1. Trang chủ
  2. » Công Nghệ Thông Tin

python and xml

94 675 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 94
Dung lượng 491 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Copyright 2001, ActiveStatePython and XML... Copyright 2001, ActiveStateAbout me • Paul Prescod, paul@activestate.com • ActiveState Senior Developer • Co-Author, XML Handbook... Copyrigh

Trang 1

Copyright 2001, ActiveState

Python and XML

Trang 2

Copyright 2001, ActiveState

About me

• Paul Prescod, (paul@activestate.com)

• ActiveState Senior Developer

• Co-Author, XML Handbook

Trang 3

Copyright 2001, ActiveState

Trang 4

Copyright 2001, ActiveState

What is Python?

• Python is an easy to learn, powerful

programming language

– Efficient high-level data structures

– Simple approach to object-oriented programming

– Elegant syntax and dynamic typing

Trang 5

Copyright 2001, ActiveState

Brief History of Python

• CWI, early 90s

• Dynamic Object Oriented High Level Language

• More than a text processing language

• More than a scripting language

• Scalable and object oriented from the beginning

• Dynamically type checked

Trang 6

Copyright 2001, ActiveState

Python's business case

• Python can displace many other

languages in the organization

• The Python interpreter is free

• Python is legally unencumbered

• Professional programmers find Python more flexible than most languages

• Amateur programmers are (often) more comfortable than with Perl or Java

Trang 7

Copyright 2001, ActiveState

Usability features

• Exceptionally clear syntax

• Provides an obvious way to do most things

• Small set of features combine in

powerful ways

• Only innovative where innovation is

really necessary

Trang 8

Copyright 2001, ActiveState

More Usability features

• Huge amount of free code and libraries

• Interactive

• Designed to talk to the world

• Runs with Unix, Mac and Windows

• Integrates with JVM (Jython) and NET Framework (Python.NET)

• Talks MS COM, XPCOM,

CORBA,SOAP, XML-RPC, …

Trang 9

Copyright 2001, ActiveState

Scalability features

• Simple but powerful module system

• Simple but powerful class system

• Structured, standardized exceptions

Trang 10

Copyright 2001, ActiveState

Trang 11

Copyright 2001, ActiveState

Extendable

• New data types in Python or C

• Modules in Python or C

• Functions in Python or C

Trang 12

Copyright 2001, ActiveState

Python isn't picky!

Trang 13

Copyright 2001, ActiveState

Trang 14

Copyright 2001, ActiveState

Compared to Java

• Java is more difficult for amateur

programmers

• Static type checking can be

inconvenient in text processing

• Puritanical OO can be inconvenient

• Bottom line: Java can make simple

projects harder

Trang 15

Copyright 2001, ActiveState

Why not Java: political

• "100% pure Java" gets in the way

• The Java environment punishes

interoperability (e.g getenv is

Trang 16

Copyright 2001, ActiveState

Jython (nee JPython)

• Compiles Python classes to Java

classes

• Embedded interpreter allows interactive coding

• Access to all Java classes

• For better or worse: maintains Java's

security/platform-independence bubble

Trang 17

Copyright 2001, ActiveState

Jython can use Java tools

Trang 18

Copyright 2001, ActiveState

• Raw text searching is not as fast as Perl.

• Dynamic type checking requires more care in testing.

Trang 19

Copyright 2001, ActiveState

Python “Hello world"

print "Hello, World“

Trang 20

Copyright 2001, ActiveState

Python interpreter

• Just type:

C:\> python

Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32

Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam

>>> print "Hello, World"

Trang 21

Copyright 2001, ActiveState

• py files get a pyc in the same directory

• When the py is updated, the pyc is

updated

Trang 22

Copyright 2001, ActiveState

Interpreters

• DOS/Win32 (last slide)

• Unix (use ^D to exit)

• Graphical: “IDLE”, “PythonWin”

Trang 23

Copyright 2001, ActiveState

Trang 24

Copyright 2001, ActiveState

Numeric types

• int: 32 bit, e.g "x=5"

• long: arbitrary sized, e.g "x=2L**128"

• float: accuracy depends on platform, e.g "x=3.14"

• complex: real+imag., "x=5.3+3.2j"

Trang 25

Copyright 2001, ActiveState

Sequence types:

• Strings: "abcd"

• Tuples: (1,2,"b")

• Lists: [1,"a",3]

Trang 26

Copyright 2001, ActiveState

Trang 27

Copyright 2001, ActiveState

Sequence types: string

myStr = "abc" # assignment

myStr = myStr + "def" # = "abcdef"

otherstr = myStr[ 1 : 4 ] # = "bcd"

Trang 28

Copyright 2001, ActiveState

Sequence types: lists

myList = ["a", 5 , 3.25 , 2L , 4 + 3j ]

anotherList = ["a",myList, ["3","2"]] anotherList2 = myList + myList

# = ["a",5, ,"a",5, ]

yetAnotherList = myList[ 1 : 3 ]

# = [5,3.25]

Trang 29

Copyright 2001, ActiveState

Iterating over sequences

strlist = ["abc", "def", "ghi"]

for item in strlist:

for char in item:

print char

Trang 30

Copyright 2001, ActiveState

Trang 31

Copyright 2001, ActiveState

Trang 32

Copyright 2001, ActiveState

Trang 33

Copyright 2001, ActiveState

Getting the length

• The len() function gets a sequence's length

>>> len( "abc" )

3

>>> len( ["abc","def"] )

2

Trang 34

Copyright 2001, ActiveState

Traceback (innermost last):

File "<stdin>", line 1, in ?

TypeError: object doesn't support item assignment

Trang 35

Copyright 2001, ActiveState

Dictionaries

• Serve as a lookup table

• Maps "keys" to "values"

• Keys can be of any immutable type

• Assignment adds or changes members

• keys() method returns keys

Trang 36

Copyright 2001, ActiveState

'a' : 'alpha' }

Trang 37

Copyright 2001, ActiveState

>>> dict.clear()

>>> print dict

{}

Trang 38

Copyright 2001, ActiveState

File Objects

• Represent opened files:

myFile = open( "catalog.txt", "r" )

data = myFile.read()

myFile = open( "catalog2.txt", "w" )

data = data+ "more data"

myFile.write( data )

Trang 39

Copyright 2001, ActiveState

Function definitions

• Encapsulate bits of code

• Can take a fixed or variable number of arguments

• Arguments can have default values

Trang 40

Copyright 2001, ActiveState

Functions are objects

Trang 41

Copyright 2001, ActiveState

Flow Control Statements

• if/then/else

• while

• for

• try

Trang 42

Copyright 2001, ActiveState

Exception handling

• Python exception handling like Java/C++

• Errors are reported in tracebacks

• Exceptions propagate up

Trang 43

Copyright 2001, ActiveState

Exception traceback

Traceback (innermost last):

File "test.py", line 10, in ?

Trang 44

Copyright 2001, ActiveState

Classes

• Classes combine code and data.

• They represent real world objects.

• We create "instance objects" from classes.

• Closest languages in terms of object model are SmallTalk or Ruby.

• Much more flexible than Java or C++

• More central to the language than

Perl/Tcl/PHP.

Trang 45

Copyright 2001, ActiveState

Inheritance

• Classes can specify a base class.

• The new class "inherits" methods and data.

• The new class can

– "override" methods.

– add data and methods.

• Multiple Inheritance is okay

• All methods are virtual.

Trang 46

Copyright 2001, ActiveState

Modules and Packages

• A module is a set of code in a single file

• A package is a collection of related

modules

Trang 47

Copyright 2001, ActiveState

XML and Python

• Accessing XML with Python

• Parsing XML with Python

– Non validating Parsers

– Validating Parsers

Trang 48

Copyright 2001, ActiveState

Trang 49

Copyright 2001, ActiveState

Trang 50

Copyright 2001, ActiveState

Parsers for Jython

Trang 51

Copyright 2001, ActiveState

Manipulating XML

• Flat file processing with RE's (briefly!)

• PySAX - Simple API for XML

• PyDOM - W3C Document Object Model

• …

Trang 52

Copyright 2001, ActiveState

Flat File Processing

• XML documents are text

• Ordinary textual tools continue to work

• E.G Search for emph elements:

for i in re.search(

r"<emph>(.*)</emph>" , input ):

print i

Trang 53

Copyright 2001, ActiveState

Flat File Recipe

• Unless your needs are very simple, let

me help you!

• I’ve already converted the ultimate XML parsing regular expression to Python:

http://aspn.activestate.com/ASPN/Python/Cookbook/Recipe/65125

Trang 54

Copyright 2001, ActiveState

Events

• Think of an XML document as a series

of events

• "Start tag", "End tag", “Characters", etc

• We can handle hierarchy by tracking start/end tags

• We can deal with the document a little

at a time

Trang 55

Copyright 2001, ActiveState

PySAX

• "Simple API for XML"

• Common API for parsers

• Based on Java API

• Parser implements certain interfaces

• Application implements callback

interfaces

Trang 56

Copyright 2001, ActiveState

SAX Model

• The application hands the parser an

event handler object

• The parser sends events to the handler

• The handler can

– store them somehow,

– build something,

– re-route them to other parts of the

app

Trang 57

Copyright 2001, ActiveState

Trang 58

Copyright 2001, ActiveState

Trang 59

Copyright 2001, ActiveState

Trang 60

Copyright 2001, ActiveState

Trang 61

Copyright 2001, ActiveState

print handler.tags

Trang 62

Copyright 2001, ActiveState

Trang 63

Copyright 2001, ActiveState

ErrorHandling

• In addition to content handler,

• we should assign an error handler.

class MyErrorHandler:

def warning(self, exception):

print "Whoa, nelly!"

print exception

def error(self, exception):

print "Whoa, nelly!"

raise exception

def fatalError(self, exception):

print "Whoa, nelly!" raise exception

Trang 64

Copyright 2001, ActiveState

Trang 65

Copyright 2001, ActiveState

Character handling

# print out characters in document

from xml.sax.handler import ContentHandler

import xml.sax, sys

Trang 66

Copyright 2001, ActiveState

Document Object Model

• Document Object Model

• The DOM is a W3C standard

• Extended version of "Dynamic HTML"

• Defined in CORBA IDL

• Implemented in various languages

• Implemented in IE5.0 and eventually Netscape

Trang 67

Copyright 2001, ActiveState

The DOM

• The DOM is a tree-based API

• This implies a certain amount of

overhead

• But also a lot of convenience and

flexibility

• XPath implementation essentially

requires tree-based APIs

Trang 68

Copyright 2001, ActiveState

DOM Nodes

• Elements, attributes, comments, etc

called "nodes"

• Classes represent node types

• All node types subclass the "node" base class

Trang 69

Copyright 2001, ActiveState

Trang 70

Copyright 2001, ActiveState

Trang 71

Copyright 2001, ActiveState

DOM node types

Trang 72

Copyright 2001, ActiveState

More DOM node types

Trang 73

Copyright 2001, ActiveState

Navigation properties

• parentNode - Parent of this node

• firstChild - First child of this node

• lastChild - Last child of this node

• previousSibling - Node immediately preceding this node

• nextSibling - Node immediately following this node

• childNodes - List containing all the children of this node

Trang 74

Copyright 2001, ActiveState

<title> SIG for XML Processing in

Python </title>

</bookmark>

</folder>

Trang 75

Copyright 2001, ActiveState

Trang 76

Copyright 2001, ActiveState

First "title" node

Properties:

• parentNode: folder element

• firstChild: Text node 'XML bookmarks'

• lastChild: Text node 'XML bookmarks'

• previousSibling: codeNone

• nextSibling: bookmark element

• childNodes: A 1-element list: [ Text node

'XML bookmarks' ]

Trang 77

Copyright 2001, ActiveState

Trang 78

Copyright 2001, ActiveState

Trang 79

Copyright 2001, ActiveState

Trang 80

Copyright 2001, ActiveState

Modifying a DOM

appendChild(newChild)

insertBefore(newChild, refChild) replaceChild(newChild, oldChild)

removeChild(oldChild)

Trang 81

Copyright 2001, ActiveState

The Document Node

• One Document node per document

• The base of the entire tree

• documentElement attribute contains a single Element node

• childNodes may have additional

children, such as ProcessingInstruction nodes

Trang 82

Copyright 2001, ActiveState

Trang 83

Copyright 2001, ActiveState

PyDOM

• A richer, more robust DOM than

minidom

• More classes, support for DOM 2+

• Integration with XPath and XSLT

Trang 84

Copyright 2001, ActiveState

Trang 85

Copyright 2001, ActiveState

PyXML Parsers

• Xml.parsers.xmlproc

• Qp_xml

• Xml.sax.drivers

Trang 86

Copyright 2001, ActiveState

Trang 87

Copyright 2001, ActiveState

– …

Trang 88

Copyright 2001, ActiveState

Python SOAP Example

• SOAP.py:

import SOAP

server =

SOAP.SOAPProxy( "http://local host:8000/" )

print server.echo( "Hello

world" )

Trang 89

Copyright 2001, ActiveState

XML and Zope

• Zope is an Open Source application server

that publishes objects on the Internet.

• ParsedXML: Breaks up an XML document

into bits.

• XML-RPC: You can plumb the depths of Zope with XML-RPC.

• Zcatalog: Index based on element-type

names, attribute names, etc.

Trang 90

Copyright 2001, ActiveState

ParsedXML

• A free Zope “product” (extension)

• Every element is a first-class Zope

object

• You can add “behavior” to XML

documents

• RSS Channel Product

Trang 91

Copyright 2001, ActiveState

Trang 92

Copyright 2001, ActiveState

Redfoot

• Redfoot is a framework for distributed based applications, written in Python.

RDF-– an RDF database

– a query API for RDF

– an RDF parser and serializer

– a simple HTTP server providing a web interface for viewing and editing RDF

– a fully customizable UI

– the beginnings of a peer-to-peer architecture for communication between different RDF databases

Trang 93

Copyright 2001, ActiveState

Trang 94

Copyright 2001, ActiveState

Ngày đăng: 23/10/2014, 17:17

Xem thêm

TỪ KHÓA LIÊN QUAN