Addison wesley XML and java developing web applications 2nd edition may 2002 ISBN 0201770040

A SAX parser reads an XML document from the beginning, and the parser tells an application what it finds by using the callback methods of ContentHandler or other interfaces.. 5.2.1 Conte

Trang 1

Section 5.1 Introduction

Section 5.2 Basic Tips for Using SAXSection 5.3 DOM versus SAX

Section 5.4 Summary

Trang 2

Unlike DOM, the SAX specification is not authorized by W3C.SAX was developed through the xml-dev mailing list, the largestcommunity of XML-related developers The development of SAXwas finished in May 1998 SAX 2.0, which introduced

namespace support and the feature/property mechanism, wascompleted in May 2000

As described in Chapter 2, SAX is an event-based parsing API.Its methods and data structures are much simpler than those ofDOM This simplicity implies that application programs based onSAX are required to do more work than those based on DOM

On the other hand, SAX-based programs can often achieve highperformance

In this chapter, we describe some tips for using SAX Then wecompare DOM and SAX, and introduce sample programs usingDOM and SAX

Trang 3

In Chapter 2, Sections 2.4 (see Figure 2.2) and 2.4.2 describethe basic concepts of SAX and the programming model for SAX.The concept of SAX is simple A SAX parser reads an XML

document from the beginning, and the parser tells an

application what it finds by using the callback methods of

ContentHandler or other interfaces

However, there are some things you should know We discussthem in this section

5.2.1 ContentHandler

In this section, we discuss a major trap for beginning users ofSAX and the parser feature mechanism, an important featureintroduced in SAX2

Trap of the characters() Events

The characters() method of ContentHandler confuses SAXbeginners Consider the following document:

startDocument()

Trang 4

characters(): "\n Hello,\n XML & Java!\n" endElement() for the root element

Trang 5

Stack context;

public TextMatch(String pattern) {

this.buffer = new StringBuffer();

Trang 6

}

public void processingInstruction(String target, String data) throws SAXException {

// Nothing to do because PI does not affect the meaning

// of a document.

}

public void startElement(String uri, String local,

Trang 7

try {

XMLReader xreader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser");

Trang 8

TextMatch finds "XML & Java" in the book element, the

Trang 9

NAMESPACE-NS URI/LOCAL NAME

QUALIFIED NAME

CALLS

*PrefixMapping()

Trang 10

-true true x x

Basically, you need not disable the namespace feature Turn itoff only when the slight overhead of this feature is

unacceptable Turn on the namespace-prefix feature if you needqualified names or namespace declarations as attributes

According to the JAXP specification, a SAX parser created bySAXParserFactory is not namespace-aware by default In theJAXP implementation of Xerces,

SAXParserFactory.setNamespaceAware() affects the

setting of the namespace feature As for Crimson in the JAXP1.1 reference implementation,

SAXParserFactory.setNamespaceAware() seems to affectneither the namespace feature nor the namespace-prefix

feature We recommend that you always get an XMLReaderinstance by using SAXParser.getXMLReader() and that youset these features explicitly

5.2.2 Using and Writing SAX Filters

A SAX filter receives SAX events from a SAX parser, modifies

these events, andforwards them to a handler, as shown in

Figure 5.1 As far as the SAX parser is concerned, the SAX filtercan be seen as a handler On the other hand, as far the handler

is concerned, the SAX filter can be seen as a SAX parser

Figure 5.1 SAX filter

Trang 11

interface for SAX parsers

Typical uses of SAX filters are the following

Modifying XML documents

When you write a program for modifying XML documents, youmight want to reuse XMLSerializer for serializing SAX events

to an XML document Then you only have to write a SAX filterthat modifies SAX events, and insert the filter between a SAXparser and XMLSerializer

implementing a SAX filter that concatenates consecutive

characters() events

Trang 12

Suppose that you want to use two handlers for a single XMLdocument at the same time Unfortunately, you cannot registertwo or more handlers of the same type to one XMLReader

instance So you implement a handler as a SAX filter (see

Figure 5.2), or you make a filter that accepts the registration oftwo handlers and duplicates the input events (see Figure 5.3.)

Figure 5.2 A handler performs as a filter.

Figure 5.3 A filter duplicates events.

Trang 13

A typical code fragment for using a SAX parser follows

XMLReader parser = XMLReaderFactory.createXMLReader(); // or parser = new SAXParser() if you use Xerces.

Trang 14

interface by adding getParent() and setParent() The

Trang 15

Listing 5.3 is an example of a SAX filter It replaces elementslike <email>foo@example.com</email> with

Trang 16

*/

public void startElement(String uri, String local, String qname, Attributes atts)

Trang 17

return null So you have to check whether the next handler is

Trang 18

</addresses>

5.2.3 New Features of SAX2

In this section, we summarize the new features of SAX2 fordevelopers who have experience with SAX1

Namespace support

SAX1 was finalized before the "Namespace in XML" specificationbecame a W3C Recommendation So SAX1 has no namespacesupport With SAX2, applications can receive namespace

information as described in Section 5.2.1

SAX filters

SAX1 has no interface for filters, though we can write filterswithout such an interface SAX2 introduced a standard

XMLFilter interface It makes writing and using filters easier

More information about an XML document

With SAX1, applications can know nothing about comments,CDATA sections, and many types of declarations in DTDs SAX2supports them with new interfaces

Feature/property mechanism

SAX2 provides a generic mechanism to enable or disable thefeatures of SAX parsers and to set or get extra information

Trang 19

Name changes to classes and interfaces

Some interfaces of SAX1 were made obsolete by SAX2 Werecommend using the SAX2 interfaces even if you don't needthe new features of SAX Table 5.2 summarizes the name

changes

Table 5.2 Interface Changes between SAX1 and SAX2

Parser XMLReader Support of new

interfacesParserFactory XMLReaderFactory Support of new

interfacesDocumentHandler ContentHandler Support of namespaceHandlerBase DefaultHandler Support of new

interfacesAttributeList Attributes Support of namespaceAttributeListImpl AttributesImpl Support of new

interfacesN/A DeclHandler Receive declarations in

DTDsN/A LexicalHandler Receive lexical

information such ascomments and CDATAsections

N/A XMLFilter New filter interface

Trang 20

We discussed the basic concepts of DOM and tips for using DOM

in Chapter 4 and discussed those of SAX in the previous

section In Section 2.4.3, we discussed points for deciding

whether to use DOM or SAX In this section, we compare theperformance of DOM and SAX and study the conversion of DOMfrom and to SAX

5.3.1 Performance: Memory and Speed

In this section, we compare the performance of DOM and SAXbased on memory usage and on parsing speed

Memory Usage

First, we compare the memory usage of DOM and SAX We canguess that SAX uses less memory than DOM

We use the XML document shown in Listing 5.4 Its size is 348bytes

Trang 21

public static void main(String[] argv) throws Exception { String xml = argv[0];

Trang 24

"Deferred DOM," we call DocumentImpl without deferred DOM

"Non-deferred DOM," and we call CoreDocumentImpl "Core

R:\samples> java chap05.MemoryUsageDOM org.apache.xerces.dom.

CoreDocumentImpl false file:./chap05/memtest10.xml

104776,155584,278472,280792,324416,327032,329664,291320,334944,337560, 340192,301848

Trang 25

document The second invokes Non-deferred DOM and usesabout 2.62KB for one document The third invokes Core DOMand uses about 2.60KB for one document

Figure 5.5 shows the memory usage of SAX, Deferred DOM,Non-deferred DOM, and Core DOM

Figure 5.5 Memory usage for SAX and DOM

implementations

For Non-deferred DOM or Core DOM, the amount of memoryused increases in proportion to the number of nodes in a

document For Deferred DOM, the amount of memory used isnot proportional It does not use 220KB for a document twice aslarge Table 5.3 shows the memory usage for documents

containing 10, 100, 200, 300, 400, or 500 child nodes

This result indicates that Deferred DOM wastes much memory

In fact, Deferred DOM defers creating DOM nodes in order toimprove not memory performance but parsing speed In

general, object creation in Java cost much time, and reducingobject creation (new operators) is very effective for improving

Trang 26

public class SpeedTest {

Trang 27

"http://apache.org/xml/properties/dom/document-class-name"; static final String FEATURE_DEFER =

Trang 29

domp.setProperty(PROP_DOC,

"org.apache.xerces.dom.CoreDocumentImpl"); for (int i = -1; i< n; i++) {

Trang 30

Non-deferred DOM: 12748ms

Core DOM: 12648ms

R:\samples> java chap05.SpeedTest 500 true file:./chap05/memtest500.xml SAX: 11036ms

Trang 31

Because a serializer accesses all nodes in the DOM tree, all

nodes are eventually created even when Deferred DOM doesnot create them during parsing In fact, Deferred DOM is theslowest in parsing combined with serialization

5.3.2 Conversion from DOM to SAX and Vice Versa

As described earlier, the runtime performance of SAX is alwaysbetter than that of DOM However, application development withSAX only is a hard job Converters from DOM to SAX and viceversa would be useful

In this section, we introduce DOMReader, which throws SAXevents from a DOM tree, and DOMConstructor, which creates

a DOM tree from SAX events

Trang 32

DOMReader traverses an input DOM tree and generates

corresponding SAX events It is derived from XMLReader,

which is the SAX parser interface, because it generates SAXevents However, the input to DOMReader is a DOM node,

though the input to XMLReader is InputSource or a URI

Thus, DOMReader ignores the parameters of the parse()

method and receives the input DOM via the setProperty()method

The core of DOMReader is the processNode() method, whichgenerates corresponding SAX events from various types of DOMnodes It is not difficult to understand this method if you arefamiliar with DOM and SAX Because there are no ways to

Trang 36

// text is ignorable or not.

chars = node.getNodeValue().toCharArray();

this.chandler.characters(chars, 0, chars.length); break;

Trang 37

xreader.setContentHandler(mon);

Trang 39

characters: length=11 '\n aaa\n '

Trang 40

programming models of DOM and SAX

Because both normal character data and CDATA sections arerepresented by characters() events, we cannot distinguishCDATA sections from characters() events by examining

characters() events only To distinguish CDATA sections, we

Trang 41

CDATA sections or not by checking startCDATA() and

endCDATA() As for entity references, we also have to checkstartEntity() and endEntity() to know whether or not aparser is processing an entity reference

Methods such as startCDATA(), startEntity(), and

comment() are methods of the LexicalHandler interface Sothey are not called if a SAX parser does not support

LexicalHandler or an application does not register a

DOMConstructor instance as a LexicalHandler() to a SAXparser In this case,

No Comment nodes are generated

No EntityReference nodes are created and the contents

of entity references are appended directly

Text nodes are generated instead of CDATASection

nodes

They do not change the meaning of an XML document, thoughthey change the lexical representation of the XML document.The type of an output node of DOMConstructor depends onthe input SAX events We get a Document node if the input SAXevents start with startDocument() and end with

endDocument() Meanwhile, we get an Element node if theinput SAX events start with startElement() and end withendElement() To convert part of an XML document to a DOMtree, you can create a SAX filter to discard unnecessary events.See Listing 5.10

Listing 5.10 Convert SAX events to a DOM tree,

chap05/DOMConstructor.java

Trang 42

this.factory = factory;

Trang 43

protected void flushText() {

if (this.buffer == null || this.buffer.length() == 0) return;

String text = new String(this.buffer);

if (this.inCdata) {

Trang 44

}

public void processingInstruction(String target, String data) throws SAXException {

this.flushText();

ProcessingInstruction pi;

pi = this.factory.createProcessingInstruction(target, data); this.output(pi);

Trang 46

Element elem = this.factory.createElementNS(uri, qname); for (int i = 0; i < atts.getLength(); i++) {

}

public void endEntity(String name) throws SAXException {

Trang 47

The program SAX2DOM (see Listing 5.11) is an example of

converting SAX events to a DOM tree with DOMConstructor It

Trang 49

The result of running SAX2DOM follows We can see that two

identical tree structures are created

R:\samples>java chap05.SAX2DOM file:./chap05/nstest.xml - DOM -

Trang 50

#text

Trang 51

In this chapter, we discussed some tips for using SAX and SAXfilters Then we compared the performance of DOM and SAXand described converting from DOM to SAX and from SAX toDOM

In Chapter 6, we provide general tips on using XML processorsand Xerces and discuss the new Xerces2 architecture

Trang 52

Section 2.5 Summary

Trang 53

In the previous sections, we showed how to read and parse anXML document Next, we explain how to process an XML

document by accessing its internal structure through APIs

The XML 1.0 Recommendation defines the precise behavior of

an XML processor when reading and parsing a document, but itsays nothing about which API to use In this section, we discusstwo widely used APIs

The Document Object Model (DOM), a tree structure–based

API by W3C The specification consists of Level 1

(Recommendation in October 1998), Level 2 (Recommendation

in November 2000), and Level 3 (currently a Working Draft)documents Xerces 1.4.3 supports most of DOM Level 2

The Simple API for XML (SAX), an event-driven API

developed by David Megginson and a number of people on thexml-dev mailing list Although not sanctioned by any standardsbody, SAX is supported by most of the available XML

processors Xerces 1.4.3 supports SAX and SAX2, which

supports namespaces In this book, the word "SAX" refers toSAX (version 1.0) and SAX2

Figure 2.2 depicts the difference between the DOM and SAXAPIs When an application uses a DOM-based parser, it parses

an XML document and passes a Document instance The

application should wait until it parses the whole XML document.When an application uses a SAX-based parser, it starts parsing

an XML document and passes an event stream to the

application in the course of parsing The next sections discuss indetail the pros and cons of using these APIs

Figure 2.2 DOM versus SAX

Trang 54

In SAX2, some interfaces have been changed andrenamed to support namespaces Xerces supportsboth the SAX and SAX2 APIs, but the old SAXinterfaces are now deprecated

2.4.1 DOM: Tree-Based API

Trang 55

document.forms(1).username.value refers to the value ofthe input field with the name username in the first form

element in an HTML document This expression is used to

access the HTML DOM on HTML browsers like Microsoft InternetExplorer (IE) and Netscape Navigator

However, current HTML object models and APIs to access themare browser-dependent (though the problem is being resolved).Thus you generally should prepare different pages suited foreach type of browser that might execute your scripts One goal

of the DOM specification is to define a common, interoperabledocument object model for HTML as well as XML The first

edition of this book is based on the DOM Level 1

Recommendation The DOM Level 2 Recommendation was

published on November 13, 2000 Handling of namespaces,events, traversal range, and views were introduced in DOMLevel 2 Standardization of DOM Level 3 is in progress It willsupport load and save functions and other new functions Thedetails of using the DOM API are discussed in Chapter 4

In DOM, an XML document is represented as a tree whose

nodes are elements, text, and so on An XML processor

generates the tree and hands it to an application A DOM-basedXML processor (for example, DOMParser or

DocumentBuilder) creates the entire structure of an XML

document in memory (though Xerces defers the creation ofDOM nodes until it is accessed)

XML is a language for describing tree-structured data In XML,

an element is represented by a start tag and a matching endtag (or an empty-element tag) An element may contain one or

Định dạng
Số trang	422
Dung lượng	2,27 MB