1118162137 xml split 2 9721

PART VProgramming CHAPTER 11: Event-Driven Programming CHAPTER 12: LINQ to XML Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com... Event-Driven ProgrammingWHAT

Trang 1

PART V

Programming

CHAPTER 11: Event-Driven Programming

CHAPTER 12: LINQ to XML Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 3

Event-Driven Programming

WHAT YOU WILL LEARN IN THIS CHAPTER:

XMLReader

There are many ways to extract information from an XML document You’ve already seen how to use the document object model and XPath; both of these methods can be used to ﬁ nd any relevant item of data Additionally, in Chapter 12 you’ll meet LINQ to XML, Microsoft’s latest attempt to incorporate XML data retrieval in its universal data access strategy

Given the wide variety of methods already available, you may be wondering why you need more, and why in particular do you need event-driven methods? The main answer is because

of memory limitations Other XML processing methods require that the whole XML docu-ment be loaded into memory (that is, RAM) before any processing can take place Because XML documents typically use up to four times more RAM than the size of the fi le containing the document, some documents can take up more RAM than is available on a computer; it is therefore necessary to fi nd an alternative method to extract data This is where event-driven paradigms come into play Instead of loading the complete fi le into memory, the fi le is

cov-ered in this chapter

11

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 4

404 ❘ CHAPTER 11 EVENT-DRIVEN PROGRAMMING

UNDERSTANDING SEQUENTIAL PROCESSING

There are two main ways of processing a fi le sequentially The fi rst relies on events being fi red

when-ever speciﬁ c items are found; whether you respond to these events is up to you For example, say an

event is ﬁ red when the opening tag of the root element is encountered, and the name of this element

is passed to the event handler Any time any textual content is found after this, another event is

ﬁ red In this scenario there would also be events that capture the closing of any elements with the

ﬁ nal event being ﬁ red when the closing tag of the root element is encountered

The second method is slightly different in that you tell the processor what sort of content you are

interested in For example, you may want to read an attribute on the ﬁ rst child under the root

ele-ment To do so, you instruct the XML reader to move to the root element and then to its ﬁ rst child

You would then begin to read the attributes until you get to the one you need Both of these

meth-ods are similar conceptually, and both cope admirably with the problem of larger memory usage

posed by using the DOM that requires the whole XML document to be loaded into memory before

being processed

Processing ﬁ les in a sequential fashion includes one or two downsides, however The ﬁ rst is that you

can’t revisit content If you read an element and then move on to one of its siblings or children,

you can’t then go back and examine one of its attributes without starting from the beginning

again You need to plan carefully what information you’ll need The second problem is validation

Imagine you receive the document shown here:

</document>

encounters, but won’t complain that the document is not valid until it reaches the relevant point

You may not care about the extra element, in which case you can just extract whatever you need,

but if you want to validate before processing begins, this usually involves reading the document

twice This is the price you pay for not needing to load the full document into memory

In the following sections you’ll examine the two methods in more detail The pure event-driven

method is called SAX and is commonly used with Java, although it can be used from any language

USING SAX IN SEQUENTIAL PROCESSING

SAX stands for the Simple API for XML, and arose out of discussions on the XML-DEV list in the

late 1990s

Trang 5

Using SAX in Sequential Processing ❘ 405

Back then people were having problems because different parsers were incompatible David Megginson took on the job of coordinating the process of specifying a new API with the group On May 11, 1998, the SAX 1.0 speciﬁ cation was completed A whole series of SAX 1.0–compliant pars-ers then began to emerge, both from large corporations, such as IBM and Sun, and from enterpris-ing individuals, such as James Clark All of these parsers were freely available for public download

Eventually, a number of shortcomings in the specifi cation became apparent, and David Megginson and his colleagues got back to work, fi nally producing the SAX 2.0 specifi cation on May 5, 2000

The improvements centered on added support for namespaces and tighter adherence to the XML speciﬁ cation Several other enhancements were made to expose additional information in the XML document, but the core of SAX was very stable On April 27, 2004, these changes were ﬁ nalized and released as version 2.0.2

SAX is speciﬁ ed as a set of Java interfaces, which initially meant that if you were going to do any serious work with it, you were looking at doing some Java programming using Java Development Kit (JDK) 1.1 or later Now, however, a wide variety of languages have their own version of SAX, some of which you learn about later in the chapter In deference to the SAX tradition, however, the examples in this chapter are written in Java

source project hosted by SourceForge To download SAX, go to the homepage and browse

.net/projects/sax This is one of the extraordinary things about SAX — it isn’t owned by anyone It doesn’t belong to any consortium, standards body, company, or individual In other words, it doesn’t survive because some organization or government says that you must use it to comply with their standards, or because a speciﬁ c company supporting it is dominant in the marketplace It survives because it’s simple and it works

Preparing to Run the Examples

The SAX speciﬁ cation does not limit which XML parser you use with your document It simply sits

on top of it and reports what content it ﬁ nds A number of different parsers are available out in the wild, but these examples use the one that comes with the JDK

If you don’t have the JDK already installed, perform the following steps to do so:

1. Go to http://www.oracle.com/technetwork/java/javase/downloads/index html Download the latest version under the SE section These examples use 1.6 but 1.7 is

.org/archives/xml-dev/ The list is still very active and any XML-related problems are usually responded to within hours, if not minutes

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Tiêu đề	Event-Driven Programming
Trường học	University of Example
Chuyên ngành	Computer Science
Thể loại	Thesis
Năm xuất bản	2012
Thành phố	Hanoi

Định dạng
Số trang	7
Dung lượng	2,44 MB