PART VProgramming CHAPTER 11: Event-Driven Programming CHAPTER 12: LINQ to XML Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com... Event-Driven ProgrammingWHAT
Trang 1PART V
Programming
CHAPTER 11: Event-Driven Programming
CHAPTER 12: LINQ to XML Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 3Event-Driven Programming
WHAT YOU WILL LEARN IN THIS CHAPTER:
XMLReader
There are many ways to extract information from an XML document You’ve already seen how to use the document object model and XPath; both of these methods can be used to fi nd any relevant item of data Additionally, in Chapter 12 you’ll meet LINQ to XML, Microsoft’s latest attempt to incorporate XML data retrieval in its universal data access strategy
Given the wide variety of methods already available, you may be wondering why you need more, and why in particular do you need event-driven methods? The main answer is because
of memory limitations Other XML processing methods require that the whole XML docu-ment be loaded into memory (that is, RAM) before any processing can take place Because XML documents typically use up to four times more RAM than the size of the fi le containing the document, some documents can take up more RAM than is available on a computer; it is therefore necessary to fi nd an alternative method to extract data This is where event-driven paradigms come into play Instead of loading the complete fi le into memory, the fi le is
cov-ered in this chapter
11
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 4404 ❘ CHAPTER 11 EVENT-DRIVEN PROGRAMMING
UNDERSTANDING SEQUENTIAL PROCESSING
There are two main ways of processing a fi le sequentially The fi rst relies on events being fi red
when-ever specifi c items are found; whether you respond to these events is up to you For example, say an
event is fi red when the opening tag of the root element is encountered, and the name of this element
is passed to the event handler Any time any textual content is found after this, another event is
fi red In this scenario there would also be events that capture the closing of any elements with the
fi nal event being fi red when the closing tag of the root element is encountered
The second method is slightly different in that you tell the processor what sort of content you are
interested in For example, you may want to read an attribute on the fi rst child under the root
ele-ment To do so, you instruct the XML reader to move to the root element and then to its fi rst child
You would then begin to read the attributes until you get to the one you need Both of these
meth-ods are similar conceptually, and both cope admirably with the problem of larger memory usage
posed by using the DOM that requires the whole XML document to be loaded into memory before
being processed
Processing fi les in a sequential fashion includes one or two downsides, however The fi rst is that you
can’t revisit content If you read an element and then move on to one of its siblings or children,
you can’t then go back and examine one of its attributes without starting from the beginning
again You need to plan carefully what information you’ll need The second problem is validation
Imagine you receive the document shown here:
<document>
<data>Here is some data.</data>
<data>Here is some more data.</data>
</document>
encounters, but won’t complain that the document is not valid until it reaches the relevant point
You may not care about the extra element, in which case you can just extract whatever you need,
but if you want to validate before processing begins, this usually involves reading the document
twice This is the price you pay for not needing to load the full document into memory
In the following sections you’ll examine the two methods in more detail The pure event-driven
method is called SAX and is commonly used with Java, although it can be used from any language
USING SAX IN SEQUENTIAL PROCESSING
SAX stands for the Simple API for XML, and arose out of discussions on the XML-DEV list in the
late 1990s
Trang 5Using SAX in Sequential Processing ❘ 405
Back then people were having problems because different parsers were incompatible David Megginson took on the job of coordinating the process of specifying a new API with the group On May 11, 1998, the SAX 1.0 specifi cation was completed A whole series of SAX 1.0–compliant pars-ers then began to emerge, both from large corporations, such as IBM and Sun, and from enterpris-ing individuals, such as James Clark All of these parsers were freely available for public download
Eventually, a number of shortcomings in the specifi cation became apparent, and David Megginson and his colleagues got back to work, fi nally producing the SAX 2.0 specifi cation on May 5, 2000
The improvements centered on added support for namespaces and tighter adherence to the XML specifi cation Several other enhancements were made to expose additional information in the XML document, but the core of SAX was very stable On April 27, 2004, these changes were fi nalized and released as version 2.0.2
SAX is specifi ed as a set of Java interfaces, which initially meant that if you were going to do any serious work with it, you were looking at doing some Java programming using Java Development Kit (JDK) 1.1 or later Now, however, a wide variety of languages have their own version of SAX, some of which you learn about later in the chapter In deference to the SAX tradition, however, the examples in this chapter are written in Java
source project hosted by SourceForge To download SAX, go to the homepage and browse
.net/projects/sax This is one of the extraordinary things about SAX — it isn’t owned by anyone It doesn’t belong to any consortium, standards body, company, or individual In other words, it doesn’t survive because some organization or government says that you must use it to comply with their standards, or because a specifi c company supporting it is dominant in the marketplace It survives because it’s simple and it works
Preparing to Run the Examples
The SAX specifi cation does not limit which XML parser you use with your document It simply sits
on top of it and reports what content it fi nds A number of different parsers are available out in the wild, but these examples use the one that comes with the JDK
If you don’t have the JDK already installed, perform the following steps to do so:
1. Go to http://www.oracle.com/technetwork/java/javase/downloads/index html Download the latest version under the SE section These examples use 1.6 but 1.7 is
.org/archives/xml-dev/ The list is still very active and any XML-related problems are usually responded to within hours, if not minutes
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com