XML Stylesheet Tree Transformations XSLT gained debugging capabilities in Visual Studio 2005, and is improved with new features in Visual Studio 2008, such as XSLT Data Breakpoints and b
Trang 1Wor king with XML and LINQ to XML
This is not a book about XML, the eXtensible Markup Language; but XML has become such a part
of an ASP.NET programmer’s life that the topic deserves its own chapter Although most of the
XML functionality in the NET Framework appears to be in theSystem.Xmlnamespace, you can
find XML’s influence throughout the entire Framework includingSystem.DataandSystem.Web
XML is oft maligned and misunderstood To some, XML is simply a text-based markup language; to
others it is an object serialization format or a document-encoding standard In fact, XML has become
the de facto standard manner in which data passes around the Internet XML, however, is not really
a technology as much as it is a set of standards or guiding principles It provides a structure within
which data can be stored; but the XML specification doesn’t dictate how XML processors, parsers,
formatters, and data access methods should be written or implemented.System.Xml,System.Xml
.Linq, and other namespaces contain the NET Framework 3.5’s view on how programmers should
manipulate XML Some of its techniques, such as XSLT and XML Schema, are standards-based
Oth-ers, likeXmlReaderandXmlWriter, started in the world of the NET Framework and now Java has
similar classes The NET Framework 3.5 — along with new compilers for C# 3.0 and VB 9 — brings
LINQ and LINQ to XML as a Language-Integrated Query over XML to the table
This is an ASP.NET book, aimed at the professional Web developer, so it can’t be a book all about
LINQ However, a single chapter can’t do LINQ justice Rather than making this a chapter that
focuses exclusively on justSystem.XmlorSystem.Xml.Linq, this chapter will present the new LINQ
model and syntax as a juxtaposition to the way you’re used to manipulating XML The examples
will include both the traditional and the new LINQ way of doing things We recognize that you
won’t go and rewrite all yourSystem.Xmlcode to use LINQ just because it’s cool, but seeing the
new syntax alongside what you are used to is an excellent way to learn the syntax, and it also
assists you in making decisions on which technology to use going forward
Trang 2In this chapter some listings will include a ‘q’ in the numbering scheme These
listings demonstrate how you can use LINQ to XML to accomplish the same task
shown in the previous related listing For example, Listing 10-5q shows the way
you’d accomplish the task from Listing 10-5 using LINQ to XML.
You’ll learn more about LINQ and its flexibility in the all new chapter dedicated to the technology
For the purposes of this chapter, know thatSystem.Xml.Linqintroduces an all new series of objects
such as XDocument and XElement that in some ways complement the existing APIs, but in many ways,
eclipse them You’ll also see how these new classes have provided ‘‘bridges’’ back and forth between
System.XmlandSystem.Xml.Linqthat will enable you to use many new techniques for clearer, simpler
code, while still utilizing the very useful, powerful (and well-tested) features of theSystem.Xmlclasses
you’re used to
Ultimately, however, remember that while the NET Framework has its own unique style of API around
the uses of XML, the XML consumed and produced by these techniques is standards-based and can be
used by other languages that consume XML This chapter covers all the major techniques for
manipulat-ing XML provided by the NET Framework.XmlReaderandXmlWriteroffer incredible speed but may
require a bit more thought The XmlDocument or DOM is the most commonly used method for
manip-ulating XML but you’ll pay dearly in performance penalties without careful use ADO NET DataSets
have always provided XML support, and their deep support continues with NET 3.5 XML Stylesheet
Tree Transformations (XSLT) gained debugging capabilities in Visual Studio 2005, and is improved with
new features in Visual Studio 2008, such as XSLT Data Breakpoints and better support in the editor for
loading large documents Additionally, XSLT stylesheets can be compiled into assemblies even more
easily with the new command-line stylesheet compiler ASP.NET continues to make development easier
with some simple yet powerful server controls to manipulate XML
Its flexibility and room for innovation make XML very powerful and a joy to work with
Note that when the acronym XML appears by itself, the whole acronym is
capitalized, but when it appears in a function name or namespace, only the X is
capitalized, as in System.Xml or XmlTextReader Microsoft’s API Design Guidelines
dictate that if an abbreviation of three or more characters appears in a variable
name, class name, or namespace, the first character is capitalized.
The Basics of XML
Listing 10-1, aBooks.xmldocument that represents a bookstore’s inventory database, is one of the sample
documents used in this chapter This example document has been used in various MSDN examples for
many years
Listing 10-1: The Books.xml XML document
<?xml version=’1.0’?>
<! This file is a part of a book store inventory database >
<bookstore xmlns="http://example.books.com">
Trang 3<book genre="autobiography" publicationdate="1981" ISBN="1-861003-11-0">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel" publicationdate="1967" ISBN="0-201-63361-2">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
<book genre="philosophy" publicationdate="1991" ISBN="1-861001-57-6">
<title>The Gorgias</title>
<author>
<first-name>Sidas</first-name>
<last-name>Plato</last-name>
</author>
<price>9.99</price>
</book>
</bookstore>
The first line of Listing 10-1, starting with<?xml version=‘1.0’?>, is an XML declaration This line
should always appear before the first element in the XML document and indicates the version of XML
with which this document is compliant
The second line is an XML comment and uses the same syntax as an HTML comment This isn’t a
coincidence; remember that XML and HTML are both descendants of SGML, the Standard Generalized Markup Language Comments are always optional in XML documents
The third line,<bookstore>, is the opening tag of the root element or document entity of the XML
document An XML document can have only one root element The last line in the document is the clos-ing tag</bookstore>of the root element No elements of the document can appear after the final
closing tag</bookstore> The<bookstore>element contains anxmlnsattribute such as
xmlns="http://example.books.com" Namespaces in XML are similar to namespaces in the NET
Framework because they provide qualification of elements and attributes It’s very likely that someone else
in the world has created a bookstore XML document before, and it’s also likely he or she chose an
ele-ment such as<book>or<bookstore/> A namespace is defined to make your<book>element different from any others and to deal with the chance that other<book>elements might appear with yours in the same document — it’s possible with XML
This namespace is often a URL (Uniform/Universal Resource Locator), but it actually can be a URI
(Uniform/Universal Resource Identifier) A namespace can be a GUID or a nonsense string such as
www-computerzen-com:schemaas long as it is unique Recently, the convention has been to use a URL
because URLs are ostensibly unique, thus making the document’s associated schema unique You learn more about schemas and namespaces in the next section
The fourth line is a little different because the<book>element contains some additional attributes such
asgenre,publicationdate, andISBN The order of the elements matters in an XML document, but the
499
Trang 4order of the attributes does not These attributes are said to be on or contained within the book element.
Consider the following line of code:
<book genre="autobiography" publicationdate="1981" ISBN="1-861003-11-0">
Notice that every element following this line has a matching end tag, similar to the example that follows:
<example>This is a test</example>
If no matching end tag is used, the XML is not well formed; technically it isn’t even XML! These next two
example XML fragments are not well formed because the elements don’t match up:
<example>This is a test
<example>This is a test</anothertag>
If the<example>element is empty, it might appear like this:
<example></example>
Alternatively, it could appear as a shortcut like this:
<example/>
The syntax is different, but the semantics are the same The difference between the syntax and the
seman-tics of an XML document is crucial for understanding what XML is trying to accomplish XML documents
are text files by their nature, but the information — the information set — is representable using text that
isn’t exact The set of information is the same, but the actual bytes are not
Note that attributes appear only within start tags or empty elements such as<book
genre="scifi"></book>or<book genre = "scifi" />.Visit the World Wide Web
Consortium’s (W3C) XML site at www.w3.org/XML/ for more detailed information
on XML.
The XML InfoSet
The XML InfoSet is a W3C concept that describes what is and isn’t significant in an XML document The
InfoSet isn’t a class, a function, a namespace, or a language — the InfoSet is a concept
Listing 10-2 describes two XML documents that are syntactically different but semantically the same
Listing 10-2: XML syntax versus semantics
XML document
<?xml version=’1.0’?>
<bookstore>
<book genre="autobiography" publicationdate="1981" ISBN="1-861003-11-0">
<title>The Autobiography of Benjamin Franklin</title>
<author>
Trang 5<last-name>Franklin</last-name>
</author>
<price></price>
</book>
</bookstore>
XML document that differs in syntax, but not in semantics
<?xml version=’1.0’?><bookstore><book genre="autobiography"
publicationdate="1981" ISBN="1-861003-11-0"><title>The Autobiography of Benjamin
Franklin</title><author><first-name>Benjamin</first-name>
<last-name>Franklin</last-name></author><price/></book></bookstore>
Certainly, the first document in Listing 10-2 is easier for a human to read, but the second document is
just as easy for a computer to read The second document has insignificant white space removed
Notice also that the empty<price/>element is different in the two documents The first uses the verbose
form, whereas the second element uses the shortcut form to express an empty element However, both are
empty elements.
You can manipulate XML as elements and attributes You can visualize XML as a tree of nodes You
rarely, if ever, have to worry about angle brackets or parse text yourself A text-based differences (diff)
tool would report these two documents are different because their character representations are different
An XML-based differences tool would report (correctly) that they are the same document Each document contains the same InfoSet
You can run a free XML Diff Tool online at
http://www.deltaxml.com/free/compare//
XSD–XML Schema Definition
XML documents must be well formed at the very least However, just because a document is well formed doesn’t ensure that its elements are in the right order, have the right name, or are the correct data types
After creating a well-formed XML document, you should ensure that your document is also valid A valid
XML document is well formed and also has an associated XML Schema Definition (XSD) that describes what elements, simple types, and complex types are allowed in the document
The schema for theBooks.xmlfile is a glossary or vocabulary for the bookstore described in an XML
Schema definition In programming terms, an XML Schema is a type definition, whereas an XML docu-ment is an instance of that type Listing 10-3 describes one possible XML Schema calledBooks.xsdthat validates against theBooks.xmlfile
Listing 10-3: The Books.xsd XML Schema
<?xml version="1.0" encoding="utf-8" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:tns="http://example.books.com"
xmlns="http://example.books.com"
targetNamespace="http://example.books.com"
elementFormDefault="qualified">
501
Trang 6<xsd:element name="bookstore" type="bookstoreType"/>
<xsd:complexType name="bookstoreType">
<xsd:sequence maxOccurs="unbounded">
<xsd:element name="book" type="bookType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="bookType">
<xsd:sequence>
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="author" type="authorName"/>
<xsd:element name="price" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="genre" type="xsd:string"/>
<xsd:attribute name="publicationdate" type="xsd:string"/>
<xsd:attribute name="ISBN" type="xsd:string"/>
</xsd:complexType>
<xsd:complexType name="authorName">
<xsd:sequence>
<xsd:element name="first-name" type="xsd:string"/>
<xsd:element name="last-name" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
The XML Schema in Listing 10-3 starts by including a series of namespace prefixes used in the schema
document as attributes on the root element The prefixxsd:is declared on the root element (xmlns:
xsd="http://www.w3.org/2001/XMLSchema") and then used on all other elements of that schema
The default namespace assumed for any elements without prefixes is described by thexmlnsattribute
like this:
xmlns="http://example.books.com"
A namespace-qualified element has a prefix such as<xsd:element> The target namespace for all
elements in this schema is declared with thetargetNamespaceattribute
XML Schema can be daunting at first; but if you read each line to yourself as a declaration, it makes more
sense For example, the line
<xsd:element name="bookstore" type="bookstoreType"/>
declares that an element namedbookstorehas the typebookstoreType Because the
targetNames-pacefor the schema ishttp://example.books.com, that is the namespace of each declared type in the
Books.xsdschema If you refer to Listing 10-1, you see that the namespace of theBooks.xmldocument
is alsohttp://example.books.com
For more detailed information on XML Schema, visit the W3C’s XML Schema site atwww.w3.org/
XML/Schema
Trang 7Editing XML and XML Schema in Visual Studio 2008
If you start up Visual Studio 2008 and open theBooks.xmlfile into the editor, you notice immediately
that the Visual Studio editor provides syntax highlighting and formats the XML document as a nicely
indented tree If you start writing a new XML element anywhere, you don’t have access to IntelliSense Even though thehttp://example.books.comnamespace is the default namespace, Visual Studio 2008
has no way to find theBooks.xsdfile; it could be located anywhere Remember that the namespace is not
a URL It’s a URI — an identifier Even if it were a URL it wouldn’t be appropriate for the editor, or any program you write, to go out on the Web looking for a schema You have to be explicit when associating XML Schema with instance documents
Classes and methods are used to validate XML documents when you are working programmatically, but the Visual Studio editor needs a hint to find theBook.xsdschema Assuming theBooks.xsdfile is in the same directory asBooks.xml, you have three ways to inform the editor:
❑ Open theBooks.xsdschema in Visual Studio in another window while theBooks.xmlfile is also open
❑ Include aschemaLocationattribute in the Books.xmlfile
❑ If you open at least one XML file with theschemaLocationattribute set, Visual Studio uses that schema for any other open XML files that don’t include the attribute
❑ Add theBooks.xsdschema to the list of schemas that Visual Studio knows about internally by
adding it to theSchemasproperty in the document properties window of theBooks.xmlfile
When schemas are added in this way, Visual Studio checks the document’s namespace and
determines if it already knows of a schema that matches
TheschemaLocationattribute is in a different namespace, so include thexmlnsnamespace attribute and your chosen prefix for the schema’s location, as shown in Listing 10-4
Listing 10-4: Updating the Books.xml file with a schemaLocation attribute
<?xml version=’1.0’?>
<! This file is a part of a book store inventory database >
<bookstore xmlns="http://example.books.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://example.books.com Books.xsd">
<book genre="autobiography" publicationdate="1981" ISBN="1-861003-11-0">
<title>The Autobiography of Benjamin Franklin</title>
Rest of the XML document omitted for brevity
The format for theschemaLocationattribute consists of pairs of strings separated by spaces where the
first string in each pair is a namespace URI and the second string is the location of the schema The
location can be relative, as shown in Listing 10-4, or it can be anhttp://URL orfile://location
When theBooks.xsdschema can be located for theBooks.xmldocument, Visual Studio 2008’s
XML Editor becomes considerably more useful Not only does the editor underline incorrect elements
with blue squiggles, it also includes tooltips and IntelliSense for the entire document, as shown in
Figure 10-1
503
Trang 8Figure 10-1
Figure 10-2
Trang 9When the XML Schema file from Listing 10-3 is loaded into the Visual Studio Editor, the default view
in Visual Studio 2008 for standard XSDs is now the XML Editor, rather than the Dataset Designer as in
Visual Studio 2005 However, if you right-click onBooks.xsdin the Solution Explorer and select Open
With, you’ll get a brief warning that the DataSet Designer might have modified your schema by remov-ing non-dataset XML Make sure your schema is backed up and select OK, and you’ll get a redesigned
Dataset Designer that presents the elements and complex types in a format that is familiar if you’ve
edited database schemas before (see Figure 10-2) This Designer view is intended to manipulate DataSets expressed as schema, but it can be a useful visualizer for many XSDs However, as soon as you use this visualizer to edit your XSD, your document will be turned into a Microsoft DataSet Therefore the Dataset Designer in Visual Studio 2008 is no longer suitable as a general purpose visual XML Schema Editor
A greatly enhanced XSD Designer will be released as an add-on soon after the release of Visual
Stu-dio but wasn’t yet released at the time of this writing This new XML Schema Editor will include a
Schema Explorer toolbox window that will present a comprehensive tree-view of complex schemas in
a much more scalable and appropriate way than can a more traditional ER-Diagram For more details,
see Figure 10-3 or go tohttp://blogs.msdn.com/xmlteam/archive/2007/08/27/
announcing-ctp1-of-the-xml-schema-designer.aspx
Figure 10-3
505
Trang 10After you have created an XML Schema that correctly describes an XML document, you’re ready to start
programmatically manipulating XML TheSystem.XmlandSystem.Xml.Linqnamespaces provide a
number of ways to create, access, and query XML XML Schemas provide valuable typing information
for all XML consumers that are type aware
XmlReader and XmlWriter
XmlReaderoffers a pull-style API over an XML document that is unique to the NET Framework It
pro-vides fast, forward-only, read-only access to XML documents These documents may contain elements
in multiple namespaces.XmlReaderis actually an abstract class that other classes derive from to provide
specific concrete instances likeXmlTextReaderandXmlNodeReader
Things changed slightly withXmlReaderbetween the NET Framework 1.1 and 2.0 although nothing
significant changed in theXmlReaderandXmlWriterclasses in NET 3.5 as most of the new functionality
was around LINQ Since NET 1.1, several convenient new methods have been added, and the way you
createXmlReaderhas changed for the better.XmlReaderhas become a factory The primary way for you
to create an instance of anXmlReaderis by using the Static/SharedCreatemethod Rather than creating
concrete implementations of theXmlReaderclass, you create an instance of theXmlReaderSettingsclass
and pass it to theCreatemethod You specify the features you want for yourXmlReaderobject with the
XmlReaderSettingsclass For example, you might want a specializedXmlReaderthat checks the validity
of an XML document with theIgnoreWhiteSpaceandIgnoreCommentsproperties pre-set TheCreate
method of theXmlReaderclass provides you with an instance of anXmlReaderwithout requiring you
to decide which implementation to use You can also add features to existingXmlReaders by chaining
instances of theXmlReaderclass with each other because theCreatemethod ofXmlReadertakes another
XmlReaderas a parameter
If you are accustomed to using theXmlDocumentor DOM to write an entire XML fragment or
docu-ment into memory, you will find usingXmlReaderto be a very different process A good analogy is
thatXmlReaderis toXmlDocumentwhat the ADO ForwardOnly recordset is to the ADO Static recordset
Remember that the ADO Static recordset loads the entire results set into memory and holds it there
Cer-tainly, you wouldn’t use a Static recordset if you want to retrieve only a few values The same basic rules
apply to theXmlReaderclass If you’re going to run through the document only once, you don’t want to
hold it in memory; you want the access to be as fast as possible.XmlReaderis the right decision in
this case
Listing 10-5 creates anXmlReaderclass instance and iterates forward through it, counting the number of
books in theBooks.xmldocument from Listing 10-1 TheXmlReaderSettingsobject specifies the features
that are required, rather than the actual kind ofXmlReaderto create In this example,
IgnoreWhite-spaceandIgnoreCommentsare set toTrue TheXmlReaderSettingsobject is created with these property
settings and then passed to theCreatemethod ofXmlReader
Listing 10-5: Processing XML with an XmlReader
VB
Imports System.IO
Imports System.Xml
Partial Class _Default
Inherits System.Web.UI.Page