XML Feature Standards Standard Abbreviation Purpose XML Namespaces Namespaces Prevent overlap of names used by different softwareXML Path Language XPath Address data within a documentXML
Trang 1Executive Summary
Overview of Namespaces
Overview of XPath, XPointer, and XQueryOverview of XSLT
Trang 2XML documents and DTDs provide the foundation for an
evolving Internet document paradigm As people developedXML applications, they naturally identified additional featuresthey wanted to have and troublesome drawbacks they wanted
to avoid Such discoveries inspired further standards effortsbuilt on the foundation of XML documents and DTDs As the use
of XML has evolved, this process has led to a complicated
landscape where old specification efforts languish for extendedperiods of time and new efforts rapidly achieve acceptance.Therefore, identifying the standards relevant to your needs can
be something of a challenge
Categorizing the various specifications offers a potential aid tonavigating the landscape Figure 3-1 presents one potentialcategorization of XML-related standards It expands on Figure2-1 by revealing the detailed structure of both the platform
standards and the domain standards This categorization canhelp you identify the depth of understanding you need for anygiven standard
Figure 3-1 Categories of Related Standards
Trang 3standards define an XML format to encode information for anarrower purpose In general, a standard that could ultimatelyaffect a majority of XML developers is probably a platform
standard Within this category, there are three subcategories
1 Technical standards Standards in this category
provide a precise technical foundation for XML
processing components to work correctly Without these standards, ambiguity could cause unexpected results when working with documents The most important is XML Infoset, which provides an
underlying data model for XML documents.
Feature standards Standards in this category add new
features to the overall XML document paradigm Without thesestandards, people could not do all of the things that they want
Trang 4standards
Infrastructure standards Standards in this category provide
crucial infrastructure necessary for many XML applications
They are low-level applications of the feature standards thatprovide services used by higher-level applications Standardssuch as XHTML and XForms specify how to integrate XML withWeb technologies Others, such as SOAP and XML Encryption,specify a uniform way to perform common operations Withoutthese standards, XML applications would be incompatible withexisting Web applications, and all developers would have to
reinvent the same infrastructure
It is important to have a thorough understanding of feature
standards because they will help you to determine whether XMLsupplies the capabilities necessary to solve a given businessproblem On the other hand, most managers only need to
understand technical and infrastructure standards at a very highlevel A general knowledge of technical standards could helpyou understand the source of a defect in a third-party
Figure 3-1 also shows the structure of the domain standards.The XML paradigm includes both horizontal and vertical domainstandards Horizontal domain standards provide a solution formanipulating information specific to a type of application butused in many industries Examples include portable formats forvector graphics (SVG), multimedia (SMIL), and mathematicalexpressions (MathML) They generally facilitate common
software components across different industries Their
Trang 5The vertical domain standards provide formats for describinginformation specific to a particular industry but used by manytypes of applications Examples include interoperability formatsfor finance, health care, and telecommunications They
generally facilitate compatibility among systems used within thesame industry Their applicability depends on whether your
organization works with information from a particular industry
Table 3-1 XML Feature Standards
Standard Abbreviation Purpose
XML Namespaces Namespaces Prevent overlap of names
used by different softwareXML Path
Language
XPath Address data within a
documentXML Pointer
Language
XPointer Specify locations within a
documentXML Query
Language
XQuery Search for data within a set
of documentsXSL
Transformations
XSLT Transform data from one
format to anotherExtensible
format definition rules thanDTDs
Table 3-1 lists the most important feature standards The rest ofthis chapter reviews the important details of these standards Atthe end of the chapter, there is a summary of key technical and
Trang 6infrastructure standards.
Trang 7As you saw in the previous chapter, the primary units of content
in XML are elements and attributes Authors distinguish amongdifferent types of elements and attributes by assigning themunique names When applications process a document, theyassociate element content with the corresponding element
name and attribute content with the corresponding attributename
As people started to use XML, they realized that organizationsand application often use the same name to describe differentconcepts When these organizations want to exchange
information using XML, conflicts may occur A single documentoften combines information intended for multiple organizationsand applications For example, most mail order vendors do notcharge customers until they ship the ordered goods Enforcingthis policy requires processing the same information with anAccounting application for billing purposes and a Fulfillment
application for shipping purposes
But what if the Accounting and Fulfillment applications use theterm "status" to signify different meanings? Both applicationswant their respective status data associated with the "Status"element name Such a naming collision has the potential tomake life very uncomfortable for developers responsible forspecifying the formats of affected documents
XML Namespaces, a W3C Recommendation, enables
developers to avoid naming collisions by assigning element andattribute names to namespaces Accounting and Fulfillmentdepartments could each have separate namespaces Developerscan thereby qualify the use of the "Status" element name withthe appropriate namespace
How It Works
Trang 9xmlns:ful="http://www.foocompany.com/names/ful-REV10"> <acct:Customer>
Trang 11element automatically use the Accounting namespace Then weset a default namespace Fulfillment for the second Customerelement You can also set a default namespace and then usethe syntax in Example 3-1 to specify a different namespace forparticular elements or attributes The combination of the twomechanisms for specifying namespaces delivers a convenientand flexible way to avoid naming collisions
Practical Usage
In practice, the most important application of XML Namespacescomes when using the related feature standards For example,
a given document may have wrapper tags from an XML
messaging standard, tags for domain-specific XML content, andtags from the XLink standard, specifying links to other
documents Without XML Namespaces, it would be impossible toensure that the correct type of processing component receivedthe elements intended for it As you will see in the subsequentsections of this chapter, each of the other feature standards hasits own namespace to avoid this problem
In deciding whether to define your own namespaces, considerthat the need for them increases in direct proportion to the
number of different groups that will use a document format Inthe case of industry-specific standards, this need is obviouslyquite high The industry standards body does not want the
format it defines to conflict with any of the internally definedformats of its members An enterprisewide format for a largemultinational corporation would also almost certainly need touse XML Namespaces so that it would not conflict with any localformats However, a format intended for use by two parties,each of which plans to construct a dedicated application, might
be able to get away without XML Namespaces But it would
probably be a good idea to use them anyway, in case the
parties decide to expand the number of participants A formatused only by a single application for internal purposes almost
Trang 12If you work for a software vendor that uses XML in a product,you will want to define a namespace if your XML features arevisible to external developers or applications Typically, usingthe Internet domain name for your company, the name of theproduct, and a version identifier will be sufficient
If you work for a large enterprise that uses XML in an
integration or e-commerce application, the proper approach forconstructing namespaces is less clear Of course, you use thedomain name of your company for the first part of the
namespace and a version identifier for the last part But themiddle part takes some thought You can use a path that
mirrors your organizational chart, but organizational charts maychange rather often If your enterprise has an organizationaldirectory based on a standard data model such as X.500 or
LDAP, you may be able to leverage that In any case, an
officially designated person should coordinate the policy used byall groups developing XML applications
Trang 13An XML document represents a convenient package of relatedXML data However, there are many situations where peopleneed to work with a subset of data within a single document or
an aggregation of data from several documents To print a
shipping label, they might want to grab just the address
information from an order document To print a fraud-resistantreceipt, they might want to reference just the last four digits ofthe credit card number To produce an upgrade notification,
they might want contact information from every order
document that includes the purchase of a specific product Toperform these types of operations, it helps to have the means
of specifying such subsets and aggregations of document data
XML Path Language (XPath) provides a syntax for
addressing individual nodes within a document hierarchy
Developers specify a path from the document node to the targetnode It includes basic selection criteria so that a particular
XPath expression can differentiate among similar nodes To grabonly the billing address from orders, a developer could use
To create a fraud-resistant receipt, an author could use XPointer
to specify the last four characters of the "Number" element
within the "Card" element of an "Order" document XPointer isuseful primarily for hypertext applications and is taking a longtime to work its way through the W3C standards process
Trang 14intermediate processing steps It uses XPath as the format forspecifying many of these parameters To create a mailing list for
How They Work
With XPath, you specify a particular node or set of nodes byindicating a navigation path Suppose you wanted to refer toeach of the "Description" elements in our example order fromExample 2-6 You could use three separate XPath expressions
to address each of the three desired elements individually, asshown in Example 3-3
element based on the order in which they appear in the
document It's somewhat inconvenient to use a different
expression for each "LineItem" element, especially if you don't
Trang 16"Card" element of any "Payment" element within the document.This limitation prevents an unintended match within a
document that also uses a "Number" element as part of
something other than a credit card number This expressionthen goes beyond XPath, using the "child:text" clause to narrowfurther the scope of the match to the internal text content of anode The "position()=12" clause places the point just after the12th character of the text content
By combining two points to specify a range, authors can useXPointer to achieve the precision necessary to indicate only thelast four digits of a credit card number Example 3-7 specifiestwo separate points, using the syntax from Example 3-6 Thefirst point is just after the 12th character The second point isjust after the 16th character Connecting these two points withthe "range-to" function returns all text between the points—inthis case, the 13th, 14th, 15th, and 16th digits of the creditcard number Points can even cross node boundaries This
capability is useful for quoting passages in text documents It ispossible to grab all of the text from the 2nd paragraph of the3rd chapter to the 5th paragraph of the 4th chapter using asimilar XPointer expression
Trang 17Example 2-9
Example 3-8 wraps the query expression in a "MailingList"
element so that the query engine can generate a well-formedXML results document The "FOR" clause specifies the scope ofthe query as "Order" documents within the "Orders" collection
clause This clause is useful for sophisticated queries wheredevelopers need to manipulate intermediate results to generatethe final results
Example 3-8
Trang 18FOR $o IN document
("http://www.foocompany.com")/Orders/Order WHERE $o//Description = "FooBar Version 5" RETURN
Drafts The purpose of these revisions is to resolve two primaryissues The first is to ensure compatibility with the XML Schemadata model It would be nice if XPath expressions could use
constraints based on the basic data types defined by XML
Schema The second is to ensure compatibility between XPathand XQuery It makes a great deal of sense to use XPath as thesyntax for expressing many of the XQuery parameters
However, there are several subtle issues concerning XPath 1.0that inhibit this compatibility XPath 2.0 will resolve these issues
to the point where XQuery 1.0 will include XPath 2.0 as a
subset
Practical Usage
Developers and authors have been using XPath for some time,but they are just beginning to experiment with XPointer andXQuery XPointer and XQuery incorporate much of the XPathsyntax These factors make XPath the most important featurestandard for accessing parts of documents Managers shouldensure that all developers working on XML projects are
comfortable using XPath
Trang 19to use a particular address for sending an invoice, it passes thedocument location and the appropriate XPath
XPointer offers advantages over XPath primarily for hypertextapplications In these applications, much of the content is text-oriented rather than data-oriented, and documents naturallytend to refer to content within other documents This
applicability extends to boundary situations where a text-oriented document refers to parts of a data-oriented documentgenerated by an application XPointer is closely related to XLink,the feature standard focused on XML hypertext links and
discussed later The motivation for XPointer stems from the
common need for such links to specify a particular place within
a document rather than the entire document
While XPath and XPointer address the issue of accessing datawithin an individual document, XQuery handles accessing dataacross a collection of documents Although the standard wasstill under development at the time of this writing, the
motivation for it is clear Any time an application generates asignificant number of XML documents containing meaningfuldata, it's only a matter of time before someone wants to searchthose documents for a specific piece of information In caseswhere XML is only one representation of data actually managedwithin a relational database model, existing query languagesand tools will be adequate But if XML is the native
representation of the data or is the common representation
provided by otherwise heterogeneous data models, XQuery
Trang 20should see extensive use.
Trang 21The three standards covered in the previous section enable you
to select relatively small parts of documents in various ways Insome cases, you may want to take this manipulation a step
further by completely reorganizing a document This type oftransformation requires selecting different parts of a documentand rearranging them XSL Transformations (XSLT) extendsthe XPath model of how to address parts of documents withmore sophisticated operators that enable developers to specifythese rearrangements
You may be wondering why, as the name implies, XSLT is
associated with XSL Originally, people saw XSL as a genericway to display XML documents for all presentation technologies.Accomplishing this goal naturally required two different types offeatures: (1) features for rearranging document content so that
it made the most sense for display and (2) features for
attaching display properties to the content However,
rearranging content turned out to be useful for other purposes,and the appropriate display properties turned out to be a topic
of extensive debate Because people wanted to use the
rearranging features and agreement on these features was farahead of that for display properties, the two standards
same logical type of document? That leads to meta-incompatibility and big headaches for managers who are
Trang 22of groups calling different concepts by the same name XSLTsolves the problem of groups calling the same concept by
different names
Such a scenario is highly likely in applications such as supplychain management where two companies want to exchangeconceptually the same information but already have formatsthat they use internally Also, industry groups in finance,
telecommunications, and transportation have defined formatsfor transactions in those industries In some cases, multipleindustry groups are working on the same problem, creating thepotential for dueling standards Moreover, with the integration
of global supply chains, standards for related industries such asmanufacturing and shipping may need to ensure compatibilitywhere they overlap
The idea behind XSLT is to define a scripting language—usingXML syntax, of course—that enables developers to transformone format into another Wherever two formats overlapped,developers would create a transform that extracts the
overlapping information from one format and rearranges it intothe other format These transforms are directional; to rearrangedocuments in both directions, you would need two separatetransforms
How It Works
Consider the basic problem of automatically placing an orderwith a trading partner over the Internet Foo Company has
defined a Foo Company Order DTD that it uses internally BarCorp has defined a Bar Corp Order DTD that it uses internally.Now Foo Company wants to place orders automatically with BarCorp
Trang 23is valid with respect to the Foo Company Order DTD However,for Bar Company to accept the order, the order document must
be valid with Corp to the Bar Corp Order DTD To achieve thisend, Foo Company creates a Foo Bar Transformation documentthat specifies how to translate a Foo Company Order Documentinto a Bar Corp Order Document as shown in Figure 3-3
Figure 3-3 Translating Order Formats with XSLT
To see how XSLT works, let's consider a simple example
Examples 3-9a and 3-9b show parts of order documents in twodifferent formats Example 3-9a models currency information as
an attribute on the "Order" element Example 3-9b models
currency information as a child element of the "Order" element.The choice of modeling information as an attribute or child
element is an arbitrary one, so it is likely that two different
Trang 24elements using XSL-specific constructs The transformation
document selects the "Order" element in the source document
It then begins a new "Order" element with a new "Currency"child element in the translated document It inserts the value ofthe "Currency" attribute of the selected order element in thesource document as the element content of the "Currency"
Trang 25DTDs, XSLT scripts also work with well-formed ones However,people often use XSLT when they have in one data format manydocuments that they want translated to another data format—precisely the same conditions under which they use valid
documents
At the time of this writing, the W3C had commenced work onXSLT 2.0 The work was in its beginning stages, focusing on therequirements for the new version Given the work on XML
Schema and XPath 2.0 since XSLT 1.0 appeared as a
Recommendation, one of the primary requirements was
compatibility with the rest of the XML standards family The rest
of the proposed requirements mostly revolved around makingXSLT easier to use Since the release of XSLT 1.0, a great deal
has been revealed about what people want to do with XSLT and what they are finding they can do So there are a number of
proposals to close the most glaring gaps between frequency ofneed to perform a type of operation and the difficulty of actuallyperforming the operation
Practical Usage
XSLT is becoming an increasingly important part of the XMLfamily of specifications XML provides a general grammar for
Trang 26organization, different consumers of the data will probably want
it in different formats XSLT provides the mechanism for
supporting this customized data flow It fundamentally altersthe issue of information exchange from defining common dataformats for all applications to defining the transformations
necessary to deliver data to each application in the format itdesires
Another common use for XSLT is transforming data documentsinto presentation documents There are a number of
presentation formats based on tagged markup HTML is by farthe most widely used, but there is also Wireless Markup
Language (WML) for small devices and VoiceXML for voice-driven interfaces The W3C has redefined HTML in terms of XMLwith its XHTML initiative, while WML and VoiceXML are alreadydefined in terms of XML Therefore, it is fairly straightforward tocreate XSLT transforms that take a data-oriented format like anOrder and rearrange it into a presentation-oriented format likeXHTML It's even possible to create pages that incorporate
advanced features like JavaScript or VBScript
Figure 3-4 shows the typical architecture of an XSLT-driven Website Note that the XSLT transformation from XML data into
XHTML presentation takes place on the server rather than theclient While this approach violates the vision of users specifyingtheir own presentation, it is much more convenient to do
transforms on the server with the current Internet software
infrastructure Some projects have even combined XSLT withCascading Style Sheets (CSS) CSS enables authors to specifydetailed screen presentation properties for HTML elements Onedrawback of CSS is that it can only add formatting information
to a document; it cannot rearrange the document information
to make more sense to the user In this sense, XSLT and CSScomplement each other The application reorganizes the data-oriented XML with XSLT to suit the user requirements and thenapplies CSS to get the final presentation-oriented document
Trang 27There is some backlash against using XSLT Some experts
observe that XSLT is evolving into a complete programminglanguage even though existing programming languages do
almost as good a job of transforming XML documents Theybelieve that the small benefit offered by XSLT is overwhelmed
purpose programming language Finally, they feel the
by the cost of forcing developers to learn yet another special-performance of XSLT is dismal when compared to the sameoperations in traditional programming languages Others arguethat because of the complexity of the low-level interfaces, usingexisting programming languages requires a high degree of skillthat many Web developers do not have They point to the
success of JavaScript and to HTML itself as evidence that
special-purpose scripting languages improve the reach of
information exchange technologies Last, they point out thatpeople also complained about Java's initial performance andthat the performance of XSLT will also certainly improve as it
Trang 28matures Given this debate it's probably worth encouragingyour architects and lead developers to examine carefullywhether XSLT is right for a particular project.
Trang 29As discussed in the previous section, XSLT was originally part ofthe effort to create a master presentation language for XML
documents While the ability to rearrange document contentturned out to have many applications, it is still a necessary
paginated output is printing, electronic renditions of pages such
as PDF or electronic books also clearly fall in the paginated
category You could even imagine pagination applying in somesense to aural output for books on tape As you can probablyguess, XSL is therefore of primary interest to publishing
professionals
The goal of XSL is to specify a language that allows people toapply paginated formatting to XML documents without
contaminating logical document content There are three
primary requirements for a solution
1 Applying formatting rules to elements Authors must
be able to specify complete formatting rules for each type of element in a document These rules include font format, indentation, line spacing, leading and trailing space, table formatting, and so on Applying formatting rules to elements makes it very efficient to create stylesheets in conjunction with DTDs Because all documents using a DTD have the same structure, it makes sense that they should have the same
formatting rules.
Trang 30formatting rules to each page Changing the content of
elements in a document changes the position of page breaks Itwould be highly cumbersome for authors to figure out the
position of these page breaks and insert special elements toprovide features such as margins, headers, and footers
Therefore, style sheets for paginated output need general rulesfor specifying these page-level formatting concepts
Usable with different display technologies One of the bigadvantages of a stylesheet approach is that an organizationcould render the same document content in different formats bychanging stylesheets A publisher could render the same book
in a printing press format, an electronic book format, and a
books on tape format Therefore, the formatting rules must beflexible enough to accommodate the special needs of these
different presentation platforms
How It Works
The fundamental idea behind XSL is to use the same content togenerate different paginated presentations As just discussed,different presentations can accommodate different platforms.They could also accommodate different conventions, such asthose used to format product press releases versus product
Each department can each define its own stylesheet, and the
Trang 31Figure 3-5 Applying Different Stylesheets for Different
Users
Generating output occurs in two steps First, an XSL stylesheettransforms a content document into a presentation documentwith the same mechanisms as using XSLT to transform one datadocument into another data document The resulting
presentation document contains XML content elements
decorated with XSL formatting objects and is called a
formatting object tree Actually generating the final output
requires the use of a formatter as a second step As Figure 3-5shows, a given presentation document may be rendered intomany different outputs In this case, each department
generates both a Postscript file for printing and a PDF file forelectronic storage
Trang 33<fo:static-content flow-name="xsl-region-after"> <fo:block> p <fo:page-number/></fo:block>
</fo:static-content>
</fo:page-sequence>
Trang 34"Order" and gives it half-inch margins Besides the main body ofthe page, it also defines a header with the "fo:region-before"element and a footer with the "fo:region-after" element Thesecond part assigns content to this header and footer The
header gets the title "Foo Company Order" and the footer gets apage number identifier
Practical Usage
XSL achieved W3C Recommendation status relatively recently,
so there is not much practical experience with its application Atthe time of this writing, there were only a handful of XSL
discussed in the coverage of XSLT, applications that dynamicallygenerate XML content as part of a server-side application areusing XSLT to generate markup-oriented presentations such asHTML, WML, and VoiceXML directly Combine this usage patternwith the focus of XSL on paginated output, and it becomes clearthat the primary use of XSL will be for publishing
Within publishing, XSL has the potential to affect both
companies dedicated to the business of publishing and internalpublishing groups within companies For publishing companies,XSL enables them to target different delivery platforms such asprinted pages, electronic books, and books on tape efficiently.For internal groups, XSL enables them to repurpose the samecontent for different uses such as press releases, data sheets,and technical specifications There is also the possibility thatXSL could greatly expand the scope of internal publishing toinclude capturing application-generated XML content, collating
Trang 35it, and dispersing it in a published form All of these uses forXSL depend on the adoption by vendors of traditional publishingand content management systems.
Trang 36Many people want to use XML on the Web, where linking is anabsolute requirement, so it makes sense to define a companionlinking standard for XML Also, once people can easily exchangeand understand structured documents, specifying relationshipsamong them becomes very valuable These forces have
motivated the XML Linking Language (XLink) initiative
However, the advanced features of XLink may have little appeal
to mainstream Web users
Most Web authors are familiar with HTML and its href syntax.Therefore, a goal of the XLink specification is to make it verysimple to use this syntax to create a one-way link from a point
in an XML source document to a target document This
compatibility lowers the learning barrier for existing Web
developers It also raises the question of whether XLink givesthese developers enough incentive to adopt it
With XLink, it is possible to go beyond simple HTML-like links.Theoretically, once people have access to a wide variety of
content structured in XML, they will naturally want to connectdifferent pieces of this content in many different ways To
accommodate this need, XLink offers powerful capabilities forspecifying relationships among multiple target documents
rather than just between two documents Of course, this moresophisticated approach requires more sophisticated softwareinfrastructure and more sophisticated user behavior
Suppose that an attorney has researched a particular esotericand complex point of the law Parts of many different court
opinions apply in differing degrees to this attorney's particularcase With XLink, the attorney could create an extended linkthat led to all the different opinions, categorized the opinions byrelevance, and pointed to specific passages in the opinions Theattorney could then e-mail this extended link to other attorneys
on the case To accomplish this task, the attorney would need
Trang 37colleagues would need software features for navigating them.Whether there is a need for such features outside of specializeddomains such as law and medicine remains an open question
How It Works
Links with multiple targets are an exciting development, butlet's start with the simple case If you wanted to link just oneOrder Document to one Customer Document, you would use a
simple link A simple link is very similar to a standard HTML
link Figure 3-6 shows how the link is defined as part of theOrder Document and targets the Customer Document
Figure 3-6 A Simple XLink
The syntax for a link is itself XML Therefore, extracting thelinking information from a document does not require any
additional capabilities However, knowing what to do with thisinformation requires an XLink processor that understands
linking information Document viewers such as Web browsers,therefore, need to include such a component to support linksusing the XLink syntax Example 3-13 shows the actual syntaxfor the simple link represented in Figure 3-6
Trang 39independent of the content it connects This separation of
relationships from content is a powerful tool for categorizingcontent on the Web For a given set of logical documents, theremay be dozens of different categorizations, all implementedwith links and all independent of each other
Figure 3-7 Extended XLink
An extended link has more complex structure than a simplelink Because the link includes multiple targets, there will besubelements corresponding to each target Therefore, extendedlinks have two kinds of attributes: those that appear once in thetop-level element of the link and those that appear in each ofthe subelements corresponding to a target Example 3-14
shows this syntax for the link represented in Figure 3-7
Trang 40"xlink:href" attribute gives the location of the document
Extended links have the same attributes as simple links forindicating contextual information such as "xlink:label" and
"xlink:title." In addition to the attributes shown here, there isalso a set of advanced extended link attributes useful for
creating sophisticated semantic webs among documents
As discussed in the coverage of XPointer, you can combineXLink and XPointer to refer from one document to a specificlocation within another document In this case, you add a "#"
to the end of the URI in the "xlink:href" attribute and then