Addison wesley XML a managers guide 2nd edition aug 2002 ISBN 0201770067

XML Feature Standards Standard Abbreviation Purpose XML Namespaces Namespaces Prevent overlap of names used by different softwareXML Path Language XPath Address data within a documentXML

Trang 1

Executive Summary

Overview of Namespaces

Overview of XPath, XPointer, and XQueryOverview of XSLT

Trang 2

XML documents and DTDs provide the foundation for an

evolving Internet document paradigm As people developedXML applications, they naturally identified additional featuresthey wanted to have and troublesome drawbacks they wanted

to avoid Such discoveries inspired further standards effortsbuilt on the foundation of XML documents and DTDs As the use

of XML has evolved, this process has led to a complicated

landscape where old specification efforts languish for extendedperiods of time and new efforts rapidly achieve acceptance.Therefore, identifying the standards relevant to your needs can

be something of a challenge

Categorizing the various specifications offers a potential aid tonavigating the landscape Figure 3-1 presents one potentialcategorization of XML-related standards It expands on Figure2-1 by revealing the detailed structure of both the platform

standards and the domain standards This categorization canhelp you identify the depth of understanding you need for anygiven standard

Figure 3-1 Categories of Related Standards

Trang 3

standards define an XML format to encode information for anarrower purpose In general, a standard that could ultimatelyaffect a majority of XML developers is probably a platform

standard Within this category, there are three subcategories

1 Technical standards Standards in this category

provide a precise technical foundation for XML

processing components to work correctly Without these standards, ambiguity could cause unexpected results when working with documents The most important is XML Infoset, which provides an

underlying data model for XML documents.

Feature standards Standards in this category add new

features to the overall XML document paradigm Without thesestandards, people could not do all of the things that they want

Trang 4

standards

Infrastructure standards Standards in this category provide

crucial infrastructure necessary for many XML applications

They are low-level applications of the feature standards thatprovide services used by higher-level applications Standardssuch as XHTML and XForms specify how to integrate XML withWeb technologies Others, such as SOAP and XML Encryption,specify a uniform way to perform common operations Withoutthese standards, XML applications would be incompatible withexisting Web applications, and all developers would have to

reinvent the same infrastructure

It is important to have a thorough understanding of feature

standards because they will help you to determine whether XMLsupplies the capabilities necessary to solve a given businessproblem On the other hand, most managers only need to

understand technical and infrastructure standards at a very highlevel A general knowledge of technical standards could helpyou understand the source of a defect in a third-party

Figure 3-1 also shows the structure of the domain standards.The XML paradigm includes both horizontal and vertical domainstandards Horizontal domain standards provide a solution formanipulating information specific to a type of application butused in many industries Examples include portable formats forvector graphics (SVG), multimedia (SMIL), and mathematicalexpressions (MathML) They generally facilitate common

software components across different industries Their

Trang 5

The vertical domain standards provide formats for describinginformation specific to a particular industry but used by manytypes of applications Examples include interoperability formatsfor finance, health care, and telecommunications They

generally facilitate compatibility among systems used within thesame industry Their applicability depends on whether your

organization works with information from a particular industry

Table 3-1 XML Feature Standards

Standard Abbreviation Purpose

XML Namespaces Namespaces Prevent overlap of names

used by different softwareXML Path

Language

XPath Address data within a

documentXML Pointer

Language

XPointer Specify locations within a

documentXML Query

Language

XQuery Search for data within a set

of documentsXSL

Transformations

XSLT Transform data from one

format to anotherExtensible

format definition rules thanDTDs

Table 3-1 lists the most important feature standards The rest ofthis chapter reviews the important details of these standards Atthe end of the chapter, there is a summary of key technical and

Trang 6

infrastructure standards.

Trang 7

As you saw in the previous chapter, the primary units of content

in XML are elements and attributes Authors distinguish amongdifferent types of elements and attributes by assigning themunique names When applications process a document, theyassociate element content with the corresponding element

name and attribute content with the corresponding attributename

As people started to use XML, they realized that organizationsand application often use the same name to describe differentconcepts When these organizations want to exchange

information using XML, conflicts may occur A single documentoften combines information intended for multiple organizationsand applications For example, most mail order vendors do notcharge customers until they ship the ordered goods Enforcingthis policy requires processing the same information with anAccounting application for billing purposes and a Fulfillment

application for shipping purposes

But what if the Accounting and Fulfillment applications use theterm "status" to signify different meanings? Both applicationswant their respective status data associated with the "Status"element name Such a naming collision has the potential tomake life very uncomfortable for developers responsible forspecifying the formats of affected documents

XML Namespaces, a W3C Recommendation, enables

developers to avoid naming collisions by assigning element andattribute names to namespaces Accounting and Fulfillmentdepartments could each have separate namespaces Developerscan thereby qualify the use of the "Status" element name withthe appropriate namespace

How It Works

Trang 9

xmlns:ful="http://www.foocompany.com/names/ful-REV10"> <acct:Customer>

Trang 11

element automatically use the Accounting namespace Then weset a default namespace Fulfillment for the second Customerelement You can also set a default namespace and then usethe syntax in Example 3-1 to specify a different namespace forparticular elements or attributes The combination of the twomechanisms for specifying namespaces delivers a convenientand flexible way to avoid naming collisions

Practical Usage

In practice, the most important application of XML Namespacescomes when using the related feature standards For example,

a given document may have wrapper tags from an XML

messaging standard, tags for domain-specific XML content, andtags from the XLink standard, specifying links to other

documents Without XML Namespaces, it would be impossible toensure that the correct type of processing component receivedthe elements intended for it As you will see in the subsequentsections of this chapter, each of the other feature standards hasits own namespace to avoid this problem

In deciding whether to define your own namespaces, considerthat the need for them increases in direct proportion to the

number of different groups that will use a document format Inthe case of industry-specific standards, this need is obviouslyquite high The industry standards body does not want the

format it defines to conflict with any of the internally definedformats of its members An enterprisewide format for a largemultinational corporation would also almost certainly need touse XML Namespaces so that it would not conflict with any localformats However, a format intended for use by two parties,each of which plans to construct a dedicated application, might

be able to get away without XML Namespaces But it would

probably be a good idea to use them anyway, in case the

parties decide to expand the number of participants A formatused only by a single application for internal purposes almost

Trang 12

If you work for a software vendor that uses XML in a product,you will want to define a namespace if your XML features arevisible to external developers or applications Typically, usingthe Internet domain name for your company, the name of theproduct, and a version identifier will be sufficient

If you work for a large enterprise that uses XML in an

integration or e-commerce application, the proper approach forconstructing namespaces is less clear Of course, you use thedomain name of your company for the first part of the

namespace and a version identifier for the last part But themiddle part takes some thought You can use a path that

mirrors your organizational chart, but organizational charts maychange rather often If your enterprise has an organizationaldirectory based on a standard data model such as X.500 or

LDAP, you may be able to leverage that In any case, an

officially designated person should coordinate the policy used byall groups developing XML applications

Trang 13

An XML document represents a convenient package of relatedXML data However, there are many situations where peopleneed to work with a subset of data within a single document or

an aggregation of data from several documents To print a

shipping label, they might want to grab just the address

information from an order document To print a fraud-resistantreceipt, they might want to reference just the last four digits ofthe credit card number To produce an upgrade notification,

they might want contact information from every order

document that includes the purchase of a specific product Toperform these types of operations, it helps to have the means

of specifying such subsets and aggregations of document data

XML Path Language (XPath) provides a syntax for

addressing individual nodes within a document hierarchy

Developers specify a path from the document node to the targetnode It includes basic selection criteria so that a particular

XPath expression can differentiate among similar nodes To grabonly the billing address from orders, a developer could use

To create a fraud-resistant receipt, an author could use XPointer

to specify the last four characters of the "Number" element

within the "Card" element of an "Order" document XPointer isuseful primarily for hypertext applications and is taking a longtime to work its way through the W3C standards process

Trang 14

intermediate processing steps It uses XPath as the format forspecifying many of these parameters To create a mailing list for

How They Work

With XPath, you specify a particular node or set of nodes byindicating a navigation path Suppose you wanted to refer toeach of the "Description" elements in our example order fromExample 2-6 You could use three separate XPath expressions

to address each of the three desired elements individually, asshown in Example 3-3

element based on the order in which they appear in the

document It's somewhat inconvenient to use a different

expression for each "LineItem" element, especially if you don't

Trang 16

"Card" element of any "Payment" element within the document.This limitation prevents an unintended match within a

document that also uses a "Number" element as part of

something other than a credit card number This expressionthen goes beyond XPath, using the "child:text" clause to narrowfurther the scope of the match to the internal text content of anode The "position()=12" clause places the point just after the12th character of the text content

By combining two points to specify a range, authors can useXPointer to achieve the precision necessary to indicate only thelast four digits of a credit card number Example 3-7 specifiestwo separate points, using the syntax from Example 3-6 Thefirst point is just after the 12th character The second point isjust after the 16th character Connecting these two points withthe "range-to" function returns all text between the points—inthis case, the 13th, 14th, 15th, and 16th digits of the creditcard number Points can even cross node boundaries This

capability is useful for quoting passages in text documents It ispossible to grab all of the text from the 2nd paragraph of the3rd chapter to the 5th paragraph of the 4th chapter using asimilar XPointer expression

Trang 17

Example 2-9

Example 3-8 wraps the query expression in a "MailingList"

element so that the query engine can generate a well-formedXML results document The "FOR" clause specifies the scope ofthe query as "Order" documents within the "Orders" collection

clause This clause is useful for sophisticated queries wheredevelopers need to manipulate intermediate results to generatethe final results

Example 3-8

Trang 18

FOR $o IN document

("http://www.foocompany.com")/Orders/Order WHERE $o//Description = "FooBar Version 5" RETURN

Drafts The purpose of these revisions is to resolve two primaryissues The first is to ensure compatibility with the XML Schemadata model It would be nice if XPath expressions could use

constraints based on the basic data types defined by XML

Schema The second is to ensure compatibility between XPathand XQuery It makes a great deal of sense to use XPath as thesyntax for expressing many of the XQuery parameters

However, there are several subtle issues concerning XPath 1.0that inhibit this compatibility XPath 2.0 will resolve these issues

to the point where XQuery 1.0 will include XPath 2.0 as a

subset

Practical Usage

Developers and authors have been using XPath for some time,but they are just beginning to experiment with XPointer andXQuery XPointer and XQuery incorporate much of the XPathsyntax These factors make XPath the most important featurestandard for accessing parts of documents Managers shouldensure that all developers working on XML projects are

comfortable using XPath

Trang 19

to use a particular address for sending an invoice, it passes thedocument location and the appropriate XPath

XPointer offers advantages over XPath primarily for hypertextapplications In these applications, much of the content is text-oriented rather than data-oriented, and documents naturallytend to refer to content within other documents This

applicability extends to boundary situations where a text-oriented document refers to parts of a data-oriented documentgenerated by an application XPointer is closely related to XLink,the feature standard focused on XML hypertext links and

discussed later The motivation for XPointer stems from the

common need for such links to specify a particular place within

a document rather than the entire document

While XPath and XPointer address the issue of accessing datawithin an individual document, XQuery handles accessing dataacross a collection of documents Although the standard wasstill under development at the time of this writing, the

motivation for it is clear Any time an application generates asignificant number of XML documents containing meaningfuldata, it's only a matter of time before someone wants to searchthose documents for a specific piece of information In caseswhere XML is only one representation of data actually managedwithin a relational database model, existing query languagesand tools will be adequate But if XML is the native

representation of the data or is the common representation

provided by otherwise heterogeneous data models, XQuery

Trang 20

should see extensive use.

Trang 21

The three standards covered in the previous section enable you

to select relatively small parts of documents in various ways Insome cases, you may want to take this manipulation a step

further by completely reorganizing a document This type oftransformation requires selecting different parts of a documentand rearranging them XSL Transformations (XSLT) extendsthe XPath model of how to address parts of documents withmore sophisticated operators that enable developers to specifythese rearrangements

You may be wondering why, as the name implies, XSLT is

associated with XSL Originally, people saw XSL as a genericway to display XML documents for all presentation technologies.Accomplishing this goal naturally required two different types offeatures: (1) features for rearranging document content so that

it made the most sense for display and (2) features for

attaching display properties to the content However,

rearranging content turned out to be useful for other purposes,and the appropriate display properties turned out to be a topic

of extensive debate Because people wanted to use the

rearranging features and agreement on these features was farahead of that for display properties, the two standards

same logical type of document? That leads to meta-incompatibility and big headaches for managers who are

Trang 22

of groups calling different concepts by the same name XSLTsolves the problem of groups calling the same concept by

different names

Such a scenario is highly likely in applications such as supplychain management where two companies want to exchangeconceptually the same information but already have formatsthat they use internally Also, industry groups in finance,

telecommunications, and transportation have defined formatsfor transactions in those industries In some cases, multipleindustry groups are working on the same problem, creating thepotential for dueling standards Moreover, with the integration

of global supply chains, standards for related industries such asmanufacturing and shipping may need to ensure compatibilitywhere they overlap

The idea behind XSLT is to define a scripting language—usingXML syntax, of course—that enables developers to transformone format into another Wherever two formats overlapped,developers would create a transform that extracts the

overlapping information from one format and rearranges it intothe other format These transforms are directional; to rearrangedocuments in both directions, you would need two separatetransforms

How It Works

Consider the basic problem of automatically placing an orderwith a trading partner over the Internet Foo Company has

defined a Foo Company Order DTD that it uses internally BarCorp has defined a Bar Corp Order DTD that it uses internally.Now Foo Company wants to place orders automatically with BarCorp

Trang 23

is valid with respect to the Foo Company Order DTD However,for Bar Company to accept the order, the order document must

be valid with Corp to the Bar Corp Order DTD To achieve thisend, Foo Company creates a Foo Bar Transformation documentthat specifies how to translate a Foo Company Order Documentinto a Bar Corp Order Document as shown in Figure 3-3

Figure 3-3 Translating Order Formats with XSLT

To see how XSLT works, let's consider a simple example

Examples 3-9a and 3-9b show parts of order documents in twodifferent formats Example 3-9a models currency information as

an attribute on the "Order" element Example 3-9b models

currency information as a child element of the "Order" element.The choice of modeling information as an attribute or child

element is an arbitrary one, so it is likely that two different

Trang 24

elements using XSL-specific constructs The transformation

document selects the "Order" element in the source document

It then begins a new "Order" element with a new "Currency"child element in the translated document It inserts the value ofthe "Currency" attribute of the selected order element in thesource document as the element content of the "Currency"

Trang 25

DTDs, XSLT scripts also work with well-formed ones However,people often use XSLT when they have in one data format manydocuments that they want translated to another data format—precisely the same conditions under which they use valid

documents

At the time of this writing, the W3C had commenced work onXSLT 2.0 The work was in its beginning stages, focusing on therequirements for the new version Given the work on XML

Schema and XPath 2.0 since XSLT 1.0 appeared as a

Recommendation, one of the primary requirements was

compatibility with the rest of the XML standards family The rest

of the proposed requirements mostly revolved around makingXSLT easier to use Since the release of XSLT 1.0, a great deal

has been revealed about what people want to do with XSLT and what they are finding they can do So there are a number of

proposals to close the most glaring gaps between frequency ofneed to perform a type of operation and the difficulty of actuallyperforming the operation

Practical Usage

XSLT is becoming an increasingly important part of the XMLfamily of specifications XML provides a general grammar for

Trang 26

organization, different consumers of the data will probably want

it in different formats XSLT provides the mechanism for

supporting this customized data flow It fundamentally altersthe issue of information exchange from defining common dataformats for all applications to defining the transformations

necessary to deliver data to each application in the format itdesires

Another common use for XSLT is transforming data documentsinto presentation documents There are a number of

presentation formats based on tagged markup HTML is by farthe most widely used, but there is also Wireless Markup

Language (WML) for small devices and VoiceXML for voice-driven interfaces The W3C has redefined HTML in terms of XMLwith its XHTML initiative, while WML and VoiceXML are alreadydefined in terms of XML Therefore, it is fairly straightforward tocreate XSLT transforms that take a data-oriented format like anOrder and rearrange it into a presentation-oriented format likeXHTML It's even possible to create pages that incorporate

advanced features like JavaScript or VBScript

Figure 3-4 shows the typical architecture of an XSLT-driven Website Note that the XSLT transformation from XML data into

XHTML presentation takes place on the server rather than theclient While this approach violates the vision of users specifyingtheir own presentation, it is much more convenient to do

transforms on the server with the current Internet software

infrastructure Some projects have even combined XSLT withCascading Style Sheets (CSS) CSS enables authors to specifydetailed screen presentation properties for HTML elements Onedrawback of CSS is that it can only add formatting information

to a document; it cannot rearrange the document information

to make more sense to the user In this sense, XSLT and CSScomplement each other The application reorganizes the data-oriented XML with XSLT to suit the user requirements and thenapplies CSS to get the final presentation-oriented document

Trang 27

There is some backlash against using XSLT Some experts

observe that XSLT is evolving into a complete programminglanguage even though existing programming languages do

almost as good a job of transforming XML documents Theybelieve that the small benefit offered by XSLT is overwhelmed

purpose programming language Finally, they feel the

by the cost of forcing developers to learn yet another special-performance of XSLT is dismal when compared to the sameoperations in traditional programming languages Others arguethat because of the complexity of the low-level interfaces, usingexisting programming languages requires a high degree of skillthat many Web developers do not have They point to the

success of JavaScript and to HTML itself as evidence that

special-purpose scripting languages improve the reach of

information exchange technologies Last, they point out thatpeople also complained about Java's initial performance andthat the performance of XSLT will also certainly improve as it

Trang 28

matures Given this debate it's probably worth encouragingyour architects and lead developers to examine carefullywhether XSLT is right for a particular project.

Trang 29

As discussed in the previous section, XSLT was originally part ofthe effort to create a master presentation language for XML

documents While the ability to rearrange document contentturned out to have many applications, it is still a necessary

paginated output is printing, electronic renditions of pages such

as PDF or electronic books also clearly fall in the paginated

category You could even imagine pagination applying in somesense to aural output for books on tape As you can probablyguess, XSL is therefore of primary interest to publishing

professionals

The goal of XSL is to specify a language that allows people toapply paginated formatting to XML documents without

contaminating logical document content There are three

primary requirements for a solution

1 Applying formatting rules to elements Authors must

be able to specify complete formatting rules for each type of element in a document These rules include font format, indentation, line spacing, leading and trailing space, table formatting, and so on Applying formatting rules to elements makes it very efficient to create stylesheets in conjunction with DTDs Because all documents using a DTD have the same structure, it makes sense that they should have the same

formatting rules.

Trang 30

formatting rules to each page Changing the content of

elements in a document changes the position of page breaks Itwould be highly cumbersome for authors to figure out the

position of these page breaks and insert special elements toprovide features such as margins, headers, and footers

Therefore, style sheets for paginated output need general rulesfor specifying these page-level formatting concepts

Usable with different display technologies One of the bigadvantages of a stylesheet approach is that an organizationcould render the same document content in different formats bychanging stylesheets A publisher could render the same book

in a printing press format, an electronic book format, and a

books on tape format Therefore, the formatting rules must beflexible enough to accommodate the special needs of these

different presentation platforms

How It Works

The fundamental idea behind XSL is to use the same content togenerate different paginated presentations As just discussed,different presentations can accommodate different platforms.They could also accommodate different conventions, such asthose used to format product press releases versus product

Each department can each define its own stylesheet, and the

Trang 31

Figure 3-5 Applying Different Stylesheets for Different

Users

Generating output occurs in two steps First, an XSL stylesheettransforms a content document into a presentation documentwith the same mechanisms as using XSLT to transform one datadocument into another data document The resulting

presentation document contains XML content elements

decorated with XSL formatting objects and is called a

formatting object tree Actually generating the final output

requires the use of a formatter as a second step As Figure 3-5shows, a given presentation document may be rendered intomany different outputs In this case, each department

generates both a Postscript file for printing and a PDF file forelectronic storage

Trang 33

<fo:static-content flow-name="xsl-region-after"> <fo:block> p <fo:page-number/></fo:block>

</fo:static-content>

</fo:page-sequence>

Trang 34

"Order" and gives it half-inch margins Besides the main body ofthe page, it also defines a header with the "fo:region-before"element and a footer with the "fo:region-after" element Thesecond part assigns content to this header and footer The

header gets the title "Foo Company Order" and the footer gets apage number identifier

Practical Usage

XSL achieved W3C Recommendation status relatively recently,

so there is not much practical experience with its application Atthe time of this writing, there were only a handful of XSL

discussed in the coverage of XSLT, applications that dynamicallygenerate XML content as part of a server-side application areusing XSLT to generate markup-oriented presentations such asHTML, WML, and VoiceXML directly Combine this usage patternwith the focus of XSL on paginated output, and it becomes clearthat the primary use of XSL will be for publishing

Within publishing, XSL has the potential to affect both

companies dedicated to the business of publishing and internalpublishing groups within companies For publishing companies,XSL enables them to target different delivery platforms such asprinted pages, electronic books, and books on tape efficiently.For internal groups, XSL enables them to repurpose the samecontent for different uses such as press releases, data sheets,and technical specifications There is also the possibility thatXSL could greatly expand the scope of internal publishing toinclude capturing application-generated XML content, collating

Trang 35

it, and dispersing it in a published form All of these uses forXSL depend on the adoption by vendors of traditional publishingand content management systems.

Trang 36

Many people want to use XML on the Web, where linking is anabsolute requirement, so it makes sense to define a companionlinking standard for XML Also, once people can easily exchangeand understand structured documents, specifying relationshipsamong them becomes very valuable These forces have

motivated the XML Linking Language (XLink) initiative

However, the advanced features of XLink may have little appeal

to mainstream Web users

Most Web authors are familiar with HTML and its href syntax.Therefore, a goal of the XLink specification is to make it verysimple to use this syntax to create a one-way link from a point

in an XML source document to a target document This

compatibility lowers the learning barrier for existing Web

developers It also raises the question of whether XLink givesthese developers enough incentive to adopt it

With XLink, it is possible to go beyond simple HTML-like links.Theoretically, once people have access to a wide variety of

content structured in XML, they will naturally want to connectdifferent pieces of this content in many different ways To

accommodate this need, XLink offers powerful capabilities forspecifying relationships among multiple target documents

rather than just between two documents Of course, this moresophisticated approach requires more sophisticated softwareinfrastructure and more sophisticated user behavior

Suppose that an attorney has researched a particular esotericand complex point of the law Parts of many different court

opinions apply in differing degrees to this attorney's particularcase With XLink, the attorney could create an extended linkthat led to all the different opinions, categorized the opinions byrelevance, and pointed to specific passages in the opinions Theattorney could then e-mail this extended link to other attorneys

on the case To accomplish this task, the attorney would need

Trang 37

colleagues would need software features for navigating them.Whether there is a need for such features outside of specializeddomains such as law and medicine remains an open question

How It Works

Links with multiple targets are an exciting development, butlet's start with the simple case If you wanted to link just oneOrder Document to one Customer Document, you would use a

simple link A simple link is very similar to a standard HTML

link Figure 3-6 shows how the link is defined as part of theOrder Document and targets the Customer Document

Figure 3-6 A Simple XLink

The syntax for a link is itself XML Therefore, extracting thelinking information from a document does not require any

additional capabilities However, knowing what to do with thisinformation requires an XLink processor that understands

linking information Document viewers such as Web browsers,therefore, need to include such a component to support linksusing the XLink syntax Example 3-13 shows the actual syntaxfor the simple link represented in Figure 3-6

Trang 39

independent of the content it connects This separation of

relationships from content is a powerful tool for categorizingcontent on the Web For a given set of logical documents, theremay be dozens of different categorizations, all implementedwith links and all independent of each other

Figure 3-7 Extended XLink

An extended link has more complex structure than a simplelink Because the link includes multiple targets, there will besubelements corresponding to each target Therefore, extendedlinks have two kinds of attributes: those that appear once in thetop-level element of the link and those that appear in each ofthe subelements corresponding to a target Example 3-14

shows this syntax for the link represented in Figure 3-7

Trang 40

"xlink:href" attribute gives the location of the document

Extended links have the same attributes as simple links forindicating contextual information such as "xlink:label" and

"xlink:title." In addition to the attributes shown here, there isalso a set of advanced extended link attributes useful for

creating sophisticated semantic webs among documents

As discussed in the coverage of XPointer, you can combineXLink and XPointer to refer from one document to a specificlocation within another document In this case, you add a "#"

to the end of the URI in the "xlink:href" attribute and then

Định dạng
Số trang	179
Dung lượng	1,59 MB