Supported by: SAX 2 isSpecified index or Returns false if the default attribute value was isSpecified qName or specified in the DTD.. Table 6-33 Attributes2Impl Interface Methods addA
Trang 2Table 6-26
ParserFactory Class Methods
makeParser() Create a new SAX parser using the ‘org.xml.sax.parser’
Supported by: system property
SAX 1
makeParser( className) Create a new SAX parser object using the class name
Supported by: provided
SAX 1
AttributeListImpl
AttributeListImplis the SAX helper class of the SAX 1 interface for a list ofXML attributes As with the Parser and ContentHandler interfaces, AttributeListinterface should not be used for new development Consequently, the
AttributeListImplclass should not be used either We’ve included it here tohelp debug and upgrade SAX 1 code to the SAX 2 XMLReader, ContentHandler, andAttributes interfaces Table 6-27 describes the methods
Table 6-27
AttributeListImpl Class Methods
addAttribute( name, type, value) Adds an attribute to an attribute list
getLength() Returns the count of element attributes,
Supported by: starting at 0.
SAX 1
getName( i) Returns the name of an attribute by index
Supported by: Attribute indexes start at 0.
SAX 1
getType( i) Returns the type of an attribute by index
Supported by: Attribute indexes start at 0.
SAX 1
Continued
Trang 3Table 6-27 (continued)
getType( name) Returns the type of an attribute by name
Supported by:
SAX 1
getValue( i) Returns the value of an attribute by index
Supported by: Attribute indexes start at 0.
SAX extension interfaces
Aside from the SAX core interfaces, there are several extension interfaces that areimplemented using the SAX extension API SAX extensions are optional interfacesfor SAX parsers For example, the MSXML parser supports the DeclHandler andLexicalHandler interfaces, while the Apache Xerces parser classes support allextension interfaces They can also be implemented independently of the SAX coreinterfaces All extensions have been developed using the SAX 2 extensions API, andare not available in SAX 1
At the beginning of this chapter, you reviewed the SAX extensions at the interfacelevel Now let’s review the methods that are contained in the extension interfaces
You may see SAX documentation that refers to “SAX Extensions 1.x.” This refers tothe SAX 2 Extensions 1.x API, not SAX 1 There is no SAX extension API for SAX 1
Attributes2
The Attributes2 interface checks a DTD to see if an attribute in an XML documentwas declared in a DTD It also checks to see if the DTD specifies a default value forthe attribute This interface is used mainly for data validation Table 6-28 describesthe methods
Note
Trang 4Table 6-28
Attributes2 Interface Methods
isDeclared( index) or Returns true if attribute was declared in the DTD
isDeclared( qName) or isDeclared accepts an index (starting with 0), a
isDeclared( uri, localName) qualified name, or a local name
Supported by:
SAX 2
isSpecified( index) or Returns false if the default attribute value was
isSpecified( qName) or specified in the DTD isSpecified accepts an index
isSpecified( uri, localName) (starting with 0), a qualified name, or a local
DeclHandler Interface Methods
attributeDecl( eName, aName, Returns a DTD attribute type declaration Values
type, mode, value) returned include any valid DTD values, such as
Supported by: “CDATA”, “ID”, “IDREF”, “IDREFS”, “NMTOKEN”, SAX 2 and MSXML “NMTOKENS”, “ENTITY”, or “ENTITIES”, a token
group, or a NOTATION reference.
elementDecl( name, model) Returns a DTD element type declaration Values
Supported by: returned include any valid DTD values, such as SAX 2 and MSXML “EMPTY”, “ANY”, order specification, and so on.
externalEntityDecl( name, publicId, Returns a parsed external entity declaration
systemId) Supported by:
SAX 2 and MSXML
internalEntityDecl( name, value) Returns a parsed internal entity declaration
Supported by:
SAX 2 and MSXML
Trang 5EntityResolver2 extends the EntityResolver interface by programmatically addingexternal entity reference subsets This can be useful for automatically adding pre-defined DTD references to an XML document for validation while parsing Table 6-30describes the methods
Table 6-30
EntityResolver2 Interface Methods
getExternalSubset( name, baseURI) Returns an external subset for documents
Supported by: without a valid DOCTYPE declaration
SAX 2
resolveEntity( name, publicId, Allows applications to map external entities to
baseURI, systemId) XML document inputSources, or map an external
Supported by: entity by URI.
SAX 2
LexicalHandler
LexicalHandler returns information about lexical events in an XML document.Comments, the start and end of a CDATA section, the start and end of a DTD decla-ration, and the start and end of an entity can be tracked with LexicalHandler Table6-31 describes the methods
Table 6-31
LexicalHandler Interface Methods
comment(char[] ch, start, length) This event is triggered when the parser
Supported by: encounters a comment anywhere in the
endCDATA() This event is triggered when the parser
Supported by: encounters the end of a CDATA section
SAX 2 and MSXML
endDTD() This event is triggered when the parser
Supported by: encounters the end of a DTD declaration
SAX 2 and MSXML
Trang 6Method Name Description
endEntity( name) This event is triggered when the parser
Supported by: encounters the end of an entity
SAX 2 and MSXML
startCDATA() This event is triggered when the parser
Supported by: encounters the start of a CDATA section
SAX 2 and MSXML
startDTD( name, publicId, This event is triggered when the parser
systemId) encounters the start of DTD a declaration.
Supported by:
SAX 2 and MSXML
startEntity( name) This event is triggered when the parser
Supported by: encounters the beginning of internal or external
Locator2
Locator2 extends the Locator interface to return the encoding and the XML versionfor an XML document Table 6-32 describes the methods
Table 6-32
Locator2 Interface Methods
Method Name Description
getXMLVersion() Returns the entity XML version
SAX extension helper classes
The SAX extension helper classes provide the same programmatic access to theSAX Extension interfaces that the SAX helpers do to the SAX Core Interfaces Theoptional SAX 2 Extension API interface properties, methods and object classes have
to be implemented to support these classes
Trang 7The SAX Extension Helper classes are only for Java implementations Currently,MSXML does not support helper classes, though they do support some of thefunctionality through additional methods in the core interfaces
Attributes2Impl
The Attributes2Impl helper class is the implementation class of the Attributes2interface Attributes2 checks a DTD to see if an attribute in an XML document wasdeclared in a DTD It also checks to see if the DTD specifies a default value for theattribute It’s used mainly for data validation Attributes2Impl extends the interfacefunctionality by letting you add, edit, and delete attributes from lists, as described
in Table 6-33
Table 6-33
Attributes2Impl Interface Methods
addAttribute( uri, localName, Adds an attribute to the end of the attribute list, setting
qName, type, value) its “specified” flag to true
Supported by:
SAX 2
isDeclared( index) or Returns true if attribute was declared in the DTD
isDeclared( qName) or isDeclared accepts an index (starting with 0), a qualified
isDeclared( uri, localName) name, or a local name
Supported by:
SAX 2
isSpecified( index) or Returns false if the default attribute value was specified
isSpecified( qName) or in the DTD isSpecified accepts an index (starting with 0),
isSpecified( uri, localName) a qualified name, or a local name.
Supported by:
SAX 2
removeAttribute( index) Removes an attribute from the attribute list Attribute
Supported by: indexes start at 0
SAX 2
setAttributes(Attributes atts) Copy the specified Attributes object to a new Attributes
Supported by: object.
SAX 2
setDeclared( index, Set the “declared” flag of a specified attribute Attribute
boolean value) indexes start at 0.
Supported by:
SAX 2
Note
Trang 8Method Name Description
setSpecified( index, Set the “specified” flag of a specified attribute Attribute
boolean value) indexes start at 0.
DefaultHandler2 Interface Methods
attributeDecl( eName, aName, type, Returns a DTD attribute type declaration Values
mode, value) returned include any valid DTD values, such as
Supported by: “CDATA”, “ID”, “IDREF”, “IDREFS”, “NMTOKEN”, SAX 2 “NMTOKENS”, “ENTITY”, or “ENTITIES”, a token
group, or a NOTATION reference Source interface
is DeclHandler.
elementDecl( name, model) Returns a DTD element type declaration Values
Supported by: returned include any valid DTD values, such as SAX 2 “EMPTY”, “ANY”, order specification, etc Source
interface is DeclHandler.
externalEntityDecl( name, publicId, Returns a parsed external entity declaration
systemId) Source interface is DeclHandler.
Supported by:
SAX 2
internalEntityDecl( name, value) Returns a parsed internal entity declaration
Supported by: Source interface is DeclHandler.
SAX 2
comment(char[ ] ch, start, length) This event is triggered when the parser
Supported by: encounters a comment anywhere in the SAX 2 document Source interface is LexicalHandler.
startDTD( name, publicId, systemId) This event is triggered when the parser
Supported by: encounters the start of a DTD declaration Source
Continued
Trang 9Table 6-34 (continued)
endDTD() This event is triggered when the parser
Supported by: encounters the end of a DTD declaration Source
startCDATA() This event is triggered when the parser
Supported by: encounters the start of a CDATA section Source
endCDATA() This event is triggered when the parser
Supported by: encounters the end of a CDATA section Source
startEntity( name) This event is triggered when the parser
Supported by: encounters the beginning of internal or external SAX 2 XML entities Source interface is LexicalHandler.
endEntity( name) This event is triggered when the parser
Supported by: encounters the end of internal or external XML SAX 2 entities Source interface is LexicalHandler.
getExternalSubset( name, baseURI) Returns an external subset for documents
Supported by: without a valid DOCTYPE declaration Source
resolveEntity( publicId, systemId) Allows applications to map an external entity by
Supported by: URI Source interface is EntityResolver2.
SAX 2
resolveEntity( name, publicId, Allows applications to map external entities to
baseURI, systemId) XML document inputSources, or map an external
Supported by: entity by URI Source interface is EntityResolver2 SAX 2
Locator2Impl
Locator2Impl is the implementation class for the Locator2 SAX extension interface.Locator2 extends the Locator interface to return the encoding and the XML versionfor an XML document Table 6-35 describes the methods
Trang 10Table 6-35
Locator2Impl Interface Methods
getEncoding() Returns the type of character encoding for the entity.
IMXAttributes Interface Methods
addAttribute (URI, LocalName, Adds an attribute to the end of an attribute list.
QName, Type, Value) Supported by:
MSXML
Continued
Note
Trang 11Table 6-36 (continued)
addAttributeFromIndex Adds the attribute specified by an index value to
(attributes, index) the end of an attribute list Attribute indexes start
Supported by: with 0
MSXML
clear Clears the attribute list Attribute indexes start
Supported by: with 0.
MSXML
removeAttribute (index) Removes an attribute from the attribute list
Supported by: Attribute indexes start with 0.
MSXML
setAttribute (index, URI, localName, Sets an attribute in the list Attribute indexes start
QName, type, value) with 0.
setLocalName (index, localName) Sets the local name of a specified attribute
Supported by: Attribute indexes start with 0.
MSXML
setQName (index, QName) Sets the qualified name (QName) of a specified
Supported by: attribute Attribute indexes start with 0.
MSXML
setType (index, type) Sets the type of a specified attribute Attribute
Supported by: indexes start with 0.
MSXML
setURI (index, URI) Sets the namespace URI of a specified attribute
Supported by: Attribute indexes start with 0.
MSXML
setValue (index, value) Sets the value of a specified attribute Attribute
Supported by: indexes start with 0.
MSXML
IMXSchemaDeclHandler
The MSXML IMXSchemaDeclHandler extension interface provides schema tion about an element being parsed, including attributes Table 6-37 describes themethods
Trang 12informa-Table 6-37
IMXSchemaDeclHandler Interface Methods
schemaElementDecl Declares a schema for validation of an element Assists
Supported by: in MSXML SAX validation when parsing.
MSXML
IMXWriter
IMXWriter writes parsed XML output to:
✦ An IStream object: A stream object representing a sequence of bytes thatcan be forwarded to another object such as a file or a screen
✦ A string (remember, all XML documents are technically strings)
✦ A DOMDocument object: Can be passed to the MSXML DOM parser for furtherprocessing For example, a new XML document could be parsed using SAX forspeed, then sent to the DOM parser for DTD validation
The encoding and version properties of IMXWriter are similar to the
getXMLVersion() and getEncoding() methods of the SAX API Locator2extension interface Also, one piece of trivia: Note that this is the only SAX interfacethat has more properties than methods
Table 6-38 describes the properties
Table 6-38
IMXWriter Interface Properties
byteOrderMark (boolean) Controls the writing of the Byte Order Mark
Supported by: (BOM) for encoding, according to XML 1.0
disableOutputEscaping (boolean) Sets the flag for the disable-output-escaping
Supported by: attribute of the <xsl:text> and <xsl:value-of>
MSXML elements If True, entity reference symbols and
other non-XML data are passed without entity resolution
Continued
Note
Trang 13Table 6-38 (continued)
encoding (string) Sets and gets XML document encoding for the
Supported by: written output
MSXML
Indent (boolean) Sets indentation in the output
Supported by:
MSXML
omitXMLDeclaration (boolean) If true, the output will not include the XML
Supported by: declaration
MSXML
output (variant) Sets the destination and the type of IMXWriter
Supported by: output
MSXML
standalone (boolean) Sets the XML declaration standalone attribute to
Supported by: “yes” or “no.”
IMXWriter Interface Methods
Method Name Description
flush() Flushes the object’s internal buffer to its destination (not
for DOMDocument output).
Trang 14In this chapter, I provided a deep dive into the details of the Simple API for XML(SAX):
✦ A history of SAX
✦ SAX versions and evolution
✦ Understanding differences in W3C and MSXML SAX parser implementations
✦ SAX interfaces, extension interfaces, and helper classes
✦ SAX interface event callback methods
✦ SAX helper classes for implementing SAX 1 to SAX 2 compatibility
✦ Properties and methods for W3C and MSXML SAX interfaces
In the next chapter, we move on to something completely different: ExtensibleStylesheet transformations The chapters will follow the same format as the parsingchapters Chapter 7 is an introduction to XSL and XSLT, while Chapter 8 providesmore information on implementing XSLT and includes working examples
Trang 16XSLT Concepts
Chapters 1, 2, and 3 showed you what XML was all about,
how to develop XML documents, and how to make surethat XML document structures are enforced using data valida-tion Chapters 4, 5, and 6 showed you some of the things youcan do with XML documents, namely parsing them for conver-sion to other types of data
This chapter will discuss the syntax, structure, and theory ofExtensible Stylesheet Language (XSL) and XSL Transform-ations (XSLT), with some basic examples for illustration
Chapter 8 will show you XML and XSLT in real-world examplesand tips for writing XSL stylesheets for XML documents
Chapter 9 will extend those examples to show you how to useXSL: Formatting Objects (XSL:FO) with XML documents
All of the XML document and stylesheet examples contained in this chapter can be downloaded from the xmlprogrammersbible.com Website, in the Downloads section
Introducing the XSL Transformation Recommendation
XSL stands for Extensible Stylesheet Language The XSL
stylesheet XSL Transformation Recommendation describesthe process of applying an XSL stylesheet to an XML docu-ment using a transformation engine, and also specifies the
XSL language covered in this chapter XSLT is based on DSSSL
(Document Style Semantics and Specification Language), which
was originally developed to define SGML document outputformatting XSLT 1.0 became a W3C Recommendation in 1999,and the full specification is available for review at http://
www.w3.org/TR/xslt
The XSLT Recommendation should not be confused with the very confusingly named Extensible Stylesheet Language(XSL) Version 1.0 Recommendation, which achieved W3C
An introduction toXSL stylesheetelementsUseful XPath andXSLT functions forstylesheet developersExtending XSLT withthe help of EXSLT.org
Trang 17Recommendation status on 15 October 2001 This recommendation has more to dowith XSL: Formatting Objects (XSL:FO) than XSL Transformations (XSLT) You canview the Extensible Stylesheet Language (XSL) Version 1.0 Recommendation athttp://www.w3.org/TR/xsl/ Chapter 9 covers XSL XSL: Formatting Objects,including most of the W3C Extensible Stylesheet Language 1.0 Recommendation.Another W3C Recommendation that affects XSLT is the XML Path Language (XPath).XPath is a tree-based representation model of an XML document that is used inXSLT to describe elements, attributes, text data, and relative positions in an XMLdocument The full recommendation document can be seen at http://www.w3.org/TR/xpath.
Version 2.0 of XSLT and XPath are currently in the Recommendation process, andare expected to become W3C Recommendations sometime in late 2003 The currentdocuments and their status can be reviewed at http://www.w3.org/TR/
xslt20req and http://www.w3.org/TR/xpath20req
Stylesheet structure and syntax is defined in the W3C XSLT Recommendation ment, and Transformation engines are based on these definitions Transformationengines support a variety of programming languages, usually based on the languagethat they are developed in At time of writing, there is no comprehensive list ofXSLT engines available, but the Open Directory Project provides a good overview athttp://dmoz.org/Computers/Data_Formats/Markup_Languages/XML/Style_Sheets/XSL/Implementations/ Despite a multitude of XSLT enginessupporting a multitude of languages, mainstream XSLT engines are split into twoplatform camps: Java and Microsoft
docu-One of the first Java transformation engines was the LotusXSL engine, which IBMdonated to the Apache Software Group, where it became the Xalan Transformationengine Since then, Apache has developed Xalan Version 2, which implements apluggable interface into Xalan 1 and 2, as well as integrated SAX and DOM parsers.Both of the Java versions of XALAN implement the W3C Recommendations XSLTand XPath You can find more information on Xalan at http://xml.apache.org/xalan-j/index.html
Microsoft support for XML 1.0 and a reduced implementation of the W3C XSLT ommendation began with the MS Internet Explorer 5, which also supported theDocument Object Model (DOM), XML Namespaces, and beta support for XMLSchemas XML and XSL functionality was extended in later browser versions andseparated from the browser into the MSXML parser, more recently renamed theMicrosoft XML Core Services MSXML is for use in client applications, via Webbrowsers, Microsoft server products, and is a core component of the NET platform
Trang 18rec-How an XSL Transformation Works
Developers create code that identifies an XML source, an XSL stylesheet, and atransformation output method and destination to a transformation engine, which isusually described as an XSL processor Instructions from source code to the XSLprocessor perform a transformation using the predefined components The XSLprocessor reads the Source XML document and performs a transformation of theXML attributes, elements, and text values based on instructions in the XSLstylesheet
XSLT stylesheets are well-formed XML documents that conform to W3C standardsfor syntax Output format is specified in the XSL document as well, and can beHTML, text, or XML
remov-XSL for attributes and elements
XSL directives and functions combined with XPath functions make up the lary for XSL stylesheet transformations All of the directives and functions will beexplained a little later in this chapter Before I get into the full list of directives andfunctions, let’s step through a very basic transformation using very basic source,output, and stylesheet formats Listing 7-1 shows the very simple XML documentthat is based on the first XML document examined in Chapter 1 The document has a root element and a few nested elements, a few attributes, and a few text datavalues
vocabu-Listing 7-1: A Very Simple XML Document
Trang 19✦ href: Must be a valid URI.
✦ title: Used for distinguishing between more than one XML-stylesheet
process-ing instruction in the same XML document
✦ media: A list of values as defined in the W3C HTML Recommendation Version
4.0 and higher Used in addition to or instead of the title attribute
✦ charset: Used to specify a separate encoding for a stylesheet For example,
the XML document may be UTF-8, and the XSL stylesheet could be ISO-8859-1.Theoretically, the XSLT processor should know how to handle the charset differences
✦ alternate: For use when more than one XML-stylesheet processing instruction
is in the same XML document If the attribute value is no, the stylesheetshould be used first All other stylesheets should have an alternate attributevalue of yes
There are three ways that transformations happen:
✦ Referencing the XSL explicitly: As illustrated in the reference code earlier,
and in Listing 7-1, a reference to a stylesheet can be explicitly declared usingthe XML-stylesheet processing instruction This is useful when automatic
Trang 20client-side XSLT transformations are necessary and the client software, ally a Web browser, is W3C XSLT compliant Explicit referencing is most com-monly used for separation of data in XML documents from display
usu-characteristics in XSL stylesheets The XML is usually transformed to HTML
on a server or in a browser client before the HTML is displayed to a user
✦ Referencing the stylesheet programmatically: Programs can declare the XML
source, the XSL stylesheet, and the output destination, then invoke an XSLTprocessor to perform the transformation This is the technique used onservers to separate XML document data from XSL stylesheet HTML displaycharacteristics in XML-based Websites, where one stylesheet controls the dis-play of many XML documents It is also the way that most XML-to-XML andXML-to-text transformations occur in XML applications
✦ Embedding XML into an XSL stylesheet: XML data can also be embedded
into an XSL document This is not recommended for the same reasons thatembedded DTDs are not recommended This is only mentioned here in case adeveloper comes across this technique in a legacy system Embeddedstylesheets represent a maintenance nightmare if the transformation or thesource data should ever need to be altered, and defeat the purpose of trans-formations In most cases, the transformed document can be substituted forthe XML data and stylesheet combination document
Next is the remainder of the XML document, which consists of a single-valuerootelementelement:
Nested under the “firstelement” element is the level1 element, which contains
an attribute called children The element name is used to describe the nestinglevel in the XML document, and the attribute is used to describe how many morelevels of nesting are contained under the level1 element, in this case, no morenested levels (0) The phrase This is level 1 of the nested elementsrepresents a textual data value for the level1 element that the text is nested in
The secondelement element is a variation of the firstelement element Let’scompare the firstelement and secondelement elements to get a better sense
of the structure of the document:
Trang 21Last but not least, to finish the XML document, the rootelement tag is closed:
</rootelement>
Listing 7-2 shows a stylesheet that transforms attributes in Listing 7-1 to elements
by matching a pattern and applying a template to items in the source XML ment that transforms them into a new format in the destination XML document
docu-Listing 7-2: A Very Simple XSL Stylesheet
Trang 22The XSL stylesheet starts with an optional XML declaration and an attribute thatsets the encoding style for the XSL stylesheet Encoding style for the transforma-tion output is handled separately:
ele-a good reele-ason for not using stylesheet For XSLT 1.0, the version ele-attribute isoptional if stylesheet is used as the element name, but must be included iftransformis used When using stylesheet as the element name, the default ver-sion is 1.0 if the attribute is not included, which does not impact XSLT transforma-tions until XSLT 2.0 becomes an official W3C Recommendation
There is one other Namespace declaration that developers may see in legacy cations and older stylesheets:
appli-<xsl:Stylesheet xmlns:xsl=”http://www.w3.org/TR/WD-xsl”>
This Namespace declaration was used in older stylesheets to maintain ity with Microsoft IE 5.0 browsers, which supported an older version of the W3CRecommendation This Namespace should not be used unless compatibility with5.0 browsers needs to be maintained
compatibil-XSLT Elements
The stylesheet element is used to specify the root element of W3C stylesheets
XSLT vocabularies are mostly made up of elements that describe template tions or types of data that XSLT processors use during transformations Table 7-1describes the full listing of XSL elements available to stylesheet developers
Trang 23instruc-Table 7-1
W3C XSLT Elements
Element Description
stylesheet Defines a root element of a stylesheet Can be used
interchangeably with transform, but most stylesheets use
stylesheetas a de facto standard.
transform Defines a root element of a stylesheet Should only be used to
replace stylesheetas the root element of a stylesheet, but only if there is a good reason not to use stylesheet.
output Defines the format of the output document html, xml, and text
output methods are predefined If the output method is xml, output is well-formed xml, html formats the output as HTML, and text is any character data, including RTF and PDF files If no output method is specified, the XSLT processor usually checks to see if the document is html-based on html output document tree node prefixes, and defaults to xml if no other determination can
be made Must be a child of the stylesheetelement Several optional attributes can also be used to define the output version, the encoding type, to include or not include an XML declaration declaration, define the standalone attribute, define a doctype, support output document indentation, and indicate a media type.
namespace-alias Replaces a source document Namespacewith a new
Namespacein the output node tree Must be a child of the
stylesheetelement.
preserve-space Defines whitespace preservation for elements Must be a child of
the stylesheetelement.
strip-space Defines whitespace removal for elements Must be a child of the
stylesheetelement.
key Adds key values to each node in the result of an XPath
expression Must be defined as a child of the stylesheet
element For use with the key function in XPath expressions (functions are defined in Table 7-4).
import Imports an external stylesheet into the current stylesheet If there
are conflicts between the current stylesheet and the imported stylesheet, the current stylesheet takes precedence Must be defined as a child of the stylesheetelement.
apply-imports Follows the apply-template rules but overrides a stylesheet
template with the template from an imported template.
Normally, the current stylesheet takes precedence over the imported stylesheet
Trang 24Element Description
Include Includes an external stylesheet in the current stylesheet If there
are conflicts between the current stylesheet and the included stylesheet, it’s up to the XSLT processor to decide precedence.
Must be defined as a child of the stylesheetelement.
template Applies rules in a match or select action Optional attributes can
be used for specifying a node-set by match, template name, processing priority for this template in case of conflicts in the stylesheet, and an optional QName for a subset of nodes in a nodeset.
apply-templates Applies templates to all children of the current node, or a
specified node-set using the optional selectattribute.
Parameters can be passed using the with-paramelement.
call-template Calls a template by name Parameters can be passed using the
with-paramelement Results can be assigned to a variable.
param Defines a parameter and a default value in a stylesheet template.
A global parameter can be defined as a child of the
stylesheetelement.
with-param Passes a parameter value to a template when call-template or
apply-templates is used.
variable Defines a variable in a template or a stylesheet A global variable
can be defined as a child of the stylesheetelement.
copy Copies the current node and any related Namespaceonly.
Output matches the current node (element, attribute, text, processing instruction, comment, or Namespace).
copy-of Copies the current node, Namespaces, descendant nodes, and
attributes Scope can be controlled with a select attribute.
If Conditionally applies a template if the testattribute expression
evaluates to true.
choose Makes a choice based on multiple options Used with whenand
otherwise.
when An action for chooseelements.
otherwise A default action for chooseelements Must be the last child of a
chooseelement
for-each Iteratively processes each node in a node-set defined by an XPath
expression.
sort Defines a sort key used by apply-templates to a node-set and by
for-each to specify the order of iterative processing of a node set.
Continued
Trang 25Table 7-1 (continued)
Element Description
element Adds an element to the output node tree Names, Namespaces,
and attributes can be added with the names, Namespaces, and use-attribute-setsattributes.
attribute Adds an attribute to the output node tree Must be a child of an
element.
attribute-set Adds a list of attributes to the output node tree Must be a child
of an element.
text Adds text to the output node tree.
value-of Retrieves a string value of a node and write it to the output node
tree.
decimal-format Specifies the format of numeric characters and symbols when
converting to strings Used with the format-numberfunction only, not with the number element (Functions are defined in Table 7-4.)
number Adds a sequential number to the nodes of a node-set, based on
the value attribute Can also define the number format for the current node in the output node tree.
fallback Defines alternatives for instructions that the current XSL processor
does not support.
message Adds a message to the output node tree This element can also
optionally stop processing on a stylesheet with the terminate
attribute Mostly used by developers for debugging stylesheets and XSLT processors.
processing- Adds a processing instruction to the output node tree.
instructioncomment Adds a comment to the output node tree.
All of the elements in Table 7-1 should be prefixed by xsl:and follow the format
Trang 26The other XSLT 1.0 output options are text or HTML, or a valid prefixed QName thatcan be resolved into a URI For more complete documentation on this element,please refer to the XSLT element listings in Table 7-1.
Next, the stylesheet goes hunting for all the attributes in the XML document usingthe template element and the match attribute:
<xsl:template match=”@*”>
The match attribute is available with the template and key elements, and is used
to match the pattern specified by the match attribute value When an XSLT sor is invoked, the source XML document is parsed into a set of nodes in a tree,starting with the root element in the document XSLT uses pattern matching to lookthrough the document node tree and retrieve nodes that match the patterns speci-fied The @* attribute value is an XPath expression and instructs the processor tolook at all child nodes of the root node (*) and find all the attributes (@) in thesource XML document
proces-XSL and XPath
The match attribute is one of several XSLT pattern-matching attributes that areused to find nodes in an XML source document The match attribute is used tomatch a pattern in an XML document, for example, to detect the root element, or anattribute in the second element under the root element Pattern matching is facili-tated through XPath expressions, which express the parsed nodes of an XML docu-ment in tree hierarchy references XPath follows a syntax that closely mirrors filesystem paths but in the context of an XML document XPath tree representationsbreak XML documents down into a series of connected root, element, text,attribute, Namespace, processing instruction, and comment nodes
Imagine that the XSLT processor parses a document and places each of the ments in the document into a directory on a file system, and defining attributes,Namespaces, and text data in each directory with special identifiers The new filesystem starts with the root directory (/), and each descendant element can be
ele-found in a subdirectory under the root XPath doesn’t work exactly like this, but on
the surface it appears to, and the directory metaphor is a good point of referencefor starting to understand how XPath really does work Table 7-2 shows the basiclocation operators for XPath expressions
Trang 27Table 7-2
XPath Location Operators
Operator Description
The location operators are actually abbreviations of commonly used XPath nodeaxes Node axes are expressions that relate to the current node and radiate outfrom that node in different directions, to locate parents, ancestors, children,descendants, and siblings, in relation to the current node Table 7-3 lists anddescribes the XPath node axes
Table 7-3
XPath Node Axes
ancestor Ancestors, excluding the current node ancestor-or-self The current node and all ancestors attribute The attributes of the current node child Children of the current node descendant Descendants, excluding the current node descendant-or-self The current node and all descendants following The next node in the document order, including all descendants
of the next node, and excluding the current node descendants and ancestors
following-sibling The next sibling node in the document order, including all
descendants of the sibling node, and excluding the current node descendants and ancestors
namespace All Namespacenodes of the current node parent The parent of the current node
Trang 28Axis Description
preceding The previous node in the document order, including all
descendants of the previous node, and excluding the current node descendants and ancestors
preceding-sibling The previous sibling node in the document order, including all
descendants of the sibling node, and excluding the current node descendants and ancestors
XPath axes, attributes, and namespaces
XPath axis nodes treat attributes and Namespaces differently than they treat ments, text values, processing instructions, and comments, depending on the axisand the current node This is because attributes and Namespaces in the documentare not part of the hierarchy of elements, text values, processing instructions, andcomments, but are located separately in the node tree
ele-✦ Attributes are only available from element nodes or the root node, not fromother attribute and namespace nodes
✦ The child, descendant, following, following-sibling, preceding, and sibling axes do not contain attributes or Namespaces, and are empty if thecurrent node is an attribute or a Namespace node
preceding-✦ Attributes of the current node can be accessed using the attribute axis or theattribute identifier (@), as long as the current node is an element node
The next few lines in our example stylesheet create a new element based on thename of the current node in the XML document tree The current node is set to anattribute in the XML document, based on the previous line in the XSL stylesheet(xsl:template match=”@*”) However, XPath has limitations on what can beaccessed if the current node is an attribute or Namespace To get around this limi-tation, the XSLT name() function is used to pass the name of the current attributenode to the new element declaration The XPath location operator representing theself node (.) is used to pass the value of the attribute into the value of the new ele-ment using the value-of select element, and then the new element is finishedwith a hard-coded closing tag, and the template is finished with the template clos-ing tag:
<xsl:element name=”{name()}”>
<xsl:value-of select=”.”/>
</xsl:element>
</xsl:template>
Trang 29The name() function is one of many functions that can be used in stylesheets.Unlike other types of XML, XPath supports five types of data, even though the dataitself remains text
✦ boolean objects: True or false values.
✦ numbers: Any numeric value.
✦ string: Any string.
✦ node-set: A set of nodes selected by an XPath expression or series of
expressions
✦ external object: A set of nodes returned by an XSLT extension function other
than an XPath or XSLT expression Support for external objects depends onthe XSLT processor support for extensions
There are also several functions related to each data type that can be used in XSLstylesheets Table 7-4 describes the functions supported for each data type
Table 7-4
Functions by Data Type
Boolean Functions
boolean() Converts an expression to the Boolean data type value and
returns true or false.
true() Binary true.
false() Binary false.
not() Reverse binary true or false: not(true
expression)=false, not(falseexpression)=true
Number Functions
number() Converts an expression to a numeric data type value.
round() Rounds a value up or down to the nearest integer:
round(98.49) = 98, round(98.5) = 99floor() Rounds a value down to the nearest integer:
floor(98.9) = 98
ceiling() Rounds a value up to the nearest integer:
ceiling(98.4) = 99
sum() Sums the numeric values in a node-set.
count() Counts the nodes in a node-set.
Trang 30Function Description
String Functions
string() Converts an expression to a string data type value.
format-number() Converts a numeric expression to a string data type value,
using the decimal-format element values as a guide if the decimal-format element is present in a stylesheet.
concat() Converts two or more expressions to a concatenated string
data type value.
string-length() Counts the characters in a string data type value.
contains() Checks for a substring in a string Returns Boolean true
or false.
starts-with() Checks for a substring at the beginning of a string Returns
Boolean true or false.
translate() Replaces an existing substring with a specified substring in
a specified string data type value.
substring() Retrieves a substring in a specified string data type value
starting at a numeric character position and optionally ending at a specified numeric length after the starting point.
substring-after() Retrieves a substring of all characters in a specified string
data type that occurs after a numeric character position.
substring-before() Retrieves a substring of all characters in a specified string
data type that occurs before a numeric character position.
normalize-space() Replaces any tab, newline, and carriage return characters in
a string data type value with spaces, then removes any leading or trailing spaces from the new string.
Node Set Functions
current() The current node in a single-node node-set.
position() The position of the current node in a node-set.
key() A node-set defined by the key element.
name() The name of the selected node
local-name() The name of a node without a prefix, if a prefix exists.
namespace-uri() The full URI of a node prefix, if a prefix exists.
unparsed-entity-uri() The URI of an unparsed entity via a reference to the source
document DTD, based on the entity name.
id() A node-set with nodes that match the id value.
Continued
Trang 31Table 7-4 (continued)
generate-id() A unique string for a selected node in a node-set The
syntax follows well-formed XML rules.
lang() A Boolean true or false depending on if the xml:lang
attribute for the selected node matches the language identifier provided in an argument.
last() The position of the last node in a node-set.
document() Builds a node tree from an external XML document when
provided with a valid document URI.
External Object Functions (Note: These functions may also apply to other data types.)
system-property() Returns information about the processing environment.
Useful when building multi-version and multi-platform stylesheets in conjunction with the fallback element.
element-available() A Boolean true or false based on if a processing instruction
or extension element is supported by the XSLT processor.
function-available() A Boolean true or false based on if a function is supported
by the XSLT processor.
The next segment of the sample stylesheet uses the wildcard to create a templatefrom all child nodes in the document The copy element is used to copy the con-tents of the current XML document and apply the predefined templates related tothe attribute match (@*) and the current template match (*) while copying by usingthe select attribute of the apply-templates element After that, the XSL
stylesheet is closed by the stylesheet closing tag
Trang 32Listing 7-3: The transformation output document
XSLT Extensions with EXSLT.org
As mentioned earlier in this chapter, the W3C XSLT stylesheet Recommendation willprobably be updated from Version 1.0 to Version 2.0 in late 2003 In the meantime,the 1999 1.0 Recommendation has been showing its age The 1.0 specification does,however, leave room for extensions to existing stylesheet structure and syntax viathe external-object data type and the extension-element-prefixes attribute in thestylesheet and transform elements, and the element-available and function-availablefunctions Many XSLT processors now support external extensions, and a goodsource of extensions can be found at EXSLT.org Most extensions take the form ofcode that acts as add-in modules to existing XSLT processors and support functionsthat can be used as if they were part of the W3C Recommendation, once the mod-ules are installed EXSLT.org provides several free-distribution modules, plus setupinstructions and function documentation Developers are also welcomed to con-tribute to the group with their own extensions
Trang 33In this chapter, I provided an introduction to XSL and provided a theoreticaloverview of XSLT, XSL stylesheet elements, structure, and syntax, XPath axes, func-tions, and data types, and a few XSLT-specific functions
✦ All about EXSLT.org
In the next chapter, you’ll be putting all the lessons you have learned so far about XSLT Transformations to use by showing examples for transforming XML
to text and HTML We’ll also cover changing the format of XML documents usingtransformation
Trang 34XSL Transformations
In the last chapter, you were introduced to the theory of
XSLT, XSL stylesheets, and XPath expressions In this ter, you’ll apply that theory to real-world examples that willshow you how to use XSLT elements, functions, and XPathexpressions to transform XML documents to other formats ofXML, text, and HTML The next chapter will extend the HTMLexamples in this chapter even further by using XSL:FO in ourtransformations
chap-All of the XML document and stylesheet examples tained in this chapter can be downloaded from the xmlprogrammingbible.comWebsite, in the Downloadssection
con-To Begin
All of the examples in this chapter use the same source XMLfile, which is the sample XML document I have used in previ-ous chapters This example starts with a list of selectedquotes from William Shakespeare, then goes on to list threebooks that contain the quotes that are available for purchasefrom Amazon.com, and a Spanish translation of Macbeth,Romeo and Juliet, Hamlet, and other volumes that are avail-able from http://www.elcorteingles.es Amazon.comprovides a service that returns XML documents based on aURL query, and the Amazon element is based on this format
The elcorteingles.com book listing format and the quotelisting, as well as other parts of the document are used toillustrate several features of XSLT stylesheet transformations
I convert the source document into HTML, delimited text, andHTML to show you some advanced XSLT tips and tricks
Trang 35Listing 8-1 shows the XML document, named AmazonMacbethSpanish.xml, which Iwill refer back to in the next few examples.
Listing 8-1: The Contents of AmazonMacbethSpanish.xml
<?xml version=”1.0” encoding=”ISO-8859-1”?>
<quotedoc>
<quotelist author=”Shakespeare, William” quotes=”4”>
<quote source=”Macbeth” author=”Shakespeare,William”>When the hurlyburly’s done, / When the battle’s lost and won.</quote>
<quote source=”Macbeth” author=”Shakespeare, William”>Out, damned spot! out, I say! One; two; why, then ‘tis time to do’t ; Hell is murky! Fie, my lord, fie! a soldier, and afeard? What need we fear who knows
it, when none can call our power to account? Yet who would have thought the old man to have had so much blood
<quote source=”Macbeth” author=”Shakespeare, morrow, and to-morrow, and to-morrow,creeps in this petty pace from day to day, to the last syllable of recorded time; and all our yesterdays have lighted fools the way to dusty death Out, out, brief candle! Life’s but a walking shadow; a poor player, that struts and frets his hour upon the stage, and then is heard no more: it is a tale told by an idiot, full of sound and fury, signifying nothing </quote>
Trang 36<tagged_url>http://www.amazon.com:80/exec/obidos/redirect?tag=associateid&benztechnonogies=9441
Trang 37A simple technique using xsl:copy-of
One of the simplest ways to start using XSL is to use the xsl:copy-of element tocreate a new XML document using a subset of a larger XML document Listing 8-2shows the contents of the XMLtoQuotes.xsl stylesheet This stylesheet creates anew XML document containing just the quotes from the sample XML document inListing 8-1
Listing 8-2: The Code for the XMLtoQuotes.xsl Stylesheet
<?xml version=”1.0” encoding=”UTF-8”?>
<xsl:stylesheetxmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0”>
Trang 38Walking through the transformation, I declare the XSL stylesheet as an XML ment, and then declare an xsl: Namespace for the XSL elements in the stylesheet.
docu-Next, I specify the output method for the stylesheet as xml, and also specify theencoding for the output as ISO-8859-1, the same as the origin document Notethat the output encoding differs from the stylesheet encoding This is a good illus-tration of the fact that the source XML document, the XSL stylesheet, and the trans-formation output can all be different encoding types if needed However, it’s worthpointing out that most XSLT processors support only UTF-8 and UTF-16 encoding Ialso set the indent attribute to “yes” The indent attribute is one of the optionaland vague attributes that must be recognized but do not necessarily need to besupported in an XSLT processor If the indent attribute is set to “yes”, the XSLTprocessor is supposed to perform rudimentary formatting on the XSLT output
<?xml version=”1.0” encoding= <?xml version=”1.0”
quotelistelement in the source document, which is a child of the quotedoc rootelement using the select attribute of the apply-templates element
(select=”/quotedoc/quotelist/*”>):
<xsl:template match=”/”> <transformedquotes>
<transformedquotes>
<xsl:apply-templates select=
”/quotedoc/quotelist/*”>
Trang 39The only template in the stylesheet is called as a result of the apply-templateselement The template is applied to all XML data in the node-set via the match=”*”attribute of the template element In this case, the node-set contains all the descen-dants of the /quotedoc/quotelist element The xsl:copy-of element makes acopy of all the nodes in a node-set without exception, including namespaces,attributes, and so on The select attribute could limit the copy-of element to aspecific scope, for example all of the attributes in the node-set, but in this case theselect just passes the whole node-set to the transformation output document byusing the XPath current node operator (.):
<xsl:template match=”*”> <quote source=”Macbeth”
author=”Shakespeare,
<xsl:copy-of select=”.”/> William”>When the hurlyburly’s
done, / When the battle’s lost
</xsl:template> and won.</quote>
<quote source=”Macbeth”
author=”Shakespeare,William”>Out, damned spot! out,
I say! One; two; why, then
‘tis time to do’t ; Hell ismurky! Fie, my lord, fie! asoldier, and afeard? What need
we fear who knows it, when none can call our power toaccount? Yet who would havethought the old man to have had
so much blood in him?</quote>
<quote source=”Macbeth”
author=”Shakespeare,William”>Is this a dagger which
I see before me, the handletoward my hand? Come, let meclutch thee: I have thee not,and yet I see thee still Artthou not, fatal vision,
sensible to feeling as tosight? or art thou but a dagger
of the mind, a false creation,proceeding from the heat-oppressed brain?</quote>
Trang 40Stylesheet Output XML Document Result
<quote source=”Macbeth”
author=”Shakespeare,William”>To-morrow, and to-morrow, and to-morrow,creeps inthis petty pace from day today, to the last syllable ofrecorded time; and all ouryesterdays have lighted foolsthe way to dusty death Out,out, brief candle! Life’s but awalking shadow; a poor player,that struts and frets his hourupon the stage, and then isheard no more: it is a taletold by an idiot, full of soundand fury, signifying nothing
</quote>
Once the template is finished, control is passed back to the template that called thecopy-of template, and the hard-coded transformedquotes closing tag is added tothe XSLT output Next, the template and the stylesheet closing tags finish the XSLTprocess
</xsl:apply-templates> </transformedquotes>
</transformedquotes>
</xsl:template>
</xsl:stylesheet>
Listing 8-3 shows the final XSLT transformation output in its entirety
Listing 8-3: The XSLT Output Document