Wrox XSLT programmers reference 2nd edition apr 2001 ISBN 1861005067

namespaces, 63node name, 60string-value, 62element-av ailable function, 476alternative to xsl:fallback, 216creating multiple output files, 478examples, 481rules, 477instructions defined

Trang 1

XSLT Programmer’s Reference, Second EditionMichael Kay

© 2000 Wrox PressAll rights reserved No part of this book may be reproduced, stored in a retrieval system ortransmitted in any form or by any means, without the prior written permission of the publisher,except in the case of brief quotations embodied in critical articles or reviews

The authors and publisher have made every effort in the preparation of this book to ensure theaccuracy of the information However, the information contained in this book is sold withoutwarranty, either express or implied Neither the authors, Wrox Press nor its dealers or distributorswill be held liable for any damages caused or alleged to be caused either directly or indirectly

by this book

First Published: April 2000Latest Reprint: March 2003Wrox

Published by Wrox Press LtdArden House, 1102 Warwick Road, Acock's Green, Birmingham B27 6BH, UKPrinted in Canada

ISBN 1-861005-06-7Trademark AcknowledgementsWrox has endeavored to provide trademark information about all the companies and productsmentioned in this book by the appropriate use of capitals However, Wrox cannot guarantee theaccuracy of this information

CreditsAuthor Michael KayTechnical Rev iewers David CarlisleRobert ChangMichael CorningJim MacIntoshCraig McQueenGary L PeskinPaul TchistopolskiiLinda van den BrinkDmitry E VoytenkoDan WahlinCategory Manager Dave GallowayTechnical Architect Dianne ParkerTechnical Editor Simon MackieAuthor Agent Marsha CollinsProj ect Managers Avril CorbinBeckie StonesProduction Manager Simon HardwareProduction Coordinator

Trang 2

About the Author

Michael Kay has recently joined the systems architecture team at Software AG, working on thestandards and interfaces for their XML product line, centred around the Tamino database Healso represents Software AG on the W3C XSL Working Group Until then, he had spent most ofhis career as a software designer and systems architect with ICL, the IT services supplier Hisbackground (and Ph.D., from the University of Cambridge) is in database technology He hasworked on the design of network, relational, and object-oriented database software products aswell as a text search engine In the XML world he is known as the developer of the open sourceSaxon product, the first fully-conformant implementation of the XSLT standard

Michael lives in Reading, Berkshire with his wife and daughter His hobbies, as you might guessfrom the examples in this book, include genealogy and choral singing

Acknowledgements

Firstly, I'd like to acknowledge the work of the W3C XSL Working Group, who created the XSLTlanguage Without their efforts there would have been no language and no book I wrote the firstedition of this book from an outsider's perspective before I joined the group, and I've tried to keepthat flavor in the second edition, despite the fact that I now have to take my share of

responsibility for the spec being the way it is

More specifically, I'm grateful to James Clark, the editor of the XSLT and XPath specifications,who responded courteously and promptly to a great many enquiries

I've learnt a great deal of what I know about XSLT from the people on the XSL-List; not onlyfrom the experts who answer so many of the questions, but also from the many beginners who askthem Many of the new techniques and explanations in the second edition were prompted byideas first aired on this list

I owe a debt both to ICL and to Software AG, my employers during the life of this project, whoboth offered me every encouragement and support

My editors at Wrox Press, and the technical reviewers, made an invaluable contribution bypointing out the many places where improvements were needed

And finally, I'm once again grateful for the support of Penny and Pippa, who took the news that Iwas planning a second edition with little more than a sigh of resignation

namespaces, 63node name, 60string-value, 62element-av ailable() function, 476alternative to xsl:fallback, 216creating multiple output files, 478examples, 481

rules, 477instructions defined in XSLT, 477testing for availability of

extension elements, 136features in later XSLT versions, 480vendor extenions, 481

usage, 480Elementsattribute value template, 150classification into groups, 151document order, 150expressions, 150instantiate, 150instructions, 150literal result element, 150patterns, 150

QName, 150stylesheets, 151template body, 151template rules, 151temporary trees, 151embedded stylesheet, 304example, 109using Saxon, 110using xsl:stylesheet element, 310empty node-set, 92

result of comparisons, 88empty string, 91encoding

of HTML output, 280

of text output, 281

of XML output, 276ends-with() functionthere is none, 540entities

external general parsed entity, 58unparsed, 562

entity referenceeffect on string-length(), 543handled by XML parser, 141HTML output, 279EntityResolv eruse with Saxon, 772entity-uri() function, 64EqualityExpr expression, 370equals operator =, 370applied to a tree, 375not an identity comparison, 374rules for node-sets, 372rules for simple values, 371ErrorListener interface (TrAX), 858escaping of special characters, 325

ev al()Saxon extension function, 781

ev aluate()Saxon extension function, 782example, 786Xalan extension function, 815Excelon

producers of Stylus Studio, 830exclude-result-prefixes attribute

in an imported stylesheet, 227

in an included stylesheet, 239xsl:stylesheet element, 312

Trang 4

AbbreviatedRelativeLocationPath, 354AbbreviatedStep, 356

recognizing vendor extensions, 131

system property() function, 131

testing whether extensions are available, 131vendor defined attributes, 131

vendor defined top level elements, 131XSLT open ended attribute values, 131vendor discretion on values to support, 131eXtensible Serv er Pages

see XSP pages

extension elements

definition, 117

implementation, vendor-dependant, 118namespace, 305

non-standard elements from vendor and user, 136saxon:group extension element, 117

saxon:while element, 135

supported in Saxon, 777

testing for availability, 476

tokenizing a string, 135

use for debugging, 311

use of xsl:fallback element, 118

xsl:exclude-result-prefixes attribute, 136xsl:fallback element, 137

xsl:stylesheet element, 136

extension-element-prefixes attribute, 135xsl:extension-element-prefixes attribute, 135

Trang 5

as substitute for updateable variables, 613

binding, 569

calling, 386

construct new tree in form of DOM, 572

node must be Document or Document Fragment object, 572

choice of language to write in, 568

mechanism for using other languages, 132

name contains namespace prefix and colon, 568

elements can specify different languages, 569

XSLT source tree, accessing, 570

extension-element-prefixes attribute

imported stylesheet, in, 239

relationship to element-available() function, 480

xsl:stylesheet element, 311

external functions within a loop, 585

external general parsed entity, 58

used as XML output, 272

external obj ect

cannot use as a predicate, 412

table of conversion rules, 581

external obj ect data type, 386

EZ/X

produced by Activated Intelligence, 828

Trang 6

Chapter 8 Contents

Overview

When are Extension Functions

Needed?

Calling Extension Functions

What Language is Best?

Binding Extension Functions

XPath Trees and the DOM

The Java Language Binding

The JavaScript Language

my:function($arg1, 23, string(title))The name of an extension function will always contain a namespace prefix and a colon Theprefix («my» in this example) must be declared in a namespace declaration on some containingelement in the stylesheet, in the usual way The function may take

any number of arguments (zero or more), and the parentheses are needed even if there are noarguments The arguments can be any XPath expressions; in our example, the first argument is avariable reference, the second is a number, and the third is a function call The arguments arepassed to the function by value, which means that the function can never modify the values ofthe arguments The function always returns a result

We'll have a lot more to say about the data types of the arguments, and the data type of theresult, in due course

Trang 7

<xsl:if> is analogous to the if statement found in many programming languages.

Defined inXSLT section 9.1Format

Expression The Boolean condition to be

testedContent

A template body

EffectThe test expression is evaluated and the result is converted if necessary to a Boolean using therules defined for the boolean() function If the result is true, the contained template body isinstantiated; otherwise, no action is taken

Any XPath value may be converted to a Boolean In brief, the rules are:

if the expression is a node-set, it is treated as true if the node-set contains at leastone node (This means that a reference to a temporary tree is always treated astrue.)

if the expression is a string, it is treated as true if the string is not empty

if the expression is a number, it is treated as true if the number is non-zero.Usage

The <xsl:if> instruction is useful where an action is to be performed conditionally It performsthe functions of the if-then construct found in other programming languages If there are two ormore alternative actions (the equivalent of an if-then-else or switch or Select Case inother languages), use <xsl:choose> instead

One common use of <xsl:if> is to test for error conditions In this case it is often used with

the numeric value is less than zero, orthe numeric value is greater than 100

<xsl:if test="not(@percent) or (string(number(@percent))='NaN') or (number(@percent) < 0) or (number(@percent) > 100)">

<xsl:message>

Trang 8

<xsl:if test="position()!=last()">, </xsl:if>

<xsl:if test="position()=last()-1">and </xsl:if>

Trang 9

Who is this Book for?

Why a Second Edition?

What does the Book Cover?

How is the Book Structured?

feedback@wrox.com

Trang 10

Defined inXSLT section 2.6.1Format

<xsl:include href=uri />

Position

<xsl:include> is a top-level element, which means that it must appear as a child of the

<xsl:stylesheet> element There are no constraints on its ordering relative to other top-levelelements in the stylesheet

Attributes

href mandatory

URI The URI of the stylesheet to be included

ContentNone; the element is always empty

EffectThe URI contained in the href attribute may be an absolute URI or a relative URI If relative, it

is interpreted relative to the the base URI of the XML document or external entity containing the

<xsl:include> element For example, if a file main.xsl contains the element <xsl:includehref="date.xsl"/> then by default the system will look for date.xsl in the same directory asmain.xsl With XSLT 1.1 you can change this behavior by using the xml:base attribute tospecify a base URI explicitly, as described in page 62

The URI must identify an XML document that is a valid XSLT stylesheet The top level elements

of this stylesheet are logically inserted into the including stylesheet module at the point wherethe <xsl:include> element appears However:

These elements retain their base URI, so anything that involves referencing arelative URI is done relative to the original URI of the included stylesheet This ruleapplies, for example, when expanding further <xsl:include> and <xsl:import>elements, or when using relative URIs as arguments to the document() function.When a namespace prefix is used (typically within a QName, but it also applies tofreestanding prefixes such as those in the xsl:exclude-result-prefixesattribute of a literal result element) it is interpreted using only the namespacedeclarations in the original stylesheet module in which the QName occurred Anincluded stylesheet module does not inherit namespace declarations from themodule that includes it This even applies to QNames constructed at execution time

as the result of evaluating an expression, for example an expression used within anattribute value template for the name or namespace attribute of <xsl:element>.The values of the version, extension-element-prefixes, and exclude-result-prefixes attributes that apply to an element in the included stylesheetmodule, as well as xml:lang and xml:space, are those that were defined on itsown <xsl:stylesheet> element, not those on the <xsl:stylesheet> element

of the including stylesheet module

An exception is made for <xsl:import> elements in the included stylesheetmodule <xsl:import> elements must come before any other top-level elements,

so instead of placing them in their natural sequence in the including module, theyare promoted so they appear after any <xsl:import> elements, but before anyother top-level elements, in the including stylesheet module This is relevant tosituations where there are duplicate definitions and the XSLT processor is allowed

to choose the one that comes last

The included stylesheet module may use the simplified stylesheet) stylesheet syntax, described in Chapter 3 This allows an entire stylesheetmodule to be defined as the content of an element such as <HTML> It is then treated as if it were

(literal-result-element-as-a module cont(literal-result-element-as-aining (literal-result-element-as-a single templ(literal-result-element-as-ate, whose m(literal-result-element-as-atch p(literal-result-element-as-attern is «/» (literal-result-element-as-and whose content is theliteral result element

The included stylesheet module may contain <xsl:include> statements to include furtherstylesheets, or <xsl:import> statements to import them A stylesheet must not directly orindirectly include itself

It is not an error to include the same stylesheet module more than once, either directly orindirectly, but it is not a useful thing to do It may well cause errors due to the presence ofduplicate declarations, in fact if the stylesheet contains definitions of global variables or namedtemplates, and is included more than once at the same import precedence, such errors areinevitable In some other situations it is implementation-defined whether an XSLT processor willreport duplicate declarations as an error, so the behavior may vary from one product to another.Usage

<xsl:include> provides a simple textual inclusion facility analagous to the #include directive

in C, it is purely a way of writing a stylesheet in a modular way so that commonly used definitions

Trang 11

some elements in common, which are to be processed in the same way regardless of where theyoccur For example, these might include standard definitions of toolbars, backgrounds, andnavigation buttons to go on your web pages, as well as standard styles applied to data elementssuch as product names, e-mail contact addresses, or dates.

To incorporate such standard content into a stylesheet without change, use <xsl:include> Ifthere are definitions you want to override, use <xsl:import>

<xsl:include> is a compile-time facility; it is used to assemble the complete stylesheet beforeyou start executing it People sometimes ask how to include other stylesheets conditionally atrun-time, based on conditions found in the source document The answer is simple: you can't Itwould be like writing a program in Visual Basic that modifies itself as it executes If you do wantdifferent sets of rules to be active at different times, consider using modes, or consider invertingthe logic, so that instead of having an all-purpose stylesheet that tries to include different sets ofrules on different occasions, you make your principal stylesheet module the one that is mostclosely tailored to the circumstances, and use <xsl:import> to import the all-purpose rules into

it, at a lower import precedence than the specialized rules

It can make a difference where in your stylesheet the <xsl:include> statement is placed.There are some kinds of objects - notably, template rules - where if there is no other way ofdeciding which one to use, the XSLT processor has the option of giving priority to the one thatoccurs last in the stylesheet This isn't something you can easily take advantage of, because inall these cases the processor also has the option of reporting an error As a general principle, it'sprobably best to place <xsl:include> statements near the beginning of the file, because then

if there are any accidental overlaps in the definitions, the ones in your principal stylesheet willeither override those included from elsewhere, or be reported as errors

The resulting output is:

This is because attributes generated using <xsl:attribute> override those generated by using

a named attribute set; it has nothing to do with the fact that the attribute set came from anincluded stylesheet

See also

<xsl:import> on page 226

Trang 13

of attribute names and values The resulting attribute set can be applied as a whole to anyoutput element, providing a way of defining commonly-used sets of attributes in a single place.Named attribute sets provide a capability similar to named styles in CSS.

Defined inXSLT section 7.1.4Format

<xsl:attribute-set name=QName use-attribute-sets=list-of-QNames >

QName The name of the attribute set

use-attribute-sets optional

Whitespace-separatedlist of QName

The names of other attribute sets

to be incorporated into thisattribute set

ContentZero or more <xsl:attribute> elements

EffectThe name attribute is mandatory, and defines the name of the attribute set It must be a QName;

a name with or without a namespace prefix If the name uses a prefix, it must refer to anamespace declaration that is in scope at this point in the stylesheet, and as usual it is thenamespace URI rather than the prefix that is used when matching names The name does notneed to be unique; if there are several attribute sets with the same name, they are effectivelymerged

The use-attribute-sets attribute is optional It is used to build up one attribute set from anumber of others If present, its value must be a whitespace-separated list of tokens each of which

is a valid QName that refers to another named attribute set in the stylesheet For example:

The references must not be circular: if A refers to B, then B must not refer directly or indirectly to

A The order is significant: specifying a list of named attribute sets is equivalent to copying the

<xsl:attribute> elements that they contain, in order, to the beginning of the list of

<xsl:attribute> elements contained in this <xsl:attribute-set> element

If several attribute sets have the same name, they are merged If this merging finds two attributeswith the same name, then the one in the attribute set with higher import precedence will takeprecedence Import precedence is discussed under <xsl:import> on page 226 If they bothhave the same precedence, the XSLT processor has the option of using the one that came later

in the stylesheet, or reporting an error (Note that it isn't generally possible to detect this error atcompile time, because the attribute names can be calculated at run-time using an attributevalue template.)

The order in which this merging process takes place can affect the outcome When attribute-sets appears on an <xsl:attribute-set> or <xsl:copy> element, or xsl:use-attribute-sets on a literal result element, it is expanded to create an equivalent sequence of

use-<xsl:attribute> instructions The precise rules are tortuous, even in the corrected version thatappears in the XSLT 1.0 errata They are best explained by example

Suppose you have the following attribute-set definition:

<xsl:attribute-set name="B" use-attribute-sets="A1 A2">

Trang 14

equivalent list of <xsl:attribute> instructions before it merges this B with other attribute sets

of the same name

When B is expanded, the attributes derived from A1 and A2 will be output before the attributes p,

q, and r, so if expanding A1 and A2 generates any attributes called p, q, and r, these will beoverwritten by the values specified within B (percy, queenie, and rory)

In general, duplicate attribute names or attribute set names do not cause an error; if severalattributes have the same name, the one that comes last (in the order produced by the mergingrules given above) will take precedence There is one situation only that is defined as an error,namely if two different attribute-sets with the same name and the same import precedence bothproduce attributes with the same name Another way of saying this is: it is an error if the finalresult depends on the relative positions of the attribute-set definitions in the stylesheet However,the processor is allowed to recover from this error by taking the definitions in stylesheet order, and

in this particular case it's so much easier to take the recovery action than to detect the error that Isuspect this is what most processors are likely to do

Usage

The most common use of attribute sets is to define packages of attributes that constitute a displaystyle, for example a collection of attributes for a font or for a table

A named attribute set is used by referring to it in the use-attribute-sets attribute of the

<xsl:element> or <xsl:copy> elements, or in the xsl:use-attribute-sets attribute of aliteral result element, or, of course, in the use-attribute-sets attribute of another

<xsl:attribute-set> The first three cases all write an element node to the current outputdestination, and have the effect of adding the attributes in the named attribute set to thatelement node Any attributes added implicitly from a named attribute set can be overridden byattribute nodes added explicitly by the invoking code

An attribute set is not simply a textual macro The attributes contained in the attribute set eachhave a template body to define the value, and although this will often be a simple text node, itmay also, for example, declare variables or invoke other XSLT instructions such as <xsl:call-template> and <xsl:apply-templates>

The rules for the scope of variables, described under <xsl:variable> on page 333, are thesame as anywhere else, and are defined by the position of the definitions in the sourcestylesheet document This means that the only way to parameterize the values of attributes in anamed attribute set is by reference to global variables and parameters: there is no other way ofpassing parameters to an attribute set However, the value of the generated attributes maydepend on the context in the source document The context is not changed when the attribute-set is used, so the current node («.») and current node list are exactly the same as in the callingtemplate

This is shown in the following example:

Example: Using an Attribute Set for Numbering

Let's suppose we want to copy an XML file containing a poem, but with the <line> elements inthe poem output in the form <line number="3" of="18"> within the stanza

<line>And suddenly the wind comes soft,</line>

<line>And Spring is here again;</line>

<line>And the hawthorn quickens with buds of green</line>

<line>And my heart with buds of pain.</line>

Trang 15

This produces the following output:

The output now becomes:

<table border="2" rules="cols" cellpadding="3"

Then this new attribute set could also be invoked by name from a literal result element, an

<xsl:element> instruction, or an <xsl:copy> instruction

Trang 16

Appendix E - Other Products

XSLT Programmer's Reference, Second EditionbyMichael Kay

Wrox Press 2001

4XSLTThis is an open source implementation of XSLT written in the Python language The vendor isthe FourThought company of Boulder, Colorado, and the lead designer is Uche Ogbuji.Information on the company is on http://www.fourthought.com/, and the open source tools can

be downloaded from http://4suite.org/index.epy

4XSLT forms one component of a suite of XML software which is remarkable in its scope, giventhe small size of the company producing it As well as the XSLT processor, it includes an XMLparser with DOM and SAX interfaces, an implementation of XLink and XPointer, an RDFimplementation, and an object database The main distinguishing feature of these tools is thatthey are all written in Python

4XSLT is (as far as I can tell) a complete implementation of the XSLT 1.0 and XPath 1.0Recommendations The XPath implementation is freestanding, and can also be used directly inconjunction with the DOM implementation

The XSLT processor can be run either from the command line, or from a Python applicationprogram It also allows extension elements and extension functions to be written in Python.The product contains a few built-in extension elements, including:

ft:apply-templates A variation on xsl:apply-templates that allows the mode to

be selected dynamically at run-timeft:write-file Produces multiple output documents (similar to the XSLT 1.1

<xsl:document> instruction)ft:message-output Defines the destination for output from the <xsl:message>instruction

4XSLT also contains a useful library of extension functions, many of them familiar from otherprocessors, but some of them unique to 4XSLT:

ft:node-set() Converts a result tree fragment to a node-setft:match() Determines whether a string matches a regular expressionft:escape-url() Escapes special characters appearing in a URLft:iso-time() Returns the current date and time in ISO 8601 formatft:ev aluate() Evaluates an XPath expression supplied as a stringft:distinct() Removes duplicate nodes from a node-setft:split() Splits a string into a set of text nodes, using a specified delimiterft:range() Constructs a set of text nodes whose values are consecutive numbersft:if() Acts as a conditional expression

ft:find() Finds the position of a string within a containing string

In the weeks before we went to press, Uche Ogbuji was championing the creation of a neutral library of specifications for extension functions, being prepared under the editorship ofJeni Tennison; for details see http://www.jenitennison.com/xslt/exslt/common/ The thinkingbehind this initiative is that there is already so much overlap between different vendors' sets ofextension functions, it makes sense for those provided by multiple vendors to use the samenamespace, so that portable stylesheets can be written It seems likely that 4XSLT will be in thelead in implementing this library, so watch out for news on this

vendor-I haven't actually tried to install the 4XSLT product; it's probably very easy if you know Pythonand are familiar with its conventions, but a bit daunting if you don't Having said that, Pythonseems to be one of those languages that people fall in love with once they've discovered it, so itmight well be a skill worth acquiring Find out more at http://www.py thon.org

Trang 17

There is a clear analogy here with object-oriented programming Writing a stylesheet modulethat imports another is like writing a subclass, whose methods override the methods of thesuperclass <xsl:apply-imports> behaves analogously to the super() function in object-oriented programming languages, allowing the functionality of the superclass to be incorporated

in the functionality of the subclass

Defined inXSLT section 5.6FormatXSLT 1.0 Format:

The specification doesn't say exactly what imported into means A reasonableinterpretation is a stylesheet module that is a descendant of this one in theimport tree A stylesheet module S becomes a child of another stylesheetmodule T in the import tree by being referenced in an <xsl:import> statementwithin T, or within a module that is included in T directly or indirectly by means

of <xsl:include> statements If module S is imported into module T, thentemplates defined in S will always have lower import precedence thantemplates defined in T

With the draft XSLT 1.1 specification it becomes possible to specify parameters to be supplied tothe called template, using <xsl:with-param> elements contained within the <xsl:apply-imports> element These work in the same way as parameters for <xsl:call-template> and

<xsl:apply-templates>; if the name of the supplied parameter matches the name of an

<xsl:param> element within the called template, the parameter will take that value, otherwise

it will take the default value supplied in the <xsl:param> element It is not an error to supplyparameters that don't match any <xsl:param> element in the called template rule, they willsimply be ignored

Usage and ExamplesThe intended usage pattern behind <xsl:apply-imports> is illustrated by the followingexample

One stylesheet, a.xsl, contains general-purpose rules for rendering elements For example, itmight contain a general-purpose template rule for displaying dates, as follows:

<xsl:template match="timeline/date">

<b>

<xsl:value-of select="day"/>

<xsl:text>/</xsl:text>

Trang 18

«timeline/date» template rule is in a stylesheet that directly or indirectly imports the «date»template rule It will not work, for example, if they are in the same stylesheet but defined withdifferent priority.

In many situations the same effect can be achieved equally well by giving the general-purposetemplate rule a name, and invoking it from the special-purpose template rule by using

<xsl:call-template> (see page 179) This technique also works when overriding a templaterule of lower priority (and equal import precedence) The one time this alternative technique isnot possible is when the general-purpose template rule was written by someone else and cannot

be changed For example this situation might arise if users of web pages were allowed to createXSLT stylesheets that modified the behavior of an author-supplied stylesheet

But that approach doesn't work if you want one rule that overrides or supplements many others.One example I encountered was a developer who had a working stylesheet, but wanted to addthe rule "output an HTML <a> tag for any source element that has an anchor attribute" Ratherthan modifying every rule in the existing stylesheet, this can be achieved by defining a newstylesheet module that imports the original one, and contains the single rule:

Trang 19

or you could just as sensibly start at the end and work backwards For a top-down approach, startwith Expr on page 420), which is one of the key concepts that gives XPath its power.Many languages distinguish the lexical rules, which define the format of basic tokens such asnames, numbers, and operators, from the syntactic rules, which define how these tokens arecombined to form expressions and other higher-level constructs.

The XPath specification includes both syntactic and lexical production rules, but they are notquite as clearly separated as in some languages As some constructs appear in both, I've keptthem bundled together, showing the lexical rules in the same alphabetical sequence as thesyntax rules, but distinguishing them in the text The main distinction between the two kinds ofrule is that whitespace can be freely used between lexical tokens but not within a lexical token.The top-level lexical rule is ExprToken

Trang 20

For example, the expression «format-number(12.5, '$#.00')» returns the string «$12.50».Defined in

XSLT section 12.3

The effect of the function is defined by reference to the Java JDK 1.1 specifications; I haveextracted the relevant information for ease of reference

Formatformat-number(value, format) ® stringformat-number(value, format, name) ® numberArguments

value number The input value If it is not of type number,

it is first converted to a number using therules for the number() function

format string A format pattern If it is not of type string, it

is first converted to a string using the rulesfor the string() function

nameoptional

string The name (a QName) of a decimal format,

established using the format> element If the argument isomitted, the default decimal format isused

in scope at the point in the stylesheet where the format-number() function is called Theremust be an <xsl:decimal-format> element in the stylesheet with the same expanded name,using the namespace URIs rather than prefixes in the comparison

If the third argument is omitted, the default decimal format is used A default decimal format can

be established for a stylesheet by including an <xsl:decimal-format> element with no name

If there is no unnamed <xsl:decimal-format> element in the stylesheet, the system uses abuilt-in default format which is the same as specifying an <xsl:decimal-format> with noattributes

The Format PatternThe rules for the format pattern string are defined in XSLT by reference to the Java JDK 1.1specification

The structure of the format pattern is as follows, using the same syntax conventions as inExpressions:

pattern subpattern ( pattern-separator subpattern )?

subpattern prefix? integer ( decimal-point fraction )?

suffix?

(but also allowing a grouping-separator to appear)

pattern-separator «;» (by default)decimal-point «.» (by default)grouping-separator «,» (by default)

zero-digit «0» (by default)specialCharacters see table below

In these syntax rules, the characters shown as «;», «.», «,», «#», and «0» are the defaultrepresentations of pattern-separator, decimal-point, grouping-separator, digit, andzero-digit If the relevant <xsl:decimal-format> element nominates different characters inthese roles, the nominated character is used in its place in the format pattern

Trang 21

The special characters used are as follows:

zero-digit (default «0») A digit will always appear at this point in the

result string

digit (default «#») A digit will appear at this point in the result

string unless it is a redundant leading or trailingzero

decimal-point (default «.») Separates the integer and the fraction part of the

number

grouping-separator (default «,») Separates groups of digits

pattern-separator (default «;») Separates the positive and negative format

sub-patterns

minus-sign (default «-») Minus sign

percent-sign (default «%») Multiplies the number by 100 and show it as a

If there is no explicit negative subpattern, «-» is prefixed to the positive form That is, «0.00»alone is equivalent to «0.00;-0.00» If there is an explicit negative subpattern, it serves only tospecify the negative prefix and suffix; the number of digits, minimal digits, and othercharacteristics are all the same as the positive pattern That means that «#,##0.0#;(#)» hasprecisely the same result as «#,##0.0#;(#,##0.0#)»

The XSLT specification specifically references the JDK 1.1 specification for its definition ofpatterns Some additional special characters were defined in decimal format patterns after theinitial release of the JDK 1.1 specification: notably «¤» (the international currency sign #xA4),and «E», the exponent character, used for output in scientific notation While the specification isclear that an XSLT processor isn't required to support these special characters, neither does it saythat it must treat them as errors If you're using a processor written in Java, it's actually quite likelythat the way these characters are interpreted will vary depending on which specific Java VM youare using

The grouping separator is commonly used for thousands, but in some countries for ten-thousands.The number of digits per group in the output string will be equal to the number of digits in thepattern between the last grouping separator and the end of the integer: any other groupingseparators in the format pattern are ignored For example, if you write «#,##,###,####» therewill be a grouping separator every four digits

It is not defined what happens if the format pattern is invalid This means the implementation isfree either to report an error or to display the number in some fallback representation.Usage

Note that this facility for formatting numbers is completely separate from the facilities availablethrough the <xsl:number> element There is some overlapping functionality, but the syntax ofthe format patterns is quite unrelated The format-number() function formats a single number,which need not be an integer <xsl:number> is primarily designed to format a list of positiveintegers For formatting a single positive integer, either facility can be used

Examples

The following example shows the result of format-number() using the default decimal format.Examples with non-default decimal formats are shown under the <xsl:decimal-format>element in page 199

Trang 22

in Saxon, 779language-dependent sorting, 298last() function, 510

alternative to count() function, 461effect of xsl:apply-templates, 157

in a predicate, 409numbering the figures in a document example, 512rules, 511

sorting into columns, 513usage and examples, 511testing for the last in a list, 512used with xsl:sort, 513

used within a predicate, 384within a pattern, 442within a predicate, 513within xsl:for-each, 220lazy ev aluation, 135LDAP directory, 52leading()Saxon extension function, 783less than operator, 416less-than-or-equals operator, 416Lexical tokens (XPath)Digits, 369ExprToken, 380ExprWhitespace, 381FunctionName, 388Literal, 388MultiplyOperator, 394NCName, 396NCNameChar, 396NodeType, 399Number, 401Operator, 402OperatorName, 403lexical units (XPath)rules, 381line endingsnormalization, 69line-number()Saxon extension function, 784listing the cast of a play examplexsl:apply-templates element, 162listing the elements in a document, 521Literal expression, 388

literal result elements, 118

as stylesheet, 239attribute value template, 123attributes, 122

compared with xsl:element, 212copying of namespaces, 306definition, 75

description, 150effect of xsl:namespace-alias, 252generate attributes, 123generates nodes not tags, 121namespace nodes, 124usage, 120xsl:attribute-sets attribute, 123xsl:attribute element, used with, 123xsl:exclude-result-prefixes attribute, 119xsl:extension-element-prefixes attribute, 119xsl:use-attribute-sets attribute, 123xsl:version attribute, 119literal result elements examplesarranging data in rows of a table, 122namespaces example, 124date:new()extension function, 125date:toString(),extension function 125literals

examples, 390use of quotes, 389

Trang 23

selecting regardless of namespace, 516

sorting elements by namespace, 517

implemented using the document() function, 473

using temporary trees, 342

Trang 24

For example, the expression «count(.)» always returns 1.

Defined inXPath section 4.1Formatcount(nodes) ® numberArguments

nodes node-set The input node-set An error is reported if the

argument is not a node-set

Result

A number giving the number of distinct nodes in the input node-set

RulesThe count() function takes a node-set as its parameter, and returns the number of nodespresent in the node-set

Only the nodes that are members of the node-set in their own right are counted Nodes that arechildren or descendants of these nodes are not included in the count

Usage

A node-set is a mathematical set, so all the nodes it contains are distinct (which means they aredifferent nodes - it doesn't mean, of course, that they must have different string-values) If youform a node-set using the union operator «|», any nodes that are in both operands will only beincluded in the result once This means, for example, that the result of «count( | / )» will be

1 if, and only if, the context node is the root

Since XPath provides no other way of comparing whether two node-sets contain the same node,this can be a useful programming trick For example, to test whether the context node is one ofthe nodes in the node-set in variable «$special», write:

<xsl:if test="count($special | ) = count($special)"> </xsl:if>Avoid using count() to test whether a node-set is empty, for example by writing:

<xsl:if test="count(book[author='Hemingway']) != 0"> </xsl:if>This can be better expressed as:

<xsl:if test="book[author='Hemingway']"> </xsl:if>

Both examples test whether the current node has any child <book> elements that have a child

<author> element whose value is «Hemingway» However, the second example, as well asbeing more concise, is easier for the XSLT processor to optimize Many implementations will beable to stop the scan of books as soon as a matching one is found

The count() function is often an effective alternative to using <xsl:number> For example, ifthe current node is a <bullet> element, then «count(preceding-sibling::bullet)+1»returns the same value as <xsl:number/> used with no attributes The advantages of usingcount() are that it is rather more flexible in defining what you want to count, and it can be useddirectly in expressions However, <xsl:number> gives a simple way of obtaining the sequencenumber, formatting it, and inserting it in the result tree in a single operation; it may also in somecases be easier for the processor to optimize

Avoid using count() where last() would do the job just as well This situation arises when youare processing a set of nodes using <xsl:apply-templates> or <xsl:for-each>; the number

of nodes in that set is then available from the last() function For example, it is probablyinefficient to write:

<xsl:for-each select="book[author='Hemingway']">

</xsl:for-each>

because - unless the XSLT processor is rather clever - it will have to re-evaluate the expression

« /book[author='Hemingway']» each time round the loop

<xsl:value-of select="count(//footnote)"/>

The following example assigns to a variable the number of attributes of the current node:

Trang 25

This example counts how many distinct values of the country attribute there are in a list of

<city> elements

Source

The source document is cities.xml:

Trang 26

Chapter 2 Contents

Overview

XSLT: A System Overview

The Tree Model

The Transformation Process

in more detail at these aspects of the language

VariablesXSLT allows global variables to be defined, which are available throughout the wholestylesheet, as well as local variables, which are available only within a particular template body.The name and value of a variable are defined in an <xsl:variable> element For example:

<xsl:variable name="width" select="50"/>

This defines a variable whose name is width and whose value is the number 50 The variablecan subsequently be referenced in an XPath expression as $width If the <xsl:variable>element appears at the top level of the stylesheet (as a child of the <xsl:stylesheet>element) then it is a global variable; if it appears within the body of an <xsl:template>element then it is a local variable

Similarly, XSLT also allows global and local parameters to be defined, using an <xsl:param>element Global parameters are set from outside the stylesheet (for example, from the commandline or from an API - the actual mechanism is implementor-defined) Local parameters to atemplate are set using an <xsl:with-param> element when the template is called

Variables and parameters are not statically typed: they take whatever type of value is assigned tothem The five data types defined in XSLT and XPath are:

String (any sequence of Unicode characters permitted in XML)

Number (a double-precision floating point number as defined in IEEE 754).Boolean (the value true or false)

Node-set (a set of nodes in a source tree)

External obj ect (an object, for example a Java object, returned by an extensionfunction written in a language other than XSLT/XPath) By allowing XSLTvariables to identify an external object, external functions are able to pass arbitraryvalues to each other

In XSLT 1.0 there is an additional data type, the Result Tree Fragment This is essentially a treeconforming to the model described earlier in this chapter In the draft XSLT 1.1 specification, theresult tree fragment is no longer a separate data type, but is merged into the node-set data type

A tree is manipulated in the form of a node-set containing one node, the root of the tree.These data types are described in more detail later in the chapter, starting on page 87.The use of variables is superficially very similar to their use in conventional programming andscripting languages They even have similar scoping rules However, there is one key difference:once a value has been given to a variable, it cannot be changed This difference has a profoundimpact on the way programs are written, so it is discussed in detail in the section ProgrammingWithout Assignment Statements in page 609

ExpressionsThe syntax of expressions is defined in the XPath Recommendation, and is described in detail inChapter 5

XPath expressions are used in a number of contexts in an XSLT stylesheet They are used asattribute values for many XSLT elements, for example:

<xsl:value-of select="($x + $y) * 2"/>

In this example $x and $y are references to variables, and the operators «+» and «*» have theirusual meanings of addition and multiplication

Many XPath expressions, like this one, follow a syntax that is similar to other programminglanguages The one that stands out, however, and the one that gave XPath its name, is the PathExpression

A Path Expression defines a navigation path through the document tree Starting at a definedorigin, usually either the current node or the root, it follows a sequence of steps in defineddirections At each stage the path can branch, so for example you can find all the attributes of allthe children of the origin node The result is always a set of nodes It might be empty or containonly one node, but it is still treated as a set

The directions of navigation through the tree are called axes The various axes are defined indetail in Chapter 5 They include:

The child axis, which finds all the children of a node

The attribute axis, which finds all the attributes of a node

The ancestor axis, which finds all the ancestors of a node

The following-siblings axis, which finds the nodes that come after this one andshare the same parent

The preceding-siblings axis, which finds the nodes that come before this one andshare the same parent

As well as specifying the direction of navigation through the tree, each step in a path expressioncan also qualify which nodes are to be selected This can be done in several different ways:

By defining the name of the nodes (completely or partially)

By defining the type of nodes (e.g elements or processing instructions)

By defining a predicate that the nodes must satisfy - an arbitrary Booleanexpression

Trang 27

The syntax of a path expression uses «/» as an operator to separate the successive steps A «/»

at the start of a path expression indicates that the origin is the root node; otherwise it is generallythe current node Within each step, the axis is written first, separated from the other conditions bythe separator «::» However, the child axis is the default, so it may be omitted; and the attributeaxis may be abbreviated to «@»

For example:

child::item/attribute::category

is a path expression of two steps, the first selects all the child <item> elements of the currentnode, and the second step selects their category attributes This can be abbreviated to:item/@category

Predicates that the nodes must satisfy are written in square brackets, for example:

The static context includes:

The set of namespace declarations in force at the point where the expression iswritten This determines the validity and meaning of any namespace prefixes used

in the expression

The set of variable declarations (that is, <xsl:variable> and <xsl:param>elements) in scope at the point where the expression is written This determines thevalidity of any variable references used in the expression

The set of functions that are available to be called External function libraries can

be set up using vendor-defined mechanisms, or in XSLT 1.1, by using the

<xsl:script> element (described in page 293)

The Base URI of the stylesheet element containing the XPath expression This onlyaffects the result if the expression uses the document() function, using a relativeURI that is interpreted relative to the stylesheet The document() function isdescribed in page 466

The dynamic context consists of:

The current values of all the variables that are in scope for the expression Thesemay be different each time the expression is evaluated

The current location in the source tree The current location comprises:

The current node This is the node in the source tree that iscurrently being processed A node becomes the current node when it

is processed using the <xsl:apply-templates> or each> instructions The current node can be referenced using thecurrent() function

<xsl:for-The context node This is normally the same as the current node,except in a predicate used to qualify a step within a path expression,when it is the node currently being tested by the predicate Thecontext node can be referenced using the expression «.», or thelonger form «self::node()» For example, «a[.='Madrid']»selects all the <a> elements whose string-value is 'Madrid'.The context position This is an integer (>=1) that indicates theposition of the context node in the current node list The contextposition can be referenced using the position() function When

<xsl:apply-templates> or <xsl:for-each> are used to process

a list of nodes, that list becomes the current node list, and thecontext position therefore takes the values 1 n as each of the nodes

in the list is processed When a predicate is used within a pathexpression, the context position is the position of the node beingtested within the set of nodes being tested So, for example,

«child::a[position() != 1]» selects all the child elementsnamed <a>, except the first

The context size This is an integer (>=1) that indicates the number

of nodes in the current node list The context size can be referencedusing the last() function So, for example,

«child::a[position() != last()]» selects all the childelements named <a>, except the last

The XSLT and XPath specifications use different terminology to describe the context XSLT usesthe concepts of current node and current node list, while XPath uses the concepts of contextnode, context position, and context size When the <xsl:apply-templates> or <xsl:for-each> instructions are executed, the current node list is set to the list of nodes being processed,and the current node is set to each of these nodes in turn When an XPath expression isevaluated, the context node is set to the current node; the context position is set to the position

of the current node within the current node list; and the context size is set to the size of thecurrent node list

Some system functions that can be used in expressions have other dependencies on the context,for example the effect of the key() function depends on the set of <xsl:key> declarations inforce; but the list above covers all the context information that is directly accessible to user-written expressions

Data Types

XSLT is a dynamically-typed language, in that types are associated with values rather than withvariables In this respect it is similar to VBScript and JavaScript

Trang 28

However, the functions boolean(), number(), and string() are also available to carry outexplicit conversions The table below summarizes the conversions between the five data types:

To/

From

obj ectBoolean not

applicable

false ® 0true ® 1

false ®'false' true ® 'true'

not allowed not

allowed

number 0 ® false

other ® true

notapplicable

convert todecimalformat

not allowed not

allowed

string null ® false

other ( true

parse as adecimalnumber

notapplicable

not allowed not

string-value

of first node

in documentorder

notapplicable

notallowed

In XSLT 1.1, a temporary tree created using a non-empty <xsl:variable> element has a datatype of node-set (the node-set contains a single node, the root of the tree), and its conversions toand from other data types are as defined in the table above With XSLT 1.0, however, temporarytrees are represented by an additional data type, the result tree fragment, which behaves in mostways like a node-set, but can't be used in all contexts where node-sets are allowed Many XSLT1.0 products provide an extension function to convert a result tree fragment to a node-set whenrequired; in MSXML3, for example, the function is msxml:node-set()

See the relevant Appendix for details of extension functions in particular

vendors' products

Conversion of a result tree fragment to a string, number, or boolean happens implicitly, and hasthe same result as if a node-set containing just the root node of the tree were converted to therequired type, using the rules in the table above The effect of these rules is as follows:

always true convert to a string; then

convert the string to anumber

the concatenation ofall the text nodes inthe treeThe following sections describe the core data types, namely Boolean, number, string, and node-set External objects, which are provided for use by extension functions written in a differentprogramming language, are described in Chapter 8

Boolean Values

The Boolean data type in XPath contains the values true and false

There are no constants to represent true and false, instead the values can be written using thefunction calls true() and false()

Boolean values may be obtained by comparing values of other data types using operators such

as «=» and «!=», and they may be combined using the two operators «and» and «or» and thefunction not()

XPath differs from SQL in that it does not use three-valued logic A Boolean value is alwayseither true or false; it can never be undefined or null The nearest equivalent to an SQL nullvalue in XPath is an empty node-set, and when you compare an empty node-set to a string ornumber the result is always false, regardless of which comparison operator you use For example,

if the current element has no name attribute, then the expressions «@name='Boston'» and,

«@name!='Boston'» both return false However, the expression «not(@name='Boston')»returns true

For more information on the sometimes-strange behavior of the equality and inequality operatorswhen applied to node-sets, see the sections EqualityExpr (

Number Values

A number in XPath is always a double-precision (64-bit) floating-point number, and its behavior

is defined to follow the IEEE 754 standard This standard (IEEE Standard for Binary Point Arithmetic ANSI/IEEE Std 754-1985) has been widely implemented by many

Floating-microprocessors for some years, but it is only through its adoption in the Java language that ithas become familiar to high-level language programmers If you understand how floating pointbehaves in Java, the contents of this section will be quite familiar; if not, they may be ratherstrange

Unlike most other programming languages, XPath does not use scientific notation for point numbers, either on input or on output If you want to enter the number one trillion, youmust write 1000000000000, not 1.0E12

floating-IEEE 754 defines the following range of values for a double-precision number:

Finite nonzero

values

These are values of the form s × m × 2x, where s (the sign) is +1 or

-1, m (the mantissa) is a positive integer less than 253, and x (theexponent) is an integer between -1075 and 970, inclusive

Positive zero This is the result of subtracting a number from itself It can also

result from dividing any positive number by infinity, or from dividing

a very small number by a very large number of the same sign.Negative zero This is the result of dividing any negative number by infinity It can

also result from dividing a positive number by minus infinity, or fromdividing a very small negative number by a very large positivenumber, or vice versa

Trang 29

Note that division by zero is not an error: it has a well-defined result.Negative infinity This is the result of dividing any negative number by zero It can

also result from multiplying two very large numbers with differentsigns

NaN Not a Number This is the result of attempting to convert a

non-numeric string value to a number It can also be used to mean

"unknown" or "not applicable", like the SQL null value

These values cannot all be written directly as XPath constants However, they can be expressed

as the result of expressions, for example:

Positive Infinity 1 div 0

Negative Infinity -1 div 0

Technically, negative numbers cannot be written directly as constants: «-10» is an expressionrather than a number, but in practice it can be used anywhere that a numeric constant can beused The only thing you need to be careful of is that a space may be needed before the unaryminus operator if you write an expression such as «$x div -1»

Except for NaN, number values are ordered Arranged from smallest to largest, they are:

This ordering determines the result of less-than and greater-than comparisons, and also the result

of sorting using <xsl:apply-templates> or <xsl:for-each> with a sort key specified using

<xsl:sort data-type="number">

NaN is unordered, so the operators «<», «<=», «>», and «>=» return false if either or bothoperands are NaN However, when <xsl:sort> is used to sort a sequence of numeric values thatincludes one or more NaN values, NaN values are collated at the start of the sequence (or at theend if you choose descending order)

The position of NaN in the sort sequence was undefined in the original XSLT1.0 specification, and this omission was corrected later in an erratum So youmay encounter XSLT processors that don't handle it in this way

Positive zero and negative zero compare equal This means that the operators «=», «<=», and

«>=» return true, while «!=», «<», and «>» return false However, other operations candistinguish positive and negative zero; for example, «1.0 div $x» has the value positiveinfinity if $x is positive zero, and negative infinity if $x is negative zero

The equals operator «=» returns false if either or both operands are NaN, and the not-equalsoperator «!=» returns true if either or both operands are NaN Watch out for the apparentcontradictions this leads to; for example «$x=$x» can be false, and «$x<$y» doesn't necessarilygive the same answer as «$y>$x»

The simplest way to test whether a value $x is NaN is:

«not(NaN=NaN)» is true

XPath provides a number of operators and functions that act on numeric values:

The numerical comparison operators «<», «<=», «>», and «>=» Note that within astylesheet, you may need to use XML escape conventions to write these, forexample «<» in place of «<»

The numerical equality operators «=» and «!=»

The unary minus operator «-»

The multiplicative operators «*», «div», and «mod»

The additive operators «+» and «-»

The number() function, which can convert from any value to a number.The string()and format-number() functions, which convert a number to astring

The boolean() function, which converts a number to a Boolean

The functions round(), ceiling(), and floor(), which convert a number to aninteger

The function sum(), which totals the numeric values of a set of nodes

Operators on numbers behave exactly as specified by IEEE 754 XPath is not as strict as Java indefining exactly what rounding algorithms should be used for inexact results, and in whatsequence operations should be performed Many implementations, however, will follow the Javarules

XPath numeric operators and functions never produce an error An operation that overflowsproduces positive or negative infinity, an operation that underflows produces positive or negativezero, and an operation that has no other sensible result produces NaN All numeric operationsand functions with NaN as an operand produce NaN as a result For example, if you apply thesum() function to a node-set, then if the string value of any of the nodes cannot be converted to

Trang 30

String Values

A string value in XPath is any sequence of zero or more characters, where the alphabet ofpossible characters is the same as in XML: essentially the characters defined in Unicode.String values can be written in XPath expressions in the form of a literal, using either singlequotes or double quotes, for example 'John' or "Mary" In theory, the string can contain theopposite quote character as part of the value, for example "John's" In practice, however,XPath expressions are written within XML attributes, so the opposite quote character willgenerally already be in use for the attribute delimiters For more details, see the section Literal

in page 389

There is no special null value, as there is in SQL Where no other value is appropriate, a length string is used In fact the terms null string and empty string are used interchangeably torefer to a zero-length string

zero-The only ASCII control characters permitted (codes below #x20) are the whitespace characters

#x9, #xA, and #xD (tab, carriage return, and newline)

Strings may be compared using the «=» and «!=» operators They are compared character bycharacter (there is no space-padding as in SQL) The implementation is allowed to normalizethe strings before comparing them, to handle different Unicode representations of the sameaccented character, but it is not required to do so There is no operator or function provided tocompare two strings ignoring case: the best you can achieve (if you know the strings are restricted

to a limited alphabet such as ASCII) is to convert from lowercase to uppercase, or vice versa,using the translate() function Otherwise, use an external user-defined function

There are no operators or functions to compare whether one string is greater than or less thananother in alphabetic sequence: the «<» and «>» operators always force a numeric comparison.The only way around this (other than writing an external function) is to do a sort; write bothstrings as elements on a temporary tree, then process the elements on the tree in sorted order.When counting characters in a string, for example in the string-length() function, it is thenumber of XML characters that is relevant, not the number of 16-bit Unicode codes This meansthat Unicode surrogate pairs are counted as a single character Unicode surrogate pairs, whichare used to extend Unicode beyond 65,535 characters, are very rarely encountered in practice,though their use may increase in the future

Node-set Values

A node-set is a set of nodes in the source document tree If there are multiple source documenttrees, a node-set may contain nodes from more than one tree The nodes may also come fromtemporary trees, or from trees passed into the stylesheet from external functions or as parameters.The nodes in a node-set may be any type of node, and different types of node can be mixed inthe same node-set It is a pure mathematical set; each node can appear at most once, and there

to find out whether the current element has a name attribute

The nodes in a node-set may have children, but the children are not regarded as members of thenode-set For example, the expression «/» returns a node-set containing a single node, the root.The other nodes subordinate to the root can be reached from this node, but they are notthemselves members of the node-set, and the value of «count(/)» is therefore always 1

A node-set is not intrinsically ordered, though in many contexts the nodes are processed indocument order Where two nodes come from the same document, their relative position indocument order is based on their position in the document tree: for example an elementprecedes its children in document order, and sibling nodes are listed in the same order as theyappear in the original source document Where two nodes come from different documents, theirrelative order is undefined The ordering of attribute and namespace nodes is defined onlypartially: an element node is followed by its namespace nodes, then its attributes, and then itschildren, but the ordering of the namespace nodes among themselves, and of the attributenodes among themselves, is undefined

Temporary Trees

XSLT allows you to construct a new tree, and to refer to this tree using a variable In XSLT 1.0,such trees are considered to be a data type in their own right, known as a result tree fragment Aresult tree fragment behaves exactly like a node-set containing a single node, the root of thetree, but it cannot be used in every context where node-sets are allowed Specifically, there areonly two things you can do with a result tree fragment: you can copy it to another tree using the

<xsl:copy-of> instruction, or you can extract its string value (the concatenation of all textnodes in the tree) Many vendors, however, provide an extension function to convert a result treefragment to the equivalent node-set

With XSLT 1.1, a temporary tree behaves exactly like a node-set, and can be used anywhere that

a node-set can be used The conversion to a node-set is implicit, whenever you use a tree-valuedvariable in a context that expects a node-set So the additional data type has disappeared, andwith it the rather awkward term "result tree fragment"

In fact, XSLT 1.1 no longer has a name for this concept at all I've found I need

a name to explain the concept in this book, so I've invented the terms

"temporary tree" and "tree-valued variable" These terms aren't found in the

XSLT Recommendation

A temporary tree always contains a root node, and the root node may have children A tree doesnot necessarily correspond to a well-formed XML document, for example the root node can owntext nodes directly, and it can have more than one element node among its children However, itmust conform to the same rules as an XML external parsed entity: for example, all the attributesbelonging to an element node must have distinct names

Example: Temporary Trees

A temporary tree is constructed by instantiating the body of an <xsl:variable> declaration, forexample:

<xsl:variable name="tree"

>AAA<xsl:element name="x">

<xsl:attribute name="att">att-value</xsl:attribute

>BBB</xsl:element>

Trang 31

(The strange layout is to ensure there is no white space adjacent to the text values, because thatwould be difficult to show on the tree diagram The effects of whitespace are discussed inChapter 3.)

This creates the tree illustrated in the diagram below Each box shows a node; the three layersare respectively the node type, the node name, and the string-value of the node Once again, anasterisk indicates that the string-value is the concatenation of the string-values of the child nodes

In XSLT 1.0, as we have seen, there are only two things you can do with this tree once it hasbeen constructed: you can copy it to the current destination tree (which might be the final resulttree or another temporary tree) using the <xsl:copy-of> instruction, or you could convert itsvalue to a string Converting it to a string gives the concatenation of all the text nodes in thetree: in the above example this is «AAABBBCCC»

In XSLT 1.1 you can manipulate the tree in many other ways For example:

You can apply templates to process its nodes, using <xsl:apply-templates> It's

a good idea to use a different mode, so that you know which template rules aresupposed to apply to which tree

You can access its nodes explicitly using a path expression, for example

<xsl:value-of select="$tree/x/@att"/> will output the text «att-value».You can use the sum() and count() functions, for example «count($tree//*)»will return 2 (the number of element nodes on the tree)

This means, for example, that you can use a temporary tree as a lookup table The followingstylesheet fragment uses data held in a temporary tree to get the name of the month, given itsnumber held in a variable $mm:

Trang 32

Chapter 8 Contents

Overview

When are Extension Functions

Needed?

Calling Extension Functions

What Language is Best?

Binding Extension Functions

XPath Trees and the DOM

The Java Language Binding

Identifying the Java Class

Choosing a Java Method

Rules for Converting

Arguments

Handling the Return Value

Using Java Extension

Functions

The XSLTContext Object

The JavaScript Language

The process of calling a Java method falls into four stages:

Using the namespace prefix of the XPath function name to identify a Java class.Using the local part of the XPath function name, and if necessary the data types of thesupplied arguments, to identify a method defined in that Java class

Converting the supplied arguments to Java objects

Handling the return value from the method, and any exceptions it throws

We'll look at all these four stages in turn

The Java language binding was designed to satisfy two slightly different objectives:

Firstly, to make it easy to call methods available in the standard Java class librarydirectly from your stylesheet This allows you to perform common tasks such asobtaining the current date, getting a random number, performing trigonometriccalculations, or testing whether a particular file exists

Secondly, to make it easy to write new Java methods to augment your stylesheet withapplication-specific logic, for example a method that is given a product code, andreturns the product description by performing an SQL query on your product database.The first objective resulted in an interface that makes it possible to call virtually any Java method.The main limitation is that if the method throws an exception, execution of the stylesheet will fail.The second objective resulted in facilities that allow your Java code to determine details of theXSLT and XPath processing context, which we will come back to on page 590

Identifying the Java Class

As we've seen, a call on an extension function will always use a prefixed name, for example

«my:function()» The prefix must be declared in a namespace declaration that is in scope at thispoint in the stylesheet, and the system uses this namespace declaration to determine the relevantnamespace URI For example, if the namespace declaration is

«xmlns:my="http://schmidt.de/functions"», then this will be the namespace URI we arelooking for

I would normally recommend using a namespace URI that identifies the requiredclass directly, for example «xmlns:math=java:java.lang.Math» However, theclass and the namespace are distinct things, so giving them the same name wouldconfuse this explanation We can start overloading the names in this way oncewe've understood clearly what's going on

The processor then searches for an <xsl:script> element whose implements-prefix attributeidentifies a prefix that maps to this same namespace URI Often of course it will be the same prefix,but as always, it is the namespace URI that matters

If the processor finds an <xsl:script> element identifies the fully-qualified Java class name,prefixed by «java:» to make it a valid URI

For example, if the function call is

<xsl:value-of select="dt:new()"/>

then the <xsl:script> element might be

<xsl:script language="java" implements-prefix="dt" src="java:java.util.Date"/> This identifies the required Java class as java.util.Date This class should normally be on theclass path Alternatively, it is possible to specify the path explicitly, using the archive attribute; this

is especially useful when running the processor within a web browser, since it allows the classes to bedownloaded from the server on demand

If there are several <xsl:script> elements for the same language with the same precedence.Choosing a Java Method

The local part of the name in the XPath function call is used to identify a method to be called

If the name used in the call is «new», then the system looks for public constructors rather than publicmethods; in all other cases it looks for a method with a matching name The matching is done asfollows:

First of all, hyphens in the name are removed, and any character that immediately follows a hyphen

is forced to upper case For example, if the XPath function call is written as number(), then the processor will look for a Java method called getAccountNumber() This ispurely a cosmetic process that allows both languages to use their normal naming conventions If youknow that you want to call the getAccountNumber() method, then you could equally well write thecall as acct:getAccountNumber() if you prefer

acct:get-account-Next, the system identifies all the public methods in the chosen class that have this name It is anerror if there are none Of course, a Java class can have several methods with the same name anddifferent arguments (this capability is usually called method overloading), so when this happens theprocessor now needs to look at the arguments to the call

The system next eliminates any method that has the wrong number of arguments This doesn'tnecessarily mean it must have exactly the same number of arguments as the function call There aretwo factors that complicate the calculation:

If the first argument in the Java method signature is of classorg.w3c.xsl.XSLTContext, then this argument doesn't contribute to the total Theprocessor will supply the value of this argument, as discussed on page 590, therefore itdoesn't correspond to any argument in the XPath function call

If the Java method being considered is an instance-level method (as distinct from a

Trang 33

external object, which is used as the target of the method call.

Any methods that have the wrong number of arguments, after taking these adjustments into account,are eliminated from consideration

If this still leaves more than one candidate method, the selection now depends on the data types ofthe arguments to the XPath function call Since XPath is a dynamically-typed language, the datatypes of the arguments can't always be determined in advance, so this final stage may have to bedone at run-time Potentially, the same XPath function call might cause different Java methods to beexecuted on different occasions

The basic principle is to choose the method whose declared argument types are the best fit to thesupplied argument types The actual algorithm is similar to the one used by Java itself Although itisn't the way the rules are described in the specification, the way I like to explain the algorithm is byshowing that it chooses the method that requires minimum conversion effort for the arguments.Between any XPath data type and any Java class there is a certain conversion effort: the effort is zero

if the types correspond exactly, and it is infinite if the conversion is not allowed So for eachcandidate method, it is possible to calculate the conversion effort for each argument (the actual

"effort" values are given in the next section) If the XPath data type of the supplied argument is A,and the Java class of the required argument is P, then the conversion effort is a function of A and P:say effort(A, P) The table starting on page 577 shows us, for example that effort(number, double)=0and effort(number, String)=16

Suppose that the XPath data types of the supplied arguments are A1, A2, and A3, and that we aretrying to choose between two Java methods, method P whose declared arguments are of Java classes

P1, P2, and P3, and method Q whose arguments are of Java classes Q1, Q2, and Q3

We don't simply work out the total effort by adding the conversion effort for each argument Rather,

we have to find a method where for each argument individually, the conversion effort for thatargument is no worse than any other method

If for each of the arguments n=1 3 (in our example), either Pn is the same class as Qn, or effort(An, Pn)

< effort(An, Qn), then method P is preferred to method Q (There must be at least one argumentwhere the two methods differ, otherwise Java would have rejected the methods as duplicates.) Ifthere is one argument where effort(An, Pn) < effort(An, Qn), and another where effort(Am, Pm) >effort(Am, Qm), then neither P nor Q is preferred

If, under these rules, one of the candidate methods is preferred to each one of the others, then that isthe one chosen If none of the methods is a clear winner, then an error is reported

To put this in concrete terms, suppose that the function call is

Method Effort for conv erting argument 1 Effort for conv erting argument 2

D were not available, the function call would be bounced as an error

Conversion Effort Table

The table below is used to calculate the conversion effort for converting an XPath value to a Javaobject (or value) The XPath data type of the supplied argument is listed along the top of the table,the Java class of the required argument on the leftmost column The numbers in this table, of course,are just relative indications of conversion cost We are only interested in knowing that it is less effort

to convert a boolean to a String than to a Float; the actual values are irrelevant

Trang 34

(note 1)

Rules for Converting Arguments

We've given a table that shows the notional effort in performing each possible conversion, but now

we need to explain exactly how the conversions are done What does it mean, for example, toconvert an XPath boolean to a Java byte?

The rules are given in the following sections, organised according to the XPath data type of thesupplied argument value

Converting from an XPath boolean

Boolean.TRUE or Boolean.FALSE

«false»

double, float, long, int, short, byte,

Double, Float, Long, Integer, Short,

Byte

False is supplied as 0, true as 1

Converting from an XPath number

float, Float The value is rounded to the precision that

can be accommodated in a float value.Values that are out of range for a float areconverted to plus or minus infinity

long, Long, int, Integer, short, Short,

char, Character, byte, Byte

The value is truncated to an integer (roundedtowards zero) by removing its fractional part

If the value is out of range for the target datatype, an error is reported An error is alsoreported if the supplied value is NaN (not anumber)

boolean() function; zero becomes false,anything else becomes true

the XPath string() function

object

Converting from an XPath string

char, Character If the string is one character long, this

character is supplied as the value If thestring is zero-length or longer than onecharacter, an error is reported

Double, Float, Long, Integer, Short, Byte

The XPath string is converted to an XPathnumber using the number() function, andthe resulting XPath number is thenconverted to the required Java typefollowing the rules given in the previoustable

Converting from an XPath node-set

org.w3c.dom.NodeList The nodes in the node-set are

presented in the form of a DOMNodeList The list will be indocument order An error may bereported if the node-set contains anynamespace nodes

Trang 35

following the rules of the XPathstring() function (This gets thestring value of the first node if there is

at least one, or an empty string if thenode-set is empty.)

org.w3.dom.NodeList

using the XPath string() function

If this string contains exactly onecharacter, that character is supplied;otherwise, an error is reported

Double, Float, Long, Integer, Short, Byte

The node-set is converted to anumber using the XPath number()function, and this is then converted tothe required Java type in the sameway as if an XPath number had beensupplied as the argument This willcause an error if the value is notnumeric

the node-set is empty, true if itcontains at least one node

Converting from an external object

Any Java class C such that «OBJ

instanceof C» is true, where OBJ is the

wrapped external object

Unwrap the external object

char, Character Call the wrapped object's toString() method

Use this value if it is exactly one character long,otherwise report an error

Byte

Call the wrapped object's toString() method,then convert this string to a number using therules of the XPath number() function Convertthe resulting number to the required Java type

as if an XPath number had been supplied in thefunction call

Handling the Return Value

If the Java method throws an exception, an error is reported It is not possible for the stylesheet to trapthe error and continue processing In practice this means that it's a bad idea to call Java methodsthat throw exceptions unless (a) you can prevent the exception occurring by making sure you get allthe arguments right, or (b) the error would be fatal anyway

If this is a problem, you can get round it by writing a wrapper method in Java that traps the exceptionand indicates the failure to the stylesheet within the return value, for example by returning a specialvalue such as -1

If the Java method returns void, the result of the XPath function is (arbitrarily) an empty node-set.(The only point in calling a method that returns void is to cause some side effects, so take care; seethe notes on functions with side-effects on page 587.)

If the Java method returns null, the result of the XPath function is an external object that is null.Unfortunately there is no direct way to test an external object to see if it is null, except by passing it toanother extension function To confound matters further, Java doesn't have a simple method you cancall to test if an object is null I hunted around the Java API documentation and the best I could findwas the static method identityHashCode() on class java.lang.System, which returns zero if theargument is null So you can write:

Simple return values are converted to XPath values as follows:

Value returned by Jav a method Result of XPath function call

deliberately doesn't define what happens ifthe string contains characters that are valid inJava but not in XML, for example the ASCIIcontrol character NUL (#x00) It's an error ifthe string contains such characters, but thesystem isn't obliged to report it

Trang 36

Byte

If the Java method returns a DOM NodeList, or a single Node, the rules can become quite complex,and at the time of writing they aren't very clearly defined Some of the complications are:

Many products will only be able to handle DOM data structures that were constructedusing that particular vendor's product In general, you can't expect to construct a DOMusing, say, the Xerces XML parser, and then pass this DOM into the Oracle XSLTengine for processing: it will only accept a DOM constructed using its stable-mateOracle parser

What happens if you call a sequence of extension functions, and several functionsreturn the same DOM node? Will these each return the same XPath node (producingthe same value for the generate-id() function), or to different XPath nodes? For aproduct that uses the DOM as its internal data structure, this probably isn't too much of aproblem; you'll get the same node back each time But for a product that performs amapping or conversion between the DOM structure and its own internal treerepresentation, there are considerable difficulties in preserving node identity, especially

as the Java DOM specification makes no guarantee that the same node in the DOM willalways be represented by the same Java object - the DOM specification is deliberatelywritten so that nodes can be materialized on demand, for example by fetching datafrom a database

The XSLT specification does allow extension functions to return NodeLists or individual Nodes, but

it doesn't really answer all these questions, so if you do this, you will probably find that yourstylesheet is not fully portable

Using Java Extension Functions

The simplest extension functions to write, and the kind to stick to whenever you can, are statelessmethods, which get no external information from their environment and leave nothing behind whenthey have returned These are best written as static methods, and we'll look at these first

For more complex functions it may be necessary to maintain information between calls: for example

if you want to generate a sequence of random numbers This will normally require the use of aninstance-level method, so we'll discuss those afterwards

Calling Static Methods

To call a static method all you need to know is the class name, the method name, and the expectedarguments

A typical example is the method max(a, b) found in the class java.lang.Math This takes twonumbers as arguments and returns whichever of them is the greater

You can call this function as follows:

<xsl:script implements-prefix="Math" xmlns:math="java:java.lang.Math"

is the one that minimizes the conversion effort

Notice that both the arguments we supplied were XPath numbers If we hadn't used the number()function to convert them to numbers, they would have been node-sets Looking back at the table onpage 577, the effort for converting an XPath node-set to a Java double, a float, a long, or an int

is exactly the same This means that none of the four methods would be preferred, and an errorwould be reported

Static methods such as max() fit very cleanly into the XSLT processing model, because they have noside effects It doesn't matter how often they are called, the same function with the same argumentswill always produce the same results So you don't need to worry about the order of execution ofinstructions in the stylesheet As we'll see, the same can't be said of all external methods.Calling Constructors and Instance-level Methods

You can call a Java constructor to create a new instance of a class as if it were a static method callednew()

The result of a constructor is a Java object, and this will be wrapped inside an XPath value with thespecial data type external obj ect The only thing you can do with an external object is to pass it toanother extension function You can either pass it as an argument to a method, or you can invoke itsown instance-level (that is, non-static) methods To invoke an instance-level method of an object, yousupply that object as the first argument of the XPath function call, followed by the values of theordinary arguments if there are any

For example, suppose you want to output the current date The Java class java.util.Date has adefault constructor that creates a Date object representing the current date and time; and the Dateobject has an instance-level method toString() that outputs the date in the format «Sat Jan 0614:32:29 GMT 2001» You can use the XPath substring() function to extract the parts of the dateyou actually want to display

So, here's the stylesheet code to do this:

<xsl:script language="java" implements-prefix="Date"

Trang 37

This will insert into the output file a comment of the form:

<! Output generated at 14:32 on 06 Jan 2001 >

There are two external function calls here: a call to Date:new(), which calls the zero-argumentconstructor of the java.util.Date class to create a Date object, and a call to Date:toString(),which takes this object as its only argument Because Date:toString() is identified as an instance-level (non-static) method, the first argument of the XPath function call must be a Date object, andthis Date object is taken as the target of the method invocation

The value returned by the first function call and passed to the second is not one of the standardXPath data types, it is an external object In this example we used the external object immediately,

by passing it directly to another function call, but we could equally well have stored it in a variablefor use later, or passed it as a parameter to a template If you do store an external object in avariable, however, take care not to attempt any XPath operations on it: apparently harmlessoperations such as comparing it with another value, or converting it to a string or a boolean, will allcause errors

Although this example requires the set of XPath data types to be extended to incorporate the idea of

an external object, it still fits very cleanly into the XSLT processing model, because both the functioncalls are free of side effects In this particular case it's no longer true that we'd get the same resultevery time we called the function - the whole point of this particular function is that the resultsdepend on when it was called But we're still on pretty safe ground The next example will tread intoslightly more dangerous territory

Calling External Functions within a Loop

In the next example we will use a Java BufferedReader object to read an external file, copying it

to the output one line at a time, each line being followed by an empty <br/> element

Example: Calling External Functions within a Loop

Take your arrows, jasper-headed,

Take your war-club, Puggawaugun,

And your mittens, Minjekahwan,

And your birch-canoe for sailing,

And the oil of Mishe-Nama

Stylesheet

The stylesheet can be downloaded as reader.xsl

First we declare the namespaces we will need Since the namespaces are used both in the

<xsl:stylesheet> element itself I shall stick to my convention of using the same «java:*» URIboth to identify the location of the Java class, and as its namespace URI, and I will also use theabbreviated class name as the namespace prefix You won't usually want these namespacesappearing in the result document, so you can suppress them using exclude-result-prefixes

exclude-result-prefixes="FileReader BufferedReader System">

<xsl:script language="java" implements-prefix="FileReader"

to test whether an external Java object is null So we'll use the System.identityHashCode() trickthat I mentioned earlier; this function returns zero if the argument is null

Trang 38

Take your bow, O Hiawatha,<br/>

Take your arrows, jasper-headed,<br/>

Take your war-club, Puggawaugun,<br/>

And your mittens, Minjekahwan,<br/>

And your birch-canoe for sailing,<br/>

And the oil of Mishe-Nama.<br/>

</out>

In this example the function call does have side effects, because the $reader variable is an externalJava object which holds information about the current position in the file being read, and advancesthis position each time a line is read from the file In general, function calls with side effects aredangerous, because XSLT does not define the order in which statements are executed But in thiscase, the logic of the stylesheet is such that an XSLT processor would have to be very devious indeed

to execute the statements in any order other than the obvious one The fact that the recursive call onthe read-lines template is within an <xsl:if> instruction that tests the $line variable means thatthe processor is forced to read a line, test the result, and then, if necessary, make the recursive call toread further lines

The next example uses side effects in a much less controlled way, and in this case causes results thatwill vary from one XSLT processor to another

Functions with Uncontrolled Side Effects

Just to illustrate the dangers of using functions with side effects, we'll include an example where theeffects are not predictable

Example: A Function with Uncontrolled Side Effects

Source

Like the previous example, this stylesheet doesn't access the primary source document, so any XMLfile will do For example, you can use the stylesheet itself as the source document

In this example we'll read an input file containing names and addresses, for example

addresses.txt We'll assume this file is created by a legacy application and consists of groups offive lines Each group contains a customer number on the first line, the customer's name on thesecond, an address on lines three and four, and a telephone number on line five Because that's theway legacy data files often work, we'll assume that the last line of the file contains the string «****».15668

exclude-result-prefixes="FileReader BufferedReader System">

<xsl:script language="java" implements-prefix="FileReader"

Trang 39

$line3, $line4, and $line5 will be evaluated in the order we've written them There is noguarantee of this, and there are processors in existence for which this stylesheet will fail to producethe expected output The processor is quite at liberty, for example, not to evaluate a variable until it

is used, which means that $line3 will be evaluated before $line2, and worse still, $line5 (because

it is never used) may not be evaluated at all, meaning that instead of reading a group of five linesfrom the file, the template will only read four lines each time it is invoked

So this stylesheet may work on some XSLT processors but it will fail on others

Output

The output depends entirely on the sequence of execution chosen by the particular XSLT processor

As with the previous example, I couldn't find an XSLT processor that was capable of running thisstylesheet at the time of writing

In practice I would probably tackle this problem a completely different way With a TrAX-basedprocessor such as Saxon or Xalan, I would write a Java application to read the lines of the file andoutput them as a set of element nodes, as if the application were a SAX2 parser I would specify aURIResolver that invokes this application if the URI takes a particular value, and then I would callthe document() function from the stylesheet, with this URI as the argument value, to retrieve thecontents of the data file as a secondary input document

With MSXML3, I would probably adopt a similar approach, but using the DOM I would write aJavaScript extension function that reads the file and populates a DOM, which it then returns to thestylesheet as a result of the extension function call

Both these solutions are safer because they involve a single function call from the stylesheet toretrieve the entire file contents, so there is no dependency on the order of execution

The XSLTContext Object

One of the aims of the XSLT language designers was to allow users to create libraries of extensionfunctions that mirror the capability and calling conventions of the core functions provided in XPathand XSLT, so calling an extension function should feel the same as calling a built-in function Many

of the core functions need to know the XPath context; for example, functions such as id() operate on the context node if they are called with no arguments So it was felt that extensionfunctions should have access to information about the context This is provided through an implicitextra parameter, which is supplied to a Java method if its signature is written to expect its firstargument to be of class org.w3c.xsl.XSLTContext

generate-The XSLTContext object provides the following methods None of them defines any specificexception conditions, though like any Java methods they are allowed to throw standard exceptionssuch as java.lang.IllegalArgumentException and java.lang.NullPointerException.org.w3c.dom.Node getContextNode()

This method returns the XPath context node, represented as a DOM Node: the same node asretrieved by the XPath expression «.» You can use DOM methods to navigate from this node toother nodes in the tree It isn't defined what happens in the unusual case where the context node is anamespace node (namespace nodes can't be represented directly in the DOM)

createProcessingInstruction, and importNode If you call any other methods, the effect isundefined

Object systemProperty(String namespaceURI, String localName)

This method returns the same result as the XPath system-property() function Instead ofsupplying the property name as a QName as you would do in XPath, the two components of theexpanded name are supplied separately The type of the result depends on the property that wasrequested

String stringValue(org.w3c.dom.Node node)

This method returns the string-value of the specified node, following the XPath rules This is aconvenience method, made available because it is surprisingly difficult to write an equivalentfunction that uses DOM interfaces only

public abstract class Extensions {

public static String nodeType(XSLTContext context) {

Node node = context.getContextNode();

if (node instanceof Document) return "root";

if (node instanceof Element) return "element";

if (node instanceof Attr) return "attribute";

if (node instanceof ProcessingInstruction)

return "processing-instruction";

if (node instanceof Comment) return "comment";

if (node instanceof Text) return "text";

return "unknown node type";

Trang 40

This method can then be invoked from a stylesheet as follows:

<xsl:if test="my:node-type() = 'attribute'">

<xsl:message terminate="yes">Context node is an

If you want to use such v endor-specific facilities, you can still keep your extension functionsportable by using a structure such as:

public static String methodName(XSLTContext context) {

if (context instanceof com.icl.saxon.Context) {

com.icl.saxon.Context saxonContext =

(com.icl.saxon.Context)context;

// logic for Saxon

} else if (context instanceof

org.apache.xalan.extensions.ExpressionContext){ org.apache.xalan.extensions.ExpressionContext xalanContext =

Định dạng
Số trang	629
Dung lượng	15,4 MB