• The document contains a single document element, which may contain otherelements.. Understanding the Structure of an XML Document Each XML document is divided into two parts: the prolo
Trang 2Sas Jacobs
Beginning XML with DOM and Ajax
From Novice to Professional
Trang 3Beginning XML with DOM and Ajax: From Novice to Professional
Copyright © 2006 by Sas Jacobs
All rights reserved No part of this work may be reproduced or transmitted in any form or by any means,electronic or mechanical, including photocopying, recording, or by any information storage or retrievalsystem, without the prior written permission of the copyright owner and the publisher
ISBN-13 (pbk): 978-1-59059-676-0
ISBN-10 (pbk): 1-59059-676-5
Printed and bound in the United States of America 9 8 7 6 5 4 3 2 1
Trademarked names may appear in this book Rather than use a trademark symbol with every occurrence
of a trademarked name, we use the names only in an editorial fashion and to the benefit of the trademarkowner, with no intention of infringement of the trademark
Lead Editors: Charles Brown, Chris Mills
Technical Reviewer: Allan Kent
Editorial Board: Steve Anglin, Ewan Buckingham, Gary Cornell, Jason Gilmore, Jonathan Gennick,Jonathan Hassell, James Huddleston, Chris Mills, Matthew Moodie, Dominic Shakeshaft, Jim Sumser, Keir Thomas, Matt Wade
Project Manager: Beth Christmas
Copy Edit Manager: Nicole LeClerc
Copy Editor: Nicole Abramowitz
Assistant Production Director: Kari Brooks-Copony
Production Editor: Kelly Winquist
Compositor: Dina Quan
Proofreader: Dan Shaw
Indexer: Brenda Miller
Artist: Kinetic Publishing Services, LLC
Cover Designer: Kurt Krames
Manufacturing Director: Tom Debolski
Distributed to the book trade worldwide by Springer-Verlag New York, Inc., 233 Spring Street, 6th Floor,New York, NY 10013 Phone 1-800-SPRINGER, fax 201-348-4505, e-mail orders-ny@springer-sbm.com,
or visit http://www.springeronline.com
For information on translations, please contact Apress directly at 2560 Ninth Street, Suite 219, Berkeley,
CA 94710 Phone 510-549-5930, fax 510-549-5939, e-mail info@apress.com, or visit http://www.apress.com The information in this book is distributed on an “as is” basis, without warranty Although every precautionhas been taken in the preparation of this work, neither the author(s) nor Apress shall have any liability toany person or entity with respect to any loss or damage caused or alleged to be caused directly or indi-rectly by the information contained in this work
The source code for this book is available to readers at http://www.apress.com in the Source Code section
Trang 4Contents at a Glance
About the Author xiii
About the Technical Reviewer xv
Acknowledgments xvii
Introduction xix
■ CHAPTER 1 Introduction to XML 1
■ CHAPTER 2 Related XML Recommendations 21
■ CHAPTER 3 Web Vocabularies 53
■ CHAPTER 4 Client-Side XML 99
■ CHAPTER 5 Displaying XML Using CSS 121
■ CHAPTER 6 Introduction to XSLT 169
■ CHAPTER 7 Advanced Client-Side XSLT Techniques 191
■ CHAPTER 8 Scripting in the Browser 225
■ CHAPTER 9 The Ajax Approach to Browser Scripting 265
■ CHAPTER 10 Using Flash to Display XML 293
■ CHAPTER 11 Introduction to Server-Side XML 317
■ CHAPTER 12 Case Study: Using NET for an XML Application 349
■ CHAPTER 13 Case Study: Using PHP for an XML Application 381
■ INDEX 417
iii
Trang 6About the Author xiii
About the Technical Reviewer xv
Acknowledgments xvii
Introduction xix
■ CHAPTER 1 Introduction to XML 1
What Is XML? 2
A Brief History of XML 2
The Goals of XML 3
Understanding XML Syntax 4
Well-Formed Documents 4
Understanding the Difference Between Tags and Elements 5
Viewing a Complete XML Document 6
Understanding the Structure of an XML Document 7
Naming Rules in XML 8
Understanding the XML Document Prolog 9
Understanding Sections Within the XML Document Element 11
The XML Processing Model 16
XML Processing Types 17
DOM Parsing 17
SAX Parsing 17
Why Have Two Processing Models? 18
Some XML Tools 18
Summary 19
■ CHAPTER 2 Related XML Recommendations 21
Understanding the Role of XML Namespaces 21
Adding Namespaces to XML Documents 23
Adding Default Namespaces 23
v
Trang 7Defining XML Vocabularies 24
The Document Type Definition 25
XML Schema 29
Comparing DTDs and Schemas 36
Other Schema Types 37
XML Vocabularies 37
Displaying XML 38
XML and CSS 39
XSL 39
XPath 44
XPath Expressions 45
Identifying Specific Nodes 46
Including Calculations and Functions 46
XPath Summary 47
Linking with XML 47
Simple Links 48
Extended Links 49
XPointer 50
XML Links Summary 51
Summary 51
■ CHAPTER 3 Web Vocabularies 53
XHTML 53
Separation of Presentation and Content 54
XHTML Construction Rules 56
XHTML Tools 66
Well-Formed and Valid XHTML Documents 67
XHTML Modularization 72
MathML 73
Presentation MathML 73
Content MathML 76
Scalable Vector Graphics 77
Vector Graphic Shapes 78
Images 80
Text 81
Putting It Together 82
Web Services 86
WSDL 86
SOAP 92
Trang 8Other Web Vocabularies 96
RSS and News Feeds 96
VoiceXML 97
SMIL 97
Database Output Formats 97
Summary 98
■ CHAPTER 4 Client-Side XML 99
Why Use Client-Side XML? 99
Working with XML Content Client-Side 100
Styling Content in a Browser 100
Manipulating XML Content in a Browser 101
Working with XML in Flash 102
Examining XML Support in Major Browsers 103
Understanding the W3C DOM 103
Understanding the XML Schema Definition Language 104
Understanding XSLT 104
Microsoft Internet Explorer 104
Mozilla 112
Opera 114
Adobe (Formerly Macromedia) Flash 115
Choosing Between Client and Server 116
Using Client-Side XML 117
Using Server-Side XML 117
Summary 120
■ CHAPTER 5 Displaying XML Using CSS 121
Introduction to CSS 122
Why CSS? 122
CSS Rules 122
Styling XHTML Documents with CSS 124
Styling XML Documents with CSS 129
Attaching the Stylesheet 130
Selectors 130
Layout of XML with CSS 131
Understanding the W3C Box Model 132
Positioning in CSS 135
Trang 9Displaying Tabular Data 150
Working with Display Properties 150
Working with Floating Elements 152
Table Row Spans 154
Linking Between Displayed XML Documents 154
XLink in Netscape and Firefox 155
Forcing Links Using the HTML Namespace 157
Adding Images in XML Documents 158
Adding Images with Netscape and Firefox 158
Using CSS to Add an Image 159
Using CSS to Add Content 160
Working with Attribute Content 162
Using Attributes in Selectors 163
Using Attribute Values in Documents 164
Summary 166
■ CHAPTER 6 Introduction to XSLT 169
Browser Support for XSLT 169
Using XSLT to Create Headers and Footers 170
Understanding XHTML, XSLT, and Namespaces 172
Creating the XSLT Stylesheet 172
Understanding the Stylesheet 174
Transforming the <body> Element 174
Applying the Transformation 175
Adding the Footer 175
Transformation Without Change 175
Creating a Table of Contents 176
Selecting Each Planet with <xsl:for-each> 179
Adding a New Planet 180
Presenting XML with XSLT 181
Moving from XHTML to XML 182
Styling the XML with XSLT 182
Removing Content with XSLT 184
Understanding the Role of XPath in XSLT 185
Including Images 186
Importing Templates 187
Including Templates 188
Tools for XSLT Development 188
Summary 190
Trang 10■ CHAPTER 7 Advanced Client-Side XSLT Techniques 191
Sorting Data Within an XML Document 191
Sorting Dynamically with JavaScript 196
Adding Extension Functions (Internet Explorer) 203
Understanding More About Namespaces 205
Adding Extension Functions to the Stylesheet 206
Providing Support for Browsers Other Than IE 209
Working with Named Templates 210
Generating JavaScript with XSLT 213
Understanding XSLT Parameters 215
Understanding White Space and Modes 215
Working Through the onelinehtml Template 217
Finishing Off the Page 218
Generating JavaScript in Mozilla 219
XSLT Tips and Troubleshooting 220
Dealing with White Space 220
Using HTML Entities in XSLT 222
Checking Browser Type 222
Building on What Others Have Done 223
Understanding the Best Uses for XSLT 223
Summary 224
■ CHAPTER 8 Scripting in the Browser 225
The W3C XML DOM 225
Understanding Key DOM Interfaces 227
Examining Extra Functionality in MSXML 238
Browser Support for the W3C DOM 241
Using the xDOM Wrapper 241
xDOM Caveats 246
Using JavaScript with the DOM 246
Creating DOM Document Objects and Loading XML 247
XSLT Manipulation 251
Extracting Raw XML 253
Manipulating the DOM 253
Putting It into Practice 257
Understanding the Application 257
Examining the Code 258
Dealing with Large XML Documents 262
Summary 264
Trang 11■ CHAPTER 9 The Ajax Approach to Browser Scripting 265
Understanding Ajax 266
Explaining the Role of Ajax Components 266
Understanding the XMLHttpRequest Object 267
Putting It Together 276
Username Validation with the XMLHttpRequest Object 276
Contacts Address Book Using an Ajax Approach 279
Using Cross-Browser Libraries 284
Sarissa 285
Other Ajax Frameworks and Toolkits 287
Backbase 287
Bindows 287
Dojo 287
Interactive Website Framework 287
qooxdoo 287
Criticisms of Ajax 288
Providing Visual Cues 288
Updating the Interface 288
Preloading Data 289
Providing Links to State and Enabling the Back Button 289
Ajax Best Practices and Design Principles 289
Minimizing Server Traffic 290
Using Standard Interface Methods 290
Using Wrappers or Libraries 290
Using Ajax Appropriately 290
Summary 290
■ CHAPTER 10 Using Flash to Display XML 293
The XML Class 294
Loading an XML Document 294
Understanding the XML Class 297
Understanding the XMLNode Class 298
Loading and Displaying XML Content in Flash 301
Updating XML Content in Flash 305
Sending XML Content from Flash 309
Trang 12Using the XMLConnector Component 310
Loading an XML Document 311
Data Binding 313
Updating XML Content with Data Components 315
Understanding Flash Security 316
Summary 316
■ CHAPTER 11 Introduction to Server-Side XML 317
Server-Side vs Client-Side XML Processing 317
Server-Side Languages 318
.NET 319
PHP 321
Working Through Simple Examples 323
The XML Document 324
Transforming the XML 324
Adding a New DVD 331
Modifying an Existing DVD 339
Deleting a DVD 346
Summary 348
■ CHAPTER 12 Case Study: Using NET for an XML Application 349
Understanding the Application 349
Setting Up the Environment 350
Understanding the Components of the News Application 352
Summary 380
■ CHAPTER 13 Case Study: Using PHP for an XML Application 381
Understanding the Application 381
Setting Up the Environment 381
Understanding Components of the Weather Portal Application 388
Summary 416
■ INDEX 417
Trang 14About the Author
■SAS JACOBSis a web developer who set up her own business,Anything Is Possible, in 1994, working in the areas of webdevelopment, IT training, and technical writing The businessworks with large and small clients building web applicationswith NET, Flash, XML, and databases
Sas has spoken at such conferences as Flashforward,webDU (previously known as MXDU), and FlashKit on topicsrelated to XML and dynamic content in Flash
In her spare time, Sas is passionate about traveling,photography, running, and enjoying life
xiii
Trang 16About the Technical Reviewer
■ALLAN KENTis a born-and-bred South African and still livesand works in Cape Town He has been programming in vari-ous and on diverse platforms for more than 20 years He iscurrently the head of technology at Saatchi & SaatchiCape Town
xv
Trang 18Iwant to thank everyone at Apress for their help, support, and advice during the writing of
this book Thanks also to my family who has provided much support and love throughout the
process
xvii
Trang 20This books aims to provide a “one-stop shop” for developers who want to learn how to build
Extensible Markup Language (XML) web applications It explains XML and its role in the web
development world The book also introduces specific XML vocabularies and related XML
recommendations
I wrote the book for web developers at all levels For those developers unfamiliar withXML applications, the book provides a great starting point and introduces some important
client- and server-side techniques More experienced developers can benefit from exposure
to important coding techniques and understanding the workflow involved in creating XML
applications
The book starts with an explanation of XML and introduces the different components of
an XML document It then shows some related recommendations, including Document Type
Definitions (DTDs), XML schema, Cascading Style Sheets (CSS), Extensible Stylesheet
Lan-guage Transformations (XSLT), XPath, XLink, and XPointer I cover some common XML
vocabularies, such as Extensible HyperText Markup Language (XHTML), Mathematical
Markup Language (MathML), and Scalable Vector Graphics (SVG)
The middle section of the book deals with client-side XML applications and shows how todisplay and transform XML documents with CSS and XSLT This section also explores how the
current web browsers support XML, and it covers how to use JavaScript to work with XML
doc-uments In this section, I also provide an introduction to the Asynchronous JavaScript and
XML (Ajax) approach
The book finishes by examining how to work with XML on the server It covers two side languages: PHP 5 and NET 2.0 The last chapters of the book deconstruct two XML
server-applications: a News application and a Community Weather Portal application
The book includes lots of practical examples that developers can incorporate in theirdaily work You can download the code samples from the Source Code area of the Apress web
site at http://www.apress.com I hope you find this book an invaluable reference to XML and
that, through it, you see the incredible power and flexibility that XML offers to web developers
xix
Trang 22Introduction to XML
This chapter introduces you to Extensible Markup Language (XML) and explains some of its
basic concepts It’s an ideal place to start if you’re completely new to XML The concepts that I
introduce here are covered in more detail later in the book
Web developers familiar with Extensible HyperText Markup Language (XHTML) are oftenunsure about its relationship with XML; it’s not always clear why they might need to learn
about XML as well Be assured that both technologies are important for developers
XML is a metalanguage used for writing other languages, called XML vocabularies
XHTML is one of those vocabularies, so when you understand XML, you’ll also understand the
rules underpinning XHTML XHTML is HTML that conforms to XML rules, and you’ll find out
more about this shortly
XHTML has a number of limitations It’s good at structuring and displaying information
in web browsers, but its primary purpose is not to mark up data XHTML can’t carry out
advanced functions such as sorting and filtering content You can’t create your own tags to
describe the contents of an XHTML document The fixed XHTML tags usually don’t bear any
relationship to the type of content that they contain For example, a paragraph tag is a generic
container for any type of content
XML addresses all of the limitations evident in HTML It provides more flexibility thanXHTML, as it works in concert with other standards that assist with presentation, organiza-
tion, transformation, and navigation XML documents are self-describing; their document
structures can use descriptive tags to identify the content that they mark up
I’ll cover these points in more detail within this chapter I’ll explain more about XML andshow why you might want to use it in your work The chapter will cover:
• A definition and a short history of XML
• A discussion of how to write XML documents
• Information about the processing of XML contentWhen you finish this chapter, you should have a good understanding of XML and seewhere you might be able to use it in your work I’ll start by explaining exactly what XML is
and where it fits into the world of web development
1
C H A P T E R 1
Trang 23What Is XML?
The first and most important point about XML is that it’s not a language itself Rather, it’s ametalanguage used for constructing other languages or vocabularies XML describes the rulesfor how to create these vocabularies Each language is likely to be different, but all use tags tomark up content The choice of tag names and their structures are flexible, and it’s commonfor groups to agree on standard XML vocabularies so that they can share information
An example of an XML language is XHTML XHTML describes a standard set of tags thatyou must use in a specific way Each XHTML page contains two sections described by the
<head> and <body> tags Each of those sections can include only certain tags For example, it’snot possible to include <meta> tags in the <body> section Web developers around the worldshare the same standardized approach, and web browsers understand how to render
XHTML tags
XML is a recommendation of the World Wide Web Consortium (W3C), making it a dard that is free to use The W3C provides a more formal definition of XML in its glossary athttp://www.w3.org/TR/DOM-Level-2-Core/glossary.html:
stan-Extensible Markup Language (XML) is an extremely simple dialect of SGML The goal is
to enable generic SGML to be served, received, and processed on the Web in the way that
is now possible with HTML XML has been designed for ease of implementation and for interoperability with both SGML and HTML.
A Brief History of XML
XML came into being in 1998 and is based on Standard Generalized Markup Language(SGML) SGML is an international standard that you can think of as a language for definingother languages that mark up documents HTML was based on SGML One of the key pointsabout SGML is that it’s difficult to use XML aims to be much easier
XML also owes much of its existence to HTML HTML focused on the display of content;you couldn’t use it for more advanced features such as sorting and filtering HTML wasn’t avery precise language, and it wasn’t case-sensitive It was possible to write incorrect HTMLcontent but for a browser to display the page correctly
XML addresses many of the shortcomings found in HTML In 1999, HTML was rewrittenusing the XML language construction rules as XHTML The rules for construction of anXHTML document are more precise than those for HTML The strictness with which theserules are enforced depends on which Document Type Declaration (DOCTYPE) you assign tothe XHTML page I’ll explain more about DOCTYPEs in Chapter 3
Since 1998, it’s been clear that XML is a very powerful approach to managing information.XML documents allow for the sharing of data A range of related W3C recommendationsaddress the transformation, display, and navigation within XML documents You’ll find outmore about these recommendations in Chapter 2
Trang 24Let’s summarize the key points:
• XML isn’t a language; its rules are used to construct other languages
• XML creates tag-based languages that mark up content
• XHTML is one of the languages created by XML as a reformulation of HTML
• XML is based on SGML
The Goals of XML
After the complexity of SGML, the W3C was very clear about its goals for XML You can view
these goals at http://www.w3.org/TR/REC-xml/#sec-origin-goals:
1. XML shall be straightforwardly usable over the Internet
2. XML shall support a wide variety of applications
3. XML shall be compatible with SGML
4. It shall be easy to write programs which process XML documents
5. The number of optional features in XML is to be kept to the absolute minimum,ideally zero
6. XML documents should be human-legible and reasonably clear
7. The XML design should be prepared quickly
8. The design of XML shall be formal and concise
9. XML documents shall be easy to create
10. Terseness in XML markup is of minimal importance
A few things about these goals are worth noting First, the W3C wants XML to be forward; in fact, several of the goals include the terms “easy” and “clear.”
straight-Second, the W3C has given XML two targets: humans and XML processors An XMLprocessor or parser is a software package that processes an XML document Processors can
identify the contents of an XML document; read, write, and change an existing document; or
create a new one from scratch
The aim is to open up the market for XML processors by keeping them simple to develop
Stricter construction rules mean that less processing is required This in turn means that the
targets for XML documents can be portable devices, such as mobile phones and PDAs
By keeping documents human-readable, you can access data more readily, and you canbuild and debug applications more easily The use of Unicode allows developers to create XML
documents in a variety of languages Unfortunately, a necessary side effect is that XML
docu-ments can be verbose, and describing data using XML can be a longer process than using
other methods
Trang 25Third, note the term XML document This term is broader than the traditional view of a
physical document Some XML documents exist in physical form, but others are created as astream of information following XML construction rules Examples include web services andcalls to databases where the content is returned in XML format
Now that you understand what XML is, let’s delve into the rules for constructing XMLlanguages
indi-XML allows you to construct your own tags, so you could rewrite the previous markup as:
<intro>Here is an introduction to XML.</intro>
In this example, the <intro> tag tells you the purpose of the text that it marks up One bigadvantage of XML is that tags can describe their content—that’s why XML languages are often
called self-describing.
XML is flexible enough to allow for the creation of many different types of languages todescribe data The only constraint on XML vocabularies is that they be well-formed
Well-Formed Documents
XML documents are well-formed if they meet the following criteria:
• The document contains one or more elements
• The document contains a single document element, which may contain otherelements
• Each element closes correctly
• Elements are case-sensitive
• Attribute values are enclosed in quotation marks and cannot be empty
Trang 26I’ll describe all of these criteria throughout this chapter, but it’s worthwhile highlightingsome points now XML languages are case-sensitive; this means that the tag <intro> is not the
same as <Intro> or <INTRO> In XML, these are three different tags Prior to the days of XHTML,
HTML was case-insensitive, so <body> and <BODY> were equivalent tags
All XML tags need to have an equivalent closing tag written in the same case as the ing tag So the <intro> tag must have a matching </intro> tag If no content exists between
open-the opening and closing tags, you can abbreviate it into a single tag, <intro/> Again, contrast
this with HTML, where it was possible to write a single <p> tag to add a paragraph break
The order of tags is important in XML Tags that are opened first must close last:
<chapter><intro>Here is an introduction to XML.</intro></chapter>
HTML pages had no such requirement The following would have been correct in HTML,although unacceptable in XML:
nowrap attribute in a <td> tag, didn’t need to contain an attribute name and value pair:
<td nowrap>A table cell</td>
This type of tag construction isn’t possible in XML You must replace it with somethinglike this:
<td nowrap="true">A table cell</td>
Understanding the Difference Between Tags and Elements
You may have noticed that I’ve used the terms tag and element when talking about XML
docu-ments At first glance, they seem interchangeable, but there’s a difference between the terms
The term element describes opening and closing tags as well as any content A tag is one
part of an element Tags start with an opening angle bracket and end with a closing angle
bracket Elements usually contain both an opening and closing tag as well as the content
between
The following line shows a complete element that contains the <intro> tag
<intro>Here is an introduction to XML.</intro>
Now that you understand the construction rules, it’s time to look at a complete XMLdocument
Trang 27Viewing a Complete XML Document
A complete piece of XML is referred to as a document It doesn’t matter whether you’re dealingwith XML that marks up text, information requested from a server, or records received from adatabase—all of these are documents
Each XML document is made up of markup and character data In general, the characterdata comprises the text between a start tag and an end tag, and everything else is markup Youcan further divide markup into elements, attributes, text, entities, comments, character data(CDATA), and processing instructions
The following document illustrates the different parts of an XML document You candownload it, along with the other resource files, from the Source Code area of the Apress website (http://www.apress.com) The document, called dvd.xml,describes the contents of a smallDVD library:
Trang 28This XML document also includes a comment describing its purpose:
<! This XML document describes a DVD library >
I’ve added this comment as a guide for anyone reading the XML document As withXHTML, developers normally use comments to add notations
The document or root element is called <library> You’ll notice that all elements withinthe document appear between the opening and closing <library> tags
The document element contains a number of <DVD> elements, and each <DVD> elementcontains <title>, <format>, and <genre> elements The <DVD> element also contains an id
The <title>, <format>, and <genre> elements each contain text
You can understand the structure and the contents of this document easily by looking
at the tag names It’s obvious, even without the comment, that this document describes a
list of DVDs You can also easily infer the relationship between all of the elements from the
document
Understanding the Structure of an XML Document
Each XML document is divided into two parts: the prolog and the document or root element
The prolog appears at the top of the XML document and contains information about the
document It’s a little like the <head> section of an XHTML document In the XML document
example, the prolog includes an XML declaration and a comment It can also include other
elements, such as processing instructions or a Document Type Definition (DTD) You’ll find
out more about these later in the “Processing Instructions” and “DTDs and XML Schemas”
sections
Well-formed XML documents must have a single document element that may optionallyinclude other content Any content within an XML document must appear within the docu-
ment or root element In the example XML document, the document element is <library>,
and it contains all of the other elements
You might wonder about the names that I’ve chosen for the elements within the XMLdocument You’re free to use any name for elements and attributes, providing that they con-
form to the rules for XML names
Figure 1-1 shows the structure of an XML document
Trang 29Figure 1-1.The structure of an XML document
Naming Rules in XML
Elements, attributes, and some other constructs have names within XML documents A name
is made up of a starting character followed by name characters Don’t forget that XML namesare case-sensitive
The starting character must be a letter or underscore; it can’t be a number The namecharacters can include just about any other character except a space or a colon Colons indi-cate namespaces in XML, so you shouldn’t include them within your names You’ll learn moreabout namespaces in Chapter 2 To be sure that you’re using legal characters, it’s best torestrict yourself to the uppercase and lowercase letters of the Roman alphabet, numbers,and punctuation, excluding the colon
Trang 30If you’re authoring your own XML content as opposed to generating it automatically, it’sprobably a good idea to adopt a standardized naming convention You should also use
descriptive names
I prefer to write in CamelCase and start with a lowercase letter, unless the element name
is capitalized normally:
<camelCaseElementName>Here is an element name</camelCaseElementName>
I tend to avoid using underscore characters in my names because I think it makes themharder to read
The use of descriptive names makes it easier for humans to interpret the content Imaginethe difficulty you’d have with this:
<zyxtr>Some content</zyxtr>
Let’s summarize the rules for XML names:
• XML names cannot start with a number or punctuation
• XML names cannot include spaces
• Don’t include a colon in a name unless it indicates a namespace
• XML names are case-sensitive
I’ll describe the contents of an XML document in more detail I’ll start by showing you theelements that can appear in the prolog
Understanding the XML Document Prolog
The prolog of an XML document contains metainformation about the document rather than
document content It may contain the XML declaration, processing instructions, comments,
and an embedded DTD or schema
mation about the document, such as the character-encoding type
If you include the XML declaration, it must appear on the first line of the XML document
Nothing can precede an XML declaration—not even white space If you accidentally include
white space before the declaration, XML processors won’t be able to parse the content of the
XML document correctly and will generate an error message
The XML declaration may also include attributes that provide information about the sion, encoding, and whether the document is standalone:
ver-<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Trang 31At the time of writing, the current XML version is 1.1 However, many processors don’trecognize this version, so it’s best to stick with a version 1.0 declaration for backward
Processing Instructions
The prolog can also include processing instructions (PIs) that pass information about the XMLdocument to other applications The XML processor doesn’t process PIs, but rather passesthem on to the application unchanged
PIs start with the characters <? and finish with ?> They usually appear in the prolog,although they can appear in other places within an XML document
■ Note An XML declaration also starts with the characters <?xml Even though the XML declaration lookssimilar, it’s worth remembering that it’s quite different from a PI
The following PI indicates a reference to an XSL stylesheet:
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
The first item in a PI is a name, called the PI target The preceding PI has the name xml-stylesheet Names that start with xml are reserved for XML-specific PIs The PI also hasthe text string type="text/xsl" href="stylesheet.xsl" Although this looks like two attrib-utes, the content isn’t treated that way You’ll see more examples of stylesheet PIs in
Chapters 6 and 7
Comments
Comments can appear almost anywhere in an XML document The example XML documentincluded a comment in the prolog, so let’s look at comments with the other prolog contents.XML comments look the same as XHTML comments They begin with the characters
<! and end with >:
<! Here is a comment >
Comments don’t affect the processing of an XML document They’re normally intendedfor human readers If you add a comment, you must be aware of the following rules:
Trang 32• A comment may not contain the text >.
• A comment may not be included within tag names
• A comment should not hide either the opening or closing tags in an element
• An XML processor isn’t obliged to pass a comment to an application, although most do
DTDs and XML Schemas
DTDs and XML schemas provide rules about which elements and attributes can appear within
the XML document In other words, they specify which elements and attributes are valid and
which are required or optional
The prolog can include declarations about the XML document, a reference to an externalDTD or schema, or both I’ll explain more about DTDs and schemas in Chapter 2
Understanding Sections Within the XML Document Element
The data within an XML document is stored within the document or root element This
ele-ment contains all other eleele-ments, attributes, text, and CDATA within the docuele-ment and may
also include entities and comments
Elements
Elements serve many purposes in an XML document They
• Mark up content
• Provide a description of the content they mark up
• Provide information about the order of data and its relative importance
• Show the relationships between dataElements include a starting and ending tag as well as content The content can be text,child elements, or both text and elements The starting tag for an element can also contain
attributes You can position comments inside elements
In the earlier example, you saw the following structure within the <DVD> element:
The opening <DVD> tag contains an id attribute and includes three other elements:
<title>, <format>, and <genre> Each of these elements contains text
You saw earlier that it’s necessary to open and close tags in the correct order It would bewrong to write the following:
Trang 33• Elements containing only text
• Elements containing only child elements
• Elements containing a mixture of child elements and text, or mixed elementsYou’ll see how important it is to distinguish between these different types when I coverXML schemas in Chapter 2
Elements Containing Only Text
Some elements only contain text content You’ll recall from the previous example that the
<title>, <format>, and <genre> elements contain only text:
<title>Breakfast at Tiffany's</title>
<format>Movie</format>
<genre>Classic</genre>
Elements Containing Other Elements
It’s possible for an element to contain only other elements The container element is called the
parent, while the elements contained inside are the child elements The <DVD> element is anexample of an element that contains child elements:
Trang 34Mixed Elements
Mixed elements contain both text and child elements The DVD example doesn’t include any
of these types of elements, but the following code block shows a mixed element:
<mixedElement>This element contains both text and child elements
<childElement>This element contains text</childElement>
<emptyElement/>
</mixedElement>
To summarize, elements have the following requirements:
• Elements must contain starting and ending tags, unless there is no content, in whichcase you can use the shorthand form
• The tag names must obey the XML naming rules
• Elements must be nested correctly
Attributes
Another way to provide information in XML documents is by using attributes within the
opening tag of an element Attributes normally provide additional information about the
ele-ment that they modify There is no limit to the number of attributes that can appear inside an
In this case, the data Introduction to XML is enclosed in a <p> element This element tells
a web browser to display the information in a separate paragraph The style attribute
pro-vides additional information about how to display the data Here, you’re telling the browser
to center the text
Two common uses of attributes are to convey formatting information and to indicate theuse of a specific format or encoding For example, you could convey a date as
<Date Format="mmddyyyy">06081955</Date>
or indicate use of an International Organization for Standardization (ISO) date format using
<Date Code="ISO8601">1955-06-08</Date>
When an element contains an attribute, it’s said to be a complex type element As you’ll
see later, this is important when writing XML schema documents
You can use either a pair of double or single quotes for different attributes within thesame element:
<elementName att1="value1" att2='value2'>Here is an element</elementName>
Trang 35Make sure you don’t include one of each in a single attribute, or the document won’t bewell formed.
■ Caution Be careful when cutting and pasting attributes from a word-processing document into an XMLdocument Word processors often use smart quotes, which cause an error in an XML document
You can also write an attribute as a nested child element For example, you could rewritethe <DVD> element
• An attribute is made up of a name/value pair
• You must enclose the attribute value in single or double quotes
• Attributes cannot contain an XML tag
• Attribute names must follow the XML naming rules
Text
All text within an XML document is contained inside opening and closing tags Unless youmark the text as CDATA, it will be treated as if it were XML and processed accordingly Thismeans an opening angle bracket will be treated as if it were part of an XML tag
If you want to use reserved characters within text, you must rewrite them as characterentities For example, you can write the left angle bracket < as < You can also embed thereserved characters within CDATA
Trang 36CDATA Sections
CDATA allows you to mark blocks of text so that they’re not processed as XML As I mentioned
before, this is useful for text that contains reserved XML characters:
<title><!CDATA[ Why 9 is < 10 ]]</title>
This CDATA section starts with <!CDATA[ and ends with ]] The character data is tained within the opening and closing square brackets Obviously, the string ]] can’t appear
con-within a CDATA section
You can use CDATA sections in XML documents for embedding code, such as JavaScript,and for adding content that doesn’t need processing For example, an application that reads
data from a database and marks it up in XML might embed all content in CDATA sections to
avoid the need to process the reserved characters explicitly I’ll show you an example of using
CDATA with JavaScript in Chapter 3
Entities
Character entities are symbols that represent a single character In XHTML, character entities
are used for special symbols such as an ampersand (&) and a nonbreaking space ( )
You can use character entities to replace the reserved characters in XML documents Alltags start with a left angle bracket, so it isn’t possible to include this character in the text
within an element:
<expression>10 < 25</expression>
If you try to process this element, the presence of the left angle bracket before the text 25causes a processing error Instead, you could replace this symbol with the entity <:
<expression>10 < 25</expression>
You need to consider the following reserved characters:
• <, which indicates the start of a tag name
• &, which indicates the first character of an entity
• xml, which is reserved for referring to parts of the XML language, such as stylesheet
xml-Table 1-1 summarizes the character entities that you need to use
Trang 37Sometimes you can’t include a literal character in an XML document, perhaps becausethe character doesn’t exist on a keyboard or because it’s a graphic character Instead, you canadd these as character entities using Unicode or hexadecimal numbers For example, you canencode the copyright symbol © as © or ©
If the reference starts with &# and ends with a semicolon, it’s a character reference Thenumber between is the Unicode code for the character required If the code is written as ahexadecimal, then it’s prefixed with the character x
You can also define your own entities For example, you could define the reference
©right; to mean Copyright 2006 Apress Each time you want to include this text in theXML document, you could use the entity reference ©right; This makes the text easier tomanage and update
Let’s move on to look at the processing of XML documents
The XML Processing Model
The XML recommendation assumes that an XML document will be processed in a particularway The model indicates that an XML processor passes the content and structure of the XMLdocument to an application XML processors are usually called XML parsers, as they parse theXML document; see Figure 1-2
Common XML processors include Microsoft XML Parser (MSXML), Apache Xerces2, andthe Oracle XML parser You can write an application that uses any of these parsers SomeXML parsers are also available as prepackaged software that install automatically ExtensibleStylesheet Language Transformations (XSLT) processors used to display XML in a web browserfall into this category MSXML contains both an XML parser and an XSLT processor, and isboth an XML processor and an application It installs automatically with Internet Explorer andother Microsoft software
Trang 38XML Processing Types
There are two categories of XML processing: tree-based and event-based Many XML parsers,
including later versions of MSXML, support both models You’ll often hear tree-based parsers
referred to as Document Object Model (DOM) parsers, while event-based parsers are referred
to as Simple API for XML (SAX) parsers Both are named after the specifications they support
The DOM is a W3C recommendation that provides an application programming interface(API) to an XML document Any application can use this API to manipulate an XML docu-
ment, read information, add new nodes, and edit the existing content You can find out more
about this recommendation at http://www.w3.org/TR/REC-DOM-Level-1/
SAX is not a W3C recommendation, but it does enjoy support from both large and smallsoftware companies A SAX-based parser reads an XML document sequentially, firing off
events as it reaches important parts of the document, such as the start or end of an element
You can find out more at http://www.saxproject.org/
DOM Parsing
Figure 1-3 shows the dvd.xml document that you’ve been working with represented as a tree
structure
Displaying the document in this way reinforces the relationship between the elements, as
in a family tree The <library> element is the parent of the <DVD> element and the grandparent
of the <title>, <format>, and <genre> elements The <DVD> elements are siblings and have the
<library> element as a parent or ancestor The <title>, <format>, and <genre> elements are
descendants of the <library> element
DOM parsing allows access to these elements, their values, and all other parts of an XMLdocument through either a programming language or a scripting language such as JavaScript
SAX Parsing
A SAX-based parser presents an XML document as a string of events You must write handlers
for each event so that something suitable occurs when the event triggers the handler
This type of parsing works well with languages that have good event-handling properties
For instance, SAX parsing is used extensively with Java It’s less suitable for the scripting
lan-guages often employed on the web, so I don’t cover it in detail here
Trang 39Why Have Two Processing Models?
Both processing models offer advantages DOM-based parsing provides full read-write access
to an XML document, and you can traverse the document tree to access nodes within the ument It can also validate a document against a DTD or XML schema to determine that thedocument is valid
doc-However, DOM-based parsing must read the full XML document into memory, so DOMparsing can be slow and memory-intensive when working with large XML documents It’s dif-ficult to determine exactly what constitutes a large XML document, because processing timedepends on computing power, memory, time available, and whether it’s working in a single-user environment or a multiuser environment such as a web server As a rule, most systemscope with documents up to tens of megabytes in size, but you need to take care with filesabove this size
The SAX-based model, on the other hand, is sequential in operation Once a node hasbeen processed, it is discarded and cannot be processed again The whole document isn’tloaded into memory at once, so you can avoid problems associated with processing large XMLdocuments This method of processing puts the onus on you to store any information fromthe XML document that might be required later
SAX is ideal, for example, as an intermediate routing product in a communications tem An incoming XML document is likely to consist of a small routing header and a largerdocument for delivery to the end point Using SAX, a routing device can read the routinginformation and ignore the document, as the document is irrelevant to its delivery A DOM-based parser, however, must parse the complete document to be able to deliver it to itsultimate destination
In general, XML development tools fall into several categories:
• Extensions to existing programmers’ IDEs
• XML-specific IDEs
• Individual toolsTools such as Microsoft Visual Studio (http://msdn.microsoft.com/vstudio/) fall into thefirst category They have good XML support aimed specifically at developers At the time ofwriting, the latest version is Visual Studio 2005 and includes the following features:
• It helps you create and edit XML documents, including checking whether a document
Trang 40The dedicated XML IDEs tend to cover similar ground and differ in the depth of their port and their user interfaces Most of these tools have an XML editor, tools for creating DTDs
sup-and XML schemas, sup-and support for XSLT development Several such tools are available,
including this small sample of common ones:
• Altova’s XML Suite: http://www.altova.com/suite.html
• TIBCO Software’s suite of XML tools: http://www.tibco.com/software/
business_integration/xml_tools.jsp
• DataDirect Technologies’ Stylus Studio: http://www.stylusstudio.com/
Many of the suites mentioned include individual tools that you can use for editing XMLdocuments These include
• Altova’s XMLSpy: http://www.altova.com/products_ide.html
• Blast Radius’ XMetal: http://www.xmetal.com/index.x?products/xmetal/
• SyncRO Soft’s <oXygen/>: http://www.oxygenxml.com//
There are many other excellent tools available that I haven’t mentioned here You canfind out more by searching the Internet or subscribing to mailing lists such as XML-DEV
(http://xml.org/xml/xmldev.shtml)
Summary
In this chapter, you’ve been introduced to some of the basic concepts relating to XML I’ve
covered XML syntax in some detail, and I’ve shown you the benefits that XML provides for
web developers I’ve also shown you some of the tools that you can use to work with XML
documents
In Chapter 2, I’ll show you some of the related XML recommendations You’ll learn how towork with DTDs and XML schemas You’ll also find a brief introduction to XSLT, XPath, XLinks,
and XPointer