From document type definitions and style sheets to XPointers, schemas, the Wireless Markup Language, XHTML and other advanced tools and applications, XML expert Elliotte Rusty Harold giv
Trang 1If XML can do it, you can do it too
Now revised and expanded to cover the latest XML technologies and applications, this all-in-one tutorial and
reference shows you step by step how to put the power of XML to work in your Web pages From document
type definitions and style sheets to XPointers, schemas, the Wireless Markup Language, XHTML and other
advanced tools and applications, XML expert Elliotte Rusty Harold gives you all the know-how and examples
you need to integrate XML with HTML, solve real-world development challenges, and create data-driven content.
Inside, you’ll find complete coverage of XML
• Create well-formed XML documents
• Place international characters in documents
• Validate documents against DTDs and schemas
• Use entities to build large documents from smaller parts
• Embed non-XML data in your documents
• Format your documents with CSS and XSL style sheets
• Connect documents with XLinks and XPointers
• Merge different XML vocabularies with namespaces
• Write metadata for Web pages using RDF
• Harness XML for site design, vector graphics,
and other real-world applications
Java 1.1 or later compatible platform such as Mac
OS 8.5 or later, Windows 95/98/Me/NT/2000,
Harness the power of CSS and XSL to format XML documents
Take XML to the limit using XLinks, XPointers, Schemas, SVG, and XHTML
XML
Elliotte Rusty Harold
“The XML Bible provides complete coverage on all XML-related
topics and will be an essential resource for any developer.”
—Sean Rhody, Technical Editor, XML Journal
,!7IA7G4-fehgah!:p;o;t;T;T
XML code and authoring tools
on CD-ROM!
BONUS CD-ROM!
Sample XML code XML authoring tools W3C standards
Write Web pages in foreign languages and diverse scripts
Shareware programs are fully functional, free trial versions of copyrighted programs If you like particular programs, register with their
authors for a nominal fee and receive licenses, enhanced versions, and technical support Freeware programs are free, copyrighted
games, applications, and utilities You can copy them to as many PCs as you like—free—but they have no technical support.
*85555-AEHFHa
100%C O M P R E H E N S I V E
• Code for all examples in the book, plus
additional examples
• XML authoring tools, including expat, XT, Xalan,
Xerces, Batik, FOP, SAXON, HTML Tidy, and
Mozilla
• World Wide Web Consortium XML standards
2nd Edition 2nd Edition
2nd Edition
Trang 2Second Edition
Praise for Elliotte Rusty Harold’s XML Bible
“Great book! I have about 10 XML books and this is by far the best.”
— Edward Blair, Systems Analyst, AT&T
“I recommend the XML Bible I found it to be really helpful, as I am a beginner
myself It is easy to understand, which I found most useful since I am not a head.’”
‘tech-— Marius Holth Hanssen, Independent IT Consultant
“I don’t know how to praise Elliotte Rusty Harold enough When I read a technicalbook, I don’t expect to ENJOY it in the pure sense Oh, I expect to ENJOY increasing
my knowledge or to ENJOY the experience of successfully understanding a larly poorly written passage Your text is enjoyable in the pure sense It is fun to
particu-read I don’t have to force myself to pick up XML Bible — I jump for it because I
know I will be finding something on each page to make me smile.”
— Mike Maddux, Software Architect, Texas Department of Health
“Just wanted to take a minute and send you a big thank you for writing XML Bible
and Java Beans Without those two books, my life would be so much harder!”
— Ove “Lime” Lindström, Java Consultant, Enea Realtime AB
Trang 4XML
Bible
Second Edition
Elliotte Rusty Harold
Hungry Minds, Inc
Trang 5Copyright © 2001 Hungry Minds, Inc All rights
reserved No part of this book, including interior
design, cover design, and icons, may be reproduced
or transmitted in any form, by any means (electronic,
photocopying, recording, or otherwise) without the
prior written permission of the publisher.
Library of Congress Control Number: 2001089303
ISBN: 0-7645-4760-7
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
2B/RX/QV/QR/IN
Distributed in the United States
by Hungry Minds, Inc.
Distributed by CDG Books Canada Inc for Canada; by
Transworld Publishers Limited in the United
Kingdom; by IDG Norge Books for Norway; by IDG
Sweden Books for Sweden; by IDG Books Australia
Publishing Corporation Pty Ltd for Australia and
New Zealand; by TransQuest Publishers Pte Ltd for
Singapore, Malaysia, Thailand, Indonesia, and Hong
Kong; by Gotop Information Inc for Taiwan; by ICG
Muse, Inc for Japan; by Intersoft for South Africa; by
Eyrolles for France; by International Thomson
Publishing for Germany, Austria, and Switzerland; by
Distribuidora Cuspide for Argentina; by LR
International for Brazil; by Galileo Libros for Chile; by
Ediciones ZETA S.C.R Ltda for Peru; by WS
Computer Publishing Corporation, Inc., for the
Philippines; by Contemporanea de Ediciones for
Venezuela; by Express Computer Distributors for the
Caribbean and West Indies; by Micronesia Media
Distributor, Inc for Micronesia; by Chips
Computadoras S.A de C.V for Mexico; by Editorial
Norma de Panama S.A for Panama; by American
Bookshops for Finland.
discounts, premium and bulk quantity sales, and foreign-language translations, please contact our Customer Care department at 800-434-3422, fax 317-572-4002 or write to Hungry Minds, Inc., Attn: Customer Care Department, 10475 Crosspoint Boulevard, Indianapolis, IN 46256.
For information on licensing foreign or domestic rights, please contact our Sub-Rights Customer Care department at 212-884-5000.
For information on using Hungry Minds’ products and services in the classroom or for ordering examination copies, please contact our Educational Sales department at 800-434-2086 or fax 317-572-4005 For press review copies, author interviews, or other publicity information, please contact our Public Relations department at 317-572-3168 or fax 317-572-4168.
For authorization to photocopy items for corporate, personal, or educational use, please contact Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, or fax 978-750-4470.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND AUTHOR HAVE USED THEIR BEST EFFORTS IN PREPARING THIS BOOK THE PUBLISHER AND AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS BOOK AND SPECIFICALLY DISCLAIM ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE THERE ARE NO WARRANTIES WHICH EXTEND BEYOND THE DESCRIPTIONS CONTAINED IN THIS PARAGRAPH NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES REPRESENTATIVES OR WRITTEN SALES MATERIALS THE ACCURACY AND COMPLETENESS OF THE INFORMATION PROVIDED HEREIN AND THE OPINIONS STATED HEREIN ARE NOT GUARANTEED OR WARRANTED TO PRODUCE ANY PARTICULAR RESULTS, AND THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY INDIVIDUAL NEITHER THE PUBLISHER NOR AUTHOR SHALL BE LIABLE FOR ANY LOSS OF PROFIT OR ANY OTHER COMMERCIAL DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR OTHER DAMAGES.
Netscape Communications Corporation has not authorized, sponsored, endorsed, or approved this
publication and is not responsible for its content Netscape and the Netscape Communications Corporate Logos, are trademarks and trade names of Netscape Communications Corporation.
Trademarks: All trademarks are property of their respective owners Hungry Minds, Inc is not associated
with any product or vendor mentioned in this book.
is a trademark of
Hungry Minds, Inc.
Trang 6Graphics and Production Specialists
Heather Pope, Jill Piscitelli,
Kathie Shutte
Quality Control Technicians
David Faust, Andy Hollandbeck,
Angel Perez, Dwight Ramsey,
Proofreading and Indexing
TECHBOOKS Production Services
Cover Image
Lawrance Huck
About the Author
Elliotte Rusty Harold is an internationally respected writer, programmer, and
edu-cator both on the Internet and off He got his start writing FAQ lists for the
Macintosh newsgroups on Usenet and has since branched out into books, Websites, and newsletters He’s an adjunct professor of computer science at
Polytechnic University in Brooklyn, New York His Cafe con Leche Web site at
http://www.ibiblio.org/xml/has become one of the most popular dent XML sites on the Internet
indepen-Elliotte is originally from New Orleans, to which he returns periodically in search of
a decent bowl of gumbo However, he currently resides in the Prospect Heightsneighborhood of Brooklyn with his wife, Beth, and cats, Charm (named after thequark) and Marjorie (named after his mother-in-law) When not writing books, heenjoys working on genealogy, mathematics, and quantum mechanics His previous
books include The Java Developer’s Resource, Java Network Programming, Java
Secrets, JavaBeans, XML: Extensible Markup Language, and Java I/O.
Trang 8Welcome to the second edition of the XML Bible When the first edition was
published about two years ago, XML was a promising technology with asmall but growing niche In the last two years, it has absolutely exploded XML nolonger needs to be justified as a good idea In fact, the question developers are ask-ing has changed from “Why XML?” to “Why not XML?” XML has become the dataformat of choice for fields as diverse as stock trading and graphic design More newprograms today are using XML than aren’t A solid understanding of just what XML
is and how to use it has become a sine qua non for the computer literate.
The XML Bible is your introduction to the exciting and fast-growing world of XML.
With this book, you’ll learn how to write documents in XML and how to use stylesheets to convert those documents into HTML so that legacy browsers can readthem You’ll also learn how to use document type definitions (DTDs) to describeand validate documents You’ll experience a variety of XML applications in manydomains, ranging from finance to vector graphics to genealogy And you’ll learnhow to take advantage of XML for your own unique projects, programs, and Websites
Who You Are
Unlike most other XML books on the market, the XML Bible discusses XML from the
perspective of a Web-page author, not from the perspective of a software developer
I don’t spend a lot of time discussing BNF grammars or parsing element trees.Instead, I show you how you can use XML and existing tools today to more effi-ciently produce attractive, exciting, easy-to-use, easy-to-maintain Web sites
that keep your readers coming back for more
This book is aimed directly at Web-site developers I assume you want to use XML
to produce Web sites that are difficult to impossible to create with raw HTML You’ll
be amazed to discover that in conjunction with style sheets and a few free tools,XML enables you to do things that previously required either custom software cost-ing hundreds to thousands of dollars per developer, or extensive knowledge of pro-gramming languages such as Perl None of the software discussed in this book willcost you more than a few minutes of download time None of the tricks require anyprogramming
What’s New in the Second Edition
For the second edition, this book was rewritten from the ground up While I
retained the basic flavor and outline that proved so popular with the first edition,the writing has been tightened up throughout I tried to address all common
Trang 9complaints about the first edition For instance, the largest examples are nowsmaller and easier to digest Where mistakes or misstatements were found, theyhave been corrected Most important, the text has been brought completely up todate with the state of the XML world in 2001 Many technologies that were rapidlychanging, bleeding-edge tools in 1999 (XSLT, XSL-FO, XHTML, XLinks, XPointers,namespaces, etc.), have become the solid rocks on which future XML technologiesare being built Thus, it is now possible to offer much more comprehensive andfinal coverage of these, rather than the somewhat tentative first steps I took in thefirst edition.
The world never stands still for long, however In the two years since the first tion appeared, new XML technologies have issued forth at a frightening pace Theyare discussed here as well, though often with caveats that the details are still sub-ject to change There are several completely new chapters covering many of thesecutting-edge applications, including chapters on:
edi-✦ The Extensible Hypertext Markup Language (XHTML)
✦ Scalable Vector Graphics (SVG)
✦ Schemas
✦ The Wireless Markup Language (WML)Even more important than the new chapters are the new sections woven into morefamiliar chapters Although I made every effort to write more concisely in this edi-tion (My favorite reader comment about the first edition was, “It would seem to methat if you asked the author to write 10,000 words about the colour blue, he would
be able to do it without breaking into a sweat”), we still ended up with a book 200pages longer than before, and most of those 200 pages are new material scatteredthroughout the book If you liked the first edition, I can only surmise that you’regoing to like the second edition even more It is in every way a better, more compre-hensive, more accurate book If you didn’t like the first edition, I hope you’ll find thesecond more to your taste
What You Need to Know
XML does build on top of the underlying infrastructure of the Internet and the Web.Consequently, I will assume you know how to ftp files, send e-mail, and load URLsinto your Web browser of choice I will also assume you have a reasonable knowl-edge of HTML at about the level supported by Netscape 1.1 On the other hand,when I discuss newer aspects of HTML that are not yet in widespread use, such asCascading Style Sheets, I discuss them in depth
To be more specific, in this book I assume that you can:
✦ Write a basic HTML page, including links, images, and text, using a text editor
✦ Place that page on a Web server
Trang 10On the other hand, I do not assume that you:
✦ Know SGML In fact, this preface is almost the only place in the entire book
you’ll see the word SGML used XML is supposed to be simpler and more
widespread than SGML It can’t be that if you have to learn SGML first
✦ Are a programmer, whether of Java, Perl, C, or some other language XML is
a markup language, not a programming language You don’t need to be a
pro-grammer to write XML documents
What You’ll Learn
This book has one primary goal: to teach you to write XML documents for the Web
Fortunately, XML has a decidedly flat learning curve, much like HTML (and unlike
SGML) As you learn a little you can do a little As you learn a little more, you can do
a little more Thus the chapters in this book build steadily on one another They are
meant to be read in sequence Along the way you’ll learn:
✦ How to author XML documents and deliver them to readers
✦ How semantic tagging makes XML documents easier to maintain and develop
than their HTML equivalents
✦ How to post XML documents on Web servers in a form everyone can read
✦ How to make sure your XML is well formed
✦ How to validate documents against DTDs and schemas
✦ How to use entities to build large documents from smaller parts
✦ How to describe data with attributes
✦ How to embed non-XML data in your documents
✦ How to merge different XML vocabularies with namespaces
✦ How to format your documents with CSS and XSL style sheets
✦ How to connect documents with XLinks and XPointers
✦ How to write metadata for Web pages using RDF
In the final section of this book, you’ll see several practical examples of XML being
used for real-world applications, including:
✦ Web site design
✦ Schemas
✦ Push
✦ Vector graphics
✦ Genealogy
Trang 11How the Book Is Organized
This book is divided into five parts:
Part II: Document Type Definitions
Part II (Chapters 8 through 13) focuses on document type definitions (DTDs) ADTD specifies which elements are and are not allowed in an XML document, and theexact context and structure of those elements A validating parser can read a docu-ment, compare it to its DTD, and report any mistakes it finds DTDs enable docu-ment authors to ensure that their work meets any necessary criteria
In Part II, you’ll learn how to attach a DTD to a document, how to validate your uments against their DTDs, and how to write your own DTDs that solve your ownproblems You’ll learn the syntax for declaring elements, attributes, entities, andnotations You’ll learn how to use entity declarations and entity references to buildboth a document and its DTD from multiple, independent pieces This enables you
doc-to make long, hard-doc-to-follow documents much simpler by separating them indoc-torelated modules and components You’ll learn how to integrate other forms of datalike raw text and GIF image files in your XML document And you’ll learn how to usenamespaces to mix together different XML vocabularies in one document
Trang 12Part III: Style Languages
Part III, consisting of Chapters 14 through 18, teaches you everything you need to
know about style sheets XML markup specifies only what’s in a document Unlike
HTML, it does not say anything about what that content should look like
Information about an XML document’s appearance when printed, viewed in a Web
browser, or otherwise displayed is stored in a style sheet Different style sheets can
be used for the same document You might, for instance, want to use one style
sheet that specifies small fonts for printing, another one with larger fonts for
on-screen presentation, and a third with absolutely humongous fonts to project the
document on a wall at a seminar You can change the appearance of an XML
docu-ment by choosing a different style sheet without touching the docudocu-ment itself
Part III describes in detail the two style sheet languages in broadest use today,
Cascading Style Sheets (CSS) and the Extensible Stylesheet Language (XSL) CSS is a
simple style-sheet language originally designed for use with HTML It applies fixed
style rules to the contents of particular elements CSS exists in two versions: CSS
Level 1 and CSS Level 2 CSS Level 1 provides basic information about fonts, color,
positioning, and text properties and is reasonably well supported by current Web
browsers for HTML and XML CSS Level 2 is a more recent standard that adds
sup-port for aural style sheets, user interface styles, international and bidirectional text,
and more
XSL, by contrast, is a more complicated and more powerful style language that can
apply styles to the contents of elements as well as rearrange elements, add
boiler-plate text, and transform documents in almost arbitrary ways XSL is divided into
two parts: a transformation language for converting XML trees to alternative trees,
and a formatting language for specifying the appearance of the elements of an XML
tree Currently, many more tools support the transformation language than the
for-matting language
Part IV: Supplemental Technologies
Part IV consists of Chapters 19 through 21 It introduces some XML-based
lan-guages and syntaxes that layer on top of basic XML XLinks provides
XPointers introduce a new syntax you can attach to the end of URLs to link not only
to particular documents but also to particular parts of particular documents RDF
is an XML application used to embed metadata in XML and HTML documents
Metadata is information about a document, such as the author, date, and title of a
work, rather than the work itself All of these can be added to your own XML-based
markup languages to extend their power and utility
Part V: XML Applications
Part V, which consists of Chapters 22 to 28, shows you several practical uses of
XML in different domains XHTML is a reformulation of HTML 4.0 as valid XML
WML is an HTML-like language for serving Web content to cell phones, PDAs,
pagers, and other memory, display, and bandwidth limited devices Schemas are an
XML-based syntax for describing the permissible content of XML documents that’s
considerably more powerful and extensible than DTDs Scalable Vector Graphics
Trang 13(SVG) is a standard XML format for drawings recommended by the World Wide WebConsortium (W3C) The Vector Markup Language (VML) is a Microsoft-proprietaryXML application for vector graphics used by Office 2000 and Internet Explorer 5.0.Microsoft’s Channel Definition Format (CDF) is an XML-based markup language fordefining channels that can push updated Web-site content to subscribers Finally, acompletely new application is developed for genealogical data to show you not justhow to use XML tags, but why and when to choose them Combining all of these dif-ferent applications, you’ll develop a good sense of how XML applications aredesigned, built, and used in the real world.
What You Need
XML is a platform-independent technology Furthermore, most of the best softwarefor working with XML is written in Java and can run on multiple platforms Much ofthis is included on the CD in the back of the book or is freely available on theInternet To make the best use of this book and XML, you need:
✦ A Web browser that supports XML such as Mozilla, Netscape 6.0, or Opera 5.0.Internet Explorer 5.0/5.5 also supports XML; but its built-in XML parser,MSXML, is quite buggy, so you’ll need to upgrade it to MSXML 3.0 or laterbefore you’ll be able to use many of the techniques in this book
✦ A Java 1.2 or later virtual machine (Java 1.1 can do in a pinch.) You’ll justneed it to run programs written in Java You won’t need to write any programs
to use this book
How to Use This Book
This book is designed to be read more or less cover to cover Each chapter builds
on the material in the previous chapters in a fairly predictable fashion Of course,you’re always welcome to skim over material that’s already familiar to you I alsohope you’ll stop along the way to try out some of the examples and to write someXML documents of your own It’s important to learn not just by reading, but also bydoing Before you get started, I’d like to make a couple of notes about grammaticalconventions used in this book
<father> The fatherelement is not the same as the Fatherelement or the
FATHERelement Unfortunately, case-sensitive markup languages have an annoyinghabit of conflicting with standard English usage On rare occasion, this meansthat you may encounter sentences that don’t begin with a capital letter Morecommonly, you’ll see capitalization used in the middle of a sentence where youwouldn’t normally expect it Please don’t get too bothered by this All XML and
it will be obvious from the context what is meant
I have also adopted the British convention of placing punctuation inside quotemarks only when it belongs with the material quoted Frankly, although I learned towrite in the American educational system, I find the British system far more logical,
Trang 14especially when dealing with source code where the difference between a comma
or a period and no punctuation at all can make the difference between perfectly
correct and perfectly incorrect code
What the Icons Mean
Throughout the book, I’ve used icons in the left margin to call your attention to
points that are particularly important
Note icons provide supplemental information about the subject at hand, but
gen-erally something that isn’t quite the main idea Notes are often used to elaborate
on a detailed technical point
Tip icons indicate a more efficient way of doing something, or a technique that
may not be obvious
CD-ROM icons tell you that software discussed in the book is available on the
companion CD-ROM This icon also tells you whether a longer example,
dis-cussed but not included in its entirety in the book, is on the CD-ROM
Caution icons warn you of a common misconception or that a procedure doesn’t
always work quite like it’s supposed to The most common reason for a Caution
icon in this book is to point out the difference between what a specification says
should happen and what actually does
The Cross-Reference icon refers you to other chapters that have more to say about
a particular subject
About the Companion CD-ROM
Inside the back cover of this book is a CD-ROM that holds all numbered code
list-ings from this book as well as some longer examples that couldn’t fit into this book
The CD-ROM also contains the complete text of various XML specifications in XML
and HTML (Some of the specifications are also available in other formats like PDF.)
Finally, you will find an assortment of useful software for working with XML
docu-ments Many (though not all) of these programs are written in Java, so they’ll run
on any system with a reasonably compatible Java 1.1 or later virtual machine Most
of the programs that aren’t written in Java are designed for Windows 95 or later,
though there are also a few programs for Mac and Linux readers
For a complete description of the CD-ROM contents, please read Appendix A In
addition, to get a complete description of what is on the CD-ROM, you can load the
file index.html onto your Web browser The files on the companion CD-ROM are not
compressed, so you can access them directly from the CD
Trang 15Feel free to send me specific questions regarding the material in this book I’ll do
my best to help you out and answer your questions, but I can’t guarantee a reply.The best way to reach me is by e-mail:
elharo@metalab.unc.edu
org/xml/, which contains a lot of XML-related material and is updated almostdaily Despite my persistent efforts to make this book perfect, some errors havedoubtless slipped by Even more certainly, some of the material discussed herewill change over time I’ll post any necessary updates and errata on my Web site at
http://www.ibiblio.org/xml/books/bible/ Please let me know via e-mail ofany errors that you find that aren’t already listed
Elliotte Rusty Harold
elharo@metalab.unc.eduhttp://www.ibiblio.org/xml/
New York City, April 7, 2001
Trang 16The folks at Hungry Minds have all been great The acquisitions editors, John
Osborn on the first edition and Grace Buechlein on this edition, deserve cial thanks for arranging the unusual scheduling this book required to hit the mov-ing target that XML presents, as well for putting up with multiple missed deadlines.I’ll do better on the third edition guys, I promise! Sharon Nash shepherded thisbook through the development process With poise and grace, she managed theconstantly shifting outline and schedule that a book based on unstable specifica-tions and software requires Terri Varveris edited the first edition Without her,there could never have been a second edition
spe-Steven Champeon brought his SGML experience to the book, and provided manyinsightful comments on the text My brother Thomas Harold put his command
of chemistry at my disposal when I was trying to grasp the Chemical MarkupLanguage Carroll Bellau provided me with the parts of my family tree you’ll find inChapter 20 Piroz Mohseni and Heather Williamson served as technical editors onthe first edition and corrected many of my errors Heather Williamson also wroteparts of the CSS, Namespaces, and VML chapters for the first edition WandaJanePhillips wrote the original version of Chapter 27 on CDF that is adapted here
I also greatly appreciate all the comments, questions, and corrections sent in by
readers of the first edition and XML: Extensible Markup Language I hope that I’ve
managed to address most of those comments in this book They’ve definitely
helped make the XML Bible a better book Particular thanks are due to Michael
Dyck, Alan Esenther, and Donald Lancon Jr for their especially detailed comments.The agenting talents of David and Sherry Rogelberg of the Studio B Literary Agency(http://www.studiob.com/) have made it possible for me to write more or lessfull-time I recommend them highly to anyone thinking about writing computerbooks And as always, thanks go to my wife, Beth, for her endless love and
understanding
Trang 17Preface vii
Acknowledgments xv
Part I: Introducing XML 1
Chapter 1: An Eagle’s Eye View of XML 3
Chapter 2: XML Applications 17
Chapter 3: Your First XML Document 55
Chapter 4: Structuring Data 63
Chapter 5: Attributes, Empty Tags, and XSL 101
Chapter 6: Well-formedness 143
Chapter 7: Foreign Languages and Non-Roman Text 175
Part II: Document Type Definitions 209
Chapter 8: DTDs and Validity 211
Chapter 9: Element Declarations 227
Chapter 10: Entity Declarations 257
Chapter 11: Attribute Declarations 289
Chapter 12: Unparsed Entities, Notations, and Non-XML Data 317
Chapter 13: Namespaces 331
Part III: Style Languages 351
Chapter 14: CSS Style Sheets 353
Chapter 15: CSS Layouts 379
Chapter 16: CSS Text Styles 427
Chapter 17: XSL Transformations 481
Chapter 18: XSL Formatting Objects 571
Part IV: Supplemental Technologies 645
Chapter 19: XLinks 647
Chapter 20: XPointers 677
Chapter 21: The Resource Description Framework 707
Trang 18Chapter 24: Schemas 827
Chapter 25: Scalable Vector Graphics 881
Chapter 26: The Vector Markup Language 939
Chapter 27: The Channel Definition Format 965
Chapter 28: Designing a New XML Application 995
Appendix A: What’s on the CD-ROM 1025
Appendix B: XML Reference Material 1029
Appendix C: The XML 1.0 Specification, Second Edition 1089
Index 1153
End-User Licence Agreement 1212
CD-ROM Installation Instructions 1214
Trang 20Preface vii
Acknowledgments xv
Part I: Introducing XML 1 Chapter 1: An Eagle’s Eye View of XML 3
What Is XML? 3
XML is a meta-markup language 3
XML describes structure and semantics, not formatting 5
Why Are Developers Excited About XML? 6
Design of field-specific markup languages 6
Self-describing data 7
Interchange of data among applications 8
Structured and integrated data 8
The Life of an XML Document 9
Editors 9
Parsers and processors 10
Browsers and other applications 10
The process summarized 10
Related Technologies 11
HTML 11
Cascading Style Sheets 12
Extensible Stylesheet Language 12
URLs and URIs 14
XLinks and XPointers 14
The Unicode character set 15
Putting the pieces together 16
Chapter 2: XML Applications 17
XML Applications 17
Chemical Markup Language 18
Mathematical Markup Language 19
Channel Definition Format 22
Classic literature 23
Synchronized Multimedia Integration Language 25
HTML+TIME 25
Open Software Description 27
Scalable Vector Graphics 28
Vector Markup Language 30
Trang 21MusicML 31
VoiceXML 33
Open Financial Exchange 35
Extensible Forms Description Language 37
HR-XML 41
Resource Description Framework 44
XML for XML 45
XSL 46
XLinks 47
Schemas 47
Behind-the-Scene Uses of XML 48
Microsoft Office 2000 49
Netscape’s What’s Related 49
Chapter 3: Your First XML Document 55
Hello XML 55
Creating a simple XML document 56
Saving the XML file 56
Loading the XML file into a Web browser 57
Exploring the Simple XML Document 58
Assigning Meaning to XML Tags 59
Writing a Style Sheet for an XML Document 60
Attaching a Style Sheet to an XML Document 61
Chpater 4: Structuring Data 63
Examining the Data 63
Batters 64
Pitchers 66
Organization of the XML data 69
XMLizing the Data 70
Starting the document: XML declaration and root element 70
XMLizing league, division, and team data 72
XMLizing player data 74
XMLizing player statistics 74
Putting the XML document back together 76
The Advantages of the XML Format 84
Preparing a Style Sheet for Document Display 86
Linking to a style sheet 87
Assigning style rules to the root element 88
Assigning style rules to titles 89
Assigning style rules to player and statistics elements 94
Summing up 95
Chapter 5: Attributes, Empty Tags, and XSL 101
Attributes 101
Attributes versus Elements 107
Structured metadata 107
Meta-metadata 111
Trang 22What’s your metadata is someone else’s data 111
Elements are more extensible 112
Good times to use attributes 112
Empty Elements and Empty Element Tags 114
Separation of pitchers and batters 129
Element contents and the select attribute 134
A document must have exactly one root element that completely
contains all other elements 146
Text in XML 147
Elements and Tags 148
Element names 148
Every start tag must have a corresponding end tag 149
Empty element tags 149
Elements may nest but may not overlap 151
Chapter 7: Foreign Languages and Non-Roman Text 175
Non-Roman Scripts on the Web 176
Scripts, Character Sets, Fonts, and Glyphs 181
A character set for the script 182
A font for the character set 182
An input method for the character set 182
Operating system and application software 185
Legacy Character Sets 186
The ASCII character set 187
The ISO character sets 189
Trang 23The MacRoman character set 193The Windows ANSI character set 194The Unicode Character Set 195Unicode Encodings 201Unicode 3.1 202How to Write XML in Unicode 202Converting to and from Unicode 203Inserting characters in XML files with character references 204How to write XML in other character sets 205
Chapter 8: DTDs and Validity 211
Document Type Definitions 211Element Declarations 212DTD Files 214Document Type Declarations 215Internal DTDs 216Internal and external DTD subsets 217Public DTDs 218DTDs and style sheets 219Validating Against a DTD 220Command-line validators 221Web-based validators 222
Chapter 9: Element Declarations 227
Analyzing the Document 227The ANY Content Model 233The #PCDATA Content Model 234Child Elements 237Sequences 239One or More Children 240Zero or More Children 240Zero or One Child 241Grouping with Parentheses 244Choices 246Mixed Content 247Empty Elements 248Comments in DTDs 249
Chapter 10: Entity Declarations 257
What Is an Entity? 257Internal General Entities 258Defining an internal general entity reference 259Using general entity references in the DTD 262Predefined general entity references 263
Trang 24External General Entities 264
Text declarations 266
Nonvalidating parsers 268
Internal Parameter Entities 268
External Parameter Entities 270
Building a Document from Pieces 276
Chapter 11: Attribute Declarations 289
What Is an Attribute? 289
Declaring Attributes in DTDs 290
Declaring Multiple Attributes 291
Specifying Default Values for Attributes 292
#REQUIRED 292
#IMPLIED 293
#FIXED 294
Attribute Types 294
The CDATA attribute type 295
The NMTOKEN attribute type 295
The NMTOKENS attribute type 296
The enumerated attribute type 296
The ID attribute type 297
The IDREF attribute type 298
The IDREFS attribute type 299
The ENTITY attribute type 300
The ENTITIES attribute type 300
The NOTATION attribute type 301
Predefined Attributes 301
xml:space 302
xml:lang 303
Declarations of xml:lang 308
A DTD for Attribute-Based Baseball Statistics 308
Declaring SEASON attributes in the DTD 310
Declaring LEAGUE and DIVISION attributes in the DTD 310
Declaring TEAM attributes in the DTD 311
Declaring PLAYER attributes in the DTD 311
The complete DTD for the baseball statistics example 314
Chapter 12: Unparsed Entities, Notations, and Non-XML Data 317
Notations 318
Unparsed Entities 321
Declaring unparsed entities 321
Embedding unparsed entities 322
Embedding multiple unparsed entities 325
Processing Instructions 325
Conditional Sections in DTDs 329
Trang 25Chapter 13: Namespaces 331
The Need for Namespaces 331Namespace Syntax 333Defining namespaces with xmlns attributes 336Multiple namespaces 339Attributes 343Default namespaces 344Namespaces and Validity 349
Chapter 14: CSS Style Sheets 353
What Are Cascading Style Sheets? 353
A simple CSS style sheet 354Attaching style sheets to documents 354Document Type Definitions and style sheets 357CSS1 versus CSS2 358CSS3 358Comments in CSS 359Selecting Elements 360The universal selector 362Grouping selectors 363Hierarchy selectors 364Attribute selectors 366
ID selectors 366Pseudo-elements 367Pseudo-classes 369Inheritance 371Cascades 372Different Rules for Different Media 374Importing Style Sheets 375Style Sheet Character Sets 376
Chapter 15: CSS Layouts 379
CSS Units 380Length values 381URL values 383Color values 384Keyword values 388Strings 388The Display Property 388Inline elements 393Block elements 393None 393Compact and run-in elements 394
Trang 26The width and height properties 410
The min-width and min-height properties 412
The max-width and max-height properties 413
The overflow property 413
Clipping 414
Positioning 415
The position property 415
Stacking elements with the z-index property 419
The float property 420
The clear property 421
Formatting Pages 422
@page 422
The size property 422
The margin property 423
The mark property 423
The page property 423
Controlling page breaks 424
Widows and orphans 425
Chapter 16: CSS Text Styles 427
Font Properties 427
Choosing the font family 428
Choosing the font style 430
Small caps 431
Setting the font weight 431
Setting the font size 432
The font shorthand property 438
The Color Property 439
Text Properties 440
Word spacing 441
The letter-spacing property 441
The text-decoration property 443
The vertical-align property 444
The text-transform property 445
The text-align property 445
The text-indent property 446
The text-shadow property 446
The line-height property 448
The white-space property 449
Trang 27Background Properties 451The background-color property 452The background-image property 452The background-repeat property 454The background-attachment property 457The background-position property 458The background shorthand property 462Visibility 463Cursors 464The Content Property 465Quotes 466Attributes 467URIs 467Counters 468Aural Style Sheets 472The speak property 473The volume property 473Pause properties 474Cue properties 474Play-during property 474Spatial properties 475Voice characteristics 476Speech properties 478
Chapter 17: XSL Transformations 481
What Is XSL? 481Overview of XSL Transformations 482Trees 483XSLT style sheet documents 486Where does the XML transformation happen? 488How to use Xalan 488Direct display of XML files with XSLT style sheets 491XSL Templates 493The xsl:apply-templates element 494The select attribute 496Computing the Value of a Node with xsl:value-of 497Processing Multiple Elements with
xsl:for-each 499Patterns for Matching Nodes 499Matching the root node 500Matching element names 501Wild cards 502Matching children with / 504Matching descendants with // 505Matching by ID 505Matching attributes with @ 506Matching comments with comment( ) 508Matching processing instructions with processing-instruction( ) 509
Trang 28Matching text nodes with text( ) 510
Using the or operator | 510
Testing with [ ] 511
XPath Expressions for Selecting Nodes 513
Node axes 514
Expression types 520
The Default Template Rules 531
The default rule for elements 531
The default rule for text nodes and attributes 532
The default rule for processing instructions and comments 532
Implications of the default rules 532
Deciding What Output to Include 533
Attribute value templates 533
Inserting elements into the output with xsl:element 535
Inserting attributes into the output with xsl:attribute 536
Defining attribute sets 537
Generating processing instructions with xsl:processing-instruction 538
Generating comments with xsl:comment 539
Generating text with xsl:text 539
Copying the Context Node with xsl:copy 540
Counting Nodes with xsl:number 542
Default numbers 543
Number to string conversion 547
Sorting Output Elements 548
Modes 551
Defining Constants with xsl:variable 553
Named Templates 555
Passing Parameters to Templates 556
Stripping and Preserving White Space 557
Making Choices 559
xsl:if 559
xsl:choose 559
Merging Multiple Style Sheets 560
Importing with xsl:import 560
Inclusion with xsl:include 561
Embedding with xsl:stylesheet 561
Chapter 18: XSL Formatting Objects 571
Formatting Objects and Their Properties 571
Formatting properties 574
Transforming to formatting objects 579
Using FOP 581
Trang 29Page Layout 583The root element 583Simple page masters 584Page sequences 587Page sequence masters 596Content 599Block-level formatting objects 599Inline formatting objects 600Table formatting objects 601Out-of-line formatting objects 601Leaders and Rules 602Graphics 604fo:external-graphic 604fo:instream-foreign-object 607Graphic properties 609Links 611Lists 612Tables 616Inlines 622Footnotes 623Floats 623Formatting Properties 624The id property 625The language property 625Paragraph properties 625Character properties 628Sentence properties 631Area properties 633Aural properties 640
Chapter 19: XLinks 647
XLinks versus HTML Links 647Linking Elements 648Declaring XLink attributes in document type definitions 650Descriptions of the Remote Resource 652Link Behavior 653The xlink:show attribute 653The xlink:actuate attribute 655Extended Links 657Extended Link Syntax 658Arcs 661Out-of-Line Links 669
Trang 30Chapter 20: XPointers 677
Why Use XPointers? 677
XPointer Examples 678
A Concrete Example 681
Location Paths, Steps, and Sets 684
The Root Node 686
Axes 686
The child axis 687
The descendant axis 688
The descendant-or-self axis 689
The parent axis 689
The self axis 689
The ancestor axis 689
The ancestor-or-self axis 689
The preceding axis 690
The following axis 690
The preceding-sibling axis 690
The following-sibling axis 690
The attribute axis 691
The namespace axis 691
The RDF Root Element 710
The Description element 710
Namespaces 711
Multiple properties and statements 713
Resource valued properties 715
XML valued properties 718
Abbreviated RDF syntax 718
Containers 719
The Bag container 720
The Seq container 722
Trang 31The Alt container 723Statements about containers 724Statements about container members 727Statements about implied bags 729RDF Schemas 729
Chapter 22: XHTML 735
Why Validate HTML? 735Moving to XHTML 737Making the document well-formed XML 740Making the document valid 747The strict DTD 755The frameset DTD 768HTML Tidy 769What’s New in XHTML 773Character references 773Custom entity references defined in DTD 777Encoding declarations 780The xml:lang attribute 781CDATA sections 782
Chapter 23: The Wireless Markup Language 787
What Is WML? 788Hello WML 788The WML MIME media type 789Browsing the Web from your phone 790Cell phone simulators 791Basic Text Markup 794Tables 796Images 798Entity references 799Cards and Links 800Multicard decks 800The do element 801Anchors 804Selections 807The Options Menu 809Templates 810Events 811The Header 814The access element 814Meta 815Variables 816Reading and writing variables 816Input fields 819
Trang 32Select 821
Setting a new context for variables 821
Talking Back to the Server 822
The greeting schema 832
Validating the document against the schema 834
Numeric data types 854
Time data types 856
XML data types 857
String data types 858
Miscellaneous data types 859
Schemas for default namespaces 871
Multiple namespaces, multiple schemas 875
The rect element 891
The circle element 894
The ellipse element 895
Trang 33The line element 896Polygons and polylines 898Paths 899Arcs 902Curves 905Text 907Strings 907Text on a path 909Fonts and text styles 911Text spans 912Bitmapped Images 913Coordinate Systems and Viewports 914The viewport 915Coordinate systems 917Grouping Shapes 921Referencing Shapes 922Transformations 924Linking 932Metadata 933SVG Editors 936
Chapter 26: The Vector Markup Language 939
What Is VML? 939Drawing with a Keyboard 941The shape element 942Other shape attributes 944Shape child elements 945Predefined shapes 946The shapetype element 947The group element 949Positioning VML Shapes with CSS Properties 950The rotation property 953The flip property 955The center-x and center-y properties 956VML in Microsoft Office 956Settings 957Drawing a house 958
Chapter 27: The Channel Definition Format 965
What Is the Channel Definition Format? 965Creating Channels 966Determining channel content 966Creating CDF files and documents 967Linking the Web page to the channel 968Describing the Channel 970Title 970Abstract 972Logos 973
Trang 34Scheduling Updates 975
Precaching and Web Crawling 978
Precaching 978
Web crawling 978
The Reader Access Log 979
The BASE Attribute 981
The LASTMOD Attribute 982
The USAGE Element 984
Chapter 28: Designing a New XML Application 995
Organization of the Data 995
Listing the elements 997
Identifying the fundamental elements 998
Establishing relationships among the elements 1000
The Person DTD 1002
The Family DTD 1007
The Source DTD 1009
The Family Tree DTD 1010
Designing a Style Sheet for Family Trees 1017
Appendix A: What’s on the CD-ROM 1025
Appendix B: XML Reference Material 1029
Appendix C: The XML 1.0 Specification, Second Edition 1089
Index 1153
End-User Licence Agreement 1212
CD-ROM Installation Instructions 1214
Trang 36Chapter 4
Structuring Data
Chapter 5
Attributes, EmptyTags, and XSL
Chapter 6
Well-formedness
Chapter 7
Foreign Languagesand Non-Roman Text
I
Trang 38An Eagle’s Eye
View of XML
terms, what XML is and how it is used It shows you how
the different pieces of the XML equation fit together, and how
an XML document is created and delivered to readers
What Is XML?
XML stands for Extensible Markup Language (often
miscapi-talized as eXtensibleMarkup Language to justify the acronym).
XML is a set of rules for defining semantic tags that break a
document into parts and identify the different parts of the
document It is a meta-markup language that defines a syntax
in which other field-specific markup languages can be written
XML is a meta-markup language
The first thing you need to understand about XML is that it
isn’t just another markup language like Hypertext Markup
Language (HTML) or TeX These languages define a fixed set
of tags that describe a fixed number of elements If the
markup language you use doesn’t contain the tag you need,
you’re out of luck You can wait for the next version of the
markup language, hoping that it includes the tag you need,
but then you’re really at the mercy of whatever the vendor
chooses to include
XML, however, is a meta-markup language It’s a language in
which you make up the tags you need as you go along These
tags must be organized according to certain general
princi-ples, but they’re quite flexible in their meaning For instance,
if you’re working on genealogy and need to describe family
names, personal names, dates, births, adoptions, deaths,
burial sites, families, marriages, divorces, and so on, you can
create tags for each of these You don’t have to force your
data to fit into paragraphs, list items, table cells, and other
very general categories
1
In This Chapter
What is XML?Why are developersexcited about XML?The life of an XMLdocument
Related technologies
Trang 39The tags you create can be documented in a Document Type Definition (DTD).
You’ll learn more about DTDs in Part II of this book For now, think of a DTD as a
vocabulary and a syntax for certain kinds of documents For example, the MOL.DTD
in Peter Murray-Rust’s Chemical Markup Language (CML) describes a vocabularyand a syntax for the molecular sciences: chemistry, crystallography, solid statephysics, and the like It includes tags for atoms, molecules, bonds, spectra, and so
on Many different people in the field can share this DTD Other DTDs are availablefor other fields, and you can create your own
XML defines the meta syntax that field-specific markup languages such as MusicML,MathML, and CML must follow It specifies the rules for the low-level syntax, sayinghow markup is distinguished from content, how attributes are attached to ele-ments, and so forth without saying what these tags, elements, and attributes are orwhat they mean It specifies the patterns that elements must follow without giving
the >
If an application understands this meta syntax, it at least partially understands allthe languages built from this meta syntax A browser does not need to know inadvance each and every tag that might be used by thousands of different markuplanguages Instead, it discovers the tags used by any given document as it reads thedocument or its DTD The detailed instructions about how to display the content ofthese tags are provided in a separate style sheet that is attached to the document.For example, consider the three-dimensional Schrödinger equation:
Scientific papers are full of equations like this, but scientists have been waitingeight years for the browser vendors to support the tags needed to write even themost basic math Musicians are in a similar bind, because Netscape and InternetExplorer can’t display sheet music
XML means you don’t have to wait for browser vendors to catch up with what youwant to do You can invent the tags you need, when you need them, and tell thebrowsers how to display these tags
Trang 40XML describes structure and semantics,
not formatting
The second thing to understand about XML is that XML markup describes a
docu-ment’s structure and meaning It does not describe the formatting of the elements
on the page Formatting can be added to a document with a style sheet The
ment itself only contains tags that say what is in the document, not what the
docu-ment looks like
that the contents are a cell in a table In fact, some tags can have all three kinds of
heading, and the title of the page
For example, in HTML a song might be described using a definition title, definition
data, an unordered list, and list items But none of these elements actually have
anything to do with music The HTML might look something like this:
<dt>Hot Cop
<dd> by Jacques Morali, Henri Belolo, and Victor Willis
<ul>
<li> Jacques Morali
<li> PolyGram Records
any preexisting standard or specification I just made them up on the spot because