free ebook for IT beginner focus xml programming
Trang 2Beginning XML
4th Edition
David Hunter, Jeff Rafter, Joe Fawcett, Eric van der Vlist, Danny Ayers, Jon Duckett, Andrew Watt, and Linda McKinnon
Trang 3Beginning XML
4th Edition
Trang 5Beginning XML
4th Edition
David Hunter, Jeff Rafter, Joe Fawcett, Eric van der Vlist, Danny Ayers, Jon Duckett, Andrew Watt, and Linda McKinnon
Trang 6Copyright © 2007 by Wiley Publishing, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978-0-470-11487-2
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2 1
Library of Congress Cataloging-in-Publication Data:
Beginning XML / David Hunter [et al.] 4th ed
01923, (978) 750-8400, fax (978) 646-8600 Requests to the Publisher for permission should be addressed to the LegalDepartment, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4355, oronline at www.wiley.com/go/permissions
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY:THE PUBLISHER AND THE AUTHOR MAKE NO SENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OFTHIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WAR-RANTIES OF FITNESS FOR A PARTICULAR PURPOSE NO WARRANTY MAY BE CREATED OR EXTENDED BYSALES OR PROMOTIONAL MATERIALS THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BESUITABLE FOR EVERY SITUATION THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER ISNOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES IF PROFES-SIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BESOUGHT NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HERE-FROM THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATIONAND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THEPUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOM-MENDATIONS IT MAY MAKE FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED INTHIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN ANDWHEN IT IS READ
REPRE-For general information on our other products and services please contact our Customer Care Department within theUnited States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002
Trademarks:Wiley, the Wiley logo, Wrox, the Wrox logo, Programmer to Programmer, and related trade dress are marks or registered trademarks of John Wiley & Sons, Inc and/or its affiliates, in the United States and other countries,and may not be used without written permission All other trademarks are the property of their respective owners WileyPublishing, Inc., is not associated with any product or vendor mentioned in this book
trade-Wiley also publishes its books in a variety of electronic formats Some content that appears in print may not be available
in electronic books
Trang 7I would like to thank God, for continuing to give me opportunities to do what I love; my church family, for giving me more support than I deserve; and Andrea, for giving me more support than anyone deserves
I would also like to thank the editors, for their constant help Their dedication to the quality of this book was a major factor in its success.
—David
To Ali and Jude, for their loving patience.
—Jeff
To my two brothers, Peter and Stephen, who have both helped me in my life
and career in their own ways, many thanks.
Trang 8About the Authors
David Hunteris a Senior Technical Consultant for CGI, a full-service IT and business process servicespartner Providing technical leadership and guidance for solving his clients’ business problems, he is ajack-of-all-trades and master of some With a career that has included design, development, support,training, writing, and other roles, he has had extensive experience building scalable, reliable, enterprise-class applications David loves to peek under the hood at any new technology that comes his way, andwhen one catches his fancy, he really gets his hands dirty He loves nothing more than sharing thesetechnologies with others
Jeff Rafteris an independent consultant based in Redlands, California His focus is on emerging nology and web standards, including XML and validation He currently works with Baobab HealthPartnership with a focus on improving world health
tech-Joe Fawcett(http://joe.fawcett.name) started programming in the 1970s and worked briefly in ITwhen leaving full-time education He then pursued a more checkered career before returning to softwaredevelopment in 1994 In 2003 he was awarded the title of Microsoft Most Valuable Professional in XMLfor community contributions and technical expertise; he has subsequently been re-awarded every yearsince Joe currently works in London and is head of software development for FTC Kaplan Ltd., a lead-ing international provider of accountancy and business training
Eric van der Vlistis an independent consultant and trainer His domains of expertise include web opment and XML technologies He is the creator and main editor of XMLfr.org, the main site dedicated
devel-to XML technologies in French, the lead author of Professional Web 2.0 Programming, the author of the O’Reilly animal books XML Schema and RELAX NG and a member or the ISO DSDL (http://dsdl.org)working group focused on XML schema languages He is based in Paris and can be reached at
vdv@dyomedea.com, or meet him at one of the many conferences where he presents his projects
Danny Ayersis a freelance developer and consultant specializing in cutting-edge web technologies His blog (http://dannyayers.com) tends to feature material relating to the Semantic Web and/or cat photos
Linda McKinnonhas more than 10 years of experience as a successful trainer and network engineer,assisting both private and public enterprises in network architecture design, implementation, systemadministration, and RFP procurement She is a renowned mentor and has published numerous Linuxstudy guides for Wiley Press and Gearhead Press
Trang 9Quality Control Technician
Proofreading
Aptara
Indexing
Broccoli Information Management
Anniversary Logo Design
Richard Pacifico
Trang 11This book would not have been possible without the work of the many developers dedicated to ing the Web through standards We would also like to thank the countless contributors to mailing lists,IRC channels, forums, and friends that have helped us through the difficult corners of the specificationsand technologies presented in this book
improv-Thanks to Nicholas C Zakas for his ideas and assistance in implementing the AutoSuggest Control Manythanks to Phillip Pearson, who runs TopicExchange.com He provided much-needed technical supportthat otherwise would have meant rewriting most of Chapter 14 We would also like to thank Jim Ley andDoug Schepers for their assistance on the case study and Chapter 19 Special thanks to our lead editor,Sara Shlaer, for her gentle and not so gentle persuasive powers and attention to detail; to editor LisaThibault, for her thoughtful assistance; and to Phred Menyhert, for a rigorous technical edit Many thanks
to our acquisitions editor, Jim Minatel, who has shepherded this book through many incarnations
Trang 13Where XML Can Be Used, and What You Can Use It For 20
Trang 15Part II: Validation 93
Trang 16Creating Elements with Simple Content and Attributes 185
Trang 17Part III: Processing 247
Trang 18Chapter 8: XSLT 287
Influencing the Output with the <xsl:output> Element 306
Named Templates and the <xsl:call-template> Element 322
Trang 19Part IV: Databases 337
Trang 20Exercise Questions 373
Trang 21Two Ways to View DOM Nodes 448
Trang 22Part VI: Communication 519
Trang 23Exercise Questions 605
Trang 24Part VII: Display 689
Trang 26XForms Form Controls 817
Chapter 22: Case Study: Payment Calculator — Ruby on Rails Online
Trang 27Appendix A: Exercise Solutions 873
Appendix E: XML Schema Element and Attribute Reference Online
Trang 29Welcome to Beginning XML, Fourth Edition, the book I wish I’d had when I was first learning the language! When we wrote the first edition of this book, XML was a relatively new language but already gainingground fast and becoming more and more widely used in a vast range of applications By the time westarted the second edition, XML had already proven itself to be more than a passing fad, and was in factbeing used throughout the industry for an incredibly wide range of uses As we began the third edition,
it was clear that XML was a mature technology, but more important, it became evident that the XMLlandscape was dividing into several areas of expertise In this edition, we needed to categorize theincreasing number of specifications surrounding XML, which either use XML or provide functionality inaddition to the XML core specification
So what is XML? It’s a markup language, used to describe the structure of data in meaningful ways.Anywhere that data is input/output, stored, or transmitted from one place to another, is a potential fitfor XML’s capabilities Perhaps the most well-known applications are web-related (especially with thelatest developments in handheld web access—for which some of the technology is XML-based)
However, there are many other non-web-based applications for which XML is useful—for example, as areplacement for (or to complement) traditional databases, or for the transfer of financial informationbetween businesses News organizations, along with individuals, have also been using XML to dis-tribute syndicated news stories and blog entries
This book aims to teach you all you need to know about XML—what it is, how it works, what gies surround it, and how it can best be used in a variety of situations, from simple data transfer to usingXML in your web pages It answers the fundamental questions:
technolo-❑ What is XML?
❑ How do you use XML?
❑ How does it work?
❑ What can you use it for, anyway?
Who Is This Book For?
This book is for people who know that it would be a pretty good idea to learn XML but aren’t 100 cent sure why You’ve heard the hype but haven’t seen enough substance to figure out what XML is andwhat it can do You may be using development tools that try to hide the XML behind user interfaces andscripts, but you want to know what is really happening behind the scenes You may already be somehowinvolved in web development and probably even know the basics of HTML, although neither of thesequalifications is absolutely necessary for this book
Trang 30per-What you don’t need is knowledge of markup languages in general This book assumes that you’re new
to the concept of markup languages, and we have structured it in a way that should make sense to thebeginner and yet quickly bring you to XML expert status
The word “Beginning” in the title refers to the style of the book, rather than the reader’s experiencelevel There are two types of beginner for whom this book is ideal:
❑ Programmers who are already familiar with some web programming or data exchange niques Programmers in this category will already understand some of the concepts discussedhere, but you will learn how you can incorporate XML technologies to enhance those solutionsyou currently develop
tech-❑ Those working in a programming environment but with no substantial knowledge or ence of web development or data exchange applications In addition to learning how XML tech-nologies can be applied to such applications, you will be introduced to some new concepts tohelp you understand how such systems work
experi-How This Book Is Organized
We’ve arranged the subjects covered in this book to take you from novice to expert in as logical a ner as we could In this Fourth Edition, we have structured the book in sections that are based on vari-ous areas of XML expertise Unless you are already using XML, you should start by reading the
man-introduction to XML in Part I From there, you can quickly jump into specific areas of expertise, or, if youprefer, you can read through the book in order Keep in mind that there is quite a lot of overlap in XML,and that some of the sections make use of techniques described elsewhere in the book
❑ We begin by explaining what exactly XML is and why the industry felt that a language like thiswas needed
❑ After covering the why, the next logical step is the how, so we show you how to create
well-formed XML
❑ Once you understand the whys and hows of XML, you’ll go on to some more advanced thingsyou can do when creating your XML documents, to make them not only well formed, but valid.(And you’ll learn what “valid” really means.)
❑ After you’re comfortable with XML and have seen it in action, we unleash the programmerwithin and look at an XML-based programming language that you can use to transform XMLdocuments from one format to another
❑ Eventually, you will need to store and retrieve XML information from databases At this point,you will learn not only the state of the art for XML and databases, but also how to query XMLinformation using an SQL-like syntax called XQuery
❑ XML wouldn’t really be useful unless you could write programs to read the data in XML ments and create new XML documents, so we’ll get back to programming and look at a couple
docu-of ways that you can do that
❑ Understanding how to program and use XML within your own business is one thing, but ing that information to a business partner or publishing it to the Internet is another You’ll learnabout technologies that use XML that enable you to send messages across the Internet, publishinformation, and discover services that provide information
Trang 31send-❑ Since you have all of this data in XML format, it would be great if you could easily display it topeople, and it turns out you can We’ll show you an XML version of HTML called XHTML.You’ll also look at a technology you may already be using in conjunction with HTML docu-ments called CSS CSS enables you to add visual styles to your XML documents In addition,you’ll learn how to design stunning graphics and make interactive forms using XML.
❑ Finally, we end with a case study, which should help to give you ideas about how XML can beused in real-life situations, and which could be used in your own applications
What’s Covered in This Book
This book builds on the strengths of the earlier editions, and provides new material to reflect thechanges in the XML landscape—notably XQuery, RSS and Atom, and AJAX Updates have been made toreflect the most recent versions of specifications and best practices throughout the book In addition tothe many changes, each chapter has a set of exercise questions to test your understanding of the mate-rial Possible solutions to these questions appear in Appendix A
Here we cover some basic concepts, introducing the fact that XML is a markup language (a bit like
HTML) whereby you can define your own elements, tags, and attributes (known as a vocabulary) You’ll
see that tags have no presentation meaning—they’re just a way to describe the structure of the data
Part II: Validation
In addition to the well-formedness rules you learn in Part I, you will most likely want to learn how tocreate and use different XML vocabularies This Part introduces you to DTDs, XML Schemas, andRELAX NG: three languages that define custom XML vocabularies We also show you how to utilizethese definitions to validate your XML documents
Trang 32Chapter 4: Document Type Definitions
You can specify how an XML document should be structured, and even provide default values, using
Document Type Definitions (DTDs) If XML conforms to the associated DTD, it is known as valid XML.
This chapter covers the basics of using DTDs
Chapter 5: XML Schemas
XML Schemas, like DTDs, enable you to define how a document should be structured In addition todefining document structure, they enable you to specify the individual datatypes of attribute values andelement content They are a more powerful alternative to DTDs
Chapter 6: RELAX NG
RELAX NG is a third technology used to define the structure of documents In addition to a new syntaxand new features, it takes the best from XML Schemas and DTDs, and is therefore very simple and verypowerful RELAX NG has two syntaxes; both the full syntax and compact syntax are discussed
Part III: Processing
In addition to defining and creating XML documents, you need to know how to work with documents
to extract information and convert it to other formats In fact, easily extracting information and ing it to other formats is what makes XML so powerful
Part IV: Databases
Creating and processing XML documents is good, but eventually you will want to store those ments This section describes strategies for storing and retrieving XML documents and document frag-ments from different databases
docu-Chapter 9: XQuery, the XML Query Language
Very often, you will need to retrieve information from within a database XQuery, which is built onXPath and XPath2, enables you to do this in an elegant way
Chapter 10: XML and Databases
XML is perfect for structuring data, and some traditional databases are beginning to offer support for XML This chapter discusses these, and provides a general overview of how XML can be used in
an n-tier architecture In addition, new databases based on XML are introduced
Trang 33Part V: Programming
At some point in your XML career, you will need to work with an XML document from within a customapplication The two most popular methodologies, the Document Object Model (DOM) and the SimpleAPI for XML (SAX), are explained in this part
Chapter 11: The Document Object Model (DOM)
Programmers can use a variety of programming languages to manipulate XML using the DocumentObject Model’s objects, interfaces, methods, and properties, which are described in this chapter
Chapter 12: Simple API for XML (SAX)
An alternative to the DOM for programmatically manipulating XML data is to use the Simple API forXML (SAX) as an interface This chapter shows how to use SAX and utilizes examples from the Java SAXAPI
Part VI: Communication
Sending and receiving data from one computer to another is often difficult, but several technologies havebeen created to make communication with XML much easier In this part we discuss RSS and content syn-dication, as well as web services and SOAP This edition includes a new chapter on Ajax techniques
Chapter 13: RSS, Atom, and Content Syndication
RSS is an actively evolving technology that is used to publish syndicated news stories and website maries on the Internet This chapter not only discusses how to use the different versions of RSS andAtom, it also covers the future direction of the technology In addition, we demonstrate how to create asimple newsreader application that works with any of the currently published versions
sum-Chapter 14: Web Services
Web services enable you to perform cross-computer communications This chapter describes web vices and introduces you to using remote procedure calls in XML (using XML-RPC and REST), as well asgiving you a brief look at major topics such as SOAP Finally, it breaks down the assortment of specifica-tions designed to work in conjunction with web services
ser-Chapter 15: SOAP and WSDL
Fundamental to XML web services, the Simple Object Access Protocol (SOAP) is one of the most popularspecifications for allowing cross-computer communications Using SOAP, you can package up XML doc-uments and send them across the Internet to be processed This chapter explains SOAP and the WebServices Description Language (WSDL) that is used to publish your service
Chapter 16: Ajax
Ajax enables you to utilize JavaScript with web services and SOAP, or REST communications
Additionally, Ajax patterns can be used within web pages to communicate with the web server withoutrefreshing This chapter is new to the Fourth Edition
Trang 34Part VII: Display
Several XML technologies are devoted to displaying the data stored inside of an XML document Some
of these technologies are web-based, and some are designed for applications and mobile devices In thispart we discuss the primary display strategies and formats used today
Chapter 17: Cascading Style Sheets (CSS)
Website designers have long been using Cascading Style Sheets (CSS) with their HTML to easily makechanges to a website’s presentation without having to touch the underlying HTML documents Thispower is also available for XML, enabling you to display XML documents right in the browser Or, if youneed a bit more flexibility with your presentation, you can use XSLT to transform your XML to HTML orXHTML and then use CSS to style these documents
Chapter 18: XHTML
XHTML is a new version of HTML that follows the rules of XML In this chapter we discuss the ences between HTML and XHTML, and show you how XHTML can help make your sites available to awider variety of browsers, from legacy browsers to the latest browsers on mobile phones
differ-Chapter 19: Scalable Vector Graphics (SVG)
Do you want to produce a custom graphic using XML? SVG enables you to describe a graphic usingXML-based vector commands In this chapter we teach you the basics of SVG and then dive into a morecomplex SVG-based application that can be published to the Internet
Chapter 20: XForms
XForms are XML-based forms that can be used to design desktop applications, paper-based forms, and
of course XHTML-based forms In this chapter we demonstrate both the basics and some of the moreinteresting uses of XForms
Part VIII: Case Study
Throughout the book you’ll gain an understanding of how XML is used in web, business-to-business (B2B),data storage, and many other applications The case study covers an example application and shows howthe theory can be put into practice in real-life situations The case study is new to this edition
Chapter 21: Case Study: Payment Calculator
This case study explores some of the possibilities and strategies for using XML in your website Itincludes an example that demonstrates a loan payment calculator by creating a web page using XHTMLand CSS, communicating with a local web service using AJAX, utilizing an XML Schema to build datastructures in NET, and ultimately using the Document Object Model to display the results in SVG Anonline version of this case study on the book’s website covers the same material using Ruby on Railsinstead of NET
Appendixes
Appendix A provides answers to the exercise questions that appear throughout the book The remainingappendixes provide reference material that you may find useful as you begin to apply the knowledgegained throughout the book in your own applications
Trang 35The appendixes consist of the following:
❑ Appendix A: Exercise Solutions
❑ Appendix B: XPath Reference
❑ Appendix C: XSLT Reference
❑ Appendix D: The XML Document Object Model
❑ Appendix E: XML Schema Element and Attribute Reference
❑ Appendix F: XML Schema Datatypes Reference
❑ Appendix G: SAX 2.0.2 ReferenceAppendixes A, B, and C are included within the book; Appendixes D–G are available on the book’s website
What You Need to Use This Book
Because XML is a text-based technology, all you really need to create XML documents is Notepad or anequivalent text editor However, to truly appreciate some of these samples in action, you might want tohave a current Internet browser that can natively read XML documents, and even provide error mes-sages if something is wrong In any case, screenshots are provided throughout the book so that you cansee what things should look like Additionally, note the following:
❑ If you do have Internet Explorer, you also have an implementation of the DOM, which you mayfind useful in the chapters on that subject
❑ Some of the examples and the case studies require access to a web server, such as Microsoft’s IIS(or PWS) or Apache
❑ Throughout the book, other (freely available) XML tools are used, and we give instructions forobtaining these
Within the validation section of the book we provide instructions on how to use Codeplot (http://codeplot.com) Codeplot is an online collaborative code editor with support for a wide assortment ofXML technologies Because many validation tools require programming experience or large downloads,the examples in this section instead use Codeplot Codeplot can also be used to check the well-formed-ness of your XML documents, to transform XML documents using XSLT, and to assist you in codingXHTML, CSS, and SVG The editor is free and was built using many of the techniques described in thisbook
Trang 36Additionally, we have attempted to show the use of XML in a variety of programming languages,including Java, JavaScript, PHP, Python, Visual Basic, ASP, C#, and Ruby on Rails Therefore, while there
is a good chance that you will see an example written in your favorite programming language, there isalso a good chance you will encounter an example in a language you have never used Whenever a newlanguage is introduced, we include information on downloading and installing the necessary tools touse it Because our focus is XML, regardless of which programming language is used in an example, thecore XML concept is explained in detail
Conventions
To help you get the most from the text and keep track of what’s happening, we’ve used several tions throughout the book
conven-Try It Out
The Try It Out is an exercise you should work through, following the text in the book
1. They usually consist of a set of steps
2. Each step has a number
3. Follow the steps with your copy of the database.
How It Works
After each Try It Out, the code is explained in detail
Tips, hints, tricks, and asides to the current discussion are offset and placed in italics like this.
As for styles in the text:
❑ We highlight new terms and important words when we introduce them.
❑ We show filenames, URLs, and code within the text like so: persistence.properties
❑ We present code in two different ways:
In code examples we highlight new and important code with a gray background
The gray highlighting is not used for code that’s less important in the presentcontext, or has been shown before
Boxes like this one hold important, not-to-be forgotten information that is directly
relevant to the surrounding text.
Trang 37Source Code
As you work through the examples in this book, you may choose either to type in all the code manually
or to use the source code files that accompany the book All of the source code used in this book is able for download at www.wrox.com Once at the site, simply locate the book’s title (either by using theSearch box or by using one of the title lists) and click the Download Code link on the book’s detail page
avail-to obtain all the source code for the book
Because many books have similar titles, you may find it easiest to search by ISBN; this book’s ISBN is
SB 978-0-470-11487-2
Once you download the code, just decompress it with your favorite compression tool Alternately, youcan go to the main Wrox code download page at www.wrox.com/dynamic/books/download.aspxtosee the code available for this book and all other Wrox books
Errata
We make every effort to ensure that there are no errors in the text or in the code However, no one is fect, and mistakes do occur If you find an error in one of our books, such as a spelling mistake or afaulty piece of code, we would be very grateful for your feedback By sending in errata you may saveanother reader hours of frustration, and at the same time you will be helping us provide even higherquality information
per-To find the errata page for this book, go to www.wrox.comand locate the title using the Search box orone of the title lists Then, on the book details page, click the Book Errata link On this page you can viewall errata that has been submitted for this book and posted by Wrox editors A complete book list, includ-ing links to each book’s errata, is also available at www.wrox.com/misc-pages/booklist.shtml
If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact/
techsupport.shtmland complete the form there to send us the error you have found We’ll check the information and, if appropriate, post a message to the book’s errata page and fix the problem in subsequent editions of the book
p2p.wrox.com
For author and peer discussion, join the P2P forums at p2p.wrox.com The forums are a web-based tem for you to post messages relating to Wrox books and related technologies and interact with otherreaders and technology users The forums offer a subscription feature to e-mail you topics of interest ofyour choosing when new posts are made to the forums Wrox authors, editors, other industry experts,and your fellow readers are present on these forums
sys-At http://p2p.wrox.comyou will find a number of different forums that will help you not only as youread this book, but also as you develop your own applications To join the forums, just follow these steps:
1. Go to p2p.wrox.comand click the Register link
2. Read the terms of use and click Agree
Trang 383. Complete the required information to join as well as any optional information you wish to vide and click Submit.
pro-4. You will receive an e-mail with information describing how to verify your account and
com-plete the joining process
You can read messages in the forums without joining P2P, but in order to post your own messages, you must join.
Once you join, you can post new messages and respond to messages other users post You can read sages at any time on the Web If you would like to have new messages from a particular forum e-mailed
mes-to you, click the Subscribe mes-to this Forum icon by the forum name in the forum listing
For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to tions about how the forum software works as well as many common questions specific to P2P and Wroxbooks To read the FAQs, click the FAQ link on any P2P page
Trang 39ques-Part I Introduction
Chapter 1: What Is XML?
Chapter 2: Well-Formed XML
Chapter 3: XML Namespaces