In the example document, if you declared the cd namespace within a TITLE element rather than within the COLLECTION element, you could still apply that prefix to the element name: Violi
Trang 174 XML Step by Step
note
You can use a namespace prefix to qualify the name of the element in which the namespace is declared, even though the prefix is used before it’s declared
In the example document, if you declared the cd namespace within a TITLE
element (rather than within the COLLECTION element), you could still apply that prefix to the element name:
<cd:TITLE xmlns:cd=”http://www.mjyOnline.com/cds”>
Violin Concerto in D
</cd:TITLE>
As an alternative to creating a namespace prefix and using it to explicitly qualify
individual names, you can declare a default namespace within an element, which
will apply to the element in which it is declared (if that element has no
namespace prefix), and to all elements with no prefix within the content of that element Listing 3-5 shows the XML document from Listing 3-4 but with the
book namespace (http://www.mjyOnline.com/books) declared as a default
namespace, so that it doesn’t have to be explicitly applied to each of the book-related elements (You’ll find a copy of this listing on the companion CD under the filename Collection Default.xml.)
Collection Default.xml
<?xml version=”1.0"?>
<! File Name: Collection Default.xml >
<COLLECTION
xmlns=”http://www.mjyOnline.com/books”
xmlns:cd=”http://www.mjyOnline.com/cds”>
<ITEM Status=”in”>
<TITLE>The Adventures of Huckleberry Finn</TITLE>
<AUTHOR>Mark Twain</AUTHOR>
<PRICE>$5.49</PRICE>
</ITEM>
<cd:ITEM>
<cd:TITLE>Violin Concerto in D</cd:TITLE>
<cd:COMPOSER>Beethoven</cd:COMPOSER>
<cd:PRICE>$14.95</cd:PRICE>
</cd:ITEM>
<ITEM Status=”out”>
Trang 2Chapter 3 Creating Well-Formed XML Documents 75
<TITLE>Leaves of Grass</TITLE>
<AUTHOR>Walt Whitman</AUTHOR>
<PRICE>$7.75</PRICE>
</ITEM>
<cd:ITEM>
<cd:TITLE>Violin Concertos Numbers 1, 2, and 3</cd:TITLE>
<cd:COMPOSER>Mozart</cd:COMPOSER>
<cd:PRICE>$16.49</cd:PRICE>
</cd:ITEM>
<ITEM Status=”out”>
<TITLE>The Legend of Sleepy Hollow</TITLE>
<AUTHOR>Washington Irving</AUTHOR>
<PRICE>$2.95</PRICE>
</ITEM>
<ITEM Status=”in”>
<TITLE>The Marble Faun</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<PRICE>$10.95</PRICE>
</ITEM>
</COLLECTION>
Listing 3-5.
You declare a default namespace by assigning the namespace name to the
re-served xmlns attribute In the example document in Listing 3-5, this is done in
the COLLECTION element start-tag:
<COLLECTION
xmlns=”http://www.mjyOnline.com/books”
xmlns:cd=”http://www.mjyOnline.com/cds”>
As a result, the COLLECTION element and all nested elements within it that don’t have prefixes (namely, the book-related elements) belong to the namespace
named http://www.mjyOnline.com/books The CD-related elements all have the
cd prefix, which explicitly assigns them to the cd namespace rather than the
de-fault namespace
You can override the default namespace within a nested element by assigning a
different value to xmlns within that element For instance, in the example
docu-ment in Listing 3-5, if you defined an ITEM eledocu-ment for a CD as follows, the
ITEM element and all elements within it would not belong to a namespace (If you assign an empty string to xmlns, all nonprefixed elements within the scope
of the assignment are considered not to belong to a namespace.)
Trang 3Chapter 3 Creating Well-Formed XML Documents 77
■ When you create an XSLT style sheet, as described in Chapter 12,
you use a standard set of elements that belong to the namespace
named http://www.w3.org/1999/XSL/Transform.
note
For more information on using namespaces in XML, see the topic “Using Namespaces in Documents” in the Microsoft XML SDK 4.0 help file, or the same topic in the XML SDK documentation provided by the MSDN (Microsoft
Developer Network) Library on the Web at http://msdn.microsoft.com/library.
You’ll find the official W3C XML namespace specification on the Web at
http://www.w3.org/TR/REC-xml-names/
Characters, Encoding, and Languages
The characters you can enter into an XML document are tab, carriage-re-turn, line feed, and any of the legal characters belonging to the Unicode character set (or the equivalent ISO/IEC 10646 character set), which in-cludes characters for all the world’s written languages (For more informa-tion on these character sets and the specific characters you can use in XML, see the section “2.2 Characters” in the XML specification at
http://www.w3.org/TR/REC-xml.)
An XML file can represent, or encode, the Unicode characters in different
ways For example, if the file uses the encoding scheme known as UTF-8,
it represents a capital A as the number 65 stored in 8 bits (41 in
hexadeci-mal) However, if it uses the encoding scheme known as UTF-16, it
repre-sents a capital A as the number 65 stored in 16 bits (0041 in hexadecimal).
If you save your XML document in a plain text format using Notepad or another text or programming editor, and if you use only the standard ASCII characters (characters numbered 1 through 127 in the Unicode character set, which are the common characters you can directly enter using an English language keyboard), it’s unlikely that you’ll have to worry about encoding That’s because an XML processor will assume that the file uses the UTF-8 encoding scheme, and in a plain text file ASCII characters (and only ASCII characters) are normally encoded in conformance with the UTF-8 scheme
continued
Trang 478 XML Step by Step
Suppose, however, that you want to be able to type characters that aren’t
in the ASCII set directly into your element character data or your attribute values, such as the á and ñ in the following element:
<AUTHOR>Vicente Blasco Ibáñez</AUTHOR>
In this case, you must do two things:
1 Make sure that the XML file is encoded using a scheme that the XML processor can understand All conforming XML processors must be able to handle UTF-8 and UTF-16 encoded files, so try to use one of these schemes Some XML processors, however, support additional en-coding schemes you can use
To create your XML document, you must use a word processor or other program that can create text files in which all characters are uniformly encoded in a supported scheme For example, you can create a UTF-8 encoded XML document by opening or creating it in Microsoft Word 2002, and then saving the file by choosing the Save
As command from the File menu, selecting Plain Text (*.txt) in the Save As Type drop-down list in the Save As dialog box, clicking the Save button, and then in the File Conversion dialog box selecting the Unicode (UTF-8) encoding scheme (In Word 2000, you need to select Encoded Text (*.txt) in the Save As Type drop-down list rather than Plain Text (*.txt).)
The Microsoft Notepad editor supplied with some versions of Windows also lets you select the encoding scheme when you save a file
2 If your XML document is encoded in a scheme other than UTF-8 or
UTF-16, you must specify the name of the scheme by including an en-coding declaration in the XML declaration, immediately following the
version information For example, the following encoding declaration indicates that the file is encoded using the ISO-8859-1 scheme:
<?xml version=”1.0" encoding=”ISO-8859-1" ?>
(If you also include a standalone document declaration, as described in the sidebar “The standalone Document Declaration” on page 159, it must go after the encoding declaration.) If the XML processor can’t
handle the specified encoding scheme, it will generate a fatal error Also, if your XML document references an external DTD subset (de-scribed in Chapter 5) or an external parsed entity (de(de-scribed in Chapter
continued
Trang 5Chapter 3 Creating Well-Formed XML Documents 79
6), and if the file containing the subset or entity uses an encoding
scheme other than UTF-8 or UTF-16, you must include a text declara-tion at the very beginning of the file A text declaradeclara-tion is similar to an
XML declaration, except that the version information is optional, the
encoding declaration is mandatory, and it can’t include a standalone
document declaration Here’s an example:
<?xml version=”1.0" encoding=”ISO-8859-1" ?>
(In an external parsed entity, the text declaration is not part of the
entity’s replacement text that gets inserted by an entity reference.)
You can also insert non-ASCII characters into any XML document, regard-less of its encoding, by using character references as discussed in “Insert-ing Character References” on page 153
The XML specification’s support for the Unicode character set allows you
to freely include characters belonging to any written language It might also
be important to tell the application that handles your document the specific language used for the text in a particular element For example, the appli-cation might need to know the language of the text in order to display it properly on the screen or to check its spelling XML reserves an attribute
named xml:lang for this purpose (The xml: indicates that this attribute belongs to the xml namespace Because this namespace is predefined, you
don’t have to declare it See “Using Namespaces” on page 69.) To specify the language of the text in a particular element (the text in the element’s
character data as well as its attribute values) include an xml:lang attribute
specification in the element’s start-tag, assigning it an identifier for the lan-guage, as in the following example elements:
<! This element contains U.S English text: >
<TITLE xml:lang=”en-US”>The Color Purple</TITLE>
<! This element contains British English text: >
<TITLE xml:lang=”en-GB”>Colours I Have Known</TITLE>
<! This element contains generic English text: >
<TITLE xml:lang=”en”>The XML Story</TITLE>
<! This element contains German text: >
<TITLE xml:lang=”de”>Der Richter und Sein Henker</TITLE>
continued
Trang 680 XML Step by Step
For a description of the official language identifiers you can assign to
xml:lang, see the section “2.12 Language Identification” in the XML fication at http://www.w3.org/TR/REC-xml The xml:lang attribute
speci-fication applies to the element in which it occurs and to any nested elements,
unless it is overridden by another xml:lang attribute specification in a nested
element To indicate the language of the text throughout your entire
docu-ment, just include xml:lang in the document element.
The xml:lang attribute doesn’t affect the behavior of the XML processor.
The processor merely passes the attribute specification on to the applica-tion, which can use the value as appropriate The XML specification doesn’t
say how the xml:lang setting must be used.
When you get to Chapters 5 and 7 on creating valid documents, keep in
mind that in a valid document the xml:lang attribute must be defined just
like any other attribute (This will make sense when you read those chap-ters.) For instance, in a DTD you could define this attribute as in the fol-lowing example attribute-list declaration:
<!ATTLIST TITLE xml:lang NMTOKEN #REQUIRED>
continued
Trang 7Adding Comments,
Processing Instructions,
and CDATA Sections
In this chapter, you’ll learn how to add three types of XML markup to your
documents: comments, processing instructions, and CDATA sections While
these items aren’t required in a well-formed (or valid) XML document, they
can be useful You can use comments to make your document more understand-able when read by humans You can use processing instructions to modify the way an application handles or displays your document And you can use
CDATA sections to include almost any combination of characters within an
element’s character data
Inserting Comments
As you learned in Chapter 1, the sixth goal in the XML specification is that
“XML documents should be human-legible and reasonably clear.” Well-placed and meaningful comments can greatly enhance the human readability and clarity
of an XML document, just as comments can make program source code such as
C or BASIC much more understandable The XML processor ignores comment text, although it may pass the text on to the application
CHAPTER
4
Trang 8Chapter 4 Adding Comments, Processing Instructions, and CDATA Sections 83
And you can place them within an element’s content:
<?xml version=”1.0"?>
<DOCELEMENT>
<! This comment is part of the content of the root element > This is a very simple XML document.
</DOCELEMENT>
Here’s an example of a comment that’s illegal because it’s placed within markup:
<?xml version=”1.0"?>
<DOCELEMENT <! This is an ILLEGAL comment! > >
This is a very simple XML document.
</DOCELEMENT>
You can, however, place a comment within a document type definition (DTD)— even though a DTD is part of markup—provided that it’s not within a markup declaration in the DTD You’ll learn all about DTDs and how to place
com-ments within them in Chapter 5
Using Processing Instructions
For the most part an XML document doesn’t include information on how the data is to be formatted or processed However, the XML specification does
pro-vide a form of markup known as a processing instruction that lets you pass
in-formation to the application that isn’t part of the document’s data The XML processor itself doesn’t act on processing instructions, but merely hands the text
to the application, which can use the information as appropriate
note
Recall from Chapter 2 that the XML processor is the software module that reads and stores the contents of an XML document The application is a separate software module that obtains the document’s contents from the processor and then manipulates and displays these contents When you display XML in Internet Explorer, the browser provides both the XML processor and at least the front end of the application (If you write a script to manipulate and display an XML document, you are supplying part of the application yourself.)
Trang 984 XML Step by Step
The Form of a Processing Instruction
A processing instruction has the following general form:
<?target instruction ?>
Here, target is the name of the application to which the instruction is directed.
Note that you can’t insert white space—that is, space, tab, carriage-return, or line feed characters—between the first question mark (?) in the processing
in-struction and target Any name is allowable, provided it follows these rules:
■ The name must begin with a letter or underscore (_), followed by zero or more letters, digits, periods (.), hyphens (-), or underscores
■ The target name xml, in any combination of uppercase or lowercase letters, is reserved (As you’ve seen, you use xml in lowercase letters
for the document’s XML declaration, which is a special type of pro-cessing instruction.) To avert possible conflicts with current or fu-ture reserved target names, you should also avoid beginning a target
name with xml (in any combination of cases), although the Internet
Explorer parser doesn’t prohibit the use of such names
And instruction is the information passed to the application It can consist of
any sequence of characters, except the character pair ?> (which is reserved for terminating the processing instruction)
How You Can Use Processing Instructions
The particular processing instructions that will be recognized depend upon the application that will be handling your XML document If you’re using Internet Explorer to display and work with your XML documents (as described through-out this book), you’ll find two main uses for processing instructions:
■ You can use standard, reserved processing instructions to tell Internet Explorer how to handle or display the document An ex-ample you’ll see in this book is the processing instruction that tells Internet Explorer to display the document using a particular style sheet For instance, the following processing instruction tells Internet Explorer to use the cascading style sheet (CSS) located in the file Inventory01.css:
<?xml-stylesheet type=”text/css” href=”Inventory01.css”?>
Trang 1086 XML Step by Step
</BOOK>
</INVENTORY>
<! And here’s one following the document element: >
<?ScriptA Category=”books” Style=”formal” ?>
Here’s an example of a processing instruction illegally placed within markup:
<! The following element contains an ILLEGAL
processing instruction: >
<BOOK <?ScriptA emphasize=”yes” ?> >
<TITLE>Leaves of Grass</TITLE>
<AUTHOR>Walt Whitman</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>462</PAGES>
<PRICE>$7.75</PRICE>
</BOOK>
You can, however, place a processing instruction within a document type defini-tion (DTD)—even though a DTD is part of markup—provided that it’s not within a markup declaration in the DTD You’ll learn all about DTDs and how
to place processing instructions within them in Chapter 5
Including CDATA Sections
As you learned in Chapter 3, you can’t directly insert a left angle bracket (<) or
an ampersand (&) as part of an element’s character data, because the XML parser would interpret either of these characters as the start of markup One
way to get around this restriction is to use a character reference (< repre-senting < or & reprerepre-senting &) or a predefined general entity reference (< representing < or & representing &) You’ll learn about character and
pre-defined general entity references in Chapter 6 However, if you need to insert many < or & characters, using these references is awkward and makes the data difficult for humans to read In this case, it’s easier to place the text containing the restricted characters inside a CDATA section
The Form of a CDATA Section
A CDATA section begins with the characters <![CDATA[ and ends with the characters ]]> Between these two delimiting character groups, you can type any characters except ]]> You can freely include the often forbidden < and & char-acters You can’t include ]]> because these characters would be interpreted as