Hence, the example document in Listing 5-1 declares the cd:ITEM element and its subelements as follows: ■ If an element is assigned to a namespace using a default namespace assignmen
Trang 1<FILM Class=”instructional”>
<TITLE>The Use and Care of XML</TITLE>
<INSTRUCTOR>Michael J Young</INSTRUCTOR>
</FILM>
If you omitted the Class attribute, it would be assigned the default
value fictional Assigning to Class a value other than fictional,
docu-mentary, or instructional would be a validity error.
■ The keyword NOTATION, followed by space, followed by an open
parenthesis, followed by a list of notation names separated with |
characters, followed by a close parenthesis Each of these names
must exactly match the name of a notation declared in the DTD A
notation describes a data format or identifies the program used to
process a particular format I’ll discuss notations in Chapter 6
note
You cannot declare more than one NOTATION type attribute for a given ele-ment Also, you cannot declare a NOTATION type attribute for an element that
is declared as EMPTY
For example, assuming that the notations HTML, SGML, and RTF
are declared in your DTD, you could restrict the values of the
For-mat attribute to one of these notation names by declaring it like this:
<!ELEMENT EXAMPLE_DOCUMENT (#PCDATA)>
<!ATTLIST EXAMPLE_DOCUMENT
Format NOTATION (HTML|SGML|RTF) #REQUIRED>
You could then use the Format element to indicate the format of a
particular EXAMPLE_DOCUMENT element, as in this example:
<EXAMPLE_DOCUMENT Format=”HTML”>
<![CDATA[
<HTML>
<HEAD>
<TITLE>Mike’s Home Page</TITLE>
</HEAD>
<BODY>
<P>Welcome!</P>
</BODY>
Trang 2</HTML>
]]>
</EXAMPLE_DOCUMENT>
Assigning Format a value other than HTML, SGML, or RTF would
be a validity error (Notice the use of the CDATA section here, which allows you to use the left angle bracket (<) character freely within the element’s character data.)
The Default Declaration
The default declaration is the third and final required component of an attribute definition It specifies whether the attribute is required, and, if the attribute isn’t required, it indicates what the processor should do if the attribute is omitted The declaration might, for example, provide a default attribute value that the processor should use if the attribute is absent
Name of associated element Attribute definition
Default declaration Attribute type
Attribute name
An attribute-list declaration
The default declaration has four possible forms:
■ #REQUIRED With this form, you must specify an attribute value
for every element of the associated type For example, the following
declaration indicates that you must assign a value to the Class
at-tribute within the start-tag of every FILM element in the document:
<!ATTLIST FILM Class CDATA #REQUIRED>
■ #IMPLIED This form indicates that you can either include or omit
the attribute from an element of the associated type, and that if you omit the attribute, no default value is supplied to the processor
(This form “implies” rather than “states” a value, causing the appli-cation to use its own default value—hence the name.) For example, the following declaration indicates that assigning a value to the
Class attribute within a FILM element is optional, and that the DTD doesn’t supply a default Class value:
<!ATTLIST FILM Class CDATA #IMPLIED>
Trang 3<AUTHOR>Walt Whitman</AUTHOR>
<PRICE>$7.75</PRICE>
</ITEM>
<cd:ITEM>
<cd:TITLE>Violin Concertos Numbers 1, 2, and 3</cd:TITLE>
<cd:COMPOSER>Mozart</cd:COMPOSER>
<cd:PRICE>$16.49</cd:PRICE>
</cd:ITEM>
<ITEM Status=”out”>
<TITLE>The Legend of Sleepy Hollow</TITLE>
<AUTHOR>Washington Irving</AUTHOR>
<PRICE>$2.95</PRICE>
</ITEM>
<ITEM Status=”in”>
<TITLE>The Marble Faun</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<PRICE>$10.95</PRICE>
</ITEM>
</COLLECTION>
Listing 5-1.
■ If an element or attribute name in the document is explicitly
quali-fied using a namespace prefix, you must include that prefix when
you declare the element or attribute in the DTD Hence, the example
document in Listing 5-1 declares the cd:ITEM element and its
subelements as follows:
<!ELEMENT cd:ITEM (cd:TITLE, cd:COMPOSER, cd:PRICE)>
<!ELEMENT cd:TITLE (#PCDATA)>
<!ELEMENT cd:COMPOSER (#PCDATA)>
<!ELEMENT cd:PRICE (#PCDATA)>
■ If an element is assigned to a namespace using a default namespace
assignment in the document, you declare it using its unqualified
name Accordingly, the example document declares the
COLLEC-TION element using its unqualified name, even though it belongs by
default to the http://www.mjyOnline.com/books namespace:
<!ELEMENT COLLECTION (ITEM | cd:ITEM)*>
■ If a particular element name or attribute name belongs to several
dif-ferent namespaces—or to no namespace—you must declare each use
Trang 4of the name separately Hence, the example document declares both
ITEM and cd:ITEM.
■ As with any attributes in a document, you must declare the at-tributes that appear in the special-purpose attribute specifications that are used to declare namespaces In the example document, these attributes belong to the COLLECTION element and are declared as follows:
<!ATTLIST COLLECTION xmlns CDATA #REQUIRED xmlns:cd CDATA #REQUIRED>
Internet Explorer doesn’t support default values for these attributes
In other words, you can’t declare these attributes with default values
and then omit the attribute specifications from the start-tag of the element that they belong to (COLLECTION in the example docu-ment) You must always explicitly assign attribute values
Using an External DTD Subset
The document type definitions you’ve seen so far in this chapter are contained completely within the document type declaration in the document This type of
DTD is known as an internal DTD subset.
Alternatively, you can place all or part of the document’s DTD in a separate file, and then refer to that file from the document type declaration A DTD—
or a portion of a DTD—contained in a separate file is known as an external DTD subset.
note
Using an external DTD subset is advantageous primarily for a common DTD employed by an entire group of documents Each document can refer to a single DTD file (or copy of that file) as an external DTD subset This saves having to copy the DTD contents into each document that uses it, and also makes it easier
to maintain the DTD (You need to modify only the single DTD file—and any copies of that file—rather than edit all the documents that use it.) Recall from Chapter 1 that many of the standard XML applications are based on a common DTD included in all XML documents that conform to the application To review, take a look at “Standard XML Applications” and “Real-World Uses for XML,” both in Chapter 1
Trang 5Using an External DTD Subset Only
To use only an external DTD subset, omit the block of markup declarations and the square bracket ([]) characters that contain them, and instead include the key-word SYSTEM followed by a quoted description of the location of the separate file that contains the DTD Consider, for instance, the SIMPLE document you saw earlier in the chapter, which has an internal DTD subset:
<?xml version=”1.0"?>
<!DOCTYPE SIMPLE
[
<!ELEMENT SIMPLE ANY>
]
>
<SIMPLE>This is an extremely simplistic XML document.</SIMPLE>
If this document used an external DTD subset, it would appear like this:
<?xml version=”1.0"?>
<!DOCTYPE SIMPLE SYSTEM “Simple.dtd”>
<SIMPLE>This is an extremely simplistic XML document.</SIMPLE>
And the file Simple.dtd would have the following contents:
<!ELEMENT SIMPLE ANY>
The file containing the external DTD subset can include any of the markup dec-larations that can be included in an internal DTD subset I listed these in “Creat-ing the Document Type Definition” on page 96
note
For information on including a text declaration at the beginning of a file con-taining an external DTD subset, see the sidebar “Characters, Encoding, and Languages” on page 77
The description of the file location (Simple.dtd in the example) is known as the system identifier It can be delimited using either single quotes (') or double
quotes (") It can include any characters except the quotation character used to
Trang 6delimit it, and it must specify a valid URI (Uniform Resource Indicator) for the file containing the external DTD subset Currently, the most common form of URI is a traditional URL (Uniform Resource Locator) (See the sidebar “URIs, URLs, and URNs” on page 73.) You can use a fully qualified URL, such as:
<!DOCTYPE SIMPLE SYSTEM “http://www.mjyOnline.com/dtds/Simple.dtd”>
Or, you can use a partial URL that specifies a location relative to the location of the XML document containing the URL, such as:
<!DOCTYPE SIMPLE SYSTEM “Simple.dtd”>
Relative URLs in XML documents work just like relative URLs in HTML pages In the second example, if the full URL of the XML document were
http://www.mjyOnline.com/documents/Simple.xml, Simple.dtd would refer
to http://www.mjyOnline.com/documents/Simple.dtd Likewise, if the
XML document were located at file:///C:\XML Step by Step\Example
Source\Simple.xml, Simple.dtd would refer to file:///C:\XML Step by
Step\Example Source\Simple.dtd.
Using Both an External DTD Subset and
an Internal DTD Subset
To use both an external DTD subset and an internal DTD subset, include the SYSTEM keyword together with the system identifier giving the location of the external DTD subset file, followed by the internal DTD subset markup declara-tions within square bracket ([]) characters
Here’s an example of a simple XML document with both an internal and an ex-ternal DTD subset:
<?xml version=”1.0"?>
<!DOCTYPE BOOK SYSTEM “Book.dtd”
[
<!ATTLIST BOOK ISBN CDATA #IMPLIED Year CDATA “2000”>
<!ELEMENT TITLE (#PCDATA)>
]
>
<BOOK Year=”1998">
<TITLE>The Scarlet Letter</TITLE>
</BOOK>
Trang 7Here are the contents of the file containing the external DTD subset, Book.dtd:
<!ELEMENT BOOK ANY>
<!ATTLIST BOOK ISBN NMTOKEN #REQUIRED>
When you include both an external and an internal DTD subset, here’s how the XML processor combines their contents:
■ It merges the contents of the two subsets to form the complete DTD
In the example document, the resultant merged DTD defines two
el-ements, TITLE and BOOK, and two attributes for the BOOK
ele-ment, ISBN and Year.
■ It processes the internal DTD subset before the external DTD subset
(even though the external subset reference appears first in the
docu-ment type declaration) Thus, if a particular item (eledocu-ment, attribute,
entity, or notation) is declared with the same name in both the
inter-nal and exterinter-nal subsets, the declaration in the interinter-nal subset takes
precedence and the declaration in the external subset is considered a
redeclaration
For instance, if an attribute with the same name and element type is
declared in both subsets, the processor uses the declaration in the
internal subset and ignores the one in the external subset (As
ex-plained earlier in this chapter, the processor uses the first declaration
for a particular attribute and ignores any subsequent ones.) In the
example document, the XML processor considers the ISBN attribute
to have the CDATA type and the #IMPLIED default declaration, and
therefore the following element (which leaves out ISBN) is valid:
<BOOK Year=”1850">
<TITLE>The Scarlet Letter</TITLE>
</BOOK>
note
For more information on redeclaring elements, attributes, entities, and nota-tions, see the sidebar “Redeclarations in a DTD” on page 148 I’ll discuss en-tity and notation declarations in Chapter 6
The way the XML processor combines an internal and an external DTD subset lets you use a common DTD (such as one provided for an XML application like MathML) as an external DTD subset, but then customize the DTD for the
Trang 8<!ELEMENT BOOK (TITLE, AUTHOR, BINDING, PAGES, PRICE)> <!ATTLIST BOOK InStock (yes|no) #REQUIRED>
<!ELEMENT TITLE (#PCDATA | SUBTITLE)*>
<!ELEMENT SUBTITLE (#PCDATA)>
<!ELEMENT AUTHOR (#PCDATA)>
<!ATTLIST AUTHOR Born CDATA #IMPLIED>
<!ELEMENT BINDING (#PCDATA)>
<!ELEMENT PAGES (#PCDATA)>
<!ELEMENT PRICE (#PCDATA)>
]
>
<INVENTORY>
<BOOK InStock=”yes”>
<TITLE>The Adventures of Huckleberry Finn</TITLE> <AUTHOR Born=”1835">Mark Twain</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>298</PAGES>
<PRICE>$5.49</PRICE>
</BOOK>
<BOOK InStock=”no”>
<TITLE>Leaves of Grass</TITLE>
<AUTHOR Born=”1819">Walt Whitman</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>462</PAGES>
<PRICE>$7.75</PRICE>
</BOOK>
<BOOK InStock=”yes”>
<TITLE>The Legend of Sleepy Hollow</TITLE>
<AUTHOR>Washington Irving</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>98</PAGES>
<PRICE>$2.95</PRICE>
</BOOK>
<BOOK InStock=”yes”>
Trang 9<TITLE>The Marble Faun</TITLE>
<AUTHOR Born=”1804">Nathaniel Hawthorne</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>473</PAGES>
<PRICE>$10.95</PRICE>
</BOOK>
<BOOK InStock=”no”>
<TITLE>Moby-Dick <SUBTITLE>Or, The Whale</SUBTITLE></TITLE> <AUTHOR Born=”1819">Herman Melville</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>724</PAGES>
<PRICE>$9.95</PRICE>
</BOOK>
<BOOK InStock=”yes”>
<TITLE>The Portrait of a Lady</TITLE>
<AUTHOR>Henry James</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>256</PAGES>
<PRICE>$4.95</PRICE>
</BOOK>
<BOOK InStock=”yes”>
<TITLE>The Scarlet Letter</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>253</PAGES>
<PRICE>$4.25</PRICE>
</BOOK>
<BOOK InStock=”no”>
<TITLE>The Turn of the Screw</TITLE>
<AUTHOR>Henry James</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>384</PAGES>
<PRICE>$3.35</PRICE>
</BOOK>
</INVENTORY>
Listing 5-2.
8 If you want to test the validity of your document, read the instructions for using the DTD validity-testing page that is presented in “Checking an XML Document for Validity Using a DTD” on page 396
Trang 10Defining and
Using Entities
An important benefit of adding document type definitions (DTDs) to your XML documents is that they allow you to define entities You can use entities to save time and reduce the size of your XML documents, to modularize your
docu-ments, and to incorporate diverse types of data into your documents You define
an entity in a DTD using a syntax similar to that used to declare an element or attribute in a valid XML document, as described in Chapter 5
In this chapter, you’ll first learn some of the basic terminology used with entities and the different ways entities are classified You’ll then discover how to declare each of the different entity types, and how to insert or identify the entities in
your document where you need them Next you’ll learn how to use two XML features that let you insert any type of character in any context: character refer-ences and predefined entities The chapter concludes with a hands-on exercise to give you some practice working with entities within a complete XML document
Entity Definitions and Classifications
The XML specification uses the term entity in a broad, general sense to refer to
any of the following types of storage units associated with XML documents:
■ The entire XML document itself, which is known as the
document entity
■ An external DTD subset (discussed in “Using an External DTD Sub-set” in Chapter 5)
■ An external file defined as an external entity in the DTD and used within the document
6