note The Microsoft Internet Explorer processor will check a document for validity only if the document contains a document type declaration and you open the docu-ment through an HTML Web
Trang 1The Advantages of Making an XML
Document Valid
Creating a valid XML document might seem to be a lot of unnecessary bother: You must first fully define the document’s content and structure in a DTD or XML schema and then create the document itself, following all the DTD or
schema specifications It might seem much easier to just immediately add what-ever elements and attributes you need, as you did in the examples of
well-formed documents in previous chapters
If, however, you want to make sure that your document conforms to a specific structure or set of standards, providing a DTD or XML schema that describes the structure or standards allows an XML processor to check whether your
document is in conformance In other words, a DTD or XML schema provides a standard blueprint to the processor so that in checking the validity of the docu-ment, it can enforce the desired structure and guarantee that your document meets the required standards If any part of the document doesn’t conform to the DTD or XML schema specification, the processor can display an error mes-sage so that you can edit the document and make it conform
Making an XML document valid also fosters consistency within that document For example, a DTD or XML schema can force you to always use the same ele-ment type for describing a given piece of information (for instance, to always enter a book title using a TITLE element rather than a NAME element); it can ensure that you always assign a designated value to an attribute (for instance,
hardcover rather than hardback); and it can catch misspellings or typos in
ele-ment or attribute names (for instance, typing PHILUM rather than PHYLUM for an element name)
Making XML documents valid is especially useful for ensuring uniformity
among a group of similar documents In fact, the XML standard defines a DTD
as “a grammar for a class of documents.” Consider, for example, a Web publish-ing company that needs all its editors to create XML documents that conform to
a common structure Creating a single DTD or XML schema and using it for all documents can ensure that these documents uniformly comply with the required structure, and that editors don’t add arbitrary new elements, place information
in the wrong order, assign the wrong data types to attributes, and so on Of
course, the document must be run through a processor that checks its validity Including a DTD or XML schema and checking validity is especially important
if the documents are going to be processed by custom software (such as a Web page script) that expects a particular document content and structure If all users
of the software use a common appropriate DTD or XML schema for their XML
Trang 2documents, and if the documents are checked for validity, the users can be sure that their documents will be recognized by the processing software For ex-ample, if a group of mathematicians are creating mathematical documents that will be displayed using a particular program, they could all include in their documents a common DTD that defines the required structure, elements, at-tributes, and other features
In fact, most of the “real-world” XML applications listed at the end of Chapter
1, such as MathML, consist of a standard DTD or XML schema that all users of the application use with their XML documents, so that checking the documents for validity ensures that they conform to the application’s structure and will be recognized by any software designed for that application
note
The Microsoft Internet Explorer processor will check a document for validity only
if the document contains a document type declaration and you open the docu-ment through an HTML Web page (using the techniques you’ll learn in Chap-ters 10 and 11), or if you use an XML schema as explained in Chapter 11
If you open an XML document—one with or without a style sheet—directly in Internet Explorer (as you have done so far in this book and will do in Chapters
8, 9, and 12), the processor will check the entire document—including any document type declaration it contains—for well-formedness and will display a fatal error message for any infraction it encounters However, the Internet
Explorer processor will not check the document for validity, even if it contains
a document type declaration
To test a document with a DTD or XML schema for validity and to see messages for any well-formedness or validity errors the document contains, you can use one of the validity checking scripts (contained in HTML Web pages) that are given in “Checking an XML Document for Validity” on page 396 (These scripts are also provided on the companion CD.) You might want to read the instruc-tions in that section now so that you can begin checking the validity of the XML documents you create
Trang 3Adding the Document Type Declaration
A document type declaration is a block of XML markup that you add to the prolog of a valid XML document It can go anywhere within the prolog—out-side of other markup—following the XML declaration (Recall that if you in-clude the XML declaration, it must be at the very beginning of the document.)
Prolog
Document
element
Document type declaration can go here
or here
A document type declaration defines the content and structure of the document
If you open a document without a document type declaration (or XML schema)
in Internet Explorer, the Internet Explorer processor will merely check that the document is well-formed If, however, you open a document with a document type declaration in Internet Explorer, the processor will, under certain circum-stances, check the document for validity as well as for well-formedness, and your document must therefore conform to all declarations within the document type declaration (See the note at the end of the previous section for a descrip-tion of the circumstances under which Internet Explorer checks for validity.) You won’t, for example, be able to include any elements or attributes in the
document that you haven’t declared in the document type declaration And ev-ery element and attribute that you do include must match the specifications
(such as the allowable content of an element or the permissible type of an at-tribute value) expressed in the corresponding declaration
Trang 4Well-Formedness and Validity Constraints
Well-formedness constraints are a set of rules given in the XML
specifica-tion that you must follow—in addispecifica-tion to the rules specified in the formal XML grammar—to create a well-formed document Because an XML document must be well-formed, any violation of a well-formedness
con-straint or any other failure to achieve well-formedness is considered a
fa-tal error When the XML processor encounters a fafa-tal error, it must stop
normal processing of the document and not attempt to recover
Validity constraints are a further set of rules in the XML specification that
you must follow if you’ve chosen to create a valid document by defining a DTD (They don’t apply if you’ve chosen to create a valid document using
an XML schema.) Because validity is optional for an XML document, a
violation of a validity constraint is considered only an error, as opposed to
a fatal error When a validating XML processor (that is, one that checks
documents for validity) encounters an error, it can simply report the problem and attempt to recover from it Validity constraints consist
of specific rules for creating a proper document type declaration with its DTD, and for creating a document that conforms to the specifications within your DTD
Declaring Element Types
In a valid XML document created using a DTD, you must explicitly declare the
type of every element that you use in the document in an element type declara-tion within the DTD An element type declaradeclara-tion indicates the name of the
ele-ment type and the allowable content of the eleele-ment (often specifying the order in which child elements can occur) Taken together, the element type declarations
in the DTD map out the entire content and logical structure of the document That is, the element type declarations indicate the element types that the docu-ment contains, the order of the eledocu-ments, and the contents of these eledocu-ments
The Form of an Element Type Declaration
An element type declaration has the following general form:
<!ELEMENT Name contentspec>
Trang 5Here, Name is the name of the element type being declared (To review the rules
for legal element names, see “The Anatomy of an Element” on page 53.) And
contentspec is the content specification, which defines what the element can
con-tain The next section describes the different types of content specifications you can use
The following is a declaration of an element type named TITLE, which is per-mitted to contain only character data (no child elements would be allowed):
<!ELEMENT TITLE (#PCDATA)>
And here’s a declaration for an element type named GENERAL, which can con-tain any type of content:
<!ELEMENT GENERAL ANY>
As a final example, here’s a complete XML document with two element types The declaration of the COLLECTION element type indicates that it can contain one or more CD elements, and the declaration of the CD element type specifies that it can contain only character data Notice that the document conforms to these declarations and is therefore valid:
<?xml version=”1.0"?>
<!DOCTYPE COLLECTION
[
<!ELEMENT COLLECTION (CD)+>
<!ELEMENT CD (#PCDATA)>
<! You can also insert a comment in a DTD >
]
>
<COLLECTION>
<CD>Mozart Violin Concertos 1, 2, and 3</CD>
<CD>Telemann Trumpet Concertos</CD>
<CD>Handel Concerti Grossi Op 3</CD>
</COLLECTION>
note
You can declare a particular element type only once in a given document For general information on redeclaring items in the DTD, see the sidebar
“Redeclarations in a DTD” on page 148
Trang 6The Element’s Content Specification
You can specify the content of an element—that is, fill in the contentspec part of
the element type declaration—in four different ways:
element must be empty—that is, that it cannot have content Here’s
an example:
<!ELEMENT IMAGE EMPTY>
The following would be valid IMAGE elements you could enter into your document:
<IMAGE></IMAGE>
<IMAGE />
element can have any legal content That is, an element of this type can contain zero or more child elements of any declared type, in any order or number of repetitions, with or without interspersed character data This is the most lax content specification, and creates
an element type without content constraints Here’s an example of
a declaration:
<!ELEMENT MISC ANY>
content specification, the element can contain child elements of the indicated types, but can’t directly contain character data I’ll de-scribe this option in the next section
can contain any quantity of character data Also, if one or more child element types are specified in the declaration, the character data can be interspersed with any number of these child elements, in any order I’ll describe this option later in this chapter
Specifying Element Content
If an element has element content, it can directly contain only the specified child elements The element cannot contain character data, except for white space characters used to separate the child elements and enhance readability (for ex-ample, you can display each child element on a separate line and indent them using space or tab characters) As always, the processor must pass the white space characters on to the application, but the application will typically ignore
Trang 7them (For more details, and to learn about an exception, see the sidebar “White Space in Elements” on page 56.)
Consider the following example XML document, which describes a single book:
<?xml version=”1.0"?>
<!DOCTYPE BOOK
[
<!ELEMENT BOOK (TITLE, AUTHOR)>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT AUTHOR (#PCDATA)>
]
>
<BOOK>
<TITLE>The Scarlet Letter</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
</BOOK>
In this document, the BOOK element type is declared to have element content The (TITLE, AUTHOR) following the element name in the declaration is known
as the content model A content model indicates the allowed types of child
ele-ments and their order In this example, the content model indicates that a BOOK element must have exactly one TITLE child element followed by exactly one AUTHOR child element
A content model can have either of the following two basic forms:
ele-ment must contain a specific sequence of child eleele-ment types You
separate the names of the child element types with commas For
ex-ample, the following DTD indicates that a MOUNTAIN document
element must have one NAME child element, followed by one
HEIGHT child element, followed by one STATE child element:
<!DOCTYPE MOUNTAIN
[
<!ELEMENT MOUNTAIN (NAME, HEIGHT, STATE)>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT HEIGHT (#PCDATA)>
<!ELEMENT STATE (#PCDATA)>
]
>
Trang 8Hence, the following document element would be valid:
<MOUNTAIN>
<NAME>Wheeler</NAME>
<HEIGHT>13161</HEIGHT>
<STATE>New Mexico</STATE>
</MOUNTAIN>
The following document element, however, would be invalid because the order of the child element types isn’t as declared:
<MOUNTAIN> <! Invalid element! >
<STATE>New Mexico</STATE>
<NAME>Wheeler</NAME>
<HEIGHT>13161</HEIGHT>
</MOUNTAIN>
Omitting a child element type or including the same child element type more than once would also be invalid As you can see, this is a very rigid form of declaration
ele-ment can have any one of a series of possible child eleele-ment types, which are separated using | characters For example, the following DTD specifies that a FILM element can contain one STAR child
ele-ment, or one NARRATOR child eleele-ment, or one INSTRUCTOR
child element:
<!DOCTYPE FILM [
<!ELEMENT FILM (STAR | NARRATOR | INSTRUCTOR)>
<!ELEMENT STAR (#PCDATA)>
<!ELEMENT NARRATOR (#PCDATA)>
<!ELEMENT INSTRUCTOR (#PCDATA)>
]
>
Hence, the following document element would be valid:
<FILM>
<STAR>Robert Redford</STAR>
</FILM>
Trang 9<!ELEMENT TITLE (#PCDATA | SUBTITLE)*>
<!ELEMENT SUBTITLE (#PCDATA)>
The following are valid TITLE elements, conforming to this
declaration:
<TITLE>Moby-Dick <SUBTITLE>Or, The Whale</SUBTITLE></TITLE>
<TITLE><SUBTITLE>Or, The Whale</SUBTITLE> Moby-Dick</TITLE>
<TITLE>Moby-Dick</TITLE>
<TITLE>
<SUBTITLE>Or, The Whale</SUBTITLE>
<SUBTITLE>Another Subtitle</SUBTITLE>
</TITLE>
<TITLE></TITLE>
Declaring Attributes
In a valid XML document, you must also explicitly declare all attributes that you intend to use with the document’s elements You define all the attributes as-sociated with a particular element by using a type of DTD markup declaration
known as an attribute-list declaration This declaration does the following:
■ It defines the names of the attributes associated with the element In
a valid document, you can include in an element start-tag only those
attributes defined for that element
■ It specifies the data type of each attribute
■ It specifies for each attribute whether that attribute is required If the
attribute isn’t required, the attribute-list declaration also indicates
what the processor should do if the attribute is omitted (The
decla-ration might, for example, provide a default attribute value that the
processor will pass to the application.)
note
You can declare elements and attributes in any order in a DTD For example, you can declare the attribute-list specification for a particular element before you declare that element
Trang 10The Form of an Attribute-List Declaration
An attribute-list declaration has the following general form:
<!ATTLIST Name AttDefs>
Here, Name is the type name of the element associated with the attribute or at-tributes AttDefs is a series of one or more attribute definitions, each of which
defines one attribute (The order of the attribute definitions in the attribute-list declaration isn’t significant You can always include the attribute specifications
in an element start-tag in any order.)
An attribute definition has the following form:
Name AttType DefaultDecl
Here, Name is the name of the attribute (To review the rules for legal attribute names, see “Rules for Creating Attributes” on page 63.) AttType is the attribute type, which is the kind of value that can be assigned to the attribute (I’ll de-scribe the attribute type in the next section.) And DefaultDecl is the default dec-laration, which indicates whether the attribute is required and provides other
information (I’ll describe the default declaration later in this chapter.)
Say, for example, that you’ve declared an element type named FILM like this:
<!ELEMENT FILM (TITLE, (STAR | NARRATOR | INSTRUCTOR))>
Here’s an example of an attribute-list declaration that declares two attributes—
named Class and Year—for FILM elements:
<!ATTLIST FILM Class CDATA “fictional” Year CDATA #REQUIRED> Here are the different parts of this declaration:
Second attribute definition
Default declaration Attribute type Attribute name Attribute name
Attribute type
Default declaration
First attribute definition Name of associated element
An attribute-list declaration
You can assign to the Class attribute any legal quoted string (the CDATA
key-word); if you omit the attribute from a particular element, it will automatically
be assigned the default value fictional You can assign to the Year attribute any
legal quoted string; this attribute, however, must be assigned a value in every FILM element (the #REQUIRED keyword), and it therefore doesn’t have a default value