Defining Elements As I said, the xsd:complexTypeelement in the sample schema defines a type of element, not an ment in the document.. A simple element is an element that does not have ch
Trang 1An important aspect of defining possible attribute values by an enumeration like this is that an XML tor can help the author of a document by prompting with the list of possible attribute values from theDTD when the element is being created.
edi-An attribute that you declare as #FIXEDmust always have the default value For example:
Defining Parameter Entities
You will often need to repeat a block of information in different places in a DTD A parameter entity
identifies a block of parsed text by a name that you can use to insert the text at various places within aDTD Note that parameter entities are for use only within a DTD You cannot use parameter entity refer-ences in the body of a document You declare general entities in the DTD when you want to repeat textwithin the document body
The form for a parameter entity is very similar to what you saw for general entities except that a %acter appears between ENTITYand the entity name, separated from both by a space For example, it isquite likely that you would want to repeat the xand yattributes that you defined in the <position>element in the previous section in other elements You could define a parameter entity for theseattributes and then use that wherever these attributes should appear in an element declaration Here’sthe parameter entity declaration:
char-<!ENTITY % coordinates “x CDATA #REQUIRED y CDATA #REQUIRED”>
Now you can use the entity name to insert the xand yattribute definitions in an attribute declaration:
<!ATTLIST position %coordinates; >
A parameter entity declaration must precede its use in a DTD
The substitution string in a parameter entity declaration is parsed and can include parameter and eral entity references As with general entities, a parameter entity can also be defined by a reference to aURI containing the substitution string
gen-Other Types of Attribute ValueThere are a further eight possibilities for specifying the type of the attribute value I won’t go into detail
on these, but so you can recognize them, they are as follows:
Trang 2ENTITY An entity defined in the DTD An entity here is a name identifying an unparsed
entity defined elsewhere in the DTD by an ENTITYtag The entity may or maynot contain text An entity could represent something very simple such as <,which refers to a single character, or it could represent something more substan-tial such as an image
ENTITIES A list of entities defined in the DTD, separated by spaces
ID An ID is a unique name identifying an element in a document This is to enable
internal references to a particular element from elsewhere in the document
IDREF A reference to an element elsewhere in a document via its ID
IDREFS A list of references to IDs, separated by spaces
NMTOKEN A name conforming to the XML definition of a name This just says that the value
of the attribute will be consistent with the XML rules for a name
NMTOKENS A list of name tokens, separated by spaces
NOTATION A name identifying a notation — which is typically a format specification for an
entity such as a JPEG or PostScript file The notation will be identified elsewhere
in the DTD using a NOTATIONtag that may also identify an application capable ofprocessing an entity in the given format
A DTD for Sketcher
With what you know of XML and DTDs, you can have a stab at putting together a DTD for storingSketcher files as XML As I said before, an XML language has already been defined for representing andcommunicating two-dimensional graphics This is called Scalable Vector Graphics, and you can find it athttp://www.w3.org/TR/SVG/ While this would be the choice for transferring 2D graphics as XMLdocuments in a real-world context, the objective here is to exercise your knowledge of XML and DTDs,
so you’ll reinvent your own version of this wheel, even though it will have fewer spokes and may ble a bit
wob-First, let’s consider what the general approach is going to be Since the objective is to define a DTD thatwill enable you to exercise the Java API for XML with Sketcher, you’ll define the language to make it aneasy fit to Sketcher, rather than worry about the niceties of the best way to represent each geometric ele-ment Since Sketcher itself was a vehicle for trying out various capabilities of the Java class libraries, itevolved in a somewhat topsy-like fashion with the result that the classes defining geometric entities arenot necessarily ideal However, you’ll just map these directly in XML to avoid the mathematical jiggery-pokery that would be necessary if you adopted a more formal representation of geometry in XML
A sketch is a very simple document It’s basically a sequence of lines, circles, rectangles, curves, and text.You can therefore define the root element <sketch>in the DTD as:
<!ELEMENT sketch (line|circle|rectangle|curve|text)*>
This just says that a sketch consists of zero or more of any of the elements between the parentheses Younow need to define each of these elements
Trang 3A line is easy It is defined by its location, which is its start point and an end point It also has anorientation — its rotation angle — and a color You could define a <line>element like this:
<!ELEMENT line (color, position, endpoint)>
You could define color by a colorattribute to the <line>element with a set of alternative values, but
to allow the flexibility for lines of any color, it would be better to define a <color>element with threeattributes for RGB values In this case you can define the <color>element as:
<!ELEMENT color EMPTY>
You must now define the <position>and <endpoint>elements These are both points defined by an
(x, y) coordinate pair, so you would sensibly define them consistently Empty elements with attributes
are the most economical way here, and you can use a parameter entity for the attributes:
<!ENTITY % coordinates “x CDATA #REQUIRED y CDATA #REQUIRED”>
<!ELEMENT position EMPTY>
<!ATTLIST position %coordinates;>
<!ELEMENT endpoint EMPTY>
<!ATTLIST endpoint %coordinates;>
A rectangle will be defined very similarly to a line since it is defined by its position, which corresponds
to the top-left corner, plus the coordinates of the bottom-right corner It also has a color and a rotationangle Here’s how this will look in the DTD:
<!ELEMENT rectangle (color, position, bottomright)>
<!ATTLIST rectangle
angle CDATA #REQUIRED
>
<!ELEMENT bottomright EMPTY>
<!ATTLIST bottomright %coordinates;>
You don’t need to define the <color>and <position>elements because you have already definedthese earlier for the <line>element
The <circle>element is no more difficult Its position is the center, and it has a radius and a color Italso has a rotation angle You can define it like this:
Trang 4<!ELEMENT circle (color, position)>
<!ELEMENT curve (color, position, point+)>
<!ATTLIST curve angle CDATA #REQUIRED>
<!ELEMENT point EMPTY>
<!ATTLIST point %coordinates;>
The start point of the curve is defined by the <position>element, and it includes at least one <point>element, which is specified by the +operator
Lastly, you have the element that defines a text element in Sketcher terms You need to allow for the fontname and its style and point size, a rotation angle for the text, and a color — plus the text itself, ofcourse, and its position ATextelement is also a little different from the other elements, as its boundingrectangle is required to construct it, so you must also include that You have some options as to how youdefine this element You could use mixed element content in a <text>element, combining the textstring with <font>and <position>elements, for example
The disadvantage of this is that you cannot limit the number of occurrences of the child elements andhow they are intermixed with the text You can make the definition more precisely controlled by enclos-ing the text in its own element Then you can define the <text>element as having element content —like this:
<!ELEMENT text (color, position, font, string)>
<!ATTLIST text angle CDATA #REQUIRED>
<!ELEMENT font EMPTY>
<!ATTLIST font
fontname CDATA #REQUIREDfontstyle (plain|bold|italic) #REQUIREDpointsize CDATA #REQUIRED
>
<!ELEMENT string (#PCDATA|bounds)*>
<!ELEMENT bounds EMPTY>
Trang 5bound-That’s all you need The complete DTD for Sketcher documents will be:
<?xml version=”1.0” encoding=”UTF-8”?>
<!ELEMENT sketch (line|circle|rectangle|curve|text)*>
<!ELEMENT color EMPTY>
<!ENTITY % coordinates “x CDATA #REQUIRED y CDATA #REQUIRED”>
<!ELEMENT position EMPTY>
<!ATTLIST position %coordinates;>
<!ELEMENT endpoint EMPTY>
<!ATTLIST endpoint %coordinates;>
<!ELEMENT line (color, position, endpoint)>
<!ELEMENT bottomright EMPTY>
<!ATTLIST bottomright %coordinates;>
<!ELEMENT circle (color, position)>
<!ATTLIST circle
radius CDATA #REQUIREDangle CDATA #REQUIRED
>
<!ELEMENT curve (color, position, point+)>
<!ATTLIST curve angle CDATA #REQUIRED>
<!ELEMENT point EMPTY>
<!ATTLIST point %coordinates;>
<!ELEMENT text (color, position, font, string)>
<!ATTLIST text angle CDATA #REQUIRED>
<!ELEMENT font EMPTY>
<!ATTLIST font
fontname CDATA #REQUIREDfontstyle (plain|bold|italic|bold-italic) #REQUIREDpointsize CDATA #REQUIRED
>
Trang 6<!ELEMENT string (#PCDATA|bounds)*>
<!ELEMENT bounds EMPTY>
<!ATTLIST bounds
width CDATA #REQUIREDheight CDATA #REQUIRED
>
You can use this DTD to represent any sketch in XML Stash it away in your Beg Java Stuffdirectory
as sketcher.dtd You’ll try it out later
Rules for a Well-Formed Document
Now that you know a bit more about XML elements and what goes into a DTD, I can formulate whatyou must do to ensure your XML document is well-formed The rules for a document to be well-formedare quite simple:
1. If the XML declaration appears in the prolog, it must include the XML version Other tions in the XML document must be in the prescribed sequence — character encoding followed
specifica-by standalonespecification
2. If the document type declaration appears in the prolog, the DOCTYPEname must match that ofthe root element, and the markup declarations in the DTD must be according to the rules forwriting markup declarations
3. The body of the document must contain at least one element, the root element, which contains
all the other elements, and an instance of the root element must not appear in the content ofanother element All elements must be properly nested
4. Elements in the body of the document must be consistent with the markup declarations fied by the DOCTYPEdeclaration
identi-The rules for writing an XML document are absolutely strict Break one rule and your document is notwell-formed and will not be processed This strict application of the rules is essential because you arecommunicating data and its structure If any laxity were permitted, it would open the door to uncer-tainty about how the data should be interpreted HTML used to be quite different from XML in thisrespect Until recently, the rules for writing HTML were only loosely applied by HTML readers such asweb browsers
For example, even though a paragraph in HTML should be defined using a start tag, <p>,and an endtag, </p>, you can usually get away with omitting the end tag, and you can use both capital and lower-
case p, and indeed close a capital P paragraph with a lowercase p, and vice versa You can often have
overlapping tags in HTML and get away with that, too While it is not to be recommended, a loose cation of the rules for HTML is not so harmful since HTML is concerned only with data presentation.The worst that can happen is that the data does not display quite as you intended
Trang 7appli-In 2000, the W3C released the XHTML 1.0 standard that makes HTML an XML language, so you canexpect more and more HTML documents to conform to this The enduring problem is, of course, that theInternet has accumulated a great deal of material over many years that is still very useful but that willnever be well-formed XML, so browsers may never be fully XML-compliant.
XML NamespacesEven though they are very simple, XML namespaces can be very confusing The confusion arises because
it is so easy to make assumptions about what they imply when you first meet them Let’s look briefly atwhy you have XML namespaces in the first place, and then see what an XML namespace actually is.You saw earlier that an XML document can have only one DOCTYPEdeclaration This can identify anexternal DTD by a URI or include explicit markup declarations, or it may do both What happens if youwant to combine two or more XML documents that each has its own DTD into a single document? Theshort answer is that you can’t — not easily anyway Since the DTD for each document will have beendefined without regard for the other, element name collisions are a real possibility It may be impossible
to differentiate between different elements that share a common name, and in this case major revisions
of the documents’ contents as well as a new DTD will be necessary to deal with this It won’t be easy.XML namespaces are intended to help deal with this problem They enable names used in markup to bequalified so that you can make duplicate names that are used in different markup unique by puttingthem in separate namespaces An XML namespace is just a collection of element and attribute namesthat is identified by a URI Each name in an XML namespace is qualified by the URI that identifies thenamespace Thus, different XML namespaces may contain common names without causing confusionsince each name is notionally qualified by the unique URI for the namespace that contains it
I say “notionally qualified” because you don’t usually qualify names using the URI directly, althoughyou could Normally, in the interests of not making the markup overly verbose, you use another name
called a namespace prefix whose value is the URI for the namespace For example, I could have a
namespace that is identified by the URI http://www.wrox.com/Toysand a namespace prefix toysthat contains a declaration for the name rubber_duck I could have a second namespace with the URIhttp://www.wrox.com/BathAccessoriesand the namespace prefix bathAccessoriesthat alsodefines the name rubber_duck The rubber_duckname from the first namespace is referred to astoys:rubber_duckand that from the second namespace is bathAccessories:rubber_duck, so there
is no possibility of confusing them The colon is used in the qualified name to separate the namespaceprefix from the local name, which is why I said earlier in the chapter that you should avoid the use ofcolons in ordinary XML names
Let’s come back to the confusing aspects of namespaces for a moment There is a temptation to imaginethat the URI that identifies an XML namespace also identifies a document somewhere that specifies thenames in the namespace This is not required by the namespace specification The URI is just a uniqueidentifier for the namespace and a unique qualifier for a set of names It does not necessarily have anyother purpose, or even have to refer to a real document; it only needs to be unique The definition ofhow names within a given namespace relate to one another and the rules for markup that uses them is
an entirely separate question This may be provided by a DTD or some other mechanism such as anXML Schema
Trang 8http://www.wrox.com/dtds/sketches You can use the namespace prefix to qualify names withinthe namespace, and since this maps to the URI, the URI is effectively the qualifier for the name The URLthat I’ve given here is hypothetical — it doesn’t actually exist, but it could The sole purpose of the URIidentifying the namespace is to ensure that names within the namespace are unique, so it doesn’t matterwhether it exists or not You can add as many namespace declarations within an element as you want,and each namespace declared in an element is available within that element and its content.
With the namespace declared with the sketcherprefix, you can use the <circle>element that isdefined in the sketchernamespace like this:
A namespace has scope — a region of an XML document over which the namespace declaration is
visi-ble The scope of a namespace is the content of the element within which it is declared, plus all direct orindirect child elements The preceding namespace declaration applies to the <sketch>element and allthe elements within it If you declare a namespace in the root element for a document, its scope is theentire document
You can declare a namespace without specifying a prefix This namespace then becomes the defaultnamespace in effect for this element, and its content and unqualified element names are assumed tobelong to this namespace Here’s an example:
<sketch xmlns=”http://www.wrox.com/dtds/sketches”>
There is no namespace prefix specified so the colon following xmlnsis omitted This namespace
becomes the default, so you can use element and attribute names from this namespace without tion and they are all implicitly within the default namespace For example:
Trang 9in a document as the default.
You can declare several namespaces within a single element Here’s an example of a default namespace
in use with another namespace:
presenta-XML Namespaces and DTDs
For a document to be valid, you must still have a DTD, and the document must be consistent with it Theway in which a DTD is defined has no specific provision for namespaces The DTD for a document thatuses namespaces must therefore define the elements and attributes using qualified names and must alsomake provision for the xmlnsattribute with or without its prefix in the markup declaration for any ele-ment in which it can appear Because the markup declarations in a DTD have no specific provision foraccommodating namespaces, a DTD is a less than ideal vehicle for defining the rules for markup whennamespaces are used The XML Schema specification provides a much better solution, and overcomes anumber of other problems associated with DTDs
XML SchemasBecause of the limitations of DTDs that I mentioned earlier, the W3C has developed the XML Schemalanguage for defining the content and structure of sets of XML documents, and this language is now aW3C standard You use the XML Schema Definition language to create descriptions of particular kinds
of XML documents in a similar manner to the way you use DTDs, and such descriptions are themselvesreferred to as XML Schemas and fulfill the same role as DTDs The XML Schema language is itselfdefined in XML and is therefore implicitly extensible to support new capabilities when necessary.Because the XML Schema language enables you to specify the type and format of data within an XMLdocument, it provides a way for you to define and create XML documents that are inherently more pre-cise, and therefore safer than documents described by a DTD
Trang 10It’s easy to get confused when you are working with XML Schemas One primary source of confusion isthe various levels of language definition you are involved with At the top level, you have XML — every-thing you are working with in this context is defined in XML At the next level you have the XMLSchema Definition language — defined in XML of course — and you use this language to define an XMLSchema, which is a specification for a set of XML documents At the lowest level you define an XML doc-ument — such as a document describing a Sketcher sketch — and this document is defined according tothe rules you have defined in your XML Schema for Sketcher documents Figure 22-3 shows the relation-ships between these various XML documents.
Figure 22-3
The XML Schema language is sometimes referred to as XSD, from XML Schema Definition language.
The XML Schema namespace is usually associated with the prefix name xsd, and files containing a nition for a class of XML documents often have the extension xsd You’ll also often see the prefix xsused for the XML Schema namespace, but in fact you can use anything you like A detailed discussion ofthe XML Schema language is a substantial topic that really requires a whole book to do it justice, so I’lljust give you enough of a flavor of how you define your own XML documents schemas so that you’reable to how it differs from a DTD
defi-XML Schemafor Sketcher Documents
XML Schema Definition Language
XML was used to define a language inwhich you can define XML Schema
You define an XML Schema for XMLdocuments that define Sketcher sketches
Sketcher sketches can be stored andretrieved or otherwise communicated aslong as they conform to the XML Schema.XML
Trang 11Defining a Schema
The elements in a schema that defines the structure and content of a class of XML documents are nized in a similar way to the elements in a DTD A schema has a single root element that is unique, andall other elements must be contained within the root element and must be properly nested Everyschema consists of a schemaroot element with a number of nested sub-elements Let’s look at a simpleexample
orga-Here’s a possible schema for XML documents that contain an address:
<! This declares document content >
<xsd:element name=”address” type=”AddressType”/>
<! This defines an element type that is used in the declaration of content >
<xsd:complexType name=”AddressType”>
<xsd:sequence>
<xsd:element name=”buildingnumber” type=”xsd:positiveInteger”/>
<xsd:element name=”street” type=”xsd:string”/>
<xsd:element name=”city” type=”xsd:string”/>
<xsd:element name=”state” type=”xsd:string”/>
<xsd:element name=”zip” type=”xsd:decimal”/>
“http://www.w3.org/2001/XMLSchema”namespace, so xsdis shorthand for the full namespacename Thus schema, complexType, sequence, and elementare all names of elements in a namespacedefined for the XML Schema Definition language The root element for every XML Schema will be aschemaelement Don’t lose sight of what a schema is; it’s a definition of the form of XML documents of
a particular type, so it declares the elements that can be used in such a document and how they may bestructured A document that conforms to a particular schema does not have to identify the schema, but itcan I’ll come back to how you reference a schema when you are defining an XML document a little later
in this chapter
Trang 12The example uses an <annotation/>element to include some simple documentation in the schemadefinition The text that is the documentation appears within a child <documentation/>element Youcan also use an <appInfo/>child element within a <annotation/>element to reference informationlocated at a given URI Of course, you can also use XML comments, <! comment >, within a schema,
as the example shows
In an XML Schema, a declaration specifies an element that is content for a document, whereas a tiondefines an element type The xsd:elementelement is a declaration that the content of a documentconsists of an <address/>element Contrast this with the xsd:complexTypeelement, which is a defi-nition of the AddressTypetype for an element and does not declare document content The xsd:ele-mentelement in the schema declares that the addresselement is document content and happens to be
defini-of type AddressType, which is the type defined by the xsd:complexTypeelement
Now let’s take a look at some of the elements that you use to define a document schema in a little moredetail
Defining Elements
As I said, the xsd:complexTypeelement in the sample schema defines a type of element, not an ment in the document A complex element is simply an element that contains other elements, or that hasattributes, or both Any elements that are complex elements will need a xsd:complexTypedefinition inthe schema to define a type for the element You place the definitions for child elements for a complexelement between the complexTypestart and end tags You also place the definitions for any attributesfor a complex element between the complexTypestart and end tags You can define a simple type using
ele-an xsd:simpleTypedefinition in the schema You would use a simple type definition to constrainattribute values or element content in some way You’ll see examples of this a little later in this chapter
In the example you specify that any element of type AddressTypecontains a sequence of simple ments— a buildingnumberelement, a streetelement, a cityelement, a stateelement, and a zipelement A simple element is an element that does not have child elements or attributes; it can containonly data, which can be of a variety of standard types or of a type that you define The definition of eachsimple element that appears within an element of type AddressTypeuses an xsd:elementelement inwhich the nameattribute specifies the name of the element being defined and the typeattribute definesthe type of data that can appear within the element You can also control the number of occurrences of
ele-an element by specifying values for two further attributes within the xsd:elementtag, as follows:
Attribute Description
minOccurs The value defines the minimum number of occurrences of the element and
must be a positive integer (which can be 0) If this attribute is not defined,then the minimum number of occurrences of the element is 1
maxOccurs The value defines the maximum number of occurrences of the element and
can be a positive integer or the value unbounded If this attribute is notdefined, then the maximum number of occurrences of the element is 1
Thus, if both of these attributes are omitted, as is the case with the child element definitions in the ple schema for elements of type AddressType, the minimum and maximum numbers of occurrences are
Trang 13sam-both one, so the element must appear exactly once If you specify minOccursas 0, then the element isoptional Note that you must not specify a value for minOccursthat is greater than maxOccurs, and thevalue for maxOccursmust not be less than minOccurs You should keep in mind that both attributeshave default values of 1 when you specify a value for just one attribute.
Specifying Data Types
In the example, each of the definitions for the five simple elements within an address element has a typespecified for the data that it’ll contain, and you specify the data type by a value for the typeattribute.The data in a buildingnumberelement is specified to be of type positiveInteger, and the others areall of type string These two types are relatively self-explanatory, corresponding to positive integersgreater than or equal to 0, and strings of characters The XML Schema Definition language allows you tospecify many different values for the typeattribute in an elementdefinition Here are a few otherexamples:
negativeInteger -1, -2, -3, -12345, and so onnonNegativeInteger 0, 1, 2, 3, and so on
The floatand doubletypes correspond to values within the ranges for 32-bit and 64-bit floating-pointvalues, respectively There are many more standard types of data within the XML Schema Definition lan-guage, and because this is extensible, you can also define data types of your own
You can also define a default value for a simple element by using the defaultattribute within the nition of the element For example, within an XML representation of a sketch you will undoubtedly need
defi-to have an element defining a color You might define this as a simple element like this:
<xsd:element name=”color” type=”xsd:string” default=”blue”/>
This defines a color element containing data that is a string and a default value for the string of “blue”
In a similar way, you can define the content for a simple element to be a fixed value by specifying thecontent as the value for the fixedattribute within the xsd:elementtag
Trang 14Defining Attributes for Complex Elements
You use the xsd:attributetag to define an attribute for a complex element Let’s take an example tosee how you do this Suppose you decided that you would define a circle in an XML document for asketch using a <circle/>element, where the coordinates of the center, the radius, and the color arespecified by attributes Within the document schema, you might define the type for an element repre-senting a circle like this:
<xsd:complexType name=”CircleType”>
<xsd:attribute name=”x” type=”xsd:double”/>
<xsd:attribute name=”y” type=”xsd:double”/>
<xsd:attribute name=”radius” type=”xsd:double”/>
<xsd:attribute name=”color” type=”xsd:string”/>
</xsd:complexType>
The elements that define the attributes for the <circle/>element type appear within the complexTypeelement, just like child element definitions You specify the attribute name and the data type for thevalue in exactly the same way as for an element The type specification is not mandatory If you leave itout, it just means that anything goes as a value for the attribute
You can also specify in the definition for an attribute whether it is optional or not by specifying a valuefor the useattribute within the xsd:attributeelement The value for the useattribute can be either
“optional”or “required” For a circle element, none of the attributes are optional, so you might ify the complex type definition to the following:
mod-<xsd:complexType name=”CircleType”>
<xsd:attribute name=”x” type=”xsd:double” use=”required”/>
<xsd:attribute name=”y” type=”xsd:double” use=”required”/>
<xsd:attribute name=”radius” type=”xsd:double” use=”required”/>
<xsd:attribute name=”color” type=”xsd:string” use=”required”/>
</xsd:complexType>
You might also want to restrict the values that can be assigned to an attribute For example, the radiuscertainly cannot be zero or negative, and the color may be restricted to standard colors You could dothis by adding a simple type definition that defines restrictions on these values For example:
<xsd:complexType name=”circle”>
<xsd:attribute name=”x” type=”xsd:double” use=”required”/>
<xsd:attribute name=”y” type=”xsd:double” use=”required”/>
<xsd:attribute name=”radius” use=”required”>
Trang 15I’ve used an xsd:minExclusivespecification to define an exclusive lower limit for values of theradiusattribute, and this specifies that the value must be greater than “0” Alternatively, you mightprefer to use xsd:minExclusivewith a value of “1”to set a sensible minimum value for the radius.You also have the option of specifying an upper limit on the values by specifying either maxInclusive
or maxExclusivevalues For the colorattribute definition, I’ve introduced a restriction that the valuemust be one of a fixed set of values Each value that is allowed is specified in an xsd:enumerationele-ment, and there can be any number of these Obviously, this doesn’t just apply to strings; you can restrictthe values for numeric types to be one of an enumerated set of values For the color attribute the valuemust be one of the four string values specified
Defining Groups of AttributesSometimes several different elements have the same set of attributes To avoid having to repeat the defi-nitions for the elements in such a set for each element that requires them, you can define an attributegroup Here’s an example of a definition for an attribute group:
<xsd:attributeGroup name=”coords”>
<xsd:attribute name=”x” type=”xsd:double” use=”required”/>
<xsd:attribute name=”y” type=”xsd:double” use=”required”/>
</xsd:attributeGroup>
This defines a group of two attributes with names xand ythat specify x and y coordinates for a point.
The name of this attribute group is coords In general, an attribute group can contain other attributegroups You could use the coordsattribute group within a complex type definition like this:
<xsd:element name=”position” type=”PointType”/>
This declares a <point/>element to be of type PointType, and thus have the required attributes xand y
Trang 16Specifying a Group of Element Choices
The xsd:choiceelement in the Schema Definition language enables you to specify that one out of agiven set of elements included in the choice definition must be present This will be useful in specifying
a schema for Sketcher documents because the content is essentially variable — it can be any sequence ofany of the basic types of elements Suppose that you have already defined types for the geometric andtext elements that can occur in a sketch You could use an xsd:choiceelement in the definition of acomplex type for a <sketch/>element like this:
<xsd:complexType name=”SketchType”>
<xsd:choice minOccurs=”0” maxOccurs=”unbounded”>
<xsd:element name=”line” type=”LineType”/>
<xsd:element name=”rectangle” type=”RectangleType”/>
<xsd:element name=”circle” type=”CircleType”/>
<xsd:element name=”curve” type=”CurveType”/>
<xsd:element name=”text” type=”TextType”/>
</xsd:choice>
</xsd:complexType>
This defines that an element of type SketchTypecontains zero or more elements that are each one of thefive types identified in the xsd:choiceelement Thus, each element can be any of the types LineType,RectangleType, CircleType, CurveType, or TextType, which are types for the primitive elements in
a sketch that will be defined elsewhere in the schema Given this definition for SketchType, you candeclare the content for a sketch to be:
<xsd:element name=”sketch” type=”SketchType”/>
This declares the contents of an XML document for a sketch to be a <sketch/>element that has zero ormore elements of any of the types that appeared in the preceding xsd:choiceelement This is exactlywhat is required to accommodate any sketch, so this single declaration defines the entire contents of allpossible sketches All you need is to fill in a few details for the element types I think you know enoughabout XML Schema to put together a schema for Sketcher documents
A Schema for Sketcher
As I noted when I discussed a DTD for Sketcher, an XML document that defines a sketch can have a verysimple structure Essentially, it can consist of a <sketch/>element that contains a sequence of zero ormore elements that define lines, rectangles, circles, curves, or text These child elements may be in anysequence, and there can be any number of them To accommodate the fact that any given child elementmust be one of five types of elements, you could use some of the XML fragments from earlier sections tomake an initial stab at an outline of a Sketcher schema like this:
<?xml version=”1.0” encoding=”UTF-8”?>
<xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema”>
<! The entire document content >
<xsd:element name=”sketch” type=”SketchType”/>
<! Type for a sketch root element >
<xsd:complexType name=”SketchType”>
<xsd:choice minOccurs=”0” maxOccurs=”unbounded”>
Trang 17<xsd:element name=”line” type=”LineType”/>
<xsd:element name=”rectangle” type=”RectangleType”/>
<xsd:element name=”circle” type=”CircleType”/>
<xsd:element name=”curve” type=”CurveType”/>
<xsd:element name=”text” type=”TextType”/>
is declared to be an element with the name sketchthat is of type SketchType The <sketch/>element
is the root element, and because it can have child elements, it must be defined as a complex type Thechild elements within a <sketch/>element are the elements specified by the xsd:choiceelement,which represents a selection of one of the five complex elements that can occur in a sketch TheminOccursand maxOccursattribute values for the xsd:choiceelement determines that there may beany number of such elements, including zero Thus, this definition accommodates XML documentsdescribing any Sketcher sketch All you now need to do is fill in the definitions for the possible varieties
of child elements
Defining Line ElementsLet’s define the same XML elements in the schema for Sketcher as the DTD for Sketcher defines On thatbasis, a line element will have three child elements specifying the color, position, and end point for aline, plus an attribute that specifies the rotation angle You could define the type for a <line/>element
in the schema like this:
<xsd:complexType name=”LineType”>
<xsd:sequence>
<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
<xsd:element name=”endpoint” type=”PointType”/>
Of course, you now must define the types that you’ve used in the definition of the complex type,LineType: the ColorTypeand PointTypeelement types
Trang 18Defining a Type for Color Elements
As I discussed in the context of the DTD for Sketcher, the data for a <color/>element will be supplied
by three attributes that specify the RGB values for the color You can therefore define the element typelike this:
<xsd:complexType name=”ColorType”>
<xsd:attribute name=”R” type=”xsd:nonNegativeInteger” use=”required”/>
<xsd:attribute name=”G” type=”xsd:nonNegativeInteger” use=”required”/>
<xsd:attribute name=”B” type=”xsd:nonNegativeInteger” use=”required”/>
</xsd:complexType>
This is a relatively simple complex type definition There are just the three attributes — R, G, and B —that all have integer values that can be 0 or greater, and are all mandatory
Defining a Type for Point Elements
You saw a definition for the PointTypeelement type earlier:
<xsd:attribute name=”x” type=”xsd:double” use=”required”/>
<xsd:attribute name=”y” type=”xsd:double” use=”required”/>
</xsd:attributeGroup>
You’ll be able to use this attribute group in the definitions for other element types in the schema Thedefinition of this attribute group must appear at the top level in the schema, within the root element;otherwise, it will not be possible to refer to it from within an element declaration
Defining a Rectangle Element Type
The definition of the type for a <rectangle/>element is almost identical to the LineTypedefinition:
<xsd:complexType name=”RectangleType”>
<xsd:sequence>
<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
<xsd:element name=”bottomright” type=”PointType”/>
Trang 19Defining a Circle Element TypeThere’s nothing new in the definition of CircleType:
<xsd:complexType name=”CircleType”>
<xsd:sequence>
<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
</xsd:sequence>
<xsd:attribute name=”radius” type=”xsd:double” use=”required”/>
<xsd:attribute name=”angle” type=”xsd:double” use=”required”/>
</xsd:complexType>
The child elements appear within a sequence element, so their sequence is fixed You have the radiusand angle for a circle specified by attributes that both have values of type double, and are bothmandatory
Defining a Curve Element Type
A type for the curve element does introduce something new because the number of child elements isvariable A curve is defined by the origin plus one or more points, so the type definition must allow for
an unlimited number of child elements defining points Here’s how you can accommodate that:
<xsd:complexType name=”CurveType”>
<xsd:sequence>
<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
<xsd:element name=”point” type=”PointType” minOccurs=”1”
The type for <text/>elements is the odd one out, but it’s not difficult It involves four child elementsfor the color, the position, the font, and the text itself, plus an attribute to specify the angle The type def-inition will be as follows:
<xsd:complexType name=”TextType”>
<xsd:sequence>
<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
<xsd:element name=”font” type=”FontType”/>
<xsd:element name=”string” type=”xsd:string”/>
</xsd:sequence>
<xsd:attribute name=”angle” type=”xsd:double” use=”required”/>
</xsd:complexType>
Trang 20The text string itself is a simple <string/>element, but the font is a complex element that requires atype definition:
<xsd:complexType name=”FontType”>
<xsd:attribute name=”fontname” type=”xsd:string” use=”required”/>
<xsd:attribute name=”fontstyle” use=”required”>
con-The Complete Sketcher Schema
If you assemble all the fragments into a single file, you’ll have the following definition for the Sketcherschema that defines XML documents containing a sketch:
<?xml version=”1.0” encoding=”UTF-8”?>
<xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema”>
<xsd:element name=”sketch” type=”SketchType”/>
<! Type for a sketch root element >
<xsd:complexType name=”SketchType”>
<xsd:choice minOccurs=”0” maxOccurs=”unbounded”>
<xsd:element name=”line” type=”LineType”/>
<xsd:element name=”rectangle” type=”RectangleType”/>
<xsd:element name=”circle” type=”CircleType”/>
<xsd:element name=”curve” type=”CurveType”/>
<xsd:element name=”text” type=”TextType”/>
<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
<xsd:element name=”endpoint” type=”PointType”/>
Trang 21<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
<xsd:element name=”bottomright” type=”PointType”/>
<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
</xsd:sequence>
<xsd:attribute name=”radius” type=”xsd:double”/>
<xsd:attribute name=”angle” type=”xsd:double”/>
</xsd:complexType>
<! Type for a curve element >
<xsd:complexType name=”CurveType”>
<xsd:sequence>
<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
<xsd:element name=”point” type=”PointType” minOccurs=”1”
<xsd:element name=”color” type=”ColorType”/>
<xsd:element name=”position” type=”PointType”/>
<xsd:element name=”font” type=”FontType”/>
<xsd:element name=”string” type=”xsd:string”/>
<xsd:attribute name=”fontname” type=”xsd:string” use=”required”/>
<xsd:attribute name=”fontstyle” use=”required”>
Trang 22<xsd:attributeGroup ref=”coords”/>
</xsd:complexType>
<! Type for a color element >
<xsd:complexType name=”ColorType”>
<xsd:attribute name=”R” type=”xsd:nonNegativeInteger” use=”required”/>
<xsd:attribute name=”G” type=”xsd:nonNegativeInteger” use=”required”/>
<xsd:attribute name=”B” type=”xsd:nonNegativeInteger” use=”required”/>
</xsd:complexType>
<! Attribute group specifying point coordinates >
<xsd:attributeGroup name=”coords”>
<xsd:attribute name=”x” type=”xsd:double” use=”required”/>
<xsd:attribute name=”y” type=”xsd:double” use=”required”/>
</xsd:attributeGroup>
</xsd:schema>
This is somewhat longer than the DTD for Sketcher, but it does provide several advantages All the data
in the document now has types specified so the document is more precisely defined This schema isXML, so the documents and the schema are defined in fundamentally the same way and are equallycommunicable There is no problem combining one schema with another because namespaces are sup-ported, and every schema can be easily extended You can save this as a file Sketcher.xsd
A Document That Uses a Schema
A document that has been defined in accordance with a particular schema is called an instance mentfor that schema An instance document has to identify the schema to which it conforms, and this isdone using attribute values within the root element of the document Here’s an XML document for asketch that identifies the location of the schema:
The value for the xmlnsattribute identifies the namespace name http://www.w3.org/2001/
XMLSchema-instanceand specifies xsias the prefix used to represent this namespace name In aninstance document, the value for the noNamespaceSchemaLocationattribute in the xsinamespace is ahint about the location where the schema for the document can be found Here the value for
noNamespaceSchemaLocationis a URI for a file on the local machine, and the spaces are escapedbecause this is required within a URI The value you specify for the xsi:noNamespaceSchema
Locationattribute is always regarded as a hint, so in principle an application or parser processing thisdocument is not obliged to take account of this In practice though, this usually will be taken account ofwhen the document is processed, unless there is good reason to ignore it
Trang 23You define a value for the noNamespaceSchemaLocationattribute because a sketch document has nonamespace; if it had a namespace, you would define a value for the schemaLocationattribute thatincludes two URIs separated by whitespace within the value specification — the URI for the namespaceand a URI that is a hint for the location of the namespace Obviously, since one or more spaces separatethe two URIs, the URIs cannot contain unescaped spaces.
Programming with XML DocumentsRight at the beginning of this chapter I introduced the notion of an XML processor as a module that isused by an application to read XML documents An XML processor parses the contents of a documentand makes the elements, together with their attributes and content, available to the application, so it is
also referred to as an XML parser In case you haven’t met the term before, a parser is just a program
module that breaks down text in a given language into its component parts A natural language sor would have a parser that identifies the grammatical segments in each sentence A compiler has aparser that identifies variables, constants, operators, and so on in a program statement An applicationaccesses the content of a document through an API provided by an XML parser and the parser does thejob of figuring out what the document consists of
proces-Java supports two complementary APIs for processing an XML document:
❑ SAX, which is the Simple API for XML parsing
❑ DOM, which is the Document Object Model for XML The support in JDK 5.0 is for DOM level 3 and for SAX version 2.0.2 JDK 5.0 also supports XSLT version 1.0, where XSL is the Extensible Stylesheet Language and T is Transformations — a language for trans-
forming one XML document into another, or into some other textual representation such as HTML.However, I’ll concentrate on the basic application of DOM and SAX XSLT is such an extensive topic thatthere are several books devoted entirely to it
Before I get into detail on these APIs, let’s look at the broad differences between SAX and DOM, and get
an idea of the circumstances in which you might choose to use one rather than the other
SAX Processing
SAX uses an event-based process for reading an XML document that is implemented through a callback
mechanism This is very similar to the way in which you handle GUI events in Java As the parser reads adocument, each parsing event, such as recognizing the start or end of an element, results in a call to a par-
ticular method associated with that event Such a method is often referred to as a handler It is up to you
Trang 24to implement these methods to respond appropriately to the event Each of your methods then has theopportunity to react to the event, which will result in it being called in any way that you wish In Figure 22-4you can see the events that would arise from the XML document example that you saw earlier.
Figure 22-4
Each type of event results in a different method in your program being called There are, for example,different events for registering the beginning and end of a document You can also see that the start andend of each element results in two further kinds of events, and another type of event occurs for each seg-ment of document data Thus, this particular document will involve five different methods in your pro-gram being called — some of them more than once, of course, so there is one method for each type ofevent
Because of the way SAX works, your application inevitably receives the document a piece at a time, with
no representation of the whole document This means that if you need to have the whole documentavailable to your program with its elements and content properly structured, you have to assemble ityourself from the information supplied piecemeal to your callback methods
Of course, it also means that you don’t have to keep the entire document in memory if you don’t need it,
so if you are just looking for particular information from a document, all <phonenumber>elements, forexample, you can just save those as you receive them through the callback mechanism, and discard therest As a consequence, SAX is a particularly fast and memory efficient way of selectively processing thecontents of an XML document
First of all, SAX itself is not an XML document parser; it is a public domain definition of an interface to
an XML parser, where the parser is an external program The public domain part of the SAX API is inthree packages that are shipped as part of the JDK:
<position>
<x-coordinate>
30 </x-coordinate>
<y-coordinate>
50 </y-coordinate>
</position>
</circle>
Start documentStart elementStart elementCharactersEnd elementStart elementStart elementCharactersEnd elementStart elementCharactersEnd elementEnd elementEnd elementEnd document
:::::::::::::::
circleradius15radiuspositionx-coordinate30
x-coordinatey-coordinate50
y-coordinatepositioncircle
Trang 25❑ org.xml.sax— This defines the Java interfaces specifying the SAX API and the InputSourceclass that encapsulates a source of an XML document to be parsed.
❑ org.xml.sax.helpers— This defines a number of helper classes for interfacing to a SAXparser
❑ org.xml.sax.ext— This defines interfaces representing optional extensions to SAX2 to obtaininformation about a DTD, or to obtain information about comments and CDATA sections in adocument
In addition to these, the javax.xml.parserspackage provides factory classes that you use to gainaccess to a parser, and the javax.xml.transformpackage defines interfaces and classes for XSLT 1.0processing of an XML document
In Java terms there are several interfaces involved The XMLReaderinterface defined in theorg.xml.saxpackage specifies the methods that the SAX parser will call, as it recognizes elements,attributes, and other components of an XML document You must provide a class that implements thesemethods and responds to the method calls in the way that you want
Document ObjectReturned to your Program
<position>
<x-coordinate>
30 </x-coordinate>
<y-coordinate>
50 </y-coordinate>
</position>
</circle>
Trang 26Once you have the Documentobject available, you can call the Documentobject’s methods to navigatethrough the elements in the document tree starting with the root element With DOM, the entire docu-ment is available for you to process as often and in as many ways as you want This is a major advantageover SAX processing The downside to this is the amount of memory occupied by the document — there
is no choice, you get it all, no matter how big it is With some documents the amount of memory
required may be prohibitively large
DOM has one other unique advantage over SAX It allows you to modify existing documents or createnew ones If you want to create an XML document programmatically and then transfer it to an externaldestination such as a file or another computer, DOM is the API for this since SAX has no direct provisionfor creating or modifying XML documents I will go into detail on how you can use a DOM parser in thenext chapter
Accessing Parsers
The javax.xml.parserspackage defines four classes supporting the processing of XML documents:
SAXParserFactory Enables you to create a configurable factory object that you can
use to create a SAXParserobject encapsulating a SAX-basedparser
SAXParser Defines an object that wraps a SAX-based parser
DocumentBuilderFactory Enables you to create a configurable factory object that you can
use to create a DocumentBuilderobject encapsulating aDOM-based parser
DocumentBuilder Defines an object that wraps a DOM-based parser
All four classes are abstract This is because JAXP is designed to allow different parsers and their factoryclasses to be plugged in Both DOM and SAX parsers are developed independently of the Java JDK so it
is important to be able to integrate new parsers as they come along The Xerces parser that is currentlydistributed with the JDK is controlled and developed by the Apache Project, and it provides a very com-prehensive range of capabilities However, you may want to take advantage of the features provided byother parsers from other organizations, and JAXP allows for that
These abstract classes act as wrappers for the specific factory and parser objects that you need to use for
a particular parser and insulate your code from a particular parser implementation An instance of a tory object that can create an instance of a parser is created at run time, so your program can use a differ-ent parser without changing or even recompiling your code Now that you have a rough idea of thegeneral principles, let’s get down to specifics and practicalities, starting with SAX
fac-Using SAX
To process an XML document with SAX, you first have to establish contact with the parser that you want
to use The first step toward this is to create a SAXParserFactoryobject like this:
SAXParserFactory spf = SAXParserFactory.newInstance();
Trang 27The SAXParserFactoryclass is defined in the javax.xml.parserspackage along with the SAXParserclass that encapsulates a parser The SAXParserFactoryclass is abstract but the static newInstance()method will return a reference to an object of a class type that is a concrete implementation of
SAXParserFactory This will be the factory object for creating an object encapsulating a SAX parser.Before you create a parser object, you can condition the capabilities of the parser object that theSAXParserFactoryobject will create For example, the SAXParserFactoryobject has methods fordetermining whether the parser that it will attempt to create will be namespace aware or will validatethe XML as it is parsed:
isNamespaceAware() Returns trueif the parser to be created is namespace aware, and
falseotherwiseisValidating() Returns trueif the parser to be created will validate the XML
during parsing, and falseotherwise
You can set the factory object to produce namespace aware parsers by calling its setNamespaceAware()method with an argument value of true An argument value of falsesets the factory object to produceparsers that are not namespace aware A parser that is namespace aware can recognize the structure ofnames in a namespace — with a colon separating the namespace prefix from the name A namespaceaware parser can report the URI and local name separately for each element and attribute A parser that
is not namespace aware will report only an element or attribute name as a single name even when it tains a colon In other words, a parser that is not namespace aware will treat a colon as just another char-acter that is part of a name
con-Similarly, calling the setValidating()method with an argument value of truewill cause the factoryobject to produce parsers that can validate the XML as a document is parsed A validating parser canverify that the document body has a DTD or a schema, and that the document content is consistent withthe DTD or schema identified within the document
You can now use the SAXParserFactoryobject to create a SAXParserobject as follows:
SAXParser parser = null;
try {parser = spf.newSAXParser();
}catch(SAXException e){
e.printStackTrace(System.err);
System.exit(1);
} catch(ParserConfigurationException e) {e.printStackTrace(System.err);
System.exit(1);
}The SAXParserobject that you create here will encapsulate the parser supplied with the JDK ThenewSAXParser()method for the factory object can throw the two exceptions you are catching here AParserConfigurationExceptionwill be thrown if a parser cannot be created consistent with the con-figuration determined by the SAXParserFactoryobject, and a SAXExceptionwill be thrown if anyother error occurs For example, if you call the setValidating()option and the parser does not havethe capability for validating documents, this exception would be thrown This should not arise with theparser supplied with the JDK though, because it supports both of these features
Trang 28The ParserConfigurationExceptionclass is defined in the javax.xml.parserspackage and theSAXExceptionclass is in the org.xml.saxpackage Now let’s see what the default parser is by puttingthe code fragments you have looked at so far together in a working example.
Try It Out Accessing a SAX Parser
Here’s the code to create a SAXParserobject and output some details about it to the command line:import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
public class TrySAX {
public static void main(String args[]) {
// Create factory objectSAXParserFactory spf = SAXParserFactory.newInstance();
System.out.println(“Parser will “+(spf.isNamespaceAware()?””:”not “) +
“be namespace aware”);
System.out.println(“Parser will “+(spf.isValidating()?””:”not “) +
}
}
When I ran this I got the following output:
Parser will not be namespace aware
Parser will not validate XML
Parser object is: com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl@118f375How It Works
The output shows that the default configuration for the SAX parser produced by the SAXParserFactoryobject spfwill be neither namespace aware nor validating The parser supplied with the JDK is theXerces parser from the XML Apache Project This parser implements the W3C standard for XML, the defacto SAX2 standard, and the W3C DOM standard It also provides support for the W3C standard forXML Schema You can find detailed information on the advantages of this particular parser on thehttp://xml.apache.orgweb site
Trang 29The code to create the parser works as I have already discussed Once you have an instance of the tory method, you use that to create an object encapsulating the parser Although the reference isreturned as type SAXParser, the object is of type SAXParserImpl, which is a concrete implementation
fac-of the abstract SAXParserclass for a particular parser
The Xerces parser is capable of validating XML and can be namespace aware All you need to do is ify which of these options you require by calling the appropriate method for the factory object You canset the parser configuration for the factory object spfso that you get a validating and namespace awareparser by adding two lines to the program:
spec-// Create factory objectSAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(true);
If you compile and run the code again, you should get output something like:
Parser will be namespace awareParser will validate XMLParser object is: com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl@867e89You arrive at a SAXParserinstance without tripping any exceptions, and you clearly now have a names-pace aware and validating parser By default the Xerces parser will validate an XML document with aDTD To get it to validate a document with an XML Schema, you need to set another option for theparser, as I’ll discuss in the next section
Using a Different Parser
You might like to try a different parser at some point The simplest way to try out an alternative parser is
to include the path to the jarfile that contains the parser implementation in the -classpathoption
on the command line For example, suppose you have downloaded a newer version of the Xerces 2parser from the Apache web site that you want to try Once you have extracted the zip files and storedthem on your C:\drive, say, you could run the example with the new Xerces parser with a commandsimilar to this:
java -classpath ;C:\xerces-2_6_2\xercesImpl.jar -enableassertions TrySAXDon’t forget the period in the classpathdefinition that specifies the current directory Without it theTrySAX.classfile will not be found If you omit the –classpathoption, the program will revert tousing the default parser Of course, you can use this technique to select a particular parser when youhave several installed on your PC Just add the path to the directory that contains the JAR for the parser
to the classpath
Parser Features and Properties
Specific parsers, such as the Xerces parser that you get with the JDK, define their own features and
prop-erties that control and report on the processing of XML documents A feature is an option in processing
XML that is either on or off, so a feature is set as a booleanvalue, either trueor false A property is a
parameter that you set to a particular object value, usually of type String There are standard SAX2
Trang 30features and properties that may be common to several parsers, and non-standard features and ties that are parser-specific Note that although a feature or property may be standard for SAX2, thisdoes not mean that a SAX2 parser will necessarily support it.
proper-Querying and Setting Parser Features
Namespace awareness and validating capability are both features of a parser, and you already knowhow you tell the parser factory object that you want a parser with these features turned on In general,each parser feature is identified by a name that is a fully qualified URI, and the standard features forSAX2 parsing have names within the namespace http://xml.org/sax/features/ For example, thefeature specifying namespace awareness has the name http://xml.org/sax/features/namespaces.Here are a few of the standard features that are defined for SAX2 parsers:
namespaces When true, the parser replaces prefixes to
ele-ment and attribute names with the ing namespace URIs If you set this feature totrue, the document must have a schema thatsupports the use of namespaces All SAXparsers must support this feature
correspond-namespace-prefixes When true, the parser reports the original
pre-fixed names and attributes used for namespacedeclarations The default value for this feature
isfalse All SAX parsers must support thisfeature
validation When true, the parser will validate the
docu-ment and report any errors The default valuefor the validation feature is false
external-general-entities When true, the parser will include general
entities
string-interning When true, all element and attribute names,
namespace URIs, and local names use Javastring interning so each of these corresponds to
a unique object This feature is always trueforthe Xerces parser
external-parameter-entities When true, the parser will include external
parameter entities and the external DTD subset.lexical-handler/parameter-entities When true, the beginning and end of parame-
ter entities will be reported
There are other non-standard features for the Xerces parser Consult the documentation for the parser onthe Apache web site for more details Apart from the namespacesand namespaces-prefixesfeaturesthat all SAX2 parsers are required to implement, there is no set collection of features for a SAX2 parser,
so a parser may implement any number of arbitrary features that may or may not be in the list of dard features
Trang 31stan-You have two ways to query and set features for a parser stan-You can call the getFeature()andsetFeature()methods for the SAXParserFactoryobject to do this before you create the SAXParserobject The parser that is created will then have the features switched on Alternatively, you can create aSAXParserobject using the factory object and then obtain an org.sax.XMLReaderobject referencefrom it by calling the getXMLReader()method You can then call the getFeature()and
setFeature()methods for the XMLReaderobject XMLReaderis the interface that a concrete SAX2parser implements to allow features and properties to be set and queried The principle difference
in use between calling the factory object methods and calling the XMLReaderobject methods isthat the methods for a SAXParserFactoryobject can throw an exception of type
javax.xml.parsers.ParserConfigurationExceptionif a parser cannot be created with thefeature specified
Once you have created an XMLParserobject, you can obtain an XMLReaderobject reference from theparser like this:
XMLReader reader = null;
try{
reader = parser.getXMLReader();
} catch(org.xml.sax.SAXException e) {System.err.println(e.getMessage());
}The getFeature()method that the XMLReaderinterface declares for querying a feature expects
an argument of type Stringthat identifies the feature you are querying The method returns abooleanvalue that indicates the state of the feature The setFeature()method expects two argu-ments; the first argument is of type Stringand identifies the feature you are setting, and thesecond is of type booleanand specifies the state to be set The setFeature()method can throwexceptions of type org.xml.SAXNotRecognizedExceptionif the feature is not found, or of typeorg.xml.sax.SAXNotSupportedExceptionif the feature name was recognized but cannot be set tothe booleanvalue you specify Both exception types have SAXExceptionas a base, so you can usethis type to catch either of them Here’s how you might set the features for the Xerces parser so that itwill support namespace prefixes:
String nsPrefixesFeature = “http://xml.org/sax/features/namespace-prefixes”;
XMLReader reader = null;
try{
reader = parser.getXMLReader();
reader.setFeature(nsPrefixesFeature, true);
} catch(org.xml.sax.SAXException e) {System.err.println(e.getMessage());
}This sets the feature to make the parser report the original prefixed element and attribute names
If you want to use the SAXParserFactoryobject to set the features before you create the parser object,you could do it like this:
String nsPrefixesFeature = “http://xml.org/sax/features/namespace-prefixes”;SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = null;
try {spf.setFeature(nsPrefixesFeature, true);
Trang 32parser = spf.newSAXParser();
System.out.println(“Parser object is: “+ parser);
}catch(SAXException e) {e.printStackTrace(System.err);
System.exit(1);
} catch(ParserConfigurationException e) {e.printStackTrace(System.err);
System.exit(1);
}You must call the setFeature()method for the SAXParserFactoryobject in a tryblock because ofthe exceptions it may throw The catchblock for exceptions of type SAXExceptionwill catch theSAXNotRecognizedExceptionand SAXNotSupportedExceptionexceptions if they are thrown.Setting Parser Properties
As I said at the outset, a property is a parser parameter with a value that is an object, usually a Stringobject Some properties have values that you set to influence the parser’s operation, while the values forother properties are set by the parser for you to retrieve to provide information about the parsing process You can set the properties for a parser by calling the setProperty()method for the SAXParserobjectafter you have created it The first argument to the method is the name of the property as type String,and the second argument is the value for the property A property value can be of any class type, as theparameter type is Object, but it is usually of type String.The setProperty()method will throw aSAXNotRecognizedExceptionif the property name is not recognized or a SAXNotSupported
Exceptionif the property name is recognized but not supported Both of these exception classes aredefined in the org.xml.saxpackage Alternatively, you can get and set properties using the XMLReaderobject reference that you used to set features The XMLReaderinterface declares the getProperty()andsetProperty()methods with the same signatures as those for the SAXParserobject
You can also retrieve the values for some properties during parsing to obtain additional information aboutthe most recent parsing event You use the parser’s getProperty()method in this case The argument tothe method is the name of the property, and the method returns a reference to the property’s value
As with features, there is no defined set of parser properties, so you need to consult the parser tation for information on these There are four standard properties for a SAX parser, none of which arerequired to be supported by a SAX parser Since these properties involve the more advanced features ofSAX parser operation, they are beyond the scope of this book, but if you are interested, they are docu-mented in the description for the org.xml.saxpackage that you’ll find in the JDK documentation
documen-Parsing Documents with SAX
To parse a document using the SAXParserobject you simply call its parse()method You have to ply two arguments to the parse()method The first identifies the XML document, and the second is areference of type DefaultHandlerto a handler object that you will have created to process the contents
sup-of the document The DefaultHandlerobject must contain a specific set of public methods that theSAXParserobject expects to be able to call for each event, where each type of event corresponds to aparticular syntactic element it finds in the document
Trang 33The DefaultHandlerclass that is defined in the org.xml.sax.helperspackage already contains nothing definitions of all the callback methods that the SAXParserobject expects to be able to call Thus,all you have to do is to define a class that extends the DefaultHandlerclass and then override themethods in the DefaultHandlerclass for the events that you are interested in But let’s not gallop toofar ahead You need to look into the versions of the parse()method that you have available before youget into handling parsing events.
do-The SAXParserclass defines ten overloaded versions of the parse()method, but you’ll be interested inonly five of them The other five use a deprecated handler type HandlerBasethat was applicable toSAX1, so you can ignore those and just look at the versions that relate to SAX2 All versions of themethod have a return type of void, and the five varieties of the parse()method that you’ll consider are
as follows:
parse(File aFile, Parses the document in the file specified by
DefaultHandler handler) aFileusing handleras the object containing
the callback methods called by the parser Thiswill throw an exception of type IOException
if an I/O error occurs, and of typeIllegalArgumentExceptionif aFileis null.parse(String uri, Parses the document specified by uriusing
DefaultHandler handler) handleras the object defining the callback
methods This will throw an exception of typeSAXExceptionif uriis null, and an exception
of type IOExceptionif an I/O error occurs.parse(InputStream input, Parses inputas the source of the XML
DefaultHandler handler) withhandleras the event handler This
will throw an exception of type IOException
if an I/O error occurs, and of typeIllegalArgumentExceptionif inputis null.parse(InputStream input, Parses input as the previous method, but uses
DefaultHandler handler, systemIDto resolve any relative URIs
String systemID)parse(InputSource source, Parses the document specified by sourceusing
DefaultHandler handler) handler as the object providing the callback
methods to be called by the parser
The InputSourceclass is defined in the org.xml.saxpackage It defines an object that wraps a variety
of sources for an XML document that you can use to pass a document reference to a parser You can ate an InputSourceobject from a java.io.InputStreamobject, a java.io.Readerobject encapsulat-ing a character stream, or a Stringspecifying a URI — either a public name or a URL If you specify thedocument source as a URL, it must be fully qualified
cre-Implementing a SAX Handler
As I said, the DefaultHandlerclass in the org.xml.sax.helperspackage provides a default do-nothing implementation of each of the callback methods a SAX parser may call when parsing a
Trang 34document These methods are declared in four interfaces that are all implemented by the
DefaultHandlerclass:
❑ The ContentHandlerinterface declares methods that will be called to identify the content of adocument to an application You will usually want to implement all the methods defined in thisinterface in your subclass of DefaultHandler
❑ The EntityResolverinterface declares one method, resolveEntity(), that is called by aparser to pass a public and/or system ID to your application to allow external entities in thedocument to be resolved
❑ The DTDHandlerinterface declares two methods that will be called to notify your application ofDTD-related events
❑ The ErrorHandlerinterface defines three methods that will be called when the parser hasidentified an error of some kind in the document
All four interfaces are defined in the org.xml.saxpackage Of course, the parse()method for theSAXParserobject expects you to supply a reference of type DefaultHandleras an argument, so youhave no choice but to extend the DefaultHandlerclass in which you are defining your handler class.This accommodates just about anything you want to do since you decide which base class methods youwant to override
The methods that you must implement to deal with parsing events that relate to document content arethose declared by the ContentHandlerinterface so let’s concentrate on those first All the methods have
a voidreturn type, and they are as follows:
startDocument() Called when the start of a document is
recognized
recognized
startElement(String uri, Called when the start of an element is
String localName, recognized Up to three names may be String qName, provided for the element:
qNameis the qualified name for the element.This will be just the name if the parser is notnamespace aware (A colon, if it appears, isthen just an ordinary character.)
Trang 35The attrreference encapsulates theattributes for the element that have explicitvalues.
endElement(String uri, Called when the end of an element is
String localName, recognized The references passed to the String qName) method are as described for the startEle-
ment()method
characters(char[] ch, Called for each segment of character data
int start, that is recognized Note that a contiguous int length) segment of text within an element can be
returned as several chunks by the parser viaseveral calls to this method The charactersthat are available are from ch[start]toch[start+length-1], and you must not try
to access the array outside these limits
ignorableWhitespace(char[] ch, Called for each segment of ignorable
int start, whitespace that is recognized within the int length) content of an element Note that a contigu-
ous segment of ignorable whitespace within
an element can be returned as several chunks
by the parser via several calls to this method.The whitespace characters are that availableare from ch[start]to ch[start+length-1], and you must not try to access the arrayoutside these limits
startPrefixMapping(String prefix, Called when the start of a prefix URI
String uri) namespace mapping is identified Most of
the time you can disregard this method, as aparser will automatically replace prefixes forelements and attribute names by default.endPrefixMapping(String prefix) Called when the end of a prefix URI names-
pace mapping is identified Most of the timeyou can disregard this method for the reasonnoted in the preceding method
processingInstruction(String target, Called for each processing instruction
String data) recognized
skippedEntity(String name) Called for each entity that the parser skips.setDocumentLocator(Locator locator) Called by the parser to pass a Locatorobject
to your code that you can use to determinethe location in the document of any SAX doc-ument event The Locatorobject can pro-vide the public identifier, the system ID, theline number, and the column number in thedocument for any event Just implement thismethod if you want to receive this informa-tion for each event
Trang 36Your implementations of these methods can throw an exception of type SAXExceptionif an erroroccurs.
When the startElement()method is called, it receives a reference to an object of type
org.xml.sax.Attributesas the last argument This object encapsulates information about all theattributes for the element The Attributesinterface declares methods you can call for the object toobtain details of each attribute name, its type, and its value There are methods for obtaining this infor-mation about an attribute using either an index value to select a particular attribute, or using an attributename — either a prefix qualified name or a name qualified by a namespace name I’ll just describe themethods relating to using an index because that’s what the code examples will use Index values startfrom 0 The methods that the Attributesinterface declares for accessing attribute information using anindex are as follows:
getLength() Returns a count of the number of attributes encapsulated in the
objectgetLocalName(int index) Returns a reference to a Stringobject containing the local
name of the attribute for the indexvalue passed as the ment
argu-getQName(int index) Returns a reference to a Stringobject containing the XML 1.0
qualified name of the attribute for the indexvalue passed asthe argument
getType(int index) Returns a reference to a Stringobject containing the type of
the attribute for the indexvalue passed as the argument Thetype is returned as one of the following:
“CDATA”, “ID”, “IDREF”, “IDREFS”, “NMTOKEN”,
“NMTOKENS”, “ENTITY”, “ENTITIES”, “NOTATION”
getValue(int index) Returns a reference to a Stringobject containing the value of
the attribute for the indexvalue passed as the argumentgetURI(int index) Returns a reference to a Stringobject containing the
attribute’s namespace URI, or the empty string if no URI isavailable
If the index value that you supply to any of the getXXX()methods here is out of range, then the methodreturns null
Given a reference, attr, of type Attributes, you can retrieve information about all the attributes withthe following code:
int attrCount = attr.getLength();
if(attrCount>0) {System.out.println(“Attributes:”);
for(int i = 0 ; i<attrCount ; i++) {System.out.println(“ Name : “ + attr.getQName(i));
System.out.println(“ Type : “ + attr.getType(i));
System.out.println(“ Value: “ + attr.getValue(i));
}}
Trang 37This is very straightforward You look for data on attributes only if the value returned by the getLength()method is greater than zero You then retrieve information about each attribute in the forloop.
The DefaultHandlerclass is just like the adapter classes you have used for defining GUI event dlers All you have to do to implement your own handler is extend the DefaultHandlerclass anddefine your own implementations for the methods you are interested in The same caveat applies herethat applied with adapter classes — you must take care that the signatures of your methods are correct.Otherwise, you are simply adding a new method rather than overriding one of the inherited methods Inthis case, your program will then do nothing for the given event since the original do-nothing version ofthe method will execute, rather than your version Let’s try implementing a handler class
han-Try It Out Handling Parsing Events
Let’s first define a handler class to deal with document parsing events You’ll just implement a few of themethods from the ContentHandlerinterface in this — only those that apply to a very simple document —and you won’t worry about errors for the moment Here’s the code:
}public void startElement(String uri, String localName, String qname,
Attributes attr) {System.out.println(“Start element: local name: “ + localName + “ qname: “
+ qname + “ uri: “+uri);int attrCount = attr.getLength();
if(attrCount>0) {System.out.println(“Attributes:”);
for(int i = 0 ; i<attrCount ; i++) {System.out.println(“ Name : “ + attr.getQName(i));
System.out.println(“ Type : “ + attr.getType(i));
System.out.println(“ Value: “ + attr.getValue(i));
}} }public void endElement(String uri, String localName, String qname) {System.out.println(“End element: local name: “ + localName + “ qname: “
+ qname + “ uri: “+uri);}
public void characters(char[] ch, int start, int length) {System.out.println(“Characters: “ + new String(ch, start, length));
Trang 38public void ignorableWhitespace(char[] ch, int start, int length) {
System.out.println(“Ignorable whitespace: “ + new String(ch, start, length));}
}
Each handler method just outputs information about the event to the command line
Now you can define a program to use a handler of this class type to parse an XML document You canmake the example read the name of the XML file to be processed from the command line Here’s thecode:
public class TrySAXHandler {
public static void main(String args[]) {
if(args.length == 0) {System.out.println(“No file to process Usage is:”
+”\njava TrySax \”filename\” “);return;
}File xmlFile = new File(args[0]);
System.out.println(“Parser will “+(spf.isNamespaceAware()?””:”not “)
+ “be namespace aware”);System.out.println(“Parser will “+(spf.isValidating()?””:”not “)
+ “validate XML”);try {
parser = spf.newSAXParser();
System.out.println(“Parser object is: “+ parser);
} catch(SAXException e) {e.printStackTrace(System.err);
System.exit(1);
} catch(ParserConfigurationException e) {e.printStackTrace(System.err);
System.exit(1);
}System.out.println(“\nStarting parsing of “+file+”\n”);
Trang 39MySAXHandler handler = new MySAXHandler();
try {parser.parse(file, handler);
} catch(IOException e) {e.printStackTrace(System.err);
} catch(SAXException e) {e.printStackTrace(System.err);
}}}
I created the circle.xmlfile with the following content:
java TrySAXHandler “C:/Beg Java Stuff/circle.xml”
On my computer the program produced the following output:
Parser will be namespace awareParser will validate XMLParser object is: com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl@1d8957fStarting parsing of circle.xml
Start document:
Start element: local name: circle qname: circle uri:
Attributes:
Name : radiusType : CDATAValue: 20Name : angleType : CDATAValue: 0Characters:
Start element: local name: color qname: color uri:
Attributes:
Name : RType : CDATAValue: 255Name : GType : CDATA
Trang 40Next you call the static process()method with a reference to the Fileobject for the XML document asthe argument This method creates the XMLParserobject in the way you’ve seen previously and thencreates a handler object of type MySAXHandlerfor use by the parser The parsing process is started bycalling the parse()method for the parser object, parser, with the filereference as the first argumentand the handlerreference as the second argument This identifies the object whose methods will becalled for parsing events.
You have overridden six of the do-nothing methods that are inherited from DefaultHandlerin theMySAXHandlerclass and the output indicates which ones are called Your method implementations justoutput a message along with the information that is passed as arguments You can see from the outputthat there is no URI for a namespace in the document so the value for qnameis identical to localname.Note how you form a Stringobject in the characters()method from the specified sequence of ele-ments in the charray You must access only the lengthelements from this array that start with the ele-ment ch(start)
The output also shows that the characters()method is sometimes called with just whitespace passed
to the method in the charray This whitespace is ignorable whitespace that appears between the ments, but the parser is not recognizing it as such This is because there is no DTD to define how ele-ments are to be constructed in this document so the parser has no way to know what can be ignored.You can see that the output shows string values for both a local nameand a qname This is because youhave the namespace awareness feature switched on If you comment out the statement that calls