Prototype for the Character Data Handler The user-defined character data handler function, registered with the parser through a call to the xml_set_character_data_handler function, will
Trang 1<element attr1="value1" attr2="value2" ></element>
However, as we noted above, there is also an alternative syntax, whereby we place the closing slash at the end of the opening element:
<element attr1="value1" attr2="value2" />
The following line defines an empty element image, with an attribute src with the value
The syntax of the processing instruction might be strangely familiar to you:
There are two types of entities – internal and external The replacement text for an internal entity is specified in an entity declaration, whereas the replacement text for an external entity resides in a separate file, the location of which is specified in the entity declaration
After the entity has been declared, it can be referenced within the document using the following syntax:
Trang 2The Document Type Definition
The document type definition of an XML document is defined within a declaration known as the
document type declaration The DTD can be contained within this declaration, or the declaration can
point to an external document containing the DTD The DTD consists of element type declarations, attribute list declarations, entity declarations, and notation declarations We will cover all of these in this section
Be sure to distinguish between the document type definition, or DTD, and the document type declaration
The syntax for a document type definition is:
<!DOCTYPE rootelementname [
<!DOCTYPE rootelementname SYSTEM "http://www.harawat.com/books.dtd">
The rootelementname is the name of the root element of the document The location of the file containing the DTD is http://www.harawat.com/books.dtd
Element Type Declarations
The element type declaration indicates whether the element contains other elements, text, or is empty
It also specifies whether the elements are mandatory or optional, and how many times the elements can appear
An element type declaration, specifying that an element can contain character data, looks as follows: Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 3<!ELEMENT elementname (#PCDATA)>
Here ELEMENT is a keyword, elementname is the name of the element, and #PCDATA is also a keyword #PCDATA stands for "parsed character data", that is, the data that can be handled by the XML parser
For example, the following element declaration specifies that the element title contains character data:
<!ELEMENT title (#PCDATA)>
The syntax of an element type declaration for an empty element is:
<!ELEMENT elementname EMPTY>
Here elementname is the name of the element, and EMPTY is a keyword
For example, the following element type declaration specifies that element image is empty:
<!ELEMENT image EMPTY>
The syntax of an element type declaration for an element can contain anything – other elements or parsed character data – is as follows:
<!ELEMENT elementname ANY>
Here elementname is the name of the element and ANY is a keyword
An element type declaration for an element that contains only other elements looks like this:
<!ELEMENT parentelement (childelement1, childelement2, )>
Here the element parentelement contains the child elements childelement1,
childelement2, etc
For example, the following element type declaration specifies that the element book contains the elements title, authors, isbn, price:
<!ELEMENT book (title, authors, isbn, price)>
The syntax of element type declaration, specifying that parentelemnt contains either
childelement1 or childelement2, …
<!ELEMENT parentelement (childelement1 | childelement2 | )>
For example, the following element type declaration specifies that element url can contain either httpurl or ftpurl:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 4<!ELEMENT url (httpurl | ftpurl)>
The following operators can be used in the element type declaration, to specify the number of allowed instances of elements within the parent element:
Operator Description
* Zero or more instances of the element is allowed
+ One or more instance of the element is allowed
The following element type declaration specifies that the element authors contains zero or more instances of the element author:
<!ELEMENT authors (author*)>
The following element type declaration specifies that element authors contains one or more instances of element author:
<!ELEMENT authors (author+)>
The following element type declaration specifies that the element toc contains the element chapters and optionally can contain element appendixes:
<!ELEMENT toc (chapters, appendixes?)>
Attribute List Declarations
We saw earlier that an element can have attributes associated with it The attribute list declaration specifies the attributes which specific elements can take It also indicates whether the attributes are mandatory or not, the possible values for the attributes, default values etc
The syntax of the attribute list declaration is:
<!ATTLIST elementname attrname1 datatype1 flag1 attrname2 datatype2 flag2
>
Here elementname is the name of the element, attrname1 is the name of an attribute, datatype1 specifies the type of information to be passed with the attribute and flag1 indicates how the default values for the attribute are to be handled
The possible values for the datatype field depend on the type of the attribute
Possible values for the flags field are:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 5Flag Description
#REQUIRED This flag indicates that the attribute should be present in all instances of the
element If the attribute is not present in an instance of the element, then the document is not a valid document
#IMPLIED This flag indicates that the application can assume a default value for the
attribute if the attribute is not specified in an element
#FIXED This flag indicates that the attribute can have only one value for all
instances of elements in the document
CDATA Attributes
CDATA attributes can have any character data as their value
The following attribute list declaration specifies that instances of the element price must have an attribute currency whose value can be any character data:
<!ATTLIST price currency CDATA #REQUIRED>
Enumerated Attributes
Enumerated attributes can take one of the list of values provided in the declaration
The following attribute list declaration specifies that instances of the element author can have an attribute gender, with a value of either "male" or "female":
<!ATTLIST author gender (male|female) #IMPLIED>
ID and IDREF Attributes
Attributes of type ID must have a unique value in an XML document These attributes are used to uniquely identify instances of elements in the document
The following attribute list declaration specifies that instances of element employee, must have an attribute employeeid, and the value of it should be unique in the XML document:
<!ATTLIST employee employeeid ID #REQUIRED>
The value of attributes of type IDREF must match the value of an ID attribute on some element in the XML document Similarly, the values of attributes of type IDREFS must contain whitespace-
delimited ID values in the document Attributes of type IDREF and IDREFS are used to establish links between elements in the document
The following attribute list declaration is used to establish a link between an employee and his or her manager and subordinates
<!ATTLIST employee
employeeid ID #REQUIRED
managerid IDREF #IMPLIED
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 6subordinatesid IDREFS #IMPLIED>
Entity Attributes
Entity attributes provide a mechanism for referring to non-XML (binary) data from an XML document The value of an entity attribute must match the name of an external entity declaration referring to non-XML data
The following attribute list declaration specifies that the element book, can have an entity attribute
logo
<!ATTLIST book logo ENTITY #IMPLIED>
Notation Declarations
Sometimes elements in XML documents might refer to an external file containing data in a format that
an XML parser cannot read Suppose we have an XML document containing the details of book We may want to put a reference to a GIF image of the cover along with the details of the book The XML parser would not be able to process this data, so we need a mechanism to identify a helper application which will process this non-XML data Notation declarations allow the XML parser to identify helper applications, which can be used to process non-XML data
A notation declaration provides a name and an external identifier for a type of non-XML (unparsed) data The external identifier for the notation allows the XML application to locate a helper application capable of processing data in the given notation
For example, the following notation declaration specifies "file:///usr/bin/netscape" as the helper application for non-XML data of type "gif":
<!NOTATION gif SYSTEM "file:///usr/bin/netscape">
Entity Declarations
Entity declarations define entities which are used within the XML document Whenever the XML parser encounters an entity reference in the XML document, it replaces it with the contents of the entity as defined in the entity declaration
Internal entity declarations are in the following format:
<!ENTITY myname "Harish Rawat">
This entity declaration defines an entity myname, with the value "Harish Rawat" The following is an example of an external entity declaration, referring to a file containing XML data:
<!ENTITY description1 SYSTEM "description1.xml">
This entity declaration defines an entity named description1, with "description1.xml" as the system identifier A "system identifier" is the location of the file containing the data
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 7When declaring external entity declarations, public identifiers for the entity can also be specified The XML parser, on encountering the external entity reference first tries to resolve the reference using the public identifier and only when it fails it tries to use system identifier
In this example, the entity description1 is declared with the public identifier
"http://www.harawat.com/description1.xml", and the system identifier
"description1.xml":
<!ENTITY description1 SYSTEM "description1.xml"
PUBLIC "http://www.harawat.com/description1.xml>
If the file contains non-XML data, the syntax will be:
<!ENTITY booklogo SYSTEM "booklogo.gif" NDATA gif>
This entity declaration defines an entity booklogo, which refers to an external non-XML file booklogo.gif, of notation gif Notation declaration for gif should be declared earlier
XML Support in PHP
PHP supports a set of functions that can be used for writing PHP-based XML applications These functions can be used for parsing well-formed XML documents The XML parser in PHP is a streams-based parser Before parsing the document, different handlers (or callback functions) are registered with the parser The XML document is fed to the parser in sections, and as the parser parses the document and recognizes different nodes, it calls the appropriate registered handler Note that the
XML parser does not check for the validity of the XML document It won't generate any errors or
warnings if the document is well-formed but not valid
The PHP XML extension supports Unicode character set through different character encodings There
are two types of character encodings, source encoding and target encoding Source encoding is
performed when the XML document is parsed The default source encoding used by PHP is
ISO-8859-1 Target encoding is carried out when PHP passes data to registered handler functions Target encoding affects character data as well as tag names and processing instruction targets
If the XML parser encounters characters outside the range that its source encoding is capable of representing, it will return an error If PHP encounters characters in the parsed XML document that cannot be represented in the chosen target encoding, such characters will be replaced by a question mark
XML support for PHP is implemented using the expat library Expat is a library written in C, for parsing XML documents More information about expat can be found at
Trang 83 Read the data from the XML file, and pass the data to the parser This is where the actual parsing of the data occur
4 Free the parser, after the complete file has been parsed
We will have a quick look at what this means in practice by showing a very simple XML parser (in fact, just about the simplest possible!), before going on to look at the individual functions in turn
<?php // First we define the handler functions to inform the parser what action to // take on encountering a specific type of node
// We'll just print out element opening and closing tags and character data // The handler for element opening tags
function startElementHandler($parser, $name, $attribs) { echo("<$name><BR>");
} // The handler for element closing tags function endElementHandler($parser, $name) { echo("</$name><BR>");
} // The handler for character data function cdataHandler($parser, $data) { echo("$data<BR>");
} // Now we create the parser $parser=xml_parser_create();
// Register the start and end element handlers xml_set_element_handler($parser, "startElementHandler", "endElementHandler"); // Register the character data parser
xml_set_character_data_handler($parser, "cdataHandler");
// Open the XML file $file="test.xml";
if (!($fp = fopen($file, "r"))) { die("could not open $file for reading");
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 9}
// Read chunks of 4K from the file, and pass it to the parser
while ($data = fread($fp, 4096)) {
if (!xml_parse($parser, $data, feof($fp))) {
die(sprintf("XML error %d %d", xml_get_current_line_number($parser), xml_get_current_column_number($parser)));
This will produce this output in the browser:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 10Now we'll go on to discuss the functions in detail In the following sections, all the XML-related functions will be described, along with examples of their use
Creating an XML Parser
The function xml_parser_create() creates an XML parser context
int xml_parser_create(string [encoding_parameter]);
encoding_parameter Yes The character source encoding
that will be used by the parser The source encoding once set cannot be changed later The possible values are
"ISO-8859-1"
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 11"ISO-8859-1", ASCII" and "UTF-8"
"US-The function returns a handle (a positive integer value) on success, and false on error The handle returned by xml_parser_create() will be passed as an argument to all the function calls which register handler functions with the parser, or change the options of the parser We will see these function calls shortly
We can define multiple parsers in a single PHP script You may want to do it if you are parsing more than one XML document in the script
Registering Handler functions
Before we can parse an XML document, we need to write functions which will handle the various nodes of the XML document For example, we need to write a function which will handle the opening tag of XML elements, and another which will handle the closing tags We also need to assign handlers for character data, processing instructions, etc These handlers must be registered with the XML parser before the document can be parsed
Registering Element Handlers
The function xml_set_element_handler() registers "start" and "end" element handler
functions with the XML parser Its syntax is:
int xml_set_element_handler(int parser, string startElementHandler,
string endElementHandler);
Parameter Optional Description
parser, with which the start and end element handlers are registered
startElementHandler No The name of the start
element handler function If null is specified then no start element handler is registered
endElenmentHandler No The name of the end
element handler function If null is specified then no end element handler is registered
The function returns true on success, or false if the call fails The function call will return false
if parser is not a valid parser handle
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 12The registered handler functions startElementHandler and endElementHandler should exist when an XML document is parsed; if they do not, a warning will be generated
Start Element Handler
The user-defined start element handler function, registered with the parser through an xml_set_element_handler() function call, will be called when the parser encounters the opening tag of an element in the document The function must be defined with the following syntax:
startElementHandler(int parser, string name, string attribs[]);
Parameter Optional? Description
parser which is calling this function
element
containing the attributes of the element
For example, suppose we are parsing the following line of an XML document:
<author gender="male" age="24">Harish Rawat</author>
The XML parser will call our registered start element handler function with the following parameters: startElementHandler($parser, "author", array("gender"=>"male", "age"=>"24");
End Element Handler
The user-defined end element handler function, registered with the parser through xml_set_element_handler() function call, will be called when the parser encounters a end tag
of an element in the document This function should have the following syntax:
endElementHandler(int parser, string name);
Parameter Optional Description
parser which is calling this function
element Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 13For example, if we parse the following line of an XML document:
<author sex="male" age="24">Harish Rawat</author>
The registered end element handler function will be called with the following parameters:
endElementHandler($parser, "author");
Notice that the value of the name parameter is "author" and not "/author"
The Character Data Handler
The function xml_set_character_data_handler() registers the character data handler with the XML parser The character data handler is called by the parser, for all non-markup contents of the XML document:
int xml_set_character_data_handler (int parser, string characterDataHandler);
XML parser, with which the character data handler is registered
characterDataHandler No The name of the
character data handler function If null is specified then no character data handler is registered
The function returns true on success else false is returned The function will return false if the
parser is not a valid parser handle
The registered handler function should exist when parsing of an XML document is done, else a error is generated
Prototype for the Character Data Handler
The user-defined character data handler function, registered with the parser through a call to the xml_set_character_data_handler() function, will be called when the parser encounters non-markup content in the XML document and should have the following syntax:
characterDataHandler(int parser, string data);
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 14Parameter Optional Description
parser which is calling this function
present in the XML document The parser returns the character data as it is, and does not remove any white spaces
While parsing the contents of an element, the character data handler can be called any number of times This should be kept in mind while defining the character data handler function
For example, while parsing the following line of an XML document:
<author sex="male" age="24">Harish Rawat</author>
The character data handler can be called once with the following parameters:
characterDataHandler($parser, "Harish Rawat");
Or it can be called twice; firstly as:
characterDataHandler($parser, "Harish ");
And again as:
characterDataHandler($parser, "Rawat");
The Processing Instruction Handler
The function xml_set_processing_instruction_handler() registers with the XML parser the function that will be used to handle processing instructions The processing instruction handler is called by the parser when it encounters a processing instruction in the XML document:
int xml_set_processing_instruction_handler(int parser, string processingInstructionHandler);
XML parser with which the processing Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 15is registered
processingInstruc
processing instruction handler function If null is specified then no processing instruction handler
is registered
The function returns true on success or false on failure The function will return false if
parser is not a valid parser handle
The registered handler function should exist when an XML is parsed, or an error is generated
Processing instructions, as we saw in the section on the XML Language, are application-specific instructions embedded in the XML document This is similar to the way we embed PHP instructions in
an HTML file
Prototype for the Processing Instruction Handler
The user defined processing instruction handler function, registered with the parser through the xml_set_processing_instruction_handler() function, will be called when the parser encounters processing instructions in the XML document and should have the following syntax:
processingInstructionHandler(int parser, string target, string data);
Parameter Optional Description
parser which is calling this function
processing instruction
parser For example, if we are parsing the following processing instruction in an XML document:
<?php print "This document was created on Jan 01, 1999";?>
The processing instruction handler will be called with the following parameters:
processingInstructionHandler($parser, "php", string "print \"This XML document
was created on Jan 01, 1999\";");
A sample processing instruction handler might look like this:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 16function piHandler($parser, $target, $data) {
if (strcmp(strtolower($target), "php") == 0) { eval($data);
} }
If you are defining such a processing instruction handler in your application, then you should do some security checks before executing the code The code embedded through processing instructions can be malicious – for example, it could delete all the files in the server
One security check could be to execute the code in the processing instructions only if the owner of the XML file and the XML parser are the same
The Notation Declaration Handler
The function xml_set_notation_decl_handler() registers the notation declaration handler with the parser The notation declaration handler is called by the parser whenever it encounters a notation declaration in the XML document
int xml_set_notation_decl_handler(int parser, string notationDeclarationHandler);
XML parser with which the notation declaration handler
is registered
notationDeclarat ionHandler
notation declaration handler function If null is specified then no notation declaration handler
is registered
The function returns true on success, otherwise false is returned The function will return false
if parser is not a valid parser handle
An error will be generated when the XML document is parsed if the notation declaration handler does not exist
Prototype for the Notation Declaration Handler
The user defined notation declaration handler function, registered with the parser through a call to the xml_set_notation_decl_handler() function, will be called when the parser encounters notation declarations in the XML document and should have the syntax:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 17string systemId, string publicId);
Parameter Optional Description
XML parser which
is calling this function
identifier of the notation declaration
of the notation declaration For example, parsing the following notation declaration of an XML document:
<!NOTATION gif SYSTEM "file:///usr/bin/netscape">
Will cause the notation declaration handler to be called with the following parameters:
notationDeclarationHandler($parser, "gif", "", "file:///usr/bin/netscape", ""); Let's implement a sample notation declaration handler This handler populates the associative array
$helperApps with a mapping between the notation name and the name of the application that will handle the unparsed data of type $notationName The $helperApps array can be used by the unparsed entity declaration handler to identify the application that should be used to process non-XML data We will look at the unparsed entity declaration handler shortly
function notationHandler($parser, $notationName, $base, $systemId, $publicId) { global $helperApps;
Trang 18The External Entity Reference Handler
The function xml_set_external_entity_ref_handler() registers the external entity reference handler with the XML parser This function is called by the parser when it encounters an external entity reference in an XML document Note that the registered handler is called for external entity references and not external entity declarations
Unlike other parsers (such as Microsoft Internet Explorer 5), the XML parser of PHP does not handle external entities It simply calls the registered external entity reference handler to handle it
int xml_set_external_entity_ref_handler(int parser, string externalEntityRefHandler);
Parameter Optional Description
parser No The handle for an XML parser with
which the external entity reference handler is registered
externalEntityRef
Handler No The name of the external entity
reference handler function If null is specified then no external entity reference handler is registered
The function returns true on success; otherwise, false is returned The function will return false
if parser is not a valid parser handle
The registered handler function should exist when parsing an XML document, or an error will be generated
Prototype for the External Entity Reference Handler
The user-defined external entity reference handler function, registered with the parser through an xml_set_external_entity_ref_handler function call, will be called when the parser encounters external entity references in the XML document This should have the following syntax:
int externalEntityRefHandler(int parser, string entityName, string base, string systemId, string publicId);
Parameter Optional Description
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 19parser No Reference to the
XML parser which
is calling this function
resolving
systemId
Currently the value
of this parameter will always be a null string
identifier of the external entity
of the external entity
The user-defined external entity reference handler should handle the external references in the XML document If a true value is returned by the handler, the parser assumes that the external reference was successfully handled and the parsing continues If the handler returns false, the parser will stop parsing
As an example, suppose that an entity &book_1861002777; is defined in the DTD of an XML document:
<!ENTITY book_1861002777 SYSTEM "1861002777.xml">
And the parser comes across the following line in an XML document:
&book_1861002777;
The external entity reference handler will be called with the parameters:
externalEntityRefHandler($parser, "book_186100277", "", "1861002777.xml", "");
The Unparsed Entity Declaration Handler
The function xml_set_unparsed_entity_decl_handler registers the external entity
reference handler with the xml parser The unparsed entity declaration handler is called by the parser, when it encounters an unparsed entity declaration in an XML document
int xml_set_unparsed_entity_decl_handler(int parser,
string unparsedEntityDeclHandler);
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 20Parameter Optional Description
parser No The handle of an XML parser with
which the unparsed entity declaration handler is registered
unparsedEntityDeclHa
ndler No The name of the unparsed entity
declaration handler function If null
is specified then no external entity reference handler is registered
The function returns true on success or false on failure The function returns false if parser is
not a valid parser handle
The registered handler function should exist when an XML is parsed, or an error will be generated
Prototype for the Unparsed Entity Declaration Handler
The user-defined unparsed entity declaration handler function, registered with the parser through an xml_set_unparsed_entity_decl_handler() function call, will be called when the parser encounters an unparsed entity declaration in the XML document Its syntax is:
unparsedEntityDeclHandler(int parser, string entityName, string base, string systemId, string publicId, string notationName);
Parameter Optional Description
XML parser which
is calling this function
resolving
systemId
Currently the value
of this parameter will always be a null string
identifier of the unparsed entity
of the unparsed entity
notation (defined in Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 21an earlier notation declaration), identifying the type
of unparsed data
For example, if the parser encounters the following line (in the DTD) of an XML document:
<!ENTITY book_gif_1861002777 SYSTEM "1861002777.gif" NDATA gif>
The unparsed entity declaration handler will be called with the following parameters:
unparsedEntityDeclHandler($parser, "book_gif_1861002777", "", "1861002777.gif", "", "gif");
The Default Handler
The function xml_set_default_handler() registers the default handler with the XML parser The default handler is called by the parser for all the nodes of the XML document for which handlers can not be registered (such as the XML version declaration, DTD declaration and comments) The default handler is also called for any other nodes for which the handlers are not registered with the parser For example, if the start and end element handlers are not registered with the parser, the parser will call the default handler (if registered) whenever it encounters element opening and closing tags in the XML document
int xml_set_default handler(int parser, string defaultHandler);
parser No The handle of an XML parser with
which the unparsed default handler
is registered
DefaultHandler No The name of the default handler
function If null is specified then no default handler is registered
The function returns true on success, or false on error (e.g if parser is not a valid parser
handle)
The registered handler function should exist when parsing an XML document, or an error is generated
Prototype for the Default Handler
The user-defined default handler gets called by the parser for all the nodes in the XML document for which handler functions are not registered It should have the following syntax:
DefaultHandler(int parser, string data);
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 22Parameter Optional Description
parser which is calling this function
document for which there is no registered handler
For example, if the start and end element handlers are not registered with the parser and the parser encounters this line in an XML document:
<author sex="male" age="24">Harish Rawat</author>
The default handler will be called with the following values of function parameters:
int xml_set_default handler($parser, "<author sex=\"male" age=\"24\">");
Notice that the entire opening and closing tags of the element are passed as they are
Parsing the XML Document
The xml_parse() function passes the contents of the XML document to the parser This function accomplishes the actual parsing of the document – it calls the appropriate registered handlers as and when it encounters nodes in the document
This function is called after all the handler functions for the various node types in the XML document have been registered with the parser
int xml_parse(int parser, string data, int [isFinal]);
Parameter Optional Description Default
parser, which will parse the supplied data
document The complete contents of the XML file need not be passed in onecall
isFinal Yes Specifies the end of input
data
false
The function returns true if it was able to parse the data passed to it; otherwise, false is returned The error information in case of failure can be found with the xml_get_error_code() and xml_get_error_string() functions We shall look at these functions presently
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 23The following code fragment illustrates the use of the xml_parse() function:
// Open the XML file
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error %d %d", xml_get_current_line_number($xml_parser), xml_get_current_column_number($xml_parser))) ;
}
}
Freeing the Parser
The function xml_parser_free() frees the XML parser which was created with the
xml_parser_create() function All the resources associated with the parser are freed The XML parser should be freed after a complete XML document has been parsed, or if an error occurs while parsing a document
int xml_parse_free(int parser);
Parameter Optional Description
XML parser, which
is to freed
The function returns true if the parser was freed, otherwise false
Parser Options
There are two options for the parser We can set values for these options using the
xml_parser_set_option() function, and retrieve the current value with the
xml_parser_get_option() function
These options are:
Option Data Type Description Default
XML_OPTION_CASE_FOLDING Integer If the value of the
option is true, then the element names (start and end tags), will be upper cased, when
true Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 24the registered handlers are called
XML_OPTION_TARGET_ENCODING String The value of this
option specifies the target encoding used by parser, when it invokes registered handlers
Same as the source encoding value, specified when the parser was created
xml_parser_set_option
The xml_parser_set_option() function sets the option specified in the option argument to
the value in the value argument for the parser associated with the parser handle specified by the
parser argument
int xml_parser_set_option(int parser, int option, mixed value);
The function returns true if the new option was set; if the call failed, false is returned
The function xml_parser_set_option() can be called at any point in the PHP program The new option will take effect for any data that is parsed after the option has been set
xml_parser_get_option
The xml_parser_get_option() function retrieves the value for the option specified by the
option argument for the parser specified by the parser argument
mixed xml_parser_get_option(int parser, int option);
This function returns the value of the option (the data type of the return value therefore depends on the option) If either the parser or the option argument is invalid then false is returned
Utility Functions
The remaining functions provide useful information or services that we may need when parsing an XML document These functions provide information about any errors which occurred, the current position in the XML document There are also functions for encoding and decoding text
xml_get_error_code
The function xml_get_error_code() returns the error code from the XML parser
int xml_get_error_code(int parser);
Parameter Optional Description
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 25parser No The handle of an XML parser
This function can be called after xml_parse() has returned false to find out the exact reason why the parsing of the passed data failed The function returns false if the parser is not a valid XML parser
xml_error_string
The xml_error_string() function returns the error message corresponding to an error code
string xml_get_error_code(int errorCode);
Parameter Optional Description
ErrorCode No An error code returned by the
int xml_get_current_line_number(int parser);
Parameter Optional Description Default
XML parser This function returns the line number of the XML document that the parser is currently parsing If
parser is not a valid parser, false is returned This function can be used to print the line number (for debugging purposes), when a call to the xml_parse() function returns false
xml_get_current_column_number
The xml_get_current_column_number() function is similar to
xml_get_current_line_number(); the only difference is that it returns the number of the current column in the line that the parser is parsing
int xml_get_current_column_number(int parser);
The functions xml_get_current_line_number() and
xml_get_current_column_number() can be used together when reporting parse errors in the XML document to give the user the exact location where the error occurred:
if (!xml_parser($parser, $data)) {
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 26// If error in parsing $data, then print line number and column number // of the XML file
die(sprintf("Error in XML document at line %d column %d\n", xml_get_current_line_number($parser),
xml_get_current_column_number($parser))) ; }
The function utf8_decode() converts a UTF-8 encoded string to ISO-8859-1 encoding:
string utf8_decode(string data);
Parameter Optional Description
string This function returns an ISO-8859-1 string corresponding to data
utf8_encode
The utf8_encode() function converts an ISO-8859-1 encoded string to UTF-8 encoding:
string utf8_encode(String data);
Parameter Optional Description
string The function returns a UTF-8 string corresponding to data
Trang 27Web-Enabling Enterprise Applications
Now XML is getting used as the format for exchanging the data between different
business-to-business applications Industry standard Document Type Definition's are getting defined to describe orders, transactions, inventory, billing etc PHP can be used to provide the web based front end for these business-to-business applications
Smart Searches
PHP can be used to search XML documents For example, if all the articles in a web site are written using the same DTD, which defines elements for author, title, abstract etc., then PHP can be used to search for the articles depending on author, title etc
Converting XML to HTML
Currently there are very few browsers which have built-in XML support PHP can be used on the server side to parse XML documents and return pure HTML to the client This technique would allow all browsers, irrespective of their level of XML support, to view XML documents
Additionally, the PHP script could send either the XML document or the converted HTML to the client, depending on the browser that has sent the request For example, if the request for an XML file was sent from Internet Explorer 5 then the PHP script could simply return the XML file; otherwise, the script would convert the XML data into HTML and sent an HTML page to the browser
Different Views of the Same Data
PHP can be used to present different views of the same XML document, by deleting or modifying nodes within the document
A Sample PHP XML Application
To give a simple example of what can be done with XML and PHP, we will implement a simple "Book Information Site" This site will allow users to search books using the title, author or ISBN of the book as the search criterion Alternatively, the user can view the complete list of books After the user has searched for all books of interest, he or she can view the table of contents of the book
All the information about the books will be stored in an XML file
Before looking at the code of the application, let us first look at the user interface of the application to get a feel for it
The main page of the application allows the user to search books using the title, author or ISBN of the book as the search criterion; alternatively, users can view the complete list of books:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 28To searching for a book, the user enters the search keyword in the Search Keyword text input box, and specifies the search category using the Search Books By list box In the above figure, the search keyword is 'Java', and the search will be by title, so the application will search for all books with the word "Java" in the title
To view all the books in the file, the user can click on the Complete list of books button
The results of the search are shown in the figure below Clicking on the title of a book will present the user with the table of contents for that book:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 29The other option available from the main page is to view the complete list of all the books Again, the title of the book acts as link which the user can click to view the book's table of contents:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 30This figure displays the table of contents of the book, selected by the user from Complete List of
Books or Search Page
Now, after having looked at the screen shots of the application, we understand the functionality of the application So it is the right time to look at the code
The book details are stored in XML files in order to enable smart searches, which form an important requirement of the application A relational database could have been used for storing the data, but it would not have made sense to install a relational database for this simple application Another important feature of XML is that it stores data in plain text files, so the data can easily be exchanged between applications, even on different platforms It is even human-readable
The XML file books.xml stores the book details The following is a sample books.xml file containing details of three books First we have the DTD:
<?xml version="1.0"?>
<!DOCTYPE listofbooks[
<!ELEMENT book (title, authors, isbn, price, toc)*>
<!ELEMENT title (#PCDATA)>
<!ELEMENT authors (author*)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 31<!ELEMENT price (#PCDATA)>
<!ELEMENT toc (chapters, appendixes)>
<!ELEMENT chapters (chapter)*>
<!ELEMENT chapter (#PCDATA)>
<!ELEMENT appendixes (appendix)*>
<!ELEMENT appendix (#PCDATA)>
<!ATTLIST price currency CDATA #REQUIRED>
<!ENTITY book_1861002947 SYSTEM "1861002947.xml">
<!ENTITY book_1861002971 SYSTEM "1861002971.xml">
<!ENTITY book_1861002777 SYSTEM "1861002777.xml">
]>
Then comes the data itself The root element which encloses the data is named <listofbooks> Each book is represented by a <book> element, which contains child elements for each item of information stored about the book, such as <title>, <isbn> and <price> Since there may be more than one author for a book, there is an <authors> element for each book, which has a child
<author> element for each author The table of contents for the book is stored in another XML file,
so we use an external entity reference to refer to that:
Trang 32In this case, the root element is named <toc>, and this contains two elements – <chapters> and
<appendices> These contain child <chapter> and <appendix> elements respectively:
<chapter>Error Handling and Event Logging</chapter>
<chapter>Sessions and Session Tracking</chapter>
<chapter>Using the Servlet Context</chapter>
<chapter>Dynamic Content Generation</chapter>
<chapter>Introduction to JavaServer Pages</chapter>
<chapter>Connecting to Databases</chapter>
<chapter>Connection Pooling</chapter>
<chapter>Servlet Chaining</chapter>
<chapter>Servlet Communications</chapter>
<chapter>Distributed Computing with Servlets</chapter>
<chapter>JavaMail and Servlets</chapter>
<chapter>Introducing XML</chapter>
<chapter>Weeds of El-Limon 2</chapter>
<chapter>Bug Tracker Case Study</chapter>
<chapter>
Bug Tracker Case Study: Elaboration, Construction and Transition </chapter>
<chapter>Moving from CGI to Servlets</chapter>
<chapter>Internationalizing Web Sites</chapter>
<chapter>Smart Servlets</chapter>
<chapter>Server Programming with JNDI</chapter>
<chapter>Using LDAP and Java</chapter>
<chapter>Enterprise JavaBeans</chapter>
<chapter>Indexing and Searching</chapter>
<chapter>JINI and JavaSpaces: Servers of the Future</chapter>
<chapter>Working With JavaSpaces</chapter>
<chapter>Coding a Jini-based Website</chapter>
</chapters>
<appendices>
<appendix>HTTP</appendix>
<appendix>Java Object Streams and Serialization</appendix>
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 33<appendix>The LogWriter Class</appendix>
<appendix>UML Tutorial</appendix>
<appendix>JServ Configuration</appendix>
<appendix>ServletRunner and Java Web Server Configuration</appendix>
<appendix>JRun Configuration</appendix>
<appendix>JSDK API Reference</appendix>
<appendix>JavaServer Pages API Reference</appendix>
<appendix>JNDI API Reference</appendix>
<appendix>Core JavaMail / JAF API Reference</appendix>
<appendix>Core Jini API Reference</appendix>
<appendix>JavaSpaces API Reference</appendix>
<FORM ACTION="search_books.php" METHOD=GET>
Trang 34The file common.php contains common functions, which are used throughout the application First
we define some variables which we'll be using in all the PHP pages – the XML file which contains the data, a variable called $currentTag to hold the name of the element that is currently being parsed, and a number of variables to store the details for the book element that the parser is currently parsing:
<?php $file = "books.xml";
$currentTag = "";
$titleValue = ""; // Value of the title element $authorsValue = array(); // Array of the values of the author elements $isbnValue = ""; // Value of the isbn element
$priceValue = ""; // Value of the price element $currencyValue = ""; // Value of the book element's currency attribute $descriptionValue = ""; // Value of the description entity reference $authorCount=0; // Variable used to populate the $authorsValue array
We will also define an array to contain the book details:
$books = array(); // Contains the details of books
Next we define the start element handler of the parser We store the element name in the global variable $currentTag, so that the character data handler can identify the element that is currently being parsed If the current element is <price>, we store the value of the currency attribute in the global variable $currencyValue:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 35function startElement($parser, $name, $attr) {
global $currentTag, $currencyValue;
$books array, and reinitialize the global variables to store the details of another book If the function
is called for an <author> element, we increment the $authorCount variable:
function endElement($parser, $name) {
global $titleValue, $authorsValue, $isbnValue, $priceValue,
$currencyValue, $books, $authorCount, $descriptionValue;
Now we define the character data handler Depending on the value of $currentTag, we concatenate
$data to the appropriate global variable:
function characterData($parser, $data) {
global $titleValue, $authorsValue, $isbnValue,$priceValue,
Trang 36Our last handler is for external entity references; we simply store the value of $systemId in the global variable $descriptionValue:
function externalEntityHandler($parser, $entityName, $base, $systemId, $publicId) { global $descriptionValue;
if (!systemId) return false;
$descriptionValue = $systemId;
return true;
} The function readBookInfo() parses the XML document and returns an array containing the details of books First, we create a parser, register our handlers and set the
XML_OPTION_CASE_FOLDING option of the XML parser to false to ensure that all the element names are converted to upper case:
function readBookInfo() { global $file, $books;
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser,"characterData");
xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler"); xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);
Next we open the XML file, read the data from the file in 4K chunks, and pass the data to the XML parser $xml_parser:
if (!($fp = fopen($file, "r"))) { die("Could not open $file for reading") ; }
while (($data = fread($fp, 4096))) {
if (!xml_parse($xml_parser, $data, feof($fp))){
die(sprintf("XML error at line %d column %d", xml_get_current_line_number($xml_parser), xml_get_current_column_number($xml_parser)));
} } When we've finished parsing, we free the XML parser and return the global $books array:
of the selected book:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 37function printBookInfo($titleValue, $authorsValue, $isbnValue, $priceValue, $currencyValue) {
function searchBookByISBN($books, $isbn) {
for($i = 0; $i < count($books); $i++) {
Trang 38<?php require("common.php");
Within the table, we require the common.php file and call the readBookInfo() to parse the XML document This returns an array containing the details of all the books in the file, which we store in the variable $books:
} ?>
$books = readBookInfo();
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 39If the search category is "ISBN", we print the details of the book with $searchKeyword as the value of its <isbn> element:
} elseif (strcmp($searchBy, "author")== 0) {
for ($i=0; $i<count($books); $i++) {
$authorsValue = $books[$i]["authors"];
for($j=0; $j<count($authorsValue)-1; $j++) {
if (strcmp(strtolower(trim($authorsValue[$j])),
strtolower(trim($searchKeyword))) == 0) printBookInfo($books[$i]["title"],
} else if (strcmp($searchBy, "title") == 0) {
for ($i=0; $i<count($books); $i++) {
if (strstr(strtolower(trim($books[$i]["title"])),
strtolower(trim($searchKeyword))))
printBookInfo($books[$i]["title"],
$books[$i]["authors"], $books[$i]["isbn"], $books[$i]["price"],
Trang 40<?php require("common.php");
The data in the table of contents XML files has a different structure to that containing the book details, so we will define another set of global variables for parsing this XML file:
$currentTag1 = ""; // Name of the element that is being parsed $chapters = array(); // Array of the values of chapter elements $chapterNo=0; // Variable used to populate the $chapters array $appendixes = array(); // Array of the values of appendix elements $appendixNo=0; // Variable used to populate the $appendix array
We will also need new handler functions for the parser The handler for the opening element tags simply assigns the name of the current element to the $currentTag1 global variable:
function startElement1($parser, $name, $attr) { global $currentTag1;
} else if (strcmp($name, "appendix") == 0) { $appendixNo++ ;
} }
In the character data handler for the parser, we again concatenate the value of $data to the appropriate array:
function characterData1($parser, $data) { global $chapters, $chapterNo, $appendixes, $appendixNo, $currentTag1;
if (strcmp($currentTag1, "chapter")==0) { $chapters[$chapterNo] = $data;
} else if (strcmp($currentTag1, "appendix") == 0) { $appendixes[$appendixNo] = $data;
} }
Next we parse the XML file containing the details of all the books and find the book with an <isbn>element which has the same value as :
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com