Professional PHP Programming phần 5 pdf

Prototype for the Character Data Handler The user-defined character data handler function, registered with the parser through a call to the xml_set_character_data_handler function, will

Trang 1

However, as we noted above, there is also an alternative syntax, whereby we place the closing slash at the end of the opening element:

The following line defines an empty element image, with an attribute src with the value

The syntax of the processing instruction might be strangely familiar to you:

There are two types of entities – internal and external The replacement text for an internal entity is specified in an entity declaration, whereas the replacement text for an external entity resides in a separate file, the location of which is specified in the entity declaration

After the entity has been declared, it can be referenced within the document using the following syntax:

Trang 2

The Document Type Definition

The document type definition of an XML document is defined within a declaration known as the

document type declaration The DTD can be contained within this declaration, or the declaration can

point to an external document containing the DTD The DTD consists of element type declarations, attribute list declarations, entity declarations, and notation declarations We will cover all of these in this section

Be sure to distinguish between the document type definition, or DTD, and the document type declaration

The syntax for a document type definition is:

<!DOCTYPE rootelementname [

<!DOCTYPE rootelementname SYSTEM "http://www.harawat.com/books.dtd">

The rootelementname is the name of the root element of the document The location of the file containing the DTD is http://www.harawat.com/books.dtd

Element Type Declarations

The element type declaration indicates whether the element contains other elements, text, or is empty

It also specifies whether the elements are mandatory or optional, and how many times the elements can appear

An element type declaration, specifying that an element can contain character data, looks as follows: Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 3

<!ELEMENT elementname (#PCDATA)>

Here ELEMENT is a keyword, elementname is the name of the element, and #PCDATA is also a keyword #PCDATA stands for "parsed character data", that is, the data that can be handled by the XML parser

For example, the following element declaration specifies that the element title contains character data:

<!ELEMENT title (#PCDATA)>

The syntax of an element type declaration for an empty element is:

<!ELEMENT elementname EMPTY>

Here elementname is the name of the element, and EMPTY is a keyword

For example, the following element type declaration specifies that element image is empty:

<!ELEMENT image EMPTY>

The syntax of an element type declaration for an element can contain anything – other elements or parsed character data – is as follows:

<!ELEMENT elementname ANY>

Here elementname is the name of the element and ANY is a keyword

An element type declaration for an element that contains only other elements looks like this:

<!ELEMENT parentelement (childelement1, childelement2, )>

Here the element parentelement contains the child elements childelement1,

childelement2, etc

For example, the following element type declaration specifies that the element book contains the elements title, authors, isbn, price:

<!ELEMENT book (title, authors, isbn, price)>

The syntax of element type declaration, specifying that parentelemnt contains either

childelement1 or childelement2, …

<!ELEMENT parentelement (childelement1 | childelement2 | )>

For example, the following element type declaration specifies that element url can contain either httpurl or ftpurl:

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 4

<!ELEMENT url (httpurl | ftpurl)>

The following operators can be used in the element type declaration, to specify the number of allowed instances of elements within the parent element:

Operator Description

* Zero or more instances of the element is allowed

+ One or more instance of the element is allowed

The following element type declaration specifies that the element authors contains zero or more instances of the element author:

<!ELEMENT authors (author*)>

The following element type declaration specifies that element authors contains one or more instances of element author:

<!ELEMENT authors (author+)>

The following element type declaration specifies that the element toc contains the element chapters and optionally can contain element appendixes:

<!ELEMENT toc (chapters, appendixes?)>

Attribute List Declarations

We saw earlier that an element can have attributes associated with it The attribute list declaration specifies the attributes which specific elements can take It also indicates whether the attributes are mandatory or not, the possible values for the attributes, default values etc

The syntax of the attribute list declaration is:

<!ATTLIST elementname attrname1 datatype1 flag1 attrname2 datatype2 flag2

>

Here elementname is the name of the element, attrname1 is the name of an attribute, datatype1 specifies the type of information to be passed with the attribute and flag1 indicates how the default values for the attribute are to be handled

The possible values for the datatype field depend on the type of the attribute

Possible values for the flags field are:

Trang 5

Flag Description

#REQUIRED This flag indicates that the attribute should be present in all instances of the

element If the attribute is not present in an instance of the element, then the document is not a valid document

#IMPLIED This flag indicates that the application can assume a default value for the

attribute if the attribute is not specified in an element

#FIXED This flag indicates that the attribute can have only one value for all

instances of elements in the document

CDATA Attributes

CDATA attributes can have any character data as their value

The following attribute list declaration specifies that instances of the element price must have an attribute currency whose value can be any character data:

<!ATTLIST price currency CDATA #REQUIRED>

Enumerated Attributes

Enumerated attributes can take one of the list of values provided in the declaration

The following attribute list declaration specifies that instances of the element author can have an attribute gender, with a value of either "male" or "female":

<!ATTLIST author gender (male|female) #IMPLIED>

ID and IDREF Attributes

Attributes of type ID must have a unique value in an XML document These attributes are used to uniquely identify instances of elements in the document

The following attribute list declaration specifies that instances of element employee, must have an attribute employeeid, and the value of it should be unique in the XML document:

<!ATTLIST employee employeeid ID #REQUIRED>

The value of attributes of type IDREF must match the value of an ID attribute on some element in the XML document Similarly, the values of attributes of type IDREFS must contain whitespace-

delimited ID values in the document Attributes of type IDREF and IDREFS are used to establish links between elements in the document

The following attribute list declaration is used to establish a link between an employee and his or her manager and subordinates

<!ATTLIST employee

employeeid ID #REQUIRED

managerid IDREF #IMPLIED

Trang 6

subordinatesid IDREFS #IMPLIED>

Entity Attributes

Entity attributes provide a mechanism for referring to non-XML (binary) data from an XML document The value of an entity attribute must match the name of an external entity declaration referring to non-XML data

The following attribute list declaration specifies that the element book, can have an entity attribute

logo

<!ATTLIST book logo ENTITY #IMPLIED>

Notation Declarations

Sometimes elements in XML documents might refer to an external file containing data in a format that

an XML parser cannot read Suppose we have an XML document containing the details of book We may want to put a reference to a GIF image of the cover along with the details of the book The XML parser would not be able to process this data, so we need a mechanism to identify a helper application which will process this non-XML data Notation declarations allow the XML parser to identify helper applications, which can be used to process non-XML data

A notation declaration provides a name and an external identifier for a type of non-XML (unparsed) data The external identifier for the notation allows the XML application to locate a helper application capable of processing data in the given notation

For example, the following notation declaration specifies "file:///usr/bin/netscape" as the helper application for non-XML data of type "gif":

<!NOTATION gif SYSTEM "file:///usr/bin/netscape">

Entity Declarations

Entity declarations define entities which are used within the XML document Whenever the XML parser encounters an entity reference in the XML document, it replaces it with the contents of the entity as defined in the entity declaration

Internal entity declarations are in the following format:

<!ENTITY myname "Harish Rawat">

This entity declaration defines an entity myname, with the value "Harish Rawat" The following is an example of an external entity declaration, referring to a file containing XML data:

<!ENTITY description1 SYSTEM "description1.xml">

This entity declaration defines an entity named description1, with "description1.xml" as the system identifier A "system identifier" is the location of the file containing the data

Trang 7

When declaring external entity declarations, public identifiers for the entity can also be specified The XML parser, on encountering the external entity reference first tries to resolve the reference using the public identifier and only when it fails it tries to use system identifier

In this example, the entity description1 is declared with the public identifier

"http://www.harawat.com/description1.xml", and the system identifier

"description1.xml":

<!ENTITY description1 SYSTEM "description1.xml"

PUBLIC "http://www.harawat.com/description1.xml>

If the file contains non-XML data, the syntax will be:

<!ENTITY booklogo SYSTEM "booklogo.gif" NDATA gif>

This entity declaration defines an entity booklogo, which refers to an external non-XML file booklogo.gif, of notation gif Notation declaration for gif should be declared earlier

XML Support in PHP

PHP supports a set of functions that can be used for writing PHP-based XML applications These functions can be used for parsing well-formed XML documents The XML parser in PHP is a streams-based parser Before parsing the document, different handlers (or callback functions) are registered with the parser The XML document is fed to the parser in sections, and as the parser parses the document and recognizes different nodes, it calls the appropriate registered handler Note that the

XML parser does not check for the validity of the XML document It won't generate any errors or

warnings if the document is well-formed but not valid

The PHP XML extension supports Unicode character set through different character encodings There

are two types of character encodings, source encoding and target encoding Source encoding is

performed when the XML document is parsed The default source encoding used by PHP is

ISO-8859-1 Target encoding is carried out when PHP passes data to registered handler functions Target encoding affects character data as well as tag names and processing instruction targets

If the XML parser encounters characters outside the range that its source encoding is capable of representing, it will return an error If PHP encounters characters in the parsed XML document that cannot be represented in the chosen target encoding, such characters will be replaced by a question mark

XML support for PHP is implemented using the expat library Expat is a library written in C, for parsing XML documents More information about expat can be found at

Trang 8

3 Read the data from the XML file, and pass the data to the parser This is where the actual parsing of the data occur

4 Free the parser, after the complete file has been parsed

We will have a quick look at what this means in practice by showing a very simple XML parser (in fact, just about the simplest possible!), before going on to look at the individual functions in turn

<?php // First we define the handler functions to inform the parser what action to // take on encountering a specific type of node

// We'll just print out element opening and closing tags and character data // The handler for element opening tags

function startElementHandler($parser, $name, $attribs) { echo("<$name><BR>");

} // The handler for element closing tags function endElementHandler($parser, $name) { echo("</$name><BR>");

} // The handler for character data function cdataHandler($parser, $data) { echo("$data<BR>");

} // Now we create the parser $parser=xml_parser_create();

// Register the start and end element handlers xml_set_element_handler($parser, "startElementHandler", "endElementHandler"); // Register the character data parser

xml_set_character_data_handler($parser, "cdataHandler");

// Open the XML file $file="test.xml";

if (!($fp = fopen($file, "r"))) { die("could not open $file for reading");

Trang 9

}

// Read chunks of 4K from the file, and pass it to the parser

while ($data = fread($fp, 4096)) {

if (!xml_parse($parser, $data, feof($fp))) {

die(sprintf("XML error %d %d", xml_get_current_line_number($parser), xml_get_current_column_number($parser)));

This will produce this output in the browser:

Trang 10

Now we'll go on to discuss the functions in detail In the following sections, all the XML-related functions will be described, along with examples of their use

Creating an XML Parser

The function xml_parser_create() creates an XML parser context

int xml_parser_create(string [encoding_parameter]);

encoding_parameter Yes The character source encoding

that will be used by the parser The source encoding once set cannot be changed later The possible values are

"ISO-8859-1"

Trang 11

"ISO-8859-1", ASCII" and "UTF-8"

"US-The function returns a handle (a positive integer value) on success, and false on error The handle returned by xml_parser_create() will be passed as an argument to all the function calls which register handler functions with the parser, or change the options of the parser We will see these function calls shortly

We can define multiple parsers in a single PHP script You may want to do it if you are parsing more than one XML document in the script

Registering Handler functions

Before we can parse an XML document, we need to write functions which will handle the various nodes of the XML document For example, we need to write a function which will handle the opening tag of XML elements, and another which will handle the closing tags We also need to assign handlers for character data, processing instructions, etc These handlers must be registered with the XML parser before the document can be parsed

Registering Element Handlers

The function xml_set_element_handler() registers "start" and "end" element handler

functions with the XML parser Its syntax is:

int xml_set_element_handler(int parser, string startElementHandler,

string endElementHandler);

Parameter Optional Description

parser, with which the start and end element handlers are registered

startElementHandler No The name of the start

element handler function If null is specified then no start element handler is registered

endElenmentHandler No The name of the end

element handler function If null is specified then no end element handler is registered

The function returns true on success, or false if the call fails The function call will return false

if parser is not a valid parser handle

Trang 12

The registered handler functions startElementHandler and endElementHandler should exist when an XML document is parsed; if they do not, a warning will be generated

Start Element Handler

The user-defined start element handler function, registered with the parser through an xml_set_element_handler() function call, will be called when the parser encounters the opening tag of an element in the document The function must be defined with the following syntax:

startElementHandler(int parser, string name, string attribs[]);

Parameter Optional? Description

parser which is calling this function

element

containing the attributes of the element

For example, suppose we are parsing the following line of an XML document:

<author gender="male" age="24">Harish Rawat</author>

The XML parser will call our registered start element handler function with the following parameters: startElementHandler($parser, "author", array("gender"=>"male", "age"=>"24");

End Element Handler

The user-defined end element handler function, registered with the parser through xml_set_element_handler() function call, will be called when the parser encounters a end tag

of an element in the document This function should have the following syntax:

endElementHandler(int parser, string name);

element Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 13

For example, if we parse the following line of an XML document:

<author sex="male" age="24">Harish Rawat</author>

The registered end element handler function will be called with the following parameters:

endElementHandler($parser, "author");

Notice that the value of the name parameter is "author" and not "/author"

The Character Data Handler

The function xml_set_character_data_handler() registers the character data handler with the XML parser The character data handler is called by the parser, for all non-markup contents of the XML document:

int xml_set_character_data_handler (int parser, string characterDataHandler);

XML parser, with which the character data handler is registered

characterDataHandler No The name of the

character data handler function If null is specified then no character data handler is registered

The function returns true on success else false is returned The function will return false if the

parser is not a valid parser handle

The registered handler function should exist when parsing of an XML document is done, else a error is generated

Prototype for the Character Data Handler

The user-defined character data handler function, registered with the parser through a call to the xml_set_character_data_handler() function, will be called when the parser encounters non-markup content in the XML document and should have the following syntax:

characterDataHandler(int parser, string data);

Trang 14

present in the XML document The parser returns the character data as it is, and does not remove any white spaces

While parsing the contents of an element, the character data handler can be called any number of times This should be kept in mind while defining the character data handler function

For example, while parsing the following line of an XML document:

The character data handler can be called once with the following parameters:

characterDataHandler($parser, "Harish Rawat");

Or it can be called twice; firstly as:

characterDataHandler($parser, "Harish ");

And again as:

characterDataHandler($parser, "Rawat");

The Processing Instruction Handler

The function xml_set_processing_instruction_handler() registers with the XML parser the function that will be used to handle processing instructions The processing instruction handler is called by the parser when it encounters a processing instruction in the XML document:

int xml_set_processing_instruction_handler(int parser, string processingInstructionHandler);

XML parser with which the processing Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 15

is registered

processingInstruc

processing instruction handler function If null is specified then no processing instruction handler

is registered

The function returns true on success or false on failure The function will return false if

parser is not a valid parser handle

The registered handler function should exist when an XML is parsed, or an error is generated

Processing instructions, as we saw in the section on the XML Language, are application-specific instructions embedded in the XML document This is similar to the way we embed PHP instructions in

an HTML file

Prototype for the Processing Instruction Handler

The user defined processing instruction handler function, registered with the parser through the xml_set_processing_instruction_handler() function, will be called when the parser encounters processing instructions in the XML document and should have the following syntax:

processingInstructionHandler(int parser, string target, string data);

processing instruction

parser For example, if we are parsing the following processing instruction in an XML document:

<?php print "This document was created on Jan 01, 1999";?>

The processing instruction handler will be called with the following parameters:

processingInstructionHandler($parser, "php", string "print \"This XML document

was created on Jan 01, 1999\";");

A sample processing instruction handler might look like this:

Trang 16

function piHandler($parser, $target, $data) {

if (strcmp(strtolower($target), "php") == 0) { eval($data);

} }

If you are defining such a processing instruction handler in your application, then you should do some security checks before executing the code The code embedded through processing instructions can be malicious – for example, it could delete all the files in the server

One security check could be to execute the code in the processing instructions only if the owner of the XML file and the XML parser are the same

The Notation Declaration Handler

The function xml_set_notation_decl_handler() registers the notation declaration handler with the parser The notation declaration handler is called by the parser whenever it encounters a notation declaration in the XML document

int xml_set_notation_decl_handler(int parser, string notationDeclarationHandler);

XML parser with which the notation declaration handler

is registered

notationDeclarat ionHandler

notation declaration handler function If null is specified then no notation declaration handler

is registered

The function returns true on success, otherwise false is returned The function will return false

An error will be generated when the XML document is parsed if the notation declaration handler does not exist

Prototype for the Notation Declaration Handler

The user defined notation declaration handler function, registered with the parser through a call to the xml_set_notation_decl_handler() function, will be called when the parser encounters notation declarations in the XML document and should have the syntax:

Trang 17

string systemId, string publicId);

XML parser which

is calling this function

identifier of the notation declaration

of the notation declaration For example, parsing the following notation declaration of an XML document:

<!NOTATION gif SYSTEM "file:///usr/bin/netscape">

Will cause the notation declaration handler to be called with the following parameters:

notationDeclarationHandler($parser, "gif", "", "file:///usr/bin/netscape", ""); Let's implement a sample notation declaration handler This handler populates the associative array

$helperApps with a mapping between the notation name and the name of the application that will handle the unparsed data of type $notationName The $helperApps array can be used by the unparsed entity declaration handler to identify the application that should be used to process non-XML data We will look at the unparsed entity declaration handler shortly

function notationHandler($parser, $notationName, $base, $systemId, $publicId) { global $helperApps;

Trang 18

The External Entity Reference Handler

The function xml_set_external_entity_ref_handler() registers the external entity reference handler with the XML parser This function is called by the parser when it encounters an external entity reference in an XML document Note that the registered handler is called for external entity references and not external entity declarations

Unlike other parsers (such as Microsoft Internet Explorer 5), the XML parser of PHP does not handle external entities It simply calls the registered external entity reference handler to handle it

int xml_set_external_entity_ref_handler(int parser, string externalEntityRefHandler);

parser No The handle for an XML parser with

which the external entity reference handler is registered

externalEntityRef

Handler No The name of the external entity

reference handler function If null is specified then no external entity reference handler is registered

The function returns true on success; otherwise, false is returned The function will return false

The registered handler function should exist when parsing an XML document, or an error will be generated

Prototype for the External Entity Reference Handler

The user-defined external entity reference handler function, registered with the parser through an xml_set_external_entity_ref_handler function call, will be called when the parser encounters external entity references in the XML document This should have the following syntax:

int externalEntityRefHandler(int parser, string entityName, string base, string systemId, string publicId);

Trang 19

parser No Reference to the

XML parser which

resolving

systemId

Currently the value

of this parameter will always be a null string

identifier of the external entity

of the external entity

The user-defined external entity reference handler should handle the external references in the XML document If a true value is returned by the handler, the parser assumes that the external reference was successfully handled and the parsing continues If the handler returns false, the parser will stop parsing

As an example, suppose that an entity &book_1861002777; is defined in the DTD of an XML document:

<!ENTITY book_1861002777 SYSTEM "1861002777.xml">

And the parser comes across the following line in an XML document:

&book_1861002777;

The external entity reference handler will be called with the parameters:

externalEntityRefHandler($parser, "book_186100277", "", "1861002777.xml", "");

The Unparsed Entity Declaration Handler

The function xml_set_unparsed_entity_decl_handler registers the external entity

reference handler with the xml parser The unparsed entity declaration handler is called by the parser, when it encounters an unparsed entity declaration in an XML document

int xml_set_unparsed_entity_decl_handler(int parser,

string unparsedEntityDeclHandler);

Trang 20

parser No The handle of an XML parser with

which the unparsed entity declaration handler is registered

unparsedEntityDeclHa

ndler No The name of the unparsed entity

declaration handler function If null

is specified then no external entity reference handler is registered

The function returns true on success or false on failure The function returns false if parser is

not a valid parser handle

The registered handler function should exist when an XML is parsed, or an error will be generated

Prototype for the Unparsed Entity Declaration Handler

The user-defined unparsed entity declaration handler function, registered with the parser through an xml_set_unparsed_entity_decl_handler() function call, will be called when the parser encounters an unparsed entity declaration in the XML document Its syntax is:

unparsedEntityDeclHandler(int parser, string entityName, string base, string systemId, string publicId, string notationName);

XML parser which

resolving

systemId

Currently the value

of this parameter will always be a null string

identifier of the unparsed entity

of the unparsed entity

notation (defined in Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 21

an earlier notation declaration), identifying the type

of unparsed data

For example, if the parser encounters the following line (in the DTD) of an XML document:

<!ENTITY book_gif_1861002777 SYSTEM "1861002777.gif" NDATA gif>

The unparsed entity declaration handler will be called with the following parameters:

unparsedEntityDeclHandler($parser, "book_gif_1861002777", "", "1861002777.gif", "", "gif");

The Default Handler

The function xml_set_default_handler() registers the default handler with the XML parser The default handler is called by the parser for all the nodes of the XML document for which handlers can not be registered (such as the XML version declaration, DTD declaration and comments) The default handler is also called for any other nodes for which the handlers are not registered with the parser For example, if the start and end element handlers are not registered with the parser, the parser will call the default handler (if registered) whenever it encounters element opening and closing tags in the XML document

int xml_set_default handler(int parser, string defaultHandler);

parser No The handle of an XML parser with

which the unparsed default handler

is registered

DefaultHandler No The name of the default handler

function If null is specified then no default handler is registered

The function returns true on success, or false on error (e.g if parser is not a valid parser

handle)

The registered handler function should exist when parsing an XML document, or an error is generated

Prototype for the Default Handler

The user-defined default handler gets called by the parser for all the nodes in the XML document for which handler functions are not registered It should have the following syntax:

DefaultHandler(int parser, string data);

Trang 22

document for which there is no registered handler

For example, if the start and end element handlers are not registered with the parser and the parser encounters this line in an XML document:

The default handler will be called with the following values of function parameters:

int xml_set_default handler($parser, "<author sex=\"male" age=\"24\">");

Notice that the entire opening and closing tags of the element are passed as they are

Parsing the XML Document

The xml_parse() function passes the contents of the XML document to the parser This function accomplishes the actual parsing of the document – it calls the appropriate registered handlers as and when it encounters nodes in the document

This function is called after all the handler functions for the various node types in the XML document have been registered with the parser

int xml_parse(int parser, string data, int [isFinal]);

Parameter Optional Description Default

parser, which will parse the supplied data

document The complete contents of the XML file need not be passed in onecall

isFinal Yes Specifies the end of input

data

false

The function returns true if it was able to parse the data passed to it; otherwise, false is returned The error information in case of failure can be found with the xml_get_error_code() and xml_get_error_string() functions We shall look at these functions presently

Trang 23

The following code fragment illustrates the use of the xml_parse() function:

// Open the XML file

if (!xml_parse($xml_parser, $data, feof($fp))) {

die(sprintf("XML error %d %d", xml_get_current_line_number($xml_parser), xml_get_current_column_number($xml_parser))) ;

}

Freeing the Parser

The function xml_parser_free() frees the XML parser which was created with the

xml_parser_create() function All the resources associated with the parser are freed The XML parser should be freed after a complete XML document has been parsed, or if an error occurs while parsing a document

int xml_parse_free(int parser);

XML parser, which

is to freed

The function returns true if the parser was freed, otherwise false

Parser Options

There are two options for the parser We can set values for these options using the

xml_parser_set_option() function, and retrieve the current value with the

xml_parser_get_option() function

These options are:

Option Data Type Description Default

XML_OPTION_CASE_FOLDING Integer If the value of the

option is true, then the element names (start and end tags), will be upper cased, when

true Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 24

the registered handlers are called

XML_OPTION_TARGET_ENCODING String The value of this

option specifies the target encoding used by parser, when it invokes registered handlers

Same as the source encoding value, specified when the parser was created

xml_parser_set_option

The xml_parser_set_option() function sets the option specified in the option argument to

the value in the value argument for the parser associated with the parser handle specified by the

parser argument

int xml_parser_set_option(int parser, int option, mixed value);

The function returns true if the new option was set; if the call failed, false is returned

The function xml_parser_set_option() can be called at any point in the PHP program The new option will take effect for any data that is parsed after the option has been set

xml_parser_get_option

The xml_parser_get_option() function retrieves the value for the option specified by the

option argument for the parser specified by the parser argument

mixed xml_parser_get_option(int parser, int option);

This function returns the value of the option (the data type of the return value therefore depends on the option) If either the parser or the option argument is invalid then false is returned

Utility Functions

The remaining functions provide useful information or services that we may need when parsing an XML document These functions provide information about any errors which occurred, the current position in the XML document There are also functions for encoding and decoding text

xml_get_error_code

The function xml_get_error_code() returns the error code from the XML parser

int xml_get_error_code(int parser);

Trang 25

parser No The handle of an XML parser

This function can be called after xml_parse() has returned false to find out the exact reason why the parsing of the passed data failed The function returns false if the parser is not a valid XML parser

xml_error_string

The xml_error_string() function returns the error message corresponding to an error code

string xml_get_error_code(int errorCode);

ErrorCode No An error code returned by the

int xml_get_current_line_number(int parser);

Parameter Optional Description Default

XML parser This function returns the line number of the XML document that the parser is currently parsing If

parser is not a valid parser, false is returned This function can be used to print the line number (for debugging purposes), when a call to the xml_parse() function returns false

xml_get_current_column_number

The xml_get_current_column_number() function is similar to

xml_get_current_line_number(); the only difference is that it returns the number of the current column in the line that the parser is parsing

int xml_get_current_column_number(int parser);

The functions xml_get_current_line_number() and

xml_get_current_column_number() can be used together when reporting parse errors in the XML document to give the user the exact location where the error occurred:

if (!xml_parser($parser, $data)) {

Trang 26

// If error in parsing $data, then print line number and column number // of the XML file

die(sprintf("Error in XML document at line %d column %d\n", xml_get_current_line_number($parser),

xml_get_current_column_number($parser))) ; }

The function utf8_decode() converts a UTF-8 encoded string to ISO-8859-1 encoding:

string utf8_decode(string data);

string This function returns an ISO-8859-1 string corresponding to data

utf8_encode

The utf8_encode() function converts an ISO-8859-1 encoded string to UTF-8 encoding:

string utf8_encode(String data);

string The function returns a UTF-8 string corresponding to data

Trang 27

Web-Enabling Enterprise Applications

Now XML is getting used as the format for exchanging the data between different

business-to-business applications Industry standard Document Type Definition's are getting defined to describe orders, transactions, inventory, billing etc PHP can be used to provide the web based front end for these business-to-business applications

Smart Searches

PHP can be used to search XML documents For example, if all the articles in a web site are written using the same DTD, which defines elements for author, title, abstract etc., then PHP can be used to search for the articles depending on author, title etc

Converting XML to HTML

Currently there are very few browsers which have built-in XML support PHP can be used on the server side to parse XML documents and return pure HTML to the client This technique would allow all browsers, irrespective of their level of XML support, to view XML documents

Additionally, the PHP script could send either the XML document or the converted HTML to the client, depending on the browser that has sent the request For example, if the request for an XML file was sent from Internet Explorer 5 then the PHP script could simply return the XML file; otherwise, the script would convert the XML data into HTML and sent an HTML page to the browser

Different Views of the Same Data

PHP can be used to present different views of the same XML document, by deleting or modifying nodes within the document

A Sample PHP XML Application

To give a simple example of what can be done with XML and PHP, we will implement a simple "Book Information Site" This site will allow users to search books using the title, author or ISBN of the book as the search criterion Alternatively, the user can view the complete list of books After the user has searched for all books of interest, he or she can view the table of contents of the book

All the information about the books will be stored in an XML file

Before looking at the code of the application, let us first look at the user interface of the application to get a feel for it

The main page of the application allows the user to search books using the title, author or ISBN of the book as the search criterion; alternatively, users can view the complete list of books:

Trang 28

To searching for a book, the user enters the search keyword in the Search Keyword text input box, and specifies the search category using the Search Books By list box In the above figure, the search keyword is 'Java', and the search will be by title, so the application will search for all books with the word "Java" in the title

To view all the books in the file, the user can click on the Complete list of books button

The results of the search are shown in the figure below Clicking on the title of a book will present the user with the table of contents for that book:

Trang 29

The other option available from the main page is to view the complete list of all the books Again, the title of the book acts as link which the user can click to view the book's table of contents:

Trang 30

This figure displays the table of contents of the book, selected by the user from Complete List of

Books or Search Page

Now, after having looked at the screen shots of the application, we understand the functionality of the application So it is the right time to look at the code

The book details are stored in XML files in order to enable smart searches, which form an important requirement of the application A relational database could have been used for storing the data, but it would not have made sense to install a relational database for this simple application Another important feature of XML is that it stores data in plain text files, so the data can easily be exchanged between applications, even on different platforms It is even human-readable

The XML file books.xml stores the book details The following is a sample books.xml file containing details of three books First we have the DTD:

<?xml version="1.0"?>

<!DOCTYPE listofbooks[

<!ELEMENT book (title, authors, isbn, price, toc)*>

<!ELEMENT title (#PCDATA)>

<!ELEMENT authors (author*)>

<!ELEMENT author (#PCDATA)>

<!ELEMENT isbn (#PCDATA)>

Trang 31

<!ELEMENT price (#PCDATA)>

<!ELEMENT toc (chapters, appendixes)>

<!ELEMENT chapters (chapter)*>

<!ELEMENT chapter (#PCDATA)>

<!ELEMENT appendixes (appendix)*>

<!ELEMENT appendix (#PCDATA)>

<!ATTLIST price currency CDATA #REQUIRED>

]>

Then comes the data itself The root element which encloses the data is named <listofbooks> Each book is represented by a <book> element, which contains child elements for each item of information stored about the book, such as <title>, <isbn> and <price> Since there may be more than one author for a book, there is an <authors> element for each book, which has a child

<author> element for each author The table of contents for the book is stored in another XML file,

so we use an external entity reference to refer to that:

Trang 32

In this case, the root element is named <toc>, and this contains two elements – <chapters> and

<appendices> These contain child <chapter> and <appendix> elements respectively:

<chapter>Error Handling and Event Logging</chapter>

<chapter>Sessions and Session Tracking</chapter>

<chapter>Using the Servlet Context</chapter>

<chapter>Dynamic Content Generation</chapter>

<chapter>Introduction to JavaServer Pages</chapter>

<chapter>Connecting to Databases</chapter>

<chapter>Connection Pooling</chapter>

<chapter>Servlet Chaining</chapter>

<chapter>Servlet Communications</chapter>

<chapter>Distributed Computing with Servlets</chapter>

<chapter>JavaMail and Servlets</chapter>

<chapter>Introducing XML</chapter>

<chapter>Weeds of El-Limon 2</chapter>

<chapter>Bug Tracker Case Study</chapter>

Bug Tracker Case Study: Elaboration, Construction and Transition </chapter>

<chapter>Moving from CGI to Servlets</chapter>

<chapter>Internationalizing Web Sites</chapter>

<chapter>Smart Servlets</chapter>

<chapter>Server Programming with JNDI</chapter>

<chapter>Using LDAP and Java</chapter>

<chapter>Enterprise JavaBeans</chapter>

<chapter>Indexing and Searching</chapter>

<chapter>JINI and JavaSpaces: Servers of the Future</chapter>

<chapter>Working With JavaSpaces</chapter>

<chapter>Coding a Jini-based Website</chapter>

</chapters>

<appendix>Java Object Streams and Serialization</appendix>

Trang 33

<appendix>The LogWriter Class</appendix>

<appendix>UML Tutorial</appendix>

<appendix>JServ Configuration</appendix>

<appendix>ServletRunner and Java Web Server Configuration</appendix>

<appendix>JRun Configuration</appendix>

<appendix>JSDK API Reference</appendix>

<appendix>JavaServer Pages API Reference</appendix>

<appendix>JNDI API Reference</appendix>

<appendix>Core JavaMail / JAF API Reference</appendix>

<appendix>Core Jini API Reference</appendix>

<appendix>JavaSpaces API Reference</appendix>

Trang 34

The file common.php contains common functions, which are used throughout the application First

we define some variables which we'll be using in all the PHP pages – the XML file which contains the data, a variable called $currentTag to hold the name of the element that is currently being parsed, and a number of variables to store the details for the book element that the parser is currently parsing:

<?php $file = "books.xml";

$currentTag = "";

$titleValue = ""; // Value of the title element $authorsValue = array(); // Array of the values of the author elements $isbnValue = ""; // Value of the isbn element

$priceValue = ""; // Value of the price element $currencyValue = ""; // Value of the book element's currency attribute $descriptionValue = ""; // Value of the description entity reference $authorCount=0; // Variable used to populate the $authorsValue array

We will also define an array to contain the book details:

$books = array(); // Contains the details of books

Next we define the start element handler of the parser We store the element name in the global variable $currentTag, so that the character data handler can identify the element that is currently being parsed If the current element is <price>, we store the value of the currency attribute in the global variable $currencyValue:

Trang 35

function startElement($parser, $name, $attr) {

global $currentTag, $currencyValue;

$books array, and reinitialize the global variables to store the details of another book If the function

is called for an <author> element, we increment the $authorCount variable:

function endElement($parser, $name) {

global $titleValue, $authorsValue, $isbnValue, $priceValue,

$currencyValue, $books, $authorCount, $descriptionValue;

Now we define the character data handler Depending on the value of $currentTag, we concatenate

$data to the appropriate global variable:

function characterData($parser, $data) {

global $titleValue, $authorsValue, $isbnValue,$priceValue,

Trang 36

Our last handler is for external entity references; we simply store the value of $systemId in the global variable $descriptionValue:

function externalEntityHandler($parser, $entityName, $base, $systemId, $publicId) { global $descriptionValue;

if (!systemId) return false;

$descriptionValue = $systemId;

return true;

} The function readBookInfo() parses the XML document and returns an array containing the details of books First, we create a parser, register our handlers and set the

XML_OPTION_CASE_FOLDING option of the XML parser to false to ensure that all the element names are converted to upper case:

function readBookInfo() { global $file, $books;

$xml_parser = xml_parser_create();

xml_set_element_handler($xml_parser, "startElement", "endElement");

xml_set_character_data_handler($xml_parser,"characterData");

xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler"); xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);

Next we open the XML file, read the data from the file in 4K chunks, and pass the data to the XML parser $xml_parser:

if (!($fp = fopen($file, "r"))) { die("Could not open $file for reading") ; }

while (($data = fread($fp, 4096))) {

if (!xml_parse($xml_parser, $data, feof($fp))){

die(sprintf("XML error at line %d column %d", xml_get_current_line_number($xml_parser), xml_get_current_column_number($xml_parser)));

} } When we've finished parsing, we free the XML parser and return the global $books array:

of the selected book:

Trang 37

function printBookInfo($titleValue, $authorsValue, $isbnValue, $priceValue, $currencyValue) {

function searchBookByISBN($books, $isbn) {

for($i = 0; $i < count($books); $i++) {

Trang 38

<?php require("common.php");

Within the table, we require the common.php file and call the readBookInfo() to parse the XML document This returns an array containing the details of all the books in the file, which we store in the variable $books:

} ?>

$books = readBookInfo();

Trang 39

If the search category is "ISBN", we print the details of the book with $searchKeyword as the value of its <isbn> element:

} elseif (strcmp($searchBy, "author")== 0) {

for ($i=0; $i<count($books); $i++) {

$authorsValue = $books[$i]["authors"];

for($j=0; $j<count($authorsValue)-1; $j++) {

if (strcmp(strtolower(trim($authorsValue[$j])),

strtolower(trim($searchKeyword))) == 0) printBookInfo($books[$i]["title"],

} else if (strcmp($searchBy, "title") == 0) {

for ($i=0; $i<count($books); $i++) {

if (strstr(strtolower(trim($books[$i]["title"])),

strtolower(trim($searchKeyword))))

printBookInfo($books[$i]["title"],

$books[$i]["authors"], $books[$i]["isbn"], $books[$i]["price"],

Trang 40

<?php require("common.php");

The data in the table of contents XML files has a different structure to that containing the book details, so we will define another set of global variables for parsing this XML file:

$currentTag1 = ""; // Name of the element that is being parsed $chapters = array(); // Array of the values of chapter elements $chapterNo=0; // Variable used to populate the $chapters array $appendixes = array(); // Array of the values of appendix elements $appendixNo=0; // Variable used to populate the $appendix array

We will also need new handler functions for the parser The handler for the opening element tags simply assigns the name of the current element to the $currentTag1 global variable:

function startElement1($parser, $name, $attr) { global $currentTag1;

} else if (strcmp($name, "appendix") == 0) { $appendixNo++ ;

} }

In the character data handler for the parser, we again concatenate the value of $data to the appropriate array:

function characterData1($parser, $data) { global $chapters, $chapterNo, $appendixes, $appendixNo, $currentTag1;

if (strcmp($currentTag1, "chapter")==0) { $chapters[$chapterNo] = $data;

} else if (strcmp($currentTag1, "appendix") == 0) { $appendixes[$appendixNo] = $data;

} }

Next we parse the XML file containing the details of all the books and find the book with an <isbn>element which has the same value as :

Tiêu đề	Professional PHP Programming Phần 5
Trường học	University of Technology
Chuyên ngành	Computer Science
Thể loại	Bài báo
Năm xuất bản	2025
Thành phố	Hanoi

Định dạng
Số trang	86
Dung lượng	1,67 MB