1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu You know nothing pdf

63 266 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề You Know Nothing
Chuyên ngành PHP / MySQL Internet Solutions
Thể loại Editorial
Năm xuất bản 2000
Thành phố Ann Arbor
Định dạng
Số trang 63
Dung lượng 2,96 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Among other things, MDB2 features: • An OO-style query API • A DSN data source name or array format for specifying database servers • Datatype abstraction and on demand datatype conversi

Trang 2

SITEWORX control panel

/mo SMALL BIZ $ 21 95

NODEWORX Reseller Access

All of our servers run our in-house developed PHP/MySQL

server control panel: INTERWORX-CP

INTERWORX-CP features include:

- Rigorous spam / virus filtering

- Detailed website usage stats (including realtime metrics)

- Superb file management; WYSIWYG HTML editor

INTERWORX-CP is also available for your dedicated server Just visit

http://interworx.info for more information and to place your order

WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS

LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!

ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGE

VISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS

D e d i c a t e d & M a n a g e d D e d i c a t e d s e r v e r s o l u t i o n s a l s o a v a i l a b l e

7500 MB Storage

100 GB TransferUnlimited MySQL DatabasesHost Unlimited DomainsPHP5 / MySQL 4.1.XNODEWORX Reseller Access

/mo

C O N T R O L P A N E L :

NEW! PHP 5 & MYSQL 4.1.X

PHP4 & MySQL 3.x/4.0.x options also available

We'll install any PHP extension you need! Just ask :)

MONEY BACK GUARANTEE

FREE DOMAIN NAME

WITH ANY ANNUAL SIGNUP

4.1.x

3.x/4.0.x

Trang 4

Jaws 0.5: Just When You Thought it

was Safe to Go Back in the Water

An Advanced PHP & MySQL Hit Counter

Trang 6

NO OT y yo TH ou HIIN u k kn NG no G ow w

Software development is humbling Just when you think

you’ve got a solid handle on every last (important) bit of

tech-nology you need to complete the project at hand, you’re

often slapped in the face with the news that you’re just plain

wrong This news can be both frustrating, and encouraging (at the

same time, believe it or not)

Let me set the scene Your team has been commissioned with

adding a new section to your corporate intranet In the course of

the addition, you adopt a new technology of some sort Perhaps

this is a new database abstraction layer, or a different manner of

handling HTML forms It could be anything; it doesn’t really

mat-ter Your team has worked on this new module for two months

You’ve put all of your collective knowledge and experience into

the project The launch date is in a couple days, and you’re

actu-ally going to make your deadline

So, this sounds pretty good so far; what could go wrong?

Perhaps one of the directors is about to walk in with a must-have

feature that needs to be in the next release, and will disrupt your

schedule? Sure This happens all the time, but it’s not the scenario

I’m thinking of—that’s just frustrating, and rarely the least bit

encouraging The bad situation that I’m thinking of is (oddly) free

of managerial influence

This new technology that you’ve adopted is really great It has a

few problems, but you’ve managed to work around them All

things considered, it’s saved you many hours in the course of the

past few weeks, and you’ve been bragging about it to your

devel-oper-friends who work at different companies

Then, in the course of your daily, duly-diligent reading of various

PHP news sources, you discover a brand-new,

just-released-yester-day extension that could replace this other new technology you’ve

already adopted Not only is it a suitable replacement, but it solves

all of the problems you had to work around, and also opens the

door to new possibilities that you didn’t even consider

Frustrating because you’re about to release a critical project that

encompasses technology that you’ve just discovered is inferior But

encouraging because you’re now awaiting the day you’re allowed

to rip out all of that legacy (but, ironically, not-yet-released) code

and employ a superior product

So, what’s my point? Simple: I know nothing What I think I

know is only temporary, and could be supplanted at any moment

My life as a developer is a constant journey of staying on top of

things, and no matter how much I think I “have it covered,”

there’s always something new about to appear on the weblog,

newsgroup, or source repository of tomorrow

I hope the articles in this issue open your eyes to new ideas

Especially the XMLPull article, which I think is pretty sweet new

(well, newer) technology, and that it’s not too late to incorporate

these ideas into your current—or next—project

php|architect

Volume IV - Issue 5 May, 2005

Graphics & Layout

Markus Nix

php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada

Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, list- ings and figures, the publisher assumes no responsibilities with regards

of use of the information contained herein or in all associated material.

Contact Information:

Copyright © 2003-2005 Marco Tabini & Associates, Inc — All Rights Reserved

Trang 7

Solar 0.2.0 paul-m-jones.com announces the release of Solar 0.2.0.

What is it? According to solarphp.com: "Solar is a simple object library and application repository (that is, a com- bined class library and application component suite) for PHP5."

"Solar provides simple, easy-to-comprehend classes and components for the mon aspects of web-based rapid application development, all under the LGPL."

com-Solar is designed for developers who intend to distribute their applications to the world This means the database driver functions work exactly the same way for each supported database It also means that localization support is built in from the start." Get all the latest info from solarphp.com

phpBB 2.0.14The phpBB Group announces the release of phpBB 2.0.14, the "We know we are (not) furry" edition "This release addresses some bugfixes as well as fixing some minor non- critical security issues All issues not reported to us before being released are not credited to the founder, as usual."

"As with all new releases, we urge you to update as soon as possible You can, of course, find this download on our downloads page (http://www.phpbb.com/down- loads.php) As usual, three packages are available to simplify your update."

"The Full Package contains entire phpBB2 source and English language package."

For more information visit: http://phpbb.com

Vogoo-API.com is happy to announce

the release of Vogoo PHP API 0.8.2.

Vogoo-API.com announces: Vogoo PHP

API v0.8.2 is a free PHP API licensed

under the terms of the GNU GPL With

Vogoo PHP API, you can easily and

freely add professional collaborative

filtering features to your Web Site.

v0.8.2 features

• Handles all member/product

votes (available since v0.8)

• Fast computation of similarities

between members (available

since v0.8)

• One-to-one product

recommen-dations (available since v0.8)

• Ability for members to specify

when they are not interested in

a product recommendation

Planned features for future versions

• New engine based on products

recommendations that gives

better performances when little

information is available on the

member.

• Real time targeted ads

• Handles multiple product

cate-gories

• Collaborative filtering features

available for non-member

visi-tors

• Administration tool

• Engine for 'related sales'.

• Engine for 'related sales'.

Check out Vogoo-API.com for all

the latest info.

The Zend PHP Certification Practice Test Book is now available!

We're happy to announce that, after many months of hard work, the Zend PHP Certification Practice Test Book, written by John Coggeshall and Marco Tabini, is now available for sale from our website and most book sellers worldwide!

The book provides 200 questions designed as a learning and practice tool for the Zend PHP Certification exam Each question has been written and edited by four members of the Zend Education Board the very same group who prepared the exam The questions, which cover every topic in the exam, come with a detailed answer that explains not only the correct choice, but also the question's intention, pitfalls and the best strategy for tackling similar topics during the exam.

For more information, visit h http://www.phparch.com/cert/mock_testing.php p

Trang 8

Check out some of the hottest new releases from PEAR.

MDB2_Schema 0.2.0

PPEEAARR::::MMDDBB22 SScchheemmaa enables users to maintain RRDDBBMMSS independent schema files in XML that can be used to create, alter and drop

database entities and insert data into a database Reverse engineering database schemas from existing databases is also supported The format is compatible with both PEAR::MDB and Metabase.

MDB2 2.0.0beta4

PEAR MDB2 is a merge of the PEAR DB and Metabase php database abstraction layers.

Note that the API will be adapted to better fit with the new PHP 5-only PDO before the first stable release.

It provides a common API for all supported RDBMS The main difference to most other DB abstraction packages is that MDB2 goes much further to ensure portability Among other things, MDB2 features:

• An OO-style query API

• A DSN (data source name) or array format for specifying database servers

• Datatype abstraction and on demand datatype conversion

• Portable error codes

• Sequential and non sequential row fetching as well as bulk fetching

• Ability to make buffered and unbuffered queries

• Ordered array and associative array for the fetched rows

• Prepare/execute (bind) emulation

• Sequence emulation

• Replace emulation

• Limited Subselect emulation

• Row limit support

• Transactions support

• Large Object support

• Index/Unique support

• Module Framework to load advanced functionality on demand

• Table information interface

• RDBMS management methods (creating, dropping, altering)

• RDBMS independent xml based schema definition management

• Reverse engineering schemas from an existing DB (currently only MySQL)

• Full integration into the PEAR Framework

• PHPDoc API documentation

DataObject's links.ini file correctly, it will also automatically detect if a table field is a foreign key and will populate a selectbox with the linked table's entries There are many optional parameters that you can place in your DataObjects.ini or in the properties of your

derived classes, that you can use to fine-tune the form-generation, gradually turning the prototypes into fully-featured forms, and you can take control at any stage of the process.

Net_GeoIP 0.9.0alpha1

A library that uses Maxmind's GeoIP databases to accurately determine geographic location of an IP address.

Trang 9

Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.

While colorer is primarily designed for use with text editors, it can be also used for non-interactive syntax highlighting, for example,

in web applications This PHP extension provides basic functions for syntax highlighting.

pivotal to the increasing demand for Open Source software Topics include Scalable Internet Architectures, Web Services, PHP,

mod_perl, Apache HTTP Server, Java, XML, Subversion, and SpamAssassin.

The three main conference days offer a wide range of beginner, intermediate and advanced sessions ApacheCon attendees have more than 70 sessions to choose from, to learn firsthand the latest developments of key Open-Source projects including the Apache HTTP Server, the world's most popular web server software.

With plenty of room for networking and peer discussions, attendees can meet ASF Members and participants during the ApacheCon Expo, evening events, Birds Of a Feather sessions and a number of informal social gatherings."

For more information visit: http://www.apachecon.com/

VS.Php 1.1.1

Jcx.Software brings news of the immediate availability of

VS.Php version 1.1.1 This update adds support for PhpDoc

commenting, secure ftp deployment capabilities and many

bug fixes

PhpDoc is a powerful feature of PHP that allows the

devel-oper to add comments to the source code that can be used

to generate documentation VS.Php uses this information to

provide a better intellisense content For instance, VS.Php is

able to parse those comments to determine what type is a

particular variable Intellisense uses this information to

bet-ter help the developer This update also adds support for

secure ftp protocol for deploying applications through a

secure connection.

For information or to download VS.Php, visit:

http://www.jcxsoftware.com/

PHPEdit 1.2PHPEdit proudly announces the release of the latest version, PHPEdit 1.2

Next major version of PHPEdit is finally available for load This version includes lots of changes in its internals, and adds new, powerful features to the IDE, like complete PHP5 support, real-time syntax checking, jump to declaration, SimpleTest integration, new document templates, phpDocumentor Wizard and lots of enhancements in existing tools like CodeHint, CodeInsight and CodeBrowser.

down-This version is available for free to all our customers You can download it and test it for 30 days You can also buy a license to avoid the time limit.

To grab the latest version, visit

http://www.waterproof.fr/products/PHPEdit/

Trang 10

The following methodology was motivated by a

request from a client of mine who asked me to

provide a web page access counter for their main

corporate web site A condition of the deal, though,

was that they did not want to show the actual number

of accesses, publicly, on the web site, itself Instead,

they wanted to keep track this data privately

Their reasons for omitting a public counter were in

keeping with the idea that they did not want to

broad-cast the activity on their site to all visitors, and, in

keep-ing with the tone of their message, did not desire to

display a typical web page access counter on their site

Instead, they wanted an access counter that would

provide them with a means of comparing and

contrast-ing the number of accesses from day to day so that

they could analyze advertising impacts on the number

of visitors who were hitting their site

As you may know, numerous types of Web counters

exist that are wide ranging in their capabilities and

styles However, I wanted to tailor a solution for my

client that would keep track of the number of accesses

to their site, while providing a tool to view these data

in a manner that was meaningful, and comparative

The output would provide an at-a-glance summary that

would allow my client to assess the effectiveness of

advertising campaigns with respect to changes in site

activity

What developed was a custom hit counter whichcontinues to evolve over time—an example screenshotcan be seen in Figure 1 The benefits of this hit count-

er are not so much in its uniqueness as in the ties it offers to the average PHP developer who is inter-ested in evolving their skills in the domain of PHP,

possibili-REQUIREMENTS

(5.0.4 available) OS

Win2K Prof, Win2K Advanced Server, WinXP SP1/SP2

or greater (4.1 available)

The Anatomy of a Hit

An Advanced PHP & MySQL Hit Counter

by John R Zaleski, Ph.D.

The combined approach of capturing web page access,

and charting the results provides a simple standalone

capability for graphically displaying hit counts to a web

site that requires only a basic working knowledge of PHP

and MySQL, yet provides a basic model for expanding and

developing a much more sophisticated counter.

Furthermore, the methodology for charting the hit count

data can be decoupled from basic web page access

count-ing for use in academic, business, or other types of data

mining applications where data charting and mining

pro-vide a unique way of comparing and contrasting data as

they change over time.

i

Trang 11

MySQL, and user interface design.

The counter and graphing methodology I provide

here are very simple to understand and can be

modi-fied and used for many applications, even beyond web

page access counting

Calling the Hit Counter

The visual hit counter methodology consists of two

sep-arate pieces of code: one for incrementing hit count

statistics on a web page, and another for analyzing and

mining those statistics for relevant value The decision

to separate these two sets of functionalities is

some-what based on heuristics, but are born out of logic: by

separating the processing from the actual hit counting,

we remove the potential performance impacts

associat-ed with database access for each visit to a web page

Instead, we assign the analytical data mining of the

sta-tistics themselves to a web site dedicated to their study

This has the overall effect of reducing the load time of

the original web site so that users are not impacted

To implement the data collection part of the process,

the initial step in any web page involves incorporating

the following lines of code:

<!— Add the client hit counter —>

<?php include “./hc.php”; ?>

<!— End body tag —>

The hhcc pphhpp file is then included in the web page, at the

desired location Those wishing to make use of this

methodology need only include the above code

seg-ment in their PHP page (once all supporting files have

been uploaded to the server), and the hit counter

becomes operational

The hhcc pphhpp code contains the logic to open a data

file (hhiittccoouunntteerr ddaatt), increment a counter, and store

various other statistics to the opened file each time a

web page with the preceding include statement is

encountered

We begin the code in hhcc pphhpp by assigning the name

of the data file to the variable $$CCOOUUNNTT FFIILLEE:

If the file referred to by $$CCOOUUNNTT FFIILLEE exists, and already

contains data, we can assume the contents are the

results of previous pages accesses So, we read the

con-tents of the entire file Upon reading the last value, I

assign the content to the $$ccoonntteennttss variable, increment

the value by 1, and append the new value to the

hhiittccoouunntteerr ddaatt file

If this is the first time the web page has been

accessed, the file is empty (or the file does not exist), so

we have to create the file and write new data to it In

addition to simply writing the current counter value, I

also write the date and time stamp; this is to facilitate

the data mining process The hhiittccoouunntteerr ddaatt file hasthe following format:

[1] 23 14 45 PM Wednesday July 28th 2004 1 [2] 06 19 09 AM Thursday July 29th 2004 2 [3] 08 29 13 AM Thursday July 29th 2004 3

Note that much more information can be added (such

as the identity of those accessing the web page).However, that code would need to be added to thestructure of the hit count listing The code fragmentresponsible for writing the output listing above is:

fwrite( $fp,”[“.$counter “] “.date(“h:i A l F dS Y”).” “.

$counter.” \n”);

The entire code listing for the hit counter is contained

in Listing 1 It is important to set the permissions to mit the hhcc pphhpp file to read and write files in the directo-

per-ry in which it is placed If this is not done properly, thescript will be unable to write to the hhiittccoouunntteerr ddaatt file

Plotting Preliminaries

Plotting preparation is accomplished using the ddeexx pphhpp file (Listing 2) As I explained earlier, I hadopted to create the hit counter method independently

ssiitteeiinn of the plotting code to decouple the hit countermethod from the database This serves several purpos-

es First, it allows those interested in just a plain hitcounter to implement it without requiring them tomaster the techniques of database connectivity.Second, this takes performance considerations intoaccount by avoiding database access during the count-

er incrementing process Third, and finally, this enablesthe user to alter and improve the plotting routine inde-pendently of the hit counter so that accurate statisticscan continue to be kept by keeping the index pageintact

It will be noted that in the hit counter method Ideveloped in Listing 1, there is no direct output of thenumber of hits to the Web page This is a matter ofchoice for the Web page owner Sometimes individuals

Figure 1

Trang 12

perceive that, if the count is too low, this can bode

poorly for return visits, while others believe that the hit

count statistic may be seen as inappropriate or tacky for

the particular site I manage several sites for local

busi-nesses, and I have found have experienced both kinds

of sentiments from the business owners Thus, by

cre-ating this separate method, and only publishing the

link to a site that is not directly associated with the web

index page and its child links, the business owners can

privately view the web page statistics to determine how

many accesses have been made They can also view

when these hits occurred, in the course of the past

weeks, and months, and correlate the data to external

events (for instance, during periods of specific types of

advertising)

Updating the Database

I begin by opening a connection to the database and

entering all existing data from the hit counter method

into it This is accomplished in the ssiitteeIInnddeexx pphhpp code:

$conn = mysql_connect(“localhost”, “root”,”admin”);

In the examples I provide, everything is run on the local

machine (llooccaallhhoosstt), and I have set the username and

password to rroooott and aaddmmiinn, respectively The name of

the database instance can be arbitrarily defined by the

user; I chose ssiitteessttaattss Developers have their own

naming conventions, and I’m merely giving you some

insight into my own So, selecting the appropriatedatabase is accomplished via the following statement:

This query allows me to determine the current number

of rows contained in the table–this will be necessarylater In addition, I load an array with the data that I justread To plot the data, I need it in a form that I canmanipulate in memory:

while ($newArray = mysql_fetch_array($qry) ) {

$visits = $newArray[‘visits’];

if ( strcmp( $debug, “yes” ) == 0 ) echo “ maxVisits = “ $maxVisits

“ value from db = “ $visits “<br>”;

if ( $visits > $maxVisits ) $maxVisits = $visits;

72

73 // If file exists, but has no content, this means it is

74 // the first time the counter is being used In this

75 // instance, write the counter number and the date/time

76 // stamp to the hit counter file, with the counter

86 if ( $debug == 1 ) echo “[“ $counter “] “

87 date ( “h:i A l F dS, Y” );

16 // If file exists, and has content, read that content,

17 // extract the counter value, add 1 to it, and re-write

18 // to the counter data file

19 //

20

21 if ( filesize ( $COUNT_FILE ) > 0 ) {

22 $contents = fread ( $fp , filesize ( $COUNT_FILE ) );

23 if ( $debug == 1 ) echo $contents ;

Trang 13

and adjust our old maximum to reflect the current

value The array variable, $$vviissiittss, now contains all of

the data from the database Therefore, $$vviissiittss is a

multi-dimensional array that allows us to keep track of

all of this data The time has come to read the

hhiittccoouunntteerr ddaatt file and determine what’s new so that

this can be added to the database, and the $$vviissiittss

array The hhiittccoouunntteerr ddaatt file is opened and its records

are stored in a new temporary array, $$ffiilleeEElleemmeennttss:

The explode function is very useful in expanding the

elements read from the data file into separate fields that

are then assigned to the $$ffiilleeEElleemmeennttss array This is

simple because the field delimiter in the hhiittccoouunntteerr ddaatt

file is the space character

The next step in the process involves locating the

cur-rent position in the database and determining how

many new data points need to be added Then, we

locate where to begin entering data into the database

table This is accomplished by reading the

hhiittccoouunntteerr ddaatt file and comparing the maximum

num-ber of visits last recorded in the database with the

asso-ciated visit data contained in the data file When the

two are equal, the point has been reached in the data

file wherein the last entry was made to the database

Any data contained beyond this point represents new

information that must be inserted into the instance

This defines the starting index for future inserts into the

database, which we fill using a ffoorr loop as follows:

for($k = $startIndex+1; $k < sizeof($data)-1; $k++ )

vis-values (‘’, ‘$hour’, ‘$minute’, ‘$second’,

‘$DayofWeek’, ‘$Month’, ‘$DayofMonth’, ‘$Year’,

$$mmiinnuuttee, $$sseeccoonndd, $$DDaayyooffWWeeeekk, $$MMoonntthh, $$DDaayyooffMMoonntthh,

$$YYeeaarr, and $$vviissiittss

Querying Results

Listing 3 is what I’ll call qquueerryyDDbb pphhpp—one of the ting workhorses of the methodology I start by perform-ing a general query and fetching all data within thedatabase:

Then, I assign these data to an array:

while ($newArray = mysql_fetch_array($qry) ) {

I scale the plotting of the individual bars to the currentmaximum value contained within the database This islogical because over time, as more data accumulates,the overall maximum number of visits increases It istherefore necessary to scale all data by the new maxi-mum value so that earlier hit count recordings will dis-play proportionally with respect to one another.Furthermore, since the maximum number of visits is

“T he output would provide an

at-a-glance summary that would

allow my client to assess

the effectiveness of

advertising campaigns ”

Trang 14

104 // Determine where to begin new data entry into database,

105 // based on what is contained in the hitcounter file

106 //******************************************************** 107

149 $sql = “insert into sitevisits (visit_ID, hour,

150 minute, second, DayofWeek, Month, DayofMonth,

151 Year, visits) values (‘’, ‘$hour’, ‘$minute’,

152 ‘$second’, ‘$DayofWeek’, ‘$Month’, ‘$DayofMonth’,

14 // Open the db connection to sitestats

15 // and look at the last entry

22 mysql_select_db ( “sitestats” , $conn )

23 or die( “Could not open sitestats: “ mysql_error ());

30 $check = “select * from $table” ;

31 $qry = mysql_query ( $check )

32 or die ( “Could not match data: “ mysql_error ());

33 $nRows = mysql_num_rows ( $qry );

34 $maxVisits = 0 ;

35

36 while ( $newArray = mysql_fetch_array ( $qry ) ) {

37 $visits = $newArray [ ‘visits’ ];

38

39 if ( strcmp ( $debug , “yes” ) == 0 )

40 echo “ maxVisits = “ $maxVisits

41 “ value from db = “ $visits “<br>” ;

60 // Open the db connection to sitestats

61 // and prepare to insert data

71 mysql_select_db ( “sitestats” , $conn )

72 or die( “Could not open sitestats: “ mysql_error ());

Trang 15

(logically) always represented by the last data element

within the database, it follows that we need to scale

based on this last element

Thus, I define a maximum width using the variable

$$ggrraapphhWWiiddtthhMMaaxx == 440000 pixels Now, I need to define the

height of each bar (that is, the width in the vertical

sense), which I’ve arbitrarily assigned to be $$bbaarrHHeeiigghhtt

== 1100;; pixels, and the absolute maximum width of each

bar, taken as the latest data entry in the database

ssiitteessttaattss table $$bbaarrMMaaxx == $$ddbbEElleemmeennttss[[$$nnRRoowwss 11]][[44]];;

I also need to define the number of rows to plot on a

given web page This is an important feature because

the number that should be plotted is related to each

bar’s width as well as the resolution of the screen and

the ability of the user to see the data clearly without

having to use the scroll bar Scrollbars can become a

nuisance, too, if the user is continually moving them to

see all data Hence, one requirement which I imposed

was to keep all of the data within the eye span of the

user So, I opted for a relatively low count in terms of

bars per page Now, since I will only be plotting 10 bars

per page, I need to come up with a mechanism for

allowing the user to move to a new page and show the

next 10 bars in the database I therefore defined

vari-ables to keep track of the starting row and the ending

row on any given page These quantities are

represent-ed as follows:

$numberRowsToPlot = 10;

$startRow = 0;

$endRow = $startRow + $numberRowsToPlot;

These equations will become important, shortly First,let’s plot the first 10 rows of data We do this in a for-loop, like this:

for ( $i = $startRow; $i < $endRow; $i++ ) {

$countVal = intval( $dbElements[$i][4] );

$barWidth = $graphWidthMax * $countVal/$barMax;

//

}

I begin with the $$ssttaarrttRRooww on the page and end withthe first $$eennddRRooww I retrieve the $$ii—the current index ofthe $$ddbbEElleemmeennttss array for counter value—and assign it

to variable $$ccoouunnttVVaall I then scale the $$bbaarrWWiiddtthh in portion to the maximum graphing width (defined ear-lier as 400 pixels) normalized by the maximum number

pro-of hits This gives me a proportional width with respect

to the 400-pixel limit within the plotting frame (here,the web page itself)

You’ll note from Figure 1 that data are printed side of the bars, including the value of a particular barwidth This is done in a straightforward manner by sim-ply encapsulating the printing of the data within atable, as columns within that table This ensures uni-form spacing and alignment of the data within thecells

along-Without going into all of the details (because Listing

3 provides the explicit implementation), the key ments of this plotting process are as follows: create atable, enter the data values into columns via an echostatement, and concatenate multiple columns so thatthe data are aligned across the page:

ele-echo “<tr>”;

echo “<td align=right><font face=arial color=blue size=2>”;

echo $dbElements[$i][0] “,</font></td>”;

But how do we actually create the bar? Very easily: wehave a JPG image of a single pixel, and labeledrreeddddoott jjppgg Within the second to last column of thetable we create an image reference to that JPG imageand size it where its width is equal to $$bbaarrWWiiddtthh and its

Listing 2 (cont’d)

185 echo “ startIndex = “ $startIndex

186 “ sizeof(data) = “ sizeof ( $data ) “<br>” ;

187

188 if ( $startIndex + 1 < sizeof ( $data ) ) {

189 $hour = $fileElements [ sizeof ( $data )- 1 ][ 1 ];

190 $minute = $fileElements [ sizeof ( $data )- 1 ][ 2 ];

191 $second = $fileElements [ sizeof ( $data )- 1 ][ 3 ];

192 $DayofWeek = $fileElements [ sizeof ( $data )- 1 ][ 5 ];

193 $Month = $fileElements [ sizeof ( $data )- 1 ][ 6 ];

194 $DayofMonth = $fileElements [ sizeof ( $data )- 1 ][ 7 ];

195 $Year = $fileElements [ sizeof ( $data )- 1 ][ 8 ];

196 $visits = $fileElements [ sizeof ( $data )- 1 ][ 9 ];

197

198 $sql = “insert into siteVisits (hour, minute, second,

199 DayofWeek, Month, DayofMonth, Year, visits) values

200 (‘$hour’, ‘$minute’, ‘$second’, ‘$DayofWeek’,

201 ‘$Month’, ‘$DayofMonth’, ‘$Year’, ‘$visits’)” ;

Trang 16

11 // Open the db connection to sitestats

12 // and look at the last entry

20 mysql_select_db ( “sitestats” , $conn )

21 or die( “Could not open sitestats: “ mysql_error ());

22

23

24 //*****************************************************

25 // Note: mysql_fetch_row($qry) retrieves a single row

26 // mysql_fetch_field($qry, $i) fetches field $i

27 //*****************************************************

28

29 $table = “sitevisits” ;

30 $check = “select * from $table” ;

31 $qry = mysql_query ( $check )

32 or die ( “Could not match data: “ mysql_error ());

39 if ( strcmp ( $debug , “yes” ) == 0 ) echo “<table>” ;

40 if ( strcmp ( $debug , “yes” ) == 0 ) echo “<th>” ;

41 if ( strcmp ( $debug , “yes” ) == 0 ) echo “</th>” ;

42

43 $i = 0 ;

44 while ( $newArray = mysql_fetch_array ( $qry ) ) {

45

46 $dow = $newArray [ ‘DayofWeek’ ];

47 $mo = $newArray [ ‘Month’ ];

48 $dom = $newArray [ ‘DayofMonth’ ];

49 $yr = $newArray [ ‘Year’ ];

50 $vis = $newArray [ ‘visits’ ];

137 $countVal = intval ( $dbElements [ $i ][ 4 ] );

138 $barWidth = $graphWidthMax * $countVal / $barMax ;

139

140 echo “<tr>” ;

141 echo “<td align=right><font face=arial color=blue “

142 “size=2>” $dbElements [ $i ][ 0 ] “,</font></td>” ;

143 echo “<td align=right><font face=arial color=blue “

144 “size=2>” $dbElements [ $i ][ 1 “</font></td>” ;

145 echo “<td align=right><font face=arial color=blue “

146 “size=2>” $dbElements [ $i ][ 2 ] “</font></td>” ;

147 echo “<td align=right><font face=arial color=blue “

148 “size=2>” $dbElements [ $i ][ 3 “</font></td>” ;

149 print( “<td>\n” );

150 echo “<font face=arial color=purple size=2>” ;

151 echo “<b>” ;

152 print( “<img src=\”reddot.jpg\” “ );

153 print( “width=\”$barWidth\” height=\”$barHeight\”>” );

167 <font Style=”font-family:arial; font-size:12pt;

168 font-style: bold; color: #000000;”>

169 Entries: <?php echo $startRow ; ?> to

170 <?php echo $endRow ; ?> with

171 <?php echo $barMax ; ?> total rows

172 </font>

173 </td>

174 <td>

175 <form method=”post” action=”queryDB1.php”>

176 <input type=”hidden” name=”startRow”

177 value=” <?php echo $startRow ; ?> ” >

178 <input type=”hidden” name=”numberRowsToPlot”

179 value=” <?php echo $numberRowsToPlot ; ?> ” >

180 <input type=”hidden” name=”discrim” value=”add” >

181 <input type=”hidden” name=”delta” value=”10” >

182 <input type=”submit” value=”>”

183 Style=”font-family:sans-serif; font-size:10pt;

184 font-style:bold; background:#4400ff none;

Trang 17

height is equal to $$bbaarrHHeeiigghhtt, as shown below:

At the end of each bar, I print the actual value of the

bar, accomplished by outputting the value of

$$ddbbEElleemmeennttss[[$$ii]][[44]]

Getting the Next 10 Rows

At the bottom of Listing 3, there are two forms I will

focus on the first form for the time being This form

accepts the current values of $$ssttaarrttRRooww and $$eennddRRooww

and passes these, as hidden values, to the PHP code in

Listing 4 (qquueerryyDDBB11 pphhpp) This is shown in the code

seg-ment below:

<form method=”post” action=”queryDB1.php”>

<input type=”hidden” name=”startRow”

value=”<?php echo $startRow; ?>” >

<input type=”hidden” name=”numberRowsToPlot”

value=”<?php echo $numberRowsToPlot; ?>” >

<input type=”hidden” name=”discrim” value=”add” >

<input type=”hidden” name=”delta” value=”10” >

<input type=”submit” value=”>”

Style=”font-family:sans-serif; font-size:10pt;

font-style:bold; background:#4400ff none;

color: #ccbbcc; height: 2em; width: 2em”>

</form>

Key within this form code are the variables named

$$ddiissccrriimm and $$ddeellttaa which are passed as hidden ables from qquueerryyDDBB pphhpp to qquueerryyDDBB11 pphhpp The ASCII textstring “add” is assigned to the ddiissccrriimm field As you’llsee in a moment, this is the key to how theqquueerryyDDBB11 pphhpp code displays results—they are postedthrough the form These are retrieved withinqquueerryyDDBB11 pphhpp using the following code:

$startRow = $startRow + $delta;

$endRow = $startRow + $delta;

if ( $endRow > $barMax ) {

$endRow = $barMax;

} }

If we click the right-hand arrow in Figure 1 (that is, the

“increase” button) then we expect that we will be sented the next 10 rows of data This is accomplishedwithin qquueerryyDDBB11 pphhpp by adding the value $$ddeellttaa to thecurrent $$ssttaarrttRRooww and assigning the new $$eennddRRooww equal

pre-to the current $$ssttaarrttRRooww plus $$ddeellttaa We must be ful if we are at the last few elements of data, because byattempting to add $$ddeellttaa rows to the current $$ssttaarrttRRooww

care-we may, in effect, run off the end of the data table Toaccommodate this event, I perform a check on thevalue of $$eennddRRooww in relation to $$bbaarrMMaaxx If $$eennddRRooww isgreater than $$bbaarrMMaaxx, then simply assign $$eennddRRooww to

$$bbaarrMMaaxx The application of this logic results in thescreen snapshot shown in Figure 2, in which the next

10 rows appear

In the interest of completeness, it must be noted thatcode Listings 5, 6, and 7 are those for hheeaaddeerr pphhpp,llooggoo pphhpp, and ffooootteerr pphhpp, respectively These are smallfiles that contain web page header, title, and page clos-ing HTML tags that are included in the main PHP doc-uments

Getting the Previous 10 Rows

This process continues: located at the bottom ofqquueerryyDDBB11 pphhpp are three forms The second form is the

189 <font Style=”family:arial; size:12pt;

font-style: bold; color: #000000;”>

190 Go to Entry:

191 </font>

192 </td>

193 <td>

194 <form method=”post” action=”queryDB1.php”>

195 <input name=”startRow” type=”text” >

196 <input type=”hidden” name=”numberRowsToPlot”

197 value=” <?php echo $numberRowsToPlot ; ?> ” >

198 <input type=”hidden” name=”discrim” value=”val” >

199 <input type=”hidden” name=”delta” value=”10” >

200 <input type=”submit” value=”>|<”

201 Style=”font-family:sans-serif; font-size:8pt;

202 font-style:bold; background:#4400ff none;

203 color: #ccbbcc; height: 3em; width: 3em”>

Trang 18

6 $startRow = $_POST [ ‘startRow’ ];

7 $numberRowsToPlot = $_POST [ ‘numberRowsToPlot’ ];

8 $discrim = $_POST [ ‘discrim’ ];

9 $delta = $_POST [ ‘delta’ ];

10

11 $debug = “no” ;

12

13 //***************************************

14 // Open the db connection to sitestats

15 // and look at the last entry

23 mysql_select_db ( “sitestats” , $conn )

24 or die( “Could not open sitestats: “ mysql_error ());

25

26

27 //*****************************************************

28 // Note: mysql_fetch_row($qry) retrieves a single row

29 // mysql_fetch_field($qry, $i) fetches field $i

36 $qry = mysql_query ( $check )

37 or die ( “Could not match data because “ mysql_error ());

44 if ( strcmp ( $debug , “yes” ) == 0 ) echo “<table>” ;

45 if ( strcmp ( $debug , “yes” ) == 0 ) echo “<th>” ;

46 if ( strcmp ( $debug , “yes” ) == 0 ) echo “</th>” ;

47

48 $i = 0 ;

49 while ( $newArray = mysql_fetch_array ( $qry ) ) {

50 $dow = $newArray [ ‘DayofWeek’ ];

51 $mo = $newArray [ ‘Month’ ];

52 $dom = $newArray [ ‘DayofMonth’ ];

53 $yr = $newArray [ ‘Year’ ];

54 $vis = $newArray [ ‘visits’ ];

136 $endRow = $startRow + $delta ;

137 if ( $endRow > $barMax ) $endRow = $barMax ;

146 $startRow = $startRow + $delta ;

147 $endRow = $startRow + $delta ;

160 $startRow = $startRow - $delta ;

161 $endRow = $startRow + $delta ;

Trang 19

same as shown for qquueerryyDDBB pphhpp: in which the variable

$$ddeellttaa is added to the current $$ssttaarrttRRooww and $$eennddRRooww.The first form accommodates the left-hand arrow, andassigns the string “subtract” to the $$ddiissccrriimm variable.The code in qquueerryyDDBB11 pphhpp is then called recursively Ifthe user opts to back up ten rows, then there is a “sub-tract” method that does the following:

if ( strcmp($discrim, “subtract”) == 0 ) { // Going down

$startRow = $startRow - $delta;

$endRow = $startRow + $delta;

if ( $startRow <= 0 ) {

$startRow = 0;

$endRow = $startRow + $delta;

} }

In this instance, the $$ssttaarrttRRooww is decremented by theamount in $$ddeellttaa The $$eennddRRooww is still incremented by

$$ddeellttaa rows above $$ssttaarrttRRooww Then, we must modate the possibility of decrementing below the startrow The conditional statement handles this event bychecking whether the current value of $$ssttaarrttRRooww is lessthan zero If so, assign zero to the $$ssttaarrttRRooww variable,and set the $$eennddRRooww to zero plus $$ddeellttaa

accom-Starting at an Arbitrary Row

The third and last form contained in qquueerryyDDBB11 pphhppaccommodates the condition in which a user wishes to

go to an arbitrary row within the table This behavior ispreferred when, for example, much data exists withinthe database and the user would like to jump nearly tothe end

In this case, the value for $$ssttaarrttRRooww is assigned

direct-ly by the user, through the form, and qquueerryyDDBB11 pphhpp iscalled recursively, again The value of $$ddiissccrriimm picks upthe string value “gotovalue” from qquueerryyDDBB pphhpp, anduses this to assign the $$ssttaarrttRRooww:

<form method=”post” action=”queryDB1.php”>

<input name=”startRow” type=”text” >

<input type=”hidden” name=”numberRowsToPlot”

value=”<?php echo $numberRowsToPlot; ?>” >

<input type=”hidden” name=”discrim”

value=”val” >

<input type=”hidden” name=”delta” value=”10” >

<input type=”submit” value=”>|<”

Style=”font-family:sans-serif; font-size:8pt;

font-style:bold; background:#4400ff none;

color: #ccbbcc; height: 3em; width: 3em”>

</form>

The $$ssttaarrttRRooww variable becomes the point at which ues will start to be displayed, and is entered by the userthrough the form above Again, qquueerryyDDBB11 pphhpp is calledrecursively, and the $$ddiissccrriimm value is set to the string

val-“val” The code segment that catches this value lows:

fol-if ( strcmp($discrim,”val”) == 0 ) { // Go to cific range

spe-$endRow = $startRow + $delta;

if ( $endRow > $barMax ) $endRow = $barMax;

}

Listing 4 (cont’d)

193 echo “<td align=right><font face=arial color=blue “

194 “size=2>” $dbElements [ $i ][ 0 ] “,</font></td>” ;

195 echo “<td align=right><font face=arial color=blue “

196 “size=2>” $dbElements [ $i ][ 1 “</font></td>” ;

197 echo “<td align=right><font face=arial color=blue “

198 “size=2>” $dbElements [ $i ][ 2 ] “</font></td>” ;

199 echo “<td align=right><font face=arial color=blue “

200 “size=2>” $dbElements [ $i ][ 3 “</font></td>” ;

201 print( “<td>\n” );

202 echo “<font face=arial color=purple size=2>” ;

203 echo “<b>” ;

204 print( “<img src=\”reddot.jpg\” “ );

205 print( “width=\”$barWidth\” height=\”$barHeight\”>” );

220 <font Style=”font-family:arial; font-size:12pt;

221 font-style: bold; color: #000000;”>

222 Entries: <?php echo $startRow ; ?> to

223 <?php echo $endRow ; ?> with

224 <?php echo $barMax ; ?> total rows

232 <form method=”post” action=”queryDB1.php”>

233 <input type=”hidden” name=”startRow”

234 value=” <?php echo $startRow ; ?> ” >

235 <input type=”hidden” name=”numberRowsToPlot”

236 value=” <?php echo $numberRowsToPlot ; ?> ” >

237 <input type=”hidden” name=”discrim”

238 value=”subtract” >

239 <input type=”hidden” name=”delta” value=”10”>

240 <input type=”submit” value=”<”

241 Style=”font-family:sans-serif; font-size:10pt;

242 font-style:bold; background:#4400ff none;

243 color: #ccbbcc; height: 2em; width: 2em”>

251 <form method=”post” action=”queryDB1.php”>

252 <input type=”hidden” name=”startRow”

253 value=” <?php echo $startRow ; ?> ” >

254 <input type=”hidden” name=”numberRowsToPlot”

255 value=” <?php echo $numberRowsToPlot ; ?> ” >

256 <input type=”hidden” name=”discrim” value=”add” >

257 <input type=”hidden” name=”delta” value=”10” >

258 <input type=”submit” value=”>”

259 Style=”font-family:sans-serif; font-size:10pt;

260 font-style:bold; background:#4400ff none;

261 color: #ccbbcc; height: 2em; width: 2em”>

262 </form>

263 </td>

264 <td>

265 <font Style=”font-family:arial; font-size:12pt;

266 font-style: bold; color: #000000;”>

267 Go to Entry:

268 </font>

269 </td>

270 <td>

271 <form method=”post” action=”queryDB1.php”>

272 <input name=”startRow” type=”text” >

273 <input type=”hidden” name=”numberRowsToPlot”

274 value=” <?php echo $numberRowsToPlot ; ?> ” >

275 <input type=”hidden” name=”discrim” value=”val” >

276 <input type=”hidden” name=”delta” value=”10” >

277 <input type=”submit” value=”>|<”

278 Style=”font-family:sans-serif; font-size:8pt;

279 font-style:bold; background:#4400ff none;

280 color: #ccbbcc; height: 3em; width: 3em”>

Trang 20

The $$eennddRRooww variable is set to $$ssttaarrttRRooww plus $$ddeellttaa If

the $$eennddRRooww exceeds the number of rows in the

data-base, it is automatically set to the maximum database

row In this way a user can access any starting row and

hop over intermediate values as needed The data are

passed recursively back to qquueerryyDDBB11 pphhpp using the

fol-lowing variables, which are retrieved from the form

The values are set based on the user’s selection during

the previous call to qquueerryyDDBB11 pphhpp It is possible to

aug-ment these stateaug-ments by incorporating some error

checking into the code to verify that the values have

been set within the proper ranges This is merely one

suggestion offered to improve the robustness of the

methodology

Operation and Data Base Table Structure

For those interested in using this methodology on their

own sites, all files are provided for download in the

code archive Figure 3 shows the structure of the

ssiitteessttaattss database, and the ssiitteevviissiittss table; it

con-tains a screenshot taken from PHPMyAdmin—a useful

tool for managing MySQL databases A user wishing to

recreate this site counter tool will need to install MySQL

on the server and will need to create the database

instance and table required to run the code

Summary

I have intended to provide some insight into how to

develop a simple and useful bar-chart based hit

count-er using PHP and MySQL The code I have provided is

the same as that which I am using on client sites to

keep track of access statistics A user having ordinary

skill in the art of PHP and MySQL can take this idea

much farther and include many different types of tics

statis-The methodology I provide has educational value, aswell, by illustrating a simple manner of implementingPHP database connectivity—a capability that is neces-sary for any type of advanced commercial application.Some additional ideas include adding site statistics ontime of day, user identity, and server identity It is evenpossible to accommodate statistics for each web pageassociated with a site, thereby providing details on thepopularity of various pages and on whether the site isable to hold the interest of individuals so that they visitother features available at your site

There is no limit to what you can do

To Discuss this article:

http://forums.phparch.com/218

John R Zaleski, Ph.D., is a biomedical systems engineer with

20 years of experience in software development and medical device integration experience as applied to acute care hospi- tal environments He has developed and fielded medical products that are currently in use in large acute care hospi- tals He has developed products and many applications in Java, PHP, and MySQL and has authored two dozen patent applications and an equal number of refereed publications

in the areas of medical device integration, software methods for medical device communication, software performance, and real-time clinical analysis of patient data.

Trang 22

Unicode is a single character set designed to

include characters from just about every writing

system on the planet (and off the planet—even

Klingon has been written for Unicode, although it is not

part of the official standard) In recent years, Unicode

has become more prevalent on the web, and all major

web browsers, web servers, programming languages,

and databases worth their salt now support it

Switching your web applications to Unicode will give

you the ability to correctly handle and display any

char-acter from any language you’re likely to encounter

Understanding the significance of Unicode requires

first understanding some basics of character sets, and

their history The first thing you need to know was said

best by Joel Spolsky of Joel On Software: “There ain’t no

such thing as plain text.” If you don’t know the

charac-ter set and the encoding that were used in the creation

of a string of text, then you won’t know how to display

it properly For modern purposes, the story of character

sets starts with ASCII In the 1960s, unaccented English

characters, as well as various control characters for

car-riage returns, page feeds, etc., were each assigned a

number from 0 to 127; there was general agreement

on these number assignments, and so ASCII was born

The ASCII characters could fit in 7 bits, and computers

used 8-bit bytes, which left an extra bit of space Thisled to the proliferation of hundreds of different charac-ter sets, with each one using this extra space in a differ-ent way The characters from 0-127 are often referred

to as Lower ASCII, and the characters from 128-255 as

Many web sites cannot correctly interpret or display

any-thing other than English language characters Converting

your site to UTF-8 (Unicode) enables you to handle

char-acters from almost any language in the world However,

currently available conversion guidelines typically focus on

just a single software product, offering little guidance on

how to move UTF-8 encoded data between different

prod-ucts Configuring your web server, PHP, and your database

to support 8 is one thing—configuring them so

UTF-8 encoded data moves smoothly between them is

anoth-er This article guides you through a UTF-8 conversion

using PHP, Oracle, and Apache It also covers data exports

to PDF, RTF, email, and plain text.

Solving the Unicode Puzzle

by Michael Toppa

REQUIREMENTS

tech/opensource/php/globalizing_ _ o

Trang 23

Upper ASCII or Extended ASCII Extended ASCII

charac-ter sets added characcharac-ters from non-English languages,

special characters like copyright symbols, and

line-drawing characters to simplify line-drawing boxes, etc With

all these different versions of extended ASCII floating

around, text generated on, say, a computer in Russia

would turn into gibberish if you tried to read it on a

computer in the US This happened because the

num-ber codes representing the Cyrillic characters were

assigned to totally different characters on the US

com-puter This became a bit of a problem when everyone

started using the internet

Unicode represents an effort to clean up this mess

The Unicode slogan is: “Unicode provides a unique

number for every character, no matter what the

plat-form, no matter what the program, no matter what the

language.” Unicode can do this because it allows

char-acters to occupy more than one byte, so it has enough

room to store characters from languages around the

world—even Asian languages that have thousands of

characters With Unicode, it’s particularly important to

understand the distinction between a character set,

and character encoding Unicode is a single character

set, but there are three different ways to encode it: they

are called 8, 16, and 32 (there’s also

UTF-7, but it was never officially adopted by the Unicode

Consortium, and for the most part it’s been deprecated

in favor of UTF-8) The numbers 8, 16, and 32 indicate

the bits used for the Unicode code units (a complete

character may occupy more than one code unit—it can

be multi-byte) All three encodings can display any

Unicode character, and each has its own advantages

and disadvantages depending on what’s important in a

particular implementation In the case of web

applica-tions, UTF-8 is the encoding of choice because it stores

the lower ASCII characters in a single byte format This

makes UTF-8 fully compatible with “plain text,” even if

you’re clueless about character encoding

For the sake of brevity, I’ve glossed over a great

num-ber of points related to Unicode and character sets If

you want to learn more, I highly recommend the

arti-cle The Absolute Minimum Every Software Developer

Absolutely, Positively Must Know About Unicode and

Character Sets (No Excuses!) by Joel Spolsky, at

w

www.joelonsoftware.com/articles/Unicode.html l It

contains links to a number of other good resources as

well

Why Care About Unicode?

As far as Unicode and UTF-8 are concerned, all web

sites can be placed in one of three categories: those

that don’t need to care about them, those that should

convert to UTF-8, and those that should convert to

UTF-8 and internationalize

The most common character set currently in use on

the English-speaking side of the web, other than UTF-8,

is Western ISO-8859-1 (aka Latin-1) If your site isn’t

already using UTF-8, then you’re probably using

Latin-1 If you’ve had no problems related to character sets

so far, and you have absolutely no foreseeable needs tohandle text outside the ASCII range, then you fall intothe first category: you probably don’t need to do any-thing As you’ll see in the rest of this article, converting

to UTF-8 is not a painless process, so you should onlyundertake the work if you have some clearly identifi-able, relevant goals to meet

Here at the University of Pennsylvania School ofMedicine, we fall into the second category: our websites are in English, but we occasionally handle datafrom a variety of foreign languages that don’t use theEnglish alphabet We must receive, store, display, andtransmit these characters faithfully Since we can’t reli-ably predict what sort of characters might come ourway, converting our applications to UTF-8 was the log-ical choice, since it can handle any language we mightneed to support

The third category is for sites that don’t just ally handle foreign characters—they actually serve aninternational audience In addition to using UTF-8,these sites typically employ various mechanisms thatallow visitors to choose the language for displaying

occasion-content One important term applied here is tionalization, defined by the W3C as “[t]he process of

interna-designing, creating, and maintaining software that canserve the needs of users with differing language, cultur-

al, or geographic requirements and expectations” (seeh

http://www.w3.org/TR/ws-i18n-scenarios/) Another

key term is localization: “[t]he tailoring of a system to

the individual cultural expectations for a specific targetmarket or group of individuals.” Sites that are able todynamically perform localization for a variety of targetaudiences can do so because they’ve been configuredwith a good internationalization framework

Internationalization and localization are substantialtopics, and are not the focus of this article However,getting all the various components of your web appli-cation environment to place nicely together using UTF-

8 is a necessary step before you can even try tionalizing your site So this article will be of interest tothose who only want to handle the occasional non-English characters, and to those who are contemplatingfully internationalizing their site

interna-Getting Ready for UTF-8

The first step is determining the scope of your work At

a minimum, you probably have PHP, a web server, and

a database to consider I’ll cover doing a UTF-8 conversion with PHP, Apache, and Oracle If you

are also using Oracle, then you must read An Overview on Globalizing Oracle PHP Applications at

h

h t t t p : : / / w w w w o r a a c c l e c o m m / t e e c h n n o l o o g y / / t e c c h / o

o p p e n s s o u r r c e / / p h p p / / g l o o b a l l i z i i n g _ _ o r a a c l e e _ p h h p _ a

applications.html It’s an excellent starting point, but,unfortunately, it doesn’t always explain the reasons

Trang 24

behind its recommendations, which means you’ll get

stuck if things don’t happen to work after you follow its

instructions I’ll try to fill those gaps

You also have to take a look at any other applications

that interact with PHP, your web server, or your

data-base, as they will also be affected by a character set

conversion For us, that included Smarty, PDFlib, and

exporting data to RTF, text files, and email, so I’ll

dis-cuss those as well Even if you have a different mix of

applications, the concepts I’ll describe are probably

applicable to your situation, although the

implementa-tion specifics, obviously, will be different

Configuring Apache, PHP, and Oracle

Most of the time, PHP web applications are run under

the Apache web server, which itself is running in a user

account (assuming you’re in a Unix-ish environment)

So, the first step is to set the environment of this

account correctly Since PHP and Oracle are speaking to

each other through this account, it’s crucial to specify

the right character set for it, so they both know what to

expect You do this by setting the NNLLSS LLAANNGG

environ-ment variable in the Apache configuration The Oracle

Overview document mentioned above says to set it to

AALL3322UUTTFF88, but doesn’t fully explain why So when this

didn’t do the trick for me, I had to do some more

research I looked up the Oracle Character Set

descrip-tions and learned that AALL3322UUTTFF88 corresponds to

Unicode 3.1 After talking with our DBA I learned that

our Oracle database was set to Unicode 3.0, which

meant I needed to set NNLLSS LLAANNGG== UUTTFF88 Note that we

ultimately switched to AALL3322UUTTFF88, since it corresponds

to the latest version of Unicode, and in Oracle it allows

for conversion between UTF-16 and UTF-8 (just in case

you ever need to do that) The moral of the story is that

NNLLSS LLAANNGG should exactly match the character set you’re

using in Oracle

What I just said contradicts the advice of the Oracle

Overview document, where it says NNLLSS LLAANNGG should be

set to match the client (in this case, PHP) but that it

doesn’t need to match the database character set

That’s technically true, but a mismatch will quickly lead

to trouble if, for example, you try to insert records from

PHP that are in an encoding that’s not compatible with

the Oracle character set If you’re going to switch to

UTF-8, do it wholeheartedly: set PHP, your web server,

and your database all to UTF-8 This will save you theheadache of translating character encodings as youmove data around

NNLLSS LLAANNGG is not the end of the story It applies to thecommunication between PHP and Oracle, but it does-n’t determine how characters are encoded within PHP,and it doesn’t influence how documents are served byApache There are a few different approaches to consid-

er for having Apache and PHP serve your web pages inUTF-8

If you want all of the documents

on your server to default to UTF-8, one option is to set the AAddddDDeeffaauullttCChhaarrsseett directive in the Apache configuration to UTF-8 Note, however, that the Apache documentation ath

http://httpd.apache.org/docs-2.0/mod/core.html ldoes not express enthusiasm about this approach:

“AAddddDDeeffaauullttCChhaarrsseett should only be used when all of

the text resources to which it applies are known to be

in that character encoding and it is too inconvenient tolabel their charset individually One such example is toadd the charset parameter to resources containing gen-erated content, such as legacy CGI scripts, that might

be vulnerable to cross-site scripting attacks due to provided data being included in the output Note,however, that a better solution is to just fix (or delete)those scripts…”

user-If you want all of your PHP-generated content to beserved in UTF-8, set ddeeffaauulltt cchhaarrsseett==UUTTFF 88 in yourpphhpp iinnii file It’s OK if the PHP ddeeffaauulltt cchhaarrsseett is differ-ent from what’s specified in Apache AAddddDDeeffaauullttCChhaarrsseett:the former will apply only to PHP files, and the latterwill apply to everything else

If you want some (but not all) of your PHP documents

served in UTF-8, you don’t have to modify pphhpp iinnii.Instead, specify UTF-8 as the character set in theCCoonntteenntt ttyyppee header of those files It’s important topoint out here that you should set this header with thePHP hheeaaddeerr(())function If you try to set it with an HTMLMeta tag, and you’ve used Apache’s AAddddDDeeffaauullttCChhaarrsseettdirective to specify a different character set, the Apachedirective will override your Meta tag

Now that you’ve configured how you want ments served, you need to configure PHP so it caninternally handle UTF-8 This means enabling multi-byte character support You’ll need to re-compile PHP

docu-“S witching your web applications to Unicode will give you the ability to correctly handle and display any character from any language you’re likely to encounter “

Trang 25

with the eennaabbllee mmbbssttrriinngg option (unless, of course,

you had the foresight to do it previously), and set

mmbbssttrriinngg iinntteerrnnaall eennccooddiinngg==UUTTFF 88 in your pphhpp iinnii file

Look over the PHP documentation for multi-byte

string functions at h http://www.php.net/ref.mbstring

Many of the PHP string functions have multi-byte

equivalents An example is the best way to illustrate

what this means The multi-byte version of ssttrrlleenn(()) is

mmbb ssttrrlleenn(()) The ssttrrlleenn(()) function assumes that a

character always occupies a single byte, so it actually

returns the length of a string in bytes, and does not

necessarily indicate the number of characters In UTF-8,

though, a string that is 4 characters long could occupy

anywhere from 4 to 24 bytes depending on the

pres-ence of multi-byte characters The mmbb ssttrrlleenn(())function

will correctly tell you the number of characters in such

a string, but the regular ssttrrlleenn(()) function won’t

Because of all this, you should consider enabling

PHP’s function overloading feature, described at

h

http://php.net/ref.mbstring#mbstri ng.overload

Activating function overloading will cause PHP to

auto-matically assume it’s handling multi-byte strings, so—

continuing with the example—it will actually execute

mmbb ssttrrlleenn(())when you call ssttrrlleenn(()) If you’re making a

wholesale conversion to UTF-8, and you don’t want to

revise all of the string function calls in your existing

code, implementing function overloading makes sense

But there are a couple of caveats:

Watch out for calls to ssttrrlleenn(()) (or any other string

function) where it really is intended to work with the

byte length, not the character length In that situation,

function overloading will end up giving you an

unin-tended result Fortunately, there is a workaround for

mmbb ssttrrlleenn(()): it accepts a character set specification as a

second argument and if you pass in ‘latin1’ (even

though it’s actually handling a UTF-8 string) This will

cause the string to be evaluated as if it were single-byte

encoded mmbb ssttrrlleenn(($$yyoouurr uuttff88 ssttrriinngg,, ‘‘llaattiinn11’’)) will

give you the number of bytes in a multi-byte string

You may not want to do function overloading on

mmaaiill(()) I’ll explain why in the discussion of email below

Note that if you haven’t upgraded to PHP 5, the

hhttmmll eennttiittyy ddeeccooddee(()) function will return an error if

you pass it a UTF-8 string This was the only UTF-8

incompatibility we found in PHP 4.3

Going back to Oracle, starting with Oracle 9i, it

pro-vides improved handling for multi-byte characters by

giving you a way to distinguish between byte length

and character length When creating a table, you can

specify whether its length is defined in terms of

charac-ters or bytes For example, VVAARRCCHHAARR22((2200 BBYYTTEE)) will give

you a 20-byte length field, and VVAARRCCHHAARR22((2200 CCHHAARR)) will

give you a 20-character length field The default is BBYYTTEE,

which you can alter with the NNLLSS LLEENNGGTTHH SSEEMMAANNTTIICCSS

parameter—see your Oracle documentation for more

details

Beware Windows-1252 in Web Forms

As I mentioned, other than UTF-8, the character ing you’re most likely to find on English-speaking websites, these days, is Latin-1 (aka Western ISO-8859-1).One of the nice things about UTF-8 is that the first 256characters are the same as in Latin-1 That is, the Latin-

encod-1 ASCII characters and its Extended ASCII characterslive in the same numerical locations in UTF-8 If you’recurrently on Latin-1, this greatly eases the pain ofswitching to UTF-8

So, the big “however” comes from—you guessed it—Windows Fortunately, Windows NT, 2000, and XP useUnicode internally and shouldn’t cause headaches for aUTF-8 web site But Windows 95 and 98 use theWindows-1252 character set Its standard ASCII charac-ters from 0-127 are the same as Latin-1 and UTF-8, butits Extended ASCII set is different If you have a form on

a web page that’s UTF-8 encoded, and someone ning Windows 9x fills out the form by copying-and-pasting text from Microsoft Word, Extended ASCIIcharacters may be interpreted properly You may haveexperienced this before: for example, the “©©” symbol inyour Word document turned into something like “ää”when you pasted it into a form Nothing about thecharacter’s underlying data changed—the decimal rep-resentation of the character is the same as it wasbefore—it just means something different in UTF-8than it does in Windows-1252

run-This was more of a problem in the past than it is now,

as modern browsers try to transparently perform acharacter set conversion for you as needed in these sit-uations But the problems are by no means entirely

resolved: see FORM submission and i18n ath

http://ppewww.ph.gla.ac.uk/~flavell/charset/ / f

form-i18n.html l for a thorough overview of all theissues related to this, as well as a rundown of how themajor browsers behave (if you’re wondering about the

meaning of i18n, it’s short-hand for

internationaliza-tion)

What makes this a truly maddening problem is verting a Latin-1 encoded database to UTF-8 whensome of the data in it came from Latin-1 encoded webforms where users pasted in Windows-1252 text, andtheir browsers didn’t convert the characters properly.There is no easy fix for this, as you simply have to look

con-at the records yourself to see if the Extended ASCIIcharacters are displaying as the user intended, or ifthere was a character set conversion problem along theway

UTF-8 Support in Smarty

Smarty handles UTF-8 transparently—almost The onetrouble spot is the eessccaappee modifier It calls the PHP

hhttmmlleennttiittiieess(())and hhttmmllssppeecciiaallcchhaarrss(())functions, but

it doesn’t provide them with the necessary charsetargument so they’ll work with UTF-8 The solution is to

Trang 26

override eessccaappee with your own custom version Start by

making a copy of the Smarty eessccaappee modifier, and

tweak it to pass along a charset argument to PHP Then

override the original with your custom version If you

won’t always be using UTF-8, set your custom version

to accept a charset argument, so you can adjust the

functionality as needed Look up the “Extending

Smarty with Plugins” section of the manual on the

Smarty site—[http://smarty.php.net/]—for instructions

on how to customize Smarty

Exporting UTF-8 Data to PDF, RTF, Plain

Text, and Email

It may not always be wise, or even possible, to keep

data encoded in UTF-8 when exporting to other

for-mats As you’ll see below, sometimes you need to

change the character set before performing the export

Take a look at PHP’s uuttff88 ddeeccooddee(()) and iiccoonnvv(())

func-tions to learn about converting UTF-8 to single-byte

encoding Note that uuttff88 ddeeccooddee(()), while easy to use,

is limited to the Latin-1 character set (see the user

con-tributed notes on the PHP uuttff88 ddeeccooddee(())page for tips

on dealing with other character sets)

Our applications require exporting data to PDF, RDF,

text files, and email:

To generate PDF, we run the PDFlib application on

our web server to create PDF documents on the fly

PDFlib is an application specifically designed for

pro-cessing PDF data and dynamically generating PDF

doc-uments—you can learn more about it at

h

http://www.pdflib.com/ / For it to work with UTF-8

data, you need to use it with a UTF-8 compatible font

The commonly used Windows TrueType fonts—Arial,

Times New Roman, and Courier New—are Unicode

compliant However, that doesn’t mean they can

dis-play any Unicode character They are fine for English

and most Central and Eastern European

languages For more on this, see the Font

section of Alan Wood’s Unicode Resources at

h

http://www.alanwood.net/unicode/ / It’s important to

mention Microsoft’s Arial Unicode MS font, which is

not the same as the standard Arial font Arial Unicode

MS can display characters from Arabic, Tamil, Thai,

Hangul, Chinese, and many other languages This

means the font itself is huge: approximately 23Mb If

you try to use it with PDFlib running on your web

serv-er, you may run into performance problems

If you are using, for example, Microsoft Word,

it’s easy to take a Unicode document and save it

as an RTF file It’s also not difficult to use a tool

like RTF File Generator (available at

h

http://www.paggard.com/projects/rtf.generator/ /) to

generate RTF files using PHP, as long as the source data

does not include characters from multiple languages It

turns out to be quite difficult to use PHP to generate an

RTF file when the source data is UTF-8 encoded and

contains characters from several different languages.This is because RTF requires you to specify a characterset for displaying the characters, and you can’t just say

“Unicode.” You have to specify one or more ANSI,

PC-8, Mac, or IBM PC character sets This means you mustanalyze the multi-byte characters in a UTF-8 string andfigure out what characters they represent Then youneed to specify in the header of the RTF file what char-acter sets are needed to display them: a Hebrew char-acter set for Hebrew characters, Arabic for Arabic, etc.Then in the body of the file you must flag the variouschunks of non-English text and indicate which of thesecharacter sets are needed to display them Rather thanattempting this Herculean task, our solution is to do a

uuttff88 ddeeccooddee(())on our data before generating RTF files,

so that the text is all in Latin-1 At the moment we canget away with this since none of the data going into theRTF files we currently generate contain non-Englishcharacters We are planning to eventually discontinueour RTF support, so this will not be a long-term prob-lem Acquiring an understanding of how RTF workswith Unicode data was difficult—of all the applications

we encountered in this project, RTF was the least welldocumented when it came to Unicode

We export data to text files, primarily in ccssvv formatfor use in spreadsheets Surprisingly, current versions ofMicrosoft Excel do not support importing UTF-8 encod-

ed text files As with RTF, our solution is to perform a

uuttff88 ddeeccooddee(()) before generating these text files Thisdoesn’t pose any problems for us since the kind of data

we put in spreadsheets does not contain any English characters

non-As I mentioned, I do not recommend doing functionoverloading on the PHP mmaaiill(()) function The reasonhas to do with line breaks In Unix, a line break is rep-resented by a line feed (LLFF, or \\nn) character, on Macs,it’s represented by a carriage return (CCRR, or \\rr) charac-ter, and on Windows, by a CCRR++LLFF (\\rr\\nn) For email towork between platforms, an email standard was agreedupon in the early days of the internet, which is CCRR++LLFF

So, for example, on Unix, sendmail will add a CCRR as

“U nicode allows characters to occupy more than one byte,

so it has enough room to store characters from languages around the world “

Trang 27

needed to each LF it finds in the body of an email

mes-sage But when an email is UTF-8, PHP will first base64

encode it before passing it off to sendmail This

encod-ing is done so that multi-byte UTF-8 characters can be

transported within the 7-bit world of email (for more

about this, see Advanced E-mail Manipulation by Wez

Furlong, php|architect Vol 3, Iss 5) Sendmail and

other mailers do not attempt to wade through the

base64 encoding to “fix” the line breaks Unless you’re

careful to put CCRR++LLFF line breaks in all your PHP

generat-ed emails before sending them, you’ll end up sending

emails with improper line breaks This can have

unpre-dictable results, as you’re at the mercy of the recipient’s

email client software, and what it chooses to do with

malformed line breaks In our testing, we found that

the LLFF-only line breaks in our UTF-8 encoded emails

were interpreted as desired in Mac and Unix mail

read-ers, and by Microsoft Outlook on Windows, but not by

Eudora 6.2 (and previous versions) on Windows In

Eudora, the messages displayed with no line breaks at

all You can’t say it’s a Eudora bug, since the line breaks

weren’t meeting the standard At this time, the emails

we generate only contain basic English characters, so

sticking with the standard mmaaiill(()) function meets our

needs for now

The Bumpy Road to Unicode Compliance

As you can see, converting your web site to UTF-8 is by

no means a painless process But the payoff is worth it

if you plan to support characters from several guages It’s also a fascinating educational experience:you’ll gain a stronger understanding of how Apache,Oracle, and PHP interact, how Unicode supports somany different languages, some of the gory details ofhow email works, how browsers deal with mismatchingcharacter sets, what a Unicode compliant font is, andmuch more Even if you’re not using the same softwarediscussed in this article, hopefully I’ve at least imparted

lan-a sense of whlan-at kinds of problems you should look outfor If nothing else, hopefully you’ll remember, “thereain’t no such thing as plain text.”

Available Right At Your Desk

All our classes take place entirely through the Internet and feature a real, live instructor that interacts with each student through voice or real-time messaging.

What You Get

Your Own Web Sandbox Our No-hassle Refund Policy Smaller Classes = Better Learning

Sign-up and Save!

For a limited time, you can

just by signing up for our training program!

New classes start every three weeks!

http://www.phparch.com/cert

To Discuss this article:

http://forums.phparch.com/219

Michael Toppa is a web applications developer at the University of Pennsylvania School of Medicine He has previously worked for Ask Jeeves, E*TRADE, and Stanford University Libraries’ HighWire Press He can be found on the web at w www.toppa.com Credit for a lot of the research in this article goes to all of the U Penn School of Medicine Web Development team.

Trang 29

The hype around XML (the logical connection of

structure and data within a document) remains

unbroken—there is no serious Content

Management System that doesn’t offer, at least

rudi-mentary, XML support in one form or another

The dominant APIs for XML processing are DOM

(Document Object Model) and SAX (Simple API for

XML), two APIs that focus more on tags and less on

data The DOM API creates an XML document in a

tree-like structure that is saved in memory for continuous

use SAX is different: it runs through a document and

fires events based on the contents of the XML it is

pars-ing

Even before there was XML, there was the Document

Object Model, or DOM It allows a developer to refer

to, retrieve, and change items within an XML structure,

and is essential to working with XML The Document

Object Model is a platform- and language-neutral

inter-face that will allow programs and scripts to

dynamical-ly access and update the structure, content and style of

documents For large XML documents the memory and

processor resources consumed can be prohibitive,

because building a DOM object is relatively processor

intensive and the resulting DOM object usually

con-sumes a large amount of memory

The SAX parser is often used to process large XML

documents, but, unfortunately, it is poorly designed

Rather than being called by the parsing application, the

SAX parser uses a message handler with callbacks—this

is not straightforward The approach taken by SAXmakes the software architecture much more difficultthan it needs to be Although the resulting code maylook sufficient, there are always some inherent prob-lems because SAX does not maintain information aboutthe current state—that’s up to you This can be fixed bykeeping track of how deeply nested the start/end-ele-ment is and by using extra flags, but it always requiresadding extra state variables and code to do validation.Unlike that of DOM, the SAX specification is not a W3C(World Wide Web Consortium) standard; it was,instead, created by the members of the XML-DEV mail-ing list SAX parser doesn’t build a tree structure of thedocument in memory, like DOM does—the XML docu-ment is read sequentially, and special events are fired ifthe parser recognizes a significant component of thedocument (e.g a comment) The parser doesn’t keeptrack of previous elements—when it runs into a recog-nized chunk of the document, its work is done

XMLPull is an alternative API for parsing XML.Perhaps you find the memory consumption too high or

Despite the popularity of known APIs for XML processing,

such as SAX and DOM, the XMLPull parser is finding more

and more followers There are equivalent programs for

Java, Python, and Perl, and Harry Fuecks is writing an

equivalent implementation for PHP PHP 5 also comes with

a native extension called xmlReader.

XMLPull

an Alternative to SAX and DOM

by Markus Nix

Trang 30

the manipulation of data with SAX too involving If so,

it will pay to take a closer look at XMLPull Parsing XML

with XMLPull reflects the organization of data

struc-tures and therefore code written to use the XMLPull

parser is much easier to maintain State information is

kept, naturally, on the parser’s stack, as a consequence

of method calls that can be nested as many times as

necessary Pull parsers offer big ease-of-use advantages

compared to SAX, but you may be left wondering if

they can measure up SAX’s industrial-strength

perform-ance They can!

XMLPull was introduced in early 2002 by ringleaders

from the two leading pull parser implementations,

Stefan Haustein from the kXML project and Aleksander

Slominski from XPP3 (XML Pull Parser) Both, feeling

that the lack of a common API hindered wider pull

pars-ing adoption, began to work on XMLPull in December

2001 The resulting API reflects their substantial

experi-ence, drawing from their respective projects to produce

an interface that works well for a wide range of

appli-cations

XMLPull for Java, for example, supports everything

from J2ME (Java 2 Platform, Micro Edition) to J2EE (Java

2 Platform, Enterprise Edition) The J2ME requirement

forced the lead developers of XMLPull to create a

sim-ple interface with the minimum number of classes

nec-essary to function well in low memory environments In

contrast, J2EE environments don’t usually suffer from

such limited resources, but, instead, demand flexibility

and performance Accommodating both extremes with

a single interface is tough

According to the API introduction by Alexander

Slominski, “XML pull parsing allows incremental

(some-times called streaming) parsing of XML where

applica-tion is in control—the parsing can be interrupted at any

given moment and resumed when application is ready

to consume more input.”

While many Java programmers are already familiar

with XMLPull, this method of accessing an XML

docu-ment is still strange to most PHP programmers The

xxmmllRReeaaddeerr API is similar to SAX-API (which is frequently

used for simple XML processing in PHP), but provides a

simpler, more standard and more extensible interface

to handle large documents than the existing SAX

ver-sion It should be noted that XMLPull has no notion of

callbacks Think of XMLPull as defining a special kind of

iterator that delivers an XML document’s components

to you, one at a time It is totally up to you to decide

when you’re done with the current component, and

ready to move to the next one The parser always holds

a particular state that matches the current component

type Many of the methods prove meaningful only

when the parser is in a particular state, which is

identi-fied by a set of constant definitions

The Java API allows you choose the detail level that

your program will see This is a very powerful feature

18 require_once( XML_XMLPULL ‘XmlPull/PushListener.php’ );

19 require_once( XML_XMLPULL ‘XmlPull/PullParser.php’ );

20

21 /**

22 * Factory function for creating the pull parser

23 * @param string parser type (‘Expat’or HTMLSax’)

24 * @param string reader type (‘File’, ‘String’or ‘Struct’)

25 * @param mixed source to read (e.g string, file path, struct)

Trang 31

when talking about layering The original SAX interface

did not report all of the information needed to validate

a document, so developers had to build special

meth-ods into their parsers, if they wanted to support

valida-tion

A new Java Community Process (JCP) specification

request specifies a standard API for Java pull parsers:

JSR-173 (Streaming API for XML) Like SAX, XMLPull is

a W3C recommendation, as the only existing reference

implementations are explicitly Java based

(see the XMLPull API at h http://xmlpull.org/)

A PHP Implementation by Harry Fuecks

If you know how callback functions work in the SAX

Parser, the interface of the XMLPull Parser is easy to

understand: a simple factory method is enough to

establish a Parser- or Reader-type The document is

eas-ily iterated to capture the parts of the document that

are of interest The HTMLSAX XMLPull implementation

continues in the spirit of the original JAVA specification,

and supplies a simple interface, versatility, usage, and

good performance

Sax Pushes, XMLPull Pulls

Pull Parser is turning the paradigm of SAX Parsers

around Instead of forcing the parser to execute

prede-fined callback functions when a certain component of a

document is reached, it is instead asked to reply with

the next component This results in “pulling” instead of

“pushing”, and makes data processing easier

In the Java Community, there is a certain hype that

surrounds pull-parsing, because, unlike SAX (or rather

SAX2, if you prefer working with namespaces), it will

give control of the parsing event back to the

develop-er, instead of relying on a “black box.” XMLPull allows

incremental (streaming) parsing, so it is possible to

pause the parser in its work, for example, to wait for the

arrival of new data in unpredictable surroundings (such

as when pulling data from a remote server) J2ME is a

parser variant that is made for such surroundings: goodperformance with a small footprint

The PHP implementation follows the Java-API in mostscenarios The principle of parsing, using pull, is veryeasy: the parser iterates over a data stream with the

ppaarrssee(()) method, and travels from event to event Thevarious event types are replied as values that relate to

constants, with the original ggeettEEvveennttTTyyppee(()) method:SSTTAARRTT DDOOCCUUMMEENNTT, SSTTAARRTT TTAAGG, TTEEXXTT, EENNDD TTAAGG, and EENNDD DDOOCC UUMMEENNTT In PHP, these differ slightly: XXMMLL PPUULLLL SSTTAARRTT TTAAGG,XXMMLL PPUULLLL EENNDD TTAAGG, XXMMLL PPUULLLL TTEEXXTT and XXMMLL PPUULLLL PPII

XXMMLL PPUULLLL SSTTAARRTT TTAAGG offers information about the starttag of an element including information about theattributes XXMMLL PPUULLLL TTEEXXTT delivers CCDDAATTAA information.The other conditions are self-explained The parsing of

a XML document with XMLPull can be seen in Listing2

At the time of writing, Fuecks’ Pull Parser supportsfour conditions that are represented through the con-stants that I’ve mentioned above In addition to thesemain four, there are also XXMMLL PPUULLLL EESSCCAAPPEE andXXMMLL PPUULLLL JJAASSPP—these are useful only when workingwith the PEAR-Package (also written by Harry Fuecks).Support for namespaces is currently missing

Most SAX parsers are built on top of a pull parsinglayer It is an interesting challenge to expose both thepull and push layers to the user, but such functionalityallows a developer to use pull parsing when needed,without having to stop using the SAX API

It is possible to convert a pull parser into a pushmodel—during pull parsing, the caller has control overparsing and can push events It is also possible to con-vert push into pull parsers, but this requires that allevents be buffered, and converted from SAX callbacks

An alternative implementation of this conversioninvolves an extra thread that can be used to pull moredata from the SAX parser, but is kept suspended untilthe user asks for more events This approach is bestexemplified by Fuecks’ Pull Parser Wrapper for SAX thatallows conversion from a SAX model into an XML pullparser The parser-implementation by Fuecks is based

on the XML_SaxFilters PEAR Package (seeh

http://pear.php.net/package/XML_SaxFilters s), and uses PEAR’s iteration mechanism extensively The PHP implementation of the SAX filtercode was originally from Luis Argerich (h http://phpxml- - c

classes.sourceforge.net/show_doc.php?class=class_ s

sax_filters.html l), and was mentioned in greater

detail in the Wrox Press title “PHP 4 XML.” Fuecks’

“C ode written to use the XMLPull parser is much easier to maintain ”

Ngày đăng: 21/12/2013, 12:15

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w