Among other things, MDB2 features: • An OO-style query API • A DSN data source name or array format for specifying database servers • Datatype abstraction and on demand datatype conversi
Trang 2SITEWORX control panel
/mo SMALL BIZ $ 21 95
NODEWORX Reseller Access
All of our servers run our in-house developed PHP/MySQL
server control panel: INTERWORX-CP
INTERWORX-CP features include:
- Rigorous spam / virus filtering
- Detailed website usage stats (including realtime metrics)
- Superb file management; WYSIWYG HTML editor
INTERWORX-CP is also available for your dedicated server Just visit
http://interworx.info for more information and to place your order
WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS
LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!
ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGE
VISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS
D e d i c a t e d & M a n a g e d D e d i c a t e d s e r v e r s o l u t i o n s a l s o a v a i l a b l e
7500 MB Storage
100 GB TransferUnlimited MySQL DatabasesHost Unlimited DomainsPHP5 / MySQL 4.1.XNODEWORX Reseller Access
/mo
C O N T R O L P A N E L :
NEW! PHP 5 & MYSQL 4.1.X
PHP4 & MySQL 3.x/4.0.x options also available
We'll install any PHP extension you need! Just ask :)
MONEY BACK GUARANTEE
FREE DOMAIN NAME
WITH ANY ANNUAL SIGNUP
4.1.x
3.x/4.0.x
Trang 4Jaws 0.5: Just When You Thought it
was Safe to Go Back in the Water
An Advanced PHP & MySQL Hit Counter
Trang 6NO OT y yo TH ou HIIN u k kn NG no G ow w
Software development is humbling Just when you think
you’ve got a solid handle on every last (important) bit of
tech-nology you need to complete the project at hand, you’re
often slapped in the face with the news that you’re just plain
wrong This news can be both frustrating, and encouraging (at the
same time, believe it or not)
Let me set the scene Your team has been commissioned with
adding a new section to your corporate intranet In the course of
the addition, you adopt a new technology of some sort Perhaps
this is a new database abstraction layer, or a different manner of
handling HTML forms It could be anything; it doesn’t really
mat-ter Your team has worked on this new module for two months
You’ve put all of your collective knowledge and experience into
the project The launch date is in a couple days, and you’re
actu-ally going to make your deadline
So, this sounds pretty good so far; what could go wrong?
Perhaps one of the directors is about to walk in with a must-have
feature that needs to be in the next release, and will disrupt your
schedule? Sure This happens all the time, but it’s not the scenario
I’m thinking of—that’s just frustrating, and rarely the least bit
encouraging The bad situation that I’m thinking of is (oddly) free
of managerial influence
This new technology that you’ve adopted is really great It has a
few problems, but you’ve managed to work around them All
things considered, it’s saved you many hours in the course of the
past few weeks, and you’ve been bragging about it to your
devel-oper-friends who work at different companies
Then, in the course of your daily, duly-diligent reading of various
PHP news sources, you discover a brand-new,
just-released-yester-day extension that could replace this other new technology you’ve
already adopted Not only is it a suitable replacement, but it solves
all of the problems you had to work around, and also opens the
door to new possibilities that you didn’t even consider
Frustrating because you’re about to release a critical project that
encompasses technology that you’ve just discovered is inferior But
encouraging because you’re now awaiting the day you’re allowed
to rip out all of that legacy (but, ironically, not-yet-released) code
and employ a superior product
So, what’s my point? Simple: I know nothing What I think I
know is only temporary, and could be supplanted at any moment
My life as a developer is a constant journey of staying on top of
things, and no matter how much I think I “have it covered,”
there’s always something new about to appear on the weblog,
newsgroup, or source repository of tomorrow
I hope the articles in this issue open your eyes to new ideas
Especially the XMLPull article, which I think is pretty sweet new
(well, newer) technology, and that it’s not too late to incorporate
these ideas into your current—or next—project
php|architect
Volume IV - Issue 5 May, 2005
Graphics & Layout
Markus Nix
php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada
Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, list- ings and figures, the publisher assumes no responsibilities with regards
of use of the information contained herein or in all associated material.
Contact Information:
Copyright © 2003-2005 Marco Tabini & Associates, Inc — All Rights Reserved
Trang 7Solar 0.2.0 paul-m-jones.com announces the release of Solar 0.2.0.
What is it? According to solarphp.com: "Solar is a simple object library and application repository (that is, a com- bined class library and application component suite) for PHP5."
"Solar provides simple, easy-to-comprehend classes and components for the mon aspects of web-based rapid application development, all under the LGPL."
com-Solar is designed for developers who intend to distribute their applications to the world This means the database driver functions work exactly the same way for each supported database It also means that localization support is built in from the start." Get all the latest info from solarphp.com
phpBB 2.0.14The phpBB Group announces the release of phpBB 2.0.14, the "We know we are (not) furry" edition "This release addresses some bugfixes as well as fixing some minor non- critical security issues All issues not reported to us before being released are not credited to the founder, as usual."
"As with all new releases, we urge you to update as soon as possible You can, of course, find this download on our downloads page (http://www.phpbb.com/down- loads.php) As usual, three packages are available to simplify your update."
"The Full Package contains entire phpBB2 source and English language package."
For more information visit: http://phpbb.com
Vogoo-API.com is happy to announce
the release of Vogoo PHP API 0.8.2.
Vogoo-API.com announces: Vogoo PHP
API v0.8.2 is a free PHP API licensed
under the terms of the GNU GPL With
Vogoo PHP API, you can easily and
freely add professional collaborative
filtering features to your Web Site.
v0.8.2 features
• Handles all member/product
votes (available since v0.8)
• Fast computation of similarities
between members (available
since v0.8)
• One-to-one product
recommen-dations (available since v0.8)
• Ability for members to specify
when they are not interested in
a product recommendation
Planned features for future versions
• New engine based on products
recommendations that gives
better performances when little
information is available on the
member.
• Real time targeted ads
• Handles multiple product
cate-gories
• Collaborative filtering features
available for non-member
visi-tors
• Administration tool
• Engine for 'related sales'.
• Engine for 'related sales'.
Check out Vogoo-API.com for all
the latest info.
The Zend PHP Certification Practice Test Book is now available!
We're happy to announce that, after many months of hard work, the Zend PHP Certification Practice Test Book, written by John Coggeshall and Marco Tabini, is now available for sale from our website and most book sellers worldwide!
The book provides 200 questions designed as a learning and practice tool for the Zend PHP Certification exam Each question has been written and edited by four members of the Zend Education Board the very same group who prepared the exam The questions, which cover every topic in the exam, come with a detailed answer that explains not only the correct choice, but also the question's intention, pitfalls and the best strategy for tackling similar topics during the exam.
For more information, visit h http://www.phparch.com/cert/mock_testing.php p
Trang 8Check out some of the hottest new releases from PEAR.
MDB2_Schema 0.2.0
PPEEAARR::::MMDDBB22 SScchheemmaa enables users to maintain RRDDBBMMSS independent schema files in XML that can be used to create, alter and drop
database entities and insert data into a database Reverse engineering database schemas from existing databases is also supported The format is compatible with both PEAR::MDB and Metabase.
MDB2 2.0.0beta4
PEAR MDB2 is a merge of the PEAR DB and Metabase php database abstraction layers.
Note that the API will be adapted to better fit with the new PHP 5-only PDO before the first stable release.
It provides a common API for all supported RDBMS The main difference to most other DB abstraction packages is that MDB2 goes much further to ensure portability Among other things, MDB2 features:
• An OO-style query API
• A DSN (data source name) or array format for specifying database servers
• Datatype abstraction and on demand datatype conversion
• Portable error codes
• Sequential and non sequential row fetching as well as bulk fetching
• Ability to make buffered and unbuffered queries
• Ordered array and associative array for the fetched rows
• Prepare/execute (bind) emulation
• Sequence emulation
• Replace emulation
• Limited Subselect emulation
• Row limit support
• Transactions support
• Large Object support
• Index/Unique support
• Module Framework to load advanced functionality on demand
• Table information interface
• RDBMS management methods (creating, dropping, altering)
• RDBMS independent xml based schema definition management
• Reverse engineering schemas from an existing DB (currently only MySQL)
• Full integration into the PEAR Framework
• PHPDoc API documentation
DataObject's links.ini file correctly, it will also automatically detect if a table field is a foreign key and will populate a selectbox with the linked table's entries There are many optional parameters that you can place in your DataObjects.ini or in the properties of your
derived classes, that you can use to fine-tune the form-generation, gradually turning the prototypes into fully-featured forms, and you can take control at any stage of the process.
Net_GeoIP 0.9.0alpha1
A library that uses Maxmind's GeoIP databases to accurately determine geographic location of an IP address.
Trang 9Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.
While colorer is primarily designed for use with text editors, it can be also used for non-interactive syntax highlighting, for example,
in web applications This PHP extension provides basic functions for syntax highlighting.
pivotal to the increasing demand for Open Source software Topics include Scalable Internet Architectures, Web Services, PHP,
mod_perl, Apache HTTP Server, Java, XML, Subversion, and SpamAssassin.
The three main conference days offer a wide range of beginner, intermediate and advanced sessions ApacheCon attendees have more than 70 sessions to choose from, to learn firsthand the latest developments of key Open-Source projects including the Apache HTTP Server, the world's most popular web server software.
With plenty of room for networking and peer discussions, attendees can meet ASF Members and participants during the ApacheCon Expo, evening events, Birds Of a Feather sessions and a number of informal social gatherings."
For more information visit: http://www.apachecon.com/
VS.Php 1.1.1
Jcx.Software brings news of the immediate availability of
VS.Php version 1.1.1 This update adds support for PhpDoc
commenting, secure ftp deployment capabilities and many
bug fixes
PhpDoc is a powerful feature of PHP that allows the
devel-oper to add comments to the source code that can be used
to generate documentation VS.Php uses this information to
provide a better intellisense content For instance, VS.Php is
able to parse those comments to determine what type is a
particular variable Intellisense uses this information to
bet-ter help the developer This update also adds support for
secure ftp protocol for deploying applications through a
secure connection.
For information or to download VS.Php, visit:
http://www.jcxsoftware.com/
PHPEdit 1.2PHPEdit proudly announces the release of the latest version, PHPEdit 1.2
Next major version of PHPEdit is finally available for load This version includes lots of changes in its internals, and adds new, powerful features to the IDE, like complete PHP5 support, real-time syntax checking, jump to declaration, SimpleTest integration, new document templates, phpDocumentor Wizard and lots of enhancements in existing tools like CodeHint, CodeInsight and CodeBrowser.
down-This version is available for free to all our customers You can download it and test it for 30 days You can also buy a license to avoid the time limit.
To grab the latest version, visit
http://www.waterproof.fr/products/PHPEdit/
Trang 10The following methodology was motivated by a
request from a client of mine who asked me to
provide a web page access counter for their main
corporate web site A condition of the deal, though,
was that they did not want to show the actual number
of accesses, publicly, on the web site, itself Instead,
they wanted to keep track this data privately
Their reasons for omitting a public counter were in
keeping with the idea that they did not want to
broad-cast the activity on their site to all visitors, and, in
keep-ing with the tone of their message, did not desire to
display a typical web page access counter on their site
Instead, they wanted an access counter that would
provide them with a means of comparing and
contrast-ing the number of accesses from day to day so that
they could analyze advertising impacts on the number
of visitors who were hitting their site
As you may know, numerous types of Web counters
exist that are wide ranging in their capabilities and
styles However, I wanted to tailor a solution for my
client that would keep track of the number of accesses
to their site, while providing a tool to view these data
in a manner that was meaningful, and comparative
The output would provide an at-a-glance summary that
would allow my client to assess the effectiveness of
advertising campaigns with respect to changes in site
activity
What developed was a custom hit counter whichcontinues to evolve over time—an example screenshotcan be seen in Figure 1 The benefits of this hit count-
er are not so much in its uniqueness as in the ties it offers to the average PHP developer who is inter-ested in evolving their skills in the domain of PHP,
possibili-REQUIREMENTS
(5.0.4 available) OS
Win2K Prof, Win2K Advanced Server, WinXP SP1/SP2
or greater (4.1 available)
The Anatomy of a Hit
An Advanced PHP & MySQL Hit Counter
by John R Zaleski, Ph.D.
The combined approach of capturing web page access,
and charting the results provides a simple standalone
capability for graphically displaying hit counts to a web
site that requires only a basic working knowledge of PHP
and MySQL, yet provides a basic model for expanding and
developing a much more sophisticated counter.
Furthermore, the methodology for charting the hit count
data can be decoupled from basic web page access
count-ing for use in academic, business, or other types of data
mining applications where data charting and mining
pro-vide a unique way of comparing and contrasting data as
they change over time.
i
Trang 11MySQL, and user interface design.
The counter and graphing methodology I provide
here are very simple to understand and can be
modi-fied and used for many applications, even beyond web
page access counting
Calling the Hit Counter
The visual hit counter methodology consists of two
sep-arate pieces of code: one for incrementing hit count
statistics on a web page, and another for analyzing and
mining those statistics for relevant value The decision
to separate these two sets of functionalities is
some-what based on heuristics, but are born out of logic: by
separating the processing from the actual hit counting,
we remove the potential performance impacts
associat-ed with database access for each visit to a web page
Instead, we assign the analytical data mining of the
sta-tistics themselves to a web site dedicated to their study
This has the overall effect of reducing the load time of
the original web site so that users are not impacted
To implement the data collection part of the process,
the initial step in any web page involves incorporating
the following lines of code:
<!— Add the client hit counter —>
<?php include “./hc.php”; ?>
<!— End body tag —>
The hhcc pphhpp file is then included in the web page, at the
desired location Those wishing to make use of this
methodology need only include the above code
seg-ment in their PHP page (once all supporting files have
been uploaded to the server), and the hit counter
becomes operational
The hhcc pphhpp code contains the logic to open a data
file (hhiittccoouunntteerr ddaatt), increment a counter, and store
various other statistics to the opened file each time a
web page with the preceding include statement is
encountered
We begin the code in hhcc pphhpp by assigning the name
of the data file to the variable $$CCOOUUNNTT FFIILLEE:
If the file referred to by $$CCOOUUNNTT FFIILLEE exists, and already
contains data, we can assume the contents are the
results of previous pages accesses So, we read the
con-tents of the entire file Upon reading the last value, I
assign the content to the $$ccoonntteennttss variable, increment
the value by 1, and append the new value to the
hhiittccoouunntteerr ddaatt file
If this is the first time the web page has been
accessed, the file is empty (or the file does not exist), so
we have to create the file and write new data to it In
addition to simply writing the current counter value, I
also write the date and time stamp; this is to facilitate
the data mining process The hhiittccoouunntteerr ddaatt file hasthe following format:
[1] 23 14 45 PM Wednesday July 28th 2004 1 [2] 06 19 09 AM Thursday July 29th 2004 2 [3] 08 29 13 AM Thursday July 29th 2004 3
Note that much more information can be added (such
as the identity of those accessing the web page).However, that code would need to be added to thestructure of the hit count listing The code fragmentresponsible for writing the output listing above is:
fwrite( $fp,”[“.$counter “] “.date(“h:i A l F dS Y”).” “.
$counter.” \n”);
The entire code listing for the hit counter is contained
in Listing 1 It is important to set the permissions to mit the hhcc pphhpp file to read and write files in the directo-
per-ry in which it is placed If this is not done properly, thescript will be unable to write to the hhiittccoouunntteerr ddaatt file
Plotting Preliminaries
Plotting preparation is accomplished using the ddeexx pphhpp file (Listing 2) As I explained earlier, I hadopted to create the hit counter method independently
ssiitteeiinn of the plotting code to decouple the hit countermethod from the database This serves several purpos-
es First, it allows those interested in just a plain hitcounter to implement it without requiring them tomaster the techniques of database connectivity.Second, this takes performance considerations intoaccount by avoiding database access during the count-
er incrementing process Third, and finally, this enablesthe user to alter and improve the plotting routine inde-pendently of the hit counter so that accurate statisticscan continue to be kept by keeping the index pageintact
It will be noted that in the hit counter method Ideveloped in Listing 1, there is no direct output of thenumber of hits to the Web page This is a matter ofchoice for the Web page owner Sometimes individuals
Figure 1
Trang 12perceive that, if the count is too low, this can bode
poorly for return visits, while others believe that the hit
count statistic may be seen as inappropriate or tacky for
the particular site I manage several sites for local
busi-nesses, and I have found have experienced both kinds
of sentiments from the business owners Thus, by
cre-ating this separate method, and only publishing the
link to a site that is not directly associated with the web
index page and its child links, the business owners can
privately view the web page statistics to determine how
many accesses have been made They can also view
when these hits occurred, in the course of the past
weeks, and months, and correlate the data to external
events (for instance, during periods of specific types of
advertising)
Updating the Database
I begin by opening a connection to the database and
entering all existing data from the hit counter method
into it This is accomplished in the ssiitteeIInnddeexx pphhpp code:
$conn = mysql_connect(“localhost”, “root”,”admin”);
In the examples I provide, everything is run on the local
machine (llooccaallhhoosstt), and I have set the username and
password to rroooott and aaddmmiinn, respectively The name of
the database instance can be arbitrarily defined by the
user; I chose ssiitteessttaattss Developers have their own
naming conventions, and I’m merely giving you some
insight into my own So, selecting the appropriatedatabase is accomplished via the following statement:
This query allows me to determine the current number
of rows contained in the table–this will be necessarylater In addition, I load an array with the data that I justread To plot the data, I need it in a form that I canmanipulate in memory:
while ($newArray = mysql_fetch_array($qry) ) {
$visits = $newArray[‘visits’];
if ( strcmp( $debug, “yes” ) == 0 ) echo “ maxVisits = “ $maxVisits
“ value from db = “ $visits “<br>”;
if ( $visits > $maxVisits ) $maxVisits = $visits;
72
73 // If file exists, but has no content, this means it is
74 // the first time the counter is being used In this
75 // instance, write the counter number and the date/time
76 // stamp to the hit counter file, with the counter
86 if ( $debug == 1 ) echo “[“ $counter “] “
87 date ( “h:i A l F dS, Y” );
16 // If file exists, and has content, read that content,
17 // extract the counter value, add 1 to it, and re-write
18 // to the counter data file
19 //
20
21 if ( filesize ( $COUNT_FILE ) > 0 ) {
22 $contents = fread ( $fp , filesize ( $COUNT_FILE ) );
23 if ( $debug == 1 ) echo $contents ;
Trang 13and adjust our old maximum to reflect the current
value The array variable, $$vviissiittss, now contains all of
the data from the database Therefore, $$vviissiittss is a
multi-dimensional array that allows us to keep track of
all of this data The time has come to read the
hhiittccoouunntteerr ddaatt file and determine what’s new so that
this can be added to the database, and the $$vviissiittss
array The hhiittccoouunntteerr ddaatt file is opened and its records
are stored in a new temporary array, $$ffiilleeEElleemmeennttss:
The explode function is very useful in expanding the
elements read from the data file into separate fields that
are then assigned to the $$ffiilleeEElleemmeennttss array This is
simple because the field delimiter in the hhiittccoouunntteerr ddaatt
file is the space character
The next step in the process involves locating the
cur-rent position in the database and determining how
many new data points need to be added Then, we
locate where to begin entering data into the database
table This is accomplished by reading the
hhiittccoouunntteerr ddaatt file and comparing the maximum
num-ber of visits last recorded in the database with the
asso-ciated visit data contained in the data file When the
two are equal, the point has been reached in the data
file wherein the last entry was made to the database
Any data contained beyond this point represents new
information that must be inserted into the instance
This defines the starting index for future inserts into the
database, which we fill using a ffoorr loop as follows:
for($k = $startIndex+1; $k < sizeof($data)-1; $k++ )
vis-values (‘’, ‘$hour’, ‘$minute’, ‘$second’,
‘$DayofWeek’, ‘$Month’, ‘$DayofMonth’, ‘$Year’,
$$mmiinnuuttee, $$sseeccoonndd, $$DDaayyooffWWeeeekk, $$MMoonntthh, $$DDaayyooffMMoonntthh,
$$YYeeaarr, and $$vviissiittss
Querying Results
Listing 3 is what I’ll call qquueerryyDDbb pphhpp—one of the ting workhorses of the methodology I start by perform-ing a general query and fetching all data within thedatabase:
Then, I assign these data to an array:
while ($newArray = mysql_fetch_array($qry) ) {
I scale the plotting of the individual bars to the currentmaximum value contained within the database This islogical because over time, as more data accumulates,the overall maximum number of visits increases It istherefore necessary to scale all data by the new maxi-mum value so that earlier hit count recordings will dis-play proportionally with respect to one another.Furthermore, since the maximum number of visits is
“T he output would provide an
at-a-glance summary that would
allow my client to assess
the effectiveness of
advertising campaigns ”
Trang 14104 // Determine where to begin new data entry into database,
105 // based on what is contained in the hitcounter file
106 //******************************************************** 107
149 $sql = “insert into sitevisits (visit_ID, hour,
150 minute, second, DayofWeek, Month, DayofMonth,
151 Year, visits) values (‘’, ‘$hour’, ‘$minute’,
152 ‘$second’, ‘$DayofWeek’, ‘$Month’, ‘$DayofMonth’,
14 // Open the db connection to sitestats
15 // and look at the last entry
22 mysql_select_db ( “sitestats” , $conn )
23 or die( “Could not open sitestats: “ mysql_error ());
30 $check = “select * from $table” ;
31 $qry = mysql_query ( $check )
32 or die ( “Could not match data: “ mysql_error ());
33 $nRows = mysql_num_rows ( $qry );
34 $maxVisits = 0 ;
35
36 while ( $newArray = mysql_fetch_array ( $qry ) ) {
37 $visits = $newArray [ ‘visits’ ];
38
39 if ( strcmp ( $debug , “yes” ) == 0 )
40 echo “ maxVisits = “ $maxVisits
41 “ value from db = “ $visits “<br>” ;
60 // Open the db connection to sitestats
61 // and prepare to insert data
71 mysql_select_db ( “sitestats” , $conn )
72 or die( “Could not open sitestats: “ mysql_error ());
Trang 15(logically) always represented by the last data element
within the database, it follows that we need to scale
based on this last element
Thus, I define a maximum width using the variable
$$ggrraapphhWWiiddtthhMMaaxx == 440000 pixels Now, I need to define the
height of each bar (that is, the width in the vertical
sense), which I’ve arbitrarily assigned to be $$bbaarrHHeeiigghhtt
== 1100;; pixels, and the absolute maximum width of each
bar, taken as the latest data entry in the database
ssiitteessttaattss table $$bbaarrMMaaxx == $$ddbbEElleemmeennttss[[$$nnRRoowwss 11]][[44]];;
I also need to define the number of rows to plot on a
given web page This is an important feature because
the number that should be plotted is related to each
bar’s width as well as the resolution of the screen and
the ability of the user to see the data clearly without
having to use the scroll bar Scrollbars can become a
nuisance, too, if the user is continually moving them to
see all data Hence, one requirement which I imposed
was to keep all of the data within the eye span of the
user So, I opted for a relatively low count in terms of
bars per page Now, since I will only be plotting 10 bars
per page, I need to come up with a mechanism for
allowing the user to move to a new page and show the
next 10 bars in the database I therefore defined
vari-ables to keep track of the starting row and the ending
row on any given page These quantities are
represent-ed as follows:
$numberRowsToPlot = 10;
$startRow = 0;
$endRow = $startRow + $numberRowsToPlot;
These equations will become important, shortly First,let’s plot the first 10 rows of data We do this in a for-loop, like this:
for ( $i = $startRow; $i < $endRow; $i++ ) {
$countVal = intval( $dbElements[$i][4] );
$barWidth = $graphWidthMax * $countVal/$barMax;
//
}
I begin with the $$ssttaarrttRRooww on the page and end withthe first $$eennddRRooww I retrieve the $$ii—the current index ofthe $$ddbbEElleemmeennttss array for counter value—and assign it
to variable $$ccoouunnttVVaall I then scale the $$bbaarrWWiiddtthh in portion to the maximum graphing width (defined ear-lier as 400 pixels) normalized by the maximum number
pro-of hits This gives me a proportional width with respect
to the 400-pixel limit within the plotting frame (here,the web page itself)
You’ll note from Figure 1 that data are printed side of the bars, including the value of a particular barwidth This is done in a straightforward manner by sim-ply encapsulating the printing of the data within atable, as columns within that table This ensures uni-form spacing and alignment of the data within thecells
along-Without going into all of the details (because Listing
3 provides the explicit implementation), the key ments of this plotting process are as follows: create atable, enter the data values into columns via an echostatement, and concatenate multiple columns so thatthe data are aligned across the page:
ele-echo “<tr>”;
echo “<td align=right><font face=arial color=blue size=2>”;
echo $dbElements[$i][0] “,</font></td>”;
But how do we actually create the bar? Very easily: wehave a JPG image of a single pixel, and labeledrreeddddoott jjppgg Within the second to last column of thetable we create an image reference to that JPG imageand size it where its width is equal to $$bbaarrWWiiddtthh and its
Listing 2 (cont’d)
185 echo “ startIndex = “ $startIndex
186 “ sizeof(data) = “ sizeof ( $data ) “<br>” ;
187
188 if ( $startIndex + 1 < sizeof ( $data ) ) {
189 $hour = $fileElements [ sizeof ( $data )- 1 ][ 1 ];
190 $minute = $fileElements [ sizeof ( $data )- 1 ][ 2 ];
191 $second = $fileElements [ sizeof ( $data )- 1 ][ 3 ];
192 $DayofWeek = $fileElements [ sizeof ( $data )- 1 ][ 5 ];
193 $Month = $fileElements [ sizeof ( $data )- 1 ][ 6 ];
194 $DayofMonth = $fileElements [ sizeof ( $data )- 1 ][ 7 ];
195 $Year = $fileElements [ sizeof ( $data )- 1 ][ 8 ];
196 $visits = $fileElements [ sizeof ( $data )- 1 ][ 9 ];
197
198 $sql = “insert into siteVisits (hour, minute, second,
199 DayofWeek, Month, DayofMonth, Year, visits) values
200 (‘$hour’, ‘$minute’, ‘$second’, ‘$DayofWeek’,
201 ‘$Month’, ‘$DayofMonth’, ‘$Year’, ‘$visits’)” ;
Trang 1611 // Open the db connection to sitestats
12 // and look at the last entry
20 mysql_select_db ( “sitestats” , $conn )
21 or die( “Could not open sitestats: “ mysql_error ());
22
23
24 //*****************************************************
25 // Note: mysql_fetch_row($qry) retrieves a single row
26 // mysql_fetch_field($qry, $i) fetches field $i
27 //*****************************************************
28
29 $table = “sitevisits” ;
30 $check = “select * from $table” ;
31 $qry = mysql_query ( $check )
32 or die ( “Could not match data: “ mysql_error ());
39 if ( strcmp ( $debug , “yes” ) == 0 ) echo “<table>” ;
40 if ( strcmp ( $debug , “yes” ) == 0 ) echo “<th>” ;
41 if ( strcmp ( $debug , “yes” ) == 0 ) echo “</th>” ;
42
43 $i = 0 ;
44 while ( $newArray = mysql_fetch_array ( $qry ) ) {
45
46 $dow = $newArray [ ‘DayofWeek’ ];
47 $mo = $newArray [ ‘Month’ ];
48 $dom = $newArray [ ‘DayofMonth’ ];
49 $yr = $newArray [ ‘Year’ ];
50 $vis = $newArray [ ‘visits’ ];
137 $countVal = intval ( $dbElements [ $i ][ 4 ] );
138 $barWidth = $graphWidthMax * $countVal / $barMax ;
139
140 echo “<tr>” ;
141 echo “<td align=right><font face=arial color=blue “
142 “size=2>” $dbElements [ $i ][ 0 ] “,</font></td>” ;
143 echo “<td align=right><font face=arial color=blue “
144 “size=2>” $dbElements [ $i ][ 1 “</font></td>” ;
145 echo “<td align=right><font face=arial color=blue “
146 “size=2>” $dbElements [ $i ][ 2 ] “</font></td>” ;
147 echo “<td align=right><font face=arial color=blue “
148 “size=2>” $dbElements [ $i ][ 3 “</font></td>” ;
149 print( “<td>\n” );
150 echo “<font face=arial color=purple size=2>” ;
151 echo “<b>” ;
152 print( “<img src=\”reddot.jpg\” “ );
153 print( “width=\”$barWidth\” height=\”$barHeight\”>” );
167 <font Style=”font-family:arial; font-size:12pt;
168 font-style: bold; color: #000000;”>
169 Entries: <?php echo $startRow ; ?> to
170 <?php echo $endRow ; ?> with
171 <?php echo $barMax ; ?> total rows
172 </font>
173 </td>
174 <td>
175 <form method=”post” action=”queryDB1.php”>
176 <input type=”hidden” name=”startRow”
177 value=” <?php echo $startRow ; ?> ” >
178 <input type=”hidden” name=”numberRowsToPlot”
179 value=” <?php echo $numberRowsToPlot ; ?> ” >
180 <input type=”hidden” name=”discrim” value=”add” >
181 <input type=”hidden” name=”delta” value=”10” >
182 <input type=”submit” value=”>”
183 Style=”font-family:sans-serif; font-size:10pt;
184 font-style:bold; background:#4400ff none;
Trang 17height is equal to $$bbaarrHHeeiigghhtt, as shown below:
At the end of each bar, I print the actual value of the
bar, accomplished by outputting the value of
$$ddbbEElleemmeennttss[[$$ii]][[44]]
Getting the Next 10 Rows
At the bottom of Listing 3, there are two forms I will
focus on the first form for the time being This form
accepts the current values of $$ssttaarrttRRooww and $$eennddRRooww
and passes these, as hidden values, to the PHP code in
Listing 4 (qquueerryyDDBB11 pphhpp) This is shown in the code
seg-ment below:
<form method=”post” action=”queryDB1.php”>
<input type=”hidden” name=”startRow”
value=”<?php echo $startRow; ?>” >
<input type=”hidden” name=”numberRowsToPlot”
value=”<?php echo $numberRowsToPlot; ?>” >
<input type=”hidden” name=”discrim” value=”add” >
<input type=”hidden” name=”delta” value=”10” >
<input type=”submit” value=”>”
Style=”font-family:sans-serif; font-size:10pt;
font-style:bold; background:#4400ff none;
color: #ccbbcc; height: 2em; width: 2em”>
</form>
Key within this form code are the variables named
$$ddiissccrriimm and $$ddeellttaa which are passed as hidden ables from qquueerryyDDBB pphhpp to qquueerryyDDBB11 pphhpp The ASCII textstring “add” is assigned to the ddiissccrriimm field As you’llsee in a moment, this is the key to how theqquueerryyDDBB11 pphhpp code displays results—they are postedthrough the form These are retrieved withinqquueerryyDDBB11 pphhpp using the following code:
$startRow = $startRow + $delta;
$endRow = $startRow + $delta;
if ( $endRow > $barMax ) {
$endRow = $barMax;
} }
If we click the right-hand arrow in Figure 1 (that is, the
“increase” button) then we expect that we will be sented the next 10 rows of data This is accomplishedwithin qquueerryyDDBB11 pphhpp by adding the value $$ddeellttaa to thecurrent $$ssttaarrttRRooww and assigning the new $$eennddRRooww equal
pre-to the current $$ssttaarrttRRooww plus $$ddeellttaa We must be ful if we are at the last few elements of data, because byattempting to add $$ddeellttaa rows to the current $$ssttaarrttRRooww
care-we may, in effect, run off the end of the data table Toaccommodate this event, I perform a check on thevalue of $$eennddRRooww in relation to $$bbaarrMMaaxx If $$eennddRRooww isgreater than $$bbaarrMMaaxx, then simply assign $$eennddRRooww to
$$bbaarrMMaaxx The application of this logic results in thescreen snapshot shown in Figure 2, in which the next
10 rows appear
In the interest of completeness, it must be noted thatcode Listings 5, 6, and 7 are those for hheeaaddeerr pphhpp,llooggoo pphhpp, and ffooootteerr pphhpp, respectively These are smallfiles that contain web page header, title, and page clos-ing HTML tags that are included in the main PHP doc-uments
Getting the Previous 10 Rows
This process continues: located at the bottom ofqquueerryyDDBB11 pphhpp are three forms The second form is the
189 <font Style=”family:arial; size:12pt;
font-style: bold; color: #000000;”>
190 Go to Entry:
191 </font>
192 </td>
193 <td>
194 <form method=”post” action=”queryDB1.php”>
195 <input name=”startRow” type=”text” >
196 <input type=”hidden” name=”numberRowsToPlot”
197 value=” <?php echo $numberRowsToPlot ; ?> ” >
198 <input type=”hidden” name=”discrim” value=”val” >
199 <input type=”hidden” name=”delta” value=”10” >
200 <input type=”submit” value=”>|<”
201 Style=”font-family:sans-serif; font-size:8pt;
202 font-style:bold; background:#4400ff none;
203 color: #ccbbcc; height: 3em; width: 3em”>
Trang 186 $startRow = $_POST [ ‘startRow’ ];
7 $numberRowsToPlot = $_POST [ ‘numberRowsToPlot’ ];
8 $discrim = $_POST [ ‘discrim’ ];
9 $delta = $_POST [ ‘delta’ ];
10
11 $debug = “no” ;
12
13 //***************************************
14 // Open the db connection to sitestats
15 // and look at the last entry
23 mysql_select_db ( “sitestats” , $conn )
24 or die( “Could not open sitestats: “ mysql_error ());
25
26
27 //*****************************************************
28 // Note: mysql_fetch_row($qry) retrieves a single row
29 // mysql_fetch_field($qry, $i) fetches field $i
36 $qry = mysql_query ( $check )
37 or die ( “Could not match data because “ mysql_error ());
44 if ( strcmp ( $debug , “yes” ) == 0 ) echo “<table>” ;
45 if ( strcmp ( $debug , “yes” ) == 0 ) echo “<th>” ;
46 if ( strcmp ( $debug , “yes” ) == 0 ) echo “</th>” ;
47
48 $i = 0 ;
49 while ( $newArray = mysql_fetch_array ( $qry ) ) {
50 $dow = $newArray [ ‘DayofWeek’ ];
51 $mo = $newArray [ ‘Month’ ];
52 $dom = $newArray [ ‘DayofMonth’ ];
53 $yr = $newArray [ ‘Year’ ];
54 $vis = $newArray [ ‘visits’ ];
136 $endRow = $startRow + $delta ;
137 if ( $endRow > $barMax ) $endRow = $barMax ;
146 $startRow = $startRow + $delta ;
147 $endRow = $startRow + $delta ;
160 $startRow = $startRow - $delta ;
161 $endRow = $startRow + $delta ;
Trang 19same as shown for qquueerryyDDBB pphhpp: in which the variable
$$ddeellttaa is added to the current $$ssttaarrttRRooww and $$eennddRRooww.The first form accommodates the left-hand arrow, andassigns the string “subtract” to the $$ddiissccrriimm variable.The code in qquueerryyDDBB11 pphhpp is then called recursively Ifthe user opts to back up ten rows, then there is a “sub-tract” method that does the following:
if ( strcmp($discrim, “subtract”) == 0 ) { // Going down
$startRow = $startRow - $delta;
$endRow = $startRow + $delta;
if ( $startRow <= 0 ) {
$startRow = 0;
$endRow = $startRow + $delta;
} }
In this instance, the $$ssttaarrttRRooww is decremented by theamount in $$ddeellttaa The $$eennddRRooww is still incremented by
$$ddeellttaa rows above $$ssttaarrttRRooww Then, we must modate the possibility of decrementing below the startrow The conditional statement handles this event bychecking whether the current value of $$ssttaarrttRRooww is lessthan zero If so, assign zero to the $$ssttaarrttRRooww variable,and set the $$eennddRRooww to zero plus $$ddeellttaa
accom-Starting at an Arbitrary Row
The third and last form contained in qquueerryyDDBB11 pphhppaccommodates the condition in which a user wishes to
go to an arbitrary row within the table This behavior ispreferred when, for example, much data exists withinthe database and the user would like to jump nearly tothe end
In this case, the value for $$ssttaarrttRRooww is assigned
direct-ly by the user, through the form, and qquueerryyDDBB11 pphhpp iscalled recursively, again The value of $$ddiissccrriimm picks upthe string value “gotovalue” from qquueerryyDDBB pphhpp, anduses this to assign the $$ssttaarrttRRooww:
<form method=”post” action=”queryDB1.php”>
<input name=”startRow” type=”text” >
<input type=”hidden” name=”numberRowsToPlot”
value=”<?php echo $numberRowsToPlot; ?>” >
<input type=”hidden” name=”discrim”
value=”val” >
<input type=”hidden” name=”delta” value=”10” >
<input type=”submit” value=”>|<”
Style=”font-family:sans-serif; font-size:8pt;
font-style:bold; background:#4400ff none;
color: #ccbbcc; height: 3em; width: 3em”>
</form>
The $$ssttaarrttRRooww variable becomes the point at which ues will start to be displayed, and is entered by the userthrough the form above Again, qquueerryyDDBB11 pphhpp is calledrecursively, and the $$ddiissccrriimm value is set to the string
val-“val” The code segment that catches this value lows:
fol-if ( strcmp($discrim,”val”) == 0 ) { // Go to cific range
spe-$endRow = $startRow + $delta;
if ( $endRow > $barMax ) $endRow = $barMax;
}
Listing 4 (cont’d)
193 echo “<td align=right><font face=arial color=blue “
194 “size=2>” $dbElements [ $i ][ 0 ] “,</font></td>” ;
195 echo “<td align=right><font face=arial color=blue “
196 “size=2>” $dbElements [ $i ][ 1 “</font></td>” ;
197 echo “<td align=right><font face=arial color=blue “
198 “size=2>” $dbElements [ $i ][ 2 ] “</font></td>” ;
199 echo “<td align=right><font face=arial color=blue “
200 “size=2>” $dbElements [ $i ][ 3 “</font></td>” ;
201 print( “<td>\n” );
202 echo “<font face=arial color=purple size=2>” ;
203 echo “<b>” ;
204 print( “<img src=\”reddot.jpg\” “ );
205 print( “width=\”$barWidth\” height=\”$barHeight\”>” );
220 <font Style=”font-family:arial; font-size:12pt;
221 font-style: bold; color: #000000;”>
222 Entries: <?php echo $startRow ; ?> to
223 <?php echo $endRow ; ?> with
224 <?php echo $barMax ; ?> total rows
232 <form method=”post” action=”queryDB1.php”>
233 <input type=”hidden” name=”startRow”
234 value=” <?php echo $startRow ; ?> ” >
235 <input type=”hidden” name=”numberRowsToPlot”
236 value=” <?php echo $numberRowsToPlot ; ?> ” >
237 <input type=”hidden” name=”discrim”
238 value=”subtract” >
239 <input type=”hidden” name=”delta” value=”10”>
240 <input type=”submit” value=”<”
241 Style=”font-family:sans-serif; font-size:10pt;
242 font-style:bold; background:#4400ff none;
243 color: #ccbbcc; height: 2em; width: 2em”>
251 <form method=”post” action=”queryDB1.php”>
252 <input type=”hidden” name=”startRow”
253 value=” <?php echo $startRow ; ?> ” >
254 <input type=”hidden” name=”numberRowsToPlot”
255 value=” <?php echo $numberRowsToPlot ; ?> ” >
256 <input type=”hidden” name=”discrim” value=”add” >
257 <input type=”hidden” name=”delta” value=”10” >
258 <input type=”submit” value=”>”
259 Style=”font-family:sans-serif; font-size:10pt;
260 font-style:bold; background:#4400ff none;
261 color: #ccbbcc; height: 2em; width: 2em”>
262 </form>
263 </td>
264 <td>
265 <font Style=”font-family:arial; font-size:12pt;
266 font-style: bold; color: #000000;”>
267 Go to Entry:
268 </font>
269 </td>
270 <td>
271 <form method=”post” action=”queryDB1.php”>
272 <input name=”startRow” type=”text” >
273 <input type=”hidden” name=”numberRowsToPlot”
274 value=” <?php echo $numberRowsToPlot ; ?> ” >
275 <input type=”hidden” name=”discrim” value=”val” >
276 <input type=”hidden” name=”delta” value=”10” >
277 <input type=”submit” value=”>|<”
278 Style=”font-family:sans-serif; font-size:8pt;
279 font-style:bold; background:#4400ff none;
280 color: #ccbbcc; height: 3em; width: 3em”>
Trang 20The $$eennddRRooww variable is set to $$ssttaarrttRRooww plus $$ddeellttaa If
the $$eennddRRooww exceeds the number of rows in the
data-base, it is automatically set to the maximum database
row In this way a user can access any starting row and
hop over intermediate values as needed The data are
passed recursively back to qquueerryyDDBB11 pphhpp using the
fol-lowing variables, which are retrieved from the form
The values are set based on the user’s selection during
the previous call to qquueerryyDDBB11 pphhpp It is possible to
aug-ment these stateaug-ments by incorporating some error
checking into the code to verify that the values have
been set within the proper ranges This is merely one
suggestion offered to improve the robustness of the
methodology
Operation and Data Base Table Structure
For those interested in using this methodology on their
own sites, all files are provided for download in the
code archive Figure 3 shows the structure of the
ssiitteessttaattss database, and the ssiitteevviissiittss table; it
con-tains a screenshot taken from PHPMyAdmin—a useful
tool for managing MySQL databases A user wishing to
recreate this site counter tool will need to install MySQL
on the server and will need to create the database
instance and table required to run the code
Summary
I have intended to provide some insight into how to
develop a simple and useful bar-chart based hit
count-er using PHP and MySQL The code I have provided is
the same as that which I am using on client sites to
keep track of access statistics A user having ordinary
skill in the art of PHP and MySQL can take this idea
much farther and include many different types of tics
statis-The methodology I provide has educational value, aswell, by illustrating a simple manner of implementingPHP database connectivity—a capability that is neces-sary for any type of advanced commercial application.Some additional ideas include adding site statistics ontime of day, user identity, and server identity It is evenpossible to accommodate statistics for each web pageassociated with a site, thereby providing details on thepopularity of various pages and on whether the site isable to hold the interest of individuals so that they visitother features available at your site
There is no limit to what you can do
To Discuss this article:
http://forums.phparch.com/218
John R Zaleski, Ph.D., is a biomedical systems engineer with
20 years of experience in software development and medical device integration experience as applied to acute care hospi- tal environments He has developed and fielded medical products that are currently in use in large acute care hospi- tals He has developed products and many applications in Java, PHP, and MySQL and has authored two dozen patent applications and an equal number of refereed publications
in the areas of medical device integration, software methods for medical device communication, software performance, and real-time clinical analysis of patient data.
Trang 22Unicode is a single character set designed to
include characters from just about every writing
system on the planet (and off the planet—even
Klingon has been written for Unicode, although it is not
part of the official standard) In recent years, Unicode
has become more prevalent on the web, and all major
web browsers, web servers, programming languages,
and databases worth their salt now support it
Switching your web applications to Unicode will give
you the ability to correctly handle and display any
char-acter from any language you’re likely to encounter
Understanding the significance of Unicode requires
first understanding some basics of character sets, and
their history The first thing you need to know was said
best by Joel Spolsky of Joel On Software: “There ain’t no
such thing as plain text.” If you don’t know the
charac-ter set and the encoding that were used in the creation
of a string of text, then you won’t know how to display
it properly For modern purposes, the story of character
sets starts with ASCII In the 1960s, unaccented English
characters, as well as various control characters for
car-riage returns, page feeds, etc., were each assigned a
number from 0 to 127; there was general agreement
on these number assignments, and so ASCII was born
The ASCII characters could fit in 7 bits, and computers
used 8-bit bytes, which left an extra bit of space Thisled to the proliferation of hundreds of different charac-ter sets, with each one using this extra space in a differ-ent way The characters from 0-127 are often referred
to as Lower ASCII, and the characters from 128-255 as
Many web sites cannot correctly interpret or display
any-thing other than English language characters Converting
your site to UTF-8 (Unicode) enables you to handle
char-acters from almost any language in the world However,
currently available conversion guidelines typically focus on
just a single software product, offering little guidance on
how to move UTF-8 encoded data between different
prod-ucts Configuring your web server, PHP, and your database
to support 8 is one thing—configuring them so
UTF-8 encoded data moves smoothly between them is
anoth-er This article guides you through a UTF-8 conversion
using PHP, Oracle, and Apache It also covers data exports
to PDF, RTF, email, and plain text.
Solving the Unicode Puzzle
by Michael Toppa
REQUIREMENTS
tech/opensource/php/globalizing_ _ o
Trang 23Upper ASCII or Extended ASCII Extended ASCII
charac-ter sets added characcharac-ters from non-English languages,
special characters like copyright symbols, and
line-drawing characters to simplify line-drawing boxes, etc With
all these different versions of extended ASCII floating
around, text generated on, say, a computer in Russia
would turn into gibberish if you tried to read it on a
computer in the US This happened because the
num-ber codes representing the Cyrillic characters were
assigned to totally different characters on the US
com-puter This became a bit of a problem when everyone
started using the internet
Unicode represents an effort to clean up this mess
The Unicode slogan is: “Unicode provides a unique
number for every character, no matter what the
plat-form, no matter what the program, no matter what the
language.” Unicode can do this because it allows
char-acters to occupy more than one byte, so it has enough
room to store characters from languages around the
world—even Asian languages that have thousands of
characters With Unicode, it’s particularly important to
understand the distinction between a character set,
and character encoding Unicode is a single character
set, but there are three different ways to encode it: they
are called 8, 16, and 32 (there’s also
UTF-7, but it was never officially adopted by the Unicode
Consortium, and for the most part it’s been deprecated
in favor of UTF-8) The numbers 8, 16, and 32 indicate
the bits used for the Unicode code units (a complete
character may occupy more than one code unit—it can
be multi-byte) All three encodings can display any
Unicode character, and each has its own advantages
and disadvantages depending on what’s important in a
particular implementation In the case of web
applica-tions, UTF-8 is the encoding of choice because it stores
the lower ASCII characters in a single byte format This
makes UTF-8 fully compatible with “plain text,” even if
you’re clueless about character encoding
For the sake of brevity, I’ve glossed over a great
num-ber of points related to Unicode and character sets If
you want to learn more, I highly recommend the
arti-cle The Absolute Minimum Every Software Developer
Absolutely, Positively Must Know About Unicode and
Character Sets (No Excuses!) by Joel Spolsky, at
w
www.joelonsoftware.com/articles/Unicode.html l It
contains links to a number of other good resources as
well
Why Care About Unicode?
As far as Unicode and UTF-8 are concerned, all web
sites can be placed in one of three categories: those
that don’t need to care about them, those that should
convert to UTF-8, and those that should convert to
UTF-8 and internationalize
The most common character set currently in use on
the English-speaking side of the web, other than UTF-8,
is Western ISO-8859-1 (aka Latin-1) If your site isn’t
already using UTF-8, then you’re probably using
Latin-1 If you’ve had no problems related to character sets
so far, and you have absolutely no foreseeable needs tohandle text outside the ASCII range, then you fall intothe first category: you probably don’t need to do any-thing As you’ll see in the rest of this article, converting
to UTF-8 is not a painless process, so you should onlyundertake the work if you have some clearly identifi-able, relevant goals to meet
Here at the University of Pennsylvania School ofMedicine, we fall into the second category: our websites are in English, but we occasionally handle datafrom a variety of foreign languages that don’t use theEnglish alphabet We must receive, store, display, andtransmit these characters faithfully Since we can’t reli-ably predict what sort of characters might come ourway, converting our applications to UTF-8 was the log-ical choice, since it can handle any language we mightneed to support
The third category is for sites that don’t just ally handle foreign characters—they actually serve aninternational audience In addition to using UTF-8,these sites typically employ various mechanisms thatallow visitors to choose the language for displaying
occasion-content One important term applied here is tionalization, defined by the W3C as “[t]he process of
interna-designing, creating, and maintaining software that canserve the needs of users with differing language, cultur-
al, or geographic requirements and expectations” (seeh
http://www.w3.org/TR/ws-i18n-scenarios/) Another
key term is localization: “[t]he tailoring of a system to
the individual cultural expectations for a specific targetmarket or group of individuals.” Sites that are able todynamically perform localization for a variety of targetaudiences can do so because they’ve been configuredwith a good internationalization framework
Internationalization and localization are substantialtopics, and are not the focus of this article However,getting all the various components of your web appli-cation environment to place nicely together using UTF-
8 is a necessary step before you can even try tionalizing your site So this article will be of interest tothose who only want to handle the occasional non-English characters, and to those who are contemplatingfully internationalizing their site
interna-Getting Ready for UTF-8
The first step is determining the scope of your work At
a minimum, you probably have PHP, a web server, and
a database to consider I’ll cover doing a UTF-8 conversion with PHP, Apache, and Oracle If you
are also using Oracle, then you must read An Overview on Globalizing Oracle PHP Applications at
h
h t t t p : : / / w w w w o r a a c c l e c o m m / t e e c h n n o l o o g y / / t e c c h / o
o p p e n s s o u r r c e / / p h p p / / g l o o b a l l i z i i n g _ _ o r a a c l e e _ p h h p _ a
applications.html It’s an excellent starting point, but,unfortunately, it doesn’t always explain the reasons
Trang 24behind its recommendations, which means you’ll get
stuck if things don’t happen to work after you follow its
instructions I’ll try to fill those gaps
You also have to take a look at any other applications
that interact with PHP, your web server, or your
data-base, as they will also be affected by a character set
conversion For us, that included Smarty, PDFlib, and
exporting data to RTF, text files, and email, so I’ll
dis-cuss those as well Even if you have a different mix of
applications, the concepts I’ll describe are probably
applicable to your situation, although the
implementa-tion specifics, obviously, will be different
Configuring Apache, PHP, and Oracle
Most of the time, PHP web applications are run under
the Apache web server, which itself is running in a user
account (assuming you’re in a Unix-ish environment)
So, the first step is to set the environment of this
account correctly Since PHP and Oracle are speaking to
each other through this account, it’s crucial to specify
the right character set for it, so they both know what to
expect You do this by setting the NNLLSS LLAANNGG
environ-ment variable in the Apache configuration The Oracle
Overview document mentioned above says to set it to
AALL3322UUTTFF88, but doesn’t fully explain why So when this
didn’t do the trick for me, I had to do some more
research I looked up the Oracle Character Set
descrip-tions and learned that AALL3322UUTTFF88 corresponds to
Unicode 3.1 After talking with our DBA I learned that
our Oracle database was set to Unicode 3.0, which
meant I needed to set NNLLSS LLAANNGG== UUTTFF88 Note that we
ultimately switched to AALL3322UUTTFF88, since it corresponds
to the latest version of Unicode, and in Oracle it allows
for conversion between UTF-16 and UTF-8 (just in case
you ever need to do that) The moral of the story is that
NNLLSS LLAANNGG should exactly match the character set you’re
using in Oracle
What I just said contradicts the advice of the Oracle
Overview document, where it says NNLLSS LLAANNGG should be
set to match the client (in this case, PHP) but that it
doesn’t need to match the database character set
That’s technically true, but a mismatch will quickly lead
to trouble if, for example, you try to insert records from
PHP that are in an encoding that’s not compatible with
the Oracle character set If you’re going to switch to
UTF-8, do it wholeheartedly: set PHP, your web server,
and your database all to UTF-8 This will save you theheadache of translating character encodings as youmove data around
NNLLSS LLAANNGG is not the end of the story It applies to thecommunication between PHP and Oracle, but it does-n’t determine how characters are encoded within PHP,and it doesn’t influence how documents are served byApache There are a few different approaches to consid-
er for having Apache and PHP serve your web pages inUTF-8
If you want all of the documents
on your server to default to UTF-8, one option is to set the AAddddDDeeffaauullttCChhaarrsseett directive in the Apache configuration to UTF-8 Note, however, that the Apache documentation ath
http://httpd.apache.org/docs-2.0/mod/core.html ldoes not express enthusiasm about this approach:
“AAddddDDeeffaauullttCChhaarrsseett should only be used when all of
the text resources to which it applies are known to be
in that character encoding and it is too inconvenient tolabel their charset individually One such example is toadd the charset parameter to resources containing gen-erated content, such as legacy CGI scripts, that might
be vulnerable to cross-site scripting attacks due to provided data being included in the output Note,however, that a better solution is to just fix (or delete)those scripts…”
user-If you want all of your PHP-generated content to beserved in UTF-8, set ddeeffaauulltt cchhaarrsseett==UUTTFF 88 in yourpphhpp iinnii file It’s OK if the PHP ddeeffaauulltt cchhaarrsseett is differ-ent from what’s specified in Apache AAddddDDeeffaauullttCChhaarrsseett:the former will apply only to PHP files, and the latterwill apply to everything else
If you want some (but not all) of your PHP documents
served in UTF-8, you don’t have to modify pphhpp iinnii.Instead, specify UTF-8 as the character set in theCCoonntteenntt ttyyppee header of those files It’s important topoint out here that you should set this header with thePHP hheeaaddeerr(())function If you try to set it with an HTMLMeta tag, and you’ve used Apache’s AAddddDDeeffaauullttCChhaarrsseettdirective to specify a different character set, the Apachedirective will override your Meta tag
Now that you’ve configured how you want ments served, you need to configure PHP so it caninternally handle UTF-8 This means enabling multi-byte character support You’ll need to re-compile PHP
docu-“S witching your web applications to Unicode will give you the ability to correctly handle and display any character from any language you’re likely to encounter “
Trang 25with the eennaabbllee mmbbssttrriinngg option (unless, of course,
you had the foresight to do it previously), and set
mmbbssttrriinngg iinntteerrnnaall eennccooddiinngg==UUTTFF 88 in your pphhpp iinnii file
Look over the PHP documentation for multi-byte
string functions at h http://www.php.net/ref.mbstring
Many of the PHP string functions have multi-byte
equivalents An example is the best way to illustrate
what this means The multi-byte version of ssttrrlleenn(()) is
mmbb ssttrrlleenn(()) The ssttrrlleenn(()) function assumes that a
character always occupies a single byte, so it actually
returns the length of a string in bytes, and does not
necessarily indicate the number of characters In UTF-8,
though, a string that is 4 characters long could occupy
anywhere from 4 to 24 bytes depending on the
pres-ence of multi-byte characters The mmbb ssttrrlleenn(())function
will correctly tell you the number of characters in such
a string, but the regular ssttrrlleenn(()) function won’t
Because of all this, you should consider enabling
PHP’s function overloading feature, described at
h
http://php.net/ref.mbstring#mbstri ng.overload
Activating function overloading will cause PHP to
auto-matically assume it’s handling multi-byte strings, so—
continuing with the example—it will actually execute
mmbb ssttrrlleenn(())when you call ssttrrlleenn(()) If you’re making a
wholesale conversion to UTF-8, and you don’t want to
revise all of the string function calls in your existing
code, implementing function overloading makes sense
But there are a couple of caveats:
Watch out for calls to ssttrrlleenn(()) (or any other string
function) where it really is intended to work with the
byte length, not the character length In that situation,
function overloading will end up giving you an
unin-tended result Fortunately, there is a workaround for
mmbb ssttrrlleenn(()): it accepts a character set specification as a
second argument and if you pass in ‘latin1’ (even
though it’s actually handling a UTF-8 string) This will
cause the string to be evaluated as if it were single-byte
encoded mmbb ssttrrlleenn(($$yyoouurr uuttff88 ssttrriinngg,, ‘‘llaattiinn11’’)) will
give you the number of bytes in a multi-byte string
You may not want to do function overloading on
mmaaiill(()) I’ll explain why in the discussion of email below
Note that if you haven’t upgraded to PHP 5, the
hhttmmll eennttiittyy ddeeccooddee(()) function will return an error if
you pass it a UTF-8 string This was the only UTF-8
incompatibility we found in PHP 4.3
Going back to Oracle, starting with Oracle 9i, it
pro-vides improved handling for multi-byte characters by
giving you a way to distinguish between byte length
and character length When creating a table, you can
specify whether its length is defined in terms of
charac-ters or bytes For example, VVAARRCCHHAARR22((2200 BBYYTTEE)) will give
you a 20-byte length field, and VVAARRCCHHAARR22((2200 CCHHAARR)) will
give you a 20-character length field The default is BBYYTTEE,
which you can alter with the NNLLSS LLEENNGGTTHH SSEEMMAANNTTIICCSS
parameter—see your Oracle documentation for more
details
Beware Windows-1252 in Web Forms
As I mentioned, other than UTF-8, the character ing you’re most likely to find on English-speaking websites, these days, is Latin-1 (aka Western ISO-8859-1).One of the nice things about UTF-8 is that the first 256characters are the same as in Latin-1 That is, the Latin-
encod-1 ASCII characters and its Extended ASCII characterslive in the same numerical locations in UTF-8 If you’recurrently on Latin-1, this greatly eases the pain ofswitching to UTF-8
So, the big “however” comes from—you guessed it—Windows Fortunately, Windows NT, 2000, and XP useUnicode internally and shouldn’t cause headaches for aUTF-8 web site But Windows 95 and 98 use theWindows-1252 character set Its standard ASCII charac-ters from 0-127 are the same as Latin-1 and UTF-8, butits Extended ASCII set is different If you have a form on
a web page that’s UTF-8 encoded, and someone ning Windows 9x fills out the form by copying-and-pasting text from Microsoft Word, Extended ASCIIcharacters may be interpreted properly You may haveexperienced this before: for example, the “©©” symbol inyour Word document turned into something like “ää”when you pasted it into a form Nothing about thecharacter’s underlying data changed—the decimal rep-resentation of the character is the same as it wasbefore—it just means something different in UTF-8than it does in Windows-1252
run-This was more of a problem in the past than it is now,
as modern browsers try to transparently perform acharacter set conversion for you as needed in these sit-uations But the problems are by no means entirely
resolved: see FORM submission and i18n ath
http://ppewww.ph.gla.ac.uk/~flavell/charset/ / f
form-i18n.html l for a thorough overview of all theissues related to this, as well as a rundown of how themajor browsers behave (if you’re wondering about the
meaning of i18n, it’s short-hand for
internationaliza-tion)
What makes this a truly maddening problem is verting a Latin-1 encoded database to UTF-8 whensome of the data in it came from Latin-1 encoded webforms where users pasted in Windows-1252 text, andtheir browsers didn’t convert the characters properly.There is no easy fix for this, as you simply have to look
con-at the records yourself to see if the Extended ASCIIcharacters are displaying as the user intended, or ifthere was a character set conversion problem along theway
UTF-8 Support in Smarty
Smarty handles UTF-8 transparently—almost The onetrouble spot is the eessccaappee modifier It calls the PHP
hhttmmlleennttiittiieess(())and hhttmmllssppeecciiaallcchhaarrss(())functions, but
it doesn’t provide them with the necessary charsetargument so they’ll work with UTF-8 The solution is to
Trang 26override eessccaappee with your own custom version Start by
making a copy of the Smarty eessccaappee modifier, and
tweak it to pass along a charset argument to PHP Then
override the original with your custom version If you
won’t always be using UTF-8, set your custom version
to accept a charset argument, so you can adjust the
functionality as needed Look up the “Extending
Smarty with Plugins” section of the manual on the
Smarty site—[http://smarty.php.net/]—for instructions
on how to customize Smarty
Exporting UTF-8 Data to PDF, RTF, Plain
Text, and Email
It may not always be wise, or even possible, to keep
data encoded in UTF-8 when exporting to other
for-mats As you’ll see below, sometimes you need to
change the character set before performing the export
Take a look at PHP’s uuttff88 ddeeccooddee(()) and iiccoonnvv(())
func-tions to learn about converting UTF-8 to single-byte
encoding Note that uuttff88 ddeeccooddee(()), while easy to use,
is limited to the Latin-1 character set (see the user
con-tributed notes on the PHP uuttff88 ddeeccooddee(())page for tips
on dealing with other character sets)
Our applications require exporting data to PDF, RDF,
text files, and email:
To generate PDF, we run the PDFlib application on
our web server to create PDF documents on the fly
PDFlib is an application specifically designed for
pro-cessing PDF data and dynamically generating PDF
doc-uments—you can learn more about it at
h
http://www.pdflib.com/ / For it to work with UTF-8
data, you need to use it with a UTF-8 compatible font
The commonly used Windows TrueType fonts—Arial,
Times New Roman, and Courier New—are Unicode
compliant However, that doesn’t mean they can
dis-play any Unicode character They are fine for English
and most Central and Eastern European
languages For more on this, see the Font
section of Alan Wood’s Unicode Resources at
h
http://www.alanwood.net/unicode/ / It’s important to
mention Microsoft’s Arial Unicode MS font, which is
not the same as the standard Arial font Arial Unicode
MS can display characters from Arabic, Tamil, Thai,
Hangul, Chinese, and many other languages This
means the font itself is huge: approximately 23Mb If
you try to use it with PDFlib running on your web
serv-er, you may run into performance problems
If you are using, for example, Microsoft Word,
it’s easy to take a Unicode document and save it
as an RTF file It’s also not difficult to use a tool
like RTF File Generator (available at
h
http://www.paggard.com/projects/rtf.generator/ /) to
generate RTF files using PHP, as long as the source data
does not include characters from multiple languages It
turns out to be quite difficult to use PHP to generate an
RTF file when the source data is UTF-8 encoded and
contains characters from several different languages.This is because RTF requires you to specify a characterset for displaying the characters, and you can’t just say
“Unicode.” You have to specify one or more ANSI,
PC-8, Mac, or IBM PC character sets This means you mustanalyze the multi-byte characters in a UTF-8 string andfigure out what characters they represent Then youneed to specify in the header of the RTF file what char-acter sets are needed to display them: a Hebrew char-acter set for Hebrew characters, Arabic for Arabic, etc.Then in the body of the file you must flag the variouschunks of non-English text and indicate which of thesecharacter sets are needed to display them Rather thanattempting this Herculean task, our solution is to do a
uuttff88 ddeeccooddee(())on our data before generating RTF files,
so that the text is all in Latin-1 At the moment we canget away with this since none of the data going into theRTF files we currently generate contain non-Englishcharacters We are planning to eventually discontinueour RTF support, so this will not be a long-term prob-lem Acquiring an understanding of how RTF workswith Unicode data was difficult—of all the applications
we encountered in this project, RTF was the least welldocumented when it came to Unicode
We export data to text files, primarily in ccssvv formatfor use in spreadsheets Surprisingly, current versions ofMicrosoft Excel do not support importing UTF-8 encod-
ed text files As with RTF, our solution is to perform a
uuttff88 ddeeccooddee(()) before generating these text files Thisdoesn’t pose any problems for us since the kind of data
we put in spreadsheets does not contain any English characters
non-As I mentioned, I do not recommend doing functionoverloading on the PHP mmaaiill(()) function The reasonhas to do with line breaks In Unix, a line break is rep-resented by a line feed (LLFF, or \\nn) character, on Macs,it’s represented by a carriage return (CCRR, or \\rr) charac-ter, and on Windows, by a CCRR++LLFF (\\rr\\nn) For email towork between platforms, an email standard was agreedupon in the early days of the internet, which is CCRR++LLFF
So, for example, on Unix, sendmail will add a CCRR as
“U nicode allows characters to occupy more than one byte,
so it has enough room to store characters from languages around the world “
Trang 27needed to each LF it finds in the body of an email
mes-sage But when an email is UTF-8, PHP will first base64
encode it before passing it off to sendmail This
encod-ing is done so that multi-byte UTF-8 characters can be
transported within the 7-bit world of email (for more
about this, see Advanced E-mail Manipulation by Wez
Furlong, php|architect Vol 3, Iss 5) Sendmail and
other mailers do not attempt to wade through the
base64 encoding to “fix” the line breaks Unless you’re
careful to put CCRR++LLFF line breaks in all your PHP
generat-ed emails before sending them, you’ll end up sending
emails with improper line breaks This can have
unpre-dictable results, as you’re at the mercy of the recipient’s
email client software, and what it chooses to do with
malformed line breaks In our testing, we found that
the LLFF-only line breaks in our UTF-8 encoded emails
were interpreted as desired in Mac and Unix mail
read-ers, and by Microsoft Outlook on Windows, but not by
Eudora 6.2 (and previous versions) on Windows In
Eudora, the messages displayed with no line breaks at
all You can’t say it’s a Eudora bug, since the line breaks
weren’t meeting the standard At this time, the emails
we generate only contain basic English characters, so
sticking with the standard mmaaiill(()) function meets our
needs for now
The Bumpy Road to Unicode Compliance
As you can see, converting your web site to UTF-8 is by
no means a painless process But the payoff is worth it
if you plan to support characters from several guages It’s also a fascinating educational experience:you’ll gain a stronger understanding of how Apache,Oracle, and PHP interact, how Unicode supports somany different languages, some of the gory details ofhow email works, how browsers deal with mismatchingcharacter sets, what a Unicode compliant font is, andmuch more Even if you’re not using the same softwarediscussed in this article, hopefully I’ve at least imparted
lan-a sense of whlan-at kinds of problems you should look outfor If nothing else, hopefully you’ll remember, “thereain’t no such thing as plain text.”
Available Right At Your Desk
All our classes take place entirely through the Internet and feature a real, live instructor that interacts with each student through voice or real-time messaging.
What You Get
Your Own Web Sandbox Our No-hassle Refund Policy Smaller Classes = Better Learning
Sign-up and Save!
For a limited time, you can
just by signing up for our training program!
New classes start every three weeks!
http://www.phparch.com/cert
To Discuss this article:
http://forums.phparch.com/219
Michael Toppa is a web applications developer at the University of Pennsylvania School of Medicine He has previously worked for Ask Jeeves, E*TRADE, and Stanford University Libraries’ HighWire Press He can be found on the web at w www.toppa.com Credit for a lot of the research in this article goes to all of the U Penn School of Medicine Web Development team.
Trang 29The hype around XML (the logical connection of
structure and data within a document) remains
unbroken—there is no serious Content
Management System that doesn’t offer, at least
rudi-mentary, XML support in one form or another
The dominant APIs for XML processing are DOM
(Document Object Model) and SAX (Simple API for
XML), two APIs that focus more on tags and less on
data The DOM API creates an XML document in a
tree-like structure that is saved in memory for continuous
use SAX is different: it runs through a document and
fires events based on the contents of the XML it is
pars-ing
Even before there was XML, there was the Document
Object Model, or DOM It allows a developer to refer
to, retrieve, and change items within an XML structure,
and is essential to working with XML The Document
Object Model is a platform- and language-neutral
inter-face that will allow programs and scripts to
dynamical-ly access and update the structure, content and style of
documents For large XML documents the memory and
processor resources consumed can be prohibitive,
because building a DOM object is relatively processor
intensive and the resulting DOM object usually
con-sumes a large amount of memory
The SAX parser is often used to process large XML
documents, but, unfortunately, it is poorly designed
Rather than being called by the parsing application, the
SAX parser uses a message handler with callbacks—this
is not straightforward The approach taken by SAXmakes the software architecture much more difficultthan it needs to be Although the resulting code maylook sufficient, there are always some inherent prob-lems because SAX does not maintain information aboutthe current state—that’s up to you This can be fixed bykeeping track of how deeply nested the start/end-ele-ment is and by using extra flags, but it always requiresadding extra state variables and code to do validation.Unlike that of DOM, the SAX specification is not a W3C(World Wide Web Consortium) standard; it was,instead, created by the members of the XML-DEV mail-ing list SAX parser doesn’t build a tree structure of thedocument in memory, like DOM does—the XML docu-ment is read sequentially, and special events are fired ifthe parser recognizes a significant component of thedocument (e.g a comment) The parser doesn’t keeptrack of previous elements—when it runs into a recog-nized chunk of the document, its work is done
XMLPull is an alternative API for parsing XML.Perhaps you find the memory consumption too high or
Despite the popularity of known APIs for XML processing,
such as SAX and DOM, the XMLPull parser is finding more
and more followers There are equivalent programs for
Java, Python, and Perl, and Harry Fuecks is writing an
equivalent implementation for PHP PHP 5 also comes with
a native extension called xmlReader.
XMLPull
an Alternative to SAX and DOM
by Markus Nix
Trang 30the manipulation of data with SAX too involving If so,
it will pay to take a closer look at XMLPull Parsing XML
with XMLPull reflects the organization of data
struc-tures and therefore code written to use the XMLPull
parser is much easier to maintain State information is
kept, naturally, on the parser’s stack, as a consequence
of method calls that can be nested as many times as
necessary Pull parsers offer big ease-of-use advantages
compared to SAX, but you may be left wondering if
they can measure up SAX’s industrial-strength
perform-ance They can!
XMLPull was introduced in early 2002 by ringleaders
from the two leading pull parser implementations,
Stefan Haustein from the kXML project and Aleksander
Slominski from XPP3 (XML Pull Parser) Both, feeling
that the lack of a common API hindered wider pull
pars-ing adoption, began to work on XMLPull in December
2001 The resulting API reflects their substantial
experi-ence, drawing from their respective projects to produce
an interface that works well for a wide range of
appli-cations
XMLPull for Java, for example, supports everything
from J2ME (Java 2 Platform, Micro Edition) to J2EE (Java
2 Platform, Enterprise Edition) The J2ME requirement
forced the lead developers of XMLPull to create a
sim-ple interface with the minimum number of classes
nec-essary to function well in low memory environments In
contrast, J2EE environments don’t usually suffer from
such limited resources, but, instead, demand flexibility
and performance Accommodating both extremes with
a single interface is tough
According to the API introduction by Alexander
Slominski, “XML pull parsing allows incremental
(some-times called streaming) parsing of XML where
applica-tion is in control—the parsing can be interrupted at any
given moment and resumed when application is ready
to consume more input.”
While many Java programmers are already familiar
with XMLPull, this method of accessing an XML
docu-ment is still strange to most PHP programmers The
xxmmllRReeaaddeerr API is similar to SAX-API (which is frequently
used for simple XML processing in PHP), but provides a
simpler, more standard and more extensible interface
to handle large documents than the existing SAX
ver-sion It should be noted that XMLPull has no notion of
callbacks Think of XMLPull as defining a special kind of
iterator that delivers an XML document’s components
to you, one at a time It is totally up to you to decide
when you’re done with the current component, and
ready to move to the next one The parser always holds
a particular state that matches the current component
type Many of the methods prove meaningful only
when the parser is in a particular state, which is
identi-fied by a set of constant definitions
The Java API allows you choose the detail level that
your program will see This is a very powerful feature
18 require_once( XML_XMLPULL ‘XmlPull/PushListener.php’ );
19 require_once( XML_XMLPULL ‘XmlPull/PullParser.php’ );
20
21 /**
22 * Factory function for creating the pull parser
23 * @param string parser type (‘Expat’or HTMLSax’)
24 * @param string reader type (‘File’, ‘String’or ‘Struct’)
25 * @param mixed source to read (e.g string, file path, struct)
Trang 31when talking about layering The original SAX interface
did not report all of the information needed to validate
a document, so developers had to build special
meth-ods into their parsers, if they wanted to support
valida-tion
A new Java Community Process (JCP) specification
request specifies a standard API for Java pull parsers:
JSR-173 (Streaming API for XML) Like SAX, XMLPull is
a W3C recommendation, as the only existing reference
implementations are explicitly Java based
(see the XMLPull API at h http://xmlpull.org/)
A PHP Implementation by Harry Fuecks
If you know how callback functions work in the SAX
Parser, the interface of the XMLPull Parser is easy to
understand: a simple factory method is enough to
establish a Parser- or Reader-type The document is
eas-ily iterated to capture the parts of the document that
are of interest The HTMLSAX XMLPull implementation
continues in the spirit of the original JAVA specification,
and supplies a simple interface, versatility, usage, and
good performance
Sax Pushes, XMLPull Pulls
Pull Parser is turning the paradigm of SAX Parsers
around Instead of forcing the parser to execute
prede-fined callback functions when a certain component of a
document is reached, it is instead asked to reply with
the next component This results in “pulling” instead of
“pushing”, and makes data processing easier
In the Java Community, there is a certain hype that
surrounds pull-parsing, because, unlike SAX (or rather
SAX2, if you prefer working with namespaces), it will
give control of the parsing event back to the
develop-er, instead of relying on a “black box.” XMLPull allows
incremental (streaming) parsing, so it is possible to
pause the parser in its work, for example, to wait for the
arrival of new data in unpredictable surroundings (such
as when pulling data from a remote server) J2ME is a
parser variant that is made for such surroundings: goodperformance with a small footprint
The PHP implementation follows the Java-API in mostscenarios The principle of parsing, using pull, is veryeasy: the parser iterates over a data stream with the
ppaarrssee(()) method, and travels from event to event Thevarious event types are replied as values that relate to
constants, with the original ggeettEEvveennttTTyyppee(()) method:SSTTAARRTT DDOOCCUUMMEENNTT, SSTTAARRTT TTAAGG, TTEEXXTT, EENNDD TTAAGG, and EENNDD DDOOCC UUMMEENNTT In PHP, these differ slightly: XXMMLL PPUULLLL SSTTAARRTT TTAAGG,XXMMLL PPUULLLL EENNDD TTAAGG, XXMMLL PPUULLLL TTEEXXTT and XXMMLL PPUULLLL PPII
XXMMLL PPUULLLL SSTTAARRTT TTAAGG offers information about the starttag of an element including information about theattributes XXMMLL PPUULLLL TTEEXXTT delivers CCDDAATTAA information.The other conditions are self-explained The parsing of
a XML document with XMLPull can be seen in Listing2
At the time of writing, Fuecks’ Pull Parser supportsfour conditions that are represented through the con-stants that I’ve mentioned above In addition to thesemain four, there are also XXMMLL PPUULLLL EESSCCAAPPEE andXXMMLL PPUULLLL JJAASSPP—these are useful only when workingwith the PEAR-Package (also written by Harry Fuecks).Support for namespaces is currently missing
Most SAX parsers are built on top of a pull parsinglayer It is an interesting challenge to expose both thepull and push layers to the user, but such functionalityallows a developer to use pull parsing when needed,without having to stop using the SAX API
It is possible to convert a pull parser into a pushmodel—during pull parsing, the caller has control overparsing and can push events It is also possible to con-vert push into pull parsers, but this requires that allevents be buffered, and converted from SAX callbacks
An alternative implementation of this conversioninvolves an extra thread that can be used to pull moredata from the SAX parser, but is kept suspended untilthe user asks for more events This approach is bestexemplified by Fuecks’ Pull Parser Wrapper for SAX thatallows conversion from a SAX model into an XML pullparser The parser-implementation by Fuecks is based
on the XML_SaxFilters PEAR Package (seeh
http://pear.php.net/package/XML_SaxFilters s), and uses PEAR’s iteration mechanism extensively The PHP implementation of the SAX filtercode was originally from Luis Argerich (h http://phpxml- - c
classes.sourceforge.net/show_doc.php?class=class_ s
sax_filters.html l), and was mentioned in greater
detail in the Wrox Press title “PHP 4 XML.” Fuecks’
“C ode written to use the XMLPull parser is much easier to maintain ”