If you are using persistent connections and restart your Oracle database, all of the open connec-tions being used by your Apache server will become corrupt but will not be reopened until
Trang 1JUNE 2004 VOLUME III - ISSUE 6
The Magazine For PHP Professionals
> Artificial Intelligence made easy with PHP and FANN <
NETWORKS
NEURAL
Spell checking with PHP
Automatic language detection
Make your script determine the
language of written text
Portable and stable GUI applications with PHP and XUL
Efficient Oracle Programming
Incredible-looking forms with PHP,
Trang 3Sign up before July 20th and save up to $100!
Christian Mayaud — Getting Your OSS Business Funded, Rasmus Lerdorf — Best Practices for PHP Developers, Jim Elliott — Open Source: The View from IBM, Daniel Kushner — Attacking the PHP Market, Andrei Zmievski — Andrei’s Regex Clinic, Wez Furlong — Introducing PDO, Regina Mullen — OSS in Legal Technology, Derick Rethans —
Multilingual Development with PHP, George Schlossnagle — PHP Design Patterns
and many, many more!
Jump Right To It.
Trang 4PHP And the What-if Machine
by Andi Gutmans and Marco Tabini
10 Low-impact Programming with
TM
Trang 5*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly.
Choose a Subscription type:
CCaannaaddaa//UUSSAA $$ 9977 9999 CCAADD (($$6699 9999 UUSS**)) IInntteerrnnaattiioonnaall AAiirr $$113399 9999 CCAADD (($$9999 9999 UUSS**))CCoommbboo eeddiittiioonn aadddd oonn $$ 1144 0000 CCAADD (($$1100 0000 UUSS))((pprriinntt ++ PPDDFF eeddiittiioonn))
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue
to be mailed to you.
*US Pricing is approximate and for illustration purposes only.
php|architect Subscription Dept.
VISA Mastercard American Express
Credit Card Number:
The Magazine For PHP Professionals
YYoouu’’llll nneevveerr kknnoow w w whhaatt w wee’’llll ccoom mee uupp w wiitthh nneexxtt
Subscribe to the print edition and get a copy ofLumen's LightBulb — a
$99 value absolutely FREE †!
Login to your account
for more details
EXCLUSIVE!
† Lightbulb Lumination offer is valid until 12/31/2004 on the purchase of a 12-month print subscription.
Trang 6Graphics & Layout
no responsibilities with regards of use of the information contained herein or in all ciated material.
asso-Contact Information:
Copyright © 2003-2004 Marco Tabini & Associates, Inc.
— All Rights Reserved
This month’s issue marks the first time, at least to
my knowledge, that a topic such as artificial
intel-ligence has been discussed on a PHP publication
AI is one of those topics most people talk about
with-out really understanding its capabilities—and this has
resulted in a lot of confusion out there If you’re
wor-ried that your server will become sentient and try to
take over the world (or, worse, spend all your money),
you can rest assured that that will not be the case (at
least until you run Internet Explorer—that’ll do the
trick)
However, a technology like neural networks can
come in very handy for a website developer Ad-hoc
predictive solutions for tasks such as fraud prevention
and customer enticement already exist out there and
are well available for everyone to use—at an often
steep price As a PHP developer, however, you are both
luckier and less fortunate at the same time The FANN
library extension that is now available through PECL
provides you with the facility needed to create, train
and execute a generic neural network, which means
that you can not only build applications similar, or even
better, to the ones available commercially, but that you
can also build new and exciting ones
On the other hand, designing and training a neural
network is a bit of a “black art” that requires a lot of
trial and error, so that you’ll have to be very creative
with it It’s excellent news for us that Evan Nemerson,
who is the author and maintainer of the extension (as
well as one of the original authors of the library) has
agreed to tackle the problem of creating a neural net
from a practical perspective—building a simple script
that is capable of automatically determining the
lan-guage in which a string of text is written Even with
sur-prisingly little training (and, even better, very little
actual PHP code), the network can reach surprisingly
high levels of accuracy
Still, I’m fairly convinced that, once more people start
appreciating the abilities of the FANN library in finer
detail, we’ll see applications built on top of it become
available for everyone to use and tweak—and before
you know it, your computer will shut down at the
sound of “I’ll be back”
Neural networks are not all we’re doing this month,
of course Ilia Alshanetsky covers spell checking—a
topic that can be helpful to everyone who runs a
web-site As it turns out (but not surprisingly), PHP has
excellent facilities that support spell-checking
opera-tions We also have a great article on optimizing
Oracle-based websites—now that Oracle is placing
more and more interest in open-source projects, this is
likely to come in handy to more and more developers,
even if they are not in the enterprise arena If you ever
wanted to create beautiful-looking forms but dreaded
the prospect of converting them to PDF, you’ll likely be
EDITORIAL
Continued on page 9
Trang 7“eZ publish is an open source content management system and development framework As a content management system (CMS) it’s most notable feature
is its revolutionary, fully customizable, and extendable content model This is also what makes it suitable as
a platform for general Web development Its alone libraries can be used for cross-platform, data- base independent PHP projects eZ publish is also well suited for news publishing, e-commerce (B2B and B2C), portals, and corporate Web sites, intranets, and extranets eZ publish is dual licensed between GPL and the eZ publish professional license.”
stand-View more information at e eZ.no
phpPgAdmin 3.4 Released
P
Postgresql.com m announces the release of
phpPgAdmin 3.4
“phpPgAdmin is a web-based administration tool for
all 7.x versions of PostgreSQL.”
Some new features include:
• Add CACHE and CYCLE parameters in
sequence creation
• View, add, edit and delete comments on
tables, views, schemas, aggregates,
conver-sions, operators, functions, types, opclasses,
sequences and columns (Dan Boren &
ChrisKL)
• Add config file option for turning off the
dis-play of comments
• Allow creating array columns in tables
• Allow adding array columns to tables
• many more…
Get all the info at P Postgresql.com m
Zend Technologies and ApolloInteractive Unite
Thursday, May 27th 2004 13:48:55 GMT
“Apollo Interactive®, America's leading Interactive Agency, and Zend Technologies, the PHP company, today announced a partnership to promote excellence
in open source development Through the alliance, the companies will share their varied technology per- spectives to improve the functionality of the PHP lan- guage ¾ which was developed by the founders of Zend ¾ and refine PHP implementation for large, high-volume enterprise Web sites
The combination of Apollo’s significant PHP site opment experience and Zend’s technological expertise will help drive the continued evolution of PHP, an open source Web scripting language that is gaining momentum as the most popular language to power dynamic Web sites.The alliance will further the devel- opment of PHP’s infrastructure and enable Zend to establish best practices for its implementation in large enterprise environments.”
devel-For more information visit: w www.zend.com m
Trang 8NE EW W S ST TU UF FF F
PHP5 Coding Contest
Want to put your PHP5 Skills to the test? Zend has
announced its PHP5 coding contest, of which
php|architect is also a sponsor
“We’ve got lots of Prizes to give out just for entering,
as well as the Grand Prizes: a top-of-the-range Dell
laptop for a developer working by himself or an Apple
iPod Mini for each member of your team! Your
appli-cation will be rated both by your peers and by the
panel of Judges we’ve assembled from among the
most known and well-respected names in the PHP
community.”
Get all the Contest information from Z Zend.com m
Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.
BLENC 1.0alpha
BLENC is an extension that hooks into the Zend Engine, allowing for transparent encryption andexecution of PHP scripts using the blowfish algorithm It is not designed for complete security(it is still possible to disassemble the script into op codes using a package such as XDebug),however it does keep people out of your code and make reverse engineering difficult
odbtp 1.1.1
This extension provides a set of ODBTP, Open Database Transport Protocol, client functions.ODBTP allows any platform to remotely access Win32-based databases Linux and UNIX clientscan use this extension to access Win32 databases like MS SQL Server, MS Access and VisualFoxPro
PDO_MYSQL 0.1
This extension provides a Mysql 3.x/4.0 driver for PDO
PHP 4.3.7 Released
PHP.net announcedthe release of PHP4.3.7 The PHPDevelopment Team isproud to announce therelease of PHP PHP4.3.7 This is a mainte-nance release that, inaddition to several non-critical bug fixes, addresses
an input validation vulnerability in
eessccaappeesshheellllccmmdd(()) and eessccaappeesshheellllaarrgg(()) functions
on the Windows platform Users of PHP onWindows are encouraged to upgrade to thisrelease as soon as possible
For more information visit: h http://qa.php.net/ /
Trang 9New at php|a: PayPal support and
sin-gle prints
Monday, June 7th 2004 13:22:00 GMT
You asked for it! php|architect's purchasing system
now accepts PayPal as a valid payment method! You
can use your PayPal account safely and securely to pay
for all your php|a purchases
Also, effective immediately you can now purchase
individual print issues that will be delivered directly to
your doorstep Expect more past issues to become
available as we update our inventory and introduce
new shipping methods to get the magazines out to you
faster!
PHP 5 Release Candidate 3 Released!
Tuesday, June 8th 2004 12:48:09 GMT PHP.netannounces the third release candidate of PHP5!
The third (and hopefully final) Release Candidate ofPHP 5 is now available!
This mostly bug fix release improves PHP 5's stabilityand irons out some of the remaining issues before PHP
5 can be deemed release quality Everyone is nowencouraged to start playing with it!
There are few changes changes since ReleaseCandidate 2, which can be found here
For more information visit: w www.php.net
• Creates configurations from scratch
• Parses and outputs different formats (XML, PHP, INI,
Apache )
• Edits existing configurations
• Converts configurations to other formats
• Allows manipulation of sections, comments, directives
• Parses configurations into a tree structure
• Provides XPath-like access to directives
XML_HTMLSax3 3.0.0RC1
XML_HTMLSax3 is a SAX-based XML parser for badly formed XML documents, such as HTML
The original code base was developed by Alexander Zhukov and published at ects/phpshelve/ Alexander kindly gave permission to modify the code and license for inclusion in PEAR
http://sourceforge.net/proj-PEAR::XML_HTMLSax3 provides an API very similar to the native PHP XML extension(http://www.php.net/xml), allowing handlers using one to be easily adapted to the other The key difference
is HTMLSax will not break on badly formed XML, allowing it to be used for parsing HTML documents.Otherwise HTMLSax supports all the handlers available from Expat except namespace and external entity han-dlers Provides methods for handling XML escapes as well as JSP/ASP opening and close tags
DB_DataObject 1.6.1
DataObject performs 2 tasks:
1 It builds SQL statements based on the objects vars and the builder methods
2 It acts as a datastore for a table row
The core class is designed to be extended for each of your tables so that you put the data logic inside the dataclasses
php|a
PHP_Beautifier 0.0.6.1This program reformats and beauti-fies PHP source code files automati-cally The program is Open Sourceand distributed under the terms ofPHP License It is written in PHP 5and has a command line tool
Trang 10interested in this month’s article on FDF forms—PHPprovides an excellent interface to Adobe’s FDF librarythat lets you combine a PDF form with POST data andcreate a print-quality document with little or no effort.Elsewhere, we cover XUL, the interface developmentlanguage that must have been born out of one Mozilladeveloper asking the others “and now, how do we do
it in Windows?” XUL is great for building a GUI cation that can be ported across several operating sys-tems and that requires almost no programming—and,
appli-certainly, no code in C, Visual Basic et similia
Finally, this issue also marks the debut of our veryown Peter MacIntyre in the role of reviewer and author.Peter is a great help in the editorial process—and, as itturns out, an incredibly gifted reviewer Now, if I couldonly interest him in some Italian food…
Editorial: Contiuned from page 5
“LightBulb is a complete, browser-based, WYSIWYG
PHP development suite which includes a PHP
appli-cation generator, a code editor (with context and
classes prompting and highlighting), a complete
middleware/framework environment (Lumenation),
a GUI application interface, record locking, HIPPA
application compliance, user application logging,
transaction logging, current user monitoring, a
library of PHP classes and data access security, DB
compatibility, a report builder, a query builder, an
SQL builder, a source code manager, an application
management system, and a virtual desktop system
metaphor, and many other features.”
For more information or to download, visit e ezsdk.com
Trang 11Having your PHP-driven website use an Oracle
database means you can tap into some of the
most powerful tools available for web
develop-ment The speed and reliability of PHP code, coupled
with the power and flexibility of an Oracle database
gives developers a tough combination to beat
However, unless you are careful, performance and
reli-ability issues can creep up on you and cause your
sys-tems to have increasingly difficult issues to resolve
This article is an attempt to outline a few steps you
can undertake to create PHP and Oracle code that
works well together and minimizes resource utilization
on both sides In other words, we’re presenting the
tools and techniques to create “low impact
program-ming.”
All of the examples shown are drawn from real-life
pain and suffering Our environment consists of a
num-ber of web servers running Linux, Apache version 2,
and usually the latest version of PHP Our Oracle
data-base servers typically run on Sun hardware operating
Solaris (although we do have several test Oracle servers
running on Linux)
We’ll start by describing the various ways in which
you can minimize the impact of coding decisions on
both the web and database servers We will follow this
up with some very specific examples of common tasks
and the approaches you can take with them We’ll
con-clude with tools and techniques for monitoring your
progress at making low impact and robust web sites
with PHP and Oracle
What is “Low Impact Programming”?
Low impact programming, more an attitude than askill, means always trying to reduce the ways in whichthe code we write and the configurations we makeimpact the servers on which they run It means alwayssearching for ways to reduce resource utilizationregardless of how often a piece of code will be run
In practical terms, writing low impact programmingmeans that your systems will scale without having tocontinue throwing hardware at the problem.Moreover, by concentrating on reducing resource uti-lization to accomplish the same tasks, you end up mak-ing your systems much more robust and fault tolerant
We are often lulled into a false sense of security when
we use tools like PHP and Oracle, because of theirinherent speed and reliability The danger is that whenperformance issues arise, they escalate quickly
Keeping Resource Utilization Light
A number of factors influence what resources arerequired when connecting PHP and Oracle in a web-site The easiest ways to reduce resource usage includeusing persistent database connections, avoiding Oracledatabase commits, taking advantage of the Oracle SQL
REQUIREMENTS
PHP and Oracle are an excellent combination for creating
powerful and scalable web solutions This article sheds
light on those performance issues that might arise only
under high-traffic situations—so that you can stop them
before they ever start cropping up
Trang 12cache, and minimizing data transfer.
Persistent Connections
Many arguments exist both for and against using
per-sistent connections The biggest single advantage to
using persistent connections, with Oracle as your
data-base in particular, lies in the fact that creating datadata-base
connections takes a lot of time and CPU power on both
ends of the connection In our testing, we found that
opening a new Oracle database
con-nection added between 0.25 and
0.5 seconds per page Using
persist-ent connections saved us this time
on nearly every page
If you choose to use persistent
connections with an Oracle
data-base, you must be aware of many
things Among the chief
considera-tions are that resources opened up
by one script on a persistent
connec-tion will remain open on subsequent
scripts on the same connection This
has a cumulative effect on your
data-base server and can be a stealth
rea-son for system slowdowns and
phantom error messages
As an example, each statement handle created opens
a cursor The open cursor in the Oracle database
repre-sents a memory handle within the database, and all
Oracle databases have a finite number of these handles
available While the persistent connections will
eventu-ally close (closing all the open cursors on that
connec-tion) when the Apache child process ends, on a busy
site you can see the open cursor counts rise until you
start getting error messages If you’re going to use
per-sistent connections, then, you must specifically close all
statement handles you create
Another problem area that can sneak up on you is in
the use of Oracle session parameters One of the most
common uses of Oracle session parameters is to set the
default date format If you want a particular date
for-mat for a query on one page and use the Oracle session
parameters to accomplish that, then that change in the
date format will persist to all scripts that happen to use
the same connection This can lead to very inconsistent
output without any clear indication of why it is
happen-ing
The last danger with persistent connections is that
current versions of PHP don’t handle database restarts
very well If you are using persistent connections and
restart your Oracle database, all of the open
connec-tions being used by your Apache server will become
corrupt but will not be reopened until the next page
This means that every time you restart your Oracle
database, you need to restart your Apache server or
your users will see lots of error messages until all the old
connections are retired
Using persistent connections to reduce resource lization can work if your PHP scripts all follow theseguidelines:
uti-• Program all scripts to clean up after selves
them-• Only use Oracle session parameters in welldefined and agreed upon ways
Having all your scripts clean up afterthemselves is an easy idea If youopen a statement handle, close it Ifyou open a new descriptor, close it
If you can remember to always closeevery resource you ever open, yourOracle server will reward you witheven performance and high upti-mes
While there are quite a few usefulfeatures in an Oracle server that can
be taken advantage of via Oraclesession parameters, these parame-ters must always be mutually agreedupon by all the programmers andused consistently If one program-mer chooses to set an Oracle sessionparameter, the unintended effects this will have oneveryone else’s code are very difficult to predict.Moreover, bug reports involving these types of param-eters are almost impossible to find
Minimizing Commits and TransactionSize
Every time an Oracle database does a commit, it willsave whatever is in the current buffer to disk This is truewhether or not there is any data to save Each of thesedisk writes takes time and resources on the Oracle serv-er
Because the default behavior of the Oracle functions
in PHP is to have auto-commit turned on by default,you dramatically increase the number of unnecessarydisk writes performed by the database server The rea-son that so many of the disk writes are unnecessary lies
in the fact that almost every statement handle used in
a PHP site is a query for data Unless your query is a
“select for update” operation, a select statement willrequire no saving of data and only needs the disk toread
The easiest way to avoid doing commits when all youwant to do is read data from the database is to use theOOCCII DDEEFFAAUULLTT option on your OOCCIIEExxeeccuuttee statements.This changes the behavior of SQL execute statementsfrom auto-committing your statement handle to defer-ring the commit While this might lead to problemswith the roll-back spaces, if you followed the earlieradvice of always closing your statement handles, then
F
“E very time an Oracle database does a commit,
it will save ever is in the current buffer to disk.”
Trang 13what-resource utilization will be kept to a minimum.
When doing inserts, updates, and deletes, however,
you must do a commit or your data changes will not be
saved There may be situations where you wish to defer
the commit until after further operations are
complet-ed, but under most circumstances you’ll want to
com-mit as soon as you execute the statement The only
sit-uation where you must defer a commit is when using
certain types of bind variables such as when dealing
with large objects or PL/SQL
If you are going to defer commits when doing inserts,
updates, or deletes, then you must make sure that you
keep enough room in your Oracle rollback segments
The size of your rollback segments should be more than
large enough to accommodate the largest transaction
you will ever have in a single script If you’re going to
do a lot of work with large objects, then you should
make sure that your rollback segments could
accom-modate the largest large object you think you’ll
encounter
Leveraging the SQL Buffer Cache
One of the chief benefits of using an Oracle database as
the backbone of your PHP-driven site is that the SQL
engine available to you has enormous power to
manip-ulate data With that power, however, you pay a price
as your queries become more and more complicated
Each time you pass a SQL statement to the Oracle
data-base, it must be parsed and an execution model must
be created Every SQL statement must pass through the
Oracle cost based optimizer (CBO) to determine what
indexes will be used and in what order, along with the
various join conditions to best return the data
request-ed
To keep from returning to the CBO unnecessarily,
Oracle will maintain a cache of the results of each parse
and execution However, this cache is based on the
exact SQL statement and is case sensitive As a result, to
properly leverage this cache and avoid having to
re-parse the same SQL statements over and over again, all
of your SQL statements must be standardized
The easiest ways to standardize your SQL statements
consist of following two simple guidelines: always use a
consistent case convention and avoid putting newline
characters in your SQL strings While many
program-mers like to use a mixed case for all of their SQL
state-ments, it is hard to find two programmers who will do
it exactly the same way The easiest way to avoid
hav-ing case issues deny you equal access to the SQL buffer
cache is to always use the same case for all SQL
state-ments While it is a personal choice on which case to
use, we choose to always use lower case for SQL
state-ments, since they are easier to type this way
Another way to leverage the SQL cache is to look for
queries that vary only by particular parameters For
example, if you are always calling up rows from a
par-ticular table just varying the primary key you query on,
then you can make them all use the same SQL cacheentry by using a bind variable for the varying parame-ter In this way, the SQL is always the same but the bindvariable lets you select which row you wish to return.Another way to reduce the load on the Oracle data-base server is to move more complex queries intoOracle views A view is simply a pre-defined query Theadvantage to a view is that it gets compiled and opti-mized when it is created, rather than whenever theassociated statement is executed As a result, when thePHP script calls on the view, the database server hasalready dealt with the complex conditions Effective use
of views, often difficult for PHP programmers makingthe transition from other database systems, can dra-matically reduce resource utilization Another benefit isthat your DBA can optimize the views in the systemwhile leaving the PHP programmers code untouched
By following these guidelines, a busy site can oftenachieve a buffer cache hit ratio of 80%, or even betterunder some circumstances This will dramaticallyreduce the load on the CBO and other aspects of theOracle database server
Minimizing Data Transfer
Another stealth reason for seemingly slow performancelies in how much data is transferred between the Oracleserver and the Apache server Hopefully, the databaseand web servers are located physically close to oneanother (preferably on the same network subnet).However, large amounts of data transfer—often unnec-essary—can cause slowdowns and reduce responsetimes
One way to reduce data transfer is to create viewsthat let you retrieve only the data that you actuallyneed For example, if you have a table with one ormore large objects in it and you don’t need LOB data,don’t put those columns in your query If you prefer touse the sseelleecctt ** ffrroomm…… syntax, then create a view thatcontains the non-LOB columns of the table
Another way to avoid unnecessary data transfer is towrite queries without having to do queries within yourreturn result loops If you have your code do an innerquery for each return result row, then you’re going to
be putting a lot of extra pressure on data transferbetween your PHP script and the Oracle server Because
of the ability of the Oracle SQL engine to do complex,multi-dimensional queries, it is almost always possible
to write a nested query as a single query
Another place where data transfer can hurt ance is within the database itself It is often difficult forthe CBO to know that a join condition is a foreignkey/primary key relationship If you know that only onematching row will ever occur for a given join condition,then you can let the CBO know this by passing theFFIIRRSSTT RROOWWSS SQL compiler directive as in Listing 1 Thislets the CBO know that all of the join conditions are for-
perform-F
Trang 14eign keys to primary keys.
One last condition where data transfer can affect
per-formance is in the use of database links between
multi-ple Oracle servers When you do joins across database
links or even on the far side of the database link, the
local CBO will work especially hard putting all the data
together You will often find that performance on both
the local database server and remote database server
suffers as data transfers across the database link eat up
resources on both sides
Optimizing Common Tasks with PHP and Oracle
There are a number of common tasks where the
choices made by the programmer can have a
signifi-cant cumulative effect on performance Among the
tasks where these choices arise are in providing paged
output, computing subtotals and grand totals, finding
sums and averages conditionally, and querying against
date constraints In all cases, there are multiple ways to
accomplish the same task We will show what we have
found produces the least impact on all our servers
together
Paging Query Output
Programmers are often called upon to provide search
output or reports in a paged format For example, you
may want to show search results limiting the output to
only 20 results per page In a web environment, this
helps to avoid situations where a poor search may
return thousands of entries, only the first few of which
are really of interest to the user
With some RDBMS systems—including mySQL and,
to some extent, Microsoft SQL Server—built-in sions to SQL allow you to do this quickly and easily.Fortunately or unfortunately, Oracle databases haveonly half of what you need to limit output, and it isalways done before the sorting requested in the query.Given those limitations, you need to decide whetheryou will have PHP limit your search output or whetheryou will have your Oracle query return only the rowsyou are interested in
exten-To really understand the limitations in Oracle, let’sstart to build a paged query from the inside out Let’ssay you want to retrieve all employee records sorted bylast name and then first name The query in Listing 2works well enough So long as there are only a fewdozen employees, you never need worry about howmany rows get returned and displayed in the browser.Once you move beyond a few dozen rows returned
to a few hundred rows returned, you’ll want to limit theoutput If you wanted to display only the first twentyrows in the query and you had done a superficial read-ing of the Oracle SQL manuals, you would be tempted
to use the query in Listing 3 Unfortunately, the OracleCBO will limit your query by row number prior toapplying the sorting routines, giving you inconsistentoutput The cure is to put the main query from Listing
2 into a subquery, as in Listing 4
This basic technique can give you a query that cutsoff at a particular maximum value This works wellenough to display the first page of your paged output.Assuming you want to display the second page, youwill have to employ another round of querying If youmodify the query in Listing 4 to also return the rownumber returned, then you can put the whole query inyet another subquery and limit output to only thoserows whose row number is at least as large as the min-imum value you want to return The full query shown
in Listing 5 shows how to return rows 20 through 39 ofthe result set
The main performance issues you’ll face when trying
to decide whether to use the paged query describedabove versus doing a solution with PHP code will cen-ter on whether data transfer or the Oracle CBO opti-mizations give you the best performance Our experi-ence has demonstrated that for the first 3-4 pages ofoutput, the query solution gives slightly better results
rroowwnnuumm << 4400)) ff wwhheerree
ff rr >>== 2200
Listing 5
Trang 15The larger the inner result set and the more pages of
output the user pages to, the closer the performance
impact between query and PHP solutions become
Using Roll-up Queries
Another common task when producing reports
involves creating subtotals and grand totals When the
only computations are summations and counts,
whether you use the aggregation functions in Oracle or
you use PHP variables is almost immaterial However, if
your query involves averages or other functions, then
you’ll want to take advantage of the large family of
aggregation functions available in Oracle SQL
The roll-up features available in Oracle SQL center on
options given the GROUP BY clause along with use of
the GROUPING function An example query using
these functions is shown in Listing 6 This query findsthe minimum, maximum, average, and totals for
13 $sql = “select d.dname, e.ename, min(e.sal) as min_salary, “ ;
14 $sql = “max(e.sal) as max_salary, avg(e.sal) as avg_salary,
eemmpp ee,, ddeepptt dd wwhheerree ee ddeeppttnnoo==dd ddeeppttnnoo ggrroouupp bbyy
rroolllluupp((dd ddnnaammee,, ee eennaammee))
Listing 6
Trang 16salaries by employee with department subtotals and a
grand total for all rows In order to know which rows
contain subtotals for a particular column, we include
values from the GROUPING functions in our SELECT
clause The main thing to remember about the
GROUP-ING function is that it behaves opposite to what you
think it does Moreover, when a column is part of a
rolled-up subtotal, then the return set will have a NULL
value in that column You cannot count on this to tell
you when you have a subtotal column in case there are
actual NULLs that are a legitimate part of the return set
A complete PHP example using the query from
Listing 6 utilizing the GROUPING function output to
show subtotal rows and the grand total row is shown in
Listing 7
Using Case and NVL
A number of specialized data situations occur where
the Oracle CASE and NVL functions provide beneficial
solutions Often, you will want to total things
condi-tionally or otherwise operate on selective data In
addi-tion, there are cases where you want to pivot your
out-put from what the natural select order would give you.Finally, there are cases where you want one thing tohappen if there is a value in a column and anotherthing to happen if the column is null
To conditionally operate on columns, you can usePHP with variable accumulators, or you can do theseaccumulations in your query The advantage to doingthem in PHP is that you will cut down on the amount
of work the Oracle CBO has to do when it is assemblingthe query However, with the use of the SQL buffercache, indexes, and potentially views, you can mitigatethis quite a bit The big disadvantage to doing this withPHP is that you will have a lot of unnecessary datatransfer between your Oracle server and your Apacheserver Moreover, you will be increasing the memoryutilization on your Apache server—a commodity that isusually in short supply on a busy machine
Let’s say that you wanted to sum up the salaries forall managers You can do this with the PHP code inListing 8 This will retrieve all the data from the data-base and decide which data to use in its sum.Alternatively, you can use the single query shown inListing 9 that will give you the answer right away
When you wish to transpose or pivot the output, youare often forced to retrieve all your data via calls to thedatabase and then put it into arrays in PHP for output.This often is the most efficient method for accomplish-ing this task However, if the circumstances are right,you can use the CASE function to retrieve the columnsyou wish to use individually We have found this partic-ularly useful when we wish to display reports on trans-actional data for today, yesterday, this week, and thismonth The use of a CCAASSEE statement to pivot aroundthese date values makes the process very straightfor-ward If you refer to Listing 10, you’ll see a samplequery whose output appears in Figure 1 This, like theprevious example, gets just the information requiredwith a minimum of data transfer
Finally, when you are looking to execute a query thatwill have conditional logic based on whether a column
is NNUULLLL or contains a value, the NNVVLL function can greatlyspeed up the process There are a number of ways inwhich this function can be of use For example, when
we have a sequence value that is used in a table forsorting purposes, we often want to have new entriesappend at the end of the sequence The SQL statement
in Listing 11 will insert a new item and will guaranteethat it will always appear at the end of the list
Another instance where the NNVVLL function can be ofuse is when dealing with effectivity dates In those
9 while ( OCIFetchInto ( $stmt , $row , OCI_ASSOC )) {
11 $mgr_total += $row [ “SAL” ];
ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt llooggiinn ddaattee))==ttrruunncc((ssyyssddaattee))
tthheenn 11 eellssee 00 eenndd)) aass ttooddaayy,,
ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt llooggiinn ddaattee))==ttrruunncc((ssyyssddaattee 11))
tthheenn 11 eellssee 00 eenndd)) aass yyeesstteerrddaayy,,
ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt llooggiinn ddaattee,, ‘‘IIWW’’))==ttrruunncc((ssyyssddaattee,, ‘‘IIWW’’))
tthheenn 11 eellssee 00 eenndd)) aass tthhiiss wweeeekk,,
ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt llooggiinn ddaattee,, ‘‘MMOONNTTHH’’))==ttrruunncc((ssyyssddaattee,, ‘‘MMOONNTTHH’’))
tthheenn 11 eellssee 00 eenndd)) aass tthhiiss mmoonntthh
((mmeennuu iitteemm sseeqq nneexxttvvaall,, nnvvll((mmaaxx((sseeqq))++1100,, 1100)),, ‘‘NNeeww IItteemm’’,, ‘‘//nneeww iitteemm hhttmmll’’))
Listing 11
Trang 17cases, you want to know if the current date is at least
what the start date is indicating and at most what the
end date is indicating Listings 12 and 13 contain two
alternative queries that will return the same data The
results of these queries are not terribly different in their
impact on the database server and are mostly just an
exercise in thinking about the uses of NNVVLL in creative
ways
Fast Oracle Date Functions
One thing that often trips up programmers who are
new to Oracle databases is that the Oracle DDAATTEE column
data type is actually a date and time column Oracle
does not have a column that is date only or time only
as many other RDBM systems do Instead, Oracle dates
are stored internally as a floating point number The
integer portion of the floating point number is thenumber of days since January 1, 2000BC The mantissarepresents what portion of a day the time represents
Thus, 10.5 would represent January 10, 2000BC, atnoon
This method of storing dates means that there aresome very quick methods for doing particular kinds ofdate-related logic For example, if you want to knowthe number of days between two dates (not countingany time of day differences), then you can use theTTRRUUNNCC function as shown in Listing 14 If you want toknow if a column named LLAASSTT DDAATTEE matches today,then you can compare TTRRUUNNCC((LLAASSTT DDAATTEE)) withTTRRUUNNCC((SSYYSSDDAATTEE)) The TRUNC function, given just a sin-gle parameter, will convert the floating point date into
an integer This has the effect of converting the dateand time into midnight of the given date
If you pass additional arguments to TTRRUUNNCC, you canmove your date in even more strategic fashions Forexample, to see whether two dates are in the samemonth, you can compare TTRRUUNNCC((ddaattee11,, ‘‘MMOONNTTHH’’)) toTTRRUUNNCC((ddaattee22,, ‘‘MMOONNTTHH’’)) To see if two dates are in thesame week, you can use TTRRUUNNCC((ddaattee11,, ‘‘IIWW’’)) and TTRRUUNNCC((ddaattee22,, ‘‘IIWW’’)) Note that
we use IIWW instead of WWWW since in Oracle, the IIWW refers to
an ISO week specification in which weeks always begin
on Monday If you use the WWWW week parameter, then theweek will begin on whatever day of the week that year’sJanuary 1st occurs on
If you refer back to Listing 10, you will see an tive use of the TTRRUUNNCC function with Oracle dates Thesefunctions are much faster than using either TTOO CCHHAARRcomparisons or doing comparisons of BBEETTWWEEEENN
effec-Moreover, because Oracle date columns also contain atime, using TTRRUUNNCC will save you from inclusive problemswhen you do use a BBEETTWWEEEENN function For example, let’ssay that you want to know all transactions thatoccurred between August 1, 2003, and August 31,
2003 If you just used the query in Listing 15, then notransactions that occurred during the day on August 31would be included However, if you use the query inListing 16, then you’ll pick up everything that occurred
on August 31 regardless of the time
Tuning and Monitoring
In a perfect world, all PHP programmers would beexperts in creating pre-tuned SQL statements If wecould always be counted on to do things in the mostefficient manner, then we could do away with monitor-ing of our databases However, since none of us everseems to live in this perfect place, there is always a need
to keep an eye on which queries are using whatresources on the database and on the web server inorder to keep on top of performance issues
Tuning and monitoring consists of a number oftasks—most of the time performed by an Oracle DBA
((ssttaarrtt ddaattee << ssyyssddaattee oorr ssttaarrtt ddaattee iiss nnuullll)) aanndd
((eenndd ddaattee >> ssyyssddaattee oorr eenndd ddaattee iiss nnuullll))
nnvvll((ssttaarrtt ddaattee,, ssyyssddaattee)) <<== ssyyssddaattee aanndd
nnvvll((eenndd ddaattee,, ssyyssddaattee)) >>== ssyyssddaattee
Listing 13
Trang 18However, occasions do occur where a PHP programmer
can participate in the tuning and monitoring cycles
Often, a DBA will find a query that is performing badly,
will know how to fix it, but will have a terrible time
actually finding where this query lives in the code for
the web site Moreover, the DBA will need to work
closely with the programmer to ensure that any
alter-ations to the query will continue to return the correct
data
The basics of performance tuning come down to two
tasks for programmers: finding bad (or poorly
perform-ing) SQL and creating monitoring tools
Finding Bad SQL
The Oracle data dictionary keeps track of system
resource utilization for each and every query in the
sys-tem You can query various system tables to discover all
kinds of performance characteristics at any time you
wish to
There are several ways in which to measure good and
bad performance for Oracle SQL The chief
characteris-tics we monitor include buffer gets, parse calls, and disk
reads These refer to the various parts of the Oracle
database server having the greatest impact on query
performance In each case, lower numbers indicate
bet-ter performance characbet-teristics
To find the queries that have the poorest ratio of
buffer gets, you can perform the query in Listing 17
Buffer gets are a measure of CPU utilization in the
Oracle server If you are concerned only with a few
database users, then you can limit the where clause to
include only the database users you wish to find To
interpret the ratio returned in this query, let’s examine
the manner in which it is constructed This query
returns the ratio of CPU utilization over the number of
times the query was executed This will let newer
queries (those that haven’t been executed many times
yet) stand out over queries that have been in the
sys-tem longer
Another measure of poor performance would be the
number of times a particular query must be parsed by
the Oracle CBO A query to find the worst performers
in this category is shown in Listing 18 A higher
num-ber indicates a query that may need to be placed in a
view or otherwise optimized to avoid having to parse it
over and over again While a view won’t reduce the
number of “soft” parses, it will cut down on the
num-ber of “hard” parses Again, by dividing the numnum-ber of
parse calls by the execution count, newer queries will
rise to the top of the list
The last major area of performance indicators would
be looking at those queries with the highest ratio of
disk reads To find these queries you can use Listing 19
This query will report a high ratio for a query if there are
a large number of disk reads for each execution of the
query These are likely candidate queries to be further
optimized with additional WHERE clauses or other
tech-niques to cut down on data transfer
More information on interpreting these values can befound in the Oracle publication Oracle 8i Designingand Tuning for Performance This is usually available as
a PDF document in the set of CDs that came with yourOracle server software
F
sseelleecctt rroowwnnuumm aass rraannkk,, bb **
ffrroomm ((
sseelleecctt uu uusseerrnnaammee,, vv ppaarrssee ccaallllss,, vv eexxeeccuuttiioonnss,, rroouunndd((vv ppaarrssee ccaallllss//ddeeccooddee((vv eexxeeccuuttiioonnss,,00,,11,,vv eexxeeccuuttiioonnss)))) aass rraattiioo,, vv ssqqll tteexxtt
ffrroomm vv$$ssqqll vv,, ddbbaa uusseerrss uu,, ((
sseelleecctt ppaarrssiinngg uusseerr iidd,, 66**aavvgg((ppaarrssee ccaallllss)) aass aavvgg ppaarrssee ccaallllss ffrroomm
vv$$ssqqll wwhheerree ppaarrssee ccaallllss >> 00 ggrroouupp bbyy ppaarrssiinngg uusseerr iidd )) aa
wwhheerree vv ppaarrssee ccaallllss >> aa aavvgg ppaarrssee ccaallllss aanndd vv ppaarrssiinngg uusseerr iidd==aa ppaarrssiinngg uusseerr iidd aanndd vv ppaarrssiinngg uusseerr iidd==uu uusseerr iidd
oorrddeerr bbyy rroouunndd((vv ppaarrssee ccaallllss//ddeeccooddee((vv eexxeeccuuttiioonnss,,00,,11,,vv eexxeeccuuttiioonnss)))) ddeesscc )) bb
wwhheerree rroowwnnuumm <<== 8800
Listing 18
sseelleecctt rroowwnnuumm aass rraannkk,, bb **
ffrroomm ((
sseelleecctt uu uusseerrnnaammee,, vv bbuuffffeerr ggeettss,, vv eexxeeccuuttiioonnss,, rroouunndd((vv bbuuffffeerr ggeettss//ddeeccooddee((vv eexxeeccuuttiioonnss,,00,,11,,vv eexxeeccuuttiioonnss)))) aass rraattiioo,, vv ssqqll tteexxtt
ffrroomm vv$$ssqqll vv,, ddbbaa uusseerrss uu,, ((
sseelleecctt ppaarrssiinngg uusseerr iidd,, aavvgg((bbuuffffeerr ggeettss)) aass aavvgg bbuuffffeerr ggeettss ffrroomm
vv$$ssqqll wwhheerree bbuuffffeerr ggeettss >> 00 ggrroouupp bbyy ppaarrssiinngg uusseerr iidd )) aa
wwhheerree vv bbuuffffeerr ggeettss >> aa aavvgg bbuuffffeerr ggeettss aanndd vv ppaarrssiinngg uusseerr iidd==aa ppaarrssiinngg uusseerr iidd aanndd vv ppaarrssiinngg uusseerr iidd==uu uusseerr iidd
oorrddeerr bbyy rroouunndd((vv bbuuffffeerr ggeettss//ddeeccooddee((vv eexxeeccuuttiioonnss,,00,,11,,vv eexxeeccuuttiioonnss)))) ddeesscc )) bb
wwhheerree rroowwnnuumm <<== 8800
Listing 17
Trang 19Monitoring System Resources
When you’ve started looking at the performance of
var-ious queries within your system, you will soon find
yourself wanting to do something more systematic to
keep on top of performance issues before they get out
of hand When this time comes, you’ll want to have a
set of queries and a process in place to look at SQL
per-formance over time
By putting the queries mentioned above into a
regu-larly scheduled script, you can see over time which
queries are being used most often by your applications
This can be an invaluable tool for programmers who
want to find out where optimization efforts will yield
the highest results and can also keep you from
spend-ing lots of time optimizspend-ing a query that is run once per
day at the expense of optimizing a query run on every
single page on your site
Another area where you can track performance issues
is on the Apache servers In this case, looking at howmany active Apache child processes occur at any giventime, as well as tracking load average, memory utiliza-tion, and overall system process counts, can help iden-tify problems with your web server before they get out
of hand
We use a number of scripts and tools to monitor oursystems on a regular basis Among the key tools arescripts that run the queries looking for poor perform-ance in buffer gets, parse calls, and disk reads on a dailybasis We also have on every web server scripts thatmonitor load averages, process counts, and memoryutilization and feed that data into a round-robin data-base (RRD) We can then generate graphs of systemperformance for a number of time periods on an ongo-ing basis
Only through a concerted effort on a number offronts can you maintain a good picture of where yourperformance issues lie today and where the perform-ance issues of tomorrow will likely occur
Summary
Having access to a powerful database like Oracle is atremendous asset to a PHP programmer The flexibilityand power of the data engine is something that canreally help create complex and robust web sites.However, performance issues often arise that will takeyou by surprise unless you are prepared to deal withthem
Hopefully, this guide can serve as a starting point foryou and your organization to take steps in utilizing yourOracle database to its full potential without causing toomany problems The lessons passed on here are all theresult of painful processes as we dealt with performanceissues in real life crisis situations Perhaps learning how
we solved performance issues will keep your web anddatabase servers working well together
F
To Discuss this article:
dynamic web pages - german php.node
news scripts tutorials downloads books installation hints
D y namic Web Pages
Trang 20PHP’s primary spell checking functionality is made
available through the pspell extension., which is
based on the Aspell library
The Aspell library is a well-established open source
spell-checking engine used by many other applications
One of its neat abilities is the capability to spell check
multiple languages, rather than the single one that
most other solutions are limited to At this time, Aspell
has dictionaries for over 20 languages, and new ones
are being added all the time Because Aspell is a fairly
commonly used library, it can be found by default on
most open-source operating systems—chances are, you
won’t actually need to download an install a new
library to take advantage of what Aspell has to offer
This is very useful, because it makes the process of
adding the pspell extension to PHP a simple matter of
recompiling PHP with the ––wwiitthh ppssppeellll flag, which
should be helpful if you need to convince your ISP to
add this extension Unfortunately, even though the
underlying library is almost always available, very few
ISPs actually have this extension enabled, so keep that
in mind when writing software that will depend on the
functionality offered by pspell
Getting Started with pspell
Installing or upgrading Aspell (PHP requires Aspell
0.50.0+) is a fairly simple process that involves
down-loading and installing the library itself, followed by the
installation of the dictionaries you intend to use The
library includes only the spell checking engine—the
dictionaries must be installed individually from separate
packages available on the Aspell’s website Additional
dictionaries can be added at any point, so there is littleneed to install all of the available dictionaries rightaway That said, the dictionary files themselves takevery little space (about one to two megabytes each)and the advantage of compiling them at the onset isthat you won’t have waste time if you want to use addi-tional dictionaries at a later point In any case, all majordistributions have binary packages for both the libraryand commonly used dictionaries, so the upgrade/installprocess is fairly painless
Once the library is installed, you simply need to add
——wwiitthh ppssppeellll to your PHP configuration If the librarywas not installed inside the standard location, such as//uussrr or //uussrr//llooccaall, you will need to specify the correctpath to the directory where it resides, for example ——wwiitthh ssppeellll==//ppaatthh//ttoo//lliibb You also have the option ofinstalling pspell as a shared extension (via ——wwiitthh ppssppeellll==//uussrr,,sshhaarreedd) that can be enabled only for par-ticular hosts This is quite useful if you need to enablethe functionality for a specific account or limit capabil-ity to use pspell to higher tier accounts It is important
to keep in mind that spell checking is a relatively slowprocess and spell checking large quantities of text maytake some time Therefore, it is important set execution
REQUIREMENTS
Everyone makes typos That is a universal constant, but no one wants their typos to end up in the final product, be it
an e-mail or a blog entry Consequently, many programs have integrated spell checkers that can find and help correct the mistakes made by busy fingers Unfortunately, for the most part this functionality is not available to many forms of web communications, such as forums, blogs and online comment systems This is primarily due to the fact that it is not easy to implement a spell checker, and few developers are familiar with the extensions and libraries that can simplify the process This article will focus on two PHP extensions that offer spell checking functionality that can be used to validate and correct typos and spelling errors
Trang 21time limits to prevent scripts from taking excessive
amounts of time when forced to spell-check large
doc-uments
Once all the necessary tools are in place, the actual
spell-checking process can begin The first step is the
creation of a pspell resource that will allow the usage of
the spell checker This is done via the ppssppeellll nneeww(())
function, which accepts a number of parameters The
first and the only required parameter is the two-letter
language code that tells the extension which dictionary
will be used Since some languages use multiple
spellings for the same word (depending, for example,
on the particular dialect of the language spoken in a
country), you may also want to specify the country as
well, which can be passed along as a second, optional
parameter For example, for the English language there
are three possible country values:
British, Canadian and American
You also have an option of
speci-fying a jargon and locale files,
although these values are largely
unused and in most instances it
is best to leave them at their
defaults The very last option (a
bit mask) allows you to set your
preferences regarding how hard
pspell should try to find spelling
alternatives to a word that is
mis-spelled The values range from
PPSSPPEELLLL FFAASSTT, which will return
the fewest number of suggestions but will take the least
amount of CPU, to PPSSPPEELLLL BBAADD SSPPEELLLLEERRSS, which will
return the maximum possible number of suggestions,
but will take a noticeably greater amount of CPU The
default mode is PPSSPPEELLLL NNOORRMMAALL, which tries to find a
“happy compromise” between the quality of the
sug-gestions returned and the processing time needed to
generate them You can also use this parameter to set
an option and indicate how words that are not separate
by a space (also known as run-togethers) should be
handled By default, they would be considered typos,
but in some instances you may want to allow them
The spell-checking options can also be set via the
ppssppeellll ccoonnffiigg rruunnttooggeetthheerr(()) function, which can
change the run-together behaviour, and
ppssppeellll ccoonnffiigg mmooddee(()), which can change the spelling
mode The ability to change the mode at any time is
very handy, as it allows the usage of faster defaults and
then, if these fail to generate the necessary data, to
switch to a more complex mechanism for a particularword
Now that we have created a spell-checking resource,
it can be used to validate text The actual text tion is done through two functions, ppssppeellll cchheecckk(()),which will determine if the specified word is correctlyspelled and return FFaallssee if it isn’t In this case, you canuse the ppssppeellll ssuuggggeesstt(())function to generate an array
valida-of possible alternatives
if (!pspell_check($psl, “speler”)) {
$suggestions = pspell_suggest($psl, “speler”);
foreach ($suggestions as $word) {
echo $word “<br />\n”;
} }
Both ppssppeellll cchheecckk(()) and
ppssppeellll ssuuggggeesstt(())can only workwith one word at a time; tospell-check an entire document,you will need to first use PHP tobreak down the text into indi-vidual words that can be fed tothe pspell If you are dealingwith plain text, this is very sim-ple to do, especially so if youhave PHP 4.3.0, where the
ssttrr wwoorrdd ccoouunntt(()) function isavailable This function hasthree modes of operation: the default mode will simplycount the number of words inside a string and return
an integer result The second mode—the one wewant—will return an array of words that can be spellchecked
$wl = str_word_count(“will return an aray of words”, 1);
foreach ($wl as $key => $word) {
if (!pspell_check($psl, $word)) {
$sug = pspell_suggest($psl, $word);
// replace word with 1st suggestion
$wl[$key] = $sug[0];
} } // print corrected text (will return an array of words)
echo implode(‘ ‘, $wl);
If you are using an older version of PHP that does nothave ssttrr wwoorrdd ccoouunntt(()) you can emulate the secondoperation mode by using pprreegg mmaattcchh aallll(()), which isnoticeably slower, but will still get the job done
If (!function_exists(“str_word_count”)) { function str_word_count($text) {
preg_match_all(‘!(\w+)!’, $text, $m);
return $m[0];
} }
F
“A t this time, Aspell has dictionaries for over 20 languages, and new ones are being added all the time”
Trang 22The problem with this code is that the word array you
will receive in return from your call to the “simulated”
ssttrr wwoorrdd ccoouunntt(()) function contains only the words,
and all of the punctuation and non-alphabetic
charac-ters will not be present Thus, if you simply do
iimmppllooddee(())as I did in the previous example, all of those
characters will be lost and only the words will be
retained—clearly a bad idea Thus, we need a way to
replace the misspellings without losing the formatting,
so that our modifications only affect the misspelled
words
This is where the third and arguably the most useful
mode of the ssttrr wwoorrdd ccoouunntt(()) function comes into
play When this mode is used the resulting array will
have the offset of the word inside the string as the key
for each element This allows you to easily find the
posi-tion of the word inside the text and replace it via
ssuubbssttrr rreeppllaaccee(()) quickly and efficiently—and without
the risk of text corruption While this can be emulated
with pprreegg mmaattcchh aallll(()), it would require the use of the
PPRREEGG OOFFFFSSEETT CCAAPPTTUURREE flag that is only available in PHP
4.3.0 and higher Since PHP 4.3.0 already has a native
ssttrr wwoorrdd ccoouunntt(())function, there is no need to emulate
it using a slower alternative If you are using an older
version of PHP, you will need to come up with your own
string parser—or better yet upgrade your installation!
// adjust offset since word has changed
$off += strlen($r) - strlen($w);
}
}
return $s;
}
Replacing a misspelled word with the first suggestion
offered by the spell checker is not always the best
approach, although most of the time it will work
rea-sonably well Generally speaking, it is better to replace
a word with a select box that would allow the user to
choose a correct spelling or convert the word into a link
that would raise a layer with possible suggestions
through JavaScript The function itself would pretty
much remain the same—except that the code that
deals with replacement would now loop through all the
possible results and create an appropriate list of
alterna-tives
Advanced pspell
Now that the basic spell checking functionality is
work-ing, let’s take a look at some of the more advanced
capabilities that the pspell extension offers
When working with text, you will undoubtedlyencounter words that are correctly spelled, but the spellchecker does not recognize This is a frequent occur-rence when using industry-specific terminology, names
or slang In those instances, you would probably want
to make the spell checker ignore this word and not try
to suggest alternatives for it For example, when ing with HTML tags inside the text or formatting tagssuch as FUDcode or BBcode, you could add the tags tothe dictionary so that the spell checker can simply skipover them and save you the time of having to add spe-cial handlers for those tags, over complicating yourcode For this purpose, you can use the
work-ppssppeellll aadddd ttoo sseessssiioonn(())function to add a word to thecurrent session that would effectively make the spellchecker ignore it
// Before: will print <BLOCK QUOTE>stuff</BLOCK QUOTE>
echo spell_check_str($psl, QUOTE>’);
‘<BLOCKQUOTE>stuf</BLOCK-// After: will print <BLOCKQUOTE>stuff</BLOCKQUOTE>
a joint ignore list To create such a list, the pspellresource creation process needs to be changed to allowfor the usage of personal dictionary files Instead of using the ppssppeellll nneeww(()) function,
ppssppeellll ccoonnffiigg ccrreeaattee(())is used to create a new pspellconfiguration resource This function takes all of thesame arguments as ppssppeellll nneeww(()), except the optionparameter The options regarding the mode and han-dling of the run-togethers will need to be set via
ssppeellll ccoonnffiigg mmooddee(())and ppssppeellll ccoonnffiigg rruunnttooggeetthheerr(())
separately The ppssppeellll ccoonnffiigg ppeerrssoonnaall(()) function isthen used to specify the path to the custom word listfile, containing a list of words to ignore If you intend
to add to this file, be sure that it is writable by the userwho PHP is going to be running as Once these stepsare completed, a pspell resource can be created based
on the configuration resource that was generated viathe ppssppeellll nneeww ccoonnffiigg(())function
// create new config based on english
Trang 23New words can then be added to the ignore list via
the ppssppeellll aadddd ttoo ppeerrssoonnaall(()) function, which is
iden-tical to ppssppeellll aadddd ttoo sseessssiioonn(())as far as its parameters
are concerned Once all of the necessary words have
been added, they can be appended to the ignore file
via the ppssppeellll ssaavvee wwoorrddlliisstt(())function
// add word to personal dictionary file
pspell_add_to_personal($psl, “Ilia”);
// safe wordlist (appends to existing list
pspell_save_wordlist($psl);
Unfortunately, the Aspell library does not provide an
API for removing or modifying existing entries inside
the personal word list file To make these changes, you
will need to write your own function Fortunately, this
is very easy to do, since the format of the file includes
a basic header that specifies how many entries can be
found inside it, followed by the entries themselves, one
per line If you find yourself editing your custom
dic-tionaries very often, you can create a function for this
purpose, otherwise using your favorite text editor will
In some instances, you not only want to add words to
the ignore list, but also add them to the dictionary file
itself as possible alternatives that can be used in future
runs as a replacement for typos in words that are not
found in the stock dictionary file
This, too, is something that can be done through the
pspell extension, which allows for the creation and
usage of personal dictionary files that can be used in
addition to the base file provided for a particular
lan-guage If the dictionary file fails to find a match, it’ll try
using the custom file to determine a possible alternative
for misspelled word As with the word list file, you first
need to specify the path to the file where possible
alter-natives can be found This is done via the
ppssppeellll ccoonnffiigg rreeppll(()) function, which takes a pspell
configuration resource as the first parameter and the
path to the alternate dictionary file as the second The
ppssppeellll ssttoorree rreeppllaacceemmeenntt(()) can then be used to add
replacement suggestions, and calling
ppssppeellll ssaavvee wwoorrddlliisstt(())will now save both the ignore
list and the replacement list
$psc = pspell_config_create(“en”);
// specify personal replacement file pspell_config_repl($psc, “./my.rep”);
// Add replacement pspell_store_replacement($psl, “Iaaaliaa”, “Ilia”);
// save replacement pspell_save_wordlist($psl);
If you do not want to save a replacement, you canuse the ppssppeellll ssttoorree rreeppllaacceemmeenntt(()) function withoutspecifying the path to the file and saving the word list.The replacement mechanism itself is intelligent enoughthat if the specified string is close enough to the source,
it will use the replacement rather then the base ary, which may have further matches Using the abovecode as an example, if I were to spell check “Iaaliaa”, itwould prioritize “Ilia”, which was my replacement for
diction-“Iaaaliaa” over the dictionary’s suggestion of “Alia”.This, of course, means that, when adding replacementpairs, you don’t actually need to add an entry for everypossible misspelling of the word being added.Moreover, the library is intelligent enough to check itsmain database to see if the replacement is already avail-able and, if it is, it will not add the word to the person-
al replacement file
As with the word list file, there is no native function
to modify or remove entries from the replacement file.Fortunately, the format of this file is even simpler thanthe one used by the word list, because, while it doeshave a one line header, it is not actually being used.Other than the header, the entries are stored in thebbaadd wwoorrdd rreeppllaacceemmeenntt format and can be easily modi-fied with the following function
function md_repl($file, $src, $dst, $n_src=’’,
$n_dst=’’) {
// remove word
$data = str_replace(“\n{$src} {$dst}\n”,
“\n”, file_get_contents($file));
// add new replacement
if ($new_src && new_dst) {
$data = “{$new_src} {$new_dst}\n”;
} // update word list file fwrite(fopen($file, “w”), $data);
}
Beyond pspell
Aside from pspell, another spell checking extensioncalled Enchant has been recently made available
This extension can be found inside the PECL
reposito-ry and can be installed by running ppeeaarr iinnssttaalllleenncchhaanntt It is based on the Enchant library that provides
a common API to multiple spell checking engines, such
as Aspell, Ispell, MySpell, and so on
Having a single native API means that you can lessly use multiple engines without having to write your
seam-F
Trang 24own wrappers around different interfaces The Enchant
library works directly with each spell checking library,
so there is virtually no speed difference between using
the native interface offered by pspell and the wrapper
offered by Enchant
The main advantage of Enchant is that it gives you
the ability to use different spell checkers that may
sup-port other languages or have specific benefits, such as
lower memory footprint (Ispell) and better dictionaries
It also guarantees that you will have access to spell
checking support on virtually any system, since at least
one spell checking library is always included, although
you will need to install Enchant itself
The Enchant extension API is fairly similar to that of
the pspell extension and, for the most part, offers the
same capabilities—with, however, a few notable
differ-ences Since the Enchant extension can work with
many different spell checking engines, the spell
check-ing resource creation is designed to accommodate the
selection of the engine to be used The first step is to
initialize the enchant broker, which is done through a
call to eenncchhaanntt bbrrookkeerr iinniitt(()) function You can then
use the resulting resource to determine what spell
checking engines are supported by calling the
eenncchhaanntt bbrrookkeerr ddeessccrriibbee(()) function, which will return
an array of information arrays about the supported
/usr/lib/enchant/libenchant_aspell.so
)
)
The next step is determining the availability of a
dic-tionary for a language you want to spell check Unlike
pspell, which uses two parameters to select language
and locale, in Enchant both the settings are handled by
a single parameter, which looks something like
llaann gguuaaggee LLOOCCAALLEE (for example, eenn CCAA for Canadian
English) This parameter is then passed to the
eenncchhaanntt bbrrookkeerr ddiicctt eexxiissttss(()) function, which returns
TTrruuee if a dictionary is available and FFaallssee otherwise
If you have more then one spell checking engine
available (which is almost always the case), the Enchant
library will automatically choose what it thinks is the
best engine for the task, based on the availability of a
dictionary and its quality If the default choice is not to
your liking, you can modify the order in which the
engines are picked by using the
eenncchhaanntt bbrrookkeerr sseett oorrddeerriinngg(()) function, which takes
the language string, followed by a comma delimited
string where the engines are listed in the order you
want them to be used
function, which will return an array with informationabout the selected engine
an array of suggestions by calling the
To simplify the process, Enchant also offers the
eenncchhaanntt ddiicctt qquuiicckk cchheecckk(())function, which can check
a word and return a list of possible alternatives if it isnot spelled correctly, all in one go This makes the spellchecking-code slightly faster and reduces the amount
of PHP code you need to write (which is never a badthing)
if (!enchant_dict_quick_check($d, “spel”, tions)) {
$sugges-print_r($suggestions);
}
F
is that it gives you the ability to use different spell checkers that may support other languages ”
Trang 25When the function returns FFaallssee, indicating that the
specified word has been misspelled and a variable is
provided as the third optional argument (passed by
ref-erence), the function will populate that variable with
possible spelling alternatives
Once you are done working with the spell checker,
you should free the dictionary and the broker resources
by calling the eenncchhaanntt bbrrookkeerr ffrreeee ddiicctt(()) and
eenncchhaanntt bbrrookkeerr ffrreeee(()) functions respectively While
PHP will free those resources automatically on script
ter-mination, it is generally better to do so manually, so
that memory and dictionary file handles are released as
soon as possible
The Enchant extension also supports the ignore lists,
which can be used to allow certain words to be skipped
by the spell checker As with pspell, you have the
abili-ty to use both session- and file-based ignore lists, which
can be shared by multiple processes The session-based
ignore lists are handled by two functions,
eenncchhaanntt ddiicctt iiss iinn sseessssiioonn(()), which checks if a
par-ticular word is already being ignored and
eenncchhaanntt ddiicctt aadddd ttoo sseessssiioonn(()), which adds a word to
a session’s ignore list Since the add-to-session function
does not return a status indicator, you should use
eenncchhaanntt ddiicctt iiss iinn sseessssiioonn(()) to verify if the word
was, in fact, added successfully Keep in mind that not
all spell-checking engines may support this
functionali-ty, so this may not always be possible
enchant_dict_add_to_session($d, “Ilia”);
if (!enchant_dict_is_in_session($d, “Ilia”)) {
exit(“Cannot add to session ignore list.\n”);
}
To use a more permanent file-based ignore list, you
first need to establish the path to your ignore file by
calling the eenncchhaanntt bbrrookkeerr rreeqquueesstt ppwwll ddiicctt(())
func-tion The file must already exits, but can be empty—if
it does not exist or is not accessible, the function will
fail To add entries to the file, you can use the
eenncchhaanntt ddiicctt aadddd ttoo ppeerrssoonnaall(())function Like the
ses-sion function, this function does not return a success
indicator and eenncchhaanntt ddiicctt iiss iinn sseessssiioonn(())should be
used to confirm that the word has actually been added
Because the file name you provide is a generic
hold-er for the phold-ersonal word list, the Enchant library will
automatically create a spell-checking engine-specific
file as well For example, if the Aspell backend is being
used, Enchant will also create mmyy ppwwss inside the same
directory as mmyy ddiicctt This is a very important tidbit of
information to keep in mind when adding new words
to the list, since you will need to ensure that not onlymmyy ddiicctt is writable, but the directory it is in as well
Adding replacement alternatives is also possible;however, unlike what happens with pspell, these arealways session-specific, and there is no way to savethem for later re-use This is done through the
eenncchhaanntt ddiicctt ssttoorree rreeppllaacceemmeenntt(()) function, whichtakes a source string and a possible replacement thatcan be used for substitution
enchant_dict_store_replacement($d, “AAliaaa”,
“Ilia”);
enchant_dict_quick_check($d, “AAliaaa”, $sug);
echo $sug[0] “\n”; // will print Ilia
Mistakes Hapen Without A SpelCheker
As I hope you have an opportunity to discover, spellchecking text strings from you PHP scripts is not at alldifficult For the most part, the biggest difficulty is not
in checking the text—but in actually breaking it downinto individual words that can then be validated.Fortunately, ever since the introduction of the
ssttrr wwoorrdd ccoouunntt(())function, this has become a fairly ial process
triv-The functionality offered by a spell checker has manyuses, even in situations where users do not input longtext strings For example, in a search engine a spellchecker can be used to validate keywords You can alsouse it inside a PHP 404 handler (which I discussed in theMarch 2004 issue of php|architect) to check for typosand automatically correct them, taking the user to rightpage without any manual intervention or extra steps.Ultimately, a spell checker is a powerful tool that can besuccessfully applied to many problems with little effort,but that can make a big impact on the quality of yourapplications
F
To Discuss this article:
http://forums.phparch.com/149
Ilia Alshanetsky is an active member of the PHP development team and
is the current release manager of PHP 4.3.X Ilia is also the principal developer of FUDforum (h http://f dforum org), an open source bulletin board and a contributor to several other projects.
Trang 26Can’t stop thinking about PHP?
Write for us!
Trang 27Sometimes, you need a paper trail—an
honest-to-god, dead-tree paper trail Maybe you’re dealing
with data that needs to be formatted and
com-pared by human eye to previous decades of paper
records, or perhaps there are government regulations
requiring paper, or maybe it’s even just a pointy haired
boss who wants to do things the way they’ve always
been done
Fortunately, you can actually merge the digital and
the dead-tree using FDF—an Adobe technology layered
on top of their PDF format used pretty much by
every-body everywhere for cross-platform printable
docu-ments
You can think of an FDF file as a marriage between
HTML forms and PDF documents In practice, you can
have the same interface on your website as you have in
off-line paper-based forms, and all your web-based
forms can be printed on demand, complete with all
their fields filled in You can use the FDF forms in place
of your old HTML forms, and suddenly stop worrying
about browser compatibility and CSS layout snafus—
and get decent printing to boot
What does an FDF file look like?
You can create a simple FDF quite easily Using Adobe
Acrobat, you can scan in an existing paper form, or—
even better—import a Word document or several other
digital formats, including existing HTML forms
There are also a variety of other tools to create an
FDF, with varying features (and mis-features) A nice
comparison round-up can be found at
h
http://www.pcmag.com/article2/0,4149,1195058,00.asp
Alas, this article doesn’t specifically address creatingFDF as a super-set of creating PDF files, so, in the end,you may be stuck with only the one option of AdobeAcrobat for now Regardless of which tool you use tocreate an FDF, the basic internal structure will be thesame
Most likely, the tool available will turn the majority ofyour existing documents into a static image and, in thecase of HTML forms, turn all your IINNPPUUTT tags (andSSEELLEECCTT and TTEEXXTTAARREEAA) into corresponding FDF elements
If you’re dealing with scanned-in forms, you may need
to export images from your PDF and ‘clean up’ them toremove some of the static parts that you’ll be replacingwith FDF form elements, or you may find it easier, insome cases, to simply re-do the import after getting rid
of the offending elements in the source documents.Since images tend to occupy a lot of memory, you mayalso need to experiment with different tools to find theone that lets you ‘slim down’ your PDF to a size thatyou consider appropriate for your needs
For the purposes of this article, I have created thesimple HTML form that you can see in Figure 1 I thenused Adobe Acrobat to “Create PDF -> From Web
REQUIREMENTS
Trang 28Page” and pasted in my URL Then I had to rename all
the FORM input elements (see the “Challenges”
side-bars) and saved the resulting PDF
Internally, an FDF document keeps track of your field
names and values, as well as some basic presentation
information such as foreground/background color,
font, data type (optional) and any client-side validation
you wish to include In addition, the FDF document will
contain PDF information which will define the static
look of the document Field labels, form instructions,
logos, and any other artwork will typically all be wired” into the FDF Once processed, the PDF will looklike what you see in Figure 2 if you open it with a texteditor, or Figure 3 if you open it in Acrobat
“hard-If you scan in a paper form or import a Word file (or
a similar document), you may need to use your FDFtool to draw some controls on the FDF that will actual-
ly be used to input and output your data You’ll need toname the boxes suitably, and it would be nice to givethem Tooltips and nice labels This is accomplished indifferent ways depending on FDF tool you’re using, but
if you dig around in the menus in your FDF editor, ing for something that says (or looks like) a radio but-ton, an input box, and so on, you should be able to fig-ure out the GUI without much in the way of problems The GUI should also make how to make an input boxread-only or hidden and other similar amenities fairlyobvious, although I must say that, in my opinion, the
look-‘forms’ menu in Acrobat is rather deeply-buried, whichmakes things a bit difficult Fortunately, you can com-pletely re-arrange the Adobe tool palettes to your ownneeds, which I did by getting rid of all the commentingtools, and adding the ‘forms’ tools instead Ultimately,you need to create suitable FDF elements for yourapplication
The end user can then “fill in” the fields of the form,with Adobe’s free Reader, which will enforce the valida-tion, and then submit the data to your on-line applica-tion There are several choices for submission, but one
of them is “HTML” which means the FDF will POSTdata exactly the same way an HTML FORM would
Where does PHP come in?
Since you probably already know how to process POST
Trang 29data using PHP, that makes half of this process trivial—
you can insert or update the data coming from your
FDF forms exactly the same way you would with an
HTML form if you select “HTML” on your Submit
but-tons in your FDF
Stepping back in the process, you
can also use PHP to pre-process the
FDF documents before they are
pre-sented to the user to be filled in You
can pre-fill any known fields, saving
the user from re-typing their name
or other existing info You can also
show (or hide) various form
ele-ments as needed, so long as there is
no corresponding visual element
hard-wired in the PDF portion that
remains to visually confuse the user
You may find yourself, as I did,
con-verting what was formerly static text
into an un-editable FDF form element just so that you
can hide it when needed it
Pre-filling in your form and manipulating the FDF will
require the FDF library and the PHP FDF Module to be
installed, and you will need to use a handful of PHP’s
FDF functions, which is what we’ll focus on in this
arti-cle For more information on diagnosing your system
and installing these software packages, see the
“Installation” sidebar
An FDF application
Your first step in PHP to create a dynamic FDF file
con-sists of creating an FDF resource, just as you would
cre-ate an image or database connection You first want to
create a ‘blank’ FDF using ffddff ccrreeaattee(()) as shown in
Listing 1 Next, you would want to pre-set any known
fields in your FDF document These could come from a
database, or as in the example shown in Listing 2, the
current date
The arguments to ffddff sseett vvaalluuee(()) are mostly
obvi-ous: the FDF resource we have created, the name of the
field, and the value to be used are about as simple as it
gets The final argument, in this case zero, is actually
there only for backwards compatibility In older sions of the FDF Toolkit, a distinction between internal-ized FDF values (e.g.: FDF’s TRUE/FALSE) and externalvalues (such as our date) were separated and you
ver-would pass in one or zero based onwhat kind of value you were using
In today’s version, the final argument
is ignored, even though it is stillrequired Expect it to becomeoptional in future PHP releases
We have also used
ffddff sseett ffllaaggss(()) to make our ttooddaayyfield read-only, which will help cutdown the number of submissionerrors The Adobe FDF documenta-tion for this function is quiteobscure, but I think I have it sortedout The first argument, of course, isour FDF resource, and the second isthe name of the FDF field, which is straightforwardenough
The third argument is the “Key” for what we aregoing to be doing to the bits we are going to be chang-ing There are only six possible values for this argu-ment:
FDFSetFf FDFClearFf FDFFf FDFSetF FDFClrF FDFFlags
The first three “Key” values, FFDDFFSSeettFF, FFDDFFCClleeaarrFFff, andFFDDFFFFff can be used to Set, Clear, or completely replacethe field settings for the following attributes:
R ReadOnly y (0x00000001) R
Required d (0x00000002) P
Password d (0x00002000) - text fields only F
FileSelect t (0x00100000) - text fields only D
DoNotSpellCheck (0x00400000) - text fields only
NOTE: I should also point out that there
actually was a bug in PHP’s FDF Module
in version 4.3.3, so that I had to roll-back
a PHP upgrade and I couldn’t use some
nifty functions like ffddff eennuumm vvaalluueess(())
and ffddff ssaavvee ssttrriinngg(()) On the other
hand, this was fixed in CVS just a few
hours after then initial report—open
source software is just great!
// Pre-set the ‘today’ field to today’s date:
fdf_set_value ( $outfdf , ‘today’ , date ( ‘Y-m-d’ ), 0 );
// Force the “today” field to be read-only:
fdf_set_flags ( $outfdf , “today” , FDFSetF , 1 );
“Y ou can think
of an FDF file as a marriage
between HTML forms and PDF documents.”
Trang 30MultipleSelection (0x00200000) - listbox only
So, for example, ffddff sseett ffllaaggss(($$oouuttffddff,,
““rreeccoorrdd iidd””,, FFDDFFSSeettFFff,, 11 && 22)) would make the
rreeccoorrdd iidd field read-only and required Other settings
(e.g.: DDooNNoottSSccrroollll) would remain untouched with
whatever previous setting they had intact
Changing FFDDFFSSeettFFff to FFDDFFCCllrrFFff would clear the given
attributes, making the field writable (not read-only)
and allowing the field to be blank (not required):
// read/write, blank OK, preserve others
fdf_set_flags($outfdf, ‘record_id’, FDFClrFf, 1 & 2);
Using FFDDFFFFff instead would cause a wholesale
replace-ment of the attributes: FFDDFFFFff would set the field
read-only and required, and turn off any of the other settings
that had previously been set to “on”:
// read-only, required, reset all others to 0:
fdf_set_flags($outfdf, ‘record_id’, FDFFf, 1 & 2);
The last three “Key” values, FFDDFFSSeettFF, FFDDFFCCllrrFF, and
FFDDFFFFllaaggss, can be used to Set, Clear, or completely
replace the following attributes:
• H Hidden n flag (0x02)
• P Print flag (0x04)
• N NoView w (print only) flag (0x20)
Thus, ffddff sseett ffllaaggss(($$oouuttffddff,, ““rreeccoorrdd iidd””,,
FFDDFFSSeettFF,, 22 will hide a field, while
ffddff sseett ffllaaggss(($$oouuttffddff,, ““rreeccoorrdd iidd””,, FFDDFFCCllrrFF,, 22 will
show a field
This can be handy to show/hide buttons after a formsubmission, as we do in our example application afterprocessing the $$ PPOOSSTT data input in Listing 3
We also use the ffddff sseett ssttaattuuss(())function to set theFDF’s ssttaattuuss value This is akin to a JavaScript aalleerrtt(())
function Whatever message you put into an FDF’s ttuuss will be displayed in a popup dialog when the FDF
ssttaa is opened Thssttaa is ssttaa is handy to set success or error sages, as we do here to display the user’s Score in ourlittle quiz
mes-You may be wondering what happened to the PDFyou created For reasons I don’t really understand, youactually integrate that after you’ve set all the field val-ues, flag, options, and so on This is done with the
ffddff sseett ffiillee(())function, using something like:
fdf_set_file($outfdf, “/full/path/to/your/PDF”) or error_log(“ERROR: Unable to set FDF file.”);
To wrap this up, you then need to send the FDF tothe browser In more recent versions of PHP (4.3 andhigher) you can use ffddff ssaavvee ssttrriinngg(($$oouuttffddff))to sendthe FDF directly to your script’s output (which, if you’rerunning in a web environment, will reach the clientbrowser) However, in older versions of PHP you need
to save it to a temporary file first, as shown in Listing 4.Sending the FDF to the browser, with the correct
F
Listing 3
// Setting an FDF “button” to submit the fields as “HTML”
// lets us process FDF submission exactly as we would an
// HTML FORM using $_POST:
if (isset( $_POST ) && count ( $_POST )){
$corrects = array( ‘fdf_is’ => ‘Adobe’ , ‘bug_count’ => 3 , ‘case_sensitive’ => ‘on’ );
$high = count ( $corrects );
$score = 0 ;
reset ( $_POST );
while (list( $key , $value ) = each ( $_POST )){
if (isset( $corrects [ $key ]) && ( $corrects [ $key ] == $value )){
$score ++;
}
error_log ( “setting $key to $value” );
fdf_set_value ( $outfdf , $key , $value , 1 );
}
$percent = sprintf ( “%02.2f” , 100 * $score / $high );
fdf_set_status ( $outfdf , “Your scored $score/$high for $percent%” );
// Hide the ‘Grade Me’ button, since they already took the Quiz:
fdf_set_flags ( $outfdf , “save” , FDFSetF , 2 );
}
Listing 4
// Dump our PDF to a random temp file, since we don’t // have PHP >= 4.3.0 which would allow us to send it //directly to the browser
$temp = tempnam ( “/tmp” , “_FDF_” );
fdf_save ( $outfdf , $temp );
fdf_close ( $outfdf );
Trang 31headers, is pretty straight forward, as you can see in
Listing 5 This might look a bit weird, if you’ve only
used PHP to present HTML documents, but PHP is
actu-ally quite adept at spewing out a large variety of
web-related documents, such as JPEG/PNG/GIF, PDF, Flash,
and, in this case FDF After all, the user doesn’t need the
document to be boring and static—if the right solution
calls for a dynamic FDF, then PHP can do that
If you’re using Netscape as your browser, that pretty
much sums it up In fact, you probably should go
ahead and try to build a sample FDF application in PHP
now by putting all that source code into a file named
eexxaammppllee ffddff pphhpp on your server (you can, of course,also use the code that comes together with the maga-zine) You’ll also need to download or create the corre-sponding PDF, and store that on your server Finally,you’ll need to change //ppaatthh//ttoo//yyoouurr//PPDDFF in the source
to match the reality of your web server You should beable to get this working with Netscape; just don’t try touse Internet Explorer on it yet
Actually, there is one final note to the main portion ofthis article You may have noticed that, in most of thelistings, the source code uses PPHHPP SSEELLFF and other refer-ences to avoid using eexxaammppllee ffddff or eexxaammppllee ppddff This
is because everything except for Listings 2 and 3 can bepulled out and put into separate files that you caniinncclluuddee and use over and over I called themffddff iinnppuutt iinncc and ffddff oouuttppuutt iinncc myself, but you canarrange your own include files any way you like
Listings 2 and 3 deal with the business logic of theactual Sample Quiz, while all the rest of the code han-dles the grungy details of FDF creation and workingaround Microsoft bugs In the FDF projects on which Iwork, we have dozens, soon to be thousands, of formsusing the same FDF code, and our core business logicboils down to using $$ PPOOSSTT data to alter our database,which you already know how to do, and using ffddff sseett vvaalluuee(()), ffddff sseett ffllaaggss(()), and
ffddff sseett ssttaattuuss(())to pre-fill in our FDF, hide/show tons, and popup messages to the user respectively
but-And Now, For the Fun Part: InternetExplorer
Microsoft Internet Explorer is badly broken, at least insome areas So are all the other browsers, as you know,
but in this case, IE is very badly broken indeed.
Fortunately, there are ways around all that brokenness
In some cases, the brokenness is restricted to one
ver-F
Listing 6
// Microsoft Internet Explorer is broken, so we bury what
// should be _GET variables in our URL and import them
// here into $_PATH (like $_GET, only different)
// variables
if (isset( $_SERVER [ ‘PATH_INFO’ ])){
$variables = $_SERVER [ ‘PATH_INFO’ ];
// substr strips off initial “/”
$variables = explode ( “/” , substr ( $variables , 1 ));
while (list(, $key ) = each ( $variables )){
list(, $value ) = urldecode ( each ( $variables ));
$_PATH [ $key ] = $value ;
}
}
Listing 7
// Microsoft Internet Explorer is broken, so we force every URL to be unique by
// embedding a random string in it
// It has to be before the “xxx.pdf” for the same reasons that forced us to
// build $_PATH above
// There is a one in two billion [mt_getrandmax()] chance that this will
// screw up (having a duplicate PDF cached on the user’s machine)
// Actually, if you have a 64-bit machine, the odds are much better,
// but you probably know that already if you have a 64-bit machine
// strip off the name of the PDF:
$self_last_slash = strrpos ( $self , ‘/’ );
$self_front = substr ( $self , 0 , $self_last_slash - 1 );
$self_back = substr ( $self , $self_last_slash );
//strip off the previous random insertion:
$self_front_last_slash = strrpos ( $self_front , ‘/’ );
$self_front = substr ( $self , 0 , $self_front_last_slash );
// generate random string
mt_srand ((double) microtime () * 1000000 );
$microsoft_sucks = mt_rand ( , mt_getrandmax ());
$microsoft_sucks = “iebroken$microsoft_sucks” ;
// set the action of our “save” button dynamically
$action = “http://$_SERVER [ HTTP_HOST ] $self_front/$microsoft_sucks$self_back” ;
error_log ( “action: $action” );
fdf_set_submit_form_action ( $outfdf , “save” , FDFUp , $action , bindec ( “00111” ));
Listing 5
// Send the correct headers to the browser
// Of course, IE ignores them, but other browsers care
header ( “Content-type: application/vnd.fdf” );
header ( “Content-length: “ filesize ( $temp ));
// Finally, spew out our PDF
// If your files are HUGE you might want to use
//fopen/fread/echo in a loop to spew out only N bytes at a
// time
// Last time I checked, readfile would pull the whole file
// into RAM Use the source, Luke
readfile ( $temp );
Trang 32sion - in other cases it’s a bug that Microsoft has
ignored for years For simplicity, I’ll simply refer to “IE”
rather than delving into the joys of browser versioning
You can always test on a dozen different versions if you
really care to tie down which bugs exist where
Microsoft sure hasn’t bothered, though, so why should
you?
The first problem with IE is that it ignores the
RFC-specified CCoonntteenntt TTyyppee header, and attempts to infer
the document type from the URL and binary contents
of the file Thus, your URL, which probably looks like
h
http://example.com/example_fdf.php p is presumed by IE to
be a pphhpp file This results in the browser complaining
that it doesn’t know how to deal with a file of that type,
even though you have correctly specified
aapppplliiccaa ttiioonn//vvnndd ffddff as the document type, and IE is quite
happy to display FDF documents whose URL ends in
“.pdf”
The simplest way to fix this is to rename your file
from eexxaammppllee ffddff pphhpp to just eexxaammppllee ffddff This way, IE
won’t know that it’s a PHP file, and will do the right
thing Naturally, without the “.php” on the end,
Apache doesn’t know that it’s a PHP file either, but
that’s easy to fix by creating (or adding to) a
hhttaacccceessss file in the same directory:
<Files ~ *_fdf>
ForceType application/x-httpd-php
</Files>
You also can completely cozen even the worst
ver-sions of Internet Explorer by tacking on a completely
‘bogus’ filename that ends in “.pdf” so your URL looks
like:
http://example.com/example_fdf/iebroken.pdf
IE thinks it’s getting a document namediieebbrrookkeenn ppddff, even though it’s really the PHP scripteexxaammppllee ffddff that is spewing out the FDF
In fact, you can give it a more meaningful name, incase users want to right-click and “Save As ” You mayalso want to add a CCoonntteenntt ddiissppoossiittiioonn header and itsfilename component if you want to fix every browserknown to man
Thusly does OpenSource prevail over broken etary software OpenSource 1, Microsoft 0
propri-You may also want to allow some data to be input via GET parameters in a URL
h http://example.com/example_fdf?record_id=1 1 works fine inNetscape Microsoft Internet Explorer, however,seems to have a deeply-ingrained belief that PDF filesare boring and static, rather than dynamic, and willsimply refuse to fire up Adobe’s Reader when you dothat
One solution to this is to simply bury the parameters in your URL; for example:
h http://example.com/example_fdf/record_id/1/iebroken.pdf.Microsoft IE will naively assume you have successivedirectories named “example_fdf”, “record_id”, and
“1”, and there is a file named iebroken.pdf, when, infact, “example_fdf” is your PHP script, and the remain-der of the URL is your GET data and a totally ficticiousfilename
PHP will provide you with the URL information in aserver variable called $$ SSEERRVVEERR[[‘‘PPAATTHH IINNFFOO’’]] You canthen write a small script to parse that variable and pop-ulate a $$ PPAATTHH variable, just like $$ GGEETT and $$ PPOOSSTT,except that you’re doing all the work, as shown inListing 6 As you can see, we also use uurrllddeeccooddee(()) tochange any of those %%XXXX or ++ signs into what they ought
F
Listing 8
// Normally, our code would call session_start() at this
// point Microsoft Internet Explorer is broken, so we
// instead force the user to be logged in before they get
// here:
if (!isset( $_COOKIE [ ‘PHPSESSID’ ])){
$host = $_SERVER [ ‘HTTP_HOST’ ];
$redirect = “http://$host/login.php” ;
// header(“Location: $redirect”);
}
Listing 9
// Shove in the blank template FDF
$self = $_SERVER [ ‘PHP_SELF’ ];
$basename = basename ( $self , “.pdf” );
// Strip out the PATH_INFO and random parts:
$true_self = substr ( $self , 0 , strpos ( $self , “$basename” “_fdf” ) - 1 );
$source = “http://$_SERVER [ HTTP_HOST ] $true_self/iebroken_fdf/$microsoft_sucks/$basename.pdf” ;
error_log ( “source: $source” );
fdf_set_file ( $outfdf , $source ) or error_log ( “ERROR: Unable to set FDF file to $source “ FILE “: “ LINE );
Trang 33Since we’re already doing all sorts of horrible things
to the URL anyway, it’s simple enough to tack on a
ran-dom element so IE will have a different URL for every
single edition of any given PDF document Of course,
that will make IE keep an awful lot of PDFs in its cache
that nobody will ever be able to use, but there isn’t
real-ly that much that can be done about that
Your FDF “Submit” buttons also have the URL
embedded in their definition, so that they know where
to send your POST data So you’ll need to change the
aaccttiioonn of your PDF button elements as well, as I do in
is a constant that represents the action of the user ting up on the mouse button While you might thinkFFDDFFDDoowwnn would be better, the standard interface con-vention is to wait for the “up” action, so the user can
let-“slide off” the button with the mouse down to changetheir mind about clicking The $$aaccttiioonn part is just theURL, complete with all the hoops we are jumpingthrough to bypass Microsoft IE bugs
The bbiinnddeecc((““0000111111””)) part translates to “Use HTMLPOST, please” which is the same as having selected
“HTML” and “POST” in the Acrobat dialog for how theFDF data should be submitted I won’t even go into thedetails of why 0000111111 means HHTTMMLL PPOOSSTT since I don’t thinkit’s a good idea to use anything else anyway If you’refeeling up to it, you can read the FDF ToolkitDocumentation that came with your FDF Toolkit fromAdobe and puzzle out the reasoning behind that 0000111111 Our third Microsoft wall smashed by OpenSource.Our final score is 3 to 0, in case you’ve forgotten
I’d like to say that was the last and final hurdle in ourFDF experience But back here in the real world we had
a couple more challenges as we integrated my FDFcode with my co-workers’ efforts The first was thatPHP’s sseessssiioonn ssttaarrtt(()) function also “broke” the PDFfiles in Internet Explorer This is probably the same basic
F
Installation Steps
First, you’ll need to make sure you have PHP’s FDF library installed, as well as Adobe’s FDF SDK library Both
are available for free, though Adobe requires an email address to download As I understand their license, you
can only re-distribute Adobe’s FDF SDK library by paying a fee, but can install it for free on your server, or your
client’s server
The only “tricks” to installing Adobe’s FDFSDK are:
• copy the LLIIBBFFDDFFTTKK SSOO file to lliibbffddffttkk ssoo in the right place
• copy the LLIIBBFFDDFFTTKK HH file to lliibbffddffttkk hh in the right place
• convince Linux to re-load its dynamic libraries
One crucial part is getting a lower-case ssoo and lower case hh and, perhaps not as obvious as I would hope,
keep the main part of the filename the same—lliibbffddff ssoo and tthhaattffddfftthhiinngg hh won’t work too well This is all
so Linux will know lliibbffddffttkk ssoo is a dynamic library and load it as part of the OS, just like a DLL in Windows
or a System Extension on a Mac
The “right place” to put an ssoo file varies from distribution to distribution, but you can generally figure it
out by finding a whole bunch of other ssoo files if you’re new to Linux You’ll also maybe need to run
llddccoonn ffiigg or, if all else fails, re-boot, which is total overkill, but will suffice for newbie Linux admins Try looking in
//uussrr//lliibb and //uussrr//iinncclluuddee or //uussrr//llooccaall//lliibb and //uussrr//llooccaall//iinncclluuddee You may also want to play with
llddccoonn ffiigg ––vv and llddccoonnllddccoonn ffiigg vv || ggrreepp ffddff to convince yourself that you correctly installed Adobe’s FDF SDK library
for the OS
You then may need to re-configure PHP (if you installed from source) or download and compile parts of PHP
“M icrosoft IE will cache your dynamic PDF, because of its deep-ingrained belief that PDF files are boring static documents.”
Trang 34problem as the nnoo ccaacchhee headers Fortunately, we
did-n’t need to have any PDF files as “entry points” (first
page visited) in our application, so we simply
re-direct-ed anybody without a valid session ID back to our login
page using the script in Listing 8 Since this is
essential-ly the same problem as the other headers, I won’t even
count this one—we’re ahead 3 to 0 anyway, and can
afford to be gracious
In browser testing, it turned out that some Macintosh
Internet Explorer versions were also caching the PDF
referenced in our FDF documents The PDF isn’t
actual-ly “embedded” in an FDF as you might expect, but is
pulled in via HTTP separately At any rate, we also had
to change our call to ffddff sseett ffiillee(()) to
utilize the same sort of “random” URL to fool IE, as
shown in Listing 9
This utilizes a very short iieebbrrookkeenn ffddff script, which
also uses the hhttaacccceessss hack above (note the use of ~~
** ffddff rather than a single filename) to “fool”
Macintosh Internet Explorer into not screwing up the
underlying PDF behind our FDF, as you can see in
Listing 10 Again, it’s pretty much the same
problem/solution as a previous hoop, so we won’t even
bother to count this one Still 3 to 0
It’s pretty sad that we’ve written more code to fix IE
bugs than to actually build our original application,
but, given its user base, it was also inevitable
Using Adobe Acrobat
While I’m at it, I might as well make a few pointed
com-ments about using Adobe Acrobat for building yourFDF documents—if you decide to go ahead and buy it,this will, hopefully, save you a few headaches:
• Adobe Acrobat is quite good at importingvarious formats and creating an FDF, butwho at Adobe thought it was a “Good Idea”
to make up a random prefix and re-name all
my HTML form attributes to something useless like EEWWRRLLQQWWIIUUEENNNNIIUUEERROOPPIIUUFFLLKKJJAAEEWWII ffoorrmm11 xx11 ff11has serious issues I can understand the
“form1” part, in case there are multipleFORM tags in a single page, but the restseems to make no sense
• If a given field is “required” or has some sort
of pre-set validation, Acrobat won’t be toohappy when you serve up a blank form, andcomplain to the end user that a requiredfield has not been filled out Of course ithasn’t been filled in with a valid entry… it’s
a blank form! The text of the error messagescould also be much more pleasant and clari-fied I suspect there are extra steps onecould take to override Adobe messages, butthere’s no excuse for the defaults not beinguseable
• If I’ve zoomed in significantly and hit the
F
if you have an RPM or equivalent, which probably doesn’t include the FDF library You can use <<??pphhpp
pphhppiinn ffoo(());;??>> to quickly check if you have the FDF module or not Either you see “FDF” mentioned in a nice little
grid section like the other stuff you have installed (e.g.: MySQL) or you don’t have FDF
If you are familiar with compiling PHP from source, just tack on:
——wwiitthh ffddffttkk==//ppaatthh//aabboovvee//wwhheerree//yyoouu//ppuutt//lliibbffddffttkk Note that if you put lliibbffddffttkk ssoo in //uussrr//llooccaall//lliibb, you want to
use //uussrr//llooccaall and not //uussrr//llooccaall//lliibb as your path This is because configure needs to dig down inside that
path and find both lliibbffddffttkk ssoo and lliibbffddffttkk hh
If you are using an RPM, you can probably compile just PHP’s FDF library:
• Download the PHP source ttaarr ggzz file matching your version No fudging on this—get the same
exact version
of PHP as your RPM PHP version
• Use ttaarr xxzzvvff pphhpp ** ttaarr ggzz to unpack the source
• Use ccdd pphhpp XX YY ZZ (XX YY ZZ is your version number) to move into that directory
• Use //ccoonnffiigguurree ——wwiitthh ffddffttkk==sshhaarreedd,,//ppaatthh//aabboovvee//yyoouurr//lliibbffddffttkk ssoo
• Use ccpp mmoodduulleess//ffddff ssoo //ppaatthh//ttoo//yyoouurr//pphhpp//eexxtteennssiioonnss//ddiirreeccttoorryy to copy the resulting PHP
mod-ule to a place where you are allowed to load it into your PHP scripts on the fly
If you have no clue where you are allowed to load PHP extensions from, don’t panic—PHP will print out an
error message telling you where that is when you try to load the library (see source below), or you could dig
through your pphhpp iinnii file to find the setting if you are familiar with its structure
Installation Steps Continued