1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Artificial Intelligence made easy with PHP and FANN docx

68 482 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Neural Networks with FANN and PHP
Tác giả Evan Nemerson
Trường học php|architect
Chuyên ngành Artificial Intelligence
Thể loại Tài liệu
Năm xuất bản 2004
Thành phố Toronto
Định dạng
Số trang 68
Dung lượng 3,78 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

If you are using persistent connections and restart your Oracle database, all of the open connec-tions being used by your Apache server will become corrupt but will not be reopened until

Trang 1

JUNE 2004 VOLUME III - ISSUE 6

The Magazine For PHP Professionals

> Artificial Intelligence made easy with PHP and FANN <

NETWORKS

NEURAL

Spell checking with PHP

Automatic language detection

Make your script determine the

language of written text

Portable and stable GUI applications with PHP and XUL

Efficient Oracle Programming

Incredible-looking forms with PHP,

Trang 3

Sign up before July 20th and save up to $100!

Christian Mayaud — Getting Your OSS Business Funded, Rasmus Lerdorf — Best Practices for PHP Developers, Jim Elliott — Open Source: The View from IBM, Daniel Kushner — Attacking the PHP Market, Andrei Zmievski — Andrei’s Regex Clinic, Wez Furlong — Introducing PDO, Regina Mullen — OSS in Legal Technology, Derick Rethans —

Multilingual Development with PHP, George Schlossnagle — PHP Design Patterns

and many, many more!

Jump Right To It.

Trang 4

PHP And the What-if Machine

by Andi Gutmans and Marco Tabini

10 Low-impact Programming with

TM

Trang 5

*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly.

Choose a Subscription type:

CCaannaaddaa//UUSSAA $$ 9977 9999 CCAADD (($$6699 9999 UUSS**)) IInntteerrnnaattiioonnaall AAiirr $$113399 9999 CCAADD (($$9999 9999 UUSS**))CCoommbboo eeddiittiioonn aadddd oonn $$ 1144 0000 CCAADD (($$1100 0000 UUSS))((pprriinntt ++ PPDDFF eeddiittiioonn))

Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue

to be mailed to you.

*US Pricing is approximate and for illustration purposes only.

php|architect Subscription Dept.

VISA Mastercard American Express

Credit Card Number:

The Magazine For PHP Professionals

YYoouu’’llll nneevveerr kknnoow w w whhaatt w wee’’llll ccoom mee uupp w wiitthh nneexxtt

Subscribe to the print edition and get a copy ofLumen's LightBulb — a

$99 value absolutely FREE †!

Login to your account

for more details

EXCLUSIVE!

† Lightbulb Lumination offer is valid until 12/31/2004 on the purchase of a 12-month print subscription.

Trang 6

Graphics & Layout

no responsibilities with regards of use of the information contained herein or in all ciated material.

asso-Contact Information:

Copyright © 2003-2004 Marco Tabini & Associates, Inc.

— All Rights Reserved

This month’s issue marks the first time, at least to

my knowledge, that a topic such as artificial

intel-ligence has been discussed on a PHP publication

AI is one of those topics most people talk about

with-out really understanding its capabilities—and this has

resulted in a lot of confusion out there If you’re

wor-ried that your server will become sentient and try to

take over the world (or, worse, spend all your money),

you can rest assured that that will not be the case (at

least until you run Internet Explorer—that’ll do the

trick)

However, a technology like neural networks can

come in very handy for a website developer Ad-hoc

predictive solutions for tasks such as fraud prevention

and customer enticement already exist out there and

are well available for everyone to use—at an often

steep price As a PHP developer, however, you are both

luckier and less fortunate at the same time The FANN

library extension that is now available through PECL

provides you with the facility needed to create, train

and execute a generic neural network, which means

that you can not only build applications similar, or even

better, to the ones available commercially, but that you

can also build new and exciting ones

On the other hand, designing and training a neural

network is a bit of a “black art” that requires a lot of

trial and error, so that you’ll have to be very creative

with it It’s excellent news for us that Evan Nemerson,

who is the author and maintainer of the extension (as

well as one of the original authors of the library) has

agreed to tackle the problem of creating a neural net

from a practical perspective—building a simple script

that is capable of automatically determining the

lan-guage in which a string of text is written Even with

sur-prisingly little training (and, even better, very little

actual PHP code), the network can reach surprisingly

high levels of accuracy

Still, I’m fairly convinced that, once more people start

appreciating the abilities of the FANN library in finer

detail, we’ll see applications built on top of it become

available for everyone to use and tweak—and before

you know it, your computer will shut down at the

sound of “I’ll be back”

Neural networks are not all we’re doing this month,

of course Ilia Alshanetsky covers spell checking—a

topic that can be helpful to everyone who runs a

web-site As it turns out (but not surprisingly), PHP has

excellent facilities that support spell-checking

opera-tions We also have a great article on optimizing

Oracle-based websites—now that Oracle is placing

more and more interest in open-source projects, this is

likely to come in handy to more and more developers,

even if they are not in the enterprise arena If you ever

wanted to create beautiful-looking forms but dreaded

the prospect of converting them to PDF, you’ll likely be

EDITORIAL

Continued on page 9

Trang 7

“eZ publish is an open source content management system and development framework As a content management system (CMS) it’s most notable feature

is its revolutionary, fully customizable, and extendable content model This is also what makes it suitable as

a platform for general Web development Its alone libraries can be used for cross-platform, data- base independent PHP projects eZ publish is also well suited for news publishing, e-commerce (B2B and B2C), portals, and corporate Web sites, intranets, and extranets eZ publish is dual licensed between GPL and the eZ publish professional license.”

stand-View more information at e eZ.no

phpPgAdmin 3.4 Released

P

Postgresql.com m announces the release of

phpPgAdmin 3.4

“phpPgAdmin is a web-based administration tool for

all 7.x versions of PostgreSQL.”

Some new features include:

• Add CACHE and CYCLE parameters in

sequence creation

• View, add, edit and delete comments on

tables, views, schemas, aggregates,

conver-sions, operators, functions, types, opclasses,

sequences and columns (Dan Boren &

ChrisKL)

• Add config file option for turning off the

dis-play of comments

• Allow creating array columns in tables

• Allow adding array columns to tables

• many more…

Get all the info at P Postgresql.com m

Zend Technologies and ApolloInteractive Unite

Thursday, May 27th 2004 13:48:55 GMT

“Apollo Interactive®, America's leading Interactive Agency, and Zend Technologies, the PHP company, today announced a partnership to promote excellence

in open source development Through the alliance, the companies will share their varied technology per- spectives to improve the functionality of the PHP lan- guage ¾ which was developed by the founders of Zend ¾ and refine PHP implementation for large, high-volume enterprise Web sites

The combination of Apollo’s significant PHP site opment experience and Zend’s technological expertise will help drive the continued evolution of PHP, an open source Web scripting language that is gaining momentum as the most popular language to power dynamic Web sites.The alliance will further the devel- opment of PHP’s infrastructure and enable Zend to establish best practices for its implementation in large enterprise environments.”

devel-For more information visit: w www.zend.com m

Trang 8

NE EW W S ST TU UF FF F

PHP5 Coding Contest

Want to put your PHP5 Skills to the test? Zend has

announced its PHP5 coding contest, of which

php|architect is also a sponsor

“We’ve got lots of Prizes to give out just for entering,

as well as the Grand Prizes: a top-of-the-range Dell

laptop for a developer working by himself or an Apple

iPod Mini for each member of your team! Your

appli-cation will be rated both by your peers and by the

panel of Judges we’ve assembled from among the

most known and well-respected names in the PHP

community.”

Get all the Contest information from Z Zend.com m

Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.

BLENC 1.0alpha

BLENC is an extension that hooks into the Zend Engine, allowing for transparent encryption andexecution of PHP scripts using the blowfish algorithm It is not designed for complete security(it is still possible to disassemble the script into op codes using a package such as XDebug),however it does keep people out of your code and make reverse engineering difficult

odbtp 1.1.1

This extension provides a set of ODBTP, Open Database Transport Protocol, client functions.ODBTP allows any platform to remotely access Win32-based databases Linux and UNIX clientscan use this extension to access Win32 databases like MS SQL Server, MS Access and VisualFoxPro

PDO_MYSQL 0.1

This extension provides a Mysql 3.x/4.0 driver for PDO

PHP 4.3.7 Released

PHP.net announcedthe release of PHP4.3.7 The PHPDevelopment Team isproud to announce therelease of PHP PHP4.3.7 This is a mainte-nance release that, inaddition to several non-critical bug fixes, addresses

an input validation vulnerability in

eessccaappeesshheellllccmmdd(()) and eessccaappeesshheellllaarrgg(()) functions

on the Windows platform Users of PHP onWindows are encouraged to upgrade to thisrelease as soon as possible

For more information visit: h http://qa.php.net/ /

Trang 9

New at php|a: PayPal support and

sin-gle prints

Monday, June 7th 2004 13:22:00 GMT

You asked for it! php|architect's purchasing system

now accepts PayPal as a valid payment method! You

can use your PayPal account safely and securely to pay

for all your php|a purchases

Also, effective immediately you can now purchase

individual print issues that will be delivered directly to

your doorstep Expect more past issues to become

available as we update our inventory and introduce

new shipping methods to get the magazines out to you

faster!

PHP 5 Release Candidate 3 Released!

Tuesday, June 8th 2004 12:48:09 GMT PHP.netannounces the third release candidate of PHP5!

The third (and hopefully final) Release Candidate ofPHP 5 is now available!

This mostly bug fix release improves PHP 5's stabilityand irons out some of the remaining issues before PHP

5 can be deemed release quality Everyone is nowencouraged to start playing with it!

There are few changes changes since ReleaseCandidate 2, which can be found here

For more information visit: w www.php.net

• Creates configurations from scratch

• Parses and outputs different formats (XML, PHP, INI,

Apache )

• Edits existing configurations

• Converts configurations to other formats

• Allows manipulation of sections, comments, directives

• Parses configurations into a tree structure

• Provides XPath-like access to directives

XML_HTMLSax3 3.0.0RC1

XML_HTMLSax3 is a SAX-based XML parser for badly formed XML documents, such as HTML

The original code base was developed by Alexander Zhukov and published at ects/phpshelve/ Alexander kindly gave permission to modify the code and license for inclusion in PEAR

http://sourceforge.net/proj-PEAR::XML_HTMLSax3 provides an API very similar to the native PHP XML extension(http://www.php.net/xml), allowing handlers using one to be easily adapted to the other The key difference

is HTMLSax will not break on badly formed XML, allowing it to be used for parsing HTML documents.Otherwise HTMLSax supports all the handlers available from Expat except namespace and external entity han-dlers Provides methods for handling XML escapes as well as JSP/ASP opening and close tags

DB_DataObject 1.6.1

DataObject performs 2 tasks:

1 It builds SQL statements based on the objects vars and the builder methods

2 It acts as a datastore for a table row

The core class is designed to be extended for each of your tables so that you put the data logic inside the dataclasses

php|a

PHP_Beautifier 0.0.6.1This program reformats and beauti-fies PHP source code files automati-cally The program is Open Sourceand distributed under the terms ofPHP License It is written in PHP 5and has a command line tool

Trang 10

interested in this month’s article on FDF forms—PHPprovides an excellent interface to Adobe’s FDF librarythat lets you combine a PDF form with POST data andcreate a print-quality document with little or no effort.Elsewhere, we cover XUL, the interface developmentlanguage that must have been born out of one Mozilladeveloper asking the others “and now, how do we do

it in Windows?” XUL is great for building a GUI cation that can be ported across several operating sys-tems and that requires almost no programming—and,

appli-certainly, no code in C, Visual Basic et similia

Finally, this issue also marks the debut of our veryown Peter MacIntyre in the role of reviewer and author.Peter is a great help in the editorial process—and, as itturns out, an incredibly gifted reviewer Now, if I couldonly interest him in some Italian food…

Editorial: Contiuned from page 5

“LightBulb is a complete, browser-based, WYSIWYG

PHP development suite which includes a PHP

appli-cation generator, a code editor (with context and

classes prompting and highlighting), a complete

middleware/framework environment (Lumenation),

a GUI application interface, record locking, HIPPA

application compliance, user application logging,

transaction logging, current user monitoring, a

library of PHP classes and data access security, DB

compatibility, a report builder, a query builder, an

SQL builder, a source code manager, an application

management system, and a virtual desktop system

metaphor, and many other features.”

For more information or to download, visit e ezsdk.com

Trang 11

Having your PHP-driven website use an Oracle

database means you can tap into some of the

most powerful tools available for web

develop-ment The speed and reliability of PHP code, coupled

with the power and flexibility of an Oracle database

gives developers a tough combination to beat

However, unless you are careful, performance and

reli-ability issues can creep up on you and cause your

sys-tems to have increasingly difficult issues to resolve

This article is an attempt to outline a few steps you

can undertake to create PHP and Oracle code that

works well together and minimizes resource utilization

on both sides In other words, we’re presenting the

tools and techniques to create “low impact

program-ming.”

All of the examples shown are drawn from real-life

pain and suffering Our environment consists of a

num-ber of web servers running Linux, Apache version 2,

and usually the latest version of PHP Our Oracle

data-base servers typically run on Sun hardware operating

Solaris (although we do have several test Oracle servers

running on Linux)

We’ll start by describing the various ways in which

you can minimize the impact of coding decisions on

both the web and database servers We will follow this

up with some very specific examples of common tasks

and the approaches you can take with them We’ll

con-clude with tools and techniques for monitoring your

progress at making low impact and robust web sites

with PHP and Oracle

What is “Low Impact Programming”?

Low impact programming, more an attitude than askill, means always trying to reduce the ways in whichthe code we write and the configurations we makeimpact the servers on which they run It means alwayssearching for ways to reduce resource utilizationregardless of how often a piece of code will be run

In practical terms, writing low impact programmingmeans that your systems will scale without having tocontinue throwing hardware at the problem.Moreover, by concentrating on reducing resource uti-lization to accomplish the same tasks, you end up mak-ing your systems much more robust and fault tolerant

We are often lulled into a false sense of security when

we use tools like PHP and Oracle, because of theirinherent speed and reliability The danger is that whenperformance issues arise, they escalate quickly

Keeping Resource Utilization Light

A number of factors influence what resources arerequired when connecting PHP and Oracle in a web-site The easiest ways to reduce resource usage includeusing persistent database connections, avoiding Oracledatabase commits, taking advantage of the Oracle SQL

REQUIREMENTS

PHP and Oracle are an excellent combination for creating

powerful and scalable web solutions This article sheds

light on those performance issues that might arise only

under high-traffic situations—so that you can stop them

before they ever start cropping up

Trang 12

cache, and minimizing data transfer.

Persistent Connections

Many arguments exist both for and against using

per-sistent connections The biggest single advantage to

using persistent connections, with Oracle as your

data-base in particular, lies in the fact that creating datadata-base

connections takes a lot of time and CPU power on both

ends of the connection In our testing, we found that

opening a new Oracle database

con-nection added between 0.25 and

0.5 seconds per page Using

persist-ent connections saved us this time

on nearly every page

If you choose to use persistent

connections with an Oracle

data-base, you must be aware of many

things Among the chief

considera-tions are that resources opened up

by one script on a persistent

connec-tion will remain open on subsequent

scripts on the same connection This

has a cumulative effect on your

data-base server and can be a stealth

rea-son for system slowdowns and

phantom error messages

As an example, each statement handle created opens

a cursor The open cursor in the Oracle database

repre-sents a memory handle within the database, and all

Oracle databases have a finite number of these handles

available While the persistent connections will

eventu-ally close (closing all the open cursors on that

connec-tion) when the Apache child process ends, on a busy

site you can see the open cursor counts rise until you

start getting error messages If you’re going to use

per-sistent connections, then, you must specifically close all

statement handles you create

Another problem area that can sneak up on you is in

the use of Oracle session parameters One of the most

common uses of Oracle session parameters is to set the

default date format If you want a particular date

for-mat for a query on one page and use the Oracle session

parameters to accomplish that, then that change in the

date format will persist to all scripts that happen to use

the same connection This can lead to very inconsistent

output without any clear indication of why it is

happen-ing

The last danger with persistent connections is that

current versions of PHP don’t handle database restarts

very well If you are using persistent connections and

restart your Oracle database, all of the open

connec-tions being used by your Apache server will become

corrupt but will not be reopened until the next page

This means that every time you restart your Oracle

database, you need to restart your Apache server or

your users will see lots of error messages until all the old

connections are retired

Using persistent connections to reduce resource lization can work if your PHP scripts all follow theseguidelines:

uti-• Program all scripts to clean up after selves

them-• Only use Oracle session parameters in welldefined and agreed upon ways

Having all your scripts clean up afterthemselves is an easy idea If youopen a statement handle, close it Ifyou open a new descriptor, close it

If you can remember to always closeevery resource you ever open, yourOracle server will reward you witheven performance and high upti-mes

While there are quite a few usefulfeatures in an Oracle server that can

be taken advantage of via Oraclesession parameters, these parame-ters must always be mutually agreedupon by all the programmers andused consistently If one program-mer chooses to set an Oracle sessionparameter, the unintended effects this will have oneveryone else’s code are very difficult to predict.Moreover, bug reports involving these types of param-eters are almost impossible to find

Minimizing Commits and TransactionSize

Every time an Oracle database does a commit, it willsave whatever is in the current buffer to disk This is truewhether or not there is any data to save Each of thesedisk writes takes time and resources on the Oracle serv-er

Because the default behavior of the Oracle functions

in PHP is to have auto-commit turned on by default,you dramatically increase the number of unnecessarydisk writes performed by the database server The rea-son that so many of the disk writes are unnecessary lies

in the fact that almost every statement handle used in

a PHP site is a query for data Unless your query is a

“select for update” operation, a select statement willrequire no saving of data and only needs the disk toread

The easiest way to avoid doing commits when all youwant to do is read data from the database is to use theOOCCII DDEEFFAAUULLTT option on your OOCCIIEExxeeccuuttee statements.This changes the behavior of SQL execute statementsfrom auto-committing your statement handle to defer-ring the commit While this might lead to problemswith the roll-back spaces, if you followed the earlieradvice of always closing your statement handles, then

F

“E very time an Oracle database does a commit,

it will save ever is in the current buffer to disk.”

Trang 13

what-resource utilization will be kept to a minimum.

When doing inserts, updates, and deletes, however,

you must do a commit or your data changes will not be

saved There may be situations where you wish to defer

the commit until after further operations are

complet-ed, but under most circumstances you’ll want to

com-mit as soon as you execute the statement The only

sit-uation where you must defer a commit is when using

certain types of bind variables such as when dealing

with large objects or PL/SQL

If you are going to defer commits when doing inserts,

updates, or deletes, then you must make sure that you

keep enough room in your Oracle rollback segments

The size of your rollback segments should be more than

large enough to accommodate the largest transaction

you will ever have in a single script If you’re going to

do a lot of work with large objects, then you should

make sure that your rollback segments could

accom-modate the largest large object you think you’ll

encounter

Leveraging the SQL Buffer Cache

One of the chief benefits of using an Oracle database as

the backbone of your PHP-driven site is that the SQL

engine available to you has enormous power to

manip-ulate data With that power, however, you pay a price

as your queries become more and more complicated

Each time you pass a SQL statement to the Oracle

data-base, it must be parsed and an execution model must

be created Every SQL statement must pass through the

Oracle cost based optimizer (CBO) to determine what

indexes will be used and in what order, along with the

various join conditions to best return the data

request-ed

To keep from returning to the CBO unnecessarily,

Oracle will maintain a cache of the results of each parse

and execution However, this cache is based on the

exact SQL statement and is case sensitive As a result, to

properly leverage this cache and avoid having to

re-parse the same SQL statements over and over again, all

of your SQL statements must be standardized

The easiest ways to standardize your SQL statements

consist of following two simple guidelines: always use a

consistent case convention and avoid putting newline

characters in your SQL strings While many

program-mers like to use a mixed case for all of their SQL

state-ments, it is hard to find two programmers who will do

it exactly the same way The easiest way to avoid

hav-ing case issues deny you equal access to the SQL buffer

cache is to always use the same case for all SQL

state-ments While it is a personal choice on which case to

use, we choose to always use lower case for SQL

state-ments, since they are easier to type this way

Another way to leverage the SQL cache is to look for

queries that vary only by particular parameters For

example, if you are always calling up rows from a

par-ticular table just varying the primary key you query on,

then you can make them all use the same SQL cacheentry by using a bind variable for the varying parame-ter In this way, the SQL is always the same but the bindvariable lets you select which row you wish to return.Another way to reduce the load on the Oracle data-base server is to move more complex queries intoOracle views A view is simply a pre-defined query Theadvantage to a view is that it gets compiled and opti-mized when it is created, rather than whenever theassociated statement is executed As a result, when thePHP script calls on the view, the database server hasalready dealt with the complex conditions Effective use

of views, often difficult for PHP programmers makingthe transition from other database systems, can dra-matically reduce resource utilization Another benefit isthat your DBA can optimize the views in the systemwhile leaving the PHP programmers code untouched

By following these guidelines, a busy site can oftenachieve a buffer cache hit ratio of 80%, or even betterunder some circumstances This will dramaticallyreduce the load on the CBO and other aspects of theOracle database server

Minimizing Data Transfer

Another stealth reason for seemingly slow performancelies in how much data is transferred between the Oracleserver and the Apache server Hopefully, the databaseand web servers are located physically close to oneanother (preferably on the same network subnet).However, large amounts of data transfer—often unnec-essary—can cause slowdowns and reduce responsetimes

One way to reduce data transfer is to create viewsthat let you retrieve only the data that you actuallyneed For example, if you have a table with one ormore large objects in it and you don’t need LOB data,don’t put those columns in your query If you prefer touse the sseelleecctt ** ffrroomm…… syntax, then create a view thatcontains the non-LOB columns of the table

Another way to avoid unnecessary data transfer is towrite queries without having to do queries within yourreturn result loops If you have your code do an innerquery for each return result row, then you’re going to

be putting a lot of extra pressure on data transferbetween your PHP script and the Oracle server Because

of the ability of the Oracle SQL engine to do complex,multi-dimensional queries, it is almost always possible

to write a nested query as a single query

Another place where data transfer can hurt ance is within the database itself It is often difficult forthe CBO to know that a join condition is a foreignkey/primary key relationship If you know that only onematching row will ever occur for a given join condition,then you can let the CBO know this by passing theFFIIRRSSTT RROOWWSS SQL compiler directive as in Listing 1 Thislets the CBO know that all of the join conditions are for-

perform-F

Trang 14

eign keys to primary keys.

One last condition where data transfer can affect

per-formance is in the use of database links between

multi-ple Oracle servers When you do joins across database

links or even on the far side of the database link, the

local CBO will work especially hard putting all the data

together You will often find that performance on both

the local database server and remote database server

suffers as data transfers across the database link eat up

resources on both sides

Optimizing Common Tasks with PHP and Oracle

There are a number of common tasks where the

choices made by the programmer can have a

signifi-cant cumulative effect on performance Among the

tasks where these choices arise are in providing paged

output, computing subtotals and grand totals, finding

sums and averages conditionally, and querying against

date constraints In all cases, there are multiple ways to

accomplish the same task We will show what we have

found produces the least impact on all our servers

together

Paging Query Output

Programmers are often called upon to provide search

output or reports in a paged format For example, you

may want to show search results limiting the output to

only 20 results per page In a web environment, this

helps to avoid situations where a poor search may

return thousands of entries, only the first few of which

are really of interest to the user

With some RDBMS systems—including mySQL and,

to some extent, Microsoft SQL Server—built-in sions to SQL allow you to do this quickly and easily.Fortunately or unfortunately, Oracle databases haveonly half of what you need to limit output, and it isalways done before the sorting requested in the query.Given those limitations, you need to decide whetheryou will have PHP limit your search output or whetheryou will have your Oracle query return only the rowsyou are interested in

exten-To really understand the limitations in Oracle, let’sstart to build a paged query from the inside out Let’ssay you want to retrieve all employee records sorted bylast name and then first name The query in Listing 2works well enough So long as there are only a fewdozen employees, you never need worry about howmany rows get returned and displayed in the browser.Once you move beyond a few dozen rows returned

to a few hundred rows returned, you’ll want to limit theoutput If you wanted to display only the first twentyrows in the query and you had done a superficial read-ing of the Oracle SQL manuals, you would be tempted

to use the query in Listing 3 Unfortunately, the OracleCBO will limit your query by row number prior toapplying the sorting routines, giving you inconsistentoutput The cure is to put the main query from Listing

2 into a subquery, as in Listing 4

This basic technique can give you a query that cutsoff at a particular maximum value This works wellenough to display the first page of your paged output.Assuming you want to display the second page, youwill have to employ another round of querying If youmodify the query in Listing 4 to also return the rownumber returned, then you can put the whole query inyet another subquery and limit output to only thoserows whose row number is at least as large as the min-imum value you want to return The full query shown

in Listing 5 shows how to return rows 20 through 39 ofthe result set

The main performance issues you’ll face when trying

to decide whether to use the paged query describedabove versus doing a solution with PHP code will cen-ter on whether data transfer or the Oracle CBO opti-mizations give you the best performance Our experi-ence has demonstrated that for the first 3-4 pages ofoutput, the query solution gives slightly better results

rroowwnnuumm << 4400)) ff wwhheerree

ff rr >>== 2200

Listing 5

Trang 15

The larger the inner result set and the more pages of

output the user pages to, the closer the performance

impact between query and PHP solutions become

Using Roll-up Queries

Another common task when producing reports

involves creating subtotals and grand totals When the

only computations are summations and counts,

whether you use the aggregation functions in Oracle or

you use PHP variables is almost immaterial However, if

your query involves averages or other functions, then

you’ll want to take advantage of the large family of

aggregation functions available in Oracle SQL

The roll-up features available in Oracle SQL center on

options given the GROUP BY clause along with use of

the GROUPING function An example query using

these functions is shown in Listing 6 This query findsthe minimum, maximum, average, and totals for

13 $sql = “select d.dname, e.ename, min(e.sal) as min_salary, “ ;

14 $sql = “max(e.sal) as max_salary, avg(e.sal) as avg_salary,

eemmpp ee,, ddeepptt dd wwhheerree ee ddeeppttnnoo==dd ddeeppttnnoo ggrroouupp bbyy

rroolllluupp((dd ddnnaammee,, ee eennaammee))

Listing 6

Trang 16

salaries by employee with department subtotals and a

grand total for all rows In order to know which rows

contain subtotals for a particular column, we include

values from the GROUPING functions in our SELECT

clause The main thing to remember about the

GROUP-ING function is that it behaves opposite to what you

think it does Moreover, when a column is part of a

rolled-up subtotal, then the return set will have a NULL

value in that column You cannot count on this to tell

you when you have a subtotal column in case there are

actual NULLs that are a legitimate part of the return set

A complete PHP example using the query from

Listing 6 utilizing the GROUPING function output to

show subtotal rows and the grand total row is shown in

Listing 7

Using Case and NVL

A number of specialized data situations occur where

the Oracle CASE and NVL functions provide beneficial

solutions Often, you will want to total things

condi-tionally or otherwise operate on selective data In

addi-tion, there are cases where you want to pivot your

out-put from what the natural select order would give you.Finally, there are cases where you want one thing tohappen if there is a value in a column and anotherthing to happen if the column is null

To conditionally operate on columns, you can usePHP with variable accumulators, or you can do theseaccumulations in your query The advantage to doingthem in PHP is that you will cut down on the amount

of work the Oracle CBO has to do when it is assemblingthe query However, with the use of the SQL buffercache, indexes, and potentially views, you can mitigatethis quite a bit The big disadvantage to doing this withPHP is that you will have a lot of unnecessary datatransfer between your Oracle server and your Apacheserver Moreover, you will be increasing the memoryutilization on your Apache server—a commodity that isusually in short supply on a busy machine

Let’s say that you wanted to sum up the salaries forall managers You can do this with the PHP code inListing 8 This will retrieve all the data from the data-base and decide which data to use in its sum.Alternatively, you can use the single query shown inListing 9 that will give you the answer right away

When you wish to transpose or pivot the output, youare often forced to retrieve all your data via calls to thedatabase and then put it into arrays in PHP for output.This often is the most efficient method for accomplish-ing this task However, if the circumstances are right,you can use the CASE function to retrieve the columnsyou wish to use individually We have found this partic-ularly useful when we wish to display reports on trans-actional data for today, yesterday, this week, and thismonth The use of a CCAASSEE statement to pivot aroundthese date values makes the process very straightfor-ward If you refer to Listing 10, you’ll see a samplequery whose output appears in Figure 1 This, like theprevious example, gets just the information requiredwith a minimum of data transfer

Finally, when you are looking to execute a query thatwill have conditional logic based on whether a column

is NNUULLLL or contains a value, the NNVVLL function can greatlyspeed up the process There are a number of ways inwhich this function can be of use For example, when

we have a sequence value that is used in a table forsorting purposes, we often want to have new entriesappend at the end of the sequence The SQL statement

in Listing 11 will insert a new item and will guaranteethat it will always appear at the end of the list

Another instance where the NNVVLL function can be ofuse is when dealing with effectivity dates In those

9 while ( OCIFetchInto ( $stmt , $row , OCI_ASSOC )) {

11 $mgr_total += $row [ “SAL” ];

ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt llooggiinn ddaattee))==ttrruunncc((ssyyssddaattee))

tthheenn 11 eellssee 00 eenndd)) aass ttooddaayy,,

ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt llooggiinn ddaattee))==ttrruunncc((ssyyssddaattee 11))

tthheenn 11 eellssee 00 eenndd)) aass yyeesstteerrddaayy,,

ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt llooggiinn ddaattee,, ‘‘IIWW’’))==ttrruunncc((ssyyssddaattee,, ‘‘IIWW’’))

tthheenn 11 eellssee 00 eenndd)) aass tthhiiss wweeeekk,,

ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt llooggiinn ddaattee,, ‘‘MMOONNTTHH’’))==ttrruunncc((ssyyssddaattee,, ‘‘MMOONNTTHH’’))

tthheenn 11 eellssee 00 eenndd)) aass tthhiiss mmoonntthh

((mmeennuu iitteemm sseeqq nneexxttvvaall,, nnvvll((mmaaxx((sseeqq))++1100,, 1100)),, ‘‘NNeeww IItteemm’’,, ‘‘//nneeww iitteemm hhttmmll’’))

Listing 11

Trang 17

cases, you want to know if the current date is at least

what the start date is indicating and at most what the

end date is indicating Listings 12 and 13 contain two

alternative queries that will return the same data The

results of these queries are not terribly different in their

impact on the database server and are mostly just an

exercise in thinking about the uses of NNVVLL in creative

ways

Fast Oracle Date Functions

One thing that often trips up programmers who are

new to Oracle databases is that the Oracle DDAATTEE column

data type is actually a date and time column Oracle

does not have a column that is date only or time only

as many other RDBM systems do Instead, Oracle dates

are stored internally as a floating point number The

integer portion of the floating point number is thenumber of days since January 1, 2000BC The mantissarepresents what portion of a day the time represents

Thus, 10.5 would represent January 10, 2000BC, atnoon

This method of storing dates means that there aresome very quick methods for doing particular kinds ofdate-related logic For example, if you want to knowthe number of days between two dates (not countingany time of day differences), then you can use theTTRRUUNNCC function as shown in Listing 14 If you want toknow if a column named LLAASSTT DDAATTEE matches today,then you can compare TTRRUUNNCC((LLAASSTT DDAATTEE)) withTTRRUUNNCC((SSYYSSDDAATTEE)) The TRUNC function, given just a sin-gle parameter, will convert the floating point date into

an integer This has the effect of converting the dateand time into midnight of the given date

If you pass additional arguments to TTRRUUNNCC, you canmove your date in even more strategic fashions Forexample, to see whether two dates are in the samemonth, you can compare TTRRUUNNCC((ddaattee11,, ‘‘MMOONNTTHH’’)) toTTRRUUNNCC((ddaattee22,, ‘‘MMOONNTTHH’’)) To see if two dates are in thesame week, you can use TTRRUUNNCC((ddaattee11,, ‘‘IIWW’’)) and TTRRUUNNCC((ddaattee22,, ‘‘IIWW’’)) Note that

we use IIWW instead of WWWW since in Oracle, the IIWW refers to

an ISO week specification in which weeks always begin

on Monday If you use the WWWW week parameter, then theweek will begin on whatever day of the week that year’sJanuary 1st occurs on

If you refer back to Listing 10, you will see an tive use of the TTRRUUNNCC function with Oracle dates Thesefunctions are much faster than using either TTOO CCHHAARRcomparisons or doing comparisons of BBEETTWWEEEENN

effec-Moreover, because Oracle date columns also contain atime, using TTRRUUNNCC will save you from inclusive problemswhen you do use a BBEETTWWEEEENN function For example, let’ssay that you want to know all transactions thatoccurred between August 1, 2003, and August 31,

2003 If you just used the query in Listing 15, then notransactions that occurred during the day on August 31would be included However, if you use the query inListing 16, then you’ll pick up everything that occurred

on August 31 regardless of the time

Tuning and Monitoring

In a perfect world, all PHP programmers would beexperts in creating pre-tuned SQL statements If wecould always be counted on to do things in the mostefficient manner, then we could do away with monitor-ing of our databases However, since none of us everseems to live in this perfect place, there is always a need

to keep an eye on which queries are using whatresources on the database and on the web server inorder to keep on top of performance issues

Tuning and monitoring consists of a number oftasks—most of the time performed by an Oracle DBA

((ssttaarrtt ddaattee << ssyyssddaattee oorr ssttaarrtt ddaattee iiss nnuullll)) aanndd

((eenndd ddaattee >> ssyyssddaattee oorr eenndd ddaattee iiss nnuullll))

nnvvll((ssttaarrtt ddaattee,, ssyyssddaattee)) <<== ssyyssddaattee aanndd

nnvvll((eenndd ddaattee,, ssyyssddaattee)) >>== ssyyssddaattee

Listing 13

Trang 18

However, occasions do occur where a PHP programmer

can participate in the tuning and monitoring cycles

Often, a DBA will find a query that is performing badly,

will know how to fix it, but will have a terrible time

actually finding where this query lives in the code for

the web site Moreover, the DBA will need to work

closely with the programmer to ensure that any

alter-ations to the query will continue to return the correct

data

The basics of performance tuning come down to two

tasks for programmers: finding bad (or poorly

perform-ing) SQL and creating monitoring tools

Finding Bad SQL

The Oracle data dictionary keeps track of system

resource utilization for each and every query in the

sys-tem You can query various system tables to discover all

kinds of performance characteristics at any time you

wish to

There are several ways in which to measure good and

bad performance for Oracle SQL The chief

characteris-tics we monitor include buffer gets, parse calls, and disk

reads These refer to the various parts of the Oracle

database server having the greatest impact on query

performance In each case, lower numbers indicate

bet-ter performance characbet-teristics

To find the queries that have the poorest ratio of

buffer gets, you can perform the query in Listing 17

Buffer gets are a measure of CPU utilization in the

Oracle server If you are concerned only with a few

database users, then you can limit the where clause to

include only the database users you wish to find To

interpret the ratio returned in this query, let’s examine

the manner in which it is constructed This query

returns the ratio of CPU utilization over the number of

times the query was executed This will let newer

queries (those that haven’t been executed many times

yet) stand out over queries that have been in the

sys-tem longer

Another measure of poor performance would be the

number of times a particular query must be parsed by

the Oracle CBO A query to find the worst performers

in this category is shown in Listing 18 A higher

num-ber indicates a query that may need to be placed in a

view or otherwise optimized to avoid having to parse it

over and over again While a view won’t reduce the

number of “soft” parses, it will cut down on the

num-ber of “hard” parses Again, by dividing the numnum-ber of

parse calls by the execution count, newer queries will

rise to the top of the list

The last major area of performance indicators would

be looking at those queries with the highest ratio of

disk reads To find these queries you can use Listing 19

This query will report a high ratio for a query if there are

a large number of disk reads for each execution of the

query These are likely candidate queries to be further

optimized with additional WHERE clauses or other

tech-niques to cut down on data transfer

More information on interpreting these values can befound in the Oracle publication Oracle 8i Designingand Tuning for Performance This is usually available as

a PDF document in the set of CDs that came with yourOracle server software

F

sseelleecctt rroowwnnuumm aass rraannkk,, bb **

ffrroomm ((

sseelleecctt uu uusseerrnnaammee,, vv ppaarrssee ccaallllss,, vv eexxeeccuuttiioonnss,, rroouunndd((vv ppaarrssee ccaallllss//ddeeccooddee((vv eexxeeccuuttiioonnss,,00,,11,,vv eexxeeccuuttiioonnss)))) aass rraattiioo,, vv ssqqll tteexxtt

ffrroomm vv$$ssqqll vv,, ddbbaa uusseerrss uu,, ((

sseelleecctt ppaarrssiinngg uusseerr iidd,, 66**aavvgg((ppaarrssee ccaallllss)) aass aavvgg ppaarrssee ccaallllss ffrroomm

vv$$ssqqll wwhheerree ppaarrssee ccaallllss >> 00 ggrroouupp bbyy ppaarrssiinngg uusseerr iidd )) aa

wwhheerree vv ppaarrssee ccaallllss >> aa aavvgg ppaarrssee ccaallllss aanndd vv ppaarrssiinngg uusseerr iidd==aa ppaarrssiinngg uusseerr iidd aanndd vv ppaarrssiinngg uusseerr iidd==uu uusseerr iidd

oorrddeerr bbyy rroouunndd((vv ppaarrssee ccaallllss//ddeeccooddee((vv eexxeeccuuttiioonnss,,00,,11,,vv eexxeeccuuttiioonnss)))) ddeesscc )) bb

wwhheerree rroowwnnuumm <<== 8800

Listing 18

sseelleecctt rroowwnnuumm aass rraannkk,, bb **

ffrroomm ((

sseelleecctt uu uusseerrnnaammee,, vv bbuuffffeerr ggeettss,, vv eexxeeccuuttiioonnss,, rroouunndd((vv bbuuffffeerr ggeettss//ddeeccooddee((vv eexxeeccuuttiioonnss,,00,,11,,vv eexxeeccuuttiioonnss)))) aass rraattiioo,, vv ssqqll tteexxtt

ffrroomm vv$$ssqqll vv,, ddbbaa uusseerrss uu,, ((

sseelleecctt ppaarrssiinngg uusseerr iidd,, aavvgg((bbuuffffeerr ggeettss)) aass aavvgg bbuuffffeerr ggeettss ffrroomm

vv$$ssqqll wwhheerree bbuuffffeerr ggeettss >> 00 ggrroouupp bbyy ppaarrssiinngg uusseerr iidd )) aa

wwhheerree vv bbuuffffeerr ggeettss >> aa aavvgg bbuuffffeerr ggeettss aanndd vv ppaarrssiinngg uusseerr iidd==aa ppaarrssiinngg uusseerr iidd aanndd vv ppaarrssiinngg uusseerr iidd==uu uusseerr iidd

oorrddeerr bbyy rroouunndd((vv bbuuffffeerr ggeettss//ddeeccooddee((vv eexxeeccuuttiioonnss,,00,,11,,vv eexxeeccuuttiioonnss)))) ddeesscc )) bb

wwhheerree rroowwnnuumm <<== 8800

Listing 17

Trang 19

Monitoring System Resources

When you’ve started looking at the performance of

var-ious queries within your system, you will soon find

yourself wanting to do something more systematic to

keep on top of performance issues before they get out

of hand When this time comes, you’ll want to have a

set of queries and a process in place to look at SQL

per-formance over time

By putting the queries mentioned above into a

regu-larly scheduled script, you can see over time which

queries are being used most often by your applications

This can be an invaluable tool for programmers who

want to find out where optimization efforts will yield

the highest results and can also keep you from

spend-ing lots of time optimizspend-ing a query that is run once per

day at the expense of optimizing a query run on every

single page on your site

Another area where you can track performance issues

is on the Apache servers In this case, looking at howmany active Apache child processes occur at any giventime, as well as tracking load average, memory utiliza-tion, and overall system process counts, can help iden-tify problems with your web server before they get out

of hand

We use a number of scripts and tools to monitor oursystems on a regular basis Among the key tools arescripts that run the queries looking for poor perform-ance in buffer gets, parse calls, and disk reads on a dailybasis We also have on every web server scripts thatmonitor load averages, process counts, and memoryutilization and feed that data into a round-robin data-base (RRD) We can then generate graphs of systemperformance for a number of time periods on an ongo-ing basis

Only through a concerted effort on a number offronts can you maintain a good picture of where yourperformance issues lie today and where the perform-ance issues of tomorrow will likely occur

Summary

Having access to a powerful database like Oracle is atremendous asset to a PHP programmer The flexibilityand power of the data engine is something that canreally help create complex and robust web sites.However, performance issues often arise that will takeyou by surprise unless you are prepared to deal withthem

Hopefully, this guide can serve as a starting point foryou and your organization to take steps in utilizing yourOracle database to its full potential without causing toomany problems The lessons passed on here are all theresult of painful processes as we dealt with performanceissues in real life crisis situations Perhaps learning how

we solved performance issues will keep your web anddatabase servers working well together

F

To Discuss this article:

dynamic web pages - german php.node

news scripts tutorials downloads books installation hints

D y namic Web Pages

Trang 20

PHP’s primary spell checking functionality is made

available through the pspell extension., which is

based on the Aspell library

The Aspell library is a well-established open source

spell-checking engine used by many other applications

One of its neat abilities is the capability to spell check

multiple languages, rather than the single one that

most other solutions are limited to At this time, Aspell

has dictionaries for over 20 languages, and new ones

are being added all the time Because Aspell is a fairly

commonly used library, it can be found by default on

most open-source operating systems—chances are, you

won’t actually need to download an install a new

library to take advantage of what Aspell has to offer

This is very useful, because it makes the process of

adding the pspell extension to PHP a simple matter of

recompiling PHP with the ––wwiitthh ppssppeellll flag, which

should be helpful if you need to convince your ISP to

add this extension Unfortunately, even though the

underlying library is almost always available, very few

ISPs actually have this extension enabled, so keep that

in mind when writing software that will depend on the

functionality offered by pspell

Getting Started with pspell

Installing or upgrading Aspell (PHP requires Aspell

0.50.0+) is a fairly simple process that involves

down-loading and installing the library itself, followed by the

installation of the dictionaries you intend to use The

library includes only the spell checking engine—the

dictionaries must be installed individually from separate

packages available on the Aspell’s website Additional

dictionaries can be added at any point, so there is littleneed to install all of the available dictionaries rightaway That said, the dictionary files themselves takevery little space (about one to two megabytes each)and the advantage of compiling them at the onset isthat you won’t have waste time if you want to use addi-tional dictionaries at a later point In any case, all majordistributions have binary packages for both the libraryand commonly used dictionaries, so the upgrade/installprocess is fairly painless

Once the library is installed, you simply need to add

——wwiitthh ppssppeellll to your PHP configuration If the librarywas not installed inside the standard location, such as//uussrr or //uussrr//llooccaall, you will need to specify the correctpath to the directory where it resides, for example ——wwiitthh ssppeellll==//ppaatthh//ttoo//lliibb You also have the option ofinstalling pspell as a shared extension (via ——wwiitthh ppssppeellll==//uussrr,,sshhaarreedd) that can be enabled only for par-ticular hosts This is quite useful if you need to enablethe functionality for a specific account or limit capabil-ity to use pspell to higher tier accounts It is important

to keep in mind that spell checking is a relatively slowprocess and spell checking large quantities of text maytake some time Therefore, it is important set execution

REQUIREMENTS

Everyone makes typos That is a universal constant, but no one wants their typos to end up in the final product, be it

an e-mail or a blog entry Consequently, many programs have integrated spell checkers that can find and help correct the mistakes made by busy fingers Unfortunately, for the most part this functionality is not available to many forms of web communications, such as forums, blogs and online comment systems This is primarily due to the fact that it is not easy to implement a spell checker, and few developers are familiar with the extensions and libraries that can simplify the process This article will focus on two PHP extensions that offer spell checking functionality that can be used to validate and correct typos and spelling errors

Trang 21

time limits to prevent scripts from taking excessive

amounts of time when forced to spell-check large

doc-uments

Once all the necessary tools are in place, the actual

spell-checking process can begin The first step is the

creation of a pspell resource that will allow the usage of

the spell checker This is done via the ppssppeellll nneeww(())

function, which accepts a number of parameters The

first and the only required parameter is the two-letter

language code that tells the extension which dictionary

will be used Since some languages use multiple

spellings for the same word (depending, for example,

on the particular dialect of the language spoken in a

country), you may also want to specify the country as

well, which can be passed along as a second, optional

parameter For example, for the English language there

are three possible country values:

British, Canadian and American

You also have an option of

speci-fying a jargon and locale files,

although these values are largely

unused and in most instances it

is best to leave them at their

defaults The very last option (a

bit mask) allows you to set your

preferences regarding how hard

pspell should try to find spelling

alternatives to a word that is

mis-spelled The values range from

PPSSPPEELLLL FFAASSTT, which will return

the fewest number of suggestions but will take the least

amount of CPU, to PPSSPPEELLLL BBAADD SSPPEELLLLEERRSS, which will

return the maximum possible number of suggestions,

but will take a noticeably greater amount of CPU The

default mode is PPSSPPEELLLL NNOORRMMAALL, which tries to find a

“happy compromise” between the quality of the

sug-gestions returned and the processing time needed to

generate them You can also use this parameter to set

an option and indicate how words that are not separate

by a space (also known as run-togethers) should be

handled By default, they would be considered typos,

but in some instances you may want to allow them

The spell-checking options can also be set via the

ppssppeellll ccoonnffiigg rruunnttooggeetthheerr(()) function, which can

change the run-together behaviour, and

ppssppeellll ccoonnffiigg mmooddee(()), which can change the spelling

mode The ability to change the mode at any time is

very handy, as it allows the usage of faster defaults and

then, if these fail to generate the necessary data, to

switch to a more complex mechanism for a particularword

Now that we have created a spell-checking resource,

it can be used to validate text The actual text tion is done through two functions, ppssppeellll cchheecckk(()),which will determine if the specified word is correctlyspelled and return FFaallssee if it isn’t In this case, you canuse the ppssppeellll ssuuggggeesstt(())function to generate an array

valida-of possible alternatives

if (!pspell_check($psl, “speler”)) {

$suggestions = pspell_suggest($psl, “speler”);

foreach ($suggestions as $word) {

echo $word “<br />\n”;

} }

Both ppssppeellll cchheecckk(()) and

ppssppeellll ssuuggggeesstt(())can only workwith one word at a time; tospell-check an entire document,you will need to first use PHP tobreak down the text into indi-vidual words that can be fed tothe pspell If you are dealingwith plain text, this is very sim-ple to do, especially so if youhave PHP 4.3.0, where the

ssttrr wwoorrdd ccoouunntt(()) function isavailable This function hasthree modes of operation: the default mode will simplycount the number of words inside a string and return

an integer result The second mode—the one wewant—will return an array of words that can be spellchecked

$wl = str_word_count(“will return an aray of words”, 1);

foreach ($wl as $key => $word) {

if (!pspell_check($psl, $word)) {

$sug = pspell_suggest($psl, $word);

// replace word with 1st suggestion

$wl[$key] = $sug[0];

} } // print corrected text (will return an array of words)

echo implode(‘ ‘, $wl);

If you are using an older version of PHP that does nothave ssttrr wwoorrdd ccoouunntt(()) you can emulate the secondoperation mode by using pprreegg mmaattcchh aallll(()), which isnoticeably slower, but will still get the job done

If (!function_exists(“str_word_count”)) { function str_word_count($text) {

preg_match_all(‘!(\w+)!’, $text, $m);

return $m[0];

} }

F

“A t this time, Aspell has dictionaries for over 20 languages, and new ones are being added all the time”

Trang 22

The problem with this code is that the word array you

will receive in return from your call to the “simulated”

ssttrr wwoorrdd ccoouunntt(()) function contains only the words,

and all of the punctuation and non-alphabetic

charac-ters will not be present Thus, if you simply do

iimmppllooddee(())as I did in the previous example, all of those

characters will be lost and only the words will be

retained—clearly a bad idea Thus, we need a way to

replace the misspellings without losing the formatting,

so that our modifications only affect the misspelled

words

This is where the third and arguably the most useful

mode of the ssttrr wwoorrdd ccoouunntt(()) function comes into

play When this mode is used the resulting array will

have the offset of the word inside the string as the key

for each element This allows you to easily find the

posi-tion of the word inside the text and replace it via

ssuubbssttrr rreeppllaaccee(()) quickly and efficiently—and without

the risk of text corruption While this can be emulated

with pprreegg mmaattcchh aallll(()), it would require the use of the

PPRREEGG OOFFFFSSEETT CCAAPPTTUURREE flag that is only available in PHP

4.3.0 and higher Since PHP 4.3.0 already has a native

ssttrr wwoorrdd ccoouunntt(())function, there is no need to emulate

it using a slower alternative If you are using an older

version of PHP, you will need to come up with your own

string parser—or better yet upgrade your installation!

// adjust offset since word has changed

$off += strlen($r) - strlen($w);

}

}

return $s;

}

Replacing a misspelled word with the first suggestion

offered by the spell checker is not always the best

approach, although most of the time it will work

rea-sonably well Generally speaking, it is better to replace

a word with a select box that would allow the user to

choose a correct spelling or convert the word into a link

that would raise a layer with possible suggestions

through JavaScript The function itself would pretty

much remain the same—except that the code that

deals with replacement would now loop through all the

possible results and create an appropriate list of

alterna-tives

Advanced pspell

Now that the basic spell checking functionality is

work-ing, let’s take a look at some of the more advanced

capabilities that the pspell extension offers

When working with text, you will undoubtedlyencounter words that are correctly spelled, but the spellchecker does not recognize This is a frequent occur-rence when using industry-specific terminology, names

or slang In those instances, you would probably want

to make the spell checker ignore this word and not try

to suggest alternatives for it For example, when ing with HTML tags inside the text or formatting tagssuch as FUDcode or BBcode, you could add the tags tothe dictionary so that the spell checker can simply skipover them and save you the time of having to add spe-cial handlers for those tags, over complicating yourcode For this purpose, you can use the

work-ppssppeellll aadddd ttoo sseessssiioonn(())function to add a word to thecurrent session that would effectively make the spellchecker ignore it

// Before: will print <BLOCK QUOTE>stuff</BLOCK QUOTE>

echo spell_check_str($psl, QUOTE>’);

‘<BLOCKQUOTE>stuf</BLOCK-// After: will print <BLOCKQUOTE>stuff</BLOCKQUOTE>

a joint ignore list To create such a list, the pspellresource creation process needs to be changed to allowfor the usage of personal dictionary files Instead of using the ppssppeellll nneeww(()) function,

ppssppeellll ccoonnffiigg ccrreeaattee(())is used to create a new pspellconfiguration resource This function takes all of thesame arguments as ppssppeellll nneeww(()), except the optionparameter The options regarding the mode and han-dling of the run-togethers will need to be set via

ssppeellll ccoonnffiigg mmooddee(())and ppssppeellll ccoonnffiigg rruunnttooggeetthheerr(())

separately The ppssppeellll ccoonnffiigg ppeerrssoonnaall(()) function isthen used to specify the path to the custom word listfile, containing a list of words to ignore If you intend

to add to this file, be sure that it is writable by the userwho PHP is going to be running as Once these stepsare completed, a pspell resource can be created based

on the configuration resource that was generated viathe ppssppeellll nneeww ccoonnffiigg(())function

// create new config based on english

Trang 23

New words can then be added to the ignore list via

the ppssppeellll aadddd ttoo ppeerrssoonnaall(()) function, which is

iden-tical to ppssppeellll aadddd ttoo sseessssiioonn(())as far as its parameters

are concerned Once all of the necessary words have

been added, they can be appended to the ignore file

via the ppssppeellll ssaavvee wwoorrddlliisstt(())function

// add word to personal dictionary file

pspell_add_to_personal($psl, “Ilia”);

// safe wordlist (appends to existing list

pspell_save_wordlist($psl);

Unfortunately, the Aspell library does not provide an

API for removing or modifying existing entries inside

the personal word list file To make these changes, you

will need to write your own function Fortunately, this

is very easy to do, since the format of the file includes

a basic header that specifies how many entries can be

found inside it, followed by the entries themselves, one

per line If you find yourself editing your custom

dic-tionaries very often, you can create a function for this

purpose, otherwise using your favorite text editor will

In some instances, you not only want to add words to

the ignore list, but also add them to the dictionary file

itself as possible alternatives that can be used in future

runs as a replacement for typos in words that are not

found in the stock dictionary file

This, too, is something that can be done through the

pspell extension, which allows for the creation and

usage of personal dictionary files that can be used in

addition to the base file provided for a particular

lan-guage If the dictionary file fails to find a match, it’ll try

using the custom file to determine a possible alternative

for misspelled word As with the word list file, you first

need to specify the path to the file where possible

alter-natives can be found This is done via the

ppssppeellll ccoonnffiigg rreeppll(()) function, which takes a pspell

configuration resource as the first parameter and the

path to the alternate dictionary file as the second The

ppssppeellll ssttoorree rreeppllaacceemmeenntt(()) can then be used to add

replacement suggestions, and calling

ppssppeellll ssaavvee wwoorrddlliisstt(())will now save both the ignore

list and the replacement list

$psc = pspell_config_create(“en”);

// specify personal replacement file pspell_config_repl($psc, “./my.rep”);

// Add replacement pspell_store_replacement($psl, “Iaaaliaa”, “Ilia”);

// save replacement pspell_save_wordlist($psl);

If you do not want to save a replacement, you canuse the ppssppeellll ssttoorree rreeppllaacceemmeenntt(()) function withoutspecifying the path to the file and saving the word list.The replacement mechanism itself is intelligent enoughthat if the specified string is close enough to the source,

it will use the replacement rather then the base ary, which may have further matches Using the abovecode as an example, if I were to spell check “Iaaliaa”, itwould prioritize “Ilia”, which was my replacement for

diction-“Iaaaliaa” over the dictionary’s suggestion of “Alia”.This, of course, means that, when adding replacementpairs, you don’t actually need to add an entry for everypossible misspelling of the word being added.Moreover, the library is intelligent enough to check itsmain database to see if the replacement is already avail-able and, if it is, it will not add the word to the person-

al replacement file

As with the word list file, there is no native function

to modify or remove entries from the replacement file.Fortunately, the format of this file is even simpler thanthe one used by the word list, because, while it doeshave a one line header, it is not actually being used.Other than the header, the entries are stored in thebbaadd wwoorrdd rreeppllaacceemmeenntt format and can be easily modi-fied with the following function

function md_repl($file, $src, $dst, $n_src=’’,

$n_dst=’’) {

// remove word

$data = str_replace(“\n{$src} {$dst}\n”,

“\n”, file_get_contents($file));

// add new replacement

if ($new_src && new_dst) {

$data = “{$new_src} {$new_dst}\n”;

} // update word list file fwrite(fopen($file, “w”), $data);

}

Beyond pspell

Aside from pspell, another spell checking extensioncalled Enchant has been recently made available

This extension can be found inside the PECL

reposito-ry and can be installed by running ppeeaarr iinnssttaalllleenncchhaanntt It is based on the Enchant library that provides

a common API to multiple spell checking engines, such

as Aspell, Ispell, MySpell, and so on

Having a single native API means that you can lessly use multiple engines without having to write your

seam-F

Trang 24

own wrappers around different interfaces The Enchant

library works directly with each spell checking library,

so there is virtually no speed difference between using

the native interface offered by pspell and the wrapper

offered by Enchant

The main advantage of Enchant is that it gives you

the ability to use different spell checkers that may

sup-port other languages or have specific benefits, such as

lower memory footprint (Ispell) and better dictionaries

It also guarantees that you will have access to spell

checking support on virtually any system, since at least

one spell checking library is always included, although

you will need to install Enchant itself

The Enchant extension API is fairly similar to that of

the pspell extension and, for the most part, offers the

same capabilities—with, however, a few notable

differ-ences Since the Enchant extension can work with

many different spell checking engines, the spell

check-ing resource creation is designed to accommodate the

selection of the engine to be used The first step is to

initialize the enchant broker, which is done through a

call to eenncchhaanntt bbrrookkeerr iinniitt(()) function You can then

use the resulting resource to determine what spell

checking engines are supported by calling the

eenncchhaanntt bbrrookkeerr ddeessccrriibbee(()) function, which will return

an array of information arrays about the supported

/usr/lib/enchant/libenchant_aspell.so

)

)

The next step is determining the availability of a

dic-tionary for a language you want to spell check Unlike

pspell, which uses two parameters to select language

and locale, in Enchant both the settings are handled by

a single parameter, which looks something like

llaann gguuaaggee LLOOCCAALLEE (for example, eenn CCAA for Canadian

English) This parameter is then passed to the

eenncchhaanntt bbrrookkeerr ddiicctt eexxiissttss(()) function, which returns

TTrruuee if a dictionary is available and FFaallssee otherwise

If you have more then one spell checking engine

available (which is almost always the case), the Enchant

library will automatically choose what it thinks is the

best engine for the task, based on the availability of a

dictionary and its quality If the default choice is not to

your liking, you can modify the order in which the

engines are picked by using the

eenncchhaanntt bbrrookkeerr sseett oorrddeerriinngg(()) function, which takes

the language string, followed by a comma delimited

string where the engines are listed in the order you

want them to be used

function, which will return an array with informationabout the selected engine

an array of suggestions by calling the

To simplify the process, Enchant also offers the

eenncchhaanntt ddiicctt qquuiicckk cchheecckk(())function, which can check

a word and return a list of possible alternatives if it isnot spelled correctly, all in one go This makes the spellchecking-code slightly faster and reduces the amount

of PHP code you need to write (which is never a badthing)

if (!enchant_dict_quick_check($d, “spel”, tions)) {

$sugges-print_r($suggestions);

}

F

is that it gives you the ability to use different spell checkers that may support other languages ”

Trang 25

When the function returns FFaallssee, indicating that the

specified word has been misspelled and a variable is

provided as the third optional argument (passed by

ref-erence), the function will populate that variable with

possible spelling alternatives

Once you are done working with the spell checker,

you should free the dictionary and the broker resources

by calling the eenncchhaanntt bbrrookkeerr ffrreeee ddiicctt(()) and

eenncchhaanntt bbrrookkeerr ffrreeee(()) functions respectively While

PHP will free those resources automatically on script

ter-mination, it is generally better to do so manually, so

that memory and dictionary file handles are released as

soon as possible

The Enchant extension also supports the ignore lists,

which can be used to allow certain words to be skipped

by the spell checker As with pspell, you have the

abili-ty to use both session- and file-based ignore lists, which

can be shared by multiple processes The session-based

ignore lists are handled by two functions,

eenncchhaanntt ddiicctt iiss iinn sseessssiioonn(()), which checks if a

par-ticular word is already being ignored and

eenncchhaanntt ddiicctt aadddd ttoo sseessssiioonn(()), which adds a word to

a session’s ignore list Since the add-to-session function

does not return a status indicator, you should use

eenncchhaanntt ddiicctt iiss iinn sseessssiioonn(()) to verify if the word

was, in fact, added successfully Keep in mind that not

all spell-checking engines may support this

functionali-ty, so this may not always be possible

enchant_dict_add_to_session($d, “Ilia”);

if (!enchant_dict_is_in_session($d, “Ilia”)) {

exit(“Cannot add to session ignore list.\n”);

}

To use a more permanent file-based ignore list, you

first need to establish the path to your ignore file by

calling the eenncchhaanntt bbrrookkeerr rreeqquueesstt ppwwll ddiicctt(())

func-tion The file must already exits, but can be empty—if

it does not exist or is not accessible, the function will

fail To add entries to the file, you can use the

eenncchhaanntt ddiicctt aadddd ttoo ppeerrssoonnaall(())function Like the

ses-sion function, this function does not return a success

indicator and eenncchhaanntt ddiicctt iiss iinn sseessssiioonn(())should be

used to confirm that the word has actually been added

Because the file name you provide is a generic

hold-er for the phold-ersonal word list, the Enchant library will

automatically create a spell-checking engine-specific

file as well For example, if the Aspell backend is being

used, Enchant will also create mmyy ppwwss inside the same

directory as mmyy ddiicctt This is a very important tidbit of

information to keep in mind when adding new words

to the list, since you will need to ensure that not onlymmyy ddiicctt is writable, but the directory it is in as well

Adding replacement alternatives is also possible;however, unlike what happens with pspell, these arealways session-specific, and there is no way to savethem for later re-use This is done through the

eenncchhaanntt ddiicctt ssttoorree rreeppllaacceemmeenntt(()) function, whichtakes a source string and a possible replacement thatcan be used for substitution

enchant_dict_store_replacement($d, “AAliaaa”,

“Ilia”);

enchant_dict_quick_check($d, “AAliaaa”, $sug);

echo $sug[0] “\n”; // will print Ilia

Mistakes Hapen Without A SpelCheker

As I hope you have an opportunity to discover, spellchecking text strings from you PHP scripts is not at alldifficult For the most part, the biggest difficulty is not

in checking the text—but in actually breaking it downinto individual words that can then be validated.Fortunately, ever since the introduction of the

ssttrr wwoorrdd ccoouunntt(())function, this has become a fairly ial process

triv-The functionality offered by a spell checker has manyuses, even in situations where users do not input longtext strings For example, in a search engine a spellchecker can be used to validate keywords You can alsouse it inside a PHP 404 handler (which I discussed in theMarch 2004 issue of php|architect) to check for typosand automatically correct them, taking the user to rightpage without any manual intervention or extra steps.Ultimately, a spell checker is a powerful tool that can besuccessfully applied to many problems with little effort,but that can make a big impact on the quality of yourapplications

F

To Discuss this article:

http://forums.phparch.com/149

Ilia Alshanetsky is an active member of the PHP development team and

is the current release manager of PHP 4.3.X Ilia is also the principal developer of FUDforum (h http://f dforum org), an open source bulletin board and a contributor to several other projects.

Trang 26

Can’t stop thinking about PHP?

Write for us!

Trang 27

Sometimes, you need a paper trail—an

honest-to-god, dead-tree paper trail Maybe you’re dealing

with data that needs to be formatted and

com-pared by human eye to previous decades of paper

records, or perhaps there are government regulations

requiring paper, or maybe it’s even just a pointy haired

boss who wants to do things the way they’ve always

been done

Fortunately, you can actually merge the digital and

the dead-tree using FDF—an Adobe technology layered

on top of their PDF format used pretty much by

every-body everywhere for cross-platform printable

docu-ments

You can think of an FDF file as a marriage between

HTML forms and PDF documents In practice, you can

have the same interface on your website as you have in

off-line paper-based forms, and all your web-based

forms can be printed on demand, complete with all

their fields filled in You can use the FDF forms in place

of your old HTML forms, and suddenly stop worrying

about browser compatibility and CSS layout snafus—

and get decent printing to boot

What does an FDF file look like?

You can create a simple FDF quite easily Using Adobe

Acrobat, you can scan in an existing paper form, or—

even better—import a Word document or several other

digital formats, including existing HTML forms

There are also a variety of other tools to create an

FDF, with varying features (and mis-features) A nice

comparison round-up can be found at

h

http://www.pcmag.com/article2/0,4149,1195058,00.asp

Alas, this article doesn’t specifically address creatingFDF as a super-set of creating PDF files, so, in the end,you may be stuck with only the one option of AdobeAcrobat for now Regardless of which tool you use tocreate an FDF, the basic internal structure will be thesame

Most likely, the tool available will turn the majority ofyour existing documents into a static image and, in thecase of HTML forms, turn all your IINNPPUUTT tags (andSSEELLEECCTT and TTEEXXTTAARREEAA) into corresponding FDF elements

If you’re dealing with scanned-in forms, you may need

to export images from your PDF and ‘clean up’ them toremove some of the static parts that you’ll be replacingwith FDF form elements, or you may find it easier, insome cases, to simply re-do the import after getting rid

of the offending elements in the source documents.Since images tend to occupy a lot of memory, you mayalso need to experiment with different tools to find theone that lets you ‘slim down’ your PDF to a size thatyou consider appropriate for your needs

For the purposes of this article, I have created thesimple HTML form that you can see in Figure 1 I thenused Adobe Acrobat to “Create PDF -> From Web

REQUIREMENTS

Trang 28

Page” and pasted in my URL Then I had to rename all

the FORM input elements (see the “Challenges”

side-bars) and saved the resulting PDF

Internally, an FDF document keeps track of your field

names and values, as well as some basic presentation

information such as foreground/background color,

font, data type (optional) and any client-side validation

you wish to include In addition, the FDF document will

contain PDF information which will define the static

look of the document Field labels, form instructions,

logos, and any other artwork will typically all be wired” into the FDF Once processed, the PDF will looklike what you see in Figure 2 if you open it with a texteditor, or Figure 3 if you open it in Acrobat

“hard-If you scan in a paper form or import a Word file (or

a similar document), you may need to use your FDFtool to draw some controls on the FDF that will actual-

ly be used to input and output your data You’ll need toname the boxes suitably, and it would be nice to givethem Tooltips and nice labels This is accomplished indifferent ways depending on FDF tool you’re using, but

if you dig around in the menus in your FDF editor, ing for something that says (or looks like) a radio but-ton, an input box, and so on, you should be able to fig-ure out the GUI without much in the way of problems The GUI should also make how to make an input boxread-only or hidden and other similar amenities fairlyobvious, although I must say that, in my opinion, the

look-‘forms’ menu in Acrobat is rather deeply-buried, whichmakes things a bit difficult Fortunately, you can com-pletely re-arrange the Adobe tool palettes to your ownneeds, which I did by getting rid of all the commentingtools, and adding the ‘forms’ tools instead Ultimately,you need to create suitable FDF elements for yourapplication

The end user can then “fill in” the fields of the form,with Adobe’s free Reader, which will enforce the valida-tion, and then submit the data to your on-line applica-tion There are several choices for submission, but one

of them is “HTML” which means the FDF will POSTdata exactly the same way an HTML FORM would

Where does PHP come in?

Since you probably already know how to process POST

Trang 29

data using PHP, that makes half of this process trivial—

you can insert or update the data coming from your

FDF forms exactly the same way you would with an

HTML form if you select “HTML” on your Submit

but-tons in your FDF

Stepping back in the process, you

can also use PHP to pre-process the

FDF documents before they are

pre-sented to the user to be filled in You

can pre-fill any known fields, saving

the user from re-typing their name

or other existing info You can also

show (or hide) various form

ele-ments as needed, so long as there is

no corresponding visual element

hard-wired in the PDF portion that

remains to visually confuse the user

You may find yourself, as I did,

con-verting what was formerly static text

into an un-editable FDF form element just so that you

can hide it when needed it

Pre-filling in your form and manipulating the FDF will

require the FDF library and the PHP FDF Module to be

installed, and you will need to use a handful of PHP’s

FDF functions, which is what we’ll focus on in this

arti-cle For more information on diagnosing your system

and installing these software packages, see the

“Installation” sidebar

An FDF application

Your first step in PHP to create a dynamic FDF file

con-sists of creating an FDF resource, just as you would

cre-ate an image or database connection You first want to

create a ‘blank’ FDF using ffddff ccrreeaattee(()) as shown in

Listing 1 Next, you would want to pre-set any known

fields in your FDF document These could come from a

database, or as in the example shown in Listing 2, the

current date

The arguments to ffddff sseett vvaalluuee(()) are mostly

obvi-ous: the FDF resource we have created, the name of the

field, and the value to be used are about as simple as it

gets The final argument, in this case zero, is actually

there only for backwards compatibility In older sions of the FDF Toolkit, a distinction between internal-ized FDF values (e.g.: FDF’s TRUE/FALSE) and externalvalues (such as our date) were separated and you

ver-would pass in one or zero based onwhat kind of value you were using

In today’s version, the final argument

is ignored, even though it is stillrequired Expect it to becomeoptional in future PHP releases

We have also used

ffddff sseett ffllaaggss(()) to make our ttooddaayyfield read-only, which will help cutdown the number of submissionerrors The Adobe FDF documenta-tion for this function is quiteobscure, but I think I have it sortedout The first argument, of course, isour FDF resource, and the second isthe name of the FDF field, which is straightforwardenough

The third argument is the “Key” for what we aregoing to be doing to the bits we are going to be chang-ing There are only six possible values for this argu-ment:

FDFSetFf FDFClearFf FDFFf FDFSetF FDFClrF FDFFlags

The first three “Key” values, FFDDFFSSeettFF, FFDDFFCClleeaarrFFff, andFFDDFFFFff can be used to Set, Clear, or completely replacethe field settings for the following attributes:

R ReadOnly y (0x00000001) R

Required d (0x00000002) P

Password d (0x00002000) - text fields only F

FileSelect t (0x00100000) - text fields only D

DoNotSpellCheck (0x00400000) - text fields only

NOTE: I should also point out that there

actually was a bug in PHP’s FDF Module

in version 4.3.3, so that I had to roll-back

a PHP upgrade and I couldn’t use some

nifty functions like ffddff eennuumm vvaalluueess(())

and ffddff ssaavvee ssttrriinngg(()) On the other

hand, this was fixed in CVS just a few

hours after then initial report—open

source software is just great!

// Pre-set the ‘today’ field to today’s date:

fdf_set_value ( $outfdf , ‘today’ , date ( ‘Y-m-d’ ), 0 );

// Force the “today” field to be read-only:

fdf_set_flags ( $outfdf , “today” , FDFSetF , 1 );

“Y ou can think

of an FDF file as a marriage

between HTML forms and PDF documents.”

Trang 30

MultipleSelection (0x00200000) - listbox only

So, for example, ffddff sseett ffllaaggss(($$oouuttffddff,,

““rreeccoorrdd iidd””,, FFDDFFSSeettFFff,, 11 && 22)) would make the

rreeccoorrdd iidd field read-only and required Other settings

(e.g.: DDooNNoottSSccrroollll) would remain untouched with

whatever previous setting they had intact

Changing FFDDFFSSeettFFff to FFDDFFCCllrrFFff would clear the given

attributes, making the field writable (not read-only)

and allowing the field to be blank (not required):

// read/write, blank OK, preserve others

fdf_set_flags($outfdf, ‘record_id’, FDFClrFf, 1 & 2);

Using FFDDFFFFff instead would cause a wholesale

replace-ment of the attributes: FFDDFFFFff would set the field

read-only and required, and turn off any of the other settings

that had previously been set to “on”:

// read-only, required, reset all others to 0:

fdf_set_flags($outfdf, ‘record_id’, FDFFf, 1 & 2);

The last three “Key” values, FFDDFFSSeettFF, FFDDFFCCllrrFF, and

FFDDFFFFllaaggss, can be used to Set, Clear, or completely

replace the following attributes:

• H Hidden n flag (0x02)

• P Print flag (0x04)

• N NoView w (print only) flag (0x20)

Thus, ffddff sseett ffllaaggss(($$oouuttffddff,, ““rreeccoorrdd iidd””,,

FFDDFFSSeettFF,, 22 will hide a field, while

ffddff sseett ffllaaggss(($$oouuttffddff,, ““rreeccoorrdd iidd””,, FFDDFFCCllrrFF,, 22 will

show a field

This can be handy to show/hide buttons after a formsubmission, as we do in our example application afterprocessing the $$ PPOOSSTT data input in Listing 3

We also use the ffddff sseett ssttaattuuss(())function to set theFDF’s ssttaattuuss value This is akin to a JavaScript aalleerrtt(())

function Whatever message you put into an FDF’s ttuuss will be displayed in a popup dialog when the FDF

ssttaa is opened Thssttaa is ssttaa is handy to set success or error sages, as we do here to display the user’s Score in ourlittle quiz

mes-You may be wondering what happened to the PDFyou created For reasons I don’t really understand, youactually integrate that after you’ve set all the field val-ues, flag, options, and so on This is done with the

ffddff sseett ffiillee(())function, using something like:

fdf_set_file($outfdf, “/full/path/to/your/PDF”) or error_log(“ERROR: Unable to set FDF file.”);

To wrap this up, you then need to send the FDF tothe browser In more recent versions of PHP (4.3 andhigher) you can use ffddff ssaavvee ssttrriinngg(($$oouuttffddff))to sendthe FDF directly to your script’s output (which, if you’rerunning in a web environment, will reach the clientbrowser) However, in older versions of PHP you need

to save it to a temporary file first, as shown in Listing 4.Sending the FDF to the browser, with the correct

F

Listing 3

// Setting an FDF “button” to submit the fields as “HTML”

// lets us process FDF submission exactly as we would an

// HTML FORM using $_POST:

if (isset( $_POST ) && count ( $_POST )){

$corrects = array( ‘fdf_is’ => ‘Adobe’ , ‘bug_count’ => 3 , ‘case_sensitive’ => ‘on’ );

$high = count ( $corrects );

$score = 0 ;

reset ( $_POST );

while (list( $key , $value ) = each ( $_POST )){

if (isset( $corrects [ $key ]) && ( $corrects [ $key ] == $value )){

$score ++;

}

error_log ( “setting $key to $value” );

fdf_set_value ( $outfdf , $key , $value , 1 );

}

$percent = sprintf ( “%02.2f” , 100 * $score / $high );

fdf_set_status ( $outfdf , “Your scored $score/$high for $percent%” );

// Hide the ‘Grade Me’ button, since they already took the Quiz:

fdf_set_flags ( $outfdf , “save” , FDFSetF , 2 );

}

Listing 4

// Dump our PDF to a random temp file, since we don’t // have PHP >= 4.3.0 which would allow us to send it //directly to the browser

$temp = tempnam ( “/tmp” , “_FDF_” );

fdf_save ( $outfdf , $temp );

fdf_close ( $outfdf );

Trang 31

headers, is pretty straight forward, as you can see in

Listing 5 This might look a bit weird, if you’ve only

used PHP to present HTML documents, but PHP is

actu-ally quite adept at spewing out a large variety of

web-related documents, such as JPEG/PNG/GIF, PDF, Flash,

and, in this case FDF After all, the user doesn’t need the

document to be boring and static—if the right solution

calls for a dynamic FDF, then PHP can do that

If you’re using Netscape as your browser, that pretty

much sums it up In fact, you probably should go

ahead and try to build a sample FDF application in PHP

now by putting all that source code into a file named

eexxaammppllee ffddff pphhpp on your server (you can, of course,also use the code that comes together with the maga-zine) You’ll also need to download or create the corre-sponding PDF, and store that on your server Finally,you’ll need to change //ppaatthh//ttoo//yyoouurr//PPDDFF in the source

to match the reality of your web server You should beable to get this working with Netscape; just don’t try touse Internet Explorer on it yet

Actually, there is one final note to the main portion ofthis article You may have noticed that, in most of thelistings, the source code uses PPHHPP SSEELLFF and other refer-ences to avoid using eexxaammppllee ffddff or eexxaammppllee ppddff This

is because everything except for Listings 2 and 3 can bepulled out and put into separate files that you caniinncclluuddee and use over and over I called themffddff iinnppuutt iinncc and ffddff oouuttppuutt iinncc myself, but you canarrange your own include files any way you like

Listings 2 and 3 deal with the business logic of theactual Sample Quiz, while all the rest of the code han-dles the grungy details of FDF creation and workingaround Microsoft bugs In the FDF projects on which Iwork, we have dozens, soon to be thousands, of formsusing the same FDF code, and our core business logicboils down to using $$ PPOOSSTT data to alter our database,which you already know how to do, and using ffddff sseett vvaalluuee(()), ffddff sseett ffllaaggss(()), and

ffddff sseett ssttaattuuss(())to pre-fill in our FDF, hide/show tons, and popup messages to the user respectively

but-And Now, For the Fun Part: InternetExplorer

Microsoft Internet Explorer is badly broken, at least insome areas So are all the other browsers, as you know,

but in this case, IE is very badly broken indeed.

Fortunately, there are ways around all that brokenness

In some cases, the brokenness is restricted to one

ver-F

Listing 6

// Microsoft Internet Explorer is broken, so we bury what

// should be _GET variables in our URL and import them

// here into $_PATH (like $_GET, only different)

// variables

if (isset( $_SERVER [ ‘PATH_INFO’ ])){

$variables = $_SERVER [ ‘PATH_INFO’ ];

// substr strips off initial “/”

$variables = explode ( “/” , substr ( $variables , 1 ));

while (list(, $key ) = each ( $variables )){

list(, $value ) = urldecode ( each ( $variables ));

$_PATH [ $key ] = $value ;

}

}

Listing 7

// Microsoft Internet Explorer is broken, so we force every URL to be unique by

// embedding a random string in it

// It has to be before the “xxx.pdf” for the same reasons that forced us to

// build $_PATH above

// There is a one in two billion [mt_getrandmax()] chance that this will

// screw up (having a duplicate PDF cached on the user’s machine)

// Actually, if you have a 64-bit machine, the odds are much better,

// but you probably know that already if you have a 64-bit machine

// strip off the name of the PDF:

$self_last_slash = strrpos ( $self , ‘/’ );

$self_front = substr ( $self , 0 , $self_last_slash - 1 );

$self_back = substr ( $self , $self_last_slash );

//strip off the previous random insertion:

$self_front_last_slash = strrpos ( $self_front , ‘/’ );

$self_front = substr ( $self , 0 , $self_front_last_slash );

// generate random string

mt_srand ((double) microtime () * 1000000 );

$microsoft_sucks = mt_rand ( , mt_getrandmax ());

$microsoft_sucks = “iebroken$microsoft_sucks” ;

// set the action of our “save” button dynamically

$action = “http://$_SERVER [ HTTP_HOST ] $self_front/$microsoft_sucks$self_back” ;

error_log ( “action: $action” );

fdf_set_submit_form_action ( $outfdf , “save” , FDFUp , $action , bindec ( “00111” ));

Listing 5

// Send the correct headers to the browser

// Of course, IE ignores them, but other browsers care

header ( “Content-type: application/vnd.fdf” );

header ( “Content-length: “ filesize ( $temp ));

// Finally, spew out our PDF

// If your files are HUGE you might want to use

//fopen/fread/echo in a loop to spew out only N bytes at a

// time

// Last time I checked, readfile would pull the whole file

// into RAM Use the source, Luke

readfile ( $temp );

Trang 32

sion - in other cases it’s a bug that Microsoft has

ignored for years For simplicity, I’ll simply refer to “IE”

rather than delving into the joys of browser versioning

You can always test on a dozen different versions if you

really care to tie down which bugs exist where

Microsoft sure hasn’t bothered, though, so why should

you?

The first problem with IE is that it ignores the

RFC-specified CCoonntteenntt TTyyppee header, and attempts to infer

the document type from the URL and binary contents

of the file Thus, your URL, which probably looks like

h

http://example.com/example_fdf.php p is presumed by IE to

be a pphhpp file This results in the browser complaining

that it doesn’t know how to deal with a file of that type,

even though you have correctly specified

aapppplliiccaa ttiioonn//vvnndd ffddff as the document type, and IE is quite

happy to display FDF documents whose URL ends in

“.pdf”

The simplest way to fix this is to rename your file

from eexxaammppllee ffddff pphhpp to just eexxaammppllee ffddff This way, IE

won’t know that it’s a PHP file, and will do the right

thing Naturally, without the “.php” on the end,

Apache doesn’t know that it’s a PHP file either, but

that’s easy to fix by creating (or adding to) a

hhttaacccceessss file in the same directory:

<Files ~ *_fdf>

ForceType application/x-httpd-php

</Files>

You also can completely cozen even the worst

ver-sions of Internet Explorer by tacking on a completely

‘bogus’ filename that ends in “.pdf” so your URL looks

like:

http://example.com/example_fdf/iebroken.pdf

IE thinks it’s getting a document namediieebbrrookkeenn ppddff, even though it’s really the PHP scripteexxaammppllee ffddff that is spewing out the FDF

In fact, you can give it a more meaningful name, incase users want to right-click and “Save As ” You mayalso want to add a CCoonntteenntt ddiissppoossiittiioonn header and itsfilename component if you want to fix every browserknown to man

Thusly does OpenSource prevail over broken etary software OpenSource 1, Microsoft 0

propri-You may also want to allow some data to be input via GET parameters in a URL

h http://example.com/example_fdf?record_id=1 1 works fine inNetscape Microsoft Internet Explorer, however,seems to have a deeply-ingrained belief that PDF filesare boring and static, rather than dynamic, and willsimply refuse to fire up Adobe’s Reader when you dothat

One solution to this is to simply bury the parameters in your URL; for example:

h http://example.com/example_fdf/record_id/1/iebroken.pdf.Microsoft IE will naively assume you have successivedirectories named “example_fdf”, “record_id”, and

“1”, and there is a file named iebroken.pdf, when, infact, “example_fdf” is your PHP script, and the remain-der of the URL is your GET data and a totally ficticiousfilename

PHP will provide you with the URL information in aserver variable called $$ SSEERRVVEERR[[‘‘PPAATTHH IINNFFOO’’]] You canthen write a small script to parse that variable and pop-ulate a $$ PPAATTHH variable, just like $$ GGEETT and $$ PPOOSSTT,except that you’re doing all the work, as shown inListing 6 As you can see, we also use uurrllddeeccooddee(()) tochange any of those %%XXXX or ++ signs into what they ought

F

Listing 8

// Normally, our code would call session_start() at this

// point Microsoft Internet Explorer is broken, so we

// instead force the user to be logged in before they get

// here:

if (!isset( $_COOKIE [ ‘PHPSESSID’ ])){

$host = $_SERVER [ ‘HTTP_HOST’ ];

$redirect = “http://$host/login.php” ;

// header(“Location: $redirect”);

}

Listing 9

// Shove in the blank template FDF

$self = $_SERVER [ ‘PHP_SELF’ ];

$basename = basename ( $self , “.pdf” );

// Strip out the PATH_INFO and random parts:

$true_self = substr ( $self , 0 , strpos ( $self , “$basename” “_fdf” ) - 1 );

$source = “http://$_SERVER [ HTTP_HOST ] $true_self/iebroken_fdf/$microsoft_sucks/$basename.pdf” ;

error_log ( “source: $source” );

fdf_set_file ( $outfdf , $source ) or error_log ( “ERROR: Unable to set FDF file to $source “ FILE “: “ LINE );

Trang 33

Since we’re already doing all sorts of horrible things

to the URL anyway, it’s simple enough to tack on a

ran-dom element so IE will have a different URL for every

single edition of any given PDF document Of course,

that will make IE keep an awful lot of PDFs in its cache

that nobody will ever be able to use, but there isn’t

real-ly that much that can be done about that

Your FDF “Submit” buttons also have the URL

embedded in their definition, so that they know where

to send your POST data So you’ll need to change the

aaccttiioonn of your PDF button elements as well, as I do in

is a constant that represents the action of the user ting up on the mouse button While you might thinkFFDDFFDDoowwnn would be better, the standard interface con-vention is to wait for the “up” action, so the user can

let-“slide off” the button with the mouse down to changetheir mind about clicking The $$aaccttiioonn part is just theURL, complete with all the hoops we are jumpingthrough to bypass Microsoft IE bugs

The bbiinnddeecc((““0000111111””)) part translates to “Use HTMLPOST, please” which is the same as having selected

“HTML” and “POST” in the Acrobat dialog for how theFDF data should be submitted I won’t even go into thedetails of why 0000111111 means HHTTMMLL PPOOSSTT since I don’t thinkit’s a good idea to use anything else anyway If you’refeeling up to it, you can read the FDF ToolkitDocumentation that came with your FDF Toolkit fromAdobe and puzzle out the reasoning behind that 0000111111 Our third Microsoft wall smashed by OpenSource.Our final score is 3 to 0, in case you’ve forgotten

I’d like to say that was the last and final hurdle in ourFDF experience But back here in the real world we had

a couple more challenges as we integrated my FDFcode with my co-workers’ efforts The first was thatPHP’s sseessssiioonn ssttaarrtt(()) function also “broke” the PDFfiles in Internet Explorer This is probably the same basic

F

Installation Steps

First, you’ll need to make sure you have PHP’s FDF library installed, as well as Adobe’s FDF SDK library Both

are available for free, though Adobe requires an email address to download As I understand their license, you

can only re-distribute Adobe’s FDF SDK library by paying a fee, but can install it for free on your server, or your

client’s server

The only “tricks” to installing Adobe’s FDFSDK are:

• copy the LLIIBBFFDDFFTTKK SSOO file to lliibbffddffttkk ssoo in the right place

• copy the LLIIBBFFDDFFTTKK HH file to lliibbffddffttkk hh in the right place

• convince Linux to re-load its dynamic libraries

One crucial part is getting a lower-case ssoo and lower case hh and, perhaps not as obvious as I would hope,

keep the main part of the filename the same—lliibbffddff ssoo and tthhaattffddfftthhiinngg hh won’t work too well This is all

so Linux will know lliibbffddffttkk ssoo is a dynamic library and load it as part of the OS, just like a DLL in Windows

or a System Extension on a Mac

The “right place” to put an ssoo file varies from distribution to distribution, but you can generally figure it

out by finding a whole bunch of other ssoo files if you’re new to Linux You’ll also maybe need to run

llddccoonn ffiigg or, if all else fails, re-boot, which is total overkill, but will suffice for newbie Linux admins Try looking in

//uussrr//lliibb and //uussrr//iinncclluuddee or //uussrr//llooccaall//lliibb and //uussrr//llooccaall//iinncclluuddee You may also want to play with

llddccoonn ffiigg ––vv and llddccoonnllddccoonn ffiigg vv || ggrreepp ffddff to convince yourself that you correctly installed Adobe’s FDF SDK library

for the OS

You then may need to re-configure PHP (if you installed from source) or download and compile parts of PHP

“M icrosoft IE will cache your dynamic PDF, because of its deep-ingrained belief that PDF files are boring static documents.”

Trang 34

problem as the nnoo ccaacchhee headers Fortunately, we

did-n’t need to have any PDF files as “entry points” (first

page visited) in our application, so we simply

re-direct-ed anybody without a valid session ID back to our login

page using the script in Listing 8 Since this is

essential-ly the same problem as the other headers, I won’t even

count this one—we’re ahead 3 to 0 anyway, and can

afford to be gracious

In browser testing, it turned out that some Macintosh

Internet Explorer versions were also caching the PDF

referenced in our FDF documents The PDF isn’t

actual-ly “embedded” in an FDF as you might expect, but is

pulled in via HTTP separately At any rate, we also had

to change our call to ffddff sseett ffiillee(()) to

utilize the same sort of “random” URL to fool IE, as

shown in Listing 9

This utilizes a very short iieebbrrookkeenn ffddff script, which

also uses the hhttaacccceessss hack above (note the use of ~~

** ffddff rather than a single filename) to “fool”

Macintosh Internet Explorer into not screwing up the

underlying PDF behind our FDF, as you can see in

Listing 10 Again, it’s pretty much the same

problem/solution as a previous hoop, so we won’t even

bother to count this one Still 3 to 0

It’s pretty sad that we’ve written more code to fix IE

bugs than to actually build our original application,

but, given its user base, it was also inevitable

Using Adobe Acrobat

While I’m at it, I might as well make a few pointed

com-ments about using Adobe Acrobat for building yourFDF documents—if you decide to go ahead and buy it,this will, hopefully, save you a few headaches:

• Adobe Acrobat is quite good at importingvarious formats and creating an FDF, butwho at Adobe thought it was a “Good Idea”

to make up a random prefix and re-name all

my HTML form attributes to something useless like EEWWRRLLQQWWIIUUEENNNNIIUUEERROOPPIIUUFFLLKKJJAAEEWWII ffoorrmm11 xx11 ff11has serious issues I can understand the

“form1” part, in case there are multipleFORM tags in a single page, but the restseems to make no sense

• If a given field is “required” or has some sort

of pre-set validation, Acrobat won’t be toohappy when you serve up a blank form, andcomplain to the end user that a requiredfield has not been filled out Of course ithasn’t been filled in with a valid entry… it’s

a blank form! The text of the error messagescould also be much more pleasant and clari-fied I suspect there are extra steps onecould take to override Adobe messages, butthere’s no excuse for the defaults not beinguseable

• If I’ve zoomed in significantly and hit the

F

if you have an RPM or equivalent, which probably doesn’t include the FDF library You can use <<??pphhpp

pphhppiinn ffoo(());;??>> to quickly check if you have the FDF module or not Either you see “FDF” mentioned in a nice little

grid section like the other stuff you have installed (e.g.: MySQL) or you don’t have FDF

If you are familiar with compiling PHP from source, just tack on:

——wwiitthh ffddffttkk==//ppaatthh//aabboovvee//wwhheerree//yyoouu//ppuutt//lliibbffddffttkk Note that if you put lliibbffddffttkk ssoo in //uussrr//llooccaall//lliibb, you want to

use //uussrr//llooccaall and not //uussrr//llooccaall//lliibb as your path This is because configure needs to dig down inside that

path and find both lliibbffddffttkk ssoo and lliibbffddffttkk hh

If you are using an RPM, you can probably compile just PHP’s FDF library:

• Download the PHP source ttaarr ggzz file matching your version No fudging on this—get the same

exact version

of PHP as your RPM PHP version

• Use ttaarr xxzzvvff pphhpp ** ttaarr ggzz to unpack the source

• Use ccdd pphhpp XX YY ZZ (XX YY ZZ is your version number) to move into that directory

• Use //ccoonnffiigguurree ——wwiitthh ffddffttkk==sshhaarreedd,,//ppaatthh//aabboovvee//yyoouurr//lliibbffddffttkk ssoo

• Use ccpp mmoodduulleess//ffddff ssoo //ppaatthh//ttoo//yyoouurr//pphhpp//eexxtteennssiioonnss//ddiirreeccttoorryy to copy the resulting PHP

mod-ule to a place where you are allowed to load it into your PHP scripts on the fly

If you have no clue where you are allowed to load PHP extensions from, don’t panic—PHP will print out an

error message telling you where that is when you try to load the library (see source below), or you could dig

through your pphhpp iinnii file to find the setting if you are familiar with its structure

Installation Steps Continued

Ngày đăng: 21/12/2013, 12:15

TỪ KHÓA LIÊN QUAN