If you look at Listing 1—thesource for our search page—you'll see that the very firstpart of the file is nothing more than a simple HTMLform, which contains an input text box for the key
Trang 1MARCH 2004 VOLUME III - ISSUE 3
The Magazine For PHP Professionals
Plus:
Explore your HTML code with Tidy
Testing Automation With PHP
Using the Amazon.com API
Trang 5Existing subscribers
can upgrade to
the Print edition
and save!
Login to your account
for more details.
NEW!
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly.
**Offer available only in conjunction with the purchase of a print subscription.
Choose a Subscription type:
CCaannaaddaa//UUSSAA $$ 8833 9999 CCAADD (($$5599 9999 UUSS**)) IInntteerrnnaattiioonnaall SSuurrffaaccee $$111111 9999 CCAADD (($$7799 9999 UUSS**)) IInntteerrnnaattiioonnaall AAiirr $$112255 9999 CCAADD (($$8899 9999 UUSS**))CCoommbboo eeddiittiioonn aadddd oonn $$ 1144 0000 CCAADD (($$1100 0000 UUSS))((pprriinntt ++ PPDDFF eeddiittiioonn))
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue
to be mailed to you.
*US Pricing is approximate and for illustration purposes only.
php|architect Subscription Dept.
VISA Mastercard American Express
Credit Card Number:
The Magazine For PHP Professionals
YYoouu’’llll nneevveerr kknnoow w w whhaatt w wee’’llll ccoom mee uupp w wiitthh nneexxtt
Trang 6Graphics & Layout
John Coggeshall, John Holmes,
Dr James McCaffrey, George Schlossnagle, Alessandro Sfondrini, Chris Shiflett, Andrea Trasatti
php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes
no responsibilities with regards of use of the information contained herein or in all ciated material.
asso-Contact Information:
General mailbox: info@phparch.com
Editorial: editors@phparch.com
Subscriptions: subs@phparch.com
Sales & advertising: sales@phparch.com
Technical support: support@phparch.com
Copyright © 2003-2004 Marco Tabini & Associates, Inc.
— All Rights Reserved
I'm sure you're familiar with the Chinese proverb "may
you live in interesting times." Even though I rarely
think of my professional life as dull and boring, the
last month has been particularly exciting As promised
in my exit(0) column from last month's issue, if you
look through the middle of the magazine you'll find a
full report (in colour!) on the best conference I have
ever attended—our very own php|cruise (forgive me
for a bit of professional price—eight months of prep
work will do that to you) Things went so well that
we're working on another cruise—this time going to
Alaska in the fall—and plan on making php|c an
annu-al event for many years to come
All good things come to an end, of course, and, once
back from the cruise, it's back to work Luckily for us,
work means bringing you yet another great issue of
php|architect—and I personally consider that another
good thing Like every month, we've got some great
content waiting for you in the following pages
The one I'm most proud of is George Schlossnagle's
regular expressions article Regexes are something that
pretty much every programmer has to deal with, but
that very few among us really know how to use In fact,
I've seen developers write extremely complicated code
with the explicit purpose of getting around having to
use a regular expression—and that is just plain wrong
After all, using the best solution for each problem is
what being a programmer is all about
Thus, I approached George about writing an article
on regular expressions—and it became quickly evident
that one article would not even come close to covering
the complexity of regex Now, everyone knows that I
always try my best to stay away from multi-part articles
for a multitude of reasons, but in this case I felt that the
topic more than deserved our attention over multiple
issues and, therefore, George's article is the first in a
series of three Over the next three months, he will take
you for a ride from the basics (which are covered in this
issue) to the more complex and exotic aspects of
regu-lar expressions, thus hopefully providing the PHP world
with a definitive guide to this topic
If regular expressions are not your bag, one of the
other topics covered in this month's issue is certain to
tickle your fancy For example, you may want to read
Alessandro Sfondrini's excellent article on using the
Amazon.com API directly from your PHP website, or
Andrea Trasatti's look at the world of WAP As you can
probably imagine, both Andrea and Alessandro hail
from my native Italy—and that alone makes their
arti-cles more than worth reading There, my monthly
her-itage tax is now paid up!
As I'm sure you've noticed, in the past few months
we've been publishing material about testing practices
quite frequently As larger and larger projects are
devel-Continued on page 8
Trang 7NE EW W S ST TU UF FF F
PHP 5.0 Beta 4
PHP.net has announced the release of PHP 4.3.5 RC1
This fourth beta of PHP 5 is also scheduled to be the
last one (barring unexpected surprises, that did occur
with beta 3) This beta incorporates dozens of bug fixes
since Beta 3, rewritten exceptions support, improved
interfaces support, new experimental SOAP support, as
well as lots of other improvements, some of which are
documented in the ChangeLog Some of the key
fea-tures of PHP 5 include:
• PHP 5 features the Zend Engine 2
• XML support has been completely redone in
PHP 5, all extensions are now focused around
the excellent libxml2 library
(h http://www.xmlsoft.org/)
• SQLite has been bundled with PHP For more
information on SQLite, please visit their
web-site
• A new SimpleXML extension for easily
access-ing and manipulataccess-ing XML as PHP objects It
can also interface with the DOM extension
and vice-versa
• Streams have been greatly improved,
includ-ing the ability to access low-level socket ations on streams
oper-PHP.net also announced the release of PHP 4.3.5 RC
3 This will be the last release candidate prior to thefinal release, so please test it as much as possible
For more information visit h http://www.php.net/
ZEND Optimizer 2.5.1 Zend has announced the release of Zend Optimizer2.5.1
Zend.com describes the Optimizer as: "a free tion that runs the files encoded by the Zend Encoderand Zend SafeGuard Suite, while enhancing the run-ning speed of PHP applications
applica-Benefits:
• Enables users to run files encoded by the Zend Encoder
• Increases runtime performance up to 40%."
Get more information from Z Zend.com m
Trang 8Zend Launches New PHP5 In-Depth
Articles Section
Zend Technologies have launched a new version of
their Developer's
Corner on the zend.com website PHP5 In-depth
showcases articles from many well-known PHP authors
on the new features of PHP For more information,
check out h http://www.zend.com/php/in-depth.php p
DEV Web Management System
Dev is small, but powerful and very flexible content
management system for web portals System is licensed
as freeware under the terms of GNU/GPL license It is
absolutely free for non-commercial and commercial
use Based on php4 + MySQL technology
This project allows the user to publish articles,
evalu-ate article by taking the pool, publish short news and
create back-ends in xml format, manage download
lists, Manage advertisement on your site, Be informed
about events on your site, create system reports and
export them into MS Excel or XML format and much
"Welcome to this new version, aimed at stabilization of the 2.5 branch Meanwhile, work is continuing on the new 2.6 branch PhpMyAdmin is a tool written in PHP intend-
ed to handle the administration of MySQL over the Web Currently it can create and drop databases, create/drop/alter tables, delete/edit/add fields, execute any SQL statement, manage keys on fields."
For more information visit: w www.phpmyadmin.net t
PhpSQLiteAdmin 0.2PhpSQLiteAdmin is a Web interface for the administra-tion of SQLite databases
Version 0.2 comes with some new features and a lot
of internal cleanups and refactoring PhpSQLiteAdmin
is still in an early stage of development It comes free ofcharge and without warranty
For more information visit: w www.phpsqliteadmin.net t
phpMyEdit 5.4phpMyEdit generates PHP code for displaying/editingMySQL tables in HTML All you need to do is to write asimple calling program (a utility to do this is included)
Looking for a new PHP Extension? Check out some of the latest offerings from PECL.
Trang 9It includes a huge set of table manipulation functions
(record adition, change, view, copy, and remove), table
sorting, filtering, table lookups, and more
Several minor bugs were fixed A few new options
were added Major features include tabs support, the
ability to specify SQL expressions for fields when
writ-ing to the database, the ability to define new triggers,
and more All eval() calls were removed due to security
and performance reasons Some code was optimized
Several parts of the documentation were updated A lot
of new language files were added and updated
For more information visit:
h
http://platon.sk/projects/ phpMyEdit/
ionCube Releases New Encoder
UK-based ionCube has released a new version of their
compiled code PHP encoding tools New features
include a choice of ASCII or binary encoded file formats
and optional support for OpenSource extensions such
That's it for this month—time for me to go tend to
my sunburn while I start working on the next issue.Until then, happy readings!
Editorial: Contiuned from page 5
Trang 10In the article "Exploring the Google API with SOAP,"
which appeared in the January issue of php|a, I
showed you what SOAP is and how it can be used
together with PHP We used a SOAP-encoded
docu-ment to perform a search using the Google Engine,
then we parsed the response to display the results on
our website To perform these operations, we wrote an
application from scratch; this approach can be great to
understand how SOAP works, but when a customer
asks you to implement a SOAP-based feature in an
application, you can't waste your time in that way
In this case, there are some libraries that will make
your coding quicker and easier: one of these is
NuSOAP, which allows you to send Remote Procedure
Calls (RPCs) over HTTP
This article will show you how we can use the
Amazon.com API with NuSOAP to perform searches
and display product details, without having to sort
through a lot of SOAP syntax: if you have had an
opportunity to read my previous article, you will notice
how much shorter an application written this way is,
and how much time can actually be saved by using this
method
What are Amazon Web Services?
Amazon.com is one of the most widely known on-line
shops You can find and buy almost everything, from
books to toys to power tools Several years ago,
Amazon launched a very successful affiliate program,
which they later expanded in their Web Services
pro-gram
Why would you want to use Amazon Web Services
(AWS)? For instance, if your website is about Literature,you may want to allow your users to look for books inthe (huge) Amazon database directly from your pages,without redirecting them to Amazon.com You can pro-vide them with a detailed description of each book and,when they decide to buy one, you can add it directly totheir Amazon shopping cart When the time comes tocomplete the purchase, you can redirect the userdirectly to the Amazon website, where the checkoutprocess actually takes place and you receive credit foryour affiliate referral
It is important to understand that AWS are designedonly to retrieve information about products and create,
as well as populate, shopping carts, not to perform ments: this must be done directly on the Amazon web-site-the reason being, of course, one of security for thecustomer's personal information In any case, a signifi-cant portion of the transaction is performed from yourwebsite This results in a benefit both for you and foryour users, since you can offer your customers a nearlyseamless user experience and collect your referral fees.Access to AWS, as well as to the affiliate program,requires you to register with the Amazon AssociatesProgram and obtain an Associates ID, which will identi-
Other software:: NuSOAP 0.6.4Code Directory: webs-nusoap
REQUIREMENTS
Have you ever wanted to add an online shop to your
website but gave up on the idea because you lack the
expertise and resources to run it? Using SOAP, you can
connect to Amazon Web Services and create a PHP
appli-cation to remotely browse and search products, add
them to Amazon shopping carts or wish lists and, yes,
you can even earn money on every purchase performed
from your site.
Trang 11fy each purchase sent through our website.
Getting started
Before we start coding, I recommend you download
the AWS Software Developer's Kit from
h
http://www.amazon.com/gp/browse.html/?node=3434641 1 It contains
the License Agreement, a guide (you should have a
look at it to familiarize yourself with the concepts
asso-ciated with the program) and some code
samples-including a few written in PHP!
As I mentioned earlier, you will also have to apply for
your Developer's token-an alphanumerical string
need-ed for performing searches and purchases: to do so,
you have to visit :
h
https://associates.amazon.com/exec/panama/associates/j j
o
oin/developer/application.html l
and accept the AWS terms and conditions
To write our application, we will take advantage of a
PHP library called NuSOAP-which is really just a group
of "userland" classes written in PHP and designed to
allow developers to manage SOAP web services, which
will speed up our coding by allowing us to focus on
functionality rather than on the
communication protocols NuSOAP is distributed
under the LGPL license, and can be downloaded here:
h
http://dietrich.ganx4.com/nusoap/
To add NuSOAP support to our project, we simply
have to include nnuussooaapp pphhpp to our PHP scripts using
rreeqquuiirree(()) Performing a Remote Procedure Call (RPC) is
simple—look at this example:
require("nusoap.php");
$params = array('name' => 'value');
$s = new soapclient("http://server/file.wsdl", true);
$result = $s -> call('method', $params);
First of all, we include NuSOAP and we store theparameters we will use for the RPC in the $$ppaarraammss asso-ciative array We then create a new ssooaappcclliieenntt object,passing two arguments to the constructor: the SOAPserver address and a boolean value that indicateswhether the server uses a WSDL document WSDL(Web Services Description Language) documents con-tain information about a web service, as well as itsmethods and properties They are often used by webservice providers—including Amazon
Once we have created the object, all we have to do
is to actually execute the RPC by invoking the ccaallll(())method and specifying the remote method name andthe parameters to be passed (contained in $$ppaarraammss inour case) NuSOAP automatically fetches the results ofthe call and stores them in the $$rreessuulltt array
Since we are working with a WSDL-based server,NuSOAP can actually create a "proxy" PHP class capa-ble of providing a better interface to our scripts Once
we have instantiated $$ss, we can also invoke a remotemmeetthhoodd in this way:
Parameter Name e T Type e D Description
keyword String The keyword on which the searchshould be performed.
The page number AWS returns ten results per page, so page 1 will contain results 1 through 10, page
2 results 11 through 20, and so on.
Specifies the ID of the store to browse Each Amazon store has its unique ID, which indicates what kind of products it sells (e.g.:
b books, m music, d dvd, v vhs, etc.) You can find a complete list of all the IDs available in the AWS documenta- tion.
devtag String The Developer Token you havereceived from Amazon.
Figure 2
R Result Datum m T Type e D Description Url String The URL of the product page forthis item on Amazon
Asin String The Amazon.com Standard Item Numberfor this product
ProductName String The name of the product (in our
case, the title of the book) Catalog String The category of the product (e.g.:bbooks)Authors String The name(s) of the author(s)
ReleaseDate String The release date, in human-readableformat (e.g.: "23 February, 1976").
Manufacturer String The name of the product's
manufac-turer (the publisher in our case)
ImageUrlSmall String A pointer to the products "small"image on the Amazon website
ImageUrlMedium String Same as above, for a slightly larg-er image
ImageUrlLarge String Same as above, but for an even
UsedPrice String The product's price for usedcopies.
Trang 12This can be useful to simplify our code: first, we
cre-ate a proxy client, $$pprrooxxyy; any subsequent RPCs to
methods specified in the WSDL can be performed using
the proxy, without having to use the NuSOAP ccaallll(())
method again In our application, we will use proxies to
work with AWS
Designing the application
Now that we've laid down some ground rules, it's time
to decide in detail what the goals of our application are
going to be Since we're all PHP fans, our example
web-site will be about PHP and, therefore, we'll want to
allow our users to buy books on this topic from
Amazon
The first thing that we need is a search page: users
will be able to search for a particular keyword (or for a
set of keywords) and the page will display some basic
information about each book that matches the criteria,
such as its title, an image, the publishing company,
author or authors and price We also have to provide a
way to browse the results, since AWS calls only return
ten results per call
The search page should also contain a link for each
product to another page on our website that will
con-tain a detailed description of the book, including any
user reviews and comments From here, the users will
be able to continue their purchase on Amazon.com or
add the product to their wish lists
The search page
If you have had an opportunity to read through theAWS documentation, you have probably discoveredthat searches by keyword can be performed using theKKeeyywwoorrddSSeeaarrcchhRReeqquueesstt(()) method, which requires theparameters shown in Figure 1
Assuming that the call will be successful, the serverwill return an array containing several items:
• The TToottaallRReessuullttss element, which indicatesthe number of total results returned by thequery
• The TToottaallPPaaggeess element, which provides thenumber of pages available in the searchresult
• The DDeettaaiillss sub-array, which contains a set
of data about each search result matchingour search criteria that is included in thepage we have requested Given that a searchonly returns a maximum of ten items perpage, you can expect that this array willcontain no more than ten elements Thelliittee search mode returns the data shown inFigure 2
1 <form action=” <?=$PHP_SELF ?> ” method=”GET”>
2 <input type=”text” name=”keyword” value=”” />
3 <input type=”hidden” name=”page” value=1 />
4 <input type=”submit” name=”button” value=”Search!” />
5 </form>
6 <?php
7 if (empty( $_GET [ “keyword” ])) // If the form has’n been submitted
8 exit; // Stops the execution
9
10 require( “nusoap.php” );
11
12 $client = new soapclient ( “http://soap.amazon.com/schemas2/AmazonWebServices.wsdl” , true );
13 $proxy = $client -> getProxy (); // Creates a WSDL client and a proxy
14
15 $param = array(
16 ‘keyword’ => $_GET [ “keyword” ],
17 ‘page’ => $_GET [ “page” ],
26 if(empty( $results [ “Details” ])) // Checks whether there are results
27 die( “<h3>No results found for "” $_GET [ “keyword” ] ”".</h3>” );
28
29 echo “<h3>Searched Amazon.com for "” $_GET [ “keyword” ] ”" - page “
30 $_GET [ “page” ] ” of “ $results [ “TotalPages” ] ”</h3>” ;
31
32 foreach( $results [ “Details” ] as $res ) // Prints each product details
33 echo “<img src=’” $res [ “ImageUrlMedium” ] ”’ align=’left’ /><br/>\n”
34 ”<a href=’details.php?asin=” $res [ “Asin” ] ”’><b>” $res [ “ProductName” ] ”</b></a><br /><br />\n”
35 ”<b>Authors</b>: “ @ implode ( ‘, ‘ , $res [ “Authors” ]) ”<br />\n”
36 ”<b>Publishing Company</b>: “ $res [ “Manufacturer” ] ”<br />”
37 ”<b>List Price</b>: “ $res [ “ListPrice” ] ” - <b>Our Price</b>: “
38 $res [ “OurPrice” ] ” - <b>Used Price</b>: “ $res [ “UsedPrice” ] ”<br /><br /><br />\n\n” ;
39
40 if( $_GET [ “page” ] > 1 ) // Prints a link to prev page if any
41 echo “<a href=’$PHP_SELF?keyword=” $_GET [ “keyword” ] ”&page=” ( $_GET [ “page” ]- 1 ) ”’>Previous Page</a> \n” ;
42 if( $_GET [ “page” ] < $results [ “TotalPages” ]) // Prints a link to next page if any
43 echo “ <a href=’$PHP_SELF?keyword=” $_GET [ “keyword” ] ”&page=” ( $_GET [ “page” ]+ 1 ) ”’>Next Page</a>” ;
44 ?>
Listing 1
Trang 13As you can see, the KKeeyywwoorrddSSeeaarrcchhRReeqquueesstt(())methodreturns quite a few pieces of information for everyresult item, although, of course, we don't have to out-put all of them on our site If you look at Listing 1—thesource for our search page—you'll see that the very firstpart of the file is nothing more than a simple HTMLform, which contains an input text box for the keywordand a hidden field that forces the page number to 1—this way, a new search will automatically start from thefirst page of results.
The form uses the GET method because we need touse links for the "Next Page" and "Previous Page" oper-ations (something like ppaaggee pphhpp??kkeeyywwoorrdd==bbllaahh&&ppaaggee==22)
Naturally, you could also use POST, but in that case it
would be much more difficult for someone to create adirect link to your search results, which could, in theo-
ry, prevent you from completing some sales
The second part of the script contains the actual PHPcode First of all, an if-then-else control block stops theexecution of the script if $$ GGEETT[[""kkeeyywwoorrdd""]] is empty.Otherwise, we include NuSOAP and create a SOAPclient by passing the URI of the ** wwssddll file for Amazon(which is provided in AWS documentation) and theboolean ttrruuee to indicate to the constructor of the ssooaapp cclliieenntt(()) class that the SOAP client features WSDL sup-port We also create a proxy to call AWS methodsdirectly as we have seen in the first part of the article.The parameters needed to invokeKKeeyywwoorrddSSeeaarrcchhRReeqquueesstt(())are stored in the $$ppaarraamm array;the first two (the keyword and the page number) are to
be found in the $$ GGEETT superglobal, since they changeeach time we perform or browse a search, while theothers are constant and, therefore, we hardcode them
in our script Remember to insert your developer token
ry of the search: the keyword, the current page ber and total page count, followed by details abouteach product in the current result page These are actu-ally produced by a simple ffoorreeaacchh loop, which brows-
num-es the $$rrenum-essuullttss[[""DDeettaaiillss""]] array, eecchhooing the title ofeach book, a medium-size image, its authors, publish-ing company and prices We will also provide a link toanother page, ddeettaaiillss pphhpp, which contains furtherinformation on each book The link contains a refer-ence to the product's ASIN (the Amazon identifier foreach product) in order to make the application able toretrieve the correct product from Amazon's cataloguewith another RPC
The last part of this page allows the user to browsethe results: if the current page isn't the first one (Page
F
FE EA AT TU UR RE E Connecting to Amazon.com Web Services with NuSOAP
E
Rating Integer The rating of the product in this
review
Summary String A summary of the review
Comment String The full review itself
The type of search In this case, we'll choose h heavy, since we want all the information available on a particular book
R
Result Datum m T Type e D Description
SalesRank Integer The product's sales ranking
Lists Array ofStrings The names of the ListMania liststhat contain the product
BrowseList Array of
Arrays
Indicates the product categories in which the product can be found Its contents look like this:
BrowseList =>
Array ( [0] => Array ( BrowseName => PHP )
takes to be shipped
This array contains information about the customer reviews associ- ated with the product It includes three elements: A AvgCustomerRating, which indicates the average cus- tomer rating for the product, T
TotalCustomerReviews s, which tains the number of customer reviews available and C
con-CustomerReviews, which is an array that contains the three most recent reviews (you can find the contents
of this array in Figure 6).
SimilarProducts Array ofStrings Contains the ASINs of products thatare similar to this one.
Trang 141), the script prints a link to the previous one and, if it
isn't the last page (based on the information returned
by our AWS call), it prints a link to the next one
Figure 3 shows our search page at work
The Product Detail Page
Now that we are done with the first part of the
applica-tion, it's time to move on to the product detail page,
which will show advanced information about a
particu-lar book The AWS method we need in this case is
AAssiinnSSeeaarrcchhRReeqquueesstt(()), which needs the parameters
shown in Figure 4 Just like before, the response that we
get back from Amazon is an array of arrays—except
that, in this case, we will simply concern ourselves with
the first result set, since the ASIN uniquely identifies
one product Our data, therefore, will be stored in
$$rreessuullttss[[''DDeettaaiillss'']][[00]], which, in turn, will contain
the information shown in Figure 5 As you can see,
some of the values returned are the same as the results
of the KKeeyywwoorrddSSeeaarrcchhRReeqquueesstt(()) call that we used in
Listing 1, while some others, like the customer reviews,
are more appropriate for a detailed product page
Speaking of the product page, Listing 2 contains the
code for ddeettaaiillss pphhpp First, we check $$ GGEETT[[""aassiinn""]]; if
it is empty, the program displays a warning and exits
In a more complete application, you may want a
slight-ly more verbose explanation of what went wrong, orperhaps an automatic redirection to the search page
If we have an ASIN, we include the NuSOAP library,then create a SOAP client and proxy as we did in theprevious page Please note that we have to use
sspprriinnttff(()) to transform the ASIN in a ten-characterstrings, since AWS requires it to be submitted in thatformat (as an alternative, you could use ssttrr ppaadd(()) toensure that the string is ten character long)
This time, we only need to pass the ASIN and specifyhheeaavvyy as the search type Once the RPC has been exe-cuted, we retrieve the results and print them out, using
a ffoorreeaacchh loop to cycle through the user reviews
The final touch in our application consists of ing a link back to the Amazon website in order to make
provid-it possible for our users to purchase a product—youcan't do much selling by just showing which productsare available!
The AWS documentation specifies that an HTTP formmust be set up for the purpose of submitting the pur-chase information over to Amazon.com This form (you
can look at the one in Listing 2 for an example) uses the
POST method, and its aaccttiioonn attribute is really nothingmore than a page on Amazon.com that contains the
1 <?php
2 if(empty( $_GET [ “asin” ]))
3 die( “<h3>No ASIN specified</h3>” );
4
5 require( “nusoap.php” );
6 $_GET [ “asin” ] = sprintf ( “%010d” , $_GET [ “asin” ]);
7
8 $client = new soapclient ( “http://soap.amazon.com/schemas2/AmazonWebServices.wsdl” , true );
9 $proxy = $client -> getProxy (); // Creates a WSDL client and a proxy
20 <h1> <?=$results [ “Details” ][ 0 ][ “ProductName” ] ?> </h1>
21 <img src=” <?=$results [ “Details” ][ 0 ][ “ImageUrlLarge” ] ?> ” align=”left” height=”350” />
22 <b>Authors:</b> <?= @ implode ( ‘, ‘ , $results [ “Details” ][ 0 ][ “Authors” ]) ?> <br /><br />
23 <b>Published by</b> <?=$results [ “Details” ][ 0 ][ “Manufacturer” ] ?>
24 <b> on</b> <?=$results [ “Details” ][ 0 ][ “ReleaseDate” ] ?> <br /><br />
25 <b>List Price</b>: <?=$results [ “Details” ][ 0 ][ “ListPrice” ] ?> -
26 <b>Our Price</b>: <?=$results [ “Details” ][ 0 ][ “OurPrice” ] ?> -
27 <b>Used Price</b>: <?=$results [ “Details” ][ 0 ][ “UsedPrice” ] ?> <br /><br /><br />
28 <!— Form to purchase on Amazon.com —>
29 <form method=”POST” action=”http://www.amazon.com/o/dt/assoc/handle-buy-box= <?=$_GET [ “asin” ] ?> ”>
30 <input type=”hidden” name=”asin <?=$_GET [ “asin” ] ?> ” value=”1”>
31 <input type=”hidden” name=”tag-value” value=”webservices-20”>
32 <input type=”hidden” name=”tag_value” value=”webservices-20”>
33 <input type=”hidden” name=”dev-tag-value” value=”YOUR-DEV-TOKEN”>
34 <input type=”submit” name=”submit.add-to-cart” value=”Buy From Amazon.com”>
35 <input type=”submit” name=”submit.add-to-registry.wishlist” value=”Add to Wish List”>
36 </form>
37 <!— End Form —>
38 <b>ISBN:</b> <?=$results [ “Details” ][ 0 ][ “Isbn” ] ?> <br /><br />
39 <b>Availability:</b> <?=$results [ “Details” ][ 0 ][ “Availability” ] ?> <br /><br /><br />
40 <b>Sales Ranking:</b> <?=$results [ “Details” ][ 0 ][ “SalesRank” ] ?> <br /><br />
41 <b>Average customer rating:</b> <?=$results [ “Details” ][ 0 ][ “Reviews” ][ “AvgCustomerRating” ] ?>
42 <br /><br /><h2>Read user reviews:</h2>
43 <?php
44 foreach( $results [ “Details” ][ 0 ][ “Reviews” ][ “CustomerReviews” ] as $res )
45 echo “<h3>” $res [ “Summary” ] ”</h3>”
46 ”<b>Rating: </b>” $res [ “Rating” ] ”<br /><br />” $res [ “Comment” ] ”<br /><hr />” ;
47 ?>
Listing 2
Trang 15ASIN of product that must be added to the user's
shop-ping basket A few additional hidden fields provide the
ASIN, the Associates Id and the Developer's token The
form supports two different buttons: one adds the
product to the user's basket, while the other adds it to
his wishlist
Further Improvements
As you have probably noticed, writing a SOAP-based
application using a library like NuSOAP is much faster
than developing your own SOAP classes—if you have
read my article about the Google API that appeared on
the January issue of php|a, you probably know what I
am talking about This means that you can develop
rather complex applications without having to waste
time dealing with the nitty-gritty details of the
underly-ing protocol; in fact, we didn't even write any SOAP
code for our Amazon application—NuSOAP did it all for
us
Naturally, the code that I have introduced here is very
basic and could stand to gain from some
improve-ments For instance, Amazon Web Services allow you to
to manage a a remote shopping cart or wish list by
adding and removing items to them The very last part
of the purchase—the one where money changes
hands—must still take place on Amazon.com, but you
can let the user perform most of the normal operations
associated with an e-commerce website without
leav-ing your website However, do keep in mind that if you
choose to manage the user's shopping cart remotely,
you can't change it once you've submitted to
Amazon—this is done to protect the end user from
fraudulent transactions You can check out the AWS
documentation for more details on this topic—you'll
find that it's not complicated at all
Depending on your needs, you may choose to form a different kind of search operation on your web-site: by similar products, by author, by ISBN, by manu-facturer, and so on You may also want to browse a
per-"node", or product category (e g "programming",
"web", etc.) directly, without performing a search Itgoes without saying that all this depends on what yourgoals are
If your Amazon-based shop becomes very popular,you may decide to join the Amazon AssociatesProgram, an affiliate system that pays you commissions
on every sale Be careful, however, that your applicationmust not send more than one request per second toAmazon—even if you provide an error handling system,you must not immediately retry a request if the previ-ous one has failed
You should also provide a caching system, in order tostore the data needed by your site without going backand forth to AWS for every request—you can check outBruno Pedro's excellent article in the February 2004issue of php|a for more idea on caching data from yourPHP scripts If you choose to do so, don't forget thatyou can't keep your data cached for more than twenty-four hours
Finally, please keep in mind that in the examplesshown in this article we always referred toAmazon.com, the American website AWS are alsoavailable for Amazon.co.uk, Amazon.de andAmazon.co.jp, but you have to modify the URIs in thescript, changing the specifications in the WSDL docu-ment from [soap.amazon.com/] to soap-eu.amazon.com/, and so on You will also have to addthe locale parameter to your RPC invocations—its valuecan be set to uk, de or jp, depending on which Amazon
F
FE EA AT TU UR RE E Connecting to Amazon.com Web Services with NuSOAP
Figure 3
Trang 16website you are referring to.
I'm Outta Here
Amazon.com Web Services is a powerful tool that you
can use to add e-commerce functionality to your site
without going to the expense of developing an online
store of your own and stocking all the merchandise
Even if you can't create a complete on-line shop using
ASW (because the purchase must be completed on the
Amazon website), you can still give your users a
cus-tomized shopping experience that relies on the
practi-cally limitless resources of one of the world's most
pop-ular e-commerce websites
The sample application that I showed you in this
arti-cle is quite simple: if you plan to use it in a production
environment—especially if your site has a lot of traffic—
you should probably consider implementing features
like error handling and caching in order to prevent
problems with the Amazon servers Adding these
ele-ments to your application may require some extra
work, but it could all pay off if you enjoy decent traffic
and join the Amazon Associates Program
Perhaps most importantly, I hope to have given you
a good idea of how much a SOAP library (in this article
we have chosen NuSOAP, but there are some others
packages, like PEAR::SOAP) can simplify the creation of
a complex application—write in few lines of code toperform a Remote Procedure Call and you're practical-
To Discuss this article:
http://forums.phparch.com/130
Alessandro Sfondrini is a young Italian PHP programmer from Como He has already written some on-line PHP tutorials and published scripts on most important Italian web portals You can contact him at
g giu_ale2@hotmail.com m.
FavorHosting.com offers reliable and cost effective web hosting
SETUP FEES WAIVED AND FIRST 30 DAYS FREE!
So if you're worried about an unreliable hosting provider who won't be around in another month, or available to answer your PHP specific support questions Contact us and we'll switch your information and servers to one of our reliable hosting facilities and you'll enjoy no installation fees plus your first month of service is free!*
Please visit http://www.favorhosting.com/phpa/
call 1-866-4FAVOR1 now for information.
- Strong support team
- Focused on developer needs
- Full Managed Backup Services Included Our support team consists of knowledgable and experienced professionals who understand the requirements of installing and supporting PHP based applications.
Trang 17Regular expressions (commonly known as regexes)
are a powerful tool for pattern matching and textmanipulation A typical problem that pulls peopleinto learning regular expressions is text munging: youhave a string of text and you need to replace portions
of it based on certain rules For instance, you might want to obfuscate all the email addresses
in a block of text so that email addresses likeg
george@example.comm get translated to the formg
george [[at] eexample [[dot] ccom Regularexpressions are the tool for the job, and provide a pow-erful and deep syntax for handling tasks like these
Alternatives to the PCREFunctions
PHP supplies some alternatives to the PCRE functions.The most direct competitor is the POSIX regular expres-sion library that consists of eerreegg, eerreegg rreeppllaacceeand oth-ers We won't be looking at the POSIX regular expres-sion functions because the PCRE library provides abroader pattern-matching facility than its POSIX coun-terpart and the PCRE library is about 30% faster onaverage The other option is to perform string match-ing with the standard string functions As noted above,
Matchmaker, Matchmaker Make Me A Match
An Introduction to Regular Expressions
by George Schlossnagle
PHP: ANYOS: AnyApplications: N/ACode Directory: match-regex
REQUIREMENTS
A quick search for the words "hate" and "regular
expres-sions" on your favourite search engine is likely to bring up
thousands upon thousands of hits While most developers
recognize the usefulness of regular expressions (and many
can't do without them once they have figured out how
regexes work), their use remains something of a
black-magic art—right up there with hypnosis and session
man-agement Despite looking complicated, however, regular
expressions are much easier to work with than most
peo-ple are willing to admit.
Before we get started, we should dispel a
few popular myths about regexs:
Myth: Regular Expressions are Slow
Truth: Regular expressions can be slow,
but they don't need to be The main
reg-ular expression library used by PHP (called
PCRE and consisting of the pprreegg family of
functions) is quite fast and also quite
powerful This power means that it is
easy to write a short regular expression
that performs a lot of work, and
perform-ing a lot of work with any tool can be
slow
Myth: You should use basic string
func-tions instead of regular expressions
Truth: Regular string functions (for
example ssttrrssttrr or ssttrrttookk) are (marginally)
faster than the regular expression to
accomplish the same task That having
been noted, this myth often leads to
peo-ple impeo-plementing complicated string
parsers using string matching functions
where a single regular expression would
do the trick The PCRE library will always
match complex patterns faster than
implementing a parser on your own
A Few Myths about Regexes
Trang 18the string functions are faster on the tasks they were
designed for (finding specific characters or substrings),
but are not an appropriate fit for anything but the
sim-plest patterns
Your First Regex
The simplest regex is a match against a static string To
determine if the string 'george@example.com' is
pres-ent in a piece of text, we can use the following code
Despite its simplicity, this example illustrates the
basic syntax of a regex match The regex itself is the
first parameter, and is contained within slashes ([/])
The second parameter is the text you want to test
the pattern against The pprreegg mmaattcchhfunction returns
ttrruueeif the match succeeds, and ffaallssee if it fails Using
slashes to delimit regular expressions is a convention
(taken from the UNIX utility awk), but is not
neces-sary—you can actually use any non-alphanumeric
character Alternative delimiters are convenient if
your pattern itself contains slashes
For instance, when dealing with file
paths or URLs (both of which
con-tain numerous slashes), it is common
to use a different delimiter
We can also perform substitutions
with PCREs To substitute 'george aatt
nospam.example.com' for my address
(a common anti-spam technique), you
The other PCRE functions are:
• ppccrree ggrreepp((ssttrriinngg ppaatttteerrnn,,
aarrrraayy ssuubbjjeeccttss [[,, iinntt ffllaagg]]))—ppccrree ggrreepp
applies the specified ppaatttteerrnn to every
ele-ment of ssuubbjjeeccttss, returning an array
consist-ing of those that matched If the optional
ffllaagg is set to PPRREEGG GGRREEPP IINNVVEERR, only those
elements that did not match will be
returned
• ppccrree mmaattcchh aallll((ssttrriinngg ppaatttteerrnn,, ssttrriinngg
ssuubbjjeecctt [[,,aarrrraayy mmaattcchheess,, iinntt ffllaaggss]]]]))—
ppccrree mmaattcchhreturns only the first match
found in its subject text ppccrree mmaattcchh aallll
matches as many times as possible,
return-ing an array of all the matches I will discuss
this function in more detail later in the cle
arti-• pprreegg rreeppllaaccee ccaallllbbaacckk—This functionmakes it possible to perform very complexoperations on a per-match basis throughthe use of callback functions We will cover
it in a future article, but some of its tionality overlaps with evaluated replace-ments, which are discussed in this article
func-• pprreegg qquuoottee((ssttrriinngg tteexxtt))—When using inputtext in a pattern, you may want to sanitize it
to ensure it does not contain any regexmetacharacters pprreegg qquuootteeescapes all regexmetachacters in a string
• pprreegg sspplliitt((ssttrriinngg ppaatttteerrnn,, ssttrriinngg ssuubbjjeecctt[[,, iinntt lliimmiitt [[,, iinntt ffllaaggss]]]]))—pprreegg sspplliitt
performs similarly to eexxppllooddee, allowing us tobreak up the string ssuubbjjeecctt into lliimmiitt parts
Instead of splitting on a specific delimiter,
pprreegg sspplliittallows the string to be brokenbased on a regex
The power of regular expressions is inmatching complex patterns that can-not be identified using straightforwardtext-search functions like ssttrrssttrr(()) Thebasic components of a regular expres-sion pattern are:
• Character Classes—Patterns rarely consist of
specified letters, but classes of letters Forexample 'any number' instead of a particularnumber, or 'any letter' instead of a particularletter
• Grouping—Grouping allows for changing
the precedence of operations as well asproviding a means to extract the text youmatched with a pattern
• Enumerations—Enumerators allow you to
specify how many times a character class orsub-pattern appears This allows for conven-
“The power of lar expressions is
regu-in matchregu-ing plex patterns that cannot be identi- fied using straight- forward text- search functions like s st tr rs st tr r( () ).”
Trang 19com-ient expression of fixed length patterns like
'a US zipcode is 5 digits' as well as variable
length patterns such as 'a domain is a
num-ber of alphanumeric characters separated by
dots'
• Alternations—Alternations allow for multiple
patterns to be combined Unlike character
classes, which allow for a position to match
multiple characters, alternations allow for
entire patterns to be alternatively matched
For example, a valid workday can be
Monday, Tuesday, Wednesday, Thursday or
Friday
• Positional Anchors—Anchors allow you to
require your pattern to start matching at a
specific location in the search text, for
exam-ple at the beginning or end of a line
• Global Pattern Modifiers—Global pattern
modifiers allow you to change the basic
behavior of a regular expression, for
exam-ple rendering it case-insensitive
Character Classes
While it's usually easy to find a particular substring
within a larger string—for example, my e-mail address
in a message—it's not always easy to find a particular
type of substring-like any e-mail address To do this,
you need to be able to match against a more generic
pattern and not just against a static string PCRE
sup-plies character classes to allow you to do this; a
char-acter class allows a specific charchar-acter in a search text
to be matched against a range of possible characters
For example, a US phone number is composed of a
three digit area code, a three digit exchange, and a four
digit line number, commonly delimited by a '-' To
match this pattern, you could use the following regular
expression:
/\d\d\d-\d\d\d-\d\d\d\d/
The \\dd specifier is a built-in PCRE character class
that consists of all the digits There are a couple
things you should note about the pattern above The
first is that we have many \\dd's In regular
expres-sions, any character or character class matches only
a single character unless you use an enumerator
(which we'll cover later) to attach a quantity to it
Second, if you test this pattern you will find the lowing results
fol-• 555-123-4567 matches This is correct
• 5555-123-45678 matches This is not rect
cor-The second example does not represent a validphone number (the area code and line number are toolong), but it matches because the pattern fits as shown
in Figure 1
There are a couple of ways to combat this problem
If you know that your search text should be exactly aphone number (with no leading or trailing text), youcan use positional anchors to force the pattern to start
at the beginning of the text and end at the end, as we'llsee later on
If the phone number might be contained in text, onthe other hand, you might try and fix the pattern byhaving the numbers have at least one character of lead-ing and trailing whitespace, using a pattern like:
/\s\d\d\d-\d\d\d-\d\d\d\d\s/
The \\ss specifier is another character class for allwhitespace (spaces, tabs, newlines, etc.) This pat-tern does not work in all situations, though, since ifthe text begins with the phone number you will beunable to match the leading \\ss To handle this case,PCRE supports \\bb—a boundary condition thatmatches at the border (or boundary) between a'word' and a 'non-word' (these are words in the Cprogramming language sense—letters, numbers andunderscores only) \\bb is actually not a character class,but what is known as a 'zero-width assertion'; thismeans that the \\bb specifier does not actually matchthe character on the other side of the boundary, butonly ensures that such a boundary exists Puttingthat into our pattern we can refine it to:
/\b\d\d\d-\d\d\d-\d\d\d\d\b/
Continuing the testing, we find that "077-xxx-yyyy"matches US and Canadian area codes and exchangescannot begin with 0 or 1 (these are reserved for longdistance and operator-assisted or international servic-es) To be able to restrict the leading numbers to theallowed set, we need to be able to create our owncharacter classes In PCRE, these are constructed byfilling a set of brackets ([[ ]]) with the characters wewant to match To match 2-9, we can use the charac-ter class [[2233445566778899]], which is commonly shortened via
a range operator to [[22 99]] To use a custom characterclass in a pattern, you use it exactly as you would aregular character or character class Here is the phonenumber pattern reworked to employ this:
Trang 20PCRE provides six commonly used built-in
charac-ter classes, described in Figure 2 Additionally, PCRE
provides POSIX-style character classes for
compatibil-ity with POSIX-style regular expressions These
class-es are dclass-escribed in Figure 3 POSIX character sets
aren't commonly used much in real-life code, which
is a shame because they are often a perfect fit for
problems that programmers encounter in their
day-to-day work
You can negate a POSIX character class by adding a
^^ after the first colon For instance, to match all
non-let-ter characnon-let-ters, you could use the class ::^^aallpphhaa::
Negations are also available in custom character
classes—for example, to match anything that is not the
greater-than character (>), you can use the custom
character class [[^^>>]] Negations are very useful when
you are creating regular expressions that extract
quot-ed text or if you want to manually parse XML or HTML
Since ' ', '^^' and '[[ ]]' have special meanings in
cus-tom character classes, if you want those actual
char-acters to be elements of the class, you should escape
them with a backslash (\\) The two exceptions arethe range operator , which can appear un-escaped
as the last character in a class, since that is biguous, and the negation character ^^, which canappear un-escaped in any position but the first
unam-Grouping and Sub-PatternsUsually, you will not only want to match a pattern, butextract data from it as well To extract a specific part of
a pattern, you surround it within parentheses Forexample, to capture each part of the phone numberpattern, you would add parentheses as follows:
/\b([2-9]\d\d)-([2-9]\d\d)-(\d\d\d\d)\b/
B
Basic Character Classes
Matches any character
\w An alphanumeric character or the underscore char-acter.
:alpha: Any letter :alnum: Any alphanumeric character
:ascii: Any ASCII character :cntrl: Any control chatacter.
:digit: Any digit (same as \d)
:graph: Any alphanumeric or punctuation character.
:lower: Any lowercase letter.
:print: Any printable character.
:space: Any whitespace character (same as \s).
:upper: Any upperspace character.
:xdigit:] Any hexadecimal 'digit'.
Figure 3
Trang 21Pattern fragments grouped in this fashion are called
sub-patterns To see what they capture, you need to
pass a third argument to {preg_match} This
argu-ment is set by the function as an array with the
cap-tured sub-pattern results in it The zeroth element the
array is the text matched by the pattern as a whole,
while the sub-patterns captures are at the offset of
their pattern number Patterns are numbered
left-to-right and outside-to-inside So in the pattern above
the entire phone number is offset 0, the area code is
sub-pattern 1, the exchange is sub-pattern 2, and the
line number is sub-pattern 3
Here you can see a sample phone number being run
through the regular expression
$text = 'My phone number is 555-321-1212';
We can also nest patterns If we wanted to capture
the entire local part of the phone number, in addition
to its componentized parts, the regex could be
modi-fied to be:
/\b([2-9]\d\d)-(([2-9]\d\d)-(\d\d\d\d))\b/
When we nest patterns, we move left to right and,
when we hit a nested pattern, we take the outermost
part first, then recursively parse its contents following
the same rules With the above pattern, the patterns are
numbered as shown in Figure 4
Sub-patterns are also extremely useful in
substitu-tions, since they allow us access to the matched patterns when performing the replacement A cap-tured sub-pattern can be accessed in the{preg_replace} replacement text by referencing its off-set as \\NN (where NN is the sub-pattern number) Here is
sub-an example that ssub-anitizes phone numbers by ing their line number:
obscur- (\d\d\d\d)\b/",
preg_replace("/\b([2-9]\d\d)-([2-9]\d\d)-'\1-\2-XXXX', $text);
If we run this on the text 'My phone number is 555-1212.', it returns 'My phone number is 410-552-XXXX'
410-Note that the replacement string in the above ple is single-quoted If we were to double quote it, wewould have to double escape our sub-pattern refer-ences as ""\\\\11 \\\\22 XXXXXXXX"" This may seem mysterious butthe reasoning is this: the PCRE library needs to bepassed the sub-pattern references as \\11, but when wedouble-quote a string, PHP attempts to interpret theescaped characters for us Single-quoting performs nosuch interpretation and leaves your referencesuntouched This is the same process by which "\n"becomes a newline, but '\n' remains literally '\n'
exam-We can reference sub-patterns in matches as well,using the same rules A fun example of this is findingall 6-letter palindromes A palindrome is a word that
is spelled the same forward and backward, for ple 'noon' or 'deed' To spot a six-letter palindrome,
exam-we match 3 characters and require that exam-we see themimmediately in reverse order Here is the pattern:
F
FE EA AT TU UR RE E Matchmaker, Matchmaker Make Me A Match
This isn't the full story on RFC compliant emailaddresses Because the specification allows foraddresses to contain descriptions as well, a com-pletely accurate email address validator is actu-ally quite complex An example can be found atthe end of Mastering Regular Expressions in Perl
- the regex presented there is X characters long!
For most purposes, the regex presented above iscompletely sufficient
Enumeration modifiers can also be used tocompress patterns with long repetitive parts
For instance, the phone-number pattern can becompressed to:
/\b[2-9]\d{2}-[2-9]\d{2}-\d{4}\b/
or, by noting that the area code and exchangematch the same pattern, we can compress iteven further, as follows:
Trang 22When we run this pattern against a palindrome like '
hallah', it matches as shown in Figure 5
Notice that you need to use \\bb to make sure you
don't misidentify words that contain palindrome
sub-strings If you are running on a UNIX system, Listing 1
is a code block that will find all the six-letter
palin-dromes in the dictionary file //uussrr//sshhaarree//ddiicctt//wwoorrddss
When we use pprreegg mmaattcchh aallll with sub-patterns, we
have two choices of how we want the data returned to
us The default behavior is for the match array to
con-tain an array for each sub-pattern, where that array
contains the capture for the nth search match as its nth
element If that's confusing, here is how it looks when
matching all the phone numbers in a text:
The alternative is to pass the optional flag
PPRREEGG SSEETT OORRDDEERR With this flag set, the ordering of the
match array is reversed: the match array contains one
element for each search text matched, with that array
containing the sub-pattern captures for that search
text If we are looking to replicate the Perl idiom
while($text =~ /$regex/g) {
# perform work on one set of matches at a time
}
you can accomplish it with this PHP:
preg_match_all($regex, $text, $matches,
To handle this, PCRE supplies enumeration modifiers.
The most basic description of an email address is anumber of non-whitespace characters, followed by an'@', followed by more non-whitespace characters \\SS isthe character class for all non-whitespace characters, sousing that we can write this simplistic email-matchingpattern as:
/\S+@\S+/
++ is a PCRE enumerator that instructs the regexengine to match one or more instances of the charac-ter or character class it applies to PCRE supports anumber of enumeration methods for specifying that acharacter or character class should be matched multi-ple times, as you can see in Figure 6
The ++ and ** modifiers are both greedy This means
they will always match as long a sub-pattern as ble This is not always the way you want your patterns
possi-to behave, but I will leave the details of when we mightwant a greedy or non-greedy match to a later article.Enumeration modifiers can be applied not only tocharacters and character classes, but to sub-patterns aswell This allows for some pretty complex pattern gen-eration, which is, after all, one of the best features ofregular expressions (at least when you can understandwhat they do)
For example, we can use enumeration modifiers tosignificantly improve our email-address pattern
Figure 6
E Enumeration Modifiers
* Match 0 or more times.
+ Match 1 or more times.
? Match 0 or 1 times.
{m} Match exactly m times.
{m,n} Match between m and n times.
{m,} Match at least m times.
{,n} Match between 0 and n times.
Trang 23According to RFC 2822, which defines the "official"
valid email address syntax, an email message is
com-posed of a localpart, an '@' and a domain The localpart
is one or more characters from the set
[[\\ww!!##$$%%""**++\\//==??``{{}}||~~^^ ]], while a domain is a
dot-sepa-rated list of parts composed of \\ww The pattern for the
local part is almost identical to the definition of \\SS++:
/[\w!#$%"*+\/=?`{}|~^-]+/
The pattern for domains is more complex First, we
need to identify elements in the string These are given
by
/[\w-]+/
If we only have two such elements, the domain
pat-tern would look like this:
/[\w-]+\.[\w-]+/
Note that since '.' is a special regex character (the
wild-card character class), we must escape it to have it
match just the '.' character Since we can have an
arbi-trary number of dot-separated segments, we will
enca-puslate the first part of the pattern in a sub-pattern and
use the '+' enumerator to specify that it must occur one
or more times:
/([\w-]+\.)+[\w-]+/
Creating a sub-pattern simply involves placing it
inside parentheses Combining the local and domain
patterns together, we arrive at a decent regular
expres-sion for matching valid email addresses:
/[\w!#$%"*+\/=?`{}|~^-]+@([\w-]+\.)+[\w-]+/
We can use this regular expression to perform the
anti-spam rewriting we illustrated at the beginning of
The last of the basic regular expression syntactical
ele-ments is alternation Where character classes let us
match a single character against a set of allowed
char-acters, alternations allow for matching a string against
multiple sub-patterns For example, we might want to
identify all HTTP and FTP addresses in a document for
auto-linking or indexing purposes We could do this
with two regular expressions:
#https?://\S+#
#ftp://\S+#
but this will require the document to be completely
scanned twice Note that we are using ## as a delimiter
and not //, since our pattern contains slashes and wewould rather not have to escape them A more elegantapproach is to combine them using an alternation, asfollows:
#(https?|ftp)://\S+#
The alternation operator || means that the tern ##((hhttttppss??||ffttpp))## matches either ##hhttttppss??## ('http'with an optional 's') or ##ffttpp## To use this to automati-cally create anchor tags for all linked content, we canuse a replacement like this:
sub-pat-preg_replace('#((https?|ftp)://\S+)#', '<a href="\1">\1</a>', $text);
Running this over a sample text, we notice that anypreexisting anchor tags will become munged Forexample:
Come visit us at <a href="http://www.phpa.com">phpa.com</a>.
preg_replace('#([^\'"])((https?|ftp)://\S+)([:punct:])
#', '\1<a href="\2">\2</a>', $text);
Note here that we need to capture and return inthe substitution the non-quote (^^\\''"") character wematch before the URL to avoid losing it, and that wehave to escape the single quote, since it the entirepattern is part of a single-quoted string
Positional Anchors
In the example of matching valid US phone numbers,the regular expression we had was good for spottingphone numbers in a block of text, but not for validat-ing that a block of text is a phone number To do that,
we need to ensure that the phone number is the onlyelement in the search text, with no leading or trailingcomponents Anchors help solve this problem To man-date that our phone number match starts at the begin-ning of the search test and ends at the end of it, we canmodify our regex as follows:
/^([2-9]\d{2})-([2-9]\d{2})-(\d{4})$/
The leading ^^ anchors the match at the beginning
of the text, meaning that the match will only succeed
F
FE EA AT TU UR RE E Matchmaker, Matchmaker Make Me A Match
Trang 24if it begins there The trailing $$ anchors the match at
the end of the text, meaning that the match will only
succeed if the pattern terminates on the final
charac-ter of the text to be matched against
Here we use a slightly modified version of the
anchored pattern to make a function useful for
validat-ing user-inputted data If the phone number is valid, it
will return an array of its components If not, it will
return ffaallssee The regex has been made a bit more
robust by allowing the delimiter (previously ) to be
replaced by an optional or whitespace
if(preg_match($regex, $phone, $matches)) {
return array( 'area_code' => $matches[1],
'exchange' => $matches[2], 'line_number' => $matches[3]);
}
return false;
}
Don't confuse the anchor operator ^^ with the
negat-ed character class operator [[^^]] Because an anchor is
not a character class (in fact it's a special zero-length
look behind assertion, but that's a topic for a later
arti-cle), it has no meaning inside a character class
Anchors are also useful for extracting information
near the beginning or end of a string For example, a
line from an Apache Common Log Format logfile looks
like the following:
10.80.117.254 - - [13/Feb/2004:14:53:01 -0500]
"GET /~george/blog/ HTTP/1.1" 200 43489
This says that on February 13, 2004 a request for
"/~george/blog/" was made from the IP address
10.80.117.254 This request was successful (it returned
a 200 Request OK response code), and the amount of
data returned was 43489 bytes Writing a full parser for
this log line is not too difficult (we will do so in the
cookbook section at the end of the article), but many
queries do not require parsing the entire log For
instance, if we want to count the number of rences of each response code, the expression to use isquite simple Looking at the log format, we see that thelast two fields are numbers, and we want the next tolast one Expressed as a regex, that pattern looks likethis:
occur-/(\d+) \d+$/
Working backwards, this says we first match the end
of the line ($$), then a number (which we don't bother
to capture), then a number which we do want to ture (the response
cap-code) We can wrapthis into a quick script
to determine the quency of variousresponses as shown inListing 2 When wedon't need to parse anentire text string, espe-cially if its format iscomplex, anchors canmake our life mucheasier
fre-Global Pattern ModifiersThe final regular expression syntactical elements we are
going to discuss in this article are global pattern
modi-fiers As their name implies, global pattern modifiers
change the overall behavior of the pattern By far themost common of these is the case insensitivity modifi-
er, ii Global modifiers are implemented in the Perlstyle, directly following the pattern they apply to Here
is a function which uses a regex to extract all addressesunder a specified domain from a subject text, regard-less of the casing of the domain (domains are caseinsensitive)
function extract_addresses($domain, $text) {
$domain = preg_quote($domain);
]+)@$domain/i',
if(preg_match_all('/([\w!#\$%\"*+\/=?\'{}|~^-$text, $matches, PREG_PATTERN_ORDER)) { return $matches[1];
} return false;
}
Notice here that, in addition to using the ii modifier,
we also use pprreegg qquuoottee to sanitize $$ddoommaaiinn Data thatcan potentially come from an untrusted source (such as
a user) should always be quoted to prevent the dental or malicious inclusion of regex characters Also,
acci-we use the PPRREEGG PPAATTTTEERRNN OORRDDEERR flag so that all the pattern \\11 matches are stored in $$mmaattcchheess[[11]].Otherwise we would need to iterate over $$mmaattcchheess andmanually build the result set
sub-The other possible pattern modifiers are as follows:
7 if(( $fp = fopen ( $logfile , “r” )) == false ) {
8 print “Error opening $logfile \n ”
18 foreach ( $frequency as $code => $occurences ) {
19 print “$code \t $occurences \n ”
Trang 25• mm (treat as multiline) By default, PCRE
assumes that we intend our search text to
processed as one big string, and ^^ and $$
will match only the beginning
and ending of the search text,
respectively When the mm
modi-fier is used, ^^ and $$ will match
at the beginning and ending of
every line in the pattern (the
search text is considered to be
broken into lines by any
new-line characters)
• ss (treat as single line for
wild-cards) By default the wildcard
character ( ) will not match a
newline If should match
new-lines as well, add the ss modifier to the
pat-tern
• xx (extended legibility) By default, any
white-space in a pattern is considered part of the
pattern Allowing whitespace in a pattern
can be helpful for readability and inline
comments Compare the following two
([2-9]\d{2}) # Match the exchange as subpattern 2
[.\s-]? # An optional delimiter - dot, dash or
ws
(\d{4}) # Match the line number as subpattern 3
/x
More information of creating readable
pat-terns will be covered in a future article
• AA (Start anchored) This modifier is
equiva-lent to putting a ^^ at the start of our
pat-tern—it anchors the pattern at the start of
the search text Thus the following two
regular expressions are equivalent:
/^Subject: (.*)/
/Subject: (.*)/A
There are no benefits of using this method
over manually anchoring a pattern with ^^
(other than, perhaps, moving the anchor
character from the beginning of your
pat-tern to its end)
• DD (Dollar end-only) If this modifier is set, the
dollar end-anchor $$ will match only at theend of the string By default, $$ will matchbefore the final character if that character is
a newline This is ignored if the mm modifier is
also used
• SS (Study) If we are going toexecute a pattern a number oftimes, we can use this flag toinstruct PCRE to take extra time'studying' the pattern to improveits efficiency
• UU (Ungreedy) By default, allmatches in PCRE are greedy—
that is, a pattern will attempt tomatch the longest possible piece
of the search text The UU modifierreverses this behavior, asking PCRE to findthe shortest possible match for the pattern
More on greedy versus non-greedy ing will be covered in a future article
match-• uu (UTF-8) This modifier instructs PCRE totreat patterns and search texts as UTF-8characters instead of just single-byte charac-ters UTF-8 support is still new and should
be used with some caution as it may beincomplete
• ee (Evaluated replacements) This causes thereplacement string in a pprreegg rreeppllaaccee call to
be evaluated as PHP Back-references areexpanded and the resulting expression isexecuted via eevvaall The result of the evalua-tion is used as the final replacement text
Let's try an example of how to use this ing Wiki-style links to documents In Wikis,putting so-called CamelCaps text in a docu-ment will link it to the wiki page of thatname Doing this blindly with a regex can
writ-be achieved with the following replacement:
$text = preg_replace('/\b(([A-Z]\w+){2,})\b/',
'<a href="/wiki/\1.html">\1</a>', $text);
This might result in a number of tent documents being linked to, though If
Trang 26we want the rewriting to only happen if the
destination document exists, we can
per-form the conditional replacement with an
evaluated replacement as shown in Listing 3
Now, when a CamelCaps word is
encoun-tered, the regex checks iiss wwiikkii ppaaggee to see
if it should be linked If so, the text is
replaced with a link; otherwise, it is left as-is
(or, rather, it is replaced with itself)
Evaluated replacements and their
compan-ion functcompan-ion pprreegg rreeppllaaccee ccaallllbbaacckk will be
covered in depth in a future article
Unless specifically contraindicated (such as BB and mm),
pattern global modifiers can be freely combined
A Simple Regex Cookbook
As with most tools, the way to really learn regexes is to
use them in practical situations To help you get on
your way, here is a short selection of recipes for making
the most out of your regular expressions
Apache Log Processing
Being able to extract information from webserver
log-files is essential to both good housekeeping (knowing
what links are broken and the disposition of our traffic)
and forensics (determining where traffic is coming from
and what actions users are taking) The first step to this
is being able to parse our logs into an easily accessible
data structure Apache common log format is defined
as the following:
"%h %l %u %t \"%r\" %>s %b"
Where the individual fields are:
• %%hh—-The IP address (or hostname if DNS
lookups are enabled) of the requestor
• %%ll—The remote logname, as supplied by
• %%>>ss—The three digit response code of the
final request served (Apache has a notion of
internal redirects—this is the response code
on the page actually returned to the user)
• %%bb—The number of bytes returned in the
response
A function to parse a single line and return an array
with its contents is given in Listing 4 Even though we
didn't really explore it in much detail, the benefit ofusing extended legibility regexes should be obvioushere—with 17 sub-patterns being captured, it would
be extremely difficult to guess the correct offsets at aglance Now that we have a parser, its applications arenearly limitless For example, Listing 5 shows a littlescript I like to leave running in a window on my desk-top; I tail my Apache log into it and it reports the num-ber of hits I get per second in real-time Running it as
tail -f /apache/logs/mysite/access | freq.php
Gives a running tally of hits per second (note thatthis will only run under a UNIX-like environment andthat you'll need to make ffrreeqq pphhpp executable) Thisdata could just as easily be written to an MRTG data-base for graphing, or something even cleverer.Because we have access to the fully parsed log line, we
20 “( #begin request match ($m[12])
21 (GET|HEAD|POST) # the HTTP method ($m[13])
7 while(( $line = fgets ( STDIN )) !== false ) {
8 if( $data = parse_clf_line ( $line )) {
9 $this_time = $data [ ];
10 if( $last_time && $last_time != $this_time ) {
11 print “$last_time: $count \n ”
Trang 27could easily convert this to display hits per hour by
changing
$this_time = $data[4];
to
$this_sec = "$data[5]/$data[6]/$data[7] $data[8]";
Similarly, we could count bytes instead of pages by
accumulating $$datta[117] (bytes transferred) in
$
$countt
Single Pass Template Substitution
In its simplest form, a templating system runs through
a 'template' and replaces certain tokens with dynamic
values One of the things that makes many templating
systems slow is that they must perform multiple passes
through a document, one for each token to be
replaced If we standardize our token naming
conven-tion, we can actually perform the replacement in a
sin-gle pass
First, we require that all templates be of the form
{{NNAAMMEE}} where NNAAMMEE is a key in an associative array that
contains our substitutions With this in place, we can
match all tokens in a single pass with the following
regex:
/{(\w+)}/
Next we will use an evaluated replacement to
substi-tute the appropriate value from the passed associative
array Here is the full function:
function expand_text($text, $data)
Your friend {FRIEND} has sent you an e-card.
Click <a href="{LINK}">here</a> to pick it up.
print expand_text($template, $data);
Preventing Cross-Site Scripting
Attacks
Javascript is one of the banes of my existence Don't get
me wrong—it is a powerful and useful language, but its
tight integration with HTML makes it a fertile
play-ground for malicious users to launch cross-site scripting
attacks If we must allow HTML in user input, we will
want to at least remove any Javascript from it Listing 6
shows one possible way to do so This function looks forvarious DHTML and CSS directives that can be used forcross-site scripting attacks, and if any are found it per-forms a very draconian stripping of all but the basic for-matting tags
Conclusion
We have now come to the end of our journey throughthe basics of regular expressions With these tools inyour hands, you should be able to tackle almost anytext matching challenge Hopefully, you have lost anyfears you might have had concerning regular expres-sions Once past the terseness of their syntax, regexescan be a powerful and versatile addition to our pro-gramming toolkit
At the same time, we have really only touched the tip
of the regex iceberg In addition to the things we haveseen so far, the PCRE extension supports a number offine-grain features that allow for incredibly complexmatches These advanced features will be covered in afuture set of articles
F
FE EA AT TU UR RE E Matchmaker, Matchmaker Make Me A Match
To Discuss this article:
http://forums.phparch.com/131
George Schlossnagle is a Principal at OmniTI Computer Consulting, a Maryland-based tech company specializing in high-volume web and email systems Before joining OmniTI, George led technical operations
at several high-profile community web sites where he developed ence managing PHP in very large enterprise environments George is a frequent contributor to the PHP community His work can be found in the PHP core, as well as in the PEAR and PECL extension repositories.
experi-Before entering into information technology, George trained to be a mathematician and served a 2 year stint as a teacher in the Peace Corps His experience has taught him to value an inter-disciplinary approach to problem solving that favors root-cause analysis of problems over simply addressing symptoms.
5 $js_event_list = array(‘load’, ‘unload’, ‘click’, ‘dblclick’,
‘blur’,
‘sub-mit’,
10 $js_events = implode(‘|’, $js_event_list);
Trang 28Write for us!
Trang 29In this article, I will show you how to write powerful
automated tests in PHP for your Web applications
PHP is remarkably well-suited for writing software test
automation and the system I present is surprisingly
short Web applications built with PHP are becoming
more and more common in the enterprise arena and,
as a result, they are becoming increasingly complex As
PHP matures, the ability to write test automation
becomes more valuable, but in conversations with my
colleagues I discovered that the techniques required for
automated testing of PHP Web applications are not well
known In this article, I will show you how to quickly
write effective test automation that verifies your PHP
Web applications' correctness
The best way to show you what we will accomplish is
with two screenshots Figure 1 shows a dummy PHP
Web application that accepts a last name for an
employee and then searches a MySQL database and
displays the employee's ID, first name, last name, and
e-mail address In this example searching for "Baker"
correctly returns a single employee whose ID is 002,
first name is Bob, and e-mail is bob@build.com
Manually testing even this minimal Web application
would be extremely tedious, time consuming, and
error prone Instead, we can test the application by
programmatically sending input to the PHP script on
the Web server, then capture the response stream,
examine the response for a correct target, and log a
pass or fail result Figure 2 shows a PHP shell program
that does just that Test cases 0002 and 0003
corre-spond to the manual test shown in Figure 1
You might have noticed that my examples use a
Windows/IIS system rather than the more usualLinux/Apache setup Most client companies that I workwith are large and have a mixed technology environ-ment Because many of these companies are experi-menting with PHP and MySQL on a Windows/IIS base,
I decided to use that base for this article
In the sections that follow I will walk you through theunderlying PHP Web application so that you will under-stand what we are testing, briefly examine the underly-ing MySQL database so that you understand its rela-tionship to the test automation, and carefully go overthe PHP test automation program so that you can mod-ify the source code to meet your own particular needs
I will conclude with a discussion of some of the waysyou can extend this technique and use it in a produc-tion environment After reading this article you willhave the ability to write PHP test automation—a hope-fully valuable addition to your skill set
The PHP Web ApplicationThe most common use of PHP among the companies Iwork with is to create dynamic Web pages that have aninterface to a MySQL database I created a reduced
Other Software: N/ACode Directory: auto-test
REQUIREMENTS
PHP enables Web developers to create complex Web
appli-cations—nothing new there The techniques for writing
automated tests for PHP Web applications, however, are
not well known In this article, James McCaffrey shows you
a simple but representative PHP application and then
walks you through the creation of a powerful automated
test program written entirely in PHP The code is explained
in detail so you can use it as is, or modify and extend the
technique to meet your own needs.
Trang 30dummy Web application that contains the essential
ele-ments of most real-life applications I deal with I started
by making a small database named ddbbCCoommppaannyy, which
contains a table named ttbbllEEmmppllooyyeeeess that has four
columns: eemmppiidd (employee ID), llaassttnnaammee, ffiirrssttnnaammee,
and eemmaaiill I populated the table with the four rows of
data you can see in Figure 3 (next page)
Next, I created a simple PHP Web application that
searches the database The code shown in Listing 1
generates the Web page shown in Figure 1
Both the database and the PHP application are
sim-plistic, but together they have all the elements needed
to demonstrate test automation Before I show you the
test automation program, let's imagine what it would
be like to manually test the application (In fact, asking
how to test a dummy Web application like this is often
used as an interview question for dedicated software
test engineers.) There are thousands of inputs you would have toenter into the page and then visually determine if theresponse was correct or not Then, suppose youchanged the logic or the database structure—you'dhave to start all over As you can imagine, this wouldnot be fun, or particularly efficient
To automate the testing of the dummy PHP Webapplication, we must programmatically send input tothe PHP script (via HTTP), then capture the HTTPresponse stream, examine the response for strings thattell us if the response is correct or not, and log results.The PHP shell script shown in Listing 2 does exactly thatand generated the output shown in Figure 2
I structured the test automation as two functions Themmaaiinn(()) function reads test case data from a text file,sends an input value to the PHP Web application, and
Figure 2Figure 1
Trang 31examines the response for an expected value The
mmaaiinn(()) function calls a rreessHHaassTTaarrggeett(()) function which
returns TRUE if some input data contains a target string
Here are the contents of the test case file used in this
Each line of data represents a single test case A
4-digit test case ID is followed by an input value, then an
expected result, and an optional comment So, in test
case 0002, if we submit "Baker", we should see "Bob" in
the response
The mmaaiinn(())function starts by assigning values to
vari-ables for the IP address of the Web server, the port on
which the server listens, the path to the PHP
applica-tion, and the method used to send user data:
$ipAddress = '127.0.0.1';
$port = '80';
$page = '/PHP/simple.php';
$method = 'POST';
Because this is test automation, you will know the IP
address of the Web server that has your PHP
applica-tion, and it will usually be 127.0.0.1 (localhost), unless
you test on a server that is not installed on your local
machine Port 80 is the default HTTP port, but it may
be different in a test environment The two main
meth-ods of sending information to a Web server are POST
and GET Recall that our dummy Web application sends
data using POST:
<form name="theForm" action="simple.php"
method="POST">
I will discuss using GET requests later Next, mmaaiinn(())
prints some minimal header information to the shell
and then opens the test case file for reading The test
automation reads the test case file line by line:
$line = fgets($fp, 4096);
list($caseid, $input, $expected, $comment) = explode(":", $line);
$postData = 'lastname=' urlencode($input);
For each line, we parse the four colon-delimited fieldsusing the eexxppllooddee(()) function Using colons to delimittest case data is arbitrary—in general, you can use anycharacter but want to avoid characters that appear inthe actual test case data We append the input value tollaassttnnaammee== using the uurrlleennccooddee(()) function It replacescharacters that might be misinterpreted by the Webserver with their escaped equivalents For example, a '/'character would be replaced by a %2F sequence
After we have a test case ID, an input last name tosend and an expected value to look for, therreessHHaassTTaarrggeett(())function does all the work:
if (resHasTarget($ipAddress, $port, $method, $page,
$postData, $expected)) echo "$caseid Pass input = " str_pad($input, 12) "expected = $expected\n";
else echo "$caseid FAIL input = " str_pad($input, 12) "expected = $expected\n";
The rreessHHaassTTaarrggeett(()) function posts data to the PHPWeb application and checks if the expected value is inthe response stream For test case 0001,
"lastname=Anderson" is posted to112277 00 00 11::8800//PPHHPP//ssiimmppllee pphhpp and the response isexamined for the presence of the string "Adam" If
"Adam" is found, rreessHHaassTTaarrggeett(())returns TRUE and welog a "pass" message, otherwise we log a "fail" message.Let's now examine the rreessHHaassTTaarrggeett(())function thatdoes most of the actual work We start by creating asocket and then using it to connect to our Web server:
$socket = socket_create(AF_INET, SOCK_STREAM, 0)
or die("Socket failed\n");
$connect = socket_connect($socket, $ipAddress, $port)
or die("Connect failed\n");
The constantsAF_INET and
S O C K _ S T R E A Mmean that we want
to use the quad notation (i.e.,127.0.0.1) and afull-duplex, TCPconnection Thereare two importantalternatives to thesocket_* family offunctions I chose touse A lower levelchoice is the
dotted-ffssoocckk(()) family offunctions A higher
F
FE EA AT TU UR RE E Automated Testing For PHP Applications
Figure 3
Trang 32level choice is to use classes in the PEAR library I have
programmed sockets using all three methods and have
found that any preference is more a matter of personal
programming style than functionality After we connect
to the Web server we determine the size of the data we
will be posting :
$reqBody = $postData;
$contentLength = strlen($reqBody);
The $$ppoossttDDaattaa input parameter assumes we have
data in a name-value sequence like:
user=chris&age=25&job=tester
for example Next we construct the HTTP headers we
are going to send to the server:
$send = $method " " $page " HTTP/1.1\r\n";
$send = "Host: localhost\r\n";
$send = "Accept: */*\r\n";
$send = "User-Agent: test.php test automation\r\n";
$send = "Content-Type:
An HTTP request starts with a line that specifies the
method (e.g., POST, GET, HEAD), followed by the path
to the PHP application and the HTTP version The next
header line must specify the host that the request is
being sent to The next two header lines are optional
The AAcccceepptt header tells the server what types of
responses are acceptable (here we'll accept anything)
The UUsseerr AAggeenntt header is a courtesy so the Web server
knows who is making the request The next two
head-er lines are required for POST requests Content-Type
tells the server what kind of data is coming You can
think of aapppplliiccaattiioonn//xx wwwwww ffoorrmm uurrlleennccooddeedd as a magic
string that means "data from an HTML form"
The CCoonntteenntt LLeennggtthh header is the size of the POST
data Notice that we have to construct the POST data
before the headers so we can specify the size at this
point in the program Also notice that the
CCoonntteenntt LLeennggtthh header is followed by 2 newline
charac-ters (or in the case of the Windows based system here,
2 carriage return, linefeed combinations) Finally we
append the POST data to the request
Now we are ready to send the HTTP request to the
server, then grab the response stream and examine it:
socket_write($socket, $send, strlen($send));
while ($receiveBuffer = socket_read($socket, 2048))
The ssoocckkeett wwrriittee(()) function sends the request and
associates the response to the socket We read the
response 2048 bytes (an arbitrary size) at a time (asopposed to line-by-line) We also use ssttrrppooss(())to see ifthe target string is anywhere in the 2048 bytes, and if
it is we close the socket and return TRUE If we ine the entire response and never find the target string
exam-we return FALSE
There is one trick to watch for here—it is possible that
a response stream block of bytes might end in the dle of the target, breaking it into two parts If so, youwould not find the target string In practice this is notvery likely and you can defend against this possibility byincreasing the number of bytes read per ssoocckkeett rreeaadd(())
mid-so that you capture the entire response stream
To summarize, the key to automated testing of PHPWeb applications is the ability to send raw HTTP data tothe Web server PHP has a family of socket functionsthat make it easy to do so After reading informationfrom test case files containing input values and expect-
ed values, you send the input to the server then ine the response for the expected value
exam-Using The GET Method
In the previous sections, we assumed that the PHP Webapplication under test sends data to the server usingthe POST method What if the application uses GET?Suppose you have a Web application where the usersubmits a user ID and a password using GET (By theway, this is a bad idea because with GET the form data
is appended to the request URL) The following codesnippet shows how to send a request using GET:
// create socket // connect
$send = "GET /PHP/form2.php?";
$send = "userID=" urlencode("root");
$send = "&password=" urlencode("secret");
Beyond the BasicsYou can modify and extend the basic PHP applicationtest framework presented here in many ways For clari-
ty, I used a simple text file to store test cases, but youshould consider good alternatives, like XML or databasestorage Using XML to hold your test cases is particular-
ly appropriate when the test cases have a complexstructure (for example, many optional parameters), orare shared across groups A database, on the other
Trang 33hand, can come in handy when you have a very large
number of test cases
The technique in this article displays its output to a
command shell In a production environment, you will
probably want to write test results to a text file or a SQL
database Writing to a text file is most appropriate
when you are on a relatively short production cycle
Writing results to a SQL database is useful when you are
in a long production cycle because you will be
generat-ing lots of data that can be shared and analyzed in
many different ways
In a production environment, I always add additional
data to the results log At a minimum, you will want to
add counters for the number of cases which pass and
which fail I also like to add timing information for each
test case and the overall test run Timing information
can uncover problems in the Web application code that
basic pass-fail data misses And for reporting purposes,
you can timestamp the date of the test run
To be honest, when I first started using PHP I was very
surprised at how well it works as a language for
soft-ware test automation In general, it is best to write test
automation using the same language as that used by
the system under test—test a C++ application using
C++, test a Java application using Java The idea is that
if you use different languages, you run into many
cross-language issues which affect the validity of your test
automation But often, using the same language is just
not possible When I examined PHP's capabilities as a
testing language, I was pleased to find that they are asgood as any language I've worked with—and maybeeven better, in some cases
In the introduction to this article, I noted that most ofthe client companies I work with are currently investi-
6 <form name=”theForm” action=”simple.php” method=”POST”>
7 <p>Last name: <input type=”text” name=”lastname” /></p>
8 <p><input type=”submit” value=”Find Employee” /></p>
17 $search = $_POST [ ‘lastname’ ];
18 $query = “SELECT * FROM tblEmployees WHERE lastname = ‘“
26 echo “<td>” $row [ ‘empid’ ] “ “ $row [ ‘firstname’ ];
27 echo “ “ $row [ ‘lastname’ ] “ “ $row [ ‘email’ ]
7 $socket = socket_create ( AF_INET , SOCK_STREAM , 0 )
8 or die( “Socket failed\n” );
9
10 $connect = socket_connect ( $socket , $ipAddress , $port )
11 or die( “Connect failed\n” );
12
13 $reqBody = $postData ;
14 $contentLength = strlen ( $reqBody );
15
16 $send = $method “ “ $page “ HTTP/1.1\r\n” ;
17 $send = “Host: localhost\r\n” ;
18 $send = “Accept: */*\r\n” ;
19 $send = “User-Agent: test.php test automation\r\n” ;
20 $send = “Content-Type: urlencoded\r\n” ;
application/x-www-form-21 $send = “Content-Length: “ $contentLength “\r\n\r\n” ;
48 echo “\nBegin test run\n\n” ;
49 echo “caseid result\n” ;
Trang 34gating mixed-technology
envi-ronments As recently as twelve
months ago, mixing Open
Source and proprietary
tech-nologies usually had uneven
results, but the situation has
changed dramatically for the
better The machine on which I
developed the techniques used
in this article happily supports
MySQL and SQL Server, C#
and PHP, Apache and IIS, and
dual boots into Linux and
Windows XP This works in
PHP's favor: developers can
install PHP over their existing
technologies and gradually
migrate In particular, I am
see-ing many in-house shops start
to move from ColdFusion to
PHP as their programming
platform of choice for Web
projects
An interesting side effect of
the test automation presented
in this article is that you can
easily adapt the test code to
create a general purpose HTTP
response viewer By placing an
eecchhoo(()) statement inside the
while loop that examines the response:
while ($receiveBuffer = socket_read($socket, 2048))
{
echo $receiveBuffer;
}
and making a few other cosmetic changes you can
view the entire response stream, as you can see in
Figure 4 If you are new to programming with PHP at a
low level, this is a great way to learn what is really going
on with HTTP behind the scenes
In principle, testing PHP Web applications is similar to
traditional API (Application Programming Interface) or
Unit testing But because PHP applications are
client-server based, there are additional connectivity issues
This means you will want to liberally use error checking
As usual for instructional articles, I removed all error
checking in the code presented here Based on my
experience, adding exception handing code (if you're
using PHP5) will double the size of your source code
but is well worth the effort
One valuable use of the technique presented in this
article is to construct Developer Regression Tests (DRTs)
for your PHP Web applications DRTs are a sequence of
automated tests that are run after you make changes to
your application They are designed to determine if
your new code has broken existing functionality, before
you check it in your version-control repository You canalso create an extensive set of test cases for a Full TestPass
Conclusion
In this article, I have shown you how easy it is to createtest automation systems written in PHP for your appli-cations As PHP matures, testing will become moreimportant and the ability to write automated tests willbecome more useful than it already is And becausePHP works so well in a mixed technology environment,the ability to write PHP test automation is a valuableaddition to your skill set—no matter what platformsyou use
To Discuss this article:
Figure 4