1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Matchmaker Make Me a Match pdf

68 466 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Matchmaker Make Me a Match: An Introduction to Regular Expressions
Trường học PHP Architecture
Chuyên ngành PHP Programming
Thể loại bài báo
Năm xuất bản 2004
Thành phố Toronto
Định dạng
Số trang 68
Dung lượng 3,44 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

If you look at Listing 1—thesource for our search page—you'll see that the very firstpart of the file is nothing more than a simple HTMLform, which contains an input text box for the key

Trang 1

MARCH 2004 VOLUME III - ISSUE 3

The Magazine For PHP Professionals

Plus:

Explore your HTML code with Tidy

Testing Automation With PHP

Using the Amazon.com API

Trang 5

Existing subscribers

can upgrade to

the Print edition

and save!

Login to your account

for more details.

NEW!

*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly.

**Offer available only in conjunction with the purchase of a print subscription.

Choose a Subscription type:

CCaannaaddaa//UUSSAA $$ 8833 9999 CCAADD (($$5599 9999 UUSS**)) IInntteerrnnaattiioonnaall SSuurrffaaccee $$111111 9999 CCAADD (($$7799 9999 UUSS**)) IInntteerrnnaattiioonnaall AAiirr $$112255 9999 CCAADD (($$8899 9999 UUSS**))CCoommbboo eeddiittiioonn aadddd oonn $$ 1144 0000 CCAADD (($$1100 0000 UUSS))((pprriinntt ++ PPDDFF eeddiittiioonn))

Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue

to be mailed to you.

*US Pricing is approximate and for illustration purposes only.

php|architect Subscription Dept.

VISA Mastercard American Express

Credit Card Number:

The Magazine For PHP Professionals

YYoouu’’llll nneevveerr kknnoow w w whhaatt w wee’’llll ccoom mee uupp w wiitthh nneexxtt

Trang 6

Graphics & Layout

John Coggeshall, John Holmes,

Dr James McCaffrey, George Schlossnagle, Alessandro Sfondrini, Chris Shiflett, Andrea Trasatti

php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes

no responsibilities with regards of use of the information contained herein or in all ciated material.

asso-Contact Information:

General mailbox: info@phparch.com

Editorial: editors@phparch.com

Subscriptions: subs@phparch.com

Sales & advertising: sales@phparch.com

Technical support: support@phparch.com

Copyright © 2003-2004 Marco Tabini & Associates, Inc.

— All Rights Reserved

I'm sure you're familiar with the Chinese proverb "may

you live in interesting times." Even though I rarely

think of my professional life as dull and boring, the

last month has been particularly exciting As promised

in my exit(0) column from last month's issue, if you

look through the middle of the magazine you'll find a

full report (in colour!) on the best conference I have

ever attended—our very own php|cruise (forgive me

for a bit of professional price—eight months of prep

work will do that to you) Things went so well that

we're working on another cruise—this time going to

Alaska in the fall—and plan on making php|c an

annu-al event for many years to come

All good things come to an end, of course, and, once

back from the cruise, it's back to work Luckily for us,

work means bringing you yet another great issue of

php|architect—and I personally consider that another

good thing Like every month, we've got some great

content waiting for you in the following pages

The one I'm most proud of is George Schlossnagle's

regular expressions article Regexes are something that

pretty much every programmer has to deal with, but

that very few among us really know how to use In fact,

I've seen developers write extremely complicated code

with the explicit purpose of getting around having to

use a regular expression—and that is just plain wrong

After all, using the best solution for each problem is

what being a programmer is all about

Thus, I approached George about writing an article

on regular expressions—and it became quickly evident

that one article would not even come close to covering

the complexity of regex Now, everyone knows that I

always try my best to stay away from multi-part articles

for a multitude of reasons, but in this case I felt that the

topic more than deserved our attention over multiple

issues and, therefore, George's article is the first in a

series of three Over the next three months, he will take

you for a ride from the basics (which are covered in this

issue) to the more complex and exotic aspects of

regu-lar expressions, thus hopefully providing the PHP world

with a definitive guide to this topic

If regular expressions are not your bag, one of the

other topics covered in this month's issue is certain to

tickle your fancy For example, you may want to read

Alessandro Sfondrini's excellent article on using the

Amazon.com API directly from your PHP website, or

Andrea Trasatti's look at the world of WAP As you can

probably imagine, both Andrea and Alessandro hail

from my native Italy—and that alone makes their

arti-cles more than worth reading There, my monthly

her-itage tax is now paid up!

As I'm sure you've noticed, in the past few months

we've been publishing material about testing practices

quite frequently As larger and larger projects are

devel-Continued on page 8

Trang 7

NE EW W S ST TU UF FF F

PHP 5.0 Beta 4

PHP.net has announced the release of PHP 4.3.5 RC1

This fourth beta of PHP 5 is also scheduled to be the

last one (barring unexpected surprises, that did occur

with beta 3) This beta incorporates dozens of bug fixes

since Beta 3, rewritten exceptions support, improved

interfaces support, new experimental SOAP support, as

well as lots of other improvements, some of which are

documented in the ChangeLog Some of the key

fea-tures of PHP 5 include:

• PHP 5 features the Zend Engine 2

• XML support has been completely redone in

PHP 5, all extensions are now focused around

the excellent libxml2 library

(h http://www.xmlsoft.org/)

• SQLite has been bundled with PHP For more

information on SQLite, please visit their

web-site

• A new SimpleXML extension for easily

access-ing and manipulataccess-ing XML as PHP objects It

can also interface with the DOM extension

and vice-versa

• Streams have been greatly improved,

includ-ing the ability to access low-level socket ations on streams

oper-PHP.net also announced the release of PHP 4.3.5 RC

3 This will be the last release candidate prior to thefinal release, so please test it as much as possible

For more information visit h http://www.php.net/

ZEND Optimizer 2.5.1 Zend has announced the release of Zend Optimizer2.5.1

Zend.com describes the Optimizer as: "a free tion that runs the files encoded by the Zend Encoderand Zend SafeGuard Suite, while enhancing the run-ning speed of PHP applications

applica-Benefits:

• Enables users to run files encoded by the Zend Encoder

• Increases runtime performance up to 40%."

Get more information from Z Zend.com m

Trang 8

Zend Launches New PHP5 In-Depth

Articles Section

Zend Technologies have launched a new version of

their Developer's

Corner on the zend.com website PHP5 In-depth

showcases articles from many well-known PHP authors

on the new features of PHP For more information,

check out h http://www.zend.com/php/in-depth.php p

DEV Web Management System

Dev is small, but powerful and very flexible content

management system for web portals System is licensed

as freeware under the terms of GNU/GPL license It is

absolutely free for non-commercial and commercial

use Based on php4 + MySQL technology

This project allows the user to publish articles,

evalu-ate article by taking the pool, publish short news and

create back-ends in xml format, manage download

lists, Manage advertisement on your site, Be informed

about events on your site, create system reports and

export them into MS Excel or XML format and much

"Welcome to this new version, aimed at stabilization of the 2.5 branch Meanwhile, work is continuing on the new 2.6 branch PhpMyAdmin is a tool written in PHP intend-

ed to handle the administration of MySQL over the Web Currently it can create and drop databases, create/drop/alter tables, delete/edit/add fields, execute any SQL statement, manage keys on fields."

For more information visit: w www.phpmyadmin.net t

PhpSQLiteAdmin 0.2PhpSQLiteAdmin is a Web interface for the administra-tion of SQLite databases

Version 0.2 comes with some new features and a lot

of internal cleanups and refactoring PhpSQLiteAdmin

is still in an early stage of development It comes free ofcharge and without warranty

For more information visit: w www.phpsqliteadmin.net t

phpMyEdit 5.4phpMyEdit generates PHP code for displaying/editingMySQL tables in HTML All you need to do is to write asimple calling program (a utility to do this is included)

Looking for a new PHP Extension? Check out some of the latest offerings from PECL.

Trang 9

It includes a huge set of table manipulation functions

(record adition, change, view, copy, and remove), table

sorting, filtering, table lookups, and more

Several minor bugs were fixed A few new options

were added Major features include tabs support, the

ability to specify SQL expressions for fields when

writ-ing to the database, the ability to define new triggers,

and more All eval() calls were removed due to security

and performance reasons Some code was optimized

Several parts of the documentation were updated A lot

of new language files were added and updated

For more information visit:

h

http://platon.sk/projects/ phpMyEdit/

ionCube Releases New Encoder

UK-based ionCube has released a new version of their

compiled code PHP encoding tools New features

include a choice of ASCII or binary encoded file formats

and optional support for OpenSource extensions such

That's it for this month—time for me to go tend to

my sunburn while I start working on the next issue.Until then, happy readings!

Editorial: Contiuned from page 5

Trang 10

In the article "Exploring the Google API with SOAP,"

which appeared in the January issue of php|a, I

showed you what SOAP is and how it can be used

together with PHP We used a SOAP-encoded

docu-ment to perform a search using the Google Engine,

then we parsed the response to display the results on

our website To perform these operations, we wrote an

application from scratch; this approach can be great to

understand how SOAP works, but when a customer

asks you to implement a SOAP-based feature in an

application, you can't waste your time in that way

In this case, there are some libraries that will make

your coding quicker and easier: one of these is

NuSOAP, which allows you to send Remote Procedure

Calls (RPCs) over HTTP

This article will show you how we can use the

Amazon.com API with NuSOAP to perform searches

and display product details, without having to sort

through a lot of SOAP syntax: if you have had an

opportunity to read my previous article, you will notice

how much shorter an application written this way is,

and how much time can actually be saved by using this

method

What are Amazon Web Services?

Amazon.com is one of the most widely known on-line

shops You can find and buy almost everything, from

books to toys to power tools Several years ago,

Amazon launched a very successful affiliate program,

which they later expanded in their Web Services

pro-gram

Why would you want to use Amazon Web Services

(AWS)? For instance, if your website is about Literature,you may want to allow your users to look for books inthe (huge) Amazon database directly from your pages,without redirecting them to Amazon.com You can pro-vide them with a detailed description of each book and,when they decide to buy one, you can add it directly totheir Amazon shopping cart When the time comes tocomplete the purchase, you can redirect the userdirectly to the Amazon website, where the checkoutprocess actually takes place and you receive credit foryour affiliate referral

It is important to understand that AWS are designedonly to retrieve information about products and create,

as well as populate, shopping carts, not to perform ments: this must be done directly on the Amazon web-site-the reason being, of course, one of security for thecustomer's personal information In any case, a signifi-cant portion of the transaction is performed from yourwebsite This results in a benefit both for you and foryour users, since you can offer your customers a nearlyseamless user experience and collect your referral fees.Access to AWS, as well as to the affiliate program,requires you to register with the Amazon AssociatesProgram and obtain an Associates ID, which will identi-

Other software:: NuSOAP 0.6.4Code Directory: webs-nusoap

REQUIREMENTS

Have you ever wanted to add an online shop to your

website but gave up on the idea because you lack the

expertise and resources to run it? Using SOAP, you can

connect to Amazon Web Services and create a PHP

appli-cation to remotely browse and search products, add

them to Amazon shopping carts or wish lists and, yes,

you can even earn money on every purchase performed

from your site.

Trang 11

fy each purchase sent through our website.

Getting started

Before we start coding, I recommend you download

the AWS Software Developer's Kit from

h

http://www.amazon.com/gp/browse.html/?node=3434641 1 It contains

the License Agreement, a guide (you should have a

look at it to familiarize yourself with the concepts

asso-ciated with the program) and some code

samples-including a few written in PHP!

As I mentioned earlier, you will also have to apply for

your Developer's token-an alphanumerical string

need-ed for performing searches and purchases: to do so,

you have to visit :

h

https://associates.amazon.com/exec/panama/associates/j j

o

oin/developer/application.html l

and accept the AWS terms and conditions

To write our application, we will take advantage of a

PHP library called NuSOAP-which is really just a group

of "userland" classes written in PHP and designed to

allow developers to manage SOAP web services, which

will speed up our coding by allowing us to focus on

functionality rather than on the

communication protocols NuSOAP is distributed

under the LGPL license, and can be downloaded here:

h

http://dietrich.ganx4.com/nusoap/

To add NuSOAP support to our project, we simply

have to include nnuussooaapp pphhpp to our PHP scripts using

rreeqquuiirree(()) Performing a Remote Procedure Call (RPC) is

simple—look at this example:

require("nusoap.php");

$params = array('name' => 'value');

$s = new soapclient("http://server/file.wsdl", true);

$result = $s -> call('method', $params);

First of all, we include NuSOAP and we store theparameters we will use for the RPC in the $$ppaarraammss asso-ciative array We then create a new ssooaappcclliieenntt object,passing two arguments to the constructor: the SOAPserver address and a boolean value that indicateswhether the server uses a WSDL document WSDL(Web Services Description Language) documents con-tain information about a web service, as well as itsmethods and properties They are often used by webservice providers—including Amazon

Once we have created the object, all we have to do

is to actually execute the RPC by invoking the ccaallll(())method and specifying the remote method name andthe parameters to be passed (contained in $$ppaarraammss inour case) NuSOAP automatically fetches the results ofthe call and stores them in the $$rreessuulltt array

Since we are working with a WSDL-based server,NuSOAP can actually create a "proxy" PHP class capa-ble of providing a better interface to our scripts Once

we have instantiated $$ss, we can also invoke a remotemmeetthhoodd in this way:

Parameter Name e T Type e D Description

keyword String The keyword on which the searchshould be performed.

The page number AWS returns ten results per page, so page 1 will contain results 1 through 10, page

2 results 11 through 20, and so on.

Specifies the ID of the store to browse Each Amazon store has its unique ID, which indicates what kind of products it sells (e.g.:

b books, m music, d dvd, v vhs, etc.) You can find a complete list of all the IDs available in the AWS documenta- tion.

devtag String The Developer Token you havereceived from Amazon.

Figure 2

R Result Datum m T Type e D Description Url String The URL of the product page forthis item on Amazon

Asin String The Amazon.com Standard Item Numberfor this product

ProductName String The name of the product (in our

case, the title of the book) Catalog String The category of the product (e.g.:bbooks)Authors String The name(s) of the author(s)

ReleaseDate String The release date, in human-readableformat (e.g.: "23 February, 1976").

Manufacturer String The name of the product's

manufac-turer (the publisher in our case)

ImageUrlSmall String A pointer to the products "small"image on the Amazon website

ImageUrlMedium String Same as above, for a slightly larg-er image

ImageUrlLarge String Same as above, but for an even

UsedPrice String The product's price for usedcopies.

Trang 12

This can be useful to simplify our code: first, we

cre-ate a proxy client, $$pprrooxxyy; any subsequent RPCs to

methods specified in the WSDL can be performed using

the proxy, without having to use the NuSOAP ccaallll(())

method again In our application, we will use proxies to

work with AWS

Designing the application

Now that we've laid down some ground rules, it's time

to decide in detail what the goals of our application are

going to be Since we're all PHP fans, our example

web-site will be about PHP and, therefore, we'll want to

allow our users to buy books on this topic from

Amazon

The first thing that we need is a search page: users

will be able to search for a particular keyword (or for a

set of keywords) and the page will display some basic

information about each book that matches the criteria,

such as its title, an image, the publishing company,

author or authors and price We also have to provide a

way to browse the results, since AWS calls only return

ten results per call

The search page should also contain a link for each

product to another page on our website that will

con-tain a detailed description of the book, including any

user reviews and comments From here, the users will

be able to continue their purchase on Amazon.com or

add the product to their wish lists

The search page

If you have had an opportunity to read through theAWS documentation, you have probably discoveredthat searches by keyword can be performed using theKKeeyywwoorrddSSeeaarrcchhRReeqquueesstt(()) method, which requires theparameters shown in Figure 1

Assuming that the call will be successful, the serverwill return an array containing several items:

• The TToottaallRReessuullttss element, which indicatesthe number of total results returned by thequery

• The TToottaallPPaaggeess element, which provides thenumber of pages available in the searchresult

• The DDeettaaiillss sub-array, which contains a set

of data about each search result matchingour search criteria that is included in thepage we have requested Given that a searchonly returns a maximum of ten items perpage, you can expect that this array willcontain no more than ten elements Thelliittee search mode returns the data shown inFigure 2

1 <form action=” <?=$PHP_SELF ?> ” method=”GET”>

2 <input type=”text” name=”keyword” value=”” />

3 <input type=”hidden” name=”page” value=1 />

4 <input type=”submit” name=”button” value=”Search!” />

5 </form>

6 <?php

7 if (empty( $_GET [ “keyword” ])) // If the form has’n been submitted

8 exit; // Stops the execution

9

10 require( “nusoap.php” );

11

12 $client = new soapclient ( “http://soap.amazon.com/schemas2/AmazonWebServices.wsdl” , true );

13 $proxy = $client -> getProxy (); // Creates a WSDL client and a proxy

14

15 $param = array(

16 ‘keyword’ => $_GET [ “keyword” ],

17 ‘page’ => $_GET [ “page” ],

26 if(empty( $results [ “Details” ])) // Checks whether there are results

27 die( “<h3>No results found for &quot;” $_GET [ “keyword” ] ”&quot;.</h3>” );

28

29 echo “<h3>Searched Amazon.com for &quot;” $_GET [ “keyword” ] ”&quot; - page “

30 $_GET [ “page” ] ” of “ $results [ “TotalPages” ] ”</h3>” ;

31

32 foreach( $results [ “Details” ] as $res ) // Prints each product details

33 echo “<img src=’” $res [ “ImageUrlMedium” ] ”’ align=’left’ /><br/>\n”

34 ”<a href=’details.php?asin=” $res [ “Asin” ] ”’><b>” $res [ “ProductName” ] ”</b></a><br /><br />\n”

35 ”<b>Authors</b>: “ @ implode ( ‘, ‘ , $res [ “Authors” ]) ”<br />\n”

36 ”<b>Publishing Company</b>: “ $res [ “Manufacturer” ] ”<br />”

37 ”<b>List Price</b>: “ $res [ “ListPrice” ] ” - <b>Our Price</b>: “

38 $res [ “OurPrice” ] ” - <b>Used Price</b>: “ $res [ “UsedPrice” ] ”<br /><br /><br />\n\n” ;

39

40 if( $_GET [ “page” ] > 1 ) // Prints a link to prev page if any

41 echo “<a href=’$PHP_SELF?keyword=” $_GET [ “keyword” ] ”&page=” ( $_GET [ “page” ]- 1 ) ”’>Previous Page</a>&nbsp;\n” ;

42 if( $_GET [ “page” ] < $results [ “TotalPages” ]) // Prints a link to next page if any

43 echo “&nbsp;&nbsp;<a href=’$PHP_SELF?keyword=” $_GET [ “keyword” ] ”&page=” ( $_GET [ “page” ]+ 1 ) ”’>Next Page</a>” ;

44 ?>

Listing 1

Trang 13

As you can see, the KKeeyywwoorrddSSeeaarrcchhRReeqquueesstt(())methodreturns quite a few pieces of information for everyresult item, although, of course, we don't have to out-put all of them on our site If you look at Listing 1—thesource for our search page—you'll see that the very firstpart of the file is nothing more than a simple HTMLform, which contains an input text box for the keywordand a hidden field that forces the page number to 1—this way, a new search will automatically start from thefirst page of results.

The form uses the GET method because we need touse links for the "Next Page" and "Previous Page" oper-ations (something like ppaaggee pphhpp??kkeeyywwoorrdd==bbllaahh&&ppaaggee==22)

Naturally, you could also use POST, but in that case it

would be much more difficult for someone to create adirect link to your search results, which could, in theo-

ry, prevent you from completing some sales

The second part of the script contains the actual PHPcode First of all, an if-then-else control block stops theexecution of the script if $$ GGEETT[[""kkeeyywwoorrdd""]] is empty.Otherwise, we include NuSOAP and create a SOAPclient by passing the URI of the ** wwssddll file for Amazon(which is provided in AWS documentation) and theboolean ttrruuee to indicate to the constructor of the ssooaapp cclliieenntt(()) class that the SOAP client features WSDL sup-port We also create a proxy to call AWS methodsdirectly as we have seen in the first part of the article.The parameters needed to invokeKKeeyywwoorrddSSeeaarrcchhRReeqquueesstt(())are stored in the $$ppaarraamm array;the first two (the keyword and the page number) are to

be found in the $$ GGEETT superglobal, since they changeeach time we perform or browse a search, while theothers are constant and, therefore, we hardcode them

in our script Remember to insert your developer token

ry of the search: the keyword, the current page ber and total page count, followed by details abouteach product in the current result page These are actu-ally produced by a simple ffoorreeaacchh loop, which brows-

num-es the $$rrenum-essuullttss[[""DDeettaaiillss""]] array, eecchhooing the title ofeach book, a medium-size image, its authors, publish-ing company and prices We will also provide a link toanother page, ddeettaaiillss pphhpp, which contains furtherinformation on each book The link contains a refer-ence to the product's ASIN (the Amazon identifier foreach product) in order to make the application able toretrieve the correct product from Amazon's cataloguewith another RPC

The last part of this page allows the user to browsethe results: if the current page isn't the first one (Page

F

FE EA AT TU UR RE E Connecting to Amazon.com Web Services with NuSOAP

E

Rating Integer The rating of the product in this

review

Summary String A summary of the review

Comment String The full review itself

The type of search In this case, we'll choose h heavy, since we want all the information available on a particular book

R

Result Datum m T Type e D Description

SalesRank Integer The product's sales ranking

Lists Array ofStrings The names of the ListMania liststhat contain the product

BrowseList Array of

Arrays

Indicates the product categories in which the product can be found Its contents look like this:

BrowseList =>

Array ( [0] => Array ( BrowseName => PHP )

takes to be shipped

This array contains information about the customer reviews associ- ated with the product It includes three elements: A AvgCustomerRating, which indicates the average cus- tomer rating for the product, T

TotalCustomerReviews s, which tains the number of customer reviews available and C

con-CustomerReviews, which is an array that contains the three most recent reviews (you can find the contents

of this array in Figure 6).

SimilarProducts Array ofStrings Contains the ASINs of products thatare similar to this one.

Trang 14

1), the script prints a link to the previous one and, if it

isn't the last page (based on the information returned

by our AWS call), it prints a link to the next one

Figure 3 shows our search page at work

The Product Detail Page

Now that we are done with the first part of the

applica-tion, it's time to move on to the product detail page,

which will show advanced information about a

particu-lar book The AWS method we need in this case is

AAssiinnSSeeaarrcchhRReeqquueesstt(()), which needs the parameters

shown in Figure 4 Just like before, the response that we

get back from Amazon is an array of arrays—except

that, in this case, we will simply concern ourselves with

the first result set, since the ASIN uniquely identifies

one product Our data, therefore, will be stored in

$$rreessuullttss[[''DDeettaaiillss'']][[00]], which, in turn, will contain

the information shown in Figure 5 As you can see,

some of the values returned are the same as the results

of the KKeeyywwoorrddSSeeaarrcchhRReeqquueesstt(()) call that we used in

Listing 1, while some others, like the customer reviews,

are more appropriate for a detailed product page

Speaking of the product page, Listing 2 contains the

code for ddeettaaiillss pphhpp First, we check $$ GGEETT[[""aassiinn""]]; if

it is empty, the program displays a warning and exits

In a more complete application, you may want a

slight-ly more verbose explanation of what went wrong, orperhaps an automatic redirection to the search page

If we have an ASIN, we include the NuSOAP library,then create a SOAP client and proxy as we did in theprevious page Please note that we have to use

sspprriinnttff(()) to transform the ASIN in a ten-characterstrings, since AWS requires it to be submitted in thatformat (as an alternative, you could use ssttrr ppaadd(()) toensure that the string is ten character long)

This time, we only need to pass the ASIN and specifyhheeaavvyy as the search type Once the RPC has been exe-cuted, we retrieve the results and print them out, using

a ffoorreeaacchh loop to cycle through the user reviews

The final touch in our application consists of ing a link back to the Amazon website in order to make

provid-it possible for our users to purchase a product—youcan't do much selling by just showing which productsare available!

The AWS documentation specifies that an HTTP formmust be set up for the purpose of submitting the pur-chase information over to Amazon.com This form (you

can look at the one in Listing 2 for an example) uses the

POST method, and its aaccttiioonn attribute is really nothingmore than a page on Amazon.com that contains the

1 <?php

2 if(empty( $_GET [ “asin” ]))

3 die( “<h3>No ASIN specified</h3>” );

4

5 require( “nusoap.php” );

6 $_GET [ “asin” ] = sprintf ( “%010d” , $_GET [ “asin” ]);

7

8 $client = new soapclient ( “http://soap.amazon.com/schemas2/AmazonWebServices.wsdl” , true );

9 $proxy = $client -> getProxy (); // Creates a WSDL client and a proxy

20 <h1> <?=$results [ “Details” ][ 0 ][ “ProductName” ] ?> </h1>

21 <img src=” <?=$results [ “Details” ][ 0 ][ “ImageUrlLarge” ] ?> ” align=”left” height=”350” />

22 <b>Authors:</b> <?= @ implode ( ‘, ‘ , $results [ “Details” ][ 0 ][ “Authors” ]) ?> <br /><br />

23 <b>Published by</b> <?=$results [ “Details” ][ 0 ][ “Manufacturer” ] ?>

24 <b> on</b> <?=$results [ “Details” ][ 0 ][ “ReleaseDate” ] ?> <br /><br />

25 <b>List Price</b>: <?=$results [ “Details” ][ 0 ][ “ListPrice” ] ?> -

26 <b>Our Price</b>: <?=$results [ “Details” ][ 0 ][ “OurPrice” ] ?> -

27 <b>Used Price</b>: <?=$results [ “Details” ][ 0 ][ “UsedPrice” ] ?> <br /><br /><br />

28 <!— Form to purchase on Amazon.com —>

29 <form method=”POST” action=”http://www.amazon.com/o/dt/assoc/handle-buy-box= <?=$_GET [ “asin” ] ?> ”>

30 <input type=”hidden” name=”asin <?=$_GET [ “asin” ] ?> ” value=”1”>

31 <input type=”hidden” name=”tag-value” value=”webservices-20”>

32 <input type=”hidden” name=”tag_value” value=”webservices-20”>

33 <input type=”hidden” name=”dev-tag-value” value=”YOUR-DEV-TOKEN”>

34 <input type=”submit” name=”submit.add-to-cart” value=”Buy From Amazon.com”>&nbsp;&nbsp;

35 <input type=”submit” name=”submit.add-to-registry.wishlist” value=”Add to Wish List”>

36 </form>

37 <!— End Form —>

38 <b>ISBN:</b> <?=$results [ “Details” ][ 0 ][ “Isbn” ] ?> <br /><br />

39 <b>Availability:</b> <?=$results [ “Details” ][ 0 ][ “Availability” ] ?> <br /><br /><br />

40 <b>Sales Ranking:</b> <?=$results [ “Details” ][ 0 ][ “SalesRank” ] ?> <br /><br />

41 <b>Average customer rating:</b> <?=$results [ “Details” ][ 0 ][ “Reviews” ][ “AvgCustomerRating” ] ?>

42 <br /><br /><h2>Read user reviews:</h2>

43 <?php

44 foreach( $results [ “Details” ][ 0 ][ “Reviews” ][ “CustomerReviews” ] as $res )

45 echo “<h3>” $res [ “Summary” ] ”</h3>”

46 ”<b>Rating: </b>” $res [ “Rating” ] ”<br /><br />” $res [ “Comment” ] ”<br /><hr />” ;

47 ?>

Listing 2

Trang 15

ASIN of product that must be added to the user's

shop-ping basket A few additional hidden fields provide the

ASIN, the Associates Id and the Developer's token The

form supports two different buttons: one adds the

product to the user's basket, while the other adds it to

his wishlist

Further Improvements

As you have probably noticed, writing a SOAP-based

application using a library like NuSOAP is much faster

than developing your own SOAP classes—if you have

read my article about the Google API that appeared on

the January issue of php|a, you probably know what I

am talking about This means that you can develop

rather complex applications without having to waste

time dealing with the nitty-gritty details of the

underly-ing protocol; in fact, we didn't even write any SOAP

code for our Amazon application—NuSOAP did it all for

us

Naturally, the code that I have introduced here is very

basic and could stand to gain from some

improve-ments For instance, Amazon Web Services allow you to

to manage a a remote shopping cart or wish list by

adding and removing items to them The very last part

of the purchase—the one where money changes

hands—must still take place on Amazon.com, but you

can let the user perform most of the normal operations

associated with an e-commerce website without

leav-ing your website However, do keep in mind that if you

choose to manage the user's shopping cart remotely,

you can't change it once you've submitted to

Amazon—this is done to protect the end user from

fraudulent transactions You can check out the AWS

documentation for more details on this topic—you'll

find that it's not complicated at all

Depending on your needs, you may choose to form a different kind of search operation on your web-site: by similar products, by author, by ISBN, by manu-facturer, and so on You may also want to browse a

per-"node", or product category (e g "programming",

"web", etc.) directly, without performing a search Itgoes without saying that all this depends on what yourgoals are

If your Amazon-based shop becomes very popular,you may decide to join the Amazon AssociatesProgram, an affiliate system that pays you commissions

on every sale Be careful, however, that your applicationmust not send more than one request per second toAmazon—even if you provide an error handling system,you must not immediately retry a request if the previ-ous one has failed

You should also provide a caching system, in order tostore the data needed by your site without going backand forth to AWS for every request—you can check outBruno Pedro's excellent article in the February 2004issue of php|a for more idea on caching data from yourPHP scripts If you choose to do so, don't forget thatyou can't keep your data cached for more than twenty-four hours

Finally, please keep in mind that in the examplesshown in this article we always referred toAmazon.com, the American website AWS are alsoavailable for Amazon.co.uk, Amazon.de andAmazon.co.jp, but you have to modify the URIs in thescript, changing the specifications in the WSDL docu-ment from [soap.amazon.com/] to soap-eu.amazon.com/, and so on You will also have to addthe locale parameter to your RPC invocations—its valuecan be set to uk, de or jp, depending on which Amazon

F

FE EA AT TU UR RE E Connecting to Amazon.com Web Services with NuSOAP

Figure 3

Trang 16

website you are referring to.

I'm Outta Here

Amazon.com Web Services is a powerful tool that you

can use to add e-commerce functionality to your site

without going to the expense of developing an online

store of your own and stocking all the merchandise

Even if you can't create a complete on-line shop using

ASW (because the purchase must be completed on the

Amazon website), you can still give your users a

cus-tomized shopping experience that relies on the

practi-cally limitless resources of one of the world's most

pop-ular e-commerce websites

The sample application that I showed you in this

arti-cle is quite simple: if you plan to use it in a production

environment—especially if your site has a lot of traffic—

you should probably consider implementing features

like error handling and caching in order to prevent

problems with the Amazon servers Adding these

ele-ments to your application may require some extra

work, but it could all pay off if you enjoy decent traffic

and join the Amazon Associates Program

Perhaps most importantly, I hope to have given you

a good idea of how much a SOAP library (in this article

we have chosen NuSOAP, but there are some others

packages, like PEAR::SOAP) can simplify the creation of

a complex application—write in few lines of code toperform a Remote Procedure Call and you're practical-

To Discuss this article:

http://forums.phparch.com/130

Alessandro Sfondrini is a young Italian PHP programmer from Como He has already written some on-line PHP tutorials and published scripts on most important Italian web portals You can contact him at

g giu_ale2@hotmail.com m.

FavorHosting.com offers reliable and cost effective web hosting

SETUP FEES WAIVED AND FIRST 30 DAYS FREE!

So if you're worried about an unreliable hosting provider who won't be around in another month, or available to answer your PHP specific support questions Contact us and we'll switch your information and servers to one of our reliable hosting facilities and you'll enjoy no installation fees plus your first month of service is free!*

Please visit http://www.favorhosting.com/phpa/

call 1-866-4FAVOR1 now for information.

- Strong support team

- Focused on developer needs

- Full Managed Backup Services Included Our support team consists of knowledgable and experienced professionals who understand the requirements of installing and supporting PHP based applications.

Trang 17

Regular expressions (commonly known as regexes)

are a powerful tool for pattern matching and textmanipulation A typical problem that pulls peopleinto learning regular expressions is text munging: youhave a string of text and you need to replace portions

of it based on certain rules For instance, you might want to obfuscate all the email addresses

in a block of text so that email addresses likeg

george@example.comm get translated to the formg

george [[at] eexample [[dot] ccom Regularexpressions are the tool for the job, and provide a pow-erful and deep syntax for handling tasks like these

Alternatives to the PCREFunctions

PHP supplies some alternatives to the PCRE functions.The most direct competitor is the POSIX regular expres-sion library that consists of eerreegg, eerreegg rreeppllaacceeand oth-ers We won't be looking at the POSIX regular expres-sion functions because the PCRE library provides abroader pattern-matching facility than its POSIX coun-terpart and the PCRE library is about 30% faster onaverage The other option is to perform string match-ing with the standard string functions As noted above,

Matchmaker, Matchmaker Make Me A Match

An Introduction to Regular Expressions

by George Schlossnagle

PHP: ANYOS: AnyApplications: N/ACode Directory: match-regex

REQUIREMENTS

A quick search for the words "hate" and "regular

expres-sions" on your favourite search engine is likely to bring up

thousands upon thousands of hits While most developers

recognize the usefulness of regular expressions (and many

can't do without them once they have figured out how

regexes work), their use remains something of a

black-magic art—right up there with hypnosis and session

man-agement Despite looking complicated, however, regular

expressions are much easier to work with than most

peo-ple are willing to admit.

Before we get started, we should dispel a

few popular myths about regexs:

Myth: Regular Expressions are Slow

Truth: Regular expressions can be slow,

but they don't need to be The main

reg-ular expression library used by PHP (called

PCRE and consisting of the pprreegg family of

functions) is quite fast and also quite

powerful This power means that it is

easy to write a short regular expression

that performs a lot of work, and

perform-ing a lot of work with any tool can be

slow

Myth: You should use basic string

func-tions instead of regular expressions

Truth: Regular string functions (for

example ssttrrssttrr or ssttrrttookk) are (marginally)

faster than the regular expression to

accomplish the same task That having

been noted, this myth often leads to

peo-ple impeo-plementing complicated string

parsers using string matching functions

where a single regular expression would

do the trick The PCRE library will always

match complex patterns faster than

implementing a parser on your own

A Few Myths about Regexes

Trang 18

the string functions are faster on the tasks they were

designed for (finding specific characters or substrings),

but are not an appropriate fit for anything but the

sim-plest patterns

Your First Regex

The simplest regex is a match against a static string To

determine if the string 'george@example.com' is

pres-ent in a piece of text, we can use the following code

Despite its simplicity, this example illustrates the

basic syntax of a regex match The regex itself is the

first parameter, and is contained within slashes ([/])

The second parameter is the text you want to test

the pattern against The pprreegg mmaattcchhfunction returns

ttrruueeif the match succeeds, and ffaallssee if it fails Using

slashes to delimit regular expressions is a convention

(taken from the UNIX utility awk), but is not

neces-sary—you can actually use any non-alphanumeric

character Alternative delimiters are convenient if

your pattern itself contains slashes

For instance, when dealing with file

paths or URLs (both of which

con-tain numerous slashes), it is common

to use a different delimiter

We can also perform substitutions

with PCREs To substitute 'george aatt

nospam.example.com' for my address

(a common anti-spam technique), you

The other PCRE functions are:

• ppccrree ggrreepp((ssttrriinngg ppaatttteerrnn,,

aarrrraayy ssuubbjjeeccttss [[,, iinntt ffllaagg]]))—ppccrree ggrreepp

applies the specified ppaatttteerrnn to every

ele-ment of ssuubbjjeeccttss, returning an array

consist-ing of those that matched If the optional

ffllaagg is set to PPRREEGG GGRREEPP IINNVVEERR, only those

elements that did not match will be

returned

• ppccrree mmaattcchh aallll((ssttrriinngg ppaatttteerrnn,, ssttrriinngg

ssuubbjjeecctt [[,,aarrrraayy mmaattcchheess,, iinntt ffllaaggss]]]]))—

ppccrree mmaattcchhreturns only the first match

found in its subject text ppccrree mmaattcchh aallll

matches as many times as possible,

return-ing an array of all the matches I will discuss

this function in more detail later in the cle

arti-• pprreegg rreeppllaaccee ccaallllbbaacckk—This functionmakes it possible to perform very complexoperations on a per-match basis throughthe use of callback functions We will cover

it in a future article, but some of its tionality overlaps with evaluated replace-ments, which are discussed in this article

func-• pprreegg qquuoottee((ssttrriinngg tteexxtt))—When using inputtext in a pattern, you may want to sanitize it

to ensure it does not contain any regexmetacharacters pprreegg qquuootteeescapes all regexmetachacters in a string

• pprreegg sspplliitt((ssttrriinngg ppaatttteerrnn,, ssttrriinngg ssuubbjjeecctt[[,, iinntt lliimmiitt [[,, iinntt ffllaaggss]]]]))—pprreegg sspplliitt

performs similarly to eexxppllooddee, allowing us tobreak up the string ssuubbjjeecctt into lliimmiitt parts

Instead of splitting on a specific delimiter,

pprreegg sspplliittallows the string to be brokenbased on a regex

The power of regular expressions is inmatching complex patterns that can-not be identified using straightforwardtext-search functions like ssttrrssttrr(()) Thebasic components of a regular expres-sion pattern are:

• Character Classes—Patterns rarely consist of

specified letters, but classes of letters Forexample 'any number' instead of a particularnumber, or 'any letter' instead of a particularletter

• Grouping—Grouping allows for changing

the precedence of operations as well asproviding a means to extract the text youmatched with a pattern

• Enumerations—Enumerators allow you to

specify how many times a character class orsub-pattern appears This allows for conven-

“The power of lar expressions is

regu-in matchregu-ing plex patterns that cannot be identi- fied using straight- forward text- search functions like s st tr rs st tr r( () ).”

Trang 19

com-ient expression of fixed length patterns like

'a US zipcode is 5 digits' as well as variable

length patterns such as 'a domain is a

num-ber of alphanumeric characters separated by

dots'

• Alternations—Alternations allow for multiple

patterns to be combined Unlike character

classes, which allow for a position to match

multiple characters, alternations allow for

entire patterns to be alternatively matched

For example, a valid workday can be

Monday, Tuesday, Wednesday, Thursday or

Friday

• Positional Anchors—Anchors allow you to

require your pattern to start matching at a

specific location in the search text, for

exam-ple at the beginning or end of a line

• Global Pattern Modifiers—Global pattern

modifiers allow you to change the basic

behavior of a regular expression, for

exam-ple rendering it case-insensitive

Character Classes

While it's usually easy to find a particular substring

within a larger string—for example, my e-mail address

in a message—it's not always easy to find a particular

type of substring-like any e-mail address To do this,

you need to be able to match against a more generic

pattern and not just against a static string PCRE

sup-plies character classes to allow you to do this; a

char-acter class allows a specific charchar-acter in a search text

to be matched against a range of possible characters

For example, a US phone number is composed of a

three digit area code, a three digit exchange, and a four

digit line number, commonly delimited by a '-' To

match this pattern, you could use the following regular

expression:

/\d\d\d-\d\d\d-\d\d\d\d/

The \\dd specifier is a built-in PCRE character class

that consists of all the digits There are a couple

things you should note about the pattern above The

first is that we have many \\dd's In regular

expres-sions, any character or character class matches only

a single character unless you use an enumerator

(which we'll cover later) to attach a quantity to it

Second, if you test this pattern you will find the lowing results

fol-• 555-123-4567 matches This is correct

• 5555-123-45678 matches This is not rect

cor-The second example does not represent a validphone number (the area code and line number are toolong), but it matches because the pattern fits as shown

in Figure 1

There are a couple of ways to combat this problem

If you know that your search text should be exactly aphone number (with no leading or trailing text), youcan use positional anchors to force the pattern to start

at the beginning of the text and end at the end, as we'llsee later on

If the phone number might be contained in text, onthe other hand, you might try and fix the pattern byhaving the numbers have at least one character of lead-ing and trailing whitespace, using a pattern like:

/\s\d\d\d-\d\d\d-\d\d\d\d\s/

The \\ss specifier is another character class for allwhitespace (spaces, tabs, newlines, etc.) This pat-tern does not work in all situations, though, since ifthe text begins with the phone number you will beunable to match the leading \\ss To handle this case,PCRE supports \\bb—a boundary condition thatmatches at the border (or boundary) between a'word' and a 'non-word' (these are words in the Cprogramming language sense—letters, numbers andunderscores only) \\bb is actually not a character class,but what is known as a 'zero-width assertion'; thismeans that the \\bb specifier does not actually matchthe character on the other side of the boundary, butonly ensures that such a boundary exists Puttingthat into our pattern we can refine it to:

/\b\d\d\d-\d\d\d-\d\d\d\d\b/

Continuing the testing, we find that "077-xxx-yyyy"matches US and Canadian area codes and exchangescannot begin with 0 or 1 (these are reserved for longdistance and operator-assisted or international servic-es) To be able to restrict the leading numbers to theallowed set, we need to be able to create our owncharacter classes In PCRE, these are constructed byfilling a set of brackets ([[ ]]) with the characters wewant to match To match 2-9, we can use the charac-ter class [[2233445566778899]], which is commonly shortened via

a range operator to [[22 99]] To use a custom characterclass in a pattern, you use it exactly as you would aregular character or character class Here is the phonenumber pattern reworked to employ this:

Trang 20

PCRE provides six commonly used built-in

charac-ter classes, described in Figure 2 Additionally, PCRE

provides POSIX-style character classes for

compatibil-ity with POSIX-style regular expressions These

class-es are dclass-escribed in Figure 3 POSIX character sets

aren't commonly used much in real-life code, which

is a shame because they are often a perfect fit for

problems that programmers encounter in their

day-to-day work

You can negate a POSIX character class by adding a

^^ after the first colon For instance, to match all

non-let-ter characnon-let-ters, you could use the class ::^^aallpphhaa::

Negations are also available in custom character

classes—for example, to match anything that is not the

greater-than character (>), you can use the custom

character class [[^^>>]] Negations are very useful when

you are creating regular expressions that extract

quot-ed text or if you want to manually parse XML or HTML

Since ' ', '^^' and '[[ ]]' have special meanings in

cus-tom character classes, if you want those actual

char-acters to be elements of the class, you should escape

them with a backslash (\\) The two exceptions arethe range operator , which can appear un-escaped

as the last character in a class, since that is biguous, and the negation character ^^, which canappear un-escaped in any position but the first

unam-Grouping and Sub-PatternsUsually, you will not only want to match a pattern, butextract data from it as well To extract a specific part of

a pattern, you surround it within parentheses Forexample, to capture each part of the phone numberpattern, you would add parentheses as follows:

/\b([2-9]\d\d)-([2-9]\d\d)-(\d\d\d\d)\b/

B

Basic Character Classes

Matches any character

\w An alphanumeric character or the underscore char-acter.

:alpha: Any letter :alnum: Any alphanumeric character

:ascii: Any ASCII character :cntrl: Any control chatacter.

:digit: Any digit (same as \d)

:graph: Any alphanumeric or punctuation character.

:lower: Any lowercase letter.

:print: Any printable character.

:space: Any whitespace character (same as \s).

:upper: Any upperspace character.

:xdigit:] Any hexadecimal 'digit'.

Figure 3

Trang 21

Pattern fragments grouped in this fashion are called

sub-patterns To see what they capture, you need to

pass a third argument to {preg_match} This

argu-ment is set by the function as an array with the

cap-tured sub-pattern results in it The zeroth element the

array is the text matched by the pattern as a whole,

while the sub-patterns captures are at the offset of

their pattern number Patterns are numbered

left-to-right and outside-to-inside So in the pattern above

the entire phone number is offset 0, the area code is

sub-pattern 1, the exchange is sub-pattern 2, and the

line number is sub-pattern 3

Here you can see a sample phone number being run

through the regular expression

$text = 'My phone number is 555-321-1212';

We can also nest patterns If we wanted to capture

the entire local part of the phone number, in addition

to its componentized parts, the regex could be

modi-fied to be:

/\b([2-9]\d\d)-(([2-9]\d\d)-(\d\d\d\d))\b/

When we nest patterns, we move left to right and,

when we hit a nested pattern, we take the outermost

part first, then recursively parse its contents following

the same rules With the above pattern, the patterns are

numbered as shown in Figure 4

Sub-patterns are also extremely useful in

substitu-tions, since they allow us access to the matched patterns when performing the replacement A cap-tured sub-pattern can be accessed in the{preg_replace} replacement text by referencing its off-set as \\NN (where NN is the sub-pattern number) Here is

sub-an example that ssub-anitizes phone numbers by ing their line number:

obscur- (\d\d\d\d)\b/",

preg_replace("/\b([2-9]\d\d)-([2-9]\d\d)-'\1-\2-XXXX', $text);

If we run this on the text 'My phone number is 555-1212.', it returns 'My phone number is 410-552-XXXX'

410-Note that the replacement string in the above ple is single-quoted If we were to double quote it, wewould have to double escape our sub-pattern refer-ences as ""\\\\11 \\\\22 XXXXXXXX"" This may seem mysterious butthe reasoning is this: the PCRE library needs to bepassed the sub-pattern references as \\11, but when wedouble-quote a string, PHP attempts to interpret theescaped characters for us Single-quoting performs nosuch interpretation and leaves your referencesuntouched This is the same process by which "\n"becomes a newline, but '\n' remains literally '\n'

exam-We can reference sub-patterns in matches as well,using the same rules A fun example of this is findingall 6-letter palindromes A palindrome is a word that

is spelled the same forward and backward, for ple 'noon' or 'deed' To spot a six-letter palindrome,

exam-we match 3 characters and require that exam-we see themimmediately in reverse order Here is the pattern:

F

FE EA AT TU UR RE E Matchmaker, Matchmaker Make Me A Match

This isn't the full story on RFC compliant emailaddresses Because the specification allows foraddresses to contain descriptions as well, a com-pletely accurate email address validator is actu-ally quite complex An example can be found atthe end of Mastering Regular Expressions in Perl

- the regex presented there is X characters long!

For most purposes, the regex presented above iscompletely sufficient

Enumeration modifiers can also be used tocompress patterns with long repetitive parts

For instance, the phone-number pattern can becompressed to:

/\b[2-9]\d{2}-[2-9]\d{2}-\d{4}\b/

or, by noting that the area code and exchangematch the same pattern, we can compress iteven further, as follows:

Trang 22

When we run this pattern against a palindrome like '

hallah', it matches as shown in Figure 5

Notice that you need to use \\bb to make sure you

don't misidentify words that contain palindrome

sub-strings If you are running on a UNIX system, Listing 1

is a code block that will find all the six-letter

palin-dromes in the dictionary file //uussrr//sshhaarree//ddiicctt//wwoorrddss

When we use pprreegg mmaattcchh aallll with sub-patterns, we

have two choices of how we want the data returned to

us The default behavior is for the match array to

con-tain an array for each sub-pattern, where that array

contains the capture for the nth search match as its nth

element If that's confusing, here is how it looks when

matching all the phone numbers in a text:

The alternative is to pass the optional flag

PPRREEGG SSEETT OORRDDEERR With this flag set, the ordering of the

match array is reversed: the match array contains one

element for each search text matched, with that array

containing the sub-pattern captures for that search

text If we are looking to replicate the Perl idiom

while($text =~ /$regex/g) {

# perform work on one set of matches at a time

}

you can accomplish it with this PHP:

preg_match_all($regex, $text, $matches,

To handle this, PCRE supplies enumeration modifiers.

The most basic description of an email address is anumber of non-whitespace characters, followed by an'@', followed by more non-whitespace characters \\SS isthe character class for all non-whitespace characters, sousing that we can write this simplistic email-matchingpattern as:

/\S+@\S+/

++ is a PCRE enumerator that instructs the regexengine to match one or more instances of the charac-ter or character class it applies to PCRE supports anumber of enumeration methods for specifying that acharacter or character class should be matched multi-ple times, as you can see in Figure 6

The ++ and ** modifiers are both greedy This means

they will always match as long a sub-pattern as ble This is not always the way you want your patterns

possi-to behave, but I will leave the details of when we mightwant a greedy or non-greedy match to a later article.Enumeration modifiers can be applied not only tocharacters and character classes, but to sub-patterns aswell This allows for some pretty complex pattern gen-eration, which is, after all, one of the best features ofregular expressions (at least when you can understandwhat they do)

For example, we can use enumeration modifiers tosignificantly improve our email-address pattern

Figure 6

E Enumeration Modifiers

* Match 0 or more times.

+ Match 1 or more times.

? Match 0 or 1 times.

{m} Match exactly m times.

{m,n} Match between m and n times.

{m,} Match at least m times.

{,n} Match between 0 and n times.

Trang 23

According to RFC 2822, which defines the "official"

valid email address syntax, an email message is

com-posed of a localpart, an '@' and a domain The localpart

is one or more characters from the set

[[\\ww!!##$$%%""**++\\//==??``{{}}||~~^^ ]], while a domain is a

dot-sepa-rated list of parts composed of \\ww The pattern for the

local part is almost identical to the definition of \\SS++:

/[\w!#$%"*+\/=?`{}|~^-]+/

The pattern for domains is more complex First, we

need to identify elements in the string These are given

by

/[\w-]+/

If we only have two such elements, the domain

pat-tern would look like this:

/[\w-]+\.[\w-]+/

Note that since '.' is a special regex character (the

wild-card character class), we must escape it to have it

match just the '.' character Since we can have an

arbi-trary number of dot-separated segments, we will

enca-puslate the first part of the pattern in a sub-pattern and

use the '+' enumerator to specify that it must occur one

or more times:

/([\w-]+\.)+[\w-]+/

Creating a sub-pattern simply involves placing it

inside parentheses Combining the local and domain

patterns together, we arrive at a decent regular

expres-sion for matching valid email addresses:

/[\w!#$%"*+\/=?`{}|~^-]+@([\w-]+\.)+[\w-]+/

We can use this regular expression to perform the

anti-spam rewriting we illustrated at the beginning of

The last of the basic regular expression syntactical

ele-ments is alternation Where character classes let us

match a single character against a set of allowed

char-acters, alternations allow for matching a string against

multiple sub-patterns For example, we might want to

identify all HTTP and FTP addresses in a document for

auto-linking or indexing purposes We could do this

with two regular expressions:

#https?://\S+#

#ftp://\S+#

but this will require the document to be completely

scanned twice Note that we are using ## as a delimiter

and not //, since our pattern contains slashes and wewould rather not have to escape them A more elegantapproach is to combine them using an alternation, asfollows:

#(https?|ftp)://\S+#

The alternation operator || means that the tern ##((hhttttppss??||ffttpp))## matches either ##hhttttppss??## ('http'with an optional 's') or ##ffttpp## To use this to automati-cally create anchor tags for all linked content, we canuse a replacement like this:

sub-pat-preg_replace('#((https?|ftp)://\S+)#', '<a href="\1">\1</a>', $text);

Running this over a sample text, we notice that anypreexisting anchor tags will become munged Forexample:

Come visit us at <a href="http://www.phpa.com">phpa.com</a>.

preg_replace('#([^\'"])((https?|ftp)://\S+)([:punct:])

#', '\1<a href="\2">\2</a>', $text);

Note here that we need to capture and return inthe substitution the non-quote (^^\\''"") character wematch before the URL to avoid losing it, and that wehave to escape the single quote, since it the entirepattern is part of a single-quoted string

Positional Anchors

In the example of matching valid US phone numbers,the regular expression we had was good for spottingphone numbers in a block of text, but not for validat-ing that a block of text is a phone number To do that,

we need to ensure that the phone number is the onlyelement in the search text, with no leading or trailingcomponents Anchors help solve this problem To man-date that our phone number match starts at the begin-ning of the search test and ends at the end of it, we canmodify our regex as follows:

/^([2-9]\d{2})-([2-9]\d{2})-(\d{4})$/

The leading ^^ anchors the match at the beginning

of the text, meaning that the match will only succeed

F

FE EA AT TU UR RE E Matchmaker, Matchmaker Make Me A Match

Trang 24

if it begins there The trailing $$ anchors the match at

the end of the text, meaning that the match will only

succeed if the pattern terminates on the final

charac-ter of the text to be matched against

Here we use a slightly modified version of the

anchored pattern to make a function useful for

validat-ing user-inputted data If the phone number is valid, it

will return an array of its components If not, it will

return ffaallssee The regex has been made a bit more

robust by allowing the delimiter (previously ) to be

replaced by an optional or whitespace

if(preg_match($regex, $phone, $matches)) {

return array( 'area_code' => $matches[1],

'exchange' => $matches[2], 'line_number' => $matches[3]);

}

return false;

}

Don't confuse the anchor operator ^^ with the

negat-ed character class operator [[^^]] Because an anchor is

not a character class (in fact it's a special zero-length

look behind assertion, but that's a topic for a later

arti-cle), it has no meaning inside a character class

Anchors are also useful for extracting information

near the beginning or end of a string For example, a

line from an Apache Common Log Format logfile looks

like the following:

10.80.117.254 - - [13/Feb/2004:14:53:01 -0500]

"GET /~george/blog/ HTTP/1.1" 200 43489

This says that on February 13, 2004 a request for

"/~george/blog/" was made from the IP address

10.80.117.254 This request was successful (it returned

a 200 Request OK response code), and the amount of

data returned was 43489 bytes Writing a full parser for

this log line is not too difficult (we will do so in the

cookbook section at the end of the article), but many

queries do not require parsing the entire log For

instance, if we want to count the number of rences of each response code, the expression to use isquite simple Looking at the log format, we see that thelast two fields are numbers, and we want the next tolast one Expressed as a regex, that pattern looks likethis:

occur-/(\d+) \d+$/

Working backwards, this says we first match the end

of the line ($$), then a number (which we don't bother

to capture), then a number which we do want to ture (the response

cap-code) We can wrapthis into a quick script

to determine the quency of variousresponses as shown inListing 2 When wedon't need to parse anentire text string, espe-cially if its format iscomplex, anchors canmake our life mucheasier

fre-Global Pattern ModifiersThe final regular expression syntactical elements we are

going to discuss in this article are global pattern

modi-fiers As their name implies, global pattern modifiers

change the overall behavior of the pattern By far themost common of these is the case insensitivity modifi-

er, ii Global modifiers are implemented in the Perlstyle, directly following the pattern they apply to Here

is a function which uses a regex to extract all addressesunder a specified domain from a subject text, regard-less of the casing of the domain (domains are caseinsensitive)

function extract_addresses($domain, $text) {

$domain = preg_quote($domain);

]+)@$domain/i',

if(preg_match_all('/([\w!#\$%\"*+\/=?\'{}|~^-$text, $matches, PREG_PATTERN_ORDER)) { return $matches[1];

} return false;

}

Notice here that, in addition to using the ii modifier,

we also use pprreegg qquuoottee to sanitize $$ddoommaaiinn Data thatcan potentially come from an untrusted source (such as

a user) should always be quoted to prevent the dental or malicious inclusion of regex characters Also,

acci-we use the PPRREEGG PPAATTTTEERRNN OORRDDEERR flag so that all the pattern \\11 matches are stored in $$mmaattcchheess[[11]].Otherwise we would need to iterate over $$mmaattcchheess andmanually build the result set

sub-The other possible pattern modifiers are as follows:

7 if(( $fp = fopen ( $logfile , “r” )) == false ) {

8 print “Error opening $logfile \n ”

18 foreach ( $frequency as $code => $occurences ) {

19 print “$code \t $occurences \n ”

Trang 25

• mm (treat as multiline) By default, PCRE

assumes that we intend our search text to

processed as one big string, and ^^ and $$

will match only the beginning

and ending of the search text,

respectively When the mm

modi-fier is used, ^^ and $$ will match

at the beginning and ending of

every line in the pattern (the

search text is considered to be

broken into lines by any

new-line characters)

• ss (treat as single line for

wild-cards) By default the wildcard

character ( ) will not match a

newline If should match

new-lines as well, add the ss modifier to the

pat-tern

• xx (extended legibility) By default, any

white-space in a pattern is considered part of the

pattern Allowing whitespace in a pattern

can be helpful for readability and inline

comments Compare the following two

([2-9]\d{2}) # Match the exchange as subpattern 2

[.\s-]? # An optional delimiter - dot, dash or

ws

(\d{4}) # Match the line number as subpattern 3

/x

More information of creating readable

pat-terns will be covered in a future article

• AA (Start anchored) This modifier is

equiva-lent to putting a ^^ at the start of our

pat-tern—it anchors the pattern at the start of

the search text Thus the following two

regular expressions are equivalent:

/^Subject: (.*)/

/Subject: (.*)/A

There are no benefits of using this method

over manually anchoring a pattern with ^^

(other than, perhaps, moving the anchor

character from the beginning of your

pat-tern to its end)

• DD (Dollar end-only) If this modifier is set, the

dollar end-anchor $$ will match only at theend of the string By default, $$ will matchbefore the final character if that character is

a newline This is ignored if the mm modifier is

also used

• SS (Study) If we are going toexecute a pattern a number oftimes, we can use this flag toinstruct PCRE to take extra time'studying' the pattern to improveits efficiency

• UU (Ungreedy) By default, allmatches in PCRE are greedy—

that is, a pattern will attempt tomatch the longest possible piece

of the search text The UU modifierreverses this behavior, asking PCRE to findthe shortest possible match for the pattern

More on greedy versus non-greedy ing will be covered in a future article

match-• uu (UTF-8) This modifier instructs PCRE totreat patterns and search texts as UTF-8characters instead of just single-byte charac-ters UTF-8 support is still new and should

be used with some caution as it may beincomplete

• ee (Evaluated replacements) This causes thereplacement string in a pprreegg rreeppllaaccee call to

be evaluated as PHP Back-references areexpanded and the resulting expression isexecuted via eevvaall The result of the evalua-tion is used as the final replacement text

Let's try an example of how to use this ing Wiki-style links to documents In Wikis,putting so-called CamelCaps text in a docu-ment will link it to the wiki page of thatname Doing this blindly with a regex can

writ-be achieved with the following replacement:

$text = preg_replace('/\b(([A-Z]\w+){2,})\b/',

'<a href="/wiki/\1.html">\1</a>', $text);

This might result in a number of tent documents being linked to, though If

Trang 26

we want the rewriting to only happen if the

destination document exists, we can

per-form the conditional replacement with an

evaluated replacement as shown in Listing 3

Now, when a CamelCaps word is

encoun-tered, the regex checks iiss wwiikkii ppaaggee to see

if it should be linked If so, the text is

replaced with a link; otherwise, it is left as-is

(or, rather, it is replaced with itself)

Evaluated replacements and their

compan-ion functcompan-ion pprreegg rreeppllaaccee ccaallllbbaacckk will be

covered in depth in a future article

Unless specifically contraindicated (such as BB and mm),

pattern global modifiers can be freely combined

A Simple Regex Cookbook

As with most tools, the way to really learn regexes is to

use them in practical situations To help you get on

your way, here is a short selection of recipes for making

the most out of your regular expressions

Apache Log Processing

Being able to extract information from webserver

log-files is essential to both good housekeeping (knowing

what links are broken and the disposition of our traffic)

and forensics (determining where traffic is coming from

and what actions users are taking) The first step to this

is being able to parse our logs into an easily accessible

data structure Apache common log format is defined

as the following:

"%h %l %u %t \"%r\" %>s %b"

Where the individual fields are:

• %%hh—-The IP address (or hostname if DNS

lookups are enabled) of the requestor

• %%ll—The remote logname, as supplied by

• %%>>ss—The three digit response code of the

final request served (Apache has a notion of

internal redirects—this is the response code

on the page actually returned to the user)

• %%bb—The number of bytes returned in the

response

A function to parse a single line and return an array

with its contents is given in Listing 4 Even though we

didn't really explore it in much detail, the benefit ofusing extended legibility regexes should be obvioushere—with 17 sub-patterns being captured, it would

be extremely difficult to guess the correct offsets at aglance Now that we have a parser, its applications arenearly limitless For example, Listing 5 shows a littlescript I like to leave running in a window on my desk-top; I tail my Apache log into it and it reports the num-ber of hits I get per second in real-time Running it as

tail -f /apache/logs/mysite/access | freq.php

Gives a running tally of hits per second (note thatthis will only run under a UNIX-like environment andthat you'll need to make ffrreeqq pphhpp executable) Thisdata could just as easily be written to an MRTG data-base for graphing, or something even cleverer.Because we have access to the fully parsed log line, we

20 “( #begin request match ($m[12])

21 (GET|HEAD|POST) # the HTTP method ($m[13])

7 while(( $line = fgets ( STDIN )) !== false ) {

8 if( $data = parse_clf_line ( $line )) {

9 $this_time = $data [ ];

10 if( $last_time && $last_time != $this_time ) {

11 print “$last_time: $count \n ”

Trang 27

could easily convert this to display hits per hour by

changing

$this_time = $data[4];

to

$this_sec = "$data[5]/$data[6]/$data[7] $data[8]";

Similarly, we could count bytes instead of pages by

accumulating $$datta[117] (bytes transferred) in

$

$countt

Single Pass Template Substitution

In its simplest form, a templating system runs through

a 'template' and replaces certain tokens with dynamic

values One of the things that makes many templating

systems slow is that they must perform multiple passes

through a document, one for each token to be

replaced If we standardize our token naming

conven-tion, we can actually perform the replacement in a

sin-gle pass

First, we require that all templates be of the form

{{NNAAMMEE}} where NNAAMMEE is a key in an associative array that

contains our substitutions With this in place, we can

match all tokens in a single pass with the following

regex:

/{(\w+)}/

Next we will use an evaluated replacement to

substi-tute the appropriate value from the passed associative

array Here is the full function:

function expand_text($text, $data)

Your friend {FRIEND} has sent you an e-card.

Click <a href="{LINK}">here</a> to pick it up.

print expand_text($template, $data);

Preventing Cross-Site Scripting

Attacks

Javascript is one of the banes of my existence Don't get

me wrong—it is a powerful and useful language, but its

tight integration with HTML makes it a fertile

play-ground for malicious users to launch cross-site scripting

attacks If we must allow HTML in user input, we will

want to at least remove any Javascript from it Listing 6

shows one possible way to do so This function looks forvarious DHTML and CSS directives that can be used forcross-site scripting attacks, and if any are found it per-forms a very draconian stripping of all but the basic for-matting tags

Conclusion

We have now come to the end of our journey throughthe basics of regular expressions With these tools inyour hands, you should be able to tackle almost anytext matching challenge Hopefully, you have lost anyfears you might have had concerning regular expres-sions Once past the terseness of their syntax, regexescan be a powerful and versatile addition to our pro-gramming toolkit

At the same time, we have really only touched the tip

of the regex iceberg In addition to the things we haveseen so far, the PCRE extension supports a number offine-grain features that allow for incredibly complexmatches These advanced features will be covered in afuture set of articles

F

FE EA AT TU UR RE E Matchmaker, Matchmaker Make Me A Match

To Discuss this article:

http://forums.phparch.com/131

George Schlossnagle is a Principal at OmniTI Computer Consulting, a Maryland-based tech company specializing in high-volume web and email systems Before joining OmniTI, George led technical operations

at several high-profile community web sites where he developed ence managing PHP in very large enterprise environments George is a frequent contributor to the PHP community His work can be found in the PHP core, as well as in the PEAR and PECL extension repositories.

experi-Before entering into information technology, George trained to be a mathematician and served a 2 year stint as a teacher in the Peace Corps His experience has taught him to value an inter-disciplinary approach to problem solving that favors root-cause analysis of problems over simply addressing symptoms.

5 $js_event_list = array(‘load’, ‘unload’, ‘click’, ‘dblclick’,

‘blur’,

‘sub-mit’,

10 $js_events = implode(‘|’, $js_event_list);

Trang 28

Write for us!

Trang 29

In this article, I will show you how to write powerful

automated tests in PHP for your Web applications

PHP is remarkably well-suited for writing software test

automation and the system I present is surprisingly

short Web applications built with PHP are becoming

more and more common in the enterprise arena and,

as a result, they are becoming increasingly complex As

PHP matures, the ability to write test automation

becomes more valuable, but in conversations with my

colleagues I discovered that the techniques required for

automated testing of PHP Web applications are not well

known In this article, I will show you how to quickly

write effective test automation that verifies your PHP

Web applications' correctness

The best way to show you what we will accomplish is

with two screenshots Figure 1 shows a dummy PHP

Web application that accepts a last name for an

employee and then searches a MySQL database and

displays the employee's ID, first name, last name, and

e-mail address In this example searching for "Baker"

correctly returns a single employee whose ID is 002,

first name is Bob, and e-mail is bob@build.com

Manually testing even this minimal Web application

would be extremely tedious, time consuming, and

error prone Instead, we can test the application by

programmatically sending input to the PHP script on

the Web server, then capture the response stream,

examine the response for a correct target, and log a

pass or fail result Figure 2 shows a PHP shell program

that does just that Test cases 0002 and 0003

corre-spond to the manual test shown in Figure 1

You might have noticed that my examples use a

Windows/IIS system rather than the more usualLinux/Apache setup Most client companies that I workwith are large and have a mixed technology environ-ment Because many of these companies are experi-menting with PHP and MySQL on a Windows/IIS base,

I decided to use that base for this article

In the sections that follow I will walk you through theunderlying PHP Web application so that you will under-stand what we are testing, briefly examine the underly-ing MySQL database so that you understand its rela-tionship to the test automation, and carefully go overthe PHP test automation program so that you can mod-ify the source code to meet your own particular needs

I will conclude with a discussion of some of the waysyou can extend this technique and use it in a produc-tion environment After reading this article you willhave the ability to write PHP test automation—a hope-fully valuable addition to your skill set

The PHP Web ApplicationThe most common use of PHP among the companies Iwork with is to create dynamic Web pages that have aninterface to a MySQL database I created a reduced

Other Software: N/ACode Directory: auto-test

REQUIREMENTS

PHP enables Web developers to create complex Web

appli-cations—nothing new there The techniques for writing

automated tests for PHP Web applications, however, are

not well known In this article, James McCaffrey shows you

a simple but representative PHP application and then

walks you through the creation of a powerful automated

test program written entirely in PHP The code is explained

in detail so you can use it as is, or modify and extend the

technique to meet your own needs.

Trang 30

dummy Web application that contains the essential

ele-ments of most real-life applications I deal with I started

by making a small database named ddbbCCoommppaannyy, which

contains a table named ttbbllEEmmppllooyyeeeess that has four

columns: eemmppiidd (employee ID), llaassttnnaammee, ffiirrssttnnaammee,

and eemmaaiill I populated the table with the four rows of

data you can see in Figure 3 (next page)

Next, I created a simple PHP Web application that

searches the database The code shown in Listing 1

generates the Web page shown in Figure 1

Both the database and the PHP application are

sim-plistic, but together they have all the elements needed

to demonstrate test automation Before I show you the

test automation program, let's imagine what it would

be like to manually test the application (In fact, asking

how to test a dummy Web application like this is often

used as an interview question for dedicated software

test engineers.) There are thousands of inputs you would have toenter into the page and then visually determine if theresponse was correct or not Then, suppose youchanged the logic or the database structure—you'dhave to start all over As you can imagine, this wouldnot be fun, or particularly efficient

To automate the testing of the dummy PHP Webapplication, we must programmatically send input tothe PHP script (via HTTP), then capture the HTTPresponse stream, examine the response for strings thattell us if the response is correct or not, and log results.The PHP shell script shown in Listing 2 does exactly thatand generated the output shown in Figure 2

I structured the test automation as two functions Themmaaiinn(()) function reads test case data from a text file,sends an input value to the PHP Web application, and

Figure 2Figure 1

Trang 31

examines the response for an expected value The

mmaaiinn(()) function calls a rreessHHaassTTaarrggeett(()) function which

returns TRUE if some input data contains a target string

Here are the contents of the test case file used in this

Each line of data represents a single test case A

4-digit test case ID is followed by an input value, then an

expected result, and an optional comment So, in test

case 0002, if we submit "Baker", we should see "Bob" in

the response

The mmaaiinn(())function starts by assigning values to

vari-ables for the IP address of the Web server, the port on

which the server listens, the path to the PHP

applica-tion, and the method used to send user data:

$ipAddress = '127.0.0.1';

$port = '80';

$page = '/PHP/simple.php';

$method = 'POST';

Because this is test automation, you will know the IP

address of the Web server that has your PHP

applica-tion, and it will usually be 127.0.0.1 (localhost), unless

you test on a server that is not installed on your local

machine Port 80 is the default HTTP port, but it may

be different in a test environment The two main

meth-ods of sending information to a Web server are POST

and GET Recall that our dummy Web application sends

data using POST:

<form name="theForm" action="simple.php"

method="POST">

I will discuss using GET requests later Next, mmaaiinn(())

prints some minimal header information to the shell

and then opens the test case file for reading The test

automation reads the test case file line by line:

$line = fgets($fp, 4096);

list($caseid, $input, $expected, $comment) = explode(":", $line);

$postData = 'lastname=' urlencode($input);

For each line, we parse the four colon-delimited fieldsusing the eexxppllooddee(()) function Using colons to delimittest case data is arbitrary—in general, you can use anycharacter but want to avoid characters that appear inthe actual test case data We append the input value tollaassttnnaammee== using the uurrlleennccooddee(()) function It replacescharacters that might be misinterpreted by the Webserver with their escaped equivalents For example, a '/'character would be replaced by a %2F sequence

After we have a test case ID, an input last name tosend and an expected value to look for, therreessHHaassTTaarrggeett(())function does all the work:

if (resHasTarget($ipAddress, $port, $method, $page,

$postData, $expected)) echo "$caseid Pass input = " str_pad($input, 12) "expected = $expected\n";

else echo "$caseid FAIL input = " str_pad($input, 12) "expected = $expected\n";

The rreessHHaassTTaarrggeett(()) function posts data to the PHPWeb application and checks if the expected value is inthe response stream For test case 0001,

"lastname=Anderson" is posted to112277 00 00 11::8800//PPHHPP//ssiimmppllee pphhpp and the response isexamined for the presence of the string "Adam" If

"Adam" is found, rreessHHaassTTaarrggeett(())returns TRUE and welog a "pass" message, otherwise we log a "fail" message.Let's now examine the rreessHHaassTTaarrggeett(())function thatdoes most of the actual work We start by creating asocket and then using it to connect to our Web server:

$socket = socket_create(AF_INET, SOCK_STREAM, 0)

or die("Socket failed\n");

$connect = socket_connect($socket, $ipAddress, $port)

or die("Connect failed\n");

The constantsAF_INET and

S O C K _ S T R E A Mmean that we want

to use the quad notation (i.e.,127.0.0.1) and afull-duplex, TCPconnection Thereare two importantalternatives to thesocket_* family offunctions I chose touse A lower levelchoice is the

dotted-ffssoocckk(()) family offunctions A higher

F

FE EA AT TU UR RE E Automated Testing For PHP Applications

Figure 3

Trang 32

level choice is to use classes in the PEAR library I have

programmed sockets using all three methods and have

found that any preference is more a matter of personal

programming style than functionality After we connect

to the Web server we determine the size of the data we

will be posting :

$reqBody = $postData;

$contentLength = strlen($reqBody);

The $$ppoossttDDaattaa input parameter assumes we have

data in a name-value sequence like:

user=chris&age=25&job=tester

for example Next we construct the HTTP headers we

are going to send to the server:

$send = $method " " $page " HTTP/1.1\r\n";

$send = "Host: localhost\r\n";

$send = "Accept: */*\r\n";

$send = "User-Agent: test.php test automation\r\n";

$send = "Content-Type:

An HTTP request starts with a line that specifies the

method (e.g., POST, GET, HEAD), followed by the path

to the PHP application and the HTTP version The next

header line must specify the host that the request is

being sent to The next two header lines are optional

The AAcccceepptt header tells the server what types of

responses are acceptable (here we'll accept anything)

The UUsseerr AAggeenntt header is a courtesy so the Web server

knows who is making the request The next two

head-er lines are required for POST requests Content-Type

tells the server what kind of data is coming You can

think of aapppplliiccaattiioonn//xx wwwwww ffoorrmm uurrlleennccooddeedd as a magic

string that means "data from an HTML form"

The CCoonntteenntt LLeennggtthh header is the size of the POST

data Notice that we have to construct the POST data

before the headers so we can specify the size at this

point in the program Also notice that the

CCoonntteenntt LLeennggtthh header is followed by 2 newline

charac-ters (or in the case of the Windows based system here,

2 carriage return, linefeed combinations) Finally we

append the POST data to the request

Now we are ready to send the HTTP request to the

server, then grab the response stream and examine it:

socket_write($socket, $send, strlen($send));

while ($receiveBuffer = socket_read($socket, 2048))

The ssoocckkeett wwrriittee(()) function sends the request and

associates the response to the socket We read the

response 2048 bytes (an arbitrary size) at a time (asopposed to line-by-line) We also use ssttrrppooss(())to see ifthe target string is anywhere in the 2048 bytes, and if

it is we close the socket and return TRUE If we ine the entire response and never find the target string

exam-we return FALSE

There is one trick to watch for here—it is possible that

a response stream block of bytes might end in the dle of the target, breaking it into two parts If so, youwould not find the target string In practice this is notvery likely and you can defend against this possibility byincreasing the number of bytes read per ssoocckkeett rreeaadd(())

mid-so that you capture the entire response stream

To summarize, the key to automated testing of PHPWeb applications is the ability to send raw HTTP data tothe Web server PHP has a family of socket functionsthat make it easy to do so After reading informationfrom test case files containing input values and expect-

ed values, you send the input to the server then ine the response for the expected value

exam-Using The GET Method

In the previous sections, we assumed that the PHP Webapplication under test sends data to the server usingthe POST method What if the application uses GET?Suppose you have a Web application where the usersubmits a user ID and a password using GET (By theway, this is a bad idea because with GET the form data

is appended to the request URL) The following codesnippet shows how to send a request using GET:

// create socket // connect

$send = "GET /PHP/form2.php?";

$send = "userID=" urlencode("root");

$send = "&password=" urlencode("secret");

Beyond the BasicsYou can modify and extend the basic PHP applicationtest framework presented here in many ways For clari-

ty, I used a simple text file to store test cases, but youshould consider good alternatives, like XML or databasestorage Using XML to hold your test cases is particular-

ly appropriate when the test cases have a complexstructure (for example, many optional parameters), orare shared across groups A database, on the other

Trang 33

hand, can come in handy when you have a very large

number of test cases

The technique in this article displays its output to a

command shell In a production environment, you will

probably want to write test results to a text file or a SQL

database Writing to a text file is most appropriate

when you are on a relatively short production cycle

Writing results to a SQL database is useful when you are

in a long production cycle because you will be

generat-ing lots of data that can be shared and analyzed in

many different ways

In a production environment, I always add additional

data to the results log At a minimum, you will want to

add counters for the number of cases which pass and

which fail I also like to add timing information for each

test case and the overall test run Timing information

can uncover problems in the Web application code that

basic pass-fail data misses And for reporting purposes,

you can timestamp the date of the test run

To be honest, when I first started using PHP I was very

surprised at how well it works as a language for

soft-ware test automation In general, it is best to write test

automation using the same language as that used by

the system under test—test a C++ application using

C++, test a Java application using Java The idea is that

if you use different languages, you run into many

cross-language issues which affect the validity of your test

automation But often, using the same language is just

not possible When I examined PHP's capabilities as a

testing language, I was pleased to find that they are asgood as any language I've worked with—and maybeeven better, in some cases

In the introduction to this article, I noted that most ofthe client companies I work with are currently investi-

6 <form name=”theForm” action=”simple.php” method=”POST”>

7 <p>Last name: <input type=”text” name=”lastname” /></p>

8 <p><input type=”submit” value=”Find Employee” /></p>

17 $search = $_POST [ ‘lastname’ ];

18 $query = “SELECT * FROM tblEmployees WHERE lastname = ‘“

26 echo “<td>” $row [ ‘empid’ ] “ “ $row [ ‘firstname’ ];

27 echo “ “ $row [ ‘lastname’ ] “ “ $row [ ‘email’ ]

7 $socket = socket_create ( AF_INET , SOCK_STREAM , 0 )

8 or die( “Socket failed\n” );

9

10 $connect = socket_connect ( $socket , $ipAddress , $port )

11 or die( “Connect failed\n” );

12

13 $reqBody = $postData ;

14 $contentLength = strlen ( $reqBody );

15

16 $send = $method “ “ $page “ HTTP/1.1\r\n” ;

17 $send = “Host: localhost\r\n” ;

18 $send = “Accept: */*\r\n” ;

19 $send = “User-Agent: test.php test automation\r\n” ;

20 $send = “Content-Type: urlencoded\r\n” ;

application/x-www-form-21 $send = “Content-Length: “ $contentLength “\r\n\r\n” ;

48 echo “\nBegin test run\n\n” ;

49 echo “caseid result\n” ;

Trang 34

gating mixed-technology

envi-ronments As recently as twelve

months ago, mixing Open

Source and proprietary

tech-nologies usually had uneven

results, but the situation has

changed dramatically for the

better The machine on which I

developed the techniques used

in this article happily supports

MySQL and SQL Server, C#

and PHP, Apache and IIS, and

dual boots into Linux and

Windows XP This works in

PHP's favor: developers can

install PHP over their existing

technologies and gradually

migrate In particular, I am

see-ing many in-house shops start

to move from ColdFusion to

PHP as their programming

platform of choice for Web

projects

An interesting side effect of

the test automation presented

in this article is that you can

easily adapt the test code to

create a general purpose HTTP

response viewer By placing an

eecchhoo(()) statement inside the

while loop that examines the response:

while ($receiveBuffer = socket_read($socket, 2048))

{

echo $receiveBuffer;

}

and making a few other cosmetic changes you can

view the entire response stream, as you can see in

Figure 4 If you are new to programming with PHP at a

low level, this is a great way to learn what is really going

on with HTTP behind the scenes

In principle, testing PHP Web applications is similar to

traditional API (Application Programming Interface) or

Unit testing But because PHP applications are

client-server based, there are additional connectivity issues

This means you will want to liberally use error checking

As usual for instructional articles, I removed all error

checking in the code presented here Based on my

experience, adding exception handing code (if you're

using PHP5) will double the size of your source code

but is well worth the effort

One valuable use of the technique presented in this

article is to construct Developer Regression Tests (DRTs)

for your PHP Web applications DRTs are a sequence of

automated tests that are run after you make changes to

your application They are designed to determine if

your new code has broken existing functionality, before

you check it in your version-control repository You canalso create an extensive set of test cases for a Full TestPass

Conclusion

In this article, I have shown you how easy it is to createtest automation systems written in PHP for your appli-cations As PHP matures, testing will become moreimportant and the ability to write automated tests willbecome more useful than it already is And becausePHP works so well in a mixed technology environment,the ability to write PHP test automation is a valuableaddition to your skill set—no matter what platformsyou use

To Discuss this article:

Figure 4

Ngày đăng: 11/12/2013, 02:15

TỪ KHÓA LIÊN QUAN