However, the culture surrounding PHP development is so welcoming, and so thoroughly entrapping, that lookingback my only question is "Why aren't there more extension developers?" The sho
Trang 1By Sara Golemon
Publisher: Sams Pub Date: May 30, 2006 Print ISBN-10: 0-672-32704-X Print ISBN-13: 978-0-672-32704-9 Pages: 456
Table of Contents | Index
In just a few years PHP has rapidly evolved from a small niche language to a powerful web development tool Now in use on over 14 million Web sites, PHP is more stable and
extensible than ever However, there is no documentation on how to extend PHP;
developers seeking to build PHP extensions and increase the performance and functionality
of their PHP applications are left to word of mouth and muddling through PHP internals without systematic, helpful guidance Although the basics of extension writing are fairly easy to grasp, the more advanced features have a tougher learning curve that can be very difficult to overcome This is common at any moderate to high-traffic site, forcing the company hire talented, and high-priced, developers to increase performance With
Extending and Embedding PHP, Sara Golemon makes writing extensions within the
grasp of every PHP developer, while guiding the reader through the tricky internals of PHP.
Trang 2By Sara Golemon
Publisher: Sams Pub Date: May 30, 2006 Print ISBN-10: 0-672-32704-X Print ISBN-13: 978-0-672-32704-9 Pages: 456
Trang 7Extending and Embedding PHP
Copyright © 2006 by Sams Publishing
All rights reserved No part of this book shall be reproduced,stored in a retrieval system, or transmitted by any means,
electronic, mechanical, photocopying, recording, or otherwise,without written permission from the publisher No patent
liability is assumed with respect to the use of the informationcontained herein Although every precaution has been taken inthe preparation of this book, the publisher and author assume
no responsibility for errors or omissions Nor is any liabilityassumed for damages resulting from the use of the informationcontained herein
regarded as affecting the validity of any trademark or servicemark
Warning and Disclaimer
Trang 8as accurate as possible, but no warranty or fitness is implied.The information provided is on an "as is" basis The author andthe publisher shall have neither liability nor responsibility to anyperson or entity with respect to any loss or damages arisingfrom the information contained in this book
Bulk Sales
Sams Publishing offers excellent discounts on this book whenordered in quantity for bulk purchases or special sales For
Trang 9To my partner Angela, who waited with patience and constancy while I ignored her night after night making this title a reality And to my family, who gave me strength, courage, and
confidence, and made me the person I am today.
Trang 10If you had told me when I submitted my first patch to the PHPproject that I'd be writing a book on the topic just three yearslater, I'd have called you something unpleasant and placed you
on /ignore However, the culture surrounding PHP development
is so welcoming, and so thoroughly entrapping, that lookingback my only question is "Why aren't there more extension
developers?"
The short (easy) answer, of course, is that while PHP's
documentation of userspace syntax and functions isin everywaysecond to none, the documentation of its internals is farfrom complete and consistently out of date Even now, the
march of progress towards full Unicode support in PHP6 is
introducing dozens of new API calls and changing the way
everyone from userspace scripters to core developers looks atstrings and binary safety
The response from those of us working on PHP who are mostfamiliar with its quirks is usually, "Use the source." To be fair,that's a valid answer because nearly every method in the core,and the extensions (both bundled and PECL), are generouslypeppered with comments and formatted according to strict, wellfollowed standards that are easy to read once you're used toit
But where do new developers start? How do they find out what
PHP_LONG_MACRO_NAME() does? And what, precisely, is the differencebetween a zval and a pval? (Hint: There isn't one; they're thesame variable type) This book aims to bring the PHP internals astep closer to the level of accessibility that has made the
userspace language so popular By exposing the well plannedand powerful APIs of PHP and the Zend Engine, we'll all benefitfrom a richer pool of talented developers both from the
commercial ranks and within the open source community
Trang 11Sara Golemon is a self-described terminal geek (pun
intended) She has been involved in the PHP project as a coredeveloper for nearly four years and is best known for
approaching the language "a little bit differently than everyoneelse"; a quote you're welcome to take as either praise or
criticism She has worked as a programmer/analyst at the
University of California, Berkeley for the past six years afterserving the United States District Courts for several years prior.Sara is also the developer and lead maintainer of a dozen PECLextensions as well as libssh2, a non-PHP related project
providing easy access to the SSH2 protocol At the time of thiswriting, she is actively involved with migrating the streamslayer for Unicode compatibility in PHP6
Trang 12As the reader of this book, you are our most important critic
and commentator We value your opinion and want to know
what we're doing right, what we could do better, what areasyou'd like to see us publish in, and any other words of wisdomyou're willing to pass our way
You can email or write me directly to let me know what you did
or didn't like about this bookas well as what we can do to makeour books stronger
Please note that I cannot help you with technical problems
related to the topic of this book, and that due to the high
volume of mail I receive, I might not be able to reply to every message.
When you write, please be sure to include this book's title andauthor as well as your name and phone or email address I willcarefully review your comments and share them with the authorand editors who worked on the book
Trang 13Visit our website and register this book at
www.samspublishing.com/register for convenient access to anyupdates, downloads, or errata that might be available for thisbook
Trang 15Should You Read This Book?
You probably picked this book off the shelf because you havesome level of interest in the PHP language If you are new toprogramming in general and are looking to get into the industrywith a robust but easy-to-use language, this is not the title for
you Have a look at PHP and MySQL Web Development or Teach
Yourself PHP in 24 Hours Both titles will get you accustomed to
using PHP and have you writing applications in no time
After you become familiar with the syntax and structure of thePHP scripts, you'll be ready to delve into this title Encyclopedicknowledge of the userspace functions available within PHP won't
be necessary, but it will help to know what wheels don't needreinventing, and what proven design concepts can be followed
Because the PHP interpreter was written in C, its extension andembedding API was written from a C language perspective
Although it is certainly possible to extend from or embed intoanother language, doing so is outside of the scope of this book.Knowing basic C syntax, datatypes, and pointer management isvital
It will be helpful if you are familiar with autoconf syntax Don'tworry about it if you aren't; you'll only need to know a few
basic rules of thumb to get by and you'll be introduced to theserules in Chapters 17, "Configuration and Linking" and 18,
"Extension Generators."
Why Should You Read This Book?
This book aims to teach you how to do two things First, it will
Trang 16applications, making them more versatile and useful to yourusers and customers
Why Would You Want to Extend PHP?
There are four common reasons for wanting to extend PHP Byfar, the most common reason is to link against an external
library and expose its API to userspace scripts This motivation
is seen in extensions like mysql, which links against the
libmysqlclient library to provide the mysql_*() family of functions
to PHP scripts
These types of extensions are what developers are referring towhen they describe PHP as "glue." The code that makes up theextension performs no significant degree of work on its own;rather, it creates an interpretation bridge between PHP's
extension API and the API exposed by the library Without this,PHP and libraries like libmysqlclient would not be able to
communicate on a common level Figure I.1 shows how thistype of extension bridges the gap between third-party librariesand the PHP core
Figure I.1 Glue Extensions
Trang 17Coming in third is the sheer need for speed PHP code has to betokenized, compiled, and stepped through in a virtual machineenvironment, which can never be as fast as native code Certainutilities (known as Opcode Caches) can allow scripts to skip thetokenization and compilation step on repeated execution, butthey can never speed up the execution step By translating it to
C code, the maintainer sacrifices some of the ease of designthat makes PHP so powerful, but gains a speed increase on theorder of several multiples
Lastly, a script author may have put years of work into a
particularly clever subroutine and now wants to sell it to
another party, but doesn't want to reveal the source code Oneapproach would be to use an opcode encryption program;
Trang 18decryption phase
What Does Embedding Actually Accomplish?
Let's say you've written an entire application in a nice, fast,lean, compiled language like C To make the application moreuseful to your users or clients, you'd like to provide a means forthem to script certain behaviors using a simple high-level
language where they don't have to worry about memory
management, or pointers, or linking, or any of that complicatedstuff
If the usefulness of such a feature isn't immediately obvious,consider what your office productivity applications would bewithout macros, or your command shell without batch files.What sorts of behavior would be impossible in a web browserwithout JavaScript? Would you be able to capture the magicHula-Hoop and rescue the prince without being able to programyour F1 key to fire a triple shot from your rocket launcher atjust the right time to defeat the angry monkey? Well, maybe,but your thumbs would hurt
So let's say you want to build customizable scripting into yourapplication; you could write your own compiler, build an
execution framework, and spend thousands of hours debugging
it, or you could take a ready-made enterprise class languagelike PHP and embed its interpreter right into your application.Tough choice, isn't it?
Trang 19This book is split into three primary topics First you'll be
reintroduced to PHP from the inside out in Part I, "Getting toKnow PHP All Over Again."
You'll see how the building blocks of the PHP interpreter fit
together, and learn how familiar concepts from userspace map
to their internal representations
In Part II, "Extensions", you'll start to construct a functionalPHP extension and learn how to use additional features of thePHPAPI By the end of this section, you should be able to
translate nearly any PHP script to faster, leaner C code You'llalso be ready to link against external libraries and perform
actions not possible from userspace
In Part III, "Embedding", you'll approach PHP from the oppositeangle Here, you'll start with an ordinary application and addPHP scripting support into it You'll learn how to leverage
safe_mode and other security features to execute user-suppliedcode safely, and coordinate multiple requests simultaneously
Finally, you'll find a set of appendices containing a referenceguide to API calls, solutions to common problems, and where tofind existing extensions to crib from
PHP Versus Zend
The first thing you need to know about PHP is that it's actuallymade up of five separate pieces shown in Figure I.2
Figure I.2 Anatomy of PHP.
[View full size image]
Trang 20coordinates the lifecycle process you'll see in Chapter 1, "ThePHP Lifecycle." This layer is what interfaces to web servers likeApache (through mod_php5.so) or the command line (through
bin/php) In Part III, you'll be linking against the embed SAPIwhich operates at this layer
machine where it reads and writes userspace variables,
manages program flow, and periodically passes control to one
of the other layers such as during a function call Zend also
provides per-request memory management and a robust API forenvironment manipulation
Lying above PHP and Zend is the extension layer where you'llfind all the functions available from userspace Several of theseextensions (such as standard, pcre, and session) are compiled
Trang 21with-mysql or enable-sockets, or built as shared modules and thenloaded in the php.ini with extension= or in userspace scripts usingthe dl() function You'll be developing in this layer in Part II andPart III when you start to perform simultaneous embedding andextending
Wrapped up around and threaded through all of this is the
TSRM (Thread Safe Resource Management) layer This portion
of the PHP interpreter is what allows a single instance of PHP toexecute multiple independent requests at the same time
without stepping all over each other Fortunately most of thislayer is hidden from view through a range of macro functionsthat you'll gradually come to be familiar with through the course
of this book
What Is an Extension?
An extension is a discrete bundle of code that can be pluggedinto the PHP interpreter in order to provide additional
functionality to userspace scripts Extensions typically export atleast one function, class, resource type, or stream
implementation, often a dozen or more of these in some
combination
The most widely used extension is the standard extension, whichdefines more than 500 functions, 10 resource types, 2 classes,and 5 stream wrappers This extension, along with the
zend_builtin_functions extension, is always compiled into the PHPinterpreter regardless of any other configuration options
Additional extensions, such as session, spl, pcre, mysql, and
sockets, are enabled or disabled with configuration options, orbuilt separately using the phpize tool
One structure that each extension (or module) shares in
common is the zend_module_entry struct defined in the PHP source
Trang 22point" where PHP introduces itself to your extension and definesthe startup and shutdown methods used by the lifecycle processdescribed in Chapter 1 (see Figure I.3) This structure also
references an array of zend_function_entry structures, defined in
Zend/zend_API.h This array, as the data type suggests, lists thebuilt-in functions exported by the extension
Trang 23In Part III, you'll see how any application can leverage the
power and flexibility of PHP code through the use of this simpleand concise library
Terms Used Throughout This Book
PHP Refers to the PHP interpreter as a whole
including Zend, TSRM, the SAPI layer, and any extensions.
PECL The PHP Extension Code Library, pronounced
"pickle." PECL ( http://pecl.php.net ) is the C-code offshoot of the PEAR project that uses many of the same packaging, deployment, and
installation systems PECL packages are usually PHP extensions, but may include Zend
Trang 24PHP extension Also known as a module A discrete bundle of
compiled code defining userspace-accessible functions, classes, stream implementations, constants, ini options, and specialized resource types Anywhere you see the term extension used elsewhere in the text, you may assume it is referring to a PHP extension.
Zend extension A variant of the PHP extension used by
specialized systems such as OpCode caches and encoders Zend extensions are beyond the scope
of this book.
Userspace The environment and API library visible to scripts
actually written in the PHP language Userspace has no access to PHP internals or data structures not explicitly granted to it by the workings of the Zend Engine and the various PHP extensions.
Internals (C-space) Engine and extension code This term is used to
refer to all those things that are not directly accessible to userspace code.
Trang 25
IN A COMMON WEB SERVER ENVIRONMENT, YOU'LL NEVERexplicitly start the PHP interpreter; you'll start Apache or someother web server that will load PHP and process scripts as
neededthat is, as .php documents are requested
Trang 26
Though it may look very different, the CLI binary actually
behaves just the same way A php command, entered at thesystem prompt starts up the "command line sapi," which actslike a miniweb server designed to service a single request
When the script is done running, this miniPHP-web server shutsdown and returns control to the shell
Trang 29if (zend_register_auto_global("_MYEXTENSION", sizeof("_MYEXTENSION") - 1, NULL TSRMLS_CC) == FAILURE) { return FAILURE;
}
zend_auto_global_disable_jit ("_MYEXTENSION", sizeof("_MYEXTENSION") - 1 TSRMLS_CC);
#else
if (zend_register_auto_global("_MYEXTENSION", sizeof("_MYEXTENSION") - 1 TSRMLS_CC) == FAILURE) { return FAILURE;
other tasks such as logging the page request to a file RINIT can
be thought of as a kind of auto_prepend_file directive for all
Trang 30corresponds to auto_append_file in much the same was as RINIT
corresponds to auto_prepend_file The most important difference
between RSHUTDOWN and auto_append_file, however, is that RSHUTDOWN
will always be executed, whereas a call to die() or exit() inside
the userspace script will skip any auto_append_file
Any last minute tasks that need to be performed can be
handled in RSHUTDOWN before the symbol table and other resources
are destroyed After all RSHUTDOWN methods have completed,
every variable in the symbol table is implicitly unset(), during
which all non-persistent resource and object destructors are
called in order to free resources gracefully
/* Run at the end of every page request
*/
Trang 31{
zval **myext_autoglobal;
if (zend_hash_find(&EG(symbol_table), "_MYEXTENSION", sizeof("_MYEXTENSION"), (void**)&myext_autoglobal) == SUCCESS) { /* Do something meaningful
Trang 32}
Trang 33Each PHP instance, whether started from an init script, or fromthe command line, follows a series of events involving both theRequest/Module Init/Shutdown events covered previously, andthe actual execution of scripts themselves How many times,and how frequently each startup and shutdown phase is
executed, depends on the SAPI in use The four most commonSAPI configurations are CLI/CGI, Multiprocess Module,
Multithreaded Module, and Embedded
CLI Life Cycle
The CLI (and CGI) SAPI is fairly unique in its single-request lifecycle; however, the Module versus Requests steps are still
cycles in discrete loops Figure 1.1 shows the progression of thePHP interpreter when called from the command line for the
script test.php
Figure 1.1 Requests cycles versus engine life
cycle.
Trang 34The most common configuration of PHP embedded into a webserver is using PHP built as an APXS module for Apache 1, orApache 2 using the Pre-fork MPM Many other web server
configurations fit into this same category, which will be referred
to as the multiprocess model through the rest of this book.
It's called the multiprocess model because when Apache starts
up, it immediately forks several child processes, each of whichhas its own process space and functions independently fromeach another Within a given child, the life cycle of that PHPinstance looks immediately familiar as shown in Figure 1.2 The
Trang 35Figure 1.2 Individual process life cycle.
This model does not allow any one child to be aware of dataowned by another child, although it does allow children to dieand be replaced at will without compromising the stability ofany other child Figure 1.3 shows multiple children of a singleApache invocation and the calls to each of their MINIT, RINIT,
RSHUTDOWN, and MSHUTDOWN methods
Figure 1.3 Multiprocess life cycles.
Trang 36Increasingly, PHP is being seen in a number of multithreadedweb server configurations such as the ISAPI interface to IIS andthe Apache 2 Worker MPM Under a multithreaded web serveronly one process runs at any given time, but multiple threadsexecute within that process space simultaneously This allowsseveral bits of overhead, including the repeated calls to
MINIT/MSHUTDOWN to be avoided, true global data to be allocatedand initialized only once, and potentially opens the door for
multiple requests to deterministically share information Figure1.4 shows the parallel process flow that occurs within PHP whenrun from a multithreaded web server such as Apache 2
Figure 1.4 Multithreaded life cycles.
[View full size image]
Trang 37Recalling that the Embed SAPI is just another SAPI
implementation following the same rules as the CLI, APXS, orISAPI interfaces, it's easy to imagine that the life cycle of a
request will follow the same basic path: Module Init => RequestInit => Request => Request Shutdown => Module Shutdown.Indeed, the Embed SAPI follows each of these steps in perfecttime with its siblings
What makes the Embed SAPI appear unique is that the requestmay be fed in multiple script segments that function as part of
a single whole request Control will also pass back and forthbetween PHP and the calling application multiple times undermost configurations
Although an Embed request may consist of one or more codeelements, embed applications are subject to the same requestisolation requirements as web servers In order to process two
or more simultaneous embed environments, your applicationwill either need to fork like Apache1 or thread like Apache2
Trang 39When PHP was in its infancy, it ran as a single process CGI andhad no concern for thread safety because no process space
could outlive a single request An internal variable could be
declared in the global scope and accessed or changed at willwithout consequence so long as its contents were properly
initialized Any resources that weren't cleaned up properly
would be released when the CGI process terminated
Later on, PHP was embedded into multiprocess web servers likeApache A given internal variable could still be defined globallyand safely accessed by the active request so long as it was
properly initialized at the start of each request and cleaned up
at the end because only one request per process space couldever be active at one time At this point per-request memorymanagement was added to keep resource leaks from growingout of control
As single-process multithreaded web servers started to appear,however, a new approach to handling global data became
necessary Eventually this would emerge as a new layer called
TSRM (Thread Safe Resource Management).
Thread-Safe Versus NonThread-Safe Declaration
In a simple non-threaded application, you would most likelydeclare global variables by placing them at the top of your
source file The compiler would then allocate a block of memory
in your program's data segment to hold that unit of information
In a multithreaded application where each thread needs its ownversion of that data element, it's necessary to allocate a
separate block of memory for each thread A given thread then
Trang 40Thread-Safe Data Pools
During an extension's MINIT phase, the TSRM layer is notifiedhow much data will need to be stored by that extension usingone or more calls to the ts_allocate_id() function TSRM addsthat byte count to its running total of data space requirements,and returns a new, unique identifier for that segment's portion
}
When it comes time to access that data segment during a
request, the extension requests a pointer from the TSRM layerfor the current thread's resource pool, offset by the appropriateindex suggested by the resource ID returned by ts_allocate_id().Put another way, in terms of code flow, the following statement
SAMPLE_G(sampleint) = 5; is one that you might see in the moduleassociated with the previous MINIT statement Under a thread-