org lists the features as: • Management of several databases creation, access or upload • Management of the attached databases • Create, edit and delete tables and indexes • Insert, edit
Trang 1<?ph p
Secure your applications against Email Injection Tips on Output Buffering
KOMODO - reviewed and much more
FPDI in Detail
Importing existing documents with Free PDF Import
2005 Look Back
Reflecting on last year’s events in the PHP world
with PHP guru Derick Rethans
i18n
Internationalize your web application
with less PHP code
PDFLib’s
VOLUME 5 ISSUE 1
Trang 2NEXCESS.NET Internet Solutions
SITEWORX control panel
NODEWORX Reseller Access
All of our servers run our in-house developed PHP/MySQL
server control panel: INTERWORX-CP
INTERWORX-CP features include:
- Rigorous spam / virus filtering
- Detailed website usage stats (including realtime metrics)
- Superb file management; WYSIWYG HTML editor
INTERWORX-CP is also available for your dedicated server Just visit
http://interworx.info for more information and to place your order
WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS
LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!
ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGE
VISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS
D e d i c a t e d & M a n a g e d D e d i c a t e d s e r v e r s o l u t i o n s a l s o a v a i l a b l e
/mo N EX R ESELL 2 $ 59 95
7500 MB Storage
100 GB TransferUnlimited MySQL DatabasesHost Unlimited DomainsPHP5 / MySQL 4.1.XNODEWORX Reseller Access
/mo
C O N T R O L P A N E L :
NEW! PHP 5 & MYSQL 4.1.X
PHP4 & MySQL 3.x/4.0.x options also available
We'll install any PHP extension you need! Just ask :)
MONEY BACK GUARANTEE
WITH ANY ANNUAL SIGNUP
4.1.x
3.x/4.0.x
Trang 5If you want to bring a php-related topic to the attention of the professional php community, whether it
is personal research, company software, or anything else, why not write an article for php|architect?
If you would like to contribute, contact us and one of our editors will be happy to help you hone your idea and turn it into a beautiful article for our magazine Visit www.phparch.com/writeforus.php
or contact our editorial team at write@phparch.com and get started!
Download this month’s code at: http://www.phparch.com/code/
CONTENTS
WRITE FOR US!
Features
Reflecting on last year’s
events in the PHP world
by DERICK RETHANS
Templating PDF’s for Maximum Reusability
by RON GOFF
with Free PDF Import
by JAN SLABON
Internationalize Your Web applications
with less PHP code
Why is it Taking so Long?
Lead times and the rationale behind them
Trang 6Graphics & Layout
php|architect (ISSN 1709-7169) is published
twelve times a year by Marco Tabini & Associates, Inc., P.O Box 54526, 1771 Avenue Road, Toronto,
ON M5M 4N5, Canada
Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes
no responsibilities with regards of use of the information contained herein or in all associated material.
php|architect, php|a, the php|architect logo, Marco Tabini & Associates, Inc and the Mta Logo are trademarks of Marco Tabini & Associates, Inc.
In the past five (or so) years, especially, the desktop landscape has changed,
severely Desktops have traditionally been dominated by Windows, but
alternatives are making their way into both the office and home
Apple’s hit operating systems in the OS X series, and other chic products
(like the iPod) have not only fueled the sales of Macintosh computers, but
have opened consumers’ minds to the reality that there are alternatives to Windows
The market is still strongly clutched by Microsoft, but more and more users are
making the “switch” to Mac (and to a much lesser extent, alternatives like Linux)
This diversity, while good, can cause portability problems, and as I’ve touched
on in past issues, developers can no longer target a single browser, but must become
more and more aware of standards and cross-browser/cross-platform compatibility
issues
For the most part, developers seem to have the browser issue under control I
personally never use Internet Explorer for anything but testing (I’m a Firefox fanboy),
and it’s very rare that I still run into sites that simply won’t work with FF Even in
cases where it seems I’m out of luck, I can often spoof the User-Agent header, and
get a working site Since Firefox is available on many platforms, it seems that the
HTML issue is (mostly) behind us—I say “mostly” because standards-compliance
and portability are things that we always need to strive for
If you’ve tried to distribute a printable, offline-viewable, and well laid out
document, in the past, you know that HTML doesn’t cut it There’s little provision
for the features that are necessary to build a professional document (there is hope
with CSS, though) This often leaves websites delivering “richer” documents, such
as MS Word documents or RTF files
The distribution of proprietary format documents leads to its own set of
problems, primarily: document creation and portability Have you tried to build a
Word document from your non-Windows Web server? It’s not fun Equally tedious
is trying to get that document to render properly in different versions of Word, on
different platforms—worse is the rendering in non-Microsoft applications, such as
OpenOffice Enter PDF
Now, PDF is certainly not new technology It does, however, seem to be becoming
more and more the de facto standard for document distribution PDF is no stranger
to php|architect readers: if you’re not reading this on paper, you’re reading a PDF,
and we’ve brought you much PDF-centric content in the past, but we’ve certainly not
drained the PDF knowledge pool
This month, we’re happy to focus on PDF, once again, but this time with a twist:
using PHP to modify existing PDFs, through various means.
It’s also our pleasure to be running Derick Rethans’ PHP Lookback, 2005 Marco
will touch more on this in exit(0).
On that note, we at php|architect wish you and your business a happy and
successful 2006 Here’s to another great year of PHP!
PLATFORM
DIVERSITY
EDITORIAL
Trang 8PHP 5.1.2 RC1
Ilia Alshanetsky announces the release of php
5.1.2 RC1.
“I’ve just packaged PHP 5.1.2RC1, the first
release candidate for the next 5.1 version A
small holiday present for all PHP users, from
the PHP developers This is primarily a bug
fixing release with its major points being:
• Many fixes to the strtotime() function,
over 10 bugs have been resolved.
• A fair number of fixes to PDO and its
drivers
• New OCI8 that fixes large number of
bugs backported from head.
• A final fix for Apache 2 crash when
SSI includes are being used.
• A number of crash fixes in extensions
and core components.
• XMLwriter & Hash extensions were
added and enabled by default.”
Get all the info at http://ilia.ws/archives/
97-PHP-5.1.2RC1-Released!.html
FUDforum 2.7.4RC1
Released
The FUDforum team has announced the latest
release of their open source forum package,
version 2.7.4 RC1 Some of the new features
include:
• Added subscribed forum filter to
message navigator
• Added handling for in-lined
attachments in mailing list
import
• Added the ability to supply
custom signature to message
synchronized from the forum
back to mailing list or a news
group
• Added support for allowing the
user to select how many threads
they want to see per page
• Much more…
Visit FUDforum.org for all the latest info.
eZ components
ez components 1.0 beta2
ez.no is proud to announce the release
of ez components ez.no announces: ”Ez components is an enterprise ready, general purpose PHP platform As a collection of high quality independent building blocks for PHP application development, ez components will both speed up development and reduce risks An application can use one or more components effortlessly, as they all adhere to the same naming conventions and follow the same structure All components are based on PHP 5.1, except for the ones that require the new Unicode support that will be available from PHP 6 on.”
Need to speed up your development?
Check out ez.no for more info.
xajax 0.2
xajaxproject.org announces the release of version 0.2 What is it? The site describes it as:” an open source PHP class library that allows you to easily create powerful, web- based, Ajax applications using HTML, CSS, JavaScript, and PHP Applications developed with xajax can asynchronously call server-side PHP functions and update content without reloading the page.”
To start working with xajax, visit
xajaxproject.org.
SQLiteManager 1.2.0RC2
If SQLite is the db of choice for your PHP application, you may be interested in the latest release of SQLiteManager SQLiteManager org lists the features as:
• Management of several databases (creation, access or upload)
• Management of the attached databases
• Create, edit and delete tables and indexes
• Insert, edit, delete records in these tables
• Management of views; create views from SELECTs
• Management of triggers
• Management of user defined functions
• Manual request and from file,
it is possible to define the format of the requests, sqlite or MySQL; a conversion is done in order to directly import a MySQL database in SQLite
• Importing of records from a formatted text file
• Export of structure and the data
• Choice of several display skins Check out SQLiteManager.org to start managing your SQLite DB, today.
php|architect Releases New PDFlib Book
We are proud to announce the release of our latest book
in the “Nanobooks” series called Beginning PDF Programming
with PHP and PDFlib
Authored by Ron Goff, this book provides a thorough introduction to the great capabilities provided by the PDFlib library for the creation and manipulation of PDF files
The book features a foreword by Thomas Merz, the original author of PDFlib and founder of PDFlib GmbH, and tackles topic like PDF file creation, fonts, text, shapes and
much more, including PDFlib’s Block Tool, which allows for
the manipulation of existing PDF documents.
For more information, http://www.phparch.com/pppp
Trang 9Check out the hottest new releases from PEAR.
Image_Color2 0.1.4
PHP 5 color conversion and basic mixing.
Currently supported color models:
• CMYK - Used in printing
• Grayscale - Perceptively weighted
grayscale
• Hex - Hex RGB colors i.e #abcdef
• HSL - Used in CSS3 to define colors
• HSV - Used by Photoshop and other
graphics packages
• Named - RGB value for named colors
like black, khaki, etc.
• WebsafeHex - Just like Hex but rounds
to websafe colors
Config 1.10.5
The Config package provides methods for
configuration manipulation.
• Creates configurations from scratch
• Parses and outputs different formats
(XML, PHP, INI, Apache )
• Edits existing configurations
• Converts configurations to other
It provides a common API for all supported RDBMS The main difference to most other
DB abstraction packages is that MDB2 goes much further to ensure portability Among other things MDB2 features:
• An OO-style query API
• A DSN (data source name) or array format for specifying database servers
• Datatype abstraction and on demand datatype conversion
• Various optional fetch modes to fix portability issues
• Portable error codes
• Sequential and non sequential row fetching as well as bulk fetching
• Ability to make buffered and unbuffered queries
• Ordered array and associative array for the fetched rows
• Prepare/execute (bind) emulation
• Sequence emulation
• Replace emulation
• Limited sub select emulation
• Row limit support
• Transactions support
• Large Object support
• Index/Unique Key/Primary Key support
• Reverse engineering schemas from
an existing DB
• SQL function call abstraction
• Full integration into the PEAR Framework
• PHPDoc API documentation
The GDChart extension provides an interface
to the bundled gdchart library This library
uses the (bundled) GD library to generate 20
different types of graphs, based on supplied
parameters.
The extension provides an OO interface
to gdchart exposing majority of options via
properties and complex (array) options via a
series of methods.
To use the current version of the extension
PHP 5.0.0 is required, and older PHP 4 only
version can be downloaded from CVS, by
checking out the extension with PECL_4_3
tag.
yaz 1.0.6
This extension implements a Z39.50 client for
PHP using the YAZ toolkit.
Fileinfo 1.0.3
This extension allows retrieval of information regarding vast majority of files This information may include dimensions, quality, length etc
Additionally, it can also be used to retrieve the mime type for a particular file and for text files, the proper language encoding.
pecl_http 0.21.0
It eases handling of HTTP URLs, dates, redirects, headers and messages, provides means for negotiation of clients preferred language and charset, as well as a convenient way to send any arbitrary data with caching and resuming capabilities.
It provides powerful request functionality,
if built with CURL support Parallel requests are available for PHP-5 and greater.
PHP-5 classes: HttpUtil, HttpMessage, HttpRequest, HttpRequestPool, HttpDeflateStream, HttpInflateStream PHP-5.1 classes: HttpResponse
Xdebug 2.0.0beta5
The Xdebug extension helps you debugging your script by providing a lot of valuable debug information The debug information that Xdebug can provide includes the following:
• stack and function traces in error messages with:
• full parameter display for user defined functions
• function name, file name and line indications
• support for member functions
• memory allocation
• protection for infinite recursions Xdebug also provides:
• profiling information for PHP scripts
• script execution analysis
• capabilities to debug your scripts interactively with a debug client
Trang 10Welcome to the fourth installment of the PHP
Look Back Just as in previous years, we’ll
look back on PHP development discussions,
bloopers and accomplishments of the last
year This is not supposed to be a fully
objective review of last year—note that the opinions in
this article are that of the author, and not of the PHP
development team (nor of php|architect).
January
January was a quiet month, with not much going on
After about 8 months[001], we finally added[002] a
PIC/non-PIC detection mechanism to the configure script, that
will select non-PIC object generation for supported
platforms (Linux and FreeBSD) Non-PIC code is about
30% faster, as measured in earlier benchmarks
on the events of the past year Derick Rethans, a PHP internals developer, has been publishing a PHP Look Back for a few years, now, and this year, we saw it fitting to publish it, here Happy 2006!
by DERICK RETHANS
A week later, Leonardo[003] was wondering whether we planned on adding type hints for scalar types to PHP As PHP is a weakly-typed language, this is not something
we wanted to add, although we did add support for an
“array” type hint, later in the year With PHP 5.1’s new GOTO execution method (added last August), variable name lookups are cached internally This caused some problems for Xdebug[004], as it needs some information to find out which variables are used in a specific scope Andi committed[005] a patch that made Xdebug work properly, again
Michael started working on his HTTP extension (which
2005
PHP
LOOK BACK
Trang 112005 Look Back
generates way too many commit mails ;-) and encountered
a problem with a naming clash[006] between PEAR’s HTTP
class and his PECL extension Greg responded[007], and said
that this problem will be solved when PEAR 1.4 comes
out, with its channel support
February
Andi started discussions in February by pointing out a
date for the first beta of PHP 5.1: March 1st He declared
that “both PDO and Date should be included in the default
distribution”[008] and others suggested that XML Reader[009]
should be included by default, as well In reply to Andi,
Rasmus mentioned[010] that he would like to see the
filter extension included, as well The discussion about
this extension quickly transitioned to data mangling
of input request variables, and how they could not be
influenced by the script authors, but only by the system
administrator In the end, this discussion made place
for the topic of Operator overloading[011], where certain
people kept reiterating that operator overloading is a
“good thing.[012]”
Andrei tried to stop this discussion by being funny[013],
but it didn’t work very well[014] Around the same time, Wez
announced[015] the first beta of PDO—PHP Data Objects
Wez wanted people to test[016] PDO, and of course, over the
next couple of months, there were various PDO-related
concerns[017] and issues raised
Another discussion in February was about auto
boxing[018] in PHP Auto boxing is the encapsulation of
all primitive types as objects Naturally, people asked
why[019] we would want to have this, and no sound
reason was given In the end, this discussion suggested
that phpDocumentor [020] should handle type determining,
instead Having a doc block[021] parsing extension to the
reflection API would be nice, although a bit hard
We also had an often-recurring discussion[022] on why
the GPL[023] is a bad idea for PECL[024] extensions
John added the first version[025] of XMLRPCi to CVS;
why he chose this silly name is still unknown Jani
wrote about a problem with overwriting globals[026], an
issue that—later in the year—warranted a new PHP release, and Greg introduced[027] PEAR 1.4, with channel support
Halfway through the month, Marcus[028] mentioned a few things that should go into PHP 5.1; most notably the toString() fix, which unfortunately, did not actually make it into the release Type hinting with “= NULL” did, make it in[029], though
Martin Sarsale reported[030] an issue with references and segfaults, something which had been annoying us
at eZ systems[031] for quite some time, too This issue got fixed in PHP 4.4, albeit not without a little bickering (more about that later)
March
In March, Ilia proposed[032] a patch that adds a special token that tells PHP’s parser to abort parsing when the token is encountered This allows us to attach binary data to the end of a PHP script, which is highly useful for one-script installers, such as the one that FUDForum[033] uses
On the 14th of the month, Zeev released the first RCs[034] of both 5.0.4 and 4.3.11 We also encountered further reference issues[035]
The same guy that mailed tons of “fixes” to the internals list, last June[036], was back with more[037]
patches Andrei, once again, pointed out[038] that it is
a good idea to check with an extension’s maintainer before applying patches, and Greg published[039] the
package2.xml documentation
Lukas, once more, pointed out[040] the weird naming scheme that new extensions seem to be getting, and luckily Debian’s PHP packages got rid[041] of some of the insanity that was present in previous[042] releases by not always building in ZTS mode Unfortunately, their packages still force PIC mode for the libraries
A user brought up the idea of an upload meter patch[043], again, and although we all seemed to remember[044] that the original patch was rejected[044],
no one could find the original thread[046] where this was discussed Last year’s Look Back discussed this too, and
Luckily, Debian’s PHP packages got rid of some of the insanity that was present
in previous releases.
Trang 122005 Look Back
there, the reason was mentioned[047]
In the last week of the month, we had some fuss[048]
about “FreeBSD doing stupid things[049]” regarding their
naming of auto tools executables[050]
April
April started with a suggestion[051] by Zeev to change
the way that autoload() works, by allowing multiple
instances of this magic function In the end we, didn’t
end up implementing this, and as Lukas described[052],
“Frameworks should provide autoload() helper
methods, but should never implement the function itself
It’s up to the end user to do this.” (This is exactly how
we implemented it for the eZ components[053])
Andi wanted to release PHP 5.1 Beta 1[054] really soon,
but, as Jani mentioned[055], there were quite a few things
that were still not fully ready, and thus the suggestion to
call it “Alpha”[056] was made, instead During this thread,
some pet-features[058] were brought up[059]
Kamesh, from the Netware porting team, found
another reference issue[060] Marcus added the File [061] class
to his SPL extension, causing a small stir—the new class
clashed with any application that already defines its
own File class Although this is a valid point, projects
defining a “File” class should know better, and would be
wise to prefix their class names This same issue will pop
up later in the year
A last, somewhat larger, discussion erupted when
a question[062] about whether APC could be used as a
content cache was posted to the list Rasmus found it an
interesting idea[063], although this functionality can also
be accomplished in user space In the last point of the
thread, Rasmus mentioned[064] that APC will soon support
PHP 5
May
May had a slow start, and things only got interesting
at the end of the month The first discussion that came
up was Ilia’s removal of dangling commas from enums,
something that “was in c language from the first day[065].”
Apparently, GCC 4 is “becoming worse and worse[066],” but
luckily, we can still just ignore the warnings[067]
After a small private discussion with Dmitry about
Marcus’ and my reference fix patch[068], he came to the
conclusion that this patch breaks binary compatibility
and that this problem warrants a PHP 4.4 release As this
reference problem has been affecting many users, and
definitely eZ over the past months, I wrote an email[069]
to the list stating that it is “totally irresponsible” not to
release a fix for such a grave bug Zeev[070] also said that
“we should probably not fix this at all in the 4.x tree”
because of the hassles that accompany “breaking module
binary compatibility.” He also seemed to think that the bug can easily be worked around
Other users were a bit happier[071] that we finally nailed this bug, and Jani replied to Zeev that the magnitude[072]
of this bug is pretty high Rasmus added that he “will
be deploying the patch and happily breaking binary compatibility[073]” as soon as the patch is ready Breaking binary compatibility is only a “burden on the maintainers
of these packages” (of the various distributions) Wez thought that “the only logical move forward is a 4.4 branch and release[074].” In the end, the Zeev almighty was
“tired of going through the reasons again and again[075]” and noted that “everyone appears to prefer the upsides
to the downsides.” This resulted in the creation of the
PHP_4_4 branch[076] in the first week of June
June
Wez added a new patch to our CVS server that allows
us to block access[077] to specific branches—with this,
we closed the PHP_4_3 branch for good A week later,
I announced 4.4.0RC1[078], which features the reference bug fix
Andi wrote another PHP 5.1 mail[079], which spawned
a nice long discussion on adding goto [080] to PHP, and comparing goto to exceptions Magnus smartly added[081]
that “people are talking about hypothetical messy code because of goto” and that they forget that you don’t have to use a language construct simply because it is available
The same thread also went into a branch that discussed[082] the ifsetor() language construct After Andi returned, he decided not to do anything with
goto or ifsetor() [083], and that it was now the time to branch, so that we can merge the Unicode support that was developed in parallel by mostly Andrei and Dmitry, although Rasmus was “pretty sure the current discussions will pale in comparison to the chaos that will be created when the Unicode stuff goes into HEAD![084]”
Johannes wondered when the new date stuff[085] was going in; it was added a week later, just before PHP 5.1 beta 2 Lukas suggested that we add[086] the public keyword
to PHP 4.4 for forward compatibility Rasmus again wondered about “the reasoning for not having var be
a synonym for public in PHP 5[087].” Andi mentioned[088]
that this “was meant to help people find vars so that they can be explicit about the access modifiers” when moving to PHP 5
A few days later, Andi read a blog posting[089] which described how PHP 4.4 is breaking backwards compatibility
by issuing an E_STRICT in cases where developers abuse return-by-reference This, however, was not actually the case[090]
Trang 132005 Look Back
Yasuo started a long thread[091] on allow_url_fopen()
and claimed it was dangerous[092] The main result of
this thread seemed to be that we wanted to split the
setting into two different privileges: one that allows
remote opening of URLs and one to allow include() on
remote URLs However, this is something we could not
yet change
The last thread of the month was by Andi, writing
about the PHP 5.1 release process[093]
July
In July, Jessie suggested[094] a String extension that
declares only one class: String This class is meant to
prevent copying of the string’s data for most operations
(which is currently done with PHP’s string functions)
Most of the other developers where against it, for
different reasons: “String is such a generic name for a
non-core class[095]” and “the savings gained by this will be
more than offset by OO overhead[096],” so we will not let
“this get anywhere near the core[097].”
In the same week, I made more changes to the date
extension[098] that allows users to more easily select the
timezone that they want, instead of having to rely on
the TZ environment variable This is also needed because
the TZ environment variable[099] can most likely not
be used in a thread safe way, and it is certainly not
portable[100] Also in the same week, I proposed an API
for new Date and Timezone functionality[101] After some
pressure[102], I added[103] an OO API, too Near the end of
the month, I committed the implementation of the new
date functionality[104] It was, however, #ifdef-ed out to
facilitate discussions at a later date
Jessie came up with Yet Another Namespace
Proposal[105], and tried to come up with a solution for all
the previous problems we had with the implementation
He also made several patches[106] that added namespaces
to PHP
We had some more fuss[107] about PHP 4.4 breaking BC,
where some people didn’t see[108] why we had to implement this fix Unfortunately, there were some quirks[109] that we still had to sort out
In this same month, Rasmus released APC 3.0.0[110]
which came with PHP 5.1 support and numerous fixes
August
August started with a discussion on instanceof [111] being
“broken,” as it raises a fatal error in the case where the class that is being checked for doesn’t exist Andi declared “if you’re referencing classes/exceptions in your code that don’t exist, then something is very bogus with your code[112]” and “the only problem is if the class does not exist in your code base, in which case, your application should blow up![113]”
I raised a question about whether the new PHP with
Unicode should be called PHP 5.5 or PHP 6.0[114] Andi (amd the majority) wanted to go “with PHP 6 and aim to release it before Perl 6[115].”
After PHP_5_1 was branched, Andrei merged the Unicode branch and gave us some instructions on how
to get started with it[116] He also introduced the general ideas behind the implementation[117]
PHP 5.1 RC1 was finally rolled, about half way through the month, followed by PHP 5.0.5 RC2[118], a week later.During the development of the eZ components[119],
we discovered various things in PHP’s OO model that we wanted to see changed One of those issues was described
in the Property Overloading RFC[120] Unfortunately, not everybody could be convinced[121], and no changes were made I will try again though :)
The other issue that we raised was that failed typehints throw a fatal error[122], while that is not strictly necessary Instead of throwing exceptions[123] in this case, the discussion turned towards adding a new error mode[124]
(E_RECOVERABLE [125]) that will be used for corrupting fatal errors at the language level—this is exactly the case with failed typehints
non-engine-If you’re referencing classes/exceptions
in your code that don’t exist, then something is very bogus with your code.
Trang 142005 Look Back
The longest thread of the month, was started by
Rasmus when he posted his PHP 6[126] wish list, which
featured controversial changes such as “removing
magic_quotes” and “making identifiers case-sensitive,”
to which most developers quickly agreed[127] Following
his initial wish list, the crowd went wild and started
suggesting all kinds of weird changes, such as “Radically
change all of the operator syntaxes[128],” adding <?php6 [129]
as a BC breaking mode, and “Named parameters[130].”
Marcus made a list of his own[131] which would later
become the first draft of the meeting agenda for a PHP
Developers Meeting
September
In September, Antony committed[132] an upgraded OCI8
extension which fixes a lot of bugs[133] We also decided
to play a bit nicer with version_compare(), regarding
naming[134] release candidates
Zeev wanted to roll[135] PHP 5.0.5 but there was an
issue[136] with the shutdown order The reference issues
returned, too The first one[137] turned out to be an
incorrect merge to the PHP 5.0 branch, where suddenly
some of the notices turned into errors[138] The second
one[139] is simply a small change in behaviour, which
previously created memory corruption Rasmus explained
the issue a bit more[140], once again
Ilia tried to implement a clever fix[141] which turned
out to be a problem later on Pierre started a discussion
on supporting Unicode in identifiers, something he didn’t
want to see PHP already supports using UTF-8 encoded
characters[142] in identifiers, so removing this feature
will break BC unnecessarily Besides breaking BC, many
people simply want to use their own language for writing
code, as Tex[143] writes
Zeev made another attempt at PHP 5.1.0 RC2[144] with
the latest PEAR being the only thing missing Marcus
brought up the issue of toString() again, and finally
managed to get it into CVS, but unfortunately not in time
for PHP 5.1
Stanislav[146] noticed some problems with detecting
time zones, as the new date/time code did not try to
attempt detection in favour of the new date.timezone
setting[147] After some discussion, we came up with a solution[147], which was then implemented It should guess the timezone correctly in most cases, even on
Windows I also added support for an external timezone database[149]
October
In October, I noticed some weird notices[150] with
“make install-pear,” without a clue as to why they were showing up This discussion turned into a “why does PEAR not support PHP 5.1” thread[151] In the end, Greg managed to nail down the weird notices, though
I also noticed a commit by Dmitry[152] that ignores “&” when $this is passed I pointed out that this should not be supported (in PHP 5), as it doesn’t make really sense that people won’t see a warning/notice/error when they’re doing something silly Dmitry explained[153] that disallowing it would break code, but he also writes that
by “using ‘=& $this’, a user can break the $this value”—which is something we definitely should prevent He suggested[154] we make this an E_STRICT warning, and Andi suggested[155] we escalate this to an E_ERROR in PHP 6, but neither of those things happened
A week later, Piotr[156] asked for a tarball of our CVS to make it “possible to convert it to Subversion repository so browsing the repositories would be much easier.”
We wondered[157] why he needed that, as we offer our own browser[158], already
Matthias[159] said that we “do not want to set off yet another discussion about the changes 4.4 brought,” but that is exactly what he did Again, there was something wrong with his code, and thus the warning is legal.After resolving the timezone issues, last month,
we were surprised by a message from Zeev He simply missed[161] the conclusion in the “lengthly thread.”
As a result of the negative comments on the PHP 4.4.0 release, Lukas, Ilia and I set up a routine[162] for involving some of the more known projects to the PHP 4[163] and PHP 5[164] release processes As part of this effort, we send out[165] a mail to all participating projects whenever we
The filter extension, which I’ve been
developing for quite some time, did not make it into PHP 5.1
Trang 15After the PDM I posted[187] the meeting notes[188] to the list Most of the outcome was well appreciated, except the curly braces idea which has already been discussed With these notes, we hope to make PHP 6 a success The notes also spawned numerous[189] polls[190] on the symbol to use for separating namespaces from class names/function names We also discussed our version of a goto: labeled[191]
breaks[192].The filter extension[193], which I’ve been developing for quite some time, did not make it into PHP 5.1, although
it is a good idea[194] to add it, now, with an “experimental” status, so that this wanted extension gets more testing Perhaps for PHP 5.1.2…
December
December was a quiet month with little action Ilia proposed[195] a plan for PHP 5.1.2 and released PHP 5.1.2RC1[196], Zeev committed[197] Dmitry’s re-implementation
of the FastCGI API and some user[198] was whining about our “official” IRC channel (which doesn’t exist)
That was it for 2005 (as far as PHP internal development is concerned)! I hope you enjoyed reading this, and have a happy new year Extra thanks go to Ilia, for being the release master, Dmitry for maintaining the engine, Jani for hunting down bug reports, Andrei for his work on Unicode, Mike for his enormous stream of useless commit messages ;-), and to all others who made PHP happen this year
have a release candidate to test
I raised[166] some concern regarding our current
Unicode implementation because of maintenance issues
In part of my mail, I also indicated that I wanted “to
clean up PHP 6 for real,[167]” after private discussions
with Marcus and Ilia Behind the scenes, we prepared
some material to organize a PHP Developers Meeting to
discuss the Unicode implementation and the extended
“PHP 6 Wishlist.” I also committed[168] a patch that allows
typehints for classes to work with = NULL [169]
Another guy raised the issue of “that new isset()-like
language construct,[170]” but this ended up going nowhere,
as people were suggesting very Perl-like[171] operators
Jani replied to this thread with “How about a good ol’
beating with a large trout?[172]”
On the last day of the month, we released PHP 4.4.1[173]
which addresses some of the reference issues we’ve seen
in PHP 4.4.0
November
In November, we prepared to finally release PHP 5.1,
and one of the efforts was to make an upgrade guide[174]
for people switching to PHP 5.1 Sean noticed[175] a
problem with the parameter parsing API’s automatic
type conversion Like Andrei[176], many people think that
“passing ‘123abc’ and having it interpreted as 123” is still
wrong
Dmitry implemented[177] support for “= null” as default
to array type hinting, something that I did not do[178] on
purpose because “= array()” is the logically correct way
of doing this Andi agreed[179] with me on this
Ilia implemented, in PHP 5.1RC5[180], one of the items
that was on the outcome list of the PHP Developers
Meeting: adding a notice that warns people that curly
braces[181] for addressing a character in a string is now
deprecated in favour of the [] operator—contrary to the
current explanation in our manual {} and [] are exactly
the same thing[182] and “having two constructs for the same
behaviour is silly and leads to confusing, hard to read
code.” The outcome of this discussion was the removal
of the notice in PHP 5.1 and the likely conclusion is that
it is not going to get removed
Another change that as made PHP 5.1RC6 was the
creation of the “Date” class, which caused quite a stir
after the release of PHP 5.1[183] The reason to introduce it
in 5.1 was simply to make sure that no applications were
going to break if we introduced the Date class later in the
5.1.x series Unfortunately a lot of projects, including
PEAR, never heard of “prefixing” class names, causing
class name clashes Marcus described the problem as
“PEAR ignores coding standards,[184]” but others suggested
that we renamed the internal class[185] to something silly
DERICK RETHANS provides solutions for Internet related problems He has contributed in a number of ways to the PHP project, including the mcrypt , date and input-filter extensions, bug fixes, additions and leading the QA team He now works as project leader for the
eZ compoments project for eZ systems A.S In his spare time he likes
to work on, xdebug watch movies, travel and practice photography You can reach him at derick@derickrethans.nl.
Trang 18PHPLib’s Block Tool
<?php
FEATURE
The PDFLib Block Tool—available for use only
with PDFlib Personalization Server (PPS)—helps
create PDF documents derived from large
amounts of variable data
Before the block tool was added, it was a
difficult process to place variable data, images, and even
other PDFs into precise areas of a PDF that had been
designed previously Now, adding variable data is very
simple and helps create great dynamic pieces for just
about any application
Installing the Block Tool
Currently, the block tool plug-in for Adobe Acrobat is only
available on the Windows and Macintosh (both Mac OS 9
and Mac OS X) platforms On either platform, you must
also have Version 6 or 7 of Adobe Acrobat Professional
or Adobe Acrobat Standard, or the full version of Adobe
Acrobat 5 Other versions of Adobe Acrobat—Acrobat
Reader, and Acrobat Elements—and all other PDF creation
tools do not work with the block tool plug-in (Check the PDFlib web site for an up-to-date list of supported PDF authoring tools.)
Windows OS Installation
If you’re using Windows, you can use the block tool installer provided by PDFlib to get the plug-in installed correctly into your version of Adobe Acrobat 5, 6,
or 7 The installer places the correct files into the Acrobat plug-ins folder, which is typically found at
C:\Program Files\Adobe\Acrobat 6.0\Acrobat\plug_ins\ PDFlib The Windows version of the block tool is compatible only with PPS version 6.0.1
by Ron Goff
TO DISCUSS THIS ARTICLE VISIT:
http://forum.phparch.com/280
If you’ve been developing for any length
of time, you’ve probably been tasked with generating PDFs at some point In this article, we’ll discuss the process of combining data from many sources into a single PDF—from installation of the block tool, to creating the blocks in Adobe Acrobat, and then finally working with the blocks via PDFlib.
PDFLib’s Block Tool
CODE DIRECTORY: pdflib
Trang 19PHPLib’s Block Tool
Mac OS Installation
You can install the block tool in either Mac OS 9 or OS X
If you own Adobe Acrobat 5, place the files that comprise
the block tool into the Acrobat plug-in directory, typically
located at /Applications/Adobe Acrobat 5.0/Plug-Ins/
If you’re using Adobe Acrobat version 6 or version 7, save
the files that comprise the block tool into a new directory
and then locate the Acrobat program, which is usually
found at /Applications/Adobe Acrobat 6.0 Professional
Using the Finder, click once on the Acrobat application
to select it and then choose “File > Get Info” from the
menu bar Locate the triangle next to the words
“Plug-ins.” Expand the triangle, select “Add,” and then locate
the folder that contains the block tool plug-in files
Creating Blocks
After you install the block tool, you should see a new
menu called “PDFlib Blocks” in Acrobat’s main menubar
You should also see a new icon that resembles [=])—this
is the block tool (See the top of Figure 1.) You use the
block tool icon to create regions that you can fill with
variable data
When you click the block tool icon and hover over the
PDF, your cursor turns into a crosshair To create a block,
click the mouse and hold it while dragging your cursor
As you drag your cursor, a lightly-outlined box should
appear (See Figure 1.)
When you’re satisfied with the size of the box, release
the mouse button A menu like the one shown in Figure
3 appears The menu controls all of the properties of
the block, including the formatting of the data that will
be contained in the block (data that you will add via
FIGURE 1 FIGURE 2
FIGURE 3
The New and Improved Block Tool
If you’ve used previous versions of the block tool, you’ll notice that the new version is much more user friendly The export and import features have also been updated, making it much quicker to apply blocks from previously formatted PDFs
Trang 20PHPLib’s Block Tool
PDFlib)
There are three types of blocks that can be created:
• The first and default type of block is text It
handles any type of text, whether it’s a single
line of text or many lines of text
• The second type of block is image As its name
implies, an image block is a container for the
dynamic placement of images within the PDF
• The third and last type is PDF, which is able to
contain other PDFs
Each block has general properties (see Figure 2) and
type-specific properties General properties set attributes such as the placement of the block, its background and border colours, and its orientation, to name just a few Some of the sections that follow describe the type-specific properties
So what do you do with blocks? As you might have inferred, already, you use blocks to mix dynamic content amid static content A designer can create a PDF, include static text and images, and then place blocks wherever dynamic content should appear Your application “fills
in the blanks,” so to speak, and because blocks retain properties such as typeface, font size, color, kerning, and other settings, the block, once filled, looks exactly like the rest of document—just as the designer intended.Using blocks, the application that generates each PDF document need not format anything However,
if you want to customize a block on-the-fly, you can Pre-defined block attributes can be overwritten by your code
Editing Block Settings
To change a block property, select the block you want
to configure and then navigate to find the property you want to change For example, Figure 3 shows how to edit the textflow property, which can be either true or false
(hence, the dropdown menu)
The purpose of most properties is obvious, but be careful with attributes that specify font names Unless you’re running Acrobat on the same machine as your PDFlib application, it’s likely that the set of fonts on the two machines (say, your desktop and the server, respectively) will differ Be sure to use the name of fonts that are installed on your server
Text Flow Settings
If you want a block to flow (automatically wrap and justify) arbitrary amounts of text, set the textflow
property to true Once set to true, an additional button named TextFlow appears next to the existing button labeled Text Click on TextFlow to examine and set specific variables (such as leading and indents) that control how text flows in the block All other text attributes—those for one line of text or a flow of text—remain in the same pane as the textflow property
Trang 21PHPLib’s Block Tool
Image Settings
By changing the block option to image, you can use
PDFlib to place images dynamically in a PDF There are far
fewer options for an image block than for a text block
The options screen for an image block is shown in Figure
5
The defaultimage attribute names a default image to
place if the image specified by PDFlib is unavailable
The dpi setting, or the number of dots per inch, is
used to override the dpi of an image PDFlib will use the
default dpi value of the image if it is available, or 72
dpi if this option isn’t set If necessary, you can set the
horizontal and vertical dpi independently by supplying
two values instead of one, first horizontal dpi and then
vertical dpi
The scale property controls the scaling of the
image You can supply one value to scale horizontally
and vertically equally, or supply two values, one for the
horizontal and another for the vertical scale factor
PDF Settings
The settings for a PDF block are very similar to the settings
for an image block, as shown in Figure 6 defaultpdf
specifies a default PDF to place if the PDF document that
PDFlib names cannot be found
defaultpdfpage specifies which page of the default
PDF to place if the default PDF must be used
scale controls the scaling of the PDF As with an
image, you can specify one value to apply to both axes
or you can provide two values, one for horizontal scaling
and another for vertical scaling
Custom Settings
When using any type of block, you can specify custom
attributes Custom attributes do not affect the output
when using PDFlib, but can be retrieved by PDFlib for
interpretation by your code Custom attributes are good
for passing information to the PDFlib program, or even
for just better record keeping
As an example, say that you want to create a text
block that’s limited to ten characters or less Create the
text block, add a custom property named length, set it
to 10, and then retrieve the value via PDFlib at runtime
Your code can verify the length of a string before filling
the block and react accordingly, perhaps truncating the
string or asking the user to provide a new value
The PDFlib Blocks Menu
To make setting up blocks easier, the “PDFlib Blocks”
menu has a few handy tools You can export and import
blocks to re-use complex blocks, you can align elements,
and more
FIGURE 7
FIGURE 8
FIGURE 9
Trang 22PHPLib’s Block Tool
Whatever text you “insert” assumes the
formatting of the block.
Exporting
The “Export” feature is a huge timesaver when dealing
with multiple PDFs that require the same types of blocks
Once you’ve finished setting up blocks in a single “master”
PDF, you can export those blocks and then import them
over and over again into other PDFs There are several
different settings in the “Export” dialog (see Figure 7):
• You can export blocks from all pages of the
PDF or from a subset of them
• You can export blocks to a new PDF or to an
existing PDF Selecting “New File on Disk”
creates a blank PDF with the blocks set in
the new file If you want to export blocks to
a document that you already have opened
in Adobe Acrobat, select “Open Document”
and click “Choose” to see a list of all open
documents If you choose “Replace Existing
Files”, the block tool will overwrite the target
file with blank pages with the blocks in the
proper place
• The next option is “Export Which Blocks?” This
section allows you to control which blocks
are exported You can export all blocks—
depending on the number of pages you choose
in the first section—or just the blocks that
you highlight before exporting You can also
choose to delete the blocks that exist on the
target PDF
Importing
You can import blocks from another PDF using the import
option in the “PDFlib Blocks” menu When you choose
“Import,” you will be presented with a screen to choose
the file that contains the blocks you want to import
(Figure 8)
After you choose the appropriate file, you can
determine which pages the blocks should be applied to
Alignment Options
The alignment option in the “PDFlib Blocks” menu allows
you to align two blocks
To align, choose a block It should turn pink, reflecting
that it’s your primary choice Then choose another block;
it should turn blue, indicating that it’s your secondary choice When you select “Align,” the blue block should align with the pink block Figure 9 shows two blocks,
Block_1, the secondary block, left-aligned to the primary block, Block_0
The “Size” alignment option only works when more than one block is selected You can change all secondary blocks (blue) to be either the same width or height as the primary block (pink)
The “Center” alignment option aligns all blocks selected either horizontally or vertically, and even both horizontally and vertically
Defining Blocks and Detecting Settings
Two other time savers are available in the “PDFlib Block”
menu: one creates a block from a placed object like an image, and another creates blocks that automatically detect the font settings and font color of the font that the block is being created over
Click on “Click Object to Define Block” and then click
on an object such as an image to create a block of the same dimension in the exact same position
Or, if you click on “Detect Underlying Font and Color”
before you create a block, the block’s font settings are automatically set to match the style and size of the text below the new block This feature is especially useful
when dealing with a lot of text and specific colors (You may have to adjust the font name to match a font located
on the server running PDFlib.)
Using Blocks
As you might imagine, working with blocks from within your code makes placing text, images, and PDFs into a dynamic PDF far simpler than writing code to control the pointer, stroke text line-by-line, and so on With blocks, formatting is separated from your code, leaving all of the aesthetics to the designer creating the PDF Better yet,
a change to the design of the page doesn’t (necessarily)
Anytime
Anytime
Anytime
Trang 23Anytime
Anytime
Trang 24PHPLib’s Block Tool
necessitate tweaking your code
Setting up the dynamic PDF document is similar to
what’s been shown in prior chapters, except you need to
pull in the PDF that contains the blocks First, specify
the basic information:
PDF_set_info($p, “Creator”, “block_tool.php”);
PDF_set_info($p, “Author”, “Ron Goff”);
PDF_set_info($p, “Title”, “Block Tool”);
Next, pull in the PDF page that contains the blocks, place
it into memory, and create a new blank page:
$block_file = “block_file.pdf”;
$blockcontainer = PDF_open_pdi($p, $block_file, “”, 0);
//Page standard 8.5 x 11
PDF_begin_page_ext($p, 612, 792, “”);
Continuing, call up the actual page that you want to use
In the line of code below, the 1 (numeral one) refers to
page one of the PDF that contains the blocks
$page = PDF_open_pdi_page($p, $blockcontainer, 1, “”);
If you want to use another page from the “template”
PDF, just specify that page number instead of 1
Finally, the page with blocks is “copied” to the new
page in the new PDF
PDF_fit_pdi_page($p, $page, 0.0, 0.0, “adjustpage”);
The adjustpage option adjusts the size of the new
page to match the page size of the template PDF
adjustpage overrides any page settings that have been
set previously
From here, you are ready to use the blocks
Text Blocks
Whether working with a line of text or a text flow, text is
easy to fill in: just specify the name of the block and the
text to render and call PDF_fill_textblock()
$block = “Block_1”;
$text = “All the pie in the sky wasn’t enough to fill my plate”;
PDF_fill_textblock($p, $page, $block, $text, “encoding=winansi”);
The block name, here Block_1, is the name that was
assigned to the block when it was created in the
template PDF (Block names are unique and the default
name is Block_#, but a block name can be any string of
alphanumeric characters.)
Notice that there are no extra formatting options
Whatever text you “insert” assumes the formatting of
the block
If you want to override a block’s formatting, you can Where encoding=winansi appears, add the options that you want to override For example, to override the font size, specify encoding=winansi fontsize=12
You should also enable embedding as needed You can enable embedding by adding embedding=true as in
encoding=winansi embedding=true
Image Blocks
The process of placing an image in an image block resembles that of placing the image “manually”: the image is loaded and then placed
$block4 = “Block_4”;
$image_load = “image.jpg”;
$image = PDF_load_image($p, “auto”, $image_load, “”);
PDF_fill_imageblock($p, $page, $block4, $image, “”);
PDF_close_image($p, $image);
In this example, the image image.jpg is placed in Block_4
using the function PDF_fill_imageblock()
Form Conversion
You may be familiar with the Adobe Acrobat “Form Tool,” a great way to create fillable areas of your PDF So, why not just use forms to define variable data placement? Because the form tool is limited:
it cannot specify advanced font settings, whereas the block tool has been designed specifically to customize all aspects of your text However, if you have a PDF that used the form tool to define areas for text, there is an option within the “PDFlib Blocks” menu to convert your pre-made forms into blocks (Figure 5.4)
Trang 25RON GOFF is the technical director/senior programmer for Conveyor
Group (www.conveyorgroup.com), a Southern-California based
web development firm He is the author of several articles for
PHP|Architect magazine and other online publications Ron’s lives in
California with his wife Nadia and 2 children You can contact him at
ron@conveyorgroup.com.
Closing the Page
After you’ve filled all of the appropriate blocks on the
open page, you must close that page
PDF_close_pdi_page($p, $page);
This line closes the PDF and you can start a new page, or
end the entire document after this is called
Putting All Together
A complete example using the PDF_fill_textblock()
function can be seen in Listing 1
The PDFlib block tool is easy to use and provides
for complex layouts without extensive programming
Using blocks, a designer can assign where dynamic text,
images, and even PDFs are to be placed, yielding a much
more professional result
9 PDF_set_info ( $p , “Creator” , “block_tool.php” );
10 PDF_set_info ( $p , “Author” , “Ron Goff” );
11 PDF_set_info ( $p , “Title” , “Block Tool” );
37 header ( “Content-type: application/pdf” );
38 header ( “Content-Length: $len” );
39 header ( “Content-Disposition: inline; “
Trang 26FPDI in DetailFEATURE
by JAN SLABON
TO DISCUSS THIS ARTICLE VISIT:
http://forum.phparch.com/279
PDF documents—or better stated: the PDF
format—have reached widespread popularity
over the past few years, and this momentum
continues A very strong example of this is
in a recent ISO standard, which is based on
PDF 1.4, and defines a PDF derivate for the long-term
preservation of electronic documents PDF has becomea
a real standard!
In fact, the dynamic generation of PDF documents is
an important issue today, and will continue to be so in
the future While it’s quite simple to build PDF docments
on desktop PCs, their dynamic generation on a webserver,
especially when using a language like PHP, can prove
very difficult
On the Internet, you’ll find several PDF APIs that
will allow you to create PDF documents with PHP Some
“FPDF” stands for “Free.”
Most PHP developers about the ability to create PDF documents on the fly When looking at the wide range of PHP classes or APIs, every product has its own advantages and disadvantages—some of them are very expensive and others are free, but don’t offer the same functionality as the expensive ones The main difference between the free and commercial libraries is the ability to use external documents PDFLib has supported this through its PDI interface, but the free classes didn’t external documents, until I released FPDI for
FPDF, which gives you the same muscle—but for free!
Trang 27FPDI in Detail
define the name of the image and its real object relation After this, you can simply refer to the image by using the name you provided in the content stream As FPDF, and any other PDF generators, use named relations, which lead into name conventions, you have to pay attention when updating a PDF
If you’ve read Marco’s article, you’ll remember that there’s a part in it where he searches for the next available font name This check has to be built into FPDF before every piece of code where FPDF creates a named relation
Another disadvantage of updating documents is that you cannot remove single pages, or reuse an existing page in an easy way This method will, however, allow us
to reuse, resize, crop or rotate page We can also avoid naming conventions, because every imported page has its own kind of namespace in the new document, as you’ll see below
The Basics
While I was studying the PDF reference to find a good solution for importing pages, I came across a technique with the spooky name of “form XObjects” I’m sure that everyone who stumbles upon this term thinks about conventional “forms” like those that we use in HTML, or
on paper In this case, “form” has another meaning: it corresponds to the notation of forms in the PostScript language
A form XObject can be compared with a kind of layer
It is a self-contained description of any sequence of graphics objects—its whole structure is almost similar
to the structure of a single page in a PDF document The form XObject has its own resource dictionary, where named relations are defined So, it seemed to be the perfect solution for my problem: if I could create form XObjects, I most certainly would be able to convert pages into them
But, form XObjects have more advantages than simply preparing FPDF for PDF import For example, they can be reused at any time in a PDF document, where the viewer application can cache the rendered results to optimize the execution It sounded like a kind of template to me, so I began extending FPDF with this feature, which resulted
in a PHP class called fpdf_tpl This class redirects all output made by FPDF into containers which will be used
as form XObjects, so one can reuse any output created with FPDF, at any time
This class has more to offer than merely preparing FPDF for FPDI—as already stated You can reuse a template multiple times in a document, whereas it only needs to
be written once to the resulting document, which leads
to less memory usage and processing time in your script
When I was working with FPDF, I was often challenged
with a situation where I had to rebuild a whole document,
programmatically As you can imagine, this part was
very frustrating, tedious, and time consuming A digital
version of your document is sitting right in front of you,
and you just cannot use it
Similarly, I ran into additional problems when dealing
with vector based graphics and FPDF There was no real
way to import such things, except by converting them to
bitmaps and using the Image() method of FPDF I’m sure I
don’t have to explain the drawbacks to this workaround
When I found an article in php|architect (Vol 3, Issue
5) where Marco Tabini described how to parse a PDF and
update it with some simple content, I got the idea to
implement this technique into FPDF—which resulted in
a library which was also named with 4 simple chars: FPDI
(Free PDF Import)
I released my new library under the Apache
Software License 2.0, which allows you to use it in your
commercial or non-commercial projects The project
homepage can be found at http://fpdi.setasign.de The
article by Marco is freely available as a monthly sample,
at http://www.phparch.com/issuedata/articles/article_110.pdf
In this article, I’ll introduce you to FPDI, explain
how it was born, and cover its internal workings I will
assume that you have some knowledge of FPDF, and have
a bit of experience with the Portable Document Format,
itself If not, just download FPDF, and run the tutorials
that Olivier provided in the package This article will not
tell you how to use FPDF, but will delve deeper into the
details of the PDF structure and how FPDI extends FPDF,
bringing out the ability to import single pages of existing
PDF documents—not just modifying existing documents
This feature is not that clear to most people out there
At this point I could tell you much about the structure
of a PDF document, but as I already mentioned, the whole
idea is based on another article, where everything you
need to know about parsing a PDF is already described
I will cover some details about that issue later in this
article
I want to make it clear why I chose the “import single
pages” method, instead of “really modifying/updating” a
PDF To put it simply: “It is much easier.” You can look at
a PDF document as a collection of single objects which
are linked to each other Pages, images, font descriptions,
and document information are all single objects and can
be identified by a unique ID
The PDF format is more flexible than just assigning
objects by simple IDs, though—it allows one to define
named relations For example, these relations can be
used to put an image into a content stream of a PDF
page You have to set up a resource dictionary, where you
Trang 28FPDI in Detail
templates, approximately 1.2 MB
I hope that the main advantage of fpdf_tpl is now clear Let’s skip ahead and take a deeper look at this class The class uses an array for holding all created templates named $this->tpls where each entry describes
a single template as an array with special keys The main entries in each template array are x, y, w, h and buffer All other entries are just used to save other information, and are prefixed with o_
A new property, with the name of $this->res is used
to assign resources like fonts, images, or other templates,
to the template or the page The assignment of resources
to single pages is left in for testing purposes, and will be removed in the next release of fpdf_tpl
30 static $content = null ;
31 $this -> SetFont ( ‘Arial’ , ’B’ , 10 );
32 $this -> SetFillColor ( 255 , 153 , );
33
34 $this -> Rect ( $this -> lMargin , 28 , $width =
35 $this -> w $this -> rMargin - $this -> lMargin , 3 , ‘F’ );
36 $this -> Rect ( $this -> lMargin , $this -> h 10 ,
48 $content = file_get_contents ( FILE );
49 $this -> SetFont ( ‘Courier’ , ’’ , );
50 $this -> MultiCell ( $width - 3 , 2.5 , $content );
51 }
52
53 // For debugging purpose
54 function pdf ( $orientation = ’P’ , $unit = ’mm’ , $format = ’A4’ )
55 {
56 $this -> _startTime = microtime ();
57 parent :: fpdf_tpl ( $orientation , $unit , $format );
58 }
59
60 // For debugging purpose
61 function Close () {
62 $this -> _endTime = microtime ();
63 $this -> _writingTime = true ;
67 list( $usec , $sec ) = explode ( “ “ , $this -> _startTime );
68 $start = ((float) $usec + (float) $sec );
69 list( $usec , $sec ) = explode ( “ “ , $this -> _endTime );
70 $end = ((float) $usec + (float) $sec );
71 $time = $end - $start ;
77 for( $n = 0 , $c = count ( $this -> pages ); $n < $c ; $n ++)
78 $buffersize += strlen ( $this -> pages [ $n ]);
79 for( $n = 0 , $c = count ( $this -> tpls ); $n < $c ; $n ++)
80 $buffersize += strlen ( $this -> tpls [ $n ][ ‘buffer’ ]);
Examples of its use are: the generation of headers and/or
footers, table headers which could be repeated on every
page, a background grid of large tables, text in front or
behind a template, etc
If you take a look at Listing 1 and Figure 1, you’ll
see a sample script which demonstrates the use of
templates You turn templates on and off by setting
the $pdf->useTPLs property to true or false—the visual
result is the same This demo has no real meaning, but it
shows how much the file size and process time decrease if
you’re using templates My tests gave me a process time
of only 0.0766 seconds when using templates, and 3.649
seconds without them! The same was true for the buffer
size: with templates it only takes up 14.5 kb—without
Trang 29PODCAST AD
Trang 30So, we’ll only take a look at the tpl key in $this->res
This array is needed to rebuild the form XObjects
resources dictionary with named relations, which are
used in the template To redirect the output made by
FPDF, I used a simple flag, $this->intpl, and extended
the _out() method I had to take special care because a
form XObject cannot include internal or external links or
better, any kind of annotation
FPDF uses a single, global resource dictionary for all
pages and creates this within the _putresources() method
I extended this method to make it call _puttemplates(),
which will create all necessary template objects After
the objects are created and written, the named relations
to them will be written to the main resource dictionary
All created templates are usable on every page!
Unfortunately, using the global resource dictionary isn’t
the best solution because it’ll introduce problems when
interpreting or extracting pages of a document, as you
will see later
With the fpdf_tpl class, I’ve build the basis for
FPDI—now, we have to convert the pages of an already
existing PDF document, but we have to parse it first, to
get the desired information
Parsing the Original Document
I owe a lot of credit to Marco’s article, because the
parsing of an existing document was nearly completely
covered in it
I adapted all parsing functions into a single class,
pdf_parser, and added support for reading streams Let’s take a quick look at the structure and how the parsing has to be done The first task that the parser has to do
is to read the xref-table of the PDF document This is done by the pdf_parser::pdf_read_xref() method The xref-table is similar to a table of contents It gives us information about the objects used in the document, and their byte-offset positions in the file At the end of the xref-table, we’ll find the file trailer dictionary; the entries
in this table lead us to the catalogue dictionary of the file The catalogue dictionary is the root of all objects
in the document’s object hierarchy and we’ll find the reference to the first page tree node of the document’s page tree—which is exactly what we’re searching for: all single pages used in the existing document
The parser has to follow the whole page tree to get the exact page count and to collect other information on the pages, which is done by read_pages() in the extended class, fpdi_pdf_parser, and results in an array as the
$this->pages property The keys of $this->pages are the desired page numbers starting at zero where each entry holds the related page object After this task is done,
we have enough information about the source document for now
While I was implementing this code, I got stuck
on some problems—it took me several days (and nights) to fix them A great problem for me was the determination of the line ending in a file Normally, this task is handled by the PHP configuration directive
Trang 31FPDI in Detail
distance of the first to the third and the second to the fourth value This bug has been overlooked for a long time, because its only manifests itself if the MediaBox’s
x- or y-value have values other than 0 It’ll be fixed in the future!
To resolve the MediaBox’s data, the extended parser for FPDI is shipped with a getPageBox() method This method is needed, because the MediaBox (or any other box) can also be referenced to another PDF object, or the value can be inherited by a parent page in the page tree This method makes sure that the correct values will be resolved Currently, FPDI supports only PDFs that contain a MediaBox—there are other boxes in the PDF specification e.g a CropBox or a TrimBox If your PDF uses other boxes instead of a MediaBox, the results of FPDI might not be as expected Also if another box is used, you can ignore the bug described in the paragraph above
The next task is to fill the buffer of our template with the content stream of the imported page There’s one important difference between a PDF page and a form XObject: a page can have multiple content streams, while a form XObject can only have one Because of this issue, we have to concatenate all content streams
of a page into one single stream To do this, there’s a method called getPageContent() in the extended parser (fpdi_pdf_parser)
All of these resolved streams can be encoded with different filters The most commonly used filter is the
FlateDecode filter which can be decoded with the zlib
functions, if they are enabled in the PHP installation I’ve also written 2 more decoders for the LZWDecode- and
ASCII85Decode-filters With these 3 filters, FPDI should handle nearly all documents which have encoded page content streams—until now there have been no bug reports related to an absent filter The decoding of the content streams is done by the rebuildContentStream()
method, in the extended parser class After decoding all streams, they can be simply concatenated to a single one and assigned to the buffer key in the desired template array
The next step is to resolve the resources which are used in the content streams we want to import These can be relations to images, fonts or other form XObjects The resources are normally defined as named relations
in the page dictionary, or in one parent page in the page tree To resolve them, the extended parser offers a
_getPageResources() method, which returns the desired resource data of the page The method will not resolve the resource’s own data, but only the information like its name, and to which objects it is referenced in the original document The real import of these resources
auto_detect_line_endings, but as a PDF file can have
multiple updates by different programs (on different
operating systems), the line endings can be mixed To
overcome this issue, I’ve written a wrapper for fgets()
which comes in use as a fallback function if fgets()
returns incorrect data This wrapper function also enables
the class to be used with a PHP-version less than 4.3,
where auto_detect_line_endings was introduced
To make FPDI compatible with PHP versions less than
4.3, I also created other wrapper functions for strspn()
and strcspn() where introduced so that FPDI should run
with php 4.2+
During my testing (with hundreds of PDF files), I
found several minor bugs in the parsing process—some
are fixed and some are so raw that they can be ignored
for now
Let’s Convert a Page to a
Form XObject
First, we’ll take a deeper look at a page object found in
$this->pages of a parser object A PDF object is represented
internally as an array, in a specified structure, as Marco
defined in his article For demonstration purposes, we
use the shipped demonstration PDF with FPDI:
$pdf =& new fpdi();
$pdf->setSourceFile(‘classes/pdfdoc.pdf’);
echo “<pre>”;
print_r($pdf->current_parser->pages[0]);
You can see the output in Listing 2 At first look, it
seems very odd, but everything makes sense! Every entry
in any level is built as an array with at least the keys
0 and 1, where 0 describes the type of the value in key
1 All other keys are used to define special attributes
of that value The types are defined as constants in
pdf_parser.php For example the 0 key in the lowest level
is 9 which is defined as a PDF object This object’s value
is a dictionary (5)—in this case a page dictionary—with
tokens that each have their own value types
To import a page, FPDI offers a method called
ImportPage() which is close to the BeginTemplate()
method of fpdi_tpl As we’ve seen, the structure of a
template entry in $this->tpls contains main entries like
x, y, w, h and buffer
If we take a closer look at Listing 2, we can see a
relationship between these entries /MediaBox is an array
(6) of exactly 4 entries, whose value types are numeric
(1) The first entry’s value is that of x, the second of y,
third of w and, not surprisingly, the last one of h This is
actually a bug in the current release of FPDI The last 2
values are also coordinates The real values for the width
and the height have to be calculated by specifying the
Trang 32A PDF cannot be compared to a file with
a structural language like HTML.
1 <?php
2 define ( FPDF_FONTPATH , ‘classes/font/’ );
3 require_once( ‘classes/fpdi.php’ );
4
5 $pdf =& new fpdi ( ‘L’ , ’pt’ );
6 // load the origin document
7 $pagecount = $pdf -> setSourceFile ( ‘pdfs/article_110.pdf’ );
17 // use the imported page
18 $size = $pdf -> useTemplate ( $tplidx , $x , $y , 250 );
19 // draw a border around the used page
20 $pdf -> Rect ( $x , $y , $size [ ‘w’ ], $size [ ‘h’ ], ‘D’ );
21
22 // if it’s the third page in a row do a
23 // pagebreak and reset the x- and y-values