Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages: 304 Table of Contents | Index With a worldwide community of users and more than a million dedicated programmers, Perl
Trang 1By Simon Cozens
Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages: 304
Table of Contents | Index
With a worldwide community of users and more than a million dedicated programmers, Perl has proven to be the most effective language for the latest trends in computing and business
Every programmer must keep up with the latest tools and techniques This updated
version of Advanced Perl Programming from O'Reilly gives you the essential knowledge of
the modern Perl programmer Whatever your current level of Perl expertise, this book will help you push your skills to the next level and become a more accomplished programmer.
O'Reilly's most high-level Perl tutorial to date, Advanced Perl Programming, Second Edition
teaches you all the complex techniques for production-ready Perl programs This
ins, extending Perl's object-oriented model, and testing your code for greater stability Other topics include:
Praise for the Second Edition:
Trang 2"The information here isn't theoretical It presents tools and techniques for solving real problems cleanly and elegantly." Curtis 'Ovid' Poe
" Advanced Perl Programming collects hard-earned knowledge from some of the best
programmers in the Perl community, and explains it in a way that even novices can apply immediately." chromatic, Editor of Perl.com
Trang 3By Simon Cozens
Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages: 304
Trang 6most titles (safari.oreilly.com) For more information, contactour corporate/institutional sales department: (800) 998-9938 or
Trang 7Advanced Perl Programming, the image of a of a black leopard,
and related trade dress are trademarks of O'Reilly Media, Inc
Many of the designations used by manufacturers and sellers todistinguish their products are claimed as trademarks Wherethose designations appear in this book, and O'Reilly Media, Inc.was aware of a trademark claim, the designations have beenprinted in caps or initial caps
While every precaution has been taken in the preparation of thisbook, the publisher and author assume no responsibility for
errors or omissions, or for damages resulting from the use ofthe information contained herein
ISBN: 0-596-00456-7
[M]
Trang 8It was all Nathan Torkington's fault Our Antipodean
programmer, editor, and O'Reilly conference supremo friend
asked me to update the original Advanced Perl Programmingway back in 2002
The Perl world had changed drastically in the five years sincethe publication of the first edition, and it continues to change.Particularly, we've seen a shift away from techniques and
toward resourcesfrom doing things yourself with Perl to usingwhat other people have done with Perl In essence, advancedPerl programming has become more a matter of knowing where
to find what you need on the CPAN,[*] rather than a matter ofknowing what to do
[*] The Comprehensive Perl Archive Network (http://www.cpan.org ) is the primary resource for user-contributed Perl code.
Perl changed in other ways, too: the announcement of Perl 6 in
2000 ironically caused a renewed interest in Perl 5, with peoplestretching Perl in new and interesting directions to implementsome of the ideas and blue-skies thinking about Perl 6 Contrary
to what we all thought back then, far from killing off Perl 5, Perl6's development has made it stronger and ensured it will bearound longer
So it was in this context that it made sense to update AdvancedPerl Programming to reflect the changes in Perl and in the
Trang 9weather, and playing poker And more are being added everyday, faster than any author can keep up Second, as we've
mentioned, because Perl is changing I don't know what thenext big advance in Perl will be; I can only take you throughsome of the more important techniques and resources available
at the moment
Hopefully, though, at the end of this book you'll have a goodidea of how to use what's available, how you can save yourselftime and effort by using Perl and the Perl resources available toget your job done, and how you can be ready to use and
integrate whatever developments come down the line
In the words of Larry Wall, may you do good magic with Perl!
Trang 10If you've read Learning Perl and Programming Perl and wonder
where to go from there, this book is for you It'll help you climb
to the next level of Perl wisdom If you've been programming inPerl for years, you'll still find numerous practical tools and
techniques to help you solve your everyday problems
Trang 11Chapter 1, Advanced Techniques, introduces a few commontricks advanced Perl programmers use with examples from
Text::Template, HTML::Template, HTML::Mason, and the Template Toolkit
conversions, parsing, extraction, and Bayesian analysis
Chapter 6, Perl and Unicode, reviews some of the problems andsolutions to make the most of Perl's Unicode support
Chapter 10, Fun with Perl, closes on a lighter note with a fewrecreational (and educational) uses of Perl
Trang 12The following typographical conventions are used in this book:
Plain text
Indicates menu titles, menu options, menu buttons, andkeyboard accelerators (such as Alt and Ctrl)
Italic
Indicates new terms, URLs, email addresses, filenames, fileextensions, pathnames, directories, and Unix utilities
Constant width
Indicates commands, options, switches, variables,
attributes, keys, functions, classes, namespaces, methods,modules, parameters, values, XML tags, HTML tags, thecontents of files, or the output from commands
Constant width bold
Shows commands or other text that should be typed
literally by the user
Constant width italic
Trang 13This icon signifies a tip, suggestion, or generalnote
This icon indicates a warning or caution
Trang 14This book is here to help you get your job done In general, youmay use the code in this book in your programs and
documentation You do not need to contact us for permissionunless you're reproducing a significant portion of the code Forexample, writing a program that uses several chunks of codefrom this book does not require permission Selling or
distributing a CD-ROM of examples from O'Reilly books doesrequire permission Answering a question by citing this bookand quoting example code does not require permission
Incorporating a significant amount of example code from thisbook into your product's documentation does require
permission
We appreciate, but do not require, attribution An attributionusually includes the title, author, publisher, and ISBN For
example: "Advanced Perl Programming, Second Edition by
Simon Cozens Copyright 2005 O'Reilly Media, Inc 0-596-00456-7."
If you feel your use of code examples falls outside fair use orthe permission given above, feel free to contact us at
permissions@oreilly.com
Trang 15Please address comments and questions concerning this book tothe publisher:
http://www.oreilly.com/catalog/advperl2/
To comment or ask technical questions about this book, sendemail to:
bookquestions@oreilly.com
For more information about our books, conferences, ResourceCenters, and the O'Reilly Network, see our web site at:
http://www.oreilly.com
Trang 16When you see a Safari Enabled icon on the cover ofyour favorite technology book, that means the book is availableonline through the O'Reilly Network Safari Bookshelf
Safari offers a solution that's better than e-books It's a virtuallibrary that lets you easily search thousands of top tech books,cut and paste code samples, download chapters, and find quickanswers when you need the most accurate, current information.Try it for free at http://safari.oreilly.com
Trang 17I've already blamed Nat Torkington for commissioning this
book; I should thank him as well As much as writing a bookcan be fun, this one has been It has certainly been helped by
my editors, beginning with Nat and Tatiana Apandi, and endingwith the hugely talented Allison Randal, who has almost single-handedly corrected code, collated comments, and converted myrambling thoughts into something publishable The productionteam at O'Reilly deserves a special mention, if only because ofthe torture I put them through in having a chapter on Unicode
Allison also rounded up a great crew of highly knowledgeablereviewers: my thanks to Tony Bowden, Philippe Bruhat, SeanBurke, Piers Cawley, Nicholas Clark, James Duncan, Rafael
Garcia-Suarez, Thomas Klausner, Tom McTighe, Curtis Poe,
chromatic, and Andy Wardley
And finally, there are a few people I'd like to thank personally:thanks to Heather Lang, Graeme Everist, and Juliet Humphreyfor putting up with me last year, and to Jill Ford and the rest ofher group at All Nations Christian College who have to put upwith me now Tony Bowden taught me more about good Perlprogramming than either of us would probably admit, and
Simon Ponsonby taught me more about everything else than herealises Thanks to Al and Jamie for being there, and to Malcolmand Caroline Macdonald and Noriko and Akio Kawamura for
launching me on the current exciting stage of my life
Trang 18Once you have read the Camel Book (Programming Perl), orany other good Perl tutorial, you know almost all of the
language There are no secret keywords, no other magic sigilsthat turn on Perl's advanced mode and reveal hidden features
In one sense, this book is not going to tell you anything newabout the Perl language
What can I tell you, then? I used to be a student of music
Music is very simple There are 12 possible notes in the scale ofWestern music, although some of the most wonderful melodies
in the world only use, at most, eight of them There are aroundfour different durations of a note used in common melodies.There isn't a massive musical vocabulary to choose from Andmusic has been around a good deal longer than Perl I used towonder whether or not all the possible decent melodies wouldsoon be figured out Sometimes I listen to the Top 10 and think
I was probably right back then
But of course it's a bit more complicated than that New music
is still being produced Knowing all the notes does not tell youthe best way to put them together I've said that there are nosecret switches to turn on advanced features in Perl, and thismeans that everyone starts on a level playing field, in just thesame way that Johann Sebastian Bach and a little kid playingwith a xylophone have precisely the same raw materials to workwith The key to producing advanced Perlor advanced
Trang 19On the other hand, a book can certainly teach techniques, and
in this chapter we're going to look at the three major classes ofadvanced programming techniques in Perl First, we'll look atintrospection: programs looking at programs, figuring out howthey work, and changing them For Perl this involves
manipulating the symbol tableespecially at runtime, playingwith the behavior of built-in functions and using AUTOLOAD to
introduce new subroutines and control behavior of subroutinedispatch dynamically We'll also briefly look at bytecode
introspection, which is the ability to inspect some of the
properties of the Perl bytecode tree to determine properties ofthe program
oriented programs and modules is sometimes regarded as
The second idea we'll look at is the class model Writing object-advanced Perl, but I would categorize it as intermediate As this
is an advanced book, we're going to learn how to subvert Perl'sobject-oriented model to suit our goals
Finally, there's the technique of what I call unexpected
codecode that runs in places you might not expect it to This
means running code in place of operators in the case of
overloading, some advanced uses of tying, and controlling whencode runs using named blocks and eval
These three areas, together with the special case of Perl XSprogrammingwhich we'll look at in Chapter 9 on Inlinedelineatethe fundamental techniques from which all advanced uses ofPerl are made up
Trang 20First, though, introspection These introspection techniquesappear time and time again in advanced modules throughoutthe book As such, they can be regarded as the most
fundamental of the advanced techniqueseverything else willbuild on these ideas
1.1.1 Preparatory Work: Fun with Globs
Globs are one of the most misunderstood parts of the Perl
language, but at the same time, one of the most fundamental.This is a shame, because a glob is a relatively simple concept
When you access any global variable in Perlthat is, any variablethat has not been declared with mythe perl interpreter looks up the variable name in the symbol table For now, we'll consider
the symbol table to be a mapping between a variable's nameand some storage for its value, as in Figure 1-1
Note that we say that the symbol table maps to storage for the
value Introductory programming texts should tell you that avariable is essentially a box in which you can get and set a
value Once we've looked up $a, we know where the box is, and
we can get and set the values directly In Perl terms, the
symbol table maps to a reference to $a
Figure 1-1 Consulting the symbol table, take 1
Trang 21have noticed, however, that there are several things called a in
Perl, including $a, @a, %a, &a, the filehandle a, and the directoryhandle a
This is where the glob comes in The symbol table maps a namelike a to a glob, which is a structure holding references to all thevariables called a, as in Figure 1-2
Figure 1-2 Consulting the symbol table, take 2
Trang 221.1.1.1 Aliasing
This disconnect between the name look-up and the referencelook-up enables us to alias two names together First, we get
hold of their globs using the *name syntax, and then simply
assign one glob to another, as in Figure 1-3
Figure 1-3 Aliasing via glob assignment
Trang 23We've assigned b's symbol table entry to point to a's glob Nowany time we look up a variable like %b, the first stage look-uptakes us from the symbol table to a's glob, and returns us areference to %a.
Trang 24We use a symbolic reference to turn the glob's name, which is astring, into the glob itself This is just the same as the symbolicreference in this familiar but unpleasant piece of code:
they're doing We use no strict 'refs'; to tell Perl that we're
planning on doing good magic with symbolic references
Many advanced uses of Perl need to do some of the things that strict
prevents the uninitiated from doing As an initiated Perl user, you will occasionally have to turn strictures off This isn't something to take lightly, but don't be afraid of it; strict is a useful servant, but a bad master, and should be treated as such.
Trang 25point to the *useful glob in the current Some::Module package Nowall references to useful( ) in the main package will resolve to
All we want to do is to put a subroutine into the &useful element
of the *main::useful glob If we were exporting a scalar or anarray, we could assign a copy of its value to the glob by saying:
${caller( )."::useful"} = $useful;
@{caller( )."::useful"} = @useful;
Trang 26reference to the glob, and Perl works out what type of reference
it is and stores it in the appropriate part, as in Figure 1-4
Figure 1-4 Assigning to a glob's array part
Trang 27Notice that this is not the same as @a=@b; it is real aliasing Anychanges to @b will be seen in @a, and vice versa:
Trang 28the useful glob We do this by assigning a reference to
&Some::Module::useful to *main::useful:
sub useful { 42 }
sub import {
Trang 29*{caller( )."::useful"} = \&useful;
}
This is similar to how the Exporter module works; the heart of
Exporter is this segment of code in Exporter::Heavy::heavy_export:
Trang 30Because $sym is "useful"with no type sigilthe rest of the
Trang 31The *glob = syntax obviously only works for assigning references to the
appropriate part of the glob If you want to access the individual references, you
can treat the glob itself as a very restricted hash: *a{ARRAY} is the same as \@a ,
and *a{SCALAR} is the same as \$a The other magic names you can use are HASH ,
IO , CODE , FORMAT , and GLOB , for the reference to the glob itself There are also the
really tricky PACKAGE and NAME elements, which tell you where the glob came from.
especially closures, to a glob For instance, there's a modulecalled Data::BT::PhoneBill that retrieves data from British
separated lines of information about a call and turns them intoobjects An older version of the module split the line into anarray and blessed the array as an object, providing a bunch ofread-only accessors for data about a call:
Trang 32Printing $seq after the block doesn't work, because the lexical variable is out of
scope (it'll give you an error under use strict However, the sequence subroutine
can still access the variable to increment and return its value, because the
closure { $seq += 3 } captured the lexical variable $seq
See perlfaq7 and perlref for more details on closures.
Of course, the inevitable happened: BT added a new column atthe beginning, and all of the accessors had to shift down:
sub type { shift->[0] }
sub installation { shift->[1] }
sub line { shift->[2] }
Clearly this wasn't as easy to maintain as it should be The firststep was to rewrite the constructor to use a hash instead of anarray as the basis for the object:
Trang 33This code maps type to the first element of @data, installation to
Trang 34in the arrayequivalent to *type = sub { shift->{type} } Because
we're using a closure on $f, each accessor "remembers" whichfield it's the accessor for, even though the $f variable is out ofscope once the loop is complete
Creating a new subroutine by assigning a closure to a glob is aparticularly common trick in advanced Perl usage
1.1.2 AUTOLOAD
There is, of course, a simpler way to achieve the accessor trick.Instead of defining each accessor individually, we can define asingle routine that executes on any call to an undefined
subroutine In Perl, this takes the form of the AUTOLOAD
subroutinean ordinary subroutine with the magic name AUTOLOAD:
sub AUTOLOAD {
print "I don't know what you want me to do!\n"; }
yow( );
Instead of dying with Undefined subroutine &yow called, Perl tries
the AUTOLOAD subroutine and calls that instead
To make this useful in the Data::BT::PhoneBill case, we need toknow which subroutine was actually called Thankfully, Perl
Trang 35qualified variable name into a locally qualified name A call to
$call->type will set $AUTOLOAD to Data::BT::PhoneBill::_Call::type
Since we want everything after the last ::, we use a regularexpression to extract the relevant part This can then be used
accessor points to the right hash element Finally, once the newsubroutine is set up, we can use goto &subname to try again,
calling the newly created Data::BT::PhoneBill::_Call::type methodwith the same parameters as before The next time the samesubroutine is called, it will be found in the symbol tablesince
we've just created itand we won't go through AUTOLOAD again.
goto LABEL and goto &subname are two completely different operations, unfortunately with the same name The first is generally discouraged, but the second has no such stigma attached to it It is identical to
subname(@_) but with one important difference: the current stack frame is obliterated and replaced with the new subroutine If we had used
Trang 36It's also concievable that, because there is no third stack frame or call-There are two things that every user of AUTOLOAD needs to know The first is DESTROY If your AUTOLOAD subroutine does
subroutine has been called, then the missing subroutine hasbeen deemed to be dealt with If you want to rethrow the
undefined-subroutine error, you must do so manually For
instance, let's limit our Data::BT::PhoneBill::_Call::AUTOLOAD method
to only deal with real elements of the hash, and not any randomrubbish or typo that comes our way:
use Carp qw(croak);
Trang 37
my $self = shift;
if ($AUTOLOAD =~ /.*::(.*)/ and exists $self->{$1}) { return $self->{$1}
Trang 38Where do CORE:: and CORE::GLOBAL:: come in, then? First, if we're
Trang 40module's documentation:
Unlike other modules that provide this capacity (e.g
Hook::PreAndPost and Hook::WrapSub), Hook::LexWrap implements