This book uses the Perl programming language to illustrate how to design and implement practical network applications.. Should you ever need to write a networking application in C or Jav
Trang 1indicates the conditions you are interested in monitoring You then call the IO::Pollobject's poll() method, which blocks until one or more of the conditions is met.After poll() returns, you interrogate the object to learn which handles were
is monitored for both POLLIN or POLLOUT events, formed
by logically ORing the two constants As described in more detail later, POLLIN and POLLOUT conditions occur when
Trang 2Having set up the IO::Poll object, you usually enter an I/O loop Each time throughthe loop, you call the poll object's poll() method to wait for an event to occur andthen call handles() to determine which handles were affected:
(POLLERR) The second call looks for handles that are ready for writing The
remainder of the example loop processes these handles in an application-specificmanner
Like select(), poll() must be used with sysread() and syswrite() only.Mixing poll() with routines that use standard I/O buffering (the <> operator orplain read() and write()) does not work
Top
Trang 5Chapter 10 Forking Servers and the inetd Daemon Standard Techniques for Concurrency
Running Example: A Psychotherapist Server The Psychotherapist as a Forking Server
A Client Script for the Psychotherapist Server Daemonization on UNIX Systems
Trang 7Text::Travesty (Chapter 17)
mchat_client.pl (Chapter 21)
Appendix B Perl Error Codes and Special Variables System Error Constants
Trang 9We've used select() and IO::Select extensively to multiplex among multiple I/Ostreams However, the select() system call has some design limitations related toits use of a bit vector to represent the filehandles to be monitored On an ordinaryhost, such as a desktop machine, the maximum number of files is usually a smallnumber, such as 256, and the bit vectors will therefore be no longer than 32 bytes.However, on a host that is tuned for network applications, such as a Web server, thislimit may be in the thousands The bit vectors necessary to describe every possiblefilehandle then become quite large, forcing the operating system to scan through alarge, sparsely populated bit vector each time select() is called This may have
an impact on performance
For this reason, the POSIX standard calls for an alternative API called poll() Itdoes much the same thing as select() but uses arrays rather than bit vectors torepresent sets of filehandles Because only the filehandles of interest are placed inthe arrays, the poll() call doesn't waste time scanning through a large data
structure to determine which filehandles to watch You might also want to use
poll() if you prefer its API, which is more elegant in some ways than select().poll() is available to Perl programmers only via its object-oriented interface,IO::Poll It was introduced during the development of Perl version 5.6 Be sure touse IO::Poll version 0.04 and higher because earlier versions weren't completelyfunctional This version can be found in Perl versions 5.7 and higher
Top
Trang 10Each event is designated by one of the constants summarized in Table 16.1 They
are divided into constants that can be added to bitmasks sent to poll() using themask() method, and constants that are returned from poll() via the handles()method
Trang 11POLLRDBANDband data (Chapter 17) will succeed
Priority data is available for reading An attempt to read out-of-POLLPRI "High priority" data is available for reading High priority data is ahistorical relic and should not be used for TCP/IP programming
POLLOUT The handle can accept at least 1 byte of data for writing (as
modified by the value of the socket's send buffer low water mark, as described
in Chapter 12) syswrite() does not block as long as its length does not
exceed this value This event does not distinguish between normal and prioritydata
POLLWRNORM The handle can accept at least 1 byte of normal (nonpriority)data
POLLWRBAND The handle can accept at least 1 byte of out-of-band data
(Chapter 17)
POLLERR An error occurred on the handle, such as a PIPE error For sockets,you may be able to recover the actual error number by calling sockopt()with the SO_ERROR option (Chapter 13)
Trang 12Unfortunately, this behavior is not universal On some, if not all, Linux systems,POLLIN is not set when a socket is closed Instead, you must check for a POLLHUPevent However, POLLHUP is relevant only to sockets and pipes, and does not apply
to ordinary filehandles; this makes program logic a bit convoluted
The most reasonable strategy is to recover the handles that may be readable bycalling handles with the bitmask POLLIN|POLLHUP|POLLERR Pass each handle tosysread(), and let the return value tell you what state the handle was in
Trang 13Table of Contents
Content
Trang 14Network Programming with Perl
Many of the designations used by manufacturers and sellers to distinguish theirproducts are claimed as trademarks Where those designations appear in this bookand we were aware of a trademark claim, the designations have been printed ininitial capital letters or all capitals
The author and publisher have taken care in the preparation of this book, but make
no expressed or implied warranty of any kind and assume no responsibility forerrors or omissions No liability is assumed for incidental or consequential damages
in connection with or arising out of the use of the information or programs containedherein
The publisher offers discounts on this book when ordered in quantity for specialsales For more information, please contact:
Trang 15Top
Trang 16Table of Contents
Content
Trang 17The network is everywhere At the office, machines are wired together into localarea networks, and the local networks are interconnected via the Internet At home,personal computers are intermittently connected to the Internet or, increasingly, via
"always-on" cable and DSL modems New wireless technologies, such as Bluetooth,promise to vastly expand the network realm, embracing everything from cell phones
to kitchen appliances
Such an environment creates tremendous opportunities for innovation Whole newclasses of applications are now predicated on the availability of high-bandwidth,always-on connectivity Interactive games allow players from around the globe tocompete on virtual playing fields and the instant messaging protocols let thembroadcast news of their triumphs to their friends New peer-to-peer systems, such
as Napster and Gnutella, allow people to directly exchange MP3 audio files and othertypes of digital content The SETI@Home project takes advantage of idle time onthe millions of personal computers around the world to search for signs of
extraterrestrial life in a vast collection of cosmic noise
The ubiquity of the network allows for more earthbound applications as well Withthe right knowledge, you can write a robot that will fetch and summarize prices fromcompetitors' Web sites; a script to page you when a certain stock drops below aspecified level; a program to generate daily management reports and send them offvia e-mail; a server that centralizes some number-crunching task on a single high-powered machine, or alternatively distributes that task among the multiple nodes of
a computer cluster
Whether you are searching for the best price on a futon or for life in a distant
galaxy, you'll need to understand how network applications work in order to take fulladvantage of these opportunities You'll need a working understanding of the TCP/IPprotocol—the common denominator for all Internet-based communications and themost common protocol in use in local area networks as well You'll need to knowhow to connect to a remote program, to exchange data with that program, and what
to do when something goes wrong To work with existing applications, such as Webservers, you'll have to understand how the application-level protocols are built ontop of TCP/IP, and how to deal with common data exchange formats such as XMLand MIME
This book uses the Perl programming language to illustrate how to design and
implement practical network applications Perl is an ideal language for networkprogramming for a number of reasons First, like the rest of the language, Perl'snetworking facilities were designed to make the easy things easy It takes just twolines of code to open a network connection to a server somewhere on the Internetand send it a message A fully capable Web server can be written in a few dozenlines of code
Second, Perl's open architecture has encouraged many talented programmers tocontribute to an ever-expanding library of useful third-party modules Many of thesemodules provide powerful interfaces to common network applications For example,after loading the LWP::Simple module, a single function call allows you to fetch thecontents of a remote Web page and store it in a variable Other third-party modulesprovide intuitive interfaces to e-mail, FTP, net news, and a variety of network
databases
Perl also provides impressive portability Most of the applications developed in this
Trang 18However, the most compelling reason to choose Perl for network application
development is that it allows you to fully exploit the power of TCP/IP Perl providesyou with full access to the same low-level networking calls that are available to Cprograms and other natively compiled languages You can create multicast
applications, implement multiplexed servers, and design peer-to-peer systems.Using Perl, you can rapidly prototype new networking applications and developinterfaces to existing ones Should you ever need to write a networking application
in C or Java, you'll be delighted to discover how much of the Perl API carries overinto these languages
Top
Trang 19This book does take advantage of the object-oriented features in Perl version 5 andhigher, but most chapters do not assume a deep knowledge of this system Chapter
1 addresses all the details you will need as a casual user of Perl objects
This book is not a thorough review of the TCP/IP protocol at the lowest level, or aguide to installing and configuring network hubs, routers, and name servers Manygood books on the mechanics of the TCP/IP protocol and network administration arelisted in Appendix D
Top
Trang 20Chapters 1 and 2, Networking Basics and Processes, Pipes, and Signals, review
Perl's functions and variables for input and output, discusses the exceptionsthat can occur during I/O operations, and uses the piped filehandle as thebasis for introducing sockets These chapters also review Perl's process model,including signals and forking, and introduces Perl's object-oriented extensions
Chapter 3, Introduction to Berkeley Sockets, discusses the basics of Internet
networking and discusses IP addresses, network ports, and the principles ofclient/server applications It then turns to the Berkeley Socket API, whichprovides the programmer's interface to TCP/IP
Chapters 4 and 5, The TCP Protocol and The IO::Socket API and Simple TCP Applications, show the basics of TCP, the networking protocol that provides
reliable stream-oriented communications These chapters demonstrate how tocreate client and server applications and then introduce examples that showthe power of technique as well as some common roadblocks
Part II, Developing Clients for Common Services, looks at a collection of the best
third-party modules that developers have contributed to the Comprehensive PerlArchive Network (CPAN)
Chapter 6, FTP and Telnet, introduces modules that provide access to the FTP
file-sharing service, as well as to the flexible Net::Telnet module which allowsyou to create clients to access all sorts of network services
E-mail is still the dominant application on the Internet, and Chapter 7, SMTP: Sending Mail, introduces half of the equation This chapter shows you how to
create e-mail messages on the fly, including binary attachments, and sendthem to their destinations
Chapter 8, POP, IMAP, and NNTP: Processing Mail and Netnews, covers the
other half of e-mail, explaining modules that make it possible to receive mail
Trang 21attachments
Chapter 9, Web Clients, discusses the LWP module, which provides everything
you need to talk to Web servers, download and process HTML documents, andparse XML
Part III, Developing TCP Client/Server Systems—the longest part of the book—
discusses the alternatives for designing TCP-based client/server systems The majorexample used in these chapters is an interactive psychotherapist server, based onJoseph Weizenbaum's classic Eliza program
Chapters 12 and 13, Multiplexed Operations and Nonblocking I/O, discuss the
select() call, which enables an application to process multiple I/O streamsconcurrently without using multiprocessing or multithreading
Chapter 14, Bulletproofing Servers, discusses techniques for enhancing the
reliability and maintainability of network servers Among the topics are
logging, signal handling, and exceptions, as well as the important topic ofnetwork security
Chapter 15, Preforking and Prethreading, presents the forking and threading
models discussed in earlier chapters These enhancements increase a server'sability to perform well under heavy loads
Chapter 16, IO::Poll, discusses an alternative to select() available on UNIXplatforms This module allows applications to multiplex multiple I/O streamsusing an API that some people find more natural than select()'s
Part IV, Advanced Topics, addresses techniques that are useful for specialized
applications
Chapter 17, TCP Urgent Data, is devoted to TCP urgent or "out of band" data.
This technique is often used in highly interactive applications in which the userurgently needs to signal the remote server
Chapters 18 and 19, The UDP Protocol and UDP Servers, introduce the User
Datagram Protocol, which provides a lightweight, message-oriented
communications service Chapter 18 introduces the protocol, and Chapter 19shows how to design UDP servers The major example in this and the next twochapters contain a live online chat and messaging system written entirely inPerl
Trang 22by showing how to build one-to-all and one-to-many message broadcastingsystems In these chapters we extend the chat system to take advantage ofautomatic server discovery and multicasting
Trang 23expected imminently I expect that Perl versions 5.8 and 5.9 (assuming there will besuch versions) will be compatible with the code examples given here as well
Over the horizon, however, is Perl version 6 Version 6, which is expected to be inearly alpha form by the summer of 2001, will fix many of the idiosyncrasies and
misfeatures of earlier versions of Perl In so doing, however, it is expected to breakmost existing scripts Fortunately, the Perl language developers are committed todeveloping tools to automatically port existing scripts to version 6 With an eye to this,
I have tried to make the examples in this book generic, avoiding the more obscurePerl constructions
Cross-Platform Compatibility
More serious are the differences between implementations of Perl on various operatingsystems Perl started out on UNIX (and Linux) systems, but has been ported to manydifferent operating systems, including Microsoft Windows, the Macintosh, VMS, OS/2,Plan9, and others A script written for the Windows platform will run on UNIX or
Macintosh without modifications
The problem is that the I/O subsystem (the part of the system that manages inputand output operations) is the part that differs most dramatically from operating
system to operating system This restricts the ability of Perl to make its I/O systemcompletely portable While Perl's basic I/O functionality is identical from port to port,some of the more sophisticated operations are either missing or behave significantlydifferently on non-UNIX platforms This affects network programming, of course,because networking is fundamentally about input and output
In this book, Chapters 1 through 9 use generic networking calls that will run on allplatforms The exception to this rule is the last example in Chapter 5, which calls afunction that isn't implemented on the Macintosh, fork(), and some of the
introductory discussion in Chapter 2 of process management on UNIX systems Thetechniques discussed in these chapters are all you need for the vast majority of clientprograms, and are sufficient to get a simple server up and running Chapters 10
through 22 deal with more advanced topics in server design The table here showswhether the features in the chapters are supported by UNIX, Windows, or the
Trang 24Chapter Subject UNIX/Linux Windows Macintosh
network programming
Trang 25
Top
Trang 26
Top
Trang 27CPAN is a large Web-based collection of contributed Perl modules You can get access to it via a Web FTP browser, or by using a command-line application built into Perl itself.
Trang 28privileges, you can install the modules in your home directory At the perl Makefile.PL step, provide a PREFIX= argument with the path of your home directory For example, assuming your home directory can be found at /home/jdoe, you would type:
% perl Makefile.PL PREFIX=/home/jdoe
The rest of the install procedure is identical to what was shown earlier
If you are using a custom install directory, you must tell Perl to look in this directory for installed
modules One way to do this is to add the name of the directory to the environment variable PERL5LIB.For example:
Trang 29Running make for GAAS/Digest-MD5-2.00.tar.gz
Fetching with LWP:
ftp://ftp.cis.ufl.edu/pub/perl/CPAN/authors/id/GAAS/Digest-MD5-2.00.tar.gz CPAN: MD5 loaded ok
Fetching with LWP:
ftp://ftp.cis.ufl.edu/pub/perl/CPAN/authors/id/GAAS/CHECKSUMS
Checksum for /home/lstein/.cpan/sources/authors/id/GAAS/Digest-MD5-2.00.tar.gz ok
Digest-MD5-2.00/
Digest-MD5-2.00/typemap
Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.so Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.bs Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/MD5/MD5.so
It is easier, however, to use the ActiveState Perl Package Manager (PPM) This Perl script is installed bydefault in the ActiveState distribution of Perl, available at http://www.activestate.com/ Its interface similar to the command-line CPAN interface shown in the previous section, except that it can installprecompiled binaries as well as pure-Perl scripts For example:
Trang 30http://pudge.net/macperl/macperlmodinstall.html, which also gives instructions on downloading andinstallling them.
Trang 31particular implementation is correct
The RFC archives are mirrored at many locations on the Internet, and maintained insearchable form by several organizations One of the best archives is maintained athttp://www.faqs.org/rfcs/ To retrieve an RFC from this site, go to the indicatedpage and type the number of the desired RFC in the text field labeled "Display thedocument by number." The document will be delivered in a minimally HTMLizedform This page also allows you to search for standards documents, and to searchthe archive by keywords and phrases If you prefer a text-only form, the
http://www.faqs.org/ site contains a link to their FTP site, where you can find anddownload the RFCs in their original form
Plain Old Documentation
Much of Perl's internal documentation comes in Plain Old Documentation (POD)format These are mostly plain text, with a few markup elements inserted to
indicate headings, subheadings, and itemized lists
When you installed Perl, the POD documentation was installed as well The POD files
are located in the pod subdirectory of the Perl library directory You can either read them directly, or use the perldoc script to format and display them in a text pager such as more.
To use perldoc type the command and the name of the POD file you wish to view The best place to start is the Perl table of contents, perltoc:
Trang 33Table of Contents
Content
Trang 34They say that the first skill an editor learns on the job is patience, but I think thatKaren Gettman was born with an excess of it She must have caught on after thesecond or third time that when I said "it should be done in just another week," Ireally was talking about months Yet she never betrayed any sign of dismay, eventhough I'm sure she was fighting an increasingly restive production and marketingstaff To Karen, all I can say is "thank you!"
Thanks also to Mary Hart, the assistant editor responsible for my book I have
worked with Mary on other projects, and I know that it is her tireless effort thatmakes publishing with Addison-Wesley seem so frictionless
I am extremely grateful to the technical reviewers who worked so diligently to keep
me honest: Jon Orwant, James Lee, Harry Hochheiser, Robert Kolstad, Sander
Wahls, and Megan Conklin The book is very much better because of your efforts
I owe a debt of gratitude to the long-suffering members of my laboratory—Ravi,David, Marco, Hong, Guanming, Nathalie, and Peter; they have somehow managed
Trang 35Table of Contents
Content
Trang 36The four chapters that follow will provide the fundamental knowledge you need
to write networking applications in Perl using Berkeley sockets They set thestage for later parts of the book that delve more deeply into specific networkproblems and their solutions
Top
Trang 37Table of Contents
Part 1: Basics
Content
Trang 38This chapter provides you with the background information you'll need to writeTCP/IP applications in Perl We review Perl's input/output (I/O) system using thelanguage's built-in function calls, and then using the object-oriented (OO)
extensions of Perl5 This will prepare you to use the object-oriented constructions inlater chapters
Top
Trang 39success of TCP/IP is due partly to the ubiquity of the sockets API, which is availablefor all major languages including C, C++, Java, BASIC, Python, COBOL, Pascal,FORTRAN, and, of course, Perl The sockets API is similar in all these languages.There may be a lot of work involved porting a networking application from onecomputer language to another, but porting the part that does the socket
communications is usually the least of your problems
For dedicated Perl programmers, the answer to the question that starts this chapter
is clear—because you can! But for those who are not already members of the choir,one can make a convincing argument that not only is networking good for Perl, butPerl is good for networking
A Language Built for Interprocess Communication
Perl was built from the ground up to make it easy to do interprocess communication(the thing that happens when one program talks to another) As we shall see later
in this chapter, in Perl there is very little difference between opening up a local filefor reading and opening up a communications channel to read data from anotherlocal program With only a little more work, you can open up a socket to read datafrom a program running remotely on another machine somewhere on the Internet.Once the communications channel is open, it matters little whether the thing at theother end is a file, a program running on the same machine, or a program running
on a remote machine Perl's input/output functions work in the same way for allthree types of connections
A Language Built for Text Processing
Another Perl feature that makes it good for network applications is its powerfulintegrated regular expression-matching and text-processing facilities Much of thedata on the Internet is text based (the Web, for instance), and a good portion ofthat is unpredictable, line-oriented data Perl excels at manipulating this type ofdata, and is not vulnerable to the type of buffer overflow and memory overrunerrors that make networking applications difficult to write (and possibly insecure) inlanguages like C and C++
Trang 40Perl is an Open Source project, one of the earliest Examining other people's sourcecode is the best way to figure out how to do something Not only is the source codefor all of Perl's networking modules available, but the whole source tree for theinterpreter itself is available for your perusal Another benefit of Perl's openness isthat the project is open to any developer who wishes to contribute to the librarymodules or to the interpreter source code This means that Perl adds features veryrapidly, yet is stable and relatively bug free
The universe of third-party Perl modules is available via a distributed Web-basedarchive called CPAN, for Comprehensive Perl Archive Network You can search CPANfor modules of interest, download and install them, and contribute your own
modules to the archive The preface to this book describes CPAN and how to reachit
Object-Oriented Networking Extensions
Perl5 has object-oriented extensions, and although OO purists may express dismayover the fast and loose way in which Perl has implemented these features, it isinarguable that the OO syntax can dramatically increase the readability and
maintainability of certain applications Nowhere is this more evident than in thelibrary modules that provide a high-level interface to networking protocols Amongmany others, the IO::Socket modules provide a clean and elegant interface to
Berkeley sockets; Mail::Internet provides cross-platform access to Internet mail;LWP gives you everything you need to write Web clients; and the Net::FTP andNet::Telnet modules let you write interfaces to these important protocols
Security
Security is an important aspect of network application development, because bydefinition a network application allows a process running on a remote machine toaffect its execution Perl has some features that increase the security of networkapplications relative to other languages Because of its dynamic memory
management, Perl avoids the buffer overflows that lead to most of thesecurity holes
in C and other compiled languages Of equal importance, Perl implements a
powerful "taint" check system that prevents tainted data obtained from the networkfrom being used in operations such as opening files for writing and executing systemcommands, which could be dangerous
Performance
A last issue is performance As an interpreted language, Perl applications run severaltimes more slowly than C and other compiled languages, and about par with Javaand Python In most networking applications, however, raw performance is not theissue; the I/O bottleneck is On I/O-bound applications Perl runs just as fast (or asslowly) as a compiled program In fact, it's possible for the performance of a Perlscript to exceed that of a compiled program Benchmarks of a simple Perl-basedWeb server that we develop in Chapter 12 are several times better than the C-basedApache Web server
critical portions of your application in C, using the XS extension system Or you can