Its value is programmed to hold the number of threads currently waiting, sothe main thread knows whether or not it can send a signal or if it must wait for a servicethread to become read
Trang 1Alternatively we can get, manipulate, and set a stat object as returned by the stat method in thesame manner as IPC::Msg objects
stat Generate an IPC::Semaphore::stat object we can manipulate and then use
with set For example:
getncnt Return the number of processes that have executed a semop and are blocked
waiting for the value of the specified semaphore to increase in value
$ncnt = $sem->getncnt;
getzcnt Return the number of processes that have executed a semop and are blocked
waiting for the value of the specified semaphore to become zero
The real power of semaphores is bound up in the op method, which performs one or more semaphoreoperations on a semaphore set This is the mechanism by which processes can block and be unblocked
by other processes
Each operation consists of three values; the semaphore number to operate on, an operation to perform,and a flag value The operation is actually a value to increment or decrement by, and follows theserules:
❑ If the value is positive, the semaphore value is incremented by the supplied value This alwayssucceeds, and never blocks
❑ If the supplied value is zero, and the semaphore value is zero, the operation succeeds If thesemaphore value is not zero then the operation blocks until the semaphore value becomeszero This increases the value returned by getzcnt
❑ If the value is negative, then the semaphore value is decremented by this value, unless thiswould take the value of the semaphore negative In this case, the operation blocks until thesemaphore becomes sufficiently positive enough to allow the decrement to happen Thisincreases the value returned by getncnt
We can choose to operate as many semaphores as we like All operations must be able to completebefore the operation as a whole can succeed For example:
Trang 2The rules for blocking on semaphores allow us to create applications that can cooperate with eachother; one application can control the execution of another by setting semaphore values Applicationscan also coordinate over access to shared resources This a potentially large subject, so we will just give
a simple illustrative example of how a semaphore can coordinate access to a common shared resource:
Application 1 creates a semaphore set with one semaphore, value 1, and creates a shared resource, forexample a file, or an IPC shared memory segment However, it decides to do a lot of initialization and
so doesn't access the resource immediately
Application 2 starts up, decrements the semaphore to 0, and accesses the shared resource
Application 1 now tries to decrement the semaphore and access the resource The semaphore is zero,
so it cannot access, it therefore blocks
Application 2 finishes with the shared resource and increments the semaphore, an operation that
always succeeds
Application 1 can now decrement the semaphore since it is now 1, and so the operation succeeds and
no longer blocks
Application 2 tries to access the resource a second time First it tries to decrement the semaphore, but
is unable to, and blocks
Application 1 finishes and increments the semaphore.
Application 2 decrements the semaphore and accesses the resource.
and so on
Although this sounds complex, in reality it is very simple In code, each application simply accesses thesemaphore, creating it if isnot present, and then adds two lines around all accesses to the resource to beprotected:
sub access_resource {
# decrement semaphore, blocking if it is already zero
If we do not want to block while waiting for a semaphore we can specify IPC_NOWAIT for the flag value
We can do this on a 'per semaphore basis' too, if we want, though this could be confusing For example:
Trang 3$sem->op(0, -1, IPC_NOWAIT | SEM_UNDO);
die unless critical_subroutine();
As with message queues, care must be not to leave unused segments around after the last process exits
We will return to the subject of semaphores when we come to talk about threads, which have their ownsemaphore mechanism, inspired greatly by the original IPC implementation described above
Shared Memory Segments
While message queues and semaphores are relatively low level constructs made a little more accessible
by the IPC::Msg and IPC::Semaphore modules, shared memory has an altogether more powerfulsupport module in the form of IPC::Shareable The key reason for this is that IPC::Shareableimplements shared memory through a tie mechanism, so rather than reading and writing from amemory block we can simply attach a variable to it and use that
The tie takes four arguments, a variable (which may be a scalar, an array or a hash),
IPC::Shareable for the binding, and then an access key, followed optionally by a hash referencecontaining one or more key-value pairs For example, the following code creates and ties a hash
variable to a shared memory segment:
use IPC::Shareable;
%local_hash;
tie %local_hash, 'IPC::Shareable', 'key', {create => 1}
$local_hash{'hashkey'} = 'value';
This creates a persistent shared memory object containing a hash variable which can be accessed by anyapplication or process by tie-ing a hash variable to the access key for the shared memory segment (inthis case key):
# in a process in an application far, far away
%other_hash;
tie %other_hash, 'IPC::Shareable', 'key';
$value = $other_hash{'hashkey'};
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 4A key feature of shared memory is that, like memory queues and semaphores, the shared memorysegment exists independently of the application that created it Even if all the users of the sharedmemory exit, it will continue to exist so long as it is not explicitly deleted (we can alter this behavior,though, as we will see in a moment).
Note that the key value is actually implemented as an integer, the same as semaphores and messagequeues, so the string we pass is converted into an integer value by packing the first four characters into a
32 bit integer value This means that only the first four characters of the key are used As a simpleexample, baby and babyface are the same key to IPC::Shareable
The tied variable can be of any type, including a scalar containing a reference, in which case whateverthe reference points to is converted into a shared form This includes nested data structures and objects,making shared memory ties potentially very powerful However, each reference becomes a new sharedmemory object, so a complex structure can quickly exceed the system limit on shared memory
segments In practice we should only try to tie relatively small nested structures to avoid trouble.The fourth argument can contain a number of different configuration options that determine how theshared memory segment is accessed:
Option Function
create If true, create the key if it does not already exist If the key does exist, then the
tie succeeds and binds to the existing data, unless exclusive is also true Ifcreate is false or not given then the tie will fail if the key is not present
exclusive Used in conjunction with create If true, allows a new key to be created but
does not allow an existing key to be tied to successfully
mode Determine the access permissions of the shared memory segment The value is
an integer, traditionally an octal number or a combination of flags like S_IRWXU
| S_IRGRP.destroy If true, cause the shared memory segment to be destroyed automatically when
this process exits (but not if it aborts on a signal) In general only the creatingapplication should do this, or be able to do this (by setting the permissionsappropriately on creation)
size Define the size of the shared memory segment, in bytes In general this defaults
to an internally set maximum value, so we rarely need to use it
key If the tie is given three arguments, with the reference to the configuration options
being the third, this value specifies the name of the shared memory segment:
tie %hash, 'IPC::Shareable' {key => 'key', };
For example:
tie @array, 'IPC::Shareable', 'mysharedmem', {create => 1,
exclusive => 0,mode => 722,destroy => 1,}
Trang 5remove Remove the shared memory segment, if we have permission to do so.
clean_up Remove all shared memory segments created by this process
clean_up_all Remove all shared memory segments in existence for which this process
has permissions to do so
For example:
# grab a handle to the tied object via the 'tied' command
$shmem = tied $shared_scalar;
# use the object handle to call the 'remove' method on itprint "Removed shared scalar" if $shmem->remove;
We can also lock variables using the IPC::Shareable object's shlock and shunlock methods If thevariable is already locked, the process attempting the lock will block until it becomes free For example:
Threads
Threads are, very loosely speaking, the low fat and lean version of forked processes The trouble withfork is that it not only divides the thread of execution into two processes, it divides their code and datatoo This means that where we had a group containing the Perl interpreter, %ENV hash, @ARGV array,and the set of loaded modules and subroutines, we now have two groups In theory, this is wasteful ofresources, very inconvenient, and unworkable for large numbers of processes Additionally, sinceprocesses do not share data they must use constructs like pipes, shared memory segments or signals tocommunicate with each other In practice, most modern UNIX systems have has become very
intelligent in sharing executable segments behind the scenes, reducing the expense of a fork
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 6Regardless, threads attempt to solve all these problems Like forked processes, each thread is a
separate thread of execution Also similar to forked processes, newly created threads are owned bythe thread that created them, and they have unique identifiers, though these are thread IDs rather thanprocess IDs We can even wait for a thread to finish and collect its exit result, just like waitpid doesfor child processes However, threads all run in the same process and share the same interpreter, code,and data, nothing is duplicated except the thread of execution This makes them much more
lightweight, so we can have very many of them, and we don't need to use any of the workarounds thatforked processes need
Thread support in Perl is still experimental Added in Perl 5.6, work continues into Perl 5.7 and Perl 6
As shipped, most Perl interpreters do not support threads at all, but if we build Perl from scratch, as inChapter 1, we can enable them There are actually two types of thread; the current implementationprovides for a threaded application but with only one interpreter dividing its attention between them.The newer implementation uses an interpreter that is itself threaded, greatly improving performance.However, this thread support is so new it isn't even available in Perl code yet The likelihood is that fullofficial thread support will arrive with Perl 6, but that doesn't mean we can't use it now – carefully – forsome useful and interesting applications
Checking for Thread Support
To find out if threads are available programmatically we can check for the usethreads key in the
Trang 7The choice of new or async depends on the nature of the thread we want to start; the two are identical
in all respects apart from their syntax If we only want to start one instance of a thread then asyncshould be used whereas new is better if we want to use the same subroutine for many different threads:
$thread1 = new Thread \&threadsub, $arg1;
$thread2 = new Thread \&threadsub, $arg2;
$thread3 = new Thread \&threadsub, $arg3;
Or, with a loop:
# start a new thread for each argument passed in @ARGV:
@threads;
foreach (@ARGV) {push @threads, new Thread \&threadsub, $_;}
We can do this with a certain level of impunity because threads are so much less resource consumptivethan forked processes
Identifying Threads
Since we can start up many different threads all with the same subroutine as their entry point, it mightseem tricky to tell them apart However, this is not so First, we can pass in different arguments when westart each thread to set them on different tasks An example of this would be a filehandle, newly created
by a server application, and this is exactly what an example of a threaded server in Chapter 23 does.Second, a thread can create a thread object to represent itself using the self class method:
Trang 8It is possible to have more than one thread object containing the same thread ID, and this is actuallycommon in some circumstances We can check for equivalence by comparing the IDs, but we can dobetter by using the equals method:
print "Equal! \n" if $self->equal($thread);
Or, equivalently:
print "Equal! \n" if $thread->equal($self);
Thread identities can be useful for all kinds of things, one of the most useful being thread-specific data
Thread-specific Data
One of the drawbacks of forked processes is that they do not share any of their data, making
communication difficult Threads have the opposite problem; they all share the same data, so findingprivacy is that much harder
Unlike some other languages, Perl does not have explicit support for thread-specific data, but we canimprovise If our thread all fits into one subroutine we can simply create a lexical variable with my anduse that If we do have subroutines to call and we want them to be able to access variables we create theour can be used Alternatively we can create a global hash of thread IDs and use the values for thread-specific data:
new Thread \&thread_sub, $arg1;
new Thread \&thread_sub, $arg2;
The advantage of using my or our is that the data is truly private, because it is lexically scoped to theenclosing subroutine The advantage of the global hash is that threads can potentially look at eachother's data in certain well-defined circumstances
Thread Management
Perl keeps a list of every thread that has been created We can get a copy of this list, as thread objects,with the Thread->list class method:
@threads = Thread->list;
Trang 9One of these threads is our own thread We can find out which by using the equal method:
$self = Thread->self;
foreach (@threads) {next if $self->equal($_);
}
Just because a thread is present does not mean that it is running, however Perl keeps a record of thereturn value of every thread when it exits, and keeps a record of the thread for as long as that valueremains unclaimed This is similar to child processes that have not had waitpid called for them Thethreaded equivalent of waitpid is the join method, which we call on the thread we want to retrievethe exit value for:
$return_result = $thread->join;
The join method will block until the thread on which join was called exits If the thread aborts (forexample by calling die) then the error will be propagated to the thread that called join This meansthat it will itself die unless the join is protected by an eval:
$return_result = eval {$thread->join;}
if ($@) {warn "Thread unraveled before completion \n";
}
It is bad form to ignore the return value of a thread, since it clutters up the thread list with dead threads
If we do not care about the return value then we can tell the thread that we do not want it to linger andpreserve its return value by telling it to detach:
$thread->detach;
The catch to this is that if the thread dies, nobody will notice, unless a signal handler for the DIE hook, which checks Thread->self for the dying thread, has been registered To reiterate: if we join amoribund thread from the main thread without precautions we do have to worry about the applicationdying as a whole
As a slightly fuller and more complete (although admittedly not particularly useful) example, this shortprogram starts up five threads, then joins each of them in turn before exiting:
#!/usr/bin/perl
# join.pluse warnings;
use strict;
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 10# check we have threads
# wait for the last thread started to end
while (my $thread = shift @threads) {
print "Waiting for thread ", $thread -> tid, " to end \n";
$thread->join;
print "Ended \n";
}
# exit
print "All threads done \n";
Typically, we care about the return value, and hence would always check it However, in this case weare simply using join to avoid terminating the main thread prematurely
Variable Locks
When multiple threads are all sharing data together we sometimes have problems stopping them fromtreading on each other's toes Using thread-specific data solves part of this problem, but it does not dealwith sharing common resources amongst a pool of threads
The lock subroutine does handle this, however It takes any variable as an argument and places a lock
on it so that no other thread may lock it for as long as the lock persists, which is defined by its lexicalscope The distinction between lock and access is important; any thread can simply access thevariable by not bothering to acquire the lock, so the lock is only good if all threads abide by it
As a short and incomplete example, this subroutine locks a global variable for the duration of its body:
$global;
sub varlocksub {
lock $global;
}
Trang 11It is not necessary to unlock the variable; the end of the subroutine does it for us Any lexical scope isacceptable, so we can also place locks inside the clauses of if statements, loops, map and grep blocks,and eval statements We can also choose to lock arrays and hashes in their entirety if we lock thewhole variable:
Condition Variables, Semaphores, and Queues
Locked variables have more applications than simply controlling access We can also use them as
conditional blocks by having threads wait on a variable until it is signaled to proceed In this mode the
variable, termed a condition variable, takes on the role of a starting line; each thread lines up on the
block (so to speak) until the starting pistol is fired by another thread Depending on the type of signal
we send, either a single thread is given the go-ahead to continue, or all threads are signaled
Condition variables are a powerful tool for organizing threads, allowing us to control the flow of datathrough a threaded application and preventing threads from accessing shared data in an unsynchronized
manner They are also the basis for other kinds of thread interaction We can use thread semaphores,
provided by the Thread::Semaphore module, to signal between threads We can also implement a
queue of tasks between threads using the Thread::Queue module Both these modules build upon thebasic features of condition variables to provide their functionality but wrap them in a more convenientform
To get a feel for how each of these work we will implement a basic but functional threaded application,first using condition variables directly, then using semaphores, and finally using a queue
Condition Variables
Continuing the analogy of the starting line, to 'line up' threads on a locked variable we use thecond_wait subroutine This takes a locked variable, unlocks it, and then waits until it receives a signalfrom another thread When it receives a signal, the thread resumes execution and relocks the variable
To have several threads all waiting on the same variable we need only have each thread lock and thencond_wait the variable in turn Since the lock prevents more than one thread executing cond_wait
at the same time, the process is automatically handled for us The following code extract shows the basictechnique applied to a pool of threads:
Team-Fly®Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 12$lockvar; # lock variable – note it is not locked at this point
sub my_waiting_thread {
# wait for signal{
we do with cond_signal:
# wake up one thread
cond_signal $lockvar;
This will unlock one thread waiting on the condition variable The thread that is restarted is essentiallyrandom; we cannot assume that the first thread to block will be the first to be unlocked again This isappropriate when we have a pool of threads at our disposal, all of which perform the same basicfunction Alternatively, we can unlock all threads at once by calling cond_broadcast thus:
# everybody up!
cond_broadcast $lockvar;
This sends a message to each thread waiting on that variable, which is appropriate for circumstanceswhere a common resource is conditionally available and we want to stop or start all threads depending
on whether the resource is available or not It is important to realize, however, that if no threads arewaiting on the variable, the signal is discarded; it is not kept until a thread is ready to respond to it It isalso important to realize that this has nothing (directly) to do with process signals, as handled by the
%SIG array; writing a threaded application to handle process signals is a more complex task; see theThread::Signal module for details
Note that the actual value of the lock variable is entirely irrelevant to this process, so we can use it forother things For instance we can use it to pass a value to the thread at the moment that we signal it Thefollowing short threaded application does just this, using a pool of service threads to handle lines ofinput passed to them by the main thread While examining the application, pay close attention to thetwo condition variables that lay at the heart of the application:
Trang 13❑ $pool – used by the main thread to signal that a new line is ready It is waited on by all theservice threads Its value is programmed to hold the number of threads currently waiting, sothe main thread knows whether or not it can send a signal or if it must wait for a servicethread to become ready
❑ $line – used by whichever thread is woken by the signal to $pool Lets the main threadknow that the line of input read by the main thread has been copied to the service thread andthat a new line may now be read The value is the text of the line that was read
The two condition variables allow the main thread and the pool of service threads to cooperate witheach other This ensures that each line read by the main thread is passed to one service thread bothquickly and safely:
#!/usr/bin/perl
# threadpool.pluse warnings;
use strict;
use Thread qw(cond_signal cond_wait cond_broadcast yield);
my $threads = 3; # number of service threads to create
my $pool = 0; # child lock variable and pool counter set to 0 here,
# service threads increment it when they are ready for input
my $line=""; # parent lock variable and input line set to "" here, we
# assign each new line of input to it, and set it to 'undef'
# when we are finished to tell service threads to quit
# a locked print subroutine – stops thread output minglingsub thr_print : locked {
print @_;}
# create a pool of three service threadsforeach (1 $threads) {
new Thread \&process_thing;
}
# main loop: Read a line, wait for a service thread to become available,
# signal that a new line is ready, then wait for whichever thread picked
# up the line to signal to continuewhile ($line = <>) {
chomp $line;
thr_print "Main thread got '$line'\n";
# do not signal until at least one thread is ready
if ($pool==0) {thr_print "Main thread has no service threads available, yielding\n";
yield until $pool>0;
}thr_print "Main thread has $pool service threads available\n";
# signal that a new line is ready{
lock $pool;
cond_signal $pool;
}thr_print "Main thread sent signal, waiting to be signaled\n";
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 14# wait for whichever thread wakes up to signal us{
lock $line;
cond_wait $line;
}thr_print "Main thread received signal, reading next line\n";
}
thr_print "All lines processed, sending end signal\n";
# set the line to special value 'undef' to indicate end of input
# has the 'quit' signal been sent while we were busy?
last unless (defined $line);
# wait to be woken upthr_print "Thread ",$self->tid," waiting\n";
{lock $pool;
$thread_line = $line;
# was this the 'quit' signal? Check the value sentlast unless (defined $thread_line);
# let main thread know we have got the valuethr_print "Thread ",$self->tid," retrieved data, signaling main\n";{
}
Trang 15The means by which the application terminates is also worth noting Threads do not necessarily
terminate just because the main thread does, so in order to exit a threaded application cleanly we need
to make sure all the service threads terminate, too This is especially important if resources needed bysome threads are being used by others Shutting down threads in the wrong order can lead to seriousproblems In this application the main thread uses cond_signal to signal the $pool variable and wake
up one service thread when a new line is available Once all input has been read we need to shut downall the service threads, which means getting their entire attention To do that, we give $line the specialvalue undef and then use cond_broadcast to signal all threads to pick up the new 'line' and exitwhen they see that it is undef However, this alone is not enough because a thread might be busy andnot waiting To deal with that possibility, the service thread subroutine also checks the value of $line
at the top of the loop, just in case the thread missed the signal
Finally, this application also illustrates the use of the locked subroutine attribute The thr_printsubroutine is a wrapper for the regular print function that only allows one thread to print at a time.This prevents the output of different threads from getting intermingled For simple tasks like this one,locked subroutines are an acceptable solution to an otherwise tricky problem that would require at least
a lock variable For longer tasks, locked subroutines can be a serious bottleneck, affecting the
performance of a threaded application, so we should use them with care and never for anything likely totake appreciable time
Semaphores
Although it works perfectly well, the above application is a little more complex than it needs to be.Most forms of threads (whatever language or platform they reside on) support the concept of
semaphores and Perl is no different We covered IPC semaphores earlier, and thread semaphores are
very similar They are essentially numeric flags that take a value of zero or any positive number andobey the following simple rules:
❑ Only one thread at a time may manipulate the value of a semaphore in either direction
❑ Any thread may increment a semaphore immediately
❑ Any thread may decrement a semaphore immediately if the decrement will not take it belowzero
❑ If a thread attempts to decrement a semaphore below zero, it will block until another threadraises the semaphore high enough
Perl provides thread semaphores through the Thread::Semaphore module, which implementssemaphores in terms of condition variables – the code of Thread::Semaphore is actually quite short,
as well as instructive It provides one subroutine and two object methods:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 16new – create new semaphore, for example:
$semaphore = new Thread::Semaphore; # create semaphore, initial value 1
$semaphore2 = new Thread::Semaphore(0) # create semaphore, initial value 0
up – increment semaphore, for example:
$semaphore->up; # increment semaphore by 1
$semaphore->up(5); # increment semaphore by 5
down – decrement semaphore, blocking if necessary:
$semaphore->down; # decrement semaphore by 1
$semaphore->down(5); # decrement semaphore by 5
Depending on our requirements we can use semaphores as binary stop/go toggles or allow them torange to larger values to indicate the availability of a resource Here is an adapted form of our earlierthreaded application, rewritten to replace the condition variables with semaphores:
my $threads = 3; # number of service threads to create
my $line = ""; # input line
my $main = new Thread::Semaphore; # proceed semaphore, initial value 1
my $next = new Thread::Semaphore(0); # new line semaphore, initial value 0
# a locked print subroutine – stops thread output mingling
sub thr_print : locked {
print @_;}
# create a pool of three service threads
foreach (1 $threads) {
new Thread \&process_thing;
}
# main loop: read a line, raise 'next' semaphore to indicate a line is
# available, then wait for whichever thread lowered the 'next' semaphore
# to raise the 'main' semaphore, indicating we can continue
while ($line = <>) {
chomp $line;
thr_print "Main thread got '$line'\n";
# notify service threads that a new line is ready
$next->up;
thr_print "Main thread set new line semaphore, waiting to proceed\n";
Trang 17thr_print "All lines processed, sending end signal\n";
# set the line to special value 'undef' to indicate end of input
$line = undef;
# to terminate all threads, raise 'new line' semaphore to >= number of
# service threads: all service threads will decrement it and read the
$thread_line = $line;
# was this the 'quit' signal? Check the value sentlast unless (defined $thread_line);
# let main thread know we have got the valuethr_print "Thread ", $self->tid, " retrieved data, signaling main\n";
}
The semaphore version of the application is simpler than the condition variable implementation, ifonly because we have hidden the details of all the cond_wait and cond_signal functions inside calls
to up and down Instead of signaling the pool of service threads via a condition variable, the main
thread simply raises the next semaphore by one, giving it the value 1 Meanwhile, all the service threadsare attempting to decrement this semaphore One will succeed and receive the new line of input, andthe others will fail, continuing to block until the semaphore is raised again When it has copied the line
to its own local variable, the thread raises the main semaphore to tell the main thread that it can
proceed to read another line The concept is recognizably the same as the previous example, but iseasier to follow
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 18We have also taken advantage of the fact that semaphores can hold any positive value to terminate theapplication When the main thread runs out of input it simply raises the 'next' semaphore to be equal tothe number of service threads At this point all the threads can decrement the semaphore, read the value
of $line that we again set to undef, and quit If a thread is still busy the semaphore will remainpositive until it finishes and comes back to decrement it – we have no need to put in an extra check incase we missed a signal
Queues
Many threaded applications, our example a case in point, involve the transport of data between severaldifferent threads In a complex application incoming data might travel through multiple threads, passedfrom one to the next before being passed out again: a bucket-chain model We can create pools ofthreads at each stage along the chain in a similar way to the example application above, but this doesnot improve upon the mechanism that allows each thread to pass data to the next in line
The two versions of the application that we have produced so far are limited by the fact that they onlyhandle a single value at a time Before the main thread can read another line, it has to dispose of theprevious one Even though we can process multiple lines with multiple service threads, the
communication between the main thread and the service threads is not very efficient If we werecommunicating between different processes we might use a pipe, which buffers output from one process
until the other can read it; the same idea works for threads, too, and takes the form of a queue.
Perl provides support queues through the Thread::Queue module, which implements simple threadqueues in a similar way to the semaphores created by Thread::Semaphore Rather than a singlenumeric flag, however, the queue consists of a list to which values may be added to one and removedfrom the other At heart this is essentially no more than a shift and pop operation Using conditionalvariables and locking the module, however, creates a queue that values may be added to and removedfrom safely in a threaded environment, following rules similar to those for semaphores:
❑ Only one thread may add or remove values in the queue at a time
❑ Any thread may add values to a queue immediately
❑ Any thread may remove values from a queue immediately if there are enough values available
$queue = new Thread::Queue;
enqueue – add one or more values to a queue, for example:
$queue->enqueue($value); # add a single value
$queue->enqueue(@values); # add several values
Trang 19dequeue – remove a single value from a queue, blocking if necessary:
$value = $queue->dequeue; # remove a single value, block
dequeue_nb – remove a single value from a queue, without blocking (instead returning undef ifnothing is available):
$value = $queue->dequeue; # remove a single value, don't block
if (defined $value) {
}
pending – return the number of values currently in the queue, for example:
print "There are ",$queue->pending," items in the queue\n";
Using a queue we can rewrite our threaded application again to separate the main thread from the pool
of service threads Since the queue can take multiple values, the main thread no longer has to wait foreach value it passes on to be picked up before it can continue This simplifies both the code and theexecution of the program The queue has no limit, however, so we make sure not to read too much bychecking the size of the queue, and yielding if it reaches a limit we choose Here is a revised version ofthe same application, using a queue:
#!/usr/bin/perl
# queue.pluse warnings;
use strict;
use Thread 'yield';
use Thread::Queue;
use Thread::Semaphore;
my $threads = 3; # number of service threads to create
my $maxqueuesize = 5; # maximum size of queue allowed
my $queue = new Thread::Queue; # the queue
my $ready = new Thread::Semaphore(0); # a 'start-gun' semaphore
# initialized to 0 each service
# thread raises it by 1
# a locked print subroutine – stops thread output minglingsub thr_print : locked {
print @_;}
# create a pool of service threadsforeach (1 $threads) {
new Thread \&process_thing, $ready, $queue;
Trang 20# main loop: Read a line, queue it, read another, repeat until done
# yield and wait if the queue gets too large
while (<>) {
chomp;
thr_print "Main thread got '$_'\n";
# stall if we're getting too far ahead of the service threadsyield while $queue->pending >= $maxqueuesize;
# queue the new line
$queue->enqueue($_);
}
thr_print "All lines processed, queuing end signals\n";
# to terminate all threads, send as many 'undef's as there are service
# threads
$queue->enqueue( (undef)x$threads );
thr_print "Main thread ended\n";
exit 0;
# the thread subroutine - block on lock variable until work arrives
sub process_thing {
my ($ready,$queue)=@_;
my $self=Thread->self;
my $thread_line;
thr_print "Thread ",$self->tid," started\n";
$ready->up; #indicate that we're ready to go
}
Since the service threads block if no values are waiting in the queue, this approach effectively handlesthe job of having service threads wait – we previously dealt with this using condition variables andsemaphores However, we don't need a return semaphore anymore because there is no longer any needfor a service thread to signal the main thread that it can continue – the main thread is free to continue assoon as it has copied the new line into the queue
The means by which we terminate the program has also changed Originally we set the line variable toundef and broadcast to all the waiting threads We replaced that with a semaphore, which we raisedhigh enough so that all service threads could decrement it With a queue we use a variation on thesemaphore approach, adding sufficient undef values to the queue so that all service threads can removeone and exit
Trang 21We have added one further refinement to this version of the application – a 'start-gun' semaphore.Simply put, this is a special semaphore that is created with a value of zero and increments it by one eachservice thread as it starts The main thread attempts to decrement the semaphore by a number equal tothe number of service threads, so it will only start to read lines once all service threads are running.Why is this useful? Because threads have no priority of execution In the previous examples the firstservice threads will start receiving and processing lines before later threads have even initialized In abusy threaded application, the activity of these threads may mean that the service threads started last do
so very slowly, and may possibly never get the time to initialize themselves properly In order to makesure we have a full pool of threads at our disposal, we use this semaphore to hold back the main threaduntil the entire pool is assembled
There are many other ways that threaded applications can be built and with more depth to theexamples than we have shown here Applications that also need to handle process signals shouldconsider the Thread::Signal module, for instance Perl threads are still experimental, so an in-depthdiscussion of them is perhaps inappropriate However, more examples and a more detailed discussion ofthreaded programming models and how Perl handles threads can be found in the perlthrtut manualpage by executing:
> perldoc perlthrtut
Thread Safety and Locked Code
Due to the fact that threads share data, modules and applications that were not built with them in mindcan become confused if two threads try to execute the same code This generally comes about when amodule keeps data in global variables, which is visible to both threads and which is overwritten by each
of them in turn The result is threads that pollute each other's data
The best solution to avoid this problem is to rewrite the code to allow multiple threads to execute it at
once Code that does this is called thread safe If this is too hard or time consuming we can use the
:locked subroutine attribute to force a subroutine into a single-threaded mode, only one thread can beinside the subroutine at any one time (other calling threads will block until the current call is
completed):
singlethreadedsub:locked {print "One at a time! Wait your turn! \n";
}
This works well for functional applications, but for object-oriented programming locking a method can
be overly restrictive; we only need to make sure that two threads handling the same object do notconflict To do this we can add the :method attribute:
Trang 22We can also lock a subroutine in code by calling lock on it:
Over the course of this chapter we have covered the following:
❑ Signals and how to handle them
❑ Using Perl's fork function to start new processes
❑ Inter-Process Communication and the various ways in which different processes cancommunicate with one another
❑ Sharing of data between processes, using message queues, semaphores, and shared memorysegments
❑ Thread support in Perl
Trang 23955 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 25Networking with Perl
Network programming is a very large topic In this chapter, we will attempt to provide a solid
foundation in the basic concepts and principles of networking We will cover the TCP/IP protocol andits close relatives, then move on to developing simple network clients and servers using Perl's standardlibraries We will also look at the functions Perl provides for determining network configurations, andthe helper modules that make analyzing this information simpler
Protocol Layers
For any message to be communicated it must travel through some form of medium This medium canrange from air and space (used for satellite links and packet radio), to copper cabling and fiber optics.The medium of choice for the last twenty years has been copper cabling, due to the ubiquity of cost-effective equipment based on open standards like Ethernet This supplies the physical part of thenetwork, also known as the physical layer
Trang 26On top of this is layered a series of protocols, each providing a specific function – the preceding layeracting as the foundation for the one above ISO worked on a seven-layer reference model for protocolstandardization, to allow multiple vendors to communicate with each other While it does not defineany specific protocol or implementation, many modern protocols were designed against its model TheOSI (Open Systems Interconnection) reference model appears as follows:
OSI Model Application Presentation Session Transport Network Data-link Physical
The bottom layer, which we already mentioned, is the communication, or physical, medium UsingEthernet as an example, this layer includes not only the copper cabling, but also the impedance andvoltage levels used on it
The second layer, known as the data-link layer, defines how that medium is to be used For instance, ourstandard telephone systems are analog devices; the voice traffic is carried over analog carrier waves onthe wire Ethernet, like most computer network devices, prefers square waves Because of this, this layercomprises the circuitry used to encode any transmissions in a form the physical layer can deliver, anddecode any incoming traffic It also has to ensure successful transmission of traffic
If none of this makes any sense, don't despair, as this is typically the domain of electrical and
electronic engineers, not programmers.
The network layer works very closely with the data-link layer Not only must it ensure that it is
providing the data-link layer with information it can encode (handling only digital traffic to digitalnetworks, etc.), but it also provides the addressing scheme to be used on the network
Transport layers provide handshaking and check to ensure reliability of delivery Certain protocolsdemand acknowledgment of every packet of information delivered to the recipient, and if certainportions aren't acknowledged, they are re-sent by this layer
The session layer is responsible for establishing and maintaining sessions, as its name would suggest.Sessions are typically necessary for larger and multiple exchanges between specific services and/orusers, where the order of information transmitted and received is of great importance
The presentation layer is just that – a layer responsible for presenting a consistent interface or API foranything needing to establish and maintain connections
Trang 27Finally, the application layer specifies precisely how each type of service will communicate across theseconnections HTTP, NNTP, and SMTP are examples of application protocols
Frames, Packets, and Encapsulation
In order to put all of this into perspective, let's examine how one of the most prevalent network
architectures compares to this reference model Since we'll be reusing much of this information later inthe chapter, we'll look at TCP/IP networks running over Ethernet:
OSI Model Application Presentation Session Transport Network Data-link Physical
TCP/IP over Ethernet
Application/Terminal
Ethernet Topology IP TCP/UDP
As you can see, the Ethernet standard itself (of which, there are several variations and evolutions)encompasses both the physical and data-link layers At the physical layer, it specifies the electricalsignaling, clocking requirements, and connectors
At the data-link layer, Ethernet transceivers accomplish the encoding and decoding of traffic, andensure delivery of traffic at both ends of the communication This last part is actually a bit more
involved than one might guess, due to the nature of the physical layer Since the medium is copper, itshould be obvious that only one host (or node) can transmit at a time If more than one tries to send out
a signal simultaneously, all signals are effectively scrambled by the mixed voltage levels This event iscalled a collision The Ethernet specification anticipates that this will happen (especially on extremelybusy networks), and so provides a solution When a collision is detected, each node will wait for arandom interval before re-transmitting its data
In order to prevent particularly long communication sessions between hosts from hogging the line,Ethernet specifies that all communications be of a specific length of data, called a frame Each frame isdelivered only as the wire is sensed as free of traffic This allows multiple hosts to appear to be
communicating simultaneously In truth, they are cooperatively sending out their frames one at a time,allowing control of the wire to be passed among several hosts before sending out the succeeding frames
On the receiving end, the specification requires that every node on the network have a unique hardwareaddress (called a MAC address) Every frame is addressed to a specific address, and hence, only thenode with that address should actually listen to those frames Each address is comprised of two 24-bitsegments The first is the manufacturer's ID, assigned by the IEEE body The last 24 bits are the uniqueequipment identifiers The exception, of course, is broadcast traffic, which is not so addressed, since allnodes should be listening for it
This brings up a couple of caveats that I should mention here; while in theory you should never have toworry about two pieces of off-the-shelf gear having the same MAC address, it can and does happen,especially with those units that allow that address to be reprogrammed This can cause some quitebizarre network problems that are extremely difficult to isolate if we're not expecting them
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 28Another caveat is that while each node should only read the frames addressed to it, it doesn't necessarilyhave to be that way Low-level network monitoring software ignores this, and reads all traffic This iscalled sniffing the wire, and there are legitimate reasons to do this With the case mentioned above,realizing that two nodes have the same MAC address often requires that you sniff the wire As you cansee, Ethernet is a carefully managed cooperative environment, but it is fallible.
To learn more about Ethernet, or any of the competing standards, we can order publications from theIEEE (Institute of Electrical and Electronics Engineers) web site (http://www.ieee.org/), which governsthe standards; or alternatively, from related sites like the Gigabit Ethernet Alliance (http://www.gigabit-ethernet.org/) http://www.ethernet.org/, while offering a few forums discussing Ethernet standards, isnot affiliated with any of the official industry bodies Lastly, we can always search on the Internet forIEEE 802.3, the CSMA/CD specification that defines the operation of Ethernet technologies For
printed references, Macmillan Technical Publishing's Switched, Fast, and Gigabit Ethernet
ISBN-1578700736, written by Robert Breyer and Sean Riley, is one of the best publications available Itcovers the whole gamut of Ethernet networking, including bridging, routing, QOS, and more
The Internet Protocol
Referring once more to our comparison chart, we now move into the realm of TCP/IP TCP/IP isactually a bit misleading, since it refers to two separate protocols (and more are used, though notexplicitly mentioned) As can be seen, the network layer in the OSI model directly correlates to IP(Internet Protocol), providing an addressing and routing scheme for the networks Why, if Ethernetprovides for addressing via MAC addresses, we need yet another addressing scheme? The answer is,because of the segmented nature of large networks The Ethernet MAC system only works if every nodeaddressed is on the same wire (or bridge) If two segments, or even two separate networks, are linkedtogether via a router or host, then a packet addressed from one host on one network to another on theother network would never reach its destination if it could only rely on the MAC address IP providesthe means for that packet to be delivered across the bridging router or host onto the other network,where it can finally be delivered via the MAC address
We'll illustrate this more clearly in a moment, but first let's get a better understanding of what IP is IP
as we know it today is actually IPv4, a protocol that provides a 32 bit segmented addressing system The
protocol specification is detailed in RFC 791 (all RFC documents can be obtained from the RFCEditor's office web site, http://www.rfc-editor.org/, which is responsible for assigning each of themnumbers and publishing them)
As an aside, we should notice that we're dealing with more than one governing body just to implementone model The reason for this is simple; the IEEE, being electrical and electronics engineers, areresponsible for the hardware portion of the implementation Once we move into software, though, mostopen standards are suggested through RFCs, or 'Request for Comments' Multiple bodies work closelywith the RFC Editor's office, including IANA (Internet Assigned Numbers Authority), IETF (InternetEngineering Task Force), and others
There is an emerging new standard, called IPv6, which offers a larger addressing space and additional
features, but we won't be covering them here since IPv6 networks are still fairly rare You can readmore about IPv6 through RFC 2460
Trang 29Back to IPv4 and its addressing scheme Notice that we said that IPv4 provides a segmented 32-bitaddress space This address space is similar to MAC addresses, in that we divide the 32 bits into twoseparate designators Unlike MAC's manufacturer:unit–# scheme, the first segment refers to the
network, and the second the host Also unlike MAC, the division doesn't have to be equally sized – infact, it rarely is While the distinction between the network and host can appear subtle, it's extremelyimportant for routing traffic through disparate networks, as you will see Regardless of this, an IPaddress is usually represented as four quads (as in four bits) in decimal form, and incorporates both thenetwork and the host segment: 209.165.161.91
In RFC 1700, the Assigned Numbers standard, five classes of networks are defined:
Class Range Bitmask of first octet
The 10.0.0.0 network address, for instance, is a class A network, since the first octet falls in the class
A range, and 192.168.1.0 is a class C network In practice, A-C are reserved for unicast addresses(networks where nodes typically communicate with single end-to-end connections), class D is reservedfor multicast (nodes typically talking simultaneously to groups of other nodes), and E is reserved forfuture use We need only be concerned with A through C here
The class of each unicast network determines the maximum size of each network, since each class masksout an additional octet in each IP address, restricting the number of hosts:
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 30An administrator assigned a class C network, like 198.168.1.0, would only be able to use the last octet,and hence only address 256 hosts (actually, the number of hosts is always 2 less, since 0 is reserved forthe network address, and the highest number is usually reserved for network-wide broadcasts, which is
Class Private IP block
A 10.0.0.0 - 10.255.255.255
B 172.16.0.0 - 172.31.255.255
C 192.168.0.0 - 192.168.255.255
Because these are unregulated addresses, multiple organizations will often use the same blocks
internally This does not cause any conflicts since these blocks are not routable on public networks (likethe Internet) Any host on a company network using these blocks needs to either go through a proxywith a public address, or use some sort of network address translation (often referred to as NAT) inorder to communicate with other hosts on public networks Private IP assignments are covered in RFC1918
It is important to note that none of this precludes the ability of any network administrator from
subdividing internal network blocks in any manner they see fit – dividing the 10.x.x.x class A addressspace into several class Cs, or even smaller divisions (bit masks can also be applied to the last octet aswell) This is performed quite frequently to relieve network congestion and localize traffic onto subnets.This, in essence, creates a classless (or CIDR, Classless InterDomain Routing) network, though it maystill appear as one monolithic class network to the outside world
IP uses these network addresses and masks to determine the proper routing and delivery for each bit ofinformation being sent Each connection is evaluated as either needing local or remote delivery Localdelivery is the easiest, since IP will actually delegate to the lower layers to deliver the data:
Trang 31❑ IP receives a packet (in this case, packet refers to some kind of payload) for delivery
❑ IP determines that local delivery can be done since the destination address appears to be onthe same network as its own
❑ An ARP request is broadcasted asking, 'Who has 10?'
❑ 192.168.1.10 replies with its own MAC address
❑ IP hands over the packet to the Ethernet layer for delivery to the MAC in the ARP response
❑ Ethernet delivers the packet directly
This method will not work, however, when the network is segmented When this is done, some sort ofbridge or router must exist that links the two networks, such as a host with two NICs, or a router Thatbridge will have a local address on both networks, and the network will be configured to deliver alltraffic that's not for local hosts through the bridge's local address
Recall that we explained that Ethernet delivers all of its payloads in frames IP delivers all of itspayloads in packets, and both frames and packets have their own addressing schemes The Ethernetlayer is intentionally ignorant of what it is delivering, and who the ultimate recipient is, since its onlyconcern is local delivery In order to preserve all of that extra addressing information that IP requires, itperforms complete encapsulation of the IP packet That packet, as you may have guessed, has morethan just raw data from the upper layers, it begins with a header that includes the IP addressinginformation, with the remainder taken up by the data itself Ethernet receives that packet (which must
be small enough to fit into an Ethernet frame's data segment), stores the entire packet in its frame's datasegment, and delivers it to the local recipient's MAC address
Accordingly, remote traffic is processed at two layers Once IP determines that it is delivering to a
remote host, it requests the local address of the router or gateway, and has Ethernet deliver the packet
to the gateway, and the gateway forwards to the other network for local delivery:
2B:00:3E:56:F2:FA 192.168.2.1
Team-Fly®Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 32So, if 192.168.1.7 wanted to send a message to 192.168.2.4, the transaction would look like:
❑ IP receives a packet for delivery
❑ IP determines that local delivery cannot be done since the destination address appears to be
on a different network from its own, and so needs to deliver it through the local gateway(referred to as the default route in networking terms), which it has previously been informed is192.168.1.1
❑ An ARP request is broadcasted asking, 'Who has 1?'
❑ 192.168.1.1 replies with its own MAC address
❑ IP hands over the packet to the Ethernet layer for delivery to the MAC in the ARP response
❑ Ethernet delivers the packet to the gateway
❑ The gateway examines the packet, and notices the IP address is meant for its other localnetwork
❑ The IP layer on the gateway's other local port receives the packet
❑ IP determines that it can deliver the packet locally
❑ An ARP request is broadcasted asking, "Who has 4?"
❑ 192.168.2.4 replies with its own MAC address
❑ IP hands over the packet to the Ethernet layer for delivery to the MAC in the ARP response
❑ Ethernet delivers the packet
IP is also responsible for another very important task: if the upper layers hand it large bundles of data(each individual bundle would be considered an IP datagram), it must break them apart into somethingsmall enough for it to handle with its header information, and still be encapsulated into a single frame.When this happens, each packet sent out includes a sequence number in the header, allowing thereceiving end's IP layer to reconstruct the original datagram from the individual packets
UDP & TCP
By now, we should have a good idea how networks deliver and route traffic as needed, and we'vecovered the first three layers of the OSI Model with illustrations from TCP/IP over Ethernet We canalso see that IP could also route traffic from Ethernet to another network layer, such as token ring, orPPP over a MODEM connection We're now ready to move on to the next two layers in the OSI model,Transport and Session These two layers are combined and handled by TCP (Transmission ControlProtocol) and UDP (User Datagram Protocol) We'll cover UDP since it's the simpler of the two
protocols
UDP provides a simple transport for any applications that don't need guarantees for delivery or dataintegrity It may not be apparent from that description what the benefits of UDP are, but it does have afew For one, UDP gains a great deal of performance since less work needs to be done There's a greatdeal of overhead involved in maintaining TCP's state and data integrity UDP applications can send andforget Another benefit is the ability to broadcast (or multicast) data to groups of machines TCP is atightly controlled end-to-end protocol, permitting only one node to be addressed
Trang 33NTP (Network Time Protocol) is one application protocol that derives some great advantages fromusing UDP over TCP Using TCP, a time server would have to wait and allow individual machines toconnect and poll the time, meaning that redundant traffic is generated for every additional machine onthe network This would mean a second or more could elapse between the time being sent and the timebeing received, meaning an incorrect time being recorded Using UDP allows it to multicast its timesynchronization traffic in one burst and have all machines receive and respond accordingly, relieving asignificant amount of traffic, which can be better used for other services In this scenario, it's not criticalthat each machine be guaranteed delivery of that synchronization data since it's not likely that a
machine will drift much, if at all, between the multicast broadcasts As long as it receives some of thebroadcasts occasionally, it will stay in sync, and bandwidth is preserved
TCP, on the other hand, is much more expensive to maintain, but if you need to ensure that all the dataarrives intact, there's no other alternative TCP also allows state-full connections, with each exchangebetween nodes acknowledged and verified If any portion of the data fails to come in, TCP will
automatically request a retransmission, and the same would occur for any data arriving that fails togenerate the same checksum
Another important feature is TCP's use of internal addressing Every application using a TCP
connection is assigned a unique port number Server applications typically will use the same port tolisten on, to ensure that all clients know what IP port they need to connect to for reaching that service.TCP connections take this one step farther, by maintaining each completed circuit by independent localand remote IP:port pairs This allows the same server to serve multiple clients simultaneously since thetraffic is segregated according to the combined local and remote addresses and ports It important tonote that while UDP has a concept of ports as well, being a connectionless protocol, no virtual circuitsare built, it simply sends and receives on the port the application has registered with the UDP layer, andhence, the same capabilities do not exist Also note that port addressing for UDP and TCP exist
independently, allowing you to run a UDP and a TCP application on the same port number
simultaneously
The port portion of the IP:port pair is only 16 bits, though that rarely presents a problem Traditionally,ports 1-1023 are reserved as privileged ports, which means that no service should be running on theseports that hasn't been configured by the system administrator Web servers, as an example, run on port
80, and mail services run on port 25 When a user application connects to such a service, it is assigned arandom high number port that is currently unused Therefore, the connection to a web server willtypically look like 209.165.161.91:80<->12.18.192.3:4616
These features make TCP the most widely used protocol on the Internet Applications like web services(HTTP protocol) would certainly not be as enjoyable if the content came in out of order and
occasionally corrupted The same goes for other application protocols like mail (SMTP), and news(NNTP)
Both protocols are covered by their own RFCs: 768 for UDP, and 793 for TCP
ICMP
ICMP (Internet Control Message Protocol) is another protocol that should be mentioned for
completeness It's not a transport protocol in its own right, but a signaling protocol used by the otherprotocols For instance, should a problem arise that appears to blocking all datagrams from reaching the
recipient, IP would use ICMP to report the problem to the sender The popular ping command
generates ICMP messages ('Echo Request' message, in this case) too, to verify that the desired hosts are
on the network, and 'alive'
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 34ICMP can deliver a range of messages including Destination Unreachable, Time Exceeded, RedirectError, Source Quench, Parameter Problem, and a wide variety of sub-messages It can also providesome network probing with Echo/Timestamp/Address Mask Request and Replies.
ICMP is more fully documented in RFC 792
Other Protocols
It is important to realize that other protocols exist at various levels of the OSI reference model, such asIPX, Appletalk, NetBEUI, and DecNET Perl can even work with some of these protocols We won't becovering them here, though, since a clear majority of Perl applications reside in the TCP/IP domain.Furthermore, the principles illustrated here will apply in part to the operation of other types of
As mentioned earlier, SMTP, HTTP, and NNTP are examples of application protocols that workdirectly with the transport protocol The only thing that these types of protocols specify is how they are
to use the connection presented by the lower layers For HTTP, as an example, the protocol specifiesthat each connection will consist of simple query and response pairs, and what the syntax and
commands allowed in the queries are
Anonymous, Broadcast, and Loopback Addresses
Any address within a network with a host segment (the host segment, remember, is the unmaskedportion of any network address) that consists entirely of ones or zeros is not normally treated as aspecific network interface By default, a host with segments containing all zeros is an 'anonymous'address (since it is, in actuality, the network's address) that can be set as the sender in a TCP/IP packet.For example: 192.168.0.0
Of course, a receiving system is entirely within its rights to ignore anonymous transmissions sent to it.This is not a valid address for the recipient however Conversely, setting the host segment to all ones is
a broadcast to all of the systems that have IP addresses in that range, for example: 192.168.255.255.Pinging this address would result in every host on that subnet returning that echo request (at least, itdoes with UNIX systems, Windows NT appears to ignore it)
In addition, most hosts define the special address 127.0.0.1 as a self-referential address, also known
as the loopback address The loopback address allows hosts to open network connections to themselves,which is actually more useful than it might seem; we can for example run a web browser on the samemachine as a web server without sending packets onto the network This address, which is usually
assigned to the special host name localhost, is also useful for testing networking applications when no
actual network is available
Trang 35Networking with Perl
Sockets are Perl's interface into the networking layer, mapping the details of a network connection intosomething that looks and feels exactly like a filehandle This analogy didn't start with Perl, of course,but came into popular usage with BSD UNIX Perl merely continues that tradition, just as it is withother languages
Sockets
There are two basic kinds of sockets: Internet Domain and UNIX domain sockets Internet Domainsockets (hereafter referred to as INET sockets) are associated with an IP address, a port number, and aprotocol, allowing you to establish network connections UNIX domain sockets, on the other hand,appear as files in the local filing system, but act as a FIFO stack and are used for communicationbetween processes on the same machine
Sockets have a type associated with them, which often determines the default protocol to be used INETstream sockets use the TCP protocol, since TCP provides flow control, state, and session management.INET datagram sockets, on the other hand, would naturally use UDP Information flows in the sameway that it does with a regular filehandle, with data read and written in bytes and characters
Some socket implementations already provide support for the IPv6 protocol or handle other protocolssuch as IPX, X25, or AppleTalk Depending on the protocol used, various different socket types may ormay not be allowed Other than stream and datagram sockets, a 'raw' socket type exists, which gives adirect interface to the IP protocol Other socket types supported will be documented in the socket(2)man page on a UNIX-like system
'Socket.pm'
Since Perl's socket support is a wrapper for the native C libraries (on UNIX, anyway), it can support anytype your system can Non-UNIX platforms may have varying or incomplete support for sockets (likeWindows), so check your Perl documentation to determine the extent of your support At the very least,standard INET sockets should be supported on most platforms
The socket functions are a very direct interface One consequence of this is that the idiosyncrasies of theoriginal C API poke through in a number of places, notably the large number of numeric values for thevarious arguments of the socket functions In addition, the address information arguments required byfunctions like bind, connect, and send need to be in a packed sockaddr format that is acceptable to
the underlying C library function Fortunately, the Socket module provides constants for all of thesocket arguments, so we won't have to memorize those numeric values It also provides conversionutilities like inet_aton and inet_ntoa that convert string addresses into the packed form desired andreturned by the C functions
A summary of Perl's socket functions is given overleaf, some directly supported by Perl, the rest
provided by Socket.pm Each of them is explained in more detail later in the chapter
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 36Function Description
socket Create a new socket filehandle in the specified domain (Internet or UNIX) of the
specified type (streaming, datagram, raw, etc.) and protocol (TCP, UDP, etc.) Forexample:
socket(ISOCK, PF_INET, SOCK_STREAM, $proto);
shutdown Close a socket and all filehandles that are associated with it This differs from a
simple close, which only closes the current process's given filehandle, not anycopies held by child processes In addition, a socket may be closed completely orconverted to a read-only or write-only socket by shutting down one half of theconnection For example:
shutdown(ISOCK, 2);
socketpair Generate a pair of UNIX domain sockets linked back-to-back This is a quick and
easy way to create a full-duplex pipe (unlike the pipe function) and is covered inmore detail in Chapter 22 For example:
socketpair(RSOCK, WSOCK, SOCK_STREAM, PF_UNSPEC);
Servers:
Function Description
bind Bind a newly created socket to a specified port and address For example:
bind ISOCK, $packed_addr;
listen (Stream sockets only) Set up a queue for receiving incoming network connection
requests on a bound socket For example:
listen ISOCK, $qsize;
accept (Stream sockets only) Accept and create a new communications socket on a bound
and listened to server socket For example:
accept CNNCT, ISOCK;
Clients:
Function Description
connect Connect a socket to a remote server at a given port and IP address, which must
be bound and listening to the specified port and address Note: while datagramsockets can't really connect, this is supported This sets the default destination forthat type For example:
connect ISOCK, $packed_addr;
Trang 37Options:
Function Description
getsockopt Retrieve a configuration option from a socket For example:
$opt = getsockopt ISOCK, SOL_SOCKET, SO_DONTROUTE;
setsockopt Set a configuration option on a socket For example:
setsockopt ISOCK, SOL_SOCKET, SO_REUSEADDR, 1;
IO:
Function Description
send Send a message For UDP sockets this is the only way to send data, and an
addressee must be supplied, unless a default destination has already been set byusing the connect function For TCP sockets no addressee is needed For example:
send ISOCK, $message, 0;
recv Receive a message For a UDP socket, this is the only way to receive data, and
returns the addressee (as a sockaddr structure) on success TCP sockets may alsouse recv For example:
$message = recv ISOCK, $message, 1024, 0;
Conversion Functions:
Function Description
inet_aton aton = ASCII to Network byte order.
Convert a hostname (for example, www.myserver.com) or textualrepresentation of an IP address (for example, 209.165.161.91)into a four byte packed value for use with INET sockets If a hostname is supplied, a name lookup, possibly involving a networkrequest, is performed to find its IP address (this is a Perl extra, andnot part of the C call) Used in conjunction with
pack_sockaddr_in For example:
my $ip = inet_aton($hostname);
inet_ntoa ntoa = Network byte order to ASCII
Convert a four byte packed value into a textual representation of an
IP address, for example, 209.165.161.91 Used in conjunctionwith unpack_sockaddr_in
Table continued on following page
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 38Function Description
pack_sockaddr_in Generate a sockaddr_in structure suitable for use with the bind,
connect, and send functions from a port number and a packedfour-byte IP address For example:
my $addr = pack_sockaddr_in($port, $ip);
The IP address can be generated with inet_aton.unpack_sockaddr_in Extract the port number and packed four-byte IP address from the
supplied sockaddr_in structure:
my ($port, $ip) = unpack_sockaddr_in($addr);
sockaddr_in Call either unpack_sockaddr_in or pack_sockaddr_in
depending on whether it is called in a list (unpack) or scalar (pack)context:
my $addr = sockaddr_in($port, $ip);
my ($port, $ip) = sockaddr_in($addr);
The dual nature of this function can lead to considerable confusion,
so using it in only one direction or using the pack and unpackversions explicitly is recommended
pack_sockaddr_un Convert a pathname into a sockaddr_un structure for use with
UNIX domain sockets:
my $addr = pack_sockaddr_un($path);
unpack_sockaddr_un Convert a sockaddr_un structure into a pathname:
my $path = unpack_sockaddr_un($addr);
sockaddr_un Call either unpack_sockaddr_un or pack_sockaddr_un
depending on whether it is called in a list (unpack) or scalar (pack)context:
my $addr = sockaddr_in($path);
my ($path) = sockaddr_in($addr);
This function is even more confusing than sockaddr_in, since itonly returns one value in either case Using it for packing only, orusing the pack and unpack versions explicitly is definitelyrecommended
In addition to the utility functions, Socket supplies four symbols for special IP addresses in a packed format suitable for passing to pack_sockaddr_in:
Trang 39INADDR_ANY Tells the socket that no specific address is requested for use, we just
want to be able to talk directly to all local networks
INADDR_BROADCAST Uses the generic broadcast address of 255.255.255.255, which
transmits a broadcast on all local networks
INADDR_LOOPBACK The loopback address for the local host, generally 127.0.0.1.INADDR_NONE An address meaning 'invalid IP address' in certain operations Usually
255.255.255.255 This is invalid for TCP for example, since TCPdoes not permit broadcasts
Opening Sockets
The procedure for opening sockets is the same for both UNIX and INET sockets; it only differs
depending on whether you're opening a client or server connection Server sockets typically followthree steps: creating the socket, initializing a socket data structure, and binding the socket and thestructure together:
socket(USOCK, PF_UNIX, SOCK_STREAM, 0) || die "socket error: $!\n";
$sock_struct = sockaddr_un('/tmp/server.sock');
bind(USOCK, $sock_struct) || die "bind error: $!\n";
Of course, you may also need to insert a setsockopt call to fully configure the socket appropriately,but the above three steps are needed as a minimum Opening client connections is similar: createsocket, create the packed remote address, and connect to the server:
socket(ISOCK, PF_INET, SOCK_STREAM, getprotobyname('tcp')) || die "socket error:
$!\n";
$paddr = sockaddr_in($port, inet_aton($ip));
connect(ISOCK, $paddr) || die "connect error: $!\n";
Configuring Socket Options
The socket specification defines a number of options that may be set for sockets to alter their behavior.For direct manipulation of socket options, we can make use of Perl's built-in getsockopt and
setsockopt functions, which provide low-level socket option handling capabilities For the most part
we rarely want to set socket options; however, there are a few socket options that are worth dealing withfrom time to time These options apply almost exclusively to INET sockets
The getsockopt function takes three arguments: a socket filehandle, a protocol level, and the
option to set Protocol level refers to the level the desired level operates at in the network stack TCP,for instance, is a higher-level protocol than IP, and so has a higher value We needn't bother with theparticular values, though, since constants are available setsockopt takes the same three arguments,but it can also include a value to set the option to Leaving that argument out effectively unsets
that option
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com
Trang 40Options include:
Option Description
SO_ACCEPTCONN The socket is listening for connections
SO_BROADCAST The socket is enabled for broadcasts (used with UDP)
SO_DEBUG Debugging mode enabled
SO_LINGER Sockets with data remaining to send continue to exist after the
process that created them exits, until all data has been sent
SO_DONTROUTE The socket will not route See also MSG_DONTROUTE in 'Reading from
and Writing to Sockets' below
SO_ERROR (Read-only) The socket is in an error condition
SO_KEEPALIVE Prevent TCP/IP closing the connection due to inactivity by
maintaining a periodic exchange of low-level messages between thelocal and remote sockets
SO_DONTLINGER Sockets disappear immediately, instead of continuing to exist until all
data has been sent
SO_OOBINLINE Allow 'out of band' data to be read with a regular recv – see
'Reading fromand Writing to Sockets' below
SO_RCVBUF The size of the receive buffer
SO_RCVTIMEO The timeout value for receive operations
SO_REUSEADDR Allow bound addresses to be reused immediately
SO_SNDBUF The size of the send buffer
SO_SNDTIMEO The timeout period for send operations
SO_TYPE The socket type (stream, datagram, etc.)
Of these, the most useful is SO_REUSEADDR, which allows a given port to be reused, even if anothersocket exists that's in a TIME_WAIT status The common use for this is to allow an immediate serverrestart, even if the kernel hasn't finished it's garbage collection and removed the last socket:
setsockopt SERVER, SOL_SOCKET, SO_REUSEADDR, 1;
We can be a little more creative with the commas to provide an arguably more legible statement:
setsockopt SERVER => SOL_SOCKET, SO_REUSEADDR => 1;
The SO_LINGER option is also useful, as is its opposite, SO_DONTLINGER Usually the default,
SO_DONTLINGER causes sockets close immediately when a process exits, even if data remains in thebuffer to be sent Alternatively, you can specify SO_LINGER, which causes sockets to remain open aslong as they have data still to send to a remote client
setsockopt SERVER, SOL_SOCKET, SO_DONTLINGER, 1;