professional perl programming wrox 2001 phần 5 pot

The following two statements areactually identical, but phrased differently: # open a file write-only with 'open'open HANDLE, "> $filename"; # open a file write-only with 'sysopen'sysope

Trang 1

Opening Filehandles at the System Level

Opening files at the system level is handled by sysopen This, like open, takes a filehandle name and afilename as an argument Unlike open, sysopen does not take a mode string like > or +<, but anumeric mode made up of several mode flags whose values specify the desired attributes of thefilehandle The Fcntl module defines labels for these numeric flags, such as O_WRONLY for write-onlyaccess or O_CREAT to create the file if it does not exist The mode cannot be combined with thefilename as it is with open, instead it comes after the filename as an additional third parameter:

use Fcntl; # import standard symbolssysopen SYSHANDLE, $filename, O_WRONLY | O_CREAT;

The standard open modes can all be expressed in terms of a sysopen mode value; < is equivalent to

O_RDONLY, and > is equivalent to O_WRONLY| O_CREAT| O_TRUNC The following two statements areactually identical, but phrased differently:

# open a file write-only with 'open'open HANDLE, "> $filename";

# open a file write-only with 'sysopen'sysopen HANDLE, $filename, O_WRONLY|O_CREAT|O_TRUNC;

Note that sysopen does not create a different sort of filehandle from open In particular, it does notcreate an unbuffered handle Whether or not the filehandle is buffered or unbuffered depends on how

we read and write with it Functions like read, getc, and the readline operator work via the standard

IO buffers, while functions like sysread and syswrite bypass them sysopen itself has no opinion

on the use of buffers or not – it merely provides a lower-level way to create filehandles

For the curious coming from a C background: while it is true that sysopen uses the open system call to generate an unbuffered file descriptor, it then uses fdopen to create a filehandle from that file descriptor and returns this to Perl So sysopen does create a filehandle, even though it does not use the fopen system call.

For many applications we only need to supply three parameters to sysopen (In fact, in many cases wecan get away with two, because an open mode flag of 0 is usually equivalent to O_RDONLY However it

is dangerous to assume this, always use the Fcntl symbols.) We can also supply a fourth optionalparameter describing the permissions of the file in cases where the file is created We will come to that

in a moment

Open Mode Flags

sysopen allows us to specify all manner of flags in the open mode, some generically useful, others veryspecific, and a few more than a little obscure The main point of using sysopen rather than open is togain access to these flags directly, allowing us to create open modes other than the six standardcombinations supported by open Some are relevant only to particular kinds of file, such as terminaldevices The following tables show the flags that are combined to make up the various modes used by

Trang 2

Always specify one (and only one) of the primary modes:

O_RDONLY Open file for reading only

O_RDWR Open file for reading and writing

O_WRONLY Open file for writing only (not supported on Windows)

These are additional modes:

O_APPEND Open file for appending

O_CREAT Create file if it does not exist

O_TRUNC Truncate file on opening it (writing)

See the table in the 'Creating Filehandles with IO::File' section earlier in the chapter for a comparison

of open and sysopen modes to see how these mode flags are combined to create the six modes of

open Some other useful flags we can only access with sysopen include the following:

O_BINARY Use binary mode (no newline translation)

Text and Binary

files O_TEXT Use text mode (do newline translation).

O_NONBLOCK Enable non-blocking mode

Non-blocking IO

O_NDELAY Alias (usually) for O_NONBLOCK Semantics may vary on

platforms for filehandles that are associated withnetworking

Additional

Modes

O_EXCL Create file only if it does not already exist (meaningful

only with O_CREAT) If it does exist, fail rather thanopen it

Non-blocking IO

One of the main reasons for using sysopen over open is for non-blocking IO Normally when a read

or write (including a system read or write performed by sysread or syswrite) is performed, thesystem will wait for the operation to complete In the case of reading, Perl will wait for input to arriveand only return control to our application when it has something for us Frequently, however, we do notwant to wait because we want to do other things in the meantime, so we use sysopen and the

O_NONBLOCK flag, (although it should be noted that the O_NONBLOCK flag is not recognized by Windows

at present):

use Fcntl;

# open serial port read only, non-blocking

sysopen SERIAL, '/dev/ttyS0', O_RDONLY|O_NONBLOCK;

# attempt to read characters

my $key;

while (sysread SERIAL, $key, 1) {

if (defined $key) {print "Got '$key' \n";

} else {warn "No input available \n";

Trang 3

# wait before trying againsleep(1);

}}

# close the portclose SERIAL;

When non-blocking mode is enabled, it attempts to read from a filehandle When no data is available itwill raise the EAGAIN error in $! We can get the symbol for EAGAIN from the POSIX module, so abetter way to write the above example would have been:

use POSIX qw(EAGAIN);

} else {

if ($!==EAGAIN) {warn "No input available \n";

# wait before trying againsleep(1);

} else {warn "Error attempting to read: $! \n";

last;

}}}

# close the portclose SERIAL;

In this case we have used sysread to read an individual character directly from the serial port Wecould also have used read or even getc to do the same thing via the filehandle's buffers This probablywould have been better as the filehandle will read in several kilobytes of characters if it can, and thenreturn them to us one by one From our perspective there is no difference, but from the point of view ofthe serial port it makes a lot of difference

The Permissions Mask

Since it works at a lower level than open, sysopen also allows us to specify a numeric permissionsmask as a fourth argument to sysopen, either in the conventional octal format, or as a combination offlags defined by the :mode import label:

# permissions mode, as octal integeropen HANDLE, $filename, O_WRONLY|O_CREAT, 0644;

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 4

# permissions mode, as set of Fcntl flags:

open HANDLE, $filename, O_WRONLY|O_CREAT, S_IRUSR

Using 'sysopen' via 'IO::File'

It is not actually necessary to use sysopen to make use of its features; the new method of IO::File

automatically uses sysopen if we give it a numeric mode instead of a string:

# 'IO::File' open for read/write using 'open'

$fh = new IO::File ($filename, '+<');

# 'IO::File' open for read/write using 'sysopen'

$fh = new IO::File ($filename, O_RDWR);

Since new can automatically detect a numeric mode flag and pass it to sysopen instead of open, wecan also pass in the permissions mask too:

# 'IO::File' open for read/write using 'sysopen' with permissions mask

$fh = new IO::File ($filename, O_RDWR, 0644);

The advantage of IO::File is, of course, that it makes filehandles much easier to manipulate and topass in and out of subroutines This makes it a good choice for programming regardless of whether wechoose to use open or sysopen underneath

Unbuffered Reading

Reading at the standard IO level is handled by Perl functions such as read and print The unbufferedsystem-level equivalents are sysread and syswrite

sysread looks suspiciously similar to read at first glance Like read, it takes a filehandle to read from,

a scalar variable to store the result in, a length giving the amount of data to read (or attempt to read),and an optional offset:

die "Usage: $0 file \n" unless @ARGV;

sysopen HANDLE, $ARGV[0], O_RDONLY|O_NONBLOCK;

Trang 5

# read 20 chrs into $result

my $chrs = sysread HANDLE, $result, 20;

if ($chrs == 20) {

# got all 20, try to read another 30 chrs into $result after the first 20

$chrs += sysread HANDLE, $result, 30, 20;

print "Got '$result' \n";

if ($chrs < 50) {print "Data source exhausted after $chrs characters \n";

} else {print "Read $chrs characters \n";

}} elsif ($chrs > 0) {print "Got '$result' \n";

print "Data source exhausted after $chrs characters \n";

} else {print "No data! \n";

}The return value from sysread is the number of characters successfully read This may be less than thenumber requested if the data source runs out, and 0 if there is no data to read However, note that if

O_NONBLOCK is not set, then sysread will wait for more data to arrive rather than returning 0 If there

is some data but not enough to satisfy the request then sysread will return when it exhausts the datasource As stated before, Windows does not recognize O_NONBLOCK, so this example will not workproperly on that platform

In fact, the above example is more of an example of how to use sysopen (to get a non-blocking

filehandle) than it is of how to use sysread, as we could just as easily have used read in this example,and with the same effect The difference between the two is that read would read as much data aspossible in the first call, and the second call would only cause a read of the file if the first failed toretrieve 50 characters The fact that it only returns 20 to us is irrelevant; buffering stores up the rest until

we need them Of course we might not want to buffer the data; we may want to share the filehandlebetween different processes instead In cases like that, we would use sysread

There is no system-level definition of the end-of-file condition, but we can do the equivalent by

checking for a zero return from sysread instead

Unbuffered Writing

The counterpart to sysread is syswrite, which writes data directly to the filehandle rather than intothe filehandle's buffer This is very useful in all kinds of applications, especially those that involvesending short bursts of information between different processes

syswrite takes a filehandle, some data, a length, and an optional offset as parameters It then writesdata from the string to the filehandle up to the end of the text contained in the scalar, or the valuesupplied as the length, whichever is shorter If an offset is supplied, syswrite starts writing data fromthat character position For example, this code snippet writes out the contents of the scalar $data to

HANDLE (presumably a serial or network connection) at a rate of five hundred characters per second:

$pos = 0; $span = 500; $length = length($data);

while ($pos <= $length) {syswrite HANDLE, $data, $span, $pos;

Trang 6

Unlike sysread, the length is also an optional parameter If omitted, the length of the supplied data isused instead, making syswrite a close analogue for an unbuffered print, only without the ability toaccept a list of arguments We can invent a sysprint that will work like print using syswrite though:

# an unbuffered print-a-like

sub sysprint {

# check for a leading filehandle and remove it if present

$fh = (defined fileno($_[0]))?shift:*STDOUT;

# use $, to join arguments, just like print

$joiner = $, ?$, :'';

syswrite $fh, join($joiner, @_);

}

sysprint(*STDOUT, "This ", "works ", "like ", "print ", "(sort of) ", "\n");

See the section on pipes in Chapter 22 for another example where sysread and syswrite are useful

to avoid deadlocks between communicating processes

System-Level File Positioning

The system-level equivalent of the seek and tell functions is sysseek, which carries out both roles.Like seek, it takes a filehandle, a position, and a whence flag that is set to either 0, 1, or 2, or the

Fcntl equivalent symbols SEEK_SET, SEEK_CUR, and SEEK_END:

# seek using whence numbers

sysseek HANDLE, 0, 0; # rewind to start

sysseek HANDLE, 0, 2; # seek to end of file

sysseek HANDLE, 20, 1; # seek forward 20 characters

# seek using Fcntl symbols

use Fcntl qw(:seek);

sysseek HANDLE, 0, SEEK_SET; # rewind to start

sysseek HANDLE, 0, SEEK_END; # seek to end of file

sysseek HANDLE, 20, SEEK_CUR; # seek forward 20 characters

The old file position is returned by sysseek To find the current position using sysseek we cansimply seek to it using a whence flag of SEEK_CUR and a position of zero:

use Fcntl qw(:seek);

$pos = sysseek HANDLE, 0, SEEK_CUR;

Apart from the difference caused by buffering, sysseek is identical in operation to seek However,

tell and sysseek can (and often do) return radically different values for the file position This isbecause the position returned by tell is determined by the amount of data read by our application,whereas the position returned by sysseek is determined by the amount of data read by Perl, whichincludes the data buffered by the filehandle We can calculate the amount of data currently buffered bytaking the difference between the two values:

print "There are ", tell(HANDLE) - sysseek(HANDLE, 0, 1), " bytes in the buffer

\n";

Trang 7

No discussion of system-level IO would be entirely complete without a brief look at the fcntl and

ioctl functions These provide very low-level access to filehandles, retrieving and setting parametersthat are often otherwise inaccessible to us fcntl is more generic, and works across most kinds offilehandle ioctl is targeted at special files, such as character and block devices, and is much moreUNIX-specific

Returned values, if any, are placed into a passed scalar variable; the return value is used to indicatesuccess or failure only Both functions return undef on failure and set the error in $! Otherwise theyeither return a positive value (if the underlying system call returns one), or the special return value '0but true', which returns 0 in a numeric context but tests true otherwise To get the original underlyingreturn value (which is -1 for failure, 0 or a positive value for success), we can write:

# calculate original numeric return value from system 'fcntl'

$result = int (fcntl(HANDLE, $action, $value) || -1);

Both functions will return a fatal error if used on a platform that does not support the underlying systemcalls On Windows, they return 0, but do not actually do anything, although they are not fatal

Setting Filehandle Attributes with 'fcntl'

The fcntl function (not to be confused with the Fcntl module) performs miscellaneous actions on afilehandle It takes three parameters, a filehandle, an action to perform, and a value For actions thatretrieve information this value must be a scalar variable, into which fcntl places the results of the call.The names of the actions are defined, appropriately enough, in the Fcntl module For example, we canuse fcntl to set a filehandle into non-blocking mode after we have opened it:

use POSIX;

use Fcntl qw(:mode);

# get the current open mode flags

my $mode;

fcntl HANDLE, F_GETFL, $mode;

# add O_NONBLOCK and setfcntl HANDLE, F_SETFL, $mode | O_NONBLOCK;

A reasonably complete list of actions (some platform-specific) supported by fcntl follows Consult thesystem documentation for details of which actions fcntl supports on a particular platform; it may varyfrom the list below

This action duplicates the underlying file descriptor:

F_DUPFD Duplicate the file descriptor, returning the new descriptor A low-level version of

open's & mode

Trang 8

The following actions get or set the close-on-exec flag This flag determines whether the filehandlesurvives across an exec call Normally STDIN, STDOUT, and STDERR survive and other filehandles areclosed The threshold for new filehandles can be set with the special variable $^F The F_SETFD actionallows the flag of an individual, and already extant, filehandle to be modified.

F_GETFD Read the close-on-exec flag File descriptors with this flag set are closed across a

call to exec

F_SETFD Set the close-on-exec flag For example, to preserve a filehandle across exec we

would use:

fcntl HANDLE, F_SETFD, 0;

The following actions get and set the mode flags of the filehandle, as specified by open modes like > or

sysopen mode flags like O_RDONLY:

F_GETFL Get the open mode flags, as set by open or sysopen A combination of flags such

as O_RDONLY, O_CREAT, O_APPEND, and so on

F_SETFL Set the open mode flags Usually only the flags O_APPEND, O_ASYNC (Linux/BSD),

and O_NONBLOCK can be set, others are unaffected See above for an example

The following actions, which handle discretionary file locking, are similar to but not the same as the

flock system call (unless flock is implemented in terms of fcntl, which is sometimes the case) Formost purposes, flock is a lot simpler and more portable, but may not work on network-mountedfilesystems (for example, Via NFS) See the flock discussion earlier in the chapter for a more detailedcomparison of the two approaches (and how Perl's own flock relates to them)

F_GETLK Determine if a file is locked or not The parameter needs to be a scalar variable,

into which details of the lock are written The l_type field is set to F_UNLCK if nolock is present

F_SETLK Set a lock, returning immediately with undef on failure The parameter needs to

be a packed flock structure containing the lock details $! is set to the reason forthe failure if the lock attempt fails

F_SETLKW Set a lock, waiting for the file to become available if necessary Returns undef if

interrupted

The lock type can be one of F_RDLCK, F_WRLCK, or F_UNLCK, which have the obvious meanings There

is no nonblock flag as there is for flock because that function is handled by F_SETLK and F_SETLKW.However, the values passed and returned to these actions are packed lock structures (not simple values),

so we need to use pack and unpack to create arguments that are suitable for fcntl when using theseactions Here is a short script that implements a generic locking subroutine, three specific lock

subroutines that use it, and a quick demonstration of using them:

Trang 9

# generic lock subroutinesub _do_lock {

my ($locktype, $fh, $block) = @_;

$block |= 0; # don't block unless asked to

# is this a blocking or non-blocking attempt

my $op = $block?F_SETLKW:F_SETLK;

# pack a structure suitable for this operation

my $lock = pack('s s l l s', $locktype, 0, 0, 0, 0);

# establish the chosen lock in the chosen way

my $res = fcntl($fh, $op, $lock);

sub write_lock {return _do_lock(F_WRLCK, @_);}

sub undo_lock {return _do_lock(F_UNLCK, @_);}

# called like this:

open MYHANDLE, "+> myfile" or die "Failed to open: $! \n";

# block write lockwrite_lock(*MYHANDLE, 1) or die "Failed to lock: $! \n";

print MYHANDLE "Only I can write here \n";

# undo (can't block anyway)undo_lock(*MYHANDLE) or die "Failed to unlick: $! \n";

close MYHANDLE;

If (assuming the platform supports it) the O_ASYNC mode flag is specified in sysopen (or enabled using

fcntl and F_SETLF), then signals are generated by open file descriptors whenever reading or writingbecomes possible, which allows us to write synchronous IO routines that respond to events Theseactions allow the target and type of signal generated to be configured:

F_GETOWN Get the process id (or process group) that is receiving signals (SIGIO or SIGURG)

on this file descriptor, if the O_ASYNC mode flag is enabled By default theprocess id is that of the process that opened the filehandle, that is, us

F_SETOWN Set the process id (or process group) that will receive signals on this file

descriptor, if O_ASYNC is enabled

F_GETSIG (Linux only) Get the signal type generated by filehandles with O_ASYNC enabled

The default of zero generates a SIGIO signal, as does a setting of SIGIO

F_SETSIG (Linux only) Set the signal type generated by filehandles with O_ASYNC enabled

The POSIX module also defines symbols for the various signal names

Trang 10

Controlling Devices with 'ioctl'

The ioctl function closely resembles fcntl, both in syntax and in operation Like fcntl, it takes afilehandle, an action, and a value (which in the case of retrieval actions, must be a scalar variable) Italso returns '0 but true' on success and undef on failure, setting $! to the reason if so Also like fcntl,attempting to use ioctl on a platform that does not support it will cause a fatal error

ioctl is an interface for controlling filehandles that are associated with devices, such as serial ports,terminals, CR-ROM drives, and so on ioctl is a very low-level tool for analyzing and programmingthe device underlying a filehandle, and it is relatively rarely that we need to use it Most of the useful

ioctl actions are already encapsulated into more convenient modules or are, at the least, handledmore elegantly by the POSIX module However in a few cases it can be useful, so long as we realize that

it is not very portable (many platforms do not support it) and that higher-level and more portablesolutions are generally preferable

The different actions supported by ioctl can be considerable (as well as highly platform-dependent)since different device categories may support their own particular family of ioctl actions – serial portshave one set, terminals have another, and so on When Perl is built, it analyzes the underlying systemand attempts to compile a list of constant-defining functions, each one corresponding to the equivalent

C header file The most common symbols are placed in sys/ioctl.ph Other ioctl symbols may bedefined in different header files – on a Linux system, CR-ROM ioctls are defined in the header file

linux/cdrom.ph Here's how we can eject a CD on a Linux box:

open CDROM, '/dev/cdrom';

ioctl CDROM, 0x5309, 1; # the ioctl number for CDROMEJECT

# ioctl CDROM, &CDROMEJECT, 1;

close CDROM;

For a complete list of ioctl actions consult the system documentation for the device type in question –Linux defines ioctls in the manual page ioctl_list Serial port definitions can also be found in theheader files compiled by Perl, /usr/lib/perl5/5.6.0/<platform-type>/bits/ioctls.ph

for the actions, ioctl_types.ph for flags such as TCIOM_RTS, and TCIOM_CD in a standard UNIXPerl 5.6 installation Terminal definitions are covered by the POSIX routines POSIX::Termios anddocumented in the POSIX module manual page

Note that many of the more common actions performed by ioctl are better handled elsewhere (byseveral of the standard Perl library modules covered in this chapter) In Chapter 15, the POSIX

getattr and setattr routines and the POSIX::Termios module are covered

POSIX IO

The POSIX module provides a direct interface to the standard C library upon which Perl is built,including all of the file-handling routines Most of the time we never need to bother with these sincePerl already supports most of them in its own file functions However, on occasion, the POSIX calls cancome in useful, so we will quickly detail what is available

Trang 11

The POSIX module provides three main categories of routines that relate to file handling and IO Thefirst works on filehandles, and the majority of these are identical in every respect to the standard Perlfile functions The second works on file descriptors, and is the basis of the system-level IO functions.The third is specifically aimed at talking to terminals, and we discuss it further in Chapter 15

Intuitively we might expect Perl's sys file functions to return and operate on file descriptors (notfilehandles), since they are after all supposed to be 'system' level In fact they do, by using fileno todetermine the underlying file descriptor and then using the appropriate system-level POSIX call So,even though we have a filehandle complete with buffers, we may never actually use them Theadvantage of this approach is that we can use filehandles and still carry out unbuffered IO without everhaving to worry about file descriptors unless we really want to

POSIX Filehandle Routines

The POSIX module provides interfaces to the standard filehandle routines as a convenience toprogrammers migrating from backgrounds such as C, who are used to routines called fopen, flush,

fstat, fgetpos, and so on In actual fact all of these routines either map directly onto Perl's built-infunctions (POSIX::getc simply calls CORE::getc for example), or methods in the IO::File (open

goes to IO::File::open), IO::Handle (ungetc goes to IO::Handle::getc), or IO::Seekable

modules (ftell goes to IO::Seekable::tell) modules

In short, there is really no reason to use these functions, and we are almost certainly better off usingPerl's built-in functions and the IO:: family of modules However, for those who have a lot ofexperience with the POSIX library calls and are interested in a quick port with minimal fuss, the POSIX

module does provide their equivalents in Perl

POSIX File Descriptor Routines

The POSIX routines that operate on file descriptors are summarized below Note that we can use

fdopen to create a filehandle from a file descriptor that was created using open or creat We can use

fileno to get a file descriptor from a filehandle for use in these routines:

close fd Close a file descriptor created by open or create

creat fd, perm Create a file with an open mode of O_WRONLY|O_CREAT|O_TRUNC

Takes a permissions mask as a second parameter Shorthand for open

fdopen fd Create a filehandle from a file descriptor – equivalent to the open

mode &=<fd>

stat fd Return 'stat' information for the file descriptor The returned list of

values is identical to that returned by Perl's stat

dup fd Duplicate an existing file descriptor, returning the number of the new

file descriptor Equivalent to the open mode &<fd>, except it returns

a file descriptor

dup2 oldfd, newfd Make newfd a duplicate of oldfd, closing newfd first if it is currently

open No direct Perl equivalent

open file, mode,perm Open a file descriptor Identical to sysopen except that it returns a

file descriptor not a filehandle sysopen can be simulated byfollowing open with fdopen on the generated file descriptor

Table continued on following page

Trang 12

Pipe Create a pair of file descriptors connected to either end of a

unidirectional pipe Returns a list of two descriptors; the first is only, the second is write-only Identical to Perl's pipe except that itreturns file descriptors, not filehandles

read-read fd, $buf,

length Read from a file descriptor Identical to Perl's sysread except that it

uses a file descriptor and not a filehandle

write fd, $buf,

length Write to a file descriptor Identical to Perl's syswrite except that it

uses a file descriptor and not a filehandle

Note that importing these functions into our own applications can cause problems, since many of themhave the same name as a Perl counterpart that uses filehandles rather than file descriptors For thatreason, these routines are better called through their namespace prefix:

$fd = POSIX::open($path, O_RDWR|O_APPEND|O_EXCL, 0644);

Moving between POSIX and Standard IO

Occasionally we might want to work with both a POSIX file descriptor and a filehandle for the same file.For example, we may want to make use of functions in the POSIX library or third party C libraries thatexpect file descriptors as arguments This takes a lot of care and attention to pull off, because as weremarked when we started, mixing buffered and unbuffered operations can corrupt data and confuse thefile position However, if we really want to do this we can convert between filehandles and file

descriptors

Generating a filehandle from a file descriptor involves wrapping it in a stdio file structure containing apair of buffers In C this is done by the fdopen system call In Perl we can do the same thing with thespecial &= open mode, which takes a file descriptor as an argument:

# wrap a file descriptor in a new filehandle

open HANDLE, "&= $descriptor";

Much the same thing happens implicitly when we duplicate a filehandle with open The & mode is ashorthand for extracting the file descriptor of a filehandle and then creating a new filehandle structurearound it:

# duplicate a filehandle the quick way

open NEWOUT, "& STDOUT";

# duplicate a filehandle the explicit way

$stout = fileno STDOUT;

open NEWOUT, "& $stout";

Trang 13

Note that there is a difference between & and &= &= associates a new filehandle with an existing filedescriptor Closing any filehandle associated with that descriptor closes all of them & creates a new filedescriptor that is associated with the same file, but is nonetheless a different descriptor The file

descriptors and their associated filehandles share file positions, but can be closed independently of eachother See the section on open at the start of the chapter for more on these special modes

Extracting the file descriptor from a file handle is trivial; we just use the fileno function:

$descriptor = fileno HANDLE;

We can also use fileno to find out if filehandles are duplicated, since they will have the same filedescriptor:

if (fileno(HANDLE1) == fileno(HANDLE2)) {print "Handles are duplicated \n";

}

Summary

In this chapter we looked at using filehandles as a means of communication between our Perl programsand external data sources We saw that Perl provides us with a rich suite of functions for manipulatingfilehandles We examined various ways of creating, referring to, reading, and writing to filehandles Aswell as this, we also investigated changing the default output filehandle, duplicating and aliasingfilehandles, redirecting filehandles, and caching many filehandles

We looked into the extra control that we gain when manipulating filehandles at the system level,including performing unbuffered reading and writing

Finally, we examined the fcntl and ioctl functions, and the POSIX module, and discussed when wemight need to use them and how to do that

Trang 15

Manipulating Files and Directories

Files and Filenames

There are plenty of applications involving files that do not necessarily involve opening a filehandle.Obvious examples include copying, moving or renaming files, and interrogating files for their size,permissions or ownership In this chapter we cover testing files for different properties such as their type(file, directory, and so on) and accessibility (can we read and/or write it?) We also delve deeper into fileattributes with the stat and lstat functions, and take a look at file globbing, which is an interface tothe wildcard file-naming features usually supplied by shells

In addition to file globbing and interrogation, the Perl standard library supplies a toolkit of modules forcopying, comparing, and processing files that are written to work portably and produce the correctresults regardless of the underlying platform Modules that fall into this group include File::Copy,

File::Compare, and File::CheckTree

We also take a look at creating and using temporary files, both privately and as a mechanism for sharingtransient data between different programs Having covered files, we then move to manipulating

directories

Let's start first by examining how Perl allows us to extract user and group information, which is usefulfor managing files

Trang 16

Getting User and Group Information

Perl provides built-in support for handling user and group information on UNIX platforms through the

getpwent and getgrent families of functions This support is principally derived from the underlying

C library functions of the same names, which are in turn dependent on the details of the

implementation provided by the operating system All UNIX platforms provide broadly the samefeatures for user and group management in terms of user and group names and ids, but they varyslightly in what additional information they store Perl makes a reasonable attempt to unify and handleall the variations, but the system documentation is the best source of information on what values thesefunctions return

UNIX platforms define user and group information in the /etc/passwd and /etc/group files, butthis oversimplifies the actual process of looking up user and group information for two reasons First, if

a shadow password file is in use then the user information in /etc/passwd will not contain an

encrypted password in the password field Second, if alternative sources of user and group informationare configured (such as NIS or NIS+), then requesting user or group information may additionally (oralternatively) initiate a network lookup to retrieve information from a remote server On most UNIXplatforms the order in which local and remote information sources are consulted is typically defined bythe file /etc/nsswitch.conf

Support for other security models and platforms is not provided through built-in functions, but isavailable through extension modules Windows NT programmers, for example, can make use of the

Win32::AdminMisc module to gain access to the Win32 Security API Windows and other non-UNIXplatforms do not support getpwent or getgrent, though the Cygwin port does provide a veneer ofUNIX security that allows these functions to work on Windows platforms with limited functionality.Access Control Lists (ACLs) and other advanced security features are beyond the reach of the built-infunctions even on UNIX platforms, but they can be handled via various modules available from CPAN– modules exist for most common security solutions

User Information

UNIX platforms store local user information in the /etc/passwd file (though as noted above they mayalso retrieve information remotely) The format varies slightly, but typically has a structure like this:fred:RGdmsaynFgP56:301:200:Fred A:/home/fred:/bin/bash

jim:Edkl1y7NMtO/M:302:200:Jim B:/home/jim:/bin/ksh

mysql:!!:120:120:MySQL server:/var/lib/mysql:/bin/csh

Each line follows the same format, and contains the following fields: name, password, user id, primarygroup id, comment/GECOS, home directory, and login shell In this case we are not using a shadowpassword file, so the password field contains an encrypted password The first two lines are for regularusers, while the third defines an identity for a MySQL database server to run as It doesn't want or need apassword since it is not intended as a login user, so the password is disabled with !! (* is often also usedfor this purpose)

getpwent (pwent is short for 'password entry') retrieves one entry from the user information file at atime, starting from the first In list context it returns no less than ten fields:

($name, $passwd, $uid, $gid, $quota, $comment, $gcos, $dir, $shell, $expire) =getpwent;

Since the format and source of user information varies, not all these fields are always defined, and some

of them have alternative meanings A summary of each field and its possible meanings is given in thetable opposite; consult the manual page for the passwd file (typically via man 5 passwd) for exactdetails of what fields are provided on a given platform

Trang 17

Field Name Number Meaning

name 0 The login name of the user

passwd 1 The encrypted password Depending on the platform, the password

may be encrypted using the standard UNIX crypt function, or themore secure MD5 hashing algorithm If a shadow password file is inuse, this field returns an asterisk Additionally, disabled accounts oftenprefix passwords with ! to disable them

uid 2 The user id of this user

gid 3 The primary group of this user Other groups can be found using the

group functions detailed later

quota 4 The disk space quota allotted to this user Frequently unsupported On

some systems this may be a change or age field instead

comment 5 A comment, usually the user's full name On some systems this may be

a class field instead The comment field is often called the gcos field,but this is not technically accurate; this or the next item may thereforeactually contain the comment

gcos 6 Also known as GECOS, standing for 'General Electric Computer

Operating System' An extended comment containing a commaseparated series of values – for example the user's name, location andwork/home phone numbers Frequently unimplemented, but see note

on comment above

dir 7 The home directory of the user, for example, /home/name

shell 8 The preferred login shell of the user, for example, /usr/bin/bash

expire 9 The expiry date of the user account Frequently unsupported, often

Supporting getpwent are the setpwent and endpwent functions The setpwent function resets thepointer for the next record returned by getpwent to the start of the password file It is analogous to the

rewinddir function in the same way that getpwent is analogous to both opendir and readdir

combined Since there only is one password file, it takes no arguments:

setpwent;

Trang 18

The endpwent function is analogous to closedir: It closes the internal file pointer created whenever

we use getpwent (or getpwnam /getpwuid, detailed below) We cannot get access to this internalfilehandle, but we can still free it if we are resource-conscious programmers Additionally, if a networkquery was made then this will close the connection:

endpwent;

The getpwnam and getpwuid functions look up user names and user ids from each other getpwnam

takes a user name as an argument and returns the user id in scalar context or the full list of ten fields in

Both functions also have the same effect as setpwent in that they reset the position of the pointer used

by getpwent, so they cannot be combined with it in loops

Since ten fields is rather a lot to manage, Perl provides the User::pwent module to provide an oriented interface to the pw functions It is one of several modules that all behave similarly; others are

object-User::grent (for group information), Net::hostent, Net::servent, Net::netent,

Net::protoent (for network information) and Stat (for the stat and lstat functions)

User::pwent works by overloading the built-in getpwent, getpwnam, and getpwuid functions withobject-oriented methods returning a pw object, complete with methods to extract the relevant fields Italso has the advantage of knowing what methods actually apply, which we can determine using the

pw_has class method Here is an object-oriented user information listing program, which uses

getpwent to illustrate how the User::pwent module is used:

#!/usr/bin/perl

# listobjpw.pl

use warnings;

use strict;

use User::pwent qw(:DEFAULT pw_has);

print "Supported fields: ", scalar(pw_has), "\n";

while (my $user = getpwent) {

print 'Name : ', $user->name, "\n";

print 'Password: ', $user->passwd, "\n";

print 'User ID : ', $user->uid, "\n";

print 'Group ID: ', $user->gid, "\n";

# one of quota, change or ageprint 'Quota : ', $user->quota, "\n" if pw_has('quota');

print 'Change : ', $user->change, "\n" if pw_has('change');

print 'Age : ', $user->age, "\n" if pw_has('age');

Trang 19

# one of comment or class (also possibly gcos is comment)print 'Comment : ', $user->comment, "\n" if pw_has('comment');

print 'Class : ', $user->class, "\n" if pw_has('class');

print 'Home Dir: ', $user->dir, "\n";

print 'Shell : ', $user->shell, "\n";

# maybe gcos, maybe notprint 'GECOS : ',$user->gecos,"\n" if pw_has('gecos');

# maybe expires, maybe notprint 'Expire : ', $user->expire, "\n" if pw_has('expire');

# seperate recordsprint "\n";

}

If called with no arguments, the pw_has class method returns a list of supported fields in list context,and a space-separated string suitable for printing in scalar context Because we generally want to use itwithout prefixing User::pwent:: we specify it in the import list However, to retain the defaultimports that override getpwent etc., we also need to specify the special :DEFAULT tag

We can also import scalar variables for each field and avoid the method calls by adding the :FIELDS

tag (which also implies :DEFAULT) to the import list This generates a set of scalar variables with thesame names as their method equivalents but prefixed with pw_ The equivalent of the above object-oriented script written using field variables is:

#!/usr/bin/perl

# listfldpw.pluse warnings;

use strict;

use User::pwent qw(:FIELDS pw_has);

print "Supported fields: ", scalar(pw_has), "\n";

while (my $user = getpwent) {print 'Name : ', $pw_name, "\n";

print 'Password: ', $pw_passwd, "\n";

print 'User ID : ', $pw_uid, "\n";

print 'Group ID: ', $pw_gid, "\n";

# one of quota, change or ageprint 'Quota : ', $pw_quota, "\n" if pw_has('quota');

print 'Change : ', $pw_change, "\n" if pw_has('change');

print 'Age : ', $pw_age, "\n" if pw_has('age');

# one of comment or class (also possibly gcos is comment)print 'Comment : ', $pw_comment, "\n" if pw_has('comment');

print 'Class : ', $pw_class, "\n" if pw_has('class');

print 'Home Dir: ', $pw_dir, "\n";

print 'Shell : ', $pw_shell, "\n";

# maybe gcos, maybe notprint 'GCOS : ', $pw_gecos, "\n" if pw_has('gecos');

Trang 20

# maybe expires, maybe notprint 'Expire : ', $pw_expire, "\n" if pw_has('expire');

# seperate recordsprint "\n";

}

We may selectively import variables if we want to use a subset, but since this overrides the defaultimport we must also explicitly import the functions we want to override:

use User::grent qw($pw_name $pw_uid $pw_gid getpwnam);

To call the original getpwent, getpwnam, and getpwuid functions we can use the CORE:: prefix.Alternatively, we could suppress the overrides by passing an empty import list or an list containingneither :DEFAULT nor :FIELDS As an example, here is another version of the above script that invents

a new object method has for the Net::pwent package and then uses that and class method calls only,avoiding all imports:

print "Supported fields: ", scalar(User::pwent::has), "\n";

while (my $user = User::pwent::getpwent) {

print 'Name : ', $user->name, "\n";

print 'Password: ', $user->passwd, "\n";

print 'User ID : ', $user->uid, "\n";

print 'Group ID: ', $user->gid, "\n";

# one of quota, change or ageprint 'Quota : ', $user->quota, "\n" if $user->has('quota');

print 'Change : ', $user->change, "\n" if $user->has('change');

print 'Age : ', $user->age, "\n" if $user->has('age');

# one of comment or class (also possibly gcos is comment)print 'Comment : ', $user->comment, "\n" if $user->has('comment');

print 'Class : ', $user->class, "\n" if $user->has('class');

print 'Home Dir: ', $user->dir, "\n";

print 'Shell : ', $user->shell, "\n";

# maybe gcos, maybe notprint 'GECOS : ', $user->gecos, "\n" if $user->has('gecos');

# maybe expires, maybe notprint 'Expire : ', $user->expire, "\n" if $user->has('expire');

Trang 21

# separate recordsprint "\n";

}

As a convenience, the Net::pwent module also provides the getpw subroutine, which takes either auser name or a user id, returning a user object either way:

$user = getpw($user_name_or_id);

If the passed argument looks numeric, then getpwuid is called underneath to do the work; otherwise

getpwnam is called

Group Information

UNIX groups are a second tier of privileges between the user's own privileges and those of all users onthe system Files, for example, carry three sets of permissions for reading, writing, and execution – onefor the file's owner, one for the file's owning group, and one for everyone else (see later in the chapterfor more on this) All users belong to one primary group, and files they create are assigned to this group.This information is locally recorded in the /etc/passwd file and can be found locally or remotely withthe getpwent, getpwnam, and getpwuid functions as described above In addition, users may belong

to any number of secondary groups This information, along with the group ids (or 'gid's) and groupnames, is locally stored in the /etc/group file and can be extracted locally or remotely with the

getgrent, getgrnam, and getgrgid functions

The getgrent function reads one entry from the groups file each time it is called, starting with the firstand returning the next entry in turn on each subsequent call It returns four fields, the group name, apassword (which is usually not defined), the group id, and the users who belong to that group:

#!/usr/bin/perl

# listgr.pluse warnings;

use strict;

while (my ($name, $passwd, $gid, $members) = getgrent) {

print "$gid: $name [$passwd] $members \n";

}Alternatively, if we call getgrent in a scalar context, it returns just the group name:

#!/usr/bin/perl

# listgroups.pluse warnings;

Trang 22

As with getpwent, using getgrent causes Perl (or more accurately, the underlying C library) to open

a filehandle (or open a connection to an NIS or NIS+ server) internally Mirroring the supportingfunctions of getpwent, setgrent resets the pointer of the group filehandle to the start, and endgrent

closes the file (and/or network connection) and frees the associated resources

Perl provides the User::grent module as an object-oriented interface to the getgrent, getgrnam,and getgrid functions It works very similarly to User::pwent, but provides fewer methods as it hasfewer fields to manage It also does not have to contend with the variations of field meanings that

User::pwent does, and is consequently simpler to use Here is an object-oriented group lister using

while (my $group = getgrent) {

print 'Name : ', $group->name, "\n";

print 'Password: ', $group->passwd, "\n";

print 'Group ID: ', $group->gid, "\n";

print 'Members : ', join(', ', @{$group->members}), "\n\n";

}

Like User::pwent (and indeed all similar modules like Net::hostent, etc.) we can import the

:FIELDS tag to variables that automatically update whenever any of getgrent, getgrnam, or

getgrgid are called Here is the previous example reworked to use variables:

#!/usr/bin/perl

# listfldgr.pl

use warnings;

use strict;

use User::grent qw(:FIELDS);

while (my $group = getgrent) {

print 'Name : ', $gr_name, "\n";

print 'Password: ', $gr_passwd, "\n";

print 'Group ID: ', $gr_gid, "\n";

print 'Members : ', join(', ', @{$group->members}), "\n\n";

}

We can also selectively import variables if we only want to use some of them:

use User::grent qw($gr_name $gr_gid);

In this case the overriding of getgrent etc will not take place, so we would need to call

User::grent::getgrent rather than just getgrent, or pass getgrent as a term in the import list

To avoid importing anything at all, just pass an empty import list

Trang 23

The Unary File Test Operators

Perl provides a full complement of file test operators They test filenames for various properties, forexample, determining whether they are a file, directory, link, or other kind of file, determining whoowns the them, and their access privileges All of these file tests consist of a single minus followed by aletter, which determines the nature of the test, and either a filehandle or a string containing the filename Here are a few examples:

-r $filename # return true if file is readable by us-w $filename # return true if file is writable by us-d DIRECTORY # return true if DIRECTORY is opened to a directory-t STDIN # return true if STDIN is interactive

Collectively these functions are known as the -X or file test operators

The slightly odd-looking syntax comes from the UNIX file test utility test and the built-in equivalents

in most UNIX shells Despite their strange appearance, the file test operators are really functions thatbehave just like any other built-in unary (single argument) Perl operator, and will happily acceptparentheses:

print "It's a file!" if -f($filename);

If no filename or handle is supplied then the value of $_ is used as a default, which makes for some veryterse if somewhat algebraic expressions:

foreach (@files) {print "$_ is readable textile\n" if -r && -T; # -T for 'text' file}

Only single letters following a minus sign are interpreted as file tests, so there is never any confusionbetween file test operators and negated expressions:

-o($name) # test if $name is owned by us-oct($name) # return negated value of $name interpreted as octal

The full list of file tests is given below, loosely categorized into functional groups Note that not all ofthese tests may work, depending on the underlying platform For instance, operating systems that do notunderstand ownership in the UNIX model will not make a distinction between -r and -R, since thisrequires the concept of real and effective user IDs (the Win32 API does support 'impersonation' but this

is not the same thing and is not used here) They will also not return anything useful for -o Similarly,the -b and -c tests are specific to UNIX device files and have no relevance on other platforms

This tests for the existence of a file:

-e Return true if file exists Equivalent to the return value of the stat function

Trang 24

These test for read, write, and execute for effective and real users On non-UNIX platforms, which don'thave the concepts of real and effective users, the capital and lowercase versions are equivalent:

-r Return true if file is readable by effective user id

-R Return true if file is readable by real user id

-w Return true if file is writable by effective user id

-W Return true if file is writable by real user id

-x Return true if file is executable by effective user id

-X Return true if file is executable by real user id

The following test for ownership and permissions (-o returns 1, others ' ' on non-UNIX platforms) Notethat these are UNIX based commands On Windows, files are owned by 'groups' as opposed to 'users':

-o Return true if file is owned by our real user id

-u Return true if file is setuid (chmod u+S, executables only)

-g Return true if file is setgid (chmod g+S executables only), this does not exist on

Windows

-k Return true if file is sticky (chmod +T, executables only), this does not exist on

Windows

These tests for size work on Windows as on UNIX:

-z Return true if file has zero length (that is, it is empty)

-s Return true if file has non-zero length (opposite of -z)

The following are file type tests While -f, -d, -t are generic, the others are platform dependent:

-f Return true if file is a plain file (that is, not a directory, link, pipe, etc.)

-d Return true if file is a directory

-l Return true if file is a symbolic link

-p Return true if file is a named pipe or filehandle is a pipe filehandle

-S Return true if file is a UNIX domain socket or filehandle is a socket filehandle

-b Return true if file is a block device

-c Return true if file is a character device

-t Return true if file is interactive (opened to a terminal)

We can use -T and -B to test whether a file is text or binary:

-T Return true if file is a text file See below for details

-B Return true if file is not a text file See below details

Trang 25

The following test for times, and also work on Windows:

-M Returns the age of the file as a fractional number of days, counting from the time at

which the application started (which avoids a system call to find the current time) To testwhich of two files is more recent we can write:

$file = (-M $file1 > -M $file2)? $file1: $file2;

-A Returns last access time

-C On UNIX, returns last inode change time (not creation time, as is commonly

misconceived; this does return the creation time, but only so long as the inode has notchanged since the file was created) On other platforms, it returns the creation time

Link Transparency and Testing for Links

This section is only relevant if our chosen platform supports the concept of symbolic links, which is tosay all UNIX variants, but not most other platforms (in particular, Windows 'shortcuts' are an artifact ofthe desktop, and nothing to do with the actual filing system)

The stat function, which is the basis of all the file test operators (except -l) automatically followssymbolic links and returns information based on the real file, directory, pipe, etc., that it finds at the end

of the link Consequently, file tests like -f and -d return true if the file at the end of the link is a plainfile or directory We do not therefore have to worry about links when we just want to know if a file isreadable:

@lines;

if (-e $filename) {

if (-r $filename) {open FILE, $filename; # open file for reading

@lines = <FILE>;

} else {die "Cannot open $filename for reading \n";

}} else {die "Cannot open $filename - file does not exist \n";

}

If we want to find out if a file actually is a link, we have to use the -l test This gathers informationabout the link itself and not the file it points to, returning true of the file is in fact a link A practicalupshot of this is that we can test for broken links by testing -l and -e:

if (-l $file and !-e $file) {print "'$file' is a broken link! \n";

}This is also useful for testing that a file is not a link when we do not expect it to be A utility designed to

be run under 'root' should check that files it writes to have not been replaced with links to

/etc/passwd for example

Testing Binary and Text Files

-T and -B test files to see if they are text or binary They do this by examining the start of the file andcounting the number of non-text characters present If this exceeds a third, the file is determined to bebinary, otherwise it is determined to be text If a null (ASCII 0) character is seen anywhere in theexamined data then the file is binary

Trang 26

Since -T and -B only make sense in the context of a plain file, they are commonly combined with -f:

if (-f $file && -T $file) {

Reusing the Results of a Prior 'stat' or 'lstat'

The underlying mechanism behind the file test operators is a call to either stat or (in the case of -l,

lstat) In order to test the file, each operator will make a call to stat to interrogate the file forinformation If we want to make several tests this is inefficient, because a disk access needs to be made

print "$filename does not exist \n";

}

Note that in this example we have used lstat so the link test -l _ will work correctly -l requires an

lstat and not a stat, and will generate an error if we try to use it with the results of a previous stat:The stat preceding -l _ wasn't an lstat

Caching of the results of stat and lstat works for prior file tests too, so we could also write

something like this:

if (-e $filename) {

print "$filename exists \n";

print "$filename is a file \n" if -f _;}

Trang 27

Or:

if (-f $filename && -T _) {print "$filename exists and is a text file \n";

}

The only drawback to this is that only -l calls lstat, so we cannot test for a link this way unless thefirst test is -l

Using 'stat' Objects

Accessing the values returned by stat can be a little inconvenient, not to mention inelegant Forexample, this is how we find the size of a file:

$size = (stat $filename)[7];

Or, printing it out:

print ((stat $filename)[7]); # need to use extra parentheses with print

Unless we happen to know that the eighth element is the size or we are taking care to write particularlylegible code, this leads to unfriendly code Fortunately we can use the File::stat module instead.The File::stat module simplifies the use of stat and lstat by overriding them with subroutinesthat return stat objects instead of a list These objects can then be queried using one of File::stat'smethods, which have the same names as the values that they return

As an example, this short program uses the size, blksize, and blocks methods to return the size ofthe file supplied on the command line:

#!/usr/bin/perl

# filesize.pluse warnings;

" bytes and occupies ", $stat->blksize * $stat->blocks,

" bytes of disc space \n";

} else {print "Cannot stat $filename: $| \n";

}

As an alternative to using object methods, we can import thirteen scalar variables containing the results

of the last stat or lstat into our program by adding an import list of :FIELDS Each variable takesthe same name as the corresponding method prefixed with the string st_ For example:

Trang 28

# filesizefld.pl

use warnings;

use strict;

use File::stat qw(:FIELDS);

print "Enter filename: ";

my $filename = <>;

chomp($filename);

if (stat $filename) {

print "'$filename' is ", $st_size,

" bytes and occupies ", $st_blksize * $st_blocks,

" bytes of disc space \n";

} else {

print "Cannot stat $filename: $| \n";

}

If we want to use the original versions of stat and lstat we can do so by prefixing them with the

CORE:: package name:

use File::stat;

@new_stat = stat $filename; # use new 'stat'

@old_stat = CORE::stat $filename; # use original 'stat'

Alternatively we can prevent the override from happening by supplying an empty import list:

use File::stat qw(); # or '', etc

We can now use the File::statstat and lstat methods by qualifying them with the full

package name:

$stat = File::stat::stat $filename;

print "File is ", $stat->size(), " bytes \n";

The full list of File::stat object methods and field names is presented in the section 'InterrogatingFiles' later in the chapter

Access Control Lists, the Superuser, and the 'filestat' Pragma

The file tests -r, -w, and -x, and their uppercase counterparts determine their return value from theresults of the stat function Unfortunately this does not always produce an accurate result Some of thereasons that these file tests may produce incorrect or misleading results include:

Y An Access Control List (ACL) is in operation

Y The filesystem is read-only

Y We have superuser privileges

Trang 29

All these cases tend to produce 'false positive' results, implying that the file is accessible when in fact it

is not For example, the file may be writable, but the filesystem is not

In the case of the superuser, -r, -R, -w, and -W will always return true, even if the file is set as

unreadable and unwritable, because the superuser can just disregard the actual file permissions

Similarly, -x and -X will return true if any of the execute permissions (user, group, other) are set Tocheck if the file is really writable, we must use stat and check the file permissions directly:

$mode = ((stat $filename)[2]);

$writable = $mode & 0200; # test for owner write permission

Again, this is a UNIX-specific example Most other platforms do not support permissions; Windows

NT does, but does it a different way.

For the other cases we can try to use the filetest pragma, which alters the operation of the file testsfor access by overriding them with more rigorous tests that interrogate the operating system instead.Currently there is only one mode of operation, access, which causes the file test operators to use theunderlying access system call, if available:

use filetest 'access';

This modifies the behavior of the file test operators to use the operating system's access call to checkthe true permission of a file, as modified by access control lists, or filesystems that are mounted read-only It also makes an access subroutine, which allows us to make our own direct tests of filenames(note that it does not work on filehandles), available to us It takes a filename and a numeric flagcontaining the permissions we want to check for These are defined as constants in the POSIX module:

R_OK Test file has read permission

W_OK Test file has write permission

X_OK Test file has execute permission

F_OK Test that file exists Implied by R_OK, W_OK, or X_OK

Note that F_OK is implied by the other three, so it need never be specified directly (to test for existence

we can as easily use the -e test, or -f if we require a plain file)

While access provides no extra functionality over the standard file tests, it does allow us to make morethan one test simultaneously As an example, to test that a file is both readable and writable we woulduse:

use filetest 'access';

use POSIX;

$can_readwrite = access($filename, R_OK|W_OK);

The return value from access is undef on failure and '0 but true' (a string that evaluates to zero in anumeric context and true in any other) on success, for instance an if or while condition On failure $!

is set to indicate the reason

Trang 30

Automating Multiple File Tests

We often want to perform a series of different file tests across a range of different files Installationscripts, for example, often do this to verify that all the installed files are in the correct place and with thecorrect permissions

While it is possible to manually work though a list of files, we can make life a little simpler by using the

File::CheckTree module instead This module provides a single subroutine, validate, that takes aseries of filenames and -X style file tests and applies each of them in turn, generating warnings as it doesso

Unusually for a library subroutine, validate accepts its input in lines, in order to allow the list of filesand tests to be written in the style of a manifest list As an example, here is validate being used tocheck for the existence of three directories and an executable file installed by a fictional application:

$SIG{ WARN } = { }; # do nothing

$SIG{ WARN } = {print LOGFILE @_}; # redirect to install log

In fact this may be necessary in any case, since validate (as of Perl 5.6, at least) fails to properlyinitialize some of its internal variables leading to Use of uninitialized value warnings We can eliminatethese with selective use of nowarnings or with a signal handler like the ones above

The same file may be listed any number of times, with different tests applied each time Alternatively,multiple tests may be bunched together into one file test, so that instead of specifying two tests one afterthe other they can be done together Hence, instead of writing two lines:

/home/install/myapp/bin/myapp -f

/home/install/myapp/bin/myapp -x

We can write both tests as one line:

/home/install/myapp/bin/myapp -fx

The second test is dependent on the first, so only one warning can be generated from a bunched test If

we want to test for both conditions independently (we want to know if it is not a plain file and we want

to know if it is not executable) we need to put the tests on separate lines

Trang 31

Normal and negated tests cannot be bunched, so if we want to test that a filename corresponds to aplain file that is not executable, we must use separate tests:

validate(q{

/home/install/myapp/scripts/myscript.pl -f/home/install/myapp/scripts/myscript.pl !-xug})

Rather than a file test operator, the test may also be the command cd This causes the directory named

at the start of the line to be made the current working directory Any relative paths given after this aretaken relative to that directory until the next cd, which may also be relative:

about_us.html -rftext.bin -f || warn "Not a plain file"

});

validate is entirely insensitive to extra whitespace, so we can use additional spacing to clarify what file is being tested where In the above example we have indented the files to make it clear which directory they are being tested in.

We can supply our own warnings, and make tests fatal by suffixing the file test with || and either warn

or die These work in exactly the same way as their Perl function counterparts However, while warn

may take an optional descriptive error message die ignores it In the above example we haveterminated immediately if the installation directory does not exist, since the other tests would bepointless If we do specify our own error messages we can use the variable $file, supplied by themodule, to insert the name of the file whose test failed:

validate(q{

/etc -d || warn "What, no $file directory? \n"

/var/spool -d || die})

This trick relies on the error messages being interpolated at run time, so using single quotes or the q

quoting operator is essential in this case

Relative pathnames specified to validate before a cd are taken relative to the current directory

Unfortunately validate currently does not take proper account of this when reporting errors and adds

a leading / to the pathname, giving the impression that the filename being tested is absolute (this may

be fixed in a later release) To work around this, an explicit cd to the current directory suffices:

Trang 32

localfile -f})

One of the advantages of File::CheckTree is that the file list can be built dynamically, possiblygenerated from an existing file tree created by File::Find (see later) For example, using

File::Find we can determine the type and permissions of each file and directory in a tree, thengenerate a test list suitable for File::CheckTree to validate new installations of that tree See 'FindingFiles' and the other modules in this section for pointers

Interrogating Files

While the file test operators are satisfactory for the majority of cases, if we want to interrogate a file indetail then it is sometimes more convenient to use stat or lstat directly and examine the results.Both functions return details of the filename or filehandle supplied as their argument lstat is identical

to stat except in the case of a symbolic link, where stat will return details of the file pointed to by thelink and lstat will return details of the link itself In either case, a thirteen element list is returned:

# stat filehandle into a list

@stat_info = stat FILEHANDLE;

# lstat filename into separate scalars

($dev, $inode, $mode, $nlink, $uid, $gid, $rdev, $size,

$time, $mtime, $ctime, $blksize, $blocks) = lstat $filename;

The thirteen values are always returned, but may not be defined or have meaning in every case, eitherbecause they do not apply to the file or filehandle being tested or because they have no meaning on theunderlying platform A full list of File::Stat methods and object names is shown below, includingthe meanings and index number in the @stat_info array:

Method Number Description

dev 0 The device number of the filesystem on which the file resides

ino 1 The inode of the file

mode 2 The file mode, combining the file type and the file permissions

nlink 3 The number of hard (not symbolic) references to the inode

underneath the filename

uid 4 The user id of user that owns the file

gid 5 The group id of group that owns the file

rdev 6 The device identifier (block and character special files only)

size 7 The size of the file, in bytes

atime 8 The last access time, in seconds

mtime 9 The last modification time, in seconds

ctime 10 The last inode change time, in seconds

Trang 33

Method Number Description

blksize 11 The preferred block size of the filesystem

blocks 12 The number of blocks allocated to the file The product of

$stat_info[11]*$stat_info[12] is the size of the file asallocated in the filesystem However, the actual size of the file interms of its contents will most likely be less than this as it will onlypartially fill the last block; use size for that

Several of the values returned by stat relate to the 'inode' of the file Under UNIX, the inode of a file

is a numeric id, which it is allocated by the filesystem, and which is its 'true' identity, with the filenamebeing just an alias Since more than one filename may point to the same file, the nlink value may bemore than one, though it cannot be less (since then the inode would have no filename and we would not

be able to refer to it) The ctime value indicates the last time the node of the file changed It may oftenmean the creation time Conversely, the access and modification times refer to actual file access

On other platforms, some of these values are either undefined or meaningless Under Windows, thedevice number is related to the drive letter, there is no 'inode' and the value of nlink is always '1' The

uid and gid values are always zero, and no value is returned for either blocksize or blocks, either.There is a mode, though only the file type is useful; the permissions are always 777 While Windows

NT does have a fairly complex permissions system it is not accessible this way; see below

Changing File Attributes

UNIX and other platforms that support the concept of file permissions and ownership can make use ofthe chmod and chgrp functions to modify the permissions of a file from Perl chmod modifies the filepermissions of a file for the three categories user, group, and other The chown function modifieswhich user corresponds to the user permissions, and which group corresponds to the group

permissions Every other user and group falls under the other category Ownership and permissionsare therefore inextricably linked

On UNIX, the file type and the file permissions are combined into the mode value returned by stat

On Windows, the file type is still useful, though the permissions are always set to 777

File Ownership

File ownership is a highly platform-dependent concept Perl grew up on UNIX systems, and so attempts

to handle ownership in a UNIX-like way Under UNIX and other platforms that borrowed their

semantics from UNIX, files have an owner, represented by the file's user id, and a group owner,

represented by the file's group id Each relates to a different set of file permissions, so the user may havethe ability to read and write a file whereas other users in the same group may only get to read it Othersmay not have even that, depending on the setting of the file permissions

File ownership is handled by the chown function, which maps to both the chown and chgrp systemcalls It takes at least three parameters; a user id, a group id, and one or more files to change:

@successes = chown $uid, $gid, @files;

Trang 34

The number of files successfully changed is returned If only one file is given to chown, this allows asimple Boolean test to be used to determine success:

unless (chown $uid, $gid, $filename) {

die "chown failed: $! \n";

}

To change only the user or group, supply -1 as the value for the other parameter For instance, a chgrp

function can be simulated with:

$chown_restricted = sysconf(_PC_CHOWN_RESTRICTED);

If this returns a true value then a chown will not be permitted

chown needs a user or group id to function, it will not accept a user or group name To deduce a user idfrom the name, at least on a UNIX-like system, we can use the getpwnam function Likewise, to deduce

a group id from the name we can use the getgrnam function We can use getpwent and getgrent

instead to retrieve one user or group respectively (see earlier in the chapter for more) As a quickexample, the following script builds tables of user and group ids, which can be subsequently used in

# print out basic user and group information

foreach my $user (sort {$users{$a} <=> $users{$b}} keys %users) {

print "$users{$user}: $user, group $usergroup{$user}

($groups[$usergroup{$user}])\n";

}

Trang 35

Perl provides two functions that are specifically related to file permissions, chmod and umask.

The chmod function allows us to set the permissions of a file Permissions are grouped into threecategories: user, which applies to the file's owner, group which applies to the file's group owner, and

other, which applies to anyone who is not the file's owner or a member of the file's group owner.Within each category each file may be given read, write, and execute permission

chmod represents each of the nine values (3 categories x 3 permissions) by a different numeric flag,which are traditionally put together to form a three digit octal number, each digit corresponding to therespective category The flag values within each digit are 4 for read permission, 2 for write permission,and 1 for execute permission, as demonstrated by the following examples (prefixed by a leading 0 toremind us that these are octal values):

0200 Owner write permission

0040 Group read permission

0001 Other execute permissionThe total of the read, write, and execute permissions for a category is 7, which is why octal is soconvenient to represent the combined permissions flag Read, write and execute permission for theowner only would be represented as 0700 Similarly, read, write and execute permission for the owner,read and execute permission for the group and execute only permission for everyone else would be:

0751, which is 0400+0200+0100 + 0040+0010 + 0001

Having explained the permissions flag, the chmod function itself is comparatively simple, taking apermissions flag, as calculated above, as its first argument and applying it to one or more files given asthe second and subsequent arguments For example:

try to set The following table shows the permission bits that can be used with umask and their

meanings:

Trang 36

umask number File Permission

umask only defines the access permissions Called without an argument, it returns the current value ofthe umask, which is inherited from the shell, and is typically set to a value of 002 (mask other writepermission) or 022 (mask group and other write permissions):

The open function always uses permissions of 0666 (read and write for all categories), whereas

sysopen allows the permissions to be specified in the call Since umask controls the permissions of newfiles by removing unwanted permissions, we do not need to (and generally should not) specify morerestrictive permissions to sysopen

The 'Fcntl' Module

The Fcntl module provides symbolic constants for all of the flags contained in both the permissionsand the filetype parts of the mode value It also provides two functions for extracting each part, as analternative to computing the values by hand:

use Fcntl qw(:mode); # import file mode constants

$type = IFMT($mode); # extract file type

$perm = IFMODE($mode); # extract file permissions

printf "File permissions are: %o \n", $perm;

Trang 37

The filetype part of the mode defines the type of the file, and is the basis of the file test operators like

-d, -f, and -l that test for the type of a file The Fcntl module defines symbolic constants for these:

S_IFREG Regular file -f

S_IFBLK Block special file -b

S_IFCHR Character special file -c

S_IFIFO Pipe or named fifo -p

S_IFWHT Interactive terminal -t

Note that Fcntl also defines a number of subroutines that test the mode for the desired property Thesehave very similar names, for example S_IFDIR and S_ISFIFO, and it is easy to get the subroutines andflags confused Since we have the file test operators, we do not usually need to use these subroutines, so

we mention them only to eliminate possibly confusion

These flags can also be used with sysopen, IO::File's new method and the stat function describedpreviously, where they can be compared against the mode value As an example of how these flags can

be used, here is the equivalent of the -d file test operator written using stat and the Fcntl module:

$mode = ((stat $filename)[2]);

$is_directory = $mode & S_IFDIR;

Or, to test that a file is neither a socket nor a pipe:

$is_not_special = $mode & ^(S_IFBLK | S_IF_CHR);

The Fcntl module also defines functions that do this for us Each function takes the same name as theflag but with S_IF replaced with S_IS For instance, to test for a directory we can instead use:

$is_directory = S_ISDIR($mode);

Of course the -d file test operator is somewhat simpler in this case

The permissions part of the mode defines the read, write, and execute privileges that the file grants tothe file's owner, the file's group, and others It is the basis of the file test operators like -r, -w, -u, and

-g that test for the accessibility of a file The Fcntl module also defines symbolic constants for these:

Trang 38

Name Description Number

S_IRWXU User can read, write, execute 00700

S_IRWXG Group can read, write, execute 00070

S_IRWXO Others can read, write, execute 00007

For example, to test a file for user read and write permission, plus execute permission, we could use:

$perms_ok = $mode & S_IRUSR | S_IWUSR | S_IRGRP;

To test that a file has exactly these permissions and no others we would instead write:

$exact_perms = $mode == S_IRUSR | S_IWUSR | S_IRGRP;

The file permission flags are useful not only for making sense of the mode value returned by stat butalso in the chmod function Consult the manual page for the chmod system call (on UNIX platforms) fordetails of the more esoteric bits such as sticky and swap

Linking, Unlinking, and Renaming Files

The presence of filenames can be manipulated directly with the link and unlink built-in functions.These provide the ability to edit the entries for files in the filesystem, creating new ones or removingexisting ones They are not the same as creating and deleting files, however On platforms that supportthe concept, link creates a new link (entry in the filing system) to an existing file, it does not create acopy (except on Windows, where it does exactly this) Likewise, unlink removes a filename from thefiling system, but if the file has more than one link, and therefore more than one filename, the file willpersist This is an important point to grasp, because it often leads to confusion

Trang 39

link will not create links for directories, though it will create links for all other types of file Fordirectories we can create symbolic links only Additionally, we cannot create hard links betweendifferent file systems, nor between directories on some file systems (for example, AFS) On UNIX, link

works by giving two names in the file system the same underlying inode On Windows and other filesystems that do not have this concept, an attempt to link will create a copy of the original file

On success, link returns true and a new filename will exist for the file The old one continues to existand can either be used to read, or alter the contents of the file Both links are therefore exactly

equivalent

Having said that, the file permissions of each link can differ Immediately after creation, the new linkwill carry the same permissions and ownership as the original, but this can subsequently be changedwith the chmod and chown built-in functions, to, for example, create a read-only and a read-write entrypoint to the same data

unlink <*.bak>; # the same, via a file glob

unlink is not necessarily the same as deleting a file, for two reasons First, if the file has more than onelink then it will still be available by other names in the file system Although we cannot (easily) find outthe names of the other links, we can find out how many links a file has through stat We can thereforeestablish in advance if unlink will remove the file from the file system completely, or just one of thelinks for it by calling stat:

$links = (stat $filename)[3];

Or more legibly with the File::stat module:

$stat = new File::stat($filename);

$links = $stat->nlink;

Trang 40

Second, on platforms that support it (generally UNIX-like ones), if any process has an open filehandlefor the file then it will persist for as long as the filehandle persists This means that even after an unlink

has completely removed all links to a file it will still exist and can be read, written, and have its contentscopied to a new file Indeed, the new_tmpfile method of IO::File does exactly this, of which morewill be said later in the chapter Other platforms (such as Windows) will generally reject the attempt tounlink the file so long as a process holds an open filehandle on it

Note that unlink will not unlink directories unless we are on UNIX, Perl was given the -U flag, and wehave superuser privilege Even so, it is an inadvisable thing to do, since it will also remove the directorycontents including any subdirectories and their contents from the filing system hierarchy, but notrecycle the space that they occupy on the disk Instead they will appear in the lost+found directorythe next time an fsck filing system check is performed, which is unlikely to be what we intended The

rmdir built-in command covered later in the chapter is the preferred approach, or the rmtree functionfrom File::Path for more advanced applications involving multiple directories

}

The built-in rename function is essentially equivalent to the above subroutine:

# using the built-in function:

rename($current, $new);

This is effective for simple cases but it will fail in a number of situations, most notably if the newfilename is on a different filesystem from the old (a floppy disk to a hard drive, or instance) rename

uses the rename system call, if available However, on many systems it is equivalent to the

link/unlink subroutine above For a properly portable solution that works across all platforms,consider using the move routine from the File::Copy module, which has been specifically written tohandle most special cases

Symbolic Links

On platforms that support it, we can also create a soft or symbolic link with the built-in symlink

function This is syntactically identical to link but creates a pointer to the file rather than a direct hardlink:

if (symlink $currentname, $newname) {

die "Failed to link: $! \n";

Tiêu đề	Input and Output with Filehandles
Trường học	Wrox Press
Chuyên ngành	Professional Perl Programming
Thể loại	sách
Năm xuất bản	2001
Thành phố	Birmingham

Định dạng
Số trang	120
Dung lượng	1,32 MB