1. Trang chủ
  2. » Công Nghệ Thông Tin

Classic Shell Scripting phần 7 pps

44 331 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 44
Dung lượng 875,45 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com... prefix ./ in each report line with the user-specified directory name so that the output files LC_ALL=C sort >

Trang 1

y

Create a soft (symbolic) link existent file

$ file two Diagnose this file

Find all files

./two

Find soft links only

$ find -type l -follow Find soft links and try to

e A common idiom is find

odified in the last week

$ ls Show that we have an empt

The -links option requires a following integer number If it is unsigned, it selects only fi

ard links If it is negative, only files with fewer than that many (in absolute value) link

plus sign, then only files with more than that many links are selected Thus, the usual way to find files with halinks is f

The -atime (access time), -ctime (inode-change time), and -mtime (modification time) options require a

following integer number, measured in days If unsigned, it means exactly that many days old If negative, it means less than that absolute value With a plus sign, it means more than that valu

find files m

-mtime -7 to

It is regrettable that find does not allow the number to have a fractional part or a units suffix: we've often wanted to specify units of years, months, weeks, hours, minutes, or

seconds with these options GNU find provides the -amin, -cmin, and -mmin options

which take values in minutes, but units suffixes on the original timestamp selection options would have been more general

the specified file If you need

me timestampfile, and then egate the selector: ! -newer timestampfile

n to be taken They can be

th the -a (AND) option if you wish There is also a -o (OR) option that specifies that at least one

selector of the surrounding pair must match Here are two simple examples of the use of these Boolean

A related option, -newer filename, selects only files modified more recently than

finer granularity than a day, you can create an empty file with touch -t date_ti

use that file with the -newer option If you want to find files older than that file, n

The find command selector options can be combined: all must match for the actio

Trang 2

and -o operators, together with the grouping options \( and \), can be used to create complex Boolean

rs You'll rarely need them, and when you do, you'll find them

nce they are debugged, and then just use that script happily ever after

2 A simple find script

feeding them into a simple pipeline Now let's look at a slightly more complex example In Se , we

all HTML

filename into variable

progress

mv $file $file.save Save a backup copy

sed -f $HOME/html2xhtml.sed < $file.save > $file Make the change

In this section, we develop a real working example of find's virtuosity.[8]

presented a simple sed script to (begin to) convert HTML to XHTML

ML-based version of HTML Combining sed with find and a simple loop accomplishe

lines of code:

cd top level web site directory

find -name '*.html' -type f | Find

10.4.3.3 A complex find script

It is a shell script named

rontab

filesdirectories that some of our local users with large home-directory trees run nightly via the c

system (see Section 13.6.4) to create several lists of files and directories, grouped by the number of days within

faster elf

iple output files to be ver a version that

which they have been changed This helps remind them of their recent activities, and provides a much

itsway to search their trees for particular files by searching a single list file rather than the filesystem

[8]

Our thanks go to Pieter J Bowman at the University of Utah for this example

filesdirectories requires GNU find for access to the -fprint option, which permits mult

this script ocreated in one pass through the directory tree, producing a tenfold speedup for

used multiple invocations of the original Unix find

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 3

al security feature, the script invokes umask to limit access to the owner of the output files:

umask 077 # ensure file privacy

It then initializes TMPFILES to a long list of temporary files that collect the output:

$TMP/DIRECTORIES.all.$$ $TMP/DIRECTORIES.all.$$.tmp

$TMP/DIRECTORIES.last01.$$ $TMP/DIRECTORIES.last01.$$.tmp

ES.last02.$$.tmp

7.$$.tmp last14.$$.tmp $TMP/DIRECTORIES.last31.$$ $TMP/DIRECTORIES.last31.$$.tmp

MP/FILES.all.$$ $TMP/FILES.all.$$.tmp

FILES.last01.$$.tmp $TMP/FILES.last02.$$ $TMP/FILES.last02.$$.tmp

$TMP/FILES.last07.$$ $TMP/FILES.last07.$$.tmp

$TMP/FILES.last14.$$ $TMP/FILES.last14.$$.tmp

$TMP/FILES.last31.$$ $TMP/FILES.last31.$$.tmp

"

contain the names of directories and files in the entire tree (*.all.*), as well as the names of

t day (*.last01.*), last two days (*.last02.*), and so on

The WD variable saves the argument directory name for later use, and then the script changes to that directory:

Changing the working directory before running find solves two problems:

#! /

set the IFS variable to newline-space-tab:

IFS='

'

and set the PATH variable to ensure that GNU find is found first:

PATH=/usr/local/bin:/bin:/usr/bin # need GNU f

It then checks for the expected single argument, and otherwise, prints a brief error message on standard errand exits with a nonzero status value:

Trang 4

ment is not a directory, or is but lacks the needed permissions, then the cd command fails, and

the script terminates immediately with a nonzero exit value

w symbolic links unless given extra options, but there is no way to tell it to do so only for the top-level directory In

t is

erminates:

The exit status value is preserved across the TRap (see Section 13.3.2

• If the argu

If the argument is a symbolic link, cd follows the link to the real location find does not follo

practice, we do not want filesdirectories to follow links in the directory tree, although i

straightforward to add an option to do so

The trap commands ensure that the temporary files are removed when the script t

trap 'exit 1' HUP INT PIPE QUIT TERM

trap 'rm -f $TMPFILES' EXIT

The lines with the -name

e names of the output files from a previous run, and the -true option causes them to be ignored

so ey do not clutter the output reports:

IRECTORIES.all -true \ -o -name 'DIRECTORIES.last[0-9][0-9]' -true \

rue \ 0-9][0-9]' -true \

es to $TMP/FILES.all.$$: \

The next five lines select files modified in the last 31, 14, 7, 2, and 1 days (the -type f selector is still in effect),

-a -mtime -14 -fprint $TMP/FILES.last14.$$ \

ones,

il the next three, so it will be included only in the FILES.last31.$$ and FILES.last14.$$ files

The next line matches all ordinary files, and the -fprint option writes their nam

-o -type f -fprint $TMP/FILES.all.$$

and the -fprint option writes their names to the indicated temporary files:

-a -mtime -31 -fprint $TMP/FILES.last31.$$ \

-a -mtime -7 -fprint $TMP/FILES.last07.$$ \

-a -mtime -2 -fprint $TMP/FILES.last02.$$ \

-a -mtime -1 -fprint $TMP/FILES.last01.$$ \

The tests are made in order from oldest to newest because each set of files is a subset of the previous

reducing the work at each step Thus, a ten-day-old file will pass the first two -mtime tests, but will fa

The next line matches directories, and the -fprint option writes their names to $TMP/DIRECTORIES.al

-o -type d -fprint $TMP/DIRECTORIES.all.$$ \

The final five lines of the find command match subsets of directories (the -type d selecto

write their names, just as for files earlier in the command:

TORIE -a -mtime -31 -fprint $TMP/DIREC

-a -mtime -14 -fprint $TMP/DIRECTORIES.last14.$$ \

-a -mtime -7 -fprint $TMP/DIRECTORIES.last07.$$ \

-a -mtime -1 -fprint $TMP/DIREC

e find command finishes, its preliminary reports are available in the temporary

Trang 5

prefix ./ in each report line with the user-specified directory name so that the output files

LC_ALL=C sort > $TMP/$i.$$.tmp

d to, and avoids surprise iverse environments because our systems differ in their default locales

he loop over the report files:

DIRECTORIES.last07 DIRECTORIES.last02 DIRECTORIES.last01

do

sed replaces the

contain full, rather than relative, pathnames:

sed -e "s=^[.]/=$WD/=" -e "s=^[.]$=$WD=" $TMP/$i.$$ |

sort orders the results from sed into a temporary file named by the input filename suffixed with tmp:

Setting LC_ALL to C produces the traditional Unix sort order that we have long been use

n when more modern locales are set Using the traditional order is particularly helpful in our and confusio

d

The cmp command silently checks whether the report file differs from that of a previous run, and if so, replaces

the old one:

cmp -s $TMP/$i.$$.tmp $i || mv $TMP/$i.$$.tmp $i

Otherwise, the temporary file is left for cleanup by the trap handler

The final statement of the script completes t

done

At runtime, the script terminates via the EXIT trap set earlier

The complete filesdirectories script is collected in Example 10-1 Its structure should be clear enough that

t files, such as for files and directories modified in the last quarter,

f the -mtime values, you can get reports of files that have not been

recently modified, which might be helpful in tracking down obsolete files

iles and directories, and groups of ently modified ones, in a directory tree, creating

s in FILES.* and DIRECTORIES.* at top level

es directory

t PATH

if [ $# -ne 1 ]

tory

you can easily modify it to add other repor

half year, and year By changing the sign o

Example 10-1 A complex shell script for find

umask 077 # ensure file privacy

TMP=${TMPDIR:-/tmp} # allow alternate temporary direc

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 6

$TMP/DIRECTORIES.all.$$ $TMP/DIRECTORIES.all.$$.tmp

mp

/FILES.last01.$$ $TMP/FILES.last01.$$.tmp $TMP/FILES.last02.$$ $TMP/FILES.last02.$$.tmp

.$$.tmp 4.$$.tmp

$$ $TMP/FILES.last31.$$.tmp "

WD=$1

cd $WD || exit 1

trap 'exit 1' HUP INT PIPE QUIT TERM

-name DIRECTORIES.all -true \

-o -name FILES.all -true \

-a -mtime -31 -fprint $TMP/FILES.last31.$$ \

or i in FILES.all FILES.last31 FILES.last14 FILES.last07 \

ES.all \

14 \ DIRECTORIES.last07 DIRECTORIES.last02 DIRECTORIES.last01

e "s=^[.]$=$WD=" $TMP/$i.$$ |

oblem Files

TMPFILES="

$TMP/DIRECTORIES.last01.$$ $TMP/DIRECTORIES.last01.$$.tmp $TMP/DIRECTORIES.last02.$$ $TMP/DIRECTORIES.last02.$$.tmp $TMP/DIRECTORIES.last07.$$ $TMP/DIRECTORIES.last07.$$.tmp $TMP/DIRECTORIES.last14.$$ $TMP/DIRECTORIES.last14.$$.t $TMP/DIRECTORIES.last31.$$ $TMP/DIRECTORIES.last31.$$.tmp $TMP/FILES.all.$$ $TMP/FILES.all.$$.tmp

-o -name 'DIRECTORIES.last[0-9][0-9]' -true \

-o -name 'FILES.last[0-9][0-9]' -true \

-o -type f -fprint $TMP/FILES.all.$$ \

-a -mtime -14 -fprint $TMP/FILES.last14.$$ \

-a -mtime -7 -fprint $TMP/FILES.last07.$$ \

-a -mtime -2 -fprint $TMP/FILES.last02.$$ \

-a -mtime -1 -fprint $TMP/FILES.last01.$$ \

-o -type d -fprint $TMP/DIRECTORIES.all.$$ \ -a -mtime -31 -fprint $TMP/DIRECTORIES.last31.$$ \ -a -mtime -14 -fprint $TMP/DIRECTORIES.last14.$$ \ -a -mtime -7 -fprint $TMP/DIRECTORIES.last07.$$ \ -a -mtime -2 -fprint $TMP/DIRECTORIES.last02.$$ \ -a -mtime -1 -fprint $TMP/DIRECTORIES.last01.$$ f

FILES.last02 FILES.last01 DIRECTORI

Trang 7

In Section 10.1, we noted the difficulties presented by filenames containing special characters, such as newline

GNU find has the -print0 option to display filenames as NUL-terminated strings Since pathnames can legally

contain any character except NUL, this option provides a way to produce lists of filenames that can be parsed unambiguously

It is hard to parse such lists with typical Unix tools, most of which assume line-oriented text input However, in

a compiled language with byte-at-a-time input, such as C, C++, or Java, it is straightforward to write a program

he presence of problematic filenames in your filesystem Sometimes they get there by simple

isguising For example, suppose that you did a directory listing and got output like this:

wo special hidden not have seen any hidden files, and also, there appears to be a space before the first dot in the output Something

-print0 | od -ab Convert NUL-terminated

to octal and ASCII

0000000 nul / sp nul / sp nul /

056 056 000 056 057 056 sp sp nl

Now we can see what is going on: we have the normal dot directory, then a file named space-dot, another

dot-dot-space-dot-dot-space-dot-dot-space-dot-space-newline-newline-newline-space-space Unless someone was practicing Morse code in your

f e files look awfully suspicious, and you should investigate them further before you get rid of them

10.5 Running Commands: xargs

for the symbol POSIX_OPEN_MAX in system header files:

grep POSIX_OPEN_MAX /dev/null $(find /usr/include -type f | sort)

dotted files for the current and parent directory However, notice that we did not use the -a

is just not right! Let's apply find and od to investigate further:

Wh produces a list of files, it is often useful to be

nd Normally, this is done with the shell's com

$

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 8

/usr/include/limits.h:#define _POSIX_OPEN_MAX 16

W you write a program or a command that deals with a list of objects, you should make sure that it

no output: that will not happen here, but it is good to develop defensive program

t ined length of a command line and its environment variables is exceeded When that happens, you'll s

$ grep POSIX_OPEN_MAX /dev/null $(find /usr/include -type f | sort)

/usr/local/bin/grep: Argument list too long

That lim

of AR

31072

On the systems that we tested, the reported values ranged from a low of 24,576 (IBM AIX) to a high of

tandard input, one ) to another

command given as arguments to xargs Here is an example that eliminates the obnoxious Argument list too long error:

$ find /usr/include -type f | xargs grep POSIX_OPEN_MAX /dev/null

nclude/bits/posix1_lim.h:#define _POSIX_OPEN_MAX 16

_POS

ing it to print the

rted match If xargs gets no input filenames, it terminates silently without even

ng its argument program

has the —null option to handle the NUL-terminated filename lists produced by GNU find's -print0 ption xargs passes each such filename as a complete argument to the command that it runs, without danger of

shell (mis)interpretation or newline confusion; it is then up to that command to handle its arguments sensibly

t awk program, you

$ find -ls | awk '{Sum += $7} END {printf("Total: %.0f bytes\n", Sum)}'

However, that report underestimates the space used, because files are allocated in fixed-size blocks, and it tells

us nothing about the used and available space in the entire filesystem Two other useful tools provide better

solutions: df and du

henever

ehaves properly if the list is empty Because grep reads standard input when it is given no file argu

d an argument of /dev/null to ensure that it does not hang waiting for terminal input if find p

ming habits

tput from the substituted command can sometim

he comb

ee this instead:

it can be found with getconf:

conf ARG_MAX Get system configuration

G_MAX

1

1,048,320 (Sun Solaris)

The solution to the ARG_MAX problem is provided by xargs: it takes a list of arguments on s

per line, and feeds them in suitably sized groups (determined by the host's value of ARG_MAX

/usr/i

/usr/include/bits/posix1_lim.h:#define _POSIX_FD_SETSIZE

IX_OPEN_MAX

Here, the /dev/null argument ensures that grep always sees at least two file arguments, caus

filename at the start of each repo

invoki

GNU xargs

o

xargs has options to control where the arguments are substituted, and to limit the number of arguments passed

to one invocation of the argument command The GNU version can even run multiple argument processes in parallel However, the simple form shown here suffices most of the time Consult the xargs(1) manual pages for

ls, and for examples of some of the wizardry possible with its fancier features

further detai

10.6 Filesystem Space Information

With suitable options, the find and ls commands report file sizes, so with the help of a shor

can report how many bytes your files occupy:

Total: 23079017 bytes

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 9

lude only local filesystems, excluding network-mounted ones Here is a typical example from one of our web servers:

Filesystem 1K-blocks Used Available Use% Mounted on

38M 7.9M 29M 22% /boot 9.7G 6.2G 3.0G 68% /export none 502M 0 502M 0% /dev/shm

dev/sda8 99M 4.4M 90M 5% /tmp

3% /var 4% /ww

arbitrary, but the presence of the one-line header makes it harder to apply sort

Fortunately, on most systems, the output is only a few lines long

df (disk free) gives a one-line summary of used and available space on each mounted filesystem The units are

systemdependent blocks on some systems, and kilobytes on others Most modern implementations support the

-k option to force -kilobyte units, and the -l (lowercase L) option to inc

Usage

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 10

Lowercase L Show only local filesystems

Behavior

r each file or directory argument, or for all filesystems if there are no such arguments, df

produces a one-line header that identifies the output columns, followed by a usage report for

the filesystem containing that file or directory

Caveats

The output of df varies considerably between systems, making it hard to use reliably in

ortable shell scripts

Space reports for remote filesystems may be inaccurate

Fo

p

df's output is not sorted

Reports represent only a single snapshot that might be quite different a short time later in an

active multiuser system

You can supply a list of one or more filesystem names or mount points to limit the output to just those:

$ df -lk /dev/sda6 /var

df's reports about the free space on remote filesystems may be inaccurate, because of software implementation

inconsistencies in accounting for the space reserved for emergency use

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda6 4032092 1684660 2142608 45% /ww

/dev/sda9 13432904 269704 12480844 3% /var

unted filesystems, entries in the Filesystem column are prefixed by hostname:, making

ough that some df implementations split the display into two lines, which is a nuisance fo

rses the output Here's an example from a Sun Solaris system:

1k-blocks Used Available Use% Mounted on

fs:/export/home/0075

35197586 33528481 1317130 97% /a/fs/export/home/0075

In Section B.4.3 in Appendix B, we disc

that is set when the filesystem is created

uss the issue that the inode table in a filesystem has an immutable size

The -i (inode units) option provides a way to assess inode usage Here

Filesystem Inodes IUsed IFree IUse% Mounted on

ape, since its inode use and filesystem space are both just over 40 percent

The /ww filesystem is in excellent sh

of capacity For a healthy computing system, system managers should routinely monitor inode usag

filesystems

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 11

df is one of those com on in the options and output appearance, which again is

s

, HP provides a Berkeley-style equivalent, bdf, that produces output that is

similar to our example To deal with this variation, we recommend that you install the GNU version everywhere

at your site; it is part package cited in Section 4.1.5

mands where there is wide variati

a nuisance for portable programs that want to parse its output Hewlett-Packard's implementation on HP-UX iradically different, but fortunately

10.6.2 The du Command

f summarizes free space by filesystem, but does not tell you how much space a particular directory tree

requires That job is done by du (disk usage) Like its companion, df, du's options tend to vary substantially

plemented: -k

du -s /tmp

The GNU version provides the -h (human-readable) option:

s /var/log /var/spool /var/tmp

between systems, and its space units also may vary Two important options are widely im

(kilobyte units) and -s (summarize) Here are examples from our web server system:

du does not count extra hard links to the same file, and normally ignores soft links However, some

implementations provide options to force soft links to be followed, but the option names vary: consult the manual pages for your system

Trang 12

Show only a one-line summary for each argument

Behavior

For each file or directory argument, or for the current directory if no such arguments are

given, du normally produces one output line containing an integer representing the usage,

followed by the name of the file or directory Unless the -s option is given, each directory

argument is searched recursively, with one report line for each nested directory

Caveats

du's output is not sorted

One common problem that du helps to solve is finding out who the big filesystem users are Assuming that user

home-directory trees reside in /home/users, root can do this:

# du -s -k /home/users/* | sort -k1nr | less Find large home

000 command

directory trees

This produces a list of the top space consumers, from largest to smallest A find dirs -size +10

in a few of the largest directory trees can quickly locate files that might be candidates for compression or

deletion, and the du output can identify user directory trees that might better be moved to larger quarters

Some managers automate the regular processing of du reports, sending warning mail to

users with unexpectedly large directory trees, such as with the script in Example 7-1 in

Chapter 7 In our experience, this is much better than using the filesystem quota system (see the manual pages for quota(1)), since it avoids assigning magic numbers

(filesystem-space limits) to users; those numbers are invariably wrong, and they inevitably prevent people from getting legitimate work done

of unts

s nothing magic about how du works: like any other program, it has to descend through the file

and total up the space used by every file Thus, it can be slow on large filesystems, and it can be locked out directory trees by strict permissions; if its output contains Permission denied messages, its report underco

, only has sufficient privileges to use du everywhere in the local system

the space usage Generally

10.7 Comparing Files

In this section, we look at four related topics that involve comparing files:

• Checking whether two files are the same, and if not, finding how they differ

• Applying the differences between two files to recover one from the other

• Using checksums to find identical files

• Using digital signatures for file verification

10.7.1 The cmp and diff

lem that frequently arises in text processing is determining whether the contents of two or more file

e, even if their names differ

ave just two candidates, then the file comparison utility, cmp, readily provides the answer:

/bin/ls /tmp

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 13

No output means that the files

$ cmp /bin/cp /bin/ls Compare different files

s differ: char 27, line 1 Output identifies the location

arning message with the -s option:

$ cmp -s /bin/cp /bin/ls Compare different files

lently

$ echo $? Display the exit code

1 Nonzero value means that the files differ

you want to know the differences between two similar files, diff does the job:

Create first test file

Create second test file

re the two files

1c1

-

he older file as the first argument

ngle bracket correspond to the left (first) file, and those prefixed by a right

e right (second) file The 1c1 preceding the differences is a compact representation

f the input-file line numbers where the difference occurred, and the operation needed to make the edit: here, c

a for add and d for delete

be used by other programs For example, revision control

stems use diff to manage the differences between successive versions of files under their management

There is an occasionally useful companion to diff that does a sli htly different job diff3 compares three files,

nt people, and produces an ed-command script

e do not illustrate it here, but

10.7.2 The patch Utility

he patch utility uses the output of diff and either of the original files to reconstruct the other one Because the

h

$ patch < test.dif Apply the differences

$ cmp /bin/ls /tmp/ls Compare the original with the

copy

are identical

/bin/cp /bin/l

of the first difference

is silent when its two argument files are identical If you are interested only in its exit status, you can

cmp

suppress the w

si

If

$ echo Test 1 > test.1

$ echo Test 2 > test.2

$ diff test.[12] Compa

< Test 1

> Test 2

It is conventional in using diff to supply t

Difference lines prefixed by a left a

angle bracket come from th

o

means change In larger examples, you will usually also find

diff's output is carefully designed so that it can

sy

gsuch as a base version and modified files produced by two differe

that can be used to merge both sets of modifications back into the base version W

you can find examples in the diff3(1) manual pages

T

differences are generally much smaller than the original files, software developers often exchange difference

listings via email, and use patch to apply them Here is how patch can convert the contents of test.1 to matcthose of test.2:

$ diff -c test.[12] > test.dif Save a context difference in

test.dif

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 14

rt tells patch the filenames, and allows it to verify the change location and to

nce

hat grows quadratically in the number of files, which is soon

patching file test.1

$ cat test.1 Show the patched test.1 file

Test 2

patch applies as many of the differences as it can; it reports any failures for you to handle manually

Although patch can use the ordinary output of diff, it is more common to use diff's -c option to get a context

difference That more verbose repo

recover from mismatches Context differences are not essential if neither of the two files has been changed sithe differences were recorded, but in software development, quite often one or the other will have evolved

10.7.3 File Checksum Matching

If you have lots of files that you suspect have identical contents, using cmp or diff would require comparing all

pairs of them, leading to an execution time t

intolerable

You can get nearly linear performance by using file checksums There are several utilities for computing

checksums of files and strings, including sum, cksum, and checksum,[9] the message-digest tools[10] md5 and

md5sum, and the secure-hash algorithm[11] tools sha, sha1sum, sha256, and sha384 Regrettably,

implementations of sum differ across platforms, making its output useless for comparisons of checksums of file

R Rivest, RFC 1321: The MD5 Message-Digest Algorithm, available at ftp://ftp.internic.net/rfc/rfc1321.txt

md5sum is part of the GNU coreutils package

[11]

NIST, FIPS PUB 180-1: Secure Hash Standard, April 1995, available at

p://www.cerberussystems.com/INFOSEC/stds/fip180-1.htm

htt , and implemented in the GNU coreutils package

system, but all are easy to build and install Their output formats differ, but here is a typical example:

hexadecimal digits, equivalent to 128 bits Thus, the chance[12]

sum command, only a few of these program

s of

The long hexadecimal signature string is just a many-digit integer that is computed from all of the byte

file in such a way as to make it unlikely that any other byte stream could produce the same value With good

gorithms, longer signatures in general mean greater likelihood of uniqueness The md5sum output has 32

al

of having two different files with identical

oth with the same checksum, is likely

glossary entry includes a short proof and numerical examples

signatures is only about one in 2 = 1.84 10 , which is probably negligible Recent cryptographic research has

same MD5 checksum However, demonstrated that it is possible to create families of pairs of files with the

creating a file with similar, but not identical, contents as an existing file, b

to remain a difficult problem

[12]

If you randomly select an item from a collection of N items, each has a 1/N chance of being chosen If you select

M items, then of the M(M-1)/2 possible pairs, the chance of finding a pair with identical elements is (M(M-1)/2)/N

That value reaches probability 1/2 for M about the square root of N This is called the birthday paradox; you can

find discussions of it in books on cryptography, number theory, and probability, as well as at numerous web sites Its

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 15

e program in Example 10-2

To find matches in a set of signatures, use them as indices into a table of signature counts, and report just thos

cases where the counts exceed one awk is just the tool that we need, and the is short and clear

Example 10-2 Finding matching file contents

We can conclude, for example, that ed and red are identical programs on this system, although they m

vary their behavior according to the name that they are invoked with

Files with identical contents are often links to each other, especially when found in system dir

identical-files provides more useful information when applied to user directories, where it is l

are links and more likely that they're un

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 16

e the checksum of a file with different contents Software announcements often include checksums of the

e However, checksums alone do not provide verification: if the checksum were recorded in another file

at you downloaded with the software, an attacker could have maliciously changed the software and simply

ate key, known only to its owner, and a public key, potentially known to

t is decryptable with that key,

Alice

be confident that only

[13]

10.7.4 Digital Signature Verification

The various checksum utilities provide a single number that is characteristic of the file, and is unlikely to bsame as the

distribution files so that you have an easy way to tell whether the copy that you just downloaded matches thoriginal

th

revised the checksum accordingly

The solution to this problem comes from public-key cryptography, where data security is obtained from the

existence of two related keys: a priv

anyone Either key may be used for encryption; the other is then used for decryption The security of public-key cryptography lies in the belief that knowledge of the public key, and text tha

provides no practical information that can be used to recover the private key The great breakthrough of this invention was that it solved the biggest problem in historical cryptography: secure exchange of encryption keys among the parties needing to communicate

re is how the private and public keys are used If Alice wants to sign an open letter, she uses her private

to encrypt it Bob uses Alice's public key to decrypt the signed letter, and can then be confident that only

could have signed it, provided that she is trusted not to divulge her private key

If Alice wants to send a letter to Bob that only he can read, she encrypts it with Bob's public key, and he then uses his private key to decrypt it As long as Bob keeps his private key secret, Alice can

Bob can read her letter

It isn't necessary to encrypt the entire message: instead, if just a file checksum is encrypted, then one has a digital signature This is useful if the message itself can be public, but a way is needed to verify its authenticity Several tools for public-key cryptography are implemented in the GNU Privacy Guard (GnuPG) and Pretty Good Privacy[14] (PGP) utilities A complete description of these packages requires an entire book; see the

Chapter 16 However, it is straightforward to use them for one important task: verification of digital sign

We illustrate only GnuPG here, since it is under active development and it builds more

coreutils-5.0.tar* Show the distribution files

1 jones devel 6020616 Apr 2 2003 coreutils-5.0.tar.gz

1 jones devel 65 Apr 2 2003 coreutils-5.0.tar.gz.sig

g: Signature made Wed Apr 2 14:26:58 2003 MST using DSA key ID D333CBA1

k signature: public key not found

e only information that we have here is the key ID Fortunately,

The signature verification failed because we have not added the signer's public key to the gpg key ring If we

knew who signed the file, then we might be able to find the public key at the signer's personal web site or ask the signer for a copy via email However, th

people who use digital signatures generally register their public keys with a third-party public-key server, an

that registration is automatically shared with other key servers Some of the major ones are listed in Table 10 , Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 17

web search engines Replicated copies of public keys enhance security: if one key server is unavailable or compromised, you can easily switch to another one

blic-key servers

and more can be found by

Table 10-2 Major pu Country URL

Finally, save the key text in a temporary file—say, temp.key—and add it to your key ring:

.key Add the public key to your key

Now you can verify the signature successfully:

$ gpg coreutils-5.0.tar.gz.sig Verify the digital signature

gpg: Signature made Wed Apr 2 14:26:58 2003 MST using DSA key ID D333CBA1 gpg: Good signature from "Jim Meyering <jim@meyering.net>"

aka "Jim Meyering <meyering@lucent.com>"

1

Use a web browser to visit the key server, type the key ID 0xD333CBA1 into a search box (the leading 0x is

mandatory), and get a report like this:

Public Key Server Index ''0xD333CBA1 ''

Type bits /keyID Date User ID

pub 1024D/D333CBA1 1999/09/26 Jim Meyering <meyering@ascend.com>

eceding code snippet in bold) to get a web pagFollow the link on the key ID (shown in the pr

Public Key Server Get ''0xD333CBA1 ''

Version: PGP Key Server 0.9.6

gpg: key D333CBA1: public key "Jim Meyering <jim@meyering.net>" imported

gpg: Total number processed: 1

gpg: imported: 1

gpg: aka "Jim Meyering <meyering@na-net.ornl.gov>"

gpg: aka "Jim Meyering <meyering@pobox.com>"

gpg: aka "Jim Meyering <meyering@ascend.com>"

gpg:

gpg: checking the trustdb

gpg: checking at depth 0 signed=0 ot(-/q/n/m/f/u)=0/0/0/0/0/

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 18

gpg: next trustdb check due at ????-??-??

gpg: WARNING: This key is not certified with a trusted signature!

gpg: There is no indication that the signature belongs to the owner

Primary key fingerprint: D70D 9D25 AF38 37A5 909A 4683 FDD2 DEAC D333 CBA1

nless you personally know the signer and have good reason to believe that the key is

istribution, but without knowledge of the signer's (secret) private

ey, the digital signature cannot be reproduced, and gpg detects the attack:

ls -l coreutils-5.0.tar.gz List the maliciously modified

archive file

-rw-rw-r 1 jones devel 6074205 Apr 2 2003 coreutils-5.0.tar.gz

$ gpg coreutils-5.0.tar.gz.sig Try to verify the digital

be revealed when the signature was verified Security is never perfect

Y ou do not need to use a web browser to retrieve a public key: the GNU wget utility[15]

The warning in the successful verification simply means that you have not certified that the signer's key really does belong to him U

valid, you should not certify keys

An attacker could modify and repackage the d

Trang 19

rm -f /tmp/pgp-0xD333CBA1.tmp.21643

Some keys can be used with both PGP and GnuPG, but others cannot, so the reminder covers both Because the

command-line options for gpg and pgp differ, and pgp was developed first, gpg comes with a wrapper program,

pgpgpg, that takes the same options as pgp, but calls gpg to do the work Here, pgpgpg -ka is the same as gpg import

getpubkey allows you to add retrieved keys to either, or both, of your GnuPG and PGP key rings, at the expense

of a bit of cut-and-paste gpg provides a one-step solution, but only updates your GnuPG key ring:

$ gpg keyserver pgp.mit.edu search-keys 0xD333CBA1

gpg: searching for "0xD333CBA1" from HKP server pgp.mit.edu

ealed information about the time-of-day clock and its limited range in many current systems

echo "Try: pgp -ka $tmpfile"

echo " pgpgpg -ka $tmpfile"

echo "

done

Here is an example of its use:

$ getpubkey D333CBA1 Get the public key for key ID

Try: pgp -ka /tmp/p

pgpgpg -ka /tm

Keys 1-6 of 6 for "0xD333CBA1"

(1) Jim Meyering <meyering@ascend.com>

1024 bit DSA key D333CBA1, created 1999-09-26

Enter number(s), N)ext, or Q)uit > 1

gpg: key D333CBA1: public key "Jim Meyering <jim@meyering.net>" imported gpg: Total number processed: 1

constructed by complete scans of the filesystem When you know part or all of a filename and just wan

where it is in the filesystem, locate is generally the best way to track it down, unless it was created after the

database was constructed

The type command is a good way to find out information about shell commands, and our pathfind script from

Chapter 8 provides a more general solution for locating files in a specified directory path

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 20

e took several pages to explore the powerful find command, which uses brute-force filesystem traversal to

find files that match user-specified criteria Nevertheless, we still had to leave many of its facilities for you to

and the extensive manual for GNU find

powerful command for doing operations on lists of files, often

what fi

may use them often

We wrapped up with a description of commands for comparing files, applying patches, generating file

W

discover on your own from its manual pages

We gave a brief treatment of xargs, another

produced upstream in a pipeline by find Not only does this overcome command-line length restrictions on

many systems, but it also gives you the opportunity to insert additional filters in the pipeline to further control

les are ultimately processed

du commands report the space used in filesystems and directory trees Learn them

checksums, and validating digital signatures

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 21

Chapter 11 Extended Example: Merging User Dat

ix password file, /etc/passwd, has shown up in several places throughout the book System

stration tasks often revolve around manipulation of the password file (and the corresponding group f

[1]

roup) The format is well known:

[1]

BSD systems maintain an additional file, /etc/master.passwd, which has three additional fields: the user's

login class, password change time, and account expiration time These fields are placed between the GID field and

tolstoy:x:2076:10:Leo Tolstoy:/home/tolstoy:/bin/bash

The

is empty, the user can log in without a password, and anyone with access to the system or a terminal on it can

the field for the full name

re are seven fields: username, encrypted password, user ID number (UID), group ID number (GID), full ome directory, and login shell It's a bad idea to

log in as that user If the seventh field (the shell) is left empty, Unix defaults to the Bourne shell, /bin/sh

As is discussed in detail in Appendix B, it is the user and group ID numbers that Unix uses for permission checking when accessing files If two users have different names but the same UID number, then as far as Unix knows, they are identical There are rare occasions when you want such a situation, but usually having two

articular, NFS requires a uniform UID space; user number

2076 on all systems accessing each other via NFS had better be the same user (tolstoy), or else there will be

r and available on non-Sun systems At the time, one of us was a system administrator of two separate

via TCP/IP, but did not have NFS However, a new OS vendor was scheduled to make 4.3 BSD + NFS available for these systems There were a number of

ame was the same, but the UID wasn't! These systems erative that their UID spaces be merged The task was

uring that all users from both systems had unique UID numbers

rrect users in the case where an existing UID was to be used

he original scripts are long gone, and it's occasionally interesting and instructive to reinvent a useful wheel.) This problem isn't just academic, either:

e It's possible for there inistrator, you may one

accounts with the same UID number is a mistake In p

serious security problems

Now, return with us for a moment to yesteryear (around 1986), when Sun's NFS was just beginning to become popula

4.2 BSD Unix minicomputers These systems communicated

users with accounts on both systems; typically the usern

were soon to be sharing filesystems via NFS; it was imp

to write a series of scripts that would:

• Merge the /etc/passwd files of the two systems This entailed ens

• Change the ownership of all files to the co

for a different user

It is this task that we recreate in this chapter, from scratch (T

consider two departments in a company that have been separate but that now must merg

to be users with accounts on systems in multiple departments If you're a system adm

day face this very task In any case, we think it is an interesting problem to solve

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 22

11.2 The Password Files

Let's call our two hypothetical Unix systems u1 and u2 Example 11-1 presents the /etc/passwd file from u1.[2]

[2]

Any resemblance to actual users, living or dead, is purely coincidental

/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin

o Tolstoy:/home/tolstoy:/bin/bash

t Camus:/home/camus:/bin/bash jhancock:x:200:10:John Hancock:/home/jhancock:/bin/bash

n:/bin/bash ome/abe:/bin/bash

ash And Example 11-2

Example 11-1 u1 /etc/passwd file

/bin/bash /home/tj:/bin/bash

If you examine these files carefully, you'll represent the various possibilities that our program has to

ost typically with exist only on one system but not the other In this case, when the

• Users for whom the username is different on both systems, but the UIDs are the same

Example 11-2 u2 /etc/passwd fil

root:x:0:0:root:

bin:x:1:1:bin:/bin:/sbin/nologin

login daemon:x:2:2:daemon:/sbin:/sbin/no

/nologiadm:x:3:4:adm:/var/adm:/sbin

eorge:x:1100:10:George Washi

g

betsy:x:1110:10:Betsy Ross:/home/betsy:/bin/bash

e/jhancock:/bin/bash jhancock:x:300:10:John Hancock:/hom

• Users for whom the username and UID are the same on both systems This happens m

nd bin administrative accounts such as root a

whom the username and UID

• Users for

files are merged, there is no problem

• Users for whom the username is the same on both systems, but the UIDs are different

11.3 Merging Password Files

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Ngày đăng: 12/08/2014, 10:22

TỪ KHÓA LIÊN QUAN