Text-Processing One-Liners

For consistency in these examples, the following common variable is echoed and piped to the utility to illustrate the mode of use: VAR="The quick brown fox jumped over the lazy dog." Dis

Trang 1

■ ■ ■

Even though this book is about using the shell’s command language, I use a fair number

of calls to other utilities for text processing Sed, awk, and grep are the primary UNIX

text-processing utilities, although I have used others This chapter gives you a collection of

short and useful one-liners that illustrate quite a few methods for gathering specific

infor-mation from various textual sources

Very often when writing a script, you need to know source data locations before you

start pruning the data for further processing For instance, you can find the load average

of a running Linux system from the first line of the output of the top utility, the output of

the uptime command, the output of the w command, and in the /proc/loadavg file There

are almost always multiple ways to gather and process information, and the tools

intro-duced in this chapter should give you an excellent start on knowing what you will need to

do in many situations

For more information about any of these utilities, consult Appendix C of this book or

the man pages of individual utilities This chapter is not intended to cover these utilities

exhaustively; several of these utilities have had complete books written about them

An extremely common use of the utilities discussed in this chapter is to modify or

filter a string that is obtained from any one of a number of sources, such as from an

environment variable or from output of a system command For consistency in these

examples, the following common variable is echoed and piped to the utility to illustrate

the mode of use:

VAR="The quick brown fox jumped over the lazy dog."

Displaying Specific Fields

The following example is a simple awk statement to extract data fields from a string

con-taining a record with multiple fields, assuming that whitespace characters separate the

fields The awk field variables start at $1 and increment up through the end of the string

In our example string, there are nine fields separated by whitespace The awk positional

variable $0 is special in that it holds the value of the whole string Quite often, the print

Trang 2

statement will target only a single field, but this example shows how to extract and reorder several of the input fields:

echo $VAR | awk '{print $1, $8, $4, $5, $6, $7, $3, $9}'

This produces the following output:

The lazy fox jumped over the brown dog

Specifying the Field Separator

Here is another simple use of awk, where the field separator is specified using the -F command-line switch Using this option causes the source string to be split up based

on something other than whitespace In this case it is the letter o

echo $VAR | awk -Fo '{print $4}'

ver the lazy d

Simple Pattern-Matching

Matching specific fields of the input is very useful in finding data quickly A grep com-mand can easily return lines that match a given string, but awk can return lines that match

a specific value in a specific field The following example finds and displays all lines whose second field is equal to the string casper in /etc/hosts The test used for the second field

could be changed from equal (==) to not equal (!=) to find the lines in the file that do not

contain the string casper in the second field, and more complicated conditions can be constructed in the usual way

awk '$2 == "casper" {print $0}' /etc/hosts

172.16.5.4 casper casper.mydomain.com

Trang 3

Matching Fields Against Several Values

Another pattern-matching technique, which is similar to the previous one, is to look for

one of several alternatives in a specific field The example here extends the previous one a

bit by looking for lines in my /etc/hosts file whose IP addresses (in field 1) start with

either 127 or 172 Note that each alternative between the slashes (/) is separated by the

pipe (|) character; this is awk notation for the regular expression specifying the pattern

“starting with 127 or starting with 172.” The pattern-matching operator ~ could also be

replaced with the negated operator !~ to return the lines in the file that don’t match the

expression

awk '$1 ~ /^127|^172/ {print $0}' /etc/hosts

127.0.0.1 localhost

172.16.5.2 phred phred.mydomain.com

Determining the Number of Fields

This one-liner illustrates the use of a special awk internal variable NF whose value is the

number of fields in the current line of input You may want to try changing the field

sepa-rator as shown in the earlier example and note the difference in the result

echo $VAR | awk '{print NF}'

9

Determining the Last Field

This is a slightly modified version of the previous example; it adds a dollar sign ($) in

front of the NF variable This will print out the value of the last field instead of the

num-ber of fields

echo $VAR | awk '{print $NF}'

Trang 4

The following output results:

dog

Determining the Second-to-Last Field

We can use NF to get the second-to-last field of the string, as in the next example This could be easily modified to reference other positions in the input relative to the last field The previous three examples all relate directly to the standard numeric awk field variables From our example string, $NF would be equal to $9 This variable is one layer more abstract than directly referencing a positional variable It allows you to reference any par-ticular field of an arbitrary string length through logic

echo $VAR | awk '{print $(NF-1)}'

You get the following output:

lazy

Passing Variables to awk

In some cases you may not know until the command is run which field you want You can deal with this by passing a value to awk when it is invoked The following example shows how you can pass the value of the shell variable TheCount to an awk command The -v switch to awk specifies that you are going to set a variable Following the -v switch is the variable being assigned within awk

TheCount=3

echo $VAR | awk -v counter=$TheCount '{print $counter}'

brown

Trang 5

The -v switch is a relatively new option for assigning a variable, and it may not be ideal

when you’re shooting for portability In that case, this usage should do the trick:

TheCount=3

echo $VAR | awk '{print $counter}' counter=$TheCount

It produces the following output:

brown

Using a Variable Passed to awk in a Condition

Here is another use of shell variables with the awk command The NODE=$node assignment

sets the internal awk variable NODE to the value of the shell variable $node The awk

com-mand then checks whether each line of the input file for $2 is equal to the value of NODE If

a line is equal, then $3 is output In this example, the /etc/hosts file was used The code

works like that in the ”Simple Pattern-Matching” example shown earlier, except that the

value to compare against can be specified independently of the field that is output

awk -v NODE=$node '$2 == NODE {print $3}' /etc/hosts

The output depends on the contents of your /etc/hosts file, but the intended effect is

to display the domain name corresponding to the specified node name Try setting the

node variable to the name of your system before running this command My system is

named casper and this is its hosts file entry:

Thus, if on some line in the /etc/hosts file, the system name stored in the node variable

is in field 2, then the third field of that line will be displayed When I run this command

after setting the shell variable $node to casper, the output is the third field of the /etc/

hosts entry for casper: casper.mydomain.com

Displaying a Range of Fields (Main Method)

Usually, printing a range of fields from an input line cannot be expressed using simple

syntax Unless the range is fixed, you generally need to have awk loop through a previously

specified list of fields, printing each one in turn In this example, the for loop starts with a

fixed field number (here, 3) and ends with the value of the NF variable You can modify this

Trang 6

easily to permit any range The printf (formatted print) command in the body of the loop prints the current field, followed by a space The last print statement outside the loop adds a final carriage return at the end of the output

echo $VAR | awk '{for(i=3; i<=NF; i++) {printf "%s ",$i}; print ""}'

Here is the output:

brown fox jumped over the lazy dog

Displaying a Range of Fields (Alternate Method)

One last use of external variables being passed to awk is related to potential problems with awk versions In some cases, the versions of awk, nawk, or gawk handle the -v switch differently There are also issues when passing variables that have spaces included in lit-eral strings Most awk commands from the command line are contained within single quotes: ' When passing external shell variables to awk, in the space within the awk com-mand where the variable containing spaces would normally be applied you should embed the shell variable directly into the command by surrounding it with more single quotes In the following example, the awk command starts with a single quote and then begins a for loop The counter variable i is set to the initial value of 3 and will continue

to loop while i is less than or equal to $end $end is a shell variable that is embedded between two single quotes The first of these quotes ends the initial awk statement and the shell is then used to expand the value of the $end variable The second single quote that follows the $end variable reopens the awk command, which includes the loop incre-ment value as well as the print stateincre-ments The final single quote ends the whole awk statement

This example is very simple and nearly the same as the range-printing solution It illus-trates the use of a shell variable within an awk command The differences are that the ending variable ($end) is passed from the shell environment and it is not contained within the single quotes of the awk command The shell variable $end is set to the value 6

echo $VAR | awk '{for(i=3; i<='$end'; i++) {printf "%s ",$i}; print ""}'

Here is the output:

brown fox jumped over

Trang 7

Determining the Length of a String Using awk

The length value in awk is another internal variable that contains the number of

charac-ters in the current line

echo $VAR | awk '{print length}'

Here’s the output:

45

Determining the Length of a String Using expr

Another solution for this task uses the internal length function of expr

(expr length "$VAR")

45

Displaying a Substring with awk

Substring extraction can be performed using a built-in function of awk The function has

the following form:

substr(string,position of first character of substring,substring character count)

The following example extracts a substring of three characters from the third field of

the VAR variable, starting from the second character in the field

echo $VAR | awk '{print substr($3,2,3)}'

You get the following output:

row

Trang 8

Displaying a Substring with expr

Here is a method of extracting a substring using expr It uses the substr() function of expr

As before, the first argument is the string, the second is the position of the desired sub-string’s starting character, and the last is the number of characters in the substring The example gets 4 characters from the string stored in VAR, starting at character number 12

(expr substr "$VAR" 12 4)

rown

Conducting Simple Search and Replace with sed

The following example searches for space characters within each line of input and replaces them with the string %20 The search-and-replace syntax follows the pattern

s/search string/replacement string/ The g at the end of the expression is optional; it stands for global and indicates that you want to replace all instances of the search term

found in the line Without the g, the command replaces only the first instance of the search term

echo $VAR | sed -e "s/ /%20/g"

The%20quick%20brown%20fox%20jumped%20over%20the%20lazy%20dog

Disregarding Blank and Commented Lines

from a File

This example is a little more involved First it uses a sed command to filter all lines that have been commented out in a specified file (here, /etc/ntp.conf) The output is then piped to awk, which is used to print only non-null lines (i.e., lines whose length is not 0) The sed command checks whether each line starts with a pound sign (#) and is followed

by a string that matches the pattern *, which denotes “any number of any characters.” If

a line matches this overall pattern, sed produces no output; otherwise it echoes the line

Trang 9

The effect of this is to echo the original contents of the file, minus any commented lines

(those beginning with #) The sed output is piped into an awk one-liner that filters out lines

of length 0 The resulting sequence is a quick way to remove all blank and commented

entries of a file

sed -e "s/#.*//g" /etc/ntp.conf | awk '{if(length !=0) print $0}'

The output will, of course, be specific to the file used as input

Conducting Dual Search and Replace with sed

A more advanced search and replace first checks the input for a string other than the one

that is going to be replaced, and performs the search-and-replace operation only if this

string is found For instance, you might have a file in which each line contains a name and

address, and you want to change “Portland” to “Gresham” on the lines containing the

name Ron Peters

This can be accomplished using sed by including a pattern before the search

expres-sion Continuing with our “quick brown fox” example, the following code first searches for

the word “quick” in the input and then replaces all instances (g) of the string he with the

replacement string she on the line if the word was found

echo $VAR | sed -e "/quick/s/he/she/g"

Here’s the output:

Tshe quick brown fox jumped over tshe lazy dog

Filtering Lines with sed

Sometimes filtering out certain lines is desirable For instance, when parsing ps output,

you might not want the header line displayed The following sed example removes the

first line from the stdout of a call to ps This is similar to the head command, but it has the

opposite effect: while a head command grabs the specified number of leading lines and

drops the rest, our example removes the specified number of initial lines from the output

of ps (here, 1) and displays the rest (You could use the tail command, but you would

need to know the total number of lines.) Removing more than the first line is as simple as

changing the specified line to a range of lines; to remove the first three lines, you would

change 1d to 1,3d

ps -ef | sed -e '1d'

Trang 10

This produces the following output (the italicized line is the header that was removed):

UID PID PPID C STIME TTY TIME CMD

root 1 0 0 22:32 ? 00:00:05 init [5]

root 2 1 0 22:32 ? 00:00:01 [keventd]

root 3 1 0 22:32 ? 00:00:00 [kapmd]

Searching for Multiple Strings with egrep

egrep is a utility that works in much the same way as the traditional grep command Handily, it will search for more than one string at a time In this example, I search for any one of three alternative search strings within the /etc/passwd file

egrep "desktop|mysql|ntp" /etc/passwd

It produces the following output:

ntp:x:38:38::/etc/ntp:/sbin/nologin

desktop:x:80:80:desktop:/var/lib/menu/kde:/sbin/nologin

mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/bash

A Clean Method of Searching the Process Table

Traditionally a command to find a specific process in the process table would look some-thing like this:

ps -ef | grep some_string

When this command is run, the output includes not only the process data you were looking for, but also the data for the grep process itself since the search string is also con-tained in the invocation of grep To clean up the output, you can add an additional pipe to remove the additional grep process entry with the –v switch to grep, like this:

ps -ef | grep some_string | grep -v grep

There is a little trick for performing this task without the additional pipe:

ps -ef | grep "[s]ome_string"

This turns the original search string into a regular expression The new grep command has the same effect as the previous one because the regular expression evaluates to the

Tiêu đề	Text-Processing One-Liners
Trường học	University of Example
Chuyên ngành	Computer Science
Thể loại	Lecture Notes

Định dạng
Số trang	16
Dung lượng	110,94 KB