1. Trang chủ
  2. » Công Nghệ Thông Tin

An introduce to perl

58 378 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 58
Dung lượng 383,5 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Perl overview 1• Perl = Practical extraction and report language • Perl = Pathologically eclectic rubbish lister  • It is a powerful general-purpose language, which is particularly usef

Trang 1

An Introduction to Perl

Sources and inspirations:

http://www.cs.utk.edu/~plank/plank/classes/cs494/494/notes/Perl/lecture.html

Randal L Schwartz and Tom Christiansen,

“Learning Perl” 2nd ed., O’Reilly

Randal L Schwartz and Tom Phoenix,

“Learning Perl” 3rd ed., O’Reilly

Dr Nathalie Japkowicz, Dr Alan Williams

Go O'Reilly!

Trang 2

Perl overview (1)

• Perl = Practical extraction and report language

• Perl = Pathologically eclectic rubbish lister 

• It is a powerful general-purpose language, which is particularly useful for writing “quick and dirty”

Trang 3

Perl overview (2)

• In the hierarchy of programming language, Perl is located half-way between high-level languages

such as Pascal, C and C++, and shell scripts

(languages that add control structure to the Unix command line instructions) such as sh, sed and awk

• By the way:

– awk = Aho, Weinberger, Kernighan

– sed = Stream Editor

Trang 5

Advantages of Perl (2)

• Perl offers extremely strong regular expression

capabilities, which allow fast, flexible and reliable

string handling operations, especially pattern

Trang 6

Disadvantages of Perl

• Perl is a jumble! It contains many, many

features from many languages and tools

• It contains different constructs for the same

functionality (for example, there are at least 5

ways to perform a one-line if statement)

It is not a very readable language

• You cannot distribute a Perl program as an

opaque binary That is, you cannot really

commercialize products you develop in Perl

Trang 7

Perl resources and versions

• http://www.perl.org tells you everything that you want to know about Perl

• What you will see here is Perl 5

• Perl 5.8.0 has been released in July 2002

• Perl 6 (http://dev.perl.org/perl6/) is the next

version, still under development, but moving

along nicely The first book on Perl 6 is in stores (http://www.oreilly.com/catalog/perl6es)

Trang 8

Scalar data: strings and numbersScalars need not to be defined or their types declared:

Perl understands from context.

% cat hellos.pl

#!/usr/bin/perl -w

print "Hello" " " "world\n";

print "hi there " 2 " worlds!" "\n"; print (("5" + 6) " eggs\n" " in " "

Trang 9

Scalar variables

Scalar variable names start with a dollar sign They

do not have to be declared.

12

$k\n

Trang 10

Quotes and substitution

Suppose $x = 3

Single-quotes ' ' allow no substitution except for the

escape sequences \\ and \'

print('$x\n'); gives $x\n and no new line

Double-quotes " " allow substitution of variables like $xand control codes like \n (newline)

print("$x\n"); gives 3 (and a new line).

Back-quotes ` ` also allow substitution, then try to

execute the result as a system command, returning as the final value whatever the system command outputs

$y = `date`; print($y); results in

Trang 11

Control statements: if, else, elsif

print "'$name' follows 'fred'\n";}

elsif ($name eq 'fred') {

print "both names are 'fred'\n";}

Trang 12

Control statements: loops (1)

% oddsum_while.pl

10

Use of uninitialized value at

oddnums.pl line 6, <STDIN> chunk 1.

Trang 13

Control statements: loops (2)

• End-line comments begin with #

• It is okay, though not nice, to use a variable

without initialization (like $sum) Such a

variable is initialized to 0 if it is first used as a

number or to the empty string "" if it is first

used as a string In fact, it is always undef,

variously converted

• Perl can, if asked, issue a warning (use the -w

flag)

• Of course, while is only one of many looping

constructs in Perl Read on

Trang 14

Control statements: loops (3)

Trang 15

Control statements: loops (4)

We also have do-while and do-until, and we have

foreach Read on.

Trang 16

Control statements: loops (5)

Trang 17

Control constructs compared

C Perl (braces required)

the same if () { } if () { }

if (! ) { } unless () { }

different } else if () { } } elsif () { }

the same while () { } while () { }

the same for (aa;bb;cc) { } for (aa;bb;cc) { }

foreach $v (@array){ }

similar 0 is FALSE 0, "0", and "" are FALSE

similar != 0 is TRUE anything not false is TRUE

Trang 18

Lists and arrays

• A list is an ordered collection of scalars An array is a variable that contains a list

• Each element is an independent scalar value A list can hold numbers, strings, undef values—any

mixture of kinds of scalar values

• To use an array element, prefix the array name with

a $; place a subscript in square brackets

• To access the whole array, prefix its name with a @

• You can copy an array into another You can use the

Trang 19

Command-line arguments

Suppose that a Perl program stored in the file cleanUp

is invoked in Unix/Linux with the command:

cleanUp -o result.htm data.htm

The built-in list named @ARGV then contains three

elements:

('-o', 'result.htm', 'data.htm')

These three element can be accessed as:

$ARGV[0]

$ARGV[1]

$ARGV[2]

Trang 20

John Nathalie Zebra

hello nil notary

Trang 21

Array examples (2A)

Trang 22

Array examples (2B)

% each_rev.pl

a bc d efg efg d bc a

hi j

j hi klm nopq st

Trang 23

Array examples (3)Reversing a text file (whole lines)

Trang 25

• A hash is similar to an array, but instead of subscripts, we

can have anything as a key, and we use curly brackets

rather than square brackets

• The official name is associative array (known to be

implemented by hashing )

• Keys and values can be any scalars; keys are always

converted to strings

• To refer to a hash as a whole, prefix its name with a %

• If you assign a hash to an array, it becomes a simple list

Trang 29

<> loops over the files listed

Trang 30

Hash examples III:

character frequency count

# end of input, print %count

for $c (sort keys %count) {

print "$c\t$count{$c}\n";

Trang 31

Character frequency count (2)

Trang 32

• A subroutine is a user-defined function The

syntax is very simple; so is the semantics

marked with & The value returned is that of

the last expression evaluated

Trang 33

Subroutines (2)

A few housekeeping rules

• You can place your definitions anywhere in the file,

though it is recommended to have them at the beginning

• Perl always uses the latest definition in the file—any

preceding one is ignored

• Certain elements of the syntax are optional

• The & might sometimes be omitted (but it is not a good idea).

• The return operator may precede a value to be

returned (this can be useful):

if ( $x > $y ) { return $x }

Trang 34

Subroutines (3)

• Clearly, the use of global variables is much

too limited Subroutines take arguments, and work on them via a predefined list variable

@_ or its elements $_[0], $_[1] and so on

Trang 35

Subroutines (4)

• $_[0], $_[1] are not fun to work with We

can rename them locally, using the my

operator—it creates a sub's private variables

Here, we declare two such variables and

right away initialize them

Trang 36

• This produces 19 (23 gets ignored) and

26 (the second value is undef, that is, 0)

Trang 37

Subroutines (6)

• We could stop the subroutine if the number

of arguments is wrong The (generally very useful!) operator die does that for us

The script is stopped after printing this:

max needs two arguments: 16 19 23

Trang 38

Subroutines (7)

• We can have just a warning, if we use the

operator warn instead

The script prints this:

max needs two arguments: 16 19 23

Trang 39

Subroutines (8)

• It is, by the way, not a bad idea to generalize max

by allowing it to take any number of arguments

Trang 40

$curr_max

}

$z = &max ( );

if ( defined $z ) { print $z "\n"; }

Trang 41

Regular expressions (1)

• A regular expression (also called a pattern) is a

template that describes a class of strings A string can either match or not match the pattern

• The simplest pattern is one character

• A character class—the pattern matches any of

these characters—is written in square brackets:

[01234567] an octal digit

[0-7] an octal digit

[0-9A-F] a hex digit

[^A-Za-z] not a letter (^ "negates")

[0-9-] a decimal digit

or a minus

Trang 42

Regular expressions (2)

• Metacharacters:

(dot) any character except \n

• Anchors:

^ the beginning of a string

$ the end of a string

Trang 43

Regular expressions (3)

$x = "01239876AGH";

if ( $x =~ /^0[1-9]{4,}/ ) { print "yes1\n"; }

if ( $x =~ /[A-Z]{3}$/ ) { print "yes2\n"; }

if ( $x =~ /^.*[A-Z]{4}$/ ) { print "yes3\n"; }

• The Boolean operator =~ tries to match a string

with a regular expression written inside slashes

Trang 44

• Patterns can be grouped by parentheses

(the whole pattern becomes one item)

Alternative is denoted by the bar |

Trang 45

• Some character classes are predefined:

class not class

Trang 46

Regular expression examples (1)

Trang 47

Regular expression examples (2)

$j = "JjJjJjJj";

Trang 48

Regular expression examples (3)

$k = "Boom Boom, out go the lights!";

$k =~ /(Boom\W){2}/; # yes: \W is space, comma

$k =~ /\Bgo\B/; # no: "go" is a complete word

Trang 49

Regular expression substitution (1)

We can modify a string variable by applying a substitution.The operator is =~ and the substitution is written as:

Trang 50

Regular expression substitution (2)

Matched patterns are remembered in built-in variables

$1, $2, $3 etc These variables keep their values till

the next matching operation

Each set of paretheses in a pattern corresponds to a

'just' a single string to play with

a single string to play with, just just

Trang 51

Regular expression substitution (3)

A substitution can be applied to all occurrences of

the pattern, that is, globally:

Trang 52

Regular expression substitution (4)

$v = "This is a double double word.";

$v =~ s/(\b\w+\b) \1/\1/;

print "$v\n";

This is a double word

$v = "This is a triple triple triple word.";

Trang 53

Regular expression substitution (5)

# Find all dates, selecting and reinserting the context.

# $1 and $6 match the context Superfluous digits,

# as 43 and 55 in 432001-01-2255, belong in the context.

# "Dates" such as April 31 or February 30 are allowed.

# There are no provisions for leap years.

s/(\D*)(($Year)-($Month)-($Day))(\D|.*$)/$1<date>$2<\/date>$6/g; s/(\D*)(($Day)-($Month)-($Year))(\D|.*$)/$1<date>$2<\/date>$6/g; print $_;

}

Here is a more realistic example (last year's homework).

You rather need explanations: in class, please.

Trang 54

Regular expression substitution (6)

DATA

Both 12-09-2000 and 25-8-324 are good dates,

but 30-14-1955 and 10-10-10 are not OTOH, 10-10-010 is.

Trang 55

In another course 

• Predefined variables (lots!)

• More on lists, arrays and hashes

• More on regular expressions

Trang 56

Adapted from Programming Perl, page 361

1 Testing "all-at-once" instead of incrementally,

either bottom-up or top-down

2 Optimistically skipping print scaffolding to

dump values and show progress

3 Not running with the perl -w switch to catch

obvious typographical errors

4 Leaving off $ or @ or % from the front of a

variable

5 Forgetting the trailing semicolon

Mistakes that novices make (1)

Thanks to Alan Williams for this list

Trang 57

7 Unbalanced (), {}, [], "", '', ``, and sometimes

<>

8 Confusing '' and "", or / and \

9 Using == instead of eq, != instead of ne, =

instead of ==, and so on

• ('White' == 'Black') and ($x = 5) evaluate

as (0 == 0) and (5) and thus are true!

10.Using "else if" instead of "elsif"

11.Putting a comma after the file handle in a

print statement

Mistakes that novices make (2)

Trang 58

Mistakes that novices make (3)

12.Not chopping the output of backquotes `date` or not

normally start at 0, not 1

14.Using $_, $1, or other side-effect variables, then modifying the code in a way that unknowingly affects or is affected

by these

15.Forgetting that regular expressions are greedy, seeking

Ngày đăng: 23/10/2014, 16:11

TỪ KHÓA LIÊN QUAN