Perl coures english slide

A simple variable can be either a string or floating point number, but does not need to be declared as any specific type.Perl has a number of predefined variables sometimes used, consist

Trang 1

Perl course

The teacher:

Peter Wad Sackett

Center for Biological Sequence Analysis

pws@cbs.dtu.dk

Computer scientist

Programmed in Perl since 1995

Taught Perl since 2002

Trang 2

The beginner book

Learning Perl, 4th ed

by Randal Schwartz & Tom Christiansen (O'Reilly)

The bible

Programming Perl, 3rd ed.

by Larry Wall, Tom Christiansen & Jon Orwant (O'Reilly) The rest are more or less successful spin-offs

Trang 4

Perl strengths and weaknesses

PROS:

Fairly standard C-like syntax

Runs on Unix, Windows and Mac among others

Powerful text parsing facilities

Large library base

Quick development

Known as the ”glue” that connects applications

CONS:

Not as quick as compiled languages

Possible (and easy) to make ugly and hard to maintain code

Trang 5

All variables (scalars) starts with $

A variable name may contain alphanumeric characters and underscore

Case matters

A simple variable can be either a string or floating point number, but does not need to be declared as any specific type.Perl has a number of predefined variables (sometimes used), consisting of $ and a single non-alphanumeric character

Examples: $var1, $i, $MyCount, $remember_this

Trang 6

Numbers and operators

Numbers are assigned in a ”natural” manner;

| (or) & (and) ^ (xor) ~ (not) >> (rightshift) << (leftshift)

Autoincrement and autodecrement:

++

Trang 7

Strings are assigned with quotes:

$string = ’This is a literal string’;

$string = ”This is an $interpolated string\n”;

Interpolated strings are searched for variables and special character combinations that has meaning, like \n for newline and \t for tab

If a number is used in a string context then it is changed to a string and vice versa

String operators:

(concatenation) x (repetition)

Trang 8

elsif (predicate2) { # no spelling mistake

# this will be executed if this predicate is true

Trang 9

&& and || or ! not xorExamples:

$age > 18 and $height < 1.4

($name eq ’Peter’ or $name eq ’Chris’) and $wage <= 25000

Perl is using short-circuit (lazy) evaluation

Trang 13

Shorthand notation

Often if statements and sometimes loops only has one line of code to be executed in the block Perl has a shorthand notation for that

print ”$i\n” for ($i = 1; $i <= 10, $++);

As seen the structure of the statement is turned around

Trang 14

Output – printing to screen

The print statement prints a comma separated list of values

print ”Hello world\n”;

print ’Result is ’, $num1 + $num2, ”\n”;

print ”My name is $name\n”;

For better output formatting use printf, which is similar to the C function

printf (”%02d/%02d %04d\n”, $day, $month, $year);

printf (”Sum is %7.2f\n”, $sum);

The output of print(f) goes to the last selected filehandle unless otherwise specified This is usually STDOUT, which is usually the screen

Trang 15

Input – getting it from the

keyboard

The keyboard is usually STDIN unless redirection is in play

Lines are read from the keyboard like any lines are read from a filehandle

If there is no input on the line (EoF, EoT) then $line is assigned

the undefined value There is a function for checking that, too.

if (defined $line) {}

Trang 16

A simple Perl program

Trang 17

Variables are declared by the key word my and are private (local) to the block in which they are declared.

Trang 18

Scope (lexical)

A block can be considered as the statements between { and }

A variable declared with my is known only in the enclosing block

Only the ”most recent” declared variable is known in the block

my $age; # declaring $age in main program making it a global

# here is unwritten code that gets age

if ($age < 10) {

for (my $i = 1; $i < $age; $i++) { # private $i

print ”Year: $i\n”;

}

elsif ($age > 80) {

my $age = 40; # private $age only known in this block

print ”You are only $age years old.\n”;

}

print ”You are really $age years old.\n”;

Trang 19

Opening files

The modern open is a three parameters function call

open(FILEHANDLE, $mode, $filename)

The usual file modes are:

< reading

> writing

>> appending+< reading and writing

|- output is piped to program in $filename-| output from program in $filename is piped (read)

to Perl

open(IN, ’<’, ”myfile.txt”) or die ”Can’t read file $!\n”;

close IN;

Trang 21

File system functions

Trang 22

File test operators

There is a whole range of file test operators that all look like –X

print ”File exists” if –e $filename;

Some of the more useful are:

-e True if file exists

-z True if file has zero size

-s Returns file size

-T True if text file

-B True if binary file

-r True if file is readable by effective uid/gid

-d True if file is a directory

-l True if file is a symbolic link

Trang 23

String functions 1

Remove a trailing record separator from a string, usually newline

my $no_of_chars_removed = chomp $line;

Remove the last character from a string

my $char_removed = chop $line;

Return lower-case version of a string

Trang 24

String functions 2

Strings start with position 0

Return the number of charaters in a string

my $len = length($string);

Find a substring within a string

my $pos = index($string, $substring, $optional_position);

Right-to-left substring search

my $pos = rindex($string, $substring, $optional_position);

Flip/reverse a string

my $rstring = reverse $string;

Formatted print into a string (like printf)

sprintf($format, $variables…);

Get or alter a portion of a string

my $substring = substr($string, $position);

my $substring = substr($string, $position, $length);

substr($string, $position, $length, $replacementstring);

Trang 25

Stateful parsing

Stateful parsing is a robust and simple method to read data that are split up on several lines in a file It works by recognizing the line (or line before) where data starts (green line) and the line (or line after) it ends (red line) The green and/or red line can contain part of the data The principle is shown here, but code can be easily added to handle specific situations

my $flag = 0;

my $data = ’’;

while (defined (my $line = <IN>)) {

$flag = 0 if $line eq ’red’;

$data = $line if $flag == 1;

$flag = 1 if $line eq ’green’;

}

Trang 26

Arrays are denoted with @ They are initalized as a comma separeted list of values (scalars) They can contain any mix of numbers, strings or references The first element is at position

0, i.e arrays are zero-based There is no need to declare the size of the array except for performance reasons for large arrays It grows and shrinks as needed

my @array;

my @array = (1, ’two, 3, ’four is 4’);

Individual elements are accessed as variables, i.e with $

print $array[0], $array[1];

Length of an array

scalar(@array) == $#array + 1

Trang 28

Iterating over arrays

A straightforward for-loop

for (my $i = 0; $i <= $#array; $i++) {

print $array[$i]*2, ”\n”;

}

The special foreach-loop designed for arrays

foreach my $element (@array) {

print $element*2, ”\n”;

}

If you change the $element inside the foreach loop the actual value in the array is changed

Trang 29

Adding and/or removing element at any place in an array

my @goners = splice(@array, $position);

my @goners = splice(@array, $position, $length);

my @goners = splice(@array, $position, $length, $value);

my @goners = splice(@array, $position, $length, @tmp);

Trang 30

Array functions 2

Sorting an array

@array = sort @array; # alphabetical sort

@array = sort {$a <=> $b} @array; # numerical sort

Reversing an array

@array = reverse @array;

Splitting a string into an array

my @array = split(m/regex/, $string, $optional_number);

my @array = split(’ ’, $string);

Joining an array into a string

my $string = join(”\n”, @array);

Find elements in a list test true against a given criterion

@newarray = grep(m/regex/, @array);

Trang 31

Predefined arrays

Perl has a few predefined arrays

@INC, which is a list of include directories used for location modules

@ARGV, which is the argument vector Any parameters given to the program on command line ends up here

./perl_program 1 file.txt

@ARGV contains (1, ’file.txt’) at program start

Very useful for serious programs

Trang 32

Regular expressions – classes

Regular expressions return a true/false value and the match is available

print ”match” if $string =~ m/regex/;

print ”match” if not $string !~ m/regex/;

Character classes with [ ]

m/A[BCD]A/ m/A[a-g]A/ m/A[12a-z]A/ m/A[^a-z\d]A/

Trang 33

Regular expressions –

quantifiers

Often a match contains repeated parts, like an unknown

number of digits This is done with a quantifier that follows the

Trang 34

Regular expressions – groups

Often a pattern consists of repeated groups of characters A group is created by parenthesis This is also the way to extract data from a match

m/(AB)+/ m/([A-Z]{1,2}\d{4,})/

The match of the first group will be available in $1, second group in $2

If a data line looks like e.g first line in a swissprot entry

ID ASM_HUMAN STANDARD; PRT; 629 AA

$id = $1 if $line =~ m/ID\s+(\w+)/;

Alternation with | is a way to match either this or that

$name = $1 if $string =~ m/(Peter|Chris)/;

Trang 35

Regular expressions – bindings

A very useful and performance efficient trick is to bind the match to the beginning and/or end of the line

m/^ID\s(\w+)/ caret at first position binds to the beginning of the line

m/pattern$/ dollersign at last position binds to the end of the line

Always define patterns to be as narrow as possible as that makes them stronger and more exact

Regular expressions are best created by matching the pattern you look for, not by matching what the pattern is not

Variables can be used in a pattern

Trang 36

Regular expressions – modifiers

The function of a RE can be modified by adding a letter after the final / The most useful modifiers are:

g global – all occurences

o compile once, improve performance when not using variables

m multiline - ^ and $ match internal lines

m/peter/io finds Peter and PETER in a line - fast

A wonderful trick to find all numbers in a line is

my @array = $line =~ m/\d+/g;

my @array = $line =~ m/=(\d+)/go;

print ”Good number” if $line =~ m/^-?\d+(\.\d+)?$/o;

Trang 37

Regular expressions -

substitution

Regular expressions can also be used to replace text in a string This is substitution and is quite similar to matching The newtext is quite literal, however $1, $2 etc works here

$string =~ s/regex/newtext/;

$string =~ s/(\d+) kr/$1 dollar/; # replacing kroner with dollar

There is a useful extra modifier e which allows perl code to be executed as the replacement text in substitution

$string =~ s/(\d+) kr/’X’ x length($1) ’ dollar’/e;

# replacing kroner with dollar, but replacing the amount with x’es.

Trang 38

Somewhat like simple substitition, transliteration (or translation) replaces characters with other chararcters in a string

$string =~ tr/SEARCHLIST/REPLACEMENTLIST/;

$dna =~ tr/ATCGatcg/TAGCTAGC/; # Complementing dna

$letters =~ tr/A-Z/a-z/; # lowercasing a string

The modifiers are

c Complement the SEARCHLIST.

d Delete found but unreplaced characters.

s Squash (remove) duplicate replaced characters

Transliteration returns the number of characters replaced, so a quick way to count the number of, say A’s in a string is

$count = $string =~ tr/A/A/;

Trang 39

Hashes are unordered lists and are very fexible data structures Arrays can be considered as a special case of hashes % is used for a hash Data is a hash is a number of key/value pairs One of the more obvious uses af a hash is as a translation table

my %hash = (1 => ’one’, ’one’ => 1, 2 => ’two’, ’two’ => 2);print $hash{1}, ”\n” if $hash{’two’} == $number;

$hash{3} = ’three’;

It should be obvious from the key/value pair structure, that a key is unique in the hash, where a value can be repeated any number of times

Hash slices are possible on the values Notice the @

my @slice = @hash{’one’, ’two’};

Trang 40

Hash functions

delete $hash{$key}; # Deletes a key/value pair

exists $hash{$key} # Returns true if the key/value pair exists

keys %hash # Returns an array with all the keys of the hash

values %hash # Returns an array with the values of the hash

each %hash # Used in iteration over the hash

The usual ways to iterate over a hash are

foreach my $key (keys %hash) {

print ”$key => $hash{$key}\n”;

}

while (my ($key, $value) = each %hash) {

print ”$key => $value\n”;

}

Trang 41

Semi-advanced hash usage

Sparse N-dimensional matrix

You need a large sparsely populated N-dimensional matrix A very good and easy way is to use a hash, even if a hash is a

"flat" data structure The secret is in constructing an

appropriate key An example could be a three dimensional

matrix which could be populated in this way:

$matrix{"$x,$y,$z"} = $value;

Access the matrix like this:

$value = exists $matrix{"$x,$y,$z"} ? $matrix{"$x,$y,$z"} : 0;

Notice that $x, $y, $z is not limited to numbers, they could be SwissProt IDs or other data that makes sense

The matrix does not have to be regular

Trang 42

in @_ are aliases to the real variable in the calling environment, meaning if you change $_[0] etc., it is changed in the main program.

sub mysub {

my ($parm1, $parm2) = @_; # call-by-value

return $parm1 + $parm2;

}

sub mysub2 {

return $_[0] + $_[1]; # call-by-reference

}

Trang 43

Subroutines 2

There can be any number of return statements in a subroutine You can return any number of scalars, arrays and/or hashes but they will just be flattened into a list This means that for practical purposes, you can return any number of scalars, but just one array or one hash The same argument is valid for parameters passed to the subroutine The way around this problem is to use references

sub passarray {

my ($parm1, $parm2, @array) = @_;

sub passhash {

my ($parm1, %hash) = @_;

Subroutine calls are usually denoted with &

my ($res1, $res2) = &calc1($parm1, $parm2);

my %hash = &calc(1, @parmarray);

Trang 45

$$hashref{$key} = 'blabla';

${$hashref}{$key} = 'blabla';

$hashref->{$key} = 'blabla';

Trang 46

print "This is a normal variable\n"; }

If the variable tested is really a reference, then the type of

reference is returned by ref.

if (ref $reference eq 'SCALAR') {

print "This is a reference to a scalar (variable)\n"; }

elsif (ref $reference eq 'ARRAY') {

print "This is a reference to an array\n"; }

elsif (ref $reference eq 'HASH') {

print "This is a reference to a hash\n"; }

elsif (ref $reference eq 'CODE') {

print "This is a reference to a subroutine\n"; }

elsif (ref $reference eq 'REF') {

print "This is a reference to another reference\n"; }

There are a few other possibilities, but they are seldom used

Trang 47

my ($arrayref1, $arrayref2, $hashref) = @_;

print $hashref->{’key’} if $$arrayref1[1] eq ${$arrayref2}[2];

}

# Main program

&refarraypass(\@monsterarray, \@bigarray, \%tinyhash);

Passing lists as references is efficient, both with respect to performance and memory

Định dạng
Số trang	64
Dung lượng	536,5 KB