A simple variable can be either a string or floating point number, but does not need to be declared as any specific type.Perl has a number of predefined variables sometimes used, consist
Trang 1Perl course
The teacher:
Peter Wad Sackett
Center for Biological Sequence Analysis
pws@cbs.dtu.dk
Computer scientist
Programmed in Perl since 1995
Taught Perl since 2002
Trang 2The beginner book
Learning Perl, 4th ed
by Randal Schwartz & Tom Christiansen (O'Reilly)
The bible
Programming Perl, 3rd ed.
by Larry Wall, Tom Christiansen & Jon Orwant (O'Reilly) The rest are more or less successful spin-offs
Trang 4Perl strengths and weaknesses
PROS:
Fairly standard C-like syntax
Runs on Unix, Windows and Mac among others
Powerful text parsing facilities
Large library base
Quick development
Known as the ”glue” that connects applications
CONS:
Not as quick as compiled languages
Possible (and easy) to make ugly and hard to maintain code
Trang 5All variables (scalars) starts with $
A variable name may contain alphanumeric characters and underscore
Case matters
A simple variable can be either a string or floating point number, but does not need to be declared as any specific type.Perl has a number of predefined variables (sometimes used), consisting of $ and a single non-alphanumeric character
Examples: $var1, $i, $MyCount, $remember_this
Trang 6Numbers and operators
Numbers are assigned in a ”natural” manner;
| (or) & (and) ^ (xor) ~ (not) >> (rightshift) << (leftshift)
Autoincrement and autodecrement:
++
Trang 7Strings are assigned with quotes:
$string = ’This is a literal string’;
$string = ”This is an $interpolated string\n”;
Interpolated strings are searched for variables and special character combinations that has meaning, like \n for newline and \t for tab
If a number is used in a string context then it is changed to a string and vice versa
String operators:
(concatenation) x (repetition)
Trang 8elsif (predicate2) { # no spelling mistake
# this will be executed if this predicate is true
Trang 9&& and || or ! not xorExamples:
$age > 18 and $height < 1.4
($name eq ’Peter’ or $name eq ’Chris’) and $wage <= 25000
Perl is using short-circuit (lazy) evaluation
Trang 13Shorthand notation
Often if statements and sometimes loops only has one line of code to be executed in the block Perl has a shorthand notation for that
print ”$i\n” for ($i = 1; $i <= 10, $++);
As seen the structure of the statement is turned around
Trang 14Output – printing to screen
The print statement prints a comma separated list of values
print ”Hello world\n”;
print ’Result is ’, $num1 + $num2, ”\n”;
print ”My name is $name\n”;
For better output formatting use printf, which is similar to the C function
printf (”%02d/%02d %04d\n”, $day, $month, $year);
printf (”Sum is %7.2f\n”, $sum);
The output of print(f) goes to the last selected filehandle unless otherwise specified This is usually STDOUT, which is usually the screen
Trang 15Input – getting it from the
keyboard
The keyboard is usually STDIN unless redirection is in play
Lines are read from the keyboard like any lines are read from a filehandle
If there is no input on the line (EoF, EoT) then $line is assigned
the undefined value There is a function for checking that, too.
if (defined $line) {}
Trang 16A simple Perl program
Trang 17Variables are declared by the key word my and are private (local) to the block in which they are declared.
Trang 18Scope (lexical)
A block can be considered as the statements between { and }
A variable declared with my is known only in the enclosing block
Only the ”most recent” declared variable is known in the block
my $age; # declaring $age in main program making it a global
# here is unwritten code that gets age
if ($age < 10) {
for (my $i = 1; $i < $age; $i++) { # private $i
print ”Year: $i\n”;
}
}
elsif ($age > 80) {
my $age = 40; # private $age only known in this block
print ”You are only $age years old.\n”;
}
print ”You are really $age years old.\n”;
Trang 19Opening files
The modern open is a three parameters function call
open(FILEHANDLE, $mode, $filename)
The usual file modes are:
< reading
> writing
>> appending+< reading and writing
|- output is piped to program in $filename-| output from program in $filename is piped (read)
to Perl
open(IN, ’<’, ”myfile.txt”) or die ”Can’t read file $!\n”;
close IN;
Trang 21File system functions
Trang 22File test operators
There is a whole range of file test operators that all look like –X
print ”File exists” if –e $filename;
Some of the more useful are:
-e True if file exists
-z True if file has zero size
-s Returns file size
-T True if text file
-B True if binary file
-r True if file is readable by effective uid/gid
-d True if file is a directory
-l True if file is a symbolic link
Trang 23String functions 1
Remove a trailing record separator from a string, usually newline
my $no_of_chars_removed = chomp $line;
Remove the last character from a string
my $char_removed = chop $line;
Return lower-case version of a string
Trang 24String functions 2
Strings start with position 0
Return the number of charaters in a string
my $len = length($string);
Find a substring within a string
my $pos = index($string, $substring, $optional_position);
Right-to-left substring search
my $pos = rindex($string, $substring, $optional_position);
Flip/reverse a string
my $rstring = reverse $string;
Formatted print into a string (like printf)
sprintf($format, $variables…);
Get or alter a portion of a string
my $substring = substr($string, $position);
my $substring = substr($string, $position, $length);
substr($string, $position, $length, $replacementstring);
Trang 25Stateful parsing
Stateful parsing is a robust and simple method to read data that are split up on several lines in a file It works by recognizing the line (or line before) where data starts (green line) and the line (or line after) it ends (red line) The green and/or red line can contain part of the data The principle is shown here, but code can be easily added to handle specific situations
my $flag = 0;
my $data = ’’;
while (defined (my $line = <IN>)) {
$flag = 0 if $line eq ’red’;
$data = $line if $flag == 1;
$flag = 1 if $line eq ’green’;
}
Trang 26Arrays are denoted with @ They are initalized as a comma separeted list of values (scalars) They can contain any mix of numbers, strings or references The first element is at position
0, i.e arrays are zero-based There is no need to declare the size of the array except for performance reasons for large arrays It grows and shrinks as needed
my @array;
my @array = (1, ’two, 3, ’four is 4’);
Individual elements are accessed as variables, i.e with $
print $array[0], $array[1];
Length of an array
scalar(@array) == $#array + 1
Trang 28Iterating over arrays
A straightforward for-loop
for (my $i = 0; $i <= $#array; $i++) {
print $array[$i]*2, ”\n”;
}
The special foreach-loop designed for arrays
foreach my $element (@array) {
print $element*2, ”\n”;
}
If you change the $element inside the foreach loop the actual value in the array is changed
Trang 29Adding and/or removing element at any place in an array
my @goners = splice(@array, $position);
my @goners = splice(@array, $position, $length);
my @goners = splice(@array, $position, $length, $value);
my @goners = splice(@array, $position, $length, @tmp);
Trang 30Array functions 2
Sorting an array
@array = sort @array; # alphabetical sort
@array = sort {$a <=> $b} @array; # numerical sort
Reversing an array
@array = reverse @array;
Splitting a string into an array
my @array = split(m/regex/, $string, $optional_number);
my @array = split(’ ’, $string);
Joining an array into a string
my $string = join(”\n”, @array);
Find elements in a list test true against a given criterion
@newarray = grep(m/regex/, @array);
Trang 31Predefined arrays
Perl has a few predefined arrays
@INC, which is a list of include directories used for location modules
@ARGV, which is the argument vector Any parameters given to the program on command line ends up here
./perl_program 1 file.txt
@ARGV contains (1, ’file.txt’) at program start
Very useful for serious programs
Trang 32Regular expressions – classes
Regular expressions return a true/false value and the match is available
print ”match” if $string =~ m/regex/;
print ”match” if not $string !~ m/regex/;
Character classes with [ ]
m/A[BCD]A/ m/A[a-g]A/ m/A[12a-z]A/ m/A[^a-z\d]A/
Trang 33Regular expressions –
quantifiers
Often a match contains repeated parts, like an unknown
number of digits This is done with a quantifier that follows the
Trang 34Regular expressions – groups
Often a pattern consists of repeated groups of characters A group is created by parenthesis This is also the way to extract data from a match
m/(AB)+/ m/([A-Z]{1,2}\d{4,})/
The match of the first group will be available in $1, second group in $2
If a data line looks like e.g first line in a swissprot entry
ID ASM_HUMAN STANDARD; PRT; 629 AA
$id = $1 if $line =~ m/ID\s+(\w+)/;
Alternation with | is a way to match either this or that
$name = $1 if $string =~ m/(Peter|Chris)/;
Trang 35Regular expressions – bindings
A very useful and performance efficient trick is to bind the match to the beginning and/or end of the line
m/^ID\s(\w+)/ caret at first position binds to the beginning of the line
m/pattern$/ dollersign at last position binds to the end of the line
Always define patterns to be as narrow as possible as that makes them stronger and more exact
Regular expressions are best created by matching the pattern you look for, not by matching what the pattern is not
Variables can be used in a pattern
Trang 36Regular expressions – modifiers
The function of a RE can be modified by adding a letter after the final / The most useful modifiers are:
g global – all occurences
o compile once, improve performance when not using variables
m multiline - ^ and $ match internal lines
m/peter/io finds Peter and PETER in a line - fast
A wonderful trick to find all numbers in a line is
my @array = $line =~ m/\d+/g;
my @array = $line =~ m/=(\d+)/go;
print ”Good number” if $line =~ m/^-?\d+(\.\d+)?$/o;
Trang 37Regular expressions -
substitution
Regular expressions can also be used to replace text in a string This is substitution and is quite similar to matching The newtext is quite literal, however $1, $2 etc works here
$string =~ s/regex/newtext/;
$string =~ s/(\d+) kr/$1 dollar/; # replacing kroner with dollar
There is a useful extra modifier e which allows perl code to be executed as the replacement text in substitution
$string =~ s/(\d+) kr/’X’ x length($1) ’ dollar’/e;
# replacing kroner with dollar, but replacing the amount with x’es.
Trang 38Somewhat like simple substitition, transliteration (or translation) replaces characters with other chararcters in a string
$string =~ tr/SEARCHLIST/REPLACEMENTLIST/;
$dna =~ tr/ATCGatcg/TAGCTAGC/; # Complementing dna
$letters =~ tr/A-Z/a-z/; # lowercasing a string
The modifiers are
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
s Squash (remove) duplicate replaced characters
Transliteration returns the number of characters replaced, so a quick way to count the number of, say A’s in a string is
$count = $string =~ tr/A/A/;
Trang 39Hashes are unordered lists and are very fexible data structures Arrays can be considered as a special case of hashes % is used for a hash Data is a hash is a number of key/value pairs One of the more obvious uses af a hash is as a translation table
my %hash = (1 => ’one’, ’one’ => 1, 2 => ’two’, ’two’ => 2);print $hash{1}, ”\n” if $hash{’two’} == $number;
$hash{3} = ’three’;
It should be obvious from the key/value pair structure, that a key is unique in the hash, where a value can be repeated any number of times
Hash slices are possible on the values Notice the @
my @slice = @hash{’one’, ’two’};
Trang 40Hash functions
delete $hash{$key}; # Deletes a key/value pair
exists $hash{$key} # Returns true if the key/value pair exists
keys %hash # Returns an array with all the keys of the hash
values %hash # Returns an array with the values of the hash
each %hash # Used in iteration over the hash
The usual ways to iterate over a hash are
foreach my $key (keys %hash) {
print ”$key => $hash{$key}\n”;
}
while (my ($key, $value) = each %hash) {
print ”$key => $value\n”;
}
Trang 41Semi-advanced hash usage
Sparse N-dimensional matrix
You need a large sparsely populated N-dimensional matrix A very good and easy way is to use a hash, even if a hash is a
"flat" data structure The secret is in constructing an
appropriate key An example could be a three dimensional
matrix which could be populated in this way:
$matrix{"$x,$y,$z"} = $value;
Access the matrix like this:
$value = exists $matrix{"$x,$y,$z"} ? $matrix{"$x,$y,$z"} : 0;
Notice that $x, $y, $z is not limited to numbers, they could be SwissProt IDs or other data that makes sense
The matrix does not have to be regular
Trang 42in @_ are aliases to the real variable in the calling environment, meaning if you change $_[0] etc., it is changed in the main program.
sub mysub {
my ($parm1, $parm2) = @_; # call-by-value
return $parm1 + $parm2;
}
sub mysub2 {
return $_[0] + $_[1]; # call-by-reference
}
Trang 43Subroutines 2
There can be any number of return statements in a subroutine You can return any number of scalars, arrays and/or hashes but they will just be flattened into a list This means that for practical purposes, you can return any number of scalars, but just one array or one hash The same argument is valid for parameters passed to the subroutine The way around this problem is to use references
sub passarray {
my ($parm1, $parm2, @array) = @_;
sub passhash {
my ($parm1, %hash) = @_;
Subroutine calls are usually denoted with &
my ($res1, $res2) = &calc1($parm1, $parm2);
my %hash = &calc(1, @parmarray);
Trang 45$$hashref{$key} = 'blabla';
${$hashref}{$key} = 'blabla';
$hashref->{$key} = 'blabla';
Trang 46print "This is a normal variable\n"; }
If the variable tested is really a reference, then the type of
reference is returned by ref.
if (ref $reference eq 'SCALAR') {
print "This is a reference to a scalar (variable)\n"; }
elsif (ref $reference eq 'ARRAY') {
print "This is a reference to an array\n"; }
elsif (ref $reference eq 'HASH') {
print "This is a reference to a hash\n"; }
elsif (ref $reference eq 'CODE') {
print "This is a reference to a subroutine\n"; }
elsif (ref $reference eq 'REF') {
print "This is a reference to another reference\n"; }
There are a few other possibilities, but they are seldom used
Trang 47my ($arrayref1, $arrayref2, $hashref) = @_;
print $hashref->{’key’} if $$arrayref1[1] eq ${$arrayref2}[2];
}
# Main program
&refarraypass(\@monsterarray, \@bigarray, \%tinyhash);
Passing lists as references is efficient, both with respect to performance and memory