1. Trang chủ
  2. » Công Nghệ Thông Tin

Introduce to perl ebook

39 252 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 39
Dung lượng 168,5 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

I recommend having one of these books around in case you need some help using the command line: • For students who haven’t done much UNIX: Sams Teach Yourself Unix in 24 Hours 4th Editio

Trang 1

Introduction to Perl

Matt Hudson

(with thanks to Stuart Brown of NYU, for some great examples and

teaching ideas)

Trang 2

• clustalw: align protein or DNA sequences

• fasta34: search a sequence using an older, slower, but sometimes more flexible algorithm

Trang 3

grep – my favorite

• Allows you to pick out lines of a text file that match a query, count them, and retrieve lines around the

match.

grep ‘Query=’ myblast.txt

What sequences did I BLAST?

grep –c ‘>’ testprotein.txt

How many sequences are in this file?

grep –A 10 ‘>’ testprotein.txt

Give me the first ten lines of each protein

Trang 4

ftp commands

• ftp ftp.ncbi.nih.gov go to the NCBI site

• open open a connection

• ls same as UNIX

• cd same as UNIX

• get get me this file

• mget get more than one file

• put put a file on the server

• lcd local cd

• ! local shell

• close close connection

• bye exit the ftp program

Trang 5

Test time

• OK You are now up and running with UNIX, and can use it to do some fairly sophisticated bioinformatics

• We’re going to concentrate on Perl

scripting from now on

Trang 6

UNIX books

• You might find that your UNIX skills need some refreshing from time to time I recommend having one of these books around in case you need some help using the command line:

• For students who haven’t done much UNIX:

Sams Teach Yourself Unix in 24 Hours (4th Edition) (Sams Teach Yourself in 24 Hours) (Paperback)

by Dave Taylor

For more advanced UNIX users:

UNIX System V: A Practical Guide (3rd Edition) (Paperback)

by Mark G Sobell

• Also, for those of you not so familiar with bioinformatics:

Bioinformatics for Dummies (Paperback)

by Jean-Michel Claverie , Cedric Notredame , Jean-Michel Claverie ,

Cedric Notredame

Trang 7

This I have heard good things about but not used much myself:

Beginning Perl, Second Edition (Paperback)

This is a classic but slow going if you know no programming:

Learning Perl, Fourth Edition (Paperback)

This is better if you have little programming experience, but not a textbook:

Perl for Dummies (Fourth Edition) (Paperback)

by Paul Hoffman

• Once you get started

Programming Perl, 3 rd edition,

Trang 8

Why use Perl?

• Interpreted language – quick to program

• Easy to learn compared to most

languages

• Designed for working with text files

• Free for all operating systems

• Most popular language in bioinformatics – many scripts available you can

“borrow”, also ready made modules

Trang 9

• Run the program

• Look at the output

• Correct the errors (debugging)

• Edit the script and try again.

Trang 10

All programming courses traditionally start with a

program that prints “Hello, world!” So in keeping with

that tradition:

Note:

No line numbers.

Each command line ends with a semicolon

Remember your program?

#!/usr/bin/perl

print “Hello, world\n”;

Trang 11

– Use \n in a text string to signify a newline.

– The \ character is called “backslash”.

– It is an “escape” – it changes the meaning of the character after it In this case it changes “n” to “newline” Other

examples are \t (tab) or \$ (= print an actual dollar sign,

normally a dollar sign has a special meaning).

Trang 12

Program details

• Perl programs on UNIX start with a line like:

#!/usr/bin/perl

• Perl ignores anything after a # (this is a

command not to Perl, but to the UNIX shell).

• Elsewhere in the program # is used for

comments to explain the code.

• Lines that are Perl commands end with a semicolon (;).

Trang 13

Run your Perl program

Trang 14

• In Perl, strings are very important They are just a series of any text characters – letters, numbers, ><?>:$%^&*, etc.

• In the statement

print “Hello, world\n”;

this is a

Trang 15

string Numbers, etc

• The other common type of data is a number.

• Perl can handle numbers in most common formats, without any complications:

456 5.6743 6.3E-26

• Arithmetic functions:

+ (add)

- (minus) / (divide)

* (multiply)

** (exponentiation)

Trang 16

A program using numbers

#!/usr/bin/perl

print “2+2\n”;

print 3*4 , “\n”;

print “8/2=” , 8/2 , “\n”;

Do you get it?

Numbers in quotes are part of a string.

Numbers outside quotes are numbers, and

Trang 17

• Up till now, we’ve been telling the

computer exactly what to print But in order for the program to generate what

is printed, we need to use variables

• A variable name starts with “$”

• It can be either a string or a number

Trang 18

Assigning values

In pretty much all programming languages, = means

“assign this value to this variable”.

The “my” command in Perl initializes the variable This is optional but highly recommended.

So, you assign values to a variable as follows:

my $number = 123;

Trang 19

A program with variables

Trang 20

• If you put a variable inside double quotes, Perl

interpolates the variable

print “The number is $number\n”

The number is 9

• If you use single quotes, no interpolation happens

print ‘The number is $number\n’

The number is $number\n

• A more flexible way to do this is to “escape” the $

print “The value of \$number is $number\n”;

Trang 21

Variables - summary

• A variable name starts with a $

• It contains a number or a text string

• Use my to define a variable

• Use = to assign a value

• Use \ to stop the variable being

interpolated

• Take care with variable names and with changing the contents of variables

Trang 22

Standard Input

• To make the program do something, we need to input data

expect input, by default from the keyboard – Usually this is assigned to a variable

print “Please type a number: ”;

my $num = <STDIN>;

print “Your number is $num\n”;

Trang 24

Perl evaluates the expression (1 == 1 )

Note TWO NOT ONE EQUALS SIGNS!

The if operator causes the command in curly

brackets to be executed ONLY IF the expression is true

Trang 25

• if evaluates some statement in

parentheses (must be true or false)

• Note: conditional block is indented,

using tabs

– Perl doesn’t care about indents, but it

makes your code more “human readable”

Trang 26

Comparing variables

if ($one == $two) {print “one equals two”;}

Note there are TWO equals signs in this expression If you remember, = means “assign this variable this value” So == actually means “equals” You can also use

> Greater than

< Less than

>= Greater than or equal to

<= Less than or equal to

!= Not equal to

Trang 27

What’s a block?

• In the case of an “if” statement:

• If the test is true, execute all the

command lines inside the {} brackets If not, then go on past the closing } to the statements below

• You can also do stuff in a block over and

over again using a loop – more later.

Trang 28

die, scum

• die kills your script safely and prints a

message

• It is often used to prevent you doing

something regrettable – e.g running your script on a file that doesn’t exist, or

overwriting an existing file

Trang 29

Exercising the Perl muscles

• Now let’s write a script to ask the user their age, and then deliver an insult

specific to the age bracket:

• Over 25 - old fogey

• Under 15 – callow youth

• 15-25 – (insert your own insult here)

Trang 30

Conditional Blocks, summary

• An if test can be used to control multiple lines of commands, as in this example *

print “Enter your age: ”;

$age = <STDIN>;

chomp $age;

if ($age < 15) { print “You are too young for this kind of work!\n”;

die “too young”;

}

if ($age > 25) {

print “You’re old enough to know better!”;

die “too old”;

}

Trang 31

• An array can store multiple pieces of data

• They are essential for the most useful

functions of Perl They can store data such as:

– the lines of a text file (e.g primer sequences) – a list of numbers (e.g BLAST e values)

• Arrays are designated with the symbol @

my @bases = (“A”, “C”, “G”, “T”);

Trang 32

Converting a variable to an array

split splits a variable into parts and puts them

in an array.

my $dnastring = "ACGTGCTA";

my @dnaarray = split //, $dnastring;

@dnaarray is now (A, C, G, T, G, C, T, A)

@dnaarray = split /T/, $dnastring;

@dnaarray is now (ACG, GC, A)

Trang 33

• join combines the elements of an array into a single scalar variable (a string)

$dnastring = join('', @dnaarray);

Converting an array to a variable

which array spacer

(empty here)

Trang 34

• A loop repeats a bunch of functions until it is done The functions are placed in a BLOCK – some code delimited with curly brackets {}

• Loops are really useful with arrays.

• The “foreach” loop is probably the most useful of all:

foreach my $base (@dnaarray) {

print "$base “;

}

Trang 35

• String comparison (is the text the same?)

• eq (equal )

• ne (not equal ) There are others but beware of them!

Comparing strings

Trang 36

Getting part of a string

• substr takes characters out of a

string

$letter = substr($dnastring, $position, 1)

which string where in

the string

how many letters to take

Trang 37

Combining strings

• Strings can be concatenated (joined)

• Use the dot . operator

Trang 38

Making Decisions - review

• The if operator is generally used together

with numerical or string comparison

operators, inside an (expression)

numerical: ==, !=, >, <, ≥, ≤

• You can make decisions on each member

of an array using a loop which puts each

part of the array through the test, one at a

Trang 39

More healthy exercise

• Write a program that asks the user for a DNA restriction site, and then tells them whether that particular sequence matches the site for the restriction enzyme EcoRI, or Bam HI, or Hind III.

• Site for EcoR1: GAATTC

• Bam H1: GGATCC

• Hind III: AAGCTT

Ngày đăng: 23/10/2014, 16:11

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN