1. Trang chủ
  2. » Thể loại khác

DSpace at VNU: An improving method for estimating amino acid replacement models

6 122 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 6
Dung lượng 614,45 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

An improving method for estimating amino acid replacement models Lê Văn Đạt Trường Đại học Công nghệ Chuyên ngành: Khoa học máy tính; Mã số: 60 48 01 Người hướng dẫn: TS.. Lê Sỹ Vinh

Trang 1

An improving method for estimating amino acid

replacement models

Lê Văn Đạt

Trường Đại học Công nghệ Chuyên ngành: Khoa học máy tính; Mã số: 60 48 01

Người hướng dẫn: TS Lê Sỹ Vinh

Năm bảo vệ: 2012

Abstract: Amino acid replacement models (amino acid substitution models or

ma-trices) play important roles in protein phylogenetics analysis and protein sequence alignment Dayhoff was the fi rst person who proposed a method to build amino acid models in 1972 Currently, maximum likelihood (ML) methods are widely used to estimate popular models such as WAG, LG, FLU, etc However, ML methods are slow and not applicable to large datasets The most time consuming step in estimating matrices is build-ingphylogenetics trees from protein alignments In this thesis, we propose new methods to overcome the obstacle by splitting large alignments into small ones which still contain enough evolutionary information for esti-mating matrices Experiments with both Pfam and FLU data sets show that proposed meth-ods are about three to nine times faster than the best current method while the quality of estimated matrices are nearly the same Thus, our methods will enable researchers to estimate matrices from very large datasets

Trang 2

List of Figures v

1.1 Motivation 1

1.2 Outline of thesis 5

2.1 Amino and amino substitutions 6

2.2 Markov model 8

2.3 Amino substitution models 9

2.4 Methods to estimate amino substitution models 10

2.4.1 Counting methods 10

2.4.2 Maximum likelihood methods 12

2.4.2.1 Intro 12

2.4.2.2 Steps to build an amino substitution

model by maximum likelihood method 13

2.4.3 15

3 Alignment splitting methods for estimating amino

Trang 3

3.1 The multiple alignment 17

3.2 Steps to build an amino substitution model by align-ment splitting method 19

3.3 Random alignment splitting 20

3.4 Tree-based alignment splitting 20

4 Results 25 4.1 Compare methods on Pfam data set 26

4.1.1 Data preparation 26

4.1.2 Time 26

4.1.3 P 26

4.1.4 Robustness of model 29

4.2 Compare methods on FLU data set 30

4.2.1 Data preparation 30

4.2.2 Time 31

4.2.3 P 31

4.2.4 Robustness of model 33

36

Trang 4

Derived from Multiple Protein Families Using a Maximum-Likelihood h Mole BiologyandEvolution, 18(5):691699,May2001 1,12

[2℄ Si Q Le and Olivier An Improved General Amino A t Matrix Mole BiologyandEvolution, 25(7):13071320,July2008 1,12,14,26

[3℄ RobertD Finn,John Tate, JainaMistry,Penny C.Coggill, StephenJohn J Sam-mut, Hans-Rudolf R Hotz, Goran Kristoffer Forslund, Sean R Eddy, Erik L.Sonnhammer, and Alex Bateman The Pfamprotein familiesdatabase

Nu-resear 36(Databaseissue):D281D288,January 2008 1,12,25,26

[4℄ AndreasD.Baxevanis TheImp eof Biolo al DatabasesinBiolo al overy John Wiley&Sons, 2011 2

[5℄ The proteindata bank,2012 3

[6℄ Salam Al-Karadaghi The 20 Amino A and Their Role in Protein

2012 6,7

[7℄ T.A Brown Genomes Bios Publishers,2002 6,7

[8℄ Wai-Ki Ching and K Ng Markov Chains: Models, Algorithms and ations (InternationalSeries inOperations Resear & Management e) Springer,1 edition, De-ber2005 8

[9℄ MatthewJ Betts and Robert B Russell 9

[10℄ JosephFelsenstein InferringPhylogenies SinauerAsso 2edition,September 2003 9,

13, 15

[11℄ D Bryant, N Galtier, and M.-A Poursat Likelihood in

ph InO editor, of evolutionandphylogeny,pages3362 OxfordUniversityPress,2005 9

Trang 5

[12℄ Ziheng Yang Computational Mole Evolution (Oxford Series in ology and Evolution) OxfordUniversityPress,USA, ber2006 9,13

[13℄ CarlosSetubalandJoa Meidanis.Intro toComputationalMole Biology.PWS Publishing,January1997 9

[14℄ M O Dayhoff, R V and C M Park A model of evolutionary in proteins in Atlas of Protein Se e and e., 8999 National h Foundation.,

1972 10

[15℄ M.O Dayhoff and R M artz Chapter 22: A model of evolutionary hange

in proteins IninAtlasofProtein Se eand e,1978 10,11

[16℄ D.T.Jones, W R.Taylor,and J.M Thornton The rapid generation of mutation data fromprotein Computer ationsinthe es: CABIOS, 8(3):275282,June1992 11

[17℄ G H Gonnet, M A Cohen, and S A Benner Exhaustive hing of the entire protein database e(NewYork, N.Y.),256(5062):14431445,June1992 11

[18℄ D T Jones, W R Taylor, and J M Thornton A mutation data matrix for transmembrane proteins FEBS letters,339(3):269275,February1994 11

[19℄ J Ada andM Hasegawa Model ofamino substitution in proteins ded

bymito hondrial DNA Journalof mole evolution,42(4):459468,April1996 12

[20℄ StéphaneGuindonandOlivier ASimple,Fast,andA Algorithmto EstimateLargePhylogeniesbyMaximumLikelihood Biology,52(5):696704,

er2003 12

[21℄ L S Vinh and A von Haeseler IQPNNI: moving fast through tree and stoppingin time Mole biology andevolution,21(8):15651571,August2004 12

[22℄ StéphaneGuindonandOlivier ASimple,Fast,andA Algorithmto EstimateLargePhylogeniesbyMaximumLikelihood Biology,52(5):696704,

er2003 13,19

[23℄ Peter S Klosterman, Andrew V Uzilov, Yuri R Bendaña, Robert K Bradley, Sharon Chao, Carolin Kosiol, Goldman, and Ian Holmes XRate: a fast prototyping, training and annotation tool for phylo-grammars BMC

7:428+, er2006 13,19

[24℄ M.S Waterman Intro toComputational Biology: Maps, Se esand Genomes

In-Taylor&F 1995 17,19

[25℄ M.SalemiandA.M Vandamme ThePhylo HandbookAPr al Appro toDNA andProtein Phylogeny, hapter Multiplealignment CambridgeUniversityPress,2003 17

Trang 6

[26℄ J D Thompson, D G Higgins, and T J Gibson CLUSTAL W: improving the sensitivity of progressive multiple alignment through weight-ing, position-sp gap penalties and weight matrix resear 22(22):46734680,November1994 19

[27℄ C Notredame, D G Higgins, and J Heringa T-Coee: A novel method for fast and multiple alignment Journalof mole biology,302(1):205217, September2000 19

[28℄ Robert C Edgar MUSCLE: multiple alignment with high and high throughput resear 32(5):17921797, h2004 19

[29℄ N.Saitou andM.Nei Theneighbor-joiningmethod: anewmethodfor

ingph trees Mole biology andevolution,4(4):406425,July1987 22

[30℄ Cuong Dang, Quang Le, Olivier and Vinh Le FLU, an amino sub-stitutionmodel for inuenzaproteins BMCEvolutionary Biology, 10(1):99+,April2010

25, 30

[31℄ H.KishinoandM.Hasegawa Evaluationofthe maximumlikelihoodestimateofthe evolutionary tree topologies from DNA data, and the hing order in hominoidea Journalof Mole Evolution,29(2):170179,August1989 25

[32℄ D F Robinson and L R Foulds Comparison of ph trees al

es,53(1-2):131147, February1981 25

Ngày đăng: 15/12/2017, 11:14

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN