1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: Prediction of coenzyme specificity in dehydrogenases ⁄ reductases A hidden Markov model-based method and its application on complete genomes doc

8 481 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Prediction of coenzyme specificity in dehydrogenases / reductases
Tác giả Yvonne Kallberg, Bengt Persson
Trường học Linköping University
Chuyên ngành Bioinformatics
Thể loại báo cáo khoa học
Năm xuất bản 2006
Thành phố Linköping
Định dạng
Số trang 8
Dung lượng 496,29 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

6-Phos-phogluconate dehydrogenases have an NADP-binding domain of the Rossmann-fold type followed by a Keywords bioinformatics; coenzyme specificity; hidden Markov model; prediction; Ros

Trang 1

A hidden Markov model-based method and its application

on complete genomes

Yvonne Kallberg1,2and Bengt Persson1,2

1 IFM Bioinformatics, Linko¨ping University, Sweden

2 Centre for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden

Dehydrogenases and reductases are enzymes of

funda-mental metabolic importance that utilize coenzymes

for electron transport (NAD(H), NADP(H) or

FAD(H2), herein denoted NAD, NADP and FAD)

The enzymes bind the coenzyme through a double

babab fold, resulting in a six-stranded b-sheet

surroun-ded by a-helices, known as the Rossmann fold [1]

This domain is often found in combination with other

domains of different folding types either on the N-ter-minal side, C-terN-ter-minal side, or interrupting the Ross-mann fold [2] For example, glutathione reductases have two domains of the Rossmann-fold type, one FAD-binding domain that is interrupted by an NAD(P)-binding domain (PDB code 3grs [3]) 6-Phos-phogluconate dehydrogenases have an NADP-binding domain of the Rossmann-fold type followed by a

Keywords

bioinformatics; coenzyme specificity; hidden

Markov model; prediction; Rossmann fold

Correspondence

B Persson, IFM Bioinformatics, Linko¨ping

University, S-581 83 Linko¨ping, Sweden

Fax: +46 13 137568

Tel: +46 13 282983

E-mail: bpn@ifm.liu.se

(Received 13 December 2005, revised 17

January 2006, accepted 23 January 2006)

doi:10.1111/j.1742-4658.2006.05153.x

Dehydrogenases and reductases are enzymes of fundamental metabolic importance that often adopt a specific structure known as the Rossmann fold This fold, consisting of a six-stranded b-sheet surrounded by a-helices,

is responsible for coenzyme binding We have developed a method to iden-tify Rossmann folds and predict their coenzyme specificity (NAD, NADP

or FAD) using only the amino acid sequence as input The method is based upon hidden Markov models and sequence pattern analysis The pre-diction sensitivity is 79% and the selectivity close to 100% The method was applied on a set of 68 genomes, representing the three kingdoms arch-aea, bacteria and eukaryota In prokaryotes, 3% of the genes were found

to code for Rossmann-fold proteins, while the corresponding ratio in euk-aryotes is only around 1% In all genomes, NAD is the most preferred cofactor (41–49%), followed by NADP with 30–38%, while FAD is the least preferred cofactor (21%) However, the NAD preponderance over NADP is most pronounced in archaea, and least in eukaryotes In all three kingdoms, only 3–8% of the Rossmann proteins are predicted to have more than one membrane-spanning segment, which is much lower than the frequency of membrane proteins in general Analysis of the major protein types in eukaryotes reveals that the most common type (26%) of the Ross-mann proteins are short-chain dehydrogenases⁄ reductases In addition, the identified Rossmann proteins were analyzed with respect to further protein types, enzyme classes and redundancy The described method is available

at http://www.ifm.liu.se/bioinfo, where the preferred coenzyme and its binding region are predicted given an amino acid sequence as input

Abbreviations

Trang 2

C-terminal catalytic domain consisting of a-helices

only (PDB code 2pgd [4])

In the first part of the Rossmann fold (b1a1b2), there

are three glycine residues surrounded by hydrophobic

residues, with the first glycine at the end of the b1

strand and the other two at the beginning of the a1

helix (Fig 3, top right, Experimental procedures) The

first two glycine residues are involved in dinucleotide

binding, while the third is involved in the close packing

of the b-strands and the a-helix [5] Most of the early

characterized dehydrogenases⁄ reductases showed a

spacing of these glycine residues in a GxGxxG pattern,

where ‘x’ denotes any residue [5,6] However, as new

members of this fold have been recognized, the general

pattern is now described as Gx(x)Gx(x)G [7], i.e the

spacing between the glycine residues can be one or two

residues The members of the extended short-chain

dehydrogenase⁄ reductase (SDR) family have this

GxxGxxG pattern, whereas the classical SDRs still do

not fit into the description, since they instead have a

GxxxGxG pattern ([8]and references therein)

The residues at the end of the b2 strand normally

guide identification of the nature of the coenzyme, i.e

if an enzyme binds FAD, NAD or NADP In general,

the presence of a negatively charged residue indicates

that FAD or NAD is the preferred cofactor [5], due to

the steric hindrance to accommodate the additional

2¢-phosphate found in NADP NADP-preferring

enzymes typically have a basic residue one position

down-chain instead [5] Among the classical SDRs, a

basic residue at the position preceding the second

gly-cine residue in the Gly-pattern also indicates that the

enzyme prefers NADP over NAD [8]

A more difficult task is to distinguish between the

coenzyme types FAD and NAD Most

NAD-prefer-ring enzymes have an aspartic acid residue at the end

of the b2-strand, while FAD-preferring enzymes

instead have a glutamic acid residue at this position

However, there are exceptions in both cases that

pre-vent this feature to be used to differentiate between

the two types

We have now developed a method that from the

amino acid sequence alone identifies a protein with

coenzyme binding of the Rossmann type, and predicts

the coenzyme specificity The method is applied to all

eukaryotic and archaeal genomes and a representative

set of bacterial genomes

Results and discussion

We have developed a method for prediction of

coen-zyme specificity, based upon hidden Markov models

(HMMs) and sequence motifs (see Experimental

proce-dures) To the best of our knowledge there is no pre-diction method available with the same applicability as the one presented here A search in InterPro [9] using key words such as ‘Rossmann’, ‘NAD’, ‘NADP’ and

‘FAD’ reveals many entries but there is no single entry which can be used to identify the motifs of interest While most entries are on protein family level, there are some on domain level as well, e.g ‘NAD_BS’ (identifier IPR000205) which identifies NAD binding sites However, this motif only identifies 29 gene prod-ucts in the human Ensembl [10] database, a number far below what could be expected

Rossmann fold in completed genomes The new method was applied to a selection of 68 com-pleted genomes, representing archaea, bacteria and eukaryota In total, around 9200 Rossmann proteins were identified in these genomes The median numbers

of Rossmann proteins in each organism within eukary-otes, bacteria and archaea are 196, 67 and 59, respect-ively, corresponding to 1% of the eukaryotic proteins and 3% of the prokaryotic proteins As expected, the number of predicted coenzyme binding proteins within

a genome increases with its size (Fig 1) The number

of Rossmann folds has a steep increase for genomes with up to 10 000 open reading frames (ORFs), while

it levels out for larger genomes Among eukaryotes, Oryza sativa is at the top with 655 predicted Ross-mann proteins, and Trypanosoma brucei is at the bot-tom with only three Rossmann proteins In bacteria, the corresponding extremes are Mycobacterium tuber-culosis (185 proteins) and Chlamydophila caviae (13 proteins), while in archaea the top and bottom is rep-resented by Haloarcula marismortui (146 proteins) and Nanoarchaeum equitans (five proteins) The genomes of Oryza sativa and Xenopus tropicalis have many more

0 100 200 300 400 500 600 700 800

0 10000 20000 30000 40000 50000 60000 70000

Open Reading Frames (ORFs)

Archaea Bacteria Eukaryota

Fig 1 Number of coenzyme binding proteins in each genome plot-ted versus number of open reading frames The number of Ross-mann-folds increase steeply for genomes with up to 10 000 ORFs, while it levels out for larger genomes.

Trang 3

coenzyme binding proteins than the others (655 and

646, respectively), but given the size of their genomes

(61 000 and 53 000) the proportions are still within

the same range as for other eukaryotes There are four

eukaryotic parasites (Plasmodium falciparum,

Plasmo-dium yoelii, Leishmania major and Entamoeba

histolyti-ca) for which the ratio of coenzyme binding proteins is

much lower than expected, possibly due to their ability

to rely on the dehydrogenase⁄ reductase systems of the

host organism

Redundancy

Prokaryotic species, with a typical maximum genome

size of 5000 ORFs, have a moderate sequence

redund-ancy among their coenzyme binding proteins Using a

threshold of maximum 60% pair-wise sequence

iden-tity, 0–10% of the sequences are redundant Most of

the small eukaryotic genomes have a comparable level

of redundancy In general, the redundancy of

Ross-mann proteins is similar to that of other proteins in

the genomes However, there are five genomes which

do not follow this pattern In Thermoplasma volcanium,

Pyrococcus horikoshii, Thermococcus kodakaraensis,

Candida glabrata and Yarrowia lipolytica, the

Ross-mann proteins are two to three times more redundant

than proteins in general The redundancy among

euk-aryotes increases with genome size and is 30–40% for

genome sizes around 30 000 ORFs There are some

outliers, e.g Apis mellifera, with a very high

redund-ancy level of 54% in spite of a rather small genome

(17000 ORFs), but the redundancy in general in this

genome is 46% Comparing the two plant genomes,

Arabidopsis thalianaand Oryza sativa, we find different

redundancy in general (33% vs 46%), while the

num-bers are much closer considering Rossmann proteins

only (40% versus 37%)

Prediction of coenzyme specificity

In general, for all kingdoms, NAD is the specificity

most preferred, while FAD is the least (Table 1)

Irres-pective of kingdom, FAD preference constitutes 21%

on average, while the NAD and NADP ratios vary

somewhat For nearly all prokaryotic organisms, the

NAD-preferring Rossmann folds are more numerous than the NADP-preferring (Fig 2) The only excep-tions are Lactobacillus acidophilus, Staphylococcus aureus, Aeropyrum pernix, Pyrobaculum aerophilum, Sulfolobus tokodaii and Thermococcus kodakaraensis However, among eukaryotes it can be seen that for most species the NAD- and NADP-preferring enzymes are close to equal in numbers In plant, worm and insect, there is a majority of NADP-preferring enzymes while mammals and chicken have a majority of NAD-preferring enzymes In a previous study of short chain dehydrogenases⁄ reductases (SDRs) it was found that NADP is more frequent than NAD in human, mouse, fruit fly, worm, plant and yeast [8] As mentioned above, this is still valid when including all Rossmann-fold proteins for the lower organisms, but in human and mouse the balance is shifted and NAD is the most frequent coenzyme

Dual coenzyme sites Some proteins have two Rossmann binding sites; for example, the flavin monooxygenases with both an FAD and an NAD binding site Out of the9200 pro-teins predicted to have a Rossmann fold, almost 700 have more than one such fold For all kingdoms, the fraction of Rossmann proteins with dual sites amount

to 0–10%, with some exceptions Among the eukaryo-tes Entamoeba histolytica, Plasmodium falciparum, and Plasmodium yoelii the proportion is 15, 18 and 15%, respectively The bacterial genome of Chlamydophila caviaealso show a dual sites proportion of 15%, while the archeal genomes of Thermococcus kodakaraensis and Nanoarchaeum equitans show 17 and 20%, respect-ively These high ratios are partly caused by the low number of Rossmann-fold proteins

Protein families Among the annotated human Rossmann proteins, most proteins have EC numbers within main group 1 (oxidoreductases) However, there are several SDRs and multifunctional enzymes also within groups 3 (hydrolases), 4 (lyases), and 5 (isomerases), reflecting the versatility of the Rossmann fold

Among the eukaryotic genomes annotated by Ensembl, 60% of the Rossmann-fold proteins are found to belong to 10 major groups The SDR super-family contributes with 26%, and is by far the largest group (Table 2) The three next largest groups are var-ious flavin-binding oxidoreductases with proportions each of around 6% Closely related species show approximately the same number of proteins within

Table 1 Average coenzyme preference among archaean, bacterial,

and eukaryotic genomes.

Trang 4

Fig 2 Coenzyme preferences in all investi-gated genomes from eukaryota, bacteria and archaea The left axis shows numbers

of coenzyme binding proteins, and the right axis shows numbers of ORFs Species names are given on the horizontal axis.

Table 2 The 10 most common types of Rossmann-fold proteins in eukaryotic genomes The types are listed according to annotation of Pfam families as given in the Ensembl entries The fish genome is represented by Danio rerio, the fly by Drosophila melanogaster, the worm

by Caenorhabditis elegans, and the yeast by Saccharomyces cerevisiae The total column gives the percentage of all proteins of all types and all species included in the study The species columns give the number of proteins of each type.

Type

Total proportion

FAD-dependent pyridine

nucleotide-disulphide oxidoreductases

Trang 5

each family, but there are a few notable exceptions.

Rat aldehyde dehydrogenases, for instance, are almost

twice as frequent as mouse aldehyde dehydrogenases,

and FAD-dependent pyridine nucleotide-disulphide

oxidoreductases are also more numerous in rat

com-pared to mouse Another species which deviates from

the general pattern is yeast In this species, the fifth

major group, zinc-containing alcohol dehydrogenases,

has almost as many members as the SDRs (Table 2)

Transmembrane regions

A number of dehydrogenases and reductases are

mem-brane-attached The transmembrane (TM) helix can be

found in either the N-terminal part of the protein, as in

11-beta hydroxysteroid dehydrogenase type 1 [11], or

in the C-terminal, as in monoamine oxidase B [12]

There can also be multiple TM helices as, e.g in the

proton pumping nicotinamide nucleotide

transhydroge-nase, a three domain protein with the first and third

domain binding NAD and NADP, respectively, and

the second domain consisting of 13–14 TM helices [13]

For all Rossmann proteins found in the genomes,

transmembrane regions were predicted (see

Experimen-tal procedures) Rossmann-fold regions are sometimes

falsely predicted as TM regions, due to the

hydropho-bic nature of the fold In this study, over half (57%)

of the predicted membrane-bound proteins were found

to have at least one TM region predicted in the

Ross-mann fold These predicted TM segments were

there-fore excluded in this analysis As the TM prediction

ambiguities are considerable, Rossmann-fold

predic-tions could be used to increase the reliability of TM

predictions

While the average proportion of membrane proteins

with two transmembrane segments or more is about

15–30% in all kingdoms [14,15], the proportion of

membrane-bound Rossmann-fold proteins only

amounts to 3–8% (Table 3) The proportion of

mem-brane bound proteins with Rossmann fold is about

twice as high in eukaryotes as in prokaryotes It was

also noticed that the organisms, even closely related

ones, showed considerable variations in how many

Rossmann proteins had TM regions There are three

parasites with a very high proportion of Rossmann

membrane proteins, Plasmodium falciparum and Plas-modium yoelii with one-third each, and Encephalito-zoon cuniculi with as many as five of its six predicted Rossmann proteins also being predicted as membrane proteins

The majority of proteins was found to harbor one

or two TM segments (800 proteins vs 350 proteins with more than two TM helices), with one TM most usual (600 proteins) A positioning of the TM seg-ments C-terminally of the coenzyme binding site was twice as common as an N-terminally positioning Looking at differences in TM attachment between the various coenzyme specificities it was found that NADP-preferring enzymes are the most common type

to be membrane bound Around 44% (500 proteins)

of the Rossmann membrane proteins are NADP-pre-ferring, which is a larger proportion than Rossmann NADP-preferring proteins in general (36%, Table 4) Inversely, NAD-preferring membrane proteins amount

to 33% (400 proteins) which is lower than the fre-quency in general (43%, Table 4) Finally, FAD-preference is 15% (close to 200 proteins), also below the general occurrence (21%) Thus, NADP prefer-ence is overrepresented, while NAD and FAD pre-ferences are underrepresented Protein sequences predicted to have two or more coenzyme binding sites were the least common to be membrane bound, with only 100 sequences out of 670 predicted to have

TM helices

In the human genome, there are 45 Rossmann proteins with predicted TM regions The three main families found among them are the SDRs (27%), flavin-containing monooxygenases (13%) and F420-dependent oxidoreductases (11%)

Proteins of the Rossmann-fold type constitute a con-siderable group with many members These proteins display great versatility in terms of functions and sequence compositions In spite of these differences,

Table 3 Proportion of Rossmann-fold membrane proteins, with

more than one predicted transmembrane region, compared to

membrane proteins in general.

Table 4 Distribution of various types of Rossmann-fold transmem-brane proteins with different coenzyme specificities 1N and 2N indicate 1 and 2 transmembrane segments N-terminally of the co-enzyme binding site Similarly, 1C and 2C denote 1 and 2 trans-membrane segments C-terminally of the coenzyme binding site >2

TM indicates more than two transmembrane segments, irrespect-ive of the coenzyme binding site location The numbers include all

68 investigated genomes.

Trang 6

Fig 3 Overview of the novel prediction method Sample sequences of Rossmann-fold motif are shown (top right) a and b denotes secon-dary structure elements Arrows indicate positions of critical importance for coenzyme specificity prediction In the flow chart, the boxes describe the different steps of the method.

Trang 7

our study demonstrates the power of sequence-based

predictions It is our hope and belief that the presented

prediction tool will be a welcome addition to the

arsenal of analysis methods available for large scale

protein function exploration The prediction tool is

available via http://www.ifm.liu.se/bioinfo, where a

web form allows the user to enter one or several amino

acid sequence(s) and in return get the Rossmann-fold

prediction with estimated coenzyme preference and

position

Experimental procedures

We have developed a method which identifies coenzyme

binding regions in proteins, and also predicts if the

specific-ity is FAD, NAD or NADP The method is based upon a

combination of HMMs and sequence motif matching as

outlined in Fig 3 The HMMs are used to extract a

num-ber of potential hits which subsequently are exposed to a

filtering process followed by prediction of coenzyme

specif-icity During the development phase, different combinations

of HMMs were tried: one for each type of specificity, one

for all, and one for FAD-binding combined with one for

NAD(P)-binding proteins The latter was found to be the

best solution in terms of specificity and selectivity All

HMMs were developed using the hmmbuild command in

HMMer [17], with the parameters –F and –fast, followed

by the hmmcalibrate command

The ASTRAL database [18], version 1.65 with maximum

30% sequence identity, was used to obtain a trustworthy

test set The selected proteins belong to the folds

domain’ and ‘Nucleotide-binding domain’ The dataset was

scrutinized and only proteins utilizing FAD or NAD(P) in

a typical manner were used, i.e only selecting sequences

Fig 1 A total of 16 proteins were removed, of which five

do not bind the coenzymes of interest and the others

devi-ate in their coenzyme-binding manner The resulting data

set, with 120 members, was manually aligned based upon

their three-dimensional structures, and divided into six

groups with an even distribution of the three coenzyme

spe-cificities in each group (Supplement Tables 1–3) These groups were then included in a six-fold jack-knife test, iter-atively training the two HMMs, one with FAD-binding sequences and one with NAD(P)-binding sequences, using sequences from five of the groups and testing against the remaining group and a false data set The false data sets were created by dividing the remaining sequences in the ASTRAL data set (4701 sequences) into six equally sized groups

As the method is divided into two steps, true coenzyme binding proteins can be lost either during the database search or during the classification Only two FAD-binding proteins are lost (false negatives): one is classified as NADP-binding and the other is classified as false, i.e non-Rossmann fold Among the NAD-binding proteins a total

of 10 are false negatives: four are lost during the database search, five are classified as false, and one is classified as binding The group with most failures is NADP-binding proteins, with a total of 13 false negatives: eight are lost during database search, three are classified as false, and two are falsely predicted to be NAD-binding

False positives, i.e protein sequences falsely predicted to have certain coenzyme specificities, can be of two types: either they do not bind the coenzymes of interest or they

do but the coenzyme preference is not correctly predicted Initially, during the database search, 62 proteins were picked up which do not bind any of the coenzymes of inter-est However, only three of them remain as false positives after the classification step: molybdenum cofactor biosyn-thesis protein (1jw9, MoeB), glycinamide ribonucleotide transformylase (1kjq, PurT), and a cell division protein (1ofu, FtsZ) In common for all three is a Rossmann-fold-like structure at the predicted coenzyme binding site MoeB and PurT are ATP-binding proteins, but while the predicted coenzyme binding region in MoeB is in contact with ATP,

in PurT it is the substrate (glycinamide ribonucleotide) which is in contact with the corresponding region FtsZ is a GTPase and its coenzyme is in contact with the region fal-sely predicted to be NADP-bound In addition to these three there are four Rossmann-fold proteins where the wrong coenzyme is predicted, rendering a total of seven false positives

Table 5 Prediction sensitivity and specificity of the novel prediction method as judged towards the ASTRAL database TP ¼ true positives,

þFN , the specificity as 1  FP

þTN , and Matthews correlation coefficient as ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðTP TNFP FNÞ

ðTP þFP ÞðTP þFNÞðTNþFP ÞðTNþFNÞ

Database size

Matthews correlation coefficient

Trang 8

All in all, for 95 of 120 sequences the correct coenzyme

specificity was predicted and only seven of 4701 sequences

were false positives, yielding an overall prediction sensitivity

of 79.2%, a specificity of 99.9% and a Matthews

correla-tion coefficient of 0.86 (Table 5)

The method, using HMMs trained on all six groups, was

applied on 68 genomes: all available among eukaryotes (30)

and archaea (18), and a representative selection of 20

bac-terial genomes Genome sequences were downloaded from

tigr.org/pub/data/)

TM regions were predicted using phobius [19], a tool

based on HMMs, with ability to differentiate between

sig-nal sequences and true transmembrane sequences The TM

regions were subsequently scrutinized, and in those cases

they overlap with a predicted Rossmann-fold region

(coen-zyme binding site plus 65 residues), the transmembrane

pre-diction was ignored

References

1 Rossmann MG, Liljas A, Bra¨nde´n C-I & Banaszak LJ

(1975) In (Boyer, P D, eds), The Enzymes, Vol 11, 3rd

edn pp 61–102 Academic Press, New York

2 Brenner SE, Chothia C, Hubbard TJP & Murzin AG

(1996) Understanding protein structure: using scop for

fold interpretation Methods Enzymol 266, 635–643

3 Schulz GE, Schirmer RH, Sachsenheimer W & Pai EF

(1978) The structure of the flavoenzyme glutathione

reductase Nature 273, 120–124

4 Adams MJ, Ellis GH, Gover S, Naylor CE & Phillips C

(1994) Crystallographic study of coenzyme, coenzyme

analogue and substrate binding in 6-phosphogluconate

dehydrogenase: implications for NADP specificity and

the enzyme mechanism Structure 2, 651–668

5 Wierenga RK, De Maeyer MCH & Hol GJ (1985)

Interaction of pyrophosphate moieties with a-helixes in

dinucleotide binding proteins Biochemistry 24, 1346–

1357

6 Wierenga RK, Terpstra P & Hol WGJ (1986) Prediction

of the occurrence of the ADP-binding beta alpha

beta-fold in proteins, using an amino acid sequence

finger-print J Mol Biol 187, 101–107

7 Carugo O & Argos P (1997) NADP-dependent enzymes

I: Conserved stereochemistry of cofactor binding

Pro-teins 28, 10–28

8 Kallberg Y, Oppermann U, Jo¨rnvall H & Persson B

Eur J Biochem 269, 4409–4417

9 Mulder NJ, Apweiler R, Attwood TK, et al (2005)

InterPro, progress and status in 2005 Nucleic Acids Res

33, D201–205

10 Hubbard T, Andrews D, Caccamo M, et al (2005) Ensembl 2005 Nucleic Acids Res 33, D447–453

11 Odermatt A, Arnold P, Stauffer A, Frey BM & Frey FJ (1999) The N-terminal anchor sequences of 11beta-hydroxysteroid dehydrogenases determine their orienta-tion in the endoplasmic reticulum membrane J Biol Chem 274, 28762–28770

12 Binda C, Hubalek F, Li M, Edmondson DE & Mattevi

A (2004) Crystal structure of human monoamine oxi-dase B, a drug target enzyme monotopically inserted into the mitochondrial outer membrane FEBS Lett 564, 225–228

13 Jackson JB, Peake SJ & White SA (1999) Structure and mechanism of proton-translocating transhydrogenase FEBS Lett 464, 1–8

14 Liu J & Rost B (2001) Comparing function and struc-ture between entire proteomes Protein Sci 10, 1970– 1979

15 Krogh A, Larsson B, von Heijne G & Sonnhammer EL (2001) Predicting transmembrane protein topology with

a hidden Markov model: application to complete gen-omes J Mol Biol 305, 567–580

16 Nilsson J, Persson B & von Heijne G (2005) Compara-tive analysis of amino acid distributions in integral membrane proteins from 107 genomes Proteins 60, 606–616

17 Eddy SR (1998) Profile hidden Markov models

Bioinformatics 14, 755–763 (http://hmmer.wustl.edu )

18 Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl

P, Levitt M & Brenner SE (2004) The ASTRAL Com-pendium in 2004 Nucleic Acids Res 32, 189–192

19 Ka¨ll L, Krogh A & Sonnhammer EL (2004) A com-bined transmembrane topology and signal peptide pre-diction method J Mol Biol 338, 1027–1036

Supplementary material

The following supplementary material is available online:

Table S1 All enzymes used in the development of the prediction method

Table S2 Alignment of NAD- and NADP-preferring enzymes used in the development of the prediction method

Table S3 Alignment of FAD-preferring enzymes used

in the development of the prediction method

This material is available as part of the online article from http://www.blackwell-synergy.com

Ngày đăng: 23/03/2014, 10:21

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm