6-Phos-phogluconate dehydrogenases have an NADP-binding domain of the Rossmann-fold type followed by a Keywords bioinformatics; coenzyme specificity; hidden Markov model; prediction; Ros
Trang 1A hidden Markov model-based method and its application
on complete genomes
Yvonne Kallberg1,2and Bengt Persson1,2
1 IFM Bioinformatics, Linko¨ping University, Sweden
2 Centre for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
Dehydrogenases and reductases are enzymes of
funda-mental metabolic importance that utilize coenzymes
for electron transport (NAD(H), NADP(H) or
FAD(H2), herein denoted NAD, NADP and FAD)
The enzymes bind the coenzyme through a double
babab fold, resulting in a six-stranded b-sheet
surroun-ded by a-helices, known as the Rossmann fold [1]
This domain is often found in combination with other
domains of different folding types either on the N-ter-minal side, C-terN-ter-minal side, or interrupting the Ross-mann fold [2] For example, glutathione reductases have two domains of the Rossmann-fold type, one FAD-binding domain that is interrupted by an NAD(P)-binding domain (PDB code 3grs [3]) 6-Phos-phogluconate dehydrogenases have an NADP-binding domain of the Rossmann-fold type followed by a
Keywords
bioinformatics; coenzyme specificity; hidden
Markov model; prediction; Rossmann fold
Correspondence
B Persson, IFM Bioinformatics, Linko¨ping
University, S-581 83 Linko¨ping, Sweden
Fax: +46 13 137568
Tel: +46 13 282983
E-mail: bpn@ifm.liu.se
(Received 13 December 2005, revised 17
January 2006, accepted 23 January 2006)
doi:10.1111/j.1742-4658.2006.05153.x
Dehydrogenases and reductases are enzymes of fundamental metabolic importance that often adopt a specific structure known as the Rossmann fold This fold, consisting of a six-stranded b-sheet surrounded by a-helices,
is responsible for coenzyme binding We have developed a method to iden-tify Rossmann folds and predict their coenzyme specificity (NAD, NADP
or FAD) using only the amino acid sequence as input The method is based upon hidden Markov models and sequence pattern analysis The pre-diction sensitivity is 79% and the selectivity close to 100% The method was applied on a set of 68 genomes, representing the three kingdoms arch-aea, bacteria and eukaryota In prokaryotes, 3% of the genes were found
to code for Rossmann-fold proteins, while the corresponding ratio in euk-aryotes is only around 1% In all genomes, NAD is the most preferred cofactor (41–49%), followed by NADP with 30–38%, while FAD is the least preferred cofactor (21%) However, the NAD preponderance over NADP is most pronounced in archaea, and least in eukaryotes In all three kingdoms, only 3–8% of the Rossmann proteins are predicted to have more than one membrane-spanning segment, which is much lower than the frequency of membrane proteins in general Analysis of the major protein types in eukaryotes reveals that the most common type (26%) of the Ross-mann proteins are short-chain dehydrogenases⁄ reductases In addition, the identified Rossmann proteins were analyzed with respect to further protein types, enzyme classes and redundancy The described method is available
at http://www.ifm.liu.se/bioinfo, where the preferred coenzyme and its binding region are predicted given an amino acid sequence as input
Abbreviations
Trang 2C-terminal catalytic domain consisting of a-helices
only (PDB code 2pgd [4])
In the first part of the Rossmann fold (b1a1b2), there
are three glycine residues surrounded by hydrophobic
residues, with the first glycine at the end of the b1
strand and the other two at the beginning of the a1
helix (Fig 3, top right, Experimental procedures) The
first two glycine residues are involved in dinucleotide
binding, while the third is involved in the close packing
of the b-strands and the a-helix [5] Most of the early
characterized dehydrogenases⁄ reductases showed a
spacing of these glycine residues in a GxGxxG pattern,
where ‘x’ denotes any residue [5,6] However, as new
members of this fold have been recognized, the general
pattern is now described as Gx(x)Gx(x)G [7], i.e the
spacing between the glycine residues can be one or two
residues The members of the extended short-chain
dehydrogenase⁄ reductase (SDR) family have this
GxxGxxG pattern, whereas the classical SDRs still do
not fit into the description, since they instead have a
GxxxGxG pattern ([8]and references therein)
The residues at the end of the b2 strand normally
guide identification of the nature of the coenzyme, i.e
if an enzyme binds FAD, NAD or NADP In general,
the presence of a negatively charged residue indicates
that FAD or NAD is the preferred cofactor [5], due to
the steric hindrance to accommodate the additional
2¢-phosphate found in NADP NADP-preferring
enzymes typically have a basic residue one position
down-chain instead [5] Among the classical SDRs, a
basic residue at the position preceding the second
gly-cine residue in the Gly-pattern also indicates that the
enzyme prefers NADP over NAD [8]
A more difficult task is to distinguish between the
coenzyme types FAD and NAD Most
NAD-prefer-ring enzymes have an aspartic acid residue at the end
of the b2-strand, while FAD-preferring enzymes
instead have a glutamic acid residue at this position
However, there are exceptions in both cases that
pre-vent this feature to be used to differentiate between
the two types
We have now developed a method that from the
amino acid sequence alone identifies a protein with
coenzyme binding of the Rossmann type, and predicts
the coenzyme specificity The method is applied to all
eukaryotic and archaeal genomes and a representative
set of bacterial genomes
Results and discussion
We have developed a method for prediction of
coen-zyme specificity, based upon hidden Markov models
(HMMs) and sequence motifs (see Experimental
proce-dures) To the best of our knowledge there is no pre-diction method available with the same applicability as the one presented here A search in InterPro [9] using key words such as ‘Rossmann’, ‘NAD’, ‘NADP’ and
‘FAD’ reveals many entries but there is no single entry which can be used to identify the motifs of interest While most entries are on protein family level, there are some on domain level as well, e.g ‘NAD_BS’ (identifier IPR000205) which identifies NAD binding sites However, this motif only identifies 29 gene prod-ucts in the human Ensembl [10] database, a number far below what could be expected
Rossmann fold in completed genomes The new method was applied to a selection of 68 com-pleted genomes, representing archaea, bacteria and eukaryota In total, around 9200 Rossmann proteins were identified in these genomes The median numbers
of Rossmann proteins in each organism within eukary-otes, bacteria and archaea are 196, 67 and 59, respect-ively, corresponding to 1% of the eukaryotic proteins and 3% of the prokaryotic proteins As expected, the number of predicted coenzyme binding proteins within
a genome increases with its size (Fig 1) The number
of Rossmann folds has a steep increase for genomes with up to 10 000 open reading frames (ORFs), while
it levels out for larger genomes Among eukaryotes, Oryza sativa is at the top with 655 predicted Ross-mann proteins, and Trypanosoma brucei is at the bot-tom with only three Rossmann proteins In bacteria, the corresponding extremes are Mycobacterium tuber-culosis (185 proteins) and Chlamydophila caviae (13 proteins), while in archaea the top and bottom is rep-resented by Haloarcula marismortui (146 proteins) and Nanoarchaeum equitans (five proteins) The genomes of Oryza sativa and Xenopus tropicalis have many more
0 100 200 300 400 500 600 700 800
0 10000 20000 30000 40000 50000 60000 70000
Open Reading Frames (ORFs)
Archaea Bacteria Eukaryota
Fig 1 Number of coenzyme binding proteins in each genome plot-ted versus number of open reading frames The number of Ross-mann-folds increase steeply for genomes with up to 10 000 ORFs, while it levels out for larger genomes.
Trang 3coenzyme binding proteins than the others (655 and
646, respectively), but given the size of their genomes
(61 000 and 53 000) the proportions are still within
the same range as for other eukaryotes There are four
eukaryotic parasites (Plasmodium falciparum,
Plasmo-dium yoelii, Leishmania major and Entamoeba
histolyti-ca) for which the ratio of coenzyme binding proteins is
much lower than expected, possibly due to their ability
to rely on the dehydrogenase⁄ reductase systems of the
host organism
Redundancy
Prokaryotic species, with a typical maximum genome
size of 5000 ORFs, have a moderate sequence
redund-ancy among their coenzyme binding proteins Using a
threshold of maximum 60% pair-wise sequence
iden-tity, 0–10% of the sequences are redundant Most of
the small eukaryotic genomes have a comparable level
of redundancy In general, the redundancy of
Ross-mann proteins is similar to that of other proteins in
the genomes However, there are five genomes which
do not follow this pattern In Thermoplasma volcanium,
Pyrococcus horikoshii, Thermococcus kodakaraensis,
Candida glabrata and Yarrowia lipolytica, the
Ross-mann proteins are two to three times more redundant
than proteins in general The redundancy among
euk-aryotes increases with genome size and is 30–40% for
genome sizes around 30 000 ORFs There are some
outliers, e.g Apis mellifera, with a very high
redund-ancy level of 54% in spite of a rather small genome
(17000 ORFs), but the redundancy in general in this
genome is 46% Comparing the two plant genomes,
Arabidopsis thalianaand Oryza sativa, we find different
redundancy in general (33% vs 46%), while the
num-bers are much closer considering Rossmann proteins
only (40% versus 37%)
Prediction of coenzyme specificity
In general, for all kingdoms, NAD is the specificity
most preferred, while FAD is the least (Table 1)
Irres-pective of kingdom, FAD preference constitutes 21%
on average, while the NAD and NADP ratios vary
somewhat For nearly all prokaryotic organisms, the
NAD-preferring Rossmann folds are more numerous than the NADP-preferring (Fig 2) The only excep-tions are Lactobacillus acidophilus, Staphylococcus aureus, Aeropyrum pernix, Pyrobaculum aerophilum, Sulfolobus tokodaii and Thermococcus kodakaraensis However, among eukaryotes it can be seen that for most species the NAD- and NADP-preferring enzymes are close to equal in numbers In plant, worm and insect, there is a majority of NADP-preferring enzymes while mammals and chicken have a majority of NAD-preferring enzymes In a previous study of short chain dehydrogenases⁄ reductases (SDRs) it was found that NADP is more frequent than NAD in human, mouse, fruit fly, worm, plant and yeast [8] As mentioned above, this is still valid when including all Rossmann-fold proteins for the lower organisms, but in human and mouse the balance is shifted and NAD is the most frequent coenzyme
Dual coenzyme sites Some proteins have two Rossmann binding sites; for example, the flavin monooxygenases with both an FAD and an NAD binding site Out of the9200 pro-teins predicted to have a Rossmann fold, almost 700 have more than one such fold For all kingdoms, the fraction of Rossmann proteins with dual sites amount
to 0–10%, with some exceptions Among the eukaryo-tes Entamoeba histolytica, Plasmodium falciparum, and Plasmodium yoelii the proportion is 15, 18 and 15%, respectively The bacterial genome of Chlamydophila caviaealso show a dual sites proportion of 15%, while the archeal genomes of Thermococcus kodakaraensis and Nanoarchaeum equitans show 17 and 20%, respect-ively These high ratios are partly caused by the low number of Rossmann-fold proteins
Protein families Among the annotated human Rossmann proteins, most proteins have EC numbers within main group 1 (oxidoreductases) However, there are several SDRs and multifunctional enzymes also within groups 3 (hydrolases), 4 (lyases), and 5 (isomerases), reflecting the versatility of the Rossmann fold
Among the eukaryotic genomes annotated by Ensembl, 60% of the Rossmann-fold proteins are found to belong to 10 major groups The SDR super-family contributes with 26%, and is by far the largest group (Table 2) The three next largest groups are var-ious flavin-binding oxidoreductases with proportions each of around 6% Closely related species show approximately the same number of proteins within
Table 1 Average coenzyme preference among archaean, bacterial,
and eukaryotic genomes.
Trang 4Fig 2 Coenzyme preferences in all investi-gated genomes from eukaryota, bacteria and archaea The left axis shows numbers
of coenzyme binding proteins, and the right axis shows numbers of ORFs Species names are given on the horizontal axis.
Table 2 The 10 most common types of Rossmann-fold proteins in eukaryotic genomes The types are listed according to annotation of Pfam families as given in the Ensembl entries The fish genome is represented by Danio rerio, the fly by Drosophila melanogaster, the worm
by Caenorhabditis elegans, and the yeast by Saccharomyces cerevisiae The total column gives the percentage of all proteins of all types and all species included in the study The species columns give the number of proteins of each type.
Type
Total proportion
FAD-dependent pyridine
nucleotide-disulphide oxidoreductases
Trang 5each family, but there are a few notable exceptions.
Rat aldehyde dehydrogenases, for instance, are almost
twice as frequent as mouse aldehyde dehydrogenases,
and FAD-dependent pyridine nucleotide-disulphide
oxidoreductases are also more numerous in rat
com-pared to mouse Another species which deviates from
the general pattern is yeast In this species, the fifth
major group, zinc-containing alcohol dehydrogenases,
has almost as many members as the SDRs (Table 2)
Transmembrane regions
A number of dehydrogenases and reductases are
mem-brane-attached The transmembrane (TM) helix can be
found in either the N-terminal part of the protein, as in
11-beta hydroxysteroid dehydrogenase type 1 [11], or
in the C-terminal, as in monoamine oxidase B [12]
There can also be multiple TM helices as, e.g in the
proton pumping nicotinamide nucleotide
transhydroge-nase, a three domain protein with the first and third
domain binding NAD and NADP, respectively, and
the second domain consisting of 13–14 TM helices [13]
For all Rossmann proteins found in the genomes,
transmembrane regions were predicted (see
Experimen-tal procedures) Rossmann-fold regions are sometimes
falsely predicted as TM regions, due to the
hydropho-bic nature of the fold In this study, over half (57%)
of the predicted membrane-bound proteins were found
to have at least one TM region predicted in the
Ross-mann fold These predicted TM segments were
there-fore excluded in this analysis As the TM prediction
ambiguities are considerable, Rossmann-fold
predic-tions could be used to increase the reliability of TM
predictions
While the average proportion of membrane proteins
with two transmembrane segments or more is about
15–30% in all kingdoms [14,15], the proportion of
membrane-bound Rossmann-fold proteins only
amounts to 3–8% (Table 3) The proportion of
mem-brane bound proteins with Rossmann fold is about
twice as high in eukaryotes as in prokaryotes It was
also noticed that the organisms, even closely related
ones, showed considerable variations in how many
Rossmann proteins had TM regions There are three
parasites with a very high proportion of Rossmann
membrane proteins, Plasmodium falciparum and Plas-modium yoelii with one-third each, and Encephalito-zoon cuniculi with as many as five of its six predicted Rossmann proteins also being predicted as membrane proteins
The majority of proteins was found to harbor one
or two TM segments (800 proteins vs 350 proteins with more than two TM helices), with one TM most usual (600 proteins) A positioning of the TM seg-ments C-terminally of the coenzyme binding site was twice as common as an N-terminally positioning Looking at differences in TM attachment between the various coenzyme specificities it was found that NADP-preferring enzymes are the most common type
to be membrane bound Around 44% (500 proteins)
of the Rossmann membrane proteins are NADP-pre-ferring, which is a larger proportion than Rossmann NADP-preferring proteins in general (36%, Table 4) Inversely, NAD-preferring membrane proteins amount
to 33% (400 proteins) which is lower than the fre-quency in general (43%, Table 4) Finally, FAD-preference is 15% (close to 200 proteins), also below the general occurrence (21%) Thus, NADP prefer-ence is overrepresented, while NAD and FAD pre-ferences are underrepresented Protein sequences predicted to have two or more coenzyme binding sites were the least common to be membrane bound, with only 100 sequences out of 670 predicted to have
TM helices
In the human genome, there are 45 Rossmann proteins with predicted TM regions The three main families found among them are the SDRs (27%), flavin-containing monooxygenases (13%) and F420-dependent oxidoreductases (11%)
Proteins of the Rossmann-fold type constitute a con-siderable group with many members These proteins display great versatility in terms of functions and sequence compositions In spite of these differences,
Table 3 Proportion of Rossmann-fold membrane proteins, with
more than one predicted transmembrane region, compared to
membrane proteins in general.
Table 4 Distribution of various types of Rossmann-fold transmem-brane proteins with different coenzyme specificities 1N and 2N indicate 1 and 2 transmembrane segments N-terminally of the co-enzyme binding site Similarly, 1C and 2C denote 1 and 2 trans-membrane segments C-terminally of the coenzyme binding site >2
TM indicates more than two transmembrane segments, irrespect-ive of the coenzyme binding site location The numbers include all
68 investigated genomes.
Trang 6Fig 3 Overview of the novel prediction method Sample sequences of Rossmann-fold motif are shown (top right) a and b denotes secon-dary structure elements Arrows indicate positions of critical importance for coenzyme specificity prediction In the flow chart, the boxes describe the different steps of the method.
Trang 7our study demonstrates the power of sequence-based
predictions It is our hope and belief that the presented
prediction tool will be a welcome addition to the
arsenal of analysis methods available for large scale
protein function exploration The prediction tool is
available via http://www.ifm.liu.se/bioinfo, where a
web form allows the user to enter one or several amino
acid sequence(s) and in return get the Rossmann-fold
prediction with estimated coenzyme preference and
position
Experimental procedures
We have developed a method which identifies coenzyme
binding regions in proteins, and also predicts if the
specific-ity is FAD, NAD or NADP The method is based upon a
combination of HMMs and sequence motif matching as
outlined in Fig 3 The HMMs are used to extract a
num-ber of potential hits which subsequently are exposed to a
filtering process followed by prediction of coenzyme
specif-icity During the development phase, different combinations
of HMMs were tried: one for each type of specificity, one
for all, and one for FAD-binding combined with one for
NAD(P)-binding proteins The latter was found to be the
best solution in terms of specificity and selectivity All
HMMs were developed using the hmmbuild command in
HMMer [17], with the parameters –F and –fast, followed
by the hmmcalibrate command
The ASTRAL database [18], version 1.65 with maximum
30% sequence identity, was used to obtain a trustworthy
test set The selected proteins belong to the folds
domain’ and ‘Nucleotide-binding domain’ The dataset was
scrutinized and only proteins utilizing FAD or NAD(P) in
a typical manner were used, i.e only selecting sequences
Fig 1 A total of 16 proteins were removed, of which five
do not bind the coenzymes of interest and the others
devi-ate in their coenzyme-binding manner The resulting data
set, with 120 members, was manually aligned based upon
their three-dimensional structures, and divided into six
groups with an even distribution of the three coenzyme
spe-cificities in each group (Supplement Tables 1–3) These groups were then included in a six-fold jack-knife test, iter-atively training the two HMMs, one with FAD-binding sequences and one with NAD(P)-binding sequences, using sequences from five of the groups and testing against the remaining group and a false data set The false data sets were created by dividing the remaining sequences in the ASTRAL data set (4701 sequences) into six equally sized groups
As the method is divided into two steps, true coenzyme binding proteins can be lost either during the database search or during the classification Only two FAD-binding proteins are lost (false negatives): one is classified as NADP-binding and the other is classified as false, i.e non-Rossmann fold Among the NAD-binding proteins a total
of 10 are false negatives: four are lost during the database search, five are classified as false, and one is classified as binding The group with most failures is NADP-binding proteins, with a total of 13 false negatives: eight are lost during database search, three are classified as false, and two are falsely predicted to be NAD-binding
False positives, i.e protein sequences falsely predicted to have certain coenzyme specificities, can be of two types: either they do not bind the coenzymes of interest or they
do but the coenzyme preference is not correctly predicted Initially, during the database search, 62 proteins were picked up which do not bind any of the coenzymes of inter-est However, only three of them remain as false positives after the classification step: molybdenum cofactor biosyn-thesis protein (1jw9, MoeB), glycinamide ribonucleotide transformylase (1kjq, PurT), and a cell division protein (1ofu, FtsZ) In common for all three is a Rossmann-fold-like structure at the predicted coenzyme binding site MoeB and PurT are ATP-binding proteins, but while the predicted coenzyme binding region in MoeB is in contact with ATP,
in PurT it is the substrate (glycinamide ribonucleotide) which is in contact with the corresponding region FtsZ is a GTPase and its coenzyme is in contact with the region fal-sely predicted to be NADP-bound In addition to these three there are four Rossmann-fold proteins where the wrong coenzyme is predicted, rendering a total of seven false positives
Table 5 Prediction sensitivity and specificity of the novel prediction method as judged towards the ASTRAL database TP ¼ true positives,
þFN , the specificity as 1 FP
þTN , and Matthews correlation coefficient as ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðTP TNFP FNÞ
ðTP þFP ÞðTP þFNÞðTNþFP ÞðTNþFNÞ
Database size
Matthews correlation coefficient
Trang 8All in all, for 95 of 120 sequences the correct coenzyme
specificity was predicted and only seven of 4701 sequences
were false positives, yielding an overall prediction sensitivity
of 79.2%, a specificity of 99.9% and a Matthews
correla-tion coefficient of 0.86 (Table 5)
The method, using HMMs trained on all six groups, was
applied on 68 genomes: all available among eukaryotes (30)
and archaea (18), and a representative selection of 20
bac-terial genomes Genome sequences were downloaded from
tigr.org/pub/data/)
TM regions were predicted using phobius [19], a tool
based on HMMs, with ability to differentiate between
sig-nal sequences and true transmembrane sequences The TM
regions were subsequently scrutinized, and in those cases
they overlap with a predicted Rossmann-fold region
(coen-zyme binding site plus 65 residues), the transmembrane
pre-diction was ignored
References
1 Rossmann MG, Liljas A, Bra¨nde´n C-I & Banaszak LJ
(1975) In (Boyer, P D, eds), The Enzymes, Vol 11, 3rd
edn pp 61–102 Academic Press, New York
2 Brenner SE, Chothia C, Hubbard TJP & Murzin AG
(1996) Understanding protein structure: using scop for
fold interpretation Methods Enzymol 266, 635–643
3 Schulz GE, Schirmer RH, Sachsenheimer W & Pai EF
(1978) The structure of the flavoenzyme glutathione
reductase Nature 273, 120–124
4 Adams MJ, Ellis GH, Gover S, Naylor CE & Phillips C
(1994) Crystallographic study of coenzyme, coenzyme
analogue and substrate binding in 6-phosphogluconate
dehydrogenase: implications for NADP specificity and
the enzyme mechanism Structure 2, 651–668
5 Wierenga RK, De Maeyer MCH & Hol GJ (1985)
Interaction of pyrophosphate moieties with a-helixes in
dinucleotide binding proteins Biochemistry 24, 1346–
1357
6 Wierenga RK, Terpstra P & Hol WGJ (1986) Prediction
of the occurrence of the ADP-binding beta alpha
beta-fold in proteins, using an amino acid sequence
finger-print J Mol Biol 187, 101–107
7 Carugo O & Argos P (1997) NADP-dependent enzymes
I: Conserved stereochemistry of cofactor binding
Pro-teins 28, 10–28
8 Kallberg Y, Oppermann U, Jo¨rnvall H & Persson B
Eur J Biochem 269, 4409–4417
9 Mulder NJ, Apweiler R, Attwood TK, et al (2005)
InterPro, progress and status in 2005 Nucleic Acids Res
33, D201–205
10 Hubbard T, Andrews D, Caccamo M, et al (2005) Ensembl 2005 Nucleic Acids Res 33, D447–453
11 Odermatt A, Arnold P, Stauffer A, Frey BM & Frey FJ (1999) The N-terminal anchor sequences of 11beta-hydroxysteroid dehydrogenases determine their orienta-tion in the endoplasmic reticulum membrane J Biol Chem 274, 28762–28770
12 Binda C, Hubalek F, Li M, Edmondson DE & Mattevi
A (2004) Crystal structure of human monoamine oxi-dase B, a drug target enzyme monotopically inserted into the mitochondrial outer membrane FEBS Lett 564, 225–228
13 Jackson JB, Peake SJ & White SA (1999) Structure and mechanism of proton-translocating transhydrogenase FEBS Lett 464, 1–8
14 Liu J & Rost B (2001) Comparing function and struc-ture between entire proteomes Protein Sci 10, 1970– 1979
15 Krogh A, Larsson B, von Heijne G & Sonnhammer EL (2001) Predicting transmembrane protein topology with
a hidden Markov model: application to complete gen-omes J Mol Biol 305, 567–580
16 Nilsson J, Persson B & von Heijne G (2005) Compara-tive analysis of amino acid distributions in integral membrane proteins from 107 genomes Proteins 60, 606–616
17 Eddy SR (1998) Profile hidden Markov models
Bioinformatics 14, 755–763 (http://hmmer.wustl.edu )
18 Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl
P, Levitt M & Brenner SE (2004) The ASTRAL Com-pendium in 2004 Nucleic Acids Res 32, 189–192
19 Ka¨ll L, Krogh A & Sonnhammer EL (2004) A com-bined transmembrane topology and signal peptide pre-diction method J Mol Biol 338, 1027–1036
Supplementary material
The following supplementary material is available online:
Table S1 All enzymes used in the development of the prediction method
Table S2 Alignment of NAD- and NADP-preferring enzymes used in the development of the prediction method
Table S3 Alignment of FAD-preferring enzymes used
in the development of the prediction method
This material is available as part of the online article from http://www.blackwell-synergy.com