Thus, in order to separate SDRs from other enzymes, key positions in the classical and extended motifs were deduced from multiple sequence alignments.. At the second level, members of th
Trang 1Short-chain dehydrogenases/reductases (SDRs)
Coenzyme-based functional assignments in completed genomes
Yvonne Kallberg1,2, Udo Oppermann1, Hans Jo¨rnvall1and Bengt Persson1,2
1
Department of Medical Biochemistry and Biophysics and2Stockholm Bioinformatics Centre, Karolinska Institutet, Sweden
Short-chain dehydrogenases/reductases (SDRs) are enzymes
of great functional diversity Even at sequence identities of
typically only 15–30%, specific sequence motifs are
detect-able, reflecting common folding patterns We have
devel-oped a functional assignment scheme based on these motifs
and we find five families Two of these families were known
previously and are called ÔclassicalÕ and ÔextendedÕ families,
but they are now distinguished at a further level based on
coenzyme specificities This analysis gives seven subfamilies
of classical SDRs and three subfamilies of extended SDRs
We find that NADP(H) is the preferred coenzyme among
most classical SDRs, while NAD(H) is that preferred among
most extended SDRs Three families are novel entities,
denoted ÔintermediateÕ, ÔdivergentÕ and ÔcomplexÕ,
encom-passing short-chain alcohol dehydrogenases, enoyl
reducta-ses and multifunctional enzymes, respectively The
assignment scheme was applied to the genomes of human,
mouse, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana and Saccharomyces cerevisiae In the animal genomes, the extended SDRs amount to around one quarter or less of the total number of SDRs, while in the A.thalianaand S.cerevisiae genomes, the extended mem-bers constitute about 40% of the SDR forms The nummem-bers
of NAD(H)-dependent and NADP(H)-dependent SDRs are similar in human, mouse and plant, while the propor-tions of NAD(H)-dependent enzymes are much lower in fruit fly, worm and yeast We show that, in spite of the great diversity of the SDR superfamily, the primary structure alone can be used for functional assignments and for pre-dictions of coenzyme preference
Keywords: short-chain dehydrogenases/reductases; genome; coenzyme; sequence patterns; bioinformatics
Short-chain dehydrogenases/reductases (SDRs) are
enzymes of 250 residue subunits catalysing
NAD(P)(H)-dependent oxidation/reduction reactions The concept of
SDRs was established in 1981 [1], at a time when the only
members known were a prokaryotic ribitol dehydrogenase
and an insect alcohol dehydrogenase Since then, the SDR
family has grown enormously, both in the number of
known members and the diversity of their functions
Already some years ago, over 1000 forms were ascribed to
the SDR superfamily [2], and currently at least 3000
members, including species variants, are known with a
substrate spectrum ranging from alcohols, sugars, steroids
and aromatic compounds to xenobiotics The N-terminal
region binds the coenzymes NAD(H) or NADP(H), while
the C-terminal region constitutes the substrate binding part
Although the residue identity is as low as 15–30%, the 3D
folds are quite similar, except for the C-terminal regions
The SDRs have been divided into two large families,
ÔclassicalÕ and ÔextendedÕ, with different Gly-motifs in the
coenzyme-binding regions, and different chain lengths;
around 250 residues in classical SDRs and 350 in extended
SDRs [3] Few residues are completely conserved, but
several sequence motifs are distinguishable within the families
It is desirable to define distinct characteristics of these families for functional assignments of new sequences added
to the SDR superfamily We have now defined characteristic differences for all SDR types and distinguish five SDR families Furthermore, seven subfamilies are delineated within the classical SDRs and three subfamilies within the extended SDRs These characteristics can be used for functional predictions of further, novel structures, and the assignment system developed is now applied to the genomes
of human, mouse, Drosphila melanogaster, Arabidopsis tha-liana, Caenorhabditis elegans and Saccharomyces cerevisiae
M A T E R I A L S A N D M E T H O D S
We trained a Hidden Markov model [4] on a set of 95 SDR sequences extracted from SWISSPROT with less than 70% identity in pairwise comparisons, using a manually curated alignment based on human SDRs as seed sequences The resulting Hidden Markov model was subsequently used to search the databases SWISSPROT[5] andKIND[6], selecting every sequence that had an expect value below 10)15as a candidate SDR When these candidate sequences were aligned, they separated into five clusters (Fig 1), two of which were the classical and extended families [3] and three were the specific families of insect alcohol dehydrogenase, enoyl reductase and multifunctional enzymes These three novel families were named ÔintermediateÕ, ÔdivergentÕ and ÔcomplexÕ, respectively The first level of assignments would then be to sort sequences into these five families using a motif-based approach
Correspondence to B Persson, Department of Medical Biochemistry
and Biophysics, Karolinska Institutet, S-171 77 Stockholm,
Sweden Fax: + 46 8 337 462, Tel.: + 46 8 728 7730,
E-mail: bengt.persson@mbb.ki.se
Abbreviation: SDR, short-chain dehydrogenase/reductase.
(Received 25 April 2002, revised 16 July 2002,
accepted 24 July 2002)
Trang 2Based on a nonredundant set (< 80% identity; 100
classical, 80 extended, 7 intermediate, 12 divergent and 12
complex) of known SDR members in SWISSPROT, we
developed sequence motifs covering the most conserved
parts of the sequences Three sequence motifs were
devel-oped for each family (Fig 2) to optimize specificity and
sensitivity Within each family, 40 of the most preserved
amino acid residues in the alignment were selected The
amino acid types ÔacceptedÕ at a position were those
observed together with those with similar amino acid
properties, e.g if Ile and Val are observed, then Leu is also
accepted at that position
During an iterative process, an automated sorting
procedure was developed The sequences aligned were
scored against the sequence motifs in the following manner
The presence of an accepted amino acid residue type at a
motif position increases the sequence score with one point
If instead a gapis found at that position, the score is
decreased by one point A large region of the motifs cover
the coenzyme-binding region Other enzyme families that
also bind NAD(P)(H) might be detected with this profile,
and introduce false positives in our set Thus, in order to
separate SDRs from other enzymes, key positions in the
classical and extended motifs were deduced from multiple
sequence alignments These key positions (bold in Fig 2)
render a score of +3 if present and a score of)3 if absent
Thus, each sequence is associated with five different scores,
one for each family Incomplete sequences can pose a problem when using sequence-based methods, because such sequences might render a low score and thus be classified incorrectly In this report, sequences with more than 20% gappositions in the alignment were removed from the data set and not subjected to the scoring process
The sorting procedure, with the groups and thresholds, is shown in Fig 3 The thresholds were obtained through a systematic iterative procedure The scores were used to sort the sequences into one of the five families There are members of the SDR superfamily that do not meet any of the family requirements, i.e the scores are below the thresholds Rather than to lower the thresholds or to extend the motifs, such sequences are sorted into an artificial group called ÔunclassifiedÕ SDR Another artificial group, Ôpoten-tialÕ SDR, is also used It will consist of sequences that are not SDR members as far as can be judged today, but have some properties in common with the SDR family For the structural comparisons, the 3D structures of members within the SDR superfamily were superimposed usingICM(version 2.7, Molsoft LLC, San Diego, CA, USA) [7]
R E S U L T S
Five SDR families
In order to get functional assignments for the members of the SDR superfamily, we developed an assignment system
to distinguish families with specific characteristics The SDR superfamily divides into five families (Fig 1), of which two are the previously established classical and extended, and three are novel entities, denoted intermediate, divergent and complex
The classical family encompasses oxidoreductases (EC 1.-.-.-), such as steroid dehydrogenases and carbonyl reductases The extended family consists of isomerases (EC 5.-.-.-), e.g galactose epimerases, and lyases (EC 4.-.-.-), such as glucose dehydratases, but several oxidoreductases are also found within this family, e.g in
Fig 2 Conserved sequence motifs in the SDR families as derived from a multiple sequence alignment For each of the five SDR families, specific sequence patterns exist Three motif segments, with a total of 40 preserved positions that cover the coenzyme-binding and active site regions, have been chosen for each family Multiple amino acid occurrences at a position are written within brackets ÔxÕ denotes any amino acid residue or gap, and when present the subsequent number indicates the number of x residues/gaps Amino acid residues written in bold indicate positions of special importance in the classical and extended motifs Because the motifs are based upon the sequence majority, insertions in single sequences do not affect the patterns.
Fig 1 The two levels of classification within the SDR superfamily At
the first level, the members of the SDR superfamily are separated into
five families At the second level, members of the classical and extended
families are separated into seven and three subfamilies, respectively,
based upon coenzyme-binding residue patterns.
Trang 3multifunctional enzymes such as the 3b-hydroxysteroid
dehydrogenase/D 4,5 isomerase cluster
The intermediate family exhibits an atypical Gly-motif
(G/AxxGxxG/A) that resembles patterns of extended
SDRs, except that Ala is highly represented instead of
Gly However, the remaining parts of the sequences are
more closely related to the classical SDRs, e.g with an
NGAG motif (corresponding to the NNAG motif in b4,
Table 4), and with a subunit size ( 250 residues) as the
classical SDRs In this family, thus denoted intermediate, we
find fruit fly alcohol dehydrogenases, constituting a set of
SDRs that divides into three lines with around 35% sequence identity, in pair-wise comparisons, between them The divergent family with enoyl reductases from bacteria and plants constitutes a set of NADH-dependent enzymes with three patterns that deviate from those typical of most SDRs First, the Gly-motif is differently spaced with five residues instead of three between the first two glycine residues Second, in bacteria the second and third glycines have been replaced with serine and alanine, i.e the motif is GxxxxxSxA Third, there is a methionine instead of a tyrosine in the active site motif, while the tyrosine is found three positions upchain, i.e YxxMxxxK instead of YxxxK The 3D structures of FabI from Escherichia coli (PDB code 1qsg) and Mycobacterium tuberculosis (1bvr) reveal that the tyrosine and lysine residues are close in space They are located within an a-helix and the spacing between the two residues makes them face the same side with a similar distance between Tyr-Og and Lys-Nf as for the classical SDRs, i.e with a 1.3-A˚ difference compared to the 3a,20b-hydroxysteroid dehydrogenase, and with spatial freedom for the lysine residue to move closer to the tyrosine residue Thus, they can function the same way as when the residues are only three positions apart [8,9]
The complex family is named after its members, which are parts of multifunctional enzyme complexes present in all forms of life, e.g fatty acid synthase They are NADP(H)-binding proteins with the SDR region having a beta-ketoacyl reductive function This grouphas the unique motif of YxxxN at the active site rather than the typical YxxxK
Using a Hidden Markov model, candidate SDR sequences were extracted fromSWISSPROTandKIND These sequences were aligned and sorted into the five families, classical, extended, intermediate, divergent and complex, using a motif-based approach (for details please see the Materials and methods section) The two databases show the same ratios for the different families (Table 1) The family of classical SDRs is the largest, capturing half of the sequences, while the family of extended SDRs is second in size with a quarter of the sequences
Even when the most divergent sequences have been assigned to families, there is still large sequence variation among the members of the classical and the extended SDRs The sequence identity is as low as 8% (classical) and 10% (extended) in pair-wise comparisons (Table 1) Thus, these two families are subject to a further assignment procedure,
at a second level, based upon coenzyme-specificity
Table 1 Number of SDR family members in the SWISSPROT and KIND databases.
Family
Groupsize Residue identity Groupsize Residue identity
Fig 3 Flow chart of the family assignment procedure Each sequence is
scored against the five different family motifs Depending on these
scores, the sequences are sorted into seven groups – the five families
and two ÔartificialÕ groups The conditions for each selection are given
within boxes.
Trang 4Coenzyme-based subfamily assignments
The coenzyme-binding residues were used in the subfamily
assignments A bab-fold, part of the Rossmann fold [10],
has been found to be in common in enzymes that bind
NAD(H), NADP(H) or FAD [11] An acidic residue is
often present at the C-terminal end of the second b-strand in
enzymes that are NAD(H)-binding [12] This residue forms
hydrogen bonds to the 2¢- and 3¢-hydroxyl groups of the
adenine ribose moiety NADP(H)-preferring enzymes have
instead two basic residues (Arg or Lys) that bind to the
2¢-phosphate [cf 13] The first of these basic residues is found
in the Gly-motif, immediately preceding the second glycine
The second basic residue is positioned directly after the
crucial acidic residue of NAD(H)-preferring enzymes, i.e at
the first loopposition after the second b-strand The pattern
of charged residues was used to distinguish subfamilies
within the classical and extended SDR families
Subfamilies within the classical SDR family
We superimposed experimentally solved 3D structures of
classical SDRs, and compared residues within 4 A˚ of the
coenzyme NAD(H)-preferring enzymes
(3a,20b-hydroxy-steroid dehydrogenase, 7a-hydroxy(3a,20b-hydroxy-steroid dehydrogenase,
2,3-dihydroxybiphenyl dehydrogenase, 2,3-butanediol
dehy-drogenase, 3-hydroxyacyl-CoA dehydrogenase type 2 and
dihydropteridine reductase; PDB codes 2hsd, 1ahh, 1bdb,
1geg, 1e3w and 1dhr), have an acidic residue present at the
end of the second b-strand (key p osition 36 in Table 2)
Presence of the Aspresidue at this position alone seems to
determine the preference of NAD(H) over NADP(H), as
neither a basic residue adjacent to this acidic residue (1bdb),
nor a basic residue in the Gly-motif (2hsd) alters the
coenzyme preference NADP(H)-binding enzymes seem to
be less strict in their requirement for two basic residues
Three structures (carbonyl reductase, troponine reductase II
and sepiapterin reductase; PDB codes 1cyd, 2ae2, and 1oaa)
have both these residues (key positions 15 and 37 in
Table 2), while trihydroxynaphthalene reductase (1ybv) and
3-oxoacyl reductase (1edo) have only the first, and
17b-hydroxysteroid dehydrogenase type 1 (1fdu) has only
the second basic residue
Because only few structures are experimentally solved, we
created an alignment including all classical SDRs with
coenzyme specificity annotated inSWISSPROT The sequences were aligned using a Hidden Markov model trained on sequences from the classical family only, to avoid artefacts due to the great diversity of the SDR superfamily We found that the correlations between patterns of charged residues and coenzyme specificity are generally applicable Sequence motifs based upon the patterns of charged residues were developed and used to sort the classical SDRs into four subfamilies of NAD(H)-binding proteins (Fig 1) These subfamilies were denoted cD1d, cD1e, cD2 and cD3 Sequences that bind NAD(H) and have a negatively charged amino acid residue present at the end of the second b-strand (key position 36, Table 2) are sorted into subfamily cD1d if this charged residue is aspartic acid or subfamily cD1e if it is glutamic acid Sequences that instead have a negatively charged residue at the first or second position after the second b-strand (key positions 37 or 38, Table 2) are sorted into subfamily cD2 or cD3, respectively The NADP(H)-binding proteins are sorted into three subfamilies Sequences with a basic residue in the Gly-motif (key position 15, Table 2) are sorted into subfamily cP1, while those with a basic residue at the first position after the second b-strand (key position 37, Table 2) are sorted into subfamily cP2 The cP3 subfamily is formed from sequences that have basic residues at both these positions
The new sorting process was applied to every classical SDR sequence inSWISSPROTandKIND, giving the distribu-tion of subfamilies shown in Table 2 NADP(H)-binding is twice as frequent as NAD(H)-binding ( 60% vs 30%), indicating that there are more forms catalysing the reductive reactions than the oxidative reactions Only about 10% of the sequences do not have any of the typical patterns and thus cannot be classified
For all but six of the 218 assigned classical SDRs, the coenzyme specificity is correctly predicted, as judged by agreements with the annotations in theSWISSPROTdatabase entries Scrutinizing the six deviating cases, we find that in four (Dhb1_Human, Dhb7_Mouse, Dhpr_Rat and Idno_Ecoli) there are experimental studies [14–17] that support our predictions The remaining two cases are sequences involved in fatty acid biosynthesis (Fabg_Thema and Fag2_Syny3) They are annotated as NADPH-binding
inSWISSPROT, and other proteins of the same functional type indeed use NADPH as coenzyme However, in contrast to them, these two sequences have an aspartic acid at the last
Table 2 Number of classical SDRs within the SWISSPROT and KIND databases, divided into different coenzyme-binding subfamilies Key position numbers refer to 3a,20b-hydroxysteroid dehydrogenase (PDB code 2hsd).
Subfamily
Key positions
SWISSPROT KIND
Trang 5position of the second b-strand and are thus predicted to be
NAD(H)-binding by our method (subfamily cD1d) It is still
not experimentally verified if these two sequences bind
NADH or if they bind NADPH in an atypical manner
Subfamilies within the extended SDR family
The number of experimentally solved 3D structures for the
extended family is lower than for the classical family At
present, there are two known structures for
NAD(H)-preferring enzymes (UDP-galactose 4-epimerase and
dTDP-glucose 4,6-dehydratase; PDB codes 1ek6 and
1bxk) As for the NAD(H)-preferring enzymes of the
classical type, those of the extended family also present the
acidic residue (at key position 33, Table 3), and it is
concluded to be the exclusive determinant of an
NAD(H)-preferring enzyme There are two structures of
NADP(H)-preferring enzymes (GDP-fucose synthetase and
ADP-L-glycero-D-mannoheptose 6-epimerase; PDB codes
1bsv and 1eq2) However, when superimposing these
structures the root mean square deviation is 10 A˚, and
one of the main differences between the structures is in the
coenzyme-binding region The second structure (1eq2) is
atypical of the family [18,19], as it prefers NADP(H) but still
has the aspartic acid at the end of the second b-strand
typical of NAD(H)-binding Thus, the assignments of
NADP(H)-preferring enzymes of the extended type is based
on only the alignment of known annotated members of this
type In the alignment, we find that the basic residue present
in the Gly-motif among the classical SDRs does not have a
counterpart among the extended SDRs The second basic
residue, in the loopafter the second b-strand, is conserved
among extended SDRs as well (key position 34, Table 3)
For the extended SDRs, two NAD(H)-binding
sub-families (eD1 and eD2) and one NADP(H)-binding
subfamily (eP1) were defined based on the alignment
NAD(H)-binding sequences with an acidic residue at the
end of the second b-strand (key position 33, Table 3) are
sorted into the eD1 subfamily and those that have an acidic
residue two positions downchain are sorted into the eD2
subfamily The eP1 subfamily will consist of
NADP(H)-bound sequences that have a basic residue at the first loop
position after the second b-strand (key position 34, Table 3)
Table 3 displays the results when this classification system is
ap p lied to theSWISSPROTandKINDdatabases In contrast to
the results for the classical SDRs, a majority of the extended
SDRs are predicted to be NAD(H)-binding rather than
NADP(H)-binding The NAD(H)-binding enzymes are
twice as many as the NADP(H)-binding ones, indicating
that there are more dehydrogenases than reductases in the extended SDR family Around 10% of the sequences lack charged residues at the deterministic positions
For all but eight of the 118 assigned extended SDRs, the predicted coenzyme specificities agree with those annotated
inSWISSPROT There are three ADP-L-glycero-D -mannohep-tose 6-epimerases that are predicted to be NAD(H)-binding The sequences harbour an aspartic acid residue at the NAD(H)-deterministic position, but these enzymes prefer NADP(H) rather than NAD(H) The structure of the E.coli enzyme (1eq2) shows that the Aspresidue is in a more open conformation in contrast to other NAD(H)-preferring enzymes, and that therefore NADP(H) can be accommodated [18,19] There are five other sequences where the predicted coenzyme preferences are in disagree-ment with the annotated preferences One enzyme (galac-tose epimerase, Gale_Vibch) is predicted to prefer NADP(H), but as the galactose epimerases normally prefer NAD(H), the prediction is probably deceived by a mis-alignment due to a deletion of nine residues Another NADP(H)-predicted sequence (Noel_Rhifr) is annotated as NAD(H)-preferring, but also as a mannose dehydratase, which in general prefer NADP(H) to NAD(H) There are
no experimental data to support either alternative The last three sequences are dTDP-4-dehydrorhamnose reductases (Rbd1_Ecoli, Rbd2_Ecoli and Rfbd_Salty) with around 80% pair-wise residue identity They are predicted to be NAD(H)-preferring but are annotated to be NADP(H)-preferring However, the enzyme from S.enterica (Rfbd_Salty) has been shown to have dual coenzyme specificity, with a slight preference for NADH [20] Application to genome data
We also applied our method to six of the genome databases available, i.e human [21], mouse (July 2001; Celera Genomics, Rockville, MD), C.elegans [22], D.melanogaster [23], A.thaliana [24]; and S.cerevisiae [25] In Fig 4, results
of the assignments are displayed The numbers of SDRs found are similar when comparing the human and mouse genomes These genomes were released recently and cannot
be considered to be complete Thus, the number of SDRs in these genomes can be expected to increase [26]
For the human and mouse genomes, the distribution between classical (gray) and extended (white) families is similar to that in the general protein databases, where the extended members amount to around 25% or less of the total SDR number However, in the S.cerevisiae and A.thaliana genomes about 40% of the SDR forms are
Table 3 Number of extended SDRs, within the SWISSPROT and KIND databases, assigned into different coenzyme-binding subfamilies Key positions numbers refer to UDP-galactose 4-epimerase (PDB code 1ek6).
Subfamily
Key positions
Trang 6extended Yeast has a much smaller genome than the others
with only 19 SDRs in total, and the seven extended SDRs
might reflect a critical minimum of extended SDRs [2] In
the plant (A.thaliana) genome the extended members are
close to half of the total SDR forms, reflecting the different
metabolic requirements in plants involving several
carbo-hydrate rearrangements The total number of SDR forms is
greater in A.thaliana than in other species, compatible with
the large number of gene duplications in plants [27]
However, the ratio between extended and classical forms
is still the same when reducing the data set for homology at
the 60% and 80% levels
The absolute numbers of extended SDRs are similar in
the animal species (10–18) The number of classical SDRs is
between 39 and 48 in human, mouse and fruit fly, while the
worm has 72 classical SDRs The worm shows a
consid-erable gene duplication tendency [28], which if affecting
classical and extended SDRs differently could explain this
difference
Also shown in Fig 4 are the results of the subfamily
assignments within the classical and extended SDRs The pie
charts show the relative number of NAD(H)-preferring
sequences (lined pattern) vs NADP(H)-preferring sequences
(solid) in each genome The number of NAD(H)-dependent
SDRs is close to the number of NADP(H)-dependent SDRs
in human, mouse and A.thaliana In contrast, the
NAD(H)-dependent enzymes amount to only one quarter in fruit fly
and one eighth in worm and yeast
The observation that classical SDRs most frequently
utilize NADP(H) is remarkable In the worm genome, 60
sequences are sorted into the NADP(H) classes, while only
eight are sorted into NAD(H) classes For extended SDRs,
the observation that most of them in general are
NAD(H)-dependent is not valid for fruit fly and yeast, where most
extended SDRs instead bind NADP(H), and A.thaliana,
where the numbers of NAD(H)- and NADP(H)-dependent
forms are close to equal (34 vs 27)
D I S C U S S I O N
Database quality considerations
Our method for functional assignments was applied to
completed eukaryotic genomes, revealing that the SDR
subfamily patterns vary considerably between different
species However, the genome databases are often prelimi-nary and contain errors Exons might be missing resulting in partial sequences Falsely ascribed exon borders will result
in sequences with erroneous deletions and/or insertions A motif-based method, that is dependent on a correct alignment, is of course sensitive to these types of error Still, bearing in mind that several genome sequences are preliminary, this type of classification is valuable to deduce early functional assignments
Automated annotation methods are developed to assign functions to newly sequenced proteins A drawback with automated annotation is that errors might be introduced [29] Manual annotation should be of higher quality but is very time-consuming, which leads to difficulties in keeping
up the pace with the genome sequencing projects In this study, we detected some errors in annotation of coenzyme specificity in SWISSPROT, a database that is manually annotated and thereby believed to be reliable There were three different types of error between the keywords and the references in these database entries First, the quoted publications reported different coenzyme specificities, but the keywords only mentioned one of them Second, there were entries where the quoted publications stated one type
of coenzyme while the keyword stated a different type Third, there were entries where the keywords reported a coenzyme specificity without any verifying reference, and the keywords did not say ÔprobableÕ or Ôby similarityÕ, or any other word to inform about the uncertainty Thus, it is still necessary to perform database assignment checks, and the present method is useful for this purpose, in addition to its value in primary assignments
Classical SDRs vs extended SDRs The multiple sequence alignments of classical and extended SDRs (Fig 5) show that even though these families are highly divergent, there are conserved regions that can serve
as fingerprints in the identification of novel SDR members (Fig 6) In these regions, used to identify classical and extended SDR family members (see Materials and methods), some motifs are of special interest These are listed in Table 4 In the N-terminal region, we find the pattern of three glycine residues that is characteristic of NAD(P)(H)-binding enzymes These residues are spaced differently in classical and extended SDRs (Table 4)
Fig 4 Classical and extended SDRs and their coenzyme preference shown for the genomes investigated The pie charts display the pro-portions between classical (gray) and extended (white) SDRs with specificity for NAD(H) (lined pattern) and NADP(H) (solid), for each
of the six genomes studied The number of SDR enzymes with their coenzyme-specificity assigned is given within parentheses.
Trang 7In both families there is a conserved aspartic acid residue,
in the loopbetween b3 and a3, required for stabilization of
the adenine-binding pocket [13,30] In the extended family
this residue if often followed by another charged residue two
positions downchain
The motif positioned in and adjacent to b4 (Table 4) is
less conserved among extended SDRs compared to classical
SDRs Typically, extended SDRs prefer a histidine residue
rather than an asparagine residue at the end of this b-strand
In classical SDRs, the NNAG motif has a role to stabilize
the b-strands within the central b-sheet and to p osition this
central b-sheet [30]
There is a motif in a4 that is especially well conserved among the extended SDRs The a4 motif is also conserved among the classical SDRs Here, the asparagine residue is involved in building the active site geometry by positioning the lysine residue and being part of a postulated proton relay [30]
The active site residues in b5 and a5 (serine, tyrosine and lysine) are found in both classical and extended SDRs The extended SDRs have a conserved proline residue preceding the tyrosine residue, and also a conserved negatively charged residue four residues downchain of the lysine residue Neither of these two residues are conserved in the
Fig 6 3D structure of a classical SDR enzyme
with motifs indicated The spheres show the
coenzyme-deterministic positions for
NAD(H) in red and NADP(H) in blue.
Regions used to identify SDR members (cf.
Figure 2) are shown by blue ribbons The
coenzyme is coloured magenta The structure
is 3a,20b-hydroxysteroid dehydrogenase
(PDB code 2hsd) The figure was made using
the programme
Fig 5 Multiple sequence alignments of classical and extended SDRs The first three columns give the SWISSPROT sequence identifier, PDB identifier and subfamily membership The secondary structure elements of 3a,20b-hydroxysteroid dehydrogenase (PDB code 2hsd) are shown above the classical SDR alignment, while the secondary structure elements of UDP-galactose 4-epimerase (PDB code 1ek6) is shown below the extended SDR alignment Boxed residues denote key positions in coenzyme binding Coloured residues represent conservation of 60%, as calculated for a larger data set (red ¼ acidic, green ¼ polar, light blue ¼ hydrophobic, dark blue ¼ basic, purple ¼ Gly or Pro) Arrows 1, 2 and 3 above the alignment show the key positions 15, 36 and 37 (cf Table 2) Arrows 1, 2 and 3 below the alignment show the key positions 33, 34 and 35 (cf Table 3).
Trang 8classical family, instead, they have a conserved aspartic acid
residue about 13 positions downchain from the lysine
residue
Coenzyme specificity as classification basis
The two-level classification system divides members of the
SDR superfamily into families and subfamilies, using a
motif-based approach For the five families detected at the
first level – classical, extended, intermediate, divergent and
complex – specific sequence patterns were extracted
(Table 2) The patterns for families with few and/or closely
related members (i.e the intermediate, divergent and
complex families) might be necessary to update when
further members are added, to avoid a bias towards the
presently known sequences
At the second level, the sequences belonging to the
classical and extended families were further divided into
seven and three subfamilies, respectively These subfamilies
were defined based on coenzyme specificity and patterns of
charged residues in the coenzyme-binding region The
human 17b-hydroxysteroid dehydrogenase type 1 is an
NADP(H)-preferring enzyme with a serine residue (Ser12)
at the position before the second glycine residue of the
glycine motif There is an arginine residue (Arg37) at the
first position after the second b-strand Site-directed
muta-genesis experiments show that an exchange of Ser12 to
lysine increased the specificity for NADP(H), while a
substitution of Leu36 to an aspartic acid changed the
preference from NADP(H) to NAD(H) [34], supporting the
crystallographic analysis and our motif-based assignments
The specificity might also depend on other factors than
the sequence patterns defined thus far Some enzymes show
dual coenzyme specificity and might bind alternative
coenzymes in different tissues and in different cellular
compartments Molecular modelling using docking
calcu-lations might be helpful in the prediction of coenzyme
preference [35]
There are members of the classical type where no motifs for
coenzyme specificity were established, as no charged residues
are found at the key positions otherwise identified as crucial
for this task (Table 2) This is the situation for
11b-hydroxy-steroid dehydrogenases type 2 and human
17b-hydroxy-steroid dehydrogenase type 2 However, charged residues are
found further downchain, and their roles might be clarified
when the 3D structures become known The retinol
dehy-drogenases (RDH) constitute a groupwhere experiments
show that bovine RDH is NAD+-dependent [36], while the rat RDH is NADP+-dependent [37] These two sequences are very similar in the Gly-region and identical at the positions used to distinguish between NAD(H) and NADP(H) enzymes Based on homology modelling of rat and bovine RDH [38], a basic residue further downchain (Lys64) in rat RDH is believed to enable NADP+to bind The corresponding residue in bovine is polar (Thr61) Only when their respective 3D structures have been experimentally determined, will it be possible to check which residues have shouldered the burden of separating between NAD(H) and NADP(H) specificity in these enzymes
In summary, we have shown that functional assignments can be made and coenzyme preferences can be predicted from the amino acid sequence alone for SDR enzymes For this divergent superfamily, we could distinguish families and subfamilies, which will helpfuture assignments The present approach using hidden Markov models and sequence patterns is general and can be extended to further enzyme families
A C K N O W L E D G E M E N T S
Financial support from the Swedish Research Council, the Swedish Foundation for Strategic Research, the Swedish Society for Medical Research, the Swedish Society of Medicine, the Novo Nordisk Foundation and Karolinska Institutet is gratefully acknowledged.
R E F E R E N C E S
1 Jo¨rnvall, H., Persson, M & Jeffery, J (1981) Alcohol and polyol dehydrogenases are both divided into two protein types, and structural properties cross-relate the different enzyme activities within each type Proc.Natl Acad.Sci.USA 78, 4226– 4230.
2 Jo¨rnvall, H., Ho¨o¨g, J.-O & Persson, B (1999) SDR and MDR: completed genome sequences show these protein families to be large, of old origin, and of complex nature FEBS Lett 445, 261– 264.
3 Jo¨rnvall, H., Persson, B., Krook, M., Atrian, S., Gonzalez-Duarte, R., Jeffery, J & Ghosh, D (1995) Short-chain dehy-drogenases/reductases (SDR) Biochemistry 34, 6003–6013.
4 Karplus, K., Barrett, C & Hughey, R (1998) Hidden Markov models for detecting remote protein homologies Bioinformatics
14, 846–856.
5 Bairoch, A & Apweiler, R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 Nucleic Acids Res 28, 45–48.
Table 4 Conserved sequence motifs in the classical and the extended SDR families In the motifs, ƠaÕ denotes an aromatic residue, ƠcÕ a charged residue, ƠhÕ a hydrophobic residue, ƠpÕ a polar residue and ƠxÕ any residue Alternative amino acids at a motif position are given within brackets.
Secondary
structure
element
SDR motifs
Suggested function Reference Classical Extended
b1 + a1 TGxxxGhG TGxxGhaG Structural role in coenzyme binding region [1,2,31] b3 + a3 Dhx[cp] DhxD Adenine ring binding of coenzyme [30] b4 GxhDhhhNNAGh [DE]xhhHxAA Structural role in stabilizing central b-sheet [30]
b5 GxhhxhSSh hhhxSSxxhaG Part of active site [2,31] a5 Yx[AS][ST]K PYxx[AS]Kxxh[DE] Part of active site [2,31] b6 h[KR]h[NS]xhxPGxxxT h[KR]xxNGP Structural role, reaction direction [32,33]
Trang 96 Kallberg, Y & Persson, B (1999) KIND – a nonredundant
pro-tein database Bioinformatics 15, 260–261.
7 Abagyan, R & Totrov, M (1994) Biased probability Monte Carlo
conformational searches and electrostatic calculations for peptides
and proteins J.Mol.Biol.235, 983–1002.
8 Stewart, M.J., Parikh, S., Xiao, G., Tonge, P.J & Kisker, C.
(1999) Structural basis and mechanism of enoyl reductase
inhibi-tion by triclosan J.Mol.Biol.290, 859–865.
9 Rozwarski, D.A., Vilcheze, C., Sugantino, M., Bittman, R &
Sacchettini, J.C (1999) Crystal structure of the Mycobacterium
tuberculosis enoyl-ACP reductase, InhA, in complex with NAD +
and a C16 fatty acyl substrate J.Biol.Chem.274, 15582–15589.
10 Rossmann, M.G., Liljas, A., Bra¨nde´n, C.-I & Banaszak, L.J.
(1975) The Enzymes, 3rd edn (Boyer, P.D., eds), Vol 11, p p 61–
102 Academic Press, New York.
11 Wierenga, R.K., de Maeyer, M.C & Hol, W.G (1985) Interaction
of pyrophosphate moieties with a-helices in dinucleotide binding
proteins Biochemistry 24, 1346–1357.
12 Wierenga, R.K., Terpstra, P & Hol, W.G (1986) Prediction of the
occurrence of the ADP-binding beta alpha beta-fold in proteins,
using an amino acid sequence fingerprint J.Mol.Biol.187, 101–
107.
13 Tanaka, N., Nonaka, T., Nakanishi, M., Deyashiki, Y., Hara, A.
& Mitsui, Y (1996) Crystal structure of the ternary complex of
mouse lung carbonyl reductase at 1.8 A˚ resolution: the structural
origin of coenzyme specificity in the short-chain dehydrogenase/
reductase family Structure 4, 33–45.
14 Breton, R., Housset, D., Mazza, C & Fontecilla-Camps, J.C.
(1996) The structure of a complex of human
17beta-hydroxy-steroid dehydrogenase with estradiol and NADP + identifies two
principal targets for the design of inhibitors Structure 4, 905–915.
15 Nokelainen, P., Peltoketo, H., Vihko, R & Vihko, P (1998)
Expression cloning of a novel estrogenic mouse 17
beta-hydroxysteroid dehydrogenase/17-ketosteroid reductase
(m17HSD7), previously described as a prolactin
receptor-associ-ated protein (PRAP) in rat Mol.Endocrinol.12, 1048–1059.
16 Varughese, K.I., Skinner, M.M., Whiteley, J.M., Matthews, D.A.
& Xuong, N.H (1992) Crystal structure of rat liver
dihydropter-idine reductase Proc.Natl Acad.Sci.USA.89, 6080–6084.
17 Bausch, C., Peekhaus, N., Utz, C., Blais, T., Murray, E.,
Lowary, T & Conway, T (1998) Sequence analysis of the GntII
(subsidiary) system for gluconate metabolism reveals a novel
pathway for L -idonic acid catabolism in Escherichia coli
J.Bac-teriol 180, 3704–3710.
18 Deacon, A.M., Ni, Y.S., Coleman, W.G Jr & Ealick, S.E.
(2000) The crystal structure of ADP- L -glycero- D -mannoheptose
6-epimerase: catalysis with a twist Structure Fold.Des.8, 453–462.
19 Ni, Y., McPhie, P., Deacon, A., Ealick, S & Coleman, W.G Jr
(2001) Evidence that NADP + is the physiological cofactor of
ADP- L -glycero- D -mannoheptose 6-epimerase J.Biol.Chem.276,
27329–27334.
20 Graninger, M., Nidetzky, B., Heinrichs, D.E., Whitfield, C &
Messner, P (1999) Characterization of
dTDP-4-dehydro-rhamnose 3,5-epimerase and dTDP-4-dehydrodTDP-4-dehydro-rhamnose reductase,
required for dTDP- L -rhamnose biosynthesis in Salmonella enterica
serovar Typhimurium LT2 J.Biol.Chem.274, 25069–25077.
21 Venter, J.C et al (2001) The sequence of the human genome.
Science 291, 1304–1351.
22 Wilson, R.K (1999) How the worm was won The C.elegans
genome sequencing project Trends Genet 15, 51–58.
23 Adams, M.D et al (2000) The genome sequence of Drosophila melanogaster Science 287, 2185–2195.
24 Huala, E et al (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant Nucleic Acids Res 29, 102–105.
25 Mewes, H.W et al (1997) Overview of the yeast genome Nature
387, 7–65.
26 Kallberg, Y., Oppermann, U., Jo¨rnvall, H & Persson, B (2002) Short-chain dehydrogenase/reductase (SDR) relationships: a large family with eight clusters common to human, animal, and plant genomes Protein Sci 11, 636–641.
27 Bancroft, I (2000) Insights into the structural and functional evolution of plant genomes afforded by the nucleotide sequences
of chromosomes 2 and 4 of Arabidopsis thaliana Yeast 17, 1–5.
28 Semple, C & Wolfe, K.H (1999) Gene duplication and gene conversion in the Caenorhabditis elegans genome J.Mol.Evol.48, 555–564.
29 Devos, D & Valencia, A (2001) Intrinsic errors in genome annotation Trends Genet 17, 429–431.
30 Filling, C., Berndt, K.D., Benach, J., Knapp, S., Prozorovski, T., Nordling, E., Ladenstein, R., Jo¨rnvall, H & Oppermann, U (2002) Critical residues for structure and catalysis in short-chain dehy-drogenases/reductases (SDR) J.Biol.Chem.277, 25677–25684.
31 Oppermann, U.C., Filling, C., Berndt, K.D., Persson, B., Benach, J., Ladenstein, R & Jo¨rnvall, H (1997) Active site directed mutagenesis of 3 beta/17 beta-hydroxysteroid dehydro-genase establishes differential effects on short-chain dehydrogen-ase/reductase reactions Biochemistry 36, 34–40.
32 Filling, C., Nordling, E., Benach, J., Berndt, K.D., Ladenstein, R., Jo¨rnvall, H & Oppermann, U (2001) Structural role of conserved Asn179 in the short-chain dehydrogenase/reductase scaffold Biochem.Biophys.Res.Commun.289, 712–717.
33 Ghosh, D & Vihko, P (2001) Molecular mechanisms of estrogen recognition and 17-keto reduction by human 17beta-hydroxysteroid dehydrogenase 1 Chem.Biol.Interact.130–132, 637–650.
34 Huang, Y.W., Pineau, I., Chang, H.J., Azzi, A., Bellemare, V., Laberge, S & Lin, S.X (2001) Critical residues for the specifi-city of cofactors and substrates in human estrogenic 17beta-hydroxysteroid dehydrogenase 1: variants designed from the three-dimensional structure of the enzyme Mol.Endocrinol.11, 2010–2020.
35 Peralba, J.M., Cederlund, E., Crosas, B., Moreno, A., Julia`, P., Martı´nez, S.E., Persson, B., Farre´s, J., Pare´s, X & Jo¨rnvall, H (1999) An NADP(H)-dependent stomach alcohol dehydrogenase Structural and enzymatic properties of a gastric NADP(H)-dependent and retinal-active alcohol dehydrogenase J.Biol Chem 274, 26021–26026.
36 Simon, A., Hellman, U., Wernstedt, C & Eriksson, U (1995) The retinal pigment epithelial-specific 11-cis retinol dehydrogenase belongs to the family of short chain alcohol dehydrogenases J.Biol.Chem.270, 1107–1112.
37 Chai, X., Boerman, M.H., Zhai, Y & Napoli, J.L (1995) Cloning
of a cDNA for liver microsomal retinol dehydrogenase A tissue-specific, short-chain alcohol dehydrogenase J.Biol.Chem.270, 3900–3904.
38 Tsigelny, I & Baker, M.E (1996) Structures important in NAD(P)(H) specificity for mammalian retinol and 11-cis-retinol dehydrogenases Biochem.Biophys.Res.Commun.226, 118–127.