1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: Short-chain dehydrogenases/reductases (SDRs) Coenzyme-based functional assignments in completed genomes pptx

9 243 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Short-chain dehydrogenases/reductases (sdrs) coenzyme-based functional assignments in completed genomes
Tác giả Yvonne Kallberg, Udo Oppermann, Hans Jörnvall, Bengt Persson
Trường học Karolinska Institutet
Chuyên ngành Biochemistry
Thể loại báo cáo khoa học
Năm xuất bản 2002
Thành phố Sweden
Định dạng
Số trang 9
Dung lượng 441,42 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Thus, in order to separate SDRs from other enzymes, key positions in the classical and extended motifs were deduced from multiple sequence alignments.. At the second level, members of th

Trang 1

Short-chain dehydrogenases/reductases (SDRs)

Coenzyme-based functional assignments in completed genomes

Yvonne Kallberg1,2, Udo Oppermann1, Hans Jo¨rnvall1and Bengt Persson1,2

1

Department of Medical Biochemistry and Biophysics and2Stockholm Bioinformatics Centre, Karolinska Institutet, Sweden

Short-chain dehydrogenases/reductases (SDRs) are enzymes

of great functional diversity Even at sequence identities of

typically only 15–30%, specific sequence motifs are

detect-able, reflecting common folding patterns We have

devel-oped a functional assignment scheme based on these motifs

and we find five families Two of these families were known

previously and are called ÔclassicalÕ and ÔextendedÕ families,

but they are now distinguished at a further level based on

coenzyme specificities This analysis gives seven subfamilies

of classical SDRs and three subfamilies of extended SDRs

We find that NADP(H) is the preferred coenzyme among

most classical SDRs, while NAD(H) is that preferred among

most extended SDRs Three families are novel entities,

denoted ÔintermediateÕ, ÔdivergentÕ and ÔcomplexÕ,

encom-passing short-chain alcohol dehydrogenases, enoyl

reducta-ses and multifunctional enzymes, respectively The

assignment scheme was applied to the genomes of human,

mouse, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana and Saccharomyces cerevisiae In the animal genomes, the extended SDRs amount to around one quarter or less of the total number of SDRs, while in the A.thalianaand S.cerevisiae genomes, the extended mem-bers constitute about 40% of the SDR forms The nummem-bers

of NAD(H)-dependent and NADP(H)-dependent SDRs are similar in human, mouse and plant, while the propor-tions of NAD(H)-dependent enzymes are much lower in fruit fly, worm and yeast We show that, in spite of the great diversity of the SDR superfamily, the primary structure alone can be used for functional assignments and for pre-dictions of coenzyme preference

Keywords: short-chain dehydrogenases/reductases; genome; coenzyme; sequence patterns; bioinformatics

Short-chain dehydrogenases/reductases (SDRs) are

enzymes of 250 residue subunits catalysing

NAD(P)(H)-dependent oxidation/reduction reactions The concept of

SDRs was established in 1981 [1], at a time when the only

members known were a prokaryotic ribitol dehydrogenase

and an insect alcohol dehydrogenase Since then, the SDR

family has grown enormously, both in the number of

known members and the diversity of their functions

Already some years ago, over 1000 forms were ascribed to

the SDR superfamily [2], and currently at least 3000

members, including species variants, are known with a

substrate spectrum ranging from alcohols, sugars, steroids

and aromatic compounds to xenobiotics The N-terminal

region binds the coenzymes NAD(H) or NADP(H), while

the C-terminal region constitutes the substrate binding part

Although the residue identity is as low as 15–30%, the 3D

folds are quite similar, except for the C-terminal regions

The SDRs have been divided into two large families,

ÔclassicalÕ and ÔextendedÕ, with different Gly-motifs in the

coenzyme-binding regions, and different chain lengths;

around 250 residues in classical SDRs and 350 in extended

SDRs [3] Few residues are completely conserved, but

several sequence motifs are distinguishable within the families

It is desirable to define distinct characteristics of these families for functional assignments of new sequences added

to the SDR superfamily We have now defined characteristic differences for all SDR types and distinguish five SDR families Furthermore, seven subfamilies are delineated within the classical SDRs and three subfamilies within the extended SDRs These characteristics can be used for functional predictions of further, novel structures, and the assignment system developed is now applied to the genomes

of human, mouse, Drosphila melanogaster, Arabidopsis tha-liana, Caenorhabditis elegans and Saccharomyces cerevisiae

M A T E R I A L S A N D M E T H O D S

We trained a Hidden Markov model [4] on a set of 95 SDR sequences extracted from SWISSPROT with less than 70% identity in pairwise comparisons, using a manually curated alignment based on human SDRs as seed sequences The resulting Hidden Markov model was subsequently used to search the databases SWISSPROT[5] andKIND[6], selecting every sequence that had an expect value below 10)15as a candidate SDR When these candidate sequences were aligned, they separated into five clusters (Fig 1), two of which were the classical and extended families [3] and three were the specific families of insect alcohol dehydrogenase, enoyl reductase and multifunctional enzymes These three novel families were named ÔintermediateÕ, ÔdivergentÕ and ÔcomplexÕ, respectively The first level of assignments would then be to sort sequences into these five families using a motif-based approach

Correspondence to B Persson, Department of Medical Biochemistry

and Biophysics, Karolinska Institutet, S-171 77 Stockholm,

Sweden Fax: + 46 8 337 462, Tel.: + 46 8 728 7730,

E-mail: bengt.persson@mbb.ki.se

Abbreviation: SDR, short-chain dehydrogenase/reductase.

(Received 25 April 2002, revised 16 July 2002,

accepted 24 July 2002)

Trang 2

Based on a nonredundant set (< 80% identity; 100

classical, 80 extended, 7 intermediate, 12 divergent and 12

complex) of known SDR members in SWISSPROT, we

developed sequence motifs covering the most conserved

parts of the sequences Three sequence motifs were

devel-oped for each family (Fig 2) to optimize specificity and

sensitivity Within each family, 40 of the most preserved

amino acid residues in the alignment were selected The

amino acid types ÔacceptedÕ at a position were those

observed together with those with similar amino acid

properties, e.g if Ile and Val are observed, then Leu is also

accepted at that position

During an iterative process, an automated sorting

procedure was developed The sequences aligned were

scored against the sequence motifs in the following manner

The presence of an accepted amino acid residue type at a

motif position increases the sequence score with one point

If instead a gapis found at that position, the score is

decreased by one point A large region of the motifs cover

the coenzyme-binding region Other enzyme families that

also bind NAD(P)(H) might be detected with this profile,

and introduce false positives in our set Thus, in order to

separate SDRs from other enzymes, key positions in the

classical and extended motifs were deduced from multiple

sequence alignments These key positions (bold in Fig 2)

render a score of +3 if present and a score of)3 if absent

Thus, each sequence is associated with five different scores,

one for each family Incomplete sequences can pose a problem when using sequence-based methods, because such sequences might render a low score and thus be classified incorrectly In this report, sequences with more than 20% gappositions in the alignment were removed from the data set and not subjected to the scoring process

The sorting procedure, with the groups and thresholds, is shown in Fig 3 The thresholds were obtained through a systematic iterative procedure The scores were used to sort the sequences into one of the five families There are members of the SDR superfamily that do not meet any of the family requirements, i.e the scores are below the thresholds Rather than to lower the thresholds or to extend the motifs, such sequences are sorted into an artificial group called ÔunclassifiedÕ SDR Another artificial group, Ôpoten-tialÕ SDR, is also used It will consist of sequences that are not SDR members as far as can be judged today, but have some properties in common with the SDR family For the structural comparisons, the 3D structures of members within the SDR superfamily were superimposed usingICM(version 2.7, Molsoft LLC, San Diego, CA, USA) [7]

R E S U L T S

Five SDR families

In order to get functional assignments for the members of the SDR superfamily, we developed an assignment system

to distinguish families with specific characteristics The SDR superfamily divides into five families (Fig 1), of which two are the previously established classical and extended, and three are novel entities, denoted intermediate, divergent and complex

The classical family encompasses oxidoreductases (EC 1.-.-.-), such as steroid dehydrogenases and carbonyl reductases The extended family consists of isomerases (EC 5.-.-.-), e.g galactose epimerases, and lyases (EC 4.-.-.-), such as glucose dehydratases, but several oxidoreductases are also found within this family, e.g in

Fig 2 Conserved sequence motifs in the SDR families as derived from a multiple sequence alignment For each of the five SDR families, specific sequence patterns exist Three motif segments, with a total of 40 preserved positions that cover the coenzyme-binding and active site regions, have been chosen for each family Multiple amino acid occurrences at a position are written within brackets ÔxÕ denotes any amino acid residue or gap, and when present the subsequent number indicates the number of x residues/gaps Amino acid residues written in bold indicate positions of special importance in the classical and extended motifs Because the motifs are based upon the sequence majority, insertions in single sequences do not affect the patterns.

Fig 1 The two levels of classification within the SDR superfamily At

the first level, the members of the SDR superfamily are separated into

five families At the second level, members of the classical and extended

families are separated into seven and three subfamilies, respectively,

based upon coenzyme-binding residue patterns.

Trang 3

multifunctional enzymes such as the 3b-hydroxysteroid

dehydrogenase/D 4,5 isomerase cluster

The intermediate family exhibits an atypical Gly-motif

(G/AxxGxxG/A) that resembles patterns of extended

SDRs, except that Ala is highly represented instead of

Gly However, the remaining parts of the sequences are

more closely related to the classical SDRs, e.g with an

NGAG motif (corresponding to the NNAG motif in b4,

Table 4), and with a subunit size ( 250 residues) as the

classical SDRs In this family, thus denoted intermediate, we

find fruit fly alcohol dehydrogenases, constituting a set of

SDRs that divides into three lines with around 35% sequence identity, in pair-wise comparisons, between them The divergent family with enoyl reductases from bacteria and plants constitutes a set of NADH-dependent enzymes with three patterns that deviate from those typical of most SDRs First, the Gly-motif is differently spaced with five residues instead of three between the first two glycine residues Second, in bacteria the second and third glycines have been replaced with serine and alanine, i.e the motif is GxxxxxSxA Third, there is a methionine instead of a tyrosine in the active site motif, while the tyrosine is found three positions upchain, i.e YxxMxxxK instead of YxxxK The 3D structures of FabI from Escherichia coli (PDB code 1qsg) and Mycobacterium tuberculosis (1bvr) reveal that the tyrosine and lysine residues are close in space They are located within an a-helix and the spacing between the two residues makes them face the same side with a similar distance between Tyr-Og and Lys-Nf as for the classical SDRs, i.e with a 1.3-A˚ difference compared to the 3a,20b-hydroxysteroid dehydrogenase, and with spatial freedom for the lysine residue to move closer to the tyrosine residue Thus, they can function the same way as when the residues are only three positions apart [8,9]

The complex family is named after its members, which are parts of multifunctional enzyme complexes present in all forms of life, e.g fatty acid synthase They are NADP(H)-binding proteins with the SDR region having a beta-ketoacyl reductive function This grouphas the unique motif of YxxxN at the active site rather than the typical YxxxK

Using a Hidden Markov model, candidate SDR sequences were extracted fromSWISSPROTandKIND These sequences were aligned and sorted into the five families, classical, extended, intermediate, divergent and complex, using a motif-based approach (for details please see the Materials and methods section) The two databases show the same ratios for the different families (Table 1) The family of classical SDRs is the largest, capturing half of the sequences, while the family of extended SDRs is second in size with a quarter of the sequences

Even when the most divergent sequences have been assigned to families, there is still large sequence variation among the members of the classical and the extended SDRs The sequence identity is as low as 8% (classical) and 10% (extended) in pair-wise comparisons (Table 1) Thus, these two families are subject to a further assignment procedure,

at a second level, based upon coenzyme-specificity

Table 1 Number of SDR family members in the SWISSPROT and KIND databases.

Family

Groupsize Residue identity Groupsize Residue identity

Fig 3 Flow chart of the family assignment procedure Each sequence is

scored against the five different family motifs Depending on these

scores, the sequences are sorted into seven groups – the five families

and two ÔartificialÕ groups The conditions for each selection are given

within boxes.

Trang 4

Coenzyme-based subfamily assignments

The coenzyme-binding residues were used in the subfamily

assignments A bab-fold, part of the Rossmann fold [10],

has been found to be in common in enzymes that bind

NAD(H), NADP(H) or FAD [11] An acidic residue is

often present at the C-terminal end of the second b-strand in

enzymes that are NAD(H)-binding [12] This residue forms

hydrogen bonds to the 2¢- and 3¢-hydroxyl groups of the

adenine ribose moiety NADP(H)-preferring enzymes have

instead two basic residues (Arg or Lys) that bind to the

2¢-phosphate [cf 13] The first of these basic residues is found

in the Gly-motif, immediately preceding the second glycine

The second basic residue is positioned directly after the

crucial acidic residue of NAD(H)-preferring enzymes, i.e at

the first loopposition after the second b-strand The pattern

of charged residues was used to distinguish subfamilies

within the classical and extended SDR families

Subfamilies within the classical SDR family

We superimposed experimentally solved 3D structures of

classical SDRs, and compared residues within 4 A˚ of the

coenzyme NAD(H)-preferring enzymes

(3a,20b-hydroxy-steroid dehydrogenase, 7a-hydroxy(3a,20b-hydroxy-steroid dehydrogenase,

2,3-dihydroxybiphenyl dehydrogenase, 2,3-butanediol

dehy-drogenase, 3-hydroxyacyl-CoA dehydrogenase type 2 and

dihydropteridine reductase; PDB codes 2hsd, 1ahh, 1bdb,

1geg, 1e3w and 1dhr), have an acidic residue present at the

end of the second b-strand (key p osition 36 in Table 2)

Presence of the Aspresidue at this position alone seems to

determine the preference of NAD(H) over NADP(H), as

neither a basic residue adjacent to this acidic residue (1bdb),

nor a basic residue in the Gly-motif (2hsd) alters the

coenzyme preference NADP(H)-binding enzymes seem to

be less strict in their requirement for two basic residues

Three structures (carbonyl reductase, troponine reductase II

and sepiapterin reductase; PDB codes 1cyd, 2ae2, and 1oaa)

have both these residues (key positions 15 and 37 in

Table 2), while trihydroxynaphthalene reductase (1ybv) and

3-oxoacyl reductase (1edo) have only the first, and

17b-hydroxysteroid dehydrogenase type 1 (1fdu) has only

the second basic residue

Because only few structures are experimentally solved, we

created an alignment including all classical SDRs with

coenzyme specificity annotated inSWISSPROT The sequences were aligned using a Hidden Markov model trained on sequences from the classical family only, to avoid artefacts due to the great diversity of the SDR superfamily We found that the correlations between patterns of charged residues and coenzyme specificity are generally applicable Sequence motifs based upon the patterns of charged residues were developed and used to sort the classical SDRs into four subfamilies of NAD(H)-binding proteins (Fig 1) These subfamilies were denoted cD1d, cD1e, cD2 and cD3 Sequences that bind NAD(H) and have a negatively charged amino acid residue present at the end of the second b-strand (key position 36, Table 2) are sorted into subfamily cD1d if this charged residue is aspartic acid or subfamily cD1e if it is glutamic acid Sequences that instead have a negatively charged residue at the first or second position after the second b-strand (key positions 37 or 38, Table 2) are sorted into subfamily cD2 or cD3, respectively The NADP(H)-binding proteins are sorted into three subfamilies Sequences with a basic residue in the Gly-motif (key position 15, Table 2) are sorted into subfamily cP1, while those with a basic residue at the first position after the second b-strand (key position 37, Table 2) are sorted into subfamily cP2 The cP3 subfamily is formed from sequences that have basic residues at both these positions

The new sorting process was applied to every classical SDR sequence inSWISSPROTandKIND, giving the distribu-tion of subfamilies shown in Table 2 NADP(H)-binding is twice as frequent as NAD(H)-binding ( 60% vs 30%), indicating that there are more forms catalysing the reductive reactions than the oxidative reactions Only about 10% of the sequences do not have any of the typical patterns and thus cannot be classified

For all but six of the 218 assigned classical SDRs, the coenzyme specificity is correctly predicted, as judged by agreements with the annotations in theSWISSPROTdatabase entries Scrutinizing the six deviating cases, we find that in four (Dhb1_Human, Dhb7_Mouse, Dhpr_Rat and Idno_Ecoli) there are experimental studies [14–17] that support our predictions The remaining two cases are sequences involved in fatty acid biosynthesis (Fabg_Thema and Fag2_Syny3) They are annotated as NADPH-binding

inSWISSPROT, and other proteins of the same functional type indeed use NADPH as coenzyme However, in contrast to them, these two sequences have an aspartic acid at the last

Table 2 Number of classical SDRs within the SWISSPROT and KIND databases, divided into different coenzyme-binding subfamilies Key position numbers refer to 3a,20b-hydroxysteroid dehydrogenase (PDB code 2hsd).

Subfamily

Key positions

SWISSPROT KIND

Trang 5

position of the second b-strand and are thus predicted to be

NAD(H)-binding by our method (subfamily cD1d) It is still

not experimentally verified if these two sequences bind

NADH or if they bind NADPH in an atypical manner

Subfamilies within the extended SDR family

The number of experimentally solved 3D structures for the

extended family is lower than for the classical family At

present, there are two known structures for

NAD(H)-preferring enzymes (UDP-galactose 4-epimerase and

dTDP-glucose 4,6-dehydratase; PDB codes 1ek6 and

1bxk) As for the NAD(H)-preferring enzymes of the

classical type, those of the extended family also present the

acidic residue (at key position 33, Table 3), and it is

concluded to be the exclusive determinant of an

NAD(H)-preferring enzyme There are two structures of

NADP(H)-preferring enzymes (GDP-fucose synthetase and

ADP-L-glycero-D-mannoheptose 6-epimerase; PDB codes

1bsv and 1eq2) However, when superimposing these

structures the root mean square deviation is 10 A˚, and

one of the main differences between the structures is in the

coenzyme-binding region The second structure (1eq2) is

atypical of the family [18,19], as it prefers NADP(H) but still

has the aspartic acid at the end of the second b-strand

typical of NAD(H)-binding Thus, the assignments of

NADP(H)-preferring enzymes of the extended type is based

on only the alignment of known annotated members of this

type In the alignment, we find that the basic residue present

in the Gly-motif among the classical SDRs does not have a

counterpart among the extended SDRs The second basic

residue, in the loopafter the second b-strand, is conserved

among extended SDRs as well (key position 34, Table 3)

For the extended SDRs, two NAD(H)-binding

sub-families (eD1 and eD2) and one NADP(H)-binding

subfamily (eP1) were defined based on the alignment

NAD(H)-binding sequences with an acidic residue at the

end of the second b-strand (key position 33, Table 3) are

sorted into the eD1 subfamily and those that have an acidic

residue two positions downchain are sorted into the eD2

subfamily The eP1 subfamily will consist of

NADP(H)-bound sequences that have a basic residue at the first loop

position after the second b-strand (key position 34, Table 3)

Table 3 displays the results when this classification system is

ap p lied to theSWISSPROTandKINDdatabases In contrast to

the results for the classical SDRs, a majority of the extended

SDRs are predicted to be NAD(H)-binding rather than

NADP(H)-binding The NAD(H)-binding enzymes are

twice as many as the NADP(H)-binding ones, indicating

that there are more dehydrogenases than reductases in the extended SDR family Around 10% of the sequences lack charged residues at the deterministic positions

For all but eight of the 118 assigned extended SDRs, the predicted coenzyme specificities agree with those annotated

inSWISSPROT There are three ADP-L-glycero-D -mannohep-tose 6-epimerases that are predicted to be NAD(H)-binding The sequences harbour an aspartic acid residue at the NAD(H)-deterministic position, but these enzymes prefer NADP(H) rather than NAD(H) The structure of the E.coli enzyme (1eq2) shows that the Aspresidue is in a more open conformation in contrast to other NAD(H)-preferring enzymes, and that therefore NADP(H) can be accommodated [18,19] There are five other sequences where the predicted coenzyme preferences are in disagree-ment with the annotated preferences One enzyme (galac-tose epimerase, Gale_Vibch) is predicted to prefer NADP(H), but as the galactose epimerases normally prefer NAD(H), the prediction is probably deceived by a mis-alignment due to a deletion of nine residues Another NADP(H)-predicted sequence (Noel_Rhifr) is annotated as NAD(H)-preferring, but also as a mannose dehydratase, which in general prefer NADP(H) to NAD(H) There are

no experimental data to support either alternative The last three sequences are dTDP-4-dehydrorhamnose reductases (Rbd1_Ecoli, Rbd2_Ecoli and Rfbd_Salty) with around 80% pair-wise residue identity They are predicted to be NAD(H)-preferring but are annotated to be NADP(H)-preferring However, the enzyme from S.enterica (Rfbd_Salty) has been shown to have dual coenzyme specificity, with a slight preference for NADH [20] Application to genome data

We also applied our method to six of the genome databases available, i.e human [21], mouse (July 2001; Celera Genomics, Rockville, MD), C.elegans [22], D.melanogaster [23], A.thaliana [24]; and S.cerevisiae [25] In Fig 4, results

of the assignments are displayed The numbers of SDRs found are similar when comparing the human and mouse genomes These genomes were released recently and cannot

be considered to be complete Thus, the number of SDRs in these genomes can be expected to increase [26]

For the human and mouse genomes, the distribution between classical (gray) and extended (white) families is similar to that in the general protein databases, where the extended members amount to around 25% or less of the total SDR number However, in the S.cerevisiae and A.thaliana genomes about 40% of the SDR forms are

Table 3 Number of extended SDRs, within the SWISSPROT and KIND databases, assigned into different coenzyme-binding subfamilies Key positions numbers refer to UDP-galactose 4-epimerase (PDB code 1ek6).

Subfamily

Key positions

Trang 6

extended Yeast has a much smaller genome than the others

with only 19 SDRs in total, and the seven extended SDRs

might reflect a critical minimum of extended SDRs [2] In

the plant (A.thaliana) genome the extended members are

close to half of the total SDR forms, reflecting the different

metabolic requirements in plants involving several

carbo-hydrate rearrangements The total number of SDR forms is

greater in A.thaliana than in other species, compatible with

the large number of gene duplications in plants [27]

However, the ratio between extended and classical forms

is still the same when reducing the data set for homology at

the 60% and 80% levels

The absolute numbers of extended SDRs are similar in

the animal species (10–18) The number of classical SDRs is

between 39 and 48 in human, mouse and fruit fly, while the

worm has 72 classical SDRs The worm shows a

consid-erable gene duplication tendency [28], which if affecting

classical and extended SDRs differently could explain this

difference

Also shown in Fig 4 are the results of the subfamily

assignments within the classical and extended SDRs The pie

charts show the relative number of NAD(H)-preferring

sequences (lined pattern) vs NADP(H)-preferring sequences

(solid) in each genome The number of NAD(H)-dependent

SDRs is close to the number of NADP(H)-dependent SDRs

in human, mouse and A.thaliana In contrast, the

NAD(H)-dependent enzymes amount to only one quarter in fruit fly

and one eighth in worm and yeast

The observation that classical SDRs most frequently

utilize NADP(H) is remarkable In the worm genome, 60

sequences are sorted into the NADP(H) classes, while only

eight are sorted into NAD(H) classes For extended SDRs,

the observation that most of them in general are

NAD(H)-dependent is not valid for fruit fly and yeast, where most

extended SDRs instead bind NADP(H), and A.thaliana,

where the numbers of NAD(H)- and NADP(H)-dependent

forms are close to equal (34 vs 27)

D I S C U S S I O N

Database quality considerations

Our method for functional assignments was applied to

completed eukaryotic genomes, revealing that the SDR

subfamily patterns vary considerably between different

species However, the genome databases are often prelimi-nary and contain errors Exons might be missing resulting in partial sequences Falsely ascribed exon borders will result

in sequences with erroneous deletions and/or insertions A motif-based method, that is dependent on a correct alignment, is of course sensitive to these types of error Still, bearing in mind that several genome sequences are preliminary, this type of classification is valuable to deduce early functional assignments

Automated annotation methods are developed to assign functions to newly sequenced proteins A drawback with automated annotation is that errors might be introduced [29] Manual annotation should be of higher quality but is very time-consuming, which leads to difficulties in keeping

up the pace with the genome sequencing projects In this study, we detected some errors in annotation of coenzyme specificity in SWISSPROT, a database that is manually annotated and thereby believed to be reliable There were three different types of error between the keywords and the references in these database entries First, the quoted publications reported different coenzyme specificities, but the keywords only mentioned one of them Second, there were entries where the quoted publications stated one type

of coenzyme while the keyword stated a different type Third, there were entries where the keywords reported a coenzyme specificity without any verifying reference, and the keywords did not say ÔprobableÕ or Ôby similarityÕ, or any other word to inform about the uncertainty Thus, it is still necessary to perform database assignment checks, and the present method is useful for this purpose, in addition to its value in primary assignments

Classical SDRs vs extended SDRs The multiple sequence alignments of classical and extended SDRs (Fig 5) show that even though these families are highly divergent, there are conserved regions that can serve

as fingerprints in the identification of novel SDR members (Fig 6) In these regions, used to identify classical and extended SDR family members (see Materials and methods), some motifs are of special interest These are listed in Table 4 In the N-terminal region, we find the pattern of three glycine residues that is characteristic of NAD(P)(H)-binding enzymes These residues are spaced differently in classical and extended SDRs (Table 4)

Fig 4 Classical and extended SDRs and their coenzyme preference shown for the genomes investigated The pie charts display the pro-portions between classical (gray) and extended (white) SDRs with specificity for NAD(H) (lined pattern) and NADP(H) (solid), for each

of the six genomes studied The number of SDR enzymes with their coenzyme-specificity assigned is given within parentheses.

Trang 7

In both families there is a conserved aspartic acid residue,

in the loopbetween b3 and a3, required for stabilization of

the adenine-binding pocket [13,30] In the extended family

this residue if often followed by another charged residue two

positions downchain

The motif positioned in and adjacent to b4 (Table 4) is

less conserved among extended SDRs compared to classical

SDRs Typically, extended SDRs prefer a histidine residue

rather than an asparagine residue at the end of this b-strand

In classical SDRs, the NNAG motif has a role to stabilize

the b-strands within the central b-sheet and to p osition this

central b-sheet [30]

There is a motif in a4 that is especially well conserved among the extended SDRs The a4 motif is also conserved among the classical SDRs Here, the asparagine residue is involved in building the active site geometry by positioning the lysine residue and being part of a postulated proton relay [30]

The active site residues in b5 and a5 (serine, tyrosine and lysine) are found in both classical and extended SDRs The extended SDRs have a conserved proline residue preceding the tyrosine residue, and also a conserved negatively charged residue four residues downchain of the lysine residue Neither of these two residues are conserved in the

Fig 6 3D structure of a classical SDR enzyme

with motifs indicated The spheres show the

coenzyme-deterministic positions for

NAD(H) in red and NADP(H) in blue.

Regions used to identify SDR members (cf.

Figure 2) are shown by blue ribbons The

coenzyme is coloured magenta The structure

is 3a,20b-hydroxysteroid dehydrogenase

(PDB code 2hsd) The figure was made using

the programme

Fig 5 Multiple sequence alignments of classical and extended SDRs The first three columns give the SWISSPROT sequence identifier, PDB identifier and subfamily membership The secondary structure elements of 3a,20b-hydroxysteroid dehydrogenase (PDB code 2hsd) are shown above the classical SDR alignment, while the secondary structure elements of UDP-galactose 4-epimerase (PDB code 1ek6) is shown below the extended SDR alignment Boxed residues denote key positions in coenzyme binding Coloured residues represent conservation of 60%, as calculated for a larger data set (red ¼ acidic, green ¼ polar, light blue ¼ hydrophobic, dark blue ¼ basic, purple ¼ Gly or Pro) Arrows 1, 2 and 3 above the alignment show the key positions 15, 36 and 37 (cf Table 2) Arrows 1, 2 and 3 below the alignment show the key positions 33, 34 and 35 (cf Table 3).

Trang 8

classical family, instead, they have a conserved aspartic acid

residue about 13 positions downchain from the lysine

residue

Coenzyme specificity as classification basis

The two-level classification system divides members of the

SDR superfamily into families and subfamilies, using a

motif-based approach For the five families detected at the

first level – classical, extended, intermediate, divergent and

complex – specific sequence patterns were extracted

(Table 2) The patterns for families with few and/or closely

related members (i.e the intermediate, divergent and

complex families) might be necessary to update when

further members are added, to avoid a bias towards the

presently known sequences

At the second level, the sequences belonging to the

classical and extended families were further divided into

seven and three subfamilies, respectively These subfamilies

were defined based on coenzyme specificity and patterns of

charged residues in the coenzyme-binding region The

human 17b-hydroxysteroid dehydrogenase type 1 is an

NADP(H)-preferring enzyme with a serine residue (Ser12)

at the position before the second glycine residue of the

glycine motif There is an arginine residue (Arg37) at the

first position after the second b-strand Site-directed

muta-genesis experiments show that an exchange of Ser12 to

lysine increased the specificity for NADP(H), while a

substitution of Leu36 to an aspartic acid changed the

preference from NADP(H) to NAD(H) [34], supporting the

crystallographic analysis and our motif-based assignments

The specificity might also depend on other factors than

the sequence patterns defined thus far Some enzymes show

dual coenzyme specificity and might bind alternative

coenzymes in different tissues and in different cellular

compartments Molecular modelling using docking

calcu-lations might be helpful in the prediction of coenzyme

preference [35]

There are members of the classical type where no motifs for

coenzyme specificity were established, as no charged residues

are found at the key positions otherwise identified as crucial

for this task (Table 2) This is the situation for

11b-hydroxy-steroid dehydrogenases type 2 and human

17b-hydroxy-steroid dehydrogenase type 2 However, charged residues are

found further downchain, and their roles might be clarified

when the 3D structures become known The retinol

dehy-drogenases (RDH) constitute a groupwhere experiments

show that bovine RDH is NAD+-dependent [36], while the rat RDH is NADP+-dependent [37] These two sequences are very similar in the Gly-region and identical at the positions used to distinguish between NAD(H) and NADP(H) enzymes Based on homology modelling of rat and bovine RDH [38], a basic residue further downchain (Lys64) in rat RDH is believed to enable NADP+to bind The corresponding residue in bovine is polar (Thr61) Only when their respective 3D structures have been experimentally determined, will it be possible to check which residues have shouldered the burden of separating between NAD(H) and NADP(H) specificity in these enzymes

In summary, we have shown that functional assignments can be made and coenzyme preferences can be predicted from the amino acid sequence alone for SDR enzymes For this divergent superfamily, we could distinguish families and subfamilies, which will helpfuture assignments The present approach using hidden Markov models and sequence patterns is general and can be extended to further enzyme families

A C K N O W L E D G E M E N T S

Financial support from the Swedish Research Council, the Swedish Foundation for Strategic Research, the Swedish Society for Medical Research, the Swedish Society of Medicine, the Novo Nordisk Foundation and Karolinska Institutet is gratefully acknowledged.

R E F E R E N C E S

1 Jo¨rnvall, H., Persson, M & Jeffery, J (1981) Alcohol and polyol dehydrogenases are both divided into two protein types, and structural properties cross-relate the different enzyme activities within each type Proc.Natl Acad.Sci.USA 78, 4226– 4230.

2 Jo¨rnvall, H., Ho¨o¨g, J.-O & Persson, B (1999) SDR and MDR: completed genome sequences show these protein families to be large, of old origin, and of complex nature FEBS Lett 445, 261– 264.

3 Jo¨rnvall, H., Persson, B., Krook, M., Atrian, S., Gonzalez-Duarte, R., Jeffery, J & Ghosh, D (1995) Short-chain dehy-drogenases/reductases (SDR) Biochemistry 34, 6003–6013.

4 Karplus, K., Barrett, C & Hughey, R (1998) Hidden Markov models for detecting remote protein homologies Bioinformatics

14, 846–856.

5 Bairoch, A & Apweiler, R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 Nucleic Acids Res 28, 45–48.

Table 4 Conserved sequence motifs in the classical and the extended SDR families In the motifs, ƠaÕ denotes an aromatic residue, ƠcÕ a charged residue, ƠhÕ a hydrophobic residue, ƠpÕ a polar residue and ƠxÕ any residue Alternative amino acids at a motif position are given within brackets.

Secondary

structure

element

SDR motifs

Suggested function Reference Classical Extended

b1 + a1 TGxxxGhG TGxxGhaG Structural role in coenzyme binding region [1,2,31] b3 + a3 Dhx[cp] DhxD Adenine ring binding of coenzyme [30] b4 GxhDhhhNNAGh [DE]xhhHxAA Structural role in stabilizing central b-sheet [30]

b5 GxhhxhSSh hhhxSSxxhaG Part of active site [2,31] a5 Yx[AS][ST]K PYxx[AS]Kxxh[DE] Part of active site [2,31] b6 h[KR]h[NS]xhxPGxxxT h[KR]xxNGP Structural role, reaction direction [32,33]

Trang 9

6 Kallberg, Y & Persson, B (1999) KIND – a nonredundant

pro-tein database Bioinformatics 15, 260–261.

7 Abagyan, R & Totrov, M (1994) Biased probability Monte Carlo

conformational searches and electrostatic calculations for peptides

and proteins J.Mol.Biol.235, 983–1002.

8 Stewart, M.J., Parikh, S., Xiao, G., Tonge, P.J & Kisker, C.

(1999) Structural basis and mechanism of enoyl reductase

inhibi-tion by triclosan J.Mol.Biol.290, 859–865.

9 Rozwarski, D.A., Vilcheze, C., Sugantino, M., Bittman, R &

Sacchettini, J.C (1999) Crystal structure of the Mycobacterium

tuberculosis enoyl-ACP reductase, InhA, in complex with NAD +

and a C16 fatty acyl substrate J.Biol.Chem.274, 15582–15589.

10 Rossmann, M.G., Liljas, A., Bra¨nde´n, C.-I & Banaszak, L.J.

(1975) The Enzymes, 3rd edn (Boyer, P.D., eds), Vol 11, p p 61–

102 Academic Press, New York.

11 Wierenga, R.K., de Maeyer, M.C & Hol, W.G (1985) Interaction

of pyrophosphate moieties with a-helices in dinucleotide binding

proteins Biochemistry 24, 1346–1357.

12 Wierenga, R.K., Terpstra, P & Hol, W.G (1986) Prediction of the

occurrence of the ADP-binding beta alpha beta-fold in proteins,

using an amino acid sequence fingerprint J.Mol.Biol.187, 101–

107.

13 Tanaka, N., Nonaka, T., Nakanishi, M., Deyashiki, Y., Hara, A.

& Mitsui, Y (1996) Crystal structure of the ternary complex of

mouse lung carbonyl reductase at 1.8 A˚ resolution: the structural

origin of coenzyme specificity in the short-chain dehydrogenase/

reductase family Structure 4, 33–45.

14 Breton, R., Housset, D., Mazza, C & Fontecilla-Camps, J.C.

(1996) The structure of a complex of human

17beta-hydroxy-steroid dehydrogenase with estradiol and NADP + identifies two

principal targets for the design of inhibitors Structure 4, 905–915.

15 Nokelainen, P., Peltoketo, H., Vihko, R & Vihko, P (1998)

Expression cloning of a novel estrogenic mouse 17

beta-hydroxysteroid dehydrogenase/17-ketosteroid reductase

(m17HSD7), previously described as a prolactin

receptor-associ-ated protein (PRAP) in rat Mol.Endocrinol.12, 1048–1059.

16 Varughese, K.I., Skinner, M.M., Whiteley, J.M., Matthews, D.A.

& Xuong, N.H (1992) Crystal structure of rat liver

dihydropter-idine reductase Proc.Natl Acad.Sci.USA.89, 6080–6084.

17 Bausch, C., Peekhaus, N., Utz, C., Blais, T., Murray, E.,

Lowary, T & Conway, T (1998) Sequence analysis of the GntII

(subsidiary) system for gluconate metabolism reveals a novel

pathway for L -idonic acid catabolism in Escherichia coli

J.Bac-teriol 180, 3704–3710.

18 Deacon, A.M., Ni, Y.S., Coleman, W.G Jr & Ealick, S.E.

(2000) The crystal structure of ADP- L -glycero- D -mannoheptose

6-epimerase: catalysis with a twist Structure Fold.Des.8, 453–462.

19 Ni, Y., McPhie, P., Deacon, A., Ealick, S & Coleman, W.G Jr

(2001) Evidence that NADP + is the physiological cofactor of

ADP- L -glycero- D -mannoheptose 6-epimerase J.Biol.Chem.276,

27329–27334.

20 Graninger, M., Nidetzky, B., Heinrichs, D.E., Whitfield, C &

Messner, P (1999) Characterization of

dTDP-4-dehydro-rhamnose 3,5-epimerase and dTDP-4-dehydrodTDP-4-dehydro-rhamnose reductase,

required for dTDP- L -rhamnose biosynthesis in Salmonella enterica

serovar Typhimurium LT2 J.Biol.Chem.274, 25069–25077.

21 Venter, J.C et al (2001) The sequence of the human genome.

Science 291, 1304–1351.

22 Wilson, R.K (1999) How the worm was won The C.elegans

genome sequencing project Trends Genet 15, 51–58.

23 Adams, M.D et al (2000) The genome sequence of Drosophila melanogaster Science 287, 2185–2195.

24 Huala, E et al (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant Nucleic Acids Res 29, 102–105.

25 Mewes, H.W et al (1997) Overview of the yeast genome Nature

387, 7–65.

26 Kallberg, Y., Oppermann, U., Jo¨rnvall, H & Persson, B (2002) Short-chain dehydrogenase/reductase (SDR) relationships: a large family with eight clusters common to human, animal, and plant genomes Protein Sci 11, 636–641.

27 Bancroft, I (2000) Insights into the structural and functional evolution of plant genomes afforded by the nucleotide sequences

of chromosomes 2 and 4 of Arabidopsis thaliana Yeast 17, 1–5.

28 Semple, C & Wolfe, K.H (1999) Gene duplication and gene conversion in the Caenorhabditis elegans genome J.Mol.Evol.48, 555–564.

29 Devos, D & Valencia, A (2001) Intrinsic errors in genome annotation Trends Genet 17, 429–431.

30 Filling, C., Berndt, K.D., Benach, J., Knapp, S., Prozorovski, T., Nordling, E., Ladenstein, R., Jo¨rnvall, H & Oppermann, U (2002) Critical residues for structure and catalysis in short-chain dehy-drogenases/reductases (SDR) J.Biol.Chem.277, 25677–25684.

31 Oppermann, U.C., Filling, C., Berndt, K.D., Persson, B., Benach, J., Ladenstein, R & Jo¨rnvall, H (1997) Active site directed mutagenesis of 3 beta/17 beta-hydroxysteroid dehydro-genase establishes differential effects on short-chain dehydrogen-ase/reductase reactions Biochemistry 36, 34–40.

32 Filling, C., Nordling, E., Benach, J., Berndt, K.D., Ladenstein, R., Jo¨rnvall, H & Oppermann, U (2001) Structural role of conserved Asn179 in the short-chain dehydrogenase/reductase scaffold Biochem.Biophys.Res.Commun.289, 712–717.

33 Ghosh, D & Vihko, P (2001) Molecular mechanisms of estrogen recognition and 17-keto reduction by human 17beta-hydroxysteroid dehydrogenase 1 Chem.Biol.Interact.130–132, 637–650.

34 Huang, Y.W., Pineau, I., Chang, H.J., Azzi, A., Bellemare, V., Laberge, S & Lin, S.X (2001) Critical residues for the specifi-city of cofactors and substrates in human estrogenic 17beta-hydroxysteroid dehydrogenase 1: variants designed from the three-dimensional structure of the enzyme Mol.Endocrinol.11, 2010–2020.

35 Peralba, J.M., Cederlund, E., Crosas, B., Moreno, A., Julia`, P., Martı´nez, S.E., Persson, B., Farre´s, J., Pare´s, X & Jo¨rnvall, H (1999) An NADP(H)-dependent stomach alcohol dehydrogenase Structural and enzymatic properties of a gastric NADP(H)-dependent and retinal-active alcohol dehydrogenase J.Biol Chem 274, 26021–26026.

36 Simon, A., Hellman, U., Wernstedt, C & Eriksson, U (1995) The retinal pigment epithelial-specific 11-cis retinol dehydrogenase belongs to the family of short chain alcohol dehydrogenases J.Biol.Chem.270, 1107–1112.

37 Chai, X., Boerman, M.H., Zhai, Y & Napoli, J.L (1995) Cloning

of a cDNA for liver microsomal retinol dehydrogenase A tissue-specific, short-chain alcohol dehydrogenase J.Biol.Chem.270, 3900–3904.

38 Tsigelny, I & Baker, M.E (1996) Structures important in NAD(P)(H) specificity for mammalian retinol and 11-cis-retinol dehydrogenases Biochem.Biophys.Res.Commun.226, 118–127.

Ngày đăng: 23/03/2014, 21:21

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm