Báo cáo khoa học: Analysis of ancient sequence motifs in the H+-PPase family doc

Persson, IFM Bioinformatics, Linko¨ping University, S-581 83 Linko¨ping, Sweden Fax: +46 13 137 568 Tel: +46 13 282 983 E-mail: bpn@ifm.liu.se *These authors contributed equally to this

Trang 1

Analysis of ancient sequence motifs in the H -PPase family

Joel Hedlund1*, Roberto Cantoni2,3,4*, Margareta Baltscheffsky2, Herrick Baltscheffsky2

and Bengt Persson1,4

1 IFM Bioinformatics, Linko¨ping University, Sweden

2 Department of Biochemistry and Biophysics, Arrhenius Laboratories, Stockholm University, Sweden

3 Department of Physical Sciences, ‘Federico II’ University of Naples, Italy

4 Department of Cell and Molecular Biology (CMB), Programme for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden

Membrane-bound inorganic pyrophosphatase⁄

pyro-phosphate synthase (H+-PPase⁄ H+-PPi synthase) [1,2]

activities were ﬁrst described in chromatophores from

the purple nonsulphur photosynthetic bacterium,

Rho-dospirillum rubrum, where the enzyme functions as a

proton pump [3] The gene for H+-PPase⁄ H+-PPi

syn-thase, which is the enzyme involved in the

photo-synthetic formation of pyrophosphate (PPi) [2], was

cloned in 1998 [4] and the primary structure and

fur-ther properties have been deduced [5] Moreover, 3D

models of parts of the putative active site loop between

transmembrane segments 5 and 6 of R rubrum have

been presented [6] This 57 amino acid residue loop

contains three sequence motifs, underlined in the

following sequence: LGGGIFTKCADVGADLVGKV

EAGIPEDDPRNPAVIADNVGDNVGDCAGMAAD LFETY

In what apparently is a pyrophosphate-binding region and an essential part of the active site for the phosphorylation⁄ phosphatase reaction [7], three differ-ent ‘primitive’ tetrapeptide motifs (DVGA, DLVG and DNVG) [5,6] have been located in two nonapeptide

DNVGDNVGD) which, with their charged amino acids at positions 1, 5 and 9, seem to be involved in binding the metal-phosphate substrates [7] These seq-uence motifs are denoted ‘primitive’ because they have

a high content of the four ‘very early’ proteinaceous amino acids G (glycine), A (alanine), D (aspartic acid) and V (valine) In 1978, Eigen & Schuster [8] proposed

Keywords

bioinformatics; hidden Markov models;

molecular evolution; proteinaceous amino

acids; pyrophosphatase

Correspondence

B Persson, IFM Bioinformatics, Linko¨ping

University, S-581 83 Linko¨ping, Sweden

Fax: +46 13 137 568

Tel: +46 13 282 983

E-mail: bpn@ifm.liu.se

*These authors contributed equally to this

work

(Received 9 August 2006, revised 26

Sep-tember 2006, accepted 27 SepSep-tember 2006)

doi:10.1111/j.1742-4658.2006.05514.x

The unique family of membrane-bound proton-pumping inorganic pyro-phosphatases, involving pyrophosphate as the alternative to ATP, was investigated by characterizing 166 members of the UniProtKB⁄ Swiss-Prot + UniSwiss-ProtKB⁄ TrEMBL databases and available completed genomes, using sequence comparisons and a hidden Markov model based upon a conserved 57-residue region in the loop between transmembrane segments

5 and 6 The hidden Markov model was also used to search the approxi-mately one million sequences recently reported from a large-scale sequen-cing project of organisms in the Sargasso Sea, resulting in additional 164 partial pyrophosphatase sequences The strongly conserved 57-residue region was found to contain two nonapeptidyl sequences, mainly consisting

of the four ‘very early’ proteinaceous amino acid residues Gly, Ala, Val and Asp, compatible with an ancient origin of the inorganic pyrophospha-tases The nonapeptide patterns have charged amino acid residues at posi-tions 1, 5 and 9, are apparent binding sites for the substrate and parts of the active site, and were shown to be so speciﬁc for these enzymes that they can be used for functional assignments of unannotated genomes

Abbreviations

H+-PPase ⁄ H +

-PP i synthase, proton pumping inorganic pyrophosphatase ⁄ pyrophosphate synthase; HMM, hidden Markov model; P i , inorganic phosphate; PPi, pyrophosphate.

Trang 2

a detailed model for the evolution of RNA codons,

which suggests that the RNA triplets GGC and GCC

(coding for glycine and alanine, respectively) were the

earliest codons Transition mutations at the second

base would then have given GAC and GUC, which

code for aspartic acid and valine, respectively The

evi-dence supporting the role of these four proteinaceous

amino acids as ‘very early’ [5] has been further

con-ﬁrmed by Trifonov [9] Notably, experiments by Miller

[10], leading to the synthesis of amino acids under

cer-tain ‘simulated prebiotic’ conditions, gave these four

amino acids in the highest yield, and they all were

among the amino acids found in the Murchison

mete-orite [11] Both the high content of the ‘very early’

amino acids in these sequence motifs, and the fact that

the motifs have remained essentially unchanged during

billions of years of biological evolution, from Archaea

and Bacteria to Eukaryotes, provide a background for

our fascination with various aspects of their evolution

and function

As pointed out previously [5], the GGG motif of

the R rubrum H+-PPase may be of special functional

signiﬁcance in a possible conformational change

mechanism for the physiological coupling between the

light-induced pumping of protons and the

photo-phosphorylation of inorganic phosphate (Pi) to PPi [2]

or its reversal, the dark hydrolysis of PPito Pi[1] All

three glycines are strictly conserved, except in very rare

cases, where one, usually the ﬁrst G, is substituted with

an A With this conservative substitution, and the

additional fact that, just as glycine, alanine can serve

as a versatile link in conformational changes, the

sub-stitution of one G with one A should not cause any

drastic change in the suggested function of the GGG

motif G and A are also seen elsewhere to have been

interchanged, as will be exempliﬁed in our discussion

of the DVGADLVGK motif

Two notable points of the H+-PPase family exist at

present Unique simplicity of these homologous,

inte-grally membrane-bound enzymes resides in both their

substrates (Pi and PPi) and their single-subunit

dimer-ized structure [12] Its extreme hydrophobicity, with

approximately half of the residues in the 16 ± 1

trans-membrane segments, has made all attempts to obtain

high-resolution 3D information of the H+-PPase

unsuccessful, to date Bioinformatics may be used to

provide new perspectives on the possible evolution of

this ancient, widely spread and unusually highly

con-served energy-transferring enzyme family

The possibility has been considered that PPi could

have been a predecessor to ATP, and that H+-PPases

could have been direct or indirect evolutionary

ances-tors of ATPases [5,13] From numerous genome

projects, a large number of both prokaryotic and euk-aryotic H+-PPase sequences are now known Our new overview of the high content of strongly conserved, very early amino acids in the above shown three puta-tive acputa-tive site motifs in the loop between transmem-brane segments 5 and 6 deserves a closer look for existing sequence similarities in other known polypep-tides We thus also evaluate the possibility that the putative active site may have evolved from an original enzyme structure involved in an emerging metabolism

of energy-rich phosphate [6] A recent, additional indi-cation in this direction was given by the discovery that shifting the growth conditions of R rubrum from aerobic⁄ dark to anaerobic ⁄ light resulted in concerted transcriptional activation of both H+-PPase and pho-tosynthetic genes, by the same anaerobic regulatory factor [14], pointing towards PPi as the energy-rich phosphate product of early bacterial photophosphory-lation Furthermore, the unique property of the H+ -PPase family, to use PPi rather than ATP for energy coupling and utilization in biological membranes, pro-vides an alternative angle to the study of the central, but notoriously elusive, coupling mechanism between energy-rich phosphates and proton pumps Finally, the major problems encountered by several laboratories in obtaining H+-PPase samples for 3D studies have moti-vated detailed searches with various bioinformatic techniques in order to extend the characterization of the members of the H+-PPase family

Results and Discussion

Characterization of H+-PPase members

In order to detect members of the membrane-bound, proton pumping H+-PPase family, all completed genomes and the UniProtKB protein sequence data-base [15] were searched using fasta [16], resulting in characterization of over 100 different members The sequences were multiply aligned, revealing a number of well-conserved regions, especially the highly conserved 57-residue segment in the loop between transmembrane segments 5 and 6 This segment was used to create a hidden Markov model (HMM), which subsequently was used to search for further homologues in the data-bases of UniProtKB and the currently available genomes, in order to identify further family members Using this HMM, 166 sequences were found (supple-mentary Table S1) Remarkably, only other H+ -PPases were found using this strategy, indicating high speciﬁcity of the model The poorest scoring H+-PPase sequence had an E-value of 8.3e-25 (i.e under the circumstances of the search, only 8.3e-25 unrelated

Trang 3

sequences could be expected to attain the same match

quality by chance alone) [17] Thus, the E-value is

highly statistically signiﬁcant Furthermore, the best

scoring non-H+-PPase protein was detected at an

E-value of 15, far below statistical signiﬁcance Thus,

the model is well suited for detection of H+-PPases

The 57-residue segment seems to be unique for the

H+-PPase family, which might be seen as an index of

their early separation from other families, making

H+-PPases a low-positioned branch in the

genealogi-cal tree of protein families Consequently, this

seg-ment can be used as a ﬁngerprint in the search for

further members of this family from new sequence

data

Analyses of the various primary structures of H+

-PPases (PPi synthases) raise the possibility of

explor-ing, in molecular detail, various properties of this

‘primitive’ alternative to ubiquitously occurring ATP

synthases Looking at the species distribution, H+

-PPases are present in several archaeal and bacterial

species [5,18] In eukaryotes, these enzymes are found

in plants and in a few blood parasites, such as

Tetra-hymenaand Plasmodium

The fact that four ‘very early’ amino acid residues

have been retained indicates very early optimization

This may be a rather unusual situation compared with

early motifs in other proteins, where stepwise evolution

of the motifs may have provided further optimization,

through introduction of later amino acids

Different H+-PPase subfamilies

In order to characterize the interfamily relationships, a

dendrogram was calculated (Fig 1) based upon the

multiple sequence alignment The tree shows that H+

-PPases form two subfamilies, corresponding to type 1

and type 2 [19,20] Several species present multiple

forms of H+-PPases, which in many cases are distantly

related (37–39% pairwise residue identity) and belong

to the separate types For cress (Arabidopsis thaliana),

there are ﬁve type 1 and six type 2 enzymes, whereas

for rice (Oryza sativa) there exist 18 enzymes of type 1

and two enzymes of type 2 Among plants, the

multi-plicity has so far only been seen in organisms for

which the complete genomes are available, and this

distantly related multiplicity can thus be expected to

occur also in further plants The blood parasites

Tetra-hymena, Plasmodium falciparum and Plasmodium yoelii

show a similar multiplicity Moreover, some archaeal

species (e.g Methanosarcina mazei and Methanosarcina

acetivorans) have multiple H+-PPases, whereas most

bacterial species do not show any multiplicity, except

for the bacterium Rhodopseudomonas palustris, which

has two different pyrophosphatases, found on separate branches within type 2 (Fig 1)

This multiplicity contrasts to the situation for the family I soluble PPases, where two variants of different sizes are known, but they are strictly divided between different kingdoms – one in prokaryotes and the other

in eukaryotes [21]

Surprisingly, two eukaryotic sequences from the frog Xenopus tropicalis are found (see Fig 1) among the bacterial type 2 members However, because this gen-ome project is not complete, it cannot yet be claimed that these sequences are correct

In order to characterize the differences between sequences of types 1 and 2, we compared the positions strictly conserved within each type, but differing between the types, as indicated in Fig 2 In total, seven such differences were found – four in the trans-membrane segments and three on the cytosolic side – whereas none was found on the noncytosolic side Two of the differences were found between residues with different physico-chemical properties, implying possible functional impact (Fig 2, residues indicated

by bullet symbols) Functional impact has already been conﬁrmed for one of these differences (position 507 in HPPA_STRCO), as an Ala⁄ Lys mutation introduced

at the corresponding position in Carboxydothermus hydrogenoformans type 1 H+-PPase has been shown

to confer the potassium independence of type 2 H+ -PPases to the enzyme [22] At position 253 in the loop between transmembrane segments 5 and 6, close to one of the conserved nonapeptides, the type 1 enzymes have predominantly hydrophobic Ile or Val, while type 2 enzymes have polar Cys or Ser

Furthermore, there are two exchanges within weakly conserved clustalw groups of residues [23] (Fig 2, open ring symbols) At position 266 in transmembrane segment 6, the type 1 residues are Glu or Gly, while the type 2 residues are Val or Ala At position 510, in loop 11–12 close to the membrane, type 1 enzymes prefer Gly, while type 2 enzymes prefer Thr or Ala The remaining three residue exchanges are within clustalw groups, and these are all located in trans-membrane segments (Fig 2)

Conserved regions within H+-PPases From the multiple sequence alignment of all H+

-PPas-es, a limited number of strongly conserved regions are clearly visible A conservation proﬁle was calculated, showing the degree of conservation averaged over win-dows of 11 residues along the protein chain (Fig 3) From the resulting plot it is clear that most of the con-served residues are located on the cytosolic side of the

Trang 5

Fig 2 Schematic view of the H + -PPase sequence with residue differences between type 1 and type 2 indicated The black line symbolizes the amino acid chain, and the horizontal green lines denote membrane borders with the cytosolic side downwards and the noncytosolic side upwards The transmembrane topology is from a previous publication [26] The transmembrane segments are numbered from 1 to 17 and the loops are numbered 1–2 to 16–17 The letters N and C denote the N- and C-terminus, respectively Positional numbers refer to the refer-ence sequrefer-ence from Streptomyces coelicolor (UniProtKB ⁄ Swiss-Prot ID HPPA_STRCO, AC Q9X913) The green regions represent the five conserved regions in Fig 3 The red boxes indicate the four nonapeptides (shown in red in Fig 3) Thick lines denote positions where the two most common residue types together make up more than 95% of the residue type content Positions with differential conservation are indicated by labels of the type ‘AB ⁄ CD’, which denotes that more than 90% of the Type 1 enzymes have residue A or B at this position, while 90% of the Type 2 enzymes have C or D, and that A and C are the most common ones Boldface residue letters indicate that more than 85% of the sequences in the corresponding subtype have this residue Conserved substitutions within a CLUSTALW ‘strongly conserved group of residues’ are shown by dash symbols on the backbone, and open ring symbols are similarly used for CLUSTALW ‘weakly conserved groups of residues’ Bullet symbols on the backbone denote conserved substitutions between two residues that do not occur together in any of the CLUSTALW groups This latter form of substitution implies functional impact Also, in order to count as a conserved substitution, none of the residue types A or B can be identical to the residue types C or D.

Fig 1 Dendrogram of H + -PPases The dendrogram is based upon the multiple sequence alignment of all PPase sequences found in Uni-ProtKB ⁄ Swiss-Prot, UniProtKB ⁄ TrEMBL and GenomeLKPG databases, with removal of sequences showing 90% or more identity to any of the other sequences in the alignment Two sequences – Q420R8_DESHA and Q4CED6_CLOTM – show unclear relationships and were excluded without affecting the general tree topology Red marks at branch points indicate bootstrap values below 888 ⁄ 1000, the bootstrap value at the branch point separating Type 1 (K+-dependent) from Type 2 (K+-independent) enzymes The horizontal bar shows the branch length corresponding to 5% residue differences Each branch end-point is designated with identifiers from UniProtKB ⁄ Swiss-Prot, Uni-ProtKB ⁄ TrEMBL or the respective genome project, prefixed indicators of kingdom (A, archaeal; B, bacterial; E, eukarytoic), species group (A, Alveolata; E, Euglenozoa; P, Plant; V, Vertebrates), and PPase type (1 or 2) Archaea are labelled yellow, bacteria blue, and plants green The red-labelled sequences originate from primitive eukaryotes (alveolates and euglenozoans) Two sequences from the unfinished Xen-opus tropicalis genome project are shown in purple Accession numbers are given within parentheses after the UniProtKB ⁄ Swiss-Prot ID For UniProtKB ⁄ TrEMBL IDs, the accession number forms the first part of the ID All type 1 PPases are found in the upper part of the dendro-gram, while type 2 PPase are in the lower part Several plants and protozoan organisms have both type 1 and type 2 PPases, whereas most

of the bacterial forms only present one of the types.

Trang 6

membrane, while only few residues on the noncytosolic

side are conserved Furthermore, there are ﬁve

seg-ments that are seen as distinct peaks, indicating strong

conservation These sequences are listed in Fig 3 The

locations of these segments are schematically indicated

in the topology plot in Fig 2, shown as red-coloured

segments The ﬁrst region corresponds to the

previ-ously mentioned 57-residue conserved region in the

loop between transmembrane segments 5 and 6, while

the second and third regions are found in the loop

between segments 11 and 12 and the fourth and ﬁfth

regions in the loop between segments 15 and 16

We developed HMMs for the regions 2+3 and

4+5, which were used to search UniProtKB⁄

Swiss-Prot and UniSwiss-ProtKB⁄ TrEMBL databases for

poten-tially homologous proteins However, also for these

regions, we only found proteins of the H+-PPase fam-ily, thus emphasizing the uniqueness of this family

The 57-residue conserved region in the loop between transmembrane segments 5 and 6 From the multiple sequence alignment of all 145 H+ -PPases, the consensus sequence for the well-conserved 57-residue region was calculated (Fig 4A) The region contains the three conserved sequence motifs: GGG, DVGADLVGK and DNVGDNVGD, which have already received particular attention from the evolution-ary viewpoint [5,6] and appear to form functionally sig-nificant parts of the active site of the enzyme [7] Both the second and the third motif contain nine amino acid residues, of which the first, fifth and ninth are charged

Fig 3 Conservation plot for H + -PPases and conserved sequence motifs From the multiple sequence alignment of H + -PPases, CLUSTALW col-umn scores were averaged for ungapped 11-residue windows of the reference sequence from Streptomyces coelicolor, HPPA_STRCO, and plotted in green The predicted membrane topology is shown in blue, where high values indicate the noncytosolic side and low values the cytosolic side, whereas medium values correspond to transmembrane regions The plot shows that the conserved regions coincide with cytoplasmic and transmembrane regions There are five regions with distinct peaks above 55% column score (dotted line), indicating strong conservation The sequences for these regions in S coelicolor are shown below the plot In region 1, the well-conserved patterns, including the two nonapeptides, are highlighted in red Distantly related nonapeptides in regions 4 and 5 are also highlighted in red, together with a GGS pattern, possibly corresponding to the GGG pattern in region 1 (cf text).

Trang 7

The number of amino acid residues separating the three

motifs is remarkably constant in all the H+-PPases

The nonapeptide DVGADLVGK was seen to have

a partial counterpart in the P-loop [24] of the active

b-subunit of ATP synthase In an alignment of PPi

synthase from R rubrum with the P-loop in animal

mitochondrial ATP synthase, four of eight amino acid

residues were found to be identical [5]

In order to investigate further the evolutionary

vari-ation of this 57-residue region, we applied the HMM

to search the approximately one million sequences

recently reported from a large-scale sequencing project

of organisms in the Sargasso Sea [25] With the model,

we were able to extract an additional 164 pyrophos-phate sequences (supplementary Table S2), not over-lapping with the initial set of sequences The consensus sequence of the Sargasso sequences is shown in Fig 4B Comparing all sequences detected in Uni-ProtKB⁄ Swiss-Prot, UniProtKB ⁄ TrEMBL and Geno-meLKPG databases (Fig 4A), it can be seen that the variability is smaller among the Sargasso sequences, but many of the residue variations are identical Fur-thermore, the variable regions are generally located at the same sites as for the sequences in Fig 4A

For a number of plant H+-PPases, the HMM ﬁnds distant similarity also to a second region of the

B

A

Fig 4 Consensus sequences of the conserved 57 amino acid residue region (A) The consensus sequence derived from all sequences found

in UniProtKB ⁄ Swiss-Prot, UniProtKB ⁄ TrEMBL and available genomes using the HMM search is shown The sequence is shown using sequence logos [34], clearly showing the high conservation of this region Residues are coloured according to chemical properties: green represents polar residues (G, S, T, Y, C, Q, N), blue basic (K, R, H), red acidic (D, E) and black hydrophobic residues (A, V, L, I, P, W, F, M) The amino acid residues at each position are also shown in plain text below the x-axis, where the top row represents the most common residue type with alternative residues ordered in decreasing frequency (B) We also used the HMM to search the approximately one million sequences recently reported from the Sargasso Sea [25] The sequence logos and positional residue variants are shown as in (A).

Trang 8

enzymes, possibly being visible traces of an ancient

gene duplication This second region is located at

resi-dues 738–785 (numbering according to the A thaliana

sequence with accession number Q9FWR2

(Uni-ProtKB Q9FWR2_ARATH)) This region forms loop

15–16, located at the cytosolic side according to

experi-mental investigations [26] (cf Fig 3) The patterns of

this second region are also seen in further species

vari-ants, but are most clearly distinguishable in the plant

sequences According to the three well-conserved

sequence segments of H+-PPases, the second

nonapep-tide motif is well conserved, with all three aspartic acid

residues unchanged (Fig 3, marked residues in regions

1 and 5) For the ﬁrst nonapeptide motif, charges are

found at positions 1, 5 and 9 in the order Asp, Asp,

Lys in region 1, whereas the order is Asp, Lys, Asp in

region 4 Notably, the positional spacing between the

two nonapeptides in region 1 and region 4+5 is

identi-cal (26 residues) Furthermore, the GGG motif

pre-ceding the nonapeptides in region 1 could correspond

to a GGA or GGS motif, preceding the nonapeptide

in region 4 (Fig 3, marked residues in peptides 1 and

4) Thus, these observations, taken together, might

reﬂect an ancient gene duplication event, as previously

suggested [5]

Occurrence of the typical H+-PPase nonapeptides

in other proteins

In order to investigate the general occurrence of the

two nonapeptides, characterized as typical of H+

-PPases, they were compared with all sequences in the

UniProtKB⁄ Swiss-Prot and UniProtKB⁄ TrEMBL

databases The sequence patterns used in the searches

were based upon all positional variants occurring in any

of the H+-PPases, excluding those with only a single

observation, resulting in the patterns:

D-[VIMT]-[GA]-[AGS]-D-[LI]-[VSMA]-G-K and

D-[NCFL]-[VITA]-G-D-N-[VA]-G-D In Table 1, these patterns are shown

as DpppDppGK and DppGDNpGD, respectively

Remarkably, only 11 and 7, respectively, of the 192 and

32 possible combinations of different nonapeptides were

found to have matches in UniProtKB⁄ Swiss-Prot or

UniProtKB⁄ TrEMBL databases (Table 1) All but one

of the sequences found by the ﬁrst nonapeptide pattern

are annotated as H+-PPases For the second

nonapep-tide pattern, all but two are H+-PPases The exceptions

are a putative DNA damage-inducible protein from

Erythrobacter litoralis (Q4TQ38–9SPHN), presumably

not related to the H+-PPases, and two hypothetical

pro-teins from Neurospora crassa (Q871A9_NEUCR and

Q7RZ15_NEUCR) Furthermore, as seen in Table 1,

four H+-PPases are not detected by the ﬁrst

nonapep-tide pattern, because those proteins have one atypical amino acid residue in this pattern Thus, the nonapep-tide patterns are, with these few exceptions, speciﬁc for the H+-PPases As seen in Table 1, the number of

non-H+-PPase hits increases dramatically when the patterns are extended to DXXXDXXGK and DXXGDNXGD, respectively, where X represents any amino acid residue

We extended the pattern search to screen the Uni-ProtKB⁄ Swiss-Prot database for very simple motifs of possible ancestral signiﬁcance, with an alternation of Asp and one of the other ‘very early’ amino acids (e.g DADADADAD) (Table 2) It can be seen that the pat-tern VDVDV is under-represented compared with the patterns ADADA and GDGDG, even when considering the general frequencies of the residues (V, 6.7%; A, 7.9%; G, 7.0%) Similarly, the DGDGD pattern is over-represented compared with the patterns DADAD and DVDVD This over-representation is still present

Table 1 Number of proteins and occurrences of PP i -related sequence motifs in the UniProtKB ⁄ Swiss-Prot and Uni-ProtKB ⁄ TrEMBL databases The peptides DpppDppGK and Dpp-GDNpGD denote patterns based upon all positional variants occurring in any of the H+-PPases, excluding those with only a sin-gle observation, corresponding to the patterns: D-[VIMT]-[GA]-[AGS]-D-[LI]-[VSMA]-G-K and D-[NCFL]-[VITA]-G-D-N-[VA]-G-D In the peptides DXXXDXXGK and DXXGDNXGD, X represents any amino acid residue.

Sequence motif

UniProtKB ⁄ Swiss-Prot UniProtKB ⁄ TrEMBL Proteins Hits PPases Proteins Hits PPases First nonapeptide

Second nonapeptide

Trang 9

after homology reduction (at the 80% and 60% levels)

and might well reﬂect structural properties

We also searched the complete UniProtKB

(Swiss-Prot + TrEMBL) database for patterns with

altera-tions of any two ‘very early’ amino acid residues The

largest number of proteins was found for the sequences

AGAGA (6185) and GAGAG (6203), in agreement

with the assumption that G and A are both very early,

ﬂexible and frequent amino acids Close to 6000 results

were also reported for the pattern AVAVA (5865),

while much smaller numbers were found for GVGVG

DVDVD (852) The small difference in frequencies

between V and D (6.7% and 5.3%, respectively) in

known proteins does not fully explain the discrepancy

between AVAVA and ADADA

Three of the sequences found had long segments

consisting of repetitious patterns containing two of the

four early amino acid residues A, G, D and V The

residues A and D, present in an alternation pattern

(ADADAD ) were found in the surface protein, SdrI,

from Staphylococcus saprophyticus and two putative

peptidoglycan-bound proteins from Listeria innocua

and Listeria monocytogenes These proteins are

suppo-sedly attached to the cell wall peptidoglycan by amide

bonds Such patterns are believed to be evolutionary

relics of early sequence pattern formation by

muta-tions and duplicamuta-tions, from early homo-oligomers

(GGGGG and AAAAA ) [27]

In the conserved nonapeptides DVGADLVGK and

DNVGDNVGD, 14 out of 18 residues belong to the

four ‘primitive’ residue types G, A, V and D In order to

make a general assessment of frequency of ‘primitive’

and repetitive patterns including the ancient amino acid residues, we searched the protein databases for sequences including these four residues in various com-binations Thus, we scanned UniProtKB⁄ Swiss-Prot for the sequence motif D-[A⁄ G ⁄ V]3-D-[A⁄ G ⁄ V]3-D, and a similar motif, where alanine is replaced with asparagine, given the presence of asparagines in one of the two puta-tive acputa-tive site motifs of R rubrum H+-PPase The only patterns found in the proteins were DNVGDNVGD, unique to the H+-PPase family, and DNNNDNNND,

in the spindle assembly checkpoint component MAD1 from Saccharomyces cerevisiae (Mitotic arrest deﬁcient protein 1; UniProtKB⁄ Swiss-Prot ID MAD1_YEAST)

We thus concluded that the charged residues (1, 5 and 9)

of the two nonapeptides form a unique and unaltered pattern, presumably with critical function and charac-teristics of the H+-PPase family

Putative metal-binding patterns Asp residues are strictly conserved in the H+-PPase nonapeptide motifs DVGADLVGK and DNVGD NVGD The residues aspartic acid (Asp) and glutamic acid (Glu) can act as metal ligands in various proteins [28,29] UniProtKB⁄ Swiss-Prot was screened for pat-terns of nine amino acid residues with either Asp or Glu

at every fourth position (1, 5 and 9) and allowing any residue at the remaining positions If the sequence formed an a-helix, with one turn every 3.6 residues, the charged residues would be facing the same side, to facili-tate metal-binding properties at the active site

In UniProtKB⁄ Swiss-Prot, 11 648 proteins were found with the motif D-X3-D-X3-D, while over two-fold as many (26 389 proteins) were found with the motif E-X3-E-X3-E We investigated protein family relationships based upon the Pfam annotations in the UniProtKB⁄ Swiss-Prot entries For the ‘E-motif’, the most frequent domain was elongation factor Tu (with

472 proteins), and the protein kinase domain (with 320 proteins) was the second most frequent For the

‘D-motif’, the most frequent domain was the EF hand

Ca2+ion-binding motif (297 occurrences), followed by S-adenosylmethionine synthases (187 proteins) and kinases (178 occurrences)

Conclusions

The rapidly expanding numbers of available amino acid sequences provided novel possibilities to explore early biological evolution, in the direction from ‘very early’ polypeptide synthesis to various known or putative active sites of enzymes, especially those of the H+-PPase family Our analyses with bioinformatic methods have

Table 2 Sequence patterns containing the four ‘primitive’ amino

acid residues searched in the UniProtKB ⁄ Swiss-Prot database.

Pattern

Number of proteins

Number of occurrences

Trang 10

shown that the H+-PPases are unique in their sequence

properties, with no close relatives detectable using the

presently available sensitive methods The analyses have

also shown that the membrane-bound H+-PPases form

a large family, divided into two subclasses – types 1 and

2 – where, notably, both types are found in plants,

pro-tozoa, bacteria and archaea No occurrences exist in

ver-tebrates, with the possible exception of a reported

sequence from the X tropicalis early genome project In

soluble family I PPases, two structural types are known

to be strictly divided – one in prokaryotes and the other

in eukaryotes [21]

The well-conserved nonapeptides in the loop between

transmembrane segments 5 and 6 show speciﬁc patterns

that can be used for functional assignments of

unanno-tated genomes The distance between the two

nonapep-tides is unchanged in all known sequences We believe

that our novel explorations on peptide motifs, both as

such and as formed in apparent closeness to situations

plausibly existing at the time of the origin and very early

evolution of life on Earth, may be usefully extended

when even more sequence data and, especially, when the

ﬁrst 3D structure of an H+-PPase, become available

Based on the 3D structure and the results presented

here, rational selections of site-speciﬁc mutants may be

expected to illuminate further both the evolutionary and

the dynamic aspects of H+-PPase function

Experimental procedures

Pyrophosphatase sequences were searched using blast [30]

towards UniProtKB, version 6.3 (October 2005, http://

www.uniprot.org) [15], and an in-house database of all

genomes in the public domain (ftp.ensemble.org; ftp.ncbi

nih.gov; ftp.tigr.org), denoted GenomeLKPG (A Bresell and

J Hedlund, Linko¨ping University, Sweden, personal

commu-nication) The searches were complemented by HMM-based

screenings based upon the ‘H_PPase’ model from Pfam [31]

We also built and calibrated our own HMM, based upon 86

sequences For the creation of HMM and screenings, the

hmmersoftware (http://hmmer.wustl.edu) [17] was used with

default parameters for building and calibrating (commands

‘hmmbuild’ and ‘hmmcalibrate’)

General sequence comparisons were made using the

pro-gram fasta [16] and pattern searches using the ps_scan

utility from the Prosite database [32]

In the multiple sequence alignments, sequences annotated

as fragments, and those shorter than 300 residues, were

removed to improve the alignment quality In the

phylo-genetic trees, sequences with pairwise residue identity of

more than 90% to any other sequence were excluded

Mul-tiple sequence alignments were calculated using dialign

[33], and dendrograms were generated using the neighbour

joining method, as implemented in clustalx [23]

The plots in Figs 2 and 3 were generated using in-house produced software to calculate residue conservation and intergroup differences, and to map information on substitu-tions, conserved regions and sequence motifs onto the membrane topology

Acknowledgements

We thank Anders Bresell for early access to the Geno-meLKPG database and Jan-Ove Ja¨rrhed for computer support Financial support from Carl Tryggers Stiftelse fo¨r Vetenskaplig Forskning, Magnus Bergvalls Stift-else, Stiftelsen Wenner-Grenska Samfundet, Kar-olinska Institutets Stiftelser and Linko¨ping University

is gratefully acknowledged

References

1 Baltscheffsky M (1964) Some characteristics of the pyro-phosphatase reaction in energy-generating systems Abstracts 1st FEBS Meeting, p 67 London

2 Baltscheffsky H, von Stedingk L-V, Heldt HW & Klingenberg M (1966) Inorganic pyrophosphate: forma-tion in bacterial photophosphorylaforma-tion Science 153, 1120–1122

3 Moyle J, Mitchell R & Mitchell P (1972) Proton-trans-locating pyrophosphatase of Rhodospirillum rubrum FEBS Lett 23, 233–236

4 Baltscheffsky M, Nadanaciva S & Schultz A (1998) A pyrophosphate synthase gene: molecular cloning and sequencing of the cDNA encoding the inorganic pyro-phosphate synthase from Rhodospirillum rubrum Bio-chim Biophys Acta 1364, 301–306

5 Baltscheffsky M, Schultz A & Baltscheffsky H (1999)

H+-PPases: a tightly membrane-bound family FEBS Lett 457, 527–533

6 Baltscheffsky H, Schultz A, Persson B & Baltscheffsky M (2001) Tetra- and nonapeptidyl motifs in the origin and evolution of photosynthetic bioenergy conversion In First Steps in the Origin of Life in the Universe(Chela Flores J, Owen T & Raulin F, eds), pp 173–178 Kluwer, Dordrecht

7 Nakanishi Y, Saijo T, Wada Y & Maeshima M (2001) Mutagenic analysis of functional residues in putative substrate-binding site and acidic domains of vacuolar H+-pyrophosphatase J Biol Chem 276, 7654–7660

8 Eigen M & Schuster P (1978) The hypercycle Naturwis-senschaften 65, 341–369

9 Trifonov EN (2000) Consensus temporal order of amino acids and evolution of the triplet code Gene 261, 139–151

10 Miller SL (1953) A production of amino acids under possible primitive earth conditions Science 117, 528– 529

Định dạng
Số trang	11
Dung lượng	608,73 KB