The linker average hydrophilicity procedure discriminated well between all the extended and non-extended forms of the known-structure calcium-binding proteins, and its prediction concern
Trang 1calcium-binding protein?
Liliane Mouawad1, Adriana Isvoran2, Eric Quiniou1and Constantin T Craescu1
2 Department of Chemistry, West University of Timisoara, Romania
Calcium transport and⁄ or regulation are important
events for the normal morphology and metabolism of
the cell and play significant roles in the mechanisms of
many disease processes [1] The proteins that interact
with the calcium ions involved in these events are
called calcium-binding proteins (CaBPs) They form
two main subfamilies: the EF-hand CaBPs and the
non-EF-hand CaBPs EF-hand CaBPs, whose
proto-type is calmodulin [2], are characterized by the
pres-ence of structural motifs called ‘EF-hands’ Non
EF-hand CaBPs do not use this structural motif to
bind calcium; they may be found in the cytoplasm (similar to C2 domain proteins) [3], in the extracellular medium [4] or associated with the membrane (similar
to annexins) [5]
For the EF-hand CaBPs, each EF-hand motif con-tains two helices connected by the calcium-binding loop, a highly conserved region that binds the metal ion Many CaBPs exhibit two domains, each contain-ing two EF-hand motifs; the N-terminal (helices A, B,
C and D) and C-terminal (helices E, F, G and H) domains are connected by a linker region (Fig 1)
Keywords
calcium-binding proteins; centrin; EF-hand;
hydrophobicity; predicted form
Correspondence
Curie-Recherche, Centre Universitaire Paris-Sud,
Baˆtiment 112, 91405 Orsay Cedex, France
Fax: +33 1 69 07 53 27
Tel: +33 1 69 86 71 51
E-mail: liliane.mouawad@curie.u-psud.fr
(Received 8 September 2008, revised 8
December 2008, accepted 10 December
2008)
doi:10.1111/j.1742-4658.2008.06851.x
The EF-hand calcium-binding proteins may exist either in an extended or a compact conformation This conformation is sometimes correlated with the function of the calcium-binding protein For those proteins whose structure and function are known, calcium sensors are usually extended and calcium buffers compact; hence, there is interest in predicting the form of the pro-tein starting from its sequence In the present study, we used two different procedures: one that already exists in the literature, the sosuidumbbell algorithm, mainly based on the charges of the two EF-hand domains, and the other comprising a novel procedure that is based on linker average hydrophilicity The linker consists of the residues that connect the domains The two procedures were tested on 17 known-structure calcium-binding proteins and then applied to 59 unknown-structure centrins The sosui-dumbbell algorithm yielded the correct conformations for only 15 of the known-structure proteins and predicted that all centrins should be in a closed form The linker average hydrophilicity procedure discriminated well between all the extended and non-extended forms of the known-structure calcium-binding proteins, and its prediction concerning centrins reflected well their phylogenetic classification The linker average hydrophilicity cri-terion is a simple and powerful means to discriminate between extended and non-extended forms of calcium-binding proteins What is remarkable
is that only a few residues that constitute the linker (between 2 and 20 in our tested sample of proteins) are responsible for the form of the calcium-binding protein, showing that this form is mainly governed by short-range interactions
Abbreviations
CaBP, calcium-binding protein; LAH, linker average hydrophilicity; PDB, Protein Data Bank.
Trang 2EF-hand CaBPs are divided into two broad classes [6]:
those that bind calcium to regulate its concentration
(calcium-buffering and calcium-transporting proteins)
and those that bind calcium to decode its signal
(cal-cium-sensor proteins) The two functional classes also
have different structural features: calcium-buffering
and calcium-transporting proteins, such as
parvalbu-min [7] or the Nereis diversicolor sarcoplasmic
calcium-binding protein [8], usually have a compact tertiary
structure and are not conformationally sensitive to
cal-cium-binding, whereas calcium sensor proteins, such as
calmodulin [2] and troponin C [9], have extended
ter-tiary structures and show important conformational
changes upon calcium-binding In the extended form,
the linker between the two domains may be structured
in a straight helix, whereas, in the non-extended form,
the linker is unstructured leading to either a floppy
conformation or a very compact one (Fig 2) [10] It is
important to understand the physical reasons for these
differences This would provide tools to predict the
form of the CaBPs from their sequences, and therefore indicate their biological function
Recently, a protein classification tool, sosuidumb-bell [11], was developed to predict the degree of com-pactness of proteins starting from their amino acid sequences This tool is based on studies undertaken on all the monomers of the Protein Data Bank (PDB) [12], and not just CaBPs, indicating that the electro-static repulsion between the domains is a dominant factor in the stabilization of the extended structures, in addition to the amphiphilic character of the central flexible region By contrast, globular proteins are pre-dicted to be stabilized by a hydrophobic core built by residues from the two domains Using the sosuidumb-bell algorithm, we have analyzed 17 CaBPs with known 3D structures (Table 1) Fifteen of them were predicted in the correct form but, unfortunately, two structures were incorrectly predicted Indeed, human calmodulin-like protein (1GGZ) [13] and human cen-trin 2 (2GGM) [14], which are extended proteins, were predicted to be compact These exceptions represent a non-negligible percentage (12%) and they emphasize the need for a more detailed analysis of the sequence– structure relationship in the case of CaBPs
In the present study, we have developed a novel pro-cedure based on the linker average hydrophilicity (LAH), which we applied to our sample of 17 known-structure CaBPs and to unknown known-structures of cent-rins Centrins, a subfamily of CaBPs, are essential components of microtubule-organizing centers in organisms ranging from algae and yeast to humans [15,16] They are EF-hand calcium-binding proteins with a sequence similarity to calmodulin but distinct calcium-binding properties [15] They were shown to
be involved in centrosome duplication [17] and the contraction of centrin-based fiber systems [18] and to play a functional role in nuclear export pathways [19] The Ca2+ dependence of the centrin interactions with their targets suggests that centrins play a regulatory role by activating or changing the conformation of various target proteins Analyses of amino acid sequences of centrins from different organisms reveal
at least four phylogenetic families and several phyloge-netic subfamilies [20,21] The centrins that we consider
in the present study are listed in Table 2: (a) the Chla-mydomonas reinhardtii-like family (CrCen-like), which contains centrins from the subfamilies of green algae and vertebrate isoforms Cen1 and Cen2; (b) the higher plants Arabidopsis-like family (AtCen-like); (c) the yeast Saccharomyces cerevisiae-like family (Cdc31-like), which contains mainly two subfamilies, fungal centrins and the vertebrate isoform Cen3; and (d) the Parame-cium tetraureliainfraciliary lattice family (PtICL1-like),
Loop I Loop II Loop III Loop IV
Linker Fig 1 The hand protein schematic representation Each
EF-hand motif consists of two helices linked by a calcium loop (black
dots represent calcium ions) Two motifs constitute one EF-hand
domain The N- and C-domains are bound by a linker (bold line).
Fig 2 View of the 3D structures of two CaBPs: (A) the extended
form of calmodulin (PDB code: 1CLL) and (B) the non-extended
form of guanylate cyclase activating protein 2 (PDB code: 1JBA).
The helices are in cyan, the b-sheets are in yellow and the linker is
in red The linker in 1CLL is structured, whereas it is a loop in
Trang 3organized in ten subfamilies that contain 35 identified
isoforms [22] The 3D structure of the entire protein in
complex with its target polypeptide is known for only
two centrins: the human centrin: HsCen2 (2GGM) [14]
and the Saccaromyces cerevisiae centrin, ScCdc31
(2DOQ) [23]
The functional diversity of centrins should depend
on their sequence and their Ca2+ binding properties
However, we may ask whether the global
conforma-tion or the conformaconforma-tional preference of individual
centrin molecules also play a role in the target
recogni-tion and the plasticity of heteromolecular complexes
This idea is supported by the recent observation that
yeast ScCdc31 bound to a ScSfi1 fragment shows a
bent conformation [23], whereas human HsCen2 in
complex with an XPC peptide is completely extended
[14] In the present study, we present a new and simple
theoretical procedure for the global shape prediction
of EF-hand proteins that allows us to analyze the
pos-sible shape diversity of centrins presented in Table 2
Results and Discussion
Utilization of theSOSUIDUMBBELLalgorithm
We first applied the sosuidumbbell algorithm (http://
bp.nuap.nagoya-u.ac.jp/sosui/sosuidumbbell/dumbbell_
submit.html) to all the CaBPs with known 3D
struc-tures (Table 1) In this algorithm, a structure is pre-dicted to be extended if it obeys four criteria: (a) the absolute value of the net charge of the entire protein is higher than 20 (|Qprot| > 20); (b) the absolute net charge density (|Qprot|⁄ N, where N is the total number
of residues) is higher than 0.14 (dQ> 0.14); (c) there
is a charge balance between the two domains (|QNQC| > 100); and (d) there is a high amphiphilicity
at the center of the linker region and a high hydropa-thy at its termini [11] Based on these four criteria, the results yielded 15 well-predicted structures and two incorrectly predicted ones The latter are human cal-modulin-like protein (1GGZ) and human centrin 2 (2GGM), the structures of which are extended but pre-dicted as non-extended Therefore the question remained as to which of the four criteria described above is responsible for this misprediction To address this question, we verified initially the first two criteria For this purpose, we calculated the absolute net charge and the charge density of the entire protein for all the investigated CaBPs (Table 3), with known and unknown structures (Tables 1 and 2) First, we fol-lowed exactly the procedure described by Uchikoga
et al [11], namely that histidine residues were consid-ered as positively charged (although at the pH values corresponding to the great majority of the experiments, they are deprotonated) and the calcium ions that might bind to the protein were omitted The results
Table 1 Features of the known-structure CaBPs used in the present study, showing the name of the protein, its code in the PDB, its code
bp.nuap.nagoya-u.ac.jp/sosui/sosuidumbbell/dumbbell_submit.html) CIB, and-integrin-binding protein; SCBP, sarcoplasmic calcium-binding protein.
Protein
PDB code
SwissProt code
Experimental structure
Structure predicted by the SOSUIDUMBBELL algorithm
Trang 4Table 2 Phylogenetic classification of centrins All centrins considered in the present study (with known and unknown structures) are classi-fied by families and subfamilies The PDB codes of the known structures of fragments (*) or the entire protein are given.
Trang 5(Fig 3A,B and Table 3) show that, as indicated above,
only five known-structure proteins are predicted to be
extended instead of the seven expected (1GGZ and
2GGM are mispredicted) and all centrins with
unknown structures are predicted in a non-extended
form In a second step, the histidines were considered
neutral (CaBPs usually contain very little His) and the
Ca+2ions were added, but the results were even worse
(data not shown) because the net charge was
dimin-ished and therefore the structures were predicted to be
even more compact The first two criteria appear to be
responsible for the misprediction of the form of 1GGZ
and 2GGM Moreover, concerning centrins with
unknown structures, some experimental results (C T
Craescu & S Miron, unpublished data) in addition to
the phylogenetic classification indicate that at least the
CrCen family proteins should be in an extended form,
which is not the case in the prediction based on the
first two criteria
The last two criteria in the sosuidumbbell
algo-rithm are strongly dependent on the definition of the
domains and the inter-domain linker The
delimita-tion of this linker is not always obvious: in the
extended structures, it forms a helix in the continuity
of helices D and E, whereas, in some compact
con-formations, it is a very short unstructured region
(Fig 2) In the sosuidumbbell algorithm, the linker
considered may be too long and, consequently, the
domains too short, as for calmodulin, where helices
D and E, which belong to the N- and C-domains,
respectively, are considered as parts of the linker
[11] In the present study, to determine the linker,
we identified first the calcium-binding loops (Fig 1),
then we counted ten residues after loop II
(corre-sponding to helix D) and ten residues before loop III
(corresponding to helix E), and the remaining
resi-dues inbetween were considered as the inter-domain
linker Ten residues were considered for helices D
and E because the experimental structural data show
that a helix belonging to an EF-hand motif contains
ten residues on average Consequently, in the
pro-teins investigated in the present study, the linker was
between two and 20 residues long (Table 3), corre-sponding to 0.96% and 10.26%, respectively, of the protein sequence length
Based on this definition of the linker, the charges of the N- and C-domains were calculated without consid-ering the calcium ions In Fig 3C, we report the abso-lute value of the product of these charges, |QNQC|, which represents the charge balance between the domains With the exception of troponins, all the investigated proteins are characterized by products
|QNQC| lower than 100, and therefore are predicted to
be non-extended
From these results, it is clear that, for CaBPs, the charges of the entire protein or of the separated domains are not responsible for the extended or com-pact form of the protein This assertion is obvious in the case of human centrin 2 (HsCen2) In this protein, the first 25 amino acids, corresponding to a disordered region, are highly charged [24,25], with the net charge
of this peptide being equal to 6 (it contains seven basic and one acidic residues) The X-ray structure of this protein was obtained in the presence [14] and in the absence [25] of these residues (PDB codes 2GGM and 2OBH, respectively) In both cases, HsCen2 adopts an extended conformation, showing that the charge bal-ance of the domains does not play an important role for this protein Nevertheless, in both cases, the sosui-dumbbell algorithm predicts a non-extended form, which is not correct Moreover, the structure of all the extended forms of the CaBPs considered in the present study was determined experimentally in the presence of calcium ions Knowing that these ions reduce signifi-cantly the charges of the domains and therefore their electrostatic repulsions, calcium-binding should favor the compact structure of CaBPs, which is not the case The fourth criterion of the sosuidumbbell tool refers to the hydrophobicity of the central linker region, which is calculated using the Kyte & Doolittle Scale [26] Ushikoga et al [11] described the linker region of an extended protein as having an important negative hydrophobicity in its center (i.e to be signifi-cantly hydrophilic), whereas its edges (helices D and
Table 2 Continued.
Trang 6QC
(dQ
QN
Qc
Qlink
Qprot
QC
dQ
Q jjprot N
Linker length
n
Trang 7QN
Qc
Qlink
Qprot
QC
dQ
Qprotjj N
Linker length
n
Trang 8E) are hydrophobic In the present study, the same
calculations were applied to all known-structure
pro-teins, and it was observed that, in some cases,
non-extended proteins (e.g recoverin; 1REC) present the
same hydropathy profile around the linker as extended
proteins, such as calmodulin or troponin C (1OSA and
4TNC; Fig 3D) Therefore, none of the criteria
retained in the sosuidumbbell algorithm are
com-pletely reliable to predict the form of the CaBPs This
motivated our search for other criteria
Utilization of other criteria
Contact area
We analyzed the contact area between the domains of
known-structure non-extended CaBPs As expected,
most of the residues at the interface were found to be
hydrophobic In most compact structures, a trypto-phan (or less frequently a phenylalanine) located in one domain was buried in a hydrophobic cavity in the other domain, which would stabilize the compact structure Unfortunately, this observation cannot be used as a predictive tool starting from the sequence because the aromatic residue is not located in a specific part of it Indeed, the sequence of the linker and its close vicinity (three more residues from each side of the linker) does not always contain tryptophan or phenylalanine residues for compact forms (see 1REC, 1JBA, 1BJF and 2SCP in Table 3)
The presence of helix breakers Prolines and, to a lesser extent, glycines, are well-known helix breakers We investigated the presence of
0
5
10
15
20
25
30
Q prot
Protein number
Protein number
Protein number –0.05
0 0.05 0.1 0.15 0.2
d Q
150
200
1 2 3 4
Linker Helix E Helix D
0
50
100
Q C
–4 –3 –2 –1 0
Relative residue number
struc-tures (filled circles), the known non-extended strucstruc-tures (open diamonds) and the unknown strucstruc-tures of centrins (filled triangles) It can be seen that two extended structures are mispredicted (1GGZ and 2GGM) and that all the unknown-structure centrins are predicted to be non-extended (B) The absolute net charge density (dQ) with a horizontal line limit at 0.14 (C) The absolute value of the product of the two domain charges (|QNQC|) in the absence of calcium ions with a horizontal line limit at 100 In this case, only tropnin C molecules are pre-dicted to be extended (D) The hydrophobicity profile of the linker region and its surroundings using the Kyte & Doolitle Scale for two extended structures (dotted lines, 1OSA and dashed line, 4TNC) and for a non-extended one (solid line, 1REC) For convenience of compari-son, the three sequences were renumbered and centered on the linker The zero point corresponds to residue number 92 in 4TNC, 81 in 1OSA and 98 in 1REC, which represents the center of the linker in each case.
Trang 9such residues in the linker or its vicinity (i.e plus three
residues from each side of the linker) The results
pre-sented in Table 3 show that, as expected, the presence
of a Pro yields a non-extended form by breaking the
central helix that constitutes the linker, but the reverse
is not true because all the compact CaBPs do not
con-tain a Pro in the linker Therefore, this criterion
can-not constitute a predictive rule Moreover, concerning
glycines, it was observed that, in both troponin C
pro-teins (4TNC and 1TN4), which are extended, there is
one Gly in the linker, as in bovine recoverin (1REC),
guanylate cyclase activating protein 2 (1JBA) and
bovine neurocalcin d (1BJF), which present very
com-pact structures
Net electric charge of the linker
It might be assumed that the net electric charge of the
linker plays a role if there is repulsion between this
lin-ker and the adjacent domains Thus, this property was
investigated (Table 3) but did not yield a good
discrim-inating criterion because, in HsCen2 (2GGM), which
is extended, the linker is neutral as in bovine
neurocal-cin d (1BJF) or amphioxus sarcoplasmic
calcium-bind-ing protein (2SAS), which are non-extended structures
Hydrophilicity of the linker
The criterion that yielded the best results was based on
the hydrophilicity of the linker It was obtained by the
procedure detailed below First, the hydrophilicity (hi)
of each residue i of the protein was calculated using
the Hopp & Woods Scale [27] with a nine-residue
slid-ing window In this scale, positive values correspond
to hydrophilic positions
Second, the linker was determined as described
above: if the last residue of the calcium-binding loop II
is denoted J and the first residue of the
calcium-bind-ing loop III is denoted K, the linker consists of all
resi-dues comprised in the interval ]J + 10, K) 10[
Finally, the LAH was calculated:
i2 Jþ10;K10 ½
hi n
where n is the number of residues in the linker and hi
is the hydrophilicity at position i of the linker
This procedure was applied to all proteins in
Tables 1 and 2 The results are presented in Fig 4
Remarkably, the LAH values discriminated well
between the extended and non-extended forms of the
known structures of the CaBPs, with two distinct sets
of points, where LAH was greater than 1.6 for the
extended forms and < 1.2 for the others Therefore,
an average value of 1.4 was considered as the thresh-old above which a two-domain EF-hand protein is extended Moreover, one of the reviewers of the pres-ent study suggested the case of calcineurin B-like pro-tein 2 from Arabidopsis (SwissProt code: Q8LAS7, PDB code: 1UHN), which we omitted to consider in our sample The protein consists of 226 residues and the linker of five residues (residues 117–121) The cal-culated LAH value is 0.2978, predicting a compact structure in good agreement with the 3D structure of the protein Considering centrins with unknown struc-tures, it can be seen that the LAH values reflect well the phylogenetic classification, although this classifica-tion is based on the entire sequence, whereas LAH is based on only few residues in the linker region
To determine whether the discrimination potency of the linker average hydrophilicity is fortuitous or not, LAH values were reported versus the radius of gyra-tion of the known structures in Fig 5 A clear correla-tion is demonstrated between these two features, with
a correlation coefficient equal to 0.82 and a Student coefficient of 36.98 (for 16 degrees of freedom that cor-respond to 17 points), indicating that the probability
of this correlation to be random is < 0.001 The LAH algorithm is available at: http://u759.curie.u-psud.fr/ modelisation/LAH
The predictive potency of the present method depends on the determination of the linker limits, which must be defined objectively To find such a defi-nition, several delimitations were tested, including the
–0.5 0 0.5 1 1.5 2 2.5
Protein number
Cen2
AtCen
ICL11
ICL1e
ICL5 ICL10 Cdc31
ICL8 ICL9
Fig 4 The LAH for the investigated proteins The horizontal line delimits between the predicted extended structures (LAH > 1.4)
delimit between the known extended structures (filled circles), the known non-extended structures (open diamonds) and the unknown structures of centrins (filled triangles) For the unknown-structure centrins, we indicate the phylogenetic subfamilies.
Trang 10one used in the sosuidumbbell tool We have
observed that considering long linkers, which overlap
adjacent helices, does not allow us to discriminate
between the different forms of CaBPs because the
results were polluted by the nature of the extra
resi-dues, whereas the shortest possible linkers provided
the most reliable way to discriminate between the
extended and compact forms However, it must be
noted that the influence of four neighboring residues
at both ends of the linker are taken indirectly into
account because of the nine-residue window used in
the calculations of hydrophilicity Raw hydrophilicity
data (equivalent to a one-residue window) were also
tested to check the importance of this influence The
results were qualitatively similar to those obtained with
the nine-residue window with respect to the prediction
of the form of the protein, but the correlation between
LAH and the radius of gyration was less evident
Moreover, this discrimination was possible when
calcu-lating LAH with the Hopp & Woods Scale for
hydrop-athy Three other scales were tested (Kyte & Doolittle
[26], Miyazawa & Jernigen [28] and Janin [29]) but did
not provide satisfactory results This is mainly due to
the scores attributed to the Asn, Gln and Trp residues,
which are considered to be much more hydrophilic in
these scales than in the Hopp & Woods Scale
Applying the LAH method to centrins showed that
the CrCen-like proteins are predicted to be extended,
which is in good agreement with the known structure
of one member of this family, HsCen2 [14,25] The
Cdc31-like family is predicted to be in the
non-extended form, which is also in good agreement with
the known structure of ScCdc31 [23] There are no
experimental information about the other centrins, but
we predict that members of the AtCen family are in an
extended form, similar to the CrCen family, and that
the PtICL family is divided into two sets: the extended proteins (ICL1a, ICL3a and ICL11 subfamilies) and the non-extended ones (ICL1e, ICL3b, ICL5, ICL7, ICL8, ICL9 and ICL10 subfamilies)
Conclusions
The results obtained in the present study indicate that the extended and compact forms of EF-hand proteins
do not necessarily depend on the electric charge of the domains, but they are mainly determined by the hydro-philicity (as determined by the Hopp & Woods Scale)
of the residues that link the two domains The definition
of the linker is very important and should not include residues from the adjacent helices What is remarkable
is that, once the linker is defined objectively, the nature
of its residues appears to determine the form of the CaBP, whatever the length of this linker; it can be as long as 20 residues, as in calcineurin B homologous protein 1, 2CT9 (representing approximately 10% of the protein length; Table 3), or as short as two residues,
as in P tetraurelia infraciliary lattice centrins 9, PtICL9 (< 1% of the protein length) However, the length of the linker in the set of proteins considered in the present study is approximately five residues on average, which
is rather short This indicates that the form of CaBPs is likely governed by short-distance interactions
Experimental procedures
Seventeen CABPs with known structures, two of them com-prising centrins, in addition to 59 centrins with unknown structures, were considered in the present study
Choice of the proteins CaBPs with known structures were taken from the PDB [12] Only proteins containing four EF-hand motifs were considered The chosen structures had to obey to several criteria
First, the proteins had to be in their unbound state (i.e not in complex with their target peptides because peptide binding may cause conformational changes of the entire protein) There were, however, two exceptions: human cen-trin 2 (2GGM) and yeast cencen-trin (2DOQ), in which the peptide interacts with only one domain (C-domain) and therefore does not modify the relative position of the two domains In addition, these two structures were the only ones available in the PDB for this family of proteins Second, the EF-hand proteins, which had an extended structure resolved by NMR, were discarded because they did not provide enough information concerning the relative positions of their domains
16
18
20
22
LAH
Fig 5 The radius of gyration of the known-structure CaBPs versus
their LAH The straight line shows the linear fit of the points The
correlation coefficient is 0.82.