Keywords bioinformatics; disorder prediction; intrinsically disordered proteins; seminal vesicle protein no.. Abbreviations HCA , hydrophobic cluster analysis; IDPs, intrinsically disord
Trang 1androgen-dependent protein secreted from rat seminal
vesicle
Silvia Vilasi and Raffaele Ragone
Dipartimento di Biochimica e Biofisica, Naples, Italy
The view that a protein must fold into the correct
shape, as encoded in the amino acid sequence, before
it can function has been deeply rooted in protein
sci-ence, even before the three-dimensional structure of a
protein was first solved However, for some proteins,
especially those involved in signalling and regulation
[1], the unstructured state has been suggested to be
essential for basic cellular functions and recognized as
a separate functional and structural category [2,3]
These are proteins or domains that, in their native
state, are either completely disordered or contain large
disordered regions, and therefore do not fit the
stan-dard sequence–structure–function paradigm, because
intrinsic disorder, whether local or extended to the
entire protein length, is crucially important for their function Dunker and Obradovic [4] categorized func-tional intrinsically disordered regions in molten glob-ule-like and random coil-like structural forms, and Uversky [5] suggested the existence of an additional pre-molten globule form, whose peculiarity is the pres-ence of unstable secondary structure Betraying still imperfect categorization, these systems are currently classified as ‘intrinsically disordered proteins’ (IDPs), but the use of other synonymous expressions, such as
‘intrinsically unstructured proteins’, is widespread in the literature [6] More than 100 such proteins are known, including Tau, Prions, Bcl-2, p53, 4E-BP1 and eIF1A [5,7]
Keywords
bioinformatics; disorder prediction;
intrinsically disordered proteins; seminal
vesicle protein no 4; structure–function
relationship
Correspondence
R Ragone, Dipartimento di Biochimica
e Biofisica, Seconda Universita` di Napoli,
via S Maria di Costantinopoli 16,
80138 Naples, Italy
Fax: +39 081 294136
Tel: +39 081 294042
E-mail: raffrag@tiscali.it;
raffaele.ragone@unina2.it
(Received 30 October 2007, revised 5
December 2007, accepted 13 December
2007)
doi:10.1111/j.1742-4658.2007.06242.x
The potent immunomodulatory, anti-inflammatory and procoagulant properties of protein no 4 secreted from the rat seminal vesicle epithelium (SV-IV) have previously been found to be modulated by a supramolecular monomer–trimer equilibrium More structural details that integrate experi-mental data into a predictive framework have recently been reported Unfortunately, homology modelling and fold-recognition strategies were not successful in creating a theoretical model of the structural organization
of SV-IV It was inferred that the global structure of SV-IV is not similar
to that of any protein of known three-dimensional structure Reversing the classical approach to the sequence–structure–function paradigm, in this paper we report novel information obtained by comparing the physico-chemical parameters of SV-IV with two datasets composed of intrinsically unfolded and ideally globular proteins In addition, we analyse the SV-IV sequence by several publicly available disorder-oriented predictors Overall, disorder predictions and a re-examination of existing experimental data strongly suggest that SV-IV needs large plasticity to efficiently interact with the different targets that characterize its multifaceted biological function, and should therefore be better classified as an intrinsically disordered protein
Abbreviations
HCA , hydrophobic cluster analysis; IDPs, intrinsically disordered proteins; PDB, protein data bank; SV-IV, rat seminal vesicle protein no 4; SVM, support vector machine.
Trang 2Of the proteins studied in our laboratory, SV-IV
(seminal vesicle protein no 4, so identified according to
its electrophoretic mobility in SDS-PAGE; precursor
SWISS-PROT ID, SVP2_RAT) is a basic (pI = 8.9),
thermostable protein of 90 residues (Mr= 9758)
secreted from the rat seminal vesicle epithelium under
strict androgen transcriptional control, which has been
found to possess potent non-species-specific
immuno-modulatory, anti-inflammatory and procoagulant
prop-erties [8] It has been purified to homogeneity and
characterized extensively [8–10] It is encoded by a gene
that has been isolated, sequenced and expressed in
Escherichia coli [11–14] On the basis of its biological
and biochemical characteristics, SV-IV appears to be a
molecule of obvious pharmacological interest
SV-IV-immunorelated proteins have been discovered in several
rat tissues, as well as in human seminal fluid and
semi-nal vesicle secretion [13,14] The segment 3–41 of SV-IV
has been found to have a high amino acid sequence
similarity with the C-terminal segment 34–66 of
utero-globin, a secreted protein from rabbit displaying
phospholipase A2 inhibitory activity in vitro and
anti-inflammatory effects in vivo [15,16] Others have also
been able to prepare potent anti-inflammatory peptides
from the region of highest similarity between
uteroglo-bin and lipocortin I, a protein that has been suggested
to mediate the anti-inflammatory effects of
glucocortic-oids [17] It is therefore highly desirable to obtain as
complete structural information as possible
From a structural standpoint, early circular
dichro-ism and fluorescence polarization data indicated scarce
structural organization [18] This agreed with a
predic-tor of local flexibility [19], although other predictive
algorithms contrastingly have suggested either the
pres-ence [18] or lack [20] of an appreciable amount of
sec-ondary structure Recently, it has been found that, in
the range of physiological concentrations (2–48 lm
[20,21]), the peculiar biological properties of SV-IV are
probably modulated by a supramolecular equilibrium
in which a trimeric form competes with monomeric
protein for binding to a large variety of SV-IV targets
[20] Eventually, Caporale et al [22] found agreement
between the amounts of predicted and experimental
helical structure present in the monomeric form
(20 and 24%, respectively), and attempted to create a
theoretical model of the structural organization of
SV-IV However, on noting that homology modelling and
fold-recognition strategies were not able to provide
detailed structural information, they concluded that
‘SV-IV assumes a global structure that is not similar
to any protein of known three-dimensional structure’
[22] Indeed, such an occurrence suggests that SV-IV
could violate the standard sequence–structure–function
paradigm, but the authors did not investigate this pos-sibility
We have verified that, in terms of disorder- and order-promoting amino acid subsets [23,24], the com-position of SV-IV does not strictly conform to trends previously found to occur in IDPs, except for a very high content of serine (24%) Furthermore, a search of the DisProt database [25] did not return any hits for SV-IV, indicating that no DisProt sequence resembles this protein However, novel information obtained by publicly available disorder-oriented predictors empha-sizes that the functional state of SV-IV lacks significant structural organization This evidence is sufficient to confidently state that SV-IV can be classified amongst IDPs Incidentally, the present work also confirms that homology modelling and fold-recognition strategies are best suited to obtain information on the architecture
of ordered proteins, but the study of IDPs as if they were ordered can prove to be highly frustrating Thus, when dealing with proteins of uncertain three-dimen-sional structure, it would be more correct and less time-expensive to look for disorder before attempting modelling procedures
Results
Survey of existing structural information
In addition to fluorescence polarization and both far-and near-UV circular dichroism data from our labora-tory [18,20,22], experimental evidence that regular structure is scarce in SV-IV comes from SDS-PAGE, which is routinely used to assess the Mrvalues of pro-teins Because of their unusual amino acid composi-tion, IDPs bind less SDS than usual and their apparent Mrvalue is often 1.2–1.8 times higher than the real value calculated from sequence data or mea-sured by mass spectrometry [7] Indeed, the mobility of SV-IV in SDS-PAGE is compatible with an Mrvalue
of about 15 000–18 000 [9], which can be compared with an Mrvalue of 9758 calculated from the sequence Size-exclusion chromatography also indicates that the hydrodynamic radius of SV-IV resembles that
of an IDP [7], because purified SV-IV elutes well behind chymotrypsinogen (Mr= 25 600) and slightly ahead of RNase A (Mr= 13 600) [9] Finally, diges-tion of SV-IV with trypsin suggests that all but Lys80
of the potential proteolytic sites represented by nine lysine and seven arginine residues are able to efficiently interact with the catalytic site of the enzyme [22], as expected for an IDP-like polypeptide [7] This piece of information has prompted us to perform predictive analyses aimed at clarifying whether or not the SV-IV
Trang 3sequence is compatible with the classical sequence–
structure–function paradigm
Analysis of physicochemical parameters
It has recently emerged that protein disorder tends to
be related to general chemical properties, rather than
to the abundance or scarcity of specific amino acids
[26] Indeed, like early analyses of protein disorder that
were based on the reasoning that protein folding is
governed by a balance between hydrophobic forces
(attractive) and electrostatic forces between similarly
charged residues (repulsive) [23], disorder-oriented
predictors largely use physicochemical parameters,
such as hydrophobicity [24,27–33], the absolute value
of the net charge [24,27–29,33], C-a B-factors [24,27–
29,32,34] and number of contacts [35–38] Accordingly,
we obtained preliminary information on the structural
preference of SV-IV by comparing values per residue
of these parameters with those of two protein
data-bases composed of ideally globular [35] and natively
unfolded [39] proteins, respectively Visual inspection
of two-dimensional plots obtained by considering all
possible combinations of two parameters suggests that
SV-IV has a strong preference to conform to the
gen-eral structural features expected for IDPs, because in
no case do SV-IV data points fall in regions populated
by ordered proteins (Fig 1)
General prediction analysis
Owing to increased interest in the structure–function
relationships of IDPs, disorder-related literature is
increasing, as witnessed by several recent reviews
[40–43] To obtain prediction reliability, two general
options are presently available: (a) the combined use
of ab initio algorithms, such as a recent scheme based
on well-known predictors [23]; or (b) recent programs
with improved performance on some benchmarks, such
as those based on expected packing density [36–38] or
support vector machine (SVM) methods [44–46] (see
Materials and methods for further details) However,
as the SV-IV sequence comprises amino acid subsets
different from those previously found to occur in IDPs
[23,24] and does not resemble any known sequence
included in the DisProt database [25], it may be
valu-able to proceed with caution and investigate both
options
The first procedure comprises a preliminary search
for low-complexity regions through the seg algorithm
[47], followed by a thorough analysis benefiting from
the combined use of several ab initio methods, such as
pondr (VSL1 and VL-XT) [24,27–29], hydrophobic
cluster analysis (hca) [30], prelink [31], globplot [32], disembl [34], ronn [48], iupred [49], disopred2 [50] and norsp [51] When applied to SV-IV, seg resulted in a long non-globular region spanning the entire sequence, but few amino acids in the N- and C-termini (amino acids 1–4 and 84–90, respectively) Other structural peculiarities, such as disulfide-forming cysteine residues, zinc fingers and leucine zippers [52], are absent from the SV-IV sequence On the functional side, SV-IV is predicted to be a metal binding protein [53], but the expected probability of correct classifica-tion is about 60%, which is lower than the actual clas-sification accuracy based on the analysis of 9932 positive and 45 999 negative samples of proteins [54] The vast majority of the other methods also converged
to indicate an abundance of intrinsic disorder in SV-IV, but few amino acids in the C-terminal region
In particular, hydrophobic clusters, which are typical
of secondary structure elements, were almost totally absent from the hca plot, and prelink predicted the whole sequence as disordered By contrast, some regu-lar structure was predicted by X-ray-based algorithms, such as various disembl routines and disopred2 (seg-ments 31–39, 49–59 and 77–90), and discrepancies also affected globplot analyses, depending on the particu-lar order–disorder propensity set chosen to obtain pre-dictions, but in no cases were potential globular domains predicted When subjected to norsp, the
SV-IV protein did not appear to conform to criteria fixed for identifying non-regular secondary structure (NORS) regions, although about 70% of residues were predicted to be in loopy regions We suspect that no NORS region can be predicted in SV-IV because the recommended length of the sequence window used to calculate the structural content (70 amino acids) is close to the protein length (90 amino acids) Finally, a vanishingly small probability of coiled-coil regions was also predicted by multicoil [55] and coils [56] algo-rithms (not shown) The above results are summarized
in Fig 2
Another set of predictions was performed using algorithms that have been reported to predict protein disorder more accurately than other methods, namely the foldunfold predictor [36–38] and the SVM-based poodle suite [44–46] According to foldunfold, SV-IV is probably fully disordered, because the aver-age value of the disorder parameter over its sequence
is less than the disorder threshold Moreover, the aver-age value of the disorder parameter over regions 1–34, 36–57 and 59–80 is less than the disorder threshold and the regions are greater than the reliable frame (11 residues), which means that these regions are predicted as fully disordered (Fig 3A) Similarly,
Trang 4poodle predictions suggest that: (a) the entire SV-IV
sequence corresponds to a long disorder region
(poodle-l); (b) a few residues (amino acids 39–40
and 85–90) do not belong to short disorder regions
(poodle-s); and (c) disorder characterizes the whole
protein because of the high disorder propensity of all
residues (poodle-w) (Fig 3B)
Other predictions
To complete our analysis, we verified whether or not
SV-IV possesses biased amino acid composition and
can be maximally separated from globular proteins
Both features have been found to occur in IDPs On the first point, Weathers et al [26,57] have recently examined the contribution of various vectors to recog-nizing proteins that contain disordered regions through
an SVM trained on naturally occurring disordered and ordered proteins They found that high recognition accuracy can be obtained by an SVM that incorporates only amino acid composition, and very good recogni-tion accuracy was retained using reduced sets of amino acids based on chemical similarity Overall, this sug-gests that composition alone and general physicochem-ical properties, rather than specific amino acids, are sufficient to accurately recognize disorder We applied
0
0.2
0.4
0.6
0.8
Hydrophobicity
Hydrophobicity
0 0.2 0.4 0.6
Number of contacts
0
0.2
0.4
0.6
–0.1
0.1 0.2 0.3 0.4 0.5 0.6
B factors
0.15 0.30 0.45
18
–0.15 –0.05 0.05 0.15 0.25
Number of contacts
0.15
0.30
0.45
0.60
0.05 –0.15 –0.05 0.15 0.25
16.5 18.0 19.5 21.0 22.5
B factors
Fig 1 Two-dimensional plots The SV-IV datum (red symbol) is compared with the two sets of 90 natively unfolded and 80 ideally globular proteins (black and grey symbols, respectively) using the mean values of physicochemical parameters computed from the sequence (A) Number of contacts versus hydrophobicity (B) Number of contacts versus net charge (C) Number of contacts versus C-a B-factors (D) Net charge versus hydrophobicity (E) Net charge versus C-a B-factors (F) Hydrophobicity versus C-a B-factors.
Trang 5Fig 2 Analysis of the SV-IV sequence using well-known predictors The original graphic output of each method and the corresponding inter-pretation are shown In HCA , the protein sequence is shown on a duplicated a-helical net with hydrophobic clusters identified by solid con-tours and amino acid numbers indicated on the top ,¤, h and refer to proline, glycine, threonine and serine, respectively.
Trang 6the SVM method to compare the SV-IV sequence with
the primary structures of 80 ideally folded and
90 natively unfolded proteins Fig 4A shows the mean
values of the disorder score for all of these proteins
Although the regions covered by the two protein
data-sets overlap to some extent, the SV-IV datum clearly
belongs to the region populated by natively unfolded
proteins With regard to the second point, other
authors [35] have devised an optimal set of artificial
parameters for 20 amino acid residues by Monte Carlo
algorithm, by which they have obtained maximal
sepa-ration between sets of natively unfolded and ideally globular proteins Following the same rationale as above, we compared the mean value of the artificial parameter for SV-IV and the two sets of proteins Even in this case, the SV-IV datum unequivocally falls amongst natively unfolded proteins, whose data points are well separated from those of globular proteins (Fig 4B) Finally, Fig 4C summarizes the results obtained by other algorithms, such as dispro [58], some additional methods not included in the pondr package developed by Dunker et al [59,60], and
aa 39–40 and 85–90 have borderline disorder (probability very close to 0.5) The remaining regions are predicted as disordered
POODLE-S POODLE-L
The whole protein is predicted as disordered
POODLE-W
FOLDUNFOLD
The whole protein is predicted as disordered
0 10 20 30 40 50 60 70 80 90
Residue position
17
18
19
20
21
22
A
B
Residue positions
0
0.5
1
0 20 40 60 80
Residue positions
0 0.5
1
0 20 40 60 80
Fig 3 Analysis of the SV-IV sequence using improved performance programs Graphic output of FOLDUNFOLD [36–38] (A) and POODLE [44–46] (B) predictors.
Trang 7drippred [61] All of these algorithms agreed in
predicting that 100% amino acids in the SV-IV
sequence are disordered, except drippred, which
resulted in 32% of residues scoring as regular
structure
Discussion
The structural information re-examined here indicates
that intrinsic disorder is abundant in SV-IV Thus, it
was to be expected that homology modelling and
fold-recognition strategies would be unable to create
a theoretical model of the structural organization of
SV-IV [22] Indeed, we have used several disorder
predictors to obtain novel evidence that the odd
behaviour of SV-IV is not compatible with the
classi-cal sequence–structure–function paradigm Our
predic-tions suggest that: (a) the entire SV-IV sequence does
not encode any region with globular organization;
(b) a few isolated segments (mostly the C-terminal
region) may possess some regular structure; (c) the
prediction of regular structure almost exclusively
comes from methods based on Protein Data Bank
(PDB) missing coordinates (disembl routines, dis-opred2 and drippred) and secondary structure-derived propensities (globplot with Deleage–Roux and Russell–Linding parameters); and (d) the mean physicochemical properties of SV-IV are typical of IDPs, as suggested by methods based on visual inspection This could provide a clue for the clarifica-tion of the still obscure aspects of the SV-IV struc-ture–function relationships
Lack of consensus affecting disorder prediction in some regions of SV-IV may result from the different sensitivity displayed by disorder predictors towards the various functional properties that are encoded in sepa-rate segments of the protein sequence Indeed, integrity
of the primary structure was found to be necessary for immunomodulation, whereas all of the procoagulant and anti-inflammatory properties were located in the fragment 1–70, which is devoid of any immunomodu-latory activity, but possesses the same procoagulant and anti-inflammatory activity as the native protein Moreover, the fragment 8–16 was the shortest N-ter-minal-derived peptide that possessed equivalent
or slightly higher anti-inflammatory activity than
DISpro
1–90
VL2 1–90
–9 –6 –3
0
3
400 600 800
Number of residues in protein Number of residues in protein
–4
–2
0
2
4
6
8
A
C
B
Fig 4 Additional predictions of disorder Comparison of the SV-IV sequence with the primary structures of 90 natively unfolded and 80 ide-ally globular proteins (same symbols as in Fig 1) using the SVM method [26,57] (A) and an optimal set of artificial parameters [35] (B) (C) Results obtained by other algorithms.
Trang 8the native protein, but did not possess any
immuno-modulatory or procoagulant activity Finally, CNBr
cleavage of SV-IV at the single Met70 residue
gener-ated the biologically inactive 71–90 peptide [16],
suggesting that the immunomodulatory properties of
SV-IV are strictly governed by the cooperation
between this and the 1–70 region
Concerning the organization of SV-IV, the results
reported here are in substantial agreement with
pre-vious secondary structure predictions, at least with
regard to the 1–70 region In fact, the self-association
process that underlies the overall functional
behav-iour of the protein induces conformational changes
mainly in this region, which has been suggested to
be without secondary structure in the monomer, but
to contain some a-helix in the trimer [22] However,
minor discrepancies amongst disorder predictions, as
well as between disorder and secondary structure
predictions, suggest that several peptide segments
within the protein sequence might display chameleon
structural behaviour In this regard, previous
experi-ments in buffer solution [18] have shown that a
structural rearrangement of SV-IV takes place after
treatment with 0.2–6.0 mm SDS As this interval
includes the critical micellar concentration of the
sur-factant (2.6 mm) [62,63], it may be inferred that
SV-IV interacts with the membrane-like environment
of SDS micelles, either through direct formation of a
protein–surfactant complex or by an indirect process
in which the micelle is formed first and the protein
is then inserted into it This process is totally
differ-ent from the non-specific massive cooperative binding
of SDS to proteins at submicellar concentrations,
and mimics the situation that SV-IV experiences in
most cell-based biological assays, where its
multi-faceted biological function involves efficient binding
to the plasma membrane of its target cells
(macro-phages, T lymphocytes and polymorphonuclear cells)
at specific sites (Kd@ 10)7–10)8) [16], and can be
obtained only through large plasticity of the
structure
Materials and methods
Protein databases
The database of disordered proteins was created using a list
of natively unfolded proteins [39] and the SWISS-PROT
protein sequence data bank [64] The ideal database of
globular proteins is available at the address http://phys
protres.ru/resources/folded_80.html [35,37], as selected by
inspecting the four general classes in the SCOP database
(1.63 release) [65]
Physicochemical parameters The mean protein hydrophobicity was calculated using the Kyte–Doolittle Scale [66], rescaled to a range of 0–1 [33] The expected average number of contacts per residue in the globular state was calculated according to [35] The mean net charge was defined as the absolute value of the differ-ence between the numbers of positively and negatively charged residues at pH 7.0, divided by the total residue number, according to [39] The average structural B-factor (isotropic temperature factor) scale (2.0 SD) was obtained from [32], where only the B-factors for the C-a atoms were considered to minimize influence by crystal packing and other structural artefacts
Predictors of disorder Below, we list all predictors used in this study, pointing out their salient features A detailed description of each predic-tor is outside the scope of this paper, and the reader inter-ested in more details is invited to refer to the relevant article(s) The segalgorithm (http://mendel.imp.ac.at/ METHODS/seg.server.html), based on the rationale that compact globular structures exhibit quasi-random statistical properties, is designed to detect regions of biased amino acid composition using mathematically defined properties [47] The stringency of the search for low-complexity segments is determined by three user-defined parameters [trigger window, W; trigger complexity, K(1); extension complexity, K(2)], using the seg sequences 45, 3.4, 3.75 and
25, 3.0, 3.3 for long and short non-globular domains, respectively Predictors of natural disordered regions (PONDRs) included in the pondr collection (http:// www.pondr.com) are typically feed-forward neural net-works trained on non-redundant sets of ordered and disor-dered sequences that help to ensure modest predictor biases and to enable the predictors to generalize to new sequences [27–29] PONDRs come in several versions depending on the sequence attributes taken over windows of 9–21 amino acids These attributes, such as the fractional composition
of particular amino acids, hydropathy or sequence com-plexity, are averaged over these windows, and the values are used to train the neural network during predictor con-struction The same values are used as inputs to make pre-dictions The regional order neural network (ronn) software, originally developed to identify protease cleavage sites, is a method based on sequence alignment available at http://www.strubi.ox.ac.uk/RONN [48] The iupred server
at http://iupred.enzim.hu estimates favourable pairwise con-tacts in protein sequences and assigns order⁄ disorder status based on the assumption that intrinsically unstructured⁄ disordered proteins and domains (IUPs) have special sequences that do not fold because of their inability to form sufficient stabilizing inter-residue interactions [49] The disembl software available at http://dis.embl.de is
Trang 9based on artificial neural networks trained to assign
disor-der by using three different definitions of disordisor-der: residues
within loops⁄ coils, residues within loops with a high degree
of mobility as determined from X-ray temperature factors
(B-factors), and residues with PDB missing coordinates as
defined by Remark465 entries in PDB [34] The disopred2
disorder prediction server at http://bioinf.cs.ucl.ac.uk/
disopred restrains the definition of disorder to those
resi-dues that appear in the sequence records but with
coordi-nates missing from the electron density map, and an SVM
was trained to specifically recognize these [50] globplot
(http://globplot.embl.de) is a web service based on the
ten-dency of residues to be in an ordered or disordered state,
and uses different propensity sets based on amino acid
hydrophobicities (Kyte–Doolittle and Hopp–Woods),
B-fac-tors, PDB missing coordinates and secondary
structure-derived propensities (Deleage–Roux and Russell–Linding)
[32] norsp is an on-line predictor of NORS regions that is
not trained on any dataset and predicts segments in which
the content in regular secondary structure is below
12% over at least 70 consecutive residues, and at least
10 consecutive residues are predicted to be exposed It can
be accessed at http://cubic.bioc.columbia.edu/services/
NORSp [51] The identification of hydrophobic clusters was
performed by hca available at http://bioserv.rpbs.jussieu.fr,
which allows the easy identification of globular regions
from non-globular ones and, in globular regions, the
identi-fication of secondary structures [30] prelink (http://
genomics.eu.org/spip/PreLink) is an hca-derived method
that calculates the amino acid distributions in structured
and unstructured regions, the probability that a given
sequence fragment is part of either a structured or an
unstructured region, and the distance of each amino acid to
the nearest hydrophobic cluster Using these three values
along a protein sequence, unstructured regions can be
pre-dicted with very simple rules [31] The multicoil program
(http://groups.csail.mit.edu/cb/multicoil/cgi-bin/multicoil.cgi)
predicts the location of coiled-coil regions in amino acid
sequences and classifies the predictions as dimeric or
tri-meric [55] coils (http://ch.embnet.org/software/COILS_
form.html) is a program that compares a sequence with a
database of known parallel two-stranded coiled-coils and
derives a similarity score By comparing this score with the
distribution of scores in globular and coiled-coil proteins,
the program then calculates the probability that the
sequence will adopt a coiled-coil conformation [56]
Predictions with improved performance were carried out
by the foldunfold web server available at http://skuld
protres.ru/~mlobanov/ogu/ogu.cgi, based on the
observa-tion that disorder is connected to a weak expected packing
density, as evaluated by the observed number of contacts
within 8 A˚ for each amino acid residue in the globular state
[35–38], and the SVM-based poodle (prediction of order
and disorder by machine learning, http://mbs.cbrc.jp/
poodle) system The poodle suite predicts protein disorder
from amino acid sequences and provides three types of pre-dictions: poodle-l and poodle-s predict long disorder regions (mainly longer than 40 consecutive amino acids) and short disorder regions, respectively; poodle-w is for binary prediction of whole protein disorder [44–46]
Another SVM method for recognizing IDPs was applied according to the procedure described in [26,57], using the mySVM implementation of SVM theory by Ru¨ping [67] The set of artificial parameters for 20 amino acid residues calculated by the Monte Carlo algorithm to maximally sep-arate natively unfolded and ideally globular proteins was obtained from [35] Additional predictions were performed by: dispro software (http://www.igb.uci.edu/servers/psss html), which relies on machine learning methods and lever-ages evolutionary information as well as predicted second-ary structure and relative solvent accessibility [58]; the VL2 and VL3 predictors available at http://www.ist.temple.edu/ disprot/predictor.php, which rely on partitioning protein disorder into flavours based on competition amongst increasing numbers of predictors [59] and on an ensemble
of feed-forward neural networks based on the same attri-butes as VL2 [60], respectively; and the drippred server (http://www.sbc.su.se/~maccallr/disorder), developed for sequence profile visualization and contact map prediction, which predicts structural disorder by looking for sequence patterns that are not typically found in the PDB [61]
Acknowledgements
This paper is dedicated to the memory of the unforget-table Harold C Helgeson (a.k.a Hal), founder of the Laboratory of Theoretical Geochemistry and Biogeo-chemistry at U C Berkeley (a.k.a Prediction Central), who is probably sailing off the coast near Margarita-ville The authors are grateful to V N Uversky for his help in creating the list of natively unfolded proteins
References
1 Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM & Obradovic Z (2002) Intrinsic disorder and protein func-tion Biochemistry 41, 6573–6582
2 Wright PE & Dyson HJ (1999) Intrinsically unstruc-tured proteins: re-assessing the protein structure–func-tion paradigm J Mol Biol 293, 321–331
3 Dyson HJ & Wright PE (2005) Intrinsically unstruc-tured proteins and their functions Nat Rev Mol Cell Biol 6, 197–208
4 Dunker AK & Obradovic Z (2001) The protein trinity – linking function and disorder Nat Biotechnol 19, 805– 806
5 Uversky VN (2002) Natively unfolded proteins: a point where biology waits for physics Protein Sci 11, 739– 756
Trang 106 Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic
Z, Uversky VN & Dunker AK (2007) Intrinsic disorder
and functional proteomics Biophys J 92, 1439–1456
7 Tompa P (2002) Intrinsically unstructured proteins
Trends Biochem Sci 27, 527–533
8 Metafora S, Esposito C, Caputo I, Lepretti M, Cassese
D, Dicitore A, Ferranti P & Stiuso P (2007) Seminal
vesicle protein IV and its derived active peptides: a
pos-sible physiological role in seminal clotting Semin
Thromb Hemost 33, 53–59
9 Ostrowski MC, Kistler MK & Kistler WS (1979)
Purifi-cation and cell-free synthesis of a major protein from
rat seminal vesicle secretion A potential marker for
androgen action J Biol Chem 254, 383–390
10 Pan Y-CE & Li SSL (1982) Structure of secretory
pro-tein IV from rat seminal vesicles Int J Pept Propro-tein Res
20, 177–187
11 Harris SE, Mansson P-E, Tully DB & Burkhart B
(1983) Seminal vesicle secretion IV gene: allelic
differ-ence due to a series of 20-base-pair direct tandem
repeats within an intron Proc Natl Acad Sci USA 80,
6460–6464
12 Kandala C, Kistler MK, Lawther RP & Kistler WS
(1983) Characterization of a genomic clone for rat
semi-nal vesicle secretory protein IV Nucleic Acids Res 11,
3169–3186
13 McDonald C, Williams L, McTurck P, Fuller F,
McIntosh E & Higgins S (1983) Isolation and
charac-terisation of genes for androgen-responsive secretory
proteins of rat seminal vesicles Nucleic Acids Res 11,
917–930
14 D’Ambrosio E, Del Grosso N, Ravagnan G, Peluso G
& Metafora S (1993) Cloning and expression of the rat
genomic DNA sequence coding for the secreted form of
the protein SV-IV Bull Mol Biol Med 18, 215–223
15 Metafora S, Facchiano F, Facchiano A, Esposito C,
Peluso G & Porta R (1987) Homology between rabbit
uteroglobin and the rat seminal vesicle sperm binding
protein: prediction of structural features of glutamine
substrates for transglutaminase J Protein Chem 6,
353–359
16 Ialenti A, Santagada V, Caliendo G, Severino B,
Fiorino F, Maffia P, Ianaro A, Morelli F, Di Micco B,
Cartenı` M et al (2001) Synthesis of novel
anti-inflam-matory peptides derived from the amino-acid sequence
of the bioactive protein SV-IV Eur J Biochem 268,
3399–3406
17 Miele L, Cordella-Miele E, Facchiano A & Mukherjee
AB (1988) Novel anti-inflammatory peptides from the
region of highest similarity between uteroglobin and
lipocortin I Nature 335, 726–730
18 Stiuso P, Ragone R, De Santis A, Metafora S, Peluso
G, Ravagnan G & Colonna G (1989) Structural
properties of rat seminal vesicle protein IV: effect of
sodium dodecylsulfate In Biochemical Aspects on the
Immunopathology of Reproduction(Spera G, Mukherjee
AB, Ravagnan G & Metafora S, eds), pp 105–111 Acta Medica, Rome
19 Ragone R, Facchiano F, Facchiano A, Facchiano AM
& Colonna G (1989) Flexibility plot of proteins Protein Eng 2, 497–504
20 Stiuso P, Metafora S, Facchiano AM, Colonna G & Ragone R (1999) The self association of protein SV-IV and its possible functional implications Eur J Biochem
266, 1029–1035
21 Tufano MA, Porta R, Farzati B, Di Pierro P, Rossano F, Catalanotti P, Baroni A & Metafora S (1996) Rat seminal vesicle protein SV-IV and its transglutaminase-synthesized polyaminated derivative Spd2-SV-IV induce cytokine release from human rest-ing lymphocytes and monocytes in vitro Cell Immunol
168, 148–157
22 Caporale C, Caruso C, Colonna G, Facchiano A, Ferr-anti P, Mamone G, Picariello G, Colonna F, Metafora
S & Stiuso P (2004) Structural properties of the protein SV-IV Eur J Biochem 271, 263–271
23 Ferron F, Longhi S, Canard B & Karlin D (2006) A practical overview of protein disorder prediction meth-ods Proteins 65, 1–14
24 Romero P, Obradovic Z, Li X, Garner EC, Brown CJ
& Dunker AK (2001) Sequence complexity of dis-ordered protein Proteins 42, 38–48
25 Sickmeier M, Hamilton JA, LeGall T, Vavic V, Cortese
MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky
VN et al (2007) DisProt: the database of disordered proteins Nucleic Acids Res 35, D786–793
26 Weathers EA, Paulaitis ME, Woolf TB & Hoh JH (2004) Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein FEBS Lett 576, 348–352
27 Romero P, Obradovic Z & Dunker AK (1997) Sequence data analysis for long disordered regions prediction in the calcineurin family Genome Inform 8, 110–124
28 Li X, Romero P, Rani M, Dunker AK & Obradovic Z (1999) Predicting protein disorder for N-, C-, and inter-nal regions Genome Inform 10, 30–40
29 Obradovic Z, Peng K, Vucetic S, Radivojac P & Dun-ker AK (2005) Exploiting heterogeneous sequence prop-erties improves prediction of protein disorder Proteins
61 (Suppl 7), 176–182
30 Gaboriaud C, Bissery V, Benchetrit T & Mornon JP (1987) Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences FEBS Lett 224, 149–155
31 Coeytaux K & Poupon A (2005) Prediction of unfolded segments in a protein sequence based on amino acid composition Bioinformatics 21, 1891–1900
32 Linding R, Russell RB, Neduva V & Ginson TJ (2003) GlobPlot: exploring protein sequences for globularity and disorder Nucleic Acids Res 31, 3701–3708