This study analysed these positions in 111 representatives of different protein folds, and then carried out dynamic Monte Carlo simulations of the first steps of the folding process, aimed
Trang 1Universal positions in globular proteins
From observation to simulation
Nikolaos Papandreou1, Igor N Berezovsky2,3, Anne Lopes4, Elias Eliopoulos1and Jacques Chomilier4
1
Laboratory of Genetics, Agricultural University of Athens, Greece;2Department of Structural Biology, The Weizmann Institute of Science, Rehovot, Israel;3Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA;
4
Equipe Biologie Structurale, LMCP, Universite´s Paris 6 and Paris 7, Paris, France
The description of globular protein structures as an
ensem-ble of contiguous Ôclosed loopsÕ or Ôtightened end fragmentsÕ
reveals fold elements crucial for the formation of stable
structures and for navigating the very process of protein
folding These are the ends of the loops, which are spatially
close to each other but are situated apart in the polypeptide
chain by 25–30 residues They also correlate with the
loca-tions of highly conserved hydrophobic residues (referred
to as topohydrophobic), in a structural alignment of the
members of a protein family This study analysed these
positions in 111 representatives of different protein folds,
and then carried out dynamic Monte Carlo simulations of
the first steps of the folding process, aimed at predicting the origins of the assembling folds The simulations demon-strated that there is an obvious trend for certain sets of residues, named Ômostly interacting residuesÕ, to be buried at the early stages of the folding process Location of these residues at the loop ends and correlation with topohydro-phobic positions are demonstrated, thereby giving a route to simulations of the protein folding process
Keywords: folding nucleus; hydrophobic core; lattice simu-lation; protein folding
Despite the continuously increasing number of
experiment-ally determined protein structures, many new folds are still
to be discovered This was illustrated clearly in a recent
study [1], where a plot of the number of protein families vs
the number of resolved complete genomes resulted in a
quasi-linearly increasing function Elucidating the
evolu-tionary mechanisms leading to the emergence of a finite
number of protein folds [2,3] from the vast number of
protein sequences [4,5], as well as the mechanisms of the
formation of mature protein globules [6], remains a topic
both of great challenge and interest The latter mechanisms
are related to the physical basis of protein structure
formation and stability [7], and thus can point to possible
evolutionary routes [8]
This study is based on universal structural units of protein
folds, named Ôclosed loopsÕ [9] or Ôtightened-end fragmentsÕ
(TEFs) [10] These major elements are universally present in
all types of protein folds and have the following features in
common: (a) they usually start and end in the hydrophobic
core [11]; (b) they form loop-like structures of nearly
standard size (25–30 amino acid residues); (c) they serve as
universal units of protein domain structure [12]; (d) the
ends of these elements (or so-called locks [13]), mainly
correspond to clusters of hydrophobic amino acids in general (WIMVYLF), and highly conserved ones, the topohydrophobic (TH) positions [14,15], in particular Determination of the TH positions is based on the analysis
of multiple structural alignments of members of a protein family, limited to a pair sequence identity with a maximum
of 30% TH positions are of particular importance for the formation and stability of the protein core [16] From a dynamic point of view, the early formation of a nucleus composed of TH positions would favor the formation of closed loops and considerably speed up the folding process [17] The coupled concepts of TH and closed loops/TEFs therefore offer a simple and general scenario for the folding mechanism of globular proteins [11,15] and provide a set of critical positions in the protein core [10,11,13] The loop structure of globular proteins is a general concept, inde-pendent from secondary structure, as well as from the particular folding mechanism of each protein [9,10,13] This study addresses the question of predicting these critical positions from the sequence, a task of major importance to approach the structure of a protein of unknown folding To successfully build such a structure, numerous pieces of information have to be collected by combining various methods An initial calculation of critical positions could be a first step, providing a frame of structural restraints, as TEF limits and TH residues are located mainly inside the protein core
The notion of topohydrophobic positions suggests that the forces that bury these residues and lead to a stable core do not rely on the details of the amino acid side chain structure, but rather on an adequate succession of hydrophobic and polar amino acid residues along the polypeptide chain Thus simplified protein models, such as lattice ones, are adequate tools for calculations aimed at locating critical residues
Correspondence to N Papandreou, Laboratory of Genetics,
Agricul-tural University of Athens, Iera Odos 75, 11855 Athens, Greece.
Fax: +30 2105294322, Tel.: +30 2105294372,
E-mail: papandre@aua.gr
Abbreviations: MIR, mostly interacting residues; PDB, protein data
bank; SCOP, structural classification of proteins; TEF, tightened end
fragment; TH, topohydrophobic.
(Received 29 June 2004, revised 22 September 2004,
accepted 15 October 2004)
Trang 2This study was carried out on a dataset of 111 globular
proteins with well-defined structures in the Protein Data
Bank (PDB), that were representative of various folds, and
for which the TEFs were available For a subset of 73
proteins of the above database, the TH positions have also
been determined
The initial stages of folding were simulated using a
simplified model, which consists of an alpha-carbon reduced
representation of the polypeptide chain on a 24-first
neighbour lattice A standard Monte Carlo algorithm
dynamically simulated the folding process and a statistical
mean force potential was used to describe the interactions
between noncontiguous residues A commonly accepted
lattice model has been used [18] and was focused on the first
stages of folding process, by measuring the tendency of
amino acids to be packed inside the hydrophobic core,
depending on the peculiarities of polypeptide chain
sequence
Starting from random conformations, the Monte Carlo
simulations revealed that a subset of hydrophobic residues
had a strong tendency to be buried These residues, named
Ômostly interacting residuesÕ (MIR), were found to
statisti-cally match TEF limits and TH positions
These results are in agreement with the hydrophobic
collapse mechanism, which can be further generalized onto
the nucleation–condensation mechanism, a hybrid of
hier-archical and hydrophobic collapse mechanism [23,24]
Materials and methods
The protein database consisted of 111 globular protein
chains, representing 78 different folds, according to the
structural classification of proteins (SCOP classification)
[22] In detail, there are 26 a class proteins, 23 b class, 26
a + b class, 18 a/b class and 18 of the small proteins class,
providing a balanced representation of the major known
folds The polypeptide chain lengths vary between 50 and
250 residues
Simulations have been carried out using a Ca
represen-tation of the polypeptide chains and the lattice geometry
(Fig 1) is as in [18]
On an underlying cubic lattice (Fig 1, dotted lines) with edges of unit length, contiguous alpha carbons are connec-ted by vectors of the form (± 2, ± 1, 0) (Fig 1, solid lines) The length of such a vector is ffiffiffi
5
p lattice units and is equivalent to 3.8 A˚, the typical distance between contiguous alpha carbons in proteins In this geometry, for residue i, there are 24 possible positions for residue i + 1 to occupy This kind of polypeptide chain projection allows for a more realistic representation of the polypeptide chain [18] Two spatial constraints are implemented First, the distance between noncontiguous alpha carbons cannot be less than 3.8 A˚ ( ffiffiffi
5
p lattice units), and second (contrary to cubic lattice, where only angles of 90 and 180 are possible), limit angles here are 66 and 143 (seven possible values), approximating the range of pseudo-angle s in natural proteins [19]
The different nature of amino acids is taken into account
in the force field used to attribute an energy value to each chain conformation The distance-independent 20· 20 residue pair energy matrix of Miyazawa and Jernigan was used [20] In detail, if two noncontiguous residues i and j are found within a distance smaller or equal to 5.88 A˚, a term Eij
is added to the total energy, depending on their nature The maximum interaction range of 5.88 A˚ corresponds to ffiffiffiffiffi
12 p lattice units and seems a reasonable estimate for the mean noncovalent interaction range between amino acid residues For each protein, 100 different initial conformations were randomly generated and used as starting points for 100 simulation runs, to avoid dependency from the initial state The only constraint placed on initial states is their noncompactness, in the sense that amino acid residues placed far away in the sequence were not allowed to be close
in space, to avoid clustering due to particular initial state conformation Quantitatively, this constraint introduces a minimum spatial distance, dmin, according to the separ-ation Delta¼ |i–j| between residues i and j: (1) Delta ¼ 6‚10, dmin ¼ 7 A˚; (2) Delta ¼ 11‚15, dmin ¼ 11 A˚; (3) Delta¼ 16‚20, dmin ¼ 19 A˚; (4) Delta more than 20, dmin¼ 27 A˚
The single residue movements [18] are of two kinds; end flip movement for the N and C terminal residues and corner movements for the others The choice of the move set is more or less arbitrary, as the elementary one-residue moves are sufficient to bring the protein to a folded state In this case, the restriction to elementary moves only, apart from its simplicity, permits a sequential analysis of the chain tendency to form compact fragments around particular amino acid residues from the beginning of the simulation After each move, the calculated conformational energy was subjected to a standard Metropolis criterion, at constant temperature
Because the goal was to analyse the propensity of residues
to be buried from the start of folding, we ensured that the maximum number of Monte Carlo steps was sufficient to allow formation of compact chain fragments Due to the serial nature of the algorithm, this time limit is correlated to protein chain length L It was empirically determined that for small proteins of about 50 residues, the value tmaxis around 106Monte Carlo steps Thus, the following linear relation was adopted to generalize tmaxto proteins of any length L: tmax¼ INT (106L/50), where INT is integer part, because t is an integer by definition (Monte Carlo steps)
Fig 1 The lattice model The solid line represents the backbone from
Ca to Ca positions, while the dotted line is the underlying cubic lattice.
Trang 3For each simulation, 104records of intermediate
confor-mations were taken at regular time intervals As the number
of simulations per protein is 100 (one for each initial state),
the end result is a set of 106records per protein
For every recorded conformation, and for each amino
acid residue the number of residues with which it is in
noncovalent interaction was calculated In spatial terms,
these noncovalent neighbours are the amino acid residues
lying within a distance of 5.88 A˚ or ffiffiffiffiffi
12
p lattice units For a given protein and for residue i, at the r-th record, the
number of noncovalent neighbours is nc(i,r) The time mean
of this quantity is
NCðiÞ ¼ 1
106
X10 6
r¼1 nc(i,r)
NC(i) values are rounded to the nearest integer This mean
number of noncovalent neighbours is a quantitative
meas-ure of the tendency of a residue to be buried from solvent
The higher the NC(i), the stronger this tendency
If NC is the mean value of NC(i) over the sequence for a
given protein, the residues for which NC(i) is significantly
higher than NC are of particular interest and are called
mostly interacting residues (MIRs) Their selection requires
fixing a cut-off value above the mean value NC It was
found that NC(i) varies between 1 and 8 and that NC¼ 4
for all studied proteins Figure 2 presents the distribution of
the different values of NC(i) over the amino acid residues of
all 111 proteins The most probable value is four, which
coincides with the mean sequence value, which is also four
for all proteins as stated above From this distribution, it
appears that 13% of residues have a number of noncovalent
neighbours equal to or higher than six, which was adopted
as the lowest NC(i) value for considering residue i as a MIR
In order to validate this model, once the positions of MIRs
were determined they were compared to TEF limits and to
topohydrophobic positions The comparison with TEFs was
performed on the complete database of 111 proteins The
comparison with TH positions was performed on a
73-protein subset of this database, where these positions were
determined For the remaining 38 proteins, the calculation of
TH positions was not possible, because to obtain this at least
four 3D structures of members of the same family are
required, with a pair identity not exceeding 30% [14,15] This
critirion was not fullfilled for these 38 cases
The PDB codes [21] of the database are given in Table 1
Results The Monte Carlo algorithm for folding simulation has been applied to the entire protein dataset and the histograms NC(i), containing the distribution of noncovalent neigh-bours along the amino acid sequence, have been obtained for each protein
In Fig 3 the positions of TEFs, TH and MIR for 10 proteins of the database representative of the various classes
as determined by SCOP [22] are illustrated
Among the 1920 calculated MIRs, 92% were hydropho-bic, following the definition of topohydrophobic residues (i.e they belonged to the set ÔVIMWYLFÕ) Also, the total numbers of MIRs and TH positions, in the 73-protein subset where they are compared, are relatively close (1299 MIRs vs 1011 TH) In the same subset, the total number of TEFs was 309; thus the number of TEF limits was 618, about half the number of MIRs
To assess the overall quality of agreement between predicted critical positions (MIR) and structure-defined ones (TH and TEF limits), a statistical analysis is required This has been carried out over the whole database, i.e over all 111 proteins for the comparison between MIR and TEF limits and for the subset of 73 proteins for the comparison between MIR and TH The results are presented in two histograms in Figs 4 and 5 The histogram of Fig 4 gives the comparison between MIR and TH positions and is constructed as follows Each TH position is placed at the origin of the abscissa Then, the neighbouring MIRs that are closer to this central TH than to any other TH are located Their number is plotted as a function of their sequence distance with respect
to the central TH This is reproduced for all THs along all the
73 proteins of the data set Thus Fig 4 shows a histogram of the separation between TH and the closest MIR The plotted distances range from )20 to +20, and MIRs lying at
distances greater than ± 20 residues from the closest TH are added to the histogram at the ± 20 positions The second histogram (Fig 5) follows the same rules and concerns the comparison of MIR to TEF limits It is constructed using the whole database of 111 proteins From observation of Figs 4 and 5 it is evident that comparison of MIR with TH and TEF limits clearly presents a peak at the origin This is an indication that the residues predicted to be MIRs actually do correspond to TH positions They also statistically correlate with TEF limits, which are mostly hydrophobic [13] as it was already shown that most TH positions are located in or in vicinity of TEF ends [10] The agreement between MIR and
TH is very clear and 63% of MIR were found within ± 5 positions from a TH residue The TEF histogram presents two main secondary maxima at positions ± 3 and 57% of MIR was found within ± 5 positions from a TEF limit This good agreement between prediction and analysis [13] is of great interest in the prediction of elements of the protein core from the sequence
Discussion The existence of critical positions in protein structures, punctuated by TH positions and/or TEF limits, is of great importance for protein folding and stability Consecutive formation of the globule core [10,11,17] composed essen-tially of these residues [13] leads to tremendous
optimi-Fig 2 Distribution of the mean number of noncovalent neighbours over
all 111 sequences of the dataset.
Trang 4Table 1 A list of the PDB codes, names and SCOP classes of the proteins studied The TEFs are known for all these proteins Proteins with known
TH positions are in bold The uppercase letters at the end of the code correspond to the chain.
PDB
PDB
PDB
1aep Apolipophorin-III a 2sns Staphylococcal
nuclease
1–4-b-glucanase
phosphoribosyltransferase
a/b
subunit Iib
a/b
acetyltransferase
a/b
1poc Phospholipase A2 a 2stv STNV coat protein b 3chy Signal transduction protein a/b
1lbd Retinoid-X receptor a a 1qabA Transthyretin b 1dhr Dihydropteridin reductase a/b
2ilk Interleukin-10 a 1cbs Cellular
retinoic-acid-binding protein
b 1asu Retroviral integrase,
catalytic domain
a/b
ligand binding core
a/b 2sas Calcium-binding
protein
a 1ptf Histidine-containing
phosphocarrier
1hbg Glycera globin a 153 L Lysozyme, Goose a + b 1akz Uracil-DNA glycosylase a/b
2mhbA Hemoglobin (horse) a 1acf Profilin a + b 1 ns5A Hypothetical protein YbeA a/b 1dkeA Hemoglobin (human) a 1ctf Ribosomal protein
L7/12
a + b 1jkeB D-Tyr tRNAtyr deacylase a/b
1lki Leukemia inhibitory
factor
a 1apyA Glycosylasparaginase a + b 1dtdB Carboxypeptidase inhibitor small 3cytO Mitochondrial
cytochrome c
invariantchain fragment
small 3c2c Cytochrome c2 a 1dtp Diphtheria toxin a + b 2bbkL Methylamine dehydrogenase small
1enh DNA-binding protein a 2pii Signal transduction
protein
Peptostreptococcus
a + b 1i8nA Anti-platelet protein small 1pht Phosphatidylinositol
3-kinase
b 1fxd Ferredoxin II,
Desulfovibrio gigas
1pwt a-Spectrin, SH3 domain b 1c0bA Ribonuclease A a + b 1ehs Heat-stable enterotoxin B small 1semA Signal transduction protein b 1shaA c-src Tyrosine kinase a + b 1tgj TGF-b3 small 1cauB Seed storage protein
7 s vicillin
b 1ag2 Prion protein domain a + b 4rxn Rubredoxin,
Clostridium pasteurianum
small
Pyrococcus furiosus
small
factor)
1anu Cohesin-2 domain b 1hucB (Pro)cathepsin B a + b 1hpi HIPIP, Ectothiorhodospira
vacuolata
small 1f3g Glucose-specific factor III b 2act Actinidin a + b 1hip HIPIP, Allochromatium
vinosum
small 1sno Staphylococcal nuclease b 2 ci2 Chymotrypsin
inhibitor CI-2
a + b 1knt Collagen type VI small
Trang 5zation of the folding process, by reducing the
conform-ational space to be explored Thus, the prediction of these
ÔhotÕ residues becomes an important step in approaching
the native three-dimensional structure A first approach to this goal was undertaken in this study The guiding hypothesis was that, in order to achieve fast folding,
Fig 3 Examples of comparison of MIR, TH and TEF for 10 sequences of various folds In each example, the PDB code (with the chain) is given, followed by the name, the SCOP class and the fold of the protein in parentheses The following lines represent the sequence and the TEFs The residues belonging to a TEF are indicated ÔIÕ In case of TEF overlap, two lines are used for this representation (for example in protein 1shaA) The next line shows TH positions, where the corresponding residues are indicated ÔTÕ The final line shows MIR residues, indicated by ÔMÕ For 3chy and 5p21, due to the sequence length, the results appear in two consecutive blocks.
Trang 6critical residues should have a tendency to contact each
other and thus form the origins of the hydrophobic core
The results confirmed this hypothesis Using a simple
alpha-carbon lattice model, formation of the nucleation
sites at initial steps of the folding process was
demon-strated
These results suggest that folding initiation can be based
on the early formation of a set of nucleation sites around
selected hydrophobic residues [10,11,13] This is essentially
the basis of the hydrophobic collapse mechanism [23], which
supposes formation of hydrophobic tertiary interactions
that initiate secondary structure It can be extended onto a
unified nucleation–condensation mechanism, which is a
combination of hierarchical and hydrophobic collapse
mechanisms [23,24] In the latter case, hydrophobic tertiary
interactions are consolidated at the same time as elements of
secondary structure (with possible variations of the kinetics
of the mechanism caused by the different intrinsic stabilities
of the secondary structural elements) These models have
been developed from experiments and simulations of
folding and unfolding of several small proteins [23,24] and
particularly from the analysis of the residual structure of
denatured states, which are thought to correlate to the
nucleation sites The comparison of MIR predictions with
this type of data is being considered for future studies
The secondary peaks in the histogram representing the
correlation between MIR and TEF (Fig 5) come from
the proteins belonging mainly to the a class For these
folds, the TEF limits are often located inside a helices and
are mainly hydrophobic Sometimes, the predicted MIR are not exactly these limits but are the nearest hydropho-bic residues, which in a helix are located three positions away because of the a-helix periodicity This observation
is in full agreement with the definition of the van der Waals locks, as extended (three to five residues long) segments of polypeptide chains interacting with each other, and thus forming Ôloop-n-lockÕ structures in globu-lar proteins [13]
The main conclusion of this study is that burying MIR positions can serve as the creation of anchors for sequential formation of closed loops These results remarkably corro-borate experimental evidence on the initial stages of the folding process NMR analysis of folding intermediates of protein bovine pancreatic trypsine inhibitor [25] revealed loop formation in early, non-native states, stabilized by nonlocal interactions Also, an NMR study on the folding of lysozyme [26] showed the early formation of hydrophobic clusters, which are linked together by long-range inter-actions These interactions were shown not to occur in the native structure, but they are apparently important for keeping the loop structure and thereby speeding up the folding procedure The appearance of these essential features
in this folding simulation permits an initial estimation of the anchor regions for loop formation This approach therefore provides a set of structural constraints from first principles for an unknown structure This information could be incorporated at the early steps of a prediction method for building protein structures from the sequence by producing anchor residues known to belong to the structural core In a second stage they can be introduced as a set of constraint distances in a more detailed modeling process Acknowledgements
This project has been funded by a Concerted Action from the European Union, QLG2-CT-2002–01298, and by the Greek-French bilateral PLATO program (grant no 04146WM) I N B was also supported
by the Post-Doctoral Fellowship of the Feinberg Graduate School, Weizmann Institute of Science.
References
1 Kunin, V., Cases, I., Enright, A.J., de Lorenzo, V & Ouzounis, C.A (2003) Myriads of protein families, and still counting Genome Biol 4, 401.
2 Koonin, E.V., Wolf, Y.I & Karev, G.P (2002) The structure of the protein universe and genome evolution Nature 420, 218–223.
3 Xia, Y & Levitt, M (2004) Simulating protein evolution in sequence and structure space Curr Opin Struct Biol 14, 202– 207.
4 Rost, B (2002) Did evolution leap to create the protein universe? Curr Opin Struct Biol 12, 409–416.
5 Liu, J & Rost, B (2003) Domains, motifs and clusters in the protein universe Curr Opin Chem Biol 7, 5–11.
6 Daggett, V & Fersht, A (2003) The present view of the mechanism of protein folding Nat Rev Mol Cell Biol 4, 497– 502.
7 Shakhnovich, E.I (1997) Theoretical studies of protein-folding thermodynamics and kinetics Curr Opin Struct Biol 7, 29–40.
8 Tiana, G., Shakhnovich, B.E., Dokholyan, N.V & Shakhnovich, E.I (2004) Imprint of evolution on protein structures Proc Natl Acad Sci USA 101, 2846–2851.
Fig 4 Histogram of the correspondence between TH positions and
MIR from a set of 73 proteins.
Fig 5 Histogram of the correspondence between TEF ends and MIR
from a set of 111 proteins.
Trang 79 Berezovsky, I.N., Grosberg, A.Y & Trifonov, E.N (2000) Closed
loops of nearly standard size: common basic element of protein
structure FEBS Lett 466, 283–286.
10 Lamarine, M., Mornon, J.P., Berezovsky, I.N & Chomilier, J.
(2001) Distribution of tightened end fragments of globular
pro-teins statistically match that of topohydrophobic positions:
towards an efficient punctuation of protein folding? Cell Mol Life
Sci 58, 492–498.
11 Berezovsky, I.N., Kirznher, V., Kirzhner, A & Trifonov, E.N.
(2001) Protein folding: looping from hydrophobic nuclei Proteins
45, 346–350.
12 Berezovsky, I.N (2003) Discrete structure of van der Waals
domains in globular proteins Protein Engineering 16, 161–167.
13 Berezovsky, I.N & Trifonov, E.N (2001) Van der Waals locks:
loop-n-lock structure of globular proteins J Mol Biol 307, 1419–
1426.
14 Poupon, A & Mornon, J.P (1998) Populations of hydrophobic
amino acids within protein globular domains; identification
of conserved ÔtopohydrophobicÕ positions Proteins 33, 329–
342.
15 Poupon, A & Mornon, J.P (1999) ÔTopohydrophobic positionsÕ
as key markers of globular protein folds Theoret Chem Accounts
101, 2–8.
16 Poupon, A & Mornon, J.P (1999) Predicting the protein folding
nucleus from sequences FEBS Lett 452, 283–289.
17 Berezovsky, I.N & Trifonov, E.N (2002) Loop fold structure of
proteins: resolution of Levinthal’s paradox J Biomol Struct.
Dynamics 20, 5–6.
18 Skolnick, J & Kolinski, A (1991) Dynamic Monte Carlo simu-lations of a new lattice model of globular protein folding, structure and dynamics J Mol Biol 221, 499–531.
19 Labesse, G., Colloc’h, N., Pothier, J & Mornon, J.P (1997) P-SEA: a new efficient assignment of secondary structure from C alpha trace of proteins Comput Appl Biosci 13, 291–295.
20 Miyazawa, S & Jernigan, R.L (1996) Residue-residue potentials with a favorable contact pari term and an unfavorable high packing density term for simulation and threading J Mol Biol.
256, 623–644.
21 Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N & Bourne, P.E (2000) The Protein Data Bank Nucleic Acids Res 28, 235–242.
22 Murzin, A.G., Brenner, S.E., Hubbard, T & Chothia, C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures J Mol Biol 247, 536– 540.
23 Fersht, A & Daggett, V (2002) Protein folding at atomic resolution Cell 108, 573–582.
24 Fersht, A (1997) Nucleation mechanisms in protein folding Curr Opin Struct Biol 7, 3–9.
25 Ittah, V & Haas, E (1995) Nonlocal interactions stabilize long range loops in the initial folding intermediates of reduced bovine pancreatic trypsin inhibitor Biochemistry 34, 4493–4506.
26 Klein-Seetharaman, J., Oikawa, M., Grimshaw, S.B., Wirmer, J., Duchardt, E., Ueda, T., Imoto, T., Smith, L.J., Dobson, C.M & Schwalbe, H (2002) Long-range interactions within a non-native protein Science 295, 1719–1722.