However, Momordica cochinchinensis Trypsin Inhibitor-I and -II MCoTI-I and -II, 34-residue squash inhibitors isolated from seeds of a common Cucurbitaceae from Vietnam, were recently sho
Trang 1Squash Inhibitors: From Structural Motifs to Macrocyclic Knottins
Laurent Chiche1,*, Annie Heitz1, Jean-Christophe Gelly1, Jérơme Gracy1, Pham T.T Chau2,
Phan T Ha2, Jean-François Hernandez3 and Dung Le-Nguyen4
1
Centre de Biochimie Structurale, CNRS UMR5048, INSERM UMR554, Université Montpellier I, Faculté de Pharmacie, 15, Avenue Flahault, 34093 Montpellier-France; 2 Center for Biotechnology, Vietnam National University,
90, Nguyen Trai Street, Hanọ, Vietnam; 3 Laboratoire des Aminoacides, Peptides et Protéines CNRS UMR5810 Universités Montpellier I & II Faculté de Pharmacie, 15, avenue Flahault, 34093 Montpellier-France; 4 INSERM U376, CHU Arnaud-de-Villeneuve, 371, rue du doyen Giraud, 34295 Montpellier-France
Abstract: In this article, we will first introduce the squash inhibitors, a well established family of highly potent canonical
serine proteinase inhibitors isolated from Cucurbitaceae The squash inhibitors were among the first discovered proteins with the typical knottin fold shared by numerous peptides extracted from plants, animals and fungi Knottins contain three knotted disulfide bridges, two of them arranged as a Cystine-Stabilized Beta-sheet motif.
In contrast to cyclotides for which no natural linear homolog is known, most squash inhibitors are linear However,
Momordica cochinchinensis Trypsin Inhibitor-I and -II (MCoTI-I and -II), 34-residue squash inhibitors isolated from
seeds of a common Cucurbitaceae from Vietnam, were recently shown to be macrocyclic In these circular squash inhibitors, a short peptide linker connects residues that correspond to the N- and C-termini in homologous linear squash inhibitors.
In this review we present the isolation, characterization, chemical synthesis, and activity of these macrocyclic knottins The solution structure of MCoTI-II will be compared with topologically similar cyclotides, homologous linear squash inhibitors and other knottins, and potential applications of such scaffolds will be briefly discussed.
Keywords: Macrocyclic proteins, knottins, inhibitor cystine knots, structural motifs, squash inhibitors, disulfide bridges, drug
design, serine proteinases
INTRODUCTION
Nature has many secrets that remain to be discovered,
and the discovery of macrocyclic proteins has revealed one
such new area that is set to become an important field in
structural biology Several examples of macrocyclic proteins
have appeared in the last few years strongly suggesting that
many are still to come in the near future
The cyclotide family reviewed in the preceding articles in
this issue [1-3] is particularly remarkable in that it comprises
a large number of proteins, all being cyclic Although the
stability afforded by the circular feature is clear, it is unclear,
however, if linear counterparts exist in nature and will be
discovered in the future In this field there is still much to be
explained
In this article we focus on a protein family that was
discovered more than twenty years ago and, until recently,
and in contrast to cyclotides, comprised only linear
compounds, the squash inhibitors The recent unexpected
discovery of circular squash inhibitors from Momordica
cochinchinensis, MCoTI-I and -II [4], supports the idea that
macrocyclization of proteins may not be uncommon in
nature and provides interesting perspectives for structural
*Address correspondence to this author at the Centre de Biochimie
Structurale, CNRS UMR5048, INSERM UMR554, Université Montpellier
I, Faculté de Pharmacie, 15, Avenue Flahault, 34093 Montpellier, France;
Tel: (33)[0]-4670-43432; Fax: (33)[0]-4675-29623; E-mail: chiche@
cbs.cnrs.fr
proteomics, since end-to-end cyclization, like many other post-translational modifications, cannot be simply deduced from genomic sequences
We will first present the main historical and structural highlights of the squash inhibitors This will be extended to the intriguing structural class of proteins known as knottins that share a similar scaffold Our recent efforts to organize and standardize knottin data for improved analyses and comparisons will be briefly discussed
Then starting from this background, we will describe the
macrocyclic squash inhibitors from seeds of Momordica
cochinchinensis, including their discovery, isolation,
sequence and structure They will be compared to cyclotides,
to linear homologs and to structurally similar linear knottins Chemical synthesis and possible applications of this scaffold will be discussed
THE FAMILY OF SQUASH INHIBITORS OF SERINE PROTEINASES
The so-called 'canonical' inhibitors of serine proteinases interact with their target enzyme in a substrate-like
mechanism via a binding loop with characteristic
conformation [5, 6] Among these, the squash inhibitors of serine proteinases are small (27-34 residues) disulfide-rich proteins discovered in the late 1970s in seeds of Winter squash [7] So far, all known homologs originate from the Cucurbitaceae plant family [8] Squash inhibitors were for
Trang 2342 Current Protein and Peptide Science, 2004, Vol 5, No 5 Chiche et al.
some time the smallest known natural serine protease
inhibitors until the recent discovery of SFTI-1 a 14-residue
long circular peptide, which is discussed in detail by
Korsinczky et al [9] in this issue Association constants with
various serine proteinases may be as high as 10-12 M-1
making these inhibitors among the most potent ones [8] As
with other plant protease inhibitors, squash inhibitors are
presumed to participate in defense mechanisms by conferring
resistance to pests, pathogens or insects [10]
Squash inhibitors contain six cysteines involved in three
disulfide bridges with I-IV, II-V, III-VI connectivity The
first three-dimensional (3D) structure determinations
revealed a very specific knotted scaffold achieved when one
disulfide bridge (III-VI) crosses the macrocycle formed by
the two other disulfides and the interconnecting peptide
backbone [11-13] This remarkably stable knotted topology,
was previously observed in only one compound, the
carboxypeptidase A inhibitor from potato PCI [14] Since
these pioneering observations, nearly one hundred proteins
have been explicitly shown through structural studies to
share this specific knotted scaffold
THE KNOTTIN FOLD AND THE CSB MOTIF
Small disulfide-rich proteins sharing the disulfide
connectivity and topology of the squash inhibitors are now
known as knottins [15] or Inhibitor Cystine Knots [16]
However, despite similar overall topologies, it soon became
apparent that the I-IV disulfide bridge is not structurally
conserved between different knottin families and that only
two disulfides (II-V and III-VI) were highly conserved [17]
This observation was later supported by folding experiments
on the squash inhibitor EETI-II in which it was shown that
two disulfides are necessary and sufficient to stabilize most
of the native structure [18-20]
The synthesis and biophysical study of the truncated
EETI-II peptide Min-23, comprising only cysteines II, III, V
and VI, confirmed that the elementary two-disulfide motif
we called the Cystine Stabilized Beta-sheet (CSB) motif is
an autonomous folding unit and is the elementary structural
motif in knottins [22] Interestingly enough, although this
motif has been shown to display high stability (Tm ~ 100°C),
it has never been observed in nature alone, without
supplementary disulfide bridges The CSB motif is not only
found in knottins, but also in numerous other small disulfide
rich folds (e.g EGF-like motifs or scorpion toxins), and is
actually the most widespread disulfide motif (data not
shown) It may thus be hypothesized either that an ancestral
CSB-based protein once existed but was lost during
evolution, or that the different CSB-based folds appeared
independently as a result of convergent evolution
Relationships between the CSB motif and knottins are
summarized in (Fig 1).
As new knottins are discovered, it is becoming more and
more apparent that nature has used this stable scaffold in
very different contexts to achieve various biological roles At
present more than 12 protein families, with virtually no
sequence identity, share the knottin fold Due to its small
size, its well-defined structure, and its high stability, this
scaffold is thought to be an appealing structural template in
drug design developments Therefore, to facilitate analyses
and comparisons, we have recently proposed a simple knottin nomenclature based on loop lengths between cysteines, and a unique knottin numbering based on cysteine connectivity and structural conservation of the CSB motif
[24] These are illustrated in (Fig 2) and used throughout the
rest of the paper
Moreover, a KNOTTIN database gathering information
on knottins has been set up [24] and can be freely accessed
on the Internet (http://knottin.cbs.cnrs.fr or http:knottin com) Database searches by keyword, sequence, nomenclature, or geometrical pattern can be carried out and various displays are proposed Renumbered sequences and structures, as well as structurally fitted PDB files are available All these tools greatly facilitate knottin analyses, and particularly sequence and structure comparisons, as shown in the succeeding sections A sequence alignment between representative knottin sequences is shown in (Fig
3A).
Fig (1) From elementary CSB motif to macrocyclic knottins.
The figure indicates structural relationships between knottins and non-knottin CSB-based proteins The images of the CSB motif, the linear and circular squash inhibitors, and the cyclotides were prepared with the MOLMOL [21] and POV-Ray (http://www.povray.org/) programs using coordinates of Min-23, EETI-II, MCoTI-II and kalata B1 The (a)b.c(d)e[f] nomenclature
is explained in Fig (2) No cyclic knottins have yet been discovered for which c=0 Note that relationships in this figure do not necessarily imply evolutionary relationships (see discussion).
Trang 3Fig (2) Knottin numbering and nomenclature An automatically drawn two-dimensional (2D) Collier de Perles representation [23, 24] of
MCoTI-II is shown The line between residues 43 and 58 is a result of the 2D representation and does not indicate chain break The cysteines involved in the knot are displayed with a black or grey background Roman and Arabic numbers indicate the order in the sequence and the new unique numbering of cysteines involved in the knot, respectively Cysteine IV (grey background) does not have a fixed number Letters a-f refer to successive loops between cysteines of the knot, and to the number of amino acids therein The latter values are used to establish the nomenclature {e.g MCoTI-II: (6)5.3(1)5[8]} Numbers between round brackets refer to peptide segments involved in the disulfide macrocycle whereas the number between square brackets refers to the C-to-N linker in cyclic squash inhibitors or cyclotides.
Fig (3) A Sequence alignment between knottins One representative sequence is shown for most knottin families except conotoxins (GVIA
and gm9a) and squash inhibitors (CPTI-II, EETI-II and MCoTI-I, -II and -III) The two-disulfide peptide Min-23 corresponding to the CSB motif is shown at the bottom Disulfide bridges of cysteines I-IV and of the CSB motif are shown on top as thin and thick lines, respectively.
Additional disulfide bridges are shown as thin boxes and lines Numbering is according to [24] and Fig (2) The X letter in the PCI sequence
stands for the ambiguous Glu/Gln residue The "<" in the MCoTI-III sequence stands for a N-terminal pyroglutamic acid B Proteolytic
fragments identified during MCoTI-II characterization in comparison with the EETI-II sequence.
Trang 4344 Current Protein and Peptide Science, 2004, Vol 5, No 5 Chiche et al.
MACROCYCLIC SQUASH INHIBITORS FROM
MOMORDICA COCHINCHINENSIS
Isolation and Characterization
MCoTI-I and II were isolated from dormant seeds of the
squash Momordica cochinchinensis (MCo), a common
Cucurbitaceae in Vietnam [25] These trypsin inhibitors
(TIs) once extracted from homogenized seeds, were purified
using a series of chromatographic steps including gel
filtration and ion-exchange chromatography, TIs being
detected by testing the collected fractions for trypsin
inhibitory activity (TIA) [25] Different TIs were separated
at this stage and were further analyzed and purified using
reverse-phase HPLC Finally, six species were isolated on a
mono-S column and characterized [4] Results are
summarized in (Fig 4) The sequence of the most abundant
TI (i.e MCoTI-II) was first determined Amino acid analysis
of the compound showed that MCoTI-II was composed of 34
residues To allow sequencing, half-cystines were reduced
and alkylated Mass spectrometry analysis of the resulting
species showed that MCoTI-II contained three disulfide
bonds, as is the case with all known squash inhibitors
However, all attempts to sequence the alkylated peptide
remained unsuccessful, indicating a blocked N-terminus
Proteolytic digestion was thus performed with the
endo-Lys-C protease since amino acid analysis revealed the presence
of three lysine residues in the sequence Proteolysis yielded
two fragments (Fig 3B), a small one, which could be
directly sequenced, and a large one, which was sub-digested
using chymotrypsin (data not shown) Sequence alignment of the fragments with known squash inhibitors, indicated that the twenty N-terminal residues of the large fragment significantly matched the C-terminal part of the TIs consensus, while its C-terminal sequence was homologous to the N-terminal portion of squash TIs This surprising result was strongly indicative of a macrocyclic structure The two fragments comprised 33 residues together, thus lacking one residue According to amino acid analysis and protease specificity, this residue had to be a lysine Calculation of the molecular weight of a linear peptide composed of these three portions would give a mass being 18 units above that measured, suggesting again the macrocyclic nature of this
TI This characteristic as well as the sequence were fully confirmed by digestion of the reduced/alkylated MCoTI-II
using endo-Asp-N as shown in (Fig 3B) As there was no
first or last residue, numbering was based on alignment with linear squash TIs The last residue was considered to be the glycine residue corresponding to the conserved C-terminal glycine in linear squash inhibitors It is likely that the
sequence shown in (Fig 3A) is contained as such in the
linear precursor Indeed, the first residues in the sequence of MCoTI-II (SGSDGGV) are clearly similar to the corresponding pro-sequence of the towel gourd trypsin inhibitor TGT-II (SGRHGGI) [26]
Other species were identified using a similar approach Firstly, the species isolated from peak D with a mass identical to that of MCoTI-II, and that isolated from peak F with a mass 18 units below that of MCoTI-II were shown to
be isomeric forms of MCoTI-II They were found to arise from rearrangement of an Asp-Gly peptide bond located in the C-to-N linker, when compared with linear homologs The former contained a β-Asp-Gly bond The latter species corresponded to the succinimide cyclic intermediate (aspartimide) formed during conversion of the α-Asp-Gly peptide bond (MCoTI-II) into the β-Asp-Gly bond The unusual stability of the succinimide moiety might arise from the overall constrained structure of the macrocyclic peptide The species contained in peak B, MCoTI-I, was also
shown to be macrocyclic As shown in (Fig 3), its sequence
differed from that of MCoTI-II at two positions close to the reactive site As for MCoTI-II, a species with an identical mass to MCoTI-I was isolated from peak A Although it has not been fully characterized, it is likely that this species also derived from an Asp-Gly bond rearrangement
Finally, a third inhibitor, MCoTI-III, was identified from peak F Although the reduced/alkylated species could not be directly sequenced, this appeared not to be due to macrocyclization but to the presence of an N-terminal pyroglutamate residue After removal of pyroglutamate using
a pyroglutamyl aminopeptidase, MCoTI-III was shown to be
a regular linear member of the squash inhibitor family (Fig
3A).
At this time MCoTI-I and -II are the only known cyclic squash inhibitors and the reason why remains to be determined The cyclization of MCoTIs might depend on the presence of a specific but unknown transpeptidase in
Momordica cochinchinensis seeds that would be absent in
other sources of squash inhibitors Alternatively, minor macrocyclic TIs in other Cucurbitaceae might have not been
Fig (4) Isolation of MCoTI-I, -II, and -III Ion-exchange
chromatography on a mono-S column using a NaCl gradient was
performed on compounds with TIA eluted from gel filtration
column [4, 25] Several peaks containing TIA indicated A to F were
collected Average masses of the major compounds contained in
these peaks have been measured by electro-spray mass
spectrometry.
Trang 5detected due to sequencing difficulties In fact, contrarily to
cyclotides, circularization was not expected in squash
inhibitors and could have been missed before the discovery
of MCoTIs, raising the possibility that Curcubitaceae
produce both linear and cyclic inhibitors, with the former
only being directly sequenceable Indeed, just after we
determined the sequence of MCoTI-II, another group
reported the partial sequence of a TI isolated from
Momordica cochinchinensis [27] The major mass measured
on this sample was close to 3480 corresponding to that of
MCoTI-I, probably the major inhibitor in the sample
However, the reported sequence was that of a contaminant
corresponding to a cleaved form of MCoTI-II Clearly, even
if the macrocyclic TI was the major species in the sample,
only a linear sequenceable species was reported
Chemical Synthesis of MCoTI-I
The potential interest in cyclic MCoTIs as scaffolds for
drug design (see below) prompted us to perform chemical
synthesis Chemical synthesis and folding of circular
peptides containing multiple disulfide bonds is a complicated
process Two main strategies have been reported depending
on which of the disulfide bridges or of the backbone
cyclization is performed first: (i) After classical chain
assembly and work-up, the linear precursor peptide is first
oxidized, then cyclization is achieved via conventional
coupling procedures [28] This approach in which folding
favors cyclization by bringing N- and C-termini in close
proximity, is not fully compatible with Lys and
Asp-containing peptides as these residues may undergo
undesirable couplings during the cyclization step (ii)
Cyclization is achieved prior to oxidation, either directly on
the resin or via a C-terminal thioester [29] Whether in vivo
cyclization of MCoTIs occurs after or before disulfide bond
formation remains unknown, although the former process,
which would bring the N- and C-termini of the linear
precursor in close proximity, appears more likely.
Nevertheless, we have synthesized cyclic MCoTI-I using the
second approach and then using a new simpler protocol [30]
Our first synthesis of MCoTI-I was based on the second
approach, and used the thioester ligation procedure described
by Tam & Lu [29] This approach consists of several steps
that can be summarized as follows: (i) Peptide elongation
was initiated with Boc-Ile79-CO-S-CH2CH2CO-MBHA
resin Position 79 was selected in order to favor the
intramolecular transthio-esterification (via the thiol of Cys80)
with the thioester leading to the head-to-tail cyclization
Cysteine side-chains were protected by methylbenzyl (Meb)
groups at positions 40, 80, 60, 100 (the cysteines of the CSB
motif), and acetamidomethyl (Acm) groups at positions 20
and 78 (Figs 2 and 3) (ii) After cleavage with HF and
classical work-up the linear deprotected peptide (except
Cys(Acm)) was submitted to cyclization (iii) Oxidation of
cysteines of the CSB motif (40, 80, 60, 100) was performed
in the presence of DMSO followed by a rapid HPLC
purification (iv) The last disulfide bridge (cysteines 20 and
78) was formed (I2/MeOH), and MCoTI-I (major peak) was
finally obtained by HPLC purification (Mass: 3479.29,
expected: 3478.51) The trypsin inhibitory activity of the
synthetic product was comparable to that of native MCoTI-I,
as assessed by a qualitative agar-agar dish assay (with
edestin as substrate) based on the method described by Leluk
& Pham [31].
In our second preparation of MCoTI-I, Boc-Ala63
-CO-S-CH2CH2CO-MBHA resin was used as starting material All six Cys residues were introduced as Meb derivatives Elongation of the peptide was achieved according to the method described earlier [32] After HF cleavage, the crude peptide was dissolved in water containing 10% CH3CN, and the solution was stirred 24 h (pH 8) The major peak obtained by HPLC purification coeluted with the sample of MCoTI-I obtained from the first synthesis (mass: 3479.57)
In this method, cyclization and oxidation of all six cysteines was performed in a unique step Despite its simplicity, this new approach afforded a slightly better yield than the tedious step by step approach of Tam & Lu [29] The good yield in native protein is likely a result of the strong tendency of MCoTI-I to form native-disulfide bridges, as previously observed for EETI-II [32] Much simpler and cost-effective, this approach opens new routes to the chemical synthesis of cyclic squash inhibitors It would be interesting to see if this can be applied to other macrocyclic knottins as well
Three-dimensional Structure of MCoTI-II, and Comparison with the Cyclotide Kalata B1
Sufficient quantities of MCoTI-II were gathered from natural source for structural studies and the solution structure was solved by NMR simultaneously by our group and by David Craik's group [33, 34] Not surprisingly, the structure
of MCoTI-II is very close to that of linear homologs As
shown in (Fig 1), all typical structural elements of the
squash inhibitors are present in MCoTI-II: the triple-stranded
β-sheet and the two disulfide bridges that define the CSB motif, as well as the short 310 helix and the two β-turns The root mean square (rms) deviation for superimposition of the backbone atoms of residues 40, 60-61, 79-81 and 99-100
(knottin numbering, (see Fig 2) of the core CSB motif of
MCoTI-II [33] onto the reference X-ray knottin structure (CPTI-II, PDB ID 2btc, chain I) is as low as 0.35 Å (see the KNOTTIN database) Extending the superimposition to the
whole segment 40-100 (i.e from the second to the last cysteine) or to the 20-100 segment (i.e from the first to the
last cysteine, including the somewhat flexible inhibitory loop) leads to rms deviations of 0.76 Å and 0.85 Å, respectively These values are sufficiently low to claim that the C-to-N cyclization in MCoTI-II has no significant impact
on the protein structure
Actually, the C-to-N linker appeared as the most flexible part of the molecule, rather than as a stabilizing element [33, 34], and macrocyclization is not likely to be an essential element of MCoTI-II stability However, no linearization has yet been reported for MCoTI-II for comparison, and the details of the impact of macrocyclization on structure and stability of MCoTI-II remain to be determined Nevertheless, the isolation of the linear homologous trypsin inhibitor, MCoTI-III (77% of identity with MCoTI-II), also isolated
from seeds of Momordica cochinchinensis, clearly
demonstrates that cyclization is unnecessary for trypsin inhibition Thus the cyclic feature of MCoTI-II is a determinant of neither the stability nor the activity of the molecule It is worth noting here that pseudo-cyclizations do
Trang 6346 Current Protein and Peptide Science, 2004, Vol 5, No 5 Chiche et al.
occur in most squash TIs via salt-bridging between
side-chain of an N-terminal arginine and the C-terminal
carboxylate [11, 35-38] Nevertheless, the biological role of
cyclization in MCoTI-I and -II remains unclear The melting
temperature of EETI-II was shown to be approximately
140°C, revealing the extremely high stability of linear
squash TIs [22] Even the truncated two-disulfide Min-23
peptide (Fig 1 and 3) displays a high stability (Tm ~100°C)
[22] Therefore, the most likely significant impact of
cyclization would be on resistance to exoproteases by
removal of the protein N- and C-termini, and that might be
relevant for the biological role of MCoTIs
Interestingly, a quite different scenario has been reported
for the topologically very similar kalata B1 cyclotide It has
been shown that linearization of this peptide induces only
limited disruption of structural features but a total loss of
hemolytic activity [39], suggesting that, in this case,
cyclization affords a slight but necessary stabilization
Simple comparison of the loop [f] sequences in MCoTI-II
and kalata B1 supports these observations The 8-residue
linker in MCoTI-II contains four glycines and no prolines,
while the 7-residue linker in kalata B1 contains one proline
and only one glycine Since glycines and prolines are the
most and least flexible of all residues, respectively, this
simple sequence analysis is consistent with the linker in
MCoTI-II being more flexible Although these differences
appear very subtle, however, they might contain part of the
explanation why linear squash inhibitors are common,
whereas natural linear analogs of cyclotides are unknown
Although cyclic squash inhibitors and cyclotides share a
common topology, several differences can be observed apart
from the C-to-N linker By contrast to cyclotides, the
biological activity of the squash inhibitors is most certainly
inhibition of serine proteases Accordingly the canonical
inhibitory binding loop, i.e loop (a), is well defined and
conserved Although possible inhibitory activity for kalata
B1 has been once examined, no canonical loop can be
recognized in cyclotides, and loop (a) is shorter (3 residues),
playing a structural rather than functional role through H bonding of Glu22 side-chain with backbone amides [40] Conversely loop b is highly hydrophilic in squash inhibitors with a structural role due to H bonding of side chains of Asp43 and Asp59, but rather hydrophobic in cyclotides and with potential biological role These observations support previous analyses suggesting that different sequences and different stabilizing interactions can give rise to highly similar 3D structures [41]
WHY DO ALL KNOTTIN FAMILIES NOT HAVE CYCLIC MEMBERS?
Knottins define a very intriguing structural class of small disulfide-rich proteins Their scaffold is very small, yet remarkably stable thanks to three knotted disulfides The topological organization makes the N- and C-termini in rather close proximity since they lie at the same end of two adjacent strands of an anti-parallel β-sheet This proximity has allowed circularization by natural head-to-tail ligation in two knottin families, the cyclotides and the squash inhibitors Interestingly, it has been observed for a long time that a surprisingly high fraction of proteins have N- and C-termini close to each other [42] This feature has been used in many cases to perform non-native circularization [43, 44] or circular permutation of protein sequences [45, 46] Starting from a representative ensemble of 2169 protein structures [47], we have performed a brief analysis of distances between N- and C-termini The resulting distribution, shown
in (Fig 5), indicates that a significant number of proteins
have N- and C-termini within 15-20 Å, a distance that can be easily filled with linkers of just a few residues
Nevertheless, the discovery of cyclic MCoTIs has revealed the first family where nature has used this strategy
of circularization of few homologs by connecting proximal N- and C-termini through a short peptide linker, whereas most members remain linear It is striking, however, that similar circularization does not occur in other knottin families that contain only linear members Comparison of
Fig (5) Histogram of distances between N- and C-termini in a representative ensemble of proteins structures.
Trang 7squash inhibitors and cyclotides with linear knottin families
provides potential clues on circularization due to different
location of Cys IV between families The two known
families that include circular members, the cyclotides and the
squash inhibitors, have a cysteine in position 78 (knottin
numbering, (Fig 2 and 3), i.e near Cys V, whereas it is in
position 61, i.e adjacent to Cys III in most other families
(Fig 3) Since Cys I, which is close to the N-terminus, is
disulfide linked with Cys IV, the displacement of Cys IV
from one end to the other end of loop c brings the
N-terminus in quite different locations, and modifies
significantly the distance between termini Thus, the distance
between the amide of residue 19 and the carbonyl of residue
101 in the linear (CPTI-II, PDB ID: 2btc, chain I) and cyclic
(MCoTI-II, PDB IDs: 1ha9, 1ib9) squash inhibitors is about
9 Å This distance is about twice as large (18 Å) in
omega-agatoxin IVB (Fig 3), PDB ID: 1agg), and this difference is
roughly conserved over other squash inhibitors and spider
toxins Thus it may be postulated that circularization is easier
for knottins with nomenclatures such that c>(d) {e.g 1ha9:
(6)5.3(1)5[8]} than when the reverse is true {e.g 1agg:
(7)6.0(4)10} Although chemical circularization of a
conotoxin has been reported [48], the detailed impact on the
conotoxin structure is not yet available
Nevertheless, the observation of macrocyclic squash
inhibitors opens new perspectives in the application of the
knottin scaffold in drug design, and these are discussed
shortly below
POTENTIAL APPLICATIONS OF (CYCLIC) SQUASH
INHIBITORS AND ANALOGS IN DRUG-DESIGN
Although the role of the macrocyclization remains
unclear, an obvious advantage is to confer resistance to
exopeptidases Together with knotted disulfide bridges, these
constraints render the macrocyclic peptides highly stable As
discussed by Craik et al in this issue [1], several cyclotides
were shown to be resistant to proteases and to boiling
treatment, and kalata B1, the active component of extracts
used in traditional medicine, appears to be orally active
Similarly, MCoTI-II was shown to be resistant to thermolysin at 50°C and to heat treatment of the seeds [4], and thus also represents a very interesting scaffold in drug-design approaches
Squash inhibitors are small but very potent serine proteinase inhibitors, and it has been shown that mutation at
or near the P1 site allows generation of potent inhibitors of serine proteinases of medical interest, e.g neutrophil elastase involved in several diseases (emphysema, cystic fibrosis or rheumatoid arthritis) [15, 49, 50] This could be extended to other serine proteinases of pharmacological interest, such as coagulation factors and other proteases of the clotting cascade, or matriptase involved in cancer Using the cyclic nature of MCoTIs in similar approaches would certainly improve the bioavailability of the new molecules
Although more speculative, a still much greater potential can be expected by using the MCoTI scaffold as a stable and protease resistant structural template on which new biological activities could be transferred Several strategies
in this direction have already been reported using the linear squash inhibitor EETI-II or the elementary CSB motif These studies suggest that the homologous cyclic MCoTI-I or -II peptides could be easily modified to engineer small, stable molecules with new, selected, activity These approaches are
summarized in (Fig 6).
A pioneering work transferred the C-terminal sequence
of PCI onto EETI-II, resulting in a double-headed inhibitor
of trypsin and carboxypeptidase [15, 17] More recently, the primary trypsin binding loop (a) of EETI-II was replaced by either a 13- or a 17-residue epitope from the Sendai virus L-protein or the human bone Gla-L-protein respectively, and the
chimeric peptide displayed on the Escherichia coli outer
membrane as fused proteins [51] In another work, the same binding loop was replaced by a sequence derived from the third domain of the turkey ovomucoid inhibitor and optimized to inhibit porcine pancreatic elastase [52] Finally, this same loop was also subjected to randomization in a mRNA display approach [53]
Fig (6) Summary of drug design reports based on the EETI-II squash inhibitor scaffold
Trang 8348 Current Protein and Peptide Science, 2004, Vol 5, No 5 Chiche et al.
The second β-turn of EETI-II, i.e loop e, has also been
the subject of several studies
First, selection of trypsin binders from a phage displayed
library with four randomized positions in loop e, showed that
this loop can be accommodated by few sequences [54]
Furthermore, circular permutation of EETI-II in which the
termini are linked by a (Gly)3 tripeptide and loop e is
cleaved, yielded a correctly folded compound with
native-like structure (unpublished results) This result shows that
loop e is not essential for folding, although its transfer into
homologous CMTI-III was shown to improve synthesis yield
[55] And last, grafting a new residue sequence, taken from
the SH3 RT loop of an HIV-1 nef binding kinase, onto the
same loop in Min-23 resulted in a correctly folded chimera
with a conserved CSB motif [56] All these studies clearly
demonstrate that loop (a) and e of linear squash inhibitors
can accommodate large sequence modifications Moreover,
comparison of linear and circular squash inhibitors strongly
suggests that the head-to-tail linker, i.e loop [f], is not
essential for correct folding of the circular compounds and
can also be varied significantly This was indeed verified by
the correct folding of the circular permutant of EETI-II with
a (Gly)3 loop [f] (unpublished results) Overall, these studies
clearly demonstrate that the scaffold of the squash inhibitors
(and probably of many other knottins) is indeed a very
promising template for drug design, and this fact is still
strongly reinforced by the recent discovery of cyclic squash
inhibitors There is no doubt that new molecules developed
using one of the above approaches would benefit from the
increased stability and protease resistance afforded by
cyclization In this context, it is worth noting that the CSB
motif itself has N- and C-termini in extremely close
proximity, as exemplified in the Min-23 peptide shown in
(Fig 1), but no cyclization of this kind of compound has yet
been tested
DIVERGENT OR CONVERGENT EVOLUTION?
An interesting question remains open to discussion, i.e.
the putative evolutionary relationship between different
knottin families and more specifically between cyclic
knottins, i.e squash inhibitors and cyclotides The first part
of the question has been tentatively addressed in a recent
paper [57] Based on gene organization and 3D structure the
authors suggest the existence of two different ancestors for
knottins from plants and from animals The structural
criterion used was that animal knottins have a c=0 loop,
whereas plant knottins have a c≠0 loop However, although
this is mostly verified, it can be seen from (Fig 1) that
several knottins contradict this proposal: on the one hand
conotoxin gm9a has a c=3 loop whereas on the other hand
Gurmarin from Gymnema sylvestre, α-amylase inhibitor
from Amaranthus hypochondriatus, and PAFP-S from
Phytolacca americana display a c=0 loop Nevertheless,
although many knottin families display sequences with no
detectable relationship, it is likely from sequence
comparisons (data not shown), that at least some knottin
families are evolutionary related as, for example, toxins from
cone snails and toxins from spiders Alternatively, it might
be considered that the two-disulfide CSB motif is simply a
stable structural arrangement often found in small proteins,
just as helix bundles or β-sheet Greek-key motifs are
observed in many unrelated globular proteins All known CSB-based proteins however display at least one additional disulfide But there may not be an infinite number of ways to add a supplementary disulfide, and creating a knottin may be one of the most stabilizing ways In other words, it is tempting to speculate that many knottins have actually evolved to the same fold as a result of convergent evolution Considering the absence of any significant sequence homology, the possibility that cyclotides from Rubiaceae and Violaceae plant families and circular squash inhibitors from Cucurbitaceae plant family, have evolved by convergent evolution cannot be ruled out Examples of evolutionary convergence of shape are well known at the morphological level, e.g between the succulent euphorbia of Africa and the cacti of the Americas Nevertheless, the intriguing structural proximity between the two families suggests evolutionary relationships [34] According to this, it can be noted that Violaceae and Cucurbitaceae families are particularly close
in taxonomy (Common taxonomy in SwissProt [58] for Violaceae and Cucurbitaceae: Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; ), but a definite answer regarding the existence of
a common ancestor should probably await the discovery of many more protein and genetic sequences in these or related families
ABBREVIATIONS
2D = Two-dimensional 3D = Three-dimensional Acm = Acetamidomethyl CSB = Cystine-Stabilized Beta-sheet CPTI = Cucurbita pepo Trypsin Inhibitor
DMSO = Dimethylsulfoxide EETI-II = Ecballium elaterium Trypsin Inhibitor-II
HPLC = High performance liquid chromatography MBHA = Methylbenzhydrylamine
MCoTI = Momordica cochinchinensis Trypsin
Inhibitor Meb = Methylbenzyl rms = Root mean square NMR = Nuclear magnetic resonance PCI = Potato Carboxypeptidase Inhibitor PDB = Protein Data Bank
TCEP = Tris (2-carboxyethyl) phosphine
TI = Trypsin Inhibitor TIA = Trypsin inhibitory activity
REFERENCES
[1] Craik, D J., Daly, N L., Plan, M R., Trabi, M., and Mulvenna, J.
(2004) Curr Prot & Pept Sci., 5, 297-315.
[2] Göransson, U., Svangård, E., Clæson, P and Bohlin, L (2004)
Curr Prot & Pept Sci., 5, 317-329.
Trang 9[3] Gustafson, K R., McKee, T C., and Bokesch, H R (2004) Curr.
Prot & Pept Sci., 5, 331-340.
[4] Hernandez, J F., Gagnon, J., Chiche, L., Nguyen, T M., Andrieu,
J P., Heitz, A., Trinh Hong, T., Pham, T T., and Le Nguyen, D.
(2000) Biochemistry, 39, 5722-30.
[5] Bode, W., and Huber, R (1992) Eur J Biochem., 204, 433-51.
[6] Laskowski, M., Jr., and Kato, I (1980) Annu Rev Biochem., 49,
593-626.
[7] Polanowski, A., Wilusz, T., Nienartowicz, B., Cieslar, E.,
Slominska, A., and Nowak, K (1980) Acta Biochim Pol., 27,
371-82.
[8] Otlewski, J., and Krowarsch, D (1996) Acta Biochim Pol., 43,
431-44.
[9] Korsinczky, M L., Schirra, H J., and Craik, D J (2004) Curr.
Prot & Pept Sci., 5, 351-364.
[10] Konarev, A V., Anisimova, I N., Gavrilova, V A., Vachrusheva,
T E., Konechnaya, G Y., Lewis, M., and Shewry, P R (2002)
Phytochemistry, 59, 279-91.
[11] Bode, W., Greyling, H J., Huber, R., Otlewski, J., and Wilusz, T.
(1989) FEBS Lett., 242, 285-92.
[12] Heitz, A., Chiche, L., Le-Nguyen, D., and Castro, B (1989)
Biochemistry, 28, 2392-8.
[13] Chiche, L., Gaboriaud, C., Heitz, A., Mornon, J P., Castro, B., and
Kollman, P A (1989) Proteins, 6, 405-17.
[14] Rees, D C., and Lipscomb, W N (1982) J Mol Biol., 160,
475-98.
[15] Le Nguyen, D., Heitz, A., Chiche, L., Castro, B., Boigegrain, R A.,
Favel, A., and Coletti-Previero, M A (1990) Biochimie, 72, 431-5.
[16] Pallaghy, P K., Nielsen, K J., Craik, D J., and Norton, R S.
(1994) Protein Sci., 3, 1833-9.
[17] Chiche, L., Heitz, A., Padilla, A., Le-Nguyen, D., and Castro, B.
(1993) Protein Eng., 6, 675-82.
[18] Heitz, A., Chiche, L., Le-Nguyen, D., and Castro, B (1995) Eur J.
Biochem., 233, 837-46.
[19] Heitz, A., Le-Nguyen, D., Castro, B., and Chiche, L (1997) Lett.
Pept Sci., 4, 245-9.
[20] Le-Nguyen, D., Heitz, A., Chiche, L., el Hajji, M., and Castro, B.
(1993) Protein Sci., 2, 165-74.
[21] Koradi, R., Billeter, M., and Wuthrich, K (1996) J Mol Graph.,
14, 51-5, 29-32.
[22] Heitz, A., Le-Nguyen, D., and Chiche, L (1999) Biochemistry, 38,
10615-25.
[23] Lefranc, M P., Giudicelli, V., Ginestoux, C., Bodmer, J., Muller,
W., Bontrop, R., Lemaitre, M., Malik, A., Barbie, V., and Chaume,
D (1999) Nucleic Acids Res., 27, 209-12.
[24] Gelly, J.-C., Gracy, J., Kaas, Q., Le Nguyen, D., Heitz, A., and
Chiche, L (2004) Nucleic Acids Res., 32, D156-D159.
[25] Pham, T C., and Nguyen, T M (1996) VNU Journal of Science,
Nat Sci (vietnamese, english summary), 33-41.
[26] Ling, M H., Qi, H Y., and Chi, C W (1993) J Biol Chem., 268,
810-4.
[27] Huang, B., Ng, T B., Fong, W P., Wan, C C., and Yeung, H W.
(1999) Int J Biochem Cell Biol., 31, 707-15.
[28] Daly, N L., Love, S., Alewood, P F., and Craik, D J (1999)
Biochemistry, 38, 10606-14.
[29] Tam, J., and Lu, Y.-A (1997) Tetrahedron Lett., 38, 5599-602.
[30] Le Nguyen, D., Barry, L G., Tam, J P., Heitz, A., Chiche, L.,
Hernandez, J F., and Pham, T C (2002) in Peptides 2002
(Benedetti, E., and Pedone, C., Eds.) pp 182-183., Edizioni Ziino,
Napoli, Italy.
[31] Leluk, J., and Pham, T T C (1985) in XXIst Meeting of Polish
Biochemical Society pp 139., Krakow, Poland.
[32] Le-Nguyen, D., Nalis, D., and Castro, B (1989) Int J Pept.
Protein Res., 34, 492-7.
[33] Heitz, A., Hernandez, J F., Gagnon, J., Hong, T T., Pham, T T.,
Nguyen, T M., Le-Nguyen, D., and Chiche, L (2001)
Biochemistry, 40, 7973-83.
[34] Felizmenio-Quimio, M E., Daly, N L., and Craik, D J (2001) J.
Biol Chem., 276, 22875-82.
[35] Helland, R., Berglund, G I., Otlewski, J., Apostoluk, W.,
Andersen, O A., Willassen, N P., and Smalas, A O (1999) Acta
Crystallogr D Biol Crystallogr., 55, 139-48.
[36] Huang, Q., Liu, S., and Tang, Y (1993) J Mol Biol., 229,
1022-36.
[37] Zhu, Y., Huang, Q., Qian, M., Jia, Y., and Tang, Y (1999) J.
Protein Chem., 18, 505-9.
[38] Thaimattam, R., Tykarska, E., Bierzynski, A., Sheldrick, G M., Jaskolski, M., Zhu, Y., Huang, Q., Qian, M., Jia, Y., Tang, Y., Liu, S., Bode, W., Greyling, H J., Huber, R., Otlewski, J., Wilusz, T., Helland, R., Berglund, G I., Apostoluk, W., Andersen, O A.,
Willassen, N P., and Smalas, A O (2002) Acta Crystallogr D
Biol Crystallogr., 58, 1448-61.
[39] Barry, D G., Daly, N L., Clark, R J., Sando, L., and Craik, D J.
(2003) Biochemistry, 42, 6688-95.
[40] Rosengren, K J., Daly, N L., Plan, M R., Waine, C., and Craik, D.
J (2003) J Biol Chem., 278, 8606-16.
[41] Laurents, D V., Subbiah, S., and Levitt, M (1994) Protein Sci., 3,
1938-44.
[42] Thornton, J M., and Sibanda, B L (1983) J Mol Biol., 167,
443-60.
[43] Iwai, H., and Pluckthun, A (1999) FEBS Lett., 459, 166-72.
[44] Goldenberg, D P., and Creighton, T E (1983) J Mol Biol., 165,
407-13.
[45] Graf, R., and Schachman, H K (1996) Proc Natl Acad Sci USA,
93, 11591-6.
[46] Hennecke, J., Sebbel, P., and Glockshuber, R (1999) J Mol Biol.,
286, 1197-215.
[47] Jones, D T (1999) J Mol Biol., 292, 195-202.
[48] Craik, D., Daly, N L., and Nielsen, K J (2000), PTC International
Patent Application WO 0015654.
[49] Rolka, K., Kupryszewski, G., Ragnarsson, U., Otlewski, J., Wilusz,
T., and Polanowski, A (1989) Biol Chem Hoppe Seyler, 370,
499-502.
[50] Rozycki, J., Kupryszewski, G., Rolka, K., Ragnarsson, U., Zbyryt, T., Krokoszynska, I., Wilusz, T., Otlewski, J., and Polanowski, A.
(1994) Biol Chem Hoppe Seyler, 375, 289-91.
[51] Christmann, A., Walter, K., Wentzel, A., Kratzner, R., and Kolmar,
H (1999) Protein Eng., 12, 797-806.
[52] Ay, J., Hilpert, K., Krauss, N., Schneider-Mergener, J., and Hohne,
W (2003) Acta Crystallogr D Biol Crystallogr., 59, 247-54.
[53] Baggio, R., Burgstaller, P., Hale, S P., Putney, A R., Lane, M., Lipovsek, D., Wright, M C., Roberts, R W., Liu, R., Szostak, J.
W., and Wagner, R W (2002) J Mol Recognit., 15, 126-34.
[54] Wentzel, A., Christmann, A., Kratzner, R., and Kolmar, H (1999)
J Biol Chem., 274, 21037-43.
[55] Rolka, K., Kupryszewski, G., Ragnarsson, U., Otlewski, J.,
Krokoszynska, I., and Wilusz, T (1991) in Peptides 1990 (Giralt,
E., and Andreu, D., Eds.) pp 768-771, ESCOM Science Publishers, Leiden, Netherland.
[56] Heitz, A., Le-Nguyen, D., Dumas, C., and Chiche, L (2000) in
Peptides 2000 (Martinez, J., and Fehrentz, J A., Eds.) pp
415-416., Editions EDK, Paris, France.
[57] Zhu, S., Darbon, H., Dyason, K., Verdonck, F., and Tytgat, J.
(2003) FASEB J., 17, 1765-7.
[58] Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M C., Estreicher, A., Gasteiger, E., Martin, M J., Michoud, K.,
O'Donovan, C., Phan, I., Pilbout, S., and Schneider, M (2003)
Nucleic Acids Res., 31, 365-70.
Received: February 15, 2004 Accepted: May 26, 2004