M I N I R E V I E WSelection of stably folded proteins by phage-display with proteolysis Yawen Bai and Hanqiao Feng Laboratory of Biochemistry, National Cancer Institute, Bethesda, MD, U
Trang 1M I N I R E V I E W
Selection of stably folded proteins by phage-display with proteolysis Yawen Bai and Hanqiao Feng
Laboratory of Biochemistry, National Cancer Institute, Bethesda, MD, USA
To facilitate the process of protein design and learn the basic
rules that control the structure and stability of proteins,
combinatorial methods have been developed to select or
screen proteins with desired properties from libraries of
mutants One such method uses phage-display and
proteo-lysis to select stably folded proteins This method does not
rely on specific properties of proteins for selection
There-fore, in principle it can be applied to any protein Since its
first demonstration in 1998, the method has been used to create hyperthermophilic proteins, to evolve novel folded domains from a library generated by combinatorial shuffling
of polypeptide segments and to convert a partially unfolded structure to a fully folded protein
Keywords: hydrophobic repacking; phage-display; protein design; proteolysis
Introduction
There are two basic biophysical issues in protein design One
is to find mutations that make proteins thermodynamically
more stable The other is to find an amino acid sequence for
a polypeptide chain that will fold to a target structure The
first issue is important for developing therapeutic drugs and
useful enzymes in industry The second issue is more critical
for learning and testing the principles of protein folding
Although significant progress has been made towards
rational design of proteins with simple motifs [1–3], it is
still difficult to design native-like proteins with globular
structures [4] In addition, it is still not completely clear how
the stability of a protein is encoded in the protein’s sequence
and how individual amino acid residues contribute to
stability Thus, combinatorial approaches to select or screen
proteins with the desired properties from libraries of mutant
proteins have been sought [5–8] Phage-display coupled with
proteolysis for selection of stably folded proteins was one
such recently developed method It was first demonstrated
in 1998 by two research groups [9,10] based on the following
considerations: (a) stably folded and well structured proteins
should be more resistant to protease digestion than those
less stable and poorly folded; (b) M13 and fd phages are
resistant to cleavage by many proteases; (c) the surface g3p
proteins of phage are needed for bacteria infection, which
allows the coupling between the cleavage of inserted guest
proteins and the loss of phage infection In these
demon-strations (Fig 1A), guest variants of a protein (mutants of
barnase and ribonuclease T4, respectively) with different thermodynamic stability were inserted into the region between the C-terminal region and the two N-terminal domains of the g3p After several rounds of protease digestion and amplification of the library, variants with high thermodynamic stability were enriched over those that have low thermodynamic stability
A similar approach but with a different selection strategy (Fig 1B) was developed later [11,12] His-tagged guest proteins were fused at the N-terminal of the g3p protein Selection of phages with uncut proteins was made using Ni coated chips and monitored using surface plasmon resonance In this study, the authors aim to demonstrate that stably folded protein structures can be obtained by focusing on the design of a hydrophobic core:
a core-directed protein design approach This design process has three steps: first, generation of multiple core mutants of the target protein; secondly, display of mutants on the phage surface; and finally, selection for stably folded mutants by challenging the system with protease The concept was demonstrated by studying the core packing of ubiquitin Eight hydrophobic core residues in ubiquitin were mutated randomly and simul-taneously with all 20 amino acids The mutants were displayed on the surface of the phage and challenged with the protease chymotrypsin The selected sequences were found to be very close to the wild type, consistent with the hypothesis that a hydrophobic core may be used to direct protein design for globular proteins The authors conclu-ded that the best solution to the core-packing problem for ubiquitin is the natural wild type sequence, or residue combinations extremely close to it This result is similar
to the earlier conclusion obtained using combinatorial computational methods [13,14] Intriguingly, however, all selected proteins are less stable than the wild type, and wild type protein is not selected, suggesting that other factors may prohibit the selection of the most stable proteins One possible reason is that there are protease digestion sites in the loop region that might unfold locally (Table 1)
Correspondence to Y Bai, Laboratory of Biochemistry, National
Cancer Institute, Building 37, Room 6114E, Bethesda, MD 20892,
USA Fax: + 1 301 402 3095, Tel.: + 1 301 594 2375,
E-mail: yawen@helix.nih.gov
Abbreviations: Bc-Csp, cold shock protein from Bacillus caldolyticus;
Bs-CspB, cold shock protein from Bacillus subtilis; cyt b 562 ,
cytochrome b 562
(Received 5 January 2004, revised 11 February 2004,
accepted 5 March 2004)
Trang 2Towards selection of stable proteins
Despite the significant effort that has been made toward
studying the stability of proteins, it is still not fully
understood how the stability of a protein is encoded in its
sequence and how individual amino acid residues contribute
to stability To learn the factors that stabilize the proteins,
researchers have recently become interested in studying the
proteins from thermophilic organisms These proteins are
stable at very high temperatures Thus, it is hoped that the
rules for stabilizing proteins may be revealed after
compar-ing the thermophilic proteins with the mesophilic proteins
This is, however, complicated by the fact that a lot of
neutral mutations exist, which makes it difficult to find
which mutation or combination of mutations is important
for stability To gain insights into this issue, Martin et al
[15] used phage-display with proteolysis to convert the mesophilic cold shock protein, Bs-CspB, from Bacillus subtilis to a hyperthermophilic protein in a relatively controlled manner Mesophilic Bs-CspB differs from its thermophilic counterpart Bc-Csp from Bacillus caldolyticus
at 12 surface-exposed positions In their study, six of these positions were randomized by saturation mutagenesis, in which any of the 20 amino acids can occur at each of the six positions Selection was made under two different conditions: in the presence of guanidinium chloride and at elevated temperature Several of the selected mutants are significantly more stable than the naturally thermostable homolog Bc-Csp, and the best variant reaches the stability
of Tm-Csp (the homolog from the hyperthermophile Thermototoga maritime) Interestingly, this variant differs from Tm-Csp at five positions and from Bc-Csp at all six randomized positions, indicating that proteins can be strongly stabilized by many different sets of surface mutations Furthermore, the selection is found to be dependent on selection conditions In the ionic denaturant (guanidinium chloride) solution, nonpolar surface inter-actions were optimized, whereas at elevated temperatures variants with improved electrostatics were selected, pointing
to different strategies for stabilization at the protein surface Pedersen et al [16] also attempted a similar experiment to seek stable mutants of barnase Using a subset of codons that only encode hydrophobic residues, a library of barnase mutants was made by randomizing the residues at the 17 positions that are different from those in the homologue protein binase The library was then challenged with trypsin Among the 20 clones selected, 10 were studied for their stability None of the selected mutants was found to be more stable than the wild type barnase This result has been attributed to possible local unfolding in barnase (Table 1) Towards selection of protein structures
It has been suggested that proteins occurring in nature have been evolved by the assembly of nonhomologue genes For small protein domains, they may have evolved by assembly and/or exchange of small gene segments, leading to diversification of the domain architecture and even genera-tion of an entirely new fold Riechman & Winter [17] have investigated this proposal Using phage-display and pro-teolysis, selected stably folded proteins from a phage library
in which the DNA encoding the N-terminal half of a b-barrel domain (from cold shock protein CspA) was substituted with fragmented genomic Escherichia coli DNA The phage library was then challenged by several proteases
Table 1 Summary of proteins studied using phage-display with proteolysis In barnase, R110 is the last residue, R59 and K62 are in the loop region.
In Ubiquitin, residues F45 and Y59 are in the loop region TS, tagged selection; SIP, selectively infective phage.
Molecule Protease Positions Stability Structure change Cutting site in loops Method Barnase Trypsin surface/core decrease no yes SIP Ubiquitin Chymotrypsin core decrease no yes TS RNaseT1(4A) Trypsin/Chymotrypsin/Pepsin surface increase no no SIP
CspA Trypsin/Thermolysin surface/core increase yes no SIP
Fig 1 Two different ways for selecting stably folded proteins using
phage-display with proteolysis (A) Selectively infective phage (SIP) uses
the fact that the N-terminal domains (N1, N2) of the minor coat
protein (g3p) are responsible for binding and infection in E coli Thus,
incorporation of a library of target proteins between the N-terminal
domain and the C-terminal (CT) domain allows a protease-based
selection because proteolysis of the target protein also removes the
N-terminal domains and prevents the infection of phage in E coli (B)
A library of target proteins with a tag can be fused to the g3p protein
on the surface of the phage The tag can be a His-tag [11] or
antibody-binding proteins such as the protein AB-domain [18] Removal of
unstable proteins by proteolysis also removes the tag and prevents the
phage associated with it from being selected for further infection of
E coli.
Trang 3Four proteins selected from the library were soluble and
were characterized using NMR, CD and amide hydrogen
exchange The CD spectra indicated formation of a b-sheet
structure consistent with the segment from the CspA
Thermal melting of the selected proteins was cooperative
The thermodynamic stability of the proteins ranged from
1.8 to 5.3 kcalÆmol)1 NMR spectra of these proteins
showed sharp peaks, suggesting folded proteins were
selected Detailed structural information is needed to
demonstrate its final success
In a more recent test for the core-directed design
proposal, Chu et al [18] have converted a partially unfolded
state, apocytochrome b562, to a fully folded four-helix
bundle protein in the absence of any cofactors In this
work, the authors used the method similar to that of
Finucane et al [11] except that the protein A B-domain
instead of His-tag was used to select the folded proteins
Cytochrome b562 (cyt b562) is a four-helix bundle protein
with a heme holding the N- and C-terminal helices
(Fig 2A) In the absence of heme, apocytochrome b562
adopts a partially unfolded conformation with the
C-terminal helix largely unfolded while the other three
helices remain folded To create a four-helix bundle protein
in the absence of heme, four residues at positions 7, 98, 102
and 106, that are expected to form a hydrophobic core and
substitute the heme, were mutated Residue 7 was changed
to Trp to provide a fluorescence probe for studying the
protein’s physical properties The other three positions were
randomly mutated In addition, residue 99 in the region for
redesign was substituted with Arg to provide a specific
cutting site for protease Arg-c This library of mutants was
displayed on the surface of phage and challenged with
pro-tease Arg-c to select stably folded proteins The consensus
sequences in this selection showed some interesting results Hydrophobic residues occurred at position 98 while hydro-philic residues occurred at positions 102 and 106 Never-theless, the selected proteins were thermodynamically very stable
The structure of one of the selected proteins with Ile, Asn and Gly at positions of 98, 102 and 106, was characterized using multidimensional NMR All four helices were formed in the structure Furthermore, site-directed mutagenesis was used to change one of the two hydrophilic residues to a hydrophobic residue This muta-tion increases the stability of the protein, suggesting that the selection was not solely based on the protein’s global stability Based on the comparison between the NMR structure of the selected protein and a crystal structure of another mutant that has two hydrophobic residues substi-tuting for the two hydrophilic residues, an interpretation for the selection result is proposed In the X-ray structure, the hydrophobic interaction distorted the last turn of the C-terminal helix, which may make the site for proteolysis more accessible We have recently obtained the high resolution structure for the selected protein (Fig 2B) (H Feng & Y Bai, unpublished result) The structure shows that the C-terminal end of the fourth helix moves slightly and uses hydrophobic residues (Y101 and Y105) that are originally packed between the fourth and the third helices in the wild type protein, to participate in the new hydrophobic core of the structure The two hydrophilic residues in the selected structure are now exposed, which explains why hydrophilic residues were selected at these two positions This result confirms the idea of using a hydro-phobic core to direct protein design However, it also shows that proteins can make subtle structural changes to find alternatives to fulfil the hydrophobic interactions, which makes it difficult to predict the selection result
Effect of flexible loops and partially folded intermediate on selection
Depending on the position of protease cutting sites in the structure, the existence of flexible loops and partially unfolded states could have a significant effect on the result
of selection If the cutting sites are in the flexible loop of the native structure, they could prevent the selection of stable proteins By examining the structures of the proteins studied
by the phage-display and proteolysis, we found that protease cutting sites exist in the loop regions for both cases (barnase and ubiquitin) in which the selection did not produce very stable proteins (Table 1) A more serious problem can arise from the existence of partially unfolded states that have the protease digestion sites in their unfolded regions (Fig 3) This is because the mutations in the folded regions of the intermediate do not significantly change the relative population between the intermediate and the fully folded state Therefore, little evolution pressure can be added for selection of stable proteins if mutations are made
in the folded region To be able to select stable mutants using phage-display and proteolysis, it is necessary that the protease cutting sites be close to the mutation sites or
in the region that is exposed only upon global unfolding The stable region may be determined by the existence of the slowest exchanging amide protons
Fig 2 Effect of structural change on the selection (A) Structure of
cyt b 562 Residues M7 and H102 are the ligands of heme Heme is
represented with a red ellipse (B) Hydrophobic residues (Y101 and
Y105) that were originally packed between the third and fourth helix in
the cyt b 562 have become part of the new hydrophobic core Side
chains at positions 102 and 106 that face inside in cyt b 562 have become
exposed in the selected structure.
Trang 4Design of native-like proteins
The major difficulty encountered in protein design has been
that designed proteins often have more heterogeneous
structures than those of typical natural proteins The
initially designed structures often had the correct secondary
structure and topology but lacked the well-packed
hydro-phobic core that is characteristic of most natural proteins
[19,20] Iterative experimental design processes are normally
required to achieve the final target [3,21] This problem
becomes a more critical issue because the proteins designed
recently by computational methods have also failed in this
aspect Nauli et al [22] have redesigned the second b-hairpin
of the protein G B1-domain and obtained a protein that
is more stable than the wild type by 4 kcalÆmol)1 The
structure of this protein has been solved using the X-ray crystallography method [23] It is found that the B-factors of the mutated residues are much higher than those of other residues, indicating that there are significant dynamic motions in the redesigned structure, which may contribute
in part to the thermodynamic stability We also examined another computer-designed protein G B1-domain variant
by Malakauskas & Mayo [24] This redesigned protein is also more stable than the wild type by 4 kcalÆmol)1 In this case, the dynamic behavior of the redesigned protein is even more dramatic Several cross peaks that correspond to the redesigned residues in the 1H-15N HSQC spectrum have very weak intensities even though the structure of the redesigned protein has been solved using NMR [24] Examination of the mutations in the two computer redesigned proteins shows that most of the mutations are from polar to hydrophobic residues Thus, the two designs have essentially reversed the earlier de novo design practice,
in which polar residues were incorporated into the designed hydrophobic core to obtain unique conformation at the expense of protein stability [25] Regarding this issue, it should be noted that these redesigned proteins have 1D1 H-NMR spectra that look very much like those of native-like structures Therefore, it suggests 1D1H-NMR spectrum is insufficient for determining whether a redesigned protein has a more dynamic motion on a fine level and1H-15N HSQC spectra may be a minimum requirement for char-acterizing the dynamic behavior of redesigned proteins in the future As proteins with heterogeneous structures and dynamic behavior in the native state are likely to be more sensitive to protease digestion than those with well-packed structures, phage-display coupled with proteolysis may be useful for solving this difficult problem The backbone dynamics [26] and the 3D structure of the redesigned apocyt b562 determined by NMR clearly show that the protein has a uniquely folded state
Combinatorial computation versus phage-display
Significant progress has been made using combinatorial computation to design proteins [1,13,22,27,28] The advant-age of the computational methods is that they can examine very large numbers of mutations [27] The limitation of the current computational methods, however, is that most of the computer programs need to have the backbone conforma-tions completely fixed in order to make the computation efficient [22,27,29] The fix of the backbone conformations could potentially prevent selection of alternative attractive structures that are slightly different in terms of backbone conformation Earlier work on the T4 lysozyme revealed that over-packed core mutants typically responded by slight alteration of the main chain, preserving near-ideal rotameric side chain conformations [30] Some efforts have been made towards solving this problem For example, backbone freedom was considered in designing proteins by using algebraic parameterization of the backbone for proteins with simple motifs [1] and by manipulating the relative orienta-tions of super secondary structural elements [31] A more general method has also been explored by Desjarlais & Handel [32] Another concern is that computational methods generally lack the consideration of multi-body
Fig 3 Effect of a partially unfolded intermediate on the selection result.
If the cutting site for protease is in the unfolded region of an
interme-diate state, selection of stable proteins will not be achieved because these
mutants will not change the free energy difference between the
inter-mediate (I) and the native (N) states U represents the unfolded state.
Trang 5interactions Therefore, long range effects of a mutation,
which have been shown to be important even in small
proteins [33,34], are not considered in the calculation The
calculated stabilities of selected proteins are not correlated
with those measured in experiments, suggesting a lack of
intrinsic consistency and reliability of the computational
methods [35] In comparison with the computational
method, the major limitation of the phage-display and
proteolysis method is that the size of the library is relatively
small, permitting simultaneous mutations only at about six
positions for each library This limitation may be alleviated
to some extent if the complementary nature of the side chain
interactions is considered An advantage of the
phage-display method is that the backbone of the protein does
not need to be strictly defined and long range effects of
mutations are included automatically, which could explore
the structures that would be missed using computational
methods
Perspectives
The current experimental results of using phage-display and
proteolysis to select stable folded protein structures clearly
indicate that this method is a powerful tool for protein
design Further perfection of the method should help to
provide insights into understanding the forces that stabilize
proteins and to designing proteins with new folds A more
promising aspect is to combine the computational method
with phage-display For example, the computational
approach can be used to identify potentially important
positions for mutation while phage-display and proteolysis
can be used for the final selection
References
1 Harbury, P.B., Plecs, J.J., Tidor, B., Alber, T & Kim, P.S (1998)
High-resolution protein design with backbone freedom Science
282, 1462–1467.
2 Schafmeister, C.E., LaPorte, S.L., Miercke, L.J & Stroud, R.M.
(1997) A designed four helix bundle protein with native-like
structure Nat Struct Biol 4, 1039–1046.
3 Walsh, S.T., Cheng, H., Bryson, J.W., Roder, H & DeGrado,
W.F (1999) Solution structure and dynamics of a de novo designed
three-helix bundle protein Proc Natl Acad Sci USA 96,
5486–5491.
4 DeGrado, W.F., Summa, C.M., Pavone, V., Nastri, F &
Lom-bardi, A (1999) De novo design and structural characterization of
proteins and metalloproteins Annu Rev Biochem 68, 779–819.
5 Ness, J.E., Del Cardayre, S.B., Minshull, J & Stemmer, W.P.
(2001) Molecular Breeding: The natural approach to protein
design Adv Protein Chem 55, 261–286.
6 Wintrode, P.L & Arnold, F.H (2001) Temperature adaptation of
enzymes: lessons from laboratory evolution Adv Protein Chem.
55, 161–226.
7 Kametkar, S., Schiffer, J.M., Xiong, H., Babik, J.M & Hecht,
M.H (1993) Protein design by binary patterning of polar and
non-polar amino acids Science 262, 1680–1685.
8 Michnick, S.W (2001) Exploring protein interactions by
inter-action-induced folding of proteins from complementary peptide
fragments Curr Opin Struct Biol 11, 472–477.
9 Sieber, V., Pluckthun, A & Schmid, F.X (1998) Selecting proteins
with improved stability by a phage-based method Nat Biotechnol.
16, 955–960.
10 Kristensen, P & Winter, G (1998) Proteolytic selection for protein folding using filamentous bacteriophages Fold Des 3, 321–328.
11 Finucane, M.D., Tuna, M., Lees, J.H & Woolfson, D.N (1999) Core-directed protein design I An experimental method for selecting stable proteins from combinatorial libraries Bio-chemistry 38, 11604–11612.
12 Finucane, M.D & Woolfson, D.N (1999) Core-directed protein design II Rescue of a multiply mutated and destabilized variant
of ubiquitin Biochemistry 38, 11613–11623.
13 Lazars, G.A., Desjarlais, J.R & Handel, T.M (1997) De novo design of the hydrophobic core of ubiqutiin Protein Sci 6, 1167–1178.
14 Wernisch, L., Hery, S & Wodak, S.J (2000) Automatic protein design with all atom force-fields by exact and heuristic optimiza-tion J Mol Biol 301, 713–736.
15 Martin, A., Kather, I & Schmid, F.X (2002) Origins of the high stability of an in vitro-selected cold-shock protein J Mol Biol.
318, 1341–1349.
16 Pedersen, J.S., O tzen, D.E & Kristensen, P (2002) Directed evolution of barnase stability using proteolytic selection J Mol Biol 323, 115–123.
17 Riechmann, L & Winter, G (2000) Novel folded protein domains generated by combinatorial shuffling of polypeptide segments Proc Natl Acad Sci USA 97, 10068–10073.
18 Chu, R.A., Takei, J., Knowles, J.R., Andrykovitch, M., Pei, W., Kajava, A.V., Steinbach, P., Ji, X & Bai, Y (2002) Redesign of a four-helix bundle protein by phage display coupled with proteo-lysis and structural characterization by NMR and X-ray crystal-lography J Mol Biol 323, 253–262.
19 Regan, L & DeGrado, W.F (1988) Characterization of a helical protein designed from first principles Science 241, 976–978.
20 Kamtekar, S., Schiffer, J.M., Xiong, H., Babik, J.M & Hecht, M.H (1993) Protein design by binary patterning of polar and nonpolar amino acids Science 262, 1680–1685.
21 Betz, S.F., Bryson, J.W & DeGrado, W.F (1995) Native-like and structurally characterized designed alpha-helical bundles Curr Opin Struct Biol 5, 457–463.
22 Nauli, S., Kuhlman, B & Baker, D (2001) Computer-based redesign of a protein folding pathway Nat Struct Biol 8, 602–605.
23 Nauli, S., Kuhlman, B., Trong, I.L., Stenkamp, R.E., Teller, D & Baker, D (2002) Crystal structures and increased stabilization of the protein G variants with switched folding pathways NuG1 and NuG2 Protein Sci 11, 2924–2931.
24 Malakauskas, S.M & Mayo, S.L., (1998) Design, structure and stability of a hyperthermophilic protein variant Nat Struct Biol.
5, 470–475.
25 Lumb, K.J & Kim, P.S (1995) A buried polar interaction imparts structural uniqueness in a designed heterodimeric coiled coil Biochemistry 34, 8642–8648.
26 Takei, J., Pei, W., Vu, D & Bai, Y (2002) Populating partially unfolded forms by hydrogen exchange-directed protein engineer-ing Biochemistry 41, 12308–12312.
27 Dahiyat, B.I & Mayo, S.L (1997) De novo protein design: fully automated sequence selection Science 278, 82–87.
28 Looger, L.L & Hellinga, H.W (2001) Generalized dead-end elimination algorithms make large-scale protein side-chain struc-ture prediction tractable: implications for protein design and structural genomics J Mol Biol 307, 429–445.
29 Desjarlais, J.R & Handel, T.M (1995) De novo design of the hydrophobic cores of protiens Protein Sci 4, 2006–2018.
30 Baldwin, E.P & Matthews, B.W (1994) Core-packing constraints, hydrophobicity and protein design Curr Opin Biotechnol 5, 396–402.
Trang 631 Su, A & Mayo, S.L (1997) Coupling backbone flexibility and
amino acid sequence selection in protein design Protein Sci 6,
1701–1707.
32 Desjarlais, J.R & Handel, T.M (1999) Side-chain and backbone
flexibility in protein core design J Mol Biol 289, 305–318.
33 Lockless, S.W & Ranganathan, R (1999) Evolutionarily
conserved pathways of energetic connectivity in protein families.
Science 286, 295–299.
34 Russ, W.P & Ranganathan, R (2002) Knowledge-based poten-tial functions in protein design Curr Opin Struct Biol 12, 447–452.
35 Mendes, J., Guerois, R & Serrano, L (2002) Energy estimation in protein design Curr Opin Struct Biol 12, 441–446.