The framework of sheets and helices in the interior of a globular protein is typi-cally constant and conserved in both sequence and structure.. In some globular proteins Figure 6.22, it
Trang 1core of the protein The extensively H-bonded nature of -helices and -sheets is
ideal for this purpose, and these structures effectively stabilize the polar groups of
the peptide backbone in the protein core
The framework of sheets and helices in the interior of a globular protein is
typi-cally constant and conserved in both sequence and structure The surface of a
glob-ular protein is different in several ways Typically, much of the protein surface is
com-posed of the loops and tight turns that connect the helices and sheets of the protein
core, although helices and sheets may also be found on the surface The result is that
the surface of a globular protein is often a complex landscape of different structural
elements These complex surface structures can interact in certain cases with small
molecules or even large proteins that have complementary structure or charge
(Fig-ure 6.20) These regions of complementary, recognizable struct(Fig-ure are formed
typi-cally from the peptide segments that connect elements of secondary structure They
are the basis for enzyme–substrate interactions, protein–protein associations in cell
signaling pathways, and antigen–antibody interactions, and more
The segments of the protein that are neither helix, sheet, nor turn have
tradi-tionally been referred to as coil or random coil Both of these terms are misleading.
Most of these “loop” segments are neither coiled nor random, in any sense of the
words These structures are every bit as organized and stable as the defined
sec-ondary structures They just don’t conform to any frequently recurring pattern
These so-called coil structures are strongly influenced by side-chain interactions
with the rest of the protein
Waters on the Protein Surface Stabilize the Structure
A globular protein’s surface structure also includes water molecules Many of the
polar backbone and side chain groups on the surface of a globular protein make
H bonds with solvent water molecules There are often several such water molecules
per amino acid residue, and some are in fixed positions (Figure 6.21) Relatively few
water molecules are found inside the protein
In some globular proteins (Figure 6.22), it is common for one face of an
-helix to be exposed to the water solvent, with the other face toward the
hydro-phobic interior of the protein The outward face of such an amphiphilic helix
con-sists mainly of polar and charged residues, whereas the inward face contains mostly
nonpolar, hydrophobic residues A good example of such a surface helix is that of
residues 153 to 166 of flavodoxin from Anabaena (Figure 6.22a) Note that the
helical wheel presentationof this helix readily shows that one face contains four
hydrophobic residues and that the other is almost entirely polar and charged
Less commonly, an -helix can be completely buried in the protein interior
or completely exposed to solvent Citrate synthase is a dimeric protein in which
-helical segments form part of the subunit–subunit interface As shown in Figure
6.22b, one of these helices (residues 260 to 270) is highly hydrophobic and contains
only two polar residues, as would befit a helix in the protein core On the other hand,
Figure 6.22c shows the solvent-exposed helix (residues 74 to 87) of calmodulin, which
consists of 10 charged residues, 2 polar residues, and only 2 nonpolar residues
Packing Considerations
The secondary and tertiary structures of ribonuclease A (Figure 6.19) and other
glob-ular proteins illustrate the importance of packing in tertiary structures Secondary
structures pack closely to one another and also intercalate with (insert between)
ex-tended polypeptide chains If the sum of the van der Waals volumes of a protein’s
con-stituent amino acids is divided by the volume occupied by the protein, packing
densi-ties of 0.72 to 0.77 are typically obtained These packing densidensi-ties are similar to those
of a collection of solid spheres This means that even with close packing, approximately
25% of the total volume of a protein is not occupied by protein atoms Nearly all of this
space is in the form of very small cavities Cavities the size of water molecules or larger
do occasionally occur, but they make up only a small fraction of the total protein
vol-FIGURE 6.20 The surfaces of proteins are complemen-tary to the molecules they bind PEP carboxykinase (shown here, pdb id 1K3D) is an enzyme from the metabolic pathway that synthesizes glucose (gluconeo-genesis; see Chapter 22) In the so-called “active site” (yel-low) of this enzyme, catalysis depends on complemen-tary binding of substrates Shown in this image are ADP (brown), a Mg 2 ion (blue), and AlF 3 (a phosphate ana-log, in green, above the Mg 2 ).
FIGURE 6.21 The surfaces of proteins are ideally suited
to form multiple H bonds with water molecules Shown here are waters (blue and white) associated with actini-din, an enzyme from kiwi fruit that cleaves polypeptide chains at arginine residues (pdb id 2ACT).The polar backbone atoms and side chain groups on the surface
of actinidin are extensively H-bonded with water.
Go to CengageNOW at www cengage.com/login and click BiochemistryInteractive
to examine the secondary and tertiary structure of ribonuclease.
Trang 23
4
5
6 7
8
9
10
11
12 Asp 153
13 14
Lys Lys
Ala Asp
Ser Ser Glu
Glu Arg Trp
Leu Ile
Val
(a)
-Helix from flavodoxin (residues 153–166)
(b)
(c)
2
1
3
4
5
6 7
8
9
10 11
Leu 260
Gly Ala
Ala Ser
Leu Phe
Met
Ala Ala
Asn
-Helix from citrate synthase (residues 260–270)
2
1
3
4
5
6 7
8
9
10
11
12 Arg 74
13 14
Asp Ile
Glu Lys
Arg Thr Glu
Glu Met Asp
Glu Lys
Ser
-Helix from calmodulin (residues 74–87)
2
ACTIVE FIGURE 6.22 The so-called helical wheel presentation can reveal the polar or nonpo-lar character of -helices If the helix is viewed end on, and the residues are numbered with residue 1 closest
to the viewer, it is easy to see how polar and nonpolar residues are distributed to form a wheel (a) The-helix
consisting of residues 153–166 (red) in flavodoxin from Anabaena is a surface helix and is amphipathic (pdb id
1RCF) (b) The two helices (orange and red) in the interior of the citrate synthase dimer (residues 260–270 in
each monomer) are mostly hydrophobic (pdb id 5CSC) (c) The exposed helix (residues 74–87—red) of
calmodulin is entirely accessible to solvent and consists mainly of polar and charged residues (pdb id 1CLL).
Test yourself on the concepts in this figure at www.cengage.com/login
Trang 3HUMAN BIOCHEMISTRY
Collagen-Related Diseases
Collagen provides an ideal case study of the molecular basis of
physiology and disease For example, the nature and extent of
col-lagen crosslinking depends on the age and function of the tissue
Collagen from young animals is predominantly un-crosslinked and
can be extracted in soluble form, whereas collagen from older
an-imals is highly crosslinked and thus insoluble The loss of flexibility
of joints with aging is probably due in part to increased
crosslink-ing of collagen
Several serious and debilitating diseases involving collagen
ab-normalities are known Lathyrism occurs in animals due to the
regular consumption of seeds of Lathyrus odoratus, the sweet pea,
and involves weakening and abnormalities in blood vessels, joints,
and bones These conditions are caused by -aminopropionitrile
(see figure), which covalently inactivates lysyl oxidase, preventing
intramolecular crosslinking of collagen and causing abnormalities
in joints, bones, and blood vessels
Scurvy results from a dietary vitamin C deficiency and in-volves the inability to form collagen fibrils properly This is the result of reduced activity of prolyl hydroxylase, which is vitamin C–dependent, as previously noted Scurvy leads to lesions in the skin and blood vessels, and in its advanced stages, it can lead to grotesque disfiguration and eventual death Although rare in the modern world, it was a disease well known to sea-faring explorers
in earlier times who did not appreciate the importance of fresh fruits and vegetables in the diet
A number of rare genetic diseases involve collagen
abnormali-ties, including Marfan’s syndrome and the Ehlers–Danlos syndromes,
which result in hyperextensible joints and skin The formation of
atherosclerotic plaques, which cause arterial blockages in advanced stages, is due in part to the abnormal formation of collagenous structures in blood vessels
N C CH2 CH2
-Aminopropionitrile
NH+ 3
FIGURE 6.23 Ton-EBP is a DNA-binding protein consist-ing of two distinct domains The N-terminal domain is shown here on the right, with DNA (orange) in the middle, and the C-terminal domain on the left (pdb id
1IMH).
ume It is likely that such cavities provide flexibility for proteins and facilitate
confor-mation changes and a wide range of protein dynamics (discussed later)
Protein Domains Are Nature’s Modular Strategy for Protein Design
Proteins range in molecular weight from a thousand to more than a million It is
tempting to think that the size of unique globular, folded structures would increase
with molecular weight, but this is not what has been observed Proteins composed
of about 250 amino acids or less often have a simple, compact globular shape
How-ever, larger globular proteins are usually made up of two or more recognizable and
distinct structures, termed domains or modules—compact, folded protein
struc-tures that are usually stable by themselves in aqueous solution Figure 6.23 shows a
two-domain DNA-binding protein, TonEBP, in which the two distinct domains are
joined by a short segment of the peptide chain Most domains consist of a single
continuous portion of the protein sequence, but in some proteins the domain
se-quence is interrupted by a sese-quence belonging to some other part of the protein
Trang 4that may even form a separate domain (Figure 6.24) In either case, typical domain structures consist of hydrophobic cores with hydrophilic surfaces (as was the case for ribonuclease, Figure 6.19) Importantly, individual domains often possess unique functional behaviors (for example, the ability to bind a particular ligand with high affinity and specificity), and an individual domain from a larger protein often expresses its unique function within the larger protein in which it is found Multidomain proteins typically possess the sum total of functional properties and behaviors of their constituent domains
It is likely that proteins consisting of multiple domains (and thus multiple func-tions) evolved by the fusion of genes that once coded for separate proteins This would require gene duplication to be common in nature, and analysis of completed genomes has confirmed that approximately 90% of domains in eukaryotes have been duplicated Thus, the protein domain is a fundamental unit in evolution Many proteins have been “assembled” by duplicating domains and then combining them in different ways Many proteins are assemblies constructed from several in-dividual domains, and some proteins contain multiple copies of the same domain Figure 6.25 shows the tertiary structures of nine domains that are frequently dupli-cated, and Figure 6.26 presents several proteins that contain multiple copies of one
or more of these domains
FIGURE 6.24 Malonyl CoA:ACP transacylase (pdb id
1NM2) is a metabolic enzyme consisting of two
subdo-mains The large subdomain (blue) includes residues
1–132 and 198–316 and consists of a -sheet
sur-rounded by 12 -helices.The small subdomain (gold
residues 133–197) consists of a four-stranded
antiparal-lel-sheet and two -helices.
(a)
(f)
(e) (b)
1 nm
FIGURE 6.25 Ribbon structures of several protein modules used in the construction of complex multimodule
proteins (a) The complement control protein module (pdb id 1HCC) (b) The immunoglobulin module
(pdb id 1T89) (c) The fibronectin type I module (pdb id 1Q06) (d) The growth factor module (pdb id 1FSB) (e) The kringle module (pdb id 1HPK) (f) The GYF module (pdb id 1GYF) (g) The
-carboxygluta-mate module (pdb id 1CFI) (h) The FF module (pdb id 1UZC) (i) The DED domain (pdb id 1A1W).
Trang 5Classification Schemes for the Protein Universe Are Based on Domains
The astounding diversity of properties and behaviors in living things can now be
explored through the analysis of vast amounts of genomic information Assessment
of sequence and structural data from several million proteins in both protein and
genome databases has shown that there is a relatively limited number of structurally
distinct domains in proteins Several comprehensive projects have organized the
available information in defined hierarchies or levels of protein structure
The Structural Classification of Proteins database (SCOP, http://scop.mrc-lmb
.cam.ac.uk/scop) recognizes five overarching classes, which encompass most
pro-teins SCOP is based on hierarchical levels that embody the evolutionary and
struc-tural relationships among known proteins, and protein classification in SCOP is
es-sentially a manual process using visual inspection and comparison of structures
CATH is another hierarchical classification system (http://www.cathdb.info) that
groups protein domain structures into evolutionary families and structural
group-ings, depending on sequence and structure similarities CATH differs from SCOP
Factors VII,
IX, X and
protein C
F1
F2
Factor XII
F1
tPA
Clr,Cls
C2, factor B
F1
F1
F1
F1
F1
F1
F2
F2
F1
F1
F1
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
F1
F1
F1
Fibronectin
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
F3
Twitchin
F3
F3
Plasma
membrane
FIGURE 6.26 A sampling of proteins that consist of mo-saics of individual protein modules The modules shown includeCG, a module containing -carboxyglutamate
residues; G, an epidermal growth factor–like module;
K, the “kringle” domain, named for a Danish pastry;
C, which is found in complement proteins; F1, F2, and F3, first found in fibronectin; I, the immunoglobulin superfamily domain; N, found in some growth factor receptors; E, a module homologous to the calcium-binding E–F hand domain; and LB, a lectin module found in some cell surface proteins (Adapted from Baron,
M., Norman, D., and Campbell, I., 1991 Protein modules Trends in
Biochemical Sciences 16:13–17.)
Trang 6in that it combines manual analysis with automation based on quantitative algo-rithms to classify protein structures Figure 6.27 compares the hierarchical struc-tures of SCOP and CATH and defines the different levels of structure
Although the hierarchical names in SCOP and CATH differ somewhat, there are
common threads shared in these schemes Class is determined from the overall com-position of secondary structure elements in a domain A fold describes the number, arrangement, and connections of these secondary structure elements A superfamily
includes domains of similar folds and usually similar functions, thus suggesting a
common evolutionary ancestry A family usually includes domains with closely
re-lated amino acid sequences (in addition to folding similarities) Although the num-bers of unique folds, superfamilies, and families increase as more genomes are
known and analyzed, it has become apparent that the number of protein domains in na-ture is large but limited How many proteins can we expect to identify and understand
someday? There are approximately 103to 105genes per organism and approximately 13.6 million species of living organisms on earth (and this latter number is likely an underestimate) Thus, there may be approximately (103 1.36 107) or 1010to 1012 different proteins in all organisms on earth Still, this vast number of proteins may well consist of only about 105sequence domain families (Figure 6.27) and
approxi-The CATH Hierarchy
The SCOP Hierarchy
Class (4)
Class (7)
Architecture (40)
Topology (1084)
Fold (1086)
Superfamily (1777)
Family (3464)
Homologous Superfamily (2091)
Sequence Family (7794)
Domain (93885)
Domain (97178)
FIGURE 6.27 SCOP and CATH are hierarchical
classifica-tion systems for the known proteins Proteins are
classi-fied in SCOP by a manual process, whereas CATH
com-bines manual and automated procedures Numbers
indicate the population of each category.
Trang 7mately 103protein folds of known structure—a remarkably small number compared
to the total number of protein-coding genes (see Table 1.6) It is anticipated that
most newly identified proteins will resemble other known proteins and that most
structures can be broken into two or more domains, which resemble tertiary
struc-tures observed in other proteins
Because structure depends on sequence, and because function depends on
struc-ture, it is tempting to imagine that all proteins of a similar structure should share a
common function, but this is not always true For example, the TIM barrel is a
com-mon protein fold consisting of eight -helices and eight -strands that alternate
along the peptide backbone to form a doughnutlike tertiary structure The TIM
barrel is named for triose phosphate isomerase, an enzyme that interconverts
ke-tone and aldehyde substrates in the breakdown of sugars (see Chapter 18)
How-ever, other TIM barrel proteins carry out very different functions (Figure 6.28a),
including the reduction of aldose sugars and hydrolysis of phosphate esters
More-over, not all proteins of similar function possess similar domains Both proteins
shown in Figure 6.28b catalyze the same reaction, but they bear little structural
sim-ilarity to each other
Denaturation Leads to Loss of Protein Structure and Function
Whereas the primary structure of proteins arises from covalent bonds, the
sec-ondary, tertiary, and quaternary levels of protein structure are maintained by weak,
noncovalent forces The environment of a living cell is exquisitely suited to
main-tain these weak forces and to preserve the structures of its many proteins However,
a variety of external stresses—for example, heat or chemical treatment—can disrupt
Aspartate aminotransferase
D -amino acid aminotransferase
Same domain type, different functions:
(a)
Same function, different structures:
(b)
FIGURE 6.28 (a) Some proteins share similar structural
features but carry out quite different functions (triose phosphate isomerase, pdb id 8TIM; aldose reductase, pdb id 1ADS; phosphotriesterase, pdb id 1DPM).
(b) Proteins with quite different structures can carry
out similar functions (yeast aspartate aminotransferase, pdb id 1YAA); D -amino acid aminotransferase, pdb id
3DAA).
Trang 8these weak forces in a process termed denaturation—the loss of protein structure
and function
An everyday example is the denaturation of the protein ovalbumin during the cooking of an egg (Figure 6.29) About 10% of the mass of an egg white is protein, and 54% of that is ovalbumin When a chicken egg is cracked open, the “egg white”
is a nearly transparent, viscous fluid Cooking turns this fluid to a solid, white mass The egg white proteins have unfolded and have precipitated out of solution, and the unfolded proteins have aggregated into a solid mass
As a typical protein solution is heated slowly, the protein remains in its native state until it approaches a characteristic melting temperature, Tm As the solution is heated further, the protein denatures over a narrow range of temperatures around
Tm(Figure 6.30) Denaturation over a very small temperature range such as this is
evidence of a two-state transition between the native and the unfolded states of the
protein, and this implies that unfolding is an all-or-none process: When weak forces are disrupted in one part of the protein, the entire structure breaks down
Most proteins can also be denatured below the transition temperature by a vari-ety of chemical agents, including acid or base, organic solvents, detergents, and par-ticular denaturing solutes Guanidine hydrochloride and urea are examples of the latter (Figure 6.31) Denaturation in all these cases involves disruption of the weak forces that stabilize proteins Covalent bonds are not affected Acids and bases cause protonation and deprotonation of dissociable groups on the protein, altering ionic interactions and hydrogen bonds Organic solvents and detergents disrupt hydro-phobic interactions that bury nonpolar groups in the protein interior The effects
of guanidine hydrochloride and urea are more complex Recent research indicates
Ovalbumin monomer
FIGURE 6.29 The proteins of egg white are denatured (as evidenced by their precipitation and aggregation) during cooking More than half of the protein in egg whites is ovalbumin Ovalbumin pdb id 1OVA.
[GdmCl] (M)
0.8 1.0
0
0.6
0.4
0.2
(b)
FIGURE 6.31 Proteins can be denatured (unfolded) by high concentrations of guanidine-HCl or urea The denaturation of chymotrypsin is plotted here.(Adapted from Fersht, A., 1999 Structure and Mechanism in Protein Science.
1.00
Temperature (°C)
0.75
0.50
0.25
0.00
FIGURE 6.30 Proteins can be denatured by heat, with
commensurate loss of function Ribonuclease A (blue)
and ribonuclease B (red) lose activity above about 55°C.
These two enzymes share identical protein structures,
but ribonuclease B possesses a carbohydrate chain
attached to Asn 34 (Adapted from Arnold, U., and
Ulbrich-Hofmann, R., 1997 Kinetic and thermodynamic thermal
stabili-ties of ribonuclease A and ribonuclease B Biochemistry 36:
2166-2172.)
NH2
H2N C O
+
NH2
H2N C
NH2
(a)
Guanidine HCl
Urea
Cl–
Trang 9that these agents denature proteins by both direct effects (binding to hydrophilic
groups on the protein) and indirect effects (altering the structure and dynamics of
the water solvent) Also, both guanidine hydrochloride and urea are good H-bond
donors and acceptors
Anfinsen’s Classic Experiment Proved That Sequence
Determines Structure
As noted earlier (Section 6.2), all the information needed to fold a polypeptide into
its native structure is contained in the amino acid sequence This simple but
pro-found truth of protein structure was confirmed in the 1950s by the elegant studies
of denaturation and renaturation of proteins by Christian Anfinsen and his
co-workers at the National Institutes of Health For their pivotal studies, they chose the
small enzyme ribonuclease A from bovine pancreas, a protein with 124 residues
and four disulfide bonds (Figures 6.19 and 6.32) (Ribonuclease cleaves chains of
95
95
40
40
26
26
110 110
58
58
84 84
72
72
65 65
58
72
Hypothetical Inactive Form
(Note random formation of disulfides)
– MCE – Urea + Oxygen
Small amount of MCE
w/gentle warming
– MCE + Oxygen
+ MCE + Urea
FIGURE 6.32 Ribonuclease can be unfolded by treatment with urea, and -mercaptoethanol
(MCE) cleaves disulfide bonds If -mercaptoethanol is then removed (but not urea) under
oxidizing conditions, disulfide bonds reform in the still-unfolded protein (one possible
hypo-thetical inactive form is shown) If urea is removed in the presence of a small amount of
-mercaptoethanol with gentle warming, ribonuclease returns to its native structure (with the
correct set of disulfide bonds), and full enzymatic activity is restored This experiment
demon-strated that the information required for folding of globular proteins is contained in the
pri-mary structure.
Trang 10ribonucleic acid Only ribonuclease in its native structure posseses enzyme activity,
so loss of activity in a denaturation experiment was proof of loss of structure.) They treated solutions of ribonuclease with a combination of urea, which unfolded the protein, and mercaptoethanol, which reduced the disulfide bridges This treatment destroyed all enzymatic activity of ribonuclease
Anfinsen discovered that removing the mercaptoethanol but not the urea re-stored only 1% of the enzyme activity This was attributed to the formation of ran-dom disulfide bridges by the still-denatured protein With eight Cys residues, there are 105 possible ways to make four disulfide bridges; thus, a residual activity of 1% made sense to Anfinsen (The first Cys to form a disulfide has seven possible part-ners, the next Cys has five possible partpart-ners, the next has three, and the last Cys has only one choice 7 5 3 1 = 105) However, if Anfinsen removed
mercap-toethanol and urea at the same time, the polypeptide was able to fold into its native
structure, the correct set of four disulfides reformed, and full enzyme activity was recovered (Figure 6.32) This experiment demonstrated that the information needed for protein folding resided entirely within the amino acid sequence of the protein itself Many subsequent experiments with a variety of proteins have con-firmed this fundamental postulate For his studies of the relationship of sequence and structure, Anfinsen shared the 1972 Nobel Prize in Chemistry (with William H Stein and Stanford Moore)
Is There a Single Mechanism for Protein Folding?
Christian Anfinsen’s experiments demonstrated that proteins can fold reversibly A corollary result of Anfinsen’s work is that the native structures of at least some glob-ular proteins are thermodynamically stable states But the matter of how a given pro-tein achieves such a stable state is a complex one Cyrus Levinthal pointed out in
1968 that so many conformations are possible for a typical protein that the protein does not have sufficient time to reach its most stable conformational state by
sam-pling all the possible conformations This argument, termed Levinthal’s paradox, goes
as follows: Consider a protein of 100 amino acids Assume that there are only two conformational possibilities per amino acid, or 2100 1.27 1030possibilities Allow
1013sec for the protein to test each conformational possibility in search of the over-all energy minimum:
(1013sec)(1.27 1030) 1.27 1017sec 4 109years Four billion years is the approximate age of the earth
Levinthal’s paradox led protein chemists to hypothesize that proteins must fold
by specific “folding pathways,” and many research efforts have been devoted to the search for these pathways Several consistent themes have emerged from these stud-ies Each of them may well play a role in the folding process:
• Secondary structures—helices, sheets, and turns—probably form first
• Nonpolar residues may aggregate or coalesce in a process termed a hydrophobic
collapse.
• Subsequent steps probably involve formation of long-range interactions between secondary structures or involving other hydrophobic interactions
• The folding process may involve one or more intermediate states, including
tran-sition states and what have become known as molten globules.
The folding of most globular proteins may well involve several of these themes For example, even in the denatured state, many proteins appear to possess small
amounts of residual structure due to hydrophobic interactions, with strong
inter-residue contacts between side chains that are relatively distant in the native protein structure Such interactions, together with small amounts of secondary structure,
may act as sites of nucleation for the folding process A bit further in the folding
process, the molten globule is postulated to be a flexible but compact form charac-terized by significant amounts of secondary structure, virtually no precise tertiary structure, and a loosely packed hydrophobic core Moreover, it is likely that any one