Although including the measured and simulated pK shifts into the model of unfolded state changes the pH dependence of the unfolding free energy, it most of the cases it does not change t
Trang 1Numerical calculations of the pH of maximal protein stability
The effect of the sequence composition and three-dimensional structure
Emil Alexov
Howard Hughes Medical Institute and Columbia University, Biochemistry Department, New York, USA
A large number of proteins, found experimentally to have
different optimum pH of maximal stability, were studied to
reveal the basic principles of their preferenence for a
par-ticular pH The pH-dependent free energy of folding was
modeled numerically as a function of pH as well as the net
charge of the protein The optimum pH was determined in
the numerical calculations as the pH of the minimum free
energy of folding The experimental data for the pH of
maximal stability (experimental optimum pH) was
repro-ducible (rmsd¼ 0.73) It was shown that the optimum pH
results from two factors – amino acid composition and the
organization of the titratable groups with the 3D structure
It was demonstrated that the optimum pH and isoelectric
point could be quite different In many cases, the optimum
pH was found at a pH corresponding to a large net charge of the protein At the same time, there was a tendency for proteins having acidic optimum pHs to have a base/acid ratio smaller than one and vice versa The correlation between the optimum pH and base/acid ratio is significant if only buried groups are taken into account It was shown that
a protein that provides a favorable electrostatic environment for acids and disfavors the bases tends to have high optimum
pH and vice versa
Keywords: electrostatics; pH stability; pKa; optimum pH
The concentration of hydrogen ions (pH) is an important
factor that affects protein function and stability in different
locations in the cell and in the body [1] Physiological pH
varies in different organs in human body: the pH in the
digestive tract ranges from 1.5 to 7.0, in the kidney it ranges
from 4.5 to 8.0, and body liquids have a pH of 7.2–7.4 [2] It
was shown that the interstitial fluid of solid tumors have
pH¼ 6.5–6.8, which differs from the physiological pH of
normal tissue and thus can be used for the design of pH
selective drugs [3]
The structure and function of most macromolecules are
influenced by pH, and most proteins operate optimally at a
particular pH (optimum pH) [4] On the basis of indirect
measurements, it has been found that the intracellular pH
usually ranges between 4.5 and 7.4 in different cells [5] The
organelles’ pH affects protein function and variation of pH
away from normal could be responsible for drug resistance
[6] Lysosomal enzymes function best at the low pH of 5
found in lysosomes, whereas cytosolic enzymes function
best at the close to neutral pH of 7.2 [1]
Experimental studies of pH-dependent properties [7–11]
such as stability, solubility and activity, provide the benchmarks
for numerical simulation Experiments revealed that
altho-ugh the net charge of ribonuclease Sa does affect the solubility, it does not affect the pH of maximal stability or activity [12] Another experimental technique as acidic or basic denaturation [13–15] demonstrates the importance of electrostatic interactions on protein stability
pH-dependent phenomena have been extensively mode-led using numerical approaches [16–19] A typical task is to compute the pKas of ionizable groups [20–26], the isoelectric point [27,28] or the electrostatic potential distribution around the active site [29] It was shown that activity of nine lipases correlates with the pH dependence of the electrostatic potential mapped on the molecular surface of the molecules [29] pH dependence of unfolding energy was modeled extensively and the models reproduced reasonable the experimental denaturation free energy as a function of
pH [19,30–36]
The success of the numerical protocol to compute the
pH dependence of the free energy depends on the model
of the unfolded state, the model of folded state and thus
on the calculated pKas It is well recognized that the unfolded state is compact and native-like, but the magni-tude of the residual pairwise interactions and the desol-vation energies has been debated Some of the studies found that any residual structure of the unfolded state has negligible effect on the calculated pH dependence of unfolding free energy [31], while others found the opposite [33–36] It was estimated that the pKas of the acidic groups in unfolded state are shifted by – 0.3 pK units in respect to the pKas of model compounds Although including the measured and simulated pK shifts into the model of unfolded state changes the pH dependence of the unfolding free energy, it most of the cases it does not change the pH of maximal stability [33–36] Much more
Correspondence to E Alexov, Howard Hughes Medical Institute and
Columbia University, Biochemistry Department, 630W 168 Street,
New York, NY 10032, USA.
Fax: + 1 212 305 6926, Tel.: + 1 212 305 0265,
E-mail: ea388@columbia.edu
Abbreviations: MCCE, multi-conformation continuum electrostatic;
SAS, solvent accessible surface.
(Received 15 September 2003, accepted 11 November 2003)
Trang 2important is the modeling of the folded state, where the
errors of computing pKas could be significantly larger
than 0.3 units Over the years it has been a continuous
effort to develop methods for accurate pKa predictions
[20,21] These include empirical methods [37], macroscopic
methods [38–41], finite difference Poisson–Boltzmann
(FDPB)-based methods [20–22,42], FDPB and molecular
dynamics [43–45], FDPB and molecular mechanics
[25,46,47] and Warshel’s microscopic methods (e.g.,
[16,17]) The predicted pKas were benchmarked against
the experimental data and the average rmsd were found to
vary from the best value of 0.5pK [38], to 0.7pK [48], to
0.83pK [25] and to 0.89 [22] Multi-Conformation
Con-tinuum Electrostatics (MCCE) [25] method was shown to
be among the best pKas predictors and it will be
employed in this work
In the present work we compute the pH dependence of
the free energy of folding and the net charge The optimum
pH was identified as the pH at which the free energy of
folding has minimum A large number of proteins having
different optimum pH [49] were studied to find the effect of
the amino acid composition and 3D structure on the
optimum pH
Experimental procedures
Methods Calculations were carried out using available 3D structures
of selected proteins A text search was performed on BRENDA database [49] in the field of pH of stability Fol-lowing search strings were used: maximal stability, maxi-mum stability, optimal stability, optimaxi-mum stability, best stability, highest stability and greatest stability This revealed 168 proteins with experimentally determined pHs
of maximal stability Then a search of the Protein Data Base (PDB) was performed to find available structures for these proteins An attempt was made to select PDB structures of proteins from the same species as those used in the experiment (43 structures) Structures with missing residues were omitted as well as the structures of proteins participa-ting in large complexes resulparticipa-ting in the final set of 28 protein structures The protein names, the PDB file names and the experimental pH of maximal stability are provided in Table 1 The source of the data is BRENDA database and thus the present study is limited to the proteins listed there There will always be proteins with experimentally determined
Table 1 Proteins and corresponding PDB [57] files used in the paper The experimental optimum pH (pH of optimal stability) is taken from BRENDA website [49] The calculated optimum pH (the pH of the minimum of free energy of folding) is given in the forth column The difference is the calculated optimum pH minus the experimental number (fifth column) Bases/acid ratio for all ionizable groups is in sixth column, while the seventh shows the bases/acids ratio for 66% buried groups The last three columns show the averaged intrinsic pK shift, the averaged pK a shift and the net charge of the folded protein at pH optimum, respectively.
Protein pdb code
Experimental optimum pH
Calculated optimum
pH Difference
Base/acid ratio
Buried base/acid ratio
Averaged intrinsic
pK shift
Averaged
pK a shift
Net charge at optimum pH
Dioxygenase 1b4u 8.0 8.0 0.0 0.94 1.33 0.08 ) 0.51 ) 3.0 Transferase 1f8x 6.5 5.0 ) 1.5 0.72 0.28 0.40 0.34 ) 5.5 Glutathione synthetase 1sga 8.07.5 ) 0.5 0.87 0.88 0.41 ) 0.58 ) 10.0 Isomerase 1b0z 6.0 6.0 0.0 1.02 0.90 0.05 ) 0.48 2.1 Coenzyme A 1bdo 6.5 7.0 0.5 0.67 1.50 0.22 0.03 ) 4.1 Dienelactone hydrolase 1din 7.06.5 ) 0.5 1.04 1.17 0.26 ) 0.36 ) 2.7 Dehydrogenase 1dpg 6.2 6.0 ) 0.2 0.79 1.05 0.38 ) 0.41 ) 13.0 Endothiapepsin 1gvx 4.15 4.0 ) 0.15 0.52 0.07 1.45 2.06 6.5 Dehydratase 1aw5 9.0 9.0 0.0 1.07 0.85 0.17 ) 0.48 ) 6.8 Cathepsin B 1huc 5.15 5.0 ) 0.15 0.90 0.73 1.28 0.11 5.8 Alginate lyase 1hv6 7.0 7.0 0.0 1.17 0.93 0.63 ) 0.72 2.7 Xylanase 1igo 5.5 6.5 1.0 1.41 1.00 0.60 ) 0.74 7.3 Hydrolase 1iun 7.5 7.0 ) 0.0 0.86 1.50 0.11 ) 1.15 ) 1.1 Aspartic protease 1j71 4.15 3.0 ) 1.15 0.54 0.33 0.98 1.32 9.4 Aldolase 1jcj 8.5 8.5 0.0 0.97 0.54 0.55 ) 0.19 ) 5.1
L -Asparaginase 1jsl 8.5 7.0 ) 1.5 1.17 1.85 ) 0.12 ) 0.83 ) 0.1 Amylase 1lop 5.9 6.0 0.1 0.81 1.00 0.33 ) 0.42 ) 8.2 c-Glutamil hydrolase 1l9x 7.0 7.5 0.5 1.19 0.77 0.45 ) 0.02 2.8
Methapyrogatechase 1mpy 7.7 7.0 ) 0.7 1.0 1.33 0.11 ) 1.35 ) 12.0 Pyrovate oxidase 1pow 5.7 6.0 0.3 0.91 0.78 0.60 ) 0.51 ) 2.0 Chitosanase 1qgi 6.0 6.5 0.5 1.09 0.54 0.29 ) 0.31 5.0 Xylose isomerase 1qt1 8.0 8.0 0.0 0.84 1.50 0.24 ) 0.30 ) 16.0 Pyruvate decarboxylase 1zpd 6.0 7.0 1.0 1.02 0.83 0.47 ) 0.24 3.8 Acid a-amylase 2aaa 4.9 4.0 ) 0.90 0.51 0.64 1.53 1.48 ) 1.7 Formate dehydrogenase 2nac 5.6 7.0 1.40 1.11 1.42 0.06 ) 1.1 2.4 Phosphorylase 2tpt 6.05.0 ) 1.0 0.91 0.93 0.38 ) 0.34 ) 3.8 b-Amylase 5bca 5.5 5.0 ) 0.5 1.07 0.91 0.19 ) 0.13 15.1
Trang 3optimum pH that were not in the database, and therefore are
not modeled in the paper However, an additional four well
studied proteins were used to benchmark the method in
broad pH range and to compare the effect of mutations
Free energy and net charge of unfolded state
The unfolded state is modeled as a chain of noninteracting
amino acids (the possibility of residual interactions in the
unfolded state is discussed at the end of the discussion
section) Thus, the free energy of ionizable groups
(pH-dependent free energy) is calculated as [31]:
DGunf¼ kT lnðZunfÞ
¼ kTXN
i1
lnf1 þ exp½2:3cðiÞðpH pKsolðiÞÞg
ð1Þ where k is the Boltzmann constant, T is the temperature in
Kelvin degrees, N is the number of ionizable groups, c(i) is 1
for bases,)1 for acids, pKsol(i) is the standard pKavalue in
solution of group i (e.g., [47]), pH is the pH of the solution
and N is the number of ionizable residues Zunf is the
partition function of unfolded state and DGunf is the free
energy of unfolded state The reference state of zero free
energy is defined as state of all groups in their neutral forms
[31]
The net charge is calculated using the standard formula
that comes from Henderson–Hasselbalch equation:
qunf¼XN
i¼1
10cðiÞðpHpKsol ðiÞÞ
1þ 10cðiÞðpHpK sol ðiÞÞcðiÞ ð2Þ where c(i)¼)1 or +1 in the case of acid or base,
respectively
Free energy and net charge of the folded state
The pH-dependent free energy of the folded state is
calculated using the 3D structure of proteins listed in
Table 1 The 3D structure comprises N ionizable groups
(the same number as in the unfolded state) and L polar
groups Each of them might have several alternative
side-chain rotamers [50], or alternative polar proton positions
[47] In addition, ionizable groups are either ionized or
neutral All these alternatives are called conformers, being
ionizational and positional conformers There is no a priori
information to indicate which conformer is most likely to
exist at certain conditions of, for example, pH and salt
concentration Each microstate is comprised of one
con-former per residue The Monte Carlo method was used to
estimate the probability of microstates This procedure
is called multi-conformation continuum electrostatics (MC
CE) and it is described in more details elsewhere [25,47,50] A
brief summary of the MCCE method is provided in a later
section
To find the free energy one should calculate the
partition function for each of the proteins Thus, one
should construct all possible combinations of conformers
Because of the very large number of conformers (most of
the cases more than 1000), the Monte Carlo method (Metropolis algorithm [51]) is used to find the probability
of the microstates [20,47,50,52] However, to construct the partition function one should know all microstate energies and to sum them up as exponents Each microstate energy should be taken only once, which induces extra level of complexity A special procedure is designed that collects the lowest microstate energies and that assures that each microstate is taken only once [50] A microstate was considered to be unique if its energy differs by more than 0.001 kT from the energies of all previously generated states A much more stringent procedure that compares the microstate composition would require significant computation time and therefore was not implemented This results in a function that estimates the partition function This effective partition function will not have the states with high energy (they are rejected
by the Metropolis algorithm), but they have negligible effect [53] In addition, the constructed partition function may not have all low energy microstates, because given microstate may not be generated in the Monte Carlo sampling or because two or more distinctive microstates may have identical or very similar energies Bearing in mind all these possibilities, the effective partition function (Zfol) is calculated as [50 ]:
Zfol¼XX fol
n¼1
expðDGfol
where DGfol
n is the energy of the microstate n and Xfolis the number of microstates collected in Monte Carlo procedure Then the free energy of ionizable and polar groups in folded state is:
DGfol¼ kT lnðZfolÞ ð4Þ The occupancy of each conformer (qfoli ) [52] is calculated
in the Metropolis algorithm and then used to calculate the net charge of the folded state:
qfol¼XM i¼1
Mis the total number of conformers [Note that c(i)¼ 0for non ionizable conformers.]
Free energy of folding The pH-dependent free energy of folding is calculated as a difference between the free energy of folded and unfolded states:
DDGfolding¼ DGfol DGunf ð6Þ
An alternative formula of calculating the pH dependence
of the free energy of folding is [19,31,54,55]:
DDGfolding¼ 2:3kT
ZpH 2
pH 1
where, pH1and pH2determine the pH interval and Dq is the change of the net charge of the protein from unfolded to folded state
Trang 4Computational method: MCCE method
The basic principles of the method have been described
elsewhere [47,50] The MCCE [25] method allows us to find
the equilibrated conformation and ionization states of
protein side chains, buried waters, ions, and ligands The
method uses multiple preselected choices for atomic
posi-tions and ionization states for many selected side chains and
ligands Then, electrostatic and nonelectrostatic energies
are calculated, providing look-up tables of conformer
self-energies and conformer–conformer pairwise interactions
Protein microstates are then constructed by choosing one
conformer for each side chain and ligand Monte Carlo
sampling then uses each microstate energy to find each
conformer’s probability
Thus, the MCCE procedure is divided into three stages:
(a) selection of residues and generation of conformers; (b)
calculation of energies and (c) Monte Carlo sampling
Selection of residues The amino acids that are involved in
strong electrostatic interactions (magnitude > 3.5 kT) are
selected They will be provided with extra side-chain
rotamers to reduce the effects of possible imperfections of
crystal structures The reason is that a small change in their
position might cause a significant change in the pairwise
interactions [56] The threshold of 3.5 kT is chosen based on
extensive modeling of structures and fitting to
experiment-ally determined quantities [25] The selection is made by
calculating the electrostatic interactions using the
ori-ginal PDB [57] structure The alternative side chains for
these selected residues are built using a standard library of
rotamers [58] and by adding an extra side chain position
using a procedure developed in the Honig’s laboratory [59]
The backbone is kept rigid Then the original structure and
alternative side chains were provided with hydrogen atoms
Polar protons of the side chains are assigned by satisfying all
hydrogen acceptors and avoiding all hydrogen donors [25]
Thus, every polar side chain and neutral forms of acids have
alternative polar proton positions
Calculation of energies The alternative side chains and
polar proton positions determine the conformational
space for a particular structure, and they are called
conformers The next step is to compute the energies of
each conformer and to store them into look-up tables
Because of conformation flexibility, the energy is no
longer only electrostatic in origin, but also has
nonelec-trostatic component [47,50]
Electrostatic energies are calculated by DelPhi [60,61],
using the PARSE [62] charge and radii set Internal
dielectric constant is 4 [63], while the solution dielectric
constant is taken to be 80 The molecular surface is
generated with a water probe of radius 1.4 A˚ [64] Ionic
strength is 0.15M and the linear Poisson–Boltzmann
equation is used Focusing technique [65] was employed to
achieve a grid resolution of about two grids per A˚ngstrom
The M calculations, where M is the number of conformers,
produce a vector of length M for reaction field energy
DGrxn,i and an MxM array of the pairwise interactions
between all possible conformers DGijel In addition, each
conformer has pairwise electrostatic interactions with the
backbone resulting in a vector of length M DG The
magnitude of the strong pairwise and backbone interactions
is altered as described in [56] Such a correction was shown to improve significantly the accuracy of the calcu-lated pKas [25]
Having alternative side chains and polar hydrogen positions requires nonelectrostatic energy to be taken into account too This energy is a constant in calculations that use a rigid protein structure (and therefore should not be calculated), but in MCCE plays important role discrim-inating alternative positional conformers The non-electrostatic interactions for each conformer are the torsion energy, a self-energy term which is independent
of the position of all other residues in the protein, and the pairwise Lennard–Jones interactions, both with por-tions of the protein that are held rigid, and with conformers of side chains that have different allowed posi-tions [25,47,50]
Thus, the microstate n pH-dependent free energy of folded state is [20,21,47,50]:
DGfoln ¼XM
i¼1
2:3kTdnðiÞ½cðiÞðpH pKsolðiÞÞ þ DpKintÞðiÞ
þXM j¼iþ1
dnðiÞdnðjÞðGijelþ GijnonelÞ
; DpKintðiÞ ¼ DpKsolvðiÞ þ DpKdipðiÞ þ DpKnonelðiÞ
ð8Þ where dn(i) is 1 if ith conformer is present in the nth microstate, M is the total number of conformers, DpKint(i)
is the electrostatic and non electrostatic permanent energy contribution to the energy of conformer i (note that it does not contain interactions with polar groups), c(i) is 1 for bases,)1 for acids, and 0for neutral groups, DpKsolv(i) is the change of solvation energy of group i, DpKdip(i) is the electrostatic interactions with permanent charges, DpKnonel(i) is the nonelectrostatic energy with the rigid part
of protein, Gijeland Gijnonelare the pairwise electrostatic and non electrostatic interactions, respectively, between con-former i and j
Monte Carlo sampling The Monte Carlo algorithm is used to estimate the occupancy (the probability) of each conformer at given pH The convergence is considered successful if the average fluctuation of the occupancy is smaller than 0.01 [25] The pH where the net charge of given titratable group is 0.5 is pK½ To adopt a common nomenclature, pK½will be referred as pKathroughout the text
Optimum pH, isoelectric point (pI) and bases/acids ratio The experimental pH of maximal stability for each of the proteins listed in Table 1 is taken from the website BRENDA [49] The database does not always provide a single number for the optimum pH If given protein is reported to be stable in a range of pHs, then the optimum
pH is taken to be the middle of the pH range
The optimum pH in the numerical calculation is deter-mined as pH at which the free energy of folding has minimum In the case that the free energy of folding has a
Trang 5minimum in a pH interval, the optimum pH is the middle of
the interval The calculations were carried out in steps of
DpH¼ 1 Thus, the computational resolution of
determin-ing the pH optimum was 0.5 pH units
The calculated and experimental pH intervals were not
compared, because in many cases BRENDA database
provides only the pH of optimal stability In addition, in
most cases the experimental pH interval of stability given in
the BRENDA database does not provide information for
the free energy change that the protein can tolerate and still
be stable Therefore it cannot be compared with the
numerical results which provide only the pH dependence
of the folding free energy Some proteins may tolerate a
free energy change of 10kcalÆmol)1and still be stable, while
others became unstable upon a change of only a few
kcalÆmol)1
The calculated isoelectric point (pI) is the pH at which
the net charge of folded state is equal to zero There is
practically no experimental data for the pI of the proteins
listed in Table 1 The net charge at optimum pH is the
calculated net charge of the folded protein at pH
optimum Base/acid ratio was calculated by counting all
Asp and Glu residues as acids and all Arg, Lys and His
residues as bases In some cases, one or more acidic and/
or His residues was calculated to be neutral at a particular
pH optimum, but they were still counted The reason for
this was to avoid the bias of the 3D structure and to
calculate the base/acid ratio purely from the sequence
The given residue is counted as 66% buried if its
solvent accessible surface (SAS) is one-third of the SAS
in solution Averaged intrinsic pK shifts were calculated
as
1
N
XN i¼1 ðpKintðiÞ pKsolðiÞÞ and the averaged pKas shift as
1 N
XN i¼1 ðpKaðiÞ pKsolðiÞÞ
Thus, a negative pK shift corresponds to conditions such
that the protein stabilizes acids and destabilizes bases and
vice versa Arginines were not included in the calculations
because their pKas are calculated in many cases to be
outside the calculated pH range
Results
Origin of optimum pH The paper reports the pH dependence of the free energy of folding Despite the differences among the calculated proteins, the results show that the pH-dependence profile
of the free energy of folding is approximately bell-shaped and has a minimum at a certain pH, referred to through the paper as the optimum pH
To better understand the origin of the optimum pH, a particular case will be considered in details Figure 1A shows the free energies of cathepsin B calculated in pH range 0–14 Three energies were computed: the free energy
of the unfolded state (bottom line), the free energy of the folded state (middle line) and the free energy of folding (top curve) For the sake of convenience the free energies of the folded state and folding are scaled by an additive constants
so to have the same magnitude as the free energy of the unfolded state at the pH of the extreme value (in this case
pH¼ 5) It improves the resolution of the graph without changing its interpretation, because the energies contain an undetermined constant (hydrophobic interactions, entropy change, van der Waals interactions and other pH-inde-pendent energies)
Free energy of unfolded state It can be seen (Fig 1A) that the free energy of the unfolded state has a maximum value
at pH¼ 5 and it rapidly decreases at low and high pHs Such a behavior can be easily understood given equation 1
At low pH, the pKsolof all acidic groups is higher than the current pH and thus they contribute negligible to the partition function In contrast, all basic groups contribute significantly to the partition function As the pH decreases, their contribution increases, making the free energy more negative At medium pHs, all ionizable groups are ionized (except His and Tyr), but their effect on the free energy is quite small, because their pKsolare close to the pH This results in a maximum of the free energy corresponding to the least favorable state At high pHs, the situation is reversed: all acidic groups have a major contribution to the partition function, while bases add very little Thus, the free energy profile of the unfolded state is always a smooth curve (bell-shaped) with a maximum at a certain pH The shape of the curve and the position of the maximum depend entirely upon the amino acid composition
Fig 1 Cathepsin B pH-dependent properties.
(A) Free energy; (B) net charge.
Trang 6Free energy of folded state The free energy of the folded
state behaves in a similar manner, but it changes less with
the pH (Fig 1A) Note that it has maximum at pH¼ 6
The major difference occurs at low and high pHs where free
energy of the folded state does not decrease as fast as for the
unfolded state The 3D structure adds to the microstate
energy (Eqn 8) and to the partition function several new
energy terms )DpKint(i) (that originates in part from the
desolvation energy) and pairwise interactions Gij(a detailed
discussion on the effect of desolvation and pairwise energies
on the stability is given in [31]) If these two terms
compensate each other, then Eqn 8 might be thought to
reassemble the microstate energy formula of the unfolded
state, Eqn 1 But there is an important difference: the amino
acids are coupled through the pairwise interactions The
pairwise energies are a function of the ionization states
Thus, the de-ionization of a given group will cancel its
pairwise interaction energies with the rest of the protein
The effect of the coupling can be easily understood at the
extremes of pH Consider a very low pH such that the pKas
of all acidic groups are higher than the current pH At such
pH all acids will be fully protonated and thus the bases
(having their own desolvation penalty) will be left without
favorable interactions Thus the energy of the folded state
will be less favorable (because of the desolvation energy and
the lack on favorable interactions) than the energy of
unfolded state
Free energy of folding The pH dependence of the free
energy of folding results from the difference of the above
free energies (Fig 1A) It always will have a minimum at
certain pH (in principle it might have more than one
minimum) This minimum may or may not coincide with
the pH where the unfolded free energy has maximum The
folding free energy always has a bell shape, and it is
unfavorable at low and high pHs as compared to the free
energy at optimum pH
Net charge An alternative way of addressing the same
question is to compute the net charge of the protein
(Fig 1B) One can see that at the extremes of pH, the
protein is highly charged At low pH it has a huge net
positive charge and at high pH a huge net negative charge
A straightforward conclusion could be made that acidic/
basic denaturation is caused by the repulsion forces among
charges with the same type However all these positive
charges at low pH exist also at medium pH, where the
proteins are stable The thing that is missing at low pH and
causes acid denaturation is the favorable interactions with
negatively charged groups At low pH, bases are left without
the support of acids, and they have to pay an energy penalty
for their desolvation and unfavorable pairwise energies
among themselves
Equation 7 provides an additional tool for determining
the optimum pH At the optimum pH, the curve of folding
free energy must have an extremum, i.e the curve must
invert its pH behavior At pH lower than the optimum pH,
the free energy of folding should decrease with increasing
the pH, then it should have a minimum at pH equal to the
optimum pH, and then it should increase with further
increase of the pH Such behavior corresponds to a negative
net charge difference between the folded and unfolded state
at pH smaller than the optimum pH As pH increases, the net charge difference should get smaller, and at the optimum
pH, it should be zero Further increase of the pH (above the optimum pH) should make the net charge difference a positive number One can see in Fig 1B that the net charge
of folding follows such pattern and is zero at pH¼ 5, where the free energy of folding has a minimum
General analysis of the optimum pH Comparison to experimental data Although this paper focuses on the pH of maximal stability, it is useful to compare the calculated pH dependence of the folding free energy on a set of proteins subjected to extensive experi-mental measurements Figure 2 plots the calculated and experimental pH dependence of the free energy of folding The experimental data is taken from Fersht [66,67], Robertson [68] and Pace [10] One can see that the calculated pH-dependent free energy agrees well with the experimental data The most important conclusion for the aims of the paper is that the calculated pH dependence profile of the free energy of folding is similar to that of the experiment The only exception is ribonuclease A where the calculated pH optimum is 8 while the experiment finds the best stability at pH¼ 6 It should be noted that the calculated results are similar to the results reported by Elcock [33] and Zhou [36] in cases of idealized unfolded state From the works of the above authors, as well as from Karshikoff laboratory [34], one can see that the residual interactions in unfolded state do not affect the pH optimum
in majority of the studied cases
An additional possibility for comparison is offered by the mutant data Table 2 shows the stability change of barnase caused by mutations of charged residues The calculated numbers are the pKashifts (in respect to the standard pKsol)
of each of these ionizable residues Thus, the energy of the mutant residue is not taken into account in the numerical calculations Even under such simplification, the calculated numbers are 0.84 kcalÆmol)1rmsd from the experiment Figure 3 compares the calculated optimum pH vs experimental optimum pH for 28 proteins listed in Table 1 One can see that calculated values are in good agreement with experimental data The slope of the fitting line is 0.93 and Pearson correlation coefficient is 0.86 The rmsd between calculated and experimentally determined opti-mum pHs is 0.73 The optiopti-mum pH ranges from 2 to 9 (4–9 experimentally) which provides a broad range of pHs to be compared
The origin of the optimum pH The position of the optimum pH depends on the amino acid composition and
on the organization of the amino acids within the 3D structure To find which of these two factors dominates we plotted the calculated optimum pH of the free energy of folding vs the pH at which the free energy of unfolded state has maximum (Fig 4) The free energy of folding results from the difference of the free energy of folded and unfolded states Thus, if the last two energies have the same pH dependence, the free energy of folding will be pH independ-ent If both the free energy of unfolded and of folded state have similar shape and maximum at the same pH, then most likely the optimum pH will also be at this pH If the curve of
Trang 7the free energy of the folded state is steeper at basic pHs (or flatter at acidic pHs) compared to the free energy of the unfolded state, then the difference, i.e the free energy of folding will have optimum pH shifted to the right pH scale Such a phenomenon will occur if the protein stabilizes acids Then the optimum pH will be higher than the pH of maximal free energy of unfolded state (points above the
Table 2 Experimental and calculated effect of single mutants on the
stability of barnase.
Mutant Experiment (kcalÆmol)1) Calculation (kcalÆmol)1)
R69S, R69M ) 2.67, ) 2.24 ) 1.9
R110A ) 0.45 ) 2.17
Fig 2 The calculated pH dependence of the
free energy of folding (solid line) and
experi-mental data (d) The ionic strength was
selected to match experimental conditions:
barnase (I ¼ 50m M ), OMKTY3
(I ¼ 10m M ), CI2 (I ¼ 50m M ) and
ribonuc-lease A (I ¼ 30m M ).
Fig 3 The calculated optimum pH vs the experimental optimum pH.
The figure shows only 27 data points, because the calculated and
experimental data for 1b4u and 1qt1 overlap.
Fig 4 The calculated optimum pH vs the pH of maximal free energy of unfolded state Only 19 points can be seen in the figure, because of an overlap, but all 28 points are taken into account in the calculation of the correlation coefficient.
Trang 8diagonal) If the protein stabilizes bases (or destabilizes
acids), then the optimum pH is lower than the pH of
maximum of the free energy of unfolded state (point below
the diagonal) The points lying on the diagonal represent
cases for which the amino acid sequence dominates in
determining the optimum pH The points below the
diagonal show proteins with pH optimum lower than the
pH of maximum of the free energy of unfolded state The
points offset from the diagonal manifest the importance of
the 3D structure In each case where the 3D structure causes
a shift of the solution pKaof ionizable groups, the stability
changes [31,69] If protein favors the charges, then the
stability increases From 28 proteins studied in the paper,
nine lie on the main diagonal (tolerance 0.5pK units), while
19 are offset by more than of 0.5pK units Thus, in 32% of
the cases the amino acid composition is the dominant factor
determining the optimum pH and in 68% of the cases, the
3D structure does
To check for possible correlation between the optimum
pH and the pK shifts in respect to the standard pKsol, they
were plotted in Fig 5 Two pK shifts were calculated:
intrinsic pK which does not account for the interactions
with ionizable and polar groups, and pKa shift which
reflects the total energy change from solution to the protein
for each ionizable group In both cases the correlation with
pH optimum exists, although the correlation coefficients are
not very good A positive pK shift corresponds to pK of
acids and bases bigger that of model compounds and thus to
electrostatic environment that disfavors acids and favors
bases The most acidic enzymes were found to use this
strategy to lower their optimum pH (see the most right hand
side of the Fig 5) The most basic enzymes induce slight
positive shift of the intrinsic pK, but adding the pairwise
interactions turns the pK shift to a negative number The
enzymes between these two extremes do not induce large pK
shift on average
It is well known that the pH dependence of the free
energy is an integral of the net charge difference between
folded and unfolded states over a particular pH interval
(Equation 7) [31,55,70] A negative net charge difference
corresponds to a negative change of the free energy (the free
energy gets more favorable as pH increases) Thus, if an acid
has a pKalower than the standard pKsol, it will titrate at
lower pH in the folded state compared to unfolded As a
result, such a group will contribute to the net charge
difference by a negative number Conversely, a positive net
charge difference corresponds to a positive free energy
change, i.e to a less favorable free energy of folding This
corresponds to pKas higher than the standard pKsol At optimum pH the net charge difference should be zero At very low and at very high pHs, the free energy of folding is unfavorable, because either bases or acids are left without the support of the contra partners Between these two extremes, the free energy of folding must have a minimum Starting from very low pH to high pH, the first several ionization events will be the deprotonation of acids Because these few acids are in the environment of the positive potential of bases, they have pKas lower than of unfolded state and thus, the net charge difference between folded and unfolded states will be negative Thus, the free energy of folding will decrease If the protein does not support the acids, then the rest of acids will have pKas higher than that
of the unfolded state This results to a positive net charge difference between the folded and unfolded state and increases the free energy of folding Thus, the optimum
pH will be at low pH Conversely, if the protein favors the acids, then most of them will have pKas lower than of unfolded state and the net charge difference between folded and unfolded states will be negative Thus, the free energy of folding will keep decreasing with increasing pH This will result in optimum pH shifted to higher pHs
The optimum pH is not uniquely determined by the ratio
of basic to acidic groups Figure 6A demonstrates that enzymes with quite different bases to acids ratio have similar optimum pH and that proteins with similar bases to acids ratio function at completely different pHs At the same time, the trend is clearly seen The proteins that function at low
pH have fewer bases (low base to acid ratio), while the enzyme working at high pH have more bases than acids (see also Table 2) The Pearson correlation coefficient is less than 0.4, which demonstrates that the base/acid ratio is not the most important factor in determining the optimum pH However, restricting the counting to buried amino acids only, one finds much better correlation (Fig 6B) This improvement suggests that the pH optimum is mostly determined by the buried charged groups, but the correla-tion is still weak
The effect of the net charge on the stability of the proteins is demonstrated in Fig 7A,B, where the optimum
pH is plotted against the calculated isoelectric point (pI) and the net charge at optimum pH At the isoelectric point the net charge of the protein is zero, i.e there are equal number negative and positive charges The graph shows that there is no correlation (Pearson coeffi-cient¼ 0.09) between the isoelectric point and the opti-mum pH At the same time, the correlation between the
Fig 5 The experimental optimum pH vs the averaged pK shifts (A) Averaged intrinsic pK a ; (B) averaged pK s shift.
Trang 9optimum pH and the net charge of folded state is not
neglectable The signal is weak, but there is a clear
tendency for proteins with acidic optimum pH to be
positively charged and for proteins with basic optimum
pH to carry negative net charge There are only a few
proteins which do not have net charge at optimum pH
Discussion
The study has shown that the pH of maximal stability can
be calculated using the 3D structure of proteins
Twenty-eight different proteins were studied, most of them with
undetectable sequence and structural similarity The
opti-mum pH varies from very acidic pH to very basic pH Such
a diversity provided a good test for the computational
method (MCCE) used in the study Relatively good
agreement with the experimental data was achieved
result-ing to correlation of 0.85 and rmsd¼ 0.73 At the same
time, as indicated in Fig 3, there are three proteins with
calculated optimum pH of about 1.5 pK units offset from
the experimental value (see Table 1) The reason for such a
discrepancy could be conformation changes that are not
included in the model In addition, all calculations were
carried out at physiological salt concentration (I¼ 0.15M),
while the experimental conditions of measuring the
opti-mum pH in many cases are not available This may or may
not be a source of significant error, because although the salt
concentration strongly affects the pKa values in proteins
[71,72] and in model compounds [73], it may not necessary
affect the optimum pH [74] At the same time, it is
interesting to point out that the average rmsd of calculated
to experimental pH optimum is 0.73, which is similar and slightly better than the average rmsd of pKas calculations [25]
Two major factors determine the optimum pH, amino acid composition and 3D structure of the proteins The relative importance of these two factors varies among the proteins To test our conclusions, two proteins that have different optimum pH (acidic and basic) and are structurally superimposable will be discussed below
Figure 8A shows a structural alignment of acid a-amylase (pdb code 2aaa) and xylose isomerase (pdb code 1qt1) The first protein has acidic optimum pH (calculated optimum pH¼ 4, experimental optimum pH ¼ 4.9), while the second has basic optimum pH (calculated and experi-mental optimum pH¼ 8) The core structures of the proteins are well aligned (rmsd¼ 5.0A˚ and PSD ¼ 1.47 [75]) The part of the sequence alignment generated from the structural superimposition is shown in Fig 8B The posi-tions that correspond to Arg or Lys residues in the xylose isomerase sequence and are aligned to nonbasic groups in acid a-amylase sequence are highlighted One can see that
31 basic groups of xylose isomerase sequence are replaced
by negative, polar or neutral groups in acid a-amylase sequence There are only a few examples of the opposite case that are not shown in the figure This results to base/ acid ratio of 0.51 for acid a-amylase and 0.84 for xylose isomerase This difference in the amino acid composition results in a different pH dependence of the free energy of the unfolded state and thus demonstrates the effect of the amino acid composition on the optimum pH From a structural point of view it is interesting to mention that most of the
Fig 7 The experimental optimum pH vs the
calculated isoelectric point (A) and the net
charge at pH optimum (B).
Fig 6 The experimental optimum pH vs the
ratio of bases/acids Twenty-seven data points
can be seen, because of the overlap between
1qtl and 1b4u (A) All amino acids; (B) buried
amino acids.
Trang 10extra basic groups within the xylose isomerase structure are
not within the extra loop regions, but rather within the core
structure (see Fig 8A) This confirms the observation
(Fig 7B) that buried groups affect the optimum pH and
an enzyme that has acidic optimum pH has low acid/base
ratio It remains to be shown that this is a general behavior
of all enzymes operating at low pH
Three-dimensional structure of the protein plays an even more significant role than the sequence composition on the optimum pH (68% of the cases in this work) The ability of
Fig 8 Alignment of acid alpha-amylase (2aaa.pdb) and xylose isomerase (1qt1.pdb) (A) Structural and sequence alignments are carried out with GRASP 2 [79] Structural alignment in ribbon representation: acid amylase backbone is shown in green and xylose isomerase in blue The red patches show the positions of substitution of Arg/Lys
to negative, polar or neutral groups from xylose isomerase to acid amylase (see Fig 8B) (B) Sequence alignment from the structural superimposition: highlighted are the positions
at which Arg/Lys in the xylose isomerase sequence are aligned to acid, polar or neutral group in acid a-amylase sequence.