1. Trang chủ
  2. » Y Tế - Sức Khỏe

Tài liệu HPLC for Pharmaceutical Scientists 2007 (Part 19) pptx

63 299 1
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề LC/MS Analysis of Proteins and Peptides in Drug Discovery
Tác giả Guodong Chen, Yan-Hui Liu, Birendra N. Pramanik
Trường học John Wiley & Sons, Inc.
Chuyên ngành Pharmaceutical Sciences
Thể loại tài liệu
Năm xuất bản 2007
Định dạng
Số trang 63
Dung lượng 1,75 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

They can be sepa-rated/detected and either a directly searched against a genome or proteindatabase for protein identification peptide mass mapping or b further dis-sociated in a tandem ma

Trang 1

The modern drug discovery process, in general, involves the identification of

a biochemical target (usually protein target), screening of synthetic pounds or compound libraries from combinatorial chemistry/natural sourcesfor a lead compound, and optimization of the lead compound (activity, selec-tivity, pharmacokinetics, etc.) for recommending a potential clinical candidate.The ultimate goal is to develop highly potent compounds (small molecules)that bind noncovalently with target proteins and produce the desired thera-peutic response with minimal side effects [1]

com-In addition, the discovery of DNA structures by Francis Crick and JamesWatson laid a foundation for the $30 billion-a-year biotechnology industry thathas produced some 160 drugs and vaccines, treating everything from breastcancer to diabetes Recent advances in recombinant DNA technology haveprovided means to produce and develop protein products as novel drugs, vac-cines, and diagnostic agents For example, INTRON A (interferon α-2b) is one

of the first recombinant protein drugs introduced on the market This synthetic

E coli recombinant DNA-derived protein functions as a natural interferonproduced by the human body as part of the immune system in response to thepresence of enemy cells It not only interferes with foreign invaders that maycause infections, but also prevents the growth and spread of other diseased

837

HPLC for Pharmaceutical Scientists, Edited by Yuri Kazakevich and Rosario LoBrutto Copyright © 2007 by John Wiley & Sons, Inc.

Trang 2

cells in the body This protein drug is effective in treating hepatitis C virus and

a variety of tumors ENBREL (etanercept) is another protein drug used fortreatment of rheumatoid arthritis It is produced from a Chinese hamster ovarymammalian cell expression system This protein drug is a dimeric fusionprotein consisting of the extracellular ligand-binding portion of the human 75-kilodalton (kDa) tumor necrosis factor receptor (TNF) TNF is one of thechemical messengers that are involved in the inflammatory process Too muchTNF produced in the human body overwhelms the human immune system’sability to control inflammation in the joints ENBREL binds to and inactivatessome TNF molecules before they can trigger inflammation, thus reducinginflammatory symptoms [2, 3]

One of difficulties encountered in producing large quantities of biologicallyactive proteins is the elimination of microheterogeneity related to these pro-teins The therapeutic proteins and the drug target proteins are usually asso-ciated with post-translational modifications, such as phosphorylation [4],glycosylation [5], aggregation, and disulfide bond formation [6], with all contributing to the heterogeneity of the proteins These post-translationalmodifications control many biological activities/processes Therefore, charac-terization of proteins with respect to assessment of purity and structure is anintegral part of the overall efforts toward drug development, including sub-mission of the analytical data to the regulatory agencies Furthermore,progress in genomics and proteomics research has generated new proteins thatrequire rapid characterization by analytical methods [7]

19.2 GENERAL STRATEGIES FOR ANALYSIS

OF PROTEINS/PEPTIDES

The analytical strategies for protein characterization rely heavily on performance liquid chromatography (HPLC) and/or electrophoretic separa-tion of proteins/peptides, followed by other detection methods [e.g., mass spectrometry (MS)]

high-19.2.1 HPLC Methods in Proteins/Peptides

Achieving good separation of proteins/peptides is always one of many challenges in chromatographic separations Proteins are highly complex mole-cules with enormous amount of structural diversity, including hydrophobic/hydrophilic and anionic/cationic interactions The differences in physical,chemical, and functional properties of proteins/peptides provide the molecu-lar basis for their separations There are five basic chromatographic separationmethods, including size-exclusion chromatography, ion-exchange chromatog-raphy, reversed-phase chromatography, hydrophobic interaction chromatog-raphy (HIC), and affinity chromatography (detailed discussions on the firstthree techniques are provided in Part I of this book) [8, 9]

Trang 3

Size-exclusion chromatography (often referred to as gel filtration or gel meation chromatography) is a chromatographic process involving separation

per-of proteins on the basis per-of their differential apparent molecular sizes [10] Thecolumn packing materials usually consist of particles with well-controlled poresize When mobile-phase liquid flows through these particles, the proteins(solutes) with different size can get into and out of the pores with differentaccessibility For a specific size-exclusion column with a specific pore size, pro-teins with molecular weights above the exclusion limit (in daltons) of thecolumn are too large to enter the pores and are excluded from the column.Proteins with molecular weights less than the exclusion limit can have differ-ent access to pores of particles and elute after the void volume, depending ontheir size and shape In theory, there is a linear relationship between the log-arithm of protein molecular size (molecular weight) and the elution volume

of the protein.A calibration curve based on this linear relationship can be used

to determine the molecular weight of proteins, assuming that the protein isglobular and symmetrical in shape, and there is no other interaction betweenthe protein and column In practice, denaturants (e.g., 0.1% SDS) are some-times used in the mobile phase to disrupt possible formation of undesiredprotein aggregates in solution and promote uniformity in conformations ofproteins Thus, the separation can be performed in near-ideal situations toobtain more accurate molecular weight determination of proteins using thisapproach

Several parameters should be given special consideration in method opment of size-exclusion chromatography Although its nature of separationrequires no interactions between the proteins and stationary phase, thecolumn packing material often exhibits anionic and hydrophobic characters.The addition of salts to the mobile phase can suppress these column effects.However, a higher concentration of salts (>0.5 M) might promote hydropho-bic interactions between proteins and the column Amount of salts added tothe mobile phase should be carefully adjusted Another factor is pH value.The formation of silanolate anions from column can be minimized by carryingout experiments at pH values less than 7 Typical experimental conditionsinclude mobile phases with low ionic strength buffers (<0.1 M) in near-physiological pH ranges—that is, 50 mM phosphate buffer with 100 mM KCl(pH 6.8) Flow rates can vary from 0.5 mL/min to 1.0 mL/min, although a betterresolution can be achieved with slower flow rates The sample injection volumeand analyte concentration is also critical for optimum performance Theloading capacity is very low for size-exclusion chromatography Generally, thesample injection volume should not exceed 5% of the column bed volume inorder to maintain good resolution Protein samples should be concentratedwithout causing precipitation prior to analysis Once an appropriate method

devel-is developed, size-exclusion chromatography can be an excellent method forseparation of protein complexes It is also suitable for buffer exchange as adesalting procedure in protein purifications (salts can be easily separated fromproteins by size-exclusion chromatography) and estimation of the molecular

Trang 4

weight of proteins A key advantage of this technique is that the biologicalactivity of proteins is maintained during the separation.

Ion-exchange chromatography relies on reversible, electrostatic (or ionic)interactions between charged proteins/peptides in the mobile phase andcharged ion-exchange group on the stationary phase [11] Proteins/peptidesnormally possess either net positive or negative charges depending on pH.They are positively charged at pH values below their pI (isoelectric point) andnegatively charged at pH values above their pI For acidic proteins and pep-tides (pI < 6), they are normally separated using anion-exchange columnsbecause they are negatively charged Basic proteins and peptides (pI > 8) areusually chromatographed on cation-exchange column because they are posi-tively charged The choice of pH is important for optimum separation results.The pH of the mobile phase is typically set at least one pH unit away from

the pK a of its ion-exchange resin in order to keep 90% of the full charge

on the column For anion-exchange column, the pH is chosen to be lower than

the pK a For cation exchangers, the pH is set to be higher than the pK a Otherkey parameters include the ionic strength of the mobile phase The salts used

in the buffer solution are the counterions that might bind to the ion-exchangecolumn in competition with proteins/peptides Thus, if a protein/peptide isstrongly bound to the ion-exchange column, a stronger counterion can be used

to improve the elution Some common counterions with their relative strengthinclude Cs+> K+> NH+

4 > Na+and PO4 −> CN−> HCOO−> CH3COO− Theunique feature of ion-exchange chromatography is that the biological activity

of proteins is almost always preserved, and this separation method can also beused to concentrate dilute protein samples

More recently, another related technique — chromatofocusing — hasemerged as a chromatographic technique complementary to electrophoreticmethods for pI determination Chromatofocusing is an ion-exchange tech-nique in which a pH gradient is established across the column, allowing forthe eventual separation of amphoteric substances (i.e., proteins) based on their

pI The main advantages of chromatofocusing are high loadability of thecolumn, high resolution power allowing separation of two proteins (i.e.,protein and a degradation product variant) differing less than 0.05 pI units, andthe high efficiency due to both gradient elution mode and special focusingeffect of the polyampholytes Furthermore, peptides and proteins are lesslikely to precipitate in chromatofocusing than in isoelectrical focusing.Reversed-phase (RP) chromatography is a hydrophobic separation tech-nique based on the interaction between the nonpolar regions of proteins/peptides and the stationary phase [12] It typically utilizes volatile organic sol-vents (acetonitrile, etc.) as mobile phases under acidic pH conditions It pro-vides high speed and high efficiency and is compatible with MS detection Thistechnique is the most widely used HPLC method in the separation of peptidesand proteins

There are a number of factors to be considered in method development ofRPLC for separation of proteins and peptides Appropriate pore size is one

Trang 5

of primary considerations in selecting a column For proteins greater than

10 kDa, large pore size (300 Å) is necessary to reduce restriction of the proteininto the stationary phase and avoid poor recoveries and decreased efficien-cies Polypeptides (<10 kDa) can be effectively separated using a column with

a small pore size (<150 Å) The hydrophobicity of the protein is also tant when choosing a column In general, C18 column is used for hydrophilicproteins/small peptides, and C4 or C5 bonded phase is used for hydrophobicproteins/large polypeptides The use of C4/C5 column for hydrophobic pro-teins may reduce undesired protein absorption on the column because moreretentive C18 column for hydrophobic proteins can lead to irreversiblebinding of the protein to the column.The most commonly used mobile phase

impor-in RPLC impor-involves acetonitrile solution with 0.1% trifluoroacetic acid (TFA)

In addition, alcohols such as isopropanol are sometimes used for large andmore hydrophobic proteins to enhance the elution and improve recovery Notethat all mobile phase reagents should be of the highest quality to avoid theappearance of ghost peaks from solvent impurities Some ion-pairing reagentsare often used to optimize resolution and retention For example, hydropho-bic, anionic ion-pairing reagents (i.e., TFA and pentafluoropropionic acid) cancomplex with positively charged basic residues and influence the chromatog-raphy On the other hand, hydrophobic, cationic ion-pairing reagents (i.e., tri-ethylamine acetate) interact with negatively charged groups (i.e., carboxylicacid, free carboxyl terminus at pH > pKa) and effect their retention Thus,manipulation of ion-pairing reagent and pH value provides alternativeapproaches in optimizing RPLC Variation of flow rate and gradient rate canhave an impact on the chromatography as well An increase in flow rate or adecrease in gradient rate improves resolution, although it may result in a loss

of sensitivity Typically, a shallower gradient is employed to maintain good olution—that is, 0.25% to 4% per minute Column temperature also affectsthe separation Higher column temperature usually improves column effi-ciency, peak shape, and resolution However, it may lead to the loss of bio-logical activity of the protein

res-Hydrophobic interaction chromatography involves weak interactions ofhydrophobic patches on the surface of the intact protein and nonpolar groups

on the stationary phase [13] This technique uses aqueous mobile phases ofhigh ionic strength and neutral pH It does not denature or unfold proteinsand can be used to detect protein conformational changes Key factors affect-ing protein separations include column, salt, mobile-phase pH, and tempera-ture Most columns used in HIC are made of silica-based stationary phaseswith modified aryl groups, diol derivatives, and short alkyl chains The overallhydrophobicity of the stationary phase is determined by both the nonpolarcharacter of the bonded ligands and their density Strong column-solute inter-actions should be avoided to reduce denaturation The type and concentration

of salt are critical in HIC One of considerations in choosing a salt is its surfacetension Salts with higher surface tension values may lead to the increase insolute retention The amount of proteins bound to the column also increases

Trang 6

with increasing of salt concentration More hydrophobic proteins should beseparated using salts with higher surface tensions Commonly used salts withrelative surface tension include KCl < NaCl < Na2HPO4 < (NH4)2SO4 <

Na3PO4, with typical concentrations ranging from 1 M to 3 M in order to imize selectivity or column capacity The pH value in HIC is usually main-tained in the neutral range (pH 5–8) Appropriate pH for the optimization

max-of resolution/selectivity in HIC can only be made empirically since proteinsdiffer significantly in their susceptibility to denaturation with changing of pH.Another important parameter in developing HIC method is temperature Ingeneral, proteins tend to be more stable at lower temperatures To maintainthe conformations of proteins, the lowest temperature sufficient for separationshould be used in the HIC technique

As an illustration of HIC technique, the recombinant human growthhormone (hGH) and methionyl hGH (met-hGH) were well-separated by the HIC technique [14] The optimized conditions were found to be 1 M ammonium phosphate dibasic, pH 8.0/propanol (99.5 : 0.5) and 0.1 M sodiumphosphate dibasic, pH 8.0/propanol (97.5 : 2.5) for mobile phase A and B,respectively, with a descending gradient from 100% A to 100% B in 30 minutes

at a column (TSK-phenyl 5PW, 75 × 7.5 mm) temperature of 30°C Note thatthe addition of a small amount of propanol as organic modifiers significantlydecreases elution time while maintaining resolution and efficiency This HICmethod allowed separation of several hGH variants from the main hGH peakwhile retaining their native structures

Affinity chromatography is based on reversible, specific binding of one biomolecule to another [15] The analyte to be purified is specifically andreversibly adsorbed to a ligand (binding substance) that is immobilized by acovalent bond to a chromatographic bed material (matrix) The choice ofligand is a critical factor in affinity chromatography, because it determines theinteraction mode between the solute and the ligand There are two types ofligands: specific ones and multifunctional ones Specific ligands include potentbinders of single classes of peptides or proteins, such as enzyme substrates/inhibitors and antigens/antibodies Examples of multifunctional ligandsinclude (a) concanavalin A that binds to some specific carbohydrate residuesand (b) nucleotides that bind to enzymes The chromatography steps involvesample loading in which samples are applied under favorable conditions fortheir specific binding to the ligand Analytes of interest are consequentlybound to the ligand while unbound substances are washed away Recovery ofmolecules of interest can be achieved by changing experimental conditions tofavor desorption (elution) Various elution techniques used include changes

in mobile-phase composition (e.g., ionic strength, pH) and disruption ofligand/solute complex using competitive ligands in the mobile phase The sep-aration of analytes depends on their native conformations (for proteins) andrelative binding affinities for the immobilized ligand on the column The affin-ity interactions can be extremely specific, an antibody binding to its antigen,and so on This technique is a powerful tool in investigating protein–protein,

Trang 7

protein–peptide, and drug–protein interactions Its applications in inhibitorscreening using affinity chromatography–MS methods in drug discovery will

be discussed later in this chapter

19.2.2 MS Methods for Protein Characterization

MS is another powerful analytical technique for protein characterization Thistechnique measures mass-to-charge ratios of ions in the gas phase, providingboth molecular weight (MW) information and structural information [16].The introduction of electrospray ionization (ESI) [17, 18] and matrix-assistedlaser desorption/ionization (MALDI) [19] or soft ionization [20] has revolu-tionized applications of MS in protein characterization, making it quitestraightforward to analyze proteins with molecular weight of over 1 milliondaltons (Da) ESI forms multiple-charged ions for proteins/peptides by spray-ing the sample solution through a nozzle under a strong electrical field Themolecular weight of a protein can be calculated from a group of [M + nH]n+

ions in the ESI spectrum with a better precision Also, multiple-charge ions

appear at m/z values which are only fractions of the actual molecular weight

of the analyte This allows one to observe high-molecular-weight proteinsbeyond the normal mass range of a mass spectrometer In addition, ESI oper-ates at atmospheric pressure, which allows the direct on-line analysis by inter-facing HPLC with MS The MALDI technique has high ionization efficienciesfor proteins and can achieve a mass range of over 500 kDa when coupled with

a time-of-flight (TOF) mass analyzer In this technique, proteins are mixedwith an IR or UV absorbing matrix in large excess and the mixed sample isdeposited on a sample target, dried, and inserted into the mass spectrometerfor laser irradiation In contrast to multiple-charge ions in ESI, the singlycharged ions are the most abundant species in the MALDI-MS spectrum.Higher sensitivity (lower femtomole) can be achieved with MALDI-MS analysis

The very first step in protein characterization is the molecular weight mination With multiple-charge ions formed in ESI, a deconvoluted mass spec-trum can be generated to give an average molecular weight of the protein bycalculating from successive multiple-charged ions For example, Figure 19-1shows an ESI mass spectrum of a recombinant interferon α-2b (antiviralprotein drug) with a charge distribution of +9 to +13 The deconvoluted spec-trum (Figure 19-1, insert) gives a molecular weight of 19,266.3 Da for thisprotein The mass measurement precision and accuracy are enhanced by theuse of all the observed multiple-charged ions (typically better than 0.01% formasses up to 100 kDa) [21] The MALDI-MS technique can also be employed

deter-to analyze intact proteins with high deter-tolerance of impurities (salts, etc.) Figure19-2 illustrates a MALDI-TOF mass spectrum of 1 pmol of anti-IL-5 MABprotein with an average molecular weight of 146.5 kDa [1] The singly chargedmolecular ion [M + H]+is observed at m/z 146,485, along with a doubly charged

molecular ion

Trang 8

The protein identification or sequence determination of a protein can beachieved using two different approaches: “top-down” [22, 23] and “bottom-up” [24] A top-down experiment involves high-resolution measurement of anintact molecular weight and direct fragmentation of protein ions by tandemmass spectrometry (MS/MS) [25] This approach surveys an entire proteinsequence with 100% coverage Post-translational modifications such as glyco-

Figure 19-1 Positive ion ESI mass spectrum of rh-IFN-α-2b The insert shows a voluted spectrum

decon-Figure 19-2 MALDI-TOF mass spectrum of 1 pmol of anti-IL-5 MAB protein.

(Reprinted from reference 1, with permission of the Thomson Corporation.)

Trang 9

sylation and phosphorylation tend to remain intact during MS/MS tation at the protein level The fragment ions obtained allow the protein iden-tification by database retrieval, quick positioning of the N- and C-termini,confirmation of large sections of sequences, and partial or exact localization

fragmen-of modifications This is a preferred method for protein identifications.However, there are some obstacles that need to be overcome before thisapproach can be widely accepted as a standard in protein identifications Thesechallenges include accessibility of expensive MS instrumentation for accuratemass measurements of large proteins, development of suitable MS instru-mentation for efficient MS/MS data acquisition in automatic fashion, andappropriate database search algorithm In contrast to the top-down method-ology, the bottom-up experiment refers to the process in which proteins aredigested into smaller peptides under enzymatic cleavages without measuringthe accurate mass value of the intact protein These enzymatic digested pep-tides (tryptic peptides, etc.) often can be unique in terms of their mass, aminoacid composition/sequence, and separation characteristics They can be sepa-rated/detected and either (a) directly searched against a genome or proteindatabase for protein identification (peptide mass mapping) or (b) further dis-sociated in a tandem mass spectrometric experiment to generate fragment ionsfor database search (sequence tagging) [26, 27] The principal fragment ions

in polypeptide ions are b ions (N-terminus) and y ions (C-terminus) resultedfrom cleavages of amide bonds under collision-induced dissociations [28].These are amino acid-specific fragment ions and can be used to derivesequences of polypeptides Further database search based on the MS/MSinformation can lead to identification of proteins The general sequence cov-erage from this approach (5–70%) is far less than 100% from top-downapproach Post-translational modifications are likely to be lost during MS/MSfragmentation at the peptide level In spite of these limitations, the bottom-upapproach has become a current standard method in protein identificationsbecause of its high-throughput format and well-refined methodology—forexample, mature instrumentation and excellent software development [29].Some specific examples using this approach will be described in the followingsections

DRUG TARGETS

19.3.1 Biotechnology Products Development

The production of biologically important proteins by recombinant DNA niques and development of modified counterparts is a very challenging field.Certain criteria of safety, quality, and efficacy are required for the develop-ment and approval of these protein products as therapeutic agents The presence of structural variations during the different steps in the protein

Trang 10

tech-production process could affect the protein’s biological properties and alterthe safety, potency, and stability of the protein product The development ofsensitive analytical techniques for the analysis of therapeutic proteins is essen-tial for the quality control and structural characterization of recombinantprotein products Two examples are illustrated below, including recombinanthuman granulocyte-macrophage colony stimulating factor (rh-GM-CSF) andinterferon alpha-2b (rh-IFN-α-2b).

glycopro-teins that regulate the differentiation, activation, and proliferation of multipleblood-cell types from progenitor stem cells This particular glycoprotein isessential for the proliferation and differentiation of progenitor cells intomature granulocytes and macrophages [30] It enhances the production andfunction of white blood cells with its potential clinical applications for follow-

up treatment for patients who have gone through chemo or radiation therapyfor tumors, as well as bone marrow transplantation GM-CSF has been clonedand expressed in various cell lines that include yeast, Chinese hamster ovary,

and E coli The E coli derived GM-CSF used in this study contains 127 amino

acid and has a molecular weight of ∼14,477.6 Da

One of the first measurements performed to characterize a protein is mination of the molecular weight It is an important physical parameter thatcan be used to confirm primary structure and identity of the protein, charac-terize post-translational modifications, and determine batch-to-batch repro-ducibility in the production of recombinant proteins The mature proteinsequence for human GM-CSF with four cysteine residues is shown in Table19-1 [31] Figure 19-3A displays the ESI-MS spectrum of rh-GM-CSF, con-taining a series of multiply-charged ions ranging from the 7+ to the 16+ chargestate that correspond to molecular ions of the protein The measured averagemolecular weight (14,472 Da, as shown in the insert) suggests the presence oftwo disulfide bonds in the rh-GM-CSF because the calculated averaged molecular weight of rh-GM-CSF derived from the sequence is 14,477.6 Da

TABLE 19-1 Amino Acid Sequence of rh-GM-CSF from E Coli

APARSPSPSTQPWEHVNAIQEARRLLNLSRDTAAEMNETVEVI-T1-→ -T2 -→ -T3 -→ -V1 -→ V2 → -V3 -→-V4->-V5-> -

-T4 -→ T5 → T6-→ T7 →T8> -T9

-→ -V6> V7 -→ -V8 -→ -V9QHC88PPTPETSC96ATQIITFESFKENLKDFLLVIPFDC121WEPVQE -T10 -→-T11-> -T12 -→ -→ -V10 -→-V11-> -V12 -→-V13→ -

respectively.

Trang 11

(without accounting for existing disulfide bonds) This was further supported

by ESI-MS analysis of rh-GM-CSF after reduction with β-mercaptoethanol,

as shown in Figure 19-3B The 4-Da mass shift of the measured molecularweight of reduced rh-GM-CSF (14,476 Da) from nonreduced rh-GM-CSF con-firms the presence of two disulfide bonds in the protein molecule In addition,the charge state distribution is also shifted to higher charge states (17+, 18+,19+, 20+) for the reduced form, indicating a more open form of protein struc-ture for protonations upon disulfide-bonds reduction Furthermore, the mol-ecular weight information obtained from ESI-MS spectrum has a higheraccuracy of mass measurement (generally better than 0.01%)

The primary structural information of the protein can be obtained by matic cleavage of the protein into smaller peptide fragments, followed by MSdetermination of the molecular weights of the resulting mixture peptides(peptide mass mapping) In this case, peptide mass mapping involved enzy-

enzy-matic digestion of the rh-GM-CSF with either trypsin or Staphylococcus aureusV8 protease, followed by MS analysis of digestion mixtures Trypsin

Figure 19-3 Positive ion ESI mass spectra of rh-GM-CSF (A) In 1% HCOOH and

(B) after treatment with β-mercaptoethanol The deconvoluted spectra are shown inthe inserts (Reprinted from reference 31, with permission of the Protein Society.)

Trang 12

selectively cleaves rh-GM-CSF at the C-terminal side of argine (R) and lysine(K), while V8 protease specifically cleaves the peptide bond on the C-terminal side of glutamic acid (E) residues It is important to note that an enzy-matic digest of a large protein can yield fragments of incomplete digestion.For example, trypsin does not cleave at a lysine-proline (K-P) bond, and R-Pbonds are marginally more susceptible Also, peptide fragments that containedtwo contiguous basic sites (K-K, K-R, R-R, etc.) are observed with R or K onthe N-terminal This results from the poor exoprotease activity of typsin Sim-ilarly,V8 protease can produce incomplete digestion products;Asp (D) is occa-sionally cleaved The expected peptide fragments from enzymatic cleavages ofrh-GM-CSF with trypsin or V8 are shown in Table 19-1 For tryptic digest ofunmodified rh-GM-CSF (V0), the mass values of the majority of the observedsignals could be matched with the molecular ions of the tryptic peptides predicted from amino acid sequence (Table 19-2), with the exception of thecysteine-containing fragments T4 (DTAAEMNETEVISEMFDLQEPTC54

LQTR), T10(QHC88PPTPETSC96ATQIITFESFK), and T12(DFLLVIPFDC121

WEPVQE) These peptide fragments (T4, T10, T12) are interconnected by fide bonds with an isotopically averaged mass of 7614.6 Da, as illustrated in

disul-Figure 19-4 This disulfide-linked core peptide was detected at m/z 7613.3 by

Cs+liquid secondary-ion MS, indicating the presence of this core peptide andtwo disulfide bonds in rh-GM-CSF Furthermore, these peptide fragmentswere released after treatment of the tryptic digests with dithiothreitol (reduc-

ing reagent), and subsequent MS analysis of the mixture yielded signals at m/z

3202.3, 2466.8, and 1951.8 corresponding to their free sulfhydryl forms as T4,

T10, and T12, respectively, thus confirming the presence of two disulfide bonds

in rh-GM-CSF The assignment of the cysteine-containing peptides was alsoconfirmed by MS analysis of a tryptic digest of rh-GM-CSF in which the

cystine residue were S-alkylated with 4-vinylpyridine in the presence of

TABLE 19-2 Tryptic Digest of rh-GM-CSF (V0) and Its Variants (V1 and V2)

Trang 13

butylphosphine [32] The resulting pyridylethyl cysteine tryptic peptides wereobserved as strong ions with masses 106 Da higher than the unmodified pep-tides (data not shown).

Although tryptic peptide mass mapping of rh-GM-CSF demonstrated thepresence of two disulfide bonds and suggested two possible combinations ofdisulfide pairing (i.e., exact modification site) as C54-C88/C96-C121 or C54-C96/C88-C121, the assignment of the disulfide pairing was not possible due toabsence of a tryptic site between C88 and C96 residues of T10 Thus, V8 pro-tease was employed to digest rh-GM-CSF and cleave the protein between eachhalf-cystine residue at the C-terminal side of glutamic acid The MS analysis

of the V8 protease digest of rh-GM-CSF confirmed the presence of most

of the predicted peptides (Table 19-3) The ions at m/z 2272 and 3036

corresponded to the disulfide-linked peptides V8-SS-V10 (PTC54SS-TSC96ATQIITFE) and V7,8-SS-V10 (MFDLQE PTC54LQTRLE-SS-TSC96ATQIITFE), arising from incomplete cleavage at Glu(51) These MSsignals disappeared upon dithiothreitol (DTT) reduction reaction, thus sug-gesting a Cys(54)–Cys(96) disulfide bond The absence of digested peptides V1

LQTRLE-and V7 was likely due to the incomplete cleavages, as indicated by the ence of V1-2 and V7-8 peptides Interestingly, V9 and V12 peptides were notobserved in the spectra despite their hydrophobic character based on primarystructures This signal suppression may arise from contributions of peptide’ssecondary or tertiary structure affecting its hydrophobic character [31] Toovercome the difficulty in detecting absent peptides, the mixture of digestedV8 peptides was separated by HPLC and isolated fractions were analyzed by

pres-MS All 13 V8 peptide fragments were revealed V1peptide was observed as

V1-2at m/z 2302, while V7peptide was shown as part of V7-8at m/z 1824 due

to incomplete cleavages V9peptide was not only seen at m/z 3712 as expected,

but was identified as V9-SS-V12-13(LYKQGLRGSLTKLKGPLTMMASHYKQHC88PPTPE-SS- NLKDFLLVIPFDC121WEPVQE, m/z 6017.6) and V9-

PPTPE-SS-Figure 19-4 Amino acid sequence and calculated average mass values of the tryptic

peptides comprising the disulfide-linked core peptide in rh-GM-CSF

Trang 14

SFKENLKDFLLVIPFDC121WEPVQE, m/z 6508.7) [31] These data clearly

established another pairing of disulfide bond between Cys(88) and Cys(121).For a recombinant protein, post-translational modifications such as phos-phorylation, oxidation, deamidation, and sulfation are known to occur TheGM-CSF variants were first observed after SDS polyacrylamide gel elec-

trophoresis (SDS-PAGE) of an E coli derived GM-CSF preparation as a hazy

band located slightly above the band corresponding to unmodified GM-CSF(V0) The haze was further separated and purified by preparative reversed-phase HPLC Typically, a Rainin Dynamax C4 column (300 Å, 4.1 × 250 mm)was run at a flow rate of 30 mL/min on a Rainin autoprep preparative HPLCsystem Samples were eluted using a linear gradient of 27% to 72% acetoni-trile in 0.1% trifluoroacetic acid (TFA) over a 30-min period A Knauer vari-able wavelength detector set at 280-nm absorbance was used to monitor peaks.Fractions were taken manually based on UV absorption and retention time.Isolated fractions containing two GM-CSF variants V1 and V2 were dilutedthreefold and re-chromatographed separately on a Rainin Dynamax C4column (300 Å, 2.1 × 250 mm) at a flow rate of 10 mL/min on a Rainin auto-prep HPLC system using a linear gradient of 27% to 72% acetonitrile in 0.1%TFA These two variants, V1 and V2, were found to have comparable biolog-ical activity to the parent GM-CSF (V0) Further structural identification workwas carried out on isolated fractions using MS methods

The peptide mass mapping strategy using trypsin and V8 protease wasapplied to solve structural identification problems of the variants The com-parison of the trypsin and V8 protease digest of the native GM-CSF (V0) and

TABLE 19-3 V8 Protease Digest of rh-GM-CSF (V0) and Its Variant (V2)

Expected

Trang 15

its variant V1 and V2 demonstrated that one or two methionine residues inV0 have been converted to methionine sulfoxides (Tables 19-2 and 19-3) Inthe case of V1, tryptic peptide T9had a mass increase of 16 Da (m/z 1252, Table

19-2), suggesting oxidation of Met(79) or Met(80) In the case of V2, however,both the tryptic peptide T4 (m/z 3218) and T9 (m/z 1252) had a mass shift of

16 Da with respect to T4and T9in V0 (Table 19-2) Therefore, V2 contains twomethionine sulfoxides: one at Met(46), the other at Met(79) or Met(80) Theassignment of Met(46) oxidation was further confirmed by a mass increase of

16 Da for V8 protease peptides V7-8and V7-8-SS-V10 No tandem MS ments were attempted to differentiate oxidation sites between Met(79) andMet(80) at that time because of instrumentation limitations, although theseexperiments would have provided detailed information on the exact modifi-cation sites An example on this approach using modern instrumentation isillustrated in the case of rh-IFN-α-2b The structural assignments of V1 andV2 were further supported by MS studies of chemically modified proteins VS-1, VS-2, VS-3, and VS-4 that have different degrees of oxidation of the fourmethionine residues in rh-GM-CSF amino acid sequence (data not shown) Inthese experiments, GM-CSF was treated with H2O2 under optimized condi-tions to produce oxidized proteins The preferential oxidation of Met(79) wasobserved in the mapping experiments of permethylated GM-CSF, where an

experi-unusual cleavage at Met(79)-Met(80) yielded a signal at m/z 1306 and a weak

signal 16 Da higher

It is evident from the discussions above that mass spectrometric method incombination with enzymatic digestion offers a convenient approach to thecharacterization of GM-CSF and its variants ESI-MS method demonstrated

a mass accuracy of better than 0.01% for a recombinant protein The massspectral data of the enzymatic digest of GM-CSF and its variants allow theprecise determination of the molecular weights of the peptides, leading to theidentification of sites of covalent modifications, the disulfide bonding pattern,and confirmation of the cDNA-derived sequence of the protein

recombi-nant DNA-derived therapeutic protein that is used as an anticancer agent and

in the treatment of chronic hepatitis B and C [33] It is a 165-amino acidprotein, containing four cysteines at positions 1, 29, 98, and 138 These fourcysteines form two disulfide bonds Cysteine 1, the N-terminal amino acid, islinked to cysteine 98; cysteine 29 is linked to cysteine 138 (Figure 19-5) Themolecular weight of IFN-α-2b is calculated to be 19,265 Da from its cDNAamino acid sequence [34] The sequence and disulfide mapping of IFN-α-2bhas been successfully carried out using the same peptide mass mappingmethod as described in the case of rh-GM-CSF—for example, enzymaticdigestion with trypsin on purified protein and mass analysis of digested peptidemixtures [35]

It is not unusual that the E coli expression of IFN-α-2b produces several

isoforms in addition to the target protein, as shown in its reversed-phase

Trang 16

HPLC chromatogram (Figure 19-6) Two of the three isoforms, 2 and

Iso-3, were predicted to be incorrectly folded forms of the target protein withscrambled disulfides The third isoform, Iso-4, was thought to be reduced IFN-α-2b containing four free cysteine sulfhydryls (SH) The level of Iso-4 wasobserved to decrease during the purification process, suggesting that Iso-4 mayrefold back to IFN-α-2b Earlier RP-HPLC data provided experimental evi-dence that IFN-α-2b could be reduced with DTT to Iso-4, and Iso-4 might bere-oxidized to IFN-α-2b In addition to these isoforms, a fourth component, avariant of IFN-α-2b, was detected either co-eluting with or as a small shoul-der eluting in front of the target protein peak (peak 1) The separation of thisshoulder peak from IFN-α-2b depended on the HPLC column load; forexample, better separation was obtained with lower column loads as illustrated

in Figure 19-7 The exact structures of these isoforms and the variant of IFN-α-2b can only be obtained using mass spectrometry in conjunction withRP-HPLC

The initial studies was carried out using on-line RP-HPLC coupled with asingle quadrupole ESI-MS to measure the molecular weights of IFN-α-2bcomponents The mass spectrum showed that other than IFN-α-2b, peak 1 inFigure 19-7c contained a protein with a MW of 19,281 Da that was 16 Dahigher than the predicted MW of 19,265 Da for IFN-α-2b This higher masscomponent corresponds to oxidation of one of the five methionine amino acidspresent in IFN-α-2b The oxidation of a methionine is also indicated by thefact that this component elutes earlier than the parent protein It is well known

Figure 19-5 Amino acid sequence of rh-IFN-α-2b

Trang 17

that proteins containing an oxidized methionine are more hydrophilic andthey tend to elute earlier on RP-HPLC than the parent protein [36, 37] Thisoxidized variant is present at approximately <2% by HPLC peak area nor-malization The dynamic range of the mass spectrometer was large enough todetect the presence of this variant as well as the more abundant IFN-α-2b,even at more diluted column loads as shown in Figure 19-7.

HPLC peaks 2 and 3 in Figure 19-6 corresponded to the predicted bled disulfides of IFN-α-2b, Iso-2 and Iso-3 They were expected to have thesame MW of 19,265 Da as that of IFN-α-2b (peak 1) However, the measuredMWs were found to be different from those predicted for an incorrectly foldedform of IFN-α-2b The determined MW of Iso-2 (M = 19,310 Da) was 45 Da

scram-Figure 19-6 RP-HPLC chromatographic profile of an “in-process” sample from E coli

recombinant DNA derived IFN-α-2b Peak 1 is IFN-α-2b Isoform peak 2 and 3 areputative scrambled disulfides Isoform peak 4 is a putative open disulfide The HPLCwas run under a linear gradient of 49–65% B (10 : 90 H2O : CH3CN/0.1% TFA) over 24minutes with the UV set at 214 nm The mobile phase A was water with 0.1% TFA andthe flow rate was set at 0.2 mL/min The column used was Vydac C8 column at 30°C(2.1 mm × 50 mm, 5 µm, 300 Å)

Trang 18

higher than that of IFN-α-2b This increased mass suggests the possibility ofacetylation of the N-terminus of the reduced target protein since the acetylgroup, CH3CO—, corresponds to a mass addition of 42 Da The MW of Iso-3

(M r= 19,643) was 378 Da higher than that of IFN-α-2b.The protein MW mation obtained from MS studies indicated that neither peak 2 nor peak 3corresponded to the postulated scrambled disulfides of IFN-α-2b They aremost likely to be post-translationally modified IFN-α-2b

infor-HPLC peak 4, Iso-4, in Figure 19-6 corresponded to the putative reducedIFN-α-2b containing four free cysteine sulfhydryls (Mr = 19,269 Da) It wasexpected to have an MW that was 4 Da higher than that of the target protein.The mass spectrum of peak 4 revealed that this symmetrical HPLC peak actu-ally consisted of two co-eluting components The MW of one of the compo-

Figure 19-7 RP-HPLC chromatograms showing dependence of the early eluting

variant, peak A, on column load (a) Peak A and peak 1 resolved with a column load

of 3µg of proteins (b) Peak A and peak 1 partially resolved with a column load of

∼6µg of proteins (c) Peak A and peak 1 co-eluting with a column load of ∼15µg of protein

Trang 19

nents, at 19,269 Da, corresponded to the reduced IFN-α-2b, that is, the dicted Iso-4 However, the MW of the second component, at 19,336 Da, is

pre-71 Da higher than that of the target protein No obvious post-translationalmodification could be proposed

The above approach using RP-HPLC/ESI-MS to determine the MW of theisoforms is a powerful tool in monitoring the production process of IFN-α-2b

It provided insight into the potential structures of two of the four isoformsand the variant that were present at various stages in the production of thetarget protein However, the structure and the identification of the post-translational modifications in Iso-2, Iso-3, and Iso-4 could not be determinedsolely based on this approach To fully characterize the post-translational modifications, individual isoforms were isolated from an early step in thepurification of IFN-α-2b, followed by extensive MS characterization This wasdemonstrated in the case of Iso-4

The first step was to verify the MW of the isolated protein Iso-4 using triplequadrupole ESI-MS The MW of isolated Iso-4 was found to be 72 Da higherthan that expected for IFN-α-2b The next step involved RP-HPLC/ESI-MSanalysis of tryptic digests of the control IFN α-2b and IFN Iso-4 in order toidentify the nature of the modification The peptide mass mapping results aredisplayed in Figure 19-8 and Table 19-4 Comparison of the ESI-MS peptidemaps of the two proteins shows differences in the N-terminal peptide frag-ments The N-terminal peptide fragment of IFN-α-2b, T1 (1CDLPQTHSLGSR12), is linked with peptide T10(or T9,10and T9,10,11) through the disulfidebond formed between Cys-1 and Cys-98 These disulfide-linked peptide frag-ments—for example, T1-ss-T10 (m/z 4617)—were largely absent in the Iso-4

digest shown in Figure 19-8b Instead, the Iso-4 tryptic peptide map revealed

two new peptide fragments at m/z 1314 and 1384, respectively These peptide

fragments corresponded to the N-terminal peptide fragment T1 and T1 +

70 Da The mass difference of 70 Da in these peptide fragments is in agreementwith the mass difference (70 Da) between Iso-4 and IFN-α-2b when the massincrease of 2 Da resulted from reduction of the disulfide bond is considered.The amino acid sequence of the modified peptide and the site of the mod-ification in Iso-4 was further determined by RP-HPLC/ESI-MS/MS studies ofthe doubly charged molecular ions of the T1(m/z 658) and the T1+ 70 Da (m/z

693) peptides (Figure 19-9) Tandem MS data of the doubly charged ion for T1

+ 70 demonstrated that the peptide fragment was indeed the N-terminaltryptic peptide fragment, T1, of IFN-α-2b with a 70-Da modification groupresiding on the N-terminal cysteine The observation of the more prominentN-terminal fragment ions of the modified T1 peptide, which were shifted by

26 Da compared with those of the T1peptide of IFN-α-2b, implied a rapid loss

of 44 Da (CO2) This suggested that a labile carboxyl group could be a part ofthe 70-Da modification moiety This assumption was further confirmed byobservation of the loss of 44 Da from T1+ 70 using a higher orifice potential(80 V) for peptide mass mapping of Iso 4 using MS No such loss was detectedfor T peptide under the same orifice condition Product ion spectrum of the

Trang 20

doubly charged ion of T1+ 26, generated from the high orifice ESI-MS iment, exhibited the N-terminal fragment ions of b2+ 26, b3+ 26, and a2+ 26.

exper-As expected, the second series of fragment ions—that is, b2+ 70, b3+ 70, and

a2+ 70—were absent

The elemental composition of the 70-Da post-translational modificationgroup was determined by accurate mass measurement using high-resolution

Figure 19-8 Peptide mass mapping by RP-HPLC/ESI-MS (a) Total ion chromatogram

(TIC) of the trypsin digested IFN-α-2b showing the intact N-terminal peptide fide fragments, T1-ss-T10and T1-ss-T9,10 (b) TIC of the trypsin digested Iso-4 displayingthe absence of the intact N-terminal peptide disulfide fragments, T1-ss-T10and T1-ss-

disul-T9,10, and the appearance of a T1+ 70 Da peptide fragment The tryptic peptides wasfirst desalted with 5% mobile phase B (CH3CN/0.08% TFA), followed by a gradientrun on a Supelcosil LC-18-DB column (1 mm × 300 mm, 100 Å) with a 5–95% B in 150minutes (40µL/min with a mobile phase A: water with 0.1% TFA)

Trang 21

MALDI-TOF-MS Clearly, the 70-Da modification group was a pyruvate(C3H2O2) Pyruvic acid (CH3COCOOH), like acetic acid (CH3COOH) andother common acids, forms a strong amide bond through the carboxyl group(C-1) with the N-terminal amine group in proteins [38, 39] This amide bond

is generally stable to mild acidic and base conditions However, the pyruvatebond in Iso-4 appeared to be labile under mild acidic conditions In addition,the modification of the protein through C-1 of pyruvic acid is not likely to gen-erate a labile carboxyl group in the modification moiety as observed in theMS/MS studies This information led to the hypothesis that the puruvation ofIFN Iso-4 involved a unique chemistry in which a ketimine link was likelyformed between C-2 of pyruvic acid and the N-terminal cysteine amino group.This ketimine bond is reversible under mild acidic conditions as illustrated inFigure 19-10 The absence of the disulfide bond between Cys-1 and Cys-98 inIso-4 favors formation of the cyclic pyruvate intermediate (B) rather than for-mation of the ketimine (imine) intermediate (A) This hypothesis was con-firmed by comparing the product ion spectrum of the T1peptide fragment ofthe Iso-4 with that of a synthetically prepared T1 peptide fragment that wasderivatized with pyruvic acid The MS/MS analysis of this pyruvated syntheticpeptide generated the same fragmentation pattern as that of the N-terminaltryptic peptide of Iso-4 The N-terminal fragment ion of b + 26 (m/z 245),

TABLE 19-4 Tryptic Peptide Fragments of IFN-a-2b

Ala-Glu-Thr-Ile-Pro-Val-Leu-His-Glu-Met-Ile-Gln-Gln-Ile-Phe-Asn-Leu-Phe-Ser -T8-Asp-Ser-Ser-Ala-Ala-Trp-Asp-Glu-Thr-Leu-Leu-Asp-Lys-

-T9

-Val-Gly-Val-Thr-Glu-Thr-Pro-Leu-Met-Lys-

-Phe-Tyr-Thr-Glu-Leu-Tyr-Gln-Gln-Leu-Asn-Asp-Leu-Glu-Ala-Cys-Val-Ile-Gln-Gly -T10-Glu-Asp-Ser-Ile-Leu-Ala-Val-Arg-Lys- -Tyr-Phe-Gln-Arg- -Ile-Thr-Leu-Tyr-Leu-Lys-

-T11 -→T12 -T13 - -T14-Glu-Lys-Lys- -Tyr-Ser-Pro-Cys-Ala-Trp-Glu-Val-Val-Arg- -Ala-Glu-Ile-Met-Arg-Ser -T15 >-T16- -T17 - - -T18 -→ -Phe-Ser-Leu-Ser-Thr-Asn-Leu-Gln-Glu-Ser-Leu-Arg- -Ser-Lys- -Glu

-T19 - T20 → T21

Trang 22

-generated by MS/MS of T1 + 70 from Iso-4 and the synthetic peptide, wasfurther dissociated in the ion trap mass spectrometer, producing fragment ions

at m/z 102 and 130 resulted from cleavage of CO—CH and the amide bond

of Cys-1, respectively This multiple-stage MS analysis (MSn) further supportedthe original hypothesis

In addition, the DNPH (2,4-dinitrophenylhydrazine) [38] and NADH(dihydronicotinamide adenine dinucleotide) [36] studies with purified Iso-4 provided the evidence that the 70 Da moiety was a pyruvate deri-vative (C3H2O2) In the DNPH study, treatment of Iso-4 with acid and 2,4-dinitrophenylhydrazine produced the 2,4-dinitrophenylhydrazone of thepyruvic acid liberated from Iso-4 In the NADH study, the amount of NAD+

Figure 19-9 LC/ESI-MS/MS product ion mass spectra of the doubly charged ions of

the N-terminal tryptic peptide T1of (a) IFN-α-2b (m/z 658) and (b) Iso-4 (m/z 693).N-terminal Cys-Asp was identified as the modified fragment

Trang 23

produced is proportional to the amount of pyruvic acid liberated from the mildacid hydrolysis of Iso-4 These procedures were also applied to purified IFN-α-2b as a control The control experiments demonstrated that the pyruvatederivative was the active component measured in the Iso-4 experiments.

To verify that Iso-4 was interconvertible with IFN-α-2b, a sample of fied Iso-4 was treated under mild acidic conditions in an attempt to convert

puri-it to IFN-α-2b The MW measurement of the converted protein by HPLC/ESI-MS confirmed that Iso-4 could be converted to IFN-α-2b undermild acidic conditions Furthermore, the IFN-α-2b obtained from the conver-sion of Iso-4 was enzymatically digested with trypsin and studied by RP-HPLC/ESI-MS to assess the status of disulfide bonds The presence of the twodisulfide-bonded peptide fragments, T1-ss-T10 and T5-ss-T17, revealed the correctly folded IFN-α-2b

RP-The other isoforms, Iso-2 and Iso-3, expressed in the E coli fermentation

of IFN-α-2b, were characterized using a similar approach The isolated Iso-2and Iso-3 were enzymatically digested with trypsin, and the resulted peptidemixtures were mass-mapped using RP-HPLC/ESI-MS The results indicatedthat Iso-2 was a correctly folded IFN-α-2b acetylated on the amino group ofthe N-terminal cysteine Iso-3 was similarly determined to be a glutathionatedform (Cys-98) of the partially reduced IFN-α-2b that was pyruvated on the N-terminal cysteine The complete structures for IFN-α-2b, Iso-2, Iso-3, andIso-4 are shown in Figure 19-11

Figure 19-10 The pyruvate formation with the N-terminal cysteine The C-2 carbonyl

in pyruvic acid initially forms a ketimine intermediate (A) The sulfhydryl (SH) group

of Cys-1 generated from the reduced cysteine 1–98 disulfide bond in Iso-4 tends tofavor the formation of the more thermodynamically stable cyclic thiazolidine pyruvateintermediate (B)

Trang 24

The pyruvic modification of the N-terminal cysteine of E coli derived

recombinant IFN-α-2b via a ketimine linkage has not been reported ously There were only two cases in the literature that involved the ketimineformation of the pyruvic acid C-2 carbonyl group and the amino group of theN-terminal cysteine amino acid, including the post-translational modification

previ-of the Ner protein previ-of the bacteriophage Mu [38] and the β-chain previ-of bin A1b[39] with pyruvic acid The chemistry of pyruvic acid attachment toIso-4 from this study has a significant impact on the production of IFN-α-2b

hemoglo-It led to the development of a reproducible conversion procedure from Iso-4

to IFN-α-2b in the production process, resulting in a five- to sevenfold increase

in the production yield

19.3.2 Protein Glycosylation and Phosphorylation

(glycosyla-tion) are key factors in modulating protein structures and functions withincells Glycosylation affects probably more than half of all proteins in a eukary-otic cell [40] In the extracellular environment, the oligosaccharide moieties of

Figure 19-11 The structures of IFN-α-2b and its three isoforms, Iso-2, Iso-3, and Iso-4.The solid line indicates the disulfide bond formation, while the dashed line indicatesthe reduced disulfide bond or partial disulfide bond formation

Trang 25

glycoproteins are implicated in a wide range of cell–cell and cell–matrix nition events which exert effects on cellular recognition in infection, cancer,and immune response There are many instances where glycan structures havebeen shown to have significant importance in the biological function of aprotein For example, glycosylation of Asn-319 on rabies virus glycoprotein isessential for the secretion of soluble rabies virus glycoprotein [41] Changes

recog-in levels and types of glycosylation can be associated with disease It has beenillustrated that detecting changes in glycan structure may be used as a diag-nostic for aggressive breast cancer [42] Glycan profiling of normal and dis-eased forms of a glycoprotein has provided new insights for future research

in rheumatoid arthritis, prion disease, and congenital disorders of tion [43–47] In all these diseases, differences in glycosylation indicate thatthere are cellular or genetic changes that affect the activity of specific glyco-transferases Glycosylation also represents the most common modification forrecombinant protein products expressed in mammalian and insect cell lines.Carbohydrate modifications of recombinant proteins have significant impacts

glycosyla-on their solubility, immunogenicity, resistance to proteolysis, circulatory life, and thermal stability, all of which will affect the use of the recombinantproteins as therapeutic entities or as drug targets The important roles that gly-coproteins play in biology and medicine have stimulated a rapid expansion ofthe field of glycobiology and brought up the need to develop rapid and accu-rate analytical methods to characterize the glycoproteins

half-Glycosylation occurs in the endoplasmic reticulum (ER) and Golgi compartments of the cell and the reactions are catalyzed by membrane-boundglycotransferases and glycosidases [48, 49] All mammalian N-linked oligosac-charides share a common trimannosyl core Man3GlcNAc2 derived from abiosynthetic precursor Glc3Man9GlcNAc2 that is added cotranslationally topolypeptides in the ER There are three types of N-linked oligosaccharides:high mannose-type, complex-type, and hybrid-type For N-linked glycopro-tein, the attachment of glycan structures to proteins usually occurs at an Asn-Xaa-Ser/Thr consensus Xaa may be any amino acid except proline.O-oligosaccharide biosynthesis is initiated in the Golgi by the addition of asingle sugar to serine or threonine There are at least seven O-linked oligosac-charide core structures, four of which are particularly widespread in mam-malian glycoproteins [49]

Carbohydrates are polymers with a wide diversity of glycan structureswhich comes from the variation in the type, number, and position of individ-ual sugar residues, the degree of branching, and the level of acetylation, methy-lation, sialylation, phosphorylation, and sulfation The populations of sugarsattached to an individual protein will depend on the cell type in which the gly-coprotein is expressed and on the physiological status of the cell, and they may

be developmentally and disease-regulated A glycoprotein usually exists ascomplex mixtures of glycosylated variants (glycoforms) due to (a) the diver-sity of oligosaccharides attached to the glycoprotein and (b) the occupancy ofeach glycosylation site Complete structural characterization of a glycoprotein

Trang 26

requires the determination of the peptide primary sequence and the lation sites, as well as the definition of the attached oligosaccharides in terms

glycosy-of their linear sequencing, branching, linkage, configurations, and the tional isomers

posi-In general, glycoprotein is enzymatically digested such that each tion site is located within a separate peptide [50] HPLC separation of the pep-tides coupled with MS precursor ion scans of sugar specific oxonium ions, such

glycosyla-as m/z 163 (protonated Hex), m/z 204 (protonated HexNAc), or m/z 366

(pro-tonated Hex-HexNAc), allows the glycopeptides to be identified from themixture of peptides for further studies [51–54] In some cases, where tandem

MS is not available, fragmentation induced by internal energy transfer to theion during ionization process (in source fragmentation during electrospray orpost-source decay in MALDI) can also generate these marker ions in the low

m/zrange of the mass spectra for glycopeptides identification [55, 56] ciation of a glycopeptide in a CID experiment will provide the information onprimary sequence of the peptide, the type of sugar attached, and the aminoacid residue that was modified by the glycosyl group [57–59] However,identification of glycopeptides from a peptide mixture by the above MSapproaches can sometime be problematic, primarily due to the poor ioniza-tion efficiency of glycopeptides compared to their unmodified forms and theextensive gas-phase deglycosylation to locate the site of sugar attachment [52, 57, 60] To overcome these problems, several strategies have been developed:

Disso-1 Removal of glycans through β-elimination for O-linked glycans or

enzy-matic digestion using N-glycosidase for N-linked glycans O-linked glycan

elimination converts Ser to Ala and Thr to aminobutytic acid, where 16-Damass losses will be observed and can be served as a marker for the site of

sugar attachment Deglycosylation by N-glycosidase converts Asn to Asp with a mass increase of 1 Da N-glycosidase F with simultaneous partial

(50%) or full 18O-labeling of glycosylated asparagine residues has beenused to magnify the mass difference for glycosylation site identification[61–64] The glycan elimination with subsequent changes in peptide MWsimplifies the downstream determination of the glycosylation site byMS/MS Furthermore, the 18O/16O labeling method can be used to deter-

mine the degree of occupancy of each N-glycosylation site.

2 Affinity capture of the N-linked glycopeptides or glycoproteins via lectincolumn-mediated affinity purification followed by MS analysis [63–65] Kaji

et al [64] developed a strategy termed isotope-coded specific tagging (IGOT), which combined the lectin affinity purification and

glycosylation-site-N-glycosidase mediated 18O labeling method Applying IGOT, they characterized the N-linked high-mannose and/or hybrid-type glycopro-

teins from an extract of C elegans proteins and were able to identify 250 glycoproteins with the simultaneous determination of 400 unique N-

glycosylation sites by using multidimensional LC-MS

Trang 27

3 “Top-down” sequence analysis of whole glycoprotein ions using CID andion/ion proton transfer in a quadrupole ion trap MS [66] This approacheliminated gas-phase deglycosylation of N-linked oligosaccharide inribonuclease B, and the glycosylation site was identified to be Asn-Leu-Thr

at residues 34–36 [66]

As an example of general structural characterization of glycoproteins, MSanalysis of CHO cell-derived interleukin-4 (IL-4) was illustrated IL-4 is T-cell-derived lymphokine that mediates the growth, proliferation, and differentia-tion of B- and T-lymphocytes and myeloid cells [67] CHO cell-derived IL-4 is

a 129-amino acid glycoprotein that contains two potential N-glycosylation sites

at Asn-38 and Asn-105 [68] Compositional analysis of the oligosaccharidemoieties of CHO IL-4, carried out by high-performance anion-exchange chromatography coupled with pulsed amperometric detection, resulted in thefollowing molar concentrations: Man 1.9, GlcN 3.9, Gal 2.1, sialic acid 1.9,Fuc 0.7, and GalN < 0.04

Analysis of the intact CHO IL-4 by ESI-MS was first attempted to providepreliminary information on the glycan components [69] The ESI mass spec-trum contained three envelopes of multiply charged ions ranging from the 8+ to the 10+ charge state, each comprising eight peaks corresponding to theindividual glycoforms of the protein (Figure 19-12A) The deconvoluted mass spectrum revealed several components, with signals at 17,019 Da and17,309 Da corresponding to sialylated glycoforms (Figure 19-12B), since theirmass separation concurred with the incremental mass of the sialic acid unit(NeuAc; 291 Da) The presence of these sialylated components, combined withthe carbohydrate compositional results, indicated the likely identity of thesemajor glycans as the fucosylated biantennary oligosaccharide in the mono- anddi-sialylated forms (theoretical MW of IL-4: 14,963 Da, not accounting for theexisting disulfide bonds) The signal at 16,727 Da corresponded to the asialobiantennary oligosaccharide, whereas a glycoform lacking a unit of galactoseand fucose yielded the signal at 16,417 Da (Figure 19-12B) Other, higher masssignals in the deconvoluted ESI mass spectrum indicated the presence of tri-and tetraantennary glycans containing up to three additional lactosamine units(Hex-HexNAc, in-chain mass of 365 Da) Overall, the ESI-MS analysis pro-vided the correct assignment of the two major glycoforms in CHO IL-4 andalso allowed the detection of signals arising from more complex glycans

In order to assess the size of the carbohydrate component, deglycosylated

CHO IL-4 by N-glycanase was analyzed by ESI-MS The deconvoluted

spec-trum displayed an MW of 14,955 Da, which conformed well to the theoretical

MW of the protein (MW 14,963 Da) when considering the presence of threedisulfide bonds in the protein

Mapping of the primary structure of CHO IL-4 was carried out by tryptichydrolysis followed by measurement of the resulting peptide fragments by on-line HPLC/ESI-MS Since the primary sequence of IL-4 was known, this

MS mapping approach could confirm the cDNA-derived protein sequence,

Trang 28

and also allow identification of any posttranslational modification(s), by paring the ESI-derived mass values with the calculated MW of the predictedtryptic peptides Figure 19-13 exhibits the amino acid sequence of IL-4 withall the peptide fragments from trypsin digestion.The major difference betweenthe LC/MS profiles of CHO IL-4 tryptic digest and deglycosylated CHO IL-

com-4 tryptic digest was the observed pattern and molecular mass values for HPLCpeaks 11 and 14 (Table 19-5), thus making them good candidates for gly-copeptide-containing fractions The monosaccharide composition results andthe presence of sialic acid residues, which was shown at the molecular masslevel (Figure 19-12B), indicated the presence of a complex-type N-linkedoligosaccharide Since the types of complex N-linked carbohydrate structurestypically present in mammalian proteins contain a defined number ofsequence and branching variation, the ESI-derived glycopeptide massesshould reveal the type of the attached carbohydrate For HPLC peak 11, theresulting glycopeptide molecular masses determined from charge states were

5286, 5577, and 5868 Da These masses were in agreement with the assignment

Figure 19-12 Positive-ion ESI mass spectrum of CHO IL-4 (A) Raw spectrum and

(B) deconvoluted spectrum (Reprinted from reference 69, with permission of JohnWiley & Sons, Ltd.)

Trang 29

of the T4,5–T10 disulfide-linked peptide containing an asialo biantennaryoligosaccharide (calculated MW 5286.6 Da), accompanied by two glycoformswith one or two NeuAc units 291 Da apart Several weak glycopeptide signalscorresponding to variations in the Hex-HexNAc content of the carbohydratewere also observed in the ESI mass spectrum, indicating either additionalbranching of the asialo biantennary structure or arm extension of the asialobiantennary glycan prior to capping with NeuAc groups.

Figure 19-13 Amino acid sequence of rhIL-4 indicating all the expected tryptic

pep-tides Tn The potential glycosylation sites at Asn38and Asn105 are indicated in bold.(Reprinted from reference 69, with permission of John Wiley & Sons, Ltd.)

TABLE 19-5 ESI-MS Analysis of the Tryptic Digest of CHO IL-4 for HPLC Peak

11 & 14

Observed Mr

Trang 30

Identification of the CHO IL-4 glycosylation site was provided byHPLC/ESI-MS analysis of the V8 protease digest, where V4,5peptide (residues27–43) containing a sialylated biantennary N-linked oligosaccharide wasobserved, with ESI-derived MW of 3929 and 4220 Da (data not shown) These

glycopeptides were absent in the analysis of the N-glycanase-treated IL-4,

where multiply charged ions corresponding to the V4,5peptide were detectedwith an ESI-derived MW of 1868 Da These data are consistent with the pres-ence of a biantennary complex-type carbohydrate at the Asn-38 residue ofpeptide fragment V4, thus confirming that N-glycosylation occurs at Asn-38

rather than the other potential site of Asn-105 Thus, it seems that the ence of the Asn-38-containing tryptic peptide T5 resulted in partial inaccessi-bility of the adjacent Lys-37 tryptic site and shifted the disulfide-indicative ESI

pres-signals to higher m/z values owing to incorporation of the T5-CHO tide (T5*) into the adjacent disulfide-linked peptide T4

glycopep-Detection of low-mass sugar-specific oxonium ions in the ESI mass trum can assist in the identification of glycopeptide-containing fractionsamong the HPLC peaks of a glycoprotein proteolytic digest Production ofthese ions can be induced by increasing the orifice potential, which controlsthe extent of fragmentation in the de-clustering region of a mass spectrome-ter For example, at a higher orifice potential of 110-V, low-mass sugar oxonium

spec-ions at m/z 204 (HexNAc+), m/z 274 (NeuAc+), m/z 366 (Hex-HexNAc+) and

m/z 657 (NeuAc-Hex-HexNAc+) were observed for CHO IL-4 tryptic peak

11 (data not shown), indicating that this HPLC peak contained sialylatedoligosaccharides of the complex type Similar monitoring of these sugar-diag-nostic ions revealed the other fractions (peak 14, e.g.) containing glycopeptidefragments from CHO IL-4 Thus, all glycopeptide-containing fractions duringthe HPLC/ESI-MS analysis of CHO IL-4 were rapidly identified by carryingout the ESI-MS experiments at an elevated orifice potential, without having

to search the ESI spectra of each individual peak for signal patterns teristic of glycopeptides

charac-This HPLC/ESI-MS approach proved to be useful in detecting several coforms in CHO IL-4 Not only were the main asialo and sialylated bianten-nary glycoforms detected, but also additional signals indicative of higherbranching were well-separated This rapid assessment of glycosylation at themolecular level is invaluable for an initial batch-to-batch evaluation of mam-malian cell-derived proteins It provides real-time monitoring of the existingglycoform distribution and also allows for the detection of any changes thatmay arise from varying the conditions of the production process

modifica-tion that is reflected in close to 30% of eukaryotic gene products and almost2% of the human genome-encoded protein kinases Protein phosphorylationplays an essential role in intercellular communication during development, inphysiological responses and homeostasis, and in the functioning of the nervousand immune systems Reversible phosphorylation regulates many diverse

Trang 31

cellular processes such as growth, metabolism, proliferation, motility, and differentiation [70–72] Mutation and deregulation of the proteins, such asprotein kinases, play causal roles in human diseases The features of all cancers,deregulated cell growth and apoptosis, is a result of defective signaling path-ways [73]; protein kinases are essential elements in the signaling pathways thatmediate cell growth and programmed cell death Therefore, a complete catalogand characterization of phosphorylated proteins will afford the possibility ofdeveloping agonists and antagonists of these enzymes for use in diseasetherapy [74, 75].

In eukaryotic cells, protein phosphorylation happens mostly on serine, onine, and tyrosine residues (it could also happen on histidine, arginine, lysine,cysteine, glutamic acid, and aspartic acid to a much lesser extent) A compre-hensive analysis of protein phosphorylation involves the identification of thephosphoproteins, the localization of the residues that are phosphorylated, andthe quantitation of phosphorylation MS-based approach for characterization

thre-of phosphorylated proteins is based on the lability thre-of the phosphor moiety thre-ofthe phosphorylated peptides upon low-energy collisional activation in atandem mass spectrometric experiment The detection of a characteristic lossallows the identification of phosphorylated peptides from an unseparatedpeptide mixture or during on-line HPLC experiments In the positive ionmode, a neutral loss of 98 Da (H3PO4or HPO3and H2O) from the phospho-peptide can be used to confirm the existence of a phosphopeptide [76, 77]

The fragment ion at m/z−79 (PO3 −), however, is more frequently used in thenegative ion mode for phosphorylation specific precursor ion scanning

[78–81] The advantage of precursor ion scanning of m/z −79 includes its applicability to all phosphopeptides with phosphorylation occurring on

serine, threonine, or tyrosine Precursor ion of m/z 216 has also been used for

detection of tyrosine specific phosphorylation Mann and co-workers [82]

monitored precursors of m/z 216.043 (immonium ion of phosphotyrosine) by

using quadrupole TOF mass spectrometer in the positive ion mode andshowed that the quadrupole TOF was ∼fivefold more sensitive than the triplequadrupole instrument in monitoring precursors of this ion FTMS coupledwith electron capture dissociation (ECD) has been applied to characterize the phosphopeptides or intact phosphoproteins in recent studies [83] In this technique, dissociation is induced by electron recombination with theprotons of the multiply charge peptide or protein, where labile modificationssuch as phosphorylation of serine and threonine remain intact This processallows unambiguous assignment of the modification site in the peptide/protein

Although powerful, there are challenges of mapping phosphorylation sitessolely relying on the use of tandem MS For example, in the commonly usedpositive ion detection mode of peptide MS analysis, signal suppression of phos-phate containing peptides is often evident; the inherent lability of the phos-phate group undergoing neutral loss of HPO3(80 Da) upon CID can make theidentification of phosphorylation site difficult; for long phosphopeptides,

Ngày đăng: 22/01/2014, 00:20

TỪ KHÓA LIÊN QUAN