Markley Center for Eukaryotic Structural Genomics, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA Introduction One of the most important tasks in biotechnolog
Trang 1Wheat germ cell-free platform for eukaryotic protein
production
Dmitriy A Vinarov, Carrie L Loushin Newman and John L Markley
Center for Eukaryotic Structural Genomics, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
Introduction
One of the most important tasks in biotechnology
today is the development of improved systems and
strategies for synthesizing any desired protein or
pro-tein fragment in its folded, soluble form on a
prepara-tive scale This task is fundamental to the success of
structural genomics projects, which promise to
capital-ize upon numerous advances in science and
technol-ogy to change the appreciation and understanding of
biological systems Structural genomics implies a
move away from hypothesis-driven research to a
sys-tem of solving structures first and using these struc-tures and other strucstruc-tures modeled from them as the source of hypotheses for further research The medical incentives for understanding protein structure are great Many diseases are caused by defects in a single protein that alter its folding, stability, or activity The structures of proteins involved in diseases will move
us a step closer to improving disease treatment, diag-nosis, and prevention Beyond their specific medical applications, structural genomics projects are teaching fundamental lessons about the structural basis of life
on this planet
Keywords
cell-free extract; in vitro; isotopic labeling;
NMR screening; NMR structure
determination; protein production; protein
structure; transcription; translation; wheat
germ
Correspondence
J L Markley, Biochemistry Department,
University of Wisconsin-Madison, 433
Babcock Drive, Madison, WI 53706, USA
Fax: +1 608 262 3759
Tel: +1 608 263 9349
E-mail: markley@nmrfam.wisc.edu
Website: http://uwstructuralgenomics.org
(Received 2 May 2006, revised 13 July
2006, accepted 26 July 2006)
doi:10.1111/j.1742-4658.2006.05434.x
We describe a platform that utilizes wheat germ cell-free technology to pro-duce protein samples for NMR structure determinations In the first stage, cloned DNA molecules coding for proteins of interest are transcribed and translated on a small scale (25 lL) to determine levels of protein expression and solubility The amount of protein produced (typically 2–10 lg) is suffi-cient to be visualized by polyacrylamide gel electrophoresis The fraction of soluble protein is estimated by comparing gel scans of total protein and soluble protein Targets that pass this first screen by exhibiting high protein production and solubility move to the second stage In the second stage, the DNA is transcribed on a larger scale, and labeled proteins are pro-duced by incorporation of [15N]-labeled amino acids in a 4 mL translation reaction that typically produces 1–3 mg of protein The [15N]-labeled pro-teins are screened by 1H-15N correlated NMR spectroscopy to determine whether the protein is a good candidate for solution structure determin-ation Targets that pass this second screen are then translated in a medium containing amino acids doubly labeled with15N and 13C We describe the automation of these steps and their application to targets chosen from a variety of eukaryotic genomes: Arabidopsis thaliana, human, mouse, rat, and zebrafish We present protein yields and costs and compare the wheat germ cell-free approach with alternative methods Finally, we discuss remaining bottlenecks and approaches to their solution
Abbreviations
CESG, Center for Eukaryotic Structural Genomics; GST, glutathione S-transferase; HSQC, heteronuclear single-quantum correlation
spectroscopy; IMAC, immobilized metal affinity chromatography; PDB, Protein Data Bank; [U-15N]-, uniform labeling with nitrogen-15; SAIL, stereo-array isotope labeled; Se-Met, selenomethionine.
Trang 2Protein production remains a bottleneck in
proteo-mics, for both structural and functional studies Most
structural biology groups and structural genomics
cen-ters utilize cell-based, heterologous protein production
from Escherichia coli However, this approach fails
with many individual proteins, particularly those from
eukaryotes Failures result from no or low expression,
low solubility, or degradation Expression levels can be
improved by producing the protein of interest as a
cleavable fusion with a highly expressing protein Low
solubility can result from failure of the protein to fold
properly, aggregation of folded protein, or from
unfav-orable properties of the construct (intrinsic insolubility
of the native sequence or insolubility introduced by
a non-native sequence, such as a purification tag or
other cloning artifact) As indicated in TargetDB, the
target registration database for structural genomics
(http://targetdb.pdb.org/), the proportion of targets
that code for ‘unique proteins’ that yield soluble
pro-tein is only about one-third for prokaryotic propro-teins
and much lower for eukaryotic proteins In this
con-text, a unique protein is defined as one with a peptide
sequence exhibiting£ 30% sequence identity to the
sequence of any protein with a three-dimensional
structure deposited in the Protein Data Bank
Solubil-ity can be improved greatly by producing the protein
of interest as a cleavable fusion with a highly soluble
protein This strategy may enable the protein to fold
properly without aggregation so that it stays in
solu-tion following cleavage Many eukaryotic proteins are
‘natively disordered’, that is they do not adopt a
sin-gle, stable, folded structure Some natively disordered
proteins require an additional factor for folding: a
metal ion, a small molecule cofactor, another peptide
chain, or an oligonucleotide Other proteins may
require extensive post-translational modification to
achieve their native folded state Platforms for
struc-tural investigations must support the production of
proteins on the scale of 2–10 mg For efficient
struc-ture determination by NMR spectroscopy, the proteins
must be labeled with stable isotopes (15N or13C+15N,
or for larger proteins 2H+13C+15N) For X-ray
crys-tallography, proteins normally are labeled with
sele-nomethionine (Se-Met) to support multiwavelength
anomalous dispersion data collection for phase
deter-mination Because protein production and labeling on
this scale is expensive, it is important to screen targets
first on a smaller scale to identify which constructs are
expressed, soluble without aggregation, folded, and
stable under the conditions used for NMR structure
determinations or crystallization trials
In vitro cell-free methods for protein synthesis with
extracts from prokaryotic [1] or eukaryotic [2] cells
offer an alternative to the E coli cell-based platforms Cell-free approaches have a number of potential advantages over other alternatives to heterologous expression in E coli cells Stable isotope or Se-Met labeling is easier with cell-free systems than with yeast, mammalian, or insect cell systems [3–5] Cell-free sys-tems may permit successful production of proteins that undergo proteolysis [6,7] or accumulate in inclusion bodies [8] in cells Cell-free systems support selective labeling strategies [9–12] that cannot be achieved in bacterial whole cell systems An important emerging approach is the incorporation of stereo-array isotope labeled (SAIL) amino acids [13], chemically synthes-ized amino acids with stereo-specifically arrayed stable isotope (2H and13C) labeling patterns that are optimal for NMR spectroscopy SAIL amino acids are being commercialized by a start-up company in Japan (Sail Technologies, Inc., Yokohama, Japan) and when avail-able will raise the threshold for high-throughput NMR structure determinations from 20 kDa to 40 kDa and above [13] The SAIL amino acids must be incorpor-ated into proteins by in vitro synthesis so as not to dis-turb the labeling pattern
Cell-free systems have been used for the production
of various kinds of proteins, including membrane pro-teins [14] and propro-teins that are toxic to cells [8,15] It
is possible to collect NMR spectra of [15N]-labeled proteins prior to isolation from the cell-free protein synthesis mixture [16,17] One of the features of cell-free protein production is that only the protein of interest is labeled, so that contaminating proteins do not show up in normal multinuclear NMR spectra Cell-free protein production protocols are streamlined compared to cell-based protocols, in that they do not require cell harvesting or cell lysis Protein purification
is usually simpler, because the protein of interest starts out more concentrated and is isolated from a smaller set of contaminants
The RIKEN Structural Genomics Center in colla-boration with Roche has pioneered the use of cell-free protein production through a coupled transcription-translation system employing E coli extracts [18–22]
It has been found, however, that most of the pro-teins that produce well in E coli cell-free systems are the same ones that are produced successfully from
E coli cells [10] Thus, despite other potential advan-tages, the E coli cell-free approach may not greatly expand the range of proteins that can be produced in soluble, folded state, although it may be possible to overcome this limitation by redesigning the gene sequence (see below), by adding chaperones or other factors [22,24], or by reengineering ribosomal proteins [25]
Trang 3One of the first in vitro translation systems to be
investigated was prepared from wheat germ extracts,
but yields from this eukaryotic extract were low [2]
Y Endo and his group at Ehime University
(Matsuy-ama, Japan) achieved a breakthrough in this
technol-ogy by finding that an inhibitor of ribosomal protein
synthesis, tritin, is associated with the coat of the
wheat embryo [26] They developed a process for
removing this contaminating inhibitor and patented
this process along with methods for utilizing the
improved wheat germ extract [26–31] Endo founded
a company, CellFree Sciences Co., Ltd (Yokohama,
Japan), to commercialize the technology We found
this approach to be promising and formed a
cooper-ative undertaking with Ehime University and the
CellFree Sciences Co., Ltd with the goal of
investi-gating the potential of wheat germ cell-free protein
production as an enabling technology in our
struc-tural genomics project, the Center for Eukaryotic
Structural Genomics (CESG; Madison, WI) As
dis-cussed here, we have found this technology to be
robust, and our wheat germ cell-free pipeline now
supports high-throughput screening for protein
pro-duction and solubility and provides stable isotope
labeled protein samples for the majority of the NMR
structures determined at CESG [32,33]
CESG’s wheat germ cell-free platform
Our detailed protocol for wheat germ cell-free protein
production is available elsewhere [34] In short, the
approach consists of four steps (Fig 1A): (1) creation
of a plasmid used for in vitro transcription, (2) small
scale (25–50 lL) screening to assay the level of protein production and solubility, (3) larger scale (4–12 mL) production of [U-15N]-protein used to evaluate whe-ther solution conditions can be found that render the target suitable for NMR structure determination (soluble, monodisperse, folded, and stable), and (4) production of sufficient [U-13C,15N]-protein for multi-dimensional, multinuclear magnetic resonance data collection We purchase the wheat germ extract from CellFree Sciences, Inc., the RNA polymerase from Promega (Madison, WI), and the labeled amino acids from Cambridge Isotope Laboratories, (Andover, MA) Details about these and other reagents and sup-plies are found in our publications [32–34]
The purification workflow diagram is shown in Fig 1B In step (1), a defined series of cloning proce-dures are used to create a DNA plasmid containing the target gene and 5¢ and 3¢ extensions that promote efficient transcription and translation In step (2), small scale protein expression and purification trials are car-ried out, generally in a 96 well format Successful can-didates from these screens (those estimated to yield
> 0.5 mgÆmL)1 target protein with solubility > 75%) are then selected for larger scale protein production with incorporation of [15N]-labeled amino acids Puri-fied [U-15N]-protein samples produced in step (3) are then assayed by 1H-15N correlation spectroscopy (1H-15N HSQC) for their suitability as structural can-didates (they must be folded, monodisperse, and stable
at room temperature for at least 14 days) The solution conditions can be refined as part of this step Targets that pass these tests are then prepared as [U-13C,15 N]-protein samples, step (4)
1st GSTrap Column
Concentrate
PreScission Protease Cleavage
2nd GSTrap Column
Protein product with cleavable N-GST tag
Protein product with non-cleavable N-(His) 6 tag
Ni-HiTrap Chelating Column
Concentrate
Superdex75 in NMR Buffer
Concentration NMR sample
Cell Free Reaction (4-12ml)
Target selection
Screening (50 µl scale)
Analysis, Expression level, Solubility, (Tag cleavage)
3 Production and analysis of [ 15 N]-protein
DNA plasmid preps, Transcription
Translation on [ 15 N]-amino acids (4 ml reaction)
Isolation, purification (tag removal) HSQC NMR analysis,
Solubility, stability, and MS analysis
4 Production of [ 13 C, 15 N]-protein
As above but with double-labeling (4 –12 ml reaction)
Structure determination by NMR
Production for structural analysis (4-12 ml scale)
Production for structural analysis
2 Small scale – Transcription, Translation
1 Cloning – PCR from cDNA, Ligation cloning
DNA plasmid prep
Successful targets
Fig 1 (A) Workflow diagram showing how wheat germ cell-free platform is used to screen constructs for the expression of sol-uble protein, to produce [ 15 N]-labeled protein for NMR screening for suitability as a struc-tural candidate, and for the production of double-labeled [ 13 C, 15 N]protein for structure determination (B) Schematic illustration of the steps involved in isolating and purifying proteins produced by wheat germ cell-free platform depending on the type of tag: non-cleavable N-(His) 6 tag or cleavable N-GST tag.
Trang 4We have tested the wheat germ cell-free platform in
the context of NMR-based structural genomics of
eukaryotic proteins and have compared it with our
parallel E coli cell-based platform Our experience is
summarized briefly as follows (a) Targets can be
screened more quickly and more economically for
pro-tein expression and solubility by the cell-free approach
than by the cell-based approach The efficiency of this
process is important, because we need to screen many
targets or multiple constructs of a given target in order
to find one that produces a protein that is soluble and
well folded As an example of multiple screening of a
given target, we have screened targets with a
noncleav-able His6 tag, with a cleavable His6 tag, and with a
cleavable glutathione S-transferase (GST) tag and have
shown complementary success with these [35] (b)
Because of the smaller volumes involved, the isolation
and purification of 1–5 mg quantities of labeled
pro-tein for NMR structural studies is faster and less labor
intensive with proteins prepared by the cell-free
approach than the cell-based approach (c) Proteins
produced with the wheat germ extract from CellFree
Sciences and labeled amino acids generally show high
levels of enrichment by mass spectrometry: > 95%
15N⁄ (14N+15N) or 13C⁄ (12C+13C) These high levels
are excellent for NMR spectroscopy (d) The cell-free
system supports the production of proteins with a
vari-ety of labeling patterns: uniform labeling with2H,13C,
and 15N, selective labeling by residue type, and SAIL
(discussed above)
We recently carried out a detailed comparison of the
wheat germ cell-free and E coli cell-based approaches
to protein production for NMR structure
determin-ation [35] In this study 96 randomly chosen
Arabidop-sis thalianatargets were carried through CESG’s wheat
germ cell-free and E coli cell pipelines If possible,
[15N]-labeled versions of each protein were produced
for analysis by1H-15N correlation NMR spectroscopy
Of the 96 targets started with, only eight from the
cell-free pipeline and five from the cell-based pipeline were
found suitable for NMR structural analysis on the
basis of the NMR results In this comparison, the five
targets that proved successful by the E coli cell-based
approach also were successful by the cell-free
approach
Our wheat germ cell-free approach appears to have
advantages over published in vitro protein production
protocols that utilize E coli S30 extract (a) Cell-free
protocols utilizing E coli extract usually call for the
testing of multiple plasmids with sequence differences
outside the protein coding region to determine one
that produces protein in high yield [36] By contrast,
with the wheat germ cell-free protocol we have found
no advantage of modifying the plasmid sequence out-side the coding region, and hence utilize a single plas-mid construct for all targets (b) Protocols for E coli S30 cell-free synthesis typically employ additives, such
as polyethylene glycol to improve protein yields [10] These additives need to be removed prior to NMR structural studies No such additives are required with the wheat germ cell-free approach probably because the wheat germ extract contains chaperones and other factors that contribute to higher protein yields (c) To achieve a high level of label incorporation from E coli S30 extract it may be necessary to take pains to remove endogenous unlabeled amino acids [10] (d) Proteins prepared from E coli S30 extract may be het-erogeneous as the result of incomplete cleavage of the N-terminal methionine This heterogeneity can lead to doubling of NMR peaks [10] An effective solution is
to make all proteins with a cleavable N-terminal sequence This complication does not occur with pro-teins produced in vitro from wheat germ extract (e) Wheat germ extracts contain chaperones, and do not require the addition of chaperones as sometimes nee-ded for high yields from E coli S30 extract [37,38]
A comparison of protein production from wheat germ extract and E coli S30 extract [39] demonstrated that
a significantly higher proportion of multiple domain eukaryotic proteins were soluble when translated by wheat germ extract
Automation
All of the cell-free operations can be carried out by hand, and this is how we started using the technology Because of the small volume requirements for screen-ing (25–50 lL) and protein production for structural studies (4–12 mL), cell-free methods have proved amenable to automation CESG makes use a CellFree Sciences GeneDecoder1000TM robotic system (Fig 2)
in automating the small scale screening of constructs for protein production and solubility This unit makes
it possible to carry out as many as 1052 small scale (25 lL) screening reactions per week CESG has two prototype robotic units developed by CellFree Sciences for larger scale protein production (Fig 2) The Protemist10TM robotic system requires preparation
of the mRNA off-line, whereas the newer Prote-mist100TM starts with DNA and produces the mRNA transcript prior to the translation step Each of these systems supports 24 4 mL transcription and translation reactions per week Typical yields for the Protemist runs are 0.3–0.5 mg purified protein per mL reaction mixture These robotic systems handle the many steps that are tedious to carry out by hand, and work
Trang 5through the night They have greatly reduced the
man-power requirements of cell-free screening and protein
production
Success rates with eukaryotic targets
The centers involved in the NIH Protein Structure
Ini-tiative (USA) are generating information about success
rates in going from a selected target gene to a
comple-ted and deposicomple-ted three-dimensional protein structure
The overall success rates still tend to be quite low, in
the range of 2% to 20%, depending on the center and
the types of targets selected It is clear from all centers
that the yields of structures for eukaryotic targets are
much lower than for prokaryotic targets In the
inter-est of efficiency and cost savings, it is important to
analyze where failures occur and to devise strategies to
minimize these The most effective routes for
improve-ment involve a combination of bioinformatics and small scale screening Bioinformatics relies on prior information and mathematical models for correlating success rates with gene sequences Small scale screening offers the most economical way of testing whether a cloned and sequenced target will proceed through the critical stages leading to a structure The initial screen-ing step determines the level of gene expression and the solubility of the product As described above, CESG’s wheat germ cell-free platform supports rapid and economical small scale screening for expression and solubility We currently test constructs with and without an N-terminal tag and have shown success in rescuing failed targets by truncating the N- and⁄ or C-termini The second screening operation relevant to NMR structure determinations is the screening of the [15N]-labeled protein target by 1H-15N HSQC spectro-scopy) This test, which is repeated after one week to
Fig 2 Fully automated protein synthesizers from CellFree Sciences (Left) GeneDecoder1000 TM , which operates in two small scale modes.
In the screening mode, it handles up to four 96 well plates per overnight run, produces 2–10 lg protein per well, and uses 1.0–5.0 mL wheat germ extract per plate In the small scale protein production mode it can handle up to two 96 well plates per overnight run, produces between 10 and 50 lg protein per well, and uses 5.0–10.0 mL wheat germ extract per plate (Center) Protemist10 TM robotic system, which
is capable of carrying out 24 4 mL translation reactions per week The unit produces 1–3 mg protein per reaction and utilizes 3 mL wheat germ extract per reaction This system requires off-line preparation of the mRNA (Right) Protemist100 TM robotic system, which supports
24 4 mL transcription and translation reactions per week Its capabilities are similar to those of the Protemist10, but it has the added feature
of automated production of mRNA These robotics systems carry out a variety of operations including solvent extraction, high level multi-channel liquid handling, centrifugation, and incubation at various temperatures An onboard microprocessor interfaced with the computer connected to the database keeps detailed log files that contain information about temperatures, volumes, and operational performance at every step.
Trang 6determine if the protein is stable in solution, is highly
diagnostic for the success of an NMR structure
deter-mination Proteins that pass this test are then
pro-duced with [15N+13C]-labeling
We have accumulated experience in using the
cell-free platform to produce proteins from several
eukaryotic genomes These include over 722 different
structural genomics targets from human, mouse, and
Arabidopsis (Table 1) Most of the targets selected for
testing have coded for proteins less then 25 kDa,
because this is the size limit for high-throughput
structure determinations by NMR spectroscopy In
addition, we have carried out small scale wheat germ
cell-free screening of approximately 150 larger proteins
(25–70 kDa), and the success rates for expressing
sol-uble proteins appear to be comparable to our earlier
results with smaller targets presented in Table 1
We define ‘highly soluble’ as‡ 75% of the total
pro-tein being present in the soluble fraction Of the same
proteins produced with N-terminal GST tags and
N-terminal (His)6 tags, 9% more were highly soluble
with the GST tag Only 5% of proteins soluble as
GST fusions became insoluble following cleavage and
removal of the GST tag Thus the results show that
proteins fused to GST can be more highly soluble and
that the advantage may persist after the tag is removed
(presumably through improved folding of the purified
fusion protein prior to cleavage)
We have gathered statistics specific to human
pro-teins Of 174 human targets (most with unknown
func-tion) that were successfully cloned, 135 (78%) showed
expression at levels suitable for structural
investiga-tions Of these expressed proteins, 55 (41%) were
soluble at levels needed for NMR spectroscopy Of
these, 36 (66%) gave [15N]-labeled samples at levels
that could be evaluated by NMR spectroscopy To
date, nine of these human proteins yielded NMR
structures In total, CESG has determined NMR struc-tures of 18 eukaryotic proteins produced by this meth-odology (Fig 3) The average yield of purified, labeled, human proteins made for NMR structural studies has been 0.3 mgÆmL)1reaction mixture
Costs
Labor savings, coupled with the high level of incorpor-ation of labeled amino acids and the high yield of folded protein samples, makes the overall cost of the wheat germ cell-free method comparable to that of the E coli cell-based approach for NMR structure determinations
of eukaryotic proteins One of the main advantages of the automated wheat germ cell-free protein expression system is that the overall process requires much less time and effort compared to our current cell-based methods Not including the cloning steps, it generally takes 48 h (using the GeneDecoder1000TM), or 72 h (manually), to screen 96 targets for expression and solu-bility on the small scale The purification protocols also require less time and effort than cell-based protocols because of the smaller volumes (4–12 mL versus 500–
1000 mL) and higher initial purity Using the latest in General Electric Healthcare HIS-TRAP purification technology (Piscataway, NJ), immobilized metal affinity chromatography (IMAC) purification of His tagged proteins requires 40 min of processing time and results
in protein samples that are 75–85% pure Gel filtration adds an additional 3 h and can increase the purity to
> 95% for proteins < 15 kDa and to 90% for proteins
< 20 kDa GST purification results in > 95% purity regardless of size; however, the minimal time to process the sample is greater than 10 h
Because stable isotope labeled amino acids required for NMR structure determinations are expensive, it is important that the protein yield per quantity of amino
Table 1 Statistics on eukaryotic proteins produced by CESG’s wheat germ cell-free platform.
Small scale (lg), automated 96 well format production overnight Large scale (mg), automated 8 · 4 mL production overnight
Genome
Targets
selected
Targets cloned successfully
Targets showing acceptable expression
Targets showing adequate solubility
[ 15 N]-labeled proteins produced
Acceptable [15N-1H]-HSQC spectrum
Protein stable for >
10 days
[ 13 C, 15 N]-labeled protein made a
3D structures
by NMR
Arabidopsis 381 351 (92%) 269 (77%) 120 (45%) 76 (63%) 17 (22%) 9 (53%) 9 (100%) 8 (89%) Total 722 654 (91%) 451 (69%) 189 (42%) 123 (65%) 34 (28%) 20 (59%) 20 (100%) 18 (90%)
a Average yield of purified double-labeled proteins used in structural investigations was 0.3 mgÆmL –1 reaction mixture b Percentages represent the number of successful targets at a given step divided by the number coming from the previous step (174 ⁄ 191) ¼ 91% in the case indica-ted.
Trang 7(5) At3g01050.1
13 kDa PDB: 1SE9
(6) At2g24940.1
11 kDa PDB: 1T0G
(4) Dr.13312
12 kDa PDB: 2FB7
(7) At3g51030.1
14 kDa PDB: 1XFL
(8) Hs.102419
13 kDa PDB: 1ZR9
(9) Hs.157607
14 kDa PDB: 2ETT
(10) At2g23090.1
9 kDa PDB: 1WVK
(11) P62627 dimer 22 kDa PDB: 1Y4O
(12) At2g46140.1
19 kDa PDB: 1YYC
(1) Hs.78877
11 kDa PDB: 2G2B
(2) At5g39720.1
19 kDa PDB: 2G0Q
(3) At5g66040.1
14 kDa PDB: 1TQ1
Fig 3 Examples of three-dimensional solution structures of eukaryotic proteins determined by NMR spectroscopy from labeled samples produced by the wheat germ cell-free platform described here All structures have been deposited in the Protein Data Bank under the acces-sion codes indicated The molecular masses of the proteins are indicated; these proteins are relatively small because they were chosen as targets for high-throughput NMR structure determination, which currently has a practical size limit of 25 kDa (1) Hs.78877 is human allo-graft inflammatory factor 1 (2) At5g39720.1 is a protein of unknown function from A thaliana (3) At5g66040.1 is a single domain sulfur-transferase and is annotated as a senescence-associated protein (sen1-like protein) and ketoconazole resistance protein (4) Dr13312 is a protein of unknown function from zebrafish (5) At3g01050.1 from A thaliana has a ubiquitin-like fold, and may be prenylated at a putative C-terminal CAAX box motif so as to target the protein and its binding partners to a membrane compartment of the cell [32] (6) At2g24940.1 from A thaliana gave a structure with a cytochrome b5-like fold but with some resemblance to steroid binding proteins [42]; a subsequent NMR study showed that the protein binds progesterone This protein failed to express in the E coli cell-based pipeline (7) At3g51030.1 is
an h1 thioreodoxin from A thaliana [43] This protein was also produced from E coli cells; it gave an acceptable HSQC spectrum but failed
to crystallize (8) Hs.102419 is a human C2h2-type zinc finger protein (9) Hs.157607 is a human sorting nexin 22 px domain (10) At2g23090.1 is an unknown, partially disordered protein from A thaliana (11) P62627 from mouse is isoform 1 of Roadblock ⁄ LC7, a light chain in the dynein complex [44] (12) At2g46140.1 from A thaliana is late embryogenesis abundant (LEA) protein of a type expressed under conditions of cellular stress, such as desiccation, cold, osmotic stress, and heat [45].
Trang 8acid supplied be high With cell-free systems (E coli or
wheat germ) 10% of the labeled amino mixture
sup-plied is incorporated into the protein produced and
purified
Although the cell-free approach is much less labor
intensive in comparison to our E coli cell-based
pipe-line, it requires more expensive reagents and supplies
Current limitations of the method stem from the
restricted availability and high cost of highly active
wheat germ extract These problems should ease as the
wheat germ cell-free approach becomes more
wide-spread and as increasing demands for cell-free extract
stimulate improvements in production technology The
costs of stable isotope labeled amino acids also may be
expected to decrease as demand accelerates Average
supplies costs currently are: US$47 per target for
clo-ning and expression solubility testing (with unpurified
reaction mixture assayed by SDS⁄ PAGE), US$370 per
mg for Se-Met protein, US$390 per mg for [15
N]pro-tein, and US$470 per mg for [13C,15N]protein (with
proteins isolated and purified)
The major advantages of the wheat germ cell-free
method over the E coli cell-based pipeline are that it
supports the production of a larger fraction of targets
as folded, soluble protein and that it is much faster to
prepare additional samples or truncated samples as
needed for successful structure determinations The
E coliapproach has a cost advantage when its protein
yields are much higher than cell-free The overall costs
of each approach appear to be similar for NMR
struc-ture determinations
Prospects
Because of the complementarity of free and
cell-based methods, we envision that it will be most
efficient to screen each new target by both methods
Initially, we did not have an easy way to screen
tar-gets by the two approaches, because the cell-based
pipeline was using ligation-independent cloning
tech-nology, whereas the cell-free pipeline used ligation
clo-ning into the pEU vector To remedy this, we recently
implemented a cloning strategy that enables efficient
small scale screening by cell-free and cell-based
meth-ods [40]; this approach utilizes Promega’s Flexi
Vector technology to transfer the target gene from
one plasmid to another By comparing the small scale
screening results from the two platforms, we can now
choose the one more likely to be successful If the
cell-based approach is selected for an NMR target,
we make use of a self-induction medium developed
for producing [15N] or [13C+15N]-labeled protein
from E coli cells [41]
The largest remaining bottlenecks associated with the wheat germ cell-free protocol are the limited solu-bility, aggregation, or limited stability exhibited by many targets Improvements in any of these areas would greatly lower the costs of structure determina-tions Our ongoing research is aimed at investigating reasons for failures of these types and at developing approaches for rescuing failed targets Some structural genomics centers start multiple constructs for each tar-get selected (different N- and C-termini, different fusions, or different vectors and hosts) and choose the one that yields the most soluble protein We have initi-ated a pilot study aimed at determining whether the initial production of constructs with multiple N- and C-termini for small scale screening would be more effi-cient than our current approach of redesigning failed constructs
Currently, CESG’s X-ray structure pipeline requires
in the order of 10 mg of Se-Met protein for each tar-get We anticipate that as reliable small scale crystal-lization screening methods become available, the wheat germ cell-free method could become part of the X-ray crystallography pipeline We have already determined
by mass spectrometry that the wheat germ cell-free approach supports high level incorporation of Se-Met, and we have made small quantities of Se-Met-labeled proteins for use chip (Fluidigm, South San Francisco, CA) crystallization screening
Acknowledgements
We gratefully acknowledge the work of all CESG staff members and collaborators and fruitful interactions with Professor Y Endo and his group at Ehime Uni-versity, Matsuyama, Japan, and staff members of CellFree Sciences Co., Ltd (Yokohama, Japan)
in adapting their technology to research and product-ion environments Supported by NIH grants 1U54 G074901 (which supports CESG), and P41 RR02301 (which supports the National Magnetic Resonance Facility at Madison, where NMR spectroscopy was carried out)
References
1 Kramer G, Kudlicki W, Hardesty B, Higgens SJ & Hames BD (1999) Cell-free coupled transcription-trans-lation systems from Escherichia coli In Protein Expression A Practical Approach(Higgens SJ & Hames
BD, eds), pp 201–223 Oxford University Press, Oxford, UK
2 Clemens MM, Prujin GJ, Higgens SJ & Hames BD (1999) Protein synthesis in eukaryotic cell-free systems
Trang 9In Protein Expression A Practical Approach (Higgens
SJ & Hames BD, eds), pp 129–165 Oxford University
Press, Oxford, UK
3 Cubeddu L, Moss CX, Swarbrick JD, Gooley AA,
Williams KL, Curmi PM, Slade MB & Mabbutt BC
(2000) Dictyostelium discoideum as expression host:
isotopic labeling of a recombinant glycoprotein for
NMR studies Protein Expr Purif 19, 335–342
4 Strauss A, Bitsch F, Cutting B, Fendrich G, Graff P,
Liebetanz J, Zurini M & Jahnke W (2003)
Amino-acid-type selective isotope labeling of proteins expressed in
baculovirus-infected insect cells useful for NMR studies
J Biomol NMR 26, 367–372
5 Bruggert M, Rehm T, Shanker S, Georgescu J & Holak
TA (2003) A novel medium for expression of proteins
selectively labeled with15N-amino acids in Spodoptera
frugiperda(Sf9) insect cells J Biomol NMR 25, 335–348
6 Goff SA & Goldberg AL (1987) An Increased Content
of Protease LA, the Lon Gene-Product, Increases
Pro-tein-Degradation and Blocks Growth in Escherichia coli
J Biol Chem 262, 4508–4515
7 Maurizi MR (1987) Degradation in vitro of
bacterio-phage lambda N protein by Lon protease from
Escheri-chia coli J Biol Chem 262, 2696–2703
8 Chrunyk BA, Evans J, Lillquist J, Young P & Wetzel R
(1993) Inclusion-Body Formation and Protein Stability
in Sequence Variants of Interleukin-1-Beta J Biol Chem
268, 18053–18061
9 Shi J, Pelton JG, Cho HS & Wemmer DE (2004)
Pro-tein signal assignments using specific labeling and
cell-free synthesis J Biomol NMR 28, 235–247
10 Torizawa T, Shimizu M, Taoka M, Miyano H &
Kainosho M (2004) Efficient production of isotopically
labeled proteins by cell-free synthesis: a practical
proto-col J Biomol NMR 30, 311–325
11 Yabuki T, Kigawa T, Dohmae N, Takio K, Terada T,
Ito Y, Laue ED, Cooper JA, Kainosho M & Yokoyama
S (1998) Dual Amino Acid-Selective and Site-Directed
Stable-Isotope Labeling of the Human c-Ha-Ras Protein
by Cell-Free Synthesis J Biomol NMR 11, 295–306
12 Kigawa T, Muto Y & Yokoyama S (1995) Cell-Free
Synthesis and Amino Acid-Selective Stable Isotope
Labeling of Proteins for NMR Analysis J Biomol NMR
6, 129–134
13 Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Mei
Ono A & Gu¨ntert P (2006) Optimal isotope labelling
for NMR protein structure determinations Nature 440,
52–57
14 Klammt C, Lohr F, Schafer B, Haase W, Doetsch V,
Ru¨terjans H, Glaubitz C & Bernhard F (2004) High
level cell-free expression and specific labeling of integral
membrane proteins Eur J Biochem 271, 568–580
15 Henrich B, Lubitz W & Plapp R (1982) Lysis of
Escher-ichia coliby induction of Cloned Phi-X174 Genes Mol
Gen Gen 185, 493–497
16 Guignard L, Ozawa K, Pursglove SE, Otting G & Dixon NE (2002) NMR analysis of in vitro-synthesized proteins without purification: a high-throughput approach FEBS Lett 524, 159–162
17 Kohno T (2005) Production of proteins for NMR stud-ies using the wheat germ cell-free system Methods Mol Biol 310, 169–185
18 Kigawa T & Yokoyama S (2002) [High-throughput cell-free protein expression system for structural genomics and proteomics studies] Tanpakushitsu Kakusan Koso
47, 1014–1019
19 Yokoyama S (2003) Protein expression systems for structural genomics and proteomics Curr Opin Chem Biol 7, 39–43
20 Kigawa T, Yabuki T, Yoshida Y, Tsutsui M, Ito Y, Shibata T & Yokoyama S (1999) Cell-free production and stable-isotope labeling of milligram quantities of proteins FEBS Lett 442, 15–19
21 Kim DM, Kigawa T, Choi CY & Yokoyama S (1996)
A Highly Efficient Cell-Free Protein Synthesis System from Escherichia coli Eur J Biochem 239, 881–886
22 Yokoyama S, Hirota H, Kigawa T, Yabuki T, Shirouzu
M, Terada T, Ito Y, Matsuo Y, Kuroda Y, Nishimura
Y, Kyogoku Y, Miki K, Masui R & Kuramitsu S (2000) Structural genomics projects in Japan Nat Struct Biol 7 (Suppl.), 943–945
23 Kim DM & Swartz JR (2000) Prolonging cell-free pro-tein synthesis by selective reagent additions Biotechnol Prog 16, 385–390
24 Yin G & Swartz JR (2004) Enhancing multiple disulfide bonded protein folding in a cell-free system Biotechnol Bioeng 86, 188–195
25 Chumpolkulwong N, Hori-Takemoto C, Hosaka T, Inaoka T, Kigawa T, Shirouzu M, Ochi K &
Yokoyama S (2004) Effects of Escherichia coli riboso-mal protein S12 mutations on cell-free protein synthesis Eur J Biochem 271, 1127–1134
26 Madin K, Sawasaki T, Ogasawara T & Endo Y (2000)
A highly efficient and robust cell-free protein synthesis system prepared from wheat embryos: Plants apparently contain a suicide system directed at ribosomes Proc Natl Acad Sci USA 97, 559–564
27 Endo Y (2001) Genomics to Proteomics: A High-throughput Cell-free Protein Synthesis System for Prac-tical Use The 3rd ORCS International Symposium on Ribosome Engineering, January 22–23, 2001 Tsukuba, Japan
28 Kawasaki T, Gouda MD, Sawasaki T, Takai K & Endo
Y (2003) Efficient synthesis of a disulfide-containing protein through a batch cell-free system from wheat germ Eur J Biochem 270, 4780–4786
29 Morita EH, Sawasaki T, Tanaka R, Endo Y & Kohno
T (2003) A wheat germ cell-free system is a novel way
to screen protein folding and function Protein Sci 12, 1216–1221
Trang 1030 Sawasaki T, Ogasawara T, Morishita R & Endo Y
(2002) A cell-free protein synthesis system for
high-throughput proteomics Proc Natl Acad Sci USA 99,
14652–14657
31 Sawasaki T, Hasegawa Y, Tsuchimochi M, Kamura
N, Ogasawara T, Kuroita T & Endo Y (2002) A
bilayer cell-free protein synthesis system for
high-throughput screening of gene products FEBS Lett
514, 102–105
32 Vinarov DA, Lytle BL, Peterson FC, Tyler EM,
Volkman BF & Markley JL (2004) Cell-free protein
production and labeling protocol for NMR-based
structural proteomics Nat Methods 1, 149–153
33 Vinarov DA & Markley JL (2005) High-Throughput
Automated Platform for NMR-Based Structural
Proteo-mics Expert Rev Proteomics 2, 49–55
34 Vinarov DA, Tyler EM, Loushin Newman CL, Shahan
MN & Markley JL (2006) Protein Production using the
Wheat Germ Cell-Free Expression System In Current
Protocols in Protein Science(Coligan JE, Dunn BM,
Ploegh HL, Speicher DW & Wingfield PT, eds Series
ed Taylor G), pp 5.18.1–5.18.18 Unlimited Learning
Resources, Winston-Salem, NC
35 Tyler RC, Aceti DJ, Bingman CA, Cornilescu CC, Fox
BG, Frederick RO, Jeon WB, Lee MS, Newman CS,
Peterson FC, Phillips GN Jr, Shahan MN, Singh S,
Song J, Sreenath H, Tyler EM, Ulrich EL, Vinarov
DA, Vojtik FC, Volkman BF, Wrobel RL, Zhao Q &
Markley JL (2005) Comparison of based and
cell-free protocols for producing target proteins from the
Arabidopsis thalianagenome for structural studies
Proteins 59, 633–643
36 Betton JM (2003) Rapid translation system (RTS): a
promising alternative for recombinant protein
produc-tion Curr Protein Pept Sci 4, 73–80
37 Ryabov LA, Desplancq D, Spirin AS & Pluckthun A
(1997) Functional antibody production using cell-free
translation: effects of protein disulfide isomerase and chaperones Nat Biotechnol 15, 79–84
38 Kang SH, Kim DM, Kim HJ, Jun SY, Lee KY & Kim
HJ (2005) Cell-free production of aggregation-prone proteins in soluble and active forms Biotechnol Prog 21, 1412–1419
39 Hirano N, Sawasaki T, Tozawa Y, Endo Y & Takai K (2006) Tolerance for random recombination of domains
in prokaryotic and eukaryotic translation systems: Lim-ited interdomain misfolding in a eukaryotic translation system Proteins 64, 343–354
40 Blommel PG, Martin PA, Wrobel RL, Steffen E & Fox
BG (2006) High-efficiency single-step production of expression plasmids from cDNA clones using the Flexi Vector cloning system Protein Expr Purif 47, 562–570
41 Tyler RC, Sreenath H, Aceti DJ, Bingman CA, Singh S, Markley JL & Fox BG (2005) Auto-Induction Medium for the Production of [U-15N]- and [U-13C, U-15 N]-labe-led Proteins for NMR Screening and Structure Deter-mination Protein Expr Purif 40, 268–278
42 Song J, Vinarov D, Tyler EM, Shahan MN, Tyler RC
& Markley JL (2004) Hypothetical protein At2g24940.1 from Arabidopsis thaliana has a cytochrome b5 like fold
J Biomol NMR 30, 215–218
43 Peterson FC, Lytle BL, Sampath S, Vinarov D, Tyler
E, Shahan M, Markley JL & Volkman BF (2005) Solu-tion structure of thioredoxin h1 from Arabidopsis thali-ana Protein Sci 14, 2195–2200
44 Song J, Tyler RC, Lee MS, Tyler EM & Markley JL (2005) Solution structure of isoform 1 of Road-block⁄ LC7, a light chain in the dynein complex J Mol Biol 354, 1043–1051
45 Singh S, Cornilescu CC, Tyler RC, Cornilescu G, Tonelli M, Lee MS & Markley JL (2005) Solution structure of a late embryogenesis abundant protein (LEA14) from Arabidopsis thaliana, a cellular stress-related protein Protein Sci 14, 2601–2609