Chad Alexander 3, The University of Tennessee-Oak Ridge Graduate School of Genome Science and Technology, Oak Ridge National Laboratory, Life Sciences Division, Oak Ridge, Tennessee 3783
Trang 1in controlling the flow of genetic information This packaging system hasevolved to index our genomes such that certain genes become readily access-ible to the transcription machinery, while other genes are reversibly silenced.Moreover, chromatin-based mechanisms of gene regulation, often involvingdomains of covalent modifications of DNA and histones, can be inherited fromone generation to the next The heritability of chromatin states in the absence
of DNA mutation has contributed greatly to the current excitement in the field
of epigenetics
The past 5 years have witnessed an explosion of new research on tin biology and biochemistry Chromatin structure and function are now widelyrecognized as being critical to regulating gene expression, maintaining genomicstability, and ensuring faithful chromosome transmission Moreover, links be-tween chromatin metabolism and disease are beginning to emerge The identi-fication of altered DNA methylation and histone acetylase activity in humancancers, the use of histone deacetylase inhibitors in the treatment of leukemia,and the tumor suppressor activities of ATP-dependent chromatin remodelingenzymes are examples that likely represent just the tip of the iceberg
chroma-As such, the field is attracting new investigators who enter with little firsthand experience with the standard assays used to dissect chromatin structureand function In addition, even seasoned veterans are overwhelmed by therapid introduction of new chromatin technologies Accordingly, we sought tobring together a useful ‘‘go-to’’ set of chromatin-based methods that wouldupdate and complement two previous publications in this series, Volume 170(Nucleosomes) and Volume 304 (Chromatin) While many of the classic proto-cols in those volumes remain as timely now as when they were written, it is ourhope the present series will fill in the gaps for the next several years
This 3-volume set of Methods in Enzymology provides nearly one hundredprocedures covering the full range of tools—bioinformatics, structural biology,biophysics, biochemistry, genetics, and cell biology—employed in chromatinresearch Volume 375 includes a histone database, methods for preparation of
xv
Trang 2histones, histone variants, modified histones and defined chromatin segments,protocols for nucleosome reconstitution and analysis, and cytological methodsfor imaging chromatin functions in vivo Volume 376 includes electron micro-scopy and biophysical protocols for visualizing chromatin and detecting chro-matin interactions, enzymological assays for histone modifying enzymes, andimmunochemical protocols for the in situ detection of histone modificationsand chromatin proteins Volume 377 includes genetic assays of histones andchromatin regulators, methods for the preparation and analysis of histonemodifying and ATP-dependent chromatin remodeling enzymes, and assaysfor transcription and DNA repair on chromatin templates We are exceedinglygrateful to the very large number of colleagues representing the field’s leadinglaboratories, who have taken the time and effort to make their technicalexpertise available in this series.
Finally, we wish to take the opportunity to remember Vincent Allfrey,Andrei Mirzabekov, Harold Weintraub, Abraham Worcel, and especially AlanWolffe, co-editor of Volume 304 (Chromatin) All of these individuals had keyroles in shaping the chromatin field into what it is today
C David AllisCarl Wu
Editors’ Note: Additional methods can be found in Methods in Enzymology,Vol 371 (RNA Polymerases and Associated Factors, Part D) Section IIIChromatin, Sankar L Adhya and Susan Garges, Editors
Trang 3METHODS IN ENZYMOLOGY
EDITORS-IN-CHIEF
DIVISION OF BIOLOGY CALIFORNIA INSTITUTE OF TECHNOLOGY
PASADENA, CALIFORNIA
FOUNDING EDITORS
Sidney P Colowick and Nathan O Kaplan
Trang 4Contributors to Volume 375Article numbers are in parentheses and following the names of contributors.
Affiliations listed are current.
Chad Alexander (3), The University of
Tennessee-Oak Ridge Graduate School
of Genome Science and Technology,
Oak Ridge National Laboratory, Life
Sciences Division, Oak Ridge, Tennessee
37831-8080
Genevie`ve Almouzni (8), Institut Curie,
Section de Recherche, F-75248, Paris
Cedex 05, France
Satoshi Ando (18), Department of
Mo-lecular Life Science, School of Medicine,
Tokai University, Kanagawa 259-1193,
Japan
Yunhe Bao (2), Department of
Biochemis-try and Molecular Biology, Colorado
State University, Fort Collins, Colorado
80523-1870
Blaine Bartholomew (13), Department
of Biochemistry & Molecular Biology,
Southern Illinois University School of
Medicine, Carbondale, Illinois
62901-4413
David P Bazett-Jones (28), Programme
in Cell Biology, Hospital for Sick
Children, Toronto, Ontario M5G 1X8,
Canada
Andrew S Belmont (23), Department of
Cell and Structural Biology, University of
Illinois at Urbana-Champaign, Urbana,
Illinois 61801
Leise Berven (16), Children’s Medical
Re-search Institute, Westmead, New South
Wales 2415, Australia
Yehudit Birger (21), National Cancer
In-stitute, National Institutes of Health,
Bethesda, Maryland 20892
Hinrich Boeger (11), Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94305
William M Bonner (5), Laboratory of Molecular Pharmacology, National Cancer Institute, Bethesda, Maryland 20892
Michael Bruno (14), Division of Gene Regulation and Expression, The Well- come Trust Biocentre, Department of Biochemistry, University of Dundee, Dundee, DD1 5EH, Scotland, United Kingdom.
Gerard J Bunick (3), Life Sciences ision, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-8080 Michael Bustin (21), National Cancer In- stitute, National Institutes of Health, Bethesda, Maryland 20892
Div-Anne E Carpenter (23), Whitehead tute for Biomedical Research, Cambridge, Massachusetts 02142
Insti-Gustavo Carrero (26), Department of Mathematical and Statistical Sciences, Faculty of Science, University of Alberta, Edmonton, Alberta T6G 2E1, Canada
David Carter (29), Laboratory of matin and Gene Expression, Babraham Institute, Cambridge CB2 4AT, United Kingdom
Chro-Fre´de´ric Catez (21), National Cancer stitute, National Institutes of Health, Bethesda, Maryland 20892
In-ix
Trang 5Lyubomira Chakalova (29), Laboratory
of Chromatin and Gene Expression,
Bab-raham Institute, Cambridge CB2 4AT,
United Kingdom
Srinivas Chakravarthy (2), Department
of Biochemistry and Molecular Biology,
Colorado State University, Fort Collins,
Colorado 80523-1870
Lakshmi N Changolkar (15),
Depart-ment of Animal Biology, School of
Veter-inary Medicine, University of
Pennsylvania, Philadelphia, Pennsylvania
19104
Lisa Ann Cirillo (9), Department of Cell
Biology, Neurobiology, and Anatomy,
Medical College of Washington,
Milwaukee, Wisconsin 53149
Peter R Cook (24), The Sir William Dunn
SchoolofPathology, University ofOxford,
Oxford OX1 3RE, United Kingdom
Ellen Crawford (26), Department of
On-cology, Faculty of Medicine, University of
Alberta and Cross Cancer Institute,
Edmonton, Alberta T6G 2E1, Canada
Wouter de Laat (30), Department of Cell
Biology, ErasmusMC, 3015 GE
Rotter-dam, The Netherlands
Gerda de Vries (26), Department of
Math-ematical and Statistical Sciences, Faculty
of Science, University of Alberta,
Edmon-ton, Alberta T6G 2E1, Canada
Graham Dellaire (28), Programme in
Cell Biology, Hospital for Sick Children,
Toronto, Ontario M5G 1X8, Canada
John D Diller (10), Department of
Bio-chemistry and Molecular Biology, Center
for Gene Regulation, The Pennsylvania
State University, University Park,
Pennsylvania 16802
Charles E Ducker (10), Department of
Biochemistry and Molecular Biology,
Center for Gene Regulation, The
Pennsyl-vania State University, University Park,
Pennsylvania 16802
Pamela N Dyer (2), Department of chemistry and Molecular Biology, Color- ado State University, Fort Collins, Colorado 80523-1870
Bio-Raji S Edayathumangalam (2), ment of Biochemistry and Molecular Biol- ogy, Colorado State University, Fort Collins, Colorado 80523-1870
Depart-Thomas G Fazzio (6), Fred Hutchinson Cancer Research Center, Seattle, Wash- ington 98109-1024
Andrew Flaus (14), Division of Gene Regulation and Expression, The Well- come Trust Biocentre, Department of Bio- chemistry, University of Dundee, Dundee, DD1 5EH, Scotland, United Kingdom Peter Fraser (29), Laboratory of Chroma- tin and Gene Expression, Babraham Insti- tute, Cambridge CB2 4AT, United Kingdom
Susan M Gasser (22), Department of lecular Biology, University of Geneva,
Mo-1211 Geneva 4, Switzerland Stanislaw A Gorski (25), National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892 Joachim Griesenbeck (11), Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94305
Frank Grosveld (30), Department of Cell Biology, ErasmusMC, 3015 GE Rotter- dam, The Netherlands
B Leif Hanson (3), The University of nessee-Oak Ridge Graduate School of Genome Science and Technology, Life Sciences Divison, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-8080
Ten-Joel M Harp (3), Department of chemistry and Center for Structural Biol- ogy, Vanderbilt University, Nashville, Tennessee 37232-8725
Trang 6Keiji Hashimoto (17), Core Research for
Evolutional Science and Technology,
Saitama 332-0012, Japan
Jeffrey J Hayes (12), Department of
Bio-chemistry and Biophysics, University of
Rochester Medical Center, Rochester,
New York 14642
Florence Hediger (22), Department of
Molecular Biology, University of Geneva,
1211 Geneva 4, Switzerland
Michael J Hendzel (26), Department of
Oncology, University of Alberta and
Cross Cancer Instutite, Edmonton,
Alberta T6G 1Z2, Canada
Miki Hieda (24), Sir William Dunn School
of Pathology, University of Oxford,
Oxford OX1 3RE, United Kingdom
Stefan R Kassabov (13), Department of
Biochemistry & Molecular Biology,
Southern Illinois University School of
Medicine, Carbondale, Illinois
62901-4413
Hiroshi Kimura (24), Horizontal Medical
Research Organization, School of
Medi-cine, Kyoto University, Kyoto 606-8510,
Japan
Roger D Kornberg (11), Department of
Structural Biology, Stanford University
School of Medicine, Stanford, California
94305
David Landsman (1) National Center for
Biotechnology Information, National
Li-brary of Medicine, National Institutes of
Health, Bethesda, Maryland 20894
Paul J Laybourn (7), Department of
Bio-chemistry and Molecular Biology,
Color-ado State University, Fort Collins,
Colorado 80523-1870
Jae-Hwan Lim (21), National Cancer
Insti-tute, National Institutes of Health,
Bethesda, Maryland 20892
Karolin Luger (2), Department of chemistry and Molecular Biology, Color- ado State University, Fort Collins, Colorado 80523-1870
Bio-James G McNally (27), Laboratory of Receptor Biology and Gene Expression, National Cancer Institute, National Insti- tutes of Health, Bethesda, Maryland 20892
Tom Misteli (25) National Cancer tute, National Institutes of Health, Bethesda, Maryland 20892
Insti-Craig A Mizzen (19), Department of Cell
& Structural Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801
Setsuo Morishita (17), Department of lecular Biology, School of Science, Nagoya University, Nagoya 464-8601, Japan Uma M Muthurajan (2), Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado 80523-1870
Mo-Frank R Neumann (22), Department of Molecular Biology, University of Geneva,
1211 Geneva 4, Switzerland Rozalia Nisman (28), Programme in Cell Biology, Hospital for Sick Children, Toronto, Ontario M5G 1X8, Canada Tom Owen-Hughes (14), Division of Gene Regulation and Expression, The Well- come Trust Biocentre, Department of Bio- chemistry, University of Dundee, Dundee, DD1 5EH Scotland, United Kingdom John R Pehrson (15), Department of Animal Biology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104 Craig L Peterson (4) University of Mas- sachusetts Medical School, Worchester, Massachusetts 01605
Trang 7Robert D Phair (25), BioInformatics
Ser-vices, Rockville, Maryland 20854
Duane R Pilch (5), Laboratory of
Mo-lecular Pharmacology, National Cancer
Institute, Bethesda, Maryland 20892
Yuri V Postnikov (21), National Cancer
Institute, National Institutes of Health,
Bethesda, Maryland 20892
Danny Rangasamy (16), The John Curtin
School of Medical Research, Australian
National University, Canberra, Australia
Capital Territory 2601, Australia
Dominique Ray-Gallet (8), Institut
Curie, Section de Recherche, F-75248,
Paris Cedex 05, France
Christophe Redon (5), Laboratory of
Mo-lecular Pharmacology, National Cancer
Institute, Bethesda, Maryland 20892
Raymond Reeves (20), School of
Molecu-lar Biosciences, Biochemistry/Biophysics,
Washington State University, Pullman,
Washington 99164-4660
Patricia Ridgway (16), The John Curtin
School of Medical Research, Australian
National University, Canberra,
Austra-lian Capital Territory 2601, Australia
Chun Ruan (10), Department of
Biochem-istry and Molecular Biology, Center for
Gene Regulation, The Pennsylvania State
University, University Park, Pennsylvania
16802
Olga A Sedelnikova (5), Laboratory
of Molecular Pharmacology, National
Cancer Institute, Bethesda, Maryland
20892
Michael A Shogren-Knaak (4),
Univer-sity of Massachusetts Medical School,
Worchester, Massachusetss 01605
Robert T Simpson (10), Department of
Biochemistry and Molecular Biology,
Center for Gene Regulation, The
Pennsyl-vania State University, University Park,
Pennsylvania 16802
Erik Splinter (30), Department of Cell Biology, ErasmusMC, 3015 GE Rotter- dam, The Netherlands
Diana A Stavreva (27), Laboratory of Receptor Biology and Gene Expression, National Cancer Institute, National Insti- tutes of Health, Bethesda, Maryland 20892
J Seth Strattan (11), Department of Structural Biology, Stanford University School of Medicine, Stanford, California 94305
Steven A Sullivan (1), National Center for Biotechnology Information, National Library of Medicine, National Institutes
of Health, Bethesda, Maryland 20894 Ulrica Svensson (16), The John Curtin School of Medical Research, Australian National University, Canberra, Australian Capital Territory 2601, Australia Angela Taddei (22), Department of Mo- lecular Biology, University of Geneva,
1211 Geneva 4, Switzerland John Th’ng (26), Northwestern Ontario Regional Cancer Centre, Thunder Bay, Ontario P7A 7T1, Canada
David John Tremethick (16), The John Curtin School of Medical Research, Aus- tralian National University, Canberra, Australian Capital Territory 2601, Australia
Toshio Tsukiyama (6), Fred Hutchinson Cancer Research Center, Seattle, Wash- ington 98109-1024
Jay C Vary, Jr (6), Molecular and lar Biology Program, University of Washington, Seattle, Washington 98195 Cindy L White (2), Department of Bio- chemistry and Molecular Biology, Color- ado State University, Fort Collins, Colorado 80523-1870
Cellu-Sriwan Wongwisansri (7), Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado 80523-1870
Trang 8Kinya Yoda (17, 18), Bioscience and
Bio-technology Center, Nagoya University,
Nagoya, 464-8601, Japan
Kenneth S Zaret (9), Cell and
Devel-opmental Biology Program, Fox Chase
Cancer Center, Philadelphia,
Trang 9[1] Mining Core Histone Sequences from Public
Considerations
Our initial goal was to collect as many reported histone sequences as wecould find Among the considerations that came into play were thefollowing
1 Sourcing of sequences Several excellent public sequence tories make protein sequences available to researchers We relied on theprotein database maintained by the National Center for BiotechnologyInformation (NCBI), which is updated frequently and has been compiledfrom worldwide sources, including Swiss-Prot,3 the Protein InformationResource (PIR),4 the Protein Research Foundation (PRF) (http://www.prf.or.jp/en/), the Protein Data Bank (PDB),5 and translationsfrom annotated coding regions in GenBank6 and RefSeq,7 a curated,nonredundant set of sequences
reposi-1 S Sullivan, D W Sink, K L Trout, I Makalowska, P M Taylor, A D Baxevanis, and
D Landsman, Nucleic Acids Res 30, 341 (2002).
2 S A Sullivan and D Landsman, Proteins 52, 454 (2003).
3 B Boeckmann, A Bairoch, R Apweiler, M C Blatter, A Estreicher, E Gasteiger, M J Martin, K Michoud, C O’Donovan, I Phan, S Pilbout, and M Schneider, Nucleic Acids Res 31, 365 (2003).
4 C H Wu, L S Yeh, H Huang, L Arminski, J Castro-Alvear, Y Chen, Z Hu, P Kourtesis, R S Ledley, B E Suzek, C R Vinayaka, J Zhang, and W C Barker, Nucleic Acids Res 31, 345 (2003).
5 J Westbrook, Z Feng, L Chen, H Yang, and H M Berman, Nucleic Acids Res 31, 489 (2003).
[1] mining core histone sequences from public protein databases 3
Trang 102 Sequence-harvesting tools In general, a sequence database search is
a similarity search of either the actual sequence data or its annotation Wefind that both must be targeted in order to maximize the sequence harvest,because sequence-based searches alone can miss small or ambiguoussequence fragments that have been deposited in the public databases, andtext-based searches can miss ‘‘cryptic’’ histones, that is, those withinadequate or incorrect annotation
For text-based searches of sequence annotation we used the Entrezsearch engine at the NCBI Web site (http://www.ncbi.nlm.nih.gov/Entrez).For sequence-based searching we used several varieties of the popularBasic Local Alignment Search Tool (BLAST) pairwise alignment algo-rithm The most commonly used sequence similarity search tools find
‘‘hits’’ based on pairwise alignments of each sequence in the database toeither the query sequence alone, for example, in the case of BLAST, or aquery profile derived from a previously aligned set of similar sequences, forexample, in the case of PSI-BLAST or HMMER.8,9 The latter tools arebetter at finding highly divergent members of a protein family but can beexpected to return false positives, requiring further filtering of results.PSI-BLAST is actually a hybrid tool that performs one round of standardBLAST, using a user-supplied query sequence, and then builds a profilefrom the alignment of the initial BLAST results, which becomes the queryfor the next round of BLAST The process is reiterated until ‘‘conver-gence’’ is reached, that is, until no more new matches are found abovethe cutoff score Ideally this should take fewer than 10 iterations, but con-vergence can be elusive when the query sequence matches a diverse andperhaps distantly related set of proteins This was more difficult to interpretwith searches for nonhistone proteins containing the histone fold than forharvesting core histone sequences With the latter we found that seven iter-ations were sufficient to reach either convergence or the point at which allthe ‘‘new’’ hits appeared by other criteria to be false positives PSI-BLASTroutinely returned a small number of true-positive matches to the querysequences that gapped protein BLAST (BLASTPGP) had missed.Reasonably fast BLASTPGP and PSI-BLAST servers are available atthe NCBI Web site (http://www.ncbi.nlm.nih.gov/BLAST) One advantage
of the NCBI Web site PSI-BLAST implementation over a command-line
6 D A Benson, I Karsch-Mizrachi, D J Lipman, J Ostell, and D L Wheeler, Nucleic Acids Res 31, 23 (2003).
7 K D Pruitt, T Tatusova, and D R Maglott, Nucleic Acids Res 31, 34 (2003).
8 S F Altschul, T L Madden, A A Schaffer, J Zhang, Z Zhang, W Miller, and D J Lipman, Nucleic Acids Res 25, 3389 (1997).
9 S R Eddy, Bioinformatics 14, 755 (1998).
Trang 11version is that the user can edit each set of aligned sequences before it isused to generate a profile This can redirect a diverging sequence searchback toward convergence Unfortunately, however, it can also happen that
a valid match from one iteration falls below the noise cutoff in the next, and
in the WWW-based implementation, that match is lost Therefore we ranPSI-BLAST (and BLASTPGP) from the command line in a UNIX envir-onment, which allowed us to save the results from all of the iterations intoone file for subsequent text parsing It also allows considerable flexibility insetting other BLAST options Most default values were adequate for typ-ical BLAST searches, but we commonly increased the number of displayeddescription lines and alignments (theb and v options) to 3000, to ensureretrieval of all the possible hits for subsequent filtering steps
3 Query sequences Histones are ancient proteins, found in all knowneukaryote lineages as well as some archaeal microbes Using a single querysequence, there is the possibility that some valid hits might be missedbecause of the sequence divergence and extreme biodiversity of thehistones, even using a profile-generating protocol To maximize theidentification of eukaryote core histones from the protein databases, we
‘‘bracketed’’ the kingdom evolutionarily by using core histone sequencesfrom human and yeast as search queries This proved important for themore divergent histones, H2A and H2B, but less so for the more conservedhistones, H4 and H3 For example, queries with human or yeast H4 or H3returned almost the same sets of true-positive hits In H3 searches, themost common outliers requiring taxonomic bracketing to capture weresequence fragments from protists, and members of the centromeric H3subclass (data not shown)
4 Sequence redundancy Sequence redundancy is the bane of mostdatabase searches In most cases, redundant sequences in a large publicsequence repository such as GenBank are often the same sequence fromthe same organism, automatically harvested from different databases,rather than originating from discrete sequencing projects in differentlaboratories Thus, Web-based sequence similarity search tools, such asPSI-BLAST at the NCBI Web site, tend to present results in a convenient,nonredundant fashion, with sequence identifiers of identical sequencesgrouped together with an anchored sequence To populate the histonedatabase, however, we required every sequence in FASTA format (i.e.,each record consisting of only a unique definition line and a sequence), onereason being that homologous histones display remarkable degrees ofsequence identity, rather than mere similarity, across species It is notuncommon that fully ‘‘redundant’’ histone sequences in the publicdatabase derive from more than one species We wanted to start with aset in which such identical sequences are properly resolved Because we[1] mining core histone sequences from public protein databases 5
Trang 12were attempting an exhaustive search, the well-intentioned dancy of the public databases was, for us, an obstacle Our strategy was toextract all the unique sequence identifiers from the BLAST outputs (in thecase of NCBI records, the unique identifier is the GI number found at thebeginning of the sequence definition line of a FASTA-formatted record)into a file, and use this file to generate a corresponding library of FASTArecords NCBI Entrez on the World Wide Web can take a file of GInumbers as input for batch retrieval of records; alternatively, we used theSEALS software suite to perform such retrievals in a UNIX environ-ment.10SEALS has a tool, fauniq, for reducing a set of redundant FASTAsequence records to a nonredundant format, on the basis of eitherdefinition-line identifiers such as the GI number or on the sequence itself.This tool proved invaluable for filtering BLAST outputs to remove GI-based redundancies and for generating nonredundant sequence sets foralignment and variation analysis.
nonredun-5 Fragmentary, ambiguous, and frameshifted sequences Some quences in the public databases are less than full-length; for example, a fewrecords annotated as ‘‘histone H3’’ consist of only two or three amino acidresidues As sequences shorten, their detection becomes more difficultusing typical ‘‘flavors’’ of BLAST when querying a large database becausethey become less distinguishable from chance hits This problem iscompounded if, as is the case with histones, the protein features segments
se-of low sequence complexity, or if the fragment records contain ambiguous(‘‘X’’) residues To capture sequence fragments, we first divided the full-length query sequence into overlapping segments, with a segment window
of 20 residues, sampled at intervals of 10 residues along the length Thiswas easily done with the SEALS fenestrate command We then used thesesegments as queries against the public database in a modified gappedBLAST search optimized to capture short, nearly exact matches (a searchoption that is also available at the NCBI Website cited earlier) For thesesearches, low-complexity filters were turned off The combined results ofall the ‘‘window BLASTs’’ for a query sequence were made nonredundantwith respect to GI number
Frameshifted sequences (either authentic or erroneous) can pose a lar problem, depending on the size of the frameshifted region Putativeframeshifts are easily identified by visual inspection of multiple alignments
simi-of query results, for example, using the popular CLUSTAL X program,11where they manifest as sudden and extensive loss of sequence similarity
10 D R Walker and E V Koonin, Proc Int Conf Intell Syst Mol Biol 5, 333 (1997).
11 J D Thompson, T J Gibson, F Plewniak, F Jeanmougin, and D G Higgins, Nucleic Acids Res 25, 4876 (1997).
Trang 13To verify a frameshift, assuming access to the genomic DNA or cDNArecord for the protein (which are often, but not always, available in publicdatabases), one should translate the DNA in all frames and add those con-ceptual translations to the alignment; the correct frames will be visuallyevident in a true frameshift Several tools exist on the Web for doing suchtranslations; we commonly use the one at the ExPASy (Expert ProteinAnalysis) Web site: http://us.expasy.org/tools/dna.html A translation tool
is also available in the SEALS package
Comparison of Search Strategies
There are many available variations on the basic BLAST search col We investigated several parameters for their effects in the identifica-tion of histone H3 sequences Histone H3 is a moderately diverse histoneclass, with more than half of the known full-length sequences displaying
proto->80% identity in their histone fold domains; this figure falls between thosefor the more highly conserved H4 class and the more diverse H2A andH2B classes.2The H3 class comprises two subclasses that are markedly dis-tinct in sequence and in function: replication-dependent H3 (the major H3)and centromeric H3 There is also a third, replication-independent H3.3class, although its sequence is only marginally divergent from that of themajor H3
We first compiled a redundant reference set of H3 sequences, using avariety of BLAST- and Entrez-based searches, to include as many probableH3 sequence records as we could find in the NCBI protein database Thisset was manually reviewed to eliminate false positives, yielding a final set of
1742 good candidate H3 sequences from all three subclasses We then pared the results of different individual BLAST and Entrez search strat-egies with the reference set, to determine the efficiency (percentage ofhits that are true positives, i.e., that are also found in the reference set)and the success (percentage of the reference set found by the searchmethod) The results are shown in Table I Entrez searches of eukaryoticsequence record annotation used the queries ‘‘H3’’ or ‘‘histon.’’ BLASTparameters that we varied were: query sequence BLAST flavor (gappedBLAST versus gapped PSI-BLAST versus gapped BLAST for short,nearly exact window matches); query sequence (human versus yeast);database size (all versus the eukaryotic subset); and SEG low-complexityfiltering (off versus on)
com-The Entrez results indicate that almost 20% of H3 sequences in thepublic database are cryptic, lacking specific annotation as H3 histones.The search results for ‘‘histon’’ as a query term recovered 95% of the ref-erence sequences, with a trade-off of many more false positives, as one[1] mining core histone sequences from public protein databases 7
Trang 14TABLE I Comparison of Search Strategies for H3 Histone Sequencesa
Unique GI H3
Success (%)
Efficiency (%) Reference H3 set 1742 1742
Entrez ‘‘eukaryota[ORGN]’’ 1,143,461 1742 100.0 0.2 Entrez ‘‘H3’’ 3303 1452 83.4 44.0 Entrez ‘‘histon’’ 9297 1653 94.9 17.8 Entrez ‘‘eukaryota[ORGN] and H3’’ 2703 1452 83.4 53.7 Entrez ‘‘eukaryota[ORGN] and histon’’ 7453 1653 94.9 22.2 BLASTPGP H3human 1747 1719 98.7 98.4
BLASTPGP H3human þeukgiþseg 1754 1722 98.9 98.2 BLASTPGP H3yeast 1777 1718 98.6 96.7 BLASTPGP H3yeastþseg 1777 1718 98.6 96.7 BLASTPGP H3yeastþeukgi 1780 1718 98.6 96.5 BLASTPGP H3yeastþeukgiþseg 1780 1718 98.6 96.5 PSIBLASTPGP H3human 1897 1726 99.1 91.0 PSIBLASTPGP H3humanþseg 1897 1726 99.1 91.0 PSIBLASTPGP H3humanþeukgi 1949 1727 99.1 88.6 PSIBLASTPGP H3humanþeukgiþseg 1949 1727 99.1 88.6 PSIBLASTPGP H3yeast 2011 1726 99.1 85.8 PSIBLASTPGP H3yeastþseg 2011 1726 99.1 85.8 PSIBLASTPGP H3yeastþeukgi 2077 1727 99.1 83.1 PSIBLASTPGP H3yeastþeukgiþseg 2077 1727 99.1 83.1 WINBLASTPGP H3human 69,678 1730 99.3 2.5 WINBLASTPGP H3human þeukgi 60,821 1732 99.4 2.8 WINBLASTPGP H3human þeukgiþseg 1697 1646 94.5 97.0 WINBLASTPGP H3yeast 70,864 1730 99.3 2.4 WINBLASTPGP H3yeastþeukgi 63,949 1730 99.3 2.7 WINBLASTPGP H3yeastþeukgiþseg 1788 1646 94.5 92.1
a Entrez queries of the NCBI protein database were conducted from the NCBI Web site
www.ncbi.nlm.nih.gov/Entrez BLAST searches using human or yeast histone H3 sequences were performed from the command line in a UNIX environment: BLASTPGP, gapped protein BLAST; PSIBLASTPGP, interated gapped protein BLAST using profiles; WINBLASTPGP, gapped protein BLAST for short, nearly exact matches, using sequence windows as queries; eukgi, search restricted to sequences from eukaryotes; seg, SEG filtering of low-complexity regions enabled All results were compared with a curated reference_H3_set of sequences Column headers: unique GI, number of unique sequence records retrieved; H3, number of retrieved unique GIs shared with the reference set; efficiency, percent H3/unique GI; success, percent H3/ reference set.
Trang 15would expect The ‘‘histon’’ query also captured all of the true-positive
‘‘H3’’ query results (data not shown)
Any of the BLAST-based strategies was sufficient to capture at least94% of the reference set from the public databases The best combination
of efficiency and success was achieved using gapped BLAST The effects ofdifferences in query sequence, database size, and filtering were minor com-pared with the difference between using BLAST, PSI-BLAST, orwindowed BLAST, because the latter two BLAST implementations returnfar more false positives while increasing the success rate only marginally.Low-entropy sequence filtering appeared to make no difference whatso-ever except in the case of windowed searches, in which the query sequencewas divided into overlapping segments 20 residues each in length, with sev-eral gapped BLAST parameters altered to facilitate finding short, nearlyexact matches to the query segments Using the low-complexity filter herevastly increased efficiency by greatly reducing false positives, although suc-cess suffered in comparison with nonfiltered strategies, reflecting the pres-ence of short, often basic low-complexity regions that are a hallmark ofcore histone sequences
Unfortunately, as these results show, no single method captures all therelevant sequence records A combination of strategies was the only way toachieve 100% success However, the results of our comparison suggest a ra-tional way to mine the maximum number of histone sequence records of aclass from a database The first step is to perform a single-round gappedBLAST search, making sure that the options for ‘‘number of descriptions’’and ‘‘number of alignments’’ returned are set high (e.g., several thousandeach) This should return most of the true positives with high efficiency.This set should be inspected carefully, using a variety of tools includingtext-search of the definition lines, multiple alignment, and furtherBLAST searches with a different query sequence, to remove false posi-tives The resulting validated set becomes most useful in subsequentsearches employing other strategies, such as PSI-BLAST or text-basedsearches The validated set can be used to subtract known positives fromsubsequent search results, using difference-finding tools such as the SEALSfanot command, which finds the logical exclusion of two sets of FASTArecords or definition lines This leaves a much shorter list of candidatesfrom the new search results to be examined for new true positives As theseare identified they are added to the validated set, increasing its usefulness
as a filter This search strategy has also served us well in harvesting histoneH4, H2A, and H2B sequences, and should work for any well-conservedclass of protein sequences
[1] mining core histone sequences from public protein databases 9
Trang 16Histone Sequence Variants
Histone variants have been divided into ‘‘homomorphous’’ and morphous’’ categories.12,13Homomorphous variants have relatively minorsequence differences and require high-resolution separation methods todistinguish them biochemically (reviewed in von Holt et al.14) They arefound in all four core histone classes, and are presumed to be functionallyidentical Heteromorphous variants are readily distinguished by conven-tional biochemical separation methods and tend to be distinct from otherhistones in their class with respect to function and/or spatiotemporal local-ization, as well as sequence The distinction between the two categories ofvariants is not rigid—for example, the ostensibly homomorphous H3.3appears to be functionally distinct from the major H3—and may become less
‘‘hetero-so as the functions of more variants are experimentally tested In clusteringtrees made from multiple sequence alignments of each histone class, hetero-morphous variants tend to form biodiverse clades distinct from the majorform, indicating early branching off from major histones, whereas homo-morphous variants tend to comingle with the major form in clades that aremore strongly delineated by phylogeny than by any other factor, suggestingthe variants arose after the founding speciation event (data not shown; seealso Thatcher and Gorovsky15) For all core histone classes, sequence align-ments show clear distinctions between metazoan, plant, fungal, and variousbasal eukaryote subclasses Distinct subclasses within the metazoansequences are also common (e.g., insect or echinoderm sequences) Nomen-clature is only occasionally helpful in classifying histone variants It is notstandardized, and thus ‘‘H3.2’’ in one species may not be similar to ‘‘H3.2’’
in another The only other constant among aligned histone sequences ent inFigs 1–4, is that there tends to be less variation in the a-helical regions
appar-of the histone fold, than in the interhelical loops and the N- and C-terminalregions flanking the histone fold This pattern of variation is common inother a helix-containing protein families
H2A
The H2A class is the most diverse of the four core histone classes, bothfunctionally and in terms of sequence, comprising four subclasses of known
or putative functional variants in addition to typical phylogeny-based
12 M H West and W M Bonner, Biochemistry 19, 3238 (1980).
13 J Ausio, D W Abbott, X Wang, and S C Moore, Biochem Cell Biol 79, 693 (2001).
14 C von Holt, W F Brandt, H J Greyling, G G Lindsey, J D Retief, J D Rodrigues,
S Schwager, and B T Sewell, Methods Enzymol 170, 431 (1989).
15 T H Thatcher and M A Gorovsky, Nucleic Acids Res 22, 174 (1994).
Trang 17Fig 1 (continued)[1] mining core histone sequences from public protein databases 11
Trang 18Fig 1 (continued)
Trang 19subclasses (Fig 1A and B) H2A.X is found in species spanning theeukaryotic spectrum and features a conserved serine four residues fromthe carboxyl terminus (part of an SQ motif, positions 208 and 209 in
Fig 1A) that is phosphorylated in response to double-stranded DNAbreaks, perhaps marking the site for repair (reviewed in Redon et al.16).Interestingly, the fungal H2A subclass clusters near the H2A.X subclass,and also features a conserved SQ motif at its C terminus H2A.F/Zsequences constitute another pan-eukaryotic subclass and are necessarybut not sufficient for H2A function in organisms tested CharacteristicH2A.F/Z residues in a C-terminal, H3-binding portion of the protein(positions 145–193 in Fig 1A) have been suggested to impart a specific,although as yet unknown, function, as have the lysine residues in theamino-terminal portion (reviewed in Redon et al.16) Of these lysine
16 C Redon, D Pilch, E Rogakou, O Sedelnikova, K Newrock, and W Bonner, Curr Opin Genet Dev 12, 162 (2002).
Fig 1 Summary of H2A subclasses and variants (A) A consensus sequence of all aligned H2A sequences is shown at the top Dots in the sequences below indicate identity to the consensus Groups are named on the basis of clustering patterns observed in neighbor-joining trees of aligned H2A sequences (not shown) Names, a selection of sequence descriptors found in the definition lines of the sequence records; seq, number of unique sequences in the group; sp, number of species in the group; max sp/seq, the greatest number of species having the same sequence in the group For each group the first line is the consensus sequence for that group Variations from the group consensus are indicated below it Italic indicates a
‘‘singleton,’’ i.e., the residue was found in only one sequence from one species in the group.
An asterisk (*) indicates singleton identity or a gap Background color key: white, identity to the anchored consensus; black, gap; orange, aromatic; yellow, aliphatic/hydrophobic; light green, glycine; green, hydrophilic; light blue, histidine; blue, basic; red, acidic (B) C-terminal section of macroH2A.
[1] mining core histone sequences from public protein databases 13
Trang 20residues, two (at positions 11 and 42 in Fig 1A) appear to be specific toH2A.F/Z and not the major metazoan H2A MacroH2A is a large bipartitehistone divided into a recognizable H2A portion with many subclass-characteristic substitutions, and a long C-terminal extension found in
no other histone subclass (residues 227–430 in Fig 1B) MacroH2Ahas been found only in vertebrates and is concentrated in the inactivefemale X chromosome (reviewed in Brown17) H2A-Bbd is a highly
Fig 2 (continued)
Trang 21divergent subclass, so far found only in mammals, which displays a mentary localization to macroH2A, that is, it is excluded from inactivechromosomes.18
comple-17 D T Brown, Genome Biol 2, Reviews 0006 (2001).
18 B P Chadwick and H F Willard, J Cell Biol 152, 375 (2001).
Fig 2 Summary of H2B subclasses and variants A consensus sequence of all aligned H2B sequences is shown at the top Dots in the sequences below indicate identity to the consensus Groups are named on the basis of clustering patterns observed in neighbor-joining trees of aligned H2B sequences (not shown) Names, a selection of sequence descriptors found in the definition lines of the sequence records; seq, number of unique sequences in the group; sp, number of species in the group; max sp/seq, the greatest number of species having the same sequence in the group For each group the first line is the consensus sequence for that group Variations from the group consensus are indicated below it Italic indicates a ‘‘singleton,’’ i.e., the residue was found in only one sequence from one species in the group An asterisk (*) indicates singleton identity or a gap Background color key: white, identity to the anchored consensus; black, gap; orange, aromatic; yellow, aliphatic/hydrophobic; light green, glycine; green, hydrophilic; light blue, histidine; blue, basic; red, acidic.
[1] mining core histone sequences from public protein databases 15
Trang 22Fig 3 Summary of H3 subclasses and variants A consensus sequence of all aligned H3 sequences is shown at the top Dots in the sequences below indicate identity to the consensus Groups are named on the basis of clustering patterns observed in neighbor-joining trees of aligned H3 sequences (not shown) Names, a selection of sequence descriptors found in the definition lines of the sequence records; seq, number of unique sequences in the group; sp, number of species in the group; max sp/seq, the greatest number of species having the same sequence in the group For each group the first line is the consensus sequence for that group.
Trang 23Variations from the group consensus are indicated below it Italic indicates a ‘‘singleton,’’ i.e., the residue was found in only one sequence from one species in the group An asterisk (*) indicates singleton identity or a gap Background color key: white, identity to the anchored consensus; black, gap; orange, aromatic; yellow, aliphatic/hydrophobic; light green, glycine; green, hydrophilic; light blue, histidine; blue, basic; red, acidic.
[1] mining core histone sequences from public protein databases 17
Trang 25Functional subclasses of H2B sequences have not been positively tified, although at least one tissue-specific form has been identified in mam-malian testis (Fig 2) An echinoderm sperm variant featuring a repeatingpentapeptide has also been described (reviewed in von Holt et al.19), indi-cating that the echinoderm group inFig 2 probably could be subdividedfurther The N-terminal diversity seen within the plant subclass inFig 2
iden-suggests that it, too, could be further subdivided
at positions 153–156.21Centromere-specific H3 is found in species rangingfrom yeast to human, and its deposition has been shown to be replicationindependent (reviewed in Smith22) It is thought to help specify centromere
Fig 4 Summary of H4 subclasses and variants A consensus sequence of all aligned H4 sequences is shown at the top Dots in the sequences below indicate identity to the consensus Groups are named on the basis of clustering patterns observed in neighbor-joining trees of aligned H4 sequences (not shown) Names, a selection of sequence descriptors found in the definition lines of the sequence records; seq, number of unique sequences in the group; sp, number of species in the group; max sp/seq, the greatest number of species having the same sequence in the group For each group the first line is the consensus sequence for that group Variations from the group consensus are indicated below it Italic indicates a ‘‘singleton,’’ i.e., the residue was found in only one sequence from one species in the group An asterisk (*) indicates singleton identity or a gap Background color key: white, identity to the anchored consensus; black, gap; orange, aromatic; yellow, aliphatic/hydrophobic; light green, glycine; green, hydrophilic; light blue, histidine; blue, basic; red, acidic.
19 C von Holt, W N Strickland, W F Brandt, and M S Strickland, FEBS Lett 100, 201 (1979).
20 K Ahmad and S Henikoff, Proc Natl Acad Sci USA 99(Suppl 4), 16477 (2002).
21 K Ahmad and S Henikoff, Mol Cell 9, 1191 (2002).
22 M M Smith, Curr Opin Cell Biol 14, 279 (2002).
[1] mining core histone sequences from public protein databases 19
Trang 26regional identity within the chromosome Centromeric H3 displays what more subclass specificity (and considerably more diversity) withinthe histone fold than other H3 subclasses (Fig 3), which may reflect a role
some-in formsome-ing specialized nucleosomes
H4
The H4 class is the most conserved of the four core histones Nofunctional, localization, or expression variants are known, and thus theclustering of its sequences falls entirely along phylogenetic lines (Fig 4).Complete alignments of all histone proteins can be found at http://genome.nhgri.nih.gov/histones/ Histones for the various species can beobtained by querying the Histone Sequence Database The figures for thismanuscript are available at http://www.ncbi.nlm.nih.gov/CBBresearch/Landsman/mie/
Trang 27[2] Reconstitution of Nucleosome Core Particles from
Recombinant Histones and DNA
By Pamela N Dyer, Raji S Edayathumangalam,
Cindy L White, Yunhe Bao, Srinivas Chakravarthy,
Uma M Muthurajan, and Karolin Luger
Introduction
The ability to prepare nucleosome core particles (NCPs), or mal arrays, from recombinant histone proteins and defined-sequence DNAhas become a requirement in many projects that address the role of histonemodifications, histone variants, or histone mutations in nucleosome andchromatin structure This approach offers many advantages, such as theability to combine histone variants and tail deletion mutants, and theopportunity to study the effect of individual histone tail modifications onnucleosome structure and function
nucleoso-We have previously described comprehensive protocols for the sion and purification of histones, for the refolding of the histone octamer,and for the reconstitution and purification of crystallization-grademononucleosomes.1 The previously published version has now beenamended, and steps that can be omitted or simplified if high degrees ofpurity and homogeneity are not an issue are indicated The cloning strat-egies for the construction of plasmids containing multiple repeats of definedDNA sequences, and the subsequent large-scale isolation of defined-sequence DNA for nucleosome reconstitution, are described in detail Wealso describe adapted procedures to prepare nucleosomes with histonesfrom other species, and for the refolding and reconstitution of (H2A–H2B) dimers and (H3–H4)2 tetramers Methods to reconstitute nucleo-somes from different histone subcomplexes are also described A flow chartfor all procedures involved in the preparation of ‘‘synthetic nucleosomes’’ isgiven inFig 1 Procedures described here are indicated in gray inFig 1
expres-Cloning and Purification of Large Amounts of Defined-Sequence DNA
Cloning Strategy
A general procedure to construct a plasmid containing multiple repeats
of a given DNA sequence, based on published strategies,2,3and to purifylarge amounts of defined-sequence DNA fragments is outlined below
Copyright 2004, Elsevier Inc All rights reserved.
Trang 28Figure 2 outlines the cloning strategy for fragments containing either thecomplete desired sequence (Fig 2A), or one-half of a palindromic DNAfragment (Fig 2B) Because of the recombination activities in most bacter-ial cells, long palindromic DNA fragments cannot be amplified, but must
1 K Luger, T J Rechsteiner, and T J Richmond, Methods Enzymol 304, 3 (1999).
2 R T Simpson, F Thoma, and J M Brubaker, Cell 42, 799 (1985).
3 T J Richmond, M A Searles, and R T Simpson, J Mol Biol 199, 161 (1988).
Fig 1 Flow chart of methods used for preparation of components for nucleosomes Procedures that are described in this chapter are shown in gray.
24 biochemistry of histones, nucleosomes, and chromatin [2]
Trang 29be assembled by ligation of two halves Figure 3 describes the strategyfor duplication and outlines procedures for insert preparation We usepUC-based vectors for these constructs.
In designing the cloning strategy for creating multiple DNA repeats,the DNA sequence of interest is flanked by restriction sites as shown in
Fig 2, where A is a unique site (e.g., KpnI), B and B0are sites for enzymesthat are compatible, but nonidentical (e.g., BamHI and BglII), and C is asite for an enzyme that is used to excise the fragment from the plasmid(e.g., EcoRV) Here, blunt ends are desirable If the final DNA fragment
is to be generated by large-scale ligation of two shorter fragments (e.g., ifpalindromic 146-bp DNA fragments are the desired end-product), restric-tion enzyme D should generate overhangs suitable for high-efficiency liga-tion We used EcoRI for a perfectly palindromic 146-bp DNA fragment,4and a Hinf I site to generate 147-bp DNA fragments by ligation of two frag-ments.5Because large amounts of restriction enzymes cutting sites C and Dwill be used, economical considerations also come into play in the cloningstrategy
Digestion of the plasmid DNA with A and B creates a vector into which
a fragment generated by A and B0can be ligated, destroying the restrictionsite at the B–B0junction (Fig 3) Thus, with each cloning step, the number
4 K Luger, A W Maeder, R K Richmond, D F Sargent, and T J Richmond, Nature 389,
B 0 , compatible cohesive ends; C, generates end(s) of actual fragments (large amounts needed);
D, used for head–head ligation of two fragments; overhang can be chosen to allow or prohibit self-ligation.
Trang 30of inserts can be doubled The individual steps for fragment insertion andamplification are described.
1 Synthesize and anneal pair(s) of suitable oligonucleotides (oligos).Follow standard cloning procedures to insert the fragment into a suitablehigh-copy plasmid via restriction sites A and B0
2 Cut the plasmid containing the proper insert with restrictionenzymes A and B (digest 1) Purify the vector DNA
Fig 3 Strategy for amplification and preparation of large amounts of inserts designed in
Fig 2 (A) Cloning and duplication strategy Sites for restriction enzymes are indicated by symbols (see inset for legend) (B) Insert preparation from large-scale plasmid preparations (see text for details) for palindromic DNA fragments that undergo ligation CIP, Incubation with calf intestine phosphatase (C) Insert preparation for nonpalindromic DNA fragments that do not need to be self-ligated.
26 biochemistry of histones, nucleosomes, and chromatin [2]
Trang 313 Cut the plasmid containing the insert with a second digest ofrestriction enzymes A and B0(digest 2) Purify the insert DNA away fromthe plasmid vector and keep the insert generated by the digest.
4 Ligate the insert DNA (created by digest 2) with the vector DNA(created by digest 1)
5 Repeat steps 2–4: Each repetition will duplicate the number ofpreviously present insert copies Depending on the length of the insert,about 16 to 24 inserts can be obtained easily Use HB 101 cells or otherhost cells that are RecA minus for plasmid amplification The followingstatistics give the experimental amplification efficiencies found by ourlaboratory for each doubling cycle: 1! 2, 100% efficiency; 2 ! 4, 70%efficiency; 4! 8, 60% efficiency; 8 ! 16, 40% efficiency
6 Assay for total size of the insert by digestion with restrictionenzymes A and B0, and check for integrity of inserts by sequencing (earlystages) and by cutting with C
7 If efficiencies for duplication are low, try ligation of a 2-mer or4-mer instead of duplication, to increase insert number
Large-Scale Plasmid Purification
This method has been adapted from the original alkaline lysis protocoldescribed earlier.6 It has been optimized for high yields and purity ofpUC-based plasmids, containing 24 146 bp (or 84-bp) inserts
12 wide-bottom 4-L Fernbach flasks
Buffers and Reagents
Alkaline lysis solution I: 50 mM glucose, 25 mM Tris-HCl (pH 8.0),
Trang 32CIA: Chloroform–isoamyl alcohol (24:1, v/v)
3 M Sodium acetate (pH 5.2), autoclaved
PAGE [10% polyacrylamide, 0.2 Tris–borate–EDTA (TBE)]40% PEG 6000, autoclaved
Phenol, Tris–EDTA (TE) equilibrated
RNase A (DNase free6)
TE 10/0.1: 10 mM Tris-HCl (pH 8.0), 0.1 mM Na-EDTA; autoclaved
TE 10/50: 10 mM Tris-HCl (pH 8.0), 50 mM Na-EDTA; autoclavedT4 DNA ligase (200,000 U/ml)
Terrific broth (TB): 1.2% (w/v) Bacto Tryptone, 2.4% (w/v) yeastextract, 0.4% (v/v) glycerol Adjust autoclaved and cooled medium
to a final concentration of 17 mM KH2PO4and 72 mM K2HPO4
Plasmid Purification
1 Inoculate each of four 5-ml precultures containing TB (or 2 TY;see Histone Expression and Purification, below) and ampicillin (100 g/ml) with a colony from a freshly transformed plate Shake for 3–4 h at 37.Transfer all precultures to a 500-ml flask containing 100 ml of 2 TY andampicillin (100 g/ml), and incubate for 2–3 h at 37 until turbid Do notgrow to saturation Transfer equal amounts of the preculture to 12Fernbach flasks containing 500 ml of TB and ampicillin at 100 g/ml.Incubate under vigorous shaking for 16–18 h at 37 Harvest cells bycentrifugation in 500-ml centrifuge bottles Fresh weight yields 125 g ofcells Cells should be processed immediately for optimal yields
2 Resuspend cells from 6 liters of cell culture in a total of 360 ml ofalkaline lysis solution I by passage through a 10-ml plastic pipette.Redistribute the cells equally back into the six centrifuge bottles Add
120 ml of alkaline lysis solution II to each bottle Mix by shakingvigorously at least 20 times, until the thick translucent suspension iscompletely free of any clumps of cells Incubate on ice, and shakerepeatedly for a total of 10 to 20 min Break up large clumps that stillremain after such treatment by passage through a 10-ml disposable plasticpipette
3 Carefully pour 210 ml of ice-cold alkaline lysis solution III down theside of each bottle Mix by inverting and swirling 10 times and incubate onice for 20 min This step is critical because plasmid DNA is renatured,
28 biochemistry of histones, nucleosomes, and chromatin [2]
Trang 33whereas chromosomal DNA precipitates Viscosity is reduced dramaticallyduring this step Low yields, or large amounts of chromosomal DNA in theplasmid preparation, may result if mixing is done too slowly.
4 Centrifuge at 10,000g for 20 min at 4 Warm the rotor to 20 byrunning empty at 8000g for 15 min Pour the supernatant throughMiracloth to remove remaining precipitate, and add 0.52 volume ofisopropanol Let stand at room temperature for 15 min
5 Centrifuge at 10,000g for 30 min at 20to collect the precipitate Airdry for 30 min to 1 h Using a clean spatula, distribute pellets between two30-ml centrifuge tubes Use 5 ml of TE 10/50 to rinse out centrifugebottles, and adjust each tube to a final volume of 20 ml Mix the DNA into
a homogeneous solution, and then add 120 l of RNase A (10 mg/ml) (anRNase A stock of 1.2 Kunitz units/l should be diluted to 1:100 in relation
to the final reaction,0.01 Kunitz unit/l reaction mix) and incubate at 37overnight The pellets should have dissolved completely (Store at20asnecessary.)
6 If the suspension is viscous, dilute with TE 10/50 buffer to up totwice the volume Extract each 20 ml of suspension with 10 ml of phenol.Centrifuge at 27,000g for 20 min at 20 The DNA will be in the upper,aqueous phase, separated from the phenol phase by a thick whiteinterphase Repeat two more times or until the interface is clear Extractthe aqueous phase with 10 ml of CIA Spin for 5 min (12,000g, 20).Transfer the aqueous phase into a 50-ml centrifuge tube and adjust to afinal volume of 30 ml with TE 10/50
7 Precipitate plasmid DNA by adding one-fifth of the original volume
of 4 M NaCl (to give 0.5 M NaCl) and two-fifths of 40% PEG 6000 [to give10% (w/v) PEG 6000] Mix at 37for 5 min and incubate on ice for 30 min
8 Centrifuge at 3000g in a swinging-bucket tabletop centrifuge for
20 min at 4 Decant the supernatant, which contains RNA Dissolve thepellets in a total of 15 ml of TE 10/0.1 (overnight at room temperature orfor less time at 37) Check both fractions by agarose gel electrophoresis.Fractionation should be complete, and there should be no traces of RNAvisible in the plasmid fraction
9 Extract two times with 10 ml of CIA to remove PEG Ethanolprecipitate DNA by addition of a 1/10 volume of 3 M sodium acetate (pH5.2) and 2.5 volumes of 100% cold absolute ethanol Pellet the DNA,dissolve in 10 ml of TE 10/0.1 by incubating for 1 to several hours at 37,and determine the total concentration Yields are usually between 150 and
200 mg
Purification of Insert Experimental details in this section depend onthe restriction sites that were chosen in the design of the plasmid Given
Trang 34the large amounts of DNA present, restriction digests can routinely beperformed at plasmid concentrations of 1 mg/ml Most restriction enzymesare more efficient under these conditions Optimize reaction conditionsbefore proceeding with large-scale digestions Below we give conditionsthat were used for isolation of the palindromic 146-bp DNA fragmentderived from human -satellite DNA that is routinely used forcrystallography.4
1 The insert is excised with EcoRV, at a concentration of 1 mg/mlplasmid, in sterile 50-ml centrifuge tubes Use 30 units of EcoRV pernanomole of EcoRV site Incubate at 37 for at least 16 h, and then checkfor completion by gel electrophoresis on 10% polyacrylamide gels (0.2TBE) If the digest is not complete, add 50% more restriction enzyme andincubate for another 15 h Check the digest as described above
2 Separate the excised EcoRV fragment from the linearized plasmid
by PEG precipitation Add 0.192 volume of 4 M NaCl and 0.346 volume of40% PEG 6000 Incubate on ice for 1 h and spin down the vector DNA at27,000g and 4 for 20 min Precipitate the EcoRV fragment contained inthe supernatant by the addition of 2.5 volumes of 100% cold ethanol Airdry the DNA briefly (10 min) and dissolve in 5 ml of TE 10/0.1
3 Determine the concentration Check both precipitated PEGsupernatant and PEG pellet on a 1% agarose gel and PAGE as describedabove (run series of 1:10 dilutions) There should be no cross-contamin-ation between the two fractions Yields should be close to 90% (i.e., if thefragments encompass 40% of the entire plasmid, 40 mg of excisedfragment should be obtained 100 mg of plasmid) Note: This procedure willnot work for DNA fragments with sticky ends
4 If the cloned fragment represents the entire sequence, either use as
is (after phenol extraction and ethanol precipitation), or purify further byion-exchange chromatography If further cutting and ligation are required,proceed with step 5
5 Dephosphorylate EcoRV fragment by combining EcoRV fragment(1 mg/ml) with calf intestine alkaline phosphatase (CIAP, 1 U/nmol ofDNA end; Roche), using the conditions given by the manufacturer.Incubate at 37for 24 h, and then add 50% of the original amount of CIAPand incubate for another 24 h at 37 Complete phosphorylation isessential, because self-ligation of the blunt ends during subsequent stepsneeds to be avoided If in doubt, perform a small-scale assay for blunt-endligation None should occur if dephosphorylation is complete
6 Inactivate the CIAP by extracting the DNA solution two times with50% of the original volume of phenol–CIA (1:1 mixture) and then ethanolprecipitate by addition of a 1/10 volume of 3 M sodium acetate (pH 5.2)
30 biochemistry of histones, nucleosomes, and chromatin [2]
Trang 35and 2.5 volumes of cold ethanol Spin down the precipitated DNA at 3000g(swinging bucket tabletop centrifuge), air dry the pellet briefly, anddissolve in 5 ml of TE 10/0.1.
7 To create cohesive ends for self-ligation, use EcoRI at 20–30 U/nmol
of EcoRI site (substrate concentration, 1 mg/ml) and incubate at 37for atleast 15 h Check completion of the digest by PAGE Make sure thedigestion is complete before proceeding with the next step
8 FPLC purify the fragment by chromatography over a TSK-DEAEcolumn (the sample can be loaded directly, or it can be ethanolprecipitated to reduce the volume) Ethanol precipitate the FPLC fractions(no need to add salt), air dry the pellet briefly, and dissolve it in5 ml of
TE 10/0.1 or 1 ligation buffer (see below) Yields are typically 85% of thestarting amount
9 Perform a small-scale ligation to test whether ligation can be driven
to completion and to assess whether phosphorylation of EcoRV ends wascomplete The latter should be visible in the formation of a ladder as aresult of blunt-ended tail–tail ligation of the EcoRV fragments Use
0.5 U of ligase per microgram of fragment, at a substrate concentration
of 1 mg/ml, under conditions as given by the manufacturer Incubate atroom temperature for at least 15 h, and check completion of ligation byPAGE Add more ligase if necessary
10 If necessary, purify ligated from unligated fragments by exchange chromatography on a TSK-DEAE column (or anotherion-exchange column of similarly high resolution) This separationdepends strongly on the DNA sequence and must be optimizedindividually
ion-Histone Expression and Purification
These procedures, which utilize expression vectors for Xenopus laevishistones7have been described extensively.1We have since used this proto-col to express and purify various H2A and H3 histone variants from differ-ent species (e.g., Suto et al.8), and histones from yeast (White et al.9; alsosee Wittmeyer et al.10), Drosophila, and mouse All these histones havebeen subcloned in untagged form into the pET vector series (Novagen,
7 K Luger, T J Rechsteiner, A J Flaus, M M Waye, and T J Richmond, J Mol Biol 272,
301 (1997).
8 R K Suto, M J Clarkson, D J Tremethick, and K Luger, Nat Struct Biol 7, 1121 (2000).
9 C L White, R K Suto, and K Luger, EMBO J 20, 5207 (2001).
10 J Wittmeyer and T Formosa, Methods Enzymol 262, 415 (1995).
Trang 36Madison, WI) Histidine-tagged histones are also purified in the same way.
In some cases, codon usage has been optimized for Escherichia coli, andthe time after induction as well as the bacterial strain have been optimizedfor each case In some cases, better results are obtained with BL21(DE3)strains that compensate for poor codon usage All expressed proteins areinvariably expressed in insoluble form and isolated from the insolublefraction obtained after cell lysis (inclusion bodies)
Equipment
Dialysis tubing (6- to 8-kDa cutoff, 2.5- to 4-cm flat width)
Ion-exchange column, TSK SP-5 PW resin material
6 wide-bottom Fernbach flasks (4 L)
Tissumizer (Tekmar, Cincinnati, OH) or sonicator for cell lysisBuffers and Reagents
Ampicillin (100-mg/ml stock solution, sterile filtered)
Chloramphenicol stock solution, 25 mg/ml in ethanol
Dimethyl sulfoxide (DMSO)
Glucose
Lysozyme
Liquid nitrogen
SDS–PAGE equipment: Standard equipment, 18% SDS gels
TYE agar plates: 1.0% (w/v) Bacto Tryptone, 0.5% (w/v) yeastextract, 0.8% (w/v) NaCl, 1.5% (w/v) agar, ampicillin (100 g/ml),and chloramphenicol (25 g/ml)
2 TY: 1.6% (w/v) Bacto Tryptone, 1.0% yeast extract, 0.5% NaCl,with antibiotics and 0.1% glucose
Unfolding buffer: 6 M guanidinium-HCl, 20 mM Tris-HCl (pH 7.5),
Trang 37Histone Expression
1 Transfect BL21(DE3)pLysS cells with 0.1 to 1 g of the histone expression plasmid and plate on TYE agar plates with ampicillin(100 g/ml) and chloramphenicol (25 g/ml) Incubate at 37 overnight.For best and most reproducible results, a new transformation should bedone each night for the protein that is expressed the next day For somehistones, BL21(DE3)pLysS Codonplus (RIL) or BL21(DE3) cells will givebetter results
pET-2 Expression conditions depend on the histone in question and should
be optimized individually For most histones, conditions given in Luger
et al.7are adequate
3 Inoculate each of four preculture tubes (4 ml of 2 TY withantibiotics and 0.1% glucose) with one colony from the culture plate.Incubate in a shaker at 37
4 When preculture tubes appear slightly turbid (2–3 h), add thecontents of all four tubes to a flask containing 100 ml of 2 TY withappropriate antibiotics and glucose Incubate in a shaker at 37 For mostreproducible results, do not let precultures grow to saturation
5 When the 100-ml flask has reached an OD600of0.4, distribute thecontents evenly into six wide-bottom Fernbach flasks containing 1 liter each
of 2 TY medium and appropriate antibiotics and glucose Incubate in ashaker at 37 until the OD600 reaches about 0.4 Induce expression byaddition of IPTG to a final concentration of 0.2–0.4 mM
6 After 2 h, harvest the cells at room temperature and resuspend thecell pellets in a total of 35 ml of wash buffer Flash freeze in liquid nitrogenand store at20 in a 50-ml centrifuge tube
Note Cells expressing histone proteins (especially H4) are prone tolysis and should be centrifuged at room temperature For the same reason,
it is not recommended (or necessary) that the cell pellet be washed pend the cells well before freezing, as this will improve lysis on thawing.The cell suspension can be stored at20 or 70
Resus-Inclusion Body Preparation
1 Lyse the cell suspension by thawing at 37
2 Pour the cell extracts into 250-ml centrifuge bottles At this point,the cells should be viscous If the cell suspension is still watery, then fulllysis has not occurred In this case, or if no pLysS plasmid has been present,add lysozyme to a concentration of 1 mg/ml and incubate on ice for
30 min Repeated freeze–thaw cycles also facilitate lysis Bring the totalvolume to 100 ml
Trang 383 Blend the cell extracts with the Tissumizer to reduce viscosity Blenduntil viscosity is reduced; avoid overheating of sample A sonicator canalso be used with similar results.
4 Spin at 4 for 20 min at 12,000g Pour off the supernatant andresuspend the tight, solid pellet with 75 ml of wash buffer containing 1%Triton X-100 If the pellet is ‘‘spongy,’’ sonicate/blend (Tissumizer) again.Spin for 20 min as described previously
5 Repeat once as described above and once with wash bufferwithout Triton X-100 The drained pellet can be stored for a limited time
at 20
Histone Purification
A two-step purification procedure yielding up to 1 g of highly pure tone protein from 6 liters of induced cells has been described previously.1The purification protocol involves gel filtration and HPLC/ion-exchangechromatography under denaturing conditions If purity is not a major con-cern, one of the chromatography steps (usually the ion-exchange chroma-tography) can be omitted The gel-filtration column can be scaled downaccordingly if only small amounts of histones are purified The purified pro-teins can be stored as lyophilisates for extended periods of time, to be used
his-in refoldhis-ing reactions as described subsequently
Refolding of Histone Octamer
All possible combinations of recombinant Xenopus laevis full-lengthand globular domain histone proteins, as well as histone octamers fromother species, or containing histone variants, can be refolded to functionalhistone octamers according to a previously described protocol.1 Themethod works best for 6 to 15 mg of total protein; the limiting factorhere is the size of the gel-filtration column Much smaller samples can
be prepared when using an analytical column Some applications requirethe preparation of H2A–H2B dimers and (H3–H4)2 tetramers Thesame protocols can be used for refolding and purification of these histonesubcomplexes
Equipment
Dialysis tubing (6- to 8-kDa cutoff, 2.3-cm flat width)
HiLoad 16/60 Superdex 200 HR preparation-grade gel-filtrationcolumn (Pharmacia), equipped with UV detector and fractioncollector
SDS-PAGE equipment: Standard equipment, 18% SDS gels
34 biochemistry of histones, nucleosomes, and chromatin [2]
Trang 39Concentration device: Devices suitable for up to 25-ml volumes [e.g.,Centricon centrifugal filter devices; Amicon Bioseparations(Millipore, Bedford, MA)]
Buffers and Reagents
Purified and lyophilized histones (3- to 4-mg aliquots)
Unfolding buffer: 6 M guanidinium chloride, 20 mM Tris-HCl (pH7.5), 5 mM DTT Needs to be made fresh for good refoldingefficiency
Refolding buffer: 2 M NaCl, 10 mM Tris-HCl (pH 7.5), 1 mMNa-EDTA, 5 mM 2-ME
Histone Octamer Refolding
1 Dissolve each histone aliquot to a concentration of approximately
2 mg/ml in unfolding buffer Unfolding should be allowed to proceed for atleast 30 min and for no more than 3 h Determine the concentration of theunfolded histone proteins by measuring absorbance of the ‘‘undiluted’’solution against unfolding buffer at 276 nm (remove any undissolvedparticulate matter by centrifugation, if necessary) Extinction coefficientscan be obtained (see Table Ifor full-length Xenopus and yeast histones)
or calculated (for histones from other species or histone variants) using thefollowing Web site: http://ca.expasy.org/tools/protparam.html Note: Usingcorrect extinction coefficients is essential for good yields in refolding
2 Mix histone proteins to exactly equimolar ratios and adjust to a totalfinal protein concentration of 1 mg/ml, using unfolding buffer Dialyze at
4against at least three changes of 600 ml of refolding buffer (at least 6 heach; the second or third step should be overnight) Histone octamershould always be kept at 0–4 to avoid dissociation
3 Remove any precipitated protein by centrifugation Concentrate to
a final volume of approximately 1 ml, using the concentration device.Histone octamers refolded with tailless histones often stick to the filtermembrane of the concentration device and take a much longer time toconcentrate Make sure the octamer solution is mixed (pipette up anddown) to avoid clogging filtration devices
4 Load samples onto the gel-filtration column previously equilibratedwith refolding buffer as described.1High molecular weight aggregates willelute after about 45 ml, histone octamer at 65 to 68 ml, (H3–H4)2tetramer
at about 72 ml, and histone (H2A–H2B) dimer at 84 ml (Fig 4)
5 Check the purity and stoichiometry of the fractions by 18% SDS–PAGE Dilute sample by a factor of at least 2.5 before loading onto the gel
to reduce distortion of the bands resulting from the high salt concentration
Trang 40If octamer contains globular H3 histone, be aware that globular histoneH3 comigrates with full-length H4, and only two bands will be seen onthe gel.7
6 Pool fractions containing octamer and concentrate, using theconcentration device, to 3–15 mg/ml Determine the concentration ofthe octamer spectrophotometrically Extinction coefficients can be
TABLE I Molecular Weights and Molar Extinction Coefficients (e) for Full-length Xenopus laevis and Saccharomyces cerevisiae Histone Proteins
36 biochemistry of histones, nucleosomes, and chromatin [2]