RT–PCR can be used for cloning, cDNA library construction andprobe synthesis.. 5 Cloning a geneKey concepts DNA libraries are pools of recombinant DNA molecules Genomic libraries conta
Trang 1oligonu-devised as a method of RNA amplification and quantitation after its conversion
to DNA RT–PCR can be used for cloning, cDNA library construction andprobe synthesis The technique consists of two parts (Figure 4.12) – the syn-thesis of DNA from RNA by reverse transcription (RT) and the subsequentamplification of a specific DNA molecule by polymerase chain reaction (PCR).The RT reaction uses an RNA template (typically either total RNA or polyA+RNA), a primer (random or oligo dT primers), dNTPs, buffer and a reversetranscriptase enzyme (which we will discuss more in Chapter 5) to generate
a single-stranded DNA molecule complementary to the RNA (cDNA) ThecDNA then serves as a template in the PCR reaction During the first cycle of
Trang 24.9 REAL-TIME PCR 179
PCR, the single DNA strand is made double stranded through the binding ofanother, complementary, primer and the action of Taq DNA polymerase.Like other methods of mRNA analysis, such as northern blots and nucleaseprotection assays, RT–PCR can be used to quantify the amount of mRNAthat was contained in the original sample This type of analysis is particularlyimportant for monitoring changes in gene expression However, because PCRamplification is exponential, small sample-to-sample concentration and loadingdifferences are amplified as well Even large differences in target concentration(100-fold or more) may produce the same intensity of band after 25 or 30 PCRcycles Therefore, RT–PCR requires careful optimization when used for quan-titative mRNA analysis Quantitation usually takes one of two forms – relative
or absolute
• Relative quantitation compares transcript abundance across multiple ples, using a co-amplified internal control for sample normalization Resultsare expressed as ratios of the gene specific signal to the internal controlsignal This yields a corrected relative value for the gene specific product
sam-in each sample These values may be compared between samples for anestimate of the relative expression of target RNA in the samples
• Absolute quantitation, using competitive RT–PCR, measures the absoluteamount (e.g 5.3 × 105 copies) of a specific mRNA sequence within asample Dilutions of a synthetic RNA (containing identical primer bindingsites, but slightly shorter than the target RNA) are added to the sample andare co-amplified with the target The PCR product from the endogenoustranscript is then compared with the concentration curve created by thesynthetic competitor RNA
4.9 Real-time PCR
Quantitative real-time RT–PCR combines the best attributes of both relativeand competitive RT–PCR in that it is accurate, precise, high throughput andrelatively easy to perform Real-time PCR automates the otherwise laboriousprocess of relative RT–PCR by quantitating reaction products for each sample
in every cycle Real-time PCR systems rely upon the detection and quantitation
of a fluorescent reporter, whose signal increases in direct proportion to theamount of PCR product in a reaction In the simplest form, the reporter is thedouble-strand DNA-specific dye SYBR Green (Wittwer et al., 1997) SYBR
Green binds double-stranded DNA, probably in the minor groove, and, uponexcitation, emits light Thus, if the dye is included in a PCR reaction, as a
Trang 33 ′
5 ′ 3′
Figure 4.13. TaqMan real-time PCR quantification Three primers are used during the PCR process – two of these (primers 1 and 2) dictate the beginning of DNA replication on each DNA strand, and the third (the probe) binds to one strand in between The probe contains two modified bases – a fluorescent reporter (R) at its 5-end and a fluorescence quencher (Q) at its 3 -end As DNA replication proceeds, the extended product from primer 1 displaces the 5 -end of the probe and the exonuclease activity of the polymerase cleaves the fluorescent reporter from the probe The separation of the reporter from the quencher allows it to fluoresce The amount of fluorescence is proportional to the amount
of PCR product being made and is measured during each PCR cycle
PCR product accumulates the fluorescence increases The advantages of SYBRGreen are that it is inexpensive, easy to use, and sensitive The disadvantage
is that SYBR Green will bind to any double-stranded DNA in the reaction,including primer dimers and other non-specific reaction products, which can
Trang 44.10 APPLICATIONS OF PCR 181
result in an over-estimation of the target concentration For single PCR productreactions with well designed primers, SYBR Green can work extremely well,with spurious non-specific background only showing up in very late cycles.The alternative method for quantifying PCR products is TaqMan, whichrelies on fluorescence resonance energy transfer (FRET) of hybridization probesfor quantitation (Figure 4.13) TaqMan probes are oligonucleotides that con-tain a fluorescent reporter dye, typically attached to the 5base, and a quenchingdye, typically attached to the 3 base The probe is designed to hybridize to aninternal region of a PCR product When irradiated, the excited reporter dyetransfers energy to the nearby quenching dye molecule rather than fluorescing,resulting in a non-fluorescent substrate During PCR, when the polymerasereplicates a template on which a probe is bound, the 5-3 exonuclease activ-
ity of the polymerase cleaves the probe (Holland et al., 1991) This separates
the fluorescent and quenching dyes and FRET no longer occurs Fluorescenceincreases in each PCR cycle, proportional to the rate of probe cleavage, and ismeasured in a modified thermocycler Real-time PCR is a powerful quantitativetool, but the cost of reagents and equipment is much higher than that ofstandard PCR reactions
Trang 5• fingerprinting/population analysis
• genome analysis
• quantitative PCR of RNA or DNA
We will touch on some of these topics in later chapters but, again, interestedreaders are directed toward more dedicated literature (McPherson and Møller,2000; Innis, Gelfand and Sninsky, 1999)
Trang 65 Cloning a gene
Key concepts
DNA libraries are pools of recombinant DNA molecules
Genomic libraries contain fragments of all DNA sequences present
in the genome
developmental stage specific Their formation is dependent on anRNA-dependent DNA polymerase enzyme, reverse transcriptase
PCR based libraries negate the requirement for cloned DNA ments and can undergo subtraction to isolate genes that aredifferentially expressed
frag-Genomes contain an enormous amount of DNA (Table 5.1) Consequently,each gene contained within a genome represents only a tiny fraction of thegenome size itself All traditional DNA cloning strategies are composed of fourparts: the generation of foreign DNA fragments, the insertion of foreign DNAinto a vector, the transformation of the recombinant DNA molecule into a hostcell in which it can replicate and a method of selecting or screening clones toidentify those that contain the particular recombinant we are interested in Inthis chapter we will address some of the particular problems and issues withthe first two of these steps in the formation of DNA libraries A DNA library
is simply a collection of DNA fragments
There are several different types of library that we will consider here DNA
fragment libraries are designated as being either a genomic DNA library or
a cDNA library Most traditional methods of library construction involve the
physical cloning of various DNA fragments into a suitable vector However,
as we will see later, DNA fragments that are not cloned (e.g those derived
Analysis of Genes and Genomes Richard J Reece
2004 John Wiley & Sons, Ltd ISBNs: 0-470-84379-9 (HB); 0-470-84380-2 (PB)
Trang 8cDNA libraries are constructed by the conversion of mRNA from a particulartissue sample into DNA fragments that can be cloned into an appropriate vector.cDNA libraries thus contain only the coding sequence of genes expressed in atissue sample together with small regions of the 5and 3untranslated portions
of the gene Consequently, cDNA libraries isolated from different tissues ofthe same organism may be radically different in their composition The genesexpressed in one tissue type or developmental stage may well be different fromthose expressed in another tissue type or developmental stage Additionally, thecomposition of a cDNA library reflects the relative abundance of mRNA in theoriginal tissue sample Highly expressed genes will be represented in the librarymultiple times, whereas genes expressed at a low level will be represented inthe library less frequently
5.1 Genomic Libraries
The smallest unit of DNA within a genome is the chromosome Even in thesimplest organisms, however, chromosomes contain an enormous quantity of
DNA For example, the E coli chromosome contains some 4.6 Mbp (4 600 000
bp) of DNA (Table 5.1) This amount of DNA is far too large to be cloned intoany of the vectors currently available (Chapter 3) Therefore it is necessary, andindeed desirable, to fragment the DNA before it is cloned into an appropriatevector A ‘divide and conquer’ strategy comes into play here, whereby relativelysmall fragments of the genome can be assigned a specific function whereasthe whole genome is somewhat impenetrable The method of fragmentationplays an important role in the quality of the final library Ideally, the genomicDNA should be broken up into random and overlapping fragments prior tocloning Such cleavage would ensure that the library contains representativecopies of all DNA fragments present within the genome, and that fragment
Trang 9bias is not encountered by the cleavage of DNA at specific sites only There aretwo basic mechanisms for cleaving DNA that are used in the construction ofgenomic libraries.
(a) Mechanical shearing Purified genomic DNA is either passed several times
through an narrow-gauge syringe needle or subjected to sonication tobreak up the DNA into suitable size fragments that can be cloned.Typically, an average DNA fragment size of about 20 kbp is desirablefor cloning intoλ based vectors Mechanical methods such as these have
the advantage that DNA fragmentation is random, but suffer from thefact that large quantities of DNA are required, and that the average DNAfragmentation size may be quite variable
(b) Restriction enzyme digestion Restriction enzymes, such as EcoRI, often
recognize 6 bp DNA sequences and cleave the DNA within the recognitionsequence On average, a 6 bp DNA sequence will occur approximatelyevery 4000 bp within DNA Complete digestion of genomic DNA withEcoRI will generate DNA fragments that are generally too small to beuseful in genomic library construction Other restriction enzymes, e.g.NotI, recognize and cleave 8 bp recognition sequences Such sequenceswill occur much less commonly within DNA (approximately once every 65kbp) However, restriction enzyme cleavage to produce DNA fragmentssuffers as a consequence of the recognition sites themselves If, by chance,
a gene that we would like to clone contains multiple recognition sitesfor a particular restriction enzyme, then the fragments generated afterenzyme digestion may be too small to clone, and consequently the genemay not be represented within a library To overcome this problemgenomic DNA libraries are usually constructed by digesting the genomicDNA with restriction enzymes in such a way that the digestion does not
go to completion (Figure 5.1) Partial restriction digests will ensure thatnot all DNA recognition sequences are cut and, consequently, that thelibrary produced should contain copies of genes that may possess multiplerestriction enzyme recognition sequences In practice, restriction digestion
is normally performed using a restriction enzyme, or often two, thatrecognize and cleave very commonly occurring sequences For example,
as shown in Figure 5.2, high-molecular-weight genomic DNA is partiallycleaved with a mixture of the restriction enzymes HaeIII and AluI Each
of these restriction enzymes recognizes a 4 bp DNA sequence Theirrecognition sequences should therefore occur, on average, approximatelyevery 256 bp within genomic DNA The partial digestion, however, limitsthe number of restriction enzymes sites that are actually cut and leads to
Trang 10Figure 5.1. The complete and partial digestion of a DNA fragment using a restriction enzyme (a) Complete digestion ensures that all restriction enzyme recognition sites (RE) are cut (b) Partial digestion results in the cleavage of a random subset of the recognition sites Partial digestion will generate a variety of products as indicated
the formation of genomic DNA fragments of a suitable size for cloning.DNA fragments produced in this manner have blunt ends since bothHaeIII and AluI cut DNA in a blunt-ended fashion:
5'-GG CC-3'HaeIII:
3'-CC GG-5'
5'-AG CT-3'AluI:
• Linkers or adaptors As shown in Figure 5.2, the blunt ended DNA
frag-ments can be ligated to a series of oligonucleotides that either contain therecognition sequence for a restriction enzyme (linkers) or possess one bluntend for ligation to the genomic DNA and an overhanging sticky end forcloning into particular restriction sites (adaptors) In the case shown here,
Trang 11High molecular weight DNA (>100 kbp)
Partial restriction digest
& size fractionate 20 kbp
Mix and ligate
EcoRI methylase
m m
m m
m m
Figure 5.2. The construction of a genomic DNA library See the text for details
the DNA fragments are first protected from restriction enzyme cleavage
by treatment with a specific DNA methylase (Maniatis et al., 1978)
Treat-ment of the DNA fragTreat-ments with the EcoRI methylase, in the presence ofS-adenosylmethoinine, will result in the methylation of the internal-most
A residue within the EcoRI recognition sequence (5-GAATTC-3) DNAmodified in this fashion is unable to be cleaved by the restriction enzyme (seeFigure 2.1) The oligonucleotide linkers are then added to the methylatedDNA in large excess in the presence of high concentrations of DNA ligase.Subsequent treatment with the EcoRI restriction enzyme will result in DNA
Trang 125.1 GENOMIC LIBRARIES 189
cleavage only within the linker molecules which are the only ones thatcontain non-methylated EcoRI restriction enzyme recognition sequences.The resulting DNA fragments can then be cloned into the EcoRI restrictionsite of a suitable vector
• Restriction enzymes that generate sticky ends The genomic DNA may
be initially digested with a commonly occurring restriction enzyme thatgenerates sticky ends For example, digestion on genomic DNA with therestriction enzyme Sau3AI (recognition sequence 5-GATC-3) generatesDNA fragments that are compatible with the sticky end produced byBamHI (recognition sequence 5-GGATCC-3) cleavage of a vector Theease of this second approach makes its use far more prevalent
Once the DNA fragments are produced, there are cloned into a suitable vector.Often this will be aλ based vector but, as we have seen in Chapter 3, a variety
of vectors are available for cloning large DNA fragments The recombinant
vector and insert combinations are then grown in E coli such that a single
bacterial colony or viral plaque arises from the ligation of a single genomic
DNA fragment into the vector E coli cells infected with either a λ phage
or transformed with a plasmid DNA are unable to support the replication
of additional DNA molecules of the same type Consequently, each λ plaque
or bacterial colony contains multiple copies of the same recombinant DNAmolecule A library of these molecules is produced by pooling colonies orplaques such that sufficient are present to ensure that each genomic DNAfragment is represented at least once within the library The main advantage ofcloning large DNA fragments is that fewer individual clones must be pooledtogether to form a representative library A pertinent question to ask here ishow many individual colonies or plaques must be pooled to ensure that alibrary is truly representative of the genomic DNA from which it was made.The answer to this depends upon both the size of the genome from which thelibrary is made and upon the average size of the cloned DNA fragments within
the library For example, if a library of the E coli genome (4.6 Mbp) were
constructed containing 5 kbp fragments, then the fraction of the genome size
compared to the average individual cloned fragment size (f ) would give the
lowest possible number of clones that the library must contain:
f = genome sizefragment size = 4600 000 bp
5000 bp = 920
Therefore, an E coli genomic library of this size would require at least 920
independent clones Using the same calculation, a human genomic librarycontaining similar sized inserts would require at least 580 000 independent
Trang 13recombinants to construct a representative library If the fragment size isincreased to 20 kbp, as is common forλ vectors, then the human library must
contain at least 145 000 independent recombinant clones to be representative.The ratio of genome size to fragment size is, however, an under-estimate
of the complexity required for the construction of a library Libraries mustcontain a much larger number of recombinant clones than this since somesequences are invariably under-represented either by chance sampling error,
or as a consequence of the DNA sequence itself – perhaps the cloned DNA isrelatively toxic to the host cell in which the recombinant vector is replicated, orthey contain sequences that are difficult to clone, e.g highly repetitive DNA Inthe mid-1970s, Clarke and Carbon derived a formula relating the probability
recombinants (Clarke and Carbon, 1976):
where f is the ratio of the genome size to the fragment size described above, and
ln is the natural log Therefore, to achieve a 99 per cent probability (P = 0.99)
of including any particular sequence of random human genomic DNA in a
library of 20 kbp fragments, N = 6.7 × 105 In practice, most human genomiclibraries will contain over one million independent recombinant clones
The pooling together of either recombinant plaques or bacterial colonies
generates a primary library The recombinant clones are simply washed off the
growth plates and combined into a suitable test-tube The library should tain a representative copy of each DNA molecule from which it was produced
con-Of course, it is possible that some DNA molecules cannot be incorporated
within the library Certain DNA sequences may be toxic to E coli The foreign
DNA may
• be fortuitously expressed in E coli and the protein or protein fragment may
be harmful to bacterial growth,
• act as a binding site for E coli proteins and sequester them in such a way
that they are unable to perform their natural function or
• be highly repetitive and eliminated from bacterial cell through tion
recombina-The primary library is usually of a low titre and is often quite unstable Toincrease both its stability and its titre, the library is often subjected to anamplification step That is, the collection of phages or bacterial colonies is
Trang 145.2 cDNA LIBRARIES 191
plated out once more, and the resulting progeny collected to form an amplified library The amplified library usually has a much larger volume than the
primary library, and consequently may be screened many, or even hundreds,
of times Pooled collections of λ phages can be stored almost indefinitely.
Bacterial cells harbouring plasmids are more difficult to store and there is often
a high degree of recombinant clone loss upon resurrection of frozen bacterialcells Amplification of the library is essential if the library is to be screenedmultiple times However, it is possible that the amplification process will result
in the composition of the amplified library not truly reflecting the primaryone As we have already discussed, certain DNA sequences may be relatively
toxic to E coli cells; as a consequence bacteria harbouring such clones will
grow more slowly than other bacteria harbouring DNA sequences that do notaffect bacterial growth Such problematic DNA sequences may be present inthe primary library, but will be lost, or under-represented, after the growthphase required to produce the amplified library
5.2 cDNA Libraries
Not only are the genomes of higher-eukaryotic organisms big, but also only asmall fraction of the DNA contained within them codes for genes The HumanGenome Sequencing Project (Chapter 9) has estimated that genes constitute onlyabout 1.5 per cent of the DNA contained within the genome The knowledge
of the entire genome sequence is important to understand the potential of a cell,i.e the proteins that it could potentially produce, but perhaps more important
is knowledge of the protein content that individual cells actually produce.All cells within an individual organism are derived from the same genomesequence, but the way in which the genome is transcribed and translated isunique to individual cell types, and to the individual developmental stages ofeach cell Although many of the genes expressed by different cell types will bethe same, e.g the genes encoding the enzymes of the TCA cycle, some will also
be different, e.g some of the genes expressed within a skin cell will be different
to those of a muscle cell These differentially expressed genes, and the proteinsthat they produce, define each individual cell type Thus, the mRNA that iscontained within a cell gives us a snapshot of the genes being expressed withinthat cell at any particular time mRNA actually represents only a small fraction
of the total RNA contained within a cell (Table 5.2)
Most eukaryotic protein coding genes are transcribed by RNA polymerase IIand the resulting mRNA is usually subjected to a number of post-transcriptionalmodifications, including the additions of a 7-methylguanosine cap at the 5-end,and the addition of 100–200 adenine residues (a poly(A) tail) at the 3-end
Trang 15Table 5.2. The distribution of RNA molecules within cells In eukaryotes, RNA merase II is responsible for the production of approximately 60 per cent of newly synthesized transcripts Due to its instability, however, mRNA accumulates at a level
poly-of 10 per cent or less (Brandhorst and McConkey, 1974)
of the transcript (see Figure 1.27) Additionally, the mRNA undergoes splicing
to remove the introns so that the translation of a single contiguous messagecan occur
The problem with mRNA is, of course, that it cannot be maintained instable vectors and is difficult to manipulate Consequently, a DNA copy (called
complementary DNA, or cDNA) of the mRNA is required before a library can
be constructed The conversion of RNA to DNA is dependent upon the action
of reverse transcriptase, an enzyme found in retroviruses that is responsible for
the conversion of their RNA genome into a DNA copy prior to integration intohost cells (Figure 5.3) David Baltimore and Howard Temin first discoveredthe enzyme independently in 1970 (Temin and Mizutani, 1970; Baltimore,1970) Reverse transcriptase is an RNA-dependent DNA polymerase that, like
Figure 5.3. Reverse transcriptase The X-ray crystal structure at 1.8 ˚A resolution of a catalytically active fragment of reverse transcriptase from Moloney murine leukemia virus (MMLV-RT) (Georgiadis et al., 1995) The enzyme is an RNA-dependent DNA polymerase that is used in the conversion of mRNA into cDNA
Trang 165.2 cDNA LIBRARIES 193
all other DNA polymerases, catalyses the addition of new nucleotides to agrowing chain in a 5to 3direction Reverse transcriptases generally have twotypes of enzymatic activity
• DNA polymerase activity In the retroviral life cycle, reverse transcriptase
produces a DNA copy from RNA only but, as used in the laboratory, it willtranscribe both single-stranded RNA and single-stranded DNA templateswith essentially the same efficiency In both cases, an RNA or DNA primer
is required to initiate synthesis
• RNaseH activity RNaseH is a ribonuclease that degrades the RNA from
RNA–DNA hybrids, such as those formed during reverse transcription
of an RNA template RNaseH functions as both an endonuclease andexonuclease to hydrolyse its target molecules
All retroviruses encode their own reverse transcriptase (RT), but the cially available enzymes used in cDNA library construction are derived eitherfrom Moloney murine leukemia virus (MMLV-RT) or from Avian myeloblas-tosis virus (AMV-RT), after purification of the enzyme from virally infected
commer-cells or following expression in and purification from E coli Both enzymes
have the same fundamental activities, but differ in a number of characteristics,including temperature and pH optima MMLV-RT is a single polypeptide of
71 kDa in size, while AMV-RT is composed of two polypeptide chains 64 kDaand 96 kDa in size Most importantly, MMLV-RT has a very weak RNaseHactivity compared to AMV-RT, which gives it an obvious advantage whenbeing used to synthesize DNA from long RNA molecules
The process of producing a double-stranded cDNA copy of an mRNAmolecule is shown in Figure 5.4 The presence of a polyA tail is unique tomRNA, and provides a mechanism of distinguishing and isolating mRNA fromthe more abundant rRNA and tRNA molecules mRNA can be physicallyisolated from its more abundant relatives by passing total RNA over a column
to which polymers of deoxythymidine (oligo-dT) are bound RNA moleculesthat do not contain multiple adenine residues will be unable to adhere to such
a column and will flow straight through the column mRNA molecules, on theother hand, will bind through complementary base pairing to the column andwill be eluted only when the concentration of salt flowing through the column
is lowered
The cloning of cDNA is initiated by mixing short (12–18 base) cleotides of dT with purified mRNA such that the oligonucleotide will anneal tothe polyA tail of the RNA molecule Reverse transcriptase is then added and usesthe oligo-dT as a primer to synthesize a single strand of cDNA in the presence of
Trang 17TTTTT– 5'
TTTTT– 5'
TTTTT– 5'
TTTTT– 5' AAAAA– 3'
AAAAACCC– 3'
Reverse transcriptase + dNTPs
3'–CCC
Double-stranded cDNA
5'–GGG
Figure 5.4. The construction of a cDNA library See the text for details
the four deoxynucleotide triphosphates (dNTPs) The resulting molecules will
be double-stranded hybrids of one cDNA and one mRNA molecule An
oligo-dT primer used to make a cDNA strand will have heterologous ends The primercan pair at numerous positions throughout the polyA tail and consequentlywill yield cDNA fragments of different lengths which may have been derivedfrom the same mRNA molecule To overcome this problem, anchored oligo-
dT primers are often employed In addition to the 12–18 base dT sequence,anchored primers are constructed such that the extreme 3-end contains either
a G, A, or C residue (Liang and Pardee, 1992) Such primers (5-T12 – 18V-3,where V= G, A, or C) will only efficiently initiate DNA replication if they arepaired at the extreme 5-end of the polyA tail, when the G, A, or C residue canbase pair with the nucleotide immediately preceding the polyA sequence.The production of the second DNA strand, like all DNA replication, requires
a primer to initiate DNA synthesis However, beyond the polyA tail, mRNA
Trang 185.2 cDNA LIBRARIES 195
molecules produced from different genes will be different Therefore, a anism is required to initiate DNA synthesis at sequences corresponding to the
mech-5-end of the mRNA Early cDNA cloning strategies involved the formation of
a hair-pin in the newly synthesized cDNA strand, which would serve as a priming structure for the formation of the second strand The hair-pin would besubsequently removed from the double-stranded cDNA by treatment with S1
self-nuclease (Efstratiadis et al., 1976) However, such methods invariably resulted
in the loss of sequences at the 5-end of genes, and so the second DNA strand isusually synthesized following either nick translation or homopolymer tailing
• Nick translation RNAse H is used to partially digest the RNA component of
the RNA–DNA hybrids (Gubler and Hoffman, 1983) The remaining RNA
is then used as a primer for fresh DNA synthesis using DNA polymerase I
in the presence of the four dNTPs and finally DNA ligase is used to seal anyremaining nicks in the DNA backbone The resulting double-stranded cDNAmolecule can subsequently be cloned into a suitable vector
• Homopolymer tailing The RNA–DNA hybrids formed after the first
cDNA strand synthesis are treated with the enzyme terminal transferase
in the presence of a single deoxynucleotide triphosphate Terminaldeoxynucleotidal transferase (TdT) is a template independent polymerasethat catalyses the addition of deoxynucleotides to the 3-ends of DNAmolecules (Chang and Bollum, 1986) TdT activity was initially identified
by the analysis of immunoglobin (VDJ) recombination in which extranucleotides were found to be inserted into the joined segments that werenot present in either segment before joining (Alt and Baltimore, 1982) TdT
is found at high concentration in the thymus and bone marrow where suchrecombination events occur, but is commercially available as a recombinant
protein over-produced in and purified from E coli DNA (and RNA)
molecules incubated with TdT in the presence of dCTP will have multiple
C residues added to their 3-ends (Figure 5.4) Prior to the synthesis of thesecond DNA strand, the RNA of the RNA–DNA hybrids must be removed
to provide a single-stranded template for new DNA synthesis This can beachieved easily by treating the hybrids with alkali RNA is hydrolysed intoribonucleotides around pH 11, while DNA is resistant to hydrolysis up toabout pH 13 (Watson and Yamazaki, 1973) Increasing the pH to about 12therefore results in the hydrolysis of the RNA, but not the DNA Full-lengthcDNA strands are separated from the ribonucleotides on the basis of theirsize using sucrose gradient centrifugation The resulting cDNA strands willhave multiple C residues at their 3-ends and multiple T residues at their
5-ends (Figure 5.4) Second-strand cDNA synthesis is then initiated using
Trang 19TTTTT– 5' AAAAA– 3' AAAAA– 3'
TTTTT AAAAA
or Promoter
Promoter
Figure 5.5. cDNA that is to be expressed must be cloned in a defined orientation so that the promoter element to which it is attached will initiate the transcription of the sense strand of the DNA, rather than the antisense strand
an oligo-dG primer that will bind, through complementary base pairing,
to the newly formed polyC sequence Reverse transcriptase, performing therole of a DNA-dependent DNA polymerase, in the presence of the fourdNTPs will produce the second cDNA strand
Homopolymer tailing has an additional advantage in that both the 5- and 3ends of the original mRNA are tagged with specific and known sequences in theresulting double-stranded cDNA This can be immensely helpful when cloningcDNA fragments in a specific orientation is required, e.g during the expression
-of the cDNA mRNA molecules are directional The 5-end represents thebeginning of the gene sequence, and the 3 polyA tail occurs at the end of thegene sequence Therefore, if we want to express the cDNA in, for instance,bacterial cells, it is important to ensure that only the sense strand of the cDNA
is transcribed If the antisense strand is cloned downstream of a bacterialpromoter, then the resulting transcript (if produced at all) will not encode theintended protein (Figure 5.5)
5.3 Directional cDNA Cloning
The synthesis of cDNA using modified oligonucleotides to initiate each strand
of DNA synthesis allows the insertion of unique restriction enzyme recognitionsites at either end of the cDNA so that cloning of the cDNA fragments can onlyoccur in one direction (Figure 5.6) In the example shown here, the oligo-dTprimer also contains additional sequences at the 5-end that encode a XhoI
Trang 205.3 DIRECTIONAL cDNA CLONING 197
Double-stranded cDNA
Primer - 5'–GGGGAATTCGGGGG– 3' 5'–GGGGAATTCGGGGG–3'
Cut with EcoRI and XhoI
XhoI EcoRI
Promoter
XhoI EcoRI
Clone into EcoRI- XhoI-cut vector
Figure 5.6. Directional cDNA cloning Modified primers initiate DNA synthesis and result in the insertion of restriction enzyme recognition sequences at the 5 - and 3 -ends
of the cDNA
restriction enzyme recognition site (5-CTCGAG-3) As we discussed for PCR
in Chapter 4, the primer initiates DNA synthesis and is itself incorporated intothe extended product Thus a XhoI restriction enzyme recognition site will
be incorporated into the 3-end of the cDNA Similarly, the primer used toinitiate the second cDNA strand contains, in addition to the oligo-dG sequence,
an EcoRI restriction enzyme recognition site (5-GAATTC-3) at its 5-end.Consequently, the produced cDNA will contain an EcoRI site its 5-end and aXhoI site at its 3-end The placement of these sites means that the cDNA can
be cloned directionally A plasmid bearing a suitable promoter followed by, inorder, an EcoRI and a XhoI restriction enzyme recognition site will accept the
Trang 21cut cDNA fragments in one orientation only Thus the promoter will drive theexpression of the gene encoded by the cDNA and not the reverse orientation ofthe opposite strand.
An obvious problem of cutting cDNA with restriction enzymes is that thecDNA itself may contain restriction enzyme recognition sites Strategies toovercome this problem similar to those we have already encountered during theconstruction of genomic libraries can also be employed here Additionally, theinclusion of methylated forms of various deoxynucleotides during the synthesis
of the cDNA will protect the newly synthesized DNA strands from cleavage bycertain restriction enzymes For example, cDNA produced in the presence ofmethylated dCTP will be resistant to cleavage by XhoI by virtue of the presence
of the methylated C residues (Endow and Roberts, 1977) Alternatively, cutting restriction enzyme recognition sites, e.g the recognition sequence forNotI (5-GCGGCCGC-3), may be added to the ends of the cDNA fragments
rare-to reduce the likelihood of enzyme cleavage within the cDNA itself
The initiation of cDNA synthesis using oligo-dT primers has been immenselysuccessful in the construction of a variety of cDNA libraries The approachdoes, however, have limitations An oligo-dT primer is suitable only forreverse transcription of mRNA molecules with poly(A) tails Prokaryotic RNA
and some eukaryotic mRNAs do not have polyA tails (Adesnik et al., 1972).
Therefore prokaryotic cDNA libraries cannot be produced by this method,and in eukaryotic libraries some sequences may never be present Additionally,initial priming at the 3-end of transcripts will tend to result in the formation
of libraries that are enriched with DNA fragments representing the 3-ends
of genes – long transcripts may therefore not be fully represented within thelibrary These problems can be addressed by using random primers to initiatethe first strand of cDNA synthesis The random primers are usually six tonine nucleotides in length and are synthesized to be a mixture of all possiblebases at each position (5-NNNNNN-3) Random primers will hybridize atrandom positions along the mRNA and will serve as starting points for DNAsynthesis cDNA cloned by this method, following the synthesis of the secondstrand, is unlikely to be full length, but will generate DNA fragments that aremore representative of the starting mRNA Methods have been devised to clonefull-length cDNAs starting from a fragment that may have been isolated from
a random-primed library (Frohman, Duch and Martin, 1988)
5.4 PCR Based Libraries
The construction of high-quality cDNA libraries is both time consuming andtechnically difficult The stability and permanency of a library in which thecDNA fragments have been physically cloned into a vector, coupled with the
Trang 225.4 PCR BASED LIBRARIES 199
ability to screen it multiple times, makes these libraries popular choices forisolating cDNA clones In many cases, however, the need to construct a clonedcDNA library can be bypassed by the analysis of PCR products formed frommRNA This type of approach is only possible if screening of the DNA fragments(Chapter 6) is performed using nucleic acid hybridization and is not applicablewhen functional analysis of the encoded protein is required Nevertheless, PCR-based libraries are easy and rapid to both construct and screen PCR-basedlibraries are constructed using a combination of reverse transcriptase and PCR
(RT–PCR) (Mocharla, Mocharla and Hodes, 1990) RT–PCR is both sensitive
and versatile The technique can be used to determine the presence or absence of
a transcript, to estimate expression levels and to clone cDNA products withoutthe necessity of constructing and screening a cDNA library
A generalized overall scheme for the production of an RT–PCR library from amixed population of unknown mRNA molecules is shown in Figure 5.7 MostRT–PCR protocols employ reverse transcriptase to produce the first cDNA
strand (Murakawa et al., 1988) The production of a single strand of cDNA is
Trang 23sufficient prior to the progression of the PCR stage, where second-strand cDNAsynthesis and subsequent PCR amplification is performed using a thermostableDNA polymerase – e.g Taq DNA polymerase (see Chapter 4) In addition
to their DNA-dependent DNA polymerase activity, some thermostable DNA
polymerases (e.g Thermus thermophilus (Tth) DNA polymerase) possess a
reverse transcriptase activity in the presence of manganese ions This has led tothe development of protocols for single-enzyme reverse transcription and PCRamplification (Myers and Gelfand, 1991) Systems have also been developed inwhich the reverse transcriptase reaction and PCR are performed in the samebuffer to eliminate secondary additions to the reaction mix to decrease bothhands-on time and the likelihood of introducing contaminants into the reaction(Wang, Cao and Johnson, 1992) Such systems are ideal for the amplification
of mRNA molecules whose sequence is already known using highly specificprimers, but the construction of an amplified representative library requiresadditional steps to ensure that each mRNA molecule within the population isrepresented within the library Several methods have been devised to amplifyall potential mRNA species within a sample The method outlined in Figure 5.7utilizes many of the same elements as we have already seen in cDNA libraryconstruction The first cDNA strand is synthesized using reverse transcriptasefrom an oligo-dT primer to which additional, unique sequences have beenadded at the 5-end The mRNA strand of the RNA–DNA hybrid is removed
by treatment with RNaseH, prior to the addition of multiple C residues at the 3end of the DNA molecule using terminal transferase The second cDNA strand
-is synthesized using an oligo-dG primer that, again, has unique sequences at its
5-end The thermostable DNA polymerase that will be used for the subsequentPCR reaction may also be used to perform the synthesis of the second cDNAstrand The unique sequences at the 5- and 3-ends of the resulting double-stranded cDNA are then used as primer binding sites for a PCR reaction usingprimers containing these sequences The resulting PCR products will contain ahuge number of copies of each cDNA molecule produced in the RT reaction
5.5 Subtraction Libraries
As we have discussed earlier, many of the mRNA molecules produced bydifferent cells will be the same For example, almost all cells need to producethe enzymes required for glucose metabolism, and many of the intracellularprotein components of all cells, are identical Therefore, we might want tojust concentrate on the differences between cell types to identify genes thatare distinctive to a cell type, developmental stage or particular environmentalstress The advantage of PCR-based cDNA libraries is that they are amenable to
Trang 245.5 SUBTRACTION LIBRARIES 201
Boil and anneal
Extract biotin using avidin
Trang 25the removal (subtraction) of sequences that are common between two separatelibraries This gives an enrichment of the unique sequences and allows these to
be studied more readily A mechanism by which this type of subtraction canoccur is shown in Figure 5.8
Two PCR based cDNA libraries are constructed from different mRNAsamples One of the libraries (the driver) is produced using an oligonucleotidethat has a biotin moiety chemically added to it Biotin is a cofactor requiredfor enzymes that are involved in ATP-dependent carboxylation reactions,e.g acetyl-CoA carboxylase and pyruvate carboxylase (Figure 5.9) Biotindeficiencies in animals are rare, but can be observed following excessiveconsumption of raw eggs (Baugh, Malone and Butterworth, 1968) The binding
of an egg-white protein, called avidin, to biotin prevents its intestinal absorption(Figure 5.9) The complex formed between biotin and avidin is extraordinarily
stable (binding constant, K d ∼ 1 × 10−15M) and avidin can effectively sequester
NH HN
(a)
(b)
Figure 5.9. The structure of biotin and the avidin–biotin complex (a) The molecular structure of biotin (b) The binding of the biotin molecule to avidin Shown here is an avidin monomer with a biotin molecule (blue) bound (Pugliese et al., 1993) The functional avidin molecule is a homotetramer