Phosphorylation of Ser residues and methylation of Lys residues in histone tails also contribute to transcription regulation Figure 29.30.. Structural studies on regulatory proteins that
Trang 1Covalent Modification of Histones
Chromatin is also remodeled through the action of enzymes that covalently modify
side chains on histones within the core octamer These modifications either
dimish DNA⬊histone associations through disruption of electrostatic interactions or
in-troduce substitutions that can recruit binding of new protein participants through
protein–protein interactions
Initial events in transcriptional activation include acetyl-CoA–dependent acetylation
of
(HATs)(Figure 29.30) The histone transacetylases responsible are essential
compo-nents of several megadalton-size complexes known to be required for transcription
co-activation (co-co-activation in the sense that they are required along with RNA polymerase
II and other components of the transcriptional apparatus) Examples of such
com-plexes include the TFIID (some of whose TAFIIs have HAT activity), the SAGA
com-plex(which also contains TAFIIs), and the ADA complex N-Acetylation suppresses the
positive charge in histone tails, diminishing their interaction with the negatively
charged DNA
Phosphorylation of Ser residues and methylation of Lys residues in histone tails also
contribute to transcription regulation (Figure 29.30) Attachment of small proteins to
histone C-terminal lysine residues through ubiquitination and sumoylation (see
Chap-ter 31) are two additional forms of covalent modification found in nucleosomes
Col-lectively, these modifications create binding sites for proteins that modulate chromatin
structure, such as the chromatremodeling complexes with bromodomains that
in-teract specifically with acetylated lysine residues and chromodomains that bind to
methylated lysine residues A “histone code” has emerged
Covalent Modification of Histones Forms the Basis of the Histone Code
A code based on histone-tail covalent modifications determines gene expression
through selective recruitment of proteins Proteins that cause chromatin
com-paction (heterochromatin formation) lead to repression; proteins giving easier
ac-cess to DNA through relaxation of histone⬊DNA interactions favor the possibility of
gene expression
Histone globular regions
119
2
2
1 3 5 8 12 16
4
4
9
9
14 14
acK
meR
meK
PS
18
18
23
23
20
36
20
20 16
12
12
15 8
5
H3
5
5
3 1
DNA DNA
H2A
FIGURE 29.30 A schematic diagram of the nucleosome illustrating the various covalent modifications on the
N-terminal tails of histones acK acetylated lysine
residue; meK methylated lysine residue; meR methy-lated arginine residue; PS phosphorylated serine residue The numbers indicate the positions of the amino acids in the amino acid sequences Note the prevalence
of modifiable sites, particularly acetylatable lysines, on the N-terminal tails of histones H2B, H3, and H4.
Trang 2The prominent forms of histone covalent modification are lysine acetylation, ly-sine methylation, serine phosphorylation, lyly-sine ubiquitination, and lyly-sine sumoy-lation The lysine residue at position 9 (K9) in the histone H3 amino acid sequence
is methylated in heterochromatin, the compacted, repressed state of chromatin In contrast, lysine 4 (K4) of histone H3 typically is methylated in chromatin where gene expression is active Different proteins are recruited to these two methylated
forms of histone H3 Methylated K9 recruits heterochromatin protein 1 (HP1), which binds via its chromodomain On the other hand, methylated K4 binds CHD1,
a chromatin remodeling protein with two chromodomains (CHD1 is an acronym
for chromodomain, helicase, DNA-binding.) Ubiquitination of Lys120 in the C-terminal tail of H2B favors methylation (and thus transcription activation), whereas ubiquitination of Lys119in the C-terminal tail of H2A favors repression Sumoylation of Lys residues tends to repress transcription; apparently, sumolyation antagonizes acetylation
Methylation and Phosphorylation Act as a Binary Switch
in the Histone Code
As cells enter mitosis, the chromatin becomes condensed and histone H3 is not only methylated at K9 but also phosphorylated at the adjacent serine residue, S10 S10 phosphorylation triggers the dissociation of HP1 from the heterochromatin Thus, phosphorylation of the residue neighboring K9 trumps HP1 binding Similarly, phosphorylation of the threonine residue (T3) neighboring K4 in the histone H3 tail evicts CHD1 from its site on the methylated K4 Apparently, lysine methylation is the
“on” position for the binary switch that recruits specific proteins to histone tails, and phosphorylation at a neighboring residue turns the switch to the “off” position by ejecting the bound proteins There are at least 16 instances of serines or threonines immediately flanking lysine residues in the four histones that constitute the histone core octamer of the nucleosome The methylation-phosphorylation binary switch may
be a general phenomenon in the regulation of chromatin dynamics
Chromatin Deacetylation Leads to Transcription Repression
Deacetylation of histones is a biologically relevant matter, and enzyme complexes that
carry out such reactions have been characterized Known as histone deacetylase
com-plexes, or HDACs, they catalyze the removal of acetyl groups from lysine residues
along the histone tails, restoring the chromatin to a repressed state Beyond these ef-fects on transcription, histone modifications determine whether significant cellular events involving DNA allocation through mitosis and meiosis may occur
Nucleosome Alteration and Interaction of RNA Polymerase II with the Promoter Are Essential Features in Eukaryotic Gene Activation
Gene activation (the initiation of transcription) can thus be viewed as a process requiring two principal steps: (1) alterations in nucleosomes (and thus, chromatin) that relieve the general repressed state imposed by chromatin structure, followed by (2) the interaction of RNA polymerase II and the GTFs with the promoter
Transcription activators(proteins that bind to enhancers and response elements)
ini-tiate the process by recruiting chromatin-altering proteins (the chromatin-remodeling
complexes and histone-modifying enzymes described previously) Once these alterations
have occurred, promoter DNA is accessible to TBP⬊TFIID, the other GTFs, and RNA polymerase II Transcription activation, however, requires communication between
RNA polymerase II and the transcription activator for transcription to take place Me-diator (or Srb/Med) fulfills this function MeMe-diator interacts with both the transcrip-tion activator and the CTD of RNA polymerase II This Mediator bridge provides an
essential interface for communication between enhancers and promoters, triggering RNA polymerase II to begin transcription A general model for transcription initia-tion is shown in Figure 29.31 Once transcripinitia-tion begins, Mediator is replaced by
Trang 3another complex called Elongator Elongator has HAT subunits whose activity
remod-els downstream nucleosomes as RNA polymerase II progresses along the
chromatin-associated DNA
The interactions described thus far emphasize regulation of gene expression at
the level of RNA polymerase II recruitment to promoters However, whole-genome
analyses show that, for many genes, RNA polymerase II is already situated at
pro-moters and appears to be paused there, awaiting signals that will activate the
elon-gation phase of transcription Thus, the expression of many genes may be regulated
at the level of transcription elongation
Beyond these considerations, various mechanisms regulate gene expression
through events that take place subsequent to transcription Post-transcriptional
gene regulation mediated by microRNAs, such as RNAi (see Chapter 12) and gene
silencing (see Chapter 10), as well as alternative RNA splicing and nucleotide
changes introduced through RNA editing (as described in this chapter), are
mech-anisms targeting transcripts Post-translational modifications of proteins also play a
major role in the regulation of gene expression, as assessed at the level of
biologi-cal activity (see Chapter 30)
A SINE of the Times
An interesting twist on transcription regulation comes from the discovery that
cer-tain noncoding RNAs (ncRNAs) act as transcription factors through direct binding
to RNA polymerase II For example, ncRNA B2 in mouse and Alu RNA in humans
are RNAs encoded within short interdispersed elements (SINEs) SINEs are
abun-dant within animal DNA and were once considered “junk” DNA because they lack
protein-coding properties Ala RNA or ncRNA B2 blocks promoter-bound RNA
polymerase II by interfering with transcription initiation
Specific DNA Sequences?
Proteins that recognize nucleic acids do so by the basic rule of macromolecular
recog-nition That is, such proteins present a three-dimensional shape or contour that is
structurally and chemically complementary to the surface of a DNA sequence When
the two molecules come into contact, the numerous atomic interactions that underlie
recognition and binding can take place Nucleotide sequence–specific recognition
by the protein involves a set of atomic contacts with the bases and the sugar–
phosphate backbone Hydrogen bonding is critical for recognition, with amino acid
side chains providing most of the critical contacts with DNA Protein contacts with the
bases of DNA usually occur within the major groove (but not always) Protein contacts
with the DNA backbone involve both H bonds and salt bridges with electronegative
oxygen atoms of the phosphodiester linkages Structural studies on regulatory proteins
that bind to specific DNA sequences have revealed that roughly 80% of such proteins
can be assigned to one of three principal classes based on their possession of one of
Coactivator
Acetylase
pol II, GTFs
Co-repressor
Deacetylase
TF
FIGURE 29.31 A model for the transcriptional regulation
of eukaryotic genes The DNA is a green ribbon wrapped around disclike nucleosomes A specific tran-scription factor (TF, pink) is bound to a regulatory ele-ment (either an enhancer or silencer) RNA polymerase II and its associated GTFs (blue) are bound at the pro-moter The N-terminal tails of histones are shown as wavy lines (blue) emanating from the nucleosome discs A specific transcription factor that is a transcrip-tion activator stimulates transcriptranscrip-tion through interac-tion with a co-activator whose HAT activity renders the
DNA more accessible and through interactions with the
Mediator complex associated with RNA polymerase II.
A specific transcription factor that is a repressor inter-acts with a co-repressor that has HDAC activity that deacetylates histones, restructuring the nucleosomes into a repressed state (From Figure 1 in Kornberg, R D., 1999.
Eukaryotic transcription control Trends in Biochemical Sciences
24:M46–M49.)
Trang 4three kinds of small, distinctive structural motifs: the helix-turn-helix (or HTH), the
zinc finger(or Zn-finger), and the leucine zipper-basic region (or bZIP) The latter two
motifs are found only in DNA-binding proteins from eukaryotic organisms
In addition to their DNA-binding domains, these proteins commonly possess other structural domains that function in protein⬊protein recognitions essential to oligomerization (for example, dimer formation), DNA looping, transcriptional ac-tivation, and signal reception (for example, effector binding)
␣-Helices Fit Snugly into the Major Groove of B-DNA
A recurring structural feature in DNA-binding proteins is the presence of
-helical segments that fit directly into the major groove of B-form DNA The
diam-eter of an -helix (including its side chains) is about 1.2 nm The dimensions of the
major groove in B-DNA are 1.2 nm wide by 0.6 to 0.8 nm deep Thus, one side of an
-helix can fit snugly into the major groove Although examples of -sheet DNA
recognition elements in proteins are known, the -helix and B-form DNA are the
predominant structures involved in protein⬊DNA interactions Significantly, pro-teins can recognize specific sites in “normal” B-DNA; the DNA need not assume any unusual, alternative conformation (such as Z-DNA)
Proteins with the Helix-Turn-Helix Motif Use One Helix
to Recognize DNA
The HTH motif is a protein structural domain consisting of two successive -helices
separated by a sharp -turn (Figure 29.32) Within this domain, the -helix situated
more toward the C-terminal end of the protein, the so-called helix 3, is the DNA
recognition helix; it fits nicely into the major groove, with several of its side chains
touching DNA base pairs Helix 2, the helix at the beginning of the HTH motif,
cre-ates a stable structural domain through hydrophobic interactions with helix 3 that locks helix 3 into its DNA interface Proteins with HTH motifs bind to DNA as dimers In the dimer, the two helix 3 cylinders are antiparallel to each other, such that their N⎯→C orientations match the inverted relationship of nucleotide sequence
in the dyad-symmetric DNA-binding site An example is Antp Antp is a member of
a family of eukaryotic proteins involved in the regulation of early embryonic devel-opment that have in common an amino acid sequence element known as the
homeobox6domain.The homeobox is a DNA motif that encodes a related 60–amino acid sequence (the homeobox domain) found among proteins of virtually every
HUMAN BIOCHEMISTRY
Storage of Long-Term Memory Depends on Gene Expression Activated
by CREB-Type Transcription Factors
Learning can be defined as the process whereby new information is
acquired and memory as the process by which this information is
re-tained Short-term memory (which lasts minutes or hours) requires
only the covalent modification of preexisting proteins, but long-term
memory (which lasts days, weeks, or a lifetime) depends on gene
ex-pression, protein synthesis, and the establishment of new neuronal
connections
The macromolecular synthesis underlying long-term memory
storage requires cAMP-response element-binding (CREB) protein–
related transcription factors and the activation of cAMP-dependent
gene expression Serotonin (5-hydroxytryptamine, or 5-HT, a
hor-mone implicated in learning and memory) acting on neurons
pro-motes cAMP synthesis, which in turn stimulates protein kinase A to
phosphorylate CREB protein–related transcription factors that activate transcription of cAMP-inducible genes These genes are
characterized by the presence of CRE (cAMP response element)
consensus sequences containing the 8-bp TGACGTCA palindrome
CREB transcription factors are bZIP-type proteins (see later
discus-sion) These exciting findings opened a new arena in molecular
biology, the molecular biology of cognition Eric Kandel was awarded
the 2000 Nobel Prize in Physiology or Medicine for, among other things, his discovery of the role of CREB-type transcription factors in long-term memory storage
Cognition is the act or process of knowing; the acquisition of knowledge.
6Homeo derives from homeotic genes, a set of genes originally discovered in the fruit fly Drosophila melanogaster through their involvement in the specification of body parts during development.
FIGURE 29.32 An HTH motif protein: Antp monomer
bound to DNA Helix 3 (yellow) is locked into the major
groove of the DNA by helix 2 (magenta) (pdb id 9ANT).
Trang 5eukaryote, from yeast to man Embedded within the homeobox domain is an HTH
motif Homeobox domain proteins act as sequence-specific transcription factors.
Typically, the homeobox portion comprises only 10% or so of the protein’s mass,
with the remainder of the protein serving in protein⬊protein interactions essential
to transcription regulation Other DNA-binding proteins with HTH motifs are lac
re-pressor, trp rere-pressor, and the C-terminal domain of CAP.
How Does the Recognition Helix Recognize Its Specific DNA-Binding Site? The
edges of base pairs in dsDNA present a pattern of hydrogen-bond donor and
ac-ceptor groups within the major and minor grooves, but only the pattern displayed
on the major-groove side is distinctive for each of the four base pairs A⬊T, T⬊A, C⬊G,
and G⬊C (You can get an idea of this by inspecting the structures of the base pairs
in Figure 11.6.) Thus, the base-pair edges in the major groove act as a recognition
matrixidentifiable through H bonding with a specific protein, so it is not necessary
to melt the base pairs to read the base sequence Although formation of such
H bonds is very important in DNA⬊protein recognition, other interactions also play
a significant role For example, the C-5-methyl groups unique to thymine residues
are nonpolar “knobs” projecting into the major groove
Proteins Also Recognize DNA via “Indirect Readout” Indirect readoutis the term
for the ability of a protein to indirectly recognize a particular nucleotide sequence by
recognizing local conformational variations resulting from the effects that base
se-quence has on DNA structure Superficially, the B-form structure of DNA appears to
be a uniform cylinder Nevertheless, the conformation of DNA over a short distance
along its circumference varies subtly according to local base sequence That is, base
sequences generate unique contours that proteins can recognize Because these
con-tours arise from the base sequence, the DNA-binding protein “indirectly reads out”
the base sequence through interactions with the DNA backbone In the E coli Trp
re-pressor⬊trp operator DNA complex, the Trp repressor engages in 30 specific
hydro-gen bonds to the DNA: 28 involve phosphate groups in the backbone; only 2 are to
bases Thus, some sequence-specific DNA-binding proteins are able to recognize an
overall DNA conformation caused by the specific DNA sequence
Some Proteins Bind to DNA via Zn-Finger Motifs
There are many classes of Zn-finger motifs The prototype Zn-finger is a structural
fea-ture formed by a pair of Cys residues separated by 2 residues, then a run of 12 amino
acids, and finally a pair of His residues separated by 3 residues (Cys-x2-Cys-x12-His-x3
-His) This motif may be repeated as many as 13 times over the primary structure of a
Zn-finger protein Each repeat coordinates a zinc ion via its 2 Cys and 2 His residues
(Figure 29.33) The 12 or so residues separating the Cys and His coordination sites
Cys
Cys
His
His
Zn
Cys Cys His
His
(b)
Zn
(c)
FIGURE 29.33 The Zn-finger motif of the C 2 H 2
type showing (a) the coordination of Cys and His residues to Zn and (b) the secondary structure.
(c) Structure of a classic C2 H 2 zinc finger protein (zif268) with three zinc fingers bound to DNA (pdb id 1ZAA).
(c)
Trang 6are looped out and form a distinct DNA interaction module, the so-called Zn-finger When Zn-finger proteins associate with DNA, each Zn-finger binds in the major groove and interacts with about five nucleotides, adjacent fingers interacting with contiguous stretches of DNA Many DNA-binding proteins with this motif have been identified In all cases, the finger motif is repeated at least two times, with at least a 7– to 8–amino acid linker between Cys/Cys and His/His sites Proteins with this
gen-eral pattern are assigned to the C 2 H 2 classof Zn-finger proteins to distinguish them
from proteins bearing another kind of Zn-finger, the C x type,which includes the C4
and C6Zn-finger proteins The Cxproteins have a variable number of Cys residues
available for Zn chelation For example, the vertebrate steroid receptors have two sets
of Cys residues, one with four conserved cysteines (C4) and the other with five (C5)
Some DNA-Binding Proteins Use a Basic Region-Leucine
Zipper (bZIP) Motif
bZIP is a structural motif characterizing the third major class of sequence-specific,
DNA-binding proteins This motif was first recognized by Steve McKnight in C/EBP, a
heat-stable, DNA-binding protein isolated from rat liver nuclei that binds to both CCAAT promoter elements and certain enhancer core elements.7The DNA-binding domain of C/EBP was localized to the C-terminal region of the protein This region shows a notable absence of Pro residues, suggesting it might be arrayed in an -helix.
Within this region are two clusters of basic residues: A and B Further along is a 28-residue sequence When this latter region is displayed end-to-end down the axis of
a hypothetical -helix, beginning at Leu315, an amphipathic cylinder is generated, sim-ilar to the one shown in Figure 6.22 One side of this amphipathic helix consists prin-cipally of hydrophobic residues (particularly leucines), whereas the other side has an array of negatively and positively charged side chains (Asp, Glu, Arg, and Lys), as well
as many uncharged polar side chains (glutamines, threonines, and serines)
The Zipper Motif of bZIP Proteins Operates Through Intersubunit
Interaction of Leucine Side Chains
The leucine zipper motif arises from the periodic repetition of leucine residues within this helical region The periodicity causes the Leu side chains to protrude from the same side of the helical cylinder, where they can enter into hydrophobic interactions with a similar set of Leu side chains extending from a matching helix in a second polypeptide These hydrophobic interactions establish a stable noncovalent linkage, fostering dimerization of the two polypeptides (as shown in Figure 29.34) The leucine zipper is not a DNA-binding domain Instead, it functions in protein dimer-ization Leucine zippers have been found in other mammalian transcriptional
regu-latory proteins, including Myc, Fos, and Jun.
The Basic Region of bZIP Proteins Provides the DNA-Binding Motif
The actual DNA contact surface of bZIP proteins is contributed by a 16-residue
seg-ment that ends exactly 7 residues before the first Leu residue of the Leu zipper This DNA contact region is rich in basic residues and hence is referred to as the
basic region.Two bZIP polypeptides join via a Leu zipper to form a Y-shaped
mol-ecule in which the stem of the Y corresponds to a coiled pair of -helices held by
the leucine zipper The arms of the Y are the respective basic regions of each polypeptide; they act as a linked set of DNA contact surfaces (Figure 29.34) The dimer interacts with a DNA target site by situating the fork of the Y at the center
of the dyad-symmetric DNA sequence The two arms of the Y can then track along the major groove of the DNA in opposite directions, reading the specific
recogni-tion sequence (Figure 29.35) An interesting aspect of bZIP proteins is that the two
polypeptides need not be identical (Figure 29.35) Heterodimers can form, pro-vided both polypeptides possess a leucine zipper region An important
conse-Chelation is from the Greek word chele, meaning
“claw”; it refers to the binding of a metal ion to
two or more nonmetallic atoms in the same
molecule
Leucine zipper (dimerization motif) BR-B
BR-A
Basic region
(DNA contact
surface)
C C
FIGURE 29.34 Model for a dimeric bZIP protein Two
bZIP polypeptides dimerize to form a Y-shaped
mole-cule The stem of the Y is the Leu zipper, and it holds
the two polypeptides together Each arm of the Y is the
basic region from one polypeptide Each arm is
com-posed of two -helical segments: BR-A and BR-B (basic
regions A and B) 7The acronym C/EBP designates this protein as a “CCAAT and enhancer-binding protein.”
Trang 7quence of heterodimer formation is that the DNA target site need not be a
palin-dromic sequence The respective basic regions of the two different bZIP
polypep-tides (for example, Fos and Jun) can track along the major groove reading two
dif-ferent base sequences Heterodimer formation expands enormously the DNA
recognition and regulatory possibilities of this set of proteins
and Delivered to the Ribosomes for Translation?
Transcription and translation are concomitant processes in prokaryotes, but in
eukaryotes, the two processes are spatially separated (see Chapter 10)
Transcrip-tion occurs on DNA in the nucleus, and translaTranscrip-tion occurs on ribosomes in the cytoplasm.
Consequently, transcripts must be transported from the nucleus to the cytosol to
be translated On the way, these transcripts undergo processing: alterations that
convert the newly synthesized RNAs, or primary transcripts, into mature
messen-ger RNAs Also, unlike prokaryotes, in which many mRNAs encode more than
one polypeptide (that is, they are polycistronic), eukaryotic mRNAs encode only
one polypeptide (that is, they are exclusively monocistronic)
Eukaryotic Genes Are Split Genes
Most genes in higher eukaryotes are split into coding regions, called exons,8and
noncoding regions, called introns (Figure 29.36; see also Figure 10.20) Introns are
the intervening nucleotide sequences that are removed from the primary
tran-script when it is processed into a mature RNA Gene expression in eukaryotes
en-tails not only transcription but also the processing of primary transcripts to yield the
mature RNA molecules we classify as mRNAs, tRNAs, rRNAs, and so forth
The Organization of Exons and Introns in Split Genes Is Both
Diverse and Conserved
Split genes occur in an incredible variety of interruptions and sizes The yeast actin
geneis a simple example, having only a single 309-bp intron that separates the
nu-cleotides encoding the first 3 amino acids from those encoding the remaining 350 or
so amino acids in the protein The chicken ovalbumin gene is composed of 8 exons
FIGURE 29.35 Model for the heterodimeric bZIP tran-scription factor c-Fos ⬊c-Jun bound to a DNA oligomer
containing the AP-1 consensus target sequence TGACTCA (pdb id 1FOS).
Gene
Promoter/enhancer sequences
Exon 1 Intron Exon 2 Intron Exon 3 Intron Exon 4
DNA
coding
strand
5
mRNA
mRNA transcript
signal
3 -untranslated region (variable length since transcription termination is imprecise)
Processing (capping, methylation, poly (A) addition, splicing)
Exon 1
5 -untranslated region
7-mG cap Mature mRNA
Exon 1 Exon 2 Exon 3 Exon 4
(A)100–200
FIGURE 29.36 The organization of split eukaryotic genes.
8Although the term exon is commonly used to refer to the protein-coding regions of an interrupted or
split gene, a more precise definition would specify exons as sequences that are represented in mature
RNA molecules This definition encompasses not only protein-coding genes but also the genes for
var-ious RNAs (such as tRNAs or rRNAs) from which intervening sequences must be excised in order to
generate the mature gene product.
Trang 8and 7 introns The two vitellogenin genes of the African clawed toad Xenopus laevis are
both spread over more than 21 kbp of DNA; their primary transcripts consist of just
6 kb of message that is punctuated by 33 introns The chicken pro ␣-2 collagen gene
has a length of about 40 kbp; the coding regions constitute only 5 kb distributed over
51 exons within the primary transcript The exons are quite small, ranging from 45 to
249 bases in size
Clearly, the mechanism by which introns are removed and multiple exons are spliced together to generate a continuous, translatable mRNA must be both precise and complex If one base too many or too few is excised during splicing, the coding
sequence in the mRNA will be disrupted The mammalian DHFR (dihydrofolate
reductase) geneis split into 6 exons spread over more than 31 kbp of DNA The
6 exons are spliced together to give a 6-kb mRNA (Figure 29.37) Note that, in three different mammalian species, the size and position of the exons are essentially the same but that the lengths of the corresponding introns vary considerably Indeed, the lengths of introns in vertebrate genes range from a minimum of about 60 bases
to more than 10,000 bp Many introns have nonsense codons in all three reading frames and thus are untranslatable Introns are found in the genes of mitochondria and chloroplasts as well as in nuclear genes Although introns have been observed in archaea and even bacteriophage T4, none are known in the genomes of bacteria
Post-Transcriptional Processing of Messenger RNA Precursors Involves Capping, Methylation, Polyadenylylation, and Splicing
Capping and Methylation of Eukaryotic mRNAs The protein-coding genes of eukaryotes are transcribed by RNA polymerase II to form primary transcripts or
pre-mRNAs that serve as precursors to mRNA As a population, these RNA molecules are very large and their nucleotide sequences are very heterogeneous because they represent the transcripts of many different genes, hence the
des-ignation heterogeneous nuclear RNA, or hnRNA Shortly after transcription of
hnRNA is initiated, the 5-end of the growing transcript is capped by addition of
a guanylyl residue This reaction is catalyzed by the nuclear enzyme guanylyl
transferase using GTP as substrate (Figure 29.38) The cap structure is
methyl-ated at the 7-position of the G residue Additional methylations may occur at the 2-O positions of the two nucleosides following the 7-methyl-G cap and at the 6-amino group of a first base adenine (Figure 29.39)
3 ⴕ-Polyadenylylation of Eukaryotic mRNAs Transcription by RNA polymerase II typically continues past the 3-end of the mature messenger RNA Primary tran-scripts show heterogeneity in sequence at their 3-ends, indicating that the precise point where termination occurs is nonspecific However, termination does not nor-mally occur until RNA polymerase II has transcribed past a consensus AAUAAA
se-quence known as the polyadenylylation signal.
Most eukaryotic mRNAs have 100 to 200 adenine residues attached at their 3-end, the poly(A) tail [Histone mRNAs are the only common mRNAs that lack
Chinese hamster
Exon Intron
Mouse
Human
kb
FIGURE 29.37 The organization of the mammalian
DHFR gene in three representative species Note that
the exons are much shorter than the introns Note also
that the exon pattern is more highly conserved than
the intron pattern.
Trang 9poly(A) tails.] These A residues are not encoded in the DNA but are added
post-transcriptionally by the enzyme poly(A) polymerase, using ATP as a substrate The
consensus AAUAAA is not itself the poly(A) addition site; instead it defines the
po-sition where poly(A) addition occurs (Figure 29.40) The consensus AAUAAA is
found 10 to 35 nucleotides upstream from where the nascent primary transcript is
cleaved by an endonuclease to generate a new 3-OH end This end is where the
poly(A) tail is added The processing events of mRNA capping, poly(A) addition,
and splicing of the primary transcript create the mature mRNA Interestingly, both
the guanylyl transferase that adds the 5-cap structure and the enzymes that process
the 3-end of the transcript and add the poly (A) tail are anchored to RNA
poly-merase II via interactions with its RPB1 CTD
Nuclear Pre-mRNA Splicing
Within the nucleus, hnRNA forms ribonucleoprotein particles (RNPs) through
as-sociation with a characteristic set of nuclear proteins These proteins interact with
the nascent RNA chain as it is synthesized, maintaining the hnRNA in an untangled,
O HN
H2N N
N N
OH OH
N N
N N
NH2
OH
5
P
5 -capped transcript
N
P PNPNP .
O
H2N
GTP
CH2 O
OH OH
N
N N
HN
G
CH2 O
OH
5-end of transcript N
N N
N N
NH2
P PNPNP .
A
+
P
+
O
P P P
P P
P P
P
Guanylyl transferase
FIGURE 29.38 The capping of eukaryotic pre-mRNAs Guanylyl transferase catalyzes the addition of a guanylyl
residue (Gp) derived from GTP to the 5 -end of the growing transcript, which has a 5-triphosphate group already
there In the process, pyrophosphate (pp) is liberated from GTP and the terminal phosphate (p) is removed from
the transcript Gppp pppApNpNpNp ⎯→ GpppApNpNpNp pp p (A is often the initial nucleotide in the
primary transcript).
O
HN
H2N N
N+
N
CH3
CH2 O P
O–
O
O–
O
O–
O
O CH2 O
O O CH3
N N
N N
NH2
O–
O O CH3
N N
N N
NH2
O–
N NH O
O
etc.
A
U
5
3
OH OH O
FIGURE 29.39 Methylation of several specific sites located at the 5 -end of eukaryotic pre-mRNAs is
an essential step in mRNA maturation A cap bearing only a single OCH 3 on the guanyl is termed
cap 0 This methylation occurs in all eukaryotic mRNAs If a methyl is also added to the 2-O position
of the first nucleotide after the cap, a cap 1 structure is generated.This is the predominant cap form
in all multicellular eukaryotes Some species add a third OCH 3 to the 2 -O position of the second
nucleotide after the cap, giving a cap 2 structure Also, if the first base after the cap is an adenine, it
may be methylated on its 6-NH 2 In addition, approximately 0.1% of the adenine bases throughout
the mRNA of higher eukaryotes carry methylation on their 6-NH 2 groups.
Trang 10accessible conformation The substrate for splicing, that is, intron excision and exon ligation, is the capped primary transcript emerging from the RNA polymerase
II transcriptional apparatus, in the form of an RNP complex Splicing occurs ex-clusively in the nucleus The mature mRNA that results is then exported to the cytoplasm to be translated Splicing requires precise cleavage at the 5- and 3-ends
of introns and the accurate joining of the two ends Consensus sequences define the exon/intron junctions in eukaryotic mRNA precursors, as indicated from an analy-sis of the splice sites in vertebrate genes (Figure 29.41) Note that the sequences GU and AG are found at the 5- and 3-ends, respectively, of introns in pre-mRNAs from higher eukaryotes In addition to the splice junctions, a conserved sequence within
the intron, the branch site, is also essential to pre-mRNA splicing The site lies 18 to
40 nucleotides upstream from the 3-splice site and is represented in higher eu-karyotes by the consensus sequence YNYRAY, where Y is any pyrimidine, R is any purine, and N is any nucleotide
The Splicing Reaction Proceeds via Formation of a Lariat Intermediate
The mechanism for splicing nuclear mRNA precursors is shown in Figure 29.42
A covalently closed loop of RNA, the lariat, is formed by attachment of the
5-phosphate group of the intron’s invariant 5-G to the 2-OH at the invariant branch site A to form a 2-5 phosphodiester bond Note that lariat formation creates
an unusual branched nucleic acid The lariat structure is excised when the 3-OH of the consensus G at the 3-end of the 5 exon (Exon 1, Figure 29.42) covalently joins with the 5-phosphate at the 5-end of the 3 exon (Exon 2) The reactions that occur are transesterification reactions where an OH group reacts with a phospho-ester bond, displacing an OOH to form a new phosphophospho-ester link Because the reac-tions lead to no net change in the number of phosphodiester linkages, no energy
in-DNA
3
3
RNA polymerase
Initiates RNA polymerasecontinues
A A U A A
A A U A A
G/U
G/U
G/U cap
cap
cap
3 -OH cap
cap
CFs
CFs
Cleavage, CFs dissociate,
as does 3 -fragment
CPSF dissociates
Polyadenylylates the 3 -end
CPSF
CPSF
CPSF
CPSF
PAP
(A)100–200
A A U A A
A A U A A
A A U A A A A A A A A
FIGURE 29.40 Poly(A) addition to the 3 -ends of
tran-scripts occurs 10 to 35 nucleotides downstream from a
consensus AAUAAA sequence, defined as the
polyadeny-lylation signal CPSF (cleavage and polyadenypolyadeny-lylation
specificity factor) binds to this signal sequence and
medi-ates looping of the 3 -end of the transcript through
in-teractions with a G/U-rich sequence even further
down-stream Cleavage factors (CFs) then bind and bring about
the endonucleolytic cleavage of the transcript to create
a new 3 -end 10 to 35 nucleotides downstream from
the polyadenylylation signal Poly(A) polymerase (PAP)
then successively adds 200 to 250 adenylyl residues to
the new 3 -end (RNA polymerase II is also a significant
part of the polyadenylylation complex at the 3 -end of
the transcript, but for simplicity in illustration, its
pres-ence is not shown in the lower part of the figure.)
A G : G U A A G U
Exon
5 ⴕ-Splice Site Consensus
Intron
Py Py Py Py Py Py Py Py – C A G : G – –
Exon
3 ⴕ-Splice Site Consensus
Intron
FIGURE 29.41 Consensus sequences at the splice sites
in vertebrate genes.