Splice-sensitive microarray platforms and deep sequencing allow quantitative profiling of very large numbers of alternative splicing events, whereas global analysis of the targets of RNA
Trang 1Alternative splicing: global insights
Martina Hallegger*, Miriam Llorian* and Christopher W J Smith
Department of Biochemistry, University of Cambridge, UK
Introduction
Alternative splicing allows individual genes to produce
two or more variant mRNAs, which in many cases
encode functionally distinct proteins With the
progres-sive generation of ever larger sequence datasets, the
proportion of multi-exon human genes that are known
to be alternatively spliced has expanded to 92–94%, of
which 85% have a minor isoform frequency of at least
15% [1,2] Despite some debate about the extent to
which all of this alternative splicing is functionally
important [3], there is no disputing that alternative
splicing is a major contributor to the diverse repertoire
of transcriptomes and proteomes Its importance is
underscored by the fact that misregulated alternative
splicing can lead to human disease [4,5] As part of the
overarching effort to understand how the information
encrypted within genomes is used to generate fully
functional organisms, it is therefore necessary to deci-pher the ‘RNA codes’ underlying regulated patterns of alternative splicing
Traditionally, research on alternative splicing regula-tion focused on the study of minigene models in vitro
or in vivo The picture that emerged is that regulation
of alternative splicing occurs via the action of numer-ous RNA binding proteins expressed at variable levels between tissues These activators and repressors often mediate their effects by binding to enhancer and silen-cer elements within or surrounding alternatively spliced exons (reviewed in [6]) Although much progress has been made using model systems, a drawback is that even when a model alternative splicing event has been thoroughly characterized it is not immediately obvi-ous which of its features are generally shared by
Keywords
alternative splicing; microarray; RNA-Seq
Correspondence
C W J Smith, Department of
Biochemistry, University of Cambridge, 80
Tennis Court Road, Cambridge CB2 1GA,
UK
Fax: +44 1223 766002
Tel: +44 1223 333655
E-mail: cwjs1@cam.ac.uk
*These authors contributed equally to this
work
(Received 26 August 2009, accepted
22 October 2009)
doi:10.1111/j.1742-4658.2009.07521.x
Following the original reports of pre-mRNA splicing in 1977, it was quickly realized that splicing together of different combinations of splice sites – alternative splicing– allows individual genes to generate more than one mRNA isoform The full extent of alternative splicing only began to
be revealed once large-scale genome and transcriptome sequencing projects began, rapidly revealing that alternative splicing is the rule rather than the exception Recent technical innovations have facilitated the investigation of alternative splicing at a global scale Splice-sensitive microarray platforms and deep sequencing allow quantitative profiling of very large numbers of alternative splicing events, whereas global analysis of the targets of RNA binding proteins reveals the regulatory networks involved in post-transcrip-tional gene control Combined with sophisticated computapost-transcrip-tional analysis, these new approaches are beginning to reveal the so-called ‘RNA code’ that underlies tissue and developmentally regulated alternative splicing, and that can be disrupted by disease-causing mutations
Abbreviations
CLIP, UV cross-linking and immunoprecipitation; CELF, CUGBP and ETR3 like family (of RNA binding proteins); CUGBP, CUG binding protein; miRNA, micro-RNA; RNP, ribonucleoprotein; MBNL, muscleblind like; PTB, polypyrimidine tract binding protein; SELEX, selective evolution of ligands by exponential enrichment; SR protein, serine-arginine rich protein.
Trang 2coregulated alternative splicing events as part of a
common regulatory programme, and which features
are oddities of the particular model system Over
recent years new high-throughput methodologies have
allowed the analysis of thousands of alternative
splic-ing events in parallel These tools – principally
splice-sensitive microarrays, but also medium-throughput
automated RT-PCR, and increasingly deep sequencing
– allow large-scale quantitative profiling of splice
vari-ants This is important in allowing the generation of
large datasets of coregulated splicing events – a
prere-quisite for defining RNA codes Biomedically, these
approaches can facilitate the identification of splicing
signatures that are associated with pathologies [7] At
the same time, improved methods for defining the full
cellular complement of RNAs to which a particular
protein binds – for example, CLIP (UV cross-linking
and immunoprecipitation [8]) and its ‘next generation’
derivative HITS-CLIP [9] or CLIP-Seq [10] – as well
as a global analysis of alternative splicing changes
pro-duced as a result of splicing factor knockdown or
knockout, provide additional ‘factor-centric’ datasets
that can contribute to defining the codes
Several recent reviews have covered different aspects
of these global analyses [11–15] The aim of this
mini-review is to highlight some of the recently published
information that contributes towards breaking the
RNA code by the application of high-throughput
methodology, mainly focusing upon work in
mamma-lian systems We start by providing a brief review of
the enabling technologies, and move on to discuss the
insights they have allowed and possible future
develop-ments
Analogue and digital transcriptome
profiling
Early microarrays typically contained probes consisting
of full-length cDNAs or oligonucleotide probes located
towards the 3¢ end of transcripts, and were unable to
distinguish alternatively spliced isoforms However, a
number of current array designs, in different ‘flavours’
depending on the location of the probes, can
distin-guish between splice variants (Fig 1A, Table 1): (a)
til-ing arrays, with overlapptil-ing probes across a known
genomic sequence (a chromosome or an entire
gen-ome) [16]; (b) exon-body arrays, in which probes are
located within exons For example, the Affymetrix
human ExonArray includes 1.4 million probe sets
cor-responding to all known human exons, ranging from
the well annotated to more speculative computational
predictions [17–20]; (c) splice-junction arrays, which
contain probes crossing spliced junctions [21]; or
(d) exon-junction arrays, which contain probes within exons as well as across exon junctions Among the exon-junction arrays that have been used successfully are human and mouse arrays interrogating 3100 and
3700 cassette exons, respectively [22,23] A similar design has been used to interrogate 8315 alternative splicing events in Drosophila [24–26] Finally, a ‘whole transcript’ microarray monitoring 203 672 exons and
178 351 exon junctions has allowed the identification
of more than 24 000 human alternative splicing events [27] Such arrays have been applied successfully to study changes in alternative splicing under different conditions ranging from tissue-specific changes [17,27,28], cancer-associated splicing [19,29], signal-activated splicing [26,30], developmentally regulated splicing [20,31], as well as to define functional targets
by splicing factor depletion [18,25,32–34] and alterna-tive splicing events linked to nonsense-mediated decay [35] Although splice-sensitive microarrays have been applied with great success (see Table 1), they have some limitations, including cross-hybridization prob-lems, limited dynamic range, as well as a low signal-to-noise ratio due to background In particular, many of the normal rules for optimal probe design have to be relaxed or ignored in the case of exon-junction probes Finally, arrays are not an ideal platform for discover-ing new alternative splicdiscover-ing events, includdiscover-ing, for example, inclusion of pseudo-exons (see accompanying review by Dhir and Buratti [36]), and they are limited
to organisms with sequenced genomes
Sequence-based methods, including small tags, such
as expressed sequence tags, cap analysis of gene expression [37], serial analysis of gene expression [38],
as well as full-length cDNAs [39,40], have been used to obtain digital counts of transcript abundance, but they have suffered from bias introduced in the sample prep-aration, inability to detect lowly expressed genes and low statistical power The development of high-throughput DNA sequencing technologies [10,41,42] circumvents many of these previous barriers [1,43–48] RNA-Seq has the capacity to generate millions of short sequence reads (25–30 or 200–400 nucleotides depending on the sequencing technology) of cDNAs derived from polyA-enriched mRNA [45] Reads are then mapped on to unique locations on the genome and annotated transcriptome (for splice-junction reads), providing a digital count of expressed sequences (exons) Differences in read densities across genes in different conditions allow for quantification of gene expression [2,43] Comparison with microarray or RT-PCR data shows that read counts give an accurate estimate of relative gene expression levels across a very broad dynamic range [1,2]
Trang 3Because many sequence reads span exon–exon
junc-tions, RNA-Seq can identify novel splicing events The
discovery of new alternative splicing events and
mRNA isoforms is an area where the new sequencing
technologies will have an immediate impact However,
a greater challenge is to harness RNA-Seq for digital
quantitative profiling of alternative splicing (Fig 1B)
In principle, changes in alternative splicing between
two conditions can be quantitated by comparing the
number of reads mapping to reciprocal events (e.g
exon inclusion versus skipping) [2], or by normalizing
the number of reads mapping to a particular splice
junction or exon by the number of reads across the
gene In practice, large-amplitude changes in
alterna-tive splicing events within genes that are themselves
highly expressed are readily detected (e.g the
‘switch-like’ events reported in [2]) Only in one-third of
105 000 annotated alternative splicing events were reciprocal reads detected by Wang et al [2], allowing quantification of tissue-specific differential splicing using a minimum threshold of 10% change in inclu-sion ratio between tissues However, more subtle changes in alternative splicing within genes for which few reads are available will evade detection [49] Recent estimates suggest that 200 million reads would
be required to quantitate accurately the splicing levels
in 80% of genes [15] In the future, the progressively decreasing cost and increasing read lengths and volume
of high-throughput sequencing can only advance the ability of RNA-Seq to profile alternative splicing quan-titatively Methods to ‘focus’ sequence reads on to splice junctions, such as RNA-mediated annealing,
A
B
Fig 1 High-throughput methods for global analyses of alternative splicing (A) Schematic representation of different splice-sensitive micro-arrays (adapted from [27]) Exon micro-arrays, typically Affymetrix Exon Arrays, contain oligonucleotide probe sets for every known and predicted exon Junction arrays, typically used in [21], contain probes spanning exon junctions across annotated genes Exon-junction arrays typically contain both exon-body and exon-junction probes The coverage of these arrays varies from a few thousand cassette exons [22,23] to all annotated alternatively spliced genes in Drosophila [24–26] or every single annotated exon and exon junction in 18 000 human genes [27] The bottom panel shows an example of differential exon usage for a typical cassette exon by means of the differential hybridization signals (B) RNA-Seq The genomic structure for a typical cassette exon is depicted in the middle of the panel, where constitutive exons are shown
in purple and the alternative cassette exon in blue Sequence reads obtained from the high-throughput method are represented in colour-coded rectangles (see inset) and are mapped within the genomic sequence The counting of reads corresponding to inclusion (upper) and skipping (bottom) allows for the estimation of ‘inclusion ratios’ for the different alternatively spliced isoforms.
Trang 4selection, extension and ligation [50] or preselection by
customized capture arrays [51], might enable more
cost-effective quantitative profiling of a large number
of alternative splicing events In the meantime, some
of the splice-sensitive microarray platforms will remain
competitive
Surveying splicing regulator targets
Cataloguing the targets of RNA binding proteins that
are known splicing regulators provides a
complemen-tary entry point for unravelling RNA codes
‘Func-tional targets’ can be classified as the set of alternative
splicing events that are affected by perturbing the
levels of a splicing regulator, by knockdown, knockout
or overexpression These targets can be identified by
global transcriptome profiling tools, such as
splice-sensitive microarrays [18,25,32–34],
medium-through-put RT-PCR [52], RNA-Seq or even quantitative
proteomics [53] However, apparent functional targets
can include indirect secondary targets
A complementary approach is to identify direct
RNA ‘binding targets’ Selective evolution of ligands
by exponential enrichment (SELEX) is an initial fully
in vitro approach that defines the optimal binding site, typically short variably degenerate motifs, for an RNA binding protein by iterative selection from an ini-tially fully degenerate sequence pool [54] A variant approach, genomic SELEX, uses RNA transcribed from genomic DNA as the starting pool for selection [55] SELEX is a useful, although not obligatory, precursor to methods that catalogue the actual RNA species (mRNA or pre-mRNA) bound by a splicing regulatory protein Direct immunoprecipitation with-out prior cross-linking (RNP immunoprecipitation) followed by hybridization to arrays can be a useful approach [25] However, a more powerful approach for identifying binding targets is CLIP (Fig 2), which was originally developed to identify targets of the neuron-specific NOVA proteins [8,56] RNA is first cross-linked in vivo to bound protein by UV irradia-tion, fragmented to 100 nucleotide tags, isolated by immunoprecipitation, reverse transcribed and then sequenced A key feature of CLIP is that UV induces
‘zero-length’ cross-links only between RNA and directly bound proteins, thereby allowing enrichment
Table 1 Summary of splice-sensitive microarray analyses.
Validation rate (events tested) Reference
203 672 exons ⁄ 178 351 exon junctions 48 tissues and cell lines Human 74% (23 events tested) [27]
110 367 exons ⁄ 93 382 exon junctions Time course of heart development Mouse Not mentioned [31]
Affymetrix Exon Array Probe
sets for 1 million exons
Colon, bladder, prostate cancer tissues
Exon Array and array featuring
exon-body and exon-junction
probe sets
UPF3 in HeLa
8315 mRNAs ⁄ 9868 alt
junction probes
Knockdown of SR and hnRNP proteins in S2 cells
Knockdown of hnRNP proteins
in S2 cells
Alternative splicing changes upon insulin or Wingless stimulation
Trang 5of specifically bound sequences by
immunoprecipita-tion under stringent condiimmunoprecipita-tions The original CLIP
procedure has now been modified, with direct
high-throughput sequencing of reverse transcribed tags
[9,10] The so-called HITS-CLIP [9] or CLIP-Seq [10]
protocols allow saturated coverage of binding targets,
giving a truly global view of the RNP landscape of
individual proteins, and suggesting possible novel
func-tions This ‘next generation’ CLIP approach has
already been applied to the splicing regulators NOVA
[57], FOX2 [58], SFRS1 (better known as SF2⁄ ASF)
[59,60], as well as the miRNA-associated protein,
argonaute [61] The comprehensive view afforded by
this approach reveals additional, nonsplicing-related,
roles for these RNA binding proteins For example,
a surprising new function for NOVA2 in alternative
poly(A)-site choice was discovered Neuronal cells in
general tend to process at promoter-distal poly(A)-sites
and the NOVA2 targets follow this trend Proliferating
cells produce shorter 3¢ UTRs and therefore reduce
the potential of miRNA regulation [62] By the
same token, neuronal transcripts with long UTRs are
potentially more prone to regulatory inputs from both
miRNAs and 3¢ UTR binding proteins
In practice, methods to define functional and
binding targets are complementary A comprehensive
global analysis of the Drosophila homologues of the
mammalian hnRNPA⁄ B proteins, hrp36, hrp38, hrp40,
hrp48, involved analysis by a splice-sensitive array of alterations in alternative splicing upon knockdown, determination of SELEX motifs in vitro and direct immunoprecipitation without prior cross-linking followed by hybridization to arrays using a whole genome tiling array [25] This provided many insights into the functional redundancy and specialization of this family, and provided hints about their probable mechanism of action Perhaps most surprisingly, in view of popular models about antagonism between the two families of proteins, very few alternative splicing events were found to be regulated by both hnRNP and
SR proteins [24,25]
Tissue and individual variations in alternative splicing
Over the last year, several reports have focussed on the global analysis of transcript isoform differences between human tissues [1,2,16,27,28,47,63,64], mouse tissues [31,63], normal and cancer tissues [64], in response to specific signalling pathways in Drosophila [26], or developmental transitions in human brain [28], mouse heart [31] and mouse stem cells [63] The combi-nation of these approaches has revealed extensive transcript complexity
Sequencing approaches show that many transcripts extend beyond the previously annotated 5¢ and 3¢ gene
Fig 2 HITS-CLIP Intact tissue or tissue culture cells are UV irradiated to induce covalent cross-links between RNA and RNA binding pro-teins Cells are lysed under very stringent conditions and treated with DNAse and partially digested with RNAses The RNA–RNP complex is pulled-down by immunoprecipitation The RNA is radioactively 5¢ labelled and ligated to a 5¢ RNA linker The sample is run on SDS ⁄ PAGE with neutral pH and blotted Only RNA cross-linked to protein will be transferred on to the membrane A small fragment of membrane is iso-lated at a position that corresponds to the protein plus RNA between 50 and 100 nucleotides After proteinase K digestion, the RNA is recovered from the membrane and ligated on its 3¢ end to an RNA adapter with complementarity to the RT primer The following PCR step with primer complementary to ligated linkers also allows the addition of appropriate HITS-specific primer sequences (adapted from [76]).
Trang 6boundaries [1,2,63] Moreover, there has been a
substantial increase in the number of known alternative
splicing events, with the capacity of discovering new
splice junctions, ranging from 1400 in one study [63] to
between 4294 and 11 099 in another [1] The majority of
detected alternative splicing events, including those
newly discovered, show clear tissue specificity,
demon-strating the importance of alternative splicing in
tissue-specific programmes of gene expression In one study
alone, involving 400 million 32 base reads from 15
human tissues and cell lines, 22 000 tissue-specific
alter-native transcript events were identified [2] A group of
alternative splicing events that shows extreme changes
between tissues – so-called ‘switch-like’ events – is
asso-ciated with the regulation of highly tissue-specific
func-tions by switching between distinct full-length isoforms
[2] Perhaps unsurprisingly, some of these switch-like
alternative splicing events within highly expressed genes
(e.g TPM1) have been used for many years as model
systems of regulated alternative splicing
Interestingly, although in many cases alternative
splicing regulates functionally coherent groups of
genes, there is no significant overlap between those
genes that are differentially transcribed and those that
are differentially spliced within the same tissue or
within specific cell programmes [27,30,31,42,65] For
example, upon T cell activation, genes related to the
immunological response are affected at the level of
transcription, whereas cell cycle genes are differentially
spliced [30] These findings build upon the original
observations of Pan et al [23] suggesting that overall
programmes of tissue-specific gene expression involve
independent subprogrammes operating on different
subsets of genes at the levels of transcription and
splic-ing [66] On the other hand, in response to certain
sig-nalling pathways in Drosophila melanogaster cells, a
40% overlap was found between genes that undergo
both transcriptional and splicing changes, suggesting
that transcriptional and post-transcriptional
co-ordina-tion could be important to deploy quick responses
upon certain stimuli [26]
Sequencing and array studies have also provided
fascinating glimpses at the degree to which alternative
splicing varies between individuals RNA-Seq of
sam-ples originating from seven cerebellar cortex samsam-ples
[2] and exon tiling array analysis of 57 lymphoblastoid
cell lines [16] both showed a significant association
between genomic variations (single nucleotide
poly-morphisms) and alternative splicing patterns Happily
(for those working on mechanisms of tissue-specific
splicing), both studies indicated that although
alterna-tive splicing variation between individuals is common,
it is secondary to tissue-specific alternative splicing
Motifs and maps
RNA-Seq and microarray analysis on tissues have generated a genome-scale catalogue of isoform expres-sion profiles [2,17,27,31] These data provide a resource
to identify the RNA sequences involved in the regula-tion of tissue-specific alternative splicing by motif enrichment analysis In some cases, the motifs associ-ated with tissue-specific alternative splicing hint at the involvement of ‘usual suspects’ – well-known splicing regulators with defined binding sequences
By microarray profiling 48 human tissues and sys-tematically screening for 4-mer to 7-mer RNA ‘words’ associated with 24 426 alternatively spliced exons, Castle et al [27] identified 143 motifs enriched near tissue-specific exons Interestingly, the two most fre-quent motifs, UCUCU and UGCAUG, coincide with binding consensus sequences for PTB⁄ nPTB and FOX splicing factors, and show a distinct pattern of geno-mic localization Similar observations were made based
on RNA-Seq reads from 15 human tissues and cell lines [2] UCUCU motifs were enriched within a 200 nucleotide region upstream of cassette exons that are upregulated in brain and striated muscle The extent to which these exons are spliced correlates inversely with PTB expression levels [2,27], consistent with PTB’s well-known role as a splicing repressor [67]
The Castle et al [27] junction array was also used to analyse alternative splicing during development of the mouse heart, resulting in the identification of 63 devel-opmentally regulated alternative splicing events, falling into three temporal groups More than half of these events were regulated similarly during development of the chicken heart [31] Enriched motifs included bind-ing sites for the CUGBP, MBNL, FOX, STAR and PTB families of splicing factors Forty-four of these alternative splicing events were further investigated in hearts from transgenic animals that overexpressed CUGBP1 or were depleted of MBNL1 Of the 24 ex-ons with altered inclusion levels, 13 were regulated by CUGBP1, five by MBNL1 and six antagonistically by both [31] The switch in relative activities of CUGBP and MBNL proteins during development appears to explain a large subset of splicing transitions detected during postnatal heart development
Observation of enriched motifs in the cases above allowed inferences to be drawn about the probable cognate binding proteins, e.g Fox, PTB, MBNL and CELF proteins However, there are more than 300 RNA binding proteins encoded in mammalian genomes [68], which have the potential to act as splic-ing regulators, but for most little or nothsplic-ing is known about their binding specificity Traditional SELEX to
Trang 7determine their binding specificity would be laborious.
However, a new array-based procedure may provide
the capability to rapidly derive the optimal binding
motifs for many of these proteins [69], which would
assist in future attempts to link factors with enriched
motifs
NOVA and FOX maps
In the case of two families of mammalian proteins, the
FOX and NOVA proteins, a variety of techniques,
culminating in HITS-CLIP analysis, have converged
on very similar RNA maps, in which the precise
location of binding sites for the cognate proteins is
predictive of their action as either repressors or
silenc-ers of alternatively spliced exons
The NOVA proteins are neuron-specific RNA
bind-ing proteins that are targets of a neuronal autoimmune
response associated with cancer Analysis of these
proteins in the Darnell laboratory has led the way in
the global analysis of RNA binding protein function
[70] SELEX analysis indicated that the optimal
bind-ing site for NOVA consisted of clusters of three
YCAY motifs [71], and importantly a cluster of such
motifs matched a cis element crucial for
NOVA-regu-lated alternative splicing of an exon in the GABAA
gene Analysis of alterations in alternative splicing in
the neocortex of wild-type and Nova2) ⁄ ) mice using
an Affymetrix prototype junction array with 40 000
probe sets allowed the identification of 50
alterna-tive splicing events that were NOVA regulated [72]
The genes affected by NOVA-dependent alternative
splicing were highly enriched for proteins involved in
synaptic function, emphasizing the fact that alternative
splicing targets functionally coherent groups of genes
The CLIP method was originally developed to analyse
in vivo NOVA binding RNAs by conventional cloning
and sequencing of purified RNA tags Of the moderate
number of sequence tags identified, only 20%
con-tained clusters of YCAY motifs, but in these cases the
tags were often associated with NOVA-regulated
alternative splicing events [56] On the basis of the
accumulated group of validated NOVA targets, a
bio-informatic exercise was carried out to identify clusters
of YCAY motifs within 200 nucleotides of alternative
exons or their flanking constitutive exons, and
more-over to predict whether these clusters would act as
enhancers or silencers [73] The resulting NOVA RNA
map contained various intronic and exonic silencers, as
well as intronic enhancers NOVA clusters within the
downstream intron were invariably enhancers, whereas
within the exon and most positions in the upstream
intron they were silencers Most recently, the NOVA
RNA map has been refined by high-throughput sequencing (using the Roche 454 platform) of NOVA2 CLIP tags from mouse neocortex, with confirmation
of splicing outcomes by splice-junction array compari-son of wild-type and Nova2) ⁄ )mice [57] As expected, the comprehensive HITS-CLIP approach rediscovered many of the previously known NOVA targets, as well
as many new ones The refined NOVA map showed that NOVA binding clusters within 500 nucleotides of the alternative 5¢ splice site or constitutive 3¢ splice site acted as enhancers, whereas NOVA binding within 500 nucleotides of the constitutive 5¢ splice site or sur-rounding the NOVA-regulated exon was inhibitory The FOX1 and -2 proteins are alternative splicing regulators that have a single RNA binding domain with an unusual degree of specificity for the cognate UGCAUG binding site [74] In a number of recent global transcriptome profiling studies, FOX binding motifs were found to be associated with exons regu-lated in striated muscle and neurons [2,27,75], consis-tent with the expression patterns of FOX1 and -2 Analysis of breast and ovarian cancer using an RT-PCR panel of alternative splicing events indicated that one-third of cases of increased exon skipping in cancer were associated with downstream FOX sites More-over, FOX2 expression is lower in breast cancer and its own alternative splicing is altered in ovarian cancer [64] Closer analysis of the various FOX datasets showed an interesting position-dependent effect, remi-niscent of the NOVA map [57,73] When located downstream of alternative splicing exons, FOX binding sites act as enhancers, whereas on the upstream side they act as repressors (Fig 3) The FOX ‘RNA map’ was also converged upon by two additional approaches that used FOX binding sites and mRNA targets as the starting point The long and nondegener-ate nature of the FOX binding site allowed Zhang
et al [75] to conduct a computational search for posi-tionally conserved UGCAUG motifs within 200 nucle-otides of internal exons across 28 vertebrate genomes Comparing the bioinformatics with data collected from the Castle et al [27] custom exon-junction array for alternative splicing in 47 different tissues and cell lines, they identified the position dependency of FOX bind-ing sites Finally, CLIP-Seq analysis was carried out for FOX2 binding sites in human embryonic stem cells [58] Of 5.3 million 36 nucleotide reads, 4.4 million mapped to unique genomic locations leading to the identification of > 3500 clusters representing genuine FOX2 binding events Surprisingly, although the UGCAUG motif was highly enriched, an exact match was only found in 22% of clusters, and even the core GCAUG pentamer was present in only 33%,
Trang 8indicating that FOX2 can bind to other sites, perhaps
in co-operation with other proteins FOX2 sites were
highly enriched around alternative splicing exons and a
similar position-dependent FOX2 activity map was
deduced Interestingly, it appears that FOX2 is a key
player in a splicing regulatory network in human
embryonic stem cells The alternative splicing events
regulated by FOX2 were highly enriched for splicing
regulatory proteins, including numerous hnRNP and
SR proteins and an autoregulatory splicing event in
the FOX2 gene itself [58] In contrast, different sets of
FOX2 targets were identified in neural progenitors,
with the major functional enrichment being for
cyto-skeletal proteins, consistent with other reports
[27,58,64,75]
Towards a predictive splicing map
Global alternative splicing profiling points towards
the association of some sequence motifs and their
cognate binding proteins with some tissue-specific splicing programmes, whereas the NOVA and FOX splicing maps indicate the position-dependent activity
of some splicing regulators But even the activity of FOX and NOVA when bound at particular locations
is dependent upon the binding and activity of other factors There is still some way to go before a full tissue-specific splicing code, with the ability to predict the consequences of mutations, is deciphered A recent study highlighted one of the important future directions The Frey and Blencowe groups have developed a machine-learning approach in which the tissue-specific splicing profiles of 3707 mouse cassette exons, gathered using a custom junction-array plat-form [22], have been combined with over a thousand separate ‘RNA features’ in order to generate a ‘splic-ing code’ that predicts changes in exon inclusion between tissues The features include known protein binding sequences (including FOX, NOVA and PTB⁄ nPTB), motifs with predicted silencer or enhan-cer activity, secondary structures, conservation, exon and intron size, and whether exon inclusion or skip-ping introduces a premature termination codon Using this approach, distinct combinations of fea-tures are found to be predictive of five different tis-sue categories of alternative splicing: central nervous system, muscle, embryo, ‘digestive organs’ (including liver, kidney, gut) and tissue independent (B Frey, personal communication) This pioneering study is based upon a moderate number of cassette exons and
27 tissue-specific datasets, but it provides a clear direction for future endeavours Further refinement
of the splicing code will be readily achieved by a combination of additional tissue datasets and analysis
of transcriptomes of defined cell types (most tissues contain a variety of differentiated cell types), together with larger numbers and different categories of alter-native splicing events The ability to sequence the transcriptomes of single cells [63] will also be enor-mously helpful as improved methods for sequencing-based quantitative profiling of alternative splicing are developed Of course, defining the logic of the code will pose many questions about the underlying mech-anisms For example, why do FOX and NOVA pro-teins inhibit from an upstream position, but activate from downstream of an alternative exon? As the details of the splicing codes are revealed, there will
be scope for a great deal of further mechanistic dis-section at the molecular level However, in contrast
to earlier work on alternative splicing mechanisms, experimentalists will know in advance that they are revealing the mechanisms of generally applicable pro-grammes
A
B
Fig 3 Position-dependent activity of FOX proteins (A) Enrichment
of UGCAUG motifs in the downstream intron is associated with
increased exon inclusion in heart, skeletal muscle, brain and
cere-bellar cortex Higher motif frequency in the upstream intron is
asso-ciated with reduced inclusion in skeletal muscle Adapted from [2].
(B) Enrichment of FOX binding sites on the upstream side of
alter-natively spliced exons, indicated by the blue line, is associated with
FOX-dependent exon skipping, whereas enrichment on the
down-stream side (red line) is associated with FOX-dependent inclusion.
Adapted from [2,58,64,75].
Trang 9We thank Brendan Frey for comments on the
manu-script and for communicating unpublished data Work
in the CWJS laboratory is funded by the Wellcome
Trust (programme grant 077877) and by EC grant
EURASNET-LSHG-CT-2005-518238
References
1 Pan Q, Shai O, Lee LJ, Frey BJ & Blencowe BJ (2008)
Deep surveying of alternative splicing complexity in the
human transcriptome by high-throughput sequencing
Nat Genet 40, 1413–1415
2 Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang
L, Mayr C, Kingsmore SF, Schroth GP & Burge CB
(2008) Alternative isoform regulation in human tissue
transcriptomes Nature 456, 470–476
3 Melamud E & Moult J (2009) Stochastic noise in
splic-ing machinery Nucleic Acids Res 37, 4873–4886
4 Raponi M & Baralle D (2009) Alternative splicing: good
and bad effects of translationally silent substitutions
FEBS J 277, doi:10.1111/j.1742-4658.2009.07519.x
5 Faustino NA & Cooper TA (2003) Pre-mRNA splicing
and human disease Genes Dev 17, 419–437
6 Matlin AJ, Clark F & Smith CW (2005) Understanding
alternative splicing: towards a cellular code Nat Rev
Mol Cell Biol 6, 386–398
7 Soreq L, Gilboa-Geffen A, Berrih-Aknin S, Lacoste P,
Darvasi A, Soreq E, Bergman H & Soreq H (2008)
Identifying alternative hyper-splicing signatures in
MG-thymoma by exon arrays PLoS ONE 3, e2392
8 Ule J, Jensen K, Mele A & Darnell RB (2005) CLIP:
a method for identifying protein-RNA interaction sites
in living cells Methods 37, 376–386
9 Jensen KB & Darnell RB (2008) CLIP: crosslinking and
immunoprecipitation of in vivo RNA targets of
RNA-binding proteins Methods Mol Biol 488, 85–98
10 Wang Z, Gerstein M & Snyder M (2009) RNA-Seq: a
revolutionary tool for transcriptomics Nat Rev Genet
10, 57–63
11 Ben-Dov C, Hartmann B, Lundgren J & Valcarcel J
(2008) Genome-wide analysis of alternative pre-mRNA
splicing J Biol Chem 283, 1229–1233
12 Hartmann B & Valcarcel J (2009) Decrypting the
genome’s alternative messages Curr Opin Cell Biol 21,
377–386
13 Moore MJ & Silver PA (2008) Global analysis of
mRNA splicing RNA 14, 197–203
14 Wang Z & Burge CB (2008) Splicing regulation: from a
parts list of regulatory elements to an integrated splicing
code RNA 14, 802–813
15 Blencowe BJ, Ahmad S & Lee LJ (2009)
Current-gener-ation high-throughput sequencing: deepening insights
into mammalian transcriptomes Genes Dev 23, 1379–1386
16 Kwan T, Benovoy D, Dias C, Gurd S, Provencher C, Beaulieu P, Hudson TJ, Sladek R & Majewski J (2008) Genome-wide analysis of transcript isoform variation in humans Nat Genet 40, 225–231
17 Clark TA, Schweitzer AC, Chen TX, Staples MK, Lu
G, Wang H, Williams A & Blume JE (2007) Discovery
of tissue-specific exons using comprehensive human exon microarrays Genome Biol 8, R64
18 Oberdoerffer S, Moita LF, Neems D, Freitas RP, Hacohen N & Rao A (2008) Regulation of CD45 alternative splicing by heterogeneous ribonucleoprotein, hnRNPLL Science 321, 686–691
19 Gardina PJ, Clark TA, Shimada B, Staples MK, Yang
Q, Veitch J, Schweitzer A, Awad T, Sugnet C, Dee S
et al.(2006) Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array BMC Genomics 7, 325
20 Yamamoto ML, Clark TA, Gee SL, Kang JA, Schweitzer AC, Wickrema A & Conboy JG (2009) Alternative pre-mRNA splicing switches modulate gene expression in late erythropoiesis Blood 113, 3363–3370
21 Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch
PM, Armour CD, Santos R, Schadt EE, Stoughton R
& Shoemaker DD (2003) Genome-wide survey of human alternative pre-mRNA splicing with exon junc-tion microarrays Science 302, 2141–2144
22 Fagnani M, Barash Y, Ip JY, Misquitta C, Pan Q, Saltzman AL, Shai O, Lee L, Rozenhek A, Mohammad
N et al (2007) Functional coordination of alternative splicing in the mammalian central nervous system Genome Biol 8, R108
23 Pan Q, Shai O, Misquitta C, Zhang W, Saltzman AL, Mohammad N, Babak T, Siu H, Hughes TR, Morris
QD et al (2004) Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform Mol Cell 16, 929–941
24 Blanchette M, Green RE, Brenner SE & Rio DC (2005) Global analysis of positive and negative pre-mRNA splicing regulators in Drosophila Genes Dev 19, 1306–1314
25 Blanchette M, Green RE, MacArthur S, Brooks AN, Brenner SE, Eisen MB & Rio DC (2009) Genome-wide analysis of alternative pre-mRNA splicing and RNA-binding specificities of the Drosophila hnRNP A⁄ B family members Mol Cell 33, 438–449
26 Hartmann B, Castelo R, Blanchette M, Boue S, Rio
DC & Valcarcel J (2009) Global analysis of alternative splicing regulation by insulin and wingless signaling in Drosophilacells Genome Biol 10, R11
27 Castle JC, Zhang C, Shah JK, Kulkarni AV, Kalsotra
A, Cooper TA & Johnson JM (2008) Expression of 24,426 human alternative splicing events and predicted
Trang 10cis regulation in 48 tissues and cell lines Nat Genet 40,
1416–1425
28 Johnson MB, Kawasawa YI, Mason CE, Krsnik Z,
Coppola G, Bogdanovic D, Geschwind DH, Mane
SM, State MW & Sestan N (2009) Functional and
evolutionary insights into human brain development
through global transcriptome analysis Neuron 62, 494–
509
29 Thorsen K, Sorensen KD, Brems-Eskildsen AS,
Modin C, Gaustadnes M, Hein AM, Kruhoffer M,
Laurberg S, Borre M, Wang K et al (2008)
Alterna-tive splicing in colon, bladder, and prostate cancer
identified by exon array analysis Mol Cell Proteomics
7, 1214–1224
30 Ip JY, Tong A, Pan Q, Topp JD, Blencowe BJ &
Lynch KW (2007) Global analysis of alternative splicing
during T-cell activation RNA 13, 563–572
31 Kalsotra A, Xiao X, Ward AJ, Castle JC, Johnson JM,
Burge CB & Cooper TA (2008) A postnatal switch of
CELF and MBNL proteins reprograms alternative
splicing in the developing heart Proc Natl Acad Sci
USA 105, 20333–20338
32 Hung LH, Heiner M, Hui J, Schreiner S, Benes V &
Bindereif A (2008) Diverse roles of hnRNP L in
mam-malian mRNA processing: a combined microarray and
RNAi analysis RNA 14, 284–296
33 Chawla G, Lin CH, Han A, Shiue L, Ares M Jr &
Black DL (2009) Sam68 regulates a set of alternatively
spliced exons during neurogenesis Mol Cell Biol 29,
201–213
34 Xing Y, Stoilov P, Kapur K, Han A, Jiang H, Shen S,
Black DL & Wong WH (2008) MADS: a new and
improved method for analysis of differential alternative
splicing by exon-tiling microarrays RNA 14,
1470–1479
35 Saltzman AL, Kim YK, Pan Q, Fagnani MM, Maquat
LE & Blencowe BJ (2008) Regulation of multiple core
spliceosomal proteins by alternative splicing-coupled
nonsense-mediated mRNA decay Mol Cell Biol 28,
4320–4330
36 Dhir A & Buratti E (2009) Alternative splicing: role of
pseudoexons in human disease and potential therapeutic
strategies FEBS J 277, doi:10.1111/j.1742-4658.2009
07520.x
37 Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa
T, Kawaji H, Kodzius R, Watahiki A, Nakamura M,
Arakawa T et al (2003) Cap analysis gene expression
for high-throughput analysis of transcriptional starting
point and identification of promoter usage Proc Natl
Acad Sci USA 100, 15776–15781
38 Velculescu VE, Zhang L, Vogelstein B & Kinzler KW
(1995) Serial analysis of gene expression Science 270,
484–487
39 Iida K, Fukami-Kobayashi K, Toyoda A, Sakaki Y,
Kobayashi M, Seki M & Shinozaki K (2009) Analysis
of multiple occurrences of alternative splicing events in Arabidopsis thalianausing novel sequenced full-length cDNAs DNA Res 15, 155–164
40 Kim YC, Wu Q, Chen J, Xuan Z, Jung YC, Zhang
MQ, Rowley JD & Wang SM (2009) The transcriptome
of human CD34+ hematopoietic stem-progenitor cells Proc Natl Acad Sci USA 106, 8278–8283
41 Ansorge WJ (2009) Next-generation DNA sequencing techniques N Biotechnol 25, 195–203
42 Calarco JA, Saltzman AL, Ip JY & Blencowe BJ (2007) Technologies for the global discovery and analysis of alternative splicing Adv Exp Med Biol 623, 64–84
43 Mortazavi A, Williams BA, McCue K, Schaeffer L & Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq Nat Meth 5, 621–628
44 Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M & Snyder M (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing Science 320, 1344–1349
45 Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood
V, Goodhead I, Penkett CJ, Rogers J & Bahler J (2008) Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution Nature 453, 1239–1243
46 Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH & Ecker JR (2008) Highly inte-grated single-base resolution maps of the epigenome in Arabidopsis Cell 133, 523–536
47 Sultan M, Schulz MH, Richard H, Magen A, Klingen-hoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D et al (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome Science 321, 956–960
48 Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulk-ner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G et al (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing Nat Meth 5, 613–619
49 Li H, Lovci MT, Kwon YS, Rosenfeld MG, Fu XD & Yeo GW (2008) Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model Proc Natl Acad Sci USA 105, 20179–20184
50 Yeakley JM, Fan JB, Doucet D, Luo L, Wickham E, Ye
Z, Chee MS & Fu XD (2002) Profiling alternative splic-ing on fiber-optic arrays Nat Biotechnol 20, 353–358
51 Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, Middle CM, Rodesch MJ, Albert TJ, Hannon
GJ et al (2007) Genome-wide in situ exon capture for selective resequencing Nat Genet 39, 1522–1527
52 Venables JP, Koh CS, Froehlich U, Lapointe E, Couture S, Inkel L, Bramard A, Paquet ER, Watier V, Durand M et al (2008) Multiple and specific mRNA processing targets for the major human hnRNP proteins Mol Cell Biol 28, 6033–6043