Chapter 7 Analysis of the Dynamics of Co-transcriptional Binding Accessibility of AON Target Sites In this chapter, dynamics of the co-transcriptional binding accessibility of previousl
Trang 1Chapter 7
Analysis of the Dynamics of Co-transcriptional Binding Accessibility of AON Target Sites
In this chapter, dynamics of the co-transcriptional binding accessibility of previously
published AON target sites are analyzed, and correlated with the degree of reported efficiency in the induction of exon skipping
7.1 Overview of the analysis methodology
The analysis carried out in this chapter involves the following sequential steps:
1 Data collection (Section 7.2) Previously published AONs whose efficiency in the induction of selective exon skipping in the dystrophin pre-mRNA has been tested
in wet experiments are gathered; only AONs that target ESE sites are included They are then graded according to their reported efficiencies
2 Prediction of the co-transcriptional secondary structures of exons (Section 7.3)
A model to approximate transcription is used to predict the co-transcriptional
secondary structures of exons, which are targeted by the AONs gathered in step 1
3 Analysis of the dynamics of the co-transcriptional binding accessibility (Sections 7.4 and 7.5) The co-transcriptional binding accessibility of each nucleotide
within an AON target site is determined based on whether it is paired in the
Trang 2novel scoring systems are formulated to quantify the dynamics of the transcriptional binding accessibility
co-4 Test for correlation between reported AON efficiencies in inducing exon skipping
with their co-transcriptional binding accessibility (Sections 7.4 and 7.5) The
scores (formulated in step 3) in each grade of AONs are tested for statistical differences and significances against other grades using two-sample Kolmogorov-Smirnov (K-S) test; each grade of AONs has distinct reported efficiencies All statistical tests are performed using the statistical software, R Version 2.0.0 (http://www.R-project.org)
Note: throughout the thesis, “efficacy” is used to describe qualitatively the ability of
an AON to induce selective exon skipping whereas “efficiency” is used to quantify the percentage of total mRNA molecules whose selected exon is skipped by an AON
7.2 Data set for analysis
A total of 176 AONs, reported by two independent sources Aartsma-Rus et al (2005) and Wilton et al (2007), that target ESEs to induce the skipping of 67 exons in
dystrophin pre-mRNA was analyzed Although the cell lines and experimental protocols used in these two studies were similar, the AONs from each study were analyzed separately because of the following reasons The range of AON lengths,
which may influence AON performance (Harding et al., 2007), differed significantly between the studies The AONs from the two sources Aartsma-Rus et al (2005) and Wilton et al (2007) showed median lengths of 19 and 26 nucleotides respectively,
and for the purpose of this study, are henceforth denoted as Set A and Set B
Trang 3respectively Note that only 62 out of the 82 AONs reported by Wilton et al (2007)
are included in Set B, as the remaining ones either target non-ESE sites or result in unspecific exon skipping Secondly, as breakdown in Table 7-1, the respective sources graded their AONs differently according to their efficiencies in inducing exon skipping; AON efficiency was calculated based on densitograph semi-quantification
in the two publications
Table 7-1 Classification of published AON (antisense oligonucleotides) sequences
Published AONs from two independent sources are denoted as Set A and B respectively In each set, AONs are classified into different grades according to their efficiencies (E) in the induction of exon skipping
E ≥≥≥ 25%
Grade (+) 0% < E < 25%
Grade (+ 2 ) 0% < E < 10%
Siggia, 2000; Gultyaev et al., 1995), computational time is tractable only for
Trang 4Alternatively, algorithms that could efficiently predict a long fully synthesized mRNA are considered (Zuker, 2003; Knudsen and Hein, 2003; Ding and Lawrence, 2003;
Flamm et al., 2000) Among them is mfold (Zuker, 2003), which is chosen in this
study because firstly it has a relatively high average prediction accuracy of 70%
(Mathews et al., 1999) and secondly, it has the advantage of being used in most
published experimental work on AONs that target the dystrophin gene (Aartsma-Rus
et al , 2002, 2005; Errington et al., 2003) and, therefore, results of this study can be
compared with them on a common basis
Figure 7-1 A model to approximate transcription elongation
To approximate the transcription elongation process, a “window of analysis” is shifted one nucleotide at a time along the pre-mRNA sequence towards the 3” end At the first window, its 3’ end coincides with the 3’ end of the target exon Correspondingly, at the last window, its 5’ end coincides with the 5’ end of the target exon Each window of analysis corresponds
to a step of transcriptional analysis at which the possible secondary structures of its sequence were predicted
exon
AON target site
intron intron
1500 nt
1 st step of transcriptional analysis
2 nd step of transcri ptional analysis
3 rd step of transcri ptional analysis
Last step of transcriptional analysis
1500 nt
1 st step of transcriptional analysis
2 nd step of transcri ptional analysis
3 rd step of transcri ptional analysis
Last step of transcriptional analysis
Direction of
pre-mRNA elongation
1 nt
Trang 5As mfold does not consider folding paths, they are approximated using the model depicted in Figure 7-1 A “window of analysis” of pre-determined sequence length of 1500 nucleotides that includes the full length of the targeted exon corresponds to a “step of transcriptional analysis” To approximate the transcription elongation process, the window of analysis is shifted one nucleotide at a time along the pre-mRNA sequence towards the 3’ end At each step of transcriptional analysis, the possible secondary structures for the window sequence are predicted using mfold
version 3.1 (Zuker, 2003; Mathews et al., 1999) Since it is highly probable that the
nascent pre-mRNA may not have the chance to assume optimal secondary structures, sub-optimal secondary structures whose energies lie within 5% of the optimum are considered On average, 44,582 secondary structures are predicted per exon of which
24 to 47 secondary structures are predicted in each step of transcriptional analysis; number of secondary structures predicted in the 79 exons is given in Appendix A-17
Note that the model considers only the local secondary structures around the target exon As abundant hnRNPs (heterogeneous nuclear ribonuclear proteins) package long introns into compact secondary structures that deterred long-distance or
global intra-molecular complementary base pairings (Alberts et al., 2002), this
assumption is justified given that long introns are typical in dystrophin gene (Figure 6-1 of Chapter 6) On the other hand, the 1500 nucleotides length of the window of analysis is estimated from experimental measurements It has been reported that the 3’ splice site is recognized 48 seconds after it is transcribed (Beyer and Osheim, 1988) Based on the measured elongation rate of dystrophin pre-mRNA at 1700 to
2500 nucleotides per minute (Tennyson et al., 1995), about 1360 to 2000 nucleotides would be appended to the nascent transcript during this period Nevertheless, co-
Trang 6transcriptional secondary structures of exons 2 (62 bp), 29 (150 bp) and 59 (269 bp) were also predicted with lengths of window of analysis of 1200 and 2000 nucleotides,
but however, no statistical differences in their co-transcriptional secondary structures
are detected (data not shown)
7.4 Analysis of the dynamics of co-transcriptional
binding accessibility of AON target sites
Four levels of analysis using scoring methodologies of increasing complexity are used
to score the binding accessibility of AON target sites in the 2 sets of published AONs Scores at each level of analysis were then correlated with the degree of reported AON efficiency and efficacy for each set of AONs Note that these scoring methodologies
are applicable for any secondary structure prediction tools, as long as transcriptional secondary structures of AON target sites can be generated
At this simplest level of analysis, the binding accessibility score of an AON target site
(L1) is computed To do so however, the binding accessibility score of each
nucleotide within the AON target site is needed, and is determined by this ratio:
Number of predicted secondary structures in which the nucleotide is unpaired
Total number of secondary structures predicted
Trang 7Note: all secondary structures predicted at every step of transcriptional analysis (Figure 7-1) are included in the calculation; a nucleotide is “unpaired” when it does not form complementary base pairing with another nucleotide within the pre-mRNA
Thus, the accessibility score for the AON target site, L1 is:
Sum of nucleotide accessibility scores for all nucleotides within the AON target site
Total number of nucleotides in AON target site
The L1 scores for each AON target site analyzed are tabulated in Appendix
A-18 Two-sample Kolmogorov-Smirnov (K-S) test is used to test for statistical
differences and significances of the L1 scores for target sites between any two AON
grades of the same set Table 7-2 tabulates the p-values for the statistical tests To ensure consistent test outcomes, two exclusive one-tailed tests, i.e., Ho: 1st < 2nd and Ho: 1st > 2nd (columns 2 and 3) are performed for each test case (as described in column one) For instance, for the test case (++ versus –) of Set A, the null hypothesis, Ho: 1st < 2nd tests for whether L1 scores for target sites in (++) AONs are
smaller than those in (–) AONs The null hypothesis is true and accepted if p-value < 0.05, or is rejected if otherwise Thus, the test outcomes in a particular test case are inconsistent if the null hypotheses of the two tests are both true
Trang 8Table 7-2 p-values for K-S tests using the first level score (L1)
p-values (columns 2 and 3) of the K-S tests for the target sites of AONs in (A) Set A and (B)
Set B Statistically significant p-values are indicated in bold and underlined Column 1 describes the test case The last column indicates whether the particular test case tests for AON efficacy and/or efficiency In (B), (+ 1,2) denotes AONs merged from (+ 1) and (+ 2) AONs Note: Wilcoxon rank-sum test cannot be used as one of its key assumptions is
violated, i.e., distributions of each AON grade’s L1 scores are distinct (box plots not shown)
For AONs in Set A, L1 scores for target sites in each grade of AONs do not
show any statistical difference (Table 7-2A), which agrees with the results reported by
Aartsma-Rus et al (2005) and Harding et al (2007) For AONs in Set B, L1 scores
for target sites of (++) and (+ 1) AONs are statistically higher that those of (–) AONs; their p-values are highlighted in Table 7-2B This result indicates that (++) and (+ 1)
AON target sites are more accessible than (–) AON target sites, and therefore, the L1
score could correlate with AON efficacy for Set B AONs
At this level of analysis, the nucleotide accessibility scores of every nucleotide in an AON target site were screened to determine the presence of two or more scores with values below 0.1 occurring consecutively in the nucleotide sequence of the target site
Trang 9(Figure 7-2) Such grouping of below 0.1 nucleotide accessibility scores is termed a
“low accessibility cluster”; refer to Table S3 of Wee et al (2008a) (attached in
Appendix A-1) for the list of low accessibility clusters manifested in all the analyzed AONs In Set A, 71% of target sites of (–) AONs had one or more low accessibility clusters While only 17% of target sites of (+) AONs had one or more clusters, they were manifested in 52% of target sites of (++) AONs Set B also exhibited similar trends: 71%, 70% and 80% of target sites of (–) AONs, (+) AONs and (++) AONs respectively had one or more clusters Therefore, the presence of these clusters in the AON target sites cannot correlate with AON efficacy and efficiency
Figure 7-2 Nucleotide accessibility scores of all the nucleotide in three representative AON target sites are depicted
In each plot, the horizontal axis represents the nucleotide position in the respective target exon and the nucleotide accessibility score is plotted on the vertical axis The low accessibility clusters are indicated in red
The nucleotide accessibility scores at the first and second levels of analysis are mean scores As a result, two nucleotides with identical accessibility scores may have markedly different numbers of unpaired predicted secondary structures at each step of transcriptional analysis In analyzing accessibility for AON binding, it may be
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Trang 10important to take into account steps of transcriptional analysis in which a nucleotide is predicted to have total absence of unpaired secondary structures, i.e the nucleotide is
predicted to be completely inaccessible or “engaged” at the particular step of
transcriptional analysis, as illustrated in Figure 7-3B For the purpose of analysis, at every step of transcriptional analysis, each nucleotide in the AON target site that is
engaged may then be depicted in a plot as illustrated in Figure 7-4 Table S4 of Wee
et al. (2008a) (attached in Appendix A-1) tabulates these plots for all the AON target sites analyzed
Trang 11Figure 7-3 Schematic illustration of an engaged nucleotide
(A) – (C) Schematic multiple secondary structures of the targeted exon (drawn in black) are
predicted in each step of transcriptional analysis, with some of the possible structural motifs shown here For illustration purpose, a particular nucleotide (marked in red) within an AON target site (green line) is tracked When this nucleotide is paired (denoted with *), it is not accessible for AON binding If this nucleotide is paired in all predicted secondary structures,
this nucleotide is defined as an engaged nucleotide at this particular step of transcriptional
analysis (B).
Figure 7-4 Schematic plot depicting the incidences of engaged nucleotides
In the above illustration, the horizontal axis denotes sequential steps of transcriptional analysis while the vertical axis denotes numbered nucleotides within the AON target site At
each step of transcriptional analysis, nucleotides in the target site that are engaged are depicted as a black dot in the plot The calculations of the fourth level scores, L4_OR and
L4_AND, are illustrated (refer to Section 7.5 for details)
For each nucleotide in an AON target site, a nucleotide engaged score is defined as:
Total number of steps of transcriptional analysis at which the nucleotide is engaged
Total number of steps of transcriptional analysis
Following this, an AON target site engaged score (L3) is defined as:
Sum of nucleotide engaged scores for all nucleotides within the AON target site
Sequential steps of transcriptional analysis
Trang 12In contrast to L1 score, the higher the L3 score, the less accessible a target site is for AON binding Appendix A-18 tabulates the L3 scores for all the AONs analyzed
For Set A AONs, target sites of (++) AONs have statistically lower engaged scores than target sites of both (–) and (+) AONs (Table 7-3A) Therefore, L3 score
can statistically differentiate both AON efficacy and efficiency However, seven outlier AONs (6% of total AONs) are identified In this context, these are AONs in
which the target site L3 scores contradict their AON grades For instance, target sites
of h52AON2 and h60AON2 graded as (─) could not induce exon skipping although
their L3 scores are below the 5th-percentile of L3 scores of (++) AON target sites
(Appendix A-18) On the other hand, target sites of h45AON5 and h46AON4 graded
as (+) and target sites of h51AON29, h55AON5 and h77AON2 graded as (++) all
have L3 scores higher than the 95th-percentile of L3 scores of (─) AON target sites
(Appendix A-18) but could still induce exon skipping The omission of these outlier
AONs strengthen the correlation of L3 scores with AON efficacy and efficiency
(Table 7-3A)
For Set B AONs, target sites of (++) AONs have statistically lower engaged
scores than target sites of (–) AONs (Table 7-3B) Upon omission of four outlier
AONs (6% of total AONs, i.e., H30A, H58A, H64A and H34A2), L3 scores can
statistically differentiate efficacy between (+ 1) and (–) AONs, and efficiency between (++) and (+ 2) AONs Overall, L3 scores correlate efficacies and efficiencies of (++),
(+) and (+ 1) AONs better than L1 scores (Table 7-2B versus Table 7-3B): L3 scores can differentiate between more AON grades than L1 scores; and for K-S tests in
Trang 13which L1 scores show statistical significance, the corresponding K-S tests of L3 score
achieve even lower p-values
Table 7-3 p-values for K-S tests using the third level score (L3)
p-values (with outliers: columns 2 and 3; without outliers: columns 4 and 5) of the K-S tests
for the target sites of AONs in (A) Set A and (B) Set B Statistically significant p-values are
indicated in bold and underlined Column 1 describes the test case The last column indicates whether the particular test case tests for AON efficacy and/or efficiency In (B), (+ 1,2) denotes AONs merged from (+ 1) and (+ 2) AONs Note: Wilcoxon rank-sum test cannot be
used as one of its key assumptions is violated, i.e., distributions of each AON grade’s L3
scores are distinct (box plots not shown)
With outliers Without outliers Test case:
To explain the contrast between K-S test results of the first and third level
scores, quartiles of the normalized L1 (L1) and L3 (L3) scores of AON target sites for AONs in each grade of Sets A and B are plotted for comparison; for example, the L1 score of an AON target site is the relative percentage difference between its L1 score