Mathematical and computational analysis of intracelluar dynamics 7

Chapter 7 Analysis of the Dynamics of Co-transcriptional Binding Accessibility of AON Target Sites In this chapter, dynamics of the co-transcriptional binding accessibility of previousl

Trang 1

Chapter 7

Analysis of the Dynamics of Co-transcriptional Binding Accessibility of AON Target Sites

In this chapter, dynamics of the co-transcriptional binding accessibility of previously

published AON target sites are analyzed, and correlated with the degree of reported efficiency in the induction of exon skipping

7.1 Overview of the analysis methodology

The analysis carried out in this chapter involves the following sequential steps:

1 Data collection (Section 7.2) Previously published AONs whose efficiency in the induction of selective exon skipping in the dystrophin pre-mRNA has been tested

in wet experiments are gathered; only AONs that target ESE sites are included They are then graded according to their reported efficiencies

2 Prediction of the co-transcriptional secondary structures of exons (Section 7.3)

A model to approximate transcription is used to predict the co-transcriptional

secondary structures of exons, which are targeted by the AONs gathered in step 1

3 Analysis of the dynamics of the co-transcriptional binding accessibility (Sections 7.4 and 7.5) The co-transcriptional binding accessibility of each nucleotide

within an AON target site is determined based on whether it is paired in the

Trang 2

novel scoring systems are formulated to quantify the dynamics of the transcriptional binding accessibility

co-4 Test for correlation between reported AON efficiencies in inducing exon skipping

with their co-transcriptional binding accessibility (Sections 7.4 and 7.5) The

scores (formulated in step 3) in each grade of AONs are tested for statistical differences and significances against other grades using two-sample Kolmogorov-Smirnov (K-S) test; each grade of AONs has distinct reported efficiencies All statistical tests are performed using the statistical software, R Version 2.0.0 (http://www.R-project.org)

Note: throughout the thesis, “efficacy” is used to describe qualitatively the ability of

an AON to induce selective exon skipping whereas “efficiency” is used to quantify the percentage of total mRNA molecules whose selected exon is skipped by an AON

7.2 Data set for analysis

A total of 176 AONs, reported by two independent sources Aartsma-Rus et al (2005) and Wilton et al (2007), that target ESEs to induce the skipping of 67 exons in

dystrophin pre-mRNA was analyzed Although the cell lines and experimental protocols used in these two studies were similar, the AONs from each study were analyzed separately because of the following reasons The range of AON lengths,

which may influence AON performance (Harding et al., 2007), differed significantly between the studies The AONs from the two sources Aartsma-Rus et al (2005) and Wilton et al (2007) showed median lengths of 19 and 26 nucleotides respectively,

and for the purpose of this study, are henceforth denoted as Set A and Set B

Trang 3

respectively Note that only 62 out of the 82 AONs reported by Wilton et al (2007)

are included in Set B, as the remaining ones either target non-ESE sites or result in unspecific exon skipping Secondly, as breakdown in Table 7-1, the respective sources graded their AONs differently according to their efficiencies in inducing exon skipping; AON efficiency was calculated based on densitograph semi-quantification

in the two publications

Table 7-1 Classification of published AON (antisense oligonucleotides) sequences

Published AONs from two independent sources are denoted as Set A and B respectively In each set, AONs are classified into different grades according to their efficiencies (E) in the induction of exon skipping

E ≥≥≥ 25%

Grade (+) 0% < E < 25%

Grade (+ 2 ) 0% < E < 10%

Siggia, 2000; Gultyaev et al., 1995), computational time is tractable only for

Trang 4

Alternatively, algorithms that could efficiently predict a long fully synthesized mRNA are considered (Zuker, 2003; Knudsen and Hein, 2003; Ding and Lawrence, 2003;

Flamm et al., 2000) Among them is mfold (Zuker, 2003), which is chosen in this

study because firstly it has a relatively high average prediction accuracy of 70%

(Mathews et al., 1999) and secondly, it has the advantage of being used in most

published experimental work on AONs that target the dystrophin gene (Aartsma-Rus

et al , 2002, 2005; Errington et al., 2003) and, therefore, results of this study can be

compared with them on a common basis

Figure 7-1 A model to approximate transcription elongation

To approximate the transcription elongation process, a “window of analysis” is shifted one nucleotide at a time along the pre-mRNA sequence towards the 3” end At the first window, its 3’ end coincides with the 3’ end of the target exon Correspondingly, at the last window, its 5’ end coincides with the 5’ end of the target exon Each window of analysis corresponds

to a step of transcriptional analysis at which the possible secondary structures of its sequence were predicted

exon

AON target site

intron intron

1500 nt

1 st step of transcriptional analysis

2 nd step of transcri ptional analysis

3 rd step of transcri ptional analysis

Last step of transcriptional analysis

1500 nt

1 st step of transcriptional analysis

2 nd step of transcri ptional analysis

3 rd step of transcri ptional analysis

Last step of transcriptional analysis

Direction of

pre-mRNA elongation

1 nt

Trang 5

As mfold does not consider folding paths, they are approximated using the model depicted in Figure 7-1 A “window of analysis” of pre-determined sequence length of 1500 nucleotides that includes the full length of the targeted exon corresponds to a “step of transcriptional analysis” To approximate the transcription elongation process, the window of analysis is shifted one nucleotide at a time along the pre-mRNA sequence towards the 3’ end At each step of transcriptional analysis, the possible secondary structures for the window sequence are predicted using mfold

version 3.1 (Zuker, 2003; Mathews et al., 1999) Since it is highly probable that the

nascent pre-mRNA may not have the chance to assume optimal secondary structures, sub-optimal secondary structures whose energies lie within 5% of the optimum are considered On average, 44,582 secondary structures are predicted per exon of which

24 to 47 secondary structures are predicted in each step of transcriptional analysis; number of secondary structures predicted in the 79 exons is given in Appendix A-17

Note that the model considers only the local secondary structures around the target exon As abundant hnRNPs (heterogeneous nuclear ribonuclear proteins) package long introns into compact secondary structures that deterred long-distance or

global intra-molecular complementary base pairings (Alberts et al., 2002), this

assumption is justified given that long introns are typical in dystrophin gene (Figure 6-1 of Chapter 6) On the other hand, the 1500 nucleotides length of the window of analysis is estimated from experimental measurements It has been reported that the 3’ splice site is recognized 48 seconds after it is transcribed (Beyer and Osheim, 1988) Based on the measured elongation rate of dystrophin pre-mRNA at 1700 to

2500 nucleotides per minute (Tennyson et al., 1995), about 1360 to 2000 nucleotides would be appended to the nascent transcript during this period Nevertheless, co-

Trang 6

transcriptional secondary structures of exons 2 (62 bp), 29 (150 bp) and 59 (269 bp) were also predicted with lengths of window of analysis of 1200 and 2000 nucleotides,

but however, no statistical differences in their co-transcriptional secondary structures

are detected (data not shown)

7.4 Analysis of the dynamics of co-transcriptional

binding accessibility of AON target sites

Four levels of analysis using scoring methodologies of increasing complexity are used

to score the binding accessibility of AON target sites in the 2 sets of published AONs Scores at each level of analysis were then correlated with the degree of reported AON efficiency and efficacy for each set of AONs Note that these scoring methodologies

are applicable for any secondary structure prediction tools, as long as transcriptional secondary structures of AON target sites can be generated

At this simplest level of analysis, the binding accessibility score of an AON target site

(L1) is computed To do so however, the binding accessibility score of each

nucleotide within the AON target site is needed, and is determined by this ratio:

Number of predicted secondary structures in which the nucleotide is unpaired

Total number of secondary structures predicted

Trang 7

Note: all secondary structures predicted at every step of transcriptional analysis (Figure 7-1) are included in the calculation; a nucleotide is “unpaired” when it does not form complementary base pairing with another nucleotide within the pre-mRNA

Thus, the accessibility score for the AON target site, L1 is:

Sum of nucleotide accessibility scores for all nucleotides within the AON target site

Total number of nucleotides in AON target site

The L1 scores for each AON target site analyzed are tabulated in Appendix

A-18 Two-sample Kolmogorov-Smirnov (K-S) test is used to test for statistical

differences and significances of the L1 scores for target sites between any two AON

grades of the same set Table 7-2 tabulates the p-values for the statistical tests To ensure consistent test outcomes, two exclusive one-tailed tests, i.e., Ho: 1st < 2nd and Ho: 1st > 2nd (columns 2 and 3) are performed for each test case (as described in column one) For instance, for the test case (++ versus –) of Set A, the null hypothesis, Ho: 1st < 2nd tests for whether L1 scores for target sites in (++) AONs are

smaller than those in (–) AONs The null hypothesis is true and accepted if p-value < 0.05, or is rejected if otherwise Thus, the test outcomes in a particular test case are inconsistent if the null hypotheses of the two tests are both true

Trang 8

Table 7-2 p-values for K-S tests using the first level score (L1)

p-values (columns 2 and 3) of the K-S tests for the target sites of AONs in (A) Set A and (B)

Set B Statistically significant p-values are indicated in bold and underlined Column 1 describes the test case The last column indicates whether the particular test case tests for AON efficacy and/or efficiency In (B), (+ 1,2) denotes AONs merged from (+ 1) and (+ 2) AONs Note: Wilcoxon rank-sum test cannot be used as one of its key assumptions is

violated, i.e., distributions of each AON grade’s L1 scores are distinct (box plots not shown)

For AONs in Set A, L1 scores for target sites in each grade of AONs do not

show any statistical difference (Table 7-2A), which agrees with the results reported by

Aartsma-Rus et al (2005) and Harding et al (2007) For AONs in Set B, L1 scores

for target sites of (++) and (+ 1) AONs are statistically higher that those of (–) AONs; their p-values are highlighted in Table 7-2B This result indicates that (++) and (+ 1)

AON target sites are more accessible than (–) AON target sites, and therefore, the L1

score could correlate with AON efficacy for Set B AONs

At this level of analysis, the nucleotide accessibility scores of every nucleotide in an AON target site were screened to determine the presence of two or more scores with values below 0.1 occurring consecutively in the nucleotide sequence of the target site

Trang 9

(Figure 7-2) Such grouping of below 0.1 nucleotide accessibility scores is termed a

“low accessibility cluster”; refer to Table S3 of Wee et al (2008a) (attached in

Appendix A-1) for the list of low accessibility clusters manifested in all the analyzed AONs In Set A, 71% of target sites of (–) AONs had one or more low accessibility clusters While only 17% of target sites of (+) AONs had one or more clusters, they were manifested in 52% of target sites of (++) AONs Set B also exhibited similar trends: 71%, 70% and 80% of target sites of (–) AONs, (+) AONs and (++) AONs respectively had one or more clusters Therefore, the presence of these clusters in the AON target sites cannot correlate with AON efficacy and efficiency

Figure 7-2 Nucleotide accessibility scores of all the nucleotide in three representative AON target sites are depicted

In each plot, the horizontal axis represents the nucleotide position in the respective target exon and the nucleotide accessibility score is plotted on the vertical axis The low accessibility clusters are indicated in red

The nucleotide accessibility scores at the first and second levels of analysis are mean scores As a result, two nucleotides with identical accessibility scores may have markedly different numbers of unpaired predicted secondary structures at each step of transcriptional analysis In analyzing accessibility for AON binding, it may be

63 64 65 66 67 68 69 70 71 72 73 74 75 76 77

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Trang 10

important to take into account steps of transcriptional analysis in which a nucleotide is predicted to have total absence of unpaired secondary structures, i.e the nucleotide is

predicted to be completely inaccessible or “engaged” at the particular step of

transcriptional analysis, as illustrated in Figure 7-3B For the purpose of analysis, at every step of transcriptional analysis, each nucleotide in the AON target site that is

engaged may then be depicted in a plot as illustrated in Figure 7-4 Table S4 of Wee

et al. (2008a) (attached in Appendix A-1) tabulates these plots for all the AON target sites analyzed

Trang 11

Figure 7-3 Schematic illustration of an engaged nucleotide

(A) – (C) Schematic multiple secondary structures of the targeted exon (drawn in black) are

predicted in each step of transcriptional analysis, with some of the possible structural motifs shown here For illustration purpose, a particular nucleotide (marked in red) within an AON target site (green line) is tracked When this nucleotide is paired (denoted with *), it is not accessible for AON binding If this nucleotide is paired in all predicted secondary structures,

this nucleotide is defined as an engaged nucleotide at this particular step of transcriptional

analysis (B).

Figure 7-4 Schematic plot depicting the incidences of engaged nucleotides

In the above illustration, the horizontal axis denotes sequential steps of transcriptional analysis while the vertical axis denotes numbered nucleotides within the AON target site At

each step of transcriptional analysis, nucleotides in the target site that are engaged are depicted as a black dot in the plot The calculations of the fourth level scores, L4_OR and

L4_AND, are illustrated (refer to Section 7.5 for details)

For each nucleotide in an AON target site, a nucleotide engaged score is defined as:

Total number of steps of transcriptional analysis at which the nucleotide is engaged

Total number of steps of transcriptional analysis

Following this, an AON target site engaged score (L3) is defined as:

Sum of nucleotide engaged scores for all nucleotides within the AON target site

Sequential steps of transcriptional analysis

Trang 12

In contrast to L1 score, the higher the L3 score, the less accessible a target site is for AON binding Appendix A-18 tabulates the L3 scores for all the AONs analyzed

For Set A AONs, target sites of (++) AONs have statistically lower engaged scores than target sites of both (–) and (+) AONs (Table 7-3A) Therefore, L3 score

can statistically differentiate both AON efficacy and efficiency However, seven outlier AONs (6% of total AONs) are identified In this context, these are AONs in

which the target site L3 scores contradict their AON grades For instance, target sites

of h52AON2 and h60AON2 graded as (─) could not induce exon skipping although

their L3 scores are below the 5th-percentile of L3 scores of (++) AON target sites

(Appendix A-18) On the other hand, target sites of h45AON5 and h46AON4 graded

as (+) and target sites of h51AON29, h55AON5 and h77AON2 graded as (++) all

have L3 scores higher than the 95th-percentile of L3 scores of (─) AON target sites

(Appendix A-18) but could still induce exon skipping The omission of these outlier

AONs strengthen the correlation of L3 scores with AON efficacy and efficiency

(Table 7-3A)

For Set B AONs, target sites of (++) AONs have statistically lower engaged

scores than target sites of (–) AONs (Table 7-3B) Upon omission of four outlier

AONs (6% of total AONs, i.e., H30A, H58A, H64A and H34A2), L3 scores can

statistically differentiate efficacy between (+ 1) and (–) AONs, and efficiency between (++) and (+ 2) AONs Overall, L3 scores correlate efficacies and efficiencies of (++),

(+) and (+ 1) AONs better than L1 scores (Table 7-2B versus Table 7-3B): L3 scores can differentiate between more AON grades than L1 scores; and for K-S tests in

Trang 13

which L1 scores show statistical significance, the corresponding K-S tests of L3 score

achieve even lower p-values

Table 7-3 p-values for K-S tests using the third level score (L3)

p-values (with outliers: columns 2 and 3; without outliers: columns 4 and 5) of the K-S tests

for the target sites of AONs in (A) Set A and (B) Set B Statistically significant p-values are

indicated in bold and underlined Column 1 describes the test case The last column indicates whether the particular test case tests for AON efficacy and/or efficiency In (B), (+ 1,2) denotes AONs merged from (+ 1) and (+ 2) AONs Note: Wilcoxon rank-sum test cannot be

used as one of its key assumptions is violated, i.e., distributions of each AON grade’s L3

scores are distinct (box plots not shown)

With outliers Without outliers Test case:

To explain the contrast between K-S test results of the first and third level

scores, quartiles of the normalized L1 (L1) and L3 (L3) scores of AON target sites for AONs in each grade of Sets A and B are plotted for comparison; for example, the L1 score of an AON target site is the relative percentage difference between its L1 score

Định dạng
Số trang	26
Dung lượng	223,72 KB