MVRM A Hybrid Approach to Predict siRNA Efficacy

To this end, views of siRNAs that integrate available siRNA design rules are ﬁrst learned using an adaptive Fuzzy C Means FCM algorithm.. Therefore, they are in general poor to individua

Trang 1

MVRM: A hybrid approach to predict siRNA

efﬁcacy

Bui Ngoc Thang

University of Engineering and Technology,

Vietnam National University, Hanoi

144 Xuanthuy, Caugiay, Hanoi, Vietnam

Email: thangbn@vnu.edu.vn

Le Sy Vinh University of Engineering and Technology, Vietnam National University, Hanoi

144 Xuanthuy, Caugiay, Hanoi, Vietnam Email: vinhls@vnu.edu.vn

Ho Tu Bao School of Knowledge Science Japan Advanced Institute of Science

and Technology Email: bao@jaist.ac.jp

Abstract—The discovery of RNA interference (RNAi) leads

to design novel drugs for different diseases Selecting short

interfering RNAs (siRNAs) that can knockdown target genes

efﬁciently is one of the key tasks in studying RNAi A number

of predictive models have been proposed to predict knockdown

efﬁcacy of siRNAs, however, their performance is still far from

the expectation This work aims to develop a predictive model to

enhance siRNA knockdown efﬁcacy prediction The key idea is to

combine both the rule–based and the model–based approaches.

To this end, views of siRNAs that integrate available siRNA design

rules are ﬁrst learned using an adaptive Fuzzy C Means (FCM)

algorithm The learned views and other properties of siRNAs

are combined to ﬁnal representations of siRNAs The elastic net

regression method is employed to learn a predictive model from

these ﬁnal representations Experiments on benchmark datasets

showed that the proposed method achieved stable and accurate

results in comparison with other methods.

I INTRODUCTION RNA interference (RNAi) is a cellular process in which

long double stranded RNA duplex or hairpin precursors are

cleaved into short interfering RNAs (siRNAs) by the

ribonu-clease III enzyme Dicer siRNAs bind the RNA induced

silencing complex (RISC), then unwinded into sense and

antisense strands, after that antisense siRNAs bind to their

complementary target mRNAs and induce their degradation

In 1998, Fire and Mello discovered the important role

of dsRNAs when they studied RNAi in the nematode worm

Caenorhabditis elegans (they were awarded the Nobel Prize in

Physiology or Medicine for their contributions to research on

RNAi in 2006) Studies on the discovery of RNAi have had

an immense impact on biomedical research and make RNAi as

a valuable tool to design novel medical applications [27], [7],

[13], [25], [17], [10] In RNAi research, synthesizing of highly

effective siRNAs is a crucial task to design novel drugs for

the treatment of different diseases such as inﬂuenza A virus,

HIV, hepatitis B virus, RSV viruses, cancer disease and so

on As a consequence, siRNA–based silencing is considered

as one of the most promising techniques in future therapy

and predicting knockdown efﬁcacy of siRNAs is an essential

problem for effective siRNA selection [39], [40], [28], [31],

[32], [33], [34], [35]

A number of algorithms have been proposed to design

and predict effective siRNAs They could be categorized into

two approaches: the rule–based approach and the model–based

approach [14], [18], [23]

The rule–based approach proposes different rules to gen-erate effective siRNAs These rules were empirically designed and examined based on small datasets The ﬁrst rational

siRNA design rule was detected by Elbashir et al [6] They

suggested that siRNAs of size 19–21 nt with 2 nt overhangs

at the 3 ends can efﬁciently degrade target genes mRNAs

Scherer et al [22] found that the thermodynamic properties

are important characteristics to design effective siRNAs for inhibiting target speciﬁc mRNAs Soon after that, various rational design rules to generate effective siRNAs have been proposed [21], [30], [1], [15], [37], [38] For example, Uitei and his colleagues [30] examined 72 siRNAs targeting six genes and discovered four criteria for effective siRNA design: (i) A or U at position 19, (ii) G or C effective at position 1, (iii) at least ﬁve U or A residues from positions 13–19, (iv) no GC stretch more than 9 nt Amarzguioui and co– workers [1] analyzed 46 siRNAs targeting genes and reported the following rule of six criteria for effective siRNA design: (i)ΔT3= T3−T5, the difference between the number of A/U residues in three terminal positions at the 3end and at the 5 end (relative to the sense strand of the siRNA) ΔT3 > 1

is positively correlated; (ii) G or C residue at position 1, positively correlated; (iii) an U residue at position 1, negatively correlated; (iv) an A residue at position 6, positively correlated; (v) A or U at position 19, positively correlated; (vi) G at position 19, negatively correlated

However, the rule–based approach does not reach our satisfaction About 65% of siRNAs generated by these rules have failed when experimentally tested In particular, they were 90% in inhibition and nearly 20% of them were inactive [20] The main reason is that siRNA design rules were empirically analyzed on small datasets and siRNAs were synthesized from speciﬁc genes Therefore, they are in general poor to individually design highly effective siRNAs

The model–based approach includes predictive models that were learned from larger datasets by different machine learning techniques The performance of predictive models is more accurate and reliable than that of the rule–based approach [24] For example, Huesken and co–workers [12] proposed a new algorithm, Biopredsi, by applying artiﬁcial neural networks to

a dataset of 2431 scored siRNAs This dataset was widely used

as a benchmark to train and test other predictive models such as the ThermoComposition21 [24], DSIR [28], i–Score [14] and Scales models [36] The predictive models are currently esti-mated as the best predictors [18], [36] More recently, Sciabola

2015 Seventh International Conference on Knowledge and Systems Engineering

Trang 2

et al [23] employed three–dimension structural information

of siRNAs to increase performance of their model A stable

predictive model [3] called BiLTR was developed to predict

knockdown efﬁcacy of siRNAs

Although model–based methods are better than rule-based

methods, they suffer from some drawbacks Their performance

is still slow and unstable The predictive ability of these

models is considerably decreased and changed when tested

on independent datasets such as the performance of 18 current

models tested on three independent datasets [23] Our analyses

reveal two main reasons of the models: (1) siRNAs datasets

were provided by different groups under different protocols

in different scenarios [16], [41] so the distributions of these

datasets are very different and siRNAs data are heterogeneous

(2) The performance of machine learning methods also heavily

depends on the choice of data representation (or features) on

which they are applied In the previous models, siRNAs were

encoded by binary, spectral, tetrahedron, and sequence

repre-sentations However, because of siRNA distribution diversity

and unsuitable measures based on these siRNA representations,

they can be inappropriate to represent siRNAs in order to build

a good model for predicting siRNA efﬁcacy

In this paper, we develop a hybrid approach, named

MVRM, to predict the siRNA knockdown efﬁcacy The method

combines both design rules and machine learning methods

to build a predictive model To this end, we focus on the

representation of siNRAs Available siRNA design rules are

considered as prior background knowledge for generating

views to represent siRNAs Each view captures characteristics

of a siRNA design rule These views are then learned by

exploiting the fuzzy C means algorithm A new representation

of siRNAs is composed by learned views and other properties

of siRNAs such as melting temperature, molecular weight and

thermodynamic values After transforming siRNAs to the new

representation, a predictive model was learned by applying a

regularized method, Elastic Net, to predict knockdown efﬁcacy

of siRNAs

Our method is experimentally compared with other

meth-ods on benchmark datasets Experiments show promising

re-sults that the performance of the MVRM is comparable or

better than that of other methods

II METHODS Our model, MVRM, is a hybrid of the rule–based and the

model–based approaches so it consists of two main phases:

Learning siRNA views from design rules to build new

rep-resentations of siRNAs and building a predictive model from

these new representations to predict knockdown efﬁcacy of

siRNAs

A Learning siRNA views

{s1, s2, , s m } with the same length n The knownkdown

efﬁcacy of sequence s i ∈ S is e i (i = 1 m) A set of k

design rulesR = {r1, r2, , r k } are collected from previous

rule-based studies The learning siRNA views includes four

steps: Encoding siRNAs by content, Encoding rules to views,

Learning siRNA views, and Encoding siRNAs by learned

views

TABLE I T HE FIVE WELL - STUDIED CHARACTERISTICS OF SI RNA

SEQUENCES Properties Condition Encoding column

GC content From 0.3 to 0.6 (1,0,-1,-1) at column(n + 1)

Otherwise (0,1,-1,-1) at column(n + 1)

T >= 1 (1,0,-1,-1) at column(n + 2)

GC stretch >= 9 (1,0,-1,-1) at column(n + 3)

A/Us at ﬁve positions of the 5‘end >= 3 (1,0,-1,-1) at column(n + 4)

A/Us at seven positions of the 5‘end >= 5 (1,0,-1,-1) at column(n + 5)

Encoding siRNAs by content: Each siRNA is a sequence

of n nucleotides such as “GAAAGGAAUUGUAUAAAUC” There are ﬁve well-studied characteristics of an siRNA [26]: (1) GC content, (2) the difference of A/U in 3 nucleotides at the two ends (T), (3) GC stretch, (4), (5) the number of A/U

at ﬁve and seven positions of the 5’ end of the antisense strand This step encodes siRNA sequence s i (i = 1 m) by

a binary matrix M i of size 4 × (n + 5) in which 4 rows

represent for 4 nucleotide types and(n+5) columns represent

for n nucleotides and 5 siRNA characteristics The ﬁrst n columns represent for n nucleotides, i.e columnc (c = 1 n)

is binary vector of size 4 × 1 representing the nucleotide

at position c on the siRNA sequence Speciﬁcally, four nu-cleotides A, C, G, and U are encoded by encoding vectors

(1, 0, 0, 0) T , (0, 1, 0, 0) T , (0, 0, 1, 0) T and(0, 0, 0, 1) T,

respec-tively The last ﬁve columns of the matrix represent for ﬁve characteristics of siRNA They are computed and encoded as binary vectors as described in Table I The encoding matrix

M of an siRNA sequence of 19 nucleotides GAAAGGAAU-UGUAUAAAUC” is described in Table II

Encoding design rules to views: This step encode each

design ruler i (i = 1 k) by a matrix T i (viewT i) of size

4 × (n + 5) in which 4 rows represent for 4 nucleotides types

and(n+5) columns represents for n nucleotides and 5 siRNA

characteristics Columnj th (j = 1 n) of the matrix shows

the knockdown efficacy of nucleotides A, C, G, U The last five columns describe the knockdown efficacy of the five siRNA characteristics

The knockdown efﬁcacy of viewT ihas to satisfy constrains

of the siRNA design rule The design rule r i propositionally

describes the occurrence or absence of nucleotides at different positions on effective siRNAs and other mentioned siRNA characteristics Thus, if design rule r i states the occurrence

(or absence) of some nucleotides on the j th position, then

their corresponding values in the viewT iwould be greater (or

smaller) than other values at columnj Similarly, if the siRNA

design ruler ishows the characteristicsj th, the corresponding

value at column(n + j) thof matrixT iwould be greater than

the other values in the column

For example, consider a rule r and its encoding matrix

T, the design rule shows that at position 19, nucleotides A

is effective and nucleotide C is ineffective It means that the knockdown efﬁcacy of nucleotide A is larger than that of the other nucleotides and the knockdown efﬁcacy of nucleotide C

is smaller than that of the other nucleotides T[1,19], T[2,19], T[3,19], and T[4,19] are the knockdown efﬁcacy of A, C, G,

Trang 3

TABLE II T HE ENCODING MATRIX M OF SI RNA SEQUENCE

GAAAGGAAUUGUAUAAAUC T HE FIRST 19 COLUMNS ENCODE FOR

19 NUCLEOTIDES OF THE SI RNA SEQUENCE T HE LAST 5 COLUMNS

ENCODE FOR 5 CHARACTERISTICS OF THE SI RNA SEQUENCE

Posision 1 2 3 4 5 18 19 20 21 22 23 24

and U, respectively The rule at position 19 can be expressed

into speciﬁc constrains on matrix T as follows

• T [2, 19] − T [1, 19] < 0, i.e., A is effective than C

• T [3, 19] − T [1, 19] < 0, i.e., A is effective than G

• T [4, 19] − T [1, 19] < 0, i.e., A is effective than U

• T [2, 19] − T [3, 19] < 0, i.e., C is ineffective than G

• T [2, 19] − T [4, 19] < 0, i.e., C is ineffective than U

LetG i (i = 1 k) be the set of speciﬁc constrains of rule

r i on matrix T i where each constraint of G i is in the form

(T [p, j] − T [q, j] < 0) where row p = 1 4 and column

j = 1 n + 5.

Learning views: The siRNA set {s1, s2, , s m } will play

as the training set to learn k views (optimize k matrices

T1, , T k) Learning views can be considered as a clustering

problem[5] wherek matrices are considered as centers of k

clusters Each encoding matrix M i of siRNA s i is assigned

to viewsT j with a membership valueu ij (i = 1 m; j =

1 k) It means that siRNA sequences can be generated by

different views at different conﬁdences

We employ the FCM algorithm [2] with k clusters to

opti-mizek views (matrices) and membership values by minimizing

the following objective function

R =

m

i=1

k

j=1

u2

ij ||M i − T j ||2

subject to:

1) k constraint sets G1, , G kwhereG iset of speciﬁc

constrains of ruler i on matrixT i

i=1 u ij = 1, j = 1, k

where|| ◦ || F rois the Frobenius norm to calculate norm of

a matrix Membership values and matrices can be solved by

using an iterative method: each column of a matrix is derived

while keeping the other ones The ﬁnal solution is computed

as follows:

k

z=1

||M i −T j || F ro

||M i −T z || F ro

T j [., c] =

i=1 u2

ij M i [., c]

i=1 u2

ij

(3)

WhereT j [., c] is a vector corresponding to the c thcolumn

of the matrixT j.

Algorithm 1 describes two steps including the computing membership values of encoding matrices and updating matrices corresponding to views These two steps are repeated until membership values and views meet convergence criteria

Encoding siRNAs by views: To obtain a ﬁnal

rep-resentation of siRNAs, learned views are linearly com-bined and other properties of siRNAs are employed In particular, nucleotides A, C, G, and U of siRNAs at

a position c (c = 1 n) are represented by vectors

(T1[1, c], , T k [1, c]), (T1[1, c], , T k [1, c]), (T1[1, c], ,

T k [1, c]), and (T1[1, c], , T k [1, c]), respectively If the GC

content of siRNAs satisﬁes its condition (see Table 1), it is represented by the vector (T1[1, n + 1], , T k [1, n + 1]) In

contrast, it is represented by (T1[2, n + 1], , T k [2, n + 1]).

Four other characteristics of siRNAs are computed in the similar way In short, each siRNA sequences is encoded by a vector ofk×(n+5) Moreover, other ﬁve properties of siRNAs

(melting temperature, molecular weight, three thermodynamic properties consisting of enthalpy, entropy, and free energy ) are added to the ﬁnal representation They are calculated by the nearest neighbor method [48] As a result, each siRNA is encode by a vector ofk × (n + 5) + 5.

B Learning a predictive model

This step will build a predictive model using the new representation of siRNAs The elastic net method [Zou, H

et al., 2005] is applied to build the model for predicting knockdown efﬁcacy of siRNAs This method is not only to build the model but also to select important features that effect

to the target label In addition, based on the lasso regularization term of elastic net method, signification variables or important characteristics that influence the knockdown efficacy of siR-NAs are detected

III EXPERIMENTALEVALUATION This section presents experimental evaluation by com-paring the proposed method MVRM (multiple view based regression model) with recent methods for siRNA knockdown efﬁcacy prediction on four benchmark datasets

• The Huesken dataset of 2431 siRNA sequences target-ing 34 human and rodent mRNAs, commonly divided into the training set HU train of 2182 siRNAs and the testing set HU test of 249 siRNAs [12]

• The Reynolds dataset of 240 siRNAs [21]

• The Vicker dataset of 76 siRNA sequences targeting two genes [29]

• The Harborth dataset of 44 siRNA sequences targeting one gene [9]

We employed ﬁve siRNA design rules (k = 5) to learn

five views of siRNAs Specifically, the five design rules are Reynolds rule, Uitei rule, Amarzguioui rule, Jalag rule, Hsieh rule [21], [30], [1], [11], [15] The HU train set was used to learn these views and MVRM model The other datasets were used to comparative evaluation

Trang 4

Algorithm 1 Multi-view Learning

Input: A dataset S = {s1, s2, , s m } where s i , i =

1 m are siRNA sequences of length n; a set R of k

design rules.t Max is the number of iterations.

Output:k matrices (views) T1, T2, , T k.

main

Encode siRNAs by content

for r i inR do

– Form the set of constraintsG i based onr i

– Initialize the viewT i satisfyingG i

end for

t = 0 { Iterative step}

repeat

t ← t + 1

{Compute membership values as follows}

for i = 1 to m do

for j = 1 to k do

Computeu (t) ij using equation (2)

end for

{Update views as follow}

for j = 1 to k do

for c = 1 to n + 5 do

ComputeT j [., c] (t) using equation (3)

if (T j [., c] (t) satisﬁes the constraintsG j) then

T j [., c] ← T j [., c] (t)

end if

end for

until T (t)

p −T (t−1)

p F ro

T (t−1)

u (t)

qp −u (t−1)

qp 2

u (t−1) qp 2 ≤ 1 p=k,q=m

p=1,q=1 or(t > t Max) end main

Fig 1 Upper and lower curves of means squared error as a function ofλ

values

The turning parameter of the objective function of the

model was estimated by employing 10–fold cross validation

Figure 1 shows the curves of upper and lower bounds of mean

squared error rates between predicted efﬁcacy and

experimen-Fig 2 Coefﬁcients of the MVRM model show the importance of 125 features TABLE III T HE R VALUES OF 18 MODELS AND MVRM ON THREE

INDEPENDENT DATA SETS Algorithm R Reynolds R V icker R Harborth

(244si/7g) (76si/2g) (44si/1g) GPboot[42] 0.55 0.35 0.43

Takasaki[43] 0.03 0.25 0.01 Reynolds 1[21] 0.35 0.47 0.23 Reynolds 2[21] 0.37 0.44 0.23 Schawarz[37] 0.29 0.35 0.01 Khvorova[44] 0.15 0.19 0.11 Stockholm 1[45] 0.05 0.18 0.28 Stockholm 2[45] 0.00 0.15 0.41

i-score[14] 0.54 0.58 0.43 BIOPREDsi[12] 0.53 0.57 0.51

MVRM model 0.6 0.614 0.52

tal efficacy through cross validation We used five design rules and five other properties of siRNAs in learning our model

so the ﬁnal representation has 5 × 24 + 5 = 125 features.

After learning the model, 78 important features that influence the knockdown efficacy of siRNAs were chosen Figure 2 describes the influencing ability of 125 features During the learning process, the coefficients of less important features are driven to zero Based on the coefficients of the MVRM model, important features can be easily selected in order to design effective siRNAs

The MVRM model was compared to most of state–of–the– art methods For a fair comparison, we carried out experiments

on MVRM in the same conditions as reported by other methods Concretely, the comparative evaluation is as follows 1) Comparison of MVRM with BIOPREDsi [12], Ther-mocomposition21 [24], DSIR [28], SVM [23] and , BiLTR [3] when trained on the HU train and tested

on the HU test dataset The Pearson correlation coef-ﬁcients of those ﬁve models are 0.66, 0.66, 0.67 and 0.80, 0.67 respectively The performance of MVRM estimated on the HU test is 0.66 The performance of

Trang 5

MVRM model is similar to that of other models but

less than that of SVM model The reason is that SVM

model uses positional features and 3D information

This 3D feature captures the ﬂexibility and strain

of siRNAs that can be important characteristics for

siRNAs of the HU test set extracted from human

NCI–H1299, Hela genes and rodent genes [12]

2) Comparison of MVRM with 19 models including

BIOPREDsi, DSIR, SVM, and BiLTR when all of

models were trained on the HU train set and tested

on three independent datasets of Reynolds, Vicker

and Harborth The Pearson correlation coefﬁcients

of MVRM model are 0.6, 0.614, and 0.52 when

tested Reynolds, Vicker and Harborth datasets,

re-spectively Table III shows that the MVRM

consider-ably achieved results higher than the ﬁrst 17 models

It was better than SVM and BiLTR models when

tested on the ﬁrst two datasets The MVRM was not

as good as BiLTR on the Harborth dataset However,

one limitation of BiLTR model is computational cost

to train transformation matrices and parameters It

took about 5 days to train BiLTR while only about

ﬁve minutes to train MVRM model Besides that,

unlike most of other models, the MVRM model

produces the stable results across each of independent

siRNA datasets

In these comparative studies, it was found that the

perfor-mance of MVRM is more stable and higher than that of other

models The reason is that previous siRNA representations

can be unsuitable to represent siRNAs provided different

groups under different protocols In our proposed method,

the representation is enriched by incorporating background

knowledge of siRNA design rules Therefore, it can capture

the distribution diversity of siRNA data

As presented in the experimental comparative evaluation,

MVRM achieved better results than most other methods in

predicting siRNA knockdown efﬁcacy

IV CONCLUSION

In this paper, we have proposed a stable and accurate

method to predict the knockdown efﬁcacy of siRNA sequences

In the model, to enrich siRNA representation, views of

siR-NAs are constructed and learned by incorporating background

knowledge of available design rules By combining these

views, an appropriate siRNA representation is also developed

to represent siRNAs belonging to different distributions that

are provided by research groups under different protocols

The experimental comparative evaluation on commonly

used datasets with standard evaluation procedure in different

contexts shows that the proposed method achieved promising

results There are some reasons for that First, it is expensive

to experimentally analyze the knockdown efﬁcacy of siRNAs,

and thus most of available datasets have relatively small size

leading to limited results Second, MVRM has its advantages

by incorporating domain knowledge (siRNA design rules)

experimentally found from different datasets Third, MVRM

is generic and can be easily exploited when new design rules

are discovered When our proposed model was tested on the

three independent datasets generated by different empirical

experiments, the MVRM achieves the best results on the Reynolds and Vicker datasets Additionally, the performance

of MVRM model is higher than that of the other models except the SVM and BiLTR models when tested on the Harborth dataset (Table III)

ACKNOWLEDGMENT Bui Ngoc Thang and Le Sy Vinh are ﬁnancially supported

by Vietnam National Foundation for Science and Technology (102.01-2013.04)

REFERENCES [1] Amarzguioui M, Prydz H, An algorithm for selection of functional siRNA

sequences, Biochem Biophys Res Commun, 316:1050–8, 2004.

[2] Bezdek JC, Ehrlich R, Full W, FCM: The fuzzy c-means clustering

algorithm, Computers & Geosciences, 10 (2): 191–203, 1984.

[3] Bui TN, Ho TB, Tatsuo K, A semi-supervised tensor regression model

for siRNA efﬁcacy prediction, BMC Bioinformatics, 16: 80, 2015.

[4] Chang PC, Pan WJ, Chen CW, Chen YT, Chu YW, A design engine of

siRNA that integrates SVMs prediction and feature ﬁlters, Biocatalysis

and Agricultural Biotechnology, 1:129–134, 2012.

[5] Chang X., Dacheng T., Chao X., A Survey on Multi-view Learning, CoRR

abs/1304.5634, 2013.

[6] Elbashir SM, Lendeckel W, Tuschl T, RNA interference is mediated by

21– and 22–nucleotide RNAs, Genes Dev, 2001, 15:188–200.

[7] Elbashir SM, Harborth J, Lendeckel W, Yalcin A, Klaus W, Tuschl T,

Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells, Nature, 411:494–498, 2001.

[8] Gong W, Ren Y, Xu Q, Wang Y, Lin D, Zhou H, Li T, Integrated

siRNA design based on surveying of features associated with high RNAi effectiveness, BMC Bioinformatics, 7:516, 2006.

[9] Harborth J, Elbashir SM, Vandenburgh K, Manninga H, Scaringe SA,

Weber K, Tuschl T, Sequence, chemical, and structural variation of small

interfering RNAs and short hairpin RNAs and the effect on mammalian gene silencing, Antisense Nucleic Acid Drug Dev, 13:83–105, 2003.

[10] Hannon GJ, Rossi JJ, Unlocking the potential of the human genome

with RNA interference, Nature, 43:371–378, 2004.

[11] Hsieh AC, Bo R, Manola J, Vazquez F, Bare O, Khvorova A, Scaringe S,

Sellers WR, A library of siRNA duplexes targeting the phosphoinositide

3-kinase pathway: determinants of gene silencing for use in cell-based screens, Nucleic Acids Res, 32:893–901, 2004.

[12] Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, Mellon B, Engel S, Rosenberg A, Cohen D, Labow M, Reinhardt M, Natt

F, Hall J, Design of a Genome–Wide siRNA Library Using an Artiﬁcial

Neural Network, Nature Biotechnology, 23:955–1001, 2005.

[13] Hutvagner G, McLachlan J, Balint E, Tuschl T, Zamore PD, A cellular

function for the RNA interference enzyme Dicer in small temporal RNA maturation, Science, 293:834–838, 2001.

[14] Ichihara M, Murakumo Y, Masuda A, Matsuura T, Asai N, Jijiwa M,

Ishida M, Shinmi J, Yatsuya H, Qiao S et al., Thermodynamic instability

of siRNA duplex is a prerequisite for dependable prediction of siRNA activities, Nucleic Acids Res, 35:e123, 2007.

[15] Jagla B, Aulner N, Kelly PD, Song D, Volchuk A, Zatorski A, Shum

D, Mayer T, De Angelis DA, Ouerfelli O, Rutishauser U, Rothman JE,

Sequence characteristics of functional siRNAs, RNA, 11:864–872, 2005.

[16] Klingelhoefer JW, Moutsianas L, Holmes CC, Approximate Bayesian

feature selection on a large meta-dataset offers novel insights on factors that effect siRNA potency, Bioinformatics, 25:1594–1601, 2009.

[17] Meister G, Tuschl T, Mechanisms of gene silencing by double-stranded

RNA, Nature, 43:343–349, 2004.

[18] Mysara M, Elhefnawi M, Garibaldi JM, MysiRNA: improving siRNA

efﬁcacy prediction using a machine-learning model combining multi-tools and whole stacking energy, J Biomed Inform, 45:528–34, 2012.

[19] Qiu S, Lane T, A Framework for Multiple Kernel Support Vector

Regression and Its Applications to siRNA Efﬁcacy Prediction, IEEE/ACM

Trans Comput Biology Bioinform, 6:190–199, 2009.

Trang 6

[20] Ren Y, Gong W, Xu Q, Zheng X, Lin D, et al., siRecords: an extensive

database of mammalian siRNAs with efﬁcacy ratings, Bioinformatics,

22:1027–1028, 2006.

[21] Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, Khvorova

A, Rational siRNA design for RNA interference, Nat Biotechnol, 22:326–

330, 2004.

[22] Scherer LJ, Rossi JJ, Approaches for the sequence-speciﬁc knockdown

of mRNA, Nat Biotechnol., 21:1457–1465, 2003.

[23] Sciabola S, Cao Q, Orozco M, Faustino I, Stanton RV, Improved nucleic

acid descriptors for siRNA efﬁcacy prediction, Nucl Acids Res, 41:1383–

1394, 2013.

[24] Shabalina SA, Spiridonov AN, Ogurtsov AY, Computational models

with thermodynamic and composition features improve siRNA design,

BMC Bioinformatics, 7:65, 2006.

[25] Sudarsana LR, Sarojamma V, Ramakrishna V, Future of RNAi in

medicine: a review, World J Med Sci, 2:1–14, 2007.

[26] Takasaki S, Methods for Selecting Effective siRNA Target Sequences

Using a Variety of Statistical and Analytical Techniques, Methods Mol

Biol, 942:17–55, 2013.

[27] Tuschl T, Zamore PD, Lehmann R, Bartel DP, Sharp PA, Targeted

mRNA degradation by double-stranded RNA in vitro, Genes Dev.,

13:3191–3197, 1999.

[28] Vert JP, Foveau N, Lajaunie C, Vandenbrouck Y, An accurate and

interpretable model for siRNA efﬁcacy prediction, BMC Bioinformatics,

7:520, 2006.

[29] Vickers TA, Koo S, Bennett CF, Crooke ST, Dean NM, Baker BF,

Efﬁcient reduction of target RNAs by small interfering RNA and RNase

H-dependent antisense agents A comparative analysis, J Biol Chem.,

278:7108–7118, 2003.

[30] Ui-Tei K, Naito Y, Takahashi F, Haraguchi T, Ohki–Hamazaki H, Juni

A, Ueda R, Saigo K, Guidelines for the selection of highly effective

siRNA sequences for mammalian and chick RNA interference, Nucleic

Acids Res., 32:936–948, 2004.

[31] Ui–Tei K: Optimal choice of functional and off–target effect–reduced

siRNAs for RNAi therapeutics, Front Genet, 4:107, 2013.

[32] Angart P, Vocelle D, Chan C, Walton SP, Design of siRNA therapeutics

from the molecular scale, Pharmaceuticals, 6:440–468, 2013.

[33] Gavrilov K, Saltzman WM, Therapeutic siRNA: principles, challenges,

and strategies, Yale J Biol Med., 85:187–200, 2012.

[34] Mutisya D, Selvam C, Lunstad BD, Pallan PS, Haas A, Leake D, Egli

M, Rozners E, Amides are excellent mimics of phosphate internucleoside

linkages and are well tolerated in short interfering RNAs, Nucleic Acids

Res, 42(10):6542–51, 2014.

[35] Deng Y, Wang CC, Choy KW, Du Q, Chen J, Wang Q, Li L, Chung TK,

Tang T, Therapeutic potentials of gene silencing by RNA interference:

principles, challenges, and new strategies, Gene, 538(2):217–27, 2014.

[36] Matveeva O, Nechipurenko Y, Rossi L, Moore B, Ogurtsov AY, Atkins

JF, et al., Comparison of approaches for rational siRNA design leading

to a new efﬁcient and transparent method, Access, 35:1–10, 2007.

[37] Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD,

Asym-metry in the assembly of the RNAi enzyme complex, Cell, 115(2):199–

208, 2003.

[38] Khvorova A, Reynolds A, Jayasena SD, Functional siRNAs and miRNAs

exhibit strand bias, Cell, 115(2):209–216, 2003.

[39] Schramm G, Ramey R, siRNA design including secondary structure

target site prediction, Nature Medicine, 2(8) doi: 10.1038/nmeth780,

2005 (Application Notes).

[40] Hannon GJ, Rossi JJ, Unlocking the potential of the human genome

with RNA interference, Nature, 431:371–378, 2004.

[41] Qi L, Han Z, Ruixin Z, Ying X, and Zhiwei C, Reconsideration of in

silico siRNA design from a perspective of heterogeneous data integration:

problems and solutions, Brief Bioinform., 15:292–305, 2014.

[42] Saetrom P, Predicting the efﬁcacy of short oligonucleotides in antisense

and RNAi experiments with boosted genetic programming,

Bioinformat-ics, 20(17):3055–3063, 2004.

[43] Takasaki S, Kotani S, Konagaya A, An effective method for selecting

siRNA target sequences in mammalian cells, Cell Cycle, 3(6):790–5,

2004.

[44] Khvorova A, Reynolds A, Jayasena SD, Functional siRNAs and miRNAs

exhibit strand bias, Cell, 115:209–216, 2003.

[45] Chalk A, Wahlestedt C, Sonnhammer E, Improved and automated

pre-diction of effective siRNA, Biochem Biophys Res Commun., 319(1):264–

274, 2004.

[46] Luo K, Chang D, The gene–silencing efﬁciency of siRNA is strongly

dependent on the local structure of mRNA at the targeted region,

Biochem Biophys Res Commun, 318 (1):303–310, 2004.

[47] KatohT, Suzuki T, Speciﬁc residues at every third position of siRNA

shape its efﬁcient RNAi activity, Nucleic Acids Res, 35:e27, 2007.

[48] SantaLucia Jr., J., A uniﬁed view of polymer, dumbbell, and

oligonu-cleotide DNA nearest–neighbor thermodynamics, Proceedings of the

National Academy of Science USA, 95 :1460–1465, 1998.

[49] Zou, H., Hastie T., Regularization and variable selection via the elastic

net, Journal of the Royal Statistical Society, Series B, 67(2): 301–320,

2005.

[50] H Kopka and P W Daly, A Guide to L A TEX, 3rd ed Harlow, England:

Addison-Wesley, 1999.

Định dạng
Số trang	6
Dung lượng	210,79 KB