To this end, views of siRNAs that integrate available siRNA design rules are first learned using an adaptive Fuzzy C Means FCM algorithm.. Therefore, they are in general poor to individua
Trang 1MVRM: A hybrid approach to predict siRNA
efficacy
Bui Ngoc Thang
University of Engineering and Technology,
Vietnam National University, Hanoi
144 Xuanthuy, Caugiay, Hanoi, Vietnam
Email: thangbn@vnu.edu.vn
Le Sy Vinh University of Engineering and Technology, Vietnam National University, Hanoi
144 Xuanthuy, Caugiay, Hanoi, Vietnam Email: vinhls@vnu.edu.vn
Ho Tu Bao School of Knowledge Science Japan Advanced Institute of Science
and Technology Email: bao@jaist.ac.jp
Abstract—The discovery of RNA interference (RNAi) leads
to design novel drugs for different diseases Selecting short
interfering RNAs (siRNAs) that can knockdown target genes
efficiently is one of the key tasks in studying RNAi A number
of predictive models have been proposed to predict knockdown
efficacy of siRNAs, however, their performance is still far from
the expectation This work aims to develop a predictive model to
enhance siRNA knockdown efficacy prediction The key idea is to
combine both the rule–based and the model–based approaches.
To this end, views of siRNAs that integrate available siRNA design
rules are first learned using an adaptive Fuzzy C Means (FCM)
algorithm The learned views and other properties of siRNAs
are combined to final representations of siRNAs The elastic net
regression method is employed to learn a predictive model from
these final representations Experiments on benchmark datasets
showed that the proposed method achieved stable and accurate
results in comparison with other methods.
I INTRODUCTION RNA interference (RNAi) is a cellular process in which
long double stranded RNA duplex or hairpin precursors are
cleaved into short interfering RNAs (siRNAs) by the
ribonu-clease III enzyme Dicer siRNAs bind the RNA induced
silencing complex (RISC), then unwinded into sense and
antisense strands, after that antisense siRNAs bind to their
complementary target mRNAs and induce their degradation
In 1998, Fire and Mello discovered the important role
of dsRNAs when they studied RNAi in the nematode worm
Caenorhabditis elegans (they were awarded the Nobel Prize in
Physiology or Medicine for their contributions to research on
RNAi in 2006) Studies on the discovery of RNAi have had
an immense impact on biomedical research and make RNAi as
a valuable tool to design novel medical applications [27], [7],
[13], [25], [17], [10] In RNAi research, synthesizing of highly
effective siRNAs is a crucial task to design novel drugs for
the treatment of different diseases such as influenza A virus,
HIV, hepatitis B virus, RSV viruses, cancer disease and so
on As a consequence, siRNA–based silencing is considered
as one of the most promising techniques in future therapy
and predicting knockdown efficacy of siRNAs is an essential
problem for effective siRNA selection [39], [40], [28], [31],
[32], [33], [34], [35]
A number of algorithms have been proposed to design
and predict effective siRNAs They could be categorized into
two approaches: the rule–based approach and the model–based
approach [14], [18], [23]
The rule–based approach proposes different rules to gen-erate effective siRNAs These rules were empirically designed and examined based on small datasets The first rational
siRNA design rule was detected by Elbashir et al [6] They
suggested that siRNAs of size 19–21 nt with 2 nt overhangs
at the 3 ends can efficiently degrade target genes mRNAs
Scherer et al [22] found that the thermodynamic properties
are important characteristics to design effective siRNAs for inhibiting target specific mRNAs Soon after that, various rational design rules to generate effective siRNAs have been proposed [21], [30], [1], [15], [37], [38] For example, Uitei and his colleagues [30] examined 72 siRNAs targeting six genes and discovered four criteria for effective siRNA design: (i) A or U at position 19, (ii) G or C effective at position 1, (iii) at least five U or A residues from positions 13–19, (iv) no GC stretch more than 9 nt Amarzguioui and co– workers [1] analyzed 46 siRNAs targeting genes and reported the following rule of six criteria for effective siRNA design: (i)ΔT3= T3−T5, the difference between the number of A/U residues in three terminal positions at the 3end and at the 5 end (relative to the sense strand of the siRNA) ΔT3 > 1
is positively correlated; (ii) G or C residue at position 1, positively correlated; (iii) an U residue at position 1, negatively correlated; (iv) an A residue at position 6, positively correlated; (v) A or U at position 19, positively correlated; (vi) G at position 19, negatively correlated
However, the rule–based approach does not reach our satisfaction About 65% of siRNAs generated by these rules have failed when experimentally tested In particular, they were 90% in inhibition and nearly 20% of them were inactive [20] The main reason is that siRNA design rules were empirically analyzed on small datasets and siRNAs were synthesized from specific genes Therefore, they are in general poor to individually design highly effective siRNAs
The model–based approach includes predictive models that were learned from larger datasets by different machine learning techniques The performance of predictive models is more accurate and reliable than that of the rule–based approach [24] For example, Huesken and co–workers [12] proposed a new algorithm, Biopredsi, by applying artificial neural networks to
a dataset of 2431 scored siRNAs This dataset was widely used
as a benchmark to train and test other predictive models such as the ThermoComposition21 [24], DSIR [28], i–Score [14] and Scales models [36] The predictive models are currently esti-mated as the best predictors [18], [36] More recently, Sciabola
2015 Seventh International Conference on Knowledge and Systems Engineering
2015 Seventh International Conference on Knowledge and Systems Engineering
2015 Seventh International Conference on Knowledge and Systems Engineering
Trang 2et al [23] employed three–dimension structural information
of siRNAs to increase performance of their model A stable
predictive model [3] called BiLTR was developed to predict
knockdown efficacy of siRNAs
Although model–based methods are better than rule-based
methods, they suffer from some drawbacks Their performance
is still slow and unstable The predictive ability of these
models is considerably decreased and changed when tested
on independent datasets such as the performance of 18 current
models tested on three independent datasets [23] Our analyses
reveal two main reasons of the models: (1) siRNAs datasets
were provided by different groups under different protocols
in different scenarios [16], [41] so the distributions of these
datasets are very different and siRNAs data are heterogeneous
(2) The performance of machine learning methods also heavily
depends on the choice of data representation (or features) on
which they are applied In the previous models, siRNAs were
encoded by binary, spectral, tetrahedron, and sequence
repre-sentations However, because of siRNA distribution diversity
and unsuitable measures based on these siRNA representations,
they can be inappropriate to represent siRNAs in order to build
a good model for predicting siRNA efficacy
In this paper, we develop a hybrid approach, named
MVRM, to predict the siRNA knockdown efficacy The method
combines both design rules and machine learning methods
to build a predictive model To this end, we focus on the
representation of siNRAs Available siRNA design rules are
considered as prior background knowledge for generating
views to represent siRNAs Each view captures characteristics
of a siRNA design rule These views are then learned by
exploiting the fuzzy C means algorithm A new representation
of siRNAs is composed by learned views and other properties
of siRNAs such as melting temperature, molecular weight and
thermodynamic values After transforming siRNAs to the new
representation, a predictive model was learned by applying a
regularized method, Elastic Net, to predict knockdown efficacy
of siRNAs
Our method is experimentally compared with other
meth-ods on benchmark datasets Experiments show promising
re-sults that the performance of the MVRM is comparable or
better than that of other methods
II METHODS Our model, MVRM, is a hybrid of the rule–based and the
model–based approaches so it consists of two main phases:
Learning siRNA views from design rules to build new
rep-resentations of siRNAs and building a predictive model from
these new representations to predict knockdown efficacy of
siRNAs
A Learning siRNA views
{s1, s2, , s m } with the same length n The knownkdown
efficacy of sequence s i ∈ S is e i (i = 1 m) A set of k
design rulesR = {r1, r2, , r k } are collected from previous
rule-based studies The learning siRNA views includes four
steps: Encoding siRNAs by content, Encoding rules to views,
Learning siRNA views, and Encoding siRNAs by learned
views
TABLE I T HE FIVE WELL - STUDIED CHARACTERISTICS OF SI RNA
SEQUENCES Properties Condition Encoding column
GC content From 0.3 to 0.6 (1,0,-1,-1) at column(n + 1)
Otherwise (0,1,-1,-1) at column(n + 1)
T >= 1 (1,0,-1,-1) at column(n + 2)
Otherwise (0,1,-1,-1) at column(n + 2)
GC stretch >= 9 (1,0,-1,-1) at column(n + 3)
Otherwise (0,1,-1,-1) at column(n + 3)
A/Us at five positions of the 5‘end >= 3 (1,0,-1,-1) at column(n + 4)
Otherwise (0,1,-1,-1) at column(n + 4)
A/Us at seven positions of the 5‘end >= 5 (1,0,-1,-1) at column(n + 5)
Otherwise (0,1,-1,-1) at column(n + 5)
Encoding siRNAs by content: Each siRNA is a sequence
of n nucleotides such as “GAAAGGAAUUGUAUAAAUC” There are five well-studied characteristics of an siRNA [26]: (1) GC content, (2) the difference of A/U in 3 nucleotides at the two ends (T), (3) GC stretch, (4), (5) the number of A/U
at five and seven positions of the 5’ end of the antisense strand This step encodes siRNA sequence s i (i = 1 m) by
a binary matrix M i of size 4 × (n + 5) in which 4 rows
represent for 4 nucleotide types and(n+5) columns represent
for n nucleotides and 5 siRNA characteristics The first n columns represent for n nucleotides, i.e columnc (c = 1 n)
is binary vector of size 4 × 1 representing the nucleotide
at position c on the siRNA sequence Specifically, four nu-cleotides A, C, G, and U are encoded by encoding vectors
(1, 0, 0, 0) T , (0, 1, 0, 0) T , (0, 0, 1, 0) T and(0, 0, 0, 1) T,
respec-tively The last five columns of the matrix represent for five characteristics of siRNA They are computed and encoded as binary vectors as described in Table I The encoding matrix
M of an siRNA sequence of 19 nucleotides GAAAGGAAU-UGUAUAAAUC” is described in Table II
Encoding design rules to views: This step encode each
design ruler i (i = 1 k) by a matrix T i (viewT i) of size
4 × (n + 5) in which 4 rows represent for 4 nucleotides types
and(n+5) columns represents for n nucleotides and 5 siRNA
characteristics Columnj th (j = 1 n) of the matrix shows
the knockdown efficacy of nucleotides A, C, G, U The last five columns describe the knockdown efficacy of the five siRNA characteristics
The knockdown efficacy of viewT ihas to satisfy constrains
of the siRNA design rule The design rule r i propositionally
describes the occurrence or absence of nucleotides at different positions on effective siRNAs and other mentioned siRNA characteristics Thus, if design rule r i states the occurrence
(or absence) of some nucleotides on the j th position, then
their corresponding values in the viewT iwould be greater (or
smaller) than other values at columnj Similarly, if the siRNA
design ruler ishows the characteristicsj th, the corresponding
value at column(n + j) thof matrixT iwould be greater than
the other values in the column
For example, consider a rule r and its encoding matrix
T, the design rule shows that at position 19, nucleotides A
is effective and nucleotide C is ineffective It means that the knockdown efficacy of nucleotide A is larger than that of the other nucleotides and the knockdown efficacy of nucleotide C
is smaller than that of the other nucleotides T[1,19], T[2,19], T[3,19], and T[4,19] are the knockdown efficacy of A, C, G,
Trang 3TABLE II T HE ENCODING MATRIX M OF SI RNA SEQUENCE
GAAAGGAAUUGUAUAAAUC T HE FIRST 19 COLUMNS ENCODE FOR
19 NUCLEOTIDES OF THE SI RNA SEQUENCE T HE LAST 5 COLUMNS
ENCODE FOR 5 CHARACTERISTICS OF THE SI RNA SEQUENCE
Posision 1 2 3 4 5 18 19 20 21 22 23 24
and U, respectively The rule at position 19 can be expressed
into specific constrains on matrix T as follows
• T [2, 19] − T [1, 19] < 0, i.e., A is effective than C
• T [3, 19] − T [1, 19] < 0, i.e., A is effective than G
• T [4, 19] − T [1, 19] < 0, i.e., A is effective than U
• T [2, 19] − T [3, 19] < 0, i.e., C is ineffective than G
• T [2, 19] − T [4, 19] < 0, i.e., C is ineffective than U
LetG i (i = 1 k) be the set of specific constrains of rule
r i on matrix T i where each constraint of G i is in the form
(T [p, j] − T [q, j] < 0) where row p = 1 4 and column
j = 1 n + 5.
Learning views: The siRNA set {s1, s2, , s m } will play
as the training set to learn k views (optimize k matrices
T1, , T k) Learning views can be considered as a clustering
problem[5] wherek matrices are considered as centers of k
clusters Each encoding matrix M i of siRNA s i is assigned
to viewsT j with a membership valueu ij (i = 1 m; j =
1 k) It means that siRNA sequences can be generated by
different views at different confidences
We employ the FCM algorithm [2] with k clusters to
opti-mizek views (matrices) and membership values by minimizing
the following objective function
R =
m
i=1
k
j=1
u2
ij ||M i − T j ||2
subject to:
1) k constraint sets G1, , G kwhereG iset of specific
constrains of ruler i on matrixT i
i=1 u ij = 1, j = 1, k
where|| ◦ || F rois the Frobenius norm to calculate norm of
a matrix Membership values and matrices can be solved by
using an iterative method: each column of a matrix is derived
while keeping the other ones The final solution is computed
as follows:
k
z=1
||M i −T j || F ro
||M i −T z || F ro
T j [., c] =
i=1 u2
ij M i [., c]
i=1 u2
ij
(3)
WhereT j [., c] is a vector corresponding to the c thcolumn
of the matrixT j.
Algorithm 1 describes two steps including the computing membership values of encoding matrices and updating matrices corresponding to views These two steps are repeated until membership values and views meet convergence criteria
Encoding siRNAs by views: To obtain a final
rep-resentation of siRNAs, learned views are linearly com-bined and other properties of siRNAs are employed In particular, nucleotides A, C, G, and U of siRNAs at
a position c (c = 1 n) are represented by vectors
(T1[1, c], , T k [1, c]), (T1[1, c], , T k [1, c]), (T1[1, c], ,
T k [1, c]), and (T1[1, c], , T k [1, c]), respectively If the GC
content of siRNAs satisfies its condition (see Table 1), it is represented by the vector (T1[1, n + 1], , T k [1, n + 1]) In
contrast, it is represented by (T1[2, n + 1], , T k [2, n + 1]).
Four other characteristics of siRNAs are computed in the similar way In short, each siRNA sequences is encoded by a vector ofk×(n+5) Moreover, other five properties of siRNAs
(melting temperature, molecular weight, three thermodynamic properties consisting of enthalpy, entropy, and free energy ) are added to the final representation They are calculated by the nearest neighbor method [48] As a result, each siRNA is encode by a vector ofk × (n + 5) + 5.
B Learning a predictive model
This step will build a predictive model using the new representation of siRNAs The elastic net method [Zou, H
et al., 2005] is applied to build the model for predicting knockdown efficacy of siRNAs This method is not only to build the model but also to select important features that effect
to the target label In addition, based on the lasso regularization term of elastic net method, signification variables or important characteristics that influence the knockdown efficacy of siR-NAs are detected
III EXPERIMENTALEVALUATION This section presents experimental evaluation by com-paring the proposed method MVRM (multiple view based regression model) with recent methods for siRNA knockdown efficacy prediction on four benchmark datasets
• The Huesken dataset of 2431 siRNA sequences target-ing 34 human and rodent mRNAs, commonly divided into the training set HU train of 2182 siRNAs and the testing set HU test of 249 siRNAs [12]
• The Reynolds dataset of 240 siRNAs [21]
• The Vicker dataset of 76 siRNA sequences targeting two genes [29]
• The Harborth dataset of 44 siRNA sequences targeting one gene [9]
We employed five siRNA design rules (k = 5) to learn
five views of siRNAs Specifically, the five design rules are Reynolds rule, Uitei rule, Amarzguioui rule, Jalag rule, Hsieh rule [21], [30], [1], [11], [15] The HU train set was used to learn these views and MVRM model The other datasets were used to comparative evaluation
Trang 4Algorithm 1 Multi-view Learning
Input: A dataset S = {s1, s2, , s m } where s i , i =
1 m are siRNA sequences of length n; a set R of k
design rules.t Max is the number of iterations.
Output:k matrices (views) T1, T2, , T k.
main
Encode siRNAs by content
for r i inR do
– Form the set of constraintsG i based onr i
– Initialize the viewT i satisfyingG i
end for
t = 0 { Iterative step}
repeat
t ← t + 1
{Compute membership values as follows}
for i = 1 to m do
for j = 1 to k do
Computeu (t) ij using equation (2)
end for
end for
{Update views as follow}
for j = 1 to k do
for c = 1 to n + 5 do
ComputeT j [., c] (t) using equation (3)
if (T j [., c] (t) satisfies the constraintsG j) then
T j [., c] ← T j [., c] (t)
end if
end for
end for
until T (t)
p −T (t−1)
p F ro
T (t−1)
u (t)
qp −u (t−1)
qp 2
u (t−1) qp 2 ≤ 1 p=k,q=m
p=1,q=1 or(t > t Max) end main
Fig 1 Upper and lower curves of means squared error as a function ofλ
values
The turning parameter of the objective function of the
model was estimated by employing 10–fold cross validation
Figure 1 shows the curves of upper and lower bounds of mean
squared error rates between predicted efficacy and
experimen-Fig 2 Coefficients of the MVRM model show the importance of 125 features TABLE III T HE R VALUES OF 18 MODELS AND MVRM ON THREE
INDEPENDENT DATA SETS Algorithm R Reynolds R V icker R Harborth
(244si/7g) (76si/2g) (44si/1g) GPboot[42] 0.55 0.35 0.43
Takasaki[43] 0.03 0.25 0.01 Reynolds 1[21] 0.35 0.47 0.23 Reynolds 2[21] 0.37 0.44 0.23 Schawarz[37] 0.29 0.35 0.01 Khvorova[44] 0.15 0.19 0.11 Stockholm 1[45] 0.05 0.18 0.28 Stockholm 2[45] 0.00 0.15 0.41
i-score[14] 0.54 0.58 0.43 BIOPREDsi[12] 0.53 0.57 0.51
MVRM model 0.6 0.614 0.52
tal efficacy through cross validation We used five design rules and five other properties of siRNAs in learning our model
so the final representation has 5 × 24 + 5 = 125 features.
After learning the model, 78 important features that influence the knockdown efficacy of siRNAs were chosen Figure 2 describes the influencing ability of 125 features During the learning process, the coefficients of less important features are driven to zero Based on the coefficients of the MVRM model, important features can be easily selected in order to design effective siRNAs
The MVRM model was compared to most of state–of–the– art methods For a fair comparison, we carried out experiments
on MVRM in the same conditions as reported by other methods Concretely, the comparative evaluation is as follows 1) Comparison of MVRM with BIOPREDsi [12], Ther-mocomposition21 [24], DSIR [28], SVM [23] and , BiLTR [3] when trained on the HU train and tested
on the HU test dataset The Pearson correlation coef-ficients of those five models are 0.66, 0.66, 0.67 and 0.80, 0.67 respectively The performance of MVRM estimated on the HU test is 0.66 The performance of
Trang 5MVRM model is similar to that of other models but
less than that of SVM model The reason is that SVM
model uses positional features and 3D information
This 3D feature captures the flexibility and strain
of siRNAs that can be important characteristics for
siRNAs of the HU test set extracted from human
NCI–H1299, Hela genes and rodent genes [12]
2) Comparison of MVRM with 19 models including
BIOPREDsi, DSIR, SVM, and BiLTR when all of
models were trained on the HU train set and tested
on three independent datasets of Reynolds, Vicker
and Harborth The Pearson correlation coefficients
of MVRM model are 0.6, 0.614, and 0.52 when
tested Reynolds, Vicker and Harborth datasets,
re-spectively Table III shows that the MVRM
consider-ably achieved results higher than the first 17 models
It was better than SVM and BiLTR models when
tested on the first two datasets The MVRM was not
as good as BiLTR on the Harborth dataset However,
one limitation of BiLTR model is computational cost
to train transformation matrices and parameters It
took about 5 days to train BiLTR while only about
five minutes to train MVRM model Besides that,
unlike most of other models, the MVRM model
produces the stable results across each of independent
siRNA datasets
In these comparative studies, it was found that the
perfor-mance of MVRM is more stable and higher than that of other
models The reason is that previous siRNA representations
can be unsuitable to represent siRNAs provided different
groups under different protocols In our proposed method,
the representation is enriched by incorporating background
knowledge of siRNA design rules Therefore, it can capture
the distribution diversity of siRNA data
As presented in the experimental comparative evaluation,
MVRM achieved better results than most other methods in
predicting siRNA knockdown efficacy
IV CONCLUSION
In this paper, we have proposed a stable and accurate
method to predict the knockdown efficacy of siRNA sequences
In the model, to enrich siRNA representation, views of
siR-NAs are constructed and learned by incorporating background
knowledge of available design rules By combining these
views, an appropriate siRNA representation is also developed
to represent siRNAs belonging to different distributions that
are provided by research groups under different protocols
The experimental comparative evaluation on commonly
used datasets with standard evaluation procedure in different
contexts shows that the proposed method achieved promising
results There are some reasons for that First, it is expensive
to experimentally analyze the knockdown efficacy of siRNAs,
and thus most of available datasets have relatively small size
leading to limited results Second, MVRM has its advantages
by incorporating domain knowledge (siRNA design rules)
experimentally found from different datasets Third, MVRM
is generic and can be easily exploited when new design rules
are discovered When our proposed model was tested on the
three independent datasets generated by different empirical
experiments, the MVRM achieves the best results on the Reynolds and Vicker datasets Additionally, the performance
of MVRM model is higher than that of the other models except the SVM and BiLTR models when tested on the Harborth dataset (Table III)
ACKNOWLEDGMENT Bui Ngoc Thang and Le Sy Vinh are financially supported
by Vietnam National Foundation for Science and Technology (102.01-2013.04)
REFERENCES [1] Amarzguioui M, Prydz H, An algorithm for selection of functional siRNA
sequences, Biochem Biophys Res Commun, 316:1050–8, 2004.
[2] Bezdek JC, Ehrlich R, Full W, FCM: The fuzzy c-means clustering
algorithm, Computers & Geosciences, 10 (2): 191–203, 1984.
[3] Bui TN, Ho TB, Tatsuo K, A semi-supervised tensor regression model
for siRNA efficacy prediction, BMC Bioinformatics, 16: 80, 2015.
[4] Chang PC, Pan WJ, Chen CW, Chen YT, Chu YW, A design engine of
siRNA that integrates SVMs prediction and feature filters, Biocatalysis
and Agricultural Biotechnology, 1:129–134, 2012.
[5] Chang X., Dacheng T., Chao X., A Survey on Multi-view Learning, CoRR
abs/1304.5634, 2013.
[6] Elbashir SM, Lendeckel W, Tuschl T, RNA interference is mediated by
21– and 22–nucleotide RNAs, Genes Dev, 2001, 15:188–200.
[7] Elbashir SM, Harborth J, Lendeckel W, Yalcin A, Klaus W, Tuschl T,
Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells, Nature, 411:494–498, 2001.
[8] Gong W, Ren Y, Xu Q, Wang Y, Lin D, Zhou H, Li T, Integrated
siRNA design based on surveying of features associated with high RNAi effectiveness, BMC Bioinformatics, 7:516, 2006.
[9] Harborth J, Elbashir SM, Vandenburgh K, Manninga H, Scaringe SA,
Weber K, Tuschl T, Sequence, chemical, and structural variation of small
interfering RNAs and short hairpin RNAs and the effect on mammalian gene silencing, Antisense Nucleic Acid Drug Dev, 13:83–105, 2003.
[10] Hannon GJ, Rossi JJ, Unlocking the potential of the human genome
with RNA interference, Nature, 43:371–378, 2004.
[11] Hsieh AC, Bo R, Manola J, Vazquez F, Bare O, Khvorova A, Scaringe S,
Sellers WR, A library of siRNA duplexes targeting the phosphoinositide
3-kinase pathway: determinants of gene silencing for use in cell-based screens, Nucleic Acids Res, 32:893–901, 2004.
[12] Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, Mellon B, Engel S, Rosenberg A, Cohen D, Labow M, Reinhardt M, Natt
F, Hall J, Design of a Genome–Wide siRNA Library Using an Artificial
Neural Network, Nature Biotechnology, 23:955–1001, 2005.
[13] Hutvagner G, McLachlan J, Balint E, Tuschl T, Zamore PD, A cellular
function for the RNA interference enzyme Dicer in small temporal RNA maturation, Science, 293:834–838, 2001.
[14] Ichihara M, Murakumo Y, Masuda A, Matsuura T, Asai N, Jijiwa M,
Ishida M, Shinmi J, Yatsuya H, Qiao S et al., Thermodynamic instability
of siRNA duplex is a prerequisite for dependable prediction of siRNA activities, Nucleic Acids Res, 35:e123, 2007.
[15] Jagla B, Aulner N, Kelly PD, Song D, Volchuk A, Zatorski A, Shum
D, Mayer T, De Angelis DA, Ouerfelli O, Rutishauser U, Rothman JE,
Sequence characteristics of functional siRNAs, RNA, 11:864–872, 2005.
[16] Klingelhoefer JW, Moutsianas L, Holmes CC, Approximate Bayesian
feature selection on a large meta-dataset offers novel insights on factors that effect siRNA potency, Bioinformatics, 25:1594–1601, 2009.
[17] Meister G, Tuschl T, Mechanisms of gene silencing by double-stranded
RNA, Nature, 43:343–349, 2004.
[18] Mysara M, Elhefnawi M, Garibaldi JM, MysiRNA: improving siRNA
efficacy prediction using a machine-learning model combining multi-tools and whole stacking energy, J Biomed Inform, 45:528–34, 2012.
[19] Qiu S, Lane T, A Framework for Multiple Kernel Support Vector
Regression and Its Applications to siRNA Efficacy Prediction, IEEE/ACM
Trans Comput Biology Bioinform, 6:190–199, 2009.
Trang 6[20] Ren Y, Gong W, Xu Q, Zheng X, Lin D, et al., siRecords: an extensive
database of mammalian siRNAs with efficacy ratings, Bioinformatics,
22:1027–1028, 2006.
[21] Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, Khvorova
A, Rational siRNA design for RNA interference, Nat Biotechnol, 22:326–
330, 2004.
[22] Scherer LJ, Rossi JJ, Approaches for the sequence-specific knockdown
of mRNA, Nat Biotechnol., 21:1457–1465, 2003.
[23] Sciabola S, Cao Q, Orozco M, Faustino I, Stanton RV, Improved nucleic
acid descriptors for siRNA efficacy prediction, Nucl Acids Res, 41:1383–
1394, 2013.
[24] Shabalina SA, Spiridonov AN, Ogurtsov AY, Computational models
with thermodynamic and composition features improve siRNA design,
BMC Bioinformatics, 7:65, 2006.
[25] Sudarsana LR, Sarojamma V, Ramakrishna V, Future of RNAi in
medicine: a review, World J Med Sci, 2:1–14, 2007.
[26] Takasaki S, Methods for Selecting Effective siRNA Target Sequences
Using a Variety of Statistical and Analytical Techniques, Methods Mol
Biol, 942:17–55, 2013.
[27] Tuschl T, Zamore PD, Lehmann R, Bartel DP, Sharp PA, Targeted
mRNA degradation by double-stranded RNA in vitro, Genes Dev.,
13:3191–3197, 1999.
[28] Vert JP, Foveau N, Lajaunie C, Vandenbrouck Y, An accurate and
interpretable model for siRNA efficacy prediction, BMC Bioinformatics,
7:520, 2006.
[29] Vickers TA, Koo S, Bennett CF, Crooke ST, Dean NM, Baker BF,
Efficient reduction of target RNAs by small interfering RNA and RNase
H-dependent antisense agents A comparative analysis, J Biol Chem.,
278:7108–7118, 2003.
[30] Ui-Tei K, Naito Y, Takahashi F, Haraguchi T, Ohki–Hamazaki H, Juni
A, Ueda R, Saigo K, Guidelines for the selection of highly effective
siRNA sequences for mammalian and chick RNA interference, Nucleic
Acids Res., 32:936–948, 2004.
[31] Ui–Tei K: Optimal choice of functional and off–target effect–reduced
siRNAs for RNAi therapeutics, Front Genet, 4:107, 2013.
[32] Angart P, Vocelle D, Chan C, Walton SP, Design of siRNA therapeutics
from the molecular scale, Pharmaceuticals, 6:440–468, 2013.
[33] Gavrilov K, Saltzman WM, Therapeutic siRNA: principles, challenges,
and strategies, Yale J Biol Med., 85:187–200, 2012.
[34] Mutisya D, Selvam C, Lunstad BD, Pallan PS, Haas A, Leake D, Egli
M, Rozners E, Amides are excellent mimics of phosphate internucleoside
linkages and are well tolerated in short interfering RNAs, Nucleic Acids
Res, 42(10):6542–51, 2014.
[35] Deng Y, Wang CC, Choy KW, Du Q, Chen J, Wang Q, Li L, Chung TK,
Tang T, Therapeutic potentials of gene silencing by RNA interference:
principles, challenges, and new strategies, Gene, 538(2):217–27, 2014.
[36] Matveeva O, Nechipurenko Y, Rossi L, Moore B, Ogurtsov AY, Atkins
JF, et al., Comparison of approaches for rational siRNA design leading
to a new efficient and transparent method, Access, 35:1–10, 2007.
[37] Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD,
Asym-metry in the assembly of the RNAi enzyme complex, Cell, 115(2):199–
208, 2003.
[38] Khvorova A, Reynolds A, Jayasena SD, Functional siRNAs and miRNAs
exhibit strand bias, Cell, 115(2):209–216, 2003.
[39] Schramm G, Ramey R, siRNA design including secondary structure
target site prediction, Nature Medicine, 2(8) doi: 10.1038/nmeth780,
2005 (Application Notes).
[40] Hannon GJ, Rossi JJ, Unlocking the potential of the human genome
with RNA interference, Nature, 431:371–378, 2004.
[41] Qi L, Han Z, Ruixin Z, Ying X, and Zhiwei C, Reconsideration of in
silico siRNA design from a perspective of heterogeneous data integration:
problems and solutions, Brief Bioinform., 15:292–305, 2014.
[42] Saetrom P, Predicting the efficacy of short oligonucleotides in antisense
and RNAi experiments with boosted genetic programming,
Bioinformat-ics, 20(17):3055–3063, 2004.
[43] Takasaki S, Kotani S, Konagaya A, An effective method for selecting
siRNA target sequences in mammalian cells, Cell Cycle, 3(6):790–5,
2004.
[44] Khvorova A, Reynolds A, Jayasena SD, Functional siRNAs and miRNAs
exhibit strand bias, Cell, 115:209–216, 2003.
[45] Chalk A, Wahlestedt C, Sonnhammer E, Improved and automated
pre-diction of effective siRNA, Biochem Biophys Res Commun., 319(1):264–
274, 2004.
[46] Luo K, Chang D, The gene–silencing efficiency of siRNA is strongly
dependent on the local structure of mRNA at the targeted region,
Biochem Biophys Res Commun, 318 (1):303–310, 2004.
[47] KatohT, Suzuki T, Specific residues at every third position of siRNA
shape its efficient RNAi activity, Nucleic Acids Res, 35:e27, 2007.
[48] SantaLucia Jr., J., A unified view of polymer, dumbbell, and
oligonu-cleotide DNA nearest–neighbor thermodynamics, Proceedings of the
National Academy of Science USA, 95 :1460–1465, 1998.
[49] Zou, H., Hastie T., Regularization and variable selection via the elastic
net, Journal of the Royal Statistical Society, Series B, 67(2): 301–320,
2005.
[50] H Kopka and P W Daly, A Guide to L A TEX, 3rd ed Harlow, England:
Addison-Wesley, 1999.