In this work, we used linear molecular indices calculated with the software TOMOCOMD-CARDD and Box–Jenkins moving average operators to develop a multi-output model that can predict outco
Trang 1DOI 10.1007/s11030-015-9571-9
F U L L - L E N G T H PA P E R
Multi-output model with Box–Jenkins operators of linear
indices to predict multi-target inhibitors of
ubiquitin–proteasome pathway
Gerardo M Casañola-Martin · Huong Le-Thi-Thu · Facundo Pérez-Giménez ·
Yovani Marrero-Ponce · Matilde Merino-Sanjuán · Concepción Abad ·
Humberto González-Díaz
Received: 11 August 2014 / Accepted: 14 February 2015
© Springer International Publishing Switzerland 2015
Abstract The ubiquitin–proteasome pathway (UPP) plays
an important role in the degradation of cellular proteins and
regulation of different cellular processes that include cell
cycle control, proliferation, differentiation, and apoptosis In
this sense, the disruption of proteasome activity leads to
dif-ferent pathological states linked to clinical disorders such
as inflammation, neurodegeneration, and cancer The use of
UPP inhibitors is one of the proposed approaches to manage
these alterations On other hand, the ChEMBL database
con-tains >5,000 experimental outcomes for >2,000 compounds
tested as possible proteasome inhibitors using a large number
of pharmacological assay protocols All these assays report a
large number of experimental parameters of biological
activ-ity like EC50 , IC50, percent of inhibition, and many others
that have been determined under many different conditions,
targets, organisms, etc Although this large amount of data
offers new opportunities for the computational discovery of
Electronic supplementary material The online version of this
article (doi:10.1007/s11030-015-9571-9) contains supplementary
material, which is available to authorized users.
G M Casañola-Martin (B) · C Abad
Departament de Bioquímica i Biologia Molecular, Universitat de
Valèn-cia, 46100 Burjassot, Spain
e-mail: gmaikelc@gmail.com; gerardo.casanola@uv.es
G M Casañola-Martin · F Pérez-Giménez
Unidad de Investigación de Diseño de Fármacos y Conectividad
Mole-cular, Departamento de Química Física, Facultad de Farmacia,
Univer-sitat de València, Valencia, Spain
G M Casañola-Martin
Faculty of Environmental Science, Pontifical University Catholic of
Ecuador in Esmeraldas (PUCESE), C/ Espejo y Santa Cruz S/N, 080150
Esmeraldas, Ecuador
H Le-Thi-Thu
School of Medicine and Pharmacy, Vietnam National University Hanoi
(VNU), 144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
proteasome inhibitors, the complexity of these data repre-sents a bottleneck for the development of predictive models
In this work, we used linear molecular indices calculated with the software TOMOCOMD-CARDD and Box–Jenkins moving average operators to develop a multi-output model that can predict outcomes for 20 experimental parameters
in >450 assays carried out under different conditions This generated multi-output model showed values of accuracy, sensitivity, and specificity above 70 % for training and val-idation series Finally, this model is considered multi-target and multi-scale, because it predicts the inhibition of the UPP for drugs against 22 molecular or cellular targets of different organisms contained in the ChEMBL database
Keywords Ubiquitin–proteasome pathway inhibitors· CHEMBL· Multi-target · Multi-scale and multi-output models· Moving averages · QSAR
Y Marrero-Ponce Facultad de Química Farmacéutica, Universidad de Cartagena, Cartagena de Indias, Bolivar, Colombia
M Merino-Sanjuán Department of Pharmacy and Pharmaceutical Technology, University
of Valencia, Valencia, Spain
M Merino-Sanjuán Institute of Molecular Recognition and Technological Development (IDM), Inter-Universitary Institute from Polytechnic University of Valencia and University of Valencia, Valencia, Spain
H González-Díaz (B) Department of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940 Leioa, Spain
e-mail: humberto.gonzalezdiaz@ehu.es
H González-Díaz IKERBASQUE, Basque Foundation for Science,
48011 Bilbao, Spain
Trang 2Mol Divers
Introduction
The ubiquitin–proteasome pathway is one of the two main
proteolytic systems in mammalian cells [1] This pathway
is involved in a great number of cellular processes that
include cellular homeostasis, cell cycle control, gene
expres-sion, DNA repair, signal transduction, immune responses,
and apoptosis [2] The growing list of human diseases in
which protein homeostasis is disrupted reveals the
impor-tance of the ubiquitin–proteasome pathway for normal
cellular function and its potential as a therapeutic
tar-get [3] The proteasome core was a primary target inhibitor
to cancer therapy since the discovery of the proteasome
inhibitor bortezomid, and at present, the process of
pro-teasome inhibitors development involves the use of many
methods [4] Current efforts in this field of proteasome
inhibitors are aimed to the search for new drugs against
the ubiquitin–proteasome pathway, showing greater
selec-tivity, potency, and safety properties to minimize
side-effects
In this sense, it is important to develop new in silico
mod-els in order to predict novel, potent, and selective ubiquitin–
proteasome pathway inhibitors Due to its accessibility, in
this work it is necessary to carry out a compilation of
large datasets of these compounds from public sources The
CHEMBL database [5,6] (https://www.ebi.ac.uk/chembldb)
includes more than 11,420,000 activity data for >1,295,500
compounds, and 9,844 targets This vast quantity of data
opens a widespread field for the application of
computa-tional approaches for activity prediction [6,7] The
analy-sis of the data is very complex due to the three types
of chemical and pharmacological information that appears:
(1) targeting, (2) outputting, and/or (3)
multi-scaling Therefore, the multi-targeting approach emerges
from the formation of different pairs of interactions (I qr )
between drugs (d q ) and targets (t r ) [8 10] In our case,
the target interactions are represented as networks of nodes
(proteins, genes, RNAs, miRNAs) interconnected by a link
when there is a target–target interaction between two of them
The multi-output complication comprises the use of different
types of targets, assay conditions, assays, organisms,
experi-mental measures, etc., in order to decide whether two nodes
(assays, drugs, targets, etc.) are linked I i j = 1 or not I i j = 0
The case of multi-scaling is given by the different structural
levels of the organization of matter that can be described
by different input variables In this sense, the models need
to be multi-scale to collect the information at some of the
following levels: molecular structure (drugs),
macromolec-ular structure (molecmacromolec-ular targets), cellmacromolec-ular (cellmacromolec-ular line
tar-gets), and organisms (species from where the targets were
extracted) In our previous study, we used the
MARCH-INSIDE (MI) to obtain the Shannon entropy measures of a
molecular graph (G) which we used in turn as inputs for Box– Jenkins moving average (MA) operators used in time series analysis [11] MA models gained popularity after the initial proposed researches by Box and Jenkins [12] about autore-gressive integrated Box–Jenkins moving average (ARIMA) and similar models The Box–Jenkins MA operators used in time series are the average values of one characteristic of the system for different intervals of time or seasons In multi-output modeling, we calculate the MA operators as the aver-age of the property of the system (molecular descriptors or any other property, to be considered) for all drugs or targets with a specific response in one assay carried out under a sub-set of reference conditions(c r ) Consequently, our MA
oper-ator acts over a sub-set of conditions of the pharmacological assays The application of MA operators to other domains different from time is increased due to its wide applications
In this sense, the main objective of this kind of work is to assess interactions or links between drugs and targets, pro-teins, brain regions, and other complex systems For this, the use of MA properties of network nodes (drugs, proteins, reactions, laws, neurons, etc.) that form links(I qr ), in
spe-cific the rth sub-set of reference conditions(c r ) is adequate.
For this reason, we decided to call this strategy as assessing
of links with moving averages (ALMA), in a similar manner
as other authors for different multi-target and/or multi-output (mo) models [13–15]
The method is very versatile, because we can use molecu-lar descriptors calculated by different chemoinformatics soft-ware as input The softsoft-ware TOMOCOMD-CARDD (TC), developed by Marrero-Ponce et al [16], is a well-known tool for the calculation of several families of 2D/3D molecular descriptors In particular, we can use TC to calculate
dif-ferent types of atom-based linear indices f q (G, N, M, w) g
for a given compound (qth compound) We can compute these indices for the molecular graph G of the compound,
taking into consideration a specific norm (N), matrix (M),
a vector of physicochemical weights (w) for atoms, etc.
In addition, we can determinate linear indices for
differ-ent groups (g) of atoms in the molecule and assign them
different values according to the specific molecular frag-ments selected Some applications of linear indices include the estimation of chemical, physical, and kinetics properties
of compounds [17,18] Studies of different biological activi-ties are also encouraged by this method Some examples are
on antibacterials [19], tyrosinase inhibitors [20], trypanoso-mal inhibitors [21] and so on [22] Besides, the linear indices are very flexible and useful to study different complex sys-tems The types of complex systems already studied with linear indices include RNA secondary structures [23], and protein stability effects [24]
In a recent work [25], an ALMA model for neuropro-tective drugs present in CHEMBL was capable of
Trang 3predict-Fig 1 A representative sample of the compounds used in this study together with its ChEMBL code
ing I qr of drugs with targets in multi-output tests taking
into account the drug responses In the parametrization of
structural parameters of compounds, the TOPS-MODE
pro-gram [26] was used In a more recent work [27], using
MI scheme an ALMA classifier with good performance
was found Both models were able to predict the links
between drugs and targets However, we did not carry out
a formal construction or a comparison of the drug–target
networks for the CHEMBL data in the previous papers
In any case, despite the high versatility of entropy
mea-sures to codify structural information, there is not any
report of a multi-target model for drug–target interactions
for compounds with inhibitory activity of the ubiquitin–
proteasome pathway Therefore, in this study, we describe
for the first time a target, outputting, and
multi-scale ALMA model based on atom-based linear indices for
CHEMBL data of ubiquitin–proteasome pathway inhibitory
compounds
Materials and methods
CHEMBL dataset: assembling of training and validation sets
We searched and downloaded from the public database ChEMBL a general data set composed of >5,602 results of multiple assays endpoints [5,6] for UPP inhibitors The value
of the observed (obs) class variable A q (c r )obs = 1 (active
compound) or A q (c r )obs = 0 (non-active compounds) to every qth drug was biologically assayed in different
con-ditions c r The dataset used to train and validate the model
includes N = 5, 602 statistical cases formed by N d = 2, 954
unique drugs, with each one of the drugs assayed in at least one out of 20 possible standard type measures, which were determined in at least one out of 474 assays Each assay involves, in turn, at least one out of 20 protein or cellular tar-gets from seven different organisms In Fig.1some examples
Trang 4Mol Divers
Table 1 Dataset used in this study
of the compounds used in this work are shown A structural
diversity is encountered in the chemicals extracted from the
ChEMBL dataset with the UPP inhibitory activity as can be
observed In the same way, in Supplementary Material 1 the
SMILES codes of the 2,897 compounds used in this study
are depicted
As noted above, the total set of statistical cases (5,602)
formed all the experimental space used here In our case, at
the time of choosing the training and validation sets, we took
into account that each one of the different conditions would
be included in both training and validation sets, for the active
and inactive cases to guarantee an adequate and
representa-tive sample for the training and test sets Because of this, we
picked out randomly the compounds for the training (T) and
validation (CV) sets As shown in Table1, there are 1,827
active cases and 2,376 inactive ones belonging to the training
set (4,203 cases) The validation set consists of 1,399 cases
and has 607 active and 792 inactive cases These cases in
the validation set were never used in the development of the
ALMA models
Molecular descriptors: TOMOCOMD-CARDD atom-based
linear indices
TOMOCOMD-CARDD is a molecular descriptor (MD)
cal-culating program comprised of two suites with parallel
func-tionalities The first is a comprehensive collection of MD
calculating modules based on the so-called “relations
fre-quency matrices,” molecular fingerprints and a pool of the
most relevant MDs reported in the literature The second
suite comprises a set of modules derived from algebraic
considerations, collectively known as QuBiLS (acronym for
Quadratic, Bilinear and Linear MapS) This suite includes
three modules: (1) QuBiLS-MAS (QuBiLS-based on Graph–
Theoretical Electronic-Density Matrices and Atomic
weight-ingS), (2) QuBiLS-MIDAS (QuBiLS-based on MInkowski
Distance matrices and Atomic weightingS), and (3)
QuBiLS-POMAS (QuBiLS-based on molecular surface-based
POten-tial Matrices and Atomic weightingS) In this application,
only QuBiLs-MAS module is included QuBiLs-MAS
con-stitutes a unique combination of methods to calculate MDs
on an algebraic basis These MDs can be used for a wide
range of applications in all areas of chemistry, in particular
in drug design, lead compound discovery and optimization,
ment of compound libraries, and prediction of adsorption, distribution, metabolism, excretion and toxicity (ADMET) properties
In this work, we use the atom-based linear indices cal-culated with the software TC ver 1.0 [16] as molecular descriptor Di k For each qth chemical, we calculated the
dif-ferent types of atom-based linear indices f q (N, M, w) g The norm selected was the Manhattan distance (N1) The
used M matrix was the graph-theoretical electronic-density
matrix [called non-stochastic (NS)] [28,29] The atoms in each molecular structure were differentiated with the fol-lowing physicochemical weights (w): Ghose–Crippen LogP, electronegativity, and van der Waals volume that can allow a better understanding of the problem Moreover, in our study the different groups of atoms calculated for the compounds were H bond acceptors (A), C atoms in aliphatic chain (C),
H bond donors (D), C atoms in aromatic portion (P), and heteroatoms (X) The general equation for the definition of the atom-based linear indices is shown below (Eq.1)
f qk (G, N1, M, w) g = f qk (w) g=
| f i|g (1) Computational methods
A theoretical framework in ALMA models
ALMA models may be classified as a general type of model
to assess the links in different systems These approaches are very adaptable to all molecular descriptors, graphs invariants,
or descriptors for complex networks Here we used f qklinear indices of kth type for the qth compound represented by a
matrix M The aim of this model is to link the scores S q (c r )
with the molecular descriptors Di k of a given compound d r
and the deviation terms f qk (c r ) = f qk − f qk (c r ) The
model has the following general form:
S q (c r ) = a0+
a
= a0+
b r k×f k q−f k q (c r ). (2)
The output-dependent variable is S q (c r ) = S q (c1, c2, c3,
c4, c5, c6, c7) = S q (measure type, target, target mapping, assay type, data curation, assay protocol, organism) In our
case, the attribute S q (c r ) is a mathematical annotation for the effects of the qth compound defined as d r , in the r th test developed in the c r terms In this Eq (2), the f qk and
f qk (c r ) are used as independent attributes The input
vari-able f qk (c r ) is the mean of the kth descriptors f qkof all
qth chemicals assayed in one test procedure developed in the
Trang 5Fig 2 Graphical flowchart of all the steps taken in this work to develop the new ALMA model for UPP inhibitors
the Box–Jenkins moving average operators proposed
previ-ously [12] in other successful applications [30–33] In the
definition of this MA approach thef qk (c r ) is the sum f qk
descriptors for the n r compounds evaluate under the same
term conditions c r Later, we proceed to divide this value by
the n r drugs as can be observed in Eq (3)
f qk (c r )= 1
n r
Developing and performance of QSAR ALMA model
In order to assemble the ALMA model, we used the
lin-ear discriminant analysis (LDA) technique implemented in
the software package STASTICA 6.0 [34] This heuristic
technique is very useful for the task of separating two or
more classes as described in detail in the technical
liter-ature [35,36] This algorithm is capable of finding
mod-els and giving in the output the prediction of the group
membership of new observations Besides, this technique
is one of the most commonly used with several
applica-tions in drug discovery for different biological activities that
are included among others: theoretical studies of acetyl-cholinesterase inhibitors, modeling of anti-allergic natural compounds by molecular topology and predictive model-ing of human monoamine oxidase inhibitors [37–39], and more recently in high-dimensional datasets [40] In our study, the STATISTICA software [41] was used to develop the ALMA model, and performance parameters were consid-ered to assess the quality of the classification functions In the same way, the quantity of variables in the models was kept
to minimum taking into account the principle of parsimony (Occam’s razor)
The quality of this ALMA model was determined by examining Wilks’ λ parameter (U statistic), whose values
for the overall discrimination can take values in the range from 0 (perfect discrimination) to 1 (no discrimination) The
square Mahalanobis distance (D2) indicates the separation
of the respective groups, showing whether the model pos-sesses an appropriate discriminatory power for differenti-ating between the two corresponding groups The Fisher
ratio (F ), the corresponding p level [p(F)], the accuracy
(Ac), specificity (Sp), and sensitivity (Sn) were also used to assess the quality performance of the ALMA model [33] In
Trang 6Mol Divers
Table 2 Results of ALMA models
Sub-set Stat a % Groupsb I qt (c r )pred
= 1c
I qt (c r )pred
= 0d
Train Sp 73.0 I qt (c r )obs = 1 1,007 372
Ac 71.6 Total
CV Sp 70.9 I qt (c r )obs = 1 334 137
Sn 70.6 I qt (c r )obs = 0 273 655
Ac 70.7 Total
aSn positive correct/positive total, Sp negative correct / negative total,
Ac total correct/total
bI qt (c r )obs Observed experimental measure of interaction/not
interaction (1/0) with the rth target
cPrediction of the experimental measure of interaction with the r th
target
d Prediction of the experimental measure of the not interaction with the
r th target
Fig.2we depict the graphical flowchart for all steps given in
this work in order to develop a new ALMA model for UPP
inhibitors
Results and discussion
Model training and validation
Here, we report the first ALMA model to predict the
experi-mental measure of interaction with the r th target (I qr = 1) or
not(I qr = 0) when the qth drug presents a value higher than
average The output S q (c r ) of our ALMA model depends
on both chemical structure of the qth compound and the set
of conditions selected to carry out the biological assay(c r ).
Therefore, different outputs in terms of probabilities should
be expected if the test conditions c rare changed for the same
compound [42] The boundary conditions c r included here
are those defined previously in “Computational methods”
section As can be noted in Table2, the values of accuracy,
specificity, and sensitivity of the ALMA classification
equa-tion for the training and calibraequa-tion sets are above 70 % These
values are considered adequate in bioactivity data modeling
studies [38]
The statistical parameters used to measure the quality of
the equation were number of cases used to train the model
(N ), Chi-square (χ2), and p level [33] The probability
cut-off for this LDA model isi p1(c j ) > 0 => A i (c j ) = 1 In the
same way, due to the complexity of the molecular descriptors
in the equation, we depicted a more detailed description of
the meaning of the seven variables included in the model in
Table3
In this case, the equation that predicts probability outcome
above zero for a chemical d i has a positive response in the
r th tests developed using the c r terms The equation of the
Table 3 Variables used as input for the model
Variable Symbol Molecular descriptor details
f1 f q1 (N1, M, e)A Linear index of order 5 of
M calculated for the set of atoms A using e
f2 f q2 (N1, M, e)D Linear index of order 0 of
M calculated for the set of atoms D using e
f3 f q3 (N1, M, v)A Linear index of order 3 of
M calculated for the set of atoms A using v
f4 f q4 (N1, M, v)D Linear index of order 5 of
M calculated for the set of atoms D using v
f5 f q5 (N1,M, e)D Linear index of order 4 of
M calculated for the set of atoms D using e
f6 f q6 (N1, M, e)D Linear index of order 0 of
M calculated for the set of atoms D using e
f7 f q7 (N1, M, e)X Linear index of order 0 of
M calculated for the set of atoms X using e
M is the graph-theoretical electronic-density matrix The sets of atoms
for local indices are A set of H bond acceptor atoms (N, O, F, Cl), D set
of H bond donors (N and O atoms that have one bond with an H atom),
and X heteroatom (all atoms different to C and H atoms) The weight
vectors used to calculate the linear indices were v for atomic Van der Waals volumes and e vector for atomic electronegativities.
S q (c r ) = −0.2159 − 0.0004 × f1− 0.2265 × f2
+ 0.0007 × f3+ 0.0002 × f4− 0.0027 × f5
+ 0.1358 × f6+ 0.0408 × f7
N = 4203 χ2= 838.661 p < 0.005. (4)
As can be observed from Eq.4, the parametersf1, f2, and f5 have negative impact in the activity, and these are the boundary conditions related to measure, target, and data curation, respectively On the other hand, the variables
f3, f4, f6, andf7(mapping, assay type, protocol, and organism) have a positive influence on the activity Besides, using this equation we can have the parameters that contribute most to the activity In the case off6, with a coefficient of 0.1358, which is a very reasonable result because the most important variations in the activity, even in the same com-pounds are given by the different protocols used to quantify the activity The same occurs with f2 parameter, which has a coefficient of 0.2265 in the equation, with a significant negative contribution to the activity
To use this model in predictive studies, we only have to substitute the value of the molecular descriptor of the com-pound( f qk ) in the Eq.4, and the respective average value
of the descriptor for all compounds was measured under the
Trang 7Experimental measure
c1
f1
)av
Experimental measure
c1
f1
)av
Experimental measure
c1
f1
)av
Experimental measure
c1
f1
)av
K i
1)
1)
c2
f2
)av
c2
f2
)av
c3
f3
)av
c3
f3
)av
c3
f3
)av
c6
f6
)av
c7
f7
)av
c7
f7
)av
c7
f7
)av
Trang 8Mol Divers
ples of these average values for different targets, measures,
assays, and organisms are depicted Moreover, in Table 1 of
online Supplementary Material 2 the values of these
parame-ters for the seven reference conditions are listed
In addition, for each boundary condition, different
quan-tities of experimental values can be obtained As can be
observed in Table4, for the case of the experimental
mea-sures, some values such as I C50 (nM), EC50 (nM), and
K i (nM) are most represented, which gives more accuracy
when performing predictions Although other values are less
represented, like ratio (M/s), and ED50 (µM) this also gives
diversity, being representative of other experimental
para-meters that could be explored by the researchers The same
occurs for the other used boundary conditions, illustrating
the wide diversity of experimental assays, targets, and
organ-isms, to which predictions could be done
Our current approach versus previous methods
For any modeling study as our case of UPP inhibitory
activity, the adequate performances for the training and
test set should be proved, and after that, the comparison
with previous methodologies should be assessed
There-fore, we reviewed the literature to search for QSAR
stud-ies on UPP inhibitory activity In our case, we only found
one study for this thematic, containing a database
consist-ing of 705 cases (compounds) [43] For this report, the
val-ues of accuracy in training and prediction sets overcome
the results of our experiment with values greater than 85
and 80 %, respectively However, in the case of the
previ-ous study, the dataset used only consist of one target and
experimental assay under one condition The techniques
used are based on machine learning algorithms opposite to
our case, where a simplex technique like linear
discrimi-nant analysis was used As mentioned in previous items,
the main advantage of our proposed ALMA model is the
use of different reference conditions (assays, targets,
organ-ism, etc.) from which a wide variety of predictions could be
done
Conclusions
The ubiquitin–proteasome pathway (UPP) plays a main role
in many human pathologies, such as multiple myeloma,
neurodegenerative diseases, and others that have a great
impact in the human kind However, the traditional
meth-ods for the identification of hit or lead compounds to
be introduced in the drug-like research process is getting
more difficult In this sense, the in silico techniques in the
drug discovery are proposed as one of the solutions that
could help this process to become more efficient and fast
New algorithms that involve information technology,
sta-discovery, can be useful to accelerate the identification of compounds with high qualities using minimum resources This new method should be successful for fast and paral-lel evaluation of huge structural chemical databases [44] These strategies, which are more efficient, can be used in complement with the QSAR models in virtual assays, and the costs can be reduced in all terms of massive screen-ing [45,46]
In this sense, we developed a useful mt-QSAR and mo-QSAR model based on TC atom-based linear indices to fit a large and complex data extracted from ChEMBl This ALMA model, based in multiplexing, was capable of discriminat-ing with good performances in different boundary condi-tions that include assay condicondi-tions, targets, and organisms among others Moreover, the influence (positive or negative)
of each parameter for the activity was explained in some detail together with the contribution (high or low) of the dif-ferent boundary conditions to the UPP inhibitory activity This allows to look forward to many new insights in the field of UPP inhibitors research for the years to come, and how the combination of molecular descriptors and Box– Jenkins moving average operators helps to develop useful multi-output models This new type of QSAR models could
be used as innovative technologies with the aim to increase the hit rates discovery on biomolecular screening tasks for the identification of potential active compounds Finally, the present report opens new ways for the search of drugs that interact with different targets in the UPP linked to the search
of new chemical entities that are active against neurodegen-erative diseases, inflammation, or cancer
Acknowledgments Casañola-Martin, G M thanks the program
Estades Temporals per a Investigadors Convidats for a fellowship to
research at Valencia University (2013–2014) Le-Thi-Thu, H grate-fully acknowledges the support from the National Vietnam National
University, Hanoi Marrero-Ponce, Y thanks the International
Profes-sor program for a fellowship to work at Cartagena University in the
year 2013–2014 Also, thanks to Prof Aroa Reguero from the Pontif-ical Catholic University of Ecuador in Esmeraldas (PUCESE) for her help in the review of the manuscript Finally, the authors also thank the anonymous referees and editor for their useful comments that con-tributed to the improvement of this work.
References
1 Ciechanover A (2005) Proteolysis: from the lysosome to ubiquitin and the proteasome Nat Rev Mol Cell Biol 6:79–87 doi:10.1038/ nrm1552
2 Tu Y, Chen C, Pan J, Xu J, Zhou ZG, Wang CY (2012) The ubiquitin proteasome pathway (UPP) in the regulation of cell cycle control and DNA damage repair and its implication in tumorigenesis Int
J Clin Exp Pathol 5:726–738
3 Zhang J, Wu P, Hu Y (2013) Clinical and marketed proteasome inhibitors for cancer treatment Curr Med Chem 20:2537–2551.
Trang 94 Pevzner Y, Metcalf R, Kantor M, Sagaro D, Daniel K (2013) Recent
advances in proteasome inhibitor discovery Expert Opin Drug
Dis-cov 8:537–568 doi:10.1517/17460441.2013.780020
5 Heikamp K, Bajorath J (2011) Large-scale similarity search
profil-ing of ChEMBL compound data sets J Chem Inf Model 51:1831–
1839 doi:10.1021/ci200199u
6 Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A,
Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington
JP (2012) ChEMBL: a large-scale bioactivity database for drug
discovery Nucleic Acids Res 40:D1100–D1107 doi:10.1093/nar/
gkr777
7 Mok NY, Brenk R (2011) Mining the ChEMBL database: an
effi-cient chemoinformatics workflow for assembling an ion
channel-focused screening library J Chem Inf Model 51:2449–2454.
doi:10.1021/ci200260t
8 Hu Y, Bajorath J (2010) Molecular scaffolds with high propensity
to form multi-target activity cliffs J Chem Inf Model 50:500–510.
doi:10.1021/ci100059q
9 Erhan D, L’Heureux PJ, Yue SY, Bengio Y (2006) Collaborative
fil-tering on a family of biological targets J Chem Inf Model 46:626–
635 doi:10.1021/ci050367t
10 Namasivayam V, Hu Y, Balfer J, Bajorath J (2013) Classification of
compounds with distinct or overlapping multi-target activities and
diverse molecular mechanisms using emerging chemical patterns.
J Chem Inf Model 53:1272–1281 doi:10.1021/ci400186n
11 Tenorio-Borroto E, Garcia-Mera X, Penuelas-Rivas CG,
Vasquez-Chagoyan JC, Prado-Prado FJ, Castanedo N, Gonzalez-Diaz H
(2013) Entropy model for multiplex drug–target interaction
end-points of drug immunotoxicity Curr Top Med Chem 13:1636–
1649 doi:10.2174/15680266113139990114
12 Box GEP, Jenkins GM (1970) Time series analysis: forecasting and
control Holden-Day, San Francisco
13 Speck-Planche A, Kleandrova VV, Cordeiro MN (2013)
Chemoin-formatics for rational discovery of safe antibacterial drugs:
simul-taneous predictions of biological activity against streptococci and
toxicological profiles in laboratory animals Bioorg Med Chem
21:2727–2732 doi:10.1016/j.bmc.2013.03.015
14 Speck-Planche A, Kleandrova VV, Luan F, Cordeiro MN (2012)
Chemoinformatics in multi-target drug discovery for anti-cancer
therapy: in silico design of potent and versatile anti-brain tumor
agents Anti-Cancer Agent Med Chem 12:678–685 doi:10.2174/
187152012800617722
15 Speck-Planche A, Kleandrova VV, Luan F, Cordeiro MN (2012)
Chemoinformatics in anti-cancer chemotherapy: multi-target
QSAR model for the in silico discovery of anti-breast cancer agents.
Eur J Pharm Sci 47:273–279 doi:10.1016/j.ejps.2012.04.012
16 Marrero-Ponce Y, Valdés-Martini JR, Jacas CRG (2012)
TOMOCOMD-CARDD QuBiLS Software QUBILs-MAS
Ver-sion 1.0, CAMD-BIR Unit, Universidad Central “Marta Abreu” de
Las Villas
17 Marrero-Ponce Y, Medina-Marrero R, Castillo-Garit JA,
Romero-Zaldivar V, Torrens F, Castro EA (2005) Protein linear indices of
the ‘macromolecular pseudograph alpha-carbon atom adjacency
matrix’ in bioinformatics Part 1: prediction of protein stability
effects of a complete set of alanine substitutions in Arc repressor.
Bioorg Med Chem 13:3003–3015 doi:10.1016/j.bmc.2005.01.062
18 Marrero-Ponce Y, Castillo-Garit JA, Torrens F, Romero-Zaldivar
V, Castro E (2004) Atom, atom-type, and total linear indices of the
“molecular pseudograph’s atom adjacency matrix”: application to
QSPR/QSAR studies of organic compounds Molecules 9:1100–
1123 doi:10.3390/91201100
19 Marrero-Ponce Y, Medina-Marrero R, Martinez Y, Torrens F,
Romero-Zaldivar V, Castro EA (2006) Non-stochastic and
stochas-tic linear indices of the molecular pseudograph’s atom adjacency
matrix: a novel approach for computational -in silico- screening
and “rational” selection of new lead antibacterial agents J Mol Mod 12:255–271 doi:10.1007/s00894-005-0024-8
20 Rescigno A, Casañola-Martin GM, Sanjust E, Zucca P, Marrero-Ponce Y (2011) Vanilloid derivatives as tyrosinase inhibitors driven
by virtual screening-based QSAR models Drug Test Anal 3:176–
181 doi:10.1002/dta.187
21 Vega MC, Montero-Torres A, Marrero-Ponce Y, Rolón M, Gómez-Barrio A, Escario JA, Arán VJ, Nogal JJ, Meneses-Marcel A, Tor-rens F (2006) New ligand-based approach for the discovery of antit-rypanosomal compounds Bioorg Med Chem Lett 16:1898–1904 doi:10.1016/j.bmcl.2005.12.087
22 Brito-Sánchez Y, Castillo-Garit JA, Le-Thi-Thu H, González-Madariaga Y, Torrens F, Marrero-Ponce Y, Rodríguez-Borges JE (2013) Comparative study to predict toxic modes of action of phe-nols from molecular structures SAR QSAR Environ Res 24:235–
251 doi:10.1080/1062936x.2013.766260
23 Marrero-Ponce Y, Castillo-Garit JA, Nodarse D (2005) Linear indices of the ‘macromolecular graph’s nucleotides adjacency matrix’ as a promising approach for bioinformatics studies Part 1: prediction of paromomycin’s affinity constant with HIV-1 psi-RNA packaging region Bioorg Med Chem 13:3397–3404 doi:10 1016/j.bmc.2005.03.010
24 Marrero-Ponce Y, Medina-Marrero R, Castillo-Garit JA, Romero-Zaldivar V, Torrens F, Castro EA (2005) Protein linear indices of the
‘macromolecular pseudographα-carbon atom adjacency matrix’
in bioinformatics Part 1: prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor Bioorg Med Chem 13:3003–3015 doi:10.1016/j.bmc.2005.01.062
25 Luan F, Cordeiro MN, Alonso N, Garcia-Mera X, Caamano O, Romero-Duran FJ, Yanez M, Gonzalez-Diaz H (2013) TOPS-MODE model of multiplexing neuroprotective effects of drugs and experimental-theoretic study of new 1,3-rasagiline deriva-tives potentially useful in neurodegenerative diseases Bioorg Med Chem 21:1870–1879 doi:10.1016/j.bmc.2013.01.035
26 Marzaro G, Chilin A, Guiotto A, Uriarte E, Brun P, Castagli-uolo I, Tonus F, Gonzalez-Diaz H (2011) Using the TOPS-MODE approach to fit multi-target QSAR models for tyrosine kinases inhibitors Eur J Med Chem 46:2185–2192 doi:10.1016/j.ejmech 2011.02.072
27 Alonso N, Caamano O, Romero-Duran FJ, Luan F, Dias Soeiro Cordeiro MN, Yanez M, Gonzalez-Diaz H, Garcia-Mera X (2013) Model for high-throughput screening of multi-target drugs in chem-ical neurosciences; synthesis, assay and theoretic study of rasagi-line carbamates ACS Chem Neurosci 4:1393–1403 doi:10.1021/ cn400111n
28 Marrero-Ponce Y, Castillo-Garit JA, Olazabal E, Serrano HS, Morales A, Castanedo N, Ibarra-Velarde F, Huesca-Guillen A, Sanchez AM, Torrens F, Castro EA (2005) Atom, atom-type and total molecular linear indices as a promising approach for bioorganic and medicinal chemistry: theoretical and experimen-tal assessment of a novel method for virtual screening and rational design of new lead anthelmintic Bioorg Med Chem 13:1005–1020 doi:10.1016/j.bmc.2004.11.040
29 Marrero-Ponce Y, Machado-Tugores Y, Pereira DM, Escario JA, Barrio AG, Nogal-Ruiz JJ, Ochoa C, Aran VJ, Martinez-Fernandez
AR, Sanchez RN, Montero-Torres A, Torrens F, Meneses-Marcel
A (2005) A computer-based approach to the rational discovery of new trichomonacidal drugs by atom-type linear indices Curr Drug Discov Technol 2:245–265 doi:10.2174/157016305775202955
30 Concu R, Dea-Ayuela MA, Perez-Montoto LG, Prado-Prado FJ, Uriarte E, Bolas-Fernandez F, Podda G, Pazos A, Munteanu CR, Ubeira FM, Gonzalez-Diaz H (2009) 3D entropy and moments pre-diction of enzyme classes and experimental–theoretic study of pep-tide fingerprints in Leishmania parasites Biochim Biophys Acta 1794:1784–1794 doi:10.1016/j.bbapap.2009.08.020
Trang 10Mol Divers
31 Speck-Planche A, Kleandrova VV, Luan F, Cordeiro MN (2011)
Multi-target drug discovery in anti-cancer therapy: fragment-based
approach toward the design of potent and versatile anti-prostate
cancer agents Bioorg Med Chem 19:6239–6244 doi:10.1016/j.
bmc.2011.09.015
32 Tenorio-Borroto E, Rivas CGP, Chagoyan JCV, Castanedo N,
Prado-Prado FJ, Garcia-Mera X, Gonzalez-Diaz H (2012) ANN
multiplexing model of drugs effect on macrophages; theoretical
and flow cytometry study on the cytotoxicity of the anti-microbial
drug G1 in spleen Bioorg Med Chem doi:10.1016/j.bmc.2012.07.
020
33 Hill T, Lewicki P (2006) Statistics: methods and applications: a
comprehensive reference for science, industry and data mining.
StatSoft, Tulsa
34 StatSoft Inc (2002) STATISTICA (data analysis software system),
version 6.0
35 Tabachnick BG, Fidell LS (1996) Using multivariate statistics.
HarperCollins College, NewYork
36 Duart MJ, García-Domenech R, Anton-Fos GM, Galvez J (2001)
Optimization of a mathematical topological pattern for the
predic-tion of antihistaminic activity J Comput Aided Mol Des 15:561–
572 doi:10.1023/A:1011115824070
37 Prado-Prado FJ, Escobar M, García-Mera X (2013)
Review of bioinformatics and theoretical studies of
acetyl-cholinesterase inhibitors Curr Bioinform 8:496–510 doi:10.
2174/1574893611308040012
38 García-Domenech R, Zanni R, Galvez-Llompart M, De
Julián-Ortiz JV (2013) Modeling anti-allergic natural compounds by
molecular topology Comb Chem High Throughput Screen 16:628–
635 doi:10.2174/1386207311316080005
39 Helguera AM, Pérez-Garrido A, Gaspar A, Reis J, Cagide F, Vina
D, Cordeiro MNDS, Borges F (2013) Combining QSAR classifica-tion models for predictive modeling of human monoamine oxidase inhibitors Eur J Med Chem 59:75–90 doi:10.1016/j.ejmech.2012 10.035
40 Mai Q (2013) A review of discriminant analysis in high dimensions Wiley Interdisciplin Rev Computat Statist 5:190–197 doi:10.1002/ wics.1257
41 StatSoft Inc (2001) STATISTICA (data analysis software system)
vs 6.0 StatSoft Inc., Tulsa
42 Gerets HH, Dhalluin S, Atienzar FA (2011) Multiplexing cell viability assays Methods Mol Biol 740:91–101 doi:10.1007/ 978-1-61779-108-6-11
43 Casañola-Martin GM, Le-Thi-Thu H, Marrero-Ponce Y, Castillo-Garit JA, Torrens F, Perez-Gimenez F, Abad C (2014) Analy-sis of proteasome inhibition prediction using atom-based quadratic indices enhanced by machine learning classification techniques Lett Drug Des Discov 11:705–711 doi:10.2174/ 1570180811666140122001144
44 Oprea TI (2002) Current trends in lead discovery: are we looking for the appropiate properties? J Comput Aid Mol Des 16:325–334 doi:10.1023/A:1020877402759
45 Xu J, Hagler A (2002) Chemoinformatics and drug discovery Molecules 7:566–700 doi:10.3390/70800566
46 Seifert HJM, Wolf K, Vitt D (2003) Virtual high-throughput
in silico screening Biosilico 1:143–149 doi:10.1016/ S1478-5382(03)02359-X