1. Trang chủ
  2. » Luận Văn - Báo Cáo

Multi output model with box jenkins operators of linear indices to predict multi target inhibitors of ubiquitin proteasome pathway

10 4 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 685,29 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

In this work, we used linear molecular indices calculated with the software TOMOCOMD-CARDD and Box–Jenkins moving average operators to develop a multi-output model that can predict outco

Trang 1

DOI 10.1007/s11030-015-9571-9

F U L L - L E N G T H PA P E R

Multi-output model with Box–Jenkins operators of linear

indices to predict multi-target inhibitors of

ubiquitin–proteasome pathway

Gerardo M Casañola-Martin · Huong Le-Thi-Thu · Facundo Pérez-Giménez ·

Yovani Marrero-Ponce · Matilde Merino-Sanjuán · Concepción Abad ·

Humberto González-Díaz

Received: 11 August 2014 / Accepted: 14 February 2015

© Springer International Publishing Switzerland 2015

Abstract The ubiquitin–proteasome pathway (UPP) plays

an important role in the degradation of cellular proteins and

regulation of different cellular processes that include cell

cycle control, proliferation, differentiation, and apoptosis In

this sense, the disruption of proteasome activity leads to

dif-ferent pathological states linked to clinical disorders such

as inflammation, neurodegeneration, and cancer The use of

UPP inhibitors is one of the proposed approaches to manage

these alterations On other hand, the ChEMBL database

con-tains >5,000 experimental outcomes for >2,000 compounds

tested as possible proteasome inhibitors using a large number

of pharmacological assay protocols All these assays report a

large number of experimental parameters of biological

activ-ity like EC50 , IC50, percent of inhibition, and many others

that have been determined under many different conditions,

targets, organisms, etc Although this large amount of data

offers new opportunities for the computational discovery of

Electronic supplementary material The online version of this

article (doi:10.1007/s11030-015-9571-9) contains supplementary

material, which is available to authorized users.

G M Casañola-Martin (B) · C Abad

Departament de Bioquímica i Biologia Molecular, Universitat de

Valèn-cia, 46100 Burjassot, Spain

e-mail: gmaikelc@gmail.com; gerardo.casanola@uv.es

G M Casañola-Martin · F Pérez-Giménez

Unidad de Investigación de Diseño de Fármacos y Conectividad

Mole-cular, Departamento de Química Física, Facultad de Farmacia,

Univer-sitat de València, Valencia, Spain

G M Casañola-Martin

Faculty of Environmental Science, Pontifical University Catholic of

Ecuador in Esmeraldas (PUCESE), C/ Espejo y Santa Cruz S/N, 080150

Esmeraldas, Ecuador

H Le-Thi-Thu

School of Medicine and Pharmacy, Vietnam National University Hanoi

(VNU), 144 Xuan Thuy, Cau Giay, Hanoi, Vietnam

proteasome inhibitors, the complexity of these data repre-sents a bottleneck for the development of predictive models

In this work, we used linear molecular indices calculated with the software TOMOCOMD-CARDD and Box–Jenkins moving average operators to develop a multi-output model that can predict outcomes for 20 experimental parameters

in >450 assays carried out under different conditions This generated multi-output model showed values of accuracy, sensitivity, and specificity above 70 % for training and val-idation series Finally, this model is considered multi-target and multi-scale, because it predicts the inhibition of the UPP for drugs against 22 molecular or cellular targets of different organisms contained in the ChEMBL database

Keywords Ubiquitin–proteasome pathway inhibitors· CHEMBL· Multi-target · Multi-scale and multi-output models· Moving averages · QSAR

Y Marrero-Ponce Facultad de Química Farmacéutica, Universidad de Cartagena, Cartagena de Indias, Bolivar, Colombia

M Merino-Sanjuán Department of Pharmacy and Pharmaceutical Technology, University

of Valencia, Valencia, Spain

M Merino-Sanjuán Institute of Molecular Recognition and Technological Development (IDM), Inter-Universitary Institute from Polytechnic University of Valencia and University of Valencia, Valencia, Spain

H González-Díaz (B) Department of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940 Leioa, Spain

e-mail: humberto.gonzalezdiaz@ehu.es

H González-Díaz IKERBASQUE, Basque Foundation for Science,

48011 Bilbao, Spain

Trang 2

Mol Divers

Introduction

The ubiquitin–proteasome pathway is one of the two main

proteolytic systems in mammalian cells [1] This pathway

is involved in a great number of cellular processes that

include cellular homeostasis, cell cycle control, gene

expres-sion, DNA repair, signal transduction, immune responses,

and apoptosis [2] The growing list of human diseases in

which protein homeostasis is disrupted reveals the

impor-tance of the ubiquitin–proteasome pathway for normal

cellular function and its potential as a therapeutic

tar-get [3] The proteasome core was a primary target inhibitor

to cancer therapy since the discovery of the proteasome

inhibitor bortezomid, and at present, the process of

pro-teasome inhibitors development involves the use of many

methods [4] Current efforts in this field of proteasome

inhibitors are aimed to the search for new drugs against

the ubiquitin–proteasome pathway, showing greater

selec-tivity, potency, and safety properties to minimize

side-effects

In this sense, it is important to develop new in silico

mod-els in order to predict novel, potent, and selective ubiquitin–

proteasome pathway inhibitors Due to its accessibility, in

this work it is necessary to carry out a compilation of

large datasets of these compounds from public sources The

CHEMBL database [5,6] (https://www.ebi.ac.uk/chembldb)

includes more than 11,420,000 activity data for >1,295,500

compounds, and 9,844 targets This vast quantity of data

opens a widespread field for the application of

computa-tional approaches for activity prediction [6,7] The

analy-sis of the data is very complex due to the three types

of chemical and pharmacological information that appears:

(1) targeting, (2) outputting, and/or (3)

multi-scaling Therefore, the multi-targeting approach emerges

from the formation of different pairs of interactions (I qr )

between drugs (d q ) and targets (t r ) [8 10] In our case,

the target interactions are represented as networks of nodes

(proteins, genes, RNAs, miRNAs) interconnected by a link

when there is a target–target interaction between two of them

The multi-output complication comprises the use of different

types of targets, assay conditions, assays, organisms,

experi-mental measures, etc., in order to decide whether two nodes

(assays, drugs, targets, etc.) are linked I i j = 1 or not I i j = 0

The case of multi-scaling is given by the different structural

levels of the organization of matter that can be described

by different input variables In this sense, the models need

to be multi-scale to collect the information at some of the

following levels: molecular structure (drugs),

macromolec-ular structure (molecmacromolec-ular targets), cellmacromolec-ular (cellmacromolec-ular line

tar-gets), and organisms (species from where the targets were

extracted) In our previous study, we used the

MARCH-INSIDE (MI) to obtain the Shannon entropy measures of a

molecular graph (G) which we used in turn as inputs for Box– Jenkins moving average (MA) operators used in time series analysis [11] MA models gained popularity after the initial proposed researches by Box and Jenkins [12] about autore-gressive integrated Box–Jenkins moving average (ARIMA) and similar models The Box–Jenkins MA operators used in time series are the average values of one characteristic of the system for different intervals of time or seasons In multi-output modeling, we calculate the MA operators as the aver-age of the property of the system (molecular descriptors or any other property, to be considered) for all drugs or targets with a specific response in one assay carried out under a sub-set of reference conditions(c r ) Consequently, our MA

oper-ator acts over a sub-set of conditions of the pharmacological assays The application of MA operators to other domains different from time is increased due to its wide applications

In this sense, the main objective of this kind of work is to assess interactions or links between drugs and targets, pro-teins, brain regions, and other complex systems For this, the use of MA properties of network nodes (drugs, proteins, reactions, laws, neurons, etc.) that form links(I qr ), in

spe-cific the rth sub-set of reference conditions(c r ) is adequate.

For this reason, we decided to call this strategy as assessing

of links with moving averages (ALMA), in a similar manner

as other authors for different multi-target and/or multi-output (mo) models [13–15]

The method is very versatile, because we can use molecu-lar descriptors calculated by different chemoinformatics soft-ware as input The softsoft-ware TOMOCOMD-CARDD (TC), developed by Marrero-Ponce et al [16], is a well-known tool for the calculation of several families of 2D/3D molecular descriptors In particular, we can use TC to calculate

dif-ferent types of atom-based linear indices f q (G, N, M, w) g

for a given compound (qth compound) We can compute these indices for the molecular graph G of the compound,

taking into consideration a specific norm (N), matrix (M),

a vector of physicochemical weights (w) for atoms, etc.

In addition, we can determinate linear indices for

differ-ent groups (g) of atoms in the molecule and assign them

different values according to the specific molecular frag-ments selected Some applications of linear indices include the estimation of chemical, physical, and kinetics properties

of compounds [17,18] Studies of different biological activi-ties are also encouraged by this method Some examples are

on antibacterials [19], tyrosinase inhibitors [20], trypanoso-mal inhibitors [21] and so on [22] Besides, the linear indices are very flexible and useful to study different complex sys-tems The types of complex systems already studied with linear indices include RNA secondary structures [23], and protein stability effects [24]

In a recent work [25], an ALMA model for neuropro-tective drugs present in CHEMBL was capable of

Trang 3

predict-Fig 1 A representative sample of the compounds used in this study together with its ChEMBL code

ing I qr of drugs with targets in multi-output tests taking

into account the drug responses In the parametrization of

structural parameters of compounds, the TOPS-MODE

pro-gram [26] was used In a more recent work [27], using

MI scheme an ALMA classifier with good performance

was found Both models were able to predict the links

between drugs and targets However, we did not carry out

a formal construction or a comparison of the drug–target

networks for the CHEMBL data in the previous papers

In any case, despite the high versatility of entropy

mea-sures to codify structural information, there is not any

report of a multi-target model for drug–target interactions

for compounds with inhibitory activity of the ubiquitin–

proteasome pathway Therefore, in this study, we describe

for the first time a target, outputting, and

multi-scale ALMA model based on atom-based linear indices for

CHEMBL data of ubiquitin–proteasome pathway inhibitory

compounds

Materials and methods

CHEMBL dataset: assembling of training and validation sets

We searched and downloaded from the public database ChEMBL a general data set composed of >5,602 results of multiple assays endpoints [5,6] for UPP inhibitors The value

of the observed (obs) class variable A q (c r )obs = 1 (active

compound) or A q (c r )obs = 0 (non-active compounds) to every qth drug was biologically assayed in different

con-ditions c r The dataset used to train and validate the model

includes N = 5, 602 statistical cases formed by N d = 2, 954

unique drugs, with each one of the drugs assayed in at least one out of 20 possible standard type measures, which were determined in at least one out of 474 assays Each assay involves, in turn, at least one out of 20 protein or cellular tar-gets from seven different organisms In Fig.1some examples

Trang 4

Mol Divers

Table 1 Dataset used in this study

of the compounds used in this work are shown A structural

diversity is encountered in the chemicals extracted from the

ChEMBL dataset with the UPP inhibitory activity as can be

observed In the same way, in Supplementary Material 1 the

SMILES codes of the 2,897 compounds used in this study

are depicted

As noted above, the total set of statistical cases (5,602)

formed all the experimental space used here In our case, at

the time of choosing the training and validation sets, we took

into account that each one of the different conditions would

be included in both training and validation sets, for the active

and inactive cases to guarantee an adequate and

representa-tive sample for the training and test sets Because of this, we

picked out randomly the compounds for the training (T) and

validation (CV) sets As shown in Table1, there are 1,827

active cases and 2,376 inactive ones belonging to the training

set (4,203 cases) The validation set consists of 1,399 cases

and has 607 active and 792 inactive cases These cases in

the validation set were never used in the development of the

ALMA models

Molecular descriptors: TOMOCOMD-CARDD atom-based

linear indices

TOMOCOMD-CARDD is a molecular descriptor (MD)

cal-culating program comprised of two suites with parallel

func-tionalities The first is a comprehensive collection of MD

calculating modules based on the so-called “relations

fre-quency matrices,” molecular fingerprints and a pool of the

most relevant MDs reported in the literature The second

suite comprises a set of modules derived from algebraic

considerations, collectively known as QuBiLS (acronym for

Quadratic, Bilinear and Linear MapS) This suite includes

three modules: (1) QuBiLS-MAS (QuBiLS-based on Graph–

Theoretical Electronic-Density Matrices and Atomic

weight-ingS), (2) QuBiLS-MIDAS (QuBiLS-based on MInkowski

Distance matrices and Atomic weightingS), and (3)

QuBiLS-POMAS (QuBiLS-based on molecular surface-based

POten-tial Matrices and Atomic weightingS) In this application,

only QuBiLs-MAS module is included QuBiLs-MAS

con-stitutes a unique combination of methods to calculate MDs

on an algebraic basis These MDs can be used for a wide

range of applications in all areas of chemistry, in particular

in drug design, lead compound discovery and optimization,

ment of compound libraries, and prediction of adsorption, distribution, metabolism, excretion and toxicity (ADMET) properties

In this work, we use the atom-based linear indices cal-culated with the software TC ver 1.0 [16] as molecular descriptor Di k For each qth chemical, we calculated the

dif-ferent types of atom-based linear indices f q (N, M, w) g The norm selected was the Manhattan distance (N1) The

used M matrix was the graph-theoretical electronic-density

matrix [called non-stochastic (NS)] [28,29] The atoms in each molecular structure were differentiated with the fol-lowing physicochemical weights (w): Ghose–Crippen LogP, electronegativity, and van der Waals volume that can allow a better understanding of the problem Moreover, in our study the different groups of atoms calculated for the compounds were H bond acceptors (A), C atoms in aliphatic chain (C),

H bond donors (D), C atoms in aromatic portion (P), and heteroatoms (X) The general equation for the definition of the atom-based linear indices is shown below (Eq.1)

f qk (G, N1, M, w) g = f qk (w) g=



| f i|g (1) Computational methods

A theoretical framework in ALMA models

ALMA models may be classified as a general type of model

to assess the links in different systems These approaches are very adaptable to all molecular descriptors, graphs invariants,

or descriptors for complex networks Here we used f qklinear indices of kth type for the qth compound represented by a

matrix M The aim of this model is to link the scores S q (c r )

with the molecular descriptors Di k of a given compound d r

and the deviation terms f qk (c r ) = f qk −  f qk (c r ) The

model has the following general form:

S q (c r ) = a0+





a

= a0+





b r k×f k q−f k q (c r ). (2)

The output-dependent variable is S q (c r ) = S q (c1, c2, c3,

c4, c5, c6, c7) = S q (measure type, target, target mapping, assay type, data curation, assay protocol, organism) In our

case, the attribute S q (c r ) is a mathematical annotation for the effects of the qth compound defined as d r , in the r th test developed in the c r terms In this Eq (2), the f qk and

f qk (c r ) are used as independent attributes The input

vari-able f qk (c r ) is the mean of the kth descriptors f qkof all

qth chemicals assayed in one test procedure developed in the

Trang 5

Fig 2 Graphical flowchart of all the steps taken in this work to develop the new ALMA model for UPP inhibitors

the Box–Jenkins moving average operators proposed

previ-ously [12] in other successful applications [30–33] In the

definition of this MA approach thef qk (c r ) is the sum f qk

descriptors for the n r compounds evaluate under the same

term conditions c r Later, we proceed to divide this value by

the n r drugs as can be observed in Eq (3)



f qk (c r )= 1

n r

Developing and performance of QSAR ALMA model

In order to assemble the ALMA model, we used the

lin-ear discriminant analysis (LDA) technique implemented in

the software package STASTICA 6.0 [34] This heuristic

technique is very useful for the task of separating two or

more classes as described in detail in the technical

liter-ature [35,36] This algorithm is capable of finding

mod-els and giving in the output the prediction of the group

membership of new observations Besides, this technique

is one of the most commonly used with several

applica-tions in drug discovery for different biological activities that

are included among others: theoretical studies of acetyl-cholinesterase inhibitors, modeling of anti-allergic natural compounds by molecular topology and predictive model-ing of human monoamine oxidase inhibitors [37–39], and more recently in high-dimensional datasets [40] In our study, the STATISTICA software [41] was used to develop the ALMA model, and performance parameters were consid-ered to assess the quality of the classification functions In the same way, the quantity of variables in the models was kept

to minimum taking into account the principle of parsimony (Occam’s razor)

The quality of this ALMA model was determined by examining Wilks’ λ parameter (U statistic), whose values

for the overall discrimination can take values in the range from 0 (perfect discrimination) to 1 (no discrimination) The

square Mahalanobis distance (D2) indicates the separation

of the respective groups, showing whether the model pos-sesses an appropriate discriminatory power for differenti-ating between the two corresponding groups The Fisher

ratio (F ), the corresponding p level [p(F)], the accuracy

(Ac), specificity (Sp), and sensitivity (Sn) were also used to assess the quality performance of the ALMA model [33] In

Trang 6

Mol Divers

Table 2 Results of ALMA models

Sub-set Stat a % Groupsb I qt (c r )pred

= 1c

I qt (c r )pred

= 0d

Train Sp 73.0 I qt (c r )obs = 1 1,007 372

Ac 71.6 Total

CV Sp 70.9 I qt (c r )obs = 1 334 137

Sn 70.6 I qt (c r )obs = 0 273 655

Ac 70.7 Total

aSn positive correct/positive total, Sp negative correct / negative total,

Ac total correct/total

bI qt (c r )obs Observed experimental measure of interaction/not

interaction (1/0) with the rth target

cPrediction of the experimental measure of interaction with the r th

target

d Prediction of the experimental measure of the not interaction with the

r th target

Fig.2we depict the graphical flowchart for all steps given in

this work in order to develop a new ALMA model for UPP

inhibitors

Results and discussion

Model training and validation

Here, we report the first ALMA model to predict the

experi-mental measure of interaction with the r th target (I qr = 1) or

not(I qr = 0) when the qth drug presents a value higher than

average The output S q (c r ) of our ALMA model depends

on both chemical structure of the qth compound and the set

of conditions selected to carry out the biological assay(c r ).

Therefore, different outputs in terms of probabilities should

be expected if the test conditions c rare changed for the same

compound [42] The boundary conditions c r included here

are those defined previously in “Computational methods”

section As can be noted in Table2, the values of accuracy,

specificity, and sensitivity of the ALMA classification

equa-tion for the training and calibraequa-tion sets are above 70 % These

values are considered adequate in bioactivity data modeling

studies [38]

The statistical parameters used to measure the quality of

the equation were number of cases used to train the model

(N ), Chi-square (χ2), and p level [33] The probability

cut-off for this LDA model isi p1(c j ) > 0 => A i (c j ) = 1 In the

same way, due to the complexity of the molecular descriptors

in the equation, we depicted a more detailed description of

the meaning of the seven variables included in the model in

Table3

In this case, the equation that predicts probability outcome

above zero for a chemical d i has a positive response in the

r th tests developed using the c r terms The equation of the

Table 3 Variables used as input for the model

Variable Symbol Molecular descriptor details

f1 f q1 (N1, M, e)A Linear index of order 5 of

M calculated for the set of atoms A using e

f2 f q2 (N1, M, e)D Linear index of order 0 of

M calculated for the set of atoms D using e

f3 f q3 (N1, M, v)A Linear index of order 3 of

M calculated for the set of atoms A using v

f4 f q4 (N1, M, v)D Linear index of order 5 of

M calculated for the set of atoms D using v

f5 f q5 (N1,M, e)D Linear index of order 4 of

M calculated for the set of atoms D using e

f6 f q6 (N1, M, e)D Linear index of order 0 of

M calculated for the set of atoms D using e

f7 f q7 (N1, M, e)X Linear index of order 0 of

M calculated for the set of atoms X using e

M is the graph-theoretical electronic-density matrix The sets of atoms

for local indices are A set of H bond acceptor atoms (N, O, F, Cl), D set

of H bond donors (N and O atoms that have one bond with an H atom),

and X heteroatom (all atoms different to C and H atoms) The weight

vectors used to calculate the linear indices were v for atomic Van der Waals volumes and e vector for atomic electronegativities.

S q (c r ) = −0.2159 − 0.0004 × f1− 0.2265 × f2

+ 0.0007 × f3+ 0.0002 × f4− 0.0027 × f5

+ 0.1358 × f6+ 0.0408 × f7

N = 4203 χ2= 838.661 p < 0.005. (4)

As can be observed from Eq.4, the parametersf1, f2, and f5 have negative impact in the activity, and these are the boundary conditions related to measure, target, and data curation, respectively On the other hand, the variables

f3, f4, f6, andf7(mapping, assay type, protocol, and organism) have a positive influence on the activity Besides, using this equation we can have the parameters that contribute most to the activity In the case off6, with a coefficient of 0.1358, which is a very reasonable result because the most important variations in the activity, even in the same com-pounds are given by the different protocols used to quantify the activity The same occurs with f2 parameter, which has a coefficient of 0.2265 in the equation, with a significant negative contribution to the activity

To use this model in predictive studies, we only have to substitute the value of the molecular descriptor of the com-pound( f qk ) in the Eq.4, and the respective average value

of the descriptor for all compounds was measured under the

Trang 7

Experimental measure

c1

f1

)av

Experimental measure

c1

f1

)av

Experimental measure

c1

f1

)av

Experimental measure

c1

f1

)av

K i

1)

1)

c2

f2

)av

c2

f2

)av

c3

f3

)av

c3

f3

)av

c3

f3

)av

c6

f6

)av

c7

f7

)av

c7

f7

)av

c7

f7

)av

Trang 8

Mol Divers

ples of these average values for different targets, measures,

assays, and organisms are depicted Moreover, in Table 1 of

online Supplementary Material 2 the values of these

parame-ters for the seven reference conditions are listed

In addition, for each boundary condition, different

quan-tities of experimental values can be obtained As can be

observed in Table4, for the case of the experimental

mea-sures, some values such as I C50 (nM), EC50 (nM), and

K i (nM) are most represented, which gives more accuracy

when performing predictions Although other values are less

represented, like ratio (M/s), and ED50 (µM) this also gives

diversity, being representative of other experimental

para-meters that could be explored by the researchers The same

occurs for the other used boundary conditions, illustrating

the wide diversity of experimental assays, targets, and

organ-isms, to which predictions could be done

Our current approach versus previous methods

For any modeling study as our case of UPP inhibitory

activity, the adequate performances for the training and

test set should be proved, and after that, the comparison

with previous methodologies should be assessed

There-fore, we reviewed the literature to search for QSAR

stud-ies on UPP inhibitory activity In our case, we only found

one study for this thematic, containing a database

consist-ing of 705 cases (compounds) [43] For this report, the

val-ues of accuracy in training and prediction sets overcome

the results of our experiment with values greater than 85

and 80 %, respectively However, in the case of the

previ-ous study, the dataset used only consist of one target and

experimental assay under one condition The techniques

used are based on machine learning algorithms opposite to

our case, where a simplex technique like linear

discrimi-nant analysis was used As mentioned in previous items,

the main advantage of our proposed ALMA model is the

use of different reference conditions (assays, targets,

organ-ism, etc.) from which a wide variety of predictions could be

done

Conclusions

The ubiquitin–proteasome pathway (UPP) plays a main role

in many human pathologies, such as multiple myeloma,

neurodegenerative diseases, and others that have a great

impact in the human kind However, the traditional

meth-ods for the identification of hit or lead compounds to

be introduced in the drug-like research process is getting

more difficult In this sense, the in silico techniques in the

drug discovery are proposed as one of the solutions that

could help this process to become more efficient and fast

New algorithms that involve information technology,

sta-discovery, can be useful to accelerate the identification of compounds with high qualities using minimum resources This new method should be successful for fast and paral-lel evaluation of huge structural chemical databases [44] These strategies, which are more efficient, can be used in complement with the QSAR models in virtual assays, and the costs can be reduced in all terms of massive screen-ing [45,46]

In this sense, we developed a useful mt-QSAR and mo-QSAR model based on TC atom-based linear indices to fit a large and complex data extracted from ChEMBl This ALMA model, based in multiplexing, was capable of discriminat-ing with good performances in different boundary condi-tions that include assay condicondi-tions, targets, and organisms among others Moreover, the influence (positive or negative)

of each parameter for the activity was explained in some detail together with the contribution (high or low) of the dif-ferent boundary conditions to the UPP inhibitory activity This allows to look forward to many new insights in the field of UPP inhibitors research for the years to come, and how the combination of molecular descriptors and Box– Jenkins moving average operators helps to develop useful multi-output models This new type of QSAR models could

be used as innovative technologies with the aim to increase the hit rates discovery on biomolecular screening tasks for the identification of potential active compounds Finally, the present report opens new ways for the search of drugs that interact with different targets in the UPP linked to the search

of new chemical entities that are active against neurodegen-erative diseases, inflammation, or cancer

Acknowledgments Casañola-Martin, G M thanks the program

Estades Temporals per a Investigadors Convidats for a fellowship to

research at Valencia University (2013–2014) Le-Thi-Thu, H grate-fully acknowledges the support from the National Vietnam National

University, Hanoi Marrero-Ponce, Y thanks the International

Profes-sor program for a fellowship to work at Cartagena University in the

year 2013–2014 Also, thanks to Prof Aroa Reguero from the Pontif-ical Catholic University of Ecuador in Esmeraldas (PUCESE) for her help in the review of the manuscript Finally, the authors also thank the anonymous referees and editor for their useful comments that con-tributed to the improvement of this work.

References

1 Ciechanover A (2005) Proteolysis: from the lysosome to ubiquitin and the proteasome Nat Rev Mol Cell Biol 6:79–87 doi:10.1038/ nrm1552

2 Tu Y, Chen C, Pan J, Xu J, Zhou ZG, Wang CY (2012) The ubiquitin proteasome pathway (UPP) in the regulation of cell cycle control and DNA damage repair and its implication in tumorigenesis Int

J Clin Exp Pathol 5:726–738

3 Zhang J, Wu P, Hu Y (2013) Clinical and marketed proteasome inhibitors for cancer treatment Curr Med Chem 20:2537–2551.

Trang 9

4 Pevzner Y, Metcalf R, Kantor M, Sagaro D, Daniel K (2013) Recent

advances in proteasome inhibitor discovery Expert Opin Drug

Dis-cov 8:537–568 doi:10.1517/17460441.2013.780020

5 Heikamp K, Bajorath J (2011) Large-scale similarity search

profil-ing of ChEMBL compound data sets J Chem Inf Model 51:1831–

1839 doi:10.1021/ci200199u

6 Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A,

Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington

JP (2012) ChEMBL: a large-scale bioactivity database for drug

discovery Nucleic Acids Res 40:D1100–D1107 doi:10.1093/nar/

gkr777

7 Mok NY, Brenk R (2011) Mining the ChEMBL database: an

effi-cient chemoinformatics workflow for assembling an ion

channel-focused screening library J Chem Inf Model 51:2449–2454.

doi:10.1021/ci200260t

8 Hu Y, Bajorath J (2010) Molecular scaffolds with high propensity

to form multi-target activity cliffs J Chem Inf Model 50:500–510.

doi:10.1021/ci100059q

9 Erhan D, L’Heureux PJ, Yue SY, Bengio Y (2006) Collaborative

fil-tering on a family of biological targets J Chem Inf Model 46:626–

635 doi:10.1021/ci050367t

10 Namasivayam V, Hu Y, Balfer J, Bajorath J (2013) Classification of

compounds with distinct or overlapping multi-target activities and

diverse molecular mechanisms using emerging chemical patterns.

J Chem Inf Model 53:1272–1281 doi:10.1021/ci400186n

11 Tenorio-Borroto E, Garcia-Mera X, Penuelas-Rivas CG,

Vasquez-Chagoyan JC, Prado-Prado FJ, Castanedo N, Gonzalez-Diaz H

(2013) Entropy model for multiplex drug–target interaction

end-points of drug immunotoxicity Curr Top Med Chem 13:1636–

1649 doi:10.2174/15680266113139990114

12 Box GEP, Jenkins GM (1970) Time series analysis: forecasting and

control Holden-Day, San Francisco

13 Speck-Planche A, Kleandrova VV, Cordeiro MN (2013)

Chemoin-formatics for rational discovery of safe antibacterial drugs:

simul-taneous predictions of biological activity against streptococci and

toxicological profiles in laboratory animals Bioorg Med Chem

21:2727–2732 doi:10.1016/j.bmc.2013.03.015

14 Speck-Planche A, Kleandrova VV, Luan F, Cordeiro MN (2012)

Chemoinformatics in multi-target drug discovery for anti-cancer

therapy: in silico design of potent and versatile anti-brain tumor

agents Anti-Cancer Agent Med Chem 12:678–685 doi:10.2174/

187152012800617722

15 Speck-Planche A, Kleandrova VV, Luan F, Cordeiro MN (2012)

Chemoinformatics in anti-cancer chemotherapy: multi-target

QSAR model for the in silico discovery of anti-breast cancer agents.

Eur J Pharm Sci 47:273–279 doi:10.1016/j.ejps.2012.04.012

16 Marrero-Ponce Y, Valdés-Martini JR, Jacas CRG (2012)

TOMOCOMD-CARDD QuBiLS Software QUBILs-MAS

Ver-sion 1.0, CAMD-BIR Unit, Universidad Central “Marta Abreu” de

Las Villas

17 Marrero-Ponce Y, Medina-Marrero R, Castillo-Garit JA,

Romero-Zaldivar V, Torrens F, Castro EA (2005) Protein linear indices of

the ‘macromolecular pseudograph alpha-carbon atom adjacency

matrix’ in bioinformatics Part 1: prediction of protein stability

effects of a complete set of alanine substitutions in Arc repressor.

Bioorg Med Chem 13:3003–3015 doi:10.1016/j.bmc.2005.01.062

18 Marrero-Ponce Y, Castillo-Garit JA, Torrens F, Romero-Zaldivar

V, Castro E (2004) Atom, atom-type, and total linear indices of the

“molecular pseudograph’s atom adjacency matrix”: application to

QSPR/QSAR studies of organic compounds Molecules 9:1100–

1123 doi:10.3390/91201100

19 Marrero-Ponce Y, Medina-Marrero R, Martinez Y, Torrens F,

Romero-Zaldivar V, Castro EA (2006) Non-stochastic and

stochas-tic linear indices of the molecular pseudograph’s atom adjacency

matrix: a novel approach for computational -in silico- screening

and “rational” selection of new lead antibacterial agents J Mol Mod 12:255–271 doi:10.1007/s00894-005-0024-8

20 Rescigno A, Casañola-Martin GM, Sanjust E, Zucca P, Marrero-Ponce Y (2011) Vanilloid derivatives as tyrosinase inhibitors driven

by virtual screening-based QSAR models Drug Test Anal 3:176–

181 doi:10.1002/dta.187

21 Vega MC, Montero-Torres A, Marrero-Ponce Y, Rolón M, Gómez-Barrio A, Escario JA, Arán VJ, Nogal JJ, Meneses-Marcel A, Tor-rens F (2006) New ligand-based approach for the discovery of antit-rypanosomal compounds Bioorg Med Chem Lett 16:1898–1904 doi:10.1016/j.bmcl.2005.12.087

22 Brito-Sánchez Y, Castillo-Garit JA, Le-Thi-Thu H, González-Madariaga Y, Torrens F, Marrero-Ponce Y, Rodríguez-Borges JE (2013) Comparative study to predict toxic modes of action of phe-nols from molecular structures SAR QSAR Environ Res 24:235–

251 doi:10.1080/1062936x.2013.766260

23 Marrero-Ponce Y, Castillo-Garit JA, Nodarse D (2005) Linear indices of the ‘macromolecular graph’s nucleotides adjacency matrix’ as a promising approach for bioinformatics studies Part 1: prediction of paromomycin’s affinity constant with HIV-1 psi-RNA packaging region Bioorg Med Chem 13:3397–3404 doi:10 1016/j.bmc.2005.03.010

24 Marrero-Ponce Y, Medina-Marrero R, Castillo-Garit JA, Romero-Zaldivar V, Torrens F, Castro EA (2005) Protein linear indices of the

‘macromolecular pseudographα-carbon atom adjacency matrix’

in bioinformatics Part 1: prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor Bioorg Med Chem 13:3003–3015 doi:10.1016/j.bmc.2005.01.062

25 Luan F, Cordeiro MN, Alonso N, Garcia-Mera X, Caamano O, Romero-Duran FJ, Yanez M, Gonzalez-Diaz H (2013) TOPS-MODE model of multiplexing neuroprotective effects of drugs and experimental-theoretic study of new 1,3-rasagiline deriva-tives potentially useful in neurodegenerative diseases Bioorg Med Chem 21:1870–1879 doi:10.1016/j.bmc.2013.01.035

26 Marzaro G, Chilin A, Guiotto A, Uriarte E, Brun P, Castagli-uolo I, Tonus F, Gonzalez-Diaz H (2011) Using the TOPS-MODE approach to fit multi-target QSAR models for tyrosine kinases inhibitors Eur J Med Chem 46:2185–2192 doi:10.1016/j.ejmech 2011.02.072

27 Alonso N, Caamano O, Romero-Duran FJ, Luan F, Dias Soeiro Cordeiro MN, Yanez M, Gonzalez-Diaz H, Garcia-Mera X (2013) Model for high-throughput screening of multi-target drugs in chem-ical neurosciences; synthesis, assay and theoretic study of rasagi-line carbamates ACS Chem Neurosci 4:1393–1403 doi:10.1021/ cn400111n

28 Marrero-Ponce Y, Castillo-Garit JA, Olazabal E, Serrano HS, Morales A, Castanedo N, Ibarra-Velarde F, Huesca-Guillen A, Sanchez AM, Torrens F, Castro EA (2005) Atom, atom-type and total molecular linear indices as a promising approach for bioorganic and medicinal chemistry: theoretical and experimen-tal assessment of a novel method for virtual screening and rational design of new lead anthelmintic Bioorg Med Chem 13:1005–1020 doi:10.1016/j.bmc.2004.11.040

29 Marrero-Ponce Y, Machado-Tugores Y, Pereira DM, Escario JA, Barrio AG, Nogal-Ruiz JJ, Ochoa C, Aran VJ, Martinez-Fernandez

AR, Sanchez RN, Montero-Torres A, Torrens F, Meneses-Marcel

A (2005) A computer-based approach to the rational discovery of new trichomonacidal drugs by atom-type linear indices Curr Drug Discov Technol 2:245–265 doi:10.2174/157016305775202955

30 Concu R, Dea-Ayuela MA, Perez-Montoto LG, Prado-Prado FJ, Uriarte E, Bolas-Fernandez F, Podda G, Pazos A, Munteanu CR, Ubeira FM, Gonzalez-Diaz H (2009) 3D entropy and moments pre-diction of enzyme classes and experimental–theoretic study of pep-tide fingerprints in Leishmania parasites Biochim Biophys Acta 1794:1784–1794 doi:10.1016/j.bbapap.2009.08.020

Trang 10

Mol Divers

31 Speck-Planche A, Kleandrova VV, Luan F, Cordeiro MN (2011)

Multi-target drug discovery in anti-cancer therapy: fragment-based

approach toward the design of potent and versatile anti-prostate

cancer agents Bioorg Med Chem 19:6239–6244 doi:10.1016/j.

bmc.2011.09.015

32 Tenorio-Borroto E, Rivas CGP, Chagoyan JCV, Castanedo N,

Prado-Prado FJ, Garcia-Mera X, Gonzalez-Diaz H (2012) ANN

multiplexing model of drugs effect on macrophages; theoretical

and flow cytometry study on the cytotoxicity of the anti-microbial

drug G1 in spleen Bioorg Med Chem doi:10.1016/j.bmc.2012.07.

020

33 Hill T, Lewicki P (2006) Statistics: methods and applications: a

comprehensive reference for science, industry and data mining.

StatSoft, Tulsa

34 StatSoft Inc (2002) STATISTICA (data analysis software system),

version 6.0

35 Tabachnick BG, Fidell LS (1996) Using multivariate statistics.

HarperCollins College, NewYork

36 Duart MJ, García-Domenech R, Anton-Fos GM, Galvez J (2001)

Optimization of a mathematical topological pattern for the

predic-tion of antihistaminic activity J Comput Aided Mol Des 15:561–

572 doi:10.1023/A:1011115824070

37 Prado-Prado FJ, Escobar M, García-Mera X (2013)

Review of bioinformatics and theoretical studies of

acetyl-cholinesterase inhibitors Curr Bioinform 8:496–510 doi:10.

2174/1574893611308040012

38 García-Domenech R, Zanni R, Galvez-Llompart M, De

Julián-Ortiz JV (2013) Modeling anti-allergic natural compounds by

molecular topology Comb Chem High Throughput Screen 16:628–

635 doi:10.2174/1386207311316080005

39 Helguera AM, Pérez-Garrido A, Gaspar A, Reis J, Cagide F, Vina

D, Cordeiro MNDS, Borges F (2013) Combining QSAR classifica-tion models for predictive modeling of human monoamine oxidase inhibitors Eur J Med Chem 59:75–90 doi:10.1016/j.ejmech.2012 10.035

40 Mai Q (2013) A review of discriminant analysis in high dimensions Wiley Interdisciplin Rev Computat Statist 5:190–197 doi:10.1002/ wics.1257

41 StatSoft Inc (2001) STATISTICA (data analysis software system)

vs 6.0 StatSoft Inc., Tulsa

42 Gerets HH, Dhalluin S, Atienzar FA (2011) Multiplexing cell viability assays Methods Mol Biol 740:91–101 doi:10.1007/ 978-1-61779-108-6-11

43 Casañola-Martin GM, Le-Thi-Thu H, Marrero-Ponce Y, Castillo-Garit JA, Torrens F, Perez-Gimenez F, Abad C (2014) Analy-sis of proteasome inhibition prediction using atom-based quadratic indices enhanced by machine learning classification techniques Lett Drug Des Discov 11:705–711 doi:10.2174/ 1570180811666140122001144

44 Oprea TI (2002) Current trends in lead discovery: are we looking for the appropiate properties? J Comput Aid Mol Des 16:325–334 doi:10.1023/A:1020877402759

45 Xu J, Hagler A (2002) Chemoinformatics and drug discovery Molecules 7:566–700 doi:10.3390/70800566

46 Seifert HJM, Wolf K, Vitt D (2003) Virtual high-throughput

in silico screening Biosilico 1:143–149 doi:10.1016/ S1478-5382(03)02359-X

Ngày đăng: 17/03/2021, 09:01

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm