1. Trang chủ
  2. » Thể loại khác

30. Optimum search strategies or novel 3D molecular descriptors. Is t a stalemate

33 15 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 33
Dung lượng 3,1 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

30. Optimum search strategies or novel 3D molecular descriptors. Is t a stalemate tài liệu, giáo án, bài giảng , luận vă...

Trang 1

Vietnam National University, Hanoi

39 PUBLICATIONS 99 CITATIONS

SEE PROFILE

All content following this page was uploaded by César Raúl García-Jacas on 05 August 2015

The user has requested enhancement of the downloaded file All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately.

Trang 2

Current Bioinformatics, 2015, 10, 000-000 1

Optimum Search Strategies or Novel 3D Molecular Descriptors: Is there a Stalemate?

1 Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatics Research (CAMD-BIR

International), Cartagena de Indias, Bolívar, Colombia

2 Institut Universitari de Ciència Molecular, Universitat de València, Edifici d'Instituts de Paterna, P.O Box

22085, E-46071, València, Spain

3 Grupo de Investigación en Estudios Químicos y Biológicos, Facultad de Ciencias Básicas, Universidad

Tecnológica de Bolívar, Cartagena de Indias, Bolívar, Colombia

4 Grupo de Investigación de Bioinformática, Centro de Estudio de Matemática Computacional (CEMC),

Universidad de las Ciencias Informáticas (UCI), La Habana, Cuba

5 Faculty of Computing and Systems, Pontifical University Catholic of Ecuador in Esmeraldas (PUCESE) C/ Espejo y Santa

Cruz S/N, 080150 Esmeraldas, Ecuador

6 Laboratorio de Electrónica Molecular, Universidad del Zulia, Facultad Experimental de Ciencias, Departamento de Química Maracaibo, República Bolivariana de Venezuela

7 Laboratorio de Caracterización Molecular y Biomolecular, Departamento de Investigación en Tecnología de los Materiales

y el Ambiente (DITeMA), Instituto Venezolano de Investigaciones Científicas (IVIC), Avenida 74 con calle 14A, Maracaibo,

República Bolivariana de Venezuela

8 Departamento de Química, Universidade Federal de Lavras, UFLA Caixa Postal 3037, 37200-000 Lavras, MG, Brazil

9 School of Medicine and Pharmacy, Vietnam National University, Hanoi (VNU) 144 Xuan Thuy, Cau Giay, Hanoi, Vietnam

Abstract: The present manuscript describes a novel 3D-QSAR alignment free method (QuBiLS-MIDAS Duplex) based on

algebraic bilinear, quadratic and linear forms on the k th two-tuple spatial-(dis)similarity matrix Generalization schemes for the

inter-atomic spatial distance using diverse (dis)-similarity measures are discussed On the other hand, normalization approaches

for the two-tuple spatial-(dis)similarity matrix by using simple- and double-stochastic and mutual probability schemes are

introduced With the aim of taking into consideration particular inter-atomic interactions in total or local-fragment indices, path

and length cut-off constraints are used Also, in order to generalize the use of the linear combination of atom-level indices to yield

global (molecular) definitions, a set of aggregation operators (invariants) are applied A Shannon’s entropy based variability

study for the proposed 3D algebraic form-based indices and the DRAGON molecular descriptor families demonstrates superior

performance for the former A principal component analysis reveals that the novel indices codify structural information

orthogonal to those captured by the DRAGON indices Finally, a QSAR study for the binding affinity to the

corticosteroid-binding globulin using Cramer’s steroid database is performed From this study, it is revealed that the QuBiLS-MIDAS Duplex

approach yields similar-to-superior performance statistics than all the 3D-QSAR methods reported in the literature reported so

far, even with lower degree of freedom, using both the 31 steroids as the training set and the popular division of Cramer’s

database in training [1-21] and test sets [22-31] It is thus expected that this methodology provides useful tools for the diversity

analysis of compound datasets and high-throughput screening structure–activity data

Keywords: Alignment free method, aggregation operator, Minkowski distance matrix, principal component analysis,

QuBiLS-MIDAS, 3D-QSAR, two-tuple spatial-(dis)similarity matrix, TOMOCOMD-CARDD, variability analysis

1 INTRODUCTION

The advent of 3D-QSAR methods represents a

funda-mental shift from the classical Hansch-Fujita (2D-QSAR)

*Address correspondence to this author at the Unit of Computer-Aided

Molecular “Biosilico” Discovery and Bioinformatics Research (CAMD-BIR

International), Cartagena de Indias, Bolívar, Colombia;

Tel: 3043926347; E-mails: ymarrero77@yahoo.es, ymponce@gmail.com

approach, motivated by the rationale that the spatial arrangement of molecular structures plays determinant role

in comprehending the ligand–receptor interactions [1] Right from the pioneering work by Cramer [2], the 3D-QSAR methods have enjoyed considerable enthusiasm over their capability to adequately model the biological activities of chemical structures In principle, the 3D-QSAR techniques could be divided in two main groups, alignment-based techniques (COMFA-related methods) and alignment

Yovani Marrero-Ponce

Trang 3

2 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al

independent methods [e.g CoMASA (Comparative

Molecular Active Site Analysis)] However, the use of

3D-QSAR methods has been far from a fairy tale; several

problems have been met On one hand, the use of alignment

rules comes along with a number of challenges, such as their

subjectivity, i.e they are generally inapplicable to

structurally diverse datasets (albeit there are works in this

sense, e.g see reference [3]), and the computation of steric

and/or electrostatic interaction energies yields numerous

variables (high dimensionality MD space) relative to the

dataset size and usually include noisy variables that tend to

compromise the quality of the QSAR models [4-6]

Efforts have been made to address the limitations of

3D-QSAR methods For example, techniques aimed at

addressing the high dimensionality problem include: filtering

data points prior to QSAR modeling [7, 8], variable selection

procedures [5, 9, 10] and grouping points [11] On the other

hand, similarity matrix correlations defined in terms of shape

or electrostatic potentials were introduced with the aim of

lowering the computational cost of 3D methods, though at

the expense of loss of significant features of the molecules

[12] Also strategies aimed at improving the alignment rules

have been proposed, such as the Monte Carlo algorithm [13]

and least squares fitting [14] Other approaches such as the

hypothetical active site analysis (HASL) convert superposed

molecular sets to a set of spaced points (lattice) to a regular

dimension which are defined by 3D-Cartesian coordinates

and atom-types [15-17]

On the other hand, rather than improving the alignment

rules, several alignment-independent techniques have been

proposed such as the use of 3D-models based on Cartesian

coordinates [18], molecular transforms [19, 20], molecular

spectra [21, 22], as well as the extension of traditional 2D

molecular indices to consider 3D information [23-27] Other

alignment-free methods include CoMMA [Comparative

Molecular Moment Analysis] [28], van der Waals excluded

volume [29] etc These methods are invariant to both

translation and rotation of the molecular structures, and have

generally yielded to comparable results with respect to the

alignment-based methods

However, although relentless efforts have been made to

improve or provide alternative, robust and computationally

cheap 3D-QSAR techniques, either due to the complexity of

modeling biological activities or the very weakness inherent

to the present methods, improvements on the quality of the

3D-QSAR models have in reality been minimal, creating

some kind of “out of reach” model performance So is it

possible to penetrate through these “boundaries”? Looking at

the current state of 3D-QSAR modeling in general, the

balance of responses to this interrogative may possibly lie

towards the negative end However, our argument is that, it

is imperative to diversify the space spanned by the 3D

molecular parameters, to yield variables that correctly “fit”

or adjust to the “troublesome” behavior of the molecules,

other than nearly exclusively concentrating on the quest for

the correct relationship among variables by using more

powerful (linear or non-linear) search strategies and

optimization functions

In previous reports, Marrero-Ponce et al introduced

outstanding features related with the topological (2D) and

chiral (2.5D) aspects of the molecules through the based and bond-based TOMOCOMD-CARDD (acronym for Topological Molecular Computer Design – Computer Aided Rational Drug Design) molecular descriptors (MDs) (now condensed in QuBiLS-MAS module) [30-39] These MDs codify molecular information by means of the bilinear,

atom-quadratic and linear algebraic forms and the graph–

theoretical electronic-density matrices Thus, bearing in

mind these successfully results and based on the same linear algebraic concepts, this manuscript is dedicated to the definition and generalization of the 3D algebraic-based

QuBiLS-MIDAS (acronym for Quadratic, Bilinear and Linear Maps based on n-Tuple Spatial Metric [(Dis)-

N-Similarity] Matrices and Atomic WeightingS) Duplex MDs for relations between atom-pairs, which constitute a module

of the TOMOCOMD-CARDD framework

2 THEORETICAL FRAMEWORK 2.1 Bilinear, Quadratic and Linear Form-based Indices for Atom-Level and Total (Whole-Molecule) Definitions

bilinear, quadratic and linear MDs for each atom “a” are

computed as bilinear, quadratic and linear algebraic maps

mathematically expressed as shown as follows, respectively:

where, n is the amount of atoms of the chemical structure, u

y 1 ,…, y n are the coordinates (or components) of the molecular

molecular vectors as representations of chemical structures has been explained in detail elsewhere [35-37, 40] In the present report, the components of these molecular vectors are computed from the following atom- and fragment-based

properties (weighting schemes): 1) atomic mass (m), 2) the van der Waals volume (v), 3) the atomic polarizability (p), 4) atomic Pauling electronegativity (e), 5) atomic Ghose- Crippen LogP (a) [23, 41, 42], 6) atomic Gasteiger-Marsili charge (c) [43], 7) atomic polar surface area (psa) [44], 8) atomic refractivity (r) [23, 41, 42], 9) atomic hardness (h) and 10) atomic softness (s) These properties were

implemented in the QuBiLS-MIDAS program [45, 46] mainly using the Chemistry Development Kit (CDK) library [47]

atom-level spatial-(dis)similarity matrix (SDSM) 𝔾𝔾!,! for

𝔾𝔾! as follows:

Trang 4

g ij a,k = g ij k if i = a ∧ j = a

= 12g ij k if i = a ∨ j = a

= 0 otherwise

(4)

So, if a molecule is divided into “a” atoms then the

(see Eqs 1-3) In this way, the total (whole-molecule, that is,

considering all atoms) bilinear, quadratic and linear indices

may be represented as a vector 𝐿𝐿 of size n, where each entry

atom-level index (descriptor) for the atom “a”

Therefore from this decomposition, the total bilinear,

quadratic and linear indices are calculated as linear

combination (summation) of the atom-level indices (values

of the vector 𝐿𝐿) Generalizations of this approach using

several aggregation operators will be discussed later (see

section 2.6) The summation over 𝐿𝐿 is equivalent to the

and the property vector [Y], analogous to the original

approach for 2D global bilinear, quadratic and linear

algebraic forms [37, 48, 49], as shown in Eqs 5-7 (see also

where [X] and [Y] are column vectors (nx1 matrices)

are the transposes of the vector [U] and the property vector

[X], respectively

matrix for a molecule, which constitutes a generalization of

the well-known geometric distance matrix [20, 50] The

geometric distance matrix (or geometry matrix) of a

molecule is a square symmetric matrix n×n, where each

(geometric distance) between the atoms i and j; and the

diagonal entries are always zero [12, 20, 50, 51] In the

present report, several approaches are proposed as an

extension/generalization of the traditionally used geometric

distance matrix These will be discussed in the next section

2.2 The Two-Tuple Spatial-(Dis) Similarity Matrix

(SDSM) and their Physicochemical Nature

The development of keen interest in the codification of

the geometric and topographic aspects of the molecular

structures as a logical extension of the topological

representation can be traced way back to the mid-1980s This

approach codifies information related with the molecular

geometry represented by a geometric distance matrix [12,

19, 20, 24] As was previously mentioned the geometric

distance matrix uses the Euclidean distance to codify

inter-atomic interactions within a molecule

Formally, let N be a set of elements, a function D:

If D holds for the properties 1-3 it is called a distance on

N, while if D holds for properties 1-4 it is then denominated

a distance metric On the other hand, if D holds for the axioms 1, 2 and 4 is denominated as pseudometric, but if D does not hold the property 4 is a nonmetric

To compute the distance between two atoms the 3D

Cartesian coordinates x, y, z are considered These

coordinates are continuous variables, constituting the

Euclidean metric the most common measure employed to

compute the distance for these types of variables It is

striking that up to the moment the Euclidean distance has

been considered as practically the exclusive inter-atomic

metric in the computation of 3D MDs, although there is no evidence other than the intuitive reasoning that upholds it as the most suitable distance metric Therefore, if a molecule is

in an Euclidean space and taking into account the previous

distance and metric definitions, it is then possible to

generalize the distance between the atoms i and j through

matrix” with elements defined in Eq 8 is the more general

(extended or expanded) case of the well-known geometric

distance matrix (if p = 2) However, there exist numerous

metrics that have been used successfully in machine learning algorithms and similarity studies [53-55], that could be used

to compute the inter-atomic dissimilarity and in this way serving as generalization schemes for the spatial distance,

ij

in this report for the computation of the inter-atomic geometric distance

So, why use diverse (dis)-similarity metrics? Due to the fact

that the values obtained from these may exhibit a high degree of correlation as an indicative of the similarity between the objects

under study, as shown by Holliday et al in a comparative

analysis of the Cosine and Tanimoto coefficients [56] Conversely, whether these values show low correlations among them then may be a reflex of very different features among the objects that are being compared Therefore, it must not be

assumed as a premise that exist any single “best” distance

metric even if this report is only addressed to the domain of

Trang 5

4 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al

chemical structure handling In fact, as noted by Jones and

Curtice [57] in a debate regarding the association between

indexing terms in information retrieval systems: “What is

annoying is that no clear-cut criterion for choice among the

alternatives has emerged As a result, few candidate measures

have been permanently dismissed from consideration, and a

rather large set of formulas remains available.” Accordingly,

there is hence a continuing interest of analyzing and comparing

those available coefficients (metrics) in order to ensure that the

most suitable one(s) are used in any concrete similarity-based

system In conclusion, the use of several (dis)-similarity metrics

is necessary because they have some degree of orthogonality

and thus, the corresponding obtained MDs will have

independent information, which will also be highly

complementarily because each metric reflects very different

characteristics of atom-pairs in a molecule

On the other hand, with the aim of taking into account

close and distant inter-atomic interactions within the

molecular skeleton, we adapt a generalized expression for

j (see Table 1) Furthermore, to achieve greater

discrimination of molecular structures the diagonal entries could have assigned two different values: 1) representing the amount of lone-pair electrons for each atom, or 2) the

are defined as follows:

g ij1 =D ij if i ≠ j ∧ i, j are atoms of the molecule = L ij if i = j ∧ lone- pairs are considered (or D io ) = 0 otherwise

on the atom i, or 2) the (dis)-similarity between atom i and

center of molecule, gio (Dio)

Table 1 Metrics used to compute the “distance” between two atoms of a molecule

Minkowski (M1-M7)

p = 0.25, 0.5, 1, 1.5, 2, 2.5, 3, and ∞

[where, when p= 1 it is the Manhattan,

city-block or taxi distance (also known as

Hamming distance between binary vectors)

and p = 2 is Euclidean distance)

Trang 6

such a way that the elements of the matrix 𝔾𝔾! will be equal

non-stochastic two-tuple spatial-(dis)similarity matrix

generalized matrix due to the fact that is determined through

the Hadamard product, that is, raising to different real

powers the elements of the matrix [20] However,

generalized reciprocal matrices where k takes negative

values (k ≤ -1) are also employed as matrix forms That is to

say, the matrices employed in this report are calculated by

raising the matrix coefficients to both positive and negative

exponents In this case, when the matrix exponent is negative

and if the number of lone pairs for each atom i in the

molecule is selected as diagonal element then the reciprocal

is not applied Nonetheless, the reciprocal is computed if the

(dis)-similarity between each atom i and the center of the

molecule is chosen as diagonal coefficient

corresponding reciprocal for computing the bilinear,

quadratic and linear indices is based on the physicochemical

nature of distinct non-covalent interactions, such as Van der

Waals terms, gravitational interactions, Coulomb potential

with the powers of their coefficients, where k = 0, ±1, ±2,

±3…±12 These exponents take into account the different

interactions between atoms in a molecule, for example, for k

Coulombic and/or Gravitational, respectively The maximum

k value, ±12, is related with non-bonded (mainly steric)

interactions associated with the functional form of the

Lennard-Jones 6-12 potential, like in most CoMFA-like

studies

2.3 Normalization Formalisms based on

Simple-Stochastic, Double-Stochastic and Mutual Probability

Schemes

Matrices constitute the most common mathematical

representation to codify structural information of molecules

[20] Of particular interest are the matrices related to

molecular geometry, such as the geometry matrix, molecular

influence matrix, and others, which serve as a starting point

for the calculation of many 3D-MDs However, it is unusual

to use probabilistic transformations in matrices in general

As each rule has an exception, stochastic matrices are

defined in the framework of the MARCH-INSIDE

descriptors [58, 59], TOMOCOMD-CARDD 2D descriptors

(now condensed in QuBiLS-MAS module in

TOMOCOMD-CARDD software) [33, 60], and in walk counts (random

walk Markov matrix) In addition, Carbo-Dorca [61] also

employed a stochastic scaling by means of a simple

stochastic transformation This transformation was applied to

Quantum Similarity Matrixes (QSM) providing a stochastic

QSM In these methods a simple stochastic scaling has been

employed, where the summation of the coefficients of each

row is utilized as a scale factor In this way, unsymmetrical

matrices whose columns can be interpreted as discrete

probability distributions are created

Formally, stochastic matrices are square matrices where each column sum, left stochastic matrices, or each row sum,

right stochastic matrices, is equal to 1, that is, the

coefficients of each column or row consist of non-negative real numbers that can be interpreted as probabilities [62] On the contrary, MDs defined up to date do not use the double stochastic matrix, which is a stochastic matrix where the elements of each column and row sum 1

non-stochastic two-tuple spatial-(dis)similarity matrix, 𝔾𝔾!

!"(NS-SDSM), three probability schemes are applied These schemes are associated with inter-atomic interactions in the

chemical structure For the TOMOCOMD-CARDD 2D and

2.5D indices (QuBiLS-MAS program), the stochastic graph–

theoretical electronic-density matrix for a molecule,

describes changes in the electron distribution over time throughout the molecular backbone In this scheme, a hypothetical case in which a set of atoms are initially free in space is considered (discrete object in the space) Later, outer shell electrons of atoms are distributed around atomic cores

in discrete time intervals In this sense, the electrons in an arbitrary atom can move to other atoms at different discrete time periods throughout the chemical-bonding framework In the geometrical approach, this matrix can be interpreted as the change in the probability of atoms in a molecule to interact with each other Consequently, this probability as a measure of the spreading of the atoms (taken as discrete

objects) in space can be considered

spatial-(dis)similarity matrix, 𝔾𝔾!! ! (SS-SDSM) has been defined,

ss g ij k = g ij k

g ij k j

called the k-order spatial-(dis)-similarity vertex degree of

atom i (see Schemes 1 and 2)

However, this matrix is not necessarily symmetrical in

that the probability for atom i to interact with an atom j is different from the probability for the atom j to interact with the atom i With the purpose of equalizing the probabilities

in both senses, a double-stochastic matrix is used, defined as

a matrix with real non-negatives entries whose column and

double-stochastic two-tuple spatial-(dis)similarity matrix,

𝔾𝔾

double-stochastic matrix associated to a non-double-stochastic matrix is not

trivial Sinkhorn postulates that a strictly positive matrix A can be scaled to a double stochastic matrix B by [63]:

extended this theorem to consider non-negative matrices and proposed a well-known iteration algorithm for matrix balancing, named as the authors [64] In this sense, a DS-

equation 11 and the Sinkhorn-Knopp algorithm

Trang 7

6 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al

spatial-(dis)similarity matrix, 𝔾𝔾!" ! (MP-SDSM) is introduced The

mp g ij k = g ij k

k

g ij k j=1

and j, and S the sample space The sample space is computed

that while the simple-stochastic probability scheme has been

previously used in the TOMOCOMD-CARDD approach

[33, 60], the double-stochastic probability and mutual

probability schemes are presented for the first time as

alternative normalization strategies Scheme 1 demonstrates

the steps followed in the computation of the NS-, SS-, DS-

and MP-SDSMs

In order to illustrate the calculation process of these

matrix approaches, the molecular structure of

(E)-3-(4,5-dihydrooxazol-4-yl)-2-fluoro-3-(methylthio)acrylonitrile is

considered Table 2 depicts the zero (k = 0), first (k = 1),

second (k = 2) and third (k = 3) powers of the NS-, SS-, DS-

and MP-SDSMs for this molecular structure An example of

the computation of the atom-level SDSM matrix is shown in

2.4 Local-Fragment (Group, Atom-Type) Bilinear, Quadratic and Linear Algebraic Indices

In addition, the proposed matrices ( 𝔾𝔾!" !, 𝔾𝔾!! !, 𝔾𝔾!" !  and   𝔾𝔾!" ! ) could be used to codify

information on a specific molecular fragment (F) of the

molecule Therefore, a SDSM for the molecular fragment

F,  𝔾𝔾!!, is obtained from the total matrix 𝔾𝔾! The elements

Similar to the total atom-level indices (see Eqs 1-3), the

local-fragment two-tuple atom-level indices are computed as

a value of a local-fragment index according to the atom

considered “a” The definition of these indices is as follows:

Trang 8

Table 2 A) Chemical structure of (E)-3-(4,5-dihydrooxazol-4-yl)-2-fluoro-3-(methylthio)acrylonitrile and its labeled molecular

scaffold B), C), D) and E) The zero (k = 0), first (k = 1), second (k = 2) and third (k = 3) powers of the non-stochastic (NS), simple-stochastic (SS), double-stochastic (DS) and mutual probability (MP) spatial-(dis)similarity matrices (SDSM) of the

H H

H

H H

H

B) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 0

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083

0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007

C) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 1

0.000 1.111 1.957 2.958 2.283 3.797 1.977 1.083 1.672 1.630 1.064 2.173 0.000 0.051 0.090 0.136 0.105 0.175 0.091 0.050 0.077 0.075 0.049 0.100 1.111 0.000 1.074 1.925 1.798 2.731 1.258 1.847 2.498 2.400 1.739 2.145 0.054 0.000 0.052 0.094 0.088 0.133 0.061 0.090 0.122 0.117 0.085 0.105 1.957 1.074 0.000 1.065 0.963 1.928 2.021 2.257 2.807 2.829 2.411 3.154 0.087 0.048 0.000 0.047 0.043 0.086 0.090 0.100 0.125 0.126 0.107 0.140 2.958 1.925 1.065 0.000 1.640 0.863 2.376 3.310 3.853 3.823 3.362 3.694 0.102 0.067 0.037 0.000 0.057 0.030 0.082 0.115 0.133 0.132 0.116 0.128 2.283 1.798 0.963 1.640 3.000 2.400 2.928 2.137 2.506 2.726 2.626 3.937 0.079 0.062 0.033 0.057 0.104 0.083 0.101 0.074 0.087 0.094 0.091 0.136 3.797 2.731 1.928 0.863 2.400 1.000 2.937 4.169 4.705 4.650 4.174 4.286 0.101 0.073 0.051 0.023 0.064 0.027 0.078 0.111 0.125 0.124 0.111 0.114

Trang 9

8 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al

(Table 2) contd…

D) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 1

1.977 1.258 2.021 2.376 2.928 2.937 2.000 2.919 3.599 3.358 2.513 1.379 0.068 0.043 0.069 0.081 0.100 0.100 0.068 0.100 0.123 0.115 0.086 0.047 1.083 1.847 2.257 3.310 2.137 4.169 2.919 0.000 1.022 1.618 1.663 3.209 0.043 0.073 0.089 0.131 0.085 0.165 0.116 0.000 0.040 0.064 0.066 0.127 1.672 2.498 2.807 3.853 2.506 4.705 3.599 1.022 2.000 0.963 1.567 3.743 0.054 0.081 0.091 0.125 0.081 0.152 0.116 0.033 0.065 0.031 0.051 0.121 1.630 2.400 2.829 3.823 2.726 4.650 3.358 1.618 0.963 0.000 0.915 3.362 0.058 0.085 0.100 0.135 0.096 0.164 0.119 0.057 0.034 0.000 0.032 0.119 1.064 1.739 2.411 3.362 2.626 4.174 2.513 1.663 1.567 0.915 1.000 2.480 0.042 0.068 0.094 0.132 0.103 0.164 0.099 0.065 0.061 0.036 0.039 0.097 2.173 2.145 3.154 3.694 3.937 4.286 1.379 3.209 3.743 3.362 2.480 0.000 0.065 0.064 0.094 0.110 0.117 0.128 0.041 0.096 0.112 0.100 0.074 0.000

0.000 0.075 0.117 0.132 0.105 0.129 0.089 0.059 0.073 0.078 0.057 0.085 0.000 0.003 0.006 0.009 0.007 0.011 0.006 0.003 0.005 0.005 0.003 0.007 0.075 0.000 0.067 0.090 0.087 0.097 0.060 0.105 0.115 0.120 0.097 0.087 0.003 0.000 0.003 0.006 0.005 0.008 0.004 0.006 0.008 0.007 0.005 0.006 0.117 0.067 0.000 0.044 0.041 0.060 0.085 0.114 0.114 0.125 0.119 0.114 0.006 0.003 0.000 0.003 0.003 0.006 0.006 0.007 0.008 0.008 0.007 0.009 0.132 0.090 0.044 0.000 0.052 0.020 0.074 0.124 0.116 0.126 0.124 0.099 0.009 0.006 0.003 0.000 0.005 0.003 0.007 0.010 0.012 0.011 0.010 0.011 0.105 0.087 0.041 0.052 0.099 0.058 0.094 0.083 0.078 0.093 0.100 0.109 0.007 0.005 0.003 0.005 0.009 0.007 0.009 0.006 0.008 0.008 0.008 0.012 0.129 0.097 0.060 0.020 0.058 0.018 0.070 0.119 0.108 0.117 0.117 0.088 0.011 0.008 0.006 0.003 0.007 0.003 0.009 0.013 0.014 0.014 0.013 0.013 0.089 0.060 0.085 0.074 0.094 0.070 0.063 0.111 0.110 0.112 0.094 0.038 0.006 0.004 0.006 0.007 0.009 0.009 0.006 0.009 0.011 0.010 0.008 0.004 0.059 0.105 0.114 0.124 0.083 0.119 0.111 0.000 0.038 0.065 0.075 0.105 0.003 0.006 0.007 0.010 0.006 0.013 0.009 0.000 0.003 0.005 0.005 0.010 0.073 0.115 0.114 0.116 0.078 0.108 0.110 0.038 0.060 0.031 0.057 0.099 0.005 0.008 0.008 0.012 0.008 0.014 0.011 0.003 0.006 0.003 0.005 0.011 0.078 0.120 0.125 0.126 0.093 0.117 0.112 0.065 0.031 0.000 0.036 0.097 0.005 0.007 0.008 0.011 0.008 0.014 0.010 0.005 0.003 0.000 0.003 0.010 0.057 0.097 0.119 0.124 0.100 0.117 0.094 0.075 0.057 0.036 0.044 0.080 0.003 0.005 0.007 0.010 0.008 0.013 0.008 0.005 0.005 0.003 0.003 0.007 0.085 0.087 0.114 0.099 0.109 0.088 0.038 0.105 0.099 0.097 0.080 0.000 0.007 0.006 0.009 0.011 0.012 0.013 0.004 0.010 0.011 0.010 0.007 0.000

E) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 2

0.000 1.235 3.829 8.749 5.212 14.419 3.909 1.172 2.796 2.655 1.132 4.723 0.000 0.025 0.077 0.176 0.105 0.289 0.078 0.024 0.056 0.053 0.023 0.095 1.235 0.000 1.153 3.706 3.231 7.460 1.583 3.411 6.242 5.758 3.025 4.601 0.030 0.000 0.028 0.090 0.078 0.180 0.038 0.082 0.151 0.139 0.073 0.111 3.829 1.153 0.000 1.134 0.927 3.719 4.083 5.095 7.880 8.004 5.812 9.947 0.074 0.022 0.000 0.022 0.018 0.072 0.079 0.099 0.153 0.155 0.113 0.193 8.749 3.706 1.134 0.000 2.688 0.745 5.646 10.959 14.843 14.619 11.305 13.648 0.099 0.042 0.013 0.000 0.031 0.008 0.064 0.124 0.169 0.166 0.128 0.155 5.212 3.231 0.927 2.688 9.000 5.760 8.572 4.568 6.279 7.431 6.895 15.499 0.069 0.042 0.012 0.035 0.118 0.076 0.113 0.060 0.083 0.098 0.091 0.204 14.419 7.460 3.719 0.745 5.760 1.000 8.624 17.382 22.133 21.622 17.419 18.373 0.104 0.054 0.027 0.005 0.042 0.007 0.062 0.125 0.160 0.156 0.126 0.133 3.909 1.583 4.083 5.646 8.572 8.624 4.000 8.518 12.952 11.279 6.317 1.900 0.051 0.020 0.053 0.073 0.111 0.111 0.052 0.110 0.167 0.146 0.082 0.025 1.172 3.411 5.095 10.959 4.568 17.382 8.518 0.000 1.044 2.617 2.767 10.296 0.017 0.050 0.075 0.162 0.067 0.256 0.126 0.000 0.015 0.039 0.041 0.152 2.796 6.242 7.880 14.843 6.279 22.133 12.952 1.044 4.000 0.927 2.456 14.013 0.029 0.065 0.082 0.155 0.066 0.232 0.136 0.011 0.042 0.010 0.026 0.147 2.655 5.758 8.004 14.619 7.431 21.622 11.279 2.617 0.927 0.000 0.837 11.301 0.031 0.066 0.092 0.168 0.085 0.248 0.130 0.030 0.011 0.000 0.010 0.130 1.132 3.025 5.812 11.305 6.895 17.419 6.317 2.767 2.456 0.837 1.000 6.150 0.017 0.046 0.089 0.174 0.106 0.268 0.097 0.042 0.038 0.013 0.015 0.094 4.723 4.601 9.947 13.648 15.499 18.373 1.900 10.296 14.013 11.301 6.150 0.000 0.043 0.042 0.090 0.124 0.140 0.166 0.017 0.093 0.127 0.102 0.056 0.000

0.000 0.060 0.134 0.163 0.123 0.166 0.090 0.036 0.057 0.060 0.035 0.076 0.000 0.001 0.004 0.009 0.005 0.015 0.004 0.001 0.003 0.003 0.001 0.005 0.060 0.000 0.045 0.077 0.086 0.096 0.041 0.116 0.144 0.145 0.105 0.083 0.001 0.000 0.001 0.004 0.003 0.008 0.002 0.004 0.007 0.006 0.003 0.005 0.134 0.045 0.000 0.017 0.018 0.035 0.076 0.125 0.131 0.145 0.145 0.129 0.004 0.001 0.000 0.001 0.001 0.004 0.004 0.005 0.008 0.008 0.006 0.010 0.163 0.077 0.017 0.000 0.027 0.004 0.055 0.142 0.130 0.140 0.150 0.094 0.009 0.004 0.001 0.000 0.003 0.001 0.006 0.012 0.016 0.015 0.012 0.014 0.123 0.086 0.018 0.027 0.116 0.036 0.107 0.075 0.070 0.091 0.116 0.136 0.005 0.003 0.001 0.003 0.009 0.006 0.009 0.005 0.007 0.008 0.007 0.016 0.166 0.096 0.035 0.004 0.036 0.003 0.052 0.139 0.120 0.128 0.143 0.078 0.015 0.008 0.004 0.001 0.006 0.001 0.009 0.018 0.023 0.023 0.018 0.019 0.090 0.041 0.076 0.055 0.107 0.052 0.049 0.136 0.141 0.134 0.103 0.016 0.004 0.002 0.004 0.006 0.009 0.009 0.004 0.009 0.014 0.012 0.007 0.002 0.036 0.116 0.125 0.142 0.075 0.139 0.136 0.000 0.015 0.041 0.060 0.116 0.001 0.004 0.005 0.012 0.005 0.018 0.009 0.000 0.001 0.003 0.003 0.011 0.057 0.144 0.131 0.130 0.070 0.120 0.141 0.015 0.039 0.010 0.036 0.107 0.003 0.007 0.008 0.016 0.007 0.023 0.014 0.001 0.004 0.001 0.003 0.015

Trang 10

and column “j” of the local-fragment two-tuple atom-level

matrix,  𝔾𝔾!!,!, according to the atom “a” This matrix is

computed for each atom of the molecule from the

the distances between each atom-pair belonging to the

𝐿𝐿

and linear indices for atom-types or groups (see SCHEMES

1 and 2) In this report, these local MDs can be calculated on

seven chemical (or functional) groups in the molecule, these

are: hydrogen bond acceptors (A), carbon atoms in aliphatic chains (C), hydrogen bond donors (D), halogens (G), terminal methyl groups (M), carbon atoms in aromatic portion (P) and heteroatoms (O, N and S in all valence states, denoted as X)

Up to this section, we have used the summation of the total atom-level contributions and local-fragment atom-level

local-fragment) NS-, SS-, DS-, MP-bilinear, quadratic and linear molecular indices In the subsection 2.6, we propose alternative strategies (invariants) of obtaining indices from LOVIs other than the summation

(Table 2) contd…

F) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 2

0.060 0.145 0.145 0.140 0.091 0.128 0.134 0.041 0.010 0.000 0.013 0.094 0.003 0.006 0.008 0.015 0.008 0.023 0.012 0.003 0.001 0.000 0.001 0.012 0.035 0.105 0.145 0.150 0.116 0.143 0.103 0.060 0.036 0.013 0.022 0.071 0.001 0.003 0.006 0.012 0.007 0.018 0.007 0.003 0.003 0.001 0.001 0.006 0.076 0.083 0.129 0.094 0.136 0.078 0.016 0.116 0.107 0.094 0.071 0.000 0.005 0.005 0.010 0.014 0.016 0.019 0.002 0.011 0.015 0.012 0.006 0.000

G) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 3

0.00 1.37 7.49 25.88 11.90 54.75 7.73 1.27 4.67 4.33 1.21 10.26 0.000 0.010 0.057 0.198 0.091 0.418 0.059 0.010 0.036 0.033 0.009 0.078 1.37 0.00 1.24 7.14 5.81 20.37 1.99 6.30 15.60 13.82 5.26 9.87 0.015 0.000 0.014 0.080 0.065 0.230 0.022 0.071 0.176 0.156 0.059 0.111 7.49 1.24 0.00 1.21 0.89 7.17 8.25 11.50 22.12 22.64 14.01 31.37 0.059 0.010 0.000 0.009 0.007 0.056 0.065 0.090 0.173 0.177 0.110 0.245 25.88 7.14 1.21 0.00 4.41 0.64 13.42 36.28 57.19 55.89 38.01 50.42 0.089 0.025 0.004 0.000 0.015 0.002 0.046 0.125 0.197 0.192 0.131 0.174 11.90 5.81 0.89 4.41 27.00 13.82 25.10 9.76 15.73 20.26 18.11 61.02 0.056 0.027 0.004 0.021 0.126 0.065 0.117 0.046 0.074 0.095 0.085 0.285 54.75 20.37 7.17 0.64 13.82 1.00 25.33 72.47 104.13 100.54 72.70 78.76 0.099 0.037 0.013 0.001 0.025 0.002 0.046 0.131 0.189 0.182 0.132 0.143 7.73 1.99 8.25 13.42 25.10 25.33 8.00 24.86 46.61 37.88 15.88 2.62 0.036 0.009 0.038 0.062 0.115 0.116 0.037 0.114 0.214 0.174 0.073 0.012 1.27 6.30 11.50 36.28 9.76 72.47 24.86 0.00 1.07 4.23 4.60 33.04 0.006 0.031 0.056 0.177 0.048 0.353 0.121 0.000 0.005 0.021 0.022 0.161 4.67 15.60 22.12 57.19 15.73 104.13 46.61 1.07 8.00 0.89 3.85 52.46 0.014 0.047 0.067 0.172 0.047 0.313 0.140 0.003 0.024 0.003 0.012 0.158 4.33 13.82 22.64 55.89 20.26 100.54 37.88 4.23 0.89 0.00 0.77 37.99 0.014 0.046 0.076 0.187 0.068 0.336 0.127 0.014 0.003 0.000 0.003 0.127 1.21 5.26 14.01 38.01 18.11 72.70 15.88 4.60 3.85 0.77 1.00 15.25 0.006 0.028 0.073 0.199 0.095 0.381 0.083 0.024 0.020 0.004 0.005 0.080 10.26 9.87 31.37 50.42 61.02 78.76 2.62 33.04 52.46 37.99 15.25 0.00 0.027 0.026 0.082 0.132 0.159 0.206 0.007 0.086 0.137 0.099 0.040 0.000

0.000 0.047 0.146 0.189 0.137 0.200 0.087 0.020 0.042 0.043 0.020 0.067 0.000 0.000 0.002 0.009 0.004 0.018 0.003 0.000 0.002 0.001 0.000 0.003 0.047 0.000 0.030 0.064 0.082 0.091 0.028 0.125 0.173 0.170 0.110 0.080 0.000 0.000 0.000 0.002 0.002 0.007 0.001 0.002 0.005 0.005 0.002 0.003 0.146 0.030 0.000 0.006 0.007 0.018 0.064 0.128 0.138 0.156 0.165 0.143 0.002 0.000 0.000 0.000 0.000 0.002 0.003 0.004 0.007 0.007 0.005 0.010 0.189 0.064 0.006 0.000 0.013 0.001 0.039 0.152 0.135 0.146 0.168 0.086 0.009 0.002 0.000 0.000 0.001 0.000 0.004 0.012 0.019 0.018 0.013 0.017 0.137 0.082 0.007 0.013 0.128 0.021 0.116 0.064 0.058 0.083 0.126 0.165 0.004 0.002 0.000 0.001 0.009 0.005 0.008 0.003 0.005 0.007 0.006 0.020 0.200 0.091 0.018 0.001 0.021 0.000 0.037 0.151 0.122 0.130 0.161 0.067 0.018 0.007 0.002 0.000 0.005 0.000 0.008 0.024 0.034 0.033 0.024 0.026 0.087 0.028 0.064 0.039 0.116 0.037 0.036 0.160 0.168 0.151 0.108 0.007 0.003 0.001 0.003 0.004 0.008 0.008 0.003 0.008 0.015 0.012 0.005 0.001 0.020 0.125 0.128 0.152 0.064 0.151 0.160 0.000 0.006 0.024 0.045 0.124 0.000 0.002 0.004 0.012 0.003 0.024 0.008 0.000 0.000 0.001 0.002 0.011 0.042 0.173 0.138 0.135 0.058 0.122 0.168 0.006 0.023 0.003 0.021 0.111 0.002 0.005 0.007 0.019 0.005 0.034 0.015 0.000 0.003 0.000 0.001 0.017 0.043 0.170 0.156 0.146 0.083 0.130 0.151 0.024 0.003 0.000 0.005 0.089 0.001 0.005 0.007 0.018 0.007 0.033 0.012 0.001 0.000 0.000 0.000 0.013 0.020 0.110 0.165 0.168 0.126 0.161 0.108 0.045 0.021 0.005 0.010 0.061 0.000 0.002 0.005 0.013 0.006 0.024 0.005 0.002 0.001 0.000 0.000 0.005 0.067 0.080 0.143 0.086 0.165 0.067 0.007 0.124 0.111 0.089 0.061 0.000 0.003 0.003 0.010 0.017 0.020 0.026 0.001 0.011 0.017 0.013 0.005 0.000

Trang 11

10 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al

Table 3 A) Non-stochastic matrix of order 1 (NS-SDSM 1 ) of the chemical structure

(E)-3-(4,5-dihydrooxazol-4-yl)-2-fluoro-3-(methylthio)acrylonitrile This matrix belongs to the total bilinear index, using the Euclidean distance and the properties mass and vdw volume B) The atom-level non-stochastic matrices, NS-SDSM a,k , derived from the total NS-SDSM 1 , for all atoms of the molecule

A) NS-SDSM order 1

0.000 0.900 0.511 0.338 0.438 0.263 0.506 0.924 0.598 0.614 0.940 0.460

0.900 0.000 0.931 0.519 0.556 0.366 0.795 0.541 0.400 0.417 0.575 0.466

0.511 0.931 0.000 0.939 1.039 0.519 0.495 0.443 0.356 0.353 0.415 0.317

0.338 0.519 0.939 0.000 0.610 1.158 0.421 0.302 0.260 0.262 0.297 0.271

0.438 0.556 1.039 0.610 3.000 0.417 0.342 0.468 0.399 0.367 0.381 0.254

0.263 0.366 0.519 1.158 0.417 1.000 0.341 0.240 0.213 0.215 0.240 0.233

0.506 0.795 0.495 0.421 0.342 0.341 2.000 0.343 0.278 0.298 0.398 0.725

0.924 0.541 0.443 0.302 0.468 0.240 0.343 0.000 0.979 0.618 0.601 0.312

0.598 0.400 0.356 0.260 0.399 0.213 0.278 0.979 2.000 1.039 0.638 0.267

0.614 0.417 0.353 0.262 0.367 0.215 0.298 0.618 1.039 0.000 1.093 0.297

0.940 0.575 0.415 0.297 0.381 0.240 0.398 0.601 0.638 1.093 1.000 0.403

0.460 0.466 0.317 0.271 0.254 0.233 0.725 0.312 0.267 0.297 0.403 0.000

B) Atom-level NS-SDSM order 1 for all atoms of the molecule

0.000 0.450 0.256 0.169 0.219 0.132 0.253 0.462 0.299 0.307 0.470 0.230 0.000 0.450 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.450 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.450 0.000 0.466 0.260 0.278 0.183 0.397 0.271 0.200 0.208 0.287 0.233 0.256 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.466 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.169 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.260 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.219 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.278 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.132 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.183 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.253 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.397 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.462 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.271 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.299 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.307 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.208 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.470 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.287 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.230 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.233 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.256 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.169 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.466 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.260 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.256 0.466 0.000 0.469 0.519 0.259 0.247 0.222 0.178 0.177 0.207 0.159 0.000 0.000 0.000 0.469 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.469 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.169 0.260 0.469 0.000 0.305 0.579 0.210 0.151 0.130 0.131 0.149 0.135 0.000 0.000 0.519 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.305 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.259 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.579 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.247 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.210 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.222 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.151 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.178 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.130 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.177 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.131 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.207 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.135 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

C) Atom-level NS-SDSM order 1 for all atoms of the molecule

0.000 0.000 0.000 0.000 0.219 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.132 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.278 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.183 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.519 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.259 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.305 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.579 0.000 0.000 0.000 0.000 0.000 0.000 0.219 0.278 0.519 0.305 3.000 0.208 0.171 0.234 0.200 0.183 0.190 0.127 0.000 0.000 0.000 0.000 0.000 0.208 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.208 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.132 0.183 0.259 0.579 0.208 1.000 0.170 0.120 0.106 0.108 0.120 0.117

Trang 12

2.5 Constraints: Topological and Geometric

Neighborhood Quotient Matrices

The geometry matrix (G) [20, 50], contains information

related with the 3D molecular conformation and

configuration, but it does not contain information about atom connectivity Therefore, for several applications, the

geometry matrix is accompanied by a connectivity table or

several combinations with other “topological” or

(Table 3) contd… D) Atom-level NS-SDSM order 1 for all atoms of the molecule

0.000 0.000 0.000 0.000 0.171 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.170 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.234 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.120 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.106 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.183 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.108 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.190 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.120 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.127 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.117 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.253 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.462 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.397 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.271 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.247 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.222 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.210 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.151 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.171 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.234 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.170 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.120 0.000 0.000 0.000 0.000 0.253 0.397 0.247 0.210 0.171 0.170 2.000 0.171 0.139 0.149 0.199 0.363 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.171 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.171 0.000 0.000 0.000 0.000 0.000 0.462 0.271 0.222 0.151 0.234 0.120 0.171 0.000 0.489 0.309 0.301 0.156 0.000 0.000 0.000 0.000 0.000 0.000 0.139 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.489 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.309 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.199 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.301 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.363 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.156 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.299 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.307 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.208 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.178 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.177 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.130 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.131 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.183 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.106 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.108 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.139 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.489 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.309 0.000 0.000 0.299 0.200 0.178 0.130 0.200 0.106 0.139 0.489 2.000 0.519 0.319 0.134 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.519 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.519 0.000 0.000 0.000 0.307 0.208 0.177 0.131 0.183 0.108 0.149 0.309 0.519 0.000 0.547 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.319 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.547 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.134 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.470 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.230 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.287 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.233 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.207 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.135 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.190 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.127 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.120 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.117 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.199 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.363 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.301 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.156 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.319 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.134 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.547 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.470 0.287 0.207 0.149 0.190 0.120 0.199 0.301 0.319 0.547 1.000 0.202 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.202 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.202 0.000 0.230 0.233 0.159 0.135 0.127 0.117 0.363 0.156 0.134 0.149 0.202 0.000

Trang 13

12 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al

Table 4 Labeled chemical structure of (E)-3-(4,5-dihydrooxazol-4-yl)-2-fluoro-3-(methylthio)acrylonitrile and its local-fragment

[for heteroatom (X)] NS-SDSM matrices of order 1 using Euclidean distance with and lacking cutoffs (topological and geometrical thresholds) considering the lone-pair electrons in main diagonal A) Topological interaction at lag p, cut-off interval [2; 4-5] For simplicity, only the interactions from Flour atom to other atoms in molecule are displayed with discontinuous lines The “heavy” and slightly discontinuous lines are related with interaction between F atom with heteroatoms (X) and with non-heteroatoms (C-atoms), respectively Blue, Green and Red lines mean contact between F- atom and other atoms at 2, 4 or 5 topological distance B) The figure visualizes the corresponding tree of topological atomic distances of the annotated atom in (A) The root and leaves are labeled with the corresponding atom numbers; C) NS-SDSM 1 for fragment X with topological lag p, cut-off interval [2; 4-5]; D) NS-SDSM 1 for fragment X with geometrical lag l, cut-off interval [1.0, 2.0, and 3.0 Å] for the F-atom; E) NS-SDSM 1 for fragment X with topological and geometrical cut-off intervals (lag p [2; 4-5] and lag l [1.0-3.0 Å]); and F) NS-SDSM 1 for fragment X without cutoff (KA: keep all matrix elements)

Trang 14

“topographical” distances In this sense, from the geometric

also be re-defined as distance/distance matrix (D/D) in order

to merge in the same mathematical representation 2D and 3D

information of the molecular structure [20] Other important

matrices are the geometric distance/topological distance

quotient matrix, represented as G/D, whose entries are

computed through the division between the respective

coefficients of the geometry matrix (G) and the graph

distance matrix (D) (D/G constitutes the corresponding

reciprocal of the G/D) Other matrices that merge 2D and 3D

chemical information are distance–distance combined

matrices, namely G^D (geometric distance–topological

distance combined matrix), T^D (topographic distance–

topological distance combined matrix) and D^G and D^T,

which are the transpose of the representations G^D and T^D

[65]

With the purpose of taking into account only some

inter-atomic interactions (for example, short-, middle- and

large-contacts) in total or local-fragment indices and thus account

for the most relevant interactions, two different constraints

are proposed:

topological distance at a lag p, represented as “path

cut-off”

distance at a lag l, denoted as “length cut-off”

matrix: the two-tuple topological and geometric

neighborhood quotient SDSM1, denoted as ℕℚ𝔾𝔾1 This

𝔾𝔾!

user-defined thresholds p and/or l, and zero otherwise Then,

NQ g ij1 = gij1 if pmin≤pij ≤ pmax or / and lmin≤lij ≤ lmax

a user-defined topological and Euclidean distance thresholds,

respectively Min and Max means minimum and maximum

cut-offs (rank)

The constraints approach (both path and length

thresholds) permit us to unify 2D and 3D information as well

as to consider the most relevant interactions (see Table 4 for

a simple example) In addition, to avoid untrustworthy or

irrelevant molecular information because of long-range

inter-atomic relations, topological and/or geometrical cut-offs

whose values only enable to take into account those

inter-atomic relations significant to the considered interactions,

are used In this way, atoms far from the molecule and not

contributing to the interactions are not involved in the

computations

This approach is rather different from the one previously

used that combines matrices with topological/topographical/

geometrical Euclidean distances [20, 65] In addition, the

generalized matrices (see above) [20] A quite similar

approach has been previously used in chemo-informatics

The adjacency matrix is an instance of neighborhood matrix

distance matrix Applied to the geometry matrix, a threshold

t is equivalent to a predefined geometric distance that yields

to a sparse geometry matrix, where uniquely the atom-pairs

not too far from each other are taken into account In the

geometrical thresholds or both can be used If constraints are

calculations, i.e., it is not is mandatory to use any constraints

for calculations However, selecting the “cut-off” permits to differentiate the interaction types, for example when a topological cut-off is applied, atomic indices could be calculated for atoms separated by 1 step (covalent

interactions) or for those separated by more than 1 step (p

≥ 2) to characterize the non-covalent interactions between

atoms i and j

The relationship between the distance and the magnitude

of the non-covalent interactions of diverse nature demonstrates that these contribute to the maintenance of the 3D structure of the molecule, depending on the distance that the interacting groups are found In this way, some of these interactions are only important when the functional groups are so close among themselves or so distant in the covalent structure but sterically close (large-contacts)

For example, the use of the length criterion (together with the exponent k) permits to take into account only the interactions among the functional groups of the atoms i and j,

which significantly contribute to the maintenance of the

molecular structure On the other hand, the k exponent in the

that exists between the distance and the strength of the

interaction between the functional groups of the atoms i and

j

The path criterion permits to select the interactions for

atoms found at a determined topological distance In this way, it could be useful to construct the matrices from the information about the contact (interaction) among atoms separated at a determined distance (or distance range) in the 2D structure of the molecule, with the objective of studying the possible relations between a specific property and the topological characteristics of the molecule

An example of the application of the path and length criteria in the construction of the matrix that characterizes the 3D structure of the molecule could be found in the Table

4 From these neighborhood matrices, other neighborhood

matrices are derived for the description of the 3D features of

molecules (see Schemes 1 and 2) These matrices are

representations are obtained through the application of the

Our approach could be viewed as thresholds that generalize

and unify the use of lag k and lag r in 2D- and 3D-Moreau–

Broto autocorrelations, respectively For autocorrelation MDs

determined on a molecular graph, the lag k cut-off exactly

matches with the topological distance among any pair of vertices Autocorrelation indices for 3D molecular geometry are

Trang 15

14 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al

computed using Euclidean inter-atomic distances (r),

represented in the geometric matrix (G), instead of topological

cut-off is rather different from our “length” threshold, due to

the fact that in the first MDs the inter-atomic distance is split

into distance intervals of equal size

neighborhood interaction geodesic matrix, that is, the

interaction geodesic matrix (NIGM), δ ij:

NQ g ij1 = g ij1×δij

where, δ ij= 1 if pmin≤p ij ≤ pmax or / and lmin≤l ij ≤ lmax

= 0 otherwise

(18)

and j That is to say, the ℕℚ𝔾𝔾 matrix can be obtained using a

Hadamard product between matrices of same size SDSM and

1

ij

and linear indices at lags p and l (using both thresholds at the

same time) can be re-expressed and compacted by the

following equation:

!" 𝑥𝑥!𝑦𝑦! = !" NQgij k 𝑥𝑥!𝑦𝑦!=

local-fragment) bilinear, quadratic and linear indices using p and l

thresholds Here, the m form is: a) bilinear map if [X] ≠ [Y]

(different atomic properties in [X] and [Y] vectors), b)

quadratic map if [X] = [Y] (the same atomic property in [X]

quotient SDSM

2.6 Generalization of Method of Obtaining Total and

Local-Fragment Indices from LOVIs: Is It More Than

the Sum of Its Parts?

The notion of invariants as a generalization scheme for

the linear combination of atomic contributions to yield

global (molecular) definitions is derived from the hypothesis

that the most suitable global definition of a natural system

may not necessarily be additive Indeed it was demonstrated

in [66-68] that other operators other than the sum could yield

better correlations with determined chemical properties

These invariants are applied to the vector 𝐿𝐿 of atom-level indices These are classified in four major groups (see Table

5):

1 Norms (or Metrics) Invariants: Minkowski norms

(N1, N2, N3), and Penrose size (PN) Note that the

N1 in our case is equivalent to the summation of the

components of vector 𝐿𝐿 (Eqs 5-7)

2 Mean Invariants (first statistical moment):

Geometric mean (G), arithmetic mean (M), quadratic mean (P2), power mean of third degree (P3) and

harmonic mean (A)

3 Statistical Invariants (highest statistical moments):

Variance (V), skewness (S), kurtosis (K), standard deviation (SD), variation coefficient (CV), range (R), percentile 25 (Q1), percentile 50 (Q2), percentile 75

4 Classical Invariants: Autocorrelation (AC),

Gravitational (GV), Total Information Content (TIC), Mean Information Content (MIC), Standarized Information Content (SIC), Total Sum (TS), Ivanciuc – Balaban (IB), Electrotopological State (ES) and

quadratic and linear indices defined in the equations 5, 6 and

7, respectively In the same way, these mathematical

operators could be utilized over a vector composed of a particular class of chemical local-fragment (group and atom-type) to obtain diverse local-fragment indices to describe a given molecule Note that as for the classical invariants, in addition to using atom-level indices as LOVIs (in place of the vertex degrees), these algorithms usually carry summations, which are generalized as well using the norms,

means and statistical invariants The Scheme 2 summarizes

the steps (and generalizations) followed in the computation

of these novel 3D-MDs Finally, all 3D algebraic-based MDs are calculated with QuBiLS-MIDAS software [45, 46], a module of the TOMOCOMD-CARDD approach

3 SHANNON’S ENTROPY-BASED VARIABILITY ANALYSIS OF THE QUBILS-MIDAS DUPLEX 3D INDICES AND COMPARISON WITH OTHER APPROACHES

Recently, Godden et al., has proposed an information

theory-based algorithm for evaluating the relevance of variables [69] This unsupervised method is based on the computation of Shannon’s Entropy (SE) for variables, following the synthesis that variables desirable for chemo-informatics tasks should possess high SE values as an indicator of their tendency to gradually change with modification of the chemical molecular structure, while redundant variables (from a case-wise perspective) should possess low SE values, with the lower limit being zero SE for variables that assign the “same value” to dissimilar cases,

Trang 16

Table 5 Norms, Means and Statistical Invariants as Generalizations of the Linear Combination of LOVIs as Global (or Local)

MDs Operator, as well as Classical algorithms which generalize the first three groups

Ngày đăng: 16/12/2017, 00:02

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[46] García-Jacas CR, Aguilera-Mendoza L, González-Pérez R, Marrero-Ponce Y, Acevedo-Martínez L, Barigye SJ, et al. Multi- Server Approach for High-Throughput Molecular Descriptors Calculation based on Multi-Linear Algebraic Maps. Mol Inf 2015;34(1): 60–9 Sách, tạp chí
Tiêu đề: et al
[48] Marrero-Ponce Y, Castillo-Garit J, Torrens F, Romero Zaldivar V, Castro E. Atom, Atom-Type, and Total Linear Indices of the“Molecular Pseudograph’s Atom Adjacency Matrix”: Application to QSPR/QSAR Studies of Organic Compounds. Molecules 2004;9(12): 1100-23 Sách, tạp chí
Tiêu đề: Molecular Pseudograph’s Atom Adjacency Matrix
[68] Barigye SJ, Marrero-Ponce Y, Martínez López Y, Artiles Martínez LM, Pino-Urias RW, Martínez Santiago O, et al. Relations Frequency Hypermatrices in Mutual, Conditional and Joint Entropy-Based Information Indices. J Comput Chem. 2013; 34:259-74 Sách, tạp chí
Tiêu đề: et al
[100] Cosentino U, Moro G, Bonalumi D, Bonati L, Lasagni M, Todeschini R, et al. A combined use of global and local approaches in 3D-QSAR. Chemom Intell Laborat Sys 2000; 52: 183-94 Sách, tạp chí
Tiêu đề: et al
[1] Kubinyi H. QSAR and 3D QSAR in Drug Design: 1. Methodology. Drug Disc Today. 1997; 2: 457-67 Khác
[42] Balaban AT. Steric fit in quantitative structure-activity relations: Springer-Verlag; 1980 Khác
[43] Gasteiger J, Marsili M. Iterative partial equalization of orbital elektronegativity - a rapid access to atomic charges. Tetrahedron 1980; 36: 3219-88 Khác
[44] Ertl P, Rohde B, Selzer P. Fast Calculation of Molecular Polar Surface Area as a Sum of Fragment-Based Contributions and Its Application to the Prediction of Drug Transport Properties. J Med Chem 2000; 43: 3714-7 Khác
[45] García-Jacas CR, Marrero-Ponce Y, Acevedo-Martínez L, Barigye SJ, Valdés-Martiní JR, Contreras-Torres E. QuBiLS-MIDAS: A Parallel Free-Software for Molecular Descriptors Computation Based on Multilinear Algebraic Maps. J Comput Chem 2014;35(18): 1395–409 Khác
[47] Steinbeck C, Han YQ, Kuhn S, Horlacher O, Luttmann E, Willighagen EL. The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics J Chem Inf Comput Sci 2003; 43 (2): 493–500 Khác
[49] Marrero-Ponce Y, Medina-Marrero R, Torrens F, Martinez Y, Romero-Zaldivar V, Castro EA. Atom, atom-type, and total nonstochastic and stochastic quadratic fingerprints: a promising approach for modeling of antibacterial activity. Bioorg Med Chem 2005; 13(8): 2881-99 Khác
[50] Nikolic S, Trinajstic N, Mihalic Z, Carter S. On the geometric- distance matrix and the corresponding structural invariants of molecular systems. Chem Phys Lett 1991; 179(1-2): 21-8 Khác
[51] Devillers J, Balaban AT. Topological Indices and Related Descriptors in QSAR and QSPR. Amsterdam, The Netherland:Gordon and Breach; 1999 Khác
[52] Vargas-Quesada B, Anegón FM. Visualizing the structure of science. New York: Springer; 2007 Khác
[53] Willett P. Chemoinformatics – Similarity and Diversity in Chemical Libraries. Curr Opinion Biotechnol 2000; 11: 85–8 Khác
[54] Balaban AT, Bertelsen S, Basak SC. New centric topological indexes for acyclic molecules (trees) and substituents (rooted trees), and coding of rooted trees. MATCH Commun Math Comput Chem 1994; 30: 55–72 Khác
[55] Balaban AT, Feroiu V. Correlations between structure and critical data or vapor pressures of alkanes by means of topological indices.Rep Mol Theor 1990; 1: 133-9 Khác
[56] Holliday JD, Ranade SS, Willett P. A Fast Algorithm for Selecting Sets of Dissimilar Molecules from Large Chemical Databases.Quant Struct-Act Relat [57] 1995; 14: 501-6 Khác
[58] Jones PE, Curtice RM. A Framework for Comparing Document Term Association Measures. Am Doc 1967; 18: 153-61 Khác
[61] Marrero-Ponce Y, Castillo-Garit JA, Castro EA, Torrens F, Rotondo R. 3D-chiral (2.5) atom-based TOMOCOMD-CARDDdescriptors: theory and QSAR applications to central chirality codification J Math Chem 2008; 44: 755-86 Khác

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w