Maracaibo, República Bolivariana de Venezuela 7 Laboratorio de Caracterización Molecular y Biomolecular, Departamento de Investigación en Tecnología de los Materiales y el Ambiente DIT
Trang 1Current Bioinformatics, 2015, 10, 000-000 1
1574-8936/15 $58.00+.00 © 2015 Bentham Science Publishers
Optimum Search Strategies or Novel 3D Molecular Descriptors: Is there a Stalemate?
Yovani Marrero-Ponce*,1,2,3, César R García-Jacas1,4, Stephen J Barigye1,8, José R Valdés-Martiní1,
1 Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatics Research (CAMD-BIR
International), Cartagena de Indias, Bolívar, Colombia
2 Institut Universitari de Ciència Molecular, Universitat de València, Edifici d'Instituts de Paterna, P.O Box
22085, E-46071, València, Spain
3 Grupo de Investigación en Estudios Químicos y Biológicos, Facultad de Ciencias Básicas, Universidad
Tecnológica de Bolívar, Cartagena de Indias, Bolívar, Colombia
4 Grupo de Investigación de Bioinformática, Centro de Estudio de Matemática Computacional (CEMC),
Universidad de las Ciencias Informáticas (UCI), La Habana, Cuba
5 Faculty of Computing and Systems, Pontifical University Catholic of Ecuador in Esmeraldas (PUCESE) C/ Espejo y Santa
Cruz S/N, 080150 Esmeraldas, Ecuador
6 Laboratorio de Electrónica Molecular, Universidad del Zulia, Facultad Experimental de Ciencias, Departamento de Química Maracaibo, República Bolivariana de Venezuela
7 Laboratorio de Caracterización Molecular y Biomolecular, Departamento de Investigación en Tecnología de los Materiales
y el Ambiente (DITeMA), Instituto Venezolano de Investigaciones Científicas (IVIC), Avenida 74 con calle 14A, Maracaibo,
República Bolivariana de Venezuela
8 Departamento de Química, Universidade Federal de Lavras, UFLA Caixa Postal 3037, 37200-000 Lavras, MG, Brazil
9 School of Medicine and Pharmacy, Vietnam National University, Hanoi (VNU) 144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
Abstract: The present manuscript describes a novel 3D-QSAR alignment free method (QuBiLS-MIDAS Duplex) based on
algebraic bilinear, quadratic and linear forms on the k th two-tuple spatial-(dis)similarity matrix Generalization schemes for the
inter-atomic spatial distance using diverse (dis)-similarity measures are discussed On the other hand, normalization approaches
for the two-tuple spatial-(dis)similarity matrix by using simple- and double-stochastic and mutual probability schemes are
introduced With the aim of taking into consideration particular inter-atomic interactions in total or local-fragment indices, path
and length cut-off constraints are used Also, in order to generalize the use of the linear combination of atom-level indices to yield
global (molecular) definitions, a set of aggregation operators (invariants) are applied A Shannon’s entropy based variability
study for the proposed 3D algebraic form-based indices and the DRAGON molecular descriptor families demonstrates superior
performance for the former A principal component analysis reveals that the novel indices codify structural information
orthogonal to those captured by the DRAGON indices Finally, a QSAR study for the binding affinity to the
corticosteroid-binding globulin using Cramer’s steroid database is performed From this study, it is revealed that the QuBiLS-MIDAS Duplex
approach yields similar-to-superior performance statistics than all the 3D-QSAR methods reported in the literature reported so
far, even with lower degree of freedom, using both the 31 steroids as the training set and the popular division of Cramer’s
database in training [1-21] and test sets [22-31] It is thus expected that this methodology provides useful tools for the diversity
analysis of compound datasets and high-throughput screening structure–activity data
Keywords: Alignment free method, aggregation operator, Minkowski distance matrix, principal component analysis,
QuBiLS-MIDAS, 3D-QSAR, two-tuple spatial-(dis)similarity matrix, TOMOCOMD-CARDD, variability analysis
1 INTRODUCTION
The advent of 3D-QSAR methods represents a
funda-mental shift from the classical Hansch-Fujita (2D-QSAR)
*Address correspondence to this author at the Unit of Computer-Aided
Molecular “Biosilico” Discovery and Bioinformatics Research (CAMD-BIR
International), Cartagena de Indias, Bolívar, Colombia;
Tel: 3043926347; E-mails: ymarrero77@yahoo.es, ymponce@gmail.com
approach, motivated by the rationale that the spatial arrangement of molecular structures plays determinant role
in comprehending the ligand–receptor interactions [1] Right from the pioneering work by Cramer [2], the 3D-QSAR methods have enjoyed considerable enthusiasm over their capability to adequately model the biological activities of chemical structures In principle, the 3D-QSAR techniques could be divided in two main groups, alignment-based techniques (COMFA-related methods) and alignment
Yovani Marrero-Ponce
Trang 22 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al
independent methods [e.g CoMASA (Comparative
Molecular Active Site Analysis)] However, the use of
3D-QSAR methods has been far from a fairy tale; several
problems have been met On one hand, the use of alignment
rules comes along with a number of challenges, such as their
subjectivity, i.e they are generally inapplicable to
structurally diverse datasets (albeit there are works in this
sense, e.g see reference [3]), and the computation of steric
and/or electrostatic interaction energies yields numerous
variables (high dimensionality MD space) relative to the
dataset size and usually include noisy variables that tend to
compromise the quality of the QSAR models [4-6]
Efforts have been made to address the limitations of
3D-QSAR methods For example, techniques aimed at
addressing the high dimensionality problem include: filtering
data points prior to QSAR modeling [7, 8], variable selection
procedures [5, 9, 10] and grouping points [11] On the other
hand, similarity matrix correlations defined in terms of shape
or electrostatic potentials were introduced with the aim of
lowering the computational cost of 3D methods, though at
the expense of loss of significant features of the molecules
[12] Also strategies aimed at improving the alignment rules
have been proposed, such as the Monte Carlo algorithm [13]
and least squares fitting [14] Other approaches such as the
hypothetical active site analysis (HASL) convert superposed
molecular sets to a set of spaced points (lattice) to a regular
dimension which are defined by 3D-Cartesian coordinates
and atom-types [15-17]
On the other hand, rather than improving the alignment
rules, several alignment-independent techniques have been
proposed such as the use of 3D-models based on Cartesian
coordinates [18], molecular transforms [19, 20], molecular
spectra [21, 22], as well as the extension of traditional 2D
molecular indices to consider 3D information [23-27] Other
alignment-free methods include CoMMA [Comparative
Molecular Moment Analysis] [28], van der Waals excluded
volume [29] etc These methods are invariant to both
translation and rotation of the molecular structures, and have
generally yielded to comparable results with respect to the
alignment-based methods
However, although relentless efforts have been made to
improve or provide alternative, robust and computationally
cheap 3D-QSAR techniques, either due to the complexity of
modeling biological activities or the very weakness inherent
to the present methods, improvements on the quality of the
3D-QSAR models have in reality been minimal, creating
some kind of “out of reach” model performance So is it
possible to penetrate through these “boundaries”? Looking at
the current state of 3D-QSAR modeling in general, the
balance of responses to this interrogative may possibly lie
towards the negative end However, our argument is that, it
is imperative to diversify the space spanned by the 3D
molecular parameters, to yield variables that correctly “fit”
or adjust to the “troublesome” behavior of the molecules,
other than nearly exclusively concentrating on the quest for
the correct relationship among variables by using more
powerful (linear or non-linear) search strategies and
optimization functions
In previous reports, Marrero-Ponce et al introduced
outstanding features related with the topological (2D) and
chiral (2.5D) aspects of the molecules through the based and bond-based TOMOCOMD-CARDD (acronym for Topological Molecular Computer Design – Computer Aided Rational Drug Design) molecular descriptors (MDs) (now condensed in QuBiLS-MAS module) [30-39] These MDs codify molecular information by means of the bilinear,
atom-quadratic and linear algebraic forms and the graph–
theoretical electronic-density matrices Thus, bearing in
mind these successfully results and based on the same linear algebraic concepts, this manuscript is dedicated to the definition and generalization of the 3D algebraic-based
QuBiLS-MIDAS (acronym for Quadratic, Bilinear and Linear Maps based on n-Tuple Spatial Metric [(Dis)-
N-Similarity] Matrices and Atomic WeightingS) Duplex MDs for relations between atom-pairs, which constitute a module
of the TOMOCOMD-CARDD framework
2 THEORETICAL FRAMEWORK 2.1 Bilinear, Quadratic and Linear Form-based Indices for Atom-Level and Total (Whole-Molecule) Definitions
If a molecule is composed by n atoms then the k th
bilinear, quadratic and linear MDs for each atom “a” are
computed as bilinear, quadratic and linear algebraic maps (forms) in ℝn, in canonical basis set, and these are mathematically expressed as shown as follows, respectively:
where, n is the amount of atoms of the chemical structure, u
is a vector with coefficients equal to 1, and x 1 ,…, x n and
y 1 ,…, y n are the coordinates (or components) of the molecular (or property) vectors x and y in a system of canonical (‘natural’) basis vectors of ℝn The use of atom-based molecular vectors as representations of chemical structures has been explained in detail elsewhere [35-37, 40] In the present report, the components of these molecular vectors are computed from the following atom- and fragment-based
properties (weighting schemes): 1) atomic mass (m), 2) the van der Waals volume (v), 3) the atomic polarizability (p), 4) atomic Pauling electronegativity (e), 5) atomic Ghose- Crippen LogP (a) [23, 41, 42], 6) atomic Gasteiger-Marsili charge (c) [43], 7) atomic polar surface area (psa) [44], 8) atomic refractivity (r) [23, 41, 42], 9) atomic hardness (h) and 10) atomic softness (s) These properties were
implemented in the QuBiLS-MIDAS program [45, 46] mainly using the Chemistry Development Kit (CDK) library [47]
The coefficients g ij a,k are the elements of the k th two-tuple atom-level spatial-(dis)similarity matrix (SDSM) 𝔾𝔾!,! for
atom “a” These are obtained from the coefficients g ij k of the
𝔾𝔾! as follows:
Trang 3g ij a,k = g ij k if i = a ∧ j = a
= 12g ij k if i = a ∨ j = a
= 0 otherwise
(4)
So, if a molecule is divided into “a” atoms then the
matrix 𝔾𝔾! can be divided into “a” atom-level matrices 𝔾𝔾!,!,
in such a way that the kth power of the matrix 𝔾𝔾! is exactly
equal than the sum of the kth power of the atom-level
matrices 𝔾𝔾!,! Like this, each 𝔾𝔾!,! matrix determines an
atom-level index for each atom “a”, which is denoted as L a
(see Eqs 1-3) In this way, the total (whole-molecule, that is,
considering all atoms) bilinear, quadratic and linear indices
may be represented as a vector 𝐿𝐿 of size n, where each entry
La corresponds to the k th bilinear, quadratic or linear
atom-level index (descriptor) for the atom “a”
Therefore from this decomposition, the total bilinear,
quadratic and linear indices are calculated as linear
combination (summation) of the atom-level indices (values
of the vector 𝐿𝐿) Generalizations of this approach using
several aggregation operators will be discussed later (see
section 2.6) The summation over 𝐿𝐿 is equivalent to the
product for the property vector [X] T (or [U] T), the 𝔾𝔾! matrix
and the property vector [Y], analogous to the original
approach for 2D global bilinear, quadratic and linear
algebraic forms [37, 48, 49], as shown in Eqs 5-7 (see also
where [X] and [Y] are column vectors (nx1 matrices)
corresponding to the coordinates of the vectors x and y in
the canonical basis of ℝn , while [U] T and [X] T (1xn matrices)
are the transposes of the vector [U] and the property vector
[X], respectively
Finally, 𝔾𝔾! is the k th two-tuple spatial-(dis)similarity
matrix for a molecule, which constitutes a generalization of
the well-known geometric distance matrix [20, 50] The
geometric distance matrix (or geometry matrix) of a
molecule is a square symmetric matrix n×n, where each
entry rij is only computed as the Euclidean distance
(geometric distance) between the atoms i and j; and the
diagonal entries are always zero [12, 20, 50, 51] In the
present report, several approaches are proposed as an
extension/generalization of the traditionally used geometric
distance matrix These will be discussed in the next section
2.2 The Two-Tuple Spatial-(Dis) Similarity Matrix
(SDSM) and their Physicochemical Nature
The development of keen interest in the codification of
the geometric and topographic aspects of the molecular
structures as a logical extension of the topological
representation can be traced way back to the mid-1980s This
approach codifies information related with the molecular
geometry represented by a geometric distance matrix [12,
19, 20, 24] As was previously mentioned the geometric
distance matrix uses the Euclidean distance to codify
inter-atomic interactions within a molecule
Formally, let N be a set of elements, a function D:
4 D (a,b) ≤ D(a,c) + D(c,b) (triangle inequality)
If D holds for the properties 1-3 it is called a distance on
N, while if D holds for properties 1-4 it is then denominated
a distance metric On the other hand, if D holds for the axioms 1, 2 and 4 is denominated as pseudometric, but if D does not hold the property 4 is a nonmetric
To compute the distance between two atoms the 3D
Cartesian coordinates x, y, z are considered These
coordinates are continuous variables, constituting the
Euclidean metric the most common measure employed to
compute the distance for these types of variables It is
striking that up to the moment the Euclidean distance has
been considered as practically the exclusive inter-atomic
metric in the computation of 3D MDs, although there is no evidence other than the intuitive reasoning that upholds it as the most suitable distance metric Therefore, if a molecule is
in an Euclidean space and taking into account the previous
distance and metric definitions, it is then possible to
generalize the distance between the atoms i and j through
matrix” with elements defined in Eq 8 is the more general
(extended or expanded) case of the well-known geometric
distance matrix (if p = 2) However, there exist numerous
metrics that have been used successfully in machine learning algorithms and similarity studies [53-55], that could be used
to compute the inter-atomic dissimilarity and in this way serving as generalization schemes for the spatial distance,
ij
g for atoms i and j Table 1 shows the set of metrics used
in this report for the computation of the inter-atomic geometric distance
So, why use diverse (dis)-similarity metrics? Due to the fact
that the values obtained from these may exhibit a high degree of correlation as an indicative of the similarity between the objects
under study, as shown by Holliday et al in a comparative
analysis of the Cosine and Tanimoto coefficients [56] Conversely, whether these values show low correlations among them then may be a reflex of very different features among the objects that are being compared Therefore, it must not be
assumed as a premise that exist any single “best” distance
metric even if this report is only addressed to the domain of
Trang 44 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al
chemical structure handling In fact, as noted by Jones and
Curtice [57] in a debate regarding the association between
indexing terms in information retrieval systems: “What is
annoying is that no clear-cut criterion for choice among the
alternatives has emerged As a result, few candidate measures
have been permanently dismissed from consideration, and a
rather large set of formulas remains available.” Accordingly,
there is hence a continuing interest of analyzing and comparing
those available coefficients (metrics) in order to ensure that the
most suitable one(s) are used in any concrete similarity-based
system In conclusion, the use of several (dis)-similarity metrics
is necessary because they have some degree of orthogonality
and thus, the corresponding obtained MDs will have
independent information, which will also be highly
complementarily because each metric reflects very different
characteristics of atom-pairs in a molecule
On the other hand, with the aim of taking into account
close and distant inter-atomic interactions within the
molecular skeleton, we adapt a generalized expression for
the k th two-tuple spatial-(dis)similarity matrix, denoted by
𝔾𝔾!, where superscript k indicates the power to which the 𝔾𝔾
matrix is raised In this sense, for k = 0, the matrix 𝔾𝔾! has
each entry equal to 1; for k = 1, the elements gij1 of the matrix 𝔾𝔾! represent the (dis)-similarity between atoms i and
j (see Table 1) Furthermore, to achieve greater
discrimination of molecular structures the diagonal entries could have assigned two different values: 1) representing the amount of lone-pair electrons for each atom, or 2) the geometric distance, g io for each atom i and center of the molecule, o In this case, the elements gij1 of the matrix 𝔾𝔾!
are defined as follows:
g ij1 =D ij if i ≠ j ∧ i, j are atoms of the molecule = L ij if i = j ∧ lone- pairs are considered (or D io ) = 0 otherwise
(9)
where, D ij is the (dis)-similarity between atoms i and j (see
Table 1), and L ij could be: 1) the lone-pair (electron) number
on the atom i, or 2) the (dis)-similarity between atom i and
center of molecule, gio (Dio)
The matrices 𝔾𝔾! (k ≥ 2) are calculated multiplying
elements of the matrix 𝔾𝔾!!!by elements of the matrix 𝔾𝔾!, in
Table 1 Metrics used to compute the “distance” between two atoms of a molecule
Minkowski (M1-M7)
p = 0.25, 0.5, 1, 1.5, 2, 2.5, 3, and ∞
[where, when p= 1 it is the Manhattan,
city-block or taxi distance (also known as
Hamming distance between binary vectors)
and p = 2 is Euclidean distance)
Trang 5such a way that the elements of the matrix 𝔾𝔾! will be equal
to (gij1)k for all atom-pairs i, j of a molecule When no
normalizing procedure is employed for the elements of 𝔾𝔾!
(see below section 2.3), this matrix is designated as the k th
non-stochastic two-tuple spatial-(dis)similarity matrix
(NS-SDSM, 𝔾𝔾!" !) This matrix 𝔾𝔾!" ! can be classified as a
generalized matrix due to the fact that is determined through
the Hadamard product, that is, raising to different real
powers the elements of the matrix [20] However,
generalized reciprocal matrices where k takes negative
values (k ≤ -1) are also employed as matrix forms That is to
say, the matrices employed in this report are calculated by
raising the matrix coefficients to both positive and negative
exponents In this case, when the matrix exponent is negative
and if the number of lone pairs for each atom i in the
molecule is selected as diagonal element then the reciprocal
is not applied Nonetheless, the reciprocal is computed if the
(dis)-similarity between each atom i and the center of the
molecule is chosen as diagonal coefficient
The use of the elements of the matrix 𝔾𝔾!" ! and its
corresponding reciprocal for computing the bilinear,
quadratic and linear indices is based on the physicochemical
nature of distinct non-covalent interactions, such as Van der
Waals terms, gravitational interactions, Coulomb potential
and so on Indeed, the kth power of the matrix 𝔾𝔾!" ! is related
with the powers of their coefficients, where k = 0, ±1, ±2,
±3…±12 These exponents take into account the different
interactions between atoms in a molecule, for example, for k
= ±1 and k = ±2, the 𝔾𝔾!" ! matrix reflects interactions like
Coulombic and/or Gravitational, respectively The maximum
k value, ±12, is related with non-bonded (mainly steric)
interactions associated with the functional form of the
Lennard-Jones 6-12 potential, like in most CoMFA-like
studies
2.3 Normalization Formalisms based on
Simple-Stochastic, Double-Stochastic and Mutual Probability
Schemes
Matrices constitute the most common mathematical
representation to codify structural information of molecules
[20] Of particular interest are the matrices related to
molecular geometry, such as the geometry matrix, molecular
influence matrix, and others, which serve as a starting point
for the calculation of many 3D-MDs However, it is unusual
to use probabilistic transformations in matrices in general
As each rule has an exception, stochastic matrices are
defined in the framework of the MARCH-INSIDE
descriptors [58, 59], TOMOCOMD-CARDD 2D descriptors
(now condensed in QuBiLS-MAS module in
TOMOCOMD-CARDD software) [33, 60], and in walk counts (random
walk Markov matrix) In addition, Carbo-Dorca [61] also
employed a stochastic scaling by means of a simple
stochastic transformation This transformation was applied to
Quantum Similarity Matrixes (QSM) providing a stochastic
QSM In these methods a simple stochastic scaling has been
employed, where the summation of the coefficients of each
row is utilized as a scale factor In this way, unsymmetrical
matrices whose columns can be interpreted as discrete
probability distributions are created
Formally, stochastic matrices are square matrices where each column sum, left stochastic matrices, or each row sum,
right stochastic matrices, is equal to 1, that is, the
coefficients of each column or row consist of non-negative real numbers that can be interpreted as probabilities [62] On the contrary, MDs defined up to date do not use the double stochastic matrix, which is a stochastic matrix where the elements of each column and row sum 1
With the aim of normalizing the “extended” k th stochastic two-tuple spatial-(dis)similarity matrix, 𝔾𝔾!
non-!"
(NS-SDSM), three probability schemes are applied These schemes are associated with inter-atomic interactions in the
chemical structure For the TOMOCOMD-CARDD 2D and
2.5D indices (QuBiLS-MAS program), the stochastic graph–
theoretical electronic-density matrix for a molecule,
describes changes in the electron distribution over time throughout the molecular backbone In this scheme, a hypothetical case in which a set of atoms are initially free in space is considered (discrete object in the space) Later, outer shell electrons of atoms are distributed around atomic cores
in discrete time intervals In this sense, the electrons in an arbitrary atom can move to other atoms at different discrete time periods throughout the chemical-bonding framework In the geometrical approach, this matrix can be interpreted as the change in the probability of atoms in a molecule to interact with each other Consequently, this probability as a measure of the spreading of the atoms (taken as discrete
objects) in space can be considered
On this basis, the kth simple-stochastic two-tuple
spatial-(dis)similarity matrix, 𝔾𝔾!! ! (SS-SDSM) has been defined, which is obtained from 𝔾𝔾!" ! as follows:
ss g ij k = g ij k
g ij k j
where, gij1 are the coefficients of the kth power of 𝔾𝔾!" !
matrix and the SUM of the elements of the i th row of 𝔾𝔾!" ! is
called the k-order spatial-(dis)-similarity vertex degree of
atom i (see Schemes 1 and 2)
However, this matrix is not necessarily symmetrical in
that the probability for atom i to interact with an atom j is different from the probability for the atom j to interact with the atom i With the purpose of equalizing the probabilities
in both senses, a double-stochastic matrix is used, defined as
a matrix with real non-negatives entries whose column and
row sums are 1 Therefore, these matrices are referred to k th double-stochastic two-tuple spatial-(dis)similarity matrix,
!" ) may be calculated from the 𝔾𝔾!" ! using the
equation 11 and the Sinkhorn-Knopp algorithm
Trang 66 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al
Finally, the k th mutual probability two-tuple
spatial-(dis)similarity matrix, 𝔾𝔾!" ! (MP-SDSM) is introduced The
elements 𝔾𝔾!" !are obtained as follows:
mp g ij k = g ij k
S = g ij
k
g ij k j=1
where,mpgij k denotes the mutual probability between vertices i
and j, and S the sample space The sample space is computed
by summing all elements of 𝔾𝔾!
!" It should be pointed out that while the simple-stochastic probability scheme has been
previously used in the TOMOCOMD-CARDD approach
[33, 60], the double-stochastic probability and mutual
probability schemes are presented for the first time as
alternative normalization strategies Scheme 1 demonstrates
the steps followed in the computation of the NS-, SS-, DS-
and MP-SDSMs
In order to illustrate the calculation process of these
matrix approaches, the molecular structure of
(E)-3-(4,5-dihydrooxazol-4-yl)-2-fluoro-3-(methylthio)acrylonitrile is
considered Table 2 depicts the zero (k = 0), first (k = 1),
second (k = 2) and third (k = 3) powers of the NS-, SS-, DS-
and MP-SDSMs for this molecular structure An example of
the computation of the atom-level SDSM matrix is shown in
Table 3 using the NS-SDSM, NSa,k for k = 1
2.4 Local-Fragment (Group, Atom-Type) Bilinear, Quadratic and Linear Algebraic Indices
In addition, the proposed matrices ( 𝔾𝔾!" !, 𝔾𝔾!! !, 𝔾𝔾!" ! and 𝔾𝔾!" ! ) could be used to codify
information on a specific molecular fragment (F) of the
molecule Therefore, a SDSM for the molecular fragment
F, 𝔾𝔾!!, is obtained from the total matrix 𝔾𝔾! The elements
g ijF k of the local-fragment matrix are defined as follows:
Similar to the total atom-level indices (see Eqs 1-3), the
local-fragment two-tuple atom-level indices are computed as
a vector of LOVIs 𝐿𝐿! , where each entry 𝐿𝐿! ! corresponds to
a value of a local-fragment index according to the atom
considered “a” The definition of these indices is as follows:
Trang 7Table 2 A) Chemical structure of (E)-3-(4,5-dihydrooxazol-4-yl)-2-fluoro-3-(methylthio)acrylonitrile and its labeled molecular
scaffold B), C), D) and E) The zero (k = 0), first (k = 1), second (k = 2) and third (k = 3) powers of the non-stochastic (NS), simple-stochastic (SS), double-stochastic (DS) and mutual probability (MP) spatial-(dis)similarity matrices (SDSM) of the
molecule, respectively
A) 2D (left) and 3D (right) molecular structure
F
N S
O
N H
H H
H
H H
H
B) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 0
1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083
0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.007
C) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 1
0.000 1.111 1.957 2.958 2.283 3.797 1.977 1.083 1.672 1.630 1.064 2.173 0.000 0.051 0.090 0.136 0.105 0.175 0.091 0.050 0.077 0.075 0.049 0.100 1.111 0.000 1.074 1.925 1.798 2.731 1.258 1.847 2.498 2.400 1.739 2.145 0.054 0.000 0.052 0.094 0.088 0.133 0.061 0.090 0.122 0.117 0.085 0.105 1.957 1.074 0.000 1.065 0.963 1.928 2.021 2.257 2.807 2.829 2.411 3.154 0.087 0.048 0.000 0.047 0.043 0.086 0.090 0.100 0.125 0.126 0.107 0.140 2.958 1.925 1.065 0.000 1.640 0.863 2.376 3.310 3.853 3.823 3.362 3.694 0.102 0.067 0.037 0.000 0.057 0.030 0.082 0.115 0.133 0.132 0.116 0.128 2.283 1.798 0.963 1.640 3.000 2.400 2.928 2.137 2.506 2.726 2.626 3.937 0.079 0.062 0.033 0.057 0.104 0.083 0.101 0.074 0.087 0.094 0.091 0.136 3.797 2.731 1.928 0.863 2.400 1.000 2.937 4.169 4.705 4.650 4.174 4.286 0.101 0.073 0.051 0.023 0.064 0.027 0.078 0.111 0.125 0.124 0.111 0.114
Trang 88 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al
(Table 2) contd…
D) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 1
1.977 1.258 2.021 2.376 2.928 2.937 2.000 2.919 3.599 3.358 2.513 1.379 0.068 0.043 0.069 0.081 0.100 0.100 0.068 0.100 0.123 0.115 0.086 0.047 1.083 1.847 2.257 3.310 2.137 4.169 2.919 0.000 1.022 1.618 1.663 3.209 0.043 0.073 0.089 0.131 0.085 0.165 0.116 0.000 0.040 0.064 0.066 0.127 1.672 2.498 2.807 3.853 2.506 4.705 3.599 1.022 2.000 0.963 1.567 3.743 0.054 0.081 0.091 0.125 0.081 0.152 0.116 0.033 0.065 0.031 0.051 0.121 1.630 2.400 2.829 3.823 2.726 4.650 3.358 1.618 0.963 0.000 0.915 3.362 0.058 0.085 0.100 0.135 0.096 0.164 0.119 0.057 0.034 0.000 0.032 0.119 1.064 1.739 2.411 3.362 2.626 4.174 2.513 1.663 1.567 0.915 1.000 2.480 0.042 0.068 0.094 0.132 0.103 0.164 0.099 0.065 0.061 0.036 0.039 0.097 2.173 2.145 3.154 3.694 3.937 4.286 1.379 3.209 3.743 3.362 2.480 0.000 0.065 0.064 0.094 0.110 0.117 0.128 0.041 0.096 0.112 0.100 0.074 0.000
0.000 0.075 0.117 0.132 0.105 0.129 0.089 0.059 0.073 0.078 0.057 0.085 0.000 0.003 0.006 0.009 0.007 0.011 0.006 0.003 0.005 0.005 0.003 0.007 0.075 0.000 0.067 0.090 0.087 0.097 0.060 0.105 0.115 0.120 0.097 0.087 0.003 0.000 0.003 0.006 0.005 0.008 0.004 0.006 0.008 0.007 0.005 0.006 0.117 0.067 0.000 0.044 0.041 0.060 0.085 0.114 0.114 0.125 0.119 0.114 0.006 0.003 0.000 0.003 0.003 0.006 0.006 0.007 0.008 0.008 0.007 0.009 0.132 0.090 0.044 0.000 0.052 0.020 0.074 0.124 0.116 0.126 0.124 0.099 0.009 0.006 0.003 0.000 0.005 0.003 0.007 0.010 0.012 0.011 0.010 0.011 0.105 0.087 0.041 0.052 0.099 0.058 0.094 0.083 0.078 0.093 0.100 0.109 0.007 0.005 0.003 0.005 0.009 0.007 0.009 0.006 0.008 0.008 0.008 0.012 0.129 0.097 0.060 0.020 0.058 0.018 0.070 0.119 0.108 0.117 0.117 0.088 0.011 0.008 0.006 0.003 0.007 0.003 0.009 0.013 0.014 0.014 0.013 0.013 0.089 0.060 0.085 0.074 0.094 0.070 0.063 0.111 0.110 0.112 0.094 0.038 0.006 0.004 0.006 0.007 0.009 0.009 0.006 0.009 0.011 0.010 0.008 0.004 0.059 0.105 0.114 0.124 0.083 0.119 0.111 0.000 0.038 0.065 0.075 0.105 0.003 0.006 0.007 0.010 0.006 0.013 0.009 0.000 0.003 0.005 0.005 0.010 0.073 0.115 0.114 0.116 0.078 0.108 0.110 0.038 0.060 0.031 0.057 0.099 0.005 0.008 0.008 0.012 0.008 0.014 0.011 0.003 0.006 0.003 0.005 0.011 0.078 0.120 0.125 0.126 0.093 0.117 0.112 0.065 0.031 0.000 0.036 0.097 0.005 0.007 0.008 0.011 0.008 0.014 0.010 0.005 0.003 0.000 0.003 0.010 0.057 0.097 0.119 0.124 0.100 0.117 0.094 0.075 0.057 0.036 0.044 0.080 0.003 0.005 0.007 0.010 0.008 0.013 0.008 0.005 0.005 0.003 0.003 0.007 0.085 0.087 0.114 0.099 0.109 0.088 0.038 0.105 0.099 0.097 0.080 0.000 0.007 0.006 0.009 0.011 0.012 0.013 0.004 0.010 0.011 0.010 0.007 0.000
E) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 2
0.000 1.235 3.829 8.749 5.212 14.419 3.909 1.172 2.796 2.655 1.132 4.723 0.000 0.025 0.077 0.176 0.105 0.289 0.078 0.024 0.056 0.053 0.023 0.095 1.235 0.000 1.153 3.706 3.231 7.460 1.583 3.411 6.242 5.758 3.025 4.601 0.030 0.000 0.028 0.090 0.078 0.180 0.038 0.082 0.151 0.139 0.073 0.111 3.829 1.153 0.000 1.134 0.927 3.719 4.083 5.095 7.880 8.004 5.812 9.947 0.074 0.022 0.000 0.022 0.018 0.072 0.079 0.099 0.153 0.155 0.113 0.193 8.749 3.706 1.134 0.000 2.688 0.745 5.646 10.959 14.843 14.619 11.305 13.648 0.099 0.042 0.013 0.000 0.031 0.008 0.064 0.124 0.169 0.166 0.128 0.155 5.212 3.231 0.927 2.688 9.000 5.760 8.572 4.568 6.279 7.431 6.895 15.499 0.069 0.042 0.012 0.035 0.118 0.076 0.113 0.060 0.083 0.098 0.091 0.204 14.419 7.460 3.719 0.745 5.760 1.000 8.624 17.382 22.133 21.622 17.419 18.373 0.104 0.054 0.027 0.005 0.042 0.007 0.062 0.125 0.160 0.156 0.126 0.133 3.909 1.583 4.083 5.646 8.572 8.624 4.000 8.518 12.952 11.279 6.317 1.900 0.051 0.020 0.053 0.073 0.111 0.111 0.052 0.110 0.167 0.146 0.082 0.025 1.172 3.411 5.095 10.959 4.568 17.382 8.518 0.000 1.044 2.617 2.767 10.296 0.017 0.050 0.075 0.162 0.067 0.256 0.126 0.000 0.015 0.039 0.041 0.152 2.796 6.242 7.880 14.843 6.279 22.133 12.952 1.044 4.000 0.927 2.456 14.013 0.029 0.065 0.082 0.155 0.066 0.232 0.136 0.011 0.042 0.010 0.026 0.147 2.655 5.758 8.004 14.619 7.431 21.622 11.279 2.617 0.927 0.000 0.837 11.301 0.031 0.066 0.092 0.168 0.085 0.248 0.130 0.030 0.011 0.000 0.010 0.130 1.132 3.025 5.812 11.305 6.895 17.419 6.317 2.767 2.456 0.837 1.000 6.150 0.017 0.046 0.089 0.174 0.106 0.268 0.097 0.042 0.038 0.013 0.015 0.094 4.723 4.601 9.947 13.648 15.499 18.373 1.900 10.296 14.013 11.301 6.150 0.000 0.043 0.042 0.090 0.124 0.140 0.166 0.017 0.093 0.127 0.102 0.056 0.000
0.000 0.060 0.134 0.163 0.123 0.166 0.090 0.036 0.057 0.060 0.035 0.076 0.000 0.001 0.004 0.009 0.005 0.015 0.004 0.001 0.003 0.003 0.001 0.005 0.060 0.000 0.045 0.077 0.086 0.096 0.041 0.116 0.144 0.145 0.105 0.083 0.001 0.000 0.001 0.004 0.003 0.008 0.002 0.004 0.007 0.006 0.003 0.005 0.134 0.045 0.000 0.017 0.018 0.035 0.076 0.125 0.131 0.145 0.145 0.129 0.004 0.001 0.000 0.001 0.001 0.004 0.004 0.005 0.008 0.008 0.006 0.010 0.163 0.077 0.017 0.000 0.027 0.004 0.055 0.142 0.130 0.140 0.150 0.094 0.009 0.004 0.001 0.000 0.003 0.001 0.006 0.012 0.016 0.015 0.012 0.014 0.123 0.086 0.018 0.027 0.116 0.036 0.107 0.075 0.070 0.091 0.116 0.136 0.005 0.003 0.001 0.003 0.009 0.006 0.009 0.005 0.007 0.008 0.007 0.016 0.166 0.096 0.035 0.004 0.036 0.003 0.052 0.139 0.120 0.128 0.143 0.078 0.015 0.008 0.004 0.001 0.006 0.001 0.009 0.018 0.023 0.023 0.018 0.019 0.090 0.041 0.076 0.055 0.107 0.052 0.049 0.136 0.141 0.134 0.103 0.016 0.004 0.002 0.004 0.006 0.009 0.009 0.004 0.009 0.014 0.012 0.007 0.002 0.036 0.116 0.125 0.142 0.075 0.139 0.136 0.000 0.015 0.041 0.060 0.116 0.001 0.004 0.005 0.012 0.005 0.018 0.009 0.000 0.001 0.003 0.003 0.011 0.057 0.144 0.131 0.130 0.070 0.120 0.141 0.015 0.039 0.010 0.036 0.107 0.003 0.007 0.008 0.016 0.007 0.023 0.014 0.001 0.004 0.001 0.003 0.015
Trang 9where, gijF a,k is the kth element corresponding to the row “i”
and column “j” of the local-fragment two-tuple atom-level
matrix, 𝔾𝔾!!,!, according to the atom “a” This matrix is
computed for each atom of the molecule from the
local-fragment matrix 𝔾𝔾!!, which contains information regarding
the distances between each atom-pair belonging to the
molecular fragment ( )F considered Summation over vector
𝐿𝐿
! of LOVIs yields the kth local-fragment bilinear, quadratic
and linear indices for atom-types or groups (see SCHEMES
1 and 2) In this report, these local MDs can be calculated on
seven chemical (or functional) groups in the molecule, these
are: hydrogen bond acceptors (A), carbon atoms in aliphatic chains (C), hydrogen bond donors (D), halogens (G), terminal methyl groups (M), carbon atoms in aromatic portion (P) and heteroatoms (O, N and S in all valence states, denoted as X)
Up to this section, we have used the summation of the total atom-level contributions and local-fragment atom-level
contributions as exclusive operator to obtain the kth total (or local-fragment) NS-, SS-, DS-, MP-bilinear, quadratic and linear molecular indices In the subsection 2.6, we propose alternative strategies (invariants) of obtaining indices from LOVIs other than the summation
(Table 2) contd…
F) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 2
0.060 0.145 0.145 0.140 0.091 0.128 0.134 0.041 0.010 0.000 0.013 0.094 0.003 0.006 0.008 0.015 0.008 0.023 0.012 0.003 0.001 0.000 0.001 0.012 0.035 0.105 0.145 0.150 0.116 0.143 0.103 0.060 0.036 0.013 0.022 0.071 0.001 0.003 0.006 0.012 0.007 0.018 0.007 0.003 0.003 0.001 0.001 0.006 0.076 0.083 0.129 0.094 0.136 0.078 0.016 0.116 0.107 0.094 0.071 0.000 0.005 0.005 0.010 0.014 0.016 0.019 0.002 0.011 0.015 0.012 0.006 0.000
G) NS-, SS-, DS- and MP-SDSM, 𝔾𝔾! for k = 3
0.00 1.37 7.49 25.88 11.90 54.75 7.73 1.27 4.67 4.33 1.21 10.26 0.000 0.010 0.057 0.198 0.091 0.418 0.059 0.010 0.036 0.033 0.009 0.078 1.37 0.00 1.24 7.14 5.81 20.37 1.99 6.30 15.60 13.82 5.26 9.87 0.015 0.000 0.014 0.080 0.065 0.230 0.022 0.071 0.176 0.156 0.059 0.111 7.49 1.24 0.00 1.21 0.89 7.17 8.25 11.50 22.12 22.64 14.01 31.37 0.059 0.010 0.000 0.009 0.007 0.056 0.065 0.090 0.173 0.177 0.110 0.245 25.88 7.14 1.21 0.00 4.41 0.64 13.42 36.28 57.19 55.89 38.01 50.42 0.089 0.025 0.004 0.000 0.015 0.002 0.046 0.125 0.197 0.192 0.131 0.174 11.90 5.81 0.89 4.41 27.00 13.82 25.10 9.76 15.73 20.26 18.11 61.02 0.056 0.027 0.004 0.021 0.126 0.065 0.117 0.046 0.074 0.095 0.085 0.285 54.75 20.37 7.17 0.64 13.82 1.00 25.33 72.47 104.13 100.54 72.70 78.76 0.099 0.037 0.013 0.001 0.025 0.002 0.046 0.131 0.189 0.182 0.132 0.143 7.73 1.99 8.25 13.42 25.10 25.33 8.00 24.86 46.61 37.88 15.88 2.62 0.036 0.009 0.038 0.062 0.115 0.116 0.037 0.114 0.214 0.174 0.073 0.012 1.27 6.30 11.50 36.28 9.76 72.47 24.86 0.00 1.07 4.23 4.60 33.04 0.006 0.031 0.056 0.177 0.048 0.353 0.121 0.000 0.005 0.021 0.022 0.161 4.67 15.60 22.12 57.19 15.73 104.13 46.61 1.07 8.00 0.89 3.85 52.46 0.014 0.047 0.067 0.172 0.047 0.313 0.140 0.003 0.024 0.003 0.012 0.158 4.33 13.82 22.64 55.89 20.26 100.54 37.88 4.23 0.89 0.00 0.77 37.99 0.014 0.046 0.076 0.187 0.068 0.336 0.127 0.014 0.003 0.000 0.003 0.127 1.21 5.26 14.01 38.01 18.11 72.70 15.88 4.60 3.85 0.77 1.00 15.25 0.006 0.028 0.073 0.199 0.095 0.381 0.083 0.024 0.020 0.004 0.005 0.080 10.26 9.87 31.37 50.42 61.02 78.76 2.62 33.04 52.46 37.99 15.25 0.00 0.027 0.026 0.082 0.132 0.159 0.206 0.007 0.086 0.137 0.099 0.040 0.000
0.000 0.047 0.146 0.189 0.137 0.200 0.087 0.020 0.042 0.043 0.020 0.067 0.000 0.000 0.002 0.009 0.004 0.018 0.003 0.000 0.002 0.001 0.000 0.003 0.047 0.000 0.030 0.064 0.082 0.091 0.028 0.125 0.173 0.170 0.110 0.080 0.000 0.000 0.000 0.002 0.002 0.007 0.001 0.002 0.005 0.005 0.002 0.003 0.146 0.030 0.000 0.006 0.007 0.018 0.064 0.128 0.138 0.156 0.165 0.143 0.002 0.000 0.000 0.000 0.000 0.002 0.003 0.004 0.007 0.007 0.005 0.010 0.189 0.064 0.006 0.000 0.013 0.001 0.039 0.152 0.135 0.146 0.168 0.086 0.009 0.002 0.000 0.000 0.001 0.000 0.004 0.012 0.019 0.018 0.013 0.017 0.137 0.082 0.007 0.013 0.128 0.021 0.116 0.064 0.058 0.083 0.126 0.165 0.004 0.002 0.000 0.001 0.009 0.005 0.008 0.003 0.005 0.007 0.006 0.020 0.200 0.091 0.018 0.001 0.021 0.000 0.037 0.151 0.122 0.130 0.161 0.067 0.018 0.007 0.002 0.000 0.005 0.000 0.008 0.024 0.034 0.033 0.024 0.026 0.087 0.028 0.064 0.039 0.116 0.037 0.036 0.160 0.168 0.151 0.108 0.007 0.003 0.001 0.003 0.004 0.008 0.008 0.003 0.008 0.015 0.012 0.005 0.001 0.020 0.125 0.128 0.152 0.064 0.151 0.160 0.000 0.006 0.024 0.045 0.124 0.000 0.002 0.004 0.012 0.003 0.024 0.008 0.000 0.000 0.001 0.002 0.011 0.042 0.173 0.138 0.135 0.058 0.122 0.168 0.006 0.023 0.003 0.021 0.111 0.002 0.005 0.007 0.019 0.005 0.034 0.015 0.000 0.003 0.000 0.001 0.017 0.043 0.170 0.156 0.146 0.083 0.130 0.151 0.024 0.003 0.000 0.005 0.089 0.001 0.005 0.007 0.018 0.007 0.033 0.012 0.001 0.000 0.000 0.000 0.013 0.020 0.110 0.165 0.168 0.126 0.161 0.108 0.045 0.021 0.005 0.010 0.061 0.000 0.002 0.005 0.013 0.006 0.024 0.005 0.002 0.001 0.000 0.000 0.005 0.067 0.080 0.143 0.086 0.165 0.067 0.007 0.124 0.111 0.089 0.061 0.000 0.003 0.003 0.010 0.017 0.020 0.026 0.001 0.011 0.017 0.013 0.005 0.000
Trang 1010 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al
(E)-3-(4,5-dihydrooxazol-4-yl)-2-fluoro-3-(methylthio)acrylonitrile This matrix belongs to the total bilinear index, using the Euclidean distance and the properties
atoms of the molecule
A) NS-SDSM order 1
0.000 0.900 0.511 0.338 0.438 0.263 0.506 0.924 0.598 0.614 0.940 0.460
0.900 0.000 0.931 0.519 0.556 0.366 0.795 0.541 0.400 0.417 0.575 0.466
0.511 0.931 0.000 0.939 1.039 0.519 0.495 0.443 0.356 0.353 0.415 0.317
0.338 0.519 0.939 0.000 0.610 1.158 0.421 0.302 0.260 0.262 0.297 0.271
0.438 0.556 1.039 0.610 3.000 0.417 0.342 0.468 0.399 0.367 0.381 0.254
0.263 0.366 0.519 1.158 0.417 1.000 0.341 0.240 0.213 0.215 0.240 0.233
0.506 0.795 0.495 0.421 0.342 0.341 2.000 0.343 0.278 0.298 0.398 0.725
0.924 0.541 0.443 0.302 0.468 0.240 0.343 0.000 0.979 0.618 0.601 0.312
0.598 0.400 0.356 0.260 0.399 0.213 0.278 0.979 2.000 1.039 0.638 0.267
0.614 0.417 0.353 0.262 0.367 0.215 0.298 0.618 1.039 0.000 1.093 0.297
0.940 0.575 0.415 0.297 0.381 0.240 0.398 0.601 0.638 1.093 1.000 0.403
0.460 0.466 0.317 0.271 0.254 0.233 0.725 0.312 0.267 0.297 0.403 0.000
B) Atom-level NS-SDSM order 1 for all atoms of the molecule
0.000 0.450 0.256 0.169 0.219 0.132 0.253 0.462 0.299 0.307 0.470 0.230 0.000 0.450 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.450 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.450 0.000 0.466 0.260 0.278 0.183 0.397 0.271 0.200 0.208 0.287 0.233 0.256 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.466 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.169 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.260 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.219 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.278 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.132 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.183 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.253 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.397 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.462 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.271 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.299 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.307 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.208 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.470 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.287 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.230 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.233 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.256 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.169 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.466 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.260 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.256 0.466 0.000 0.469 0.519 0.259 0.247 0.222 0.178 0.177 0.207 0.159 0.000 0.000 0.000 0.469 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.469 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.169 0.260 0.469 0.000 0.305 0.579 0.210 0.151 0.130 0.131 0.149 0.135 0.000 0.000 0.519 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.305 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.259 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.579 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.247 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.210 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.222 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.151 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.178 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.130 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.177 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.131 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.207 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.135 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
C) Atom-level NS-SDSM order 1 for all atoms of the molecule
0.000 0.000 0.000 0.000 0.219 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.132 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.278 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.183 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.519 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.259 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.305 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.579 0.000 0.000 0.000 0.000 0.000 0.000 0.219 0.278 0.519 0.305 3.000 0.208 0.171 0.234 0.200 0.183 0.190 0.127 0.000 0.000 0.000 0.000 0.000 0.208 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.208 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.132 0.183 0.259 0.579 0.208 1.000 0.170 0.120 0.106 0.108 0.120 0.117
Trang 112.5 Constraints: Topological and Geometric
Neighborhood Quotient Matrices
The geometry matrix (G) [20, 50], contains information
related with the 3D molecular conformation and
configuration, but it does not contain information about atom connectivity Therefore, for several applications, the
geometry matrix is accompanied by a connectivity table or
several combinations with other “topological” or
(Table 3) contd…
D) Atom-level NS-SDSM order 1 for all atoms of the molecule
0.000 0.000 0.000 0.000 0.171 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.170 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.234 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.120 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.106 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.183 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.108 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.190 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.120 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.127 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.117 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.253 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.462 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.397 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.271 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.247 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.222 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.210 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.151 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.171 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.234 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.170 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.120 0.000 0.000 0.000 0.000 0.253 0.397 0.247 0.210 0.171 0.170 2.000 0.171 0.139 0.149 0.199 0.363 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.171 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.171 0.000 0.000 0.000 0.000 0.000 0.462 0.271 0.222 0.151 0.234 0.120 0.171 0.000 0.489 0.309 0.301 0.156 0.000 0.000 0.000 0.000 0.000 0.000 0.139 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.489 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.309 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.199 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.301 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.363 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.156 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.299 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.307 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.208 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.178 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.177 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.130 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.131 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.200 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.183 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.106 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.108 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.139 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.489 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.309 0.000 0.000 0.299 0.200 0.178 0.130 0.200 0.106 0.139 0.489 2.000 0.519 0.319 0.134 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.519 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.519 0.000 0.000 0.000 0.307 0.208 0.177 0.131 0.183 0.108 0.149 0.309 0.519 0.000 0.547 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.319 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.547 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.134 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.470 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.230 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.287 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.233 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.207 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.135 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.190 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.127 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.120 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.117 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.199 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.363 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.301 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.156 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.319 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.134 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.547 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.149 0.470 0.287 0.207 0.149 0.190 0.120 0.199 0.301 0.319 0.547 1.000 0.202 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.202 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.202 0.000 0.230 0.233 0.159 0.135 0.127 0.117 0.363 0.156 0.134 0.149 0.202 0.000
Trang 1212 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al
Table 4 Labeled chemical structure of (E)-3-(4,5-dihydrooxazol-4-yl)-2-fluoro-3-(methylthio)acrylonitrile and its local-fragment
[for heteroatom (X)] NS-SDSM matrices of order 1 using Euclidean distance with and lacking cutoffs (topological and geometrical thresholds) considering the lone-pair electrons in main diagonal A) Topological interaction at lag p, cut-off interval [2; 4-5] For simplicity, only the interactions from Flour atom to other atoms in molecule are displayed with discontinuous lines The “heavy” and slightly discontinuous lines are related with interaction between F atom with heteroatoms (X) and with non-heteroatoms (C-atoms), respectively Blue, Green and Red lines mean contact between F- atom and other atoms at 2, 4 or 5 topological distance B) The figure visualizes the corresponding tree of topological atomic distances of the annotated atom in (A) The root and leaves are labeled with the corresponding atom numbers; C)
1.14 0.90 0.48 0.82 3.00 2.40 2.93 1.07 2.51 1.36 2.63 0.00
0.00 1.37 0.96 0.43 2.40 1.00 2.94 0.00 0.00 0.00 0.00 0.00 0.99 0.63 1.01 1.19 2.93 2.94 2.00 1.46 0.00 0.00 2.51 0.69 0.00 0.00 0.00 0.00 1.07 0.00 1.46 0.00 0.51 0.00 0.83 0.00 0.84 1.25 1.40 0.00 2.51 0.00 0.00 0.51 2.00 0.48 1.57 0.00 0.00 0.00 0.00 0.00 1.36 0.00 0.00 0.00 0.48 0.00 0.46 0.00 0.53 0.87 1.21 0.00 2.63 0.00 2.51 0.83 1.57 0.46 1.00 1.24 0.00 0.00 0.00 0.00 0.00 0.00 0.69 0.00 0.00 0.00 1.24 0.00
1.14 0.90 0.48 0.82 3.00 2.40 2.93 1.07 2.51 1.36 2.63 1.97
1.90 1.37 0.96 0.43 2.40 1.00 2.94 2.08 4.70 2.32 4.17 2.14 0.99 0.63 1.01 1.19 2.93 2.94 2.00 1.46 3.60 1.68 2.51 0.69 0.00 0.00 0.00 0.00 1.07 2.08 1.46 0.00 0.51 0.00 0.83 0.00 0.84 1.25 1.40 1.93 2.51 4.70 3.60 0.51 2.00 0.48 1.57 1.87 0.00 0.00 0.00 0.00 1.36 2.32 1.68 0.00 0.48 0.00 0.46 0.00 0.53 0.87 1.21 1.68 2.63 4.17 2.51 0.83 1.57 0.46 1.00 1.24 0.00 0.00 0.00 0.00 1.97 2.14 0.69 0.00 1.87 0.00 1.24 0.00
Trang 13“topographical” distances In this sense, from the geometric
(r ij)or topographic (tij) distances, the geometry matrix can
also be re-defined as distance/distance matrix (D/D) in order
to merge in the same mathematical representation 2D and 3D
information of the molecular structure [20] Other important
matrices are the geometric distance/topological distance
quotient matrix, represented as G/D, whose entries are
computed through the division between the respective
coefficients of the geometry matrix (G) and the graph
distance matrix (D) (D/G constitutes the corresponding
reciprocal of the G/D) Other matrices that merge 2D and 3D
chemical information are distance–distance combined
matrices, namely G^D (geometric distance–topological
distance combined matrix), T^D (topographic distance–
topological distance combined matrix) and D^G and D^T,
which are the transpose of the representations G^D and T^D
[65]
With the purpose of taking into account only some
inter-atomic interactions (for example, short-, middle- and
large-contacts) in total or local-fragment indices and thus account
for the most relevant interactions, two different constraints
are proposed:
1) Graph-theoretical cut-off (p) calculated from
topological distance at a lag p, represented as “path
cut-off”
2) Geometric cut-off (l) calculated from Euclidean
distance at a lag l, denoted as “length cut-off”
The use of one or both cut-offs over 𝔾𝔾!" !derives a new
matrix: the two-tuple topological and geometric
neighborhood quotient SDSM1, denoted as ℕℚ𝔾𝔾1 This
ℕℚ𝔾𝔾1 is a sparse matrix whose entries are the coefficients of
𝔾𝔾!
!" matrix, which present values less than or equal to a
user-defined thresholds p and/or l, and zero otherwise Then,
the elements of ℕℚ𝔾𝔾1matrix,NQgij1, are defined as follows:
NQ g ij1 = g ij1 if pmin≤pij ≤ pmax or / and lmin≤lij ≤ lmax
where,gij1 of the matrix 𝔾𝔾! represents the (dis)-similarity
between atomic nuclei i and j (see Table 1), and p ij and lij are
a user-defined topological and Euclidean distance thresholds,
respectively Min and Max means minimum and maximum
cut-offs (rank)
The constraints approach (both path and length
thresholds) permit us to unify 2D and 3D information as well
as to consider the most relevant interactions (see Table 4 for
a simple example) In addition, to avoid untrustworthy or
irrelevant molecular information because of long-range
inter-atomic relations, topological and/or geometrical cut-offs
whose values only enable to take into account those
inter-atomic relations significant to the considered interactions,
are used In this way, atoms far from the molecule and not
contributing to the interactions are not involved in the
computations
This approach is rather different from the one previously
used that combines matrices with topological/topographical/
geometrical Euclidean distances [20, 65] In addition, the
new truncated matrix, ℕℚ𝔾𝔾1, is also a class of the
generalized matrices (see above) [20] A quite similar
approach has been previously used in chemo-informatics
The adjacency matrix is an instance of neighborhood matrix computed by using a threshold of d ij = 1 on the vertex
distance matrix Applied to the geometry matrix, a threshold
t is equivalent to a predefined geometric distance that yields
to a sparse geometry matrix, where uniquely the atom-pairs
not too far from each other are taken into account In the same spirit, in the ℕℚ𝔾𝔾 1 matrix the topological or geometrical thresholds or both can be used If constraints are not used then 𝔾𝔾!" ! matrix is used for all the MDs
calculations, i.e., it is not is mandatory to use any constraints
for calculations However, selecting the “cut-off” permits to differentiate the interaction types, for example when a topological cut-off is applied, atomic indices could be calculated for atoms separated by 1 step (covalent
interactions) or for those separated by more than 1 step (p
≥ 2) to characterize the non-covalent interactions between
atoms i and j
The relationship between the distance and the magnitude
of the non-covalent interactions of diverse nature demonstrates that these contribute to the maintenance of the 3D structure of the molecule, depending on the distance that the interacting groups are found In this way, some of these interactions are only important when the functional groups are so close among themselves or so distant in the covalent structure but sterically close (large-contacts)
For example, the use of the length criterion (together with the exponent k) permits to take into account only the interactions among the functional groups of the atoms i and j,
which significantly contribute to the maintenance of the
molecular structure On the other hand, the k exponent in the
term gij kof the Eqs 1-6 models the functional relationship
that exists between the distance and the strength of the
interaction between the functional groups of the atoms i and
j
The path criterion permits to select the interactions for
atoms found at a determined topological distance In this way, it could be useful to construct the matrices from the information about the contact (interaction) among atoms separated at a determined distance (or distance range) in the 2D structure of the molecule, with the objective of studying the possible relations between a specific property and the topological characteristics of the molecule
An example of the application of the path and length criteria in the construction of the matrix that characterizes the 3D structure of the molecule could be found in the Table
4 From these neighborhood matrices, other neighborhood
matrices are derived for the description of the 3D features of
molecules (see Schemes 1 and 2) These matrices are
denominated NS-, SS-, DS and MP-ℕℚ𝔾𝔾matrices These representations are obtained through the application of the
Eqs 10-12 to ℕℚ𝔾𝔾1 Our approach could be viewed as thresholds that generalize
and unify the use of lag k and lag r in 2D- and 3D-Moreau–
Broto autocorrelations, respectively For autocorrelation MDs
determined on a molecular graph, the lag k cut-off exactly
matches with the topological distance among any pair of vertices Autocorrelation indices for 3D molecular geometry are
Trang 1414 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al
computed using Euclidean inter-atomic distances (r),
represented in the geometric matrix (G), instead of topological
distances [49, 50] However, in this 3D-autocorrelation the rij
cut-off is rather different from our “length” threshold, due to
the fact that in the first MDs the inter-atomic distance is split
into distance intervals of equal size
From this point of view, the ℕℚ𝔾𝔾1 matrix can also be
obtained directly from NS-SDSM1 ( 𝔾𝔾!) by using a
neighborhood interaction geodesic matrix, that is, the
elementsNQg1ij of the ℕℚ𝔾𝔾1 matrix can be alternatively
obtained by multiplying the analog elements of NS-SDSM1
( 𝔾𝔾!) with their respective elements in neighborhood
interaction geodesic matrix (NIGM), δ ij:
NQ g ij1 = g ij1×δij
where, δ ij= 1 if pmin≤p ij ≤ pmax or / and lmin≤l ij ≤ lmax
= 0 otherwise
(18)
where, p min and p max are the pre-defined topological distance
values, and p ij is the topological distance between the atoms i
and j; l min and l max are the pre-determined geometric distance
values, and l ij is the geometric distance between the atoms i
and j That is to say, the ℕℚ𝔾𝔾 matrix can be obtained using a
Hadamard product between matrices of same size SDSM and
NIGM, which is represented as: ℕℚ𝔾𝔾 = 𝔾𝔾!⨂𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 =
1
ij
g ×𝛿𝛿!" (see Eq 18) So, the k th global bilinear, quadratic
and linear indices at lags p and l (using both thresholds at the
same time) can be re-expressed and compacted by the
following equation:
𝑚𝑚!,!! 𝑥𝑥, 𝑦𝑦 = gij k×δ!"
!" 𝑥𝑥!𝑦𝑦! = !" NQgij k 𝑥𝑥!𝑦𝑦!=
where, 𝑚𝑚!,!! 𝑥𝑥, 𝑦𝑦 are the k th (k = 0, ±1, ±2, ±3…±12)
algebraic maps at the lags p, l; that is the k th total (or
local-fragment) bilinear, quadratic and linear indices using p and l
thresholds Here, the m form is: a) bilinear map if [X] ≠ [Y]
(different atomic properties in [X] and [Y] vectors), b)
quadratic map if [X] = [Y] (the same atomic property in [X]
and [Y] vectors), and c) linear map if [X] = [U] ≠ [Y] ([X] is
the unit vector and [Y] the atomic property vector) ℕℚ𝔾𝔾k is
the k th two-tuple topological and geometric neighborhood
quotient SDSM
2.6 Generalization of Method of Obtaining Total and
Local-Fragment Indices from LOVIs: Is It More Than
the Sum of Its Parts?
The notion of invariants as a generalization scheme for
the linear combination of atomic contributions to yield
global (molecular) definitions is derived from the hypothesis
that the most suitable global definition of a natural system
may not necessarily be additive Indeed it was demonstrated
in [66-68] that other operators other than the sum could yield
better correlations with determined chemical properties
These invariants are applied to the vector 𝐿𝐿 of atom-level indices These are classified in four major groups (see Table
5):
(N1, N2, N3), and Penrose size (PN) Note that the
N1 in our case is equivalent to the summation of the
components of vector 𝐿𝐿 (Eqs 5-7)
Geometric mean (G), arithmetic mean (M), quadratic mean (P2), power mean of third degree (P3) and
harmonic mean (A)
3 Statistical Invariants (highest statistical moments):
Variance (V), skewness (S), kurtosis (K), standard deviation (SD), variation coefficient (CV), range (R), percentile 25 (Q1), percentile 50 (Q2), percentile 75 (Q3), inter-quartile range (I50), maximum Li (MX) and minimum Li (MN)
Gravitational (GV), Total Information Content (TIC), Mean Information Content (MIC), Standarized Information Content (SIC), Total Sum (TS), Ivanciuc – Balaban (IB), Electrotopological State (ES) and
quadratic and linear indices defined in the equations 5, 6 and
7, respectively In the same way, these mathematical
operators could be utilized over a vector composed of a particular class of chemical local-fragment (group and atom-type) to obtain diverse local-fragment indices to describe a given molecule Note that as for the classical invariants, in addition to using atom-level indices as LOVIs (in place of the vertex degrees), these algorithms usually carry summations, which are generalized as well using the norms,
means and statistical invariants The Scheme 2 summarizes
the steps (and generalizations) followed in the computation
of these novel 3D-MDs Finally, all 3D algebraic-based MDs are calculated with QuBiLS-MIDAS software [45, 46], a module of the TOMOCOMD-CARDD approach
3 SHANNON’S ENTROPY-BASED VARIABILITY ANALYSIS OF THE QUBILS-MIDAS DUPLEX 3D INDICES AND COMPARISON WITH OTHER APPROACHES
Recently, Godden et al., has proposed an information
theory-based algorithm for evaluating the relevance of variables [69] This unsupervised method is based on the computation of Shannon’s Entropy (SE) for variables, following the synthesis that variables desirable for chemo-informatics tasks should possess high SE values as an indicator of their tendency to gradually change with modification of the chemical molecular structure, while redundant variables (from a case-wise perspective) should possess low SE values, with the lower limit being zero SE for variables that assign the “same value” to dissimilar cases,
Trang 15Table 5 Norms, Means and Statistical Invariants as Generalizations of the Linear Combination of LOVIs as Global (or Local)
MDs Operator, as well as Classical algorithms which generalize the first three groups
Trang 1616 Current Bioinformatics, 2015, Vol 10, No 3 Marrero-Ponce et al
(Table 5) contd…
N, L a number
graph order and type, respectively
N o i=1
of the i th atom and ΔI i is the field effect on the i th atom calculated as perturbation of the I i of i th atom by all other atoms in the molecule, d ij is the topological distance between the i th and the j th atoms, and n is the number of
atoms The exponent k is 2
respectively
aThe second group (invariants 5-9) could be re-named as “location statistics” if percentiles and maximum (minimum) are taken into consideration in this group In this case, the
third group (invariants 10-21) could be re-named as “spread and shape statistics”.bLOVIs for “a” atoms in molecule, that is, atom-level algebraic index