1. Trang chủ
  2. » Tất cả

Qspr modelling of stability constants of metal thiosemicarbazone using artificial neural network and multivariate linear regression in environmental analysis

7 2 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề QSPR Modelling of Stability Constants of Metal Thiosemicarbazone Using Artificial Neural Network and Multivariate Linear Regression in Environmental Analysis
Tác giả Nguyen Minh Quang, Pham Thi Thu Trang, Tran Xuan Mau, Tran Thi Thanh Ngoc, Pham Van Tat
Trường học Faculty of Chemistry, University of Sciences, Hue University
Chuyên ngành Environmental Analysis
Thể loại Scientific Conference Proceedings
Năm xuất bản 2018
Thành phố Hue
Định dạng
Số trang 7
Dung lượng 437,7 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The fourth Scientific Conference SEMREGG 2018 10 QSPR MODELLING OF STABILITY CONSTANTS OF METAL THIOSEMICARBAZONE USING ARTIFICIAL NEURAL NETWORK AND MULTIVARIATE LINEAR REGRESSION IN ENVIRONMENTAL AN[.]

Trang 1

QSPR MODELLING OF STABILITY CONSTANTS OF METAL-THIOSEMICARBAZONE USING ARTIFICIAL NEURAL NETWORK AND MULTIVARIATE LINEAR REGRESSION IN ENVIRONMENTAL

ANALYSIS

Nguyen Minh Quang 1,2 , Phạm Thi Thu Trang 2 , Tran Xuan Mau 1 , Tran Thi Thanh Ngoc 3 ,

Pham Van Tat 4*

1

Faculty of Chemistry, University of Sciences, Hue University, 77 Nguyen Hue, Hue City

2

Faculty of Chemical Engineering, Industrial University of Ho Chi Minh City,

12 Nguyen Van Bao, Go Vap district, Ho Chi Minh City

3

Faculty of Geology and Mineralogy, Ho Chi Minh City University of Natural resources and

Environment, 236B Le Van Sy, Tan Binh district, Ho Chi Minh City

4

Faculty of Science and Technology, Hoa Sen University, 8 Dinh Cong Trang,

District 1, Ho Chi Minh City

*Email: vantat@gmail.com

ABSTRACT

The quantitative structure and property relationships (QSPRs) between molecular structure and

stability constants (logβ11) of metal-thiosemicarbazone complexes are constructed by the multivariate linear regression (MLR) and artificial neural network (ANN) methods The structural descriptors of the complexes include the 2D, 3D molecular and physicochemical descriptors The stability constants with experimental parameters are collected from published literature The best model QSPRMLR (with k of 11) consists of the molecular descriptors xvp6, xvpc4, xvp7, xp5, xp4,

N3, electric energy, cosmo area, dipole, knotp and volume The quality of this QSPRMLR was

verified with the statistical values: R2train = 0.926, Q2LOO = 0.842, SE = 0.790 and Fstat = 58.992 The neural network model QSPRANN with architecture I(11)-HL(8)-O(1) was also presented by the

statistical values: R2train = 0.994, Q2CV = 0.998 and R2test = 0.993 Use of the present QSPRMLR and QSPRANN model to external validation of logβ11 values turns out to be in good agreement with those from the experimental literature The outcomes are aligned in the design of new thiosemicarbazone derivatives using quantitative analysis of metal ions in environmental sample

Keywords: QSPR models, stability constants log 11, multivariate linear regression, artificial neural network, thiosemicarbazone

1 INTRODUCTION

Nowadays, the more the industry is developing, the more the environment is polluted, in which heavy metal ions are released into the environment from factories, industrial parks and production facilities Therefore, the control and analysis of heavy metal ions have to be fast and cheap to meet the practical demands

In the world, many methods of analysing heavy metal content have been used such as AAS, ICP-AES, ICP-MS, spectrophotometry and so on [1-3], in which the complexes of ligands with

Trang 2

metal ions has been used extensively [4] In this study, we aim to characterize the thiosemicarbazone derivative because of its good complexity and many published studies on their use in a simple and inexpensive method photometric analysis that is

The QSPR modelling studies provide many advantages It may be noted that QSPR helps in achieving efficient, effective, safe, and environmentally benign chemicals and processes thereof and thereby facilitates a sustainable chemical process [5]

However, to use it as complexes, stability constants are important consideration It is a measure of the strength of the interaction between the ligand and the metal ions to form different complexes On the other hand, in order to orientate the empirical research quyckly, with the development of computer science, it is very common to carry out theoretical research on applying computational chemistry combined with the chemical software to solve complex mathematical problems and appropriate mathematical methods [6]

In this work, we construct the quantitative structure and properties relationship (QSPR) using the structural descriptors and stability constant of complexes between the metal ions and thiosemicarbazone The structural descriptors are calculated by using the semi-empirical quantum chemistry method PM7 and PM7/sparkle [7], molecular mechanics, and connectivity calculation The multivariable linear model QSPRMLR is established by using the least square techniques and structural descriptors The artificial neural network model QSPRANN is constructed by the error back-propagation method using multilayer perceptron algorithm with the input layer that includes structural descriptors of the best selected QSPRMLR model The stability constant log 11 of the complexes between the metal ions and thiosemicarbazone in the test set resulting from the QSPR models is validated and compared with those from experimental data in the literature

2 METHODOLOGY 2.1 Data set selection

The selection of complexes that constitute the data set should be the first step of the study The

values logβ11 of metal-thiosemicarbazone complexes were taken from the literature in Table 1 Schematic representation of complexes is commonly described as in Fig 1 [8]

Figure 1 Structure of metal-thiosemicarbazone complex: a) General complex structure; b) Complex

between Co2+/Zn2+ and nicotinaldehyde thiosemicarbazone [9]

The logβ11 values are log-transformed stability constants of metal-thiosemicarbazone

complexes The stability constant β is calculated based on reaction between a metal ion (M) and a

thiosemicarbazone ligand (L) in an aqueous solution The reaction is

p M + q L ⇌ MpLq (1)

Trang 3

In the case of one step with p = 1 and q = 1, the stability constant for the formation of ML is

given by [10]

11

ML

M L

(2)

Table 1 Complexes of metal ions and thiosemicarbazone and stability constants

Trang 4

2.2 Multivariate linear regression

The multivariate linear regression (MLR) is a popularly used statistical method for modelling the dependency between two or more variables by fitting a linear equation to the observed data The relationship between independent and dependent variables is described in the following equation [25, 26]

where Y is the dependent variable, β0 is the intercept of the model, βi is a slope associated with Xi,

where Xi is an independent variable, k is the number of variables in the equation, and ε is an error

In this study, MLR is chosen to develop a relationship between the log-transformed stability

constants (logβ11) and structural parameters affecting on it The log-transformed stability constant

(logβ11) is the dependent variable while other parameters are independent

The MLR method is used to build QSPRMLR model The method chooses variables of model by the principle of least squares That is minimizing the sum of square differences between the observed and predicted values This minimization leads to the following estimators of the parameters of the model The models were screened by using the values R2train and Q2LOO [25-27] These were calculated by the same formula (4)

2

2 1

ˆ

1

n

i n i i

R

(4)

where n is the number of observations; Y i , Ŷ i, and Ȳ are the experimental, calculated and average value, respectively

Another statistic to evaluate MLR model is the standard error (SE) of the estimate It is a measure of the accuracy of predictions Recall that the regression line is the line that minimizes the sum of squared deviations of prediction The standard error of the estimate is closely related to this quantity and is defined below [28]

2

1

ˆ 1

n

Y Y SE

The QSPRMLR model was constructed from the database of complexes between metal ions and the ligands including the 2D and 3D molecular descriptors, the quantum parameters and the stability

constant log β11 in Table 1

The two-dimensional structures of metal-thiosemicarbazone complexes were drawn using BIOVIA Draw 2017 [29] and optimized by means of quantum mechanics on the MoPac2016 system [30] The quantum descriptors were calculated by using the semi-empirical quantum method with new version PM7 and PM7/sparkle for lanthanides [7] The resulted geometry was transferred into QSARIS system [27, 31] which calculated the 2D and 3D topological descriptors

Firstly, the data set was divided into training and test sets, in which the test set contains about

20 % of the initial set, the training set was used for constructing the regression model The construction of QSPRMLR models was carried out using back-elimination and the forward regression

Trang 5

technique on the Regress system [25] and MS-EXCEL [26,27,32] The artificial neural network model QSPRANN was built using the multilayer training technique on the Matlab system [33] The predictability of QSPR models was cross-validated by means of the leave-one-out method (LOO) using the statistic Q2LOO

To assess the degree of the influential variables on the QSPRMLR models, we introduced a

quantity, namely the average contribution percentage, MPx k,i It is the percentage of each

independent variable in the selected QSPR models (with i of 1 to k), is determined according to

formula (6) [27, 31]

,

1

100 1

,%

N

k i m i

m

k j m j j

b x MPx

N

b x

(6)

where N is number of observations; m is number of substances used to calculate Px k,i value; bk,i are the parameters of model

2.3 Artificial neural network

Artificial neural network (ANN) is a mathematical model that tries to simulate the functional aspects of biological neural networks It consists of an interconnected group of artificial neurons and it processes information using a connection to approach the computation Many research works have been applying the ANN method successfully in various fields of mathematics, robotic control, medicine, and chemistry and so on [34, 35]

An ANN modelling includes an input layer, one or more hidden layer, and an output layer Neurons in each of the layer and weights that connect these to one another There are many kinds of ANN architectures for various applications, in which multi-layer perceptron (MLP) is the simplest and the most commonly used ANN architecture for prediction [36] So, ANN architecture used in this study is a multilayer feed-forward network with a single hidden layer The model composes of input layer, hidden layer and one output layer

Besides, we used a typical feed-forward neural network with an error back-propagation learning algorithm to train it Mathematical statement of this neural network style propagates information in the feed-forward direction can be written as [37, 38]

0

·

N

i

(7)

where x i is the input factor, o j is the output factor, w ij is the weight factor between two nodes, q j is the internal threshold, and is the transfer function In this work, we used hyperbolic sigmoid tangent transfer function to train ANN models It is described in the following equation [37,38]

1 2

2 tan ( )

e

(8)

The training of ANN model is carried out till the mean square error (MSEANN) is minimized followed by a comparison of the network output with the actual values of the output obtained from

experimental results [38] MSEANN is the average squared error between the networks

outputs (o) and the target outputs (t) It is written as follows [38]

Trang 6

2 ANN

1

1 n

The QSPRANN model is also developed with the neural network technique using “nntool” tool

on the Matlab system [33] The QSPRANN model is trained with Levenberg-Marquardt backpropagation algorithm and the hyperbolic sigmoid tangent transfer function The dataset is divided randomly into three subsets That includes 70 % of training set; 15 % of cross-validation set and the last 15 % of independent test set

2.4 External validation

The external validation ensures the predictability and applicability of the developed QSPR models for the prediction of untested molecules [5] The models are external validated by the statistical paremeter as the Q2test It also calculated the same equation (4) for the test set In order to

evaluate the discrepancies between the experimental and predictive logβ11 values from the models,

we used the single factor ANOVA

In addition, error analysis is an important part of QSPR studies In order to assess the predictive performance of the developed models, the average absolute values of the relative error

MARE used to assess the overall error of the QSPR models are calculated according to formula (10)

1

, % , %

n i i

ARE MARE

n where

11,exp 11,cal

11,exp

log

n is the number of test substances; β 11,exp and β 11,cal are the experimental and calculated stability constants

3 RESULTS AND DISCUSSION 3.1 QSPR MLR modelling

In the surveyed models, statistical parameters were used to evaluate the models such as SE,

R2train, Q2LOO and Fstat (Fischer‟s value) A good calibrating model has high R2, Q2, and F values, and low SE value with the least number of descriptors in which the R2 and the Q2 values are more important The QSPRMLR models and the statistical values are shown in Table 2

The results in Table 2 showed that when the k value increases the R2train and Q2LOO values

increase and the SE value decreases Once the k value reach 11, the R2train and Q2LOO values satisfy

statistical conditions [39] When k value increases to 12, the R2train and Q2LOO parameters continue to

increase and the SE still decreases However, this variation is negligible When k is higher or equal

to 13 are not necessary because the number of variables increase Thus, the QSPRMLR model with k

= 11 is the best match in all the models The quality of the QSPRMLR model is shown in the R2train

value of 0.926; the standard error SE of 0.790; the F stat value of 58.992 and the Q2LOO value of 0.842 The linear regression equation of the QSPRMLR model is as follows

logβ11 = 7.984 - 5.997x1 + 3.044x2 + 5.960x3 - 24.356x4 + 26.688x5 + 22.313x6 -

- 0.00127x7 - 0.227x8 + 1.148x9 + 13.437x10 + 0.089x11

(11)

Trang 7

Table 2 Selected QSPRMLR model (k of 4 to 13) and statistical values

Notation of molecular descriptors

The number of descriptors k was selected in range 4 to 13 The change of the amount of structural parameter leads to the change of the values SE, R2train and Q2LOO (Figure 2a)

Figure 2 a) The change of SE, R2train and Q2LOO values according to k descriptors; b) Comparison of experimental vs predicted values logβ11 of the data set using the QSPRMLR model (with k = 11)

According to Table 3, the important contribution of molecular descriptors in each complex is

arranged in the order based on GMPx i values (GMPxi is the average value of MPxk,i, it is calculated from the results of three good models with k = 11 - 13): xp4 > xp5 > cosmo area > volume The xp4

parameter (x5) with the GMPx5 value of 31.2463 strongly influences the stability constant of complexes The xp4 parameter is called Chi path 4, the Simple 4th-order path Chi index Next, the

xp5 parameter is called Chi path 5, the Simple 5th-order path Chi index (x4) The last two

parameters that strongly affects the stability constant are cosmo area (x8) and volume (x11), these are the geometric parameters of the molecule

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Number of variable, k

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ngày đăng: 03/03/2023, 08:35

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm