DSpace at VNU: An implementation of the Levenberg-Marquardt algorithm for simultaneous-energy-gradient fitting using two-layer feed-forward neural networks

DSpace at VNU: An implementation of the Levenberg-Marquardt algorithm for simultaneous-energy-gradient fitting using two...

Trang 1

Accepted Manuscript

Title: An implementation of the Levenberg–Marquardt

algorithm forsimultaneous-energy-gradient fitting using

two-layer feed-forwardneural networks

Author: Hieu T Nguyen-Truong Hung M Le

To appear in:

Please cite this article as: Hieu T Nguyen-Truong, Hung M Le, An implementation

of the LevenbergndashMarquardt algorithm forsimultaneous-energy-gradient fitting

using two-layer feed-forwardneural networks, Chemical Physics Letters (2015),

http://dx.doi.org/10.1016/j.cplett.2015.04.019

This is a PDF file of an unedited manuscript that has been accepted for publication.

As a service to our customers we are providing this early version of the manuscript The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Trang 2

Accepted Manuscript

An implementation of the Levenberg–Marquardt

algorithm for simultaneous-energy-gradient fitting using

two-layer feed-forward neural networks

Hieu T Nguyen-Truong, Hung M Le∗ Faculty of Materials Science, University of Science, Vietnam National University, Ho Chi

Minh City, Vietnam

Abstract

We present in this study a new and robust algorithm for feed-forward neural

network (NN) fitting This method is developed for the application in

poten-tial energy surface (PES) construction, in which simultaneous energy-gradient

fitting is implemented using the well-established Levenberg–Marquardt (LM)

algorithm Three fitting examples are demonstrated, which include the

vibra-tional PES of H2O, reactive PESs of O3and ClOOCl In the three testing cases,

our new LM implementation has been shown to work very efficiently Not only

increasing fitting accuracy, it also offers two other advantages: less training

iterations are utilized and less data points are required for fitting

Keywords: Levenberg–Marquardt algorithm, Feed-Forward Neural Network,

Potential Energy Surface

1 Introduction

For years, artificial neural networks has become a robust and powerful tool

for constructing ab initio potential energy surfaces with reasonable

computa-tional efforts [1 23] However, in most of the above studies, it is obligated that

large data sets are required to construct qualitative PESs which can sufficiently

describe reaction channels and long-range atomic interactions It is therefore of

Email address: hung.m.le@hotmail.com (Hung M Le)

Trang 3

Accepted Manuscript

importance to develop an efficient training algorithm with better performance

to improve the fitting accuracy of NNs

In most studies, energy is obviously the main fitting objective in NN PES

It should be noted that atomic forces, which are derived from energy gradients,

are indeed required to be well-approximated for the vitality of solving

differ-ential equations of motions in molecular dynamics simulations Not long ago,

Pukrittayakamee et al [7] established an approach to simultaneously fit energies

and forces in NN models using the back-propagation algorithm In principle, not

only does this method improve training efficiency, it also helps to reduce the size

of training data Actually, we have confirmed the validity of such an argument

in a previous study when the back-propagation training method was employed

to fit energies and gradients simultaneously This method also incorporated

symmetry permutations of input variables [13] However, the back-propagation

method often suffers from extremely low convergence rate In fact, there have

been efforts to improve its working efficiency [24, 25] On the other hand, the

use of the Jacobian-based Leverberg–Marquardt algorithm is very promising in

terms of performance efficiency

In the present work, we develop a new version of the Levenberg–Marquardt

algorithm as an approach to simultaneously fit energies and forces using a

two-layer feed-forward NN We hereby term our newly-implemented algorithm as

the function-derivative Levenberg–Marquardt (FD-LM) fitting method To

val-idate fitting efficiency, we apply the current FD-LM approach to three molecular

systems with different levels of PES roughness: H2O, O3, and ClOOCl By

pro-viding a direct comparison of PES fitting accuracy of the present work with

those obtained by the back-propagation algorithm [13] and by fitting only

en-ergies [11], we observe that the FD-LM significantly improves fitting accuracy

and thereby save much computational resource

Trang 4

Accepted Manuscript

2 Mathematical Background and Implementation

It is important to imply that the traditional approximation method for

two-layer feed-forward NNs only fits targeted energies In order to include force

(gradient) fitting, we follow Pukrittayakamee et al [7] to employ a

minimiza-tion scheme of mean square error (MSE), or so-called performance index, as

following:

P = 1

L

Q

X

q=1

"

(tq− aq)2+ ρ

R

X

r=1

∂tq

∂pr,q − ∂aq

∂pr,q

2#

where L = Q(R + 1), Q is the number of samples, R is the number of input

parameters, tq is the qth scaled target (in a particular pre-defined range such

as [-1; 1] in order to enhance fitting efficiency), ∂tq/∂pr,q is the rth partial

derivative of the qth target (each quantity is converted in correspondence with

the scaled inputs and targets), aq is the qth output energy predicted by the

NN, ∂aq/∂pr,q is the predicted rth partial derivative of the qth sample given by

the NN An empirical parameter, ρ, is introduced into the performance index

formula to assign a “penalty” on the significance of all gradients with respect

to targeted energies Using a small ρ value would lower predicting accuracy of

gradients and thereby put a higher constraint on fitting accuracy of energy On

the other hand, a large ρ value would reduce fitting accuracy of the main target

(energy) It can be obviously seen that if ρ vanishes, our implemented algorithm

will essentially work as a traditional energy fitting procedure Despite the fact

that such a parameter can be tuned to adjust fitting performance, there is an

idea suggested by Pukrittayakamee et al [7] to first determine ρ:

ρ = maxn|tq|2o max

(

∂tq

∂pr,q

The output aq of the NN is given by

aq =

N

X

n=1

Trang 5

Accepted Manuscript

where N , w and b are the number of hidden neurons, weights and a bias of the

second layer of the NN, respectively, and

a1n,q= f

R

X

r=1

wn,r1 pr,q+ b1n

!

where f (x) is the transfer function (here we employ tanh as the transfer

func-tion), R is the number of input samples, w1

n,r and b1

n are weights and biases

of the first NN layer, respectively The scaled variables, pr,q, are employed as

input signals instead of using the original inputs p0

r,q The scaling scheme is as follow:

pr,q= p

0 r,q− min{p0

r,q} max{p0

r,q} − min{p0

r,q}, (q = 1, 2, , Q), (5) where max{p0

r,q} and min{p0

r,q} are the maximum and minimum values of the original input within Q samples, respectively

In the training process, weights and biases are continuously updated to

re-duce the performance index Now, let us denote x as a column vector to

rep-resent all NN parameters, which include weights and biases of the two layers

The form of x is written as:

xT =h w1

1,1 w1

1,2 · · · w1

1,R w1 2,1 w1 2,2 · · · w1

N,R b1 b1 · · · b1

N w2 w2 · · · w2

i (6)

The number of parameters in vector x is M = N (R + 1) + (N + 1) In the

present work, we use the Levenberg–Marquardt algorithm to update vector x,

which represents all NN parameters:

xnew= x −JTJ + µI−1JTe, (7) where e is the error vector that is defined as a column vector of L elements,

eT =h e∗

1 e1,1 e2,1 · · · eR,1 e∗2 e1,2 e2,2 · · · eR,Q

i (8) where

and

er,q =r ρ

R

∂tq

∂pr,q

− ∂aq

∂pr,q

Trang 6

Accepted Manuscript

We can rewrite the performance index P in Eq (1) as a function of error vector

e,

P = 1 L

Q

X

q=1

"

e∗q2

+

R

X

r=1

(er,q)2

#

The Jacobian J in Eq (7) is an L × M matrix as given below:

J =





1,1

1,2

∗ 1

N,R

∗ 1

N

∗ 1

N

∂b

1,1

1,2

N,R

N

∂b

1,1

1,2

N,R

N

∂b

.

1,1

1,2

N,R

N

∂b

2

1,1

2

1,2

∗ 2

N,R

2

∗ 2

N

2

∗ 2

N

2

∂b

1,1

1,2

N,R

N

∂b

1,1

1,2

N,R

N

∂b

.

1,1

1,2

N,R

N

∂b





 ,

(12)

where

∂e∗q

∂w2 n

∂e∗q

∂b1 n

= −w2nh1 − a1n,q2i, (15)

∂e∗q

∂w1 n,r

= pr,q

∂e∗q

∂b1 n

∂er,q

∂w2 n

= −r ρ R

h

1 − a1n,q2i

Trang 7

Accepted Manuscript

∂er,q

∂b1 n

= −2wn2a1n,q∂er,q

∂w2 n

∂er,q

∂w1 n,r 0

=r ρ

R 1 − 2w

1 n,ra1n,qpr0 ,q

∂ e

∗ q

∂b1 n

3 Results and Discussions

As mentioned earlier, the newly-implemented FD-LM algorithm for

simulta-neous energy-gradient fitting is applied to construct the PES for three different

molecular systems at different levels of complexity for testing purposes Those

systems include H2O, O3, and ClOOCl (see Table1for more details) The PES

of O3 was previously reported by Le et al [11] using the traditional NN

fit-ting method with only energy approximation, while Nguyen-Truong and Le [13]

reported NN PESs for H2O and ClOOCl trained by the back-propagation

al-gorithm with simultaneous-energy-gradient approximation By providing direct

comparisons of the present FD-LM results with those reported in the

litera-ture [11–13], we are able to discuss the major advantages of our newly-developed

FD-LM algorithm in PES fitting

In each case, a training set is employed to train the NN PES, while an

inde-pendent testing set is used to evaluate statistical error The use of a validation

set to prevent “over-fitting” (often referred to as the “early stopping”

tech-nique) is in fact unnecessary in the FD-LM scheme because of the incorporated

gradient interpolation during the fitting process The training process is

termi-nated when either µ (in Eq (7)) achieves 10−6 or the training error increases

continuously in 20 epochs

In the first case, we re-construct the vibrational ground-state PES of

molecu-lar H2O based on existing ab initio data [13] With only three internal variables,

the potential function is rather simple compared to the two later cases In the

training process, we utilize a set of 191 configurations with a potential energy

range of 1.5 eV In the hidden NN layer, we use 10 neurons, which bring the

Trang 8

Accepted Manuscript

angles), and potential energy (Hartree).

smallest largest

H2O

θ(H1–O–H2) 64.18 164.57

O3

r1(O1–O2) 1.107 2.157

r2(O2–O3) 1.057 1.557 θ(O1–O2–O3) 73.72 157.72

ClOOCl

r1(O1–Cl1) 1.481 2.448

r2(O2–O1) 1.048 2.823

r3(O2–Cl2) 1.486 2.444

θ1(O2–O1–Cl1) 73.494 179.663

θ2(Cl2–O2–O1) 79.872 178.927 φ(Cl2–O2–O1–Cl1) 0.027 184.672

total number of NN parameters to 51 Subsequently, a statistical test is

per-formed with an independent set of 5,612 configurations, and the testing error

reveals excellent fitting accuracy More details regarding best achieved results

for training and testing can be found in Table2 Also, the previously-reported

results [13] are also included for comparison purposes We see that the current

FD-LM algorithm is much powerful and robust than the NN back-propagation

In particular, the present root-mean-squared error (RMSE) for a testing set of

5,612 configurations is as small as 6.55 × 10−4 eV (or 1.51 × 10−2 kcal/mol),

Trang 9

Accepted Manuscript

while the corresponding absolute-average error (AAE) is 4.59 × 10 eV (or

1.06 × 10−2 kcal/mol) Those errors are in fact very small when we compare

them to the energy range (1.5 eV)

Not only obtaining excellent accuracy in energy prediction, we also

success-fully reproduce forces with respect to three internal variables describing the

H2O molecule In Fig.1, we show the real and interpolated energies and forces

of 50 testing configurations We observe that the NN-predicted results are in

excellent agreement with the data given by MP2/6-311G* calculations

More-over, the optimization process indicates an advancement when only 188 epochs

(training iterations) are required to reach numerical convergence Recall that

in the previous study [13], more than 60,000 epochs were used by the NN

back-propagation method

The next fitting objective, which is the reactive PES of O3, is further

compli-cated in terms of electronic structure Even though it is a three-atom problem

like H2O, the fourth-order Møller–Plesset perturbation [26, 27] (MP4)

calcu-lations suggested that the reactive PES of O3 is higher energetic and consists

of three different switchable spin states: singlet ground state and two excited

states (triplet and quintet) Therefore, the potential energy function of such a

molecule with a complicated electronic structure is expected to be quite

com-plex In the training process, we employ the pre-constructed dataset (obtained

by grid scans of internal coordinates) with an energy range of 2.5 eV [11]

The total number of hidden neurons to fit the PES of O3is 150 At

termina-tion, a total number of 1,187 epochs are used to train the NN parameters, and

the analysis of numerical accuracy for O3 is provided in Table2 The previous

fitting error reported by Le et al [11] is also included for comparison It should

be noted that we only use 2,815 configurations to construct the O3training set,

whereas Le et al [11] used up to 5,906 configurations It can be seen that the

testing result obtained from FD-LM fitting is better than that reported in the

previous study (0.035 eV vs 0.045 eV in RMSE, respectively) The

improve-ment in our result is clearly due to the use of simultaneous energy-force fitting

to approximate the PES From the above two testing cases, we see that the

Trang 10

Accepted Manuscript

paren-theses.

H2O

RMSE training set (191) 3.58 × 10−4 8.26 × 10−3

testing set (5,612) 6.55 × 10−4 1.51 × 10−2 Ref 13 1.03 × 10−2 2.38 × 10−1

AAE training set (191) 2.88 × 10−4 6.63 × 10−3

testing set (5,612) 4.59 × 10−4 1.06 × 10−2 Ref 13 7.80 × 10−3 1.80 × 10−1

O3

RMSE training set (2,815) 3.54 × 10−2 8.16 × 10−1

testing set (50) 3.50 × 10−2 8.07 × 10−1 Ref 11 0.45 × 10−1 1.03 AAE training set (2,815) 2.17 × 10−2 5.00 × 10−1

testing set (50) 2.22 × 10−2 5.12 × 10−1 Ref 11 7.56 × 10−2 1.74 ClOOCl

RMSE training set (1,693) 2.37 × 10−4 5.48 × 10−3

testing set (17,457) 2.97 × 10−2 6.85 × 10−1 Ref 12 1.37 × 10−2 3.16 × 10−1

Ref 13 4.09 × 10−2 9.43 × 10−1 AAE training set (1,693) 1.73 × 10−4 3.99 × 10−3

testing set (17,457) 5.93 × 10−3 1.37 × 10−1 Ref 12 0.78 × 10−2 1.80 × 10−1 Ref 13 2.69 × 10−2 6.20 × 10−1

Trang 11

Accepted Manuscript

−76.23

−76.22

−76.21

−76.2

−76.19

−76.18

−0.4

−0.3

−0.2

−0.1 0 0.1

−0.15

−0.1

−0.05 0 0.05 0.1

−0.1

−0.05 0 0.05

(b) (a)

(c)

(d)

– ab initio calculations.

Trang 12

Accepted Manuscript

current FD-LM method not only reduces fitting error, but it also downsizes the

training set In Fig.2, we show the approximated and true energies and forces

of 50 randomized configurations In a few number of cases, the predictions of

gradients with respect to the O–O bond or O–O–O bending angle are not quite

accurate; however, we can see that the majority of testing cases still reveals very

good gradient predictions

In addition, we also attempt to train the NN with a smaller dataset of 1,409

configurations Indeed, the fitting accuracy for the training set is somewhat

improved (RMSE = 0.017 eV), while the testing accuracy for all available data

points is not as good (RMSE = 0.076 eV)

In the last case, the PES of ClOOCl is constructed Four atoms in the

sys-tem constitute a set of six internal input variables, which include three bond

distances, two bending angles, and one dihedral angle With more internal

vari-ables, we employ a NN with 150 hidden neurons to fit this PES For comparison

purposes, we utilize a previously-constructed data set in the training process,

which contains 1,693 configurations as also employed in a previous study (with

an energy range of 1.2 eV) [13] The RMSE and AAE of the training set are

reported as 2.37 × 10−4 eV (or 5.48 × 10−3 kcal/mol) and 1.73 × 10−4 eV (or

3.99 × 10−3 kcal/mol), respectively Compared to the RMSE that was reported

in a previous study, we believe that the current LM implementation is highly

advantageous A large testing set of 17,457 configuration (also obtained from

a previous study [12]) is employed to validate fitting accuracy The testing

RMSE is subsequently estimated as 0.030 eV, which is comparable to the error

produced by back-propagation fitting (0.041 eV) In particular, for the training

set, we even observe that the FD-LM algorithm significantly improve the fitting

error (almost 60 times smaller than the RMSE reported in the literature [12])

In total, 1,241 epochs are used to approximate the PES with excellent

statis-tical accuracy, while we note that more than 60,000 epochs were used when

the back-propagation algorithm was employed The testing RMSE is, however,

unusually larger than the RMSE of the training set as can be seen in Table2

Unlike RMSE, if we look more carefully at the training and testing AAE in this

to approximate the PES From the above two testing cases, we see that the

Trang 10

Định dạng
Số trang	17
Dung lượng	442,63 KB