DSpace at VNU: An implementation of the Levenberg-Marquardt algorithm for simultaneous-energy-gradient fitting using two...
Trang 1Accepted Manuscript
Title: An implementation of the Levenberg–Marquardt
algorithm forsimultaneous-energy-gradient fitting using
two-layer feed-forwardneural networks
Author: Hieu T Nguyen-Truong Hung M Le
To appear in:
Please cite this article as: Hieu T Nguyen-Truong, Hung M Le, An implementation
of the LevenbergndashMarquardt algorithm forsimultaneous-energy-gradient fitting
using two-layer feed-forwardneural networks, Chemical Physics Letters (2015),
http://dx.doi.org/10.1016/j.cplett.2015.04.019
This is a PDF file of an unedited manuscript that has been accepted for publication.
As a service to our customers we are providing this early version of the manuscript The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Trang 2Accepted Manuscript
An implementation of the Levenberg–Marquardt
algorithm for simultaneous-energy-gradient fitting using
two-layer feed-forward neural networks
Hieu T Nguyen-Truong, Hung M Le∗ Faculty of Materials Science, University of Science, Vietnam National University, Ho Chi
Minh City, Vietnam
Abstract
We present in this study a new and robust algorithm for feed-forward neural
network (NN) fitting This method is developed for the application in
poten-tial energy surface (PES) construction, in which simultaneous energy-gradient
fitting is implemented using the well-established Levenberg–Marquardt (LM)
algorithm Three fitting examples are demonstrated, which include the
vibra-tional PES of H2O, reactive PESs of O3and ClOOCl In the three testing cases,
our new LM implementation has been shown to work very efficiently Not only
increasing fitting accuracy, it also offers two other advantages: less training
iterations are utilized and less data points are required for fitting
Keywords: Levenberg–Marquardt algorithm, Feed-Forward Neural Network,
Potential Energy Surface
1 Introduction
For years, artificial neural networks has become a robust and powerful tool
for constructing ab initio potential energy surfaces with reasonable
computa-tional efforts [1 23] However, in most of the above studies, it is obligated that
large data sets are required to construct qualitative PESs which can sufficiently
describe reaction channels and long-range atomic interactions It is therefore of
Email address: hung.m.le@hotmail.com (Hung M Le)
Trang 3Accepted Manuscript
importance to develop an efficient training algorithm with better performance
to improve the fitting accuracy of NNs
In most studies, energy is obviously the main fitting objective in NN PES
It should be noted that atomic forces, which are derived from energy gradients,
are indeed required to be well-approximated for the vitality of solving
differ-ential equations of motions in molecular dynamics simulations Not long ago,
Pukrittayakamee et al [7] established an approach to simultaneously fit energies
and forces in NN models using the back-propagation algorithm In principle, not
only does this method improve training efficiency, it also helps to reduce the size
of training data Actually, we have confirmed the validity of such an argument
in a previous study when the back-propagation training method was employed
to fit energies and gradients simultaneously This method also incorporated
symmetry permutations of input variables [13] However, the back-propagation
method often suffers from extremely low convergence rate In fact, there have
been efforts to improve its working efficiency [24, 25] On the other hand, the
use of the Jacobian-based Leverberg–Marquardt algorithm is very promising in
terms of performance efficiency
In the present work, we develop a new version of the Levenberg–Marquardt
algorithm as an approach to simultaneously fit energies and forces using a
two-layer feed-forward NN We hereby term our newly-implemented algorithm as
the function-derivative Levenberg–Marquardt (FD-LM) fitting method To
val-idate fitting efficiency, we apply the current FD-LM approach to three molecular
systems with different levels of PES roughness: H2O, O3, and ClOOCl By
pro-viding a direct comparison of PES fitting accuracy of the present work with
those obtained by the back-propagation algorithm [13] and by fitting only
en-ergies [11], we observe that the FD-LM significantly improves fitting accuracy
and thereby save much computational resource
Trang 4Accepted Manuscript
2 Mathematical Background and Implementation
It is important to imply that the traditional approximation method for
two-layer feed-forward NNs only fits targeted energies In order to include force
(gradient) fitting, we follow Pukrittayakamee et al [7] to employ a
minimiza-tion scheme of mean square error (MSE), or so-called performance index, as
following:
P = 1
L
Q
X
q=1
"
(tq− aq)2+ ρ
R
R
X
r=1
∂tq
∂pr,q − ∂aq
∂pr,q
2#
where L = Q(R + 1), Q is the number of samples, R is the number of input
parameters, tq is the qth scaled target (in a particular pre-defined range such
as [-1; 1] in order to enhance fitting efficiency), ∂tq/∂pr,q is the rth partial
derivative of the qth target (each quantity is converted in correspondence with
the scaled inputs and targets), aq is the qth output energy predicted by the
NN, ∂aq/∂pr,q is the predicted rth partial derivative of the qth sample given by
the NN An empirical parameter, ρ, is introduced into the performance index
formula to assign a “penalty” on the significance of all gradients with respect
to targeted energies Using a small ρ value would lower predicting accuracy of
gradients and thereby put a higher constraint on fitting accuracy of energy On
the other hand, a large ρ value would reduce fitting accuracy of the main target
(energy) It can be obviously seen that if ρ vanishes, our implemented algorithm
will essentially work as a traditional energy fitting procedure Despite the fact
that such a parameter can be tuned to adjust fitting performance, there is an
idea suggested by Pukrittayakamee et al [7] to first determine ρ:
ρ = maxn|tq|2o max
(
∂tq
∂pr,q
The output aq of the NN is given by
aq =
N
X
n=1
Trang 5Accepted Manuscript
where N , w and b are the number of hidden neurons, weights and a bias of the
second layer of the NN, respectively, and
a1n,q= f
R
X
r=1
wn,r1 pr,q+ b1n
!
where f (x) is the transfer function (here we employ tanh as the transfer
func-tion), R is the number of input samples, w1
n,r and b1
n are weights and biases
of the first NN layer, respectively The scaled variables, pr,q, are employed as
input signals instead of using the original inputs p0
r,q The scaling scheme is as follow:
pr,q= p
0 r,q− min{p0
r,q} max{p0
r,q} − min{p0
r,q}, (q = 1, 2, , Q), (5) where max{p0
r,q} and min{p0
r,q} are the maximum and minimum values of the original input within Q samples, respectively
In the training process, weights and biases are continuously updated to
re-duce the performance index Now, let us denote x as a column vector to
rep-resent all NN parameters, which include weights and biases of the two layers
The form of x is written as:
xT =h w1
1,1 w1
1,2 · · · w1
1,R w1 2,1 w1 2,2 · · · w1
N,R b1 b1 · · · b1
N w2 w2 · · · w2
i (6)
The number of parameters in vector x is M = N (R + 1) + (N + 1) In the
present work, we use the Levenberg–Marquardt algorithm to update vector x,
which represents all NN parameters:
xnew= x −JTJ + µI−1JTe, (7) where e is the error vector that is defined as a column vector of L elements,
eT =h e∗
1 e1,1 e2,1 · · · eR,1 e∗2 e1,2 e2,2 · · · eR,Q
i (8) where
and
er,q =r ρ
R
∂tq
∂pr,q
− ∂aq
∂pr,q
Trang 6
Accepted Manuscript
We can rewrite the performance index P in Eq (1) as a function of error vector
e,
P = 1 L
Q
X
q=1
"
e∗q2
+
R
X
r=1
(er,q)2
#
The Jacobian J in Eq (7) is an L × M matrix as given below:
J =
1,1
1,2
∗ 1
N,R
∗ 1
N
∗ 1
N
∂b
1,1
1,2
N,R
N
N
∂b
1,1
1,2
N,R
N
N
∂b
.
.
.
.
.
.
.
.
.
.
1,1
1,2
N,R
N
N
∂b
2
1,1
2
1,2
∗ 2
N,R
2
2
∗ 2
N
2
2
∗ 2
N
2
∂b
1,1
1,2
N,R
N
N
∂b
1,1
1,2
N,R
N
N
∂b
.
.
.
.
.
.
.
.
.
.
1,1
1,2
N,R
N
N
∂b
,
(12)
where
∂e∗q
∂e∗q
∂w2 n
∂e∗q
∂b1 n
= −w2nh1 − a1n,q2i, (15)
∂e∗q
∂w1 n,r
= pr,q
∂e∗q
∂b1 n
∂er,q
∂er,q
∂w2 n
= −r ρ R
h
1 − a1n,q2i
Trang 7Accepted Manuscript
∂er,q
∂b1 n
= −2wn2a1n,q∂er,q
∂w2 n
∂er,q
∂w1 n,r 0
=r ρ
R 1 − 2w
1 n,ra1n,qpr0 ,q
∂ e
∗ q
∂b1 n
3 Results and Discussions
As mentioned earlier, the newly-implemented FD-LM algorithm for
simulta-neous energy-gradient fitting is applied to construct the PES for three different
molecular systems at different levels of complexity for testing purposes Those
systems include H2O, O3, and ClOOCl (see Table1for more details) The PES
of O3 was previously reported by Le et al [11] using the traditional NN
fit-ting method with only energy approximation, while Nguyen-Truong and Le [13]
reported NN PESs for H2O and ClOOCl trained by the back-propagation
al-gorithm with simultaneous-energy-gradient approximation By providing direct
comparisons of the present FD-LM results with those reported in the
litera-ture [11–13], we are able to discuss the major advantages of our newly-developed
FD-LM algorithm in PES fitting
In each case, a training set is employed to train the NN PES, while an
inde-pendent testing set is used to evaluate statistical error The use of a validation
set to prevent “over-fitting” (often referred to as the “early stopping”
tech-nique) is in fact unnecessary in the FD-LM scheme because of the incorporated
gradient interpolation during the fitting process The training process is
termi-nated when either µ (in Eq (7)) achieves 10−6 or the training error increases
continuously in 20 epochs
In the first case, we re-construct the vibrational ground-state PES of
molecu-lar H2O based on existing ab initio data [13] With only three internal variables,
the potential function is rather simple compared to the two later cases In the
training process, we utilize a set of 191 configurations with a potential energy
range of 1.5 eV In the hidden NN layer, we use 10 neurons, which bring the
Trang 8Accepted Manuscript
angles), and potential energy (Hartree).
smallest largest
H2O
θ(H1–O–H2) 64.18 164.57
O3
r1(O1–O2) 1.107 2.157
r2(O2–O3) 1.057 1.557 θ(O1–O2–O3) 73.72 157.72
ClOOCl
r1(O1–Cl1) 1.481 2.448
r2(O2–O1) 1.048 2.823
r3(O2–Cl2) 1.486 2.444
θ1(O2–O1–Cl1) 73.494 179.663
θ2(Cl2–O2–O1) 79.872 178.927 φ(Cl2–O2–O1–Cl1) 0.027 184.672
total number of NN parameters to 51 Subsequently, a statistical test is
per-formed with an independent set of 5,612 configurations, and the testing error
reveals excellent fitting accuracy More details regarding best achieved results
for training and testing can be found in Table2 Also, the previously-reported
results [13] are also included for comparison purposes We see that the current
FD-LM algorithm is much powerful and robust than the NN back-propagation
In particular, the present root-mean-squared error (RMSE) for a testing set of
5,612 configurations is as small as 6.55 × 10−4 eV (or 1.51 × 10−2 kcal/mol),
Trang 9Accepted Manuscript
while the corresponding absolute-average error (AAE) is 4.59 × 10 eV (or
1.06 × 10−2 kcal/mol) Those errors are in fact very small when we compare
them to the energy range (1.5 eV)
Not only obtaining excellent accuracy in energy prediction, we also
success-fully reproduce forces with respect to three internal variables describing the
H2O molecule In Fig.1, we show the real and interpolated energies and forces
of 50 testing configurations We observe that the NN-predicted results are in
excellent agreement with the data given by MP2/6-311G* calculations
More-over, the optimization process indicates an advancement when only 188 epochs
(training iterations) are required to reach numerical convergence Recall that
in the previous study [13], more than 60,000 epochs were used by the NN
back-propagation method
The next fitting objective, which is the reactive PES of O3, is further
compli-cated in terms of electronic structure Even though it is a three-atom problem
like H2O, the fourth-order Møller–Plesset perturbation [26, 27] (MP4)
calcu-lations suggested that the reactive PES of O3 is higher energetic and consists
of three different switchable spin states: singlet ground state and two excited
states (triplet and quintet) Therefore, the potential energy function of such a
molecule with a complicated electronic structure is expected to be quite
com-plex In the training process, we employ the pre-constructed dataset (obtained
by grid scans of internal coordinates) with an energy range of 2.5 eV [11]
The total number of hidden neurons to fit the PES of O3is 150 At
termina-tion, a total number of 1,187 epochs are used to train the NN parameters, and
the analysis of numerical accuracy for O3 is provided in Table2 The previous
fitting error reported by Le et al [11] is also included for comparison It should
be noted that we only use 2,815 configurations to construct the O3training set,
whereas Le et al [11] used up to 5,906 configurations It can be seen that the
testing result obtained from FD-LM fitting is better than that reported in the
previous study (0.035 eV vs 0.045 eV in RMSE, respectively) The
improve-ment in our result is clearly due to the use of simultaneous energy-force fitting
to approximate the PES From the above two testing cases, we see that the
Trang 10Accepted Manuscript
paren-theses.
H2O
RMSE training set (191) 3.58 × 10−4 8.26 × 10−3
testing set (5,612) 6.55 × 10−4 1.51 × 10−2 Ref 13 1.03 × 10−2 2.38 × 10−1
AAE training set (191) 2.88 × 10−4 6.63 × 10−3
testing set (5,612) 4.59 × 10−4 1.06 × 10−2 Ref 13 7.80 × 10−3 1.80 × 10−1
O3
RMSE training set (2,815) 3.54 × 10−2 8.16 × 10−1
testing set (50) 3.50 × 10−2 8.07 × 10−1 Ref 11 0.45 × 10−1 1.03 AAE training set (2,815) 2.17 × 10−2 5.00 × 10−1
testing set (50) 2.22 × 10−2 5.12 × 10−1 Ref 11 7.56 × 10−2 1.74 ClOOCl
RMSE training set (1,693) 2.37 × 10−4 5.48 × 10−3
testing set (17,457) 2.97 × 10−2 6.85 × 10−1 Ref 12 1.37 × 10−2 3.16 × 10−1
Ref 13 4.09 × 10−2 9.43 × 10−1 AAE training set (1,693) 1.73 × 10−4 3.99 × 10−3
testing set (17,457) 5.93 × 10−3 1.37 × 10−1 Ref 12 0.78 × 10−2 1.80 × 10−1 Ref 13 2.69 × 10−2 6.20 × 10−1
Trang 11Accepted Manuscript
−76.23
−76.22
−76.21
−76.2
−76.19
−76.18
−0.4
−0.3
−0.2
−0.1 0 0.1
−0.15
−0.1
−0.05 0 0.05 0.1
−0.1
−0.05 0 0.05
(b) (a)
(c)
(d)
– ab initio calculations.
Trang 12Accepted Manuscript
current FD-LM method not only reduces fitting error, but it also downsizes the
training set In Fig.2, we show the approximated and true energies and forces
of 50 randomized configurations In a few number of cases, the predictions of
gradients with respect to the O–O bond or O–O–O bending angle are not quite
accurate; however, we can see that the majority of testing cases still reveals very
good gradient predictions
In addition, we also attempt to train the NN with a smaller dataset of 1,409
configurations Indeed, the fitting accuracy for the training set is somewhat
improved (RMSE = 0.017 eV), while the testing accuracy for all available data
points is not as good (RMSE = 0.076 eV)
In the last case, the PES of ClOOCl is constructed Four atoms in the
sys-tem constitute a set of six internal input variables, which include three bond
distances, two bending angles, and one dihedral angle With more internal
vari-ables, we employ a NN with 150 hidden neurons to fit this PES For comparison
purposes, we utilize a previously-constructed data set in the training process,
which contains 1,693 configurations as also employed in a previous study (with
an energy range of 1.2 eV) [13] The RMSE and AAE of the training set are
reported as 2.37 × 10−4 eV (or 5.48 × 10−3 kcal/mol) and 1.73 × 10−4 eV (or
3.99 × 10−3 kcal/mol), respectively Compared to the RMSE that was reported
in a previous study, we believe that the current LM implementation is highly
advantageous A large testing set of 17,457 configuration (also obtained from
a previous study [12]) is employed to validate fitting accuracy The testing
RMSE is subsequently estimated as 0.030 eV, which is comparable to the error
produced by back-propagation fitting (0.041 eV) In particular, for the training
set, we even observe that the FD-LM algorithm significantly improve the fitting
error (almost 60 times smaller than the RMSE reported in the literature [12])
In total, 1,241 epochs are used to approximate the PES with excellent
statis-tical accuracy, while we note that more than 60,000 epochs were used when
the back-propagation algorithm was employed The testing RMSE is, however,
unusually larger than the RMSE of the training set as can be seen in Table2
Unlike RMSE, if we look more carefully at the training and testing AAE in this
... due to the use of simultaneous energy-force fittingto approximate the PES From the above two testing cases, we see that the
Trang 10