DSpace at VNU: A novel efficient two-phase algorithm for training interpolation radial basis function networks tài liệu,...
Trang 1Signal Processing 87 (2007) 2708–2717
A novel efficient two-phase algorithm for training interpolation
Hoang Xuan Huana, Dang Thi Thu Hiena, Huu Tue Huynhb,c,
a Faculty of Information Technology, College of Technology, Vietnam National University, Hanoi, Vietnam
b Faculty of Electronics and Telecommunications, College of Technology, Vietnam National University, Hanoi, Vietnam
c Department of Electrical and Computer Engineering, Laval University, Quebec, Canada Received 18 October 2006; received in revised form 28 April 2007; accepted 8 May 2007
Available online 16 May 2007
Abstract
Interpolation radial basis function (RBF) networks have been widely used in various applications The output layer weights are usually determined by minimizing the sum-of-squares error or by directly solving interpolation equations When the number of interpolation nodes is large, these methods are time consuming, difficult to control the balance between the convergence rate and the generality, and difficult to reach a high accuracy In this paper, we propose a two-phase algorithm for training interpolation RBF networks with bell-shaped basis functions In the first two-phase, the width parameters of basis functions are determined by taking into account the tradeoff between the error and the convergence rate Then, the output layer weights are determined by finding the fixed point of a given contraction transformation The running time of this new algorithm is relatively short and the balance between the convergence rate and the generality is easily controlled by adjusting the involved parameters, while the error is made as small as desired Also, its running time can be further enhanced thanks to the possibility to parallelize the proposed algorithm Finally, its efficiency is illustrated
by simulations
r2007 Elsevier B.V All rights reserved
Keywords: Radial basis functions; Width parameters; Output weights; Contraction transformation; Fixed point
1 Introduction Radial basis function (RBF) networks, first proposed by Powell [1]and introduced into neural network literature by Broomhead and Lowe [2], have been widely used in pattern recognition, equalization, clustering, etc (see [3–6]) In a multi-variate interpolation network of a function f, the interpolation function is of the form
jðxÞ ¼XM k¼1
wkh x vk; s
k
þw0
www.elsevier.com/locate/sigpro
0165-1684/$ - see front matter r 2007 Elsevier B.V All rights reserved.
doi: 10.1016/j.sigpro.2007.05.001
$
This work has been financially supported by the College of
Technology, Vietnam National University, Hanoi Some
pre-liminary results of this work were presented at the Vietnamese
National Workshop on Some Selected Topics of Information
Technology, Hai Phong, 25–27 August 2005.
Corresponding author Faculty of Electronics and
Telecom-munications, College of Technology, Vietnam National
Uni-versity, 144 Xuanthuy, Caugiay, Hanoi, Vietnam.
Tel.: +84 4 754 9271; fax: +84 4 754 9338.
E-mail addresses: huanhx@vnu.edu.vn (H.X Huan) ,
dthien2000@yahoo.com (D.T.T Hien) , huynh@gel.ulaval.ca ,
tuehh@vnu.edu.vn (H.T Huynh)
Trang 2such that jðxkÞ ¼yk; 8k ¼ 1; ; N, where
xk
N
k¼1 is a set of n-dimensional vectors (called
interpolation nodes) and yk¼f ðxkÞis the measured
value of the function f at the node xk The real
functions hð x v k; skÞ are called RBFs with the
centers vk, M(MpN) is the number of RBFs used to
approximate f, and wk and sk are unknown
parameters to be determined Properties of RBFs
were studied in [7–9] The most common kind of
RBFs is the Gaussian function hðu; sÞ ¼ eu 2 =s 2
In interpolation RBF networks, their centers are
interpolation nodes; in this case, M ¼ N and vk¼
xk for all k In network training algorithms,
para-meters wkand skare often determined by
minimiz-ing the sum-of-squares error or directly solvminimiz-ing
interpolation equations (see[4,6]) An advantage of
interpolation RBF functions, proved by Bianchini
et al.[10], is that their sum of squared error has no
local minima, so that any optimization procedure
always gives a unique solution The most common
training algorithm is the gradient descent method
Despite the fact that the training time for an RBF
network is shorter than that for multiple-layered
perceptron (MLP), it is still rather long and the
efficiency of any optimal algorithm depends on the
choice of initial values [ref] On the other hand, it is
difficult to obtain small errors, and it is not easy to
control the balance between the convergence rate
and the generality, which depends on the radial
parameters Consequently, interpolation networks
are only used when the number of interpolation
nodes is not too large Looney [5] suggests to use
this network when the number of interpolation
nodes is less than 200
Let us consider an interpolation problem in the R4
space, with 10 points on each dimension The total
number of nodes is 104; even with this relatively high
figure, the nature of the interpolation problem is still
very sparse With known methods, it is impossible to
handle this situation In this paper, we propose a
highly efficient two-phase algorithm for training
interpolation networks In the first phase, the radial
parameters skare defined by balancing between the
convergence rate and the generality In the second
phase, the output weights wk are determined by
calculating the fixed point of a given contraction
transformation This algorithm converges quickly,
and can be parallelized in order to reduce its running
time Furthermore, this method gives a high
accu-racy Preliminary results show that this algorithm
works well even when the interpolation nodes are
relatively large as high as 5000 nodes
This paper is organized as follows In Section 2, RBF networks and usual training methods are briefly introduced Section 3 is dedicated to the new training algorithm Simulation results are presented
in Section 4 Finally, important features of the algorithm are discussed in the conclusion
2 Interpolation problems and RBF networks: an overview
In this section, the interpolation problem is stated first, then Gaussian RBFs and interpolation RBF networks are briefly introduced
2.1 Multivariate interpolation problem and radial basis functions
2.1.1 Multivariate interpolation problem Consider a multivariate function f: D(CRn)-Rm
and a sample setxk; ykN
k¼1ðxk2Rn; yk2RmÞsuch that f(xk) ¼ ykfor k ¼ 1, y, N Let j be a function
of a known form satisfying interpolation conditions: jðxiÞ ¼yi; 8i ¼ 1; ; N (1)
Eq (1) helps determine the unknown parameters
in j
The points xkare called interpolation nodes, and the function j is called interpolation function of f and used to approximate f on the domain D In
1987, Powell proposed to use RBFs as interpolation function j This technique, using Gaussian RBF,
is described in the following; for further details, see[4–6]
2.1.2 Radial basis function technique Without loss of generality, it is assumed that m is equal to 1 The interpolation function j has the following form:
jðxÞ ¼XN k¼1
where
jkðxÞ ¼ ekxv kk2=s 2
(3)
is the kth RBF corresponding to the function
hð x v k; s
kÞin Section 1, uk k ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn i¼1u2 i
q
is the Euclidean norm of u; the interpolation node vkis the center vector of jk; sk and wk are unknown parameters to be determined For each k, the parameter s , also called the width of j , is used
Trang 3to control the domain of influence of the RBF jk If
x vk
43skthen jk(x) is almost negligible In the
approximation problem, the number of RBFs is
much smaller than N and the center vectors are
chosen by any convenient methods
By inserting the interpolation conditions of (1)
into (2), a system of equations is obtained in order
to determine the sets fskgand fwkg:
jðxiÞ ¼XN
k¼1
wkjkðxiÞ þw0¼yi; 8i ¼ 1; ; N
(4) Taking (3) into account, i.e vk¼xk, it gives
XN
k¼1
wkekx i x kk2=s 2
¼yiw0¼zi; 8i ¼ 1; ; N
(5)
If the parameters skare selected, then we consider
the N N matrix F:
with
jk;i¼jkðxiÞ ¼ejjxixkjj2=s 2
Michelli [11]has proved that if the nodes xk are
pairwise different, then F is positive-definite, and
hence invertible Therefore, for any w0,there always
exists unique solutions w1,y,wNfor (5) The above
technique can then be used to design interpolation
RBF neural networks (hereinafter called
interpola-tion RBF networks)
2.2 Interpolation RBF networks
An interpolation RBF network which
interpo-lates an n-variable real function f: D(CRn)-Rm
is a 3-layer feedforward network It is composed of n
nodes of the input layer, represented by the input
vector xARn, N hidden neurons, of which the kth
output is the value of radial function jk, m output
neurons which determine interpolated values of f
The hidden layer is also called RBF layer
Like other two-phase algorithms, one advantage
of this new algorithm is that m neurons of the
output layer can be trained separately There are
many different ways to train an RBF network
Schwenker et al [6] categorize these training
methods into one-, two-, and three-phase learning
schemes In one-phase training, the widths s of the
RBF layer are set to a predefined real number s, and only the output layer weights wkare adjusted The most common learning scheme is two-phase training, where the two layers of the RBF network are trained separately The width parameters of the RBF layer are determined first, the output layer weights are then trained by a supervised learning rule Three-phase training is only used for approx-imation RBF networks; after the initialization of the RBF networks utilizing two phase training, the whole architecture is adjusted through a further optimization procedure The output layer may be determined directly by solving (4) However, when the number of interpolation nodes reaches hun-dreds, these methods are unstable Usually, in a training algorithm, the output weights are deter-mined by minimizing the sum-of-squares error, which is defined as
E ¼XN k¼1
jðxkÞ yk
Since the function E does not have local minima, optimization procedures always give a good solu-tion for (4) In practice, the training time of an RBF network is much shorter than that of an MLP one However, known methods of multivariate minimiz-ing still take rather long runnminimiz-ing times, and it is difficult to reach a very small error, or to parallelize the algorithm structure
Moreover, the width parameters of the RBFs also affect the network quality and the training time[5] Preferably, these parameters should be chosen large when the number of nodes is small, and small when the number of nodes large Therefore, they can be used to control the balance between the convergence rate and the generality of the network
In the following, a new two-phase training algorithm is proposed Briefly, in the first phase, the width parameters sk of the network are determined by balancing between its approximation generality and its convergence rate, and in the second phase, the output weights wkare iteratively adjusted by finding the corresponding fixed point of
a given contraction transformation
3 Iterative training algorithm The main idea, which is stated in the following basic theorem, of the new training algorithm is based on a contraction-mapping related to the matrix F
Trang 43.1 Basic theorem
w1 wN
and Z ¼ z 1 zN T
, respec-tively, the output weight vector and the right-hand
side of (5)
By setting
C ¼ I F ¼ ch k;ji
we have
ejjx i x k jj 2 =s 2
if: kaj:
(
(9) Then, (4) can be expressed as following:
As mentioned in Section 2.1, if the width
parameters sk and w0 are determined then (10)
always has a unique solution W First, we set the
average of all ykto w0by
w0¼ 1
N
XN
k¼1
Now, for each kpN, we have the following
function qkwith argument sk:
qk¼XN
j¼1
Theorem 1 The function qkðskÞ is increasing Also,
for every positive number qo1, there exists a sksuch
that qkis equal to q
Proof From (9) and (12), we can easily verify that
qk is an increasing function of sk Moreover, we
have
lim
s k !1qk¼N 1 and lim
s k !0qk¼0 (13) Because the function qk is continuous, for every
qA(0,1) there exists a sk such that qk(sk) ¼ q The
theorem is proved &
This theorem shows that for each given positive
value qo1, we can find a set of values sf kgNk¼1such
that the solution W*of (10) is the fixed point of the
contraction transformation CW+Z corresponding
to the contraction coefficient q
3.2 Algorithm description
Given an error e, a positive number qo1 and a
given 0oao1, the objective of our algorithm is to
determine the parameters sk and W* In the first phase, skare determined such that qkpq, and if sk
is replaced by sk=a then qk4q Therefore, the norm C
k k ¼maxk kp1 u kCuk
of the matrix C induced
by the vector norm :k kdefined in Eq (14) is smaller than q In the second phase, the solution W* of
Eq (10) is iteratively adjusted by finding the fixed point of the contraction transformation CW þ Z The algorithm is specified inFig 1and described in detail thereafter
3.2.1 Phase 1 Determining width parameters The first phase of the algorithm is to determine the width parameters sksuch that qkpq and closest
to q; i.e., if we replace skby sk=a then qk4q Given a positive number ao1 and an initial width
s0, which might be chosen to be equal to 1=ð ffiffiffi
2
p ð2NÞ1=nÞ as suggested in [5], then the algo-rithm performs the following iterative procedure 3.2.2 Phase 2 Determining output weights
To determine the solution W* of Eq (10), the following iterative procedure is executed
For each n-dimensional vector u, we denote by u
k k the following norm:
u
k k ¼XN
j¼1
The end condition of algorithm can be chosen from one of the following equations:
(a) q
1 q W
1W0
(b) tXln ð1 qÞ= Zk k
ln q ¼
ln ln Zk kþlnð1 qÞ
ln q ,
(16) where t is the number of iterative time
Fig 1 Network training procedure.
Trang 5These end conditions are suggested from the
following theorem of convergent property
3.3 Convergence property
The following theorem ensures the algorithm
convergence and allows us to estimate its error
Theorem 2 The algorithm always ends after a finite
number of iterations, and the final error is bounded by
W1W
Proof First, from the conclusion of the Theorem 1,
it can be seen that the first phase of the algorithm
always ends after a finite number of steps and qkpq
for every k On the other hand, the norm Ck k of
matrix C induced by the vector norm k k: in Eq
(14) is determined by the following equation (see[2,
Theorem 2, Subsection 9.6, Chapter I]):
C
k k¼max
kpN qk
Therefore, phase 2 corresponds to the procedure
of finding the fixed point of the contraction
transformation Cu þ Z with the contraction
coeffi-cient q with respect to the initial approximation
u0¼0 and u1¼Z It follows that if we perform t
iterative steps in phase 2, then W1 corresponds to
the (t+1)th approximate solution ukþ1 of the fixed
point W*of the contraction transformation Using
Theorem 1 in subsection 12.2 of [4], the training
error can be bounded by
W1W
p qtþ1
1 q u
1u0
¼ qtþ1
1 qk kZ .
(19)
It is easy to verify that expression (16) is
equivalent to the equation qtþ1=ð1 qÞ Zk kp
Then the statement holds if the end condition b) is
used On the other hand, in Eq (19) at t ¼ 0, with
W0¼u0, then u1¼W1 and
W1W
p q
1 q W
1W0
(20) Combining (15) and (20) gives (17) Then the
statement holds if the end condition (a) is used
The theorem is proved &
3.4 Complexity of the algorithm
In this section, the complexity for each phase of
the algorithm is analyzed
Phase 1: Beside n and N, the complexity of the first phase depends on the distribution of interpola-tion nodes xk N
k¼1 and does not depend on the function f Depending on the initial choices of s0, that there can be p4q (corresponding to step 3 of
Fig 2) or poq (corresponding to step 4 of Fig 2) For the former case, for every kpN, let mk be the number of iterations in step 3 such that qk4q with
sk¼am k 1s0 but qkpq with sk¼am ks0 Therefore,
mkploga
smin
s0
where smin¼min sf kgðps0Þ (21)
In the same manner, if mkis number of iterations
in step 4 then,
mkploga
smax
s0
where smax¼max sf kgðXs0Þ
(22) Let
c ¼ max logasmin
smax
s0
then the complexity of phase 1 is O(cnN2) (Fig 3) Phase 2: The number T of the iterations in phase
2 depends on the norm Zk kof the vector z and the value q It follows from (16) and the proof of Theorem 2 that T can be estimated by
T ¼ln ð1 qÞ=ð Zk kÞ
ð1 qÞ Z
Therefore, the complexity of the phase 2 is O(TnN2) Hence, the total complexity of this new algorithm is O((T+c)nN2)
4 Simulation study Simulations for a 3-input RBF network are performed in order to test the training time and the generality of the algorithm Its efficiency is also compared to that of the gradient algorithm The network generality is tested by the following procedure: first, choose some points that do not belong to the set of interpolation nodes, then after the network has been trained, the network outputs are compared to the true values of the function at these points in order to estimate the error
Because all norms in a finite-dimensional space are equivalent (see[12, theorem in Section 9.2]) then instead of the norm k ku determined by (14), the norm k ku ¼maxjpNuj
is used for the end condition (15) Since uk kpN uk k, this change does not influence the convergent property of algorithm
Trang 6The data in this following example are obtained
by approximately scaling each dimension and finally
combining the overall coordinates, and among
these, choose the data The simulations are run on
a computer with the following configuration: Intel
Pentium 4 Processor, 3.0 GHz, 256 MB DDR
RAM The test results and comments of the
algorithm for the function y ¼ x1x2+sin(x2+
x3+1)+4 are presented below
4.1 Test of training time
The training time reflecting the convergence rate
is examined for several numbers of nodes and for
different values of parameters q, a and e
4.1.1 Testing results Simulations are done for the function y ¼ x12.x2+ sin(x2+x3+1)+4 with x1A[0,3], x2A[0,4], x3A[0,5] The stopping rule is for e ¼ 106 Parameters q and
a are set in turn to 0.5 and 0.5; 0.8 and 0.5; 0.8 and 0.7; 0.8 and 0.8, respectively, with the number of nodes varying from 100 to 5000 The simulation results are presented inTable 1
Table 2 shows results for q ¼ a ¼ 0.7 with 2500 nodes and different values for the stopping error e Comments: From these results, it is observed that: (1) The training time of our algorithm is relatively short (only about several minutes for case of about 3000 nodes) It increases when q or a
Fig 3 Specification of the second phase of the algorithm.
Fig 2 Specification of the first phase of the algorithm.
Trang 7increase This means that smaller are q or a,
shorter is the training time However, the training
time is more sensitive in regard of a than of q
(2) When the stopping error is reduced, the total
train-ing time very slightly changes This feature means
that the high accuracy required for a given
appli-cation does not strongly affect the training time
4.2 Test of generality for q or a is changed
sequentially
To avoid unnecessary long running time, in this
part, we limit the node number to 400 These nodes
scattered in the domain {x1A[0,3], x2A[0,4], x3A
[0,5]} are generated as described above, and the
network is trained for different values of q and a,
with the stopping error e ¼ 106 After the training
is completed, the errors at 8 randomly chosen points
that do not belong to the trained nodes are checked
Test results for cases that q or a is changed
sequentially are presented inTables 3 and 4
4.2.1 Test with q ¼ 0.8 and a is changed sequentially
Testing results: Experiment results for e ¼ 106,
q ¼ 0.8 and a is set in turn to 0.9; 0.8; 0.6; 0.4 are
presented inTable 3
Comment: From these results, it can observed that
when a increases the checked errors decrease
quickly It implies from that when a is small, width
parameters sk should be also small, this influences
the generality of network In our experience, it is
convenient to set aA[0.7,y,0.9]; but the concrete
choice depends on the balance between demanded training time and generality of network
4.2.2 Test with a ¼ 0.9 and q is changed sequentially Testing results: The results for e ¼ 106, a ¼ 0.9 and q is set in turn to 0.9; 0.7; 0.5; 0.3 are presented
in Table 4 Comment: These results have shown that the generality of network strongly increases when q increases, although the change of q weakly influ-ences to training time as mentioned in Section 4.1 4.3 Comparing to gradient algorithm
We have performed simulations for the function
y ¼ x1x2+sin(x2+x3+1)+4 with 100 interpolation nodes and x1A[0,3], x2A[0,4], x3A[0,5] For the gradient algorithm family, it is very difficult to reach
a high training accuracy and it is also difficult to control the generality of networks Beside their training time, accuracy at trained nodes and error at untrained nodes (generality) obtained by the gra-dient method and by our algorithm are now compared The program of the gradient algorithm
is built by using Matlab 6.5
4.3.1 Test of accuracy at trained nodes
We randomly choose 8 nodes in 100 interpolation nodes After the training network by our algorithm with e ¼ 106, q ¼ 0.8, a ¼ 0.9 (training time: 1 s) and by the gradient algorithm in two cases in turn to
100 loop times (training time: 1 s) and 10,000 loop
Table 1
Training time for the stopping rule defined by e ¼ 10 6
Number of nodes e ¼ 10 6 , q ¼ 0.5,
a ¼ 0.5
e ¼ 10 6 , q ¼ 0.8,
a ¼ 0.5
e ¼ 10 6 , q ¼ 0.8,
a ¼ 0.7
e ¼ 10 6 , q ¼ 0.8,
a ¼ 0.8 Training time ( 00 ) Training time ( 00 ) Training time ( 00 ) Training time ( 00 )
Table 2
Training time for cases of 2500 nodes, q ¼ a ¼ 0.7 and e is changed
Training time ( 00 ) Training time ( 00 ) Training time ( 00 ) Training time ( 00 )
Trang 8times (training time: 180 s), we check the errors at
the chosen nodes to compare the accuracy of the
algorithms
Testing results: The experiment results are
pre-sented inTable 5
Comment: It can be observed that our algorithm is
much better than the gradient algorithm in both
training time and accuracy This fact seems natural,
because the gradient algorithm uses an optimization
procedure And it is known that it is difficult to obtain
a high accuracy in using any optimization procedure
4.3.2 Comparing of generality
We randomly choose 8 untrained nodes After the
training network by two algorithms with the same
parameters in Section 4.3.1, we check the errors at
the chosen nodes to compare the generality of the
algorithms
Testing results: Experiment results are presented
Table 6 Comments: From these results, it is very impor-tant to observe that in MLP networks, it is well known when the training error is small, the overfit phenomenon might happen [13] But for RBF networks, the RBFs only have local influence such that when data are not noisy, the overfit phenom-enon would not be a serious problem In fact, the simulation results show that this new algorithm offers a very short training time with a test error very small, compared to the gradient algorithm
5 Discussion and conclusion This paper proposes a simple two-phase algo-rithm to train interpolation RBF networks The first phase is to iteratively determine the widths of
Table 3
Checking error at nodes with q ¼ 0.8; e ¼ 10 6 and a is set in turn to 0.9; 0.8; 0.6
Co-ordinate of checked point Original
function value
q ¼ 0.8, a ¼ 0.9 (training time ¼ 5 00 )
q ¼ 0.8, a ¼ 0.8 (training time
¼ 4 00 )
q ¼ 0.8, a ¼ 0.6 (training time
¼ 4 00 )
value
Error Interpolation
value
Error Interpolation
value
Error
2.68412 2.94652 3.329423 26.065739 26.0679 21.6 10 4 26.06879 30.502 10 4 26.0691 33.802 10 4
2.21042 1.052145 0.040721 10.007523 10.0024 51.24 10 4 10.0144 68.763 10 4 10.0146 71.163 10 4
2.842314 2.525423 0.048435 23.983329 24.01001 266.81 104 24.0201 367.706 104 24.0251 417.90 104 2.842315 3.789123 3.283235 35.587645 35.5818 58.45 10 4 35.5799 77.452 10 4 35.5963 86.548 10 4
2.05235 3.78235 1.63321 20.063778 20.05203 117.48 10 4 20.0803 165.219 10 4 20.0812 174.21 10 4
2.84202 3.789241 3.283023 35.582265 35.5986 163.34 10 4 35.5621 201.655 10 4 35.561 212.65 10 4
2.051234 3.15775 0.59763 16.287349 16.28183 55.16 10 4 16.294 66.505 10 4 16.295 78.505 10 4
2.52621 3.36832 0.86412 24.627938 24.67451 465.72 104 24.58628 416.584 104 24.5798 481.38 104
Table 4
Checking error at nodes with a ¼ 0.9; e ¼ 10 6 and q is set in turn to 0.9; 0.7; 0.5
Co-ordinate of checked point Original
function value
q ¼ 0.9, a ¼ 0.9 q ¼ 0.7, a ¼ 0.9 q ¼ 0.5, a ¼ 0.9
value
Error Interpolation
value
Error Interpolation
value
Error
2.68412 2.94652 3.32942 26.06573 26.0655 2.22 10 4 26.0654 3.12 10 4 26.0693 35.46 10 4
2.21042 1.052145 0.04072 10.00752 10.0217 141.79 10 4 10.0196 120.33 10 4 10.0224 149.06 10 4
2.842314 2.525423 0.04843 23.98332 24.0112 279.17 10 4 24.0204 370.87 10 4 24.0221 387.53 10 4
2.842315 3.789123 3.28323 35.58764 35.5818 58.03 10 4 35.5819 57.27 10 4 35.5818 58.08 10 4
2.05235 3.78235 1.63321 20.06377 20.1105 467.62 104 20.1159 520.95 104 20.1135 497.7 104 2.84202 3.7892411 3.28302 35.58226 35.5881 58.26 104 35.5884 61.45 104 35.5886 63.11 104 2.051234 3.15775 0.59763 16.28734 16.2853 20.73 10 4 16.2852 21.13 10 4 16.2775 98.93 10 4
2.52621 3.36832 0.86412 24.62793 24.6117 162.8 10 4 24.6133 146.16 10 4 24.6108 171.74 10 4
Trang 9Gaussian RBF associated to each node And each
RBF is trained separately from others The second
phase iteratively computes the output layer weights
by using a given contraction mapping It is shown in
this paper that the algorithm always converges; and
the running time only depends on the initial values
of q, a, e on the distribution of the interpolation
nodes and on the vector norm of the interpolated
function computed at these nodes
Owing to numerical advantages of contraction
transformations, it is easy to obtain very small
training errors, and it is also easy to control the
balance between its convergence rate and the
generality of network by setting appropriate values
to parameters One of the most important features
of this algorithm is the output layer weights can be
trained independently such that the whole algorithm
can be parallelized Furthermore, for a large
network, the stopping rule based on the norm of N-dimensional vectors can be replaced by a much simpler one defined in Eq (16) to avoid lengthy computations
When the number of nodes is very large, the clustering approach can be used to regroup data into several sets with smaller size By doing so, the training can be parallely done for each cluster and helps to reduce the training time The obtained networks are called local RBF networks This approach might be considered as equivalent to the spline method and it will be presented in a forth-coming paper
In the case of a very large number N of nodes, and for the point of view of neural network as associative memory, another approach can be exploited In fact, an approximate RBF network can be designed with the number of hidden nodes
Table 5
Checking error at 8 trained nodes to compare accuracy
Co-ordinate of checked node Original
function value
Gradient algorithm with 100 loop times training time: 1 00
Gradient algorithm 10,000 loop times training time: 180 00
New algorithm e ¼ 10 6 ,
q ¼ 0.8, a ¼ 0.9 training time:1 00
value
Error Interpolation
value
Error Interpolation
value
Error
1.666667 0.000000 0.000000 4.841471 4.4645 3769.7 10 4 5.0959 2544.2 10 4 4.84146 0.1 10 4
0.333333 0.444444 1.379573 4.361647 3.5933 7683.4 104 3.6708 6908.4 104 4.36166 0.09 104 2.666667 0.444444 1.536421 7.320530 8.7058 13852.7 10 4 7.2647 558.2 10 4 7.32052 0.08 10 4
0.666667 1.333333 0.128552 5.221158 4.0646 11565.5 10 4 4.9517 2694.5 10 4 5.22117 0.1 10 4
2.666667 1.333333 1.589585 12.77726 12.5041 2731.6 10 4 12.1965 5807.6 10 4 12.7772 0.7 10 4
1.666667 1.777778 0.088890 9.209746 6.6682 25415.4 10 4 9.2944 846.5 10 4 9.20972 0.2 10 4
2.333333 0.444444 0.039225 7.415960 6.7228 6931.5 104 7.48 640.4 104 7.41596 0.005 104 2.666667 3.555556 0.852303 28.51619 28.0927 4234.9 104 29.2798 7636.0 104 28.5162 0.09 104
Table 6
Checking error at 8 untrained nodes to compare the generality
Co-ordinate of checked
node
Original function value
Gradient algorithm with 100 loop times training time: 1 00
Gradient algorithm 10,000 loop times training time: 180 00
New algorithm e ¼ 10 6 , q ¼ 0.8, a ¼ 0.9 training time:1 00
value
Error Interpolation
value
Error Interpolation
value
Error
0.32163 0.45123 1.38123 4.350910 2.1394 22115.1 104 3.9309 4200.1 104 4.32214 287.7 104 0.67123 0.8912 1.4512 4.202069 2.8529 13491.6 104 4.7884 5863.3 104 4.20115 9.1 104 1.68125 1.34121 0.27423 8.293276 6.1078 21854.7 10 4 8.3869 936.2 10 4 8.30878 155.0 10 4
0.34312 1.78123 2.56984 3.406823 3.2115 1953.2 10 4 4.1438 7369.7 10 4 3.399 78.2 10 4
2.65989 3.56012 0.8498 28.42147 27.5174 9040.7 10 4 29.1648 7433.2 10 4 28.429 75.2 10 4
1.67013 2.23123 0.29423 9.84913 8.6415 12076.3 104 9.5863 2628.3 104 9.79204 570.9 104 2.65914 3.56123 0.85612 28.41991 27.5147 9052.1 104 29.1634 7434.8 104 28.419 9.1 104 1.3163 0.44925 1.12987 5.311670 3.5188 17928.7 10 4 5.3729 612.2 10 4 5.28737 243.0 10 4
Trang 10much lesser than N, based on the following scheme.
First, the data set is partitioned into K clusters
Ci
f gKi¼1 by using any clustering algorithm (for
example, the k-mean method) Then the center ni
of the RBF associated to the ith hidden neuron can
be chosen to be the mean vector diof Cior the vector
in equation Ci nearest to di The network is trained
by the algorithm with the set of new interpolation
nodes n i k
i¼1 Other choice based on any variation of
this approach can be made, dependently on the
context of the desired applications
The advantage of RBF networks is its local
influence property (see[13,14]); so that width
para-meters are generally chosen small (see Section 6.1,
pp 288–289 of[15]), especially, in[5, Section 7.7, p
262]) it is suggested to choose s ¼ 0.05 or 0.1 In
Section 3.11, p 99 of[5], it is also suggested to use
s ¼ 1=ð2NÞ1=n, which is very small Therefore, qk
must be small In fact, the condition qko1 presents
some limitations, but does not affect the algorithm
performance
Our iterative algorithm is based on the principle
of contraction mapping; to insure the contraction
property, q chosen by (12) is fundamental to this
algorithm, so that this is rather empirical and it does
not correspond to any optimum consideration In
RBF networks, to determine the optimum width
parameters (or q) is still an open problem
References
[1] M.J.D Powell, Radial basis function approximations to
polynomials, in: Proceedings of the Numerical analysis 1987,
Dundee, UK, 1988, pp 223–241.
[2] D.S Broomhead, D Lowe, Multivariable functional interpola-tion and adaptive networks, Complex Syst 2 (1988) 321–355 [3] E Blanzieri, Theoretical interpretations and applications of radial basis function networks, Technical Report
DIT-03-023, Informatica e Telecomunicazioni, University of Trento, 2003.
[4] S Haykin, Neural Networks: A Comprehensive Founda-tion, second ed., Prentice-Hall Inc., Englewood Cliffs, NJ, 1999.
[5] C.G Looney, Pattern Recognition Using Neural Networks: Theory and Algorithm for Engineers and Scientist, Oxford University Press, New York, 1997.
[6] F Schwenker, H.A Kesler, G Palm, Three learning phases for radial-basis-function networks, Neural Networks 14 (4–5) (2001) 439–458.
[7] E.J Hartman, J.D Keeler, J.M Kowalski, Layered neural networks with Gaussian hidden units as universal approx-imations, Neural Comput 2 (2) (1990) 210–215.
[8] J Park, I.W Sandberg, Approximation and radial-basis-function networks, Neural Comput 5 (3) (1993) 305–316.
[9] T Poggio, F Girosi, Networks for approximating and learning, IEEE Proc 78 (9) (1990) 1481–1497.
[10] M Bianchini, P Frasconi, M Gori, Learning without local minima in radial basis function networks, IEEE Trans Neural Networks 6 (3) (1995) 749–756.
[11] C Michelli, Interpolation of scattered data: distance matrices and conditionally positive definite functions, Constr Approx 2 (1986) 11–22.
[12] L Collatz, Functional Analysis and Numerical Mathe-matics, Academic Press, New York, 1966.
[13] T.M Mitchell, Machine Learning, McGraw-Hill, New York, 1997.
[14] H.X Huan, D.T.T Hien, An iterative algorithm for training
an interpolation RBF networks, in: Proceedings of the Vietnamese National Workshop on Some Selected Topics
of Information Technology, Haiphong, Vietnam, 2005,
pp 314–323.
[15] Mousoun, Fundamental of Artificial Neural Networks, MIT Press, Cambridge, MA, 1995.