DSpace at VNU: A novel efficient two-phase algorithm for training interpolation radial basis function networks

DSpace at VNU: A novel efficient two-phase algorithm for training interpolation radial basis function networks tài liệu,...

Trang 1

Signal Processing 87 (2007) 2708–2717

A novel efﬁcient two-phase algorithm for training interpolation

Hoang Xuan Huana, Dang Thi Thu Hiena, Huu Tue Huynhb,c,

a Faculty of Information Technology, College of Technology, Vietnam National University, Hanoi, Vietnam

b Faculty of Electronics and Telecommunications, College of Technology, Vietnam National University, Hanoi, Vietnam

c Department of Electrical and Computer Engineering, Laval University, Quebec, Canada Received 18 October 2006; received in revised form 28 April 2007; accepted 8 May 2007

Available online 16 May 2007

Abstract

Interpolation radial basis function (RBF) networks have been widely used in various applications The output layer weights are usually determined by minimizing the sum-of-squares error or by directly solving interpolation equations When the number of interpolation nodes is large, these methods are time consuming, difficult to control the balance between the convergence rate and the generality, and difficult to reach a high accuracy In this paper, we propose a two-phase algorithm for training interpolation RBF networks with bell-shaped basis functions In the first two-phase, the width parameters of basis functions are determined by taking into account the tradeoff between the error and the convergence rate Then, the output layer weights are determined by finding the fixed point of a given contraction transformation The running time of this new algorithm is relatively short and the balance between the convergence rate and the generality is easily controlled by adjusting the involved parameters, while the error is made as small as desired Also, its running time can be further enhanced thanks to the possibility to parallelize the proposed algorithm Finally, its efficiency is illustrated

by simulations

Keywords: Radial basis functions; Width parameters; Output weights; Contraction transformation; Fixed point

1 Introduction Radial basis function (RBF) networks, ﬁrst proposed by Powell [1]and introduced into neural network literature by Broomhead and Lowe [2], have been widely used in pattern recognition, equalization, clustering, etc (see [3–6]) In a multi-variate interpolation network of a function f, the interpolation function is of the form

jðxÞ ¼XM k¼1

wkh x vk; s

k

þw0

www.elsevier.com/locate/sigpro

doi: 10.1016/j.sigpro.2007.05.001

$

This work has been ﬁnancially supported by the College of

Technology, Vietnam National University, Hanoi Some

pre-liminary results of this work were presented at the Vietnamese

National Workshop on Some Selected Topics of Information

Technology, Hai Phong, 25–27 August 2005.

Corresponding author Faculty of Electronics and

Telecom-munications, College of Technology, Vietnam National

Uni-versity, 144 Xuanthuy, Caugiay, Hanoi, Vietnam.

Tel.: +84 4 754 9271; fax: +84 4 754 9338.

E-mail addresses: huanhx@vnu.edu.vn (H.X Huan) ,

dthien2000@yahoo.com (D.T.T Hien) , huynh@gel.ulaval.ca ,

tuehh@vnu.edu.vn (H.T Huynh)

Trang 2

such that jðxkÞ ¼yk; 8k ¼ 1; ; N, where

xk

N

k¼1 is a set of n-dimensional vectors (called

interpolation nodes) and yk¼f ðxkÞis the measured

value of the function f at the node xk The real

functions hð x v k; skÞ are called RBFs with the

centers vk, M(MpN) is the number of RBFs used to

approximate f, and wk and sk are unknown

parameters to be determined Properties of RBFs

were studied in [7–9] The most common kind of

RBFs is the Gaussian function hðu; sÞ ¼ eu 2 =s 2

In interpolation RBF networks, their centers are

interpolation nodes; in this case, M ¼ N and vk¼

xk for all k In network training algorithms,

para-meters wkand skare often determined by

minimiz-ing the sum-of-squares error or directly solvminimiz-ing

interpolation equations (see[4,6]) An advantage of

interpolation RBF functions, proved by Bianchini

et al.[10], is that their sum of squared error has no

local minima, so that any optimization procedure

always gives a unique solution The most common

training algorithm is the gradient descent method

Despite the fact that the training time for an RBF

network is shorter than that for multiple-layered

perceptron (MLP), it is still rather long and the

efﬁciency of any optimal algorithm depends on the

choice of initial values [ref] On the other hand, it is

difﬁcult to obtain small errors, and it is not easy to

control the balance between the convergence rate

and the generality, which depends on the radial

parameters Consequently, interpolation networks

are only used when the number of interpolation

nodes is not too large Looney [5] suggests to use

this network when the number of interpolation

nodes is less than 200

Let us consider an interpolation problem in the R4

space, with 10 points on each dimension The total

number of nodes is 104; even with this relatively high

ﬁgure, the nature of the interpolation problem is still

very sparse With known methods, it is impossible to

handle this situation In this paper, we propose a

highly efﬁcient two-phase algorithm for training

interpolation networks In the ﬁrst phase, the radial

parameters skare deﬁned by balancing between the

convergence rate and the generality In the second

phase, the output weights wk are determined by

calculating the ﬁxed point of a given contraction

transformation This algorithm converges quickly,

and can be parallelized in order to reduce its running

time Furthermore, this method gives a high

accu-racy Preliminary results show that this algorithm

works well even when the interpolation nodes are

relatively large as high as 5000 nodes

This paper is organized as follows In Section 2, RBF networks and usual training methods are brieﬂy introduced Section 3 is dedicated to the new training algorithm Simulation results are presented

in Section 4 Finally, important features of the algorithm are discussed in the conclusion

2 Interpolation problems and RBF networks: an overview

In this section, the interpolation problem is stated ﬁrst, then Gaussian RBFs and interpolation RBF networks are brieﬂy introduced

2.1 Multivariate interpolation problem and radial basis functions

2.1.1 Multivariate interpolation problem Consider a multivariate function f: D(CRn)-Rm

and a sample setxk; ykN

k¼1ðxk2Rn; yk2RmÞsuch that f(xk) ¼ ykfor k ¼ 1, y, N Let j be a function

of a known form satisfying interpolation conditions: jðxiÞ ¼yi; 8i ¼ 1; ; N (1)

Eq (1) helps determine the unknown parameters

in j

The points xkare called interpolation nodes, and the function j is called interpolation function of f and used to approximate f on the domain D In

1987, Powell proposed to use RBFs as interpolation function j This technique, using Gaussian RBF,

is described in the following; for further details, see[4–6]

2.1.2 Radial basis function technique Without loss of generality, it is assumed that m is equal to 1 The interpolation function j has the following form:

jðxÞ ¼XN k¼1

where

jkðxÞ ¼ ekxv kk2=s 2

(3)

is the kth RBF corresponding to the function

hð x v k; s

kÞin Section 1, uk k ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Pn i¼1u2 i

q

is the Euclidean norm of u; the interpolation node vkis the center vector of jk; sk and wk are unknown parameters to be determined For each k, the parameter s , also called the width of j , is used

Trang 3

to control the domain of inﬂuence of the RBF jk If

x vk

43skthen jk(x) is almost negligible In the

approximation problem, the number of RBFs is

much smaller than N and the center vectors are

chosen by any convenient methods

By inserting the interpolation conditions of (1)

into (2), a system of equations is obtained in order

to determine the sets fskgand fwkg:

jðxiÞ ¼XN

k¼1

wkjkðxiÞ þw0¼yi; 8i ¼ 1; ; N

(4) Taking (3) into account, i.e vk¼xk, it gives

XN

k¼1

wkekx i x kk2=s 2

¼yiw0¼zi; 8i ¼ 1; ; N

(5)

If the parameters skare selected, then we consider

the N N matrix F:

with

jk;i¼jkðxiÞ ¼ejjxixkjj2=s 2

Michelli [11]has proved that if the nodes xk are

pairwise different, then F is positive-deﬁnite, and

hence invertible Therefore, for any w0,there always

exists unique solutions w1,y,wNfor (5) The above

technique can then be used to design interpolation

RBF neural networks (hereinafter called

interpola-tion RBF networks)

2.2 Interpolation RBF networks

An interpolation RBF network which

interpo-lates an n-variable real function f: D(CRn)-Rm

is a 3-layer feedforward network It is composed of n

nodes of the input layer, represented by the input

vector xARn, N hidden neurons, of which the kth

output is the value of radial function jk, m output

neurons which determine interpolated values of f

The hidden layer is also called RBF layer

Like other two-phase algorithms, one advantage

of this new algorithm is that m neurons of the

output layer can be trained separately There are

many different ways to train an RBF network

Schwenker et al [6] categorize these training

methods into one-, two-, and three-phase learning

schemes In one-phase training, the widths s of the

RBF layer are set to a predefined real number s, and only the output layer weights wkare adjusted The most common learning scheme is two-phase training, where the two layers of the RBF network are trained separately The width parameters of the RBF layer are determined first, the output layer weights are then trained by a supervised learning rule Three-phase training is only used for approx-imation RBF networks; after the initialization of the RBF networks utilizing two phase training, the whole architecture is adjusted through a further optimization procedure The output layer may be determined directly by solving (4) However, when the number of interpolation nodes reaches hun-dreds, these methods are unstable Usually, in a training algorithm, the output weights are deter-mined by minimizing the sum-of-squares error, which is defined as

E ¼XN k¼1

jðxkÞ yk

Since the function E does not have local minima, optimization procedures always give a good solu-tion for (4) In practice, the training time of an RBF network is much shorter than that of an MLP one However, known methods of multivariate minimiz-ing still take rather long runnminimiz-ing times, and it is difﬁcult to reach a very small error, or to parallelize the algorithm structure

Moreover, the width parameters of the RBFs also affect the network quality and the training time[5] Preferably, these parameters should be chosen large when the number of nodes is small, and small when the number of nodes large Therefore, they can be used to control the balance between the convergence rate and the generality of the network

In the following, a new two-phase training algorithm is proposed Briefly, in the first phase, the width parameters sk of the network are determined by balancing between its approximation generality and its convergence rate, and in the second phase, the output weights wkare iteratively adjusted by finding the corresponding fixed point of

a given contraction transformation

3 Iterative training algorithm The main idea, which is stated in the following basic theorem, of the new training algorithm is based on a contraction-mapping related to the matrix F

Trang 4

3.1 Basic theorem

w1 wN

and Z ¼ z 1 zN T

, respec-tively, the output weight vector and the right-hand

side of (5)

By setting

C ¼ I F ¼ ch k;ji

we have

ejjx i x k jj 2 =s 2

if: kaj:

(

(9) Then, (4) can be expressed as following:

As mentioned in Section 2.1, if the width

parameters sk and w0 are determined then (10)

always has a unique solution W First, we set the

average of all ykto w0by

w0¼ 1

N

XN

k¼1

Now, for each kpN, we have the following

function qkwith argument sk:

qk¼XN

j¼1

Theorem 1 The function qkðskÞ is increasing Also,

for every positive number qo1, there exists a sksuch

that qkis equal to q

Proof From (9) and (12), we can easily verify that

qk is an increasing function of sk Moreover, we

have

lim

s k !1qk¼N 1 and lim

s k !0qk¼0 (13) Because the function qk is continuous, for every

qA(0,1) there exists a sk such that qk(sk) ¼ q The

theorem is proved &

This theorem shows that for each given positive

value qo1, we can ﬁnd a set of values sf kgNk¼1such

that the solution W*of (10) is the ﬁxed point of the

contraction transformation CW+Z corresponding

to the contraction coefﬁcient q

3.2 Algorithm description

Given an error e, a positive number qo1 and a

given 0oao1, the objective of our algorithm is to

determine the parameters sk and W* In the ﬁrst phase, skare determined such that qkpq, and if sk

is replaced by sk=a then qk4q Therefore, the norm C

k k ¼maxk kp1 u kCuk

of the matrix C induced

by the vector norm :k kdeﬁned in Eq (14) is smaller than q In the second phase, the solution W* of

Eq (10) is iteratively adjusted by finding the fixed point of the contraction transformation CW þ Z The algorithm is specified inFig 1and described in detail thereafter

3.2.1 Phase 1 Determining width parameters The ﬁrst phase of the algorithm is to determine the width parameters sksuch that qkpq and closest

to q; i.e., if we replace skby sk=a then qk4q Given a positive number ao1 and an initial width

s0, which might be chosen to be equal to 1=ð ffiffiffi

2

p ð2NÞ1=nÞ as suggested in [5], then the algo-rithm performs the following iterative procedure 3.2.2 Phase 2 Determining output weights

To determine the solution W* of Eq (10), the following iterative procedure is executed

For each n-dimensional vector u, we denote by u

k k the following norm:

u

k k ¼XN

j¼1

The end condition of algorithm can be chosen from one of the following equations:

(a) q

1 q W

1W0

(b) tXln ð1 qÞ= Zk k

ln q ¼

ln ln Zk kþlnð1 qÞ

ln q ,

(16) where t is the number of iterative time

Fig 1 Network training procedure.

Trang 5

These end conditions are suggested from the

following theorem of convergent property

3.3 Convergence property

The following theorem ensures the algorithm

convergence and allows us to estimate its error

Theorem 2 The algorithm always ends after a finite

number of iterations, and the final error is bounded by

W1W

Proof First, from the conclusion of the Theorem 1,

it can be seen that the ﬁrst phase of the algorithm

always ends after a ﬁnite number of steps and qkpq

for every k On the other hand, the norm Ck k of

matrix C induced by the vector norm k k: in Eq

(14) is determined by the following equation (see[2,

Theorem 2, Subsection 9.6, Chapter I]):

C

k k¼max

kpN qk

Therefore, phase 2 corresponds to the procedure

of ﬁnding the ﬁxed point of the contraction

transformation Cu þ Z with the contraction

coefﬁ-cient q with respect to the initial approximation

u0¼0 and u1¼Z It follows that if we perform t

iterative steps in phase 2, then W1 corresponds to

the (t+1)th approximate solution ukþ1 of the ﬁxed

point W*of the contraction transformation Using

Theorem 1 in subsection 12.2 of [4], the training

error can be bounded by

W1W

p qtþ1

1 q u

1u0

¼ qtþ1

1 qk kZ .

(19)

It is easy to verify that expression (16) is

equivalent to the equation qtþ1=ð1 qÞ Zk kp

Then the statement holds if the end condition b) is

used On the other hand, in Eq (19) at t ¼ 0, with

W0¼u0, then u1¼W1 and

W1W

p q

1 q W

1W0

(20) Combining (15) and (20) gives (17) Then the

statement holds if the end condition (a) is used

The theorem is proved &

3.4 Complexity of the algorithm

In this section, the complexity for each phase of

the algorithm is analyzed

Phase 1: Beside n and N, the complexity of the ﬁrst phase depends on the distribution of interpola-tion nodes xk N

k¼1 and does not depend on the function f Depending on the initial choices of s0, that there can be p4q (corresponding to step 3 of

Fig 2) or poq (corresponding to step 4 of Fig 2) For the former case, for every kpN, let mk be the number of iterations in step 3 such that qk4q with

sk¼am k 1s0 but qkpq with sk¼am ks0 Therefore,

mkploga

smin

s0

where smin¼min sf kgðps0Þ (21)

In the same manner, if mkis number of iterations

in step 4 then,

mkploga

smax

s0

where smax¼max sf kgðXs0Þ

(22) Let

c ¼ max logasmin

smax

s0

then the complexity of phase 1 is O(cnN2) (Fig 3) Phase 2: The number T of the iterations in phase

2 depends on the norm Zk kof the vector z and the value q It follows from (16) and the proof of Theorem 2 that T can be estimated by

T ¼ln ð1 qÞ=ð Zk kÞ

ð1 qÞ Z

Therefore, the complexity of the phase 2 is O(TnN2) Hence, the total complexity of this new algorithm is O((T+c)nN2)

4 Simulation study Simulations for a 3-input RBF network are performed in order to test the training time and the generality of the algorithm Its efﬁciency is also compared to that of the gradient algorithm The network generality is tested by the following procedure: ﬁrst, choose some points that do not belong to the set of interpolation nodes, then after the network has been trained, the network outputs are compared to the true values of the function at these points in order to estimate the error

Because all norms in a ﬁnite-dimensional space are equivalent (see[12, theorem in Section 9.2]) then instead of the norm k ku determined by (14), the norm k ku ¼maxjpNuj

is used for the end condition (15) Since uk kpN uk k, this change does not inﬂuence the convergent property of algorithm

Trang 6

The data in this following example are obtained

by approximately scaling each dimension and ﬁnally

combining the overall coordinates, and among

these, choose the data The simulations are run on

a computer with the following conﬁguration: Intel

Pentium 4 Processor, 3.0 GHz, 256 MB DDR

RAM The test results and comments of the

algorithm for the function y ¼ x1x2+sin(x2+

x3+1)+4 are presented below

4.1 Test of training time

The training time reﬂecting the convergence rate

is examined for several numbers of nodes and for

different values of parameters q, a and e

4.1.1 Testing results Simulations are done for the function y ¼ x12.x2+ sin(x2+x3+1)+4 with x1A[0,3], x2A[0,4], x3A[0,5] The stopping rule is for e ¼ 106 Parameters q and

a are set in turn to 0.5 and 0.5; 0.8 and 0.5; 0.8 and 0.7; 0.8 and 0.8, respectively, with the number of nodes varying from 100 to 5000 The simulation results are presented inTable 1

Table 2 shows results for q ¼ a ¼ 0.7 with 2500 nodes and different values for the stopping error e Comments: From these results, it is observed that: (1) The training time of our algorithm is relatively short (only about several minutes for case of about 3000 nodes) It increases when q or a

Fig 3 Speciﬁcation of the second phase of the algorithm.

Fig 2 Speciﬁcation of the ﬁrst phase of the algorithm.

Trang 7

increase This means that smaller are q or a,

shorter is the training time However, the training

time is more sensitive in regard of a than of q

(2) When the stopping error is reduced, the total

train-ing time very slightly changes This feature means

that the high accuracy required for a given

appli-cation does not strongly affect the training time

4.2 Test of generality for q or a is changed

sequentially

To avoid unnecessary long running time, in this

part, we limit the node number to 400 These nodes

scattered in the domain {x1A[0,3], x2A[0,4], x3A

[0,5]} are generated as described above, and the

network is trained for different values of q and a,

with the stopping error e ¼ 106 After the training

is completed, the errors at 8 randomly chosen points

that do not belong to the trained nodes are checked

Test results for cases that q or a is changed

sequentially are presented inTables 3 and 4

4.2.1 Test with q ¼ 0.8 and a is changed sequentially

Testing results: Experiment results for e ¼ 106,

q ¼ 0.8 and a is set in turn to 0.9; 0.8; 0.6; 0.4 are

presented inTable 3

Comment: From these results, it can observed that

when a increases the checked errors decrease

quickly It implies from that when a is small, width

parameters sk should be also small, this inﬂuences

the generality of network In our experience, it is

convenient to set aA[0.7,y,0.9]; but the concrete

choice depends on the balance between demanded training time and generality of network

4.2.2 Test with a ¼ 0.9 and q is changed sequentially Testing results: The results for e ¼ 106, a ¼ 0.9 and q is set in turn to 0.9; 0.7; 0.5; 0.3 are presented

in Table 4 Comment: These results have shown that the generality of network strongly increases when q increases, although the change of q weakly inﬂu-ences to training time as mentioned in Section 4.1 4.3 Comparing to gradient algorithm

We have performed simulations for the function

y ¼ x1x2+sin(x2+x3+1)+4 with 100 interpolation nodes and x1A[0,3], x2A[0,4], x3A[0,5] For the gradient algorithm family, it is very difﬁcult to reach

a high training accuracy and it is also difﬁcult to control the generality of networks Beside their training time, accuracy at trained nodes and error at untrained nodes (generality) obtained by the gra-dient method and by our algorithm are now compared The program of the gradient algorithm

is built by using Matlab 6.5

4.3.1 Test of accuracy at trained nodes

We randomly choose 8 nodes in 100 interpolation nodes After the training network by our algorithm with e ¼ 106, q ¼ 0.8, a ¼ 0.9 (training time: 1 s) and by the gradient algorithm in two cases in turn to

100 loop times (training time: 1 s) and 10,000 loop

Table 1

Training time for the stopping rule deﬁned by e ¼ 10 6

Number of nodes e ¼ 10 6 , q ¼ 0.5,

a ¼ 0.5

e ¼ 10 6 , q ¼ 0.8,

a ¼ 0.5

e ¼ 10 6 , q ¼ 0.8,

a ¼ 0.7

e ¼ 10 6 , q ¼ 0.8,

a ¼ 0.8 Training time ( 00 ) Training time ( 00 ) Training time ( 00 ) Training time ( 00 )

Table 2

Training time for cases of 2500 nodes, q ¼ a ¼ 0.7 and e is changed

Training time ( 00 ) Training time ( 00 ) Training time ( 00 ) Training time ( 00 )

Trang 8

times (training time: 180 s), we check the errors at

the chosen nodes to compare the accuracy of the

algorithms

Testing results: The experiment results are

pre-sented inTable 5

Comment: It can be observed that our algorithm is

much better than the gradient algorithm in both

training time and accuracy This fact seems natural,

because the gradient algorithm uses an optimization

procedure And it is known that it is difﬁcult to obtain

a high accuracy in using any optimization procedure

4.3.2 Comparing of generality

We randomly choose 8 untrained nodes After the

training network by two algorithms with the same

parameters in Section 4.3.1, we check the errors at

the chosen nodes to compare the generality of the

algorithms

Testing results: Experiment results are presented

Table 6 Comments: From these results, it is very impor-tant to observe that in MLP networks, it is well known when the training error is small, the overfit phenomenon might happen [13] But for RBF networks, the RBFs only have local influence such that when data are not noisy, the overfit phenom-enon would not be a serious problem In fact, the simulation results show that this new algorithm offers a very short training time with a test error very small, compared to the gradient algorithm

5 Discussion and conclusion This paper proposes a simple two-phase algo-rithm to train interpolation RBF networks The ﬁrst phase is to iteratively determine the widths of

Table 3

Checking error at nodes with q ¼ 0.8; e ¼ 10 6 and a is set in turn to 0.9; 0.8; 0.6

Co-ordinate of checked point Original

function value

q ¼ 0.8, a ¼ 0.9 (training time ¼ 5 00 )

q ¼ 0.8, a ¼ 0.8 (training time

¼ 4 00 )

q ¼ 0.8, a ¼ 0.6 (training time

¼ 4 00 )

value

Error Interpolation

value

Error Interpolation

value

Error

2.68412 2.94652 3.329423 26.065739 26.0679 21.6 10 4 26.06879 30.502 10 4 26.0691 33.802 10 4

2.21042 1.052145 0.040721 10.007523 10.0024 51.24 10 4 10.0144 68.763 10 4 10.0146 71.163 10 4

2.842314 2.525423 0.048435 23.983329 24.01001 266.81 104 24.0201 367.706 104 24.0251 417.90 104 2.842315 3.789123 3.283235 35.587645 35.5818 58.45 10 4 35.5799 77.452 10 4 35.5963 86.548 10 4

2.05235 3.78235 1.63321 20.063778 20.05203 117.48 10 4 20.0803 165.219 10 4 20.0812 174.21 10 4

2.84202 3.789241 3.283023 35.582265 35.5986 163.34 10 4 35.5621 201.655 10 4 35.561 212.65 10 4

2.051234 3.15775 0.59763 16.287349 16.28183 55.16 10 4 16.294 66.505 10 4 16.295 78.505 10 4

2.52621 3.36832 0.86412 24.627938 24.67451 465.72 104 24.58628 416.584 104 24.5798 481.38 104

Table 4

Checking error at nodes with a ¼ 0.9; e ¼ 10 6 and q is set in turn to 0.9; 0.7; 0.5

Co-ordinate of checked point Original

function value

q ¼ 0.9, a ¼ 0.9 q ¼ 0.7, a ¼ 0.9 q ¼ 0.5, a ¼ 0.9

value

Error Interpolation

value

Error Interpolation

value

Error

2.68412 2.94652 3.32942 26.06573 26.0655 2.22 10 4 26.0654 3.12 10 4 26.0693 35.46 10 4

2.21042 1.052145 0.04072 10.00752 10.0217 141.79 10 4 10.0196 120.33 10 4 10.0224 149.06 10 4

2.842314 2.525423 0.04843 23.98332 24.0112 279.17 10 4 24.0204 370.87 10 4 24.0221 387.53 10 4

2.842315 3.789123 3.28323 35.58764 35.5818 58.03 10 4 35.5819 57.27 10 4 35.5818 58.08 10 4

2.05235 3.78235 1.63321 20.06377 20.1105 467.62 104 20.1159 520.95 104 20.1135 497.7 104 2.84202 3.7892411 3.28302 35.58226 35.5881 58.26 104 35.5884 61.45 104 35.5886 63.11 104 2.051234 3.15775 0.59763 16.28734 16.2853 20.73 10 4 16.2852 21.13 10 4 16.2775 98.93 10 4

2.52621 3.36832 0.86412 24.62793 24.6117 162.8 10 4 24.6133 146.16 10 4 24.6108 171.74 10 4

Trang 9

Gaussian RBF associated to each node And each

RBF is trained separately from others The second

phase iteratively computes the output layer weights

by using a given contraction mapping It is shown in

this paper that the algorithm always converges; and

the running time only depends on the initial values

of q, a, e on the distribution of the interpolation

nodes and on the vector norm of the interpolated

function computed at these nodes

Owing to numerical advantages of contraction

transformations, it is easy to obtain very small

training errors, and it is also easy to control the

balance between its convergence rate and the

generality of network by setting appropriate values

to parameters One of the most important features

of this algorithm is the output layer weights can be

trained independently such that the whole algorithm

can be parallelized Furthermore, for a large

network, the stopping rule based on the norm of N-dimensional vectors can be replaced by a much simpler one deﬁned in Eq (16) to avoid lengthy computations

When the number of nodes is very large, the clustering approach can be used to regroup data into several sets with smaller size By doing so, the training can be parallely done for each cluster and helps to reduce the training time The obtained networks are called local RBF networks This approach might be considered as equivalent to the spline method and it will be presented in a forth-coming paper

In the case of a very large number N of nodes, and for the point of view of neural network as associative memory, another approach can be exploited In fact, an approximate RBF network can be designed with the number of hidden nodes

Table 5

Checking error at 8 trained nodes to compare accuracy

Co-ordinate of checked node Original

function value

Gradient algorithm with 100 loop times training time: 1 00

Gradient algorithm 10,000 loop times training time: 180 00

New algorithm e ¼ 10 6 ,

q ¼ 0.8, a ¼ 0.9 training time:1 00

value

Error Interpolation

value

Error Interpolation

value

Error

1.666667 0.000000 0.000000 4.841471 4.4645 3769.7 10 4 5.0959 2544.2 10 4 4.84146 0.1 10 4

0.333333 0.444444 1.379573 4.361647 3.5933 7683.4 104 3.6708 6908.4 104 4.36166 0.09 104 2.666667 0.444444 1.536421 7.320530 8.7058 13852.7 10 4 7.2647 558.2 10 4 7.32052 0.08 10 4

0.666667 1.333333 0.128552 5.221158 4.0646 11565.5 10 4 4.9517 2694.5 10 4 5.22117 0.1 10 4

2.666667 1.333333 1.589585 12.77726 12.5041 2731.6 10 4 12.1965 5807.6 10 4 12.7772 0.7 10 4

1.666667 1.777778 0.088890 9.209746 6.6682 25415.4 10 4 9.2944 846.5 10 4 9.20972 0.2 10 4

2.333333 0.444444 0.039225 7.415960 6.7228 6931.5 104 7.48 640.4 104 7.41596 0.005 104 2.666667 3.555556 0.852303 28.51619 28.0927 4234.9 104 29.2798 7636.0 104 28.5162 0.09 104

Table 6

Checking error at 8 untrained nodes to compare the generality

Co-ordinate of checked

node

Original function value

Gradient algorithm with 100 loop times training time: 1 00

Gradient algorithm 10,000 loop times training time: 180 00

New algorithm e ¼ 10 6 , q ¼ 0.8, a ¼ 0.9 training time:1 00

value

Error Interpolation

value

Error Interpolation

value

Error

0.32163 0.45123 1.38123 4.350910 2.1394 22115.1 104 3.9309 4200.1 104 4.32214 287.7 104 0.67123 0.8912 1.4512 4.202069 2.8529 13491.6 104 4.7884 5863.3 104 4.20115 9.1 104 1.68125 1.34121 0.27423 8.293276 6.1078 21854.7 10 4 8.3869 936.2 10 4 8.30878 155.0 10 4

0.34312 1.78123 2.56984 3.406823 3.2115 1953.2 10 4 4.1438 7369.7 10 4 3.399 78.2 10 4

2.65989 3.56012 0.8498 28.42147 27.5174 9040.7 10 4 29.1648 7433.2 10 4 28.429 75.2 10 4

1.67013 2.23123 0.29423 9.84913 8.6415 12076.3 104 9.5863 2628.3 104 9.79204 570.9 104 2.65914 3.56123 0.85612 28.41991 27.5147 9052.1 104 29.1634 7434.8 104 28.419 9.1 104 1.3163 0.44925 1.12987 5.311670 3.5188 17928.7 10 4 5.3729 612.2 10 4 5.28737 243.0 10 4

Trang 10

much lesser than N, based on the following scheme.

First, the data set is partitioned into K clusters

Ci

f gKi¼1 by using any clustering algorithm (for

example, the k-mean method) Then the center ni

of the RBF associated to the ith hidden neuron can

be chosen to be the mean vector diof Cior the vector

in equation Ci nearest to di The network is trained

by the algorithm with the set of new interpolation

nodes n i k

i¼1 Other choice based on any variation of

this approach can be made, dependently on the

context of the desired applications

The advantage of RBF networks is its local

inﬂuence property (see[13,14]); so that width

para-meters are generally chosen small (see Section 6.1,

pp 288–289 of[15]), especially, in[5, Section 7.7, p

262]) it is suggested to choose s ¼ 0.05 or 0.1 In

Section 3.11, p 99 of[5], it is also suggested to use

s ¼ 1=ð2NÞ1=n, which is very small Therefore, qk

must be small In fact, the condition qko1 presents

some limitations, but does not affect the algorithm

performance

Our iterative algorithm is based on the principle

of contraction mapping; to insure the contraction

property, q chosen by (12) is fundamental to this

algorithm, so that this is rather empirical and it does

not correspond to any optimum consideration In

RBF networks, to determine the optimum width

parameters (or q) is still an open problem

References

[1] M.J.D Powell, Radial basis function approximations to

polynomials, in: Proceedings of the Numerical analysis 1987,

Dundee, UK, 1988, pp 223–241.

[2] D.S Broomhead, D Lowe, Multivariable functional interpola-tion and adaptive networks, Complex Syst 2 (1988) 321–355 [3] E Blanzieri, Theoretical interpretations and applications of radial basis function networks, Technical Report

DIT-03-023, Informatica e Telecomunicazioni, University of Trento, 2003.

[4] S Haykin, Neural Networks: A Comprehensive Founda-tion, second ed., Prentice-Hall Inc., Englewood Cliffs, NJ, 1999.

[5] C.G Looney, Pattern Recognition Using Neural Networks: Theory and Algorithm for Engineers and Scientist, Oxford University Press, New York, 1997.

[6] F Schwenker, H.A Kesler, G Palm, Three learning phases for radial-basis-function networks, Neural Networks 14 (4–5) (2001) 439–458.

[7] E.J Hartman, J.D Keeler, J.M Kowalski, Layered neural networks with Gaussian hidden units as universal approx-imations, Neural Comput 2 (2) (1990) 210–215.

[8] J Park, I.W Sandberg, Approximation and radial-basis-function networks, Neural Comput 5 (3) (1993) 305–316.

[9] T Poggio, F Girosi, Networks for approximating and learning, IEEE Proc 78 (9) (1990) 1481–1497.

[10] M Bianchini, P Frasconi, M Gori, Learning without local minima in radial basis function networks, IEEE Trans Neural Networks 6 (3) (1995) 749–756.

[11] C Michelli, Interpolation of scattered data: distance matrices and conditionally positive deﬁnite functions, Constr Approx 2 (1986) 11–22.

[12] L Collatz, Functional Analysis and Numerical Mathe-matics, Academic Press, New York, 1966.

[13] T.M Mitchell, Machine Learning, McGraw-Hill, New York, 1997.

[14] H.X Huan, D.T.T Hien, An iterative algorithm for training

an interpolation RBF networks, in: Proceedings of the Vietnamese National Workshop on Some Selected Topics

of Information Technology, Haiphong, Vietnam, 2005,

pp 314–323.

[15] Mousoun, Fundamental of Artiﬁcial Neural Networks, MIT Press, Cambridge, MA, 1995.

Định dạng
Số trang	10
Dung lượng	463,4 KB