Tài liệu Mạng thần kinh thường xuyên cho dự đoán P6 pdf

In the previous chapter, we have shown that neural networks, be they feedforward or recurrent, cannot generate time delays of an order higher than the dimension ofthe input to the networ

Trang 1

Copyright c2001 John Wiley & Sons Ltd

ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic)

6

Neural Networks as Nonlinear Adaptive Filters

6.1 Perspective

Neural networks, in particular recurrent neural networks, are cast into the framework

of nonlinear adaptive filters In this context, the relation between recurrent neuralnetworks and polynomial filters is first established Learning strategies and algorithmsare then developed for neural adaptive system identifiers and predictors Finally, issuesconcerning the choice of a neural architecture with respect to the bias and variance

of the prediction performance are discussed

6.2 Introduction

Representation of nonlinear systems in terms of NARMA/NARMAX models has beendiscussed at length in the work of Billings and others (Billings 1980; Chen and Billings

1989; Connor 1994; Nerrand et al 1994) Some cognitive aspects of neural nonlinear

ﬁlters are provided in Maass and Sontag (2000) Pearson (1995), in his article onnonlinear input–output modelling, shows that block oriented nonlinear models are

a subset of the class of Volterra models So, for instance, the Hammerstein model,

which consists of a static nonlinearity f ( · ) applied at the output of a linear dynamical system described by its z-domain transfer function H(z), can be represented1by theVolterra series

In the previous chapter, we have shown that neural networks, be they feedforward

or recurrent, cannot generate time delays of an order higher than the dimension ofthe input to the network Another important feature is the capability to generatesubharmonics in the spectrum of the output of a nonlinear neural ﬁlter (Pearson1995) The key property for generating subharmonics in nonlinear systems is recursion,hence, recurrent neural networks are necessary for their generation Notice that, as

1 Under the condition that the function f is analytic, and that the Volterra series can be thought

of as a generalised Taylor series expansion, then the coeﬃcients of the model (6.2) that do not vanish

are h i,j, ,z = 0 ⇔ i = j = · · · = z.

Trang 2

pointed out in Pearson (1995), block-stochastic models are, generally speaking, notsuitable for this application.

In Hakim et al (1991), by using the Weierstrass polynomial expansion theorem,

the relation between neural networks and Volterra series is established, which is thenextended to a more general case and to continuous functions that cannot be expandedvia a Taylor series expansion.2Both feedforward and recurrent networks are charac-terised by means of a Volterra series and vice versa

Neural networks are often referred to as ‘adaptive neural networks’ As alreadyshown, adaptive filters and neural networks are formally equivalent, and neural net-works, employed as nonlinear adaptive filters, are generalisations of linear adaptivefilters However, in neural network applications, they have been used mostly in such

a way that the network is ﬁrst trained on a particular training set and subsequentlyused This approach is not an online adaptive approach, which is in contrast withlinear adaptive ﬁlters, which undergo continual adaptation

Two groups of learning techniques are used for training recurrent neural works: a direct gradient computation technique (used in nonlinear adaptive ﬁltering)and a recurrent backpropagation technique (commonly used in neural networks foroﬄine applications) The real-time recurrent learning (RTRL) algorithm (Williamsand Zipser 1989a) is a technique which uses direct gradient computation, and is used

net-if the network coeﬃcients change slowly with time This technique is essentially anLMS learning algorithm for a nonlinear IIR ﬁlter It should be noticed that, with thesame computation time, it might be possible to unfold the recurrent neural networkinto the corresponding feedforward counterparts and hence to train it by backprop-agation The backpropagation through time (BPTT) algorithm is such a technique(Werbos 1990)

Some of the benefits involved with neural networks as nonlinear adaptive filters arethat no assumptions concerning Markov property, Gaussian distribution or additivemeasurement noise are necessary (Lo 1994) A neural filter would be a suitable choiceeven if mathematical models of the input process and measurement noise are notknown (black box modelling)

6.3 Overview

We start with the relationship between Volterra and bilinear ﬁlters and neural works Recurrent neural networks are then considered as nonlinear adaptive ﬁlters andneural architectures for this case are analysed Learning algorithms for online training

net-of recurrent neural networks are developed inductively, starting from correspondingalgorithms for linear adaptive IIR ﬁlters Some issues concerning the problem of van-ishing gradient and bias/variance dilemma are ﬁnally addressed

6.4 Neural Networks and Polynomial Filters

It has been shown in Chapter 5 that a small-scale neural network can represent order nonlinear systems, whereas a large number of terms are required for an equiv-

high-2 For instance nonsmooth functions, such as|x|.

Trang 3

alent Volterra series representation For instance, as already shown, after performing

a Taylor series expansion for the output of a neural network depicted in Figure 5.3,

with input signals u(k − 1) and u(k − 2), we obtain

y(k) = c0+ c1u(k − 1) + c2u(k − 2) + c3u2(k − 1) + c4u2(k − 2)

series and complexity of kernels h( · ) increase exponentially with the order of the

delay in system (6.2) This problem restricts practical applications of Volterra series

to small-scale systems

Nonlinear system identiﬁcation, on the other hand, has been traditionally basedupon the Kolmogorov approximation theorem (neural network existence theorem),which states that a neural network with a hidden layer can approximate an arbitrarynonlinear system Kolmogorov’s theorem, however, is not that relevant in the con-text of networks for learning (Girosi and Poggio 1989b) The problem is that innerfunctions in Kolmogorov’s formula (4.1), although continuous, have to be highly non-smooth Following the analysis from Chapter 5, it is straightforward that multilayeredand recurrent neural networks have the ability to approximate an arbitrary nonlinearsystem, whereas Volterra series fail even for simple saturation elements

Another convenient form of nonlinear system is the bilinear (truncated Volterra)system described by

N−1 j=1

b i,j y(k − j)x(k − i) +

N−1 i=0

a i x(k − i). (6.3)

Despite its simplicity, this is a powerful nonlinear model and a large class of nonlinearsystems (including Volterra systems) can be approximated arbitrarily well using thismodel Its functional dependence (6.3) shows that it belongs to a class of generalrecursive nonlinear models A recurrent neural network that realises a simple bilinearmodel is depicted in Figure 6.1 As seen from Figure 6.1, multiplicative input nodes(denoted by ‘×’) have to be introduced to represent the bilinear model Bias terms

are omitted and the chosen neuron is linear

Example 6.4.1 Show that the recurrent network shown in Figure 6.1 realises a

bilinear model Also show that this network can be described in terms of NARMAXmodels

Trang 4

a a b b

c

y(k) x(k)

0,1

Σ

+ +

y(k-1) x(k-1)

Figure 6.1 Recurrent neural network representation of the bilinear model

Solution The functional description of the recurrent network depicted in Figure 6.1

is given by

y(k) = c1y(k −1)+b 0,1 x(k)y(k −1)+b 1,1 x(k −1)y(k −1)+a0x(k) + a1x(k −1), (6.4)

which belongs to the class of bilinear models (6.3) The functional description of thenetwork from Figure 6.1 can also be expressed as

y(k) = F (y(k − 1), x(k), x(k − 1)), (6.5)which is a NARMA representation of model (6.4)

Example 6.4.1 conﬁrms the duality between Volterra, bilinear, NARMA/NARMAXand recurrent neural models To further establish the connection between Volterraseries and a neural network, let us express the activation potential of nodes of thenetwork as

where neti (k) is the activation potential of the ith hidden neuron, w i,j are weights

and x(k −j) are inputs to the network If the nonlinear activation functions of neurons are expressed via an Lth-order polynomial expansion3as

3 Using the Weierstrass theorem, this expansion can be arbitrarily accurate However, in practice

we resort to a moderate order of this polynomial expansion.

Trang 5

then the neural model described in (6.6) and (6.7) can be related to the Volterramodel (6.2) The actual relationship is rather complicated, and Volterra kernels areexpressed as sums of products of the weights from input to hidden units, weights

associated with the output neuron, and coeﬃcients ξ il from (6.7) Chon et al (1998)

have used this kind of relationship to compare the Volterra and neural approach whenapplied to processing of biomedical signals

Hence, to avoid the diﬃculty of excessive computation associated with Volterraseries, an input–output relationship of a nonlinear predictor that computes the output

in terms of past inputs and outputs may be introduced as4

ˆ

y(k) = F (y(k − 1), , y(k − N), u(k − 1), , u(k − M)), (6.8)

where F ( · ) is some nonlinear function The function F may change for diﬀerent

input variables or for diﬀerent regions of interest A NARMAX model may therefore

be a correct representation only in a region around some operating point Leontaritisand Billings (1985) rigorously proved that a discrete time nonlinear time invariantsystem can always be represented by model (6.8) in the vicinity of an equilibriumpoint provided that

• the response function of the system is ﬁnitely realisable, and

• it is possible to linearise the system around the chosen equilibrium point.

As already shown, some of the other frequently used models, such as the bilinearpolynomial ﬁlter, given by (6.3), are obviously cases of a simple NARMAX model

6.5 Neural Networks and Nonlinear Adaptive Filters

To perform nonlinear adaptive ﬁltering, tracking and system identiﬁcation of nonlineartime-varying systems, there is a need to introduce dynamics in neural networks Thesedynamics can be introduced via recurrent neural networks, which are the focus of thisbook

The design of linear ﬁlters is conveniently speciﬁed by a frequency response which

we would like to match In the nonlinear case, however, since a transfer function

of a nonlinear filter is not available in the frequency domain, one has to resort todifferent techniques For instance, the design of nonlinear filters may be thought of as

a nonlinear constrained optimisation problem in Fock space (deFigueiredo 1997)

In a recurrent neural network architecture, the feedback brings the delayed outputs

from hidden and output neurons back into the network input vector u(k), as shown in

Figure 5.13 Due to gradient learning algorithms, which are sequential, these delayedoutputs of neurons represent ﬁltered data from the previous discrete time instant.Due to this ‘memory’, at each time instant, the network is presented with the raw,

4 As already shown, this model is referred to as the NARMAX model (nonlinear ARMAX), since

it resembles the linear model

Trang 6

y(k)w

x(k)

Input

Figure 6.2 NARMA recurrent perceptron

possibly noisy, external input data s(k), s(k − 1), , s(k − M) from Figure 5.13 and Equation (5.31), and ﬁltered data y1(k − 1), , y N (k − 1) from the network output.

Intuitively, this ﬁltered input history helps to improve the processing performance ofrecurrent neural networks, as compared with feedforward networks Notice that thehistory of past outputs is never presented to the learning algorithm for feedforwardnetworks Therefore, a recurrent neural network should be able to process signalscorrupted by additive noise even in the case when the noise distribution is varyingover time

On the other hand, a nonlinear dynamical system can be described by

with an observation process

where (k) is observation noise (Haykin and Principe 1998) Takens’ embedding

theo-rem (Takens 1981) states that the geometric structure of system (6.9) can be recovered

Trang 7

(b) A recurrent linear/nonlinear neural ﬁlter structure

Figure 6.3 Nonlinear IIR ﬁlter structuresfrom the sequence{y(k)} in a D-dimensional space spanned by5

provided that D 2d + 1, where d is the dimension of the state space of system (6.9).

Therefore, one advantage of NARMA models over FIR models is the parsimony ofNARMA models, since an upper bound on the order of a NARMA model is twice theorder of the state (phase) space of the system being analysed

The simplest recurrent neural network architecture is a recurrent perceptron, shown

in Figure 6.2 This is a simple, yet eﬀective architecture The equations which describethe recurrent perceptron shown in Figure 6.2 are

y(k) = Φ(v(k)),

v(k) = uT(k)w(k),

(6.12)

where u(k) = [x(k − 1), , x(k − M), 1, y(k − 1), , y(k − N)]Tis the input vector,

transpose operator

5 Model (6.11) is in fact a NAR/NARMAX model.

Trang 8

Figure 6.5 Fully connected feedforward neural ﬁlter

A recurrent perceptron is a recursive adaptive ﬁlter with an arbitrary output tion as shown in Figure 6.3 Figure 6.3(a) shows the recurrent perceptron structure

func-as a nonlinear infinite impulse response (IIR) filter Figure 6.3(b) depicts the parallellinear/nonlinear structure, which is one of the possible architectures These structuresstem directly from IIR filters and are described in McDonnell and Waagen (1994),

Connor (1994) and Nerrand et al (1994) Here, A(z), B(z), C(z) and D(z) denote the z-domain linear transfer functions The general structure of a fully connected,

multilayer neural feedforward ﬁlter is shown in Figure 6.5 and represents a isation of a simple nonlinear feedforward perceptron with dynamic synapses, shown

general-in Figure 6.4 This structure consists of an general-input layer, layer of hidden neurons and

an output layer Although the output neuron shown in Figure 6.5 is linear, it could

be nonlinear In that case, attention should be paid that the dynamic ranges of theinput signal and output neuron match

Another generalisation of a fully connected recurrent neural ﬁlter is shown in ure 6.6 This network consists of nonlinear neural ﬁlters as depicted in Figure 6.5,applied to both the input and output signal, the outputs of which are summedtogether This is a fairly general structure which resembles the architecture of a lin-

Trang 9

Narendra and Parthasarathy (1990) provide deep insight into structures of neuralnetworks for identification of nonlinear dynamical systems Due to the duality betweensystem identification and prediction, the same architectures are suitable for predic-tion applications From Figures 6.3–6.6, we can identify four general architectures ofneural networks for prediction and system identification These architectures come ascombinations of linear/nonlinear parts from the architecture shown in Figure 6.6, andfor the nonlinear prediction configuration are specified as follows

(i) The output y(k) is a linear function of previous outputs and a nonlinear function

of previous inputs, given by

y(k) =

N

j=1

a j (k)y(k − j) + F (u(k − 1), u(k − 2), , u(k − M)), (6.13)

where F ( · ) is some nonlinear function This architecture is shown in

Fig-ure 6.7(a)

(ii) The output y(k) is a nonlinear function of past outputs and a linear function of

past inputs, given by

y(k) = F (y(k − 1), y(k − 2), , y(k − N)) +

M

i=1

b i (k)u(k − i). (6.14)This architecture is depicted in Figure 6.7(b)

(iii) The output y(k) is a nonlinear function of both past inputs and outputs The

functional relationship between the past inputs and outputs can be expressed

Trang 10

Z Z

Z

u(k-1)

u(k-M) y(k-N)

y(k-1)

u(k)

y(k)

(d) Recurrent neural ﬁlter (6.16)

Figure 6.7 Architectures of recurrent neural networks as nonlinear adaptive ﬁlters

in a separable manner as

y(k) = F (y(k − 1), , y(k − N)) + G(u(k − 1), , u(k − M)). (6.15)This architecture is depicted in Figure 6.7(c)

(iv) The output y(k) is a nonlinear function of past inputs and outputs, as

y(k) = F (y(k − 1), , y(k − N), u(k − 1), , u(k − M)). (6.16)This architecture is depicted in Figure 6.7(d) and is most general

Trang 11

y(k-2)

y(k-N)

u(k-M) u(k-1)

Figure 6.8 NARMA type neural identiﬁer

6.6 Training Algorithms for Recurrent Neural Networks

A natural error criterion, upon which the training of recurrent neural networks isbased, is in the form of the accumulated squared prediction error over the wholedataset, given by

par-m = 1, 2, , M , whereas in the nonstationary case, since the statistics change over

time, it is unreasonable to take into account the whole previous history of the errors

For this case, a forgetting mechanism is usually employed, whereby 0 < λ(m) < 1.

Since many real-world signals are nonstationary, online learning algorithms commonlyuse the squared instantaneous error as an error criterion, i.e

Here, the coeﬃcient 12 is included for convenience in the derivation of the algorithms

6.7 Learning Strategies for a Neural Predictor/Identiﬁer

A NARMA/NARMAX type neural identifier is depicted in Figure 6.8 When sidering a neural predictor, the only difference is the position of the neural modulewithin the system structure, as shown in Chapter 2 There are two main trainingstrategies to estimate the weights of the neural network shown in Figure 6.8 In thefirst approach, the links between the real system and the neural identifier are asdepicted in Figure 6.9 During training, the configuration shown in Figure 6.9 can be

Trang 12

_ Σ

y(k)

y(k) e(k)

Figure 6.9 The nonlinear series–parallel (teacher forcing) learning conﬁguration

y(k-1) y(k-2) y(k-N)

Neural Network

Figure 6.10 The nonlinear parallel (supervised) learning conﬁguration

described by

ˆ

y(k) = f (u(k), , u(k − M), y(k − 1), , y(k − N)), (6.19)

which is referred to as the nonlinear series–parallel model (Alippi and Piuri 1996; Qin

et al 1992) In this conﬁguration, the desired signal y(k) is presented to the network,

which produces biased estimates (Narendra 1996)

Tiêu đề	Recurrent neural networks for prediction
Tác giả	Danilo P. Mandic, Jonathon A. Chambers
Chuyên ngành	Computer Science
Thể loại	Book chapter
Năm xuất bản	2001

Định dạng
Số trang	24
Dung lượng	223,86 KB