Tài liệu Mạng thần kinh thường xuyên cho dự đoán P5 doc

The ability of neural networks to model nonlinear dynamical systems is demonstrated, and thecorrespondence between neural networks and block-stochastic models is established.Finally, fur

Trang 1

Authored by Danilo P Mandic, Jonathon A Chambers

Copyright c2001 John Wiley & Sons Ltd

ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic)

5

Recurrent Neural Networks

Architectures

5.1 Perspective

In this chapter, the use of neural networks, in particular recurrent neural networks,

in system identiﬁcation, signal processing and forecasting is considered The ability

of neural networks to model nonlinear dynamical systems is demonstrated, and thecorrespondence between neural networks and block-stochastic models is established.Finally, further discussion of recurrent neural network architectures is provided

5.2 Introduction

There are numerous situations in which the use of linear ﬁlters and models is limited.For instance, when trying to identify a saturation type nonlinearity, linear models willinevitably fail This is also the case when separating signals with overlapping spectralcomponents

Most real-world signals are generated, to a certain extent, by a nonlinear anism and therefore in many applications the choice of a nonlinear model may benecessary to achieve an acceptable performance from an adaptive predictor Commu-nications channels, for instance, often need nonlinear equalisers to achieve acceptableperformance The choice of model has crucial importance1 and practical applicationshave shown that nonlinear models can oﬀer a better prediction performance than theirlinear counterparts They also reveal rich dynamical behaviour, such as limit cycles,bifurcations and ﬁxed points, that cannot be captured by linear models (Gershenfeldand Weigend 1993)

mech-By system we consider the actual underlying physics2 that generate the data,

whereas by model we consider a mathematical description of the system Many

vari-ations of mathematical models can be postulated on the basis of datasets collectedfrom observations of a system, and their suitability assessed by various performance

1 System identiﬁcation, for instance, consists of choice of the model, model parameter estimation and model validation.

2 Technically, the notions of system and process are equivalent (Pearson 1995; Sj¨ oberg et al 1995).

Trang 2

Figure 5.1 Eﬀects of y = tanh(v) nonlinearity in a neuron model upon two example

inputsmetrics Since it is not possible to characterise nonlinear systems by their impulseresponse, one has to resort to less general models, such as homomorphic filters, mor-phological filters and polynomial filters Some of the most frequently used polynomialfilters are based upon Volterra series (Mathews 1991), a nonlinear analogue of thelinear impulse response, threshold autoregressive models (TAR) (Priestley 1991) andHammerstein and Wiener models The latter two represent structures that consist

of a linear dynamical model and a static zero-memory nonlinearity An overview ofthese models can be found in Haber and Unbehauen (1990) Notice that for nonlinearsystems, the ordering of the modules within a modular structure3plays an importantrole

To illustrate some important features associated with nonlinear neurons, let us sider a squashing nonlinear activation function of a neuron, shown in Figure 5.1 Fortwo identical mixed sinusoidal inputs with diﬀerent oﬀsets, passed through this non-linearity, the output behaviour varies from amplifying and slightly distorting the inputsignal (solid line in Figure 5.1) to attenuating and considerably nonlinearly distortingthe input signal (broken line in Figure 5.1) From the viewpoint of system theory,neural networks represent nonlinear maps, mapping one metric space to another

con-3 To depict this, for two modules performing nonlinear functions H1 = sin(x) and H2 = ex, we

have H1(H2(x)) = H2(H1(x)) since sin(e x)= e sin(x) This is the reason to use the term nesting

rather than cascading in modular neural networks.

Trang 3

Nonlinear system modelling has traditionally focused on Volterra–Wiener analysis.These models are nonparametric and computationally extremely demanding TheVolterra series expansion is given by

for the representation of a causal system A nonlinear system represented by a Volterra

series is completely characterised by its Volterra kernels h i , i = 0, 1, 2, The

Volterra modelling of a nonlinear system requires a great deal of computation, andmostly second- or third-order Volterra systems are used in practice

Since the Volterra series expansion is a Taylor series expansion with memory, theyboth fail when describing a system with discontinuities, such as

where sgn(· ) is the signum function.

To overcome this diﬃculty, nonlinear parametric models of nonlinear systems,termed NARMAX, that are described by nonlinear diﬀerence equations, have been

introduced (Billings 1980; Chon and Cohen 1997; Chon et al 1999; Connor 1994).

Unlike the Volterra–Wiener representation, the NARMAX representation of nonlinearsystems oﬀers compact representation

The NARMAX model describes a system by using a nonlinear functional dence between lagged inputs, outputs and/or prediction errors A polynomial expan-sion of the transfer function of a NARMAX neural network does not comprise ofdelayed versions of input and output of order higher than those presented to the net-work Therefore, the input of an insuﬃcient order will result in undermodelling, whichcomplies with Takens’ embedding theorem (Takens 1981)

depen-Applications of neural networks in forecasting, signal processing and control requiretreatment of dynamics associated with the input signal Feedforward networks forprocessing of dynamical systems tend to capture the dynamics by including pastinputs in the input vector However, for dynamical modelling of complex systems,there is a need to involve feedback, i.e to use recurrent neural networks There arevarious conﬁgurations of recurrent neural networks, which are used by Jordan (1986)for control of robots, by Elman (1990) for problems in linguistics and by Williams andZipser (1989a) for nonlinear adaptive ﬁltering and pattern recognition In Jordan’snetwork, past values of network outputs are fed back into hidden units, in Elman’snetwork, past values of the outputs of hidden units are fed back into themselves,whereas in the Williams–Zipser architecture, the network is fully connected, havingone hidden layer

There are numerous modular and hybrid architectures, combining linear adaptiveﬁlters and neural networks These include the pipelined recurrent neural network andnetworks combining recurrent networks and FIR adaptive ﬁlters The main idea here

is that the linear ﬁlter captures the linear ‘portion’ of the input process, whereas aneural network captures the nonlinear dynamics associated with the process

Trang 4

72 OVERVIEW

5.3 Overview

The basic modes of modelling, such as parametric, nonparametric, white box, black

box and grey box modelling are introduced Afterwards, the dynamical richness of

neural models is addressed and feedforward and recurrent modelling for noisy timeseries are compared Block-stochastic models are introduced and neural networks areshown to be able to represent these models The chapter concludes with an overview ofrecurrent neural network architectures and recurrent neural networks for NARMAXmodelling

5.4 Basic Modes of Modelling

The notions of parametric, nonparametric, black box, grey box and white box elling are explained These can be used to categorise neural network algorithms, such

mod-as the direct gradient computation, a posteriori and normalised algorithms The bmod-asic idea behind these approaches to modelling is not to estimate what is already known.

One should, therefore, utilise prior knowledge and knowledge about the physics of thesystem, when selecting the neural network model prior to parameter estimation

5.4.1 Parametric versus Nonparametric Modelling

A review of nonlinear input–output modelling techniques is given in Pearson (1995)

Three classes of input–output models are parametric, nonparametric and

semipara-metric models We next brieﬂy address them.

• Parametric modelling assumes a ﬁxed structure for the model The model

iden-tification problem then simplifies to estimating a finite set of parameters of thisfixed model This estimation is based upon the prediction of real input data,

so as to best match the input data dynamics An example of this technique isthe broad class of ARIMA/NARMA models For a given structure of the model(NARMA for instance) we recursively estimate the parameters of the chosenmodel

• Nonparametric modelling seeks a particular model structure from the input

data The actual model is not known beforehand An example taken from

non-parametric regression is that we look for a model in the form of y(k) = f (x(k)) without knowing the function f ( · ) (Pearson 1995).

• Semiparametric modelling is the combination of the above Part of the model

structure is completely speciﬁed and known beforehand, whereas the other part

of the model is either not known or loosely speciﬁed

Neural networks, especially recurrent neural networks, can be employed within mators of all of the above classes of models Closely related to the above concepts arewhite, grey and black box modelling techniques

Trang 5

esti-5.4.2 White, Grey and Black Box Modelling

To understand and analyse real-world physical phenomena, various mathematical

models have been developed Depending on some a priori knowledge about the

pro-cess, data and model, we diﬀerentiate between three fairly general modes of modelling.The idea is to distinguish between three levels of prior knowledge, which have been

‘colour-coded’ An overview of the white, grey and black box modelling techniquescan be found in Aguirre (2000) and Sj¨oberg et al (1995).

Given data gathered from planet movements, then Kepler’s gravitational laws mightwell provide the initial framework in building a mathematical model of the process

This mode of modelling is referred to as white box modelling (Aguirre 2000),

under-lying its fairly deterministic nature Static data are used to calculate the parameters,and to do that the underlying physical process has to be understood It is therefore

possible to build a white box model entirely from physical insight and prior

knowl-edge However, the underlying physics are generally not completely known, or are toocomplicated and often one has to resort to other types of modelling

The exact form of the input–output relationship that describes a real-world system

is most commonly unknown, and therefore modelling is based upon a chosen set ofknown functions In addition, if the model is to approximate the system with anarbitrary accuracy, the set of chosen nonlinear continuous functions must be dense.This is the case with polynomials In this light, neural networks can be viewed as

another mode of functional representations Black box modelling therefore assumes

no previous knowledge about the system that produces the data However, the chosennetwork structure belongs to architectures that are known to be ﬂexible and have

performed satisfactorily on similar problems The aim hereby is to ﬁnd a function F that approximates the process y based on the previous observations of process yPAST and input u, as

This ‘black box’ establishes a functional dependence between the input and put, which can be either linear or nonlinear The downside is that it is gener-ally not possible to learn about the true physical process that generates the data,especially if a linear model is used Once the training process is complete, a neu-

out-ral network represents a black box, nonparametric process model Knowledge about

the process is embedded in the values of the network parameters (i.e synapticweights)

A natural compromise between the two previous models is so-called grey box elling It is obtained from black box modelling if some information about the system

mod-is known a priori Thmod-is can be a probability density function, general statmod-istics of

the process data, impulse response or attractor geometry In Sj¨oberg et al (1995), two subclasses of grey box models are considered: physical modelling, where a model

structure is built upon understanding of the underlying physics, as for instance the

state-space model structure; and semiphysical modelling, where, based upon physical

insight, certain nonlinear combinations of data structures are suggested, and then

estimated by black box methodology.

Trang 6

74 NARMAX MODELS AND EMBEDDING DIMENSION

ν(k) + +

+ _

^

Neural Network Model

Σ

Figure 5.2 Nonlinear prediction conﬁguration using a neural network model

5.5 NARMAX Models and Embedding Dimension

For neural networks, the number of input nodes speciﬁes the dimension of the networkinput In practice, the true state of the system is not observable and the mathematicalmodel of the system that generates the dynamics is not known The question arises:

is the sequence of measurements {y(k)} suﬃcient to reconstruct the nonlinear

sys-tem dynamics? Under some regularity conditions, Takens’ (1981) and Mane’s (1981)embedding theorems establish this connection To ensure that the dynamics of a non-linear process estimated by a neural network are fully recovered, it is convenient touse Takens’ embedding theorem (Takens 1981), which states that to obtain a faithful

reconstruction of the system dynamics, the embedding dimension d must satisfy

where D is the dimension of the system attractor Takens’ embedding theorem (Takens

1981; Wan 1993) establishes a diﬀeomorphism between a ﬁnite window of the timeseries

[y(k − 1), y(k − 2), , y(k − N)] (5.5)and the underlying state of the dynamic system which generates the time series Thisimplies that a nonlinear regression

y(k) = g[y(k − 1), y(k − 2), , y(k − N)] (5.6)can model the nonlinear time series An important feature of the delay-embeddingtheorem due to Takens (1981) is that it is physically implemented by delay lines

Trang 7

1 w0

y(k) w

Figure 5.3 A NARMAX recurrent perceptron with p = 1 and q = 1

There is a deep connection between time-lagged vectors and underlying dynamics.Delay vectors are not just a representation of a state of the system, their length isthe key to recovering the full dynamical structure of a nonlinear system A generalstarting point would be to use a network for which the input vector comprises delayedinputs and outputs, as shown in Figure 5.2 For the network in Figure 5.2, both theinput and the output are passed through delay lines, hence indicating the NARMAXcharacter of this network The switch in this ﬁgure indicates two possible modes oflearning which will be explained in Chapter 6

5.6 How Dynamically Rich are Nonlinear Neural Models?

To make an initial step toward comparing neural and other nonlinear models, weperform a Taylor series expansion of the sigmoidal nonlinear activation function of a

single neuron model as (Billings et al 1992)

Depending on the steepness β and the activation potential v(k), the polynomial

rep-resentation (5.7) of the transfer function of a neuron exhibits a complex nonlinearbehaviour

Let us now consider a NARMAX recurrent perceptron with p = 1 and q = 1,

as shown in Figure 5.3, which is a simple example of recurrent neural networks Itsmathematical description is given by

y(k) = Φ(w1x(k − 1) + w2y(k − 1) + w0). (5.8)Expanding (5.8) using (5.7) yields

y(k) = 12+14[w1 x(k −1)+w2y(k −1)+w0]−1

48[w1 x(k −1)+w2y(k −1)+w0]3+· · · , (5.9)

where β = 1 Expression (5.9) illustrates the dynamical richness of squashing

activa-tion funcactiva-tions The associated dynamics, when represented in terms of polynomialsare quite complex Networks with more neurons and hidden layers will produce morecomplicated dynamics than those in (5.9) Following the same approach, for a general

Trang 8

76 HOW DYNAMICALLY RICH ARE NONLINEAR NEURAL MODELS?

recurrent neural network, we obtain (Billings et al 1992)

the network Representation (5.10) also models an oﬀset (mean value) c0of the inputsignal

5.6.1 Feedforward versus Recurrent Networks for Nonlinear Modelling

The choice of which neural network to employ to represent a nonlinear physical processdepends on the dynamics and complexity of the network that is best for representingthe problem in hand For instance, due to feedback, recurrent networks may sufferfrom instability and sensitivity to noise Feedforward networks, on the other hand,might not be powerful enough to capture the dynamics of the underlying nonlineardynamical system To illustrate this problem, we resort to a simple IIR (ARMA)linear system described by the following first-order difference equation

z(k) = 0.5z(k − 1) + 0.1x(k − 1). (5.11)

The system (5.11) is stable, since the pole of its transfer function is at 0.5, i.e within the unit circle in the z-plane However, in a noisy environment, the output z(k) is corrupted by noise e(k), so that the noisy output y(k) of system (5.11) becomes

which will aﬀect the quality of estimation based on this model This happens becausethe noise terms accumulate during recursions4 (5.11) as

y(k) = 0.5y(k − 1) + 0.1x(k − 1) + e(k) − 0.5e(k − 1). (5.13)

An equivalent FIR (MA) representation of the same ﬁlter (5.11), using the method

of long division, gives

z(k) = 0.1x(k − 1) + 0.05x(k − 2) + 0.025x(k − 3) + 0.0125x(k − 4) + · · · (5.14)

and the representation of a noisy system now becomes

y(k) = 0.1x(k −1)+0.05x(k −2)+0.025x(k −3)+0.0125x(k −4)+· · ·+e(k) (5.15)

4 Notice that if the noise e(k) is zero mean and white it appears coloured in (5.13), i.e correlated

with previous outputs, which leads to biased estimates.

Trang 9

Clearly, the noise in (5.15) is not correlated with the previous outputs and the mates are unbiased.5 The price to pay, however, is the inﬁnite length of the exactrepresentation of (5.11).

esti-A similar principle applies to neural networks In Chapter 6 we address the modes

of learning in neural networks and discuss the bias/variance dilemma for recurrentneural networks

5.7 Wiener and Hammerstein Models and Dynamical Neural Networks

Under relatively mild conditions,6 the output signal of a nonlinear model can beconsidered as a combination of outputs from some suitable submodels The structureidentiﬁcation, model validation and parameter estimation based upon these submodelsare more convenient than those of the whole model Block oriented stochastic modelsconsist of static nonlinear and dynamical linear modules Such models often occur inpractice, examples of which are

• the Hammerstein model, where a zero-memory nonlinearity is followed by a

lin-ear dynamical system characterised by its transfer function H(z) = N (z)/D(z);

• the Wiener model, where a linear dynamical system is followed by a zero-memory

nonlinearity

5.7.1 Overview of Block-Stochastic Models

The deﬁnitions of certain stochastic models are given by the

Trang 10

78 WIENER AND HAMMERSTEIN MODELS AND DYNAMICAL NNs

D(z)

nonlinear function

(a) The Hammerstein stochastic model

function

D(z)

(b) The Wiener stochastic model

Figure 5.4 Nonlinear stochastic models used in control and signal processing

Theoretically, there are ﬁnite size neural systems with dynamic synapses whichcan represent all of the above Moreover, some modular neural architectures, such

as the PRNN (Haykin and Li 1995), are able to represent block-cascaded Wiener–Hammerstein systems described by (Mandic and Chambers 1999c)

to be representable this way

5.7.2 Connection Between Block-Stochastic Models and Neural Networks

Block diagrams of Wiener and Hammerstein systems are shown in Figure 5.4 Thenonlinear function from Figure 5.4(a) can be generally assumed to be a polynomial,7i.e

rep-corrupted with additive output noise η(k) is

y(k) = Φ[u(k − 1)] +

∞

i=2 hiΦ[u(k − i)] + ν(k), (5.22)

where Φ is a nonlinear function which is continuous Other requirements are that the

linear dynamical subsystem is stable This network is shown in Figure 5.5

Tiêu đề	Recurrent neural networks for prediction
Tác giả	Danilo P. Mandic, Jonathon A. Chambers
Thể loại	Chapter
Năm xuất bản	2001

Định dạng
Số trang	21
Dung lượng	234,02 KB