Handbook of Reliability, Availability, Maintainability and Safety in Engineering Design - Part 73 doc

5.54 The building blocks of artificial neural networks, where σ is the non-linearity, x i the output of unit i, x jthe input to unit j, and wi jare the weights that connect unit i to uni

Trang 1

704 5 Safety and Risk in Engineering Design

close to reality) and associative (i.e include typical profiles) but not descriptive

Ex-amining the artificial neural network itself only shows meaningless numeric values

The ANN model is fundamentally a black box On the other hand, being

contin-uous and derivable, one can explore ANN models beyond simple statistical inter-rogation to determine typical profiles, explicative variables (network inputs), and apply example data to determine their associated probabilities Artificial neural net-works have the ability to account for any functional dependency by discovering (i.e learning and then modelling) the nature of the dependency without needing to be prompted The process goes straight from the data to the model without intermedi-ary interpretation or problem simplification There are no inherent conditions placed

on the predicted variable, which can be a yes/no output, a continuous value, or one

or more classes among n, etc However, artificial neural networks are insensitive to unreliability in the data.

Artificial neural networks have been applied in engineering design in predictive

modelling of system behaviour using simulation augmented with ANN model in-terpolation (Chryssolouris et al 1989), as well as in inin-terpolation of Taguchi robust design points so that a full factorial design can be simulated to search for optimal

design parameter settings (Schmerr et al 1991)

An artificial neural network is a set of elements (i.e neurodes or, more com-monly, neurons) linked to one another, and that transmit information to each other through connected links Example data (a to i) are given as the inputs to the ANN

model Various values of the data are then transmitted through the connections, be-ing modified durbe-ing the process until, on arrival at the bottom of the network, they have become the predicted values—for example, the pair of risk probabilities P1 and P2 indicated in Fig 5.53

a) The Building Blocks of Artificial Neural Networks

Artificial neural networks are highly distributed interconnections of adaptive

non-linear processing elements (PEs), as illustrated below in Fig 5.54.

The connection strengths, also called the network weights, can be adapted so that

the network’s output matches a desired response A more detailed view of a PE is shown in Fig 5.55

An artificial neural network is no more than an interconnection of PEs The form

of the interconnection provides one of the key variables for dividing neural networks into families The most general case is the fully connected neural network By defini-tion, any PE can feed or receive activations of any other, including itself Therefore, when the weights are represented in matrix form (the weight matrix), it will be fully populated A (6×6) PE fully connected network is presented in Fig 5.56.

This network is called a recurrent network In recurrent networks, some of the

connections may be absent but there are still feedback connections An input

pre-sented to a recurrent network at time t will affect the networks output for future time steps greater than t Therefore, recurrent networks need to be operated over time.

If the interconnection matrix is restricted to feed-forward activations (no feedback

Trang 2

5.3 Analytic Development of Safety and Risk in Engineering Design 705

Fig 5.53 Schematic layout of a complex artificial neural network (Valluru 1995)

Fig 5.54 The building blocks

of artificial neural networks,

where σ is the non-linearity,

x i the output of unit i, x jthe

input to unit j, and wi jare the

weights that connect unit i to

unit j

Fig 5.55 Detailed view of

a processing element (PE)

nor self connections), the ANN is defined as a feed-forward network Feed-forward networks are instantaneous mappers, i.e the output is valid immediately after the presentation of an input A special class of feed-forward networks is the layered

Trang 3

Fig 5.56 A fully connected ANN, and its weight matrix

Fig 5.57 Multi-layer

percep-tron structure

class, also termed a multi-layer perceptron (MLP) This describes a network that

consists of a single layer of non-linear PEs without feedback connections Multi-layer perceptrons have PEs arranged in Multi-layers whereby the Multi-layers that receive input

are called the input layers, layers in contact with the outside world are called output layers, and layers without direct access to the outside world, i.e connected to the

input or output, are called hidden layers (Valluru 1995)

The weight matrix of a multi-layer perceptron can be developed as follows

(Figs 5.57 and 5.58): from the example MLP in Fig 5.57, the input layer contains PEs 1, 2 and 3, the hidden layer contains PEs 4 and 5, and the output layer contains

PE 6

Figure 5.58 shows the MLP’s weight matrix Most entries in the weight matrix

of an MLP are zero In particular, any feed-forward network has at least the main diagonal, and the elements below it populated with zeros Feed-forward neural net-works are therefore a special case of recurrent netnet-works Implementing partially connected topologies with the fully connected system and then zeroing weights is inefficient but is sometimes done, depending on the requirements for the artificial neural network A case in point would be the weight matrix of the MLP below:

Trang 4

Fig 5.58 Weight matrix

structure for the multi-layer

perception

b) Structure of Artificial Neural Networks

A basic artificial neural network (ANN) structure thus consists of three layers: the input layer, the hidden layer, and the output layer, as indicated in Fig 5.59 (Haykin 1999)

This MLP works in the following manner: for a given input vector

[(x0) \ vec] = {a0, a i } (5.104)

the following output vector is computed

[(o0) \ vec] = {c0, c i } (5.105)

The ANN implements the function f where

f ([(x0) \ vec]) = [(o0) \ vec] (5.106)

The basic processing element (PE) group of the MLP is termed the artificial perceptron (AP) The AP has a set of input connections from PEs of another layer,

as indicated in Fig 5.60 (Haykin 1999)

Fig 5.59 Basic structure of

an artificial neural network

Trang 5

Fig 5.60 Input connections

of the artificial perceptron

(an ,b1 )

a0

an Wan b1

Wa1 b1 Wa0 b1

An AP computes its output in the following fashion: the output is usually a real

number, and is the function of the activation, z i, where

The activation is computed as follows

δ = the activation function There are many different activation functions (σ) in use ANNs that work with binary vectors usually use the step-function:

σ(z) = 1

z ∈ [θ,∞) else 0 (usuallyθ= 0) These activation functions (σ) are called threshold logic units (TLUs), as

indi-cated in the binary step-function illustrated in Fig 5.61

Fig 5.61 The binary

step-function threshold logic unit

(TLU)

1

z σ(z)

θ

Trang 6

Fig 5.62 The non-binary

sigmoid-function threshold

logic unit (TLU)

1

z σ(z)

Graphic examples of threshold logic units (TLU) (Fausett 1994):

Non-binary ANNs often use the sigmoid function as activation function where the

parameterρ determines the shape of the sigmoid, as indicated in Fig 5.62 and in

Eq 5.109

The most significant advantage of an MLP is that the artificial neural network

is highly parallel The MLP is also robust in the presence of noise (i.e deviations

in input) where a small amount of noise will not drastically affect the output Fur-thermore, it can deal with unseen output, through generalisation from the learned input-output combinations The threshold function ensures that the activation value will not go beyond certain values (generally, between 0 and 1) and prevents against catastrophic evolutions (loop effect where values become higher and higher)

c) Learning in Artificial Neural Networks

The basic operation of each AP is to multiply its input values by a weight (one

per input), add these together, place the result into a threshold function, and then

send the result to the neurodes downstream in the following layer The learning

mechanism of artificial neural networks is as follows: each set of example data is input to the ANN, then these values are propagated towards the output through the basic operation of each AP

The prediction obtained at the ANN’s output(s) is most probably erroneous, espe-cially at the beginning The error value is then computed as the difference between the expected value, and the actual output value This error value is back-propagated

by going upwards in the network and modifying the weights proportionally to each

AP’s contribution to the total error value This mechanism is repeated for each set

of example data in the learning set, while performance on the test set improves

This learning mechanism is called error back propagation The method is not

unique to artificial neural nets, and is a general method (i.e gradient method) appli-cable to other evolutionary computation objects

Trang 7

Fig 5.63 Boolean-function

input connections of the

artificial perceptron(an ,o0 )

Table 5.26 Boolean-function input values of the artificial perceptron(a n ,o0 )

For example, consider the input connections of the AP of an artificial neural network implementing the Boolean AND function (θ= 2), as illustrated in Fig 5.63 (Haykin 1999)

Consider all the possible values of the ANN implementing the Boolean AND function (θ= 2) for a0, a1, z, and o0

The two-dimensional pattern space of the AP can now be developed according

to the values given in Table 5.26 This is illustrated in Fig 5.64 The TLU groups its input vectors into two classes, one for which the output is 0, the other for which

the output is 1 The pattern space for an n input unit will be n-dimensional (Fausett

1994)

If the TLU uses thresholdθ, then for the[(x0)\vec] input vector, the output for

the decision plane∑∀i w i a i ≥θwill be 1, else 0 The equation for the decision plane

is∑∀i w i a i=θ, which is a diagonal line, as illustrated in Fig 5.64 Thus, in the case

of the previous example:

w0a0+ w1a1=θ⇔ a1= −(w0/w1) · a0+ (θ/w1)

Learning rules Several learning rules are used to train threshold logic units

(TLUs), such as the gradient descent technique and the delta learning rule.

Fig 5.64 Boolean-function

pattern space and TLU of the

artificial perceptron(a ,o )

Trang 8

Fig 5.65 The gradient

de-scent technique

Suppose y is a function of x (y = f (x)), f (x) is continuous, and the derivative

dy /dx can be found at any point However, if no information is available on the shape of f (x), local or global minimums cannot be found using classical methods

of calculus The slope of the tangent to the curve at x0is[dy/dx] x0

For small values ofΔx,Δy can be approximated using the expression

y1− y0= [dy/dx] x0(x1− x0) (5.110) where:

Δy = y1− y0

Δx = x1− x0.

Let:

Δx = dy/dx ·α⇒Δy

=α(dy/dx)2 where:

α is a small parameter not to overshoot any minimums or maximums

Starting from a given point (x0) in Fig 5.65, the local minima of the function f (x)

can be found by moving down the curve(Δx = dy/dx ·α), untilΔy becomes

neg-ative (at that point, the curve has already started moving away from the local

min-ima) This technique is termed the gradient descent The gradient descent technique

is used to train TLUs

d) Back Propagation in Artificial Neural Networks

Consider the ANN of Fig 5.66 Assume the neurodes are the TLUsα(x) = 0, ∀x

(Haykin 1999)

The back-propagation (BP) algorithm accounts for errors in the output layer

us-ing all the weights of the ANN Thus, if a TLU in the output layer is off, it will change weights not only between the hidden and output layer but also between the

input and hidden layer The BP algorithm uses the delta learning rule expressed as

Δw =α(t − z ) · a (Δx = dy/dxα)

Trang 9

Fig 5.66 Basic structure of

an artificial neural network:

back propagation

If the training set consists of the following pairs for the TLU:

[(x j )\vec],t j , j = 0, n and [(x j )\vec] = a j0 , a jm 

then the error for each pair is defined as

E j=1

The total error for the training set is

where E j ∀ j is a function of the weights connected to the TLU.

Thus, for all possible weight vectors, there exists an error measure (E) for a given

training set However, since the activation function is a step function, the error

mea-sure would not be a continuous function The value o j must be changed to z jin the

definition of the error E p, which means that the activation level is used, rather than the produced output to compute the error This yields a continuous function

E j=1

It can be shown that the slope of E j with respect to the ith weight is: −(t j −z j ) a ji; the delta learning rule is thus expressed as

Δw i=α(tj − z j )a ji(Δx = dy/dxα) (5.114)

when working with the jth training pair Thus, for a training set defined as:

[(x )],t , j = 0, ,m, x = x , ,x , and t = t , ,t 

Trang 10

i) Compute the output of the hidden layer using x j

ii) Compute the output of the output layer using the output of the hidden layer

(b0 b n)

iii) Calculate the error for each output node For the kth output node:

δk = (t jk − z k ), where z k is the activation of the kth output node.

iv) Train the output nodes using the delta rule, assuming mth hidden node, kth

out-put node is:Δw bmck=αδk b m

v) Calculate the error for each hidden node For the mth hidden node:

δm=∑k =1 nδk w bmckwhereδk is the computed error for the kth output node vi) Train hidden nodes using the delta rule (assuming the hth input node, lth hidden

node):Δw ahbm=αm x jh

These steps are repeated for each training vector, until the ANN produces acceptable outputs for the input vectors

e) Fuzzy Neural Rule-Based Systems

The basic advantage of neural networks is that the designer does not have to

pro-gram the system Take, for example, a complex ANN of which the input is an n × n

bitmap, which it recognises as the process equipment model (PEM) on the AIB blackboard (assuming the ANN is capable of distinguishing between a vessel, tank

and container, then the input layer has n2nodes, and the output layer has three nodes, one for each PEM) In the ideal case, the designer does not have to write any code specific, and simply chooses an appropriate ANN model and trains it The logic of each PEM is encoded in the weights and the activation functions

However, artificial neural networks also have their drawbacks They are funda-mentally black boxes, whereby the designer does not know what part of a large designed network is responsible for a particular part of the computed output Thus, the network cannot be modified to improve it

ANN models are good at reaching decisions based on incomplete information (i.e if the input vector does not match any of the training vectors, the network still computes a reasonable output in the sense that the output will probably be close to the output vector of a training vector that, in turn, is close to the input) Fuzzy rule-based systems are good at dealing with imprecise information However, determining their membership functions is usually difficult The fuzzy rule-based neural network basically makes up a membership function based on training vectors Consider for example, the fuzzy rules (Valluru 1995):

R1: IF x is F1THEN z is H1

R2: IF x is F2THEN z is H2 and

R n : IF x is F n THEN z is H n

To teach this rule-base to an ANN, the training pairs are:((F ,H ) (F ,H))

Định dạng
Số trang	10
Dung lượng	240,71 KB