5.54 The building blocks of artificial neural networks, where σ is the non-linearity, x i the output of unit i, x jthe input to unit j, and wi jare the weights that connect unit i to uni
Trang 1704 5 Safety and Risk in Engineering Design
close to reality) and associative (i.e include typical profiles) but not descriptive
Ex-amining the artificial neural network itself only shows meaningless numeric values
The ANN model is fundamentally a black box On the other hand, being
contin-uous and derivable, one can explore ANN models beyond simple statistical inter-rogation to determine typical profiles, explicative variables (network inputs), and apply example data to determine their associated probabilities Artificial neural net-works have the ability to account for any functional dependency by discovering (i.e learning and then modelling) the nature of the dependency without needing to be prompted The process goes straight from the data to the model without intermedi-ary interpretation or problem simplification There are no inherent conditions placed
on the predicted variable, which can be a yes/no output, a continuous value, or one
or more classes among n, etc However, artificial neural networks are insensitive to unreliability in the data.
Artificial neural networks have been applied in engineering design in predictive
modelling of system behaviour using simulation augmented with ANN model in-terpolation (Chryssolouris et al 1989), as well as in inin-terpolation of Taguchi robust design points so that a full factorial design can be simulated to search for optimal
design parameter settings (Schmerr et al 1991)
An artificial neural network is a set of elements (i.e neurodes or, more com-monly, neurons) linked to one another, and that transmit information to each other through connected links Example data (a to i) are given as the inputs to the ANN
model Various values of the data are then transmitted through the connections, be-ing modified durbe-ing the process until, on arrival at the bottom of the network, they have become the predicted values—for example, the pair of risk probabilities P1 and P2 indicated in Fig 5.53
a) The Building Blocks of Artificial Neural Networks
Artificial neural networks are highly distributed interconnections of adaptive
non-linear processing elements (PEs), as illustrated below in Fig 5.54.
The connection strengths, also called the network weights, can be adapted so that
the network’s output matches a desired response A more detailed view of a PE is shown in Fig 5.55
An artificial neural network is no more than an interconnection of PEs The form
of the interconnection provides one of the key variables for dividing neural networks into families The most general case is the fully connected neural network By defini-tion, any PE can feed or receive activations of any other, including itself Therefore, when the weights are represented in matrix form (the weight matrix), it will be fully populated A (6×6) PE fully connected network is presented in Fig 5.56.
This network is called a recurrent network In recurrent networks, some of the
connections may be absent but there are still feedback connections An input
pre-sented to a recurrent network at time t will affect the networks output for future time steps greater than t Therefore, recurrent networks need to be operated over time.
If the interconnection matrix is restricted to feed-forward activations (no feedback
Trang 25.3 Analytic Development of Safety and Risk in Engineering Design 705
Fig 5.53 Schematic layout of a complex artificial neural network (Valluru 1995)
Fig 5.54 The building blocks
of artificial neural networks,
where σ is the non-linearity,
x i the output of unit i, x jthe
input to unit j, and wi jare the
weights that connect unit i to
unit j
Fig 5.55 Detailed view of
a processing element (PE)
nor self connections), the ANN is defined as a feed-forward network Feed-forward networks are instantaneous mappers, i.e the output is valid immediately after the presentation of an input A special class of feed-forward networks is the layered
Trang 3706 5 Safety and Risk in Engineering Design
Fig 5.56 A fully connected ANN, and its weight matrix
Fig 5.57 Multi-layer
percep-tron structure
class, also termed a multi-layer perceptron (MLP) This describes a network that
consists of a single layer of non-linear PEs without feedback connections Multi-layer perceptrons have PEs arranged in Multi-layers whereby the Multi-layers that receive input
are called the input layers, layers in contact with the outside world are called output layers, and layers without direct access to the outside world, i.e connected to the
input or output, are called hidden layers (Valluru 1995)
The weight matrix of a multi-layer perceptron can be developed as follows
(Figs 5.57 and 5.58): from the example MLP in Fig 5.57, the input layer contains PEs 1, 2 and 3, the hidden layer contains PEs 4 and 5, and the output layer contains
PE 6
Figure 5.58 shows the MLP’s weight matrix Most entries in the weight matrix
of an MLP are zero In particular, any feed-forward network has at least the main diagonal, and the elements below it populated with zeros Feed-forward neural net-works are therefore a special case of recurrent netnet-works Implementing partially connected topologies with the fully connected system and then zeroing weights is inefficient but is sometimes done, depending on the requirements for the artificial neural network A case in point would be the weight matrix of the MLP below:
Trang 45.3 Analytic Development of Safety and Risk in Engineering Design 707
Fig 5.58 Weight matrix
structure for the multi-layer
perception
b) Structure of Artificial Neural Networks
A basic artificial neural network (ANN) structure thus consists of three layers: the input layer, the hidden layer, and the output layer, as indicated in Fig 5.59 (Haykin 1999)
This MLP works in the following manner: for a given input vector
[(x0) \ vec] = {a0, a i } (5.104)
the following output vector is computed
[(o0) \ vec] = {c0, c i } (5.105)
The ANN implements the function f where
f ([(x0) \ vec]) = [(o0) \ vec] (5.106)
The basic processing element (PE) group of the MLP is termed the artificial perceptron (AP) The AP has a set of input connections from PEs of another layer,
as indicated in Fig 5.60 (Haykin 1999)
Fig 5.59 Basic structure of
an artificial neural network
Trang 5708 5 Safety and Risk in Engineering Design
Fig 5.60 Input connections
of the artificial perceptron
(an ,b1 )
a0
an Wan b1
Wa1 b1 Wa0 b1
An AP computes its output in the following fashion: the output is usually a real
number, and is the function of the activation, z i, where
The activation is computed as follows
δ = the activation function There are many different activation functions (σ) in use ANNs that work with binary vectors usually use the step-function:
σ(z) = 1
z ∈ [θ,∞) else 0 (usuallyθ= 0) These activation functions (σ) are called threshold logic units (TLUs), as
indi-cated in the binary step-function illustrated in Fig 5.61
Fig 5.61 The binary
step-function threshold logic unit
(TLU)
1
z σ(z)
θ
Trang 65.3 Analytic Development of Safety and Risk in Engineering Design 709
Fig 5.62 The non-binary
sigmoid-function threshold
logic unit (TLU)
1
z σ(z)
Graphic examples of threshold logic units (TLU) (Fausett 1994):
Non-binary ANNs often use the sigmoid function as activation function where the
parameterρ determines the shape of the sigmoid, as indicated in Fig 5.62 and in
Eq 5.109
The most significant advantage of an MLP is that the artificial neural network
is highly parallel The MLP is also robust in the presence of noise (i.e deviations
in input) where a small amount of noise will not drastically affect the output Fur-thermore, it can deal with unseen output, through generalisation from the learned input-output combinations The threshold function ensures that the activation value will not go beyond certain values (generally, between 0 and 1) and prevents against catastrophic evolutions (loop effect where values become higher and higher)
c) Learning in Artificial Neural Networks
The basic operation of each AP is to multiply its input values by a weight (one
per input), add these together, place the result into a threshold function, and then
send the result to the neurodes downstream in the following layer The learning
mechanism of artificial neural networks is as follows: each set of example data is input to the ANN, then these values are propagated towards the output through the basic operation of each AP
The prediction obtained at the ANN’s output(s) is most probably erroneous, espe-cially at the beginning The error value is then computed as the difference between the expected value, and the actual output value This error value is back-propagated
by going upwards in the network and modifying the weights proportionally to each
AP’s contribution to the total error value This mechanism is repeated for each set
of example data in the learning set, while performance on the test set improves
This learning mechanism is called error back propagation The method is not
unique to artificial neural nets, and is a general method (i.e gradient method) appli-cable to other evolutionary computation objects
Trang 7710 5 Safety and Risk in Engineering Design
Fig 5.63 Boolean-function
input connections of the
artificial perceptron(an ,o0 )
Table 5.26 Boolean-function input values of the artificial perceptron(a n ,o0 )
For example, consider the input connections of the AP of an artificial neural network implementing the Boolean AND function (θ= 2), as illustrated in Fig 5.63 (Haykin 1999)
Consider all the possible values of the ANN implementing the Boolean AND function (θ= 2) for a0, a1, z, and o0
The two-dimensional pattern space of the AP can now be developed according
to the values given in Table 5.26 This is illustrated in Fig 5.64 The TLU groups its input vectors into two classes, one for which the output is 0, the other for which
the output is 1 The pattern space for an n input unit will be n-dimensional (Fausett
1994)
If the TLU uses thresholdθ, then for the[(x0)\vec] input vector, the output for
the decision plane∑∀i w i a i ≥θwill be 1, else 0 The equation for the decision plane
is∑∀i w i a i=θ, which is a diagonal line, as illustrated in Fig 5.64 Thus, in the case
of the previous example:
w0a0+ w1a1=θ⇔ a1= −(w0/w1) · a0+ (θ/w1)
Learning rules Several learning rules are used to train threshold logic units
(TLUs), such as the gradient descent technique and the delta learning rule.
Fig 5.64 Boolean-function
pattern space and TLU of the
artificial perceptron(a ,o )
Trang 85.3 Analytic Development of Safety and Risk in Engineering Design 711
Fig 5.65 The gradient
de-scent technique
Suppose y is a function of x (y = f (x)), f (x) is continuous, and the derivative
dy /dx can be found at any point However, if no information is available on the shape of f (x), local or global minimums cannot be found using classical methods
of calculus The slope of the tangent to the curve at x0is[dy/dx] x0
For small values ofΔx,Δy can be approximated using the expression
y1− y0= [dy/dx] x0(x1− x0) (5.110) where:
Δy = y1− y0
Δx = x1− x0.
Let:
Δx = dy/dx ·α⇒Δy
=α(dy/dx)2 where:
α is a small parameter not to overshoot any minimums or maximums
Starting from a given point (x0) in Fig 5.65, the local minima of the function f (x)
can be found by moving down the curve(Δx = dy/dx ·α), untilΔy becomes
neg-ative (at that point, the curve has already started moving away from the local
min-ima) This technique is termed the gradient descent The gradient descent technique
is used to train TLUs
d) Back Propagation in Artificial Neural Networks
Consider the ANN of Fig 5.66 Assume the neurodes are the TLUsα(x) = 0, ∀x
(Haykin 1999)
The back-propagation (BP) algorithm accounts for errors in the output layer
us-ing all the weights of the ANN Thus, if a TLU in the output layer is off, it will change weights not only between the hidden and output layer but also between the
input and hidden layer The BP algorithm uses the delta learning rule expressed as
Δw =α(t − z ) · a (Δx = dy/dxα)
Trang 9712 5 Safety and Risk in Engineering Design
Fig 5.66 Basic structure of
an artificial neural network:
back propagation
If the training set consists of the following pairs for the TLU:
[(x j )\vec],t j , j = 0, n and [(x j )\vec] = a j0 , a jm
then the error for each pair is defined as
E j=1
The total error for the training set is
where E j ∀ j is a function of the weights connected to the TLU.
Thus, for all possible weight vectors, there exists an error measure (E) for a given
training set However, since the activation function is a step function, the error
mea-sure would not be a continuous function The value o j must be changed to z jin the
definition of the error E p, which means that the activation level is used, rather than the produced output to compute the error This yields a continuous function
E j=1
It can be shown that the slope of E j with respect to the ith weight is: −(t j −z j ) a ji; the delta learning rule is thus expressed as
Δw i=α(tj − z j )a ji(Δx = dy/dxα) (5.114)
when working with the jth training pair Thus, for a training set defined as:
[(x )],t , j = 0, ,m, x = x , ,x , and t = t , ,t
Trang 105.3 Analytic Development of Safety and Risk in Engineering Design 713
i) Compute the output of the hidden layer using x j
ii) Compute the output of the output layer using the output of the hidden layer
(b0 b n)
iii) Calculate the error for each output node For the kth output node:
δk = (t jk − z k ), where z k is the activation of the kth output node.
iv) Train the output nodes using the delta rule, assuming mth hidden node, kth
out-put node is:Δw bmck=αδk b m
v) Calculate the error for each hidden node For the mth hidden node:
δm=∑k =1 nδk w bmckwhereδk is the computed error for the kth output node vi) Train hidden nodes using the delta rule (assuming the hth input node, lth hidden
node):Δw ahbm=αm x jh
These steps are repeated for each training vector, until the ANN produces acceptable outputs for the input vectors
e) Fuzzy Neural Rule-Based Systems
The basic advantage of neural networks is that the designer does not have to
pro-gram the system Take, for example, a complex ANN of which the input is an n × n
bitmap, which it recognises as the process equipment model (PEM) on the AIB blackboard (assuming the ANN is capable of distinguishing between a vessel, tank
and container, then the input layer has n2nodes, and the output layer has three nodes, one for each PEM) In the ideal case, the designer does not have to write any code specific, and simply chooses an appropriate ANN model and trains it The logic of each PEM is encoded in the weights and the activation functions
However, artificial neural networks also have their drawbacks They are funda-mentally black boxes, whereby the designer does not know what part of a large designed network is responsible for a particular part of the computed output Thus, the network cannot be modified to improve it
ANN models are good at reaching decisions based on incomplete information (i.e if the input vector does not match any of the training vectors, the network still computes a reasonable output in the sense that the output will probably be close to the output vector of a training vector that, in turn, is close to the input) Fuzzy rule-based systems are good at dealing with imprecise information However, determining their membership functions is usually difficult The fuzzy rule-based neural network basically makes up a membership function based on training vectors Consider for example, the fuzzy rules (Valluru 1995):
R1: IF x is F1THEN z is H1
R2: IF x is F2THEN z is H2 and
R n : IF x is F n THEN z is H n
To teach this rule-base to an ANN, the training pairs are:((F ,H ) (F ,H))