Given a feed-forward neural network, our method produces an annotated neural work, where each layer is annotated with a set of valid linear inequality predicates.The main challenges for
Trang 1Queensland University of Technology, Brisbane
Center for Information Technology Innovation
March 2004
Trang 2The author hereby grants permission to the Queensland University of Technology, Brisbane to
reproduce and distribute publicly paper and electronic copies of this thesis document in whole
or in part.
Trang 3Artificial Neural Network, Annotated Artificial Neural Network, Rule-Extraction, idation of Neural Network, Polyhedra, Forward-propagation, Backward-propagation,Refinement Process, Non-linear optimization, Polyhedral Computation, PolyhedralProjection Techniques
Trang 5Val-byStephan Breutel
Abstract
A new method is developed to determine a set of informative and refined interface sertions satisfied by functions that are represented by feed-forward neural networks.Neural networks have often been criticized for their low degree of comprehensibility
as-It is difficult to have confidence in software components if they have no clear and validinterface description Precise and understandable interface assertions for a neural net-work based software component are required for safety critical applications and for theintegration into larger software systems
in computing refined interface assertions, which can be viewed as the computation ofthe strongest pre- and postconditions a feed-forward neural network fulfills Unions ofpolyhedra (polyhedra are the generalization of convex polygons in higher dimensionalspaces) are well suited for describing arbitrary regions of higher dimensional vectorspaces Additionally, polyhedra are closed under affine transformations
Given a feed-forward neural network, our method produces an annotated neural work, where each layer is annotated with a set of valid linear inequality predicates.The main challenges for the computation of these assertions is to compute the solu-tion of a non-linear optimization problem and the projection of a polyhedron onto alower-dimensional subspace
Trang 7net-List of Figures vi
1.1 Motivation and Significance 1
1.2 Notations and Definitions 4
1.3 Software Verification and Neural Network Validation 6
1.4 Annotated Artifical Neural Networks 9
1.5 Highlights and Organization of this Dissertation 13
1.6 Summary of this Chapter 15
2 Analysis of Neural Networks 17 2.1 Neural Networks 17
2.2 Validation of Neural Network Components 22
2.2.1 Propositional Rule Extraction 28
2.2.2 Fuzzy Rule Extraction 35
2.2.3 Region-based Analysis 44
2.3 Overview of Discussed Neural Network Validation Techniques and Validity Polyhedral Analysis 62
3 Polyhedra and Deformations of Polyhedral Facets under Sigmoidal
i
Trang 83.1 Polyhedra and their Representation 65
3.2 Operations on Polyhedra and Important Properties 69
3.3 Deformations of Polyhedral Facets under Sigmoidal Transformations 70 3.4 Summary of this Chapter 82
4 Nonlinear Transformation Phase 83 4.1 Mathematical Analysis of Non-Axis-parallel Splits of a Polyhedron 83 4.2 Mathematical Analysis of a Polyhedral Wrapping of a Region 86
4.2.1 Sequential Quadratic Programming 89
4.2.2 Maximum Slice Approach 90
4.2.3 Branch and Bound Approach 95
4.2.4 Binary Search Approach 98
4.3 Complexity Analysis of the Branch and Bound and the Binary Search Method 109
4.4 Summary of this Chapter 111
5 Affine Transformation Phase 113 5.1 Introduction to the Problem 113
5.2 Backward Propagation Phase 114
5.3 Forward Propagation Phase 117
5.4 Projection of a Polyhedron onto a Subspace 118
5.4.1 Fourier-Motzkin 120
5.4.1.1 A Variation of Fourier-Motzkin 123
5.4.2 Block Elimination 126
5.4.3 The -Box Approximation 131
5.4.3.1 Projection of a face 133
5.4.3.2 Determination of Facets of 134
5.4.3.3 Further Improvements of the S-Box method 137
5.4.4 Experiments 138
5.5 Further Considerations about the Approximation of the Image 140
5.6 Summary of this Chapter 140
Trang 96 Implementation Issues and Numerical Problems 143
6.1 The Framework 143
6.2 Numerical Problems 147
6.3 Summary of this Chapter 150
7 Evaluation of Validity Polyhedral Analysis 151 7.1 Overview and General Procedure 151
7.2 Circle Neural Network 153
7.3 Benchmark Data Sets 156
7.3.1 Iris Neural Network 156
7.3.2 Pima Neural Network 158
7.4 SP500 Neural Network 161
7.5 Summary of this Chapter 163
8 Conclusion and Future Work 165 8.1 Contributions of this Thesis 165
8.2 Fine Tuning of VPA 167
Trang 111.1 Annotated version of a neural network 10
2.1 Single neuron of a multilayer perceptron 19
2.2 Sigmoid and threshold activation functions and the graph of the func-tion computed by a single neuron with a two-dimensional input 20
2.3 Two-layer feed-forward neural network 21
2.4 Overview of different validation methods for neural networks 26
2.5 Example for the KT-Method 30
2.6 Example for the -of- method 34
2.7 DIBA : recursively projection of hyperplanes onto each other 51
2.8 DIBA: a decision region and the traversing of a line 52
2.9 Annotation of a neural network with validity intervals 56
2.10 Piece-wise linear approximation of the sigmoid function 61
3.1 Combination of two vectors in a two-dimensional space: from left to right: linear, non-negative, affine and convex combination 66
3.2 The back-propagation of a polyhedron through the transfer function layer Given the polyhedral description in the output space of a transfer function layer the reciprocal image of this polyhe-dron under the non-linear transfer function is given by "!$#&%' $ (!) 71
3.3 Approximation of the non-linear region 73
3.4 Subdivision into cells 74
3.5 Example for subdivision into cells 74
3.6 Point sampling approach 75
v
Trang 123.7 Eigenvalue and eigenvector analysis of the facet manifold 79
3.8 Example for convex curvature 80
3.9 Example for concave curvature 81
4.1 Non axis-parallel split of a polyhedron 84
4.2 Polyhedral wrapping of the non-linear region 87
4.3 Application of branch and bound 95
4.4 Two dimensional example for the binary search method 103
5.1 The forward and backward-propagation of a polyhedron through the weight layer 114
5.2 The projection of a polyhedron onto a two-dimensional subspace 121
5.3 Relevant hinge 124
5.4 Possible 135
5.5 The projected polyhedron is contained in the approximation 139
6.1 Overview of the framework 144
7.1 Visualisation for the behaviour of the circle neural network 156
7.2 Projection onto the subspace *,+-/.021354 7698 :<;>=@?7+BADC 158
7.3 Projection onto the subspace *,+-/.021354 7698FEHG=@?7+BADC 158
7.4 The computed output regions for the Pima Neural Network 160
7.5 The computed input region for the SP500 Neural Network 163
Trang 132.1 Overview of neural network validation techniques 63
vii
Trang 155.1 projExample 138
6.1 net-struct 144
6.2 mainLoop 145
6.3 forwardStep 146
6.4 mainVIA 147
6.5 mainVPA 147
6.6 numExample 148
ix
Trang 16The work contained in this thesis has not been previously submitted for a degree ordiploma at any other higher education institution To the best of my knowledge andbelief, the thesis contains no material previously published or written by another per-son except where due reference is made.
Stephan Breutel
March 2004
Trang 17I would like to thank my principal supervisor Fr´ed´eric Maire, without whose bounded energy, idealism, motivation and immense enthusiasm this research wouldnot be possible It was an honor to receive wisdom and guidance from all my su-pervisory team A special thanks also to my associate supervisor Ross Hayward whoprovided excellent support whenever I needed it.
un-Apart from my supervisors, I would like to thank my other panel members of my oraldefense, Joaquin Sitte and Arthur ter Hofstede for their valuable comments about thisthesis
I would also like to thank all of my colleagues in my research center for their helpand encouragement during my time here, and in particular all the people of the SmartDevices Laboratory It would take too many pages to list them all
I was fortunate to have many friends with whom I enjoyed fantastic climbing sessions
at Kangaroo Point and had many great times at the Press Club whilst listing to superbJazz performances, all of which helped to keep me relatively sane during my time atQUT
I would also like to thank the Coffee Coffee Coffee crew for keeping me awake byserving me literally over a thousand coffees (to be precise 1084)
Finally, I really would like to thank all my friends and my family back home in Bavariafor supporting my endevour A special thanks to my parents for their moral and finan-cial support
This work was supported in part by an IPRS Scholarship and a QUT Faculty of mation Technology Scholarship
Trang 19of this thesis is outlined in section 1.5.
1.1 Motivation and Significance
A conclusion of the report “Industrial use of safety-related artifical neural networks”
by Lisboa [Lis01] is that one of the keys to a successful transfer of neural networks
to marketplace is the integration with other systems (e.g standard software systems,fuzzy systems, rule-based systems) This requires an analysis of the behaviour ofneural network based components Examples for products using neural networks insafety-critical areas are [Lis01]:
Explosive detection, 1987 SNOOPE from SAIC is an explosive detector It wasmotivated by the need to detect the plastic explosive Semtex The detector irra-diated suitcases with low energy neutrons and collected an emission gamma-rayspectrum A standard feed-forward neural network was used to classify be-tween: bulk explosive, sheet explosive and no explosive However, there were
1
Trang 20several practical problems For example the 4% false positive of the MLP sulted in a large number of items to be checked This is not practical, especiallyfor highly frequented airports such as Heathrow and Los Angeles airport, wherethe system was tested.
Financial risk management PRISM by Nestor relied on a recursive adaptivemodel HNC’s Falcon is based on a regularised Multi layer perceptron (MLP).Both systems are still market leader for credit card fraud detection [LVE00]
Siemens applied neural networks for the control of steel rolling mines A totype neural network based model was used for strip temperature and rollingforce at the hot strip mill of Hoesch, in Dortmund, in 1993 Later Siemensapplied this technology at 40 rolling mills world-wide Siemens experience in-dicates that neural networks always complement, and never replace, physicalmodels Additionally, the domain expertise is essential in the validation process
pro-A third observation was that data requirements are severe
NASA and Boeing are testing a neural network-based damage recovery controlsystem for military and commercial aircraft This system has the aim to add asignificant margin of safety fly-by-wire control, when the aircraft sustains majorequipment or system failure, ranging from the inability to use flaps to encoun-tering extreme icing
Vibration analysis monitoring in jet engines is a joined research project by Royce and the Department of Engineering at Oxford University The diagnos-tic system QUINCE combines the outputs from neural networks with templatematching, statistical processing and signal processing methods The software isdesigned for the pass-off test of jet engines It includes a tracking facility of themost likely fault Another project is a real-time in-flight monitoring system ofthe Trent 900 Rolls-Royce engine The project combines different techniques,like for example Kalman filters with signal processing methods and neural net-works
In an European collaborative project involving leading car manufactures,
Trang 21differ-ent control systems ranging from engine managemdiffer-ent models to physical speed
control have been implemented These control systems combined engineering
expertise with non-linear interpolation by neural network architectures It
in-cluded rule-based systems, fuzzy systems and neural networks
Siemens produces the FP-11 intelligent fire detector This detector was trained
from fire tests carried out over many years According to [Lis01] this fire
detec-tor triggered one-thirtieth false alarms of conventional detecdetec-tors The detecdetec-tor
is based on a digital implementation of fuzzy logic, with rules discovered by a
neural network but validated by human experts
These examples show that neural networks need to be integrated with other systems,
that it is relevant to extract rules learnt by neural networks (for example for the fire
detector FP-11 or for the credit card fraud detection) and that it is important to provide
valid statements of the neural network behaviour in safety-critical applications
Therefore, it is necessary to describe the neural network behaviour, e.g in form of
valid and refined rules Additionally, it is interesting to obtain explanations for the
neural network behaviour
However, our main motivation is to compute valid statements about the neural network
behaviour and as such help to prevent software faults Software errors can cause a lot of
problems and risks, especially in safety-critical environments Several software errors
and their consequences are collected in [Huc99] The explosion of the Ariane 5 rocket
and the loss of Mars climate orbiter [Huc99] are recent examples of consequences of
Trang 22control the generalization of a trained neural network Generalization expresses
the ability of the neural network to produce correct output for previously unseeninput data A description of the neural network behaviour will provide a betterinsight into the neural network generalization capability,
visualize corresponding regions in the input and the output space of a neuralnetwork
1.2 Notations and Definitions
The following notation conventions are inspired from the book by Fritzke [Fri98] andmost of the conventions followed the Matlab [Mat00c] notation
column or row vector or any arbitrary element within the matrix, we use the
JOPRQSMUTWVXMYKF extracts theQ -th and theT-th row vector
the vector
Trang 23the horizontal concatenation of two matrices with the same number of rows.
Additionally, we use the convention, that definitions and expressions, which are
time Also function names are written in emphasized style
An overview of all symbols and special operators is provided in appendix A
How-ever, the following table contains symbols which are already relevant for this chapter
and for the literature review of neural network analysis techniques in Chapter 2
The next table defines a special function symbol, one operator and the interval notation
P pqMsrUV denotes the intervalt puvt,r>
To refer to sigmoidal functions we often use the Matlab terms logsig and tansig
1
A glossary is not provided, yet, but it will be included in the final version We apologise for any
inconveniences.
Trang 241.3 Software Verification and Neural Network Validation
We will divide software components, depending on the task and its implementation,into two classes, namely into trainable software components and into non-trainablesoftware components
Definition 1.1 Trainable Software Components
Software components for classification and non-linear regression, where the task is plicitly specified via a set of examples, and a set of parameters is optimized according
Typically, we use trainable software components where it is not easy or not ble to define a clear algorithm For example, in tasks like speech recognition, imagerecognition or robotic control, often statistical learners, like neural networks or supportvector machines are applied
possi-Definition 1.2 Non-trainable Software Components
Software components, where the task is precisely specified, an algorithm can be fined and the task is implemented with a programming language are called non-trainable
We also refer to non-trainable software components as standard software
Software Verification and Validation of Neural Network Components
Standard software program verification methods take the source code as input andprove the correctness against the (formal) specification Among others, important con-
cepts of software verification are pre- and postconditions.
Definition 1.3 Precondition and Postcondition
output data, the statement
Trang 25indicates, that for every input status which fulfills 5x{ before the execution of the
We can view pre- and postconditions as specifications of the component properties
However, using pre- and postconditions does not assure that the software component
fulfills these specifications A formal proof is required
In the context of artifical neural networks we talk about validation techniques In the
following we discuss the central ideas of neural network validation techniques in a
nut-shell The validation approaches for neural networks are propositional rule extraction,
fuzzy rule extraction and region based analysis Propositional rule extraction methods
take a (trained) neural network component as input, extract symbolic rules and test
them against the network behaviour itself and against the test data These methods are
helpful to test neural networks
Fuzzy rule extraction methods try to extract a set of fuzzy rules, which mimics the
behaviour of neural networks The advantage of fuzzy rule extraction, compared to
propositional rule extraction, is that, generally, less rules are needed to explain the
neural network behaviour In addition, with the use of linguistic expressions easily
understandable characterizations about the neural network behaviour are obtained
Region based analysis methods take a (trained) neural network as input and compute
related regions in the input and output space Region based analysis techniques differ
from the above methods because they have a geometric origin, are usable for a broader
range of neural networks and have the ability to compute more accurate interface
de-scriptions of a neural network component These methods compute a region mapping
methods, these region based rules agree exactly with the behaviour of the neural
net-work The more refined those regions are, the more information we obtain about the
neural network Validity Interval Analysis (VIA), developed by Thrun [Thr93], for
example, is able to find provably correct axis-parallel rules, i.e rules of the form: “if
region based analysis approaches are suitable to validate the behaviour of neural
Trang 26net-work based software components.
The development of large software systems requires the interaction of different ware components As motivated with the examples, we need some kind of humanunderstandable descriptions of neural network based software components (e.g byusing fuzzy rules) as well as techniques to assert important properties of the neuralnetwork behaviour (e.g in form of valid relations between input and output regions).Our approach to compute corresponding regions, represented as unions of polyhedra,
soft-in the soft-input and the output space of a neural network, is able to validate propertiesabout the neural network behaviour
Definition 1.4 Polyhedral Precondition and polyhedral Postcondition
postcon-dition, where the constraints on the output data are expressed as a system of linear
We can view polyhedral pre- and postconditions as conjunctions of linear inequalitypredicates We also use the terminology “polyhedral interface assertions” or “polyhe-dral interface description”
Among others, the following properties are desirable for methods analysing the haviour of trained neural networks:
generality (also known in the literature as portability, e.g see Andrews et al.
[TAGD98]): algorithms that make no assumptions about the neural networkarchitecture and the learning algorithm,
precise and concise description of the neural network behaviour, e.g in form of
a small number of informative and refined rules,
polynomial algorithmic time and space complexity In other words the algorithm
is still applicable for higher-dimensional cases,
Trang 27usable to validate properties about the neural network behaviour
1.4 Annotated Artifical Neural Networks
Our approach is to forward- and backward-propagate finite unions of polyhedra through
all layers of the neural network This strategy can be viewed as an extension of
Valid-ity Interval Analysis (VIA) and is consequently named ValidValid-ity Polyhedral Analysis
(VPA) The method is very general, as the only assumptions are, that we work with
feed-forward neural networks (a brief introduction to feed-forward neural networks
follows in Chapter 2), and that the network has invertible and continuous
transfer-functions
relies on polyhedra
Our validation algorithm uses a, generally, trained neural network as input and
pro-duces an annotated version of this neural network
Trang 28Definition 1.5 Annotated Artifical Neural Network (AANN)
An artifical neural network, where the input and output of each layer is annotated via
a set of valid pre- and postconditions is named an Annotated Artifical Neural Network(AANN)
For example VIA produces pre-and postconditions in form of axis-parallel rules VPAannotates a neural network via a set of linear inequality predicates
σσ
Figure 1.1: Annotated version of a neural network.
Generally, the behaviour of neural networks can be more accurately described with afinite number of unions of polyhedra compared to a finite number of unions of axis-parallel hypercubes
There is an interesting analogy between the notion of annotated neural networks andsoftware-verification for programs One strategy to verify the correctness of a programagainst a given specification is to annotate the program with logical expressions and toprove the correctness of each step In our case the “program” is a neural network andeach layer is annotated with a set of valid linear inequality predicates
Trang 29A Bridge to Logic and Software Verification
The Hoare calculus provides a formal framework to verify the correctness of programs
by annotating the program with assertions about the status of the program variables
and the change of this status under the program execution The Hoare calculus defines
rules for the correct annotation of a program The book by Broy [Bro97] provides a
thorough introduction to the basics of the Hoare calculus Within the scope of this
the-sis the rule of statement sequence, the rule of consequence and the concepts of weaker
MOMO
denote predicatelogical pre-and postconditions with program variables as free identifiers Statements
of the program are represented with S, S1 and S2 In the following description the rule
condition and the rule consequence are separated by a horizontal line
Rule of Statement Sequence
Trang 30Definition 1.7 Weaker and Stronger Postcondition
is the weaker postcondition.
We can denote a feed-forward neural network as a finite sequence of an affine
layers of an annotated neural network
Rule of Statement Sequence for an Annotated Artifical Neural Network
of the rule of statement sequence on a multilayer feed-forward neural network, allows
us to write:
~®>±²²B°
where ANN represents the sequence of computation a feed-forward neural network
Trang 31of an annotated neural network are, the stronger the corresponding pre - and
postcon-ditions It turns out, that our geometrical perspective is quite useful, as it allows us
to define a precise measurement for the strength of a precondition or postcondition,
namely the volume of the corresponding region
1.5 Highlights and Organization of this Dissertation
The highlights and the organization of this thesis are as follows:
Chapter 2: Analysis of Neural Networks
In this chapter we introduce basic concepts of feed-forward neural networks and
pro-vide a literature overview of validation methods for neural network components We
classified the validation methods into propositional rule extraction, fuzzy rule
extrac-tion and region-based analysis Finally, the different methods are compared and our
approach, named Validity Polyhedral Analysis (VPA), is motivated
Chapter 3: Polyhedral Computations and Deformations of Polyhedral Facets
un-der Sigmoidal Transformations
Polyhedra are the generalization of convex polygons to higher dimensional spaces
This chapter presents the most important properties and concepts of polyhedral
analy-sis to make this theanaly-sis self contained
To obtain refined polyhedral interface assertions, we have to propagate unions of
poly-hedra through all layers of a neural network This requires computing the image of a
polyhedron under a non-linear transformation The image of non-axis parallel
polyhe-dra, under a sigmoidal transformation are non-linear regions In our initial
investiga-tions we analyse how polyhedral facets get twisted under a sigmoidal transformation
Chapter 4: Mathematical Analysis of the Non-linear Transformation Phase
In this chapter we explain how to approximate the image of a polyhedron under a
non-linear transformation by a finite union of polyhedra This approximation process can
Trang 32be reduced to a non-linear optimization problem Several approaches to approximatethe global maximum of the corresponding optimization problem are discussed.
Chapter 5: Mathematical Analysis of the Affine Transformation Phase
The computation of the reciprocal image of a polyhedron under an affine tion is explained in this chapter Furthermore, this chapter discusses how to calculatethe image of a polyhedron under an affine transformation and strategies for computing
transforma-or approximating the projection of a polyhedron onto a lower dimensional subspace.Within the scope of this thesis projection techniques are used for the computation of animage of a polyhedron under an affine transformation characterized by a non-invertiblematrix
Chapter 6: Implementation Issues and Numerical Problems
This chapter discusses the design and implementation of a general framework for anyregion-based refinement algorithm The framework is successfully used for the Valid-ity Interval Analysis (VIA) and our new Validity Polyhedral Analysis (VPA) method
It is always necessary to study numerical properties of a mathematical algorithm whenimplementing the algorithm on a digital machine with finite precision Section 6.2 isdevoted to these problems, which will be referred to as the numerical problems
Chapter 7: Evaluation of Validity Polyhedral Analysis
Validity Polyhedral Analysis computes interface assertions of a neural network in hedral format We evaluated VPA on toy neural networks, on neural networks trainedwith benchmark data sets of the UI Irvine database [Rep] and on a neural networktrained to predict the SP500 stock-market index Additionally, the method is com-pared to VIA (Validity Interval Analysis) and the refinement process is discussed
Trang 33poly-Chapter 8: Conclusion and Future Work
This chapter summarizes the main contributions, explains how to “fine tune” the
in-troduced VPA strategy, and finally motivates future investigations to obtain validation
techniques for kernel-based machines, like for example support vector machines
Appendices
Appendix A summarizes all used symbols and notations
Appendix B recalls the relevant knowledge and notions of linear algebra to make this
thesis self contained
1.6 Summary of this Chapter
As throughout this thesis a summary of the chapter and a list of new contributions is
provided
To motivate neural network validation methods, examples of neural network
compo-nents in safety-critical applications have been illustrated Notation conventions have
been introduced, neural network validation techniques discussed and criteria for
neu-ral network validation methods have been formulated Finally, the idea of validity
polyhedral analysis was introduced, which is used to obtain an annotated version of a
feed-forward neural network
Contributions Chapter 1
The notion of Annotated Artifical Neural Networks (AANN) and application
of the method of assertions to neural networks
Validity Polyhedral Analysis (VPA), as a tool to annotate a feed-forward
neu-ral network with valid pre- and postconditions in form of linear inequality
predicates
Trang 35Analysis of Neural Networks
Section 2.1 recalls central ideas of artifical neural networks For a very thorough troduction, the reader is referred to the excellent book by Haykin [Hay99]
in-As motivated in the introduction, trained neural network components need to undergo
a validation or testing procedure before their (industrial) use Section 2.2 is devoted tothe topic of neural network validation
Finally, a short summary of the discussed validation methods is provided and our proach, named Validity Polyhedral Analysis (VPA) is motivated (why we do it) andjustified (why we do it this way)
ap-2.1 Neural Networks
Artifical Neural Networks (ANNs) are partly inspired by observations about the logical brain These observations led to the conclusion that information in biologicalneural systems are processed in parallel over a network of a large number of intercon-nected, distributed neurons (simple computational units) However, there are a lot ofdifferences between biological neural systems and ANNs
bio-For example, the output of a neuron of an ANN is a single value, whereas a biological
Trang 36are general function approximators, able to learn from a set of examples ThereforeANNs are also often characterized as statistical learners The main features of neuralnetwork based machines are [Fri98]:
usually nonlinear transformations are used
In this work we focus on sigmoidal feed-forward neural networks (also known as
2.1 The input to a neuron of a feed-forward neural network is calculated by puting the weighted sum of the activation of preceding neurons and adding a bias
NL, whereµ is the weight vector and the activation of the
neuron, is denoted with
NL
In case of a threshold activation function a neuron is active (has a positive value),
A hyperplane defined by
º$ µ»¼<¨¦ is a hyperplane through the origin, and splits the input space into
¹ ¿
.This geometrical interpretation also explains why, historically [Ros58], neural net-works were introduced as classifiers For example, for a linearly separable classifica-tion problem a single weight layer neural network would be sufficient to learn the taskcorrectly Labeled data is said to be linearly separable, if the patterns lie on oppositesides of a hyperplane
Trang 37Figure 2.1: Single neuron of a multilayer perceptron, wherek
ÁÀÂMYKF are the weighted
Often the threshold function
Ná´v¨
K ghji
are used as activation functions for feed-forward neural networks The graphs of these
functions are on the left of Figure 2.2 The figure also shows the function output of a
single output neuron with 2 dimensional input space, when applying a) the logsig and
b) the threshold function to the neuron input
Trang 38Figure 2.2: Sigmoid and threshold activation functions and the graph of the function
computed by a single neuron with a two-dimensional input In this case the weight
transfer function
A feed-forward neural network architecture has an input layer, several hidden layersand an output layer The input vector propagates through all layers of the neural net-work in a forward direction The dimension (size) of the input layer, i.e the number
of input neurons and the dimension of the output layer are defined by the application
It is difficult to determine a priori a suitable number of hidden layers and the number
of hidden neurons for each of these layers This has to be solved during the modelselection process In Figure 2.3 we show a typical multilayer perceptron architecture
Trang 39θ θ
= W −
net 2x
2 2
connecting the previous layer
W ,θ θ2 Layer Weight matrix, and successor layer and the bias.
x2 input to the second layer input vector to the second
y the neural network output vector
and output of the first transfer−
function layer.
Neural Network Architecture
(net) output vector
Weight Layer
Weight Layer Layer
(net) input vector Layer
The most widespread learning method is supervised learning Supervised learning
as-sumes that the output data for the training set is available The neural network learns
by adjusting stepwise the weight parameters and biases, according to the difference
of its actual output and the desired output An example for this type of learning is
the backpropagation of error algorithm This algorithm is typically used to train
feed-forward neural networks
Other often applied machine learning strategies are unsupervised learning and
rein-forcement learning Unsupervised learning is used, for example, to cluster data points
with unknown class labels into groups with similar inputs
Trang 40Reinforcement learning describes a process, where an autonomous agent that acts in
an environment learns to choose optimal actions to achieve its goals The reader isreferred to the book by Fritzke [Fri98] for more information on unsupervised learningand to the book by Mitchell [Mit97] for reinforcement learning
2.2 Validation of Neural Network Components
Artifical neural networks have numerous advantages, like for example, universal proximation capability and ability to learn However, neural networks have often beencriticized for their low degree of comprehensibility For a user of a neural network it
ap-is impossible to infer how a specific output ap-is obtained
Validation of ANNs is important in safety-critical problem domains and for the tegration of ANN components into large software environments interface assertions,describing the behaviour of the neural network, are desirable
in-This section gives a literature overview of methods useful to explain the function puted by an ANN The methods are categorized into three groups, namely proposi-tional rule extraction, fuzzy rule extraction and region based analysis Before describ-ing some approaches of these classes in detail, we will define the problem of validatingneural network components, introduce some useful formalism and provide an exam-ple The excellent introduction book to computer science by Broy [Bro97] starts with
com-the definition of com-the terms information, representation and interpretation.
Definition 2.1 Information and Representation
We call information the abstract content (semantics) of a document, expression,
Definition 2.2 Interpretation
We can view an interpretation function as a mapping from one representation
[Bro97] Two important remarks [Bro97]:
...thorough introduction to the basics of the Hoare calculus Within the scope of this
the- sis the rule of statement sequence, the rule of consequence and the concepts of weaker
MOMO... of a neural network.
Generally, the behaviour of neural networks can be more accurately described with afinite number of unions of polyhedra compared to a finite number of unions of. .. regions, represented as unions of polyhedra,
soft-in the soft-input and the output space of a neural network, is able to validate propertiesabout the neural network behaviour
Definition