1. Trang chủ
  2. » Ngoại Ngữ

Analysing the behaviour of neural networks

203 119 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 203
Dung lượng 1,62 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Given a feed-forward neural network, our method produces an annotated neural work, where each layer is annotated with a set of valid linear inequality predicates.The main challenges for

Trang 1

Queensland University of Technology, Brisbane

Center for Information Technology Innovation

March 2004

Trang 2

The author hereby grants permission to the Queensland University of Technology, Brisbane to

reproduce and distribute publicly paper and electronic copies of this thesis document in whole

or in part.

Trang 3

Artificial Neural Network, Annotated Artificial Neural Network, Rule-Extraction, idation of Neural Network, Polyhedra, Forward-propagation, Backward-propagation,Refinement Process, Non-linear optimization, Polyhedral Computation, PolyhedralProjection Techniques

Trang 5

Val-byStephan Breutel

Abstract

A new method is developed to determine a set of informative and refined interface sertions satisfied by functions that are represented by feed-forward neural networks.Neural networks have often been criticized for their low degree of comprehensibility

as-It is difficult to have confidence in software components if they have no clear and validinterface description Precise and understandable interface assertions for a neural net-work based software component are required for safety critical applications and for theintegration into larger software systems

in computing refined interface assertions, which can be viewed as the computation ofthe strongest pre- and postconditions a feed-forward neural network fulfills Unions ofpolyhedra (polyhedra are the generalization of convex polygons in higher dimensionalspaces) are well suited for describing arbitrary regions of higher dimensional vectorspaces Additionally, polyhedra are closed under affine transformations

Given a feed-forward neural network, our method produces an annotated neural work, where each layer is annotated with a set of valid linear inequality predicates.The main challenges for the computation of these assertions is to compute the solu-tion of a non-linear optimization problem and the projection of a polyhedron onto alower-dimensional subspace

Trang 7

net-List of Figures vi

1.1 Motivation and Significance 1

1.2 Notations and Definitions 4

1.3 Software Verification and Neural Network Validation 6

1.4 Annotated Artifical Neural Networks 9

1.5 Highlights and Organization of this Dissertation 13

1.6 Summary of this Chapter 15

2 Analysis of Neural Networks 17 2.1 Neural Networks 17

2.2 Validation of Neural Network Components 22

2.2.1 Propositional Rule Extraction 28

2.2.2 Fuzzy Rule Extraction 35

2.2.3 Region-based Analysis 44

2.3 Overview of Discussed Neural Network Validation Techniques and Validity Polyhedral Analysis 62

3 Polyhedra and Deformations of Polyhedral Facets under Sigmoidal

i

Trang 8

3.1 Polyhedra and their Representation 65

3.2 Operations on Polyhedra and Important Properties 69

3.3 Deformations of Polyhedral Facets under Sigmoidal Transformations 70 3.4 Summary of this Chapter 82

4 Nonlinear Transformation Phase 83 4.1 Mathematical Analysis of Non-Axis-parallel Splits of a Polyhedron 83 4.2 Mathematical Analysis of a Polyhedral Wrapping of a Region 86

4.2.1 Sequential Quadratic Programming 89

4.2.2 Maximum Slice Approach 90

4.2.3 Branch and Bound Approach 95

4.2.4 Binary Search Approach 98

4.3 Complexity Analysis of the Branch and Bound and the Binary Search Method 109

4.4 Summary of this Chapter 111

5 Affine Transformation Phase 113 5.1 Introduction to the Problem 113

5.2 Backward Propagation Phase 114

5.3 Forward Propagation Phase 117

5.4 Projection of a Polyhedron onto a Subspace 118

5.4.1 Fourier-Motzkin 120

5.4.1.1 A Variation of Fourier-Motzkin 123

5.4.2 Block Elimination 126

5.4.3 The -Box Approximation 131

5.4.3.1 Projection of a face 133

5.4.3.2 Determination of Facets of 134

5.4.3.3 Further Improvements of the S-Box method 137

5.4.4 Experiments 138

5.5 Further Considerations about the Approximation of the Image 140

5.6 Summary of this Chapter 140

Trang 9

6 Implementation Issues and Numerical Problems 143

6.1 The Framework 143

6.2 Numerical Problems 147

6.3 Summary of this Chapter 150

7 Evaluation of Validity Polyhedral Analysis 151 7.1 Overview and General Procedure 151

7.2 Circle Neural Network 153

7.3 Benchmark Data Sets 156

7.3.1 Iris Neural Network 156

7.3.2 Pima Neural Network 158

7.4 SP500 Neural Network 161

7.5 Summary of this Chapter 163

8 Conclusion and Future Work 165 8.1 Contributions of this Thesis 165

8.2 Fine Tuning of VPA 167

Trang 11

1.1 Annotated version of a neural network 10

2.1 Single neuron of a multilayer perceptron 19

2.2 Sigmoid and threshold activation functions and the graph of the func-tion computed by a single neuron with a two-dimensional input 20

2.3 Two-layer feed-forward neural network 21

2.4 Overview of different validation methods for neural networks 26

2.5 Example for the KT-Method 30

2.6 Example for the -of- method 34

2.7 DIBA : recursively projection of hyperplanes onto each other 51

2.8 DIBA: a decision region and the traversing of a line 52

2.9 Annotation of a neural network with validity intervals 56

2.10 Piece-wise linear approximation of the sigmoid function 61

3.1 Combination of two vectors in a two-dimensional space: from left to right: linear, non-negative, affine and convex combination 66

3.2 The back-propagation of a polyhedron through the transfer function layer Given the polyhedral description  in the output space of a transfer function layer the reciprocal image of this polyhe-dron under the non-linear transfer function is given by "!$#&%' $ (!) 71

3.3 Approximation of the non-linear region 73

3.4 Subdivision into cells 74

3.5 Example for subdivision into cells 74

3.6 Point sampling approach 75

v

Trang 12

3.7 Eigenvalue and eigenvector analysis of the facet manifold 79

3.8 Example for convex curvature 80

3.9 Example for concave curvature 81

4.1 Non axis-parallel split of a polyhedron 84

4.2 Polyhedral wrapping of the non-linear region 87

4.3 Application of branch and bound 95

4.4 Two dimensional example for the binary search method 103

5.1 The forward and backward-propagation of a polyhedron through the weight layer 114

5.2 The projection of a polyhedron onto a two-dimensional subspace 121

5.3 Relevant hinge 124

5.4 Possible 135

5.5 The projected polyhedron is contained in the approximation 139

6.1 Overview of the framework 144

7.1 Visualisation for the behaviour of the circle neural network 156

7.2 Projection onto the subspace *,+-/.021354 7698 :<;>=@?7+BADC 158

7.3 Projection onto the subspace *,+-/.021354 7698FEHG=@?7+BADC 158

7.4 The computed output regions for the Pima Neural Network 160

7.5 The computed input region for the SP500 Neural Network 163

Trang 13

2.1 Overview of neural network validation techniques 63

vii

Trang 15

5.1 projExample 138

6.1 net-struct 144

6.2 mainLoop 145

6.3 forwardStep 146

6.4 mainVIA 147

6.5 mainVPA 147

6.6 numExample 148

ix

Trang 16

The work contained in this thesis has not been previously submitted for a degree ordiploma at any other higher education institution To the best of my knowledge andbelief, the thesis contains no material previously published or written by another per-son except where due reference is made.

Stephan Breutel

March 2004

Trang 17

I would like to thank my principal supervisor Fr´ed´eric Maire, without whose bounded energy, idealism, motivation and immense enthusiasm this research wouldnot be possible It was an honor to receive wisdom and guidance from all my su-pervisory team A special thanks also to my associate supervisor Ross Hayward whoprovided excellent support whenever I needed it.

un-Apart from my supervisors, I would like to thank my other panel members of my oraldefense, Joaquin Sitte and Arthur ter Hofstede for their valuable comments about thisthesis

I would also like to thank all of my colleagues in my research center for their helpand encouragement during my time here, and in particular all the people of the SmartDevices Laboratory It would take too many pages to list them all

I was fortunate to have many friends with whom I enjoyed fantastic climbing sessions

at Kangaroo Point and had many great times at the Press Club whilst listing to superbJazz performances, all of which helped to keep me relatively sane during my time atQUT

I would also like to thank the Coffee Coffee Coffee crew for keeping me awake byserving me literally over a thousand coffees (to be precise 1084)

Finally, I really would like to thank all my friends and my family back home in Bavariafor supporting my endevour A special thanks to my parents for their moral and finan-cial support

This work was supported in part by an IPRS Scholarship and a QUT Faculty of mation Technology Scholarship

Trang 19

of this thesis is outlined in section 1.5.

1.1 Motivation and Significance

A conclusion of the report “Industrial use of safety-related artifical neural networks”

by Lisboa [Lis01] is that one of the keys to a successful transfer of neural networks

to marketplace is the integration with other systems (e.g standard software systems,fuzzy systems, rule-based systems) This requires an analysis of the behaviour ofneural network based components Examples for products using neural networks insafety-critical areas are [Lis01]:

Explosive detection, 1987 SNOOPE from SAIC is an explosive detector It wasmotivated by the need to detect the plastic explosive Semtex The detector irra-diated suitcases with low energy neutrons and collected an emission gamma-rayspectrum A standard feed-forward neural network was used to classify be-tween: bulk explosive, sheet explosive and no explosive However, there were

1

Trang 20

several practical problems For example the 4% false positive of the MLP sulted in a large number of items to be checked This is not practical, especiallyfor highly frequented airports such as Heathrow and Los Angeles airport, wherethe system was tested.

Financial risk management PRISM by Nestor relied on a recursive adaptivemodel HNC’s Falcon is based on a regularised Multi layer perceptron (MLP).Both systems are still market leader for credit card fraud detection [LVE00]

Siemens applied neural networks for the control of steel rolling mines A totype neural network based model was used for strip temperature and rollingforce at the hot strip mill of Hoesch, in Dortmund, in 1993 Later Siemensapplied this technology at 40 rolling mills world-wide Siemens experience in-dicates that neural networks always complement, and never replace, physicalmodels Additionally, the domain expertise is essential in the validation process

pro-A third observation was that data requirements are severe

NASA and Boeing are testing a neural network-based damage recovery controlsystem for military and commercial aircraft This system has the aim to add asignificant margin of safety fly-by-wire control, when the aircraft sustains majorequipment or system failure, ranging from the inability to use flaps to encoun-tering extreme icing

Vibration analysis monitoring in jet engines is a joined research project by Royce and the Department of Engineering at Oxford University The diagnos-tic system QUINCE combines the outputs from neural networks with templatematching, statistical processing and signal processing methods The software isdesigned for the pass-off test of jet engines It includes a tracking facility of themost likely fault Another project is a real-time in-flight monitoring system ofthe Trent 900 Rolls-Royce engine The project combines different techniques,like for example Kalman filters with signal processing methods and neural net-works

In an European collaborative project involving leading car manufactures,

Trang 21

differ-ent control systems ranging from engine managemdiffer-ent models to physical speed

control have been implemented These control systems combined engineering

expertise with non-linear interpolation by neural network architectures It

in-cluded rule-based systems, fuzzy systems and neural networks

Siemens produces the FP-11 intelligent fire detector This detector was trained

from fire tests carried out over many years According to [Lis01] this fire

detec-tor triggered one-thirtieth false alarms of conventional detecdetec-tors The detecdetec-tor

is based on a digital implementation of fuzzy logic, with rules discovered by a

neural network but validated by human experts

These examples show that neural networks need to be integrated with other systems,

that it is relevant to extract rules learnt by neural networks (for example for the fire

detector FP-11 or for the credit card fraud detection) and that it is important to provide

valid statements of the neural network behaviour in safety-critical applications

Therefore, it is necessary to describe the neural network behaviour, e.g in form of

valid and refined rules Additionally, it is interesting to obtain explanations for the

neural network behaviour

However, our main motivation is to compute valid statements about the neural network

behaviour and as such help to prevent software faults Software errors can cause a lot of

problems and risks, especially in safety-critical environments Several software errors

and their consequences are collected in [Huc99] The explosion of the Ariane 5 rocket

and the loss of Mars climate orbiter [Huc99] are recent examples of consequences of

Trang 22

control the generalization of a trained neural network Generalization expresses

the ability of the neural network to produce correct output for previously unseeninput data A description of the neural network behaviour will provide a betterinsight into the neural network generalization capability,

visualize corresponding regions in the input and the output space of a neuralnetwork

1.2 Notations and Definitions

The following notation conventions are inspired from the book by Fritzke [Fri98] andmost of the conventions followed the Matlab [Mat00c] notation

column or row vector or any arbitrary element within the matrix, we use the

JOPRQSMUTWVXMYKF extracts theQ -th and theT-th row vector

the vector

Trang 23

the horizontal concatenation of two matrices with the same number of rows.

Additionally, we use the convention, that definitions and expressions, which are

time Also function names are written in emphasized style

An overview of all symbols and special operators is provided in appendix A

How-ever, the following table contains symbols which are already relevant for this chapter

and for the literature review of neural network analysis techniques in Chapter 2

The next table defines a special function symbol, one operator and the interval notation

P pqMsrUV denotes the intervalt puvt,r>

To refer to sigmoidal functions we often use the Matlab terms logsig and tansig

1

A glossary is not provided, yet, but it will be included in the final version We apologise for any

inconveniences.

Trang 24

1.3 Software Verification and Neural Network Validation

We will divide software components, depending on the task and its implementation,into two classes, namely into trainable software components and into non-trainablesoftware components

Definition 1.1 Trainable Software Components

Software components for classification and non-linear regression, where the task is plicitly specified via a set of examples, and a set of parameters is optimized according

Typically, we use trainable software components where it is not easy or not ble to define a clear algorithm For example, in tasks like speech recognition, imagerecognition or robotic control, often statistical learners, like neural networks or supportvector machines are applied

possi-Definition 1.2 Non-trainable Software Components

Software components, where the task is precisely specified, an algorithm can be fined and the task is implemented with a programming language are called non-trainable

We also refer to non-trainable software components as standard software

Software Verification and Validation of Neural Network Components

Standard software program verification methods take the source code as input andprove the correctness against the (formal) specification Among others, important con-

cepts of software verification are pre- and postconditions.

Definition 1.3 Precondition and Postcondition

output data, the statement

Trang 25

indicates, that for every input status which fulfills 5x{ before the execution of the

We can view pre- and postconditions as specifications of the component properties

However, using pre- and postconditions does not assure that the software component

fulfills these specifications A formal proof is required

In the context of artifical neural networks we talk about validation techniques In the

following we discuss the central ideas of neural network validation techniques in a

nut-shell The validation approaches for neural networks are propositional rule extraction,

fuzzy rule extraction and region based analysis Propositional rule extraction methods

take a (trained) neural network component as input, extract symbolic rules and test

them against the network behaviour itself and against the test data These methods are

helpful to test neural networks

Fuzzy rule extraction methods try to extract a set of fuzzy rules, which mimics the

behaviour of neural networks The advantage of fuzzy rule extraction, compared to

propositional rule extraction, is that, generally, less rules are needed to explain the

neural network behaviour In addition, with the use of linguistic expressions easily

understandable characterizations about the neural network behaviour are obtained

Region based analysis methods take a (trained) neural network as input and compute

related regions in the input and output space Region based analysis techniques differ

from the above methods because they have a geometric origin, are usable for a broader

range of neural networks and have the ability to compute more accurate interface

de-scriptions of a neural network component These methods compute a region mapping

methods, these region based rules agree exactly with the behaviour of the neural

net-work The more refined those regions are, the more information we obtain about the

neural network Validity Interval Analysis (VIA), developed by Thrun [Thr93], for

example, is able to find provably correct axis-parallel rules, i.e rules of the form: “if

region based analysis approaches are suitable to validate the behaviour of neural

Trang 26

net-work based software components.

The development of large software systems requires the interaction of different ware components As motivated with the examples, we need some kind of humanunderstandable descriptions of neural network based software components (e.g byusing fuzzy rules) as well as techniques to assert important properties of the neuralnetwork behaviour (e.g in form of valid relations between input and output regions).Our approach to compute corresponding regions, represented as unions of polyhedra,

soft-in the soft-input and the output space of a neural network, is able to validate propertiesabout the neural network behaviour

Definition 1.4 Polyhedral Precondition and polyhedral Postcondition

postcon-dition, where the constraints on the output data are expressed as a system of linear

We can view polyhedral pre- and postconditions as conjunctions of linear inequalitypredicates We also use the terminology “polyhedral interface assertions” or “polyhe-dral interface description”

Among others, the following properties are desirable for methods analysing the haviour of trained neural networks:

generality (also known in the literature as portability, e.g see Andrews et al.

[TAGD98]): algorithms that make no assumptions about the neural networkarchitecture and the learning algorithm,

precise and concise description of the neural network behaviour, e.g in form of

a small number of informative and refined rules,

polynomial algorithmic time and space complexity In other words the algorithm

is still applicable for higher-dimensional cases,

Trang 27

usable to validate properties about the neural network behaviour

1.4 Annotated Artifical Neural Networks

Our approach is to forward- and backward-propagate finite unions of polyhedra through

all layers of the neural network This strategy can be viewed as an extension of

Valid-ity Interval Analysis (VIA) and is consequently named ValidValid-ity Polyhedral Analysis

(VPA) The method is very general, as the only assumptions are, that we work with

feed-forward neural networks (a brief introduction to feed-forward neural networks

follows in Chapter 2), and that the network has invertible and continuous

transfer-functions

relies on polyhedra

Our validation algorithm uses a, generally, trained neural network as input and

pro-duces an annotated version of this neural network

Trang 28

Definition 1.5 Annotated Artifical Neural Network (AANN)

An artifical neural network, where the input and output of each layer is annotated via

a set of valid pre- and postconditions is named an Annotated Artifical Neural Network(AANN)

For example VIA produces pre-and postconditions in form of axis-parallel rules VPAannotates a neural network via a set of linear inequality predicates

σσ

Figure 1.1: Annotated version of a neural network.

Generally, the behaviour of neural networks can be more accurately described with afinite number of unions of polyhedra compared to a finite number of unions of axis-parallel hypercubes

There is an interesting analogy between the notion of annotated neural networks andsoftware-verification for programs One strategy to verify the correctness of a programagainst a given specification is to annotate the program with logical expressions and toprove the correctness of each step In our case the “program” is a neural network andeach layer is annotated with a set of valid linear inequality predicates

Trang 29

A Bridge to Logic and Software Verification

The Hoare calculus provides a formal framework to verify the correctness of programs

by annotating the program with assertions about the status of the program variables

and the change of this status under the program execution The Hoare calculus defines

rules for the correct annotation of a program The book by Broy [Bro97] provides a

thorough introduction to the basics of the Hoare calculus Within the scope of this

the-sis the rule of statement sequence, the rule of consequence and the concepts of weaker

MO’MO

denote predicatelogical pre-and postconditions with program variables as free identifiers Statements

of the program are represented with S, S1 and S2 In the following description the rule

condition and the rule consequence are separated by a horizontal line

Rule of Statement Sequence

Trang 30

Definition 1.7 Weaker and Stronger Postcondition

is the weaker postcondition.

We can denote a feed-forward neural network as a finite sequence of an affine

layers of an annotated neural network

Rule of Statement Sequence for an Annotated Artifical Neural Network

of the rule of statement sequence on a multilayer feed-forward neural network, allows

us to write:

~®>±²²B­°

where ANN represents the sequence of computation a feed-forward neural network

Trang 31

of an annotated neural network are, the stronger the corresponding pre - and

postcon-ditions It turns out, that our geometrical perspective is quite useful, as it allows us

to define a precise measurement for the strength of a precondition or postcondition,

namely the volume of the corresponding region

1.5 Highlights and Organization of this Dissertation

The highlights and the organization of this thesis are as follows:

Chapter 2: Analysis of Neural Networks

In this chapter we introduce basic concepts of feed-forward neural networks and

pro-vide a literature overview of validation methods for neural network components We

classified the validation methods into propositional rule extraction, fuzzy rule

extrac-tion and region-based analysis Finally, the different methods are compared and our

approach, named Validity Polyhedral Analysis (VPA), is motivated

Chapter 3: Polyhedral Computations and Deformations of Polyhedral Facets

un-der Sigmoidal Transformations

Polyhedra are the generalization of convex polygons to higher dimensional spaces

This chapter presents the most important properties and concepts of polyhedral

analy-sis to make this theanaly-sis self contained

To obtain refined polyhedral interface assertions, we have to propagate unions of

poly-hedra through all layers of a neural network This requires computing the image of a

polyhedron under a non-linear transformation The image of non-axis parallel

polyhe-dra, under a sigmoidal transformation are non-linear regions In our initial

investiga-tions we analyse how polyhedral facets get twisted under a sigmoidal transformation

Chapter 4: Mathematical Analysis of the Non-linear Transformation Phase

In this chapter we explain how to approximate the image of a polyhedron under a

non-linear transformation by a finite union of polyhedra This approximation process can

Trang 32

be reduced to a non-linear optimization problem Several approaches to approximatethe global maximum of the corresponding optimization problem are discussed.

Chapter 5: Mathematical Analysis of the Affine Transformation Phase

The computation of the reciprocal image of a polyhedron under an affine tion is explained in this chapter Furthermore, this chapter discusses how to calculatethe image of a polyhedron under an affine transformation and strategies for computing

transforma-or approximating the projection of a polyhedron onto a lower dimensional subspace.Within the scope of this thesis projection techniques are used for the computation of animage of a polyhedron under an affine transformation characterized by a non-invertiblematrix

Chapter 6: Implementation Issues and Numerical Problems

This chapter discusses the design and implementation of a general framework for anyregion-based refinement algorithm The framework is successfully used for the Valid-ity Interval Analysis (VIA) and our new Validity Polyhedral Analysis (VPA) method

It is always necessary to study numerical properties of a mathematical algorithm whenimplementing the algorithm on a digital machine with finite precision Section 6.2 isdevoted to these problems, which will be referred to as the numerical problems

Chapter 7: Evaluation of Validity Polyhedral Analysis

Validity Polyhedral Analysis computes interface assertions of a neural network in hedral format We evaluated VPA on toy neural networks, on neural networks trainedwith benchmark data sets of the UI Irvine database [Rep] and on a neural networktrained to predict the SP500 stock-market index Additionally, the method is com-pared to VIA (Validity Interval Analysis) and the refinement process is discussed

Trang 33

poly-Chapter 8: Conclusion and Future Work

This chapter summarizes the main contributions, explains how to “fine tune” the

in-troduced VPA strategy, and finally motivates future investigations to obtain validation

techniques for kernel-based machines, like for example support vector machines

Appendices

Appendix A summarizes all used symbols and notations

Appendix B recalls the relevant knowledge and notions of linear algebra to make this

thesis self contained

1.6 Summary of this Chapter

As throughout this thesis a summary of the chapter and a list of new contributions is

provided

To motivate neural network validation methods, examples of neural network

compo-nents in safety-critical applications have been illustrated Notation conventions have

been introduced, neural network validation techniques discussed and criteria for

neu-ral network validation methods have been formulated Finally, the idea of validity

polyhedral analysis was introduced, which is used to obtain an annotated version of a

feed-forward neural network

Contributions Chapter 1

The notion of Annotated Artifical Neural Networks (AANN) and application

of the method of assertions to neural networks

Validity Polyhedral Analysis (VPA), as a tool to annotate a feed-forward

neu-ral network with valid pre- and postconditions in form of linear inequality

predicates

Trang 35

Analysis of Neural Networks

Section 2.1 recalls central ideas of artifical neural networks For a very thorough troduction, the reader is referred to the excellent book by Haykin [Hay99]

in-As motivated in the introduction, trained neural network components need to undergo

a validation or testing procedure before their (industrial) use Section 2.2 is devoted tothe topic of neural network validation

Finally, a short summary of the discussed validation methods is provided and our proach, named Validity Polyhedral Analysis (VPA) is motivated (why we do it) andjustified (why we do it this way)

ap-2.1 Neural Networks

Artifical Neural Networks (ANNs) are partly inspired by observations about the logical brain These observations led to the conclusion that information in biologicalneural systems are processed in parallel over a network of a large number of intercon-nected, distributed neurons (simple computational units) However, there are a lot ofdifferences between biological neural systems and ANNs

bio-For example, the output of a neuron of an ANN is a single value, whereas a biological

Trang 36

are general function approximators, able to learn from a set of examples ThereforeANNs are also often characterized as statistical learners The main features of neuralnetwork based machines are [Fri98]:

usually nonlinear transformations are used

In this work we focus on sigmoidal feed-forward neural networks (also known as

2.1 The input to a neuron of a feed-forward neural network is calculated by puting the weighted sum of the activation of preceding neurons and adding a bias

NL, whereµ is the weight vector and the activation of the

neuron, is denoted with

NL

In case of a threshold activation function a neuron is active (has a positive value),

A hyperplane defined by

º$ µž»¼‹<¨¦ is a hyperplane through the origin, and splits the input space into

¹ ¿

.This geometrical interpretation also explains why, historically [Ros58], neural net-works were introduced as classifiers For example, for a linearly separable classifica-tion problem a single weight layer neural network would be sufficient to learn the taskcorrectly Labeled data is said to be linearly separable, if the patterns lie on oppositesides of a hyperplane

Trang 37

Figure 2.1: Single neuron of a multilayer perceptron, wherek

ÁÀÂMYKF are the weighted

Often the threshold function

Ná´v¨

Ÿ K ghji

are used as activation functions for feed-forward neural networks The graphs of these

functions are on the left of Figure 2.2 The figure also shows the function output of a

single output neuron with 2 dimensional input space, when applying a) the logsig and

b) the threshold function to the neuron input

Trang 38

Figure 2.2: Sigmoid and threshold activation functions and the graph of the function

computed by a single neuron with a two-dimensional input In this case the weight

transfer function

A feed-forward neural network architecture has an input layer, several hidden layersand an output layer The input vector propagates through all layers of the neural net-work in a forward direction The dimension (size) of the input layer, i.e the number

of input neurons and the dimension of the output layer are defined by the application

It is difficult to determine a priori a suitable number of hidden layers and the number

of hidden neurons for each of these layers This has to be solved during the modelselection process In Figure 2.3 we show a typical multilayer perceptron architecture

Trang 39

θ θ

= W

net 2x

2 2

connecting the previous layer

W ,θ θ2 Layer Weight matrix, and successor layer and the bias.

x2 input to the second layer input vector to the second

y the neural network output vector

and output of the first transfer−

function layer.

Neural Network Architecture

(net) output vector

Weight Layer

Weight Layer Layer

(net) input vector Layer

The most widespread learning method is supervised learning Supervised learning

as-sumes that the output data for the training set is available The neural network learns

by adjusting stepwise the weight parameters and biases, according to the difference

of its actual output and the desired output An example for this type of learning is

the backpropagation of error algorithm This algorithm is typically used to train

feed-forward neural networks

Other often applied machine learning strategies are unsupervised learning and

rein-forcement learning Unsupervised learning is used, for example, to cluster data points

with unknown class labels into groups with similar inputs

Trang 40

Reinforcement learning describes a process, where an autonomous agent that acts in

an environment learns to choose optimal actions to achieve its goals The reader isreferred to the book by Fritzke [Fri98] for more information on unsupervised learningand to the book by Mitchell [Mit97] for reinforcement learning

2.2 Validation of Neural Network Components

Artifical neural networks have numerous advantages, like for example, universal proximation capability and ability to learn However, neural networks have often beencriticized for their low degree of comprehensibility For a user of a neural network it

ap-is impossible to infer how a specific output ap-is obtained

Validation of ANNs is important in safety-critical problem domains and for the tegration of ANN components into large software environments interface assertions,describing the behaviour of the neural network, are desirable

in-This section gives a literature overview of methods useful to explain the function puted by an ANN The methods are categorized into three groups, namely proposi-tional rule extraction, fuzzy rule extraction and region based analysis Before describ-ing some approaches of these classes in detail, we will define the problem of validatingneural network components, introduce some useful formalism and provide an exam-ple The excellent introduction book to computer science by Broy [Bro97] starts with

com-the definition of com-the terms information, representation and interpretation.

Definition 2.1 Information and Representation

We call information the abstract content (semantics) of a document, expression,

Definition 2.2 Interpretation

We can view an interpretation function as a mapping from one representation

[Bro97] Two important remarks [Bro97]:

...

thorough introduction to the basics of the Hoare calculus Within the scope of this

the- sis the rule of statement sequence, the rule of consequence and the concepts of weaker

MO’MO... of a neural network.

Generally, the behaviour of neural networks can be more accurately described with afinite number of unions of polyhedra compared to a finite number of unions of. .. regions, represented as unions of polyhedra,

soft-in the soft-input and the output space of a neural network, is able to validate propertiesabout the neural network behaviour

Definition

Ngày đăng: 07/08/2017, 15:33

TỪ KHÓA LIÊN QUAN