neural networks algorithms applications and programming techniques phần 6 potx

By using the arrays in thismanner, we can collect co-occurrence statistics about the network by starting at the first input unit and sequentially scanning all other units in the network.

Trang 1

Co-occurence A-B A-C A-D A-E B-C B-D B-E C-D C-E D-E

Low memory

High memory

Figure 5.5 The CO-OCCURRENCE array is shown (a) depicted as a

sequential memory array, and (b) with its mapping to theconnections in the Boltzmann network

Figure 5.5(a) The first four entries in the CO-OCCURRENCE array for this

network would be mapped to the connections between units B, C,

A-D, and A-E, as shown in Figure 5.5(b) Likewise, the next three slots would

be mapped to the connections between units B-C, B-D, and B-E; the nexttwo to C-D, and C- E; and the last one to D-E By using the arrays in thismanner, we can collect co-occurrence statistics about the network by starting

at the first input unit and sequentially scanning all other units in the network.After completing this initial pass, we can complete the network scan by merelyincrementing our array pointer to access the second unit, then the third, fourth, , nth units

We can now specify the remaining data structures needed to implement

the Boltzmann network simulator We begin by defining the top-level record

structure used to define the Boltzmann network:

Trang 2

5.3 The Boltzmann Simulator 195

Figure 5.6 provides an illustration of how the values in the BOLTZMANN

structure interact to specify a Boltzmann network Here, as in other networkmodels, the layer structure is the gateway to the network specific data struc-tures All that is needed to gain access to the layer-specific data are point-ers to the appropriate arrays Thus, the structure for the layer record is

Figure 5.6 Organization of the Boltzmann network using the defined data

structure is shown In this example, the input and output units are the same, and the network is in the third step of its annealing schedule.

Trang 3

record LAYER =

outs : ~float[]; {pointer to unit outputs array}weights : ~"float[]; {pointers in weight _ptr array}end record;

where out s is a pointer used to locate the beginning of the unit outputs array

in memory, and weights is a pointer to the intermediate weight-ptr ray, which is used in turn to locate each of the input connection arrays in thesystem Since the Boltzmann network requires only one layer of PEs, we willneed only one layer pointer in the BOLTZMANN record All these low-leveldata structures are exactly the same as those specified in the generic simulatordiscussed in Chapter 1

Boltzmann Production Algorithms Remember that information recall in the

Boltzmann network consists of a sequence of steps where we first apply aninput to the network, raise the temperature to some predefined level, and annealthe network while slowly lowering the temperature In this example, we wouldinitially raise the temperature to 5 and would perform four stochastic signalpropagations; we would then lower the temperature to 4 and would performsix signal propagations, and so on After completing the four required signalpropagations when the temperature of the network is 1, we can consider the

network annealed At this point, we simply read the output values from thevisible units

If, however, we think about the process we just described, we can pose the information-recall problem into three lower-level subroutines:

decom-Temperature5 4 3 2 1

Passes4 6 7 6 4

Table 5.1 The annealing schedule for the simulator example.

Trang 4

apply_input A routine used to take a user-provided or training input andapply it to the network, and to initialize the output from all unknown units

at a low temperature These functions, described next, can each be implemented

as subroutines that are called by the parent anneal process

set_temp A procedure used to set the current network temperature and nealing schedule pass count to values specified in the overall annealingschedule

an-propagate A function used to perform one signal propagation through theentire network, using the current temperature and probabilistic unit se-lection This routine should be capable of performing the signal propagationregardless of the network state (clamped or undamped)

Signal Propagation in the Boltzmann Network We shall now define the

most basic of the needed subroutines, the propagate procedure The gorithm for this procedure, which follows, presumes that the user-providedapply-input and not-yet-defined set-temp functions have been executed

al-to initialize the outputs of the network's units and temperature parameter

to the desired states

procedure propagate (NET:BOLTZMANN)

{perform one signal propagation pass through network.}var unit : integer; {randomly selected unit}

p : float; {probability of unit being on}neti : float; {net input to unit}threshold : integer; {point at which unit turns on}

i, j : integer; {iteration counters}inputs : "float[]; {pointer to unit outputs array}connects : ~float[]; {pointer to unit weights array}undamped : integer; {index to first undamped unit}firstc : integer; (index to first connection}begin

{locate the first nonvisible unit, assuming first

index = 1}

undamped = NET OUTPUT FIRST + NET OUTPUT LENGTH - 1;

Trang 5

if (NET.INPUTS.FIRST = NET.OUTPUTS.FIRST)

then firstc = NET.INPUTS.FIRST {Boltzmann completion}

else firstc = NET.INPUTS.LENGTH + 1;

{Boltzmann input-output}end if;

for i = 1 to NET.UNITS {for as many units in network}do

if (NET.CLAMPED) {if network is clamped}then {select an undamped unit}unit = random (NET UNITS - undamped)

for j = firstc to NET.UNITS

{all connections to unit}

do {compute sum of products}neti = neti + inputs[j] * connects[j];

end do;

{this next statement is used to improve

performance, as described in the text}

Trang 6

Before we move on to the next routine, there are three aspects of thepropagate procedure that bear further discussion: the selection mechanismfor unit update, the computation of the neti term, and the method we havechosen for determining when a unit is or is not active

In the first case, the Boltzmann network must be able to run with its puts either clamped or free-running So that we do not need to have differentpropagate routines for each mode, we simply use a Boolean variable inthe network record to indicate the current mode of operation, and enable thepropagate routine to select a unit for update accordingly If the network

in-is clamped, we cannot select an input or output unit for update We account

for these differences by assuming that the visible units to the network are thefirst TV units in the layer We thus can be assured that the visible units willnot change if we simply select a random unit from the set of units that do not

include the first N units We accomplish this selection by decreasing the range

of the random-number generator to the number of network units minus N, and

then adding TV to the result Since we have decided that all our arrays will usethe first TV indices to locate the visible units, generating a random index greaterthan TV will always select a random unit beyond the range of the visible units

However, if the network is undamped, any unit must be available for update.

Inspection of the algorithm for propagate will reveal that these two casesare handled by the if-then-else clause at the beginning of the routine.Second, there are two salient points regarding the computation of the netiterm with respect to the propagate routine The first point is that connec-tions between input units are processed only when the network is configured

as a Boltzmann completion network In the Boltzmann input-output mode,connections between input units do not exist This structure conforms to themathematical model described earlier The second point about the calculation

of the neti term is that we have obviously wasted computer time by ing a connection from each unit to itself twice, once as part of the summationloop during the calculation of the neti value, and once to subtract it out afterthe total neti has been calculated The reason we have chosen to implementthe algorithm in this manner is, again, to improve performance Even though

process-we have consumed computer time by processing a nonexistent connection forevery unit in the network, we have used far less time than would be required

to disallow the computation of the missing connection selectively during every iteration of the summation loop Furthermore, we can easily eliminate the error

introduced in the input summation by processing the nonexistent connection by

subtracting out just that term after completing the loop, prior to updating theoutput of the unit You might also observe that we have wasted memory byallocating space for the connections between each unit and itself We have cho-sen to implement the network in this fashion to simplify processing, and thus

to improve performance as described

As an example of why it is desirable to optimize the code at the expense

of wasted memory, consider the alternative case where only valid connections

Trang 7

are modeled Since no unit has a connection to itself, but all units have outputsmaintained in the same array, the code to process all input connections to a unitwould have to be written as two different loops: one for those input PEs thatprecede the current unit, where the array indices for outputs and connectionscorrespond one-to-one, and one loop for inputs from units that follow, whereunit outputs are displaced by one array entry from the corresponding connection.This situation occurs because we have organized the unit outputs and connec-tions as linearly sequential arrays in memory Such a situation is illustrated inFigure 5.7.

Figure 5.7 The illustration shows array processing (a) when memory is

allocated for all possible connections, and (b) when memory

is not allocated for intra-unit connections In (a), the code necessary to perform this input summation simply computes the input value for all connections, then eliminates the error introduced by processing the nonexistent connection to itself.

In (b), the code must be more selective about accessing connections, since the one-to-one mapping of connections to units is lost Obviously, approach (a) is our preferred method, since it will execute much faster than approach (b).

Trang 8

Finally, with respect to deciding when to activate the output of a unit, recall

that the Boltzmann network differs from the other networks that we have studied

in that PEs are activated stochastically rather than deterministically Recall that

the equation

defines how we calculate the probability that a unit x/t is active with respect

to its input stimulation (net,t) However, simply knowing the probability that

a unit will generate an output does not guarantee that the unit will generate an

output We must therefore implement a mechanism that allows the computer totranslate the calculated probability into a unit output that occurs with the same

probability; in effect, we must let the computer roll the dice to determine when

an output is active and when it is not

One method for doing this is to make use of the pseudorandom-number erator available in most high-level computer languages Here, we take advantage

gen-of the fact that the computed probability, p^, will always be a fractional number

ranging between zero and one, as illustrated by the graph depicted in Figure 5.8

We can map p/, to an integer threshold value between zero and some arbitrarily

large number by simply multiplying the ceiling value by the computed ity and rounding the result into an integer We then generate a random numberbetween zero and the selected ceiling, and, if the probability does not exceedthe threshold value just computed, the output of the unit is set to one Assuming

probabil-that the pseudorandom-number generator has a uniform probability distribution

across the interval of interest, the random number produced will not exceed the

threshold value with a probability equal to the specified value, pk Thus, we

now have a means of stochastically activating unit outputs in the network

Net input

Figure 5.8 Shown here is a graph of the probability, p k , that the /cth unit

is on at five different temperatures, T.

Trang 9

Boltzmann Learning Algorithms There are five additional functions that

must be defined to train the Boltzmann network:

set_temp A function used to update the parameters in the BOLTZMANN record

to reflect the network temperature at the current step, as specified in theannealing schedule

pplus A function used to compute and average the co-occurrence probabilitiesfor a network with clamped inputs after it has reached equilibrium at theminimum temperature

pminus A function similar to pplus, but used when the network is running

free-update_connections The procedure that modifies the connection weights

in the network to train the Boltzmann simulator

The implementation of the set-temp function is straightforward, as fined here:

de-procedure set_temp (NET:BOLTZMANN; N:integer)

{set the temperature and schedule step}

devel-We shall now turn our attention to the computation of the co-occurrence

probability, pj, when the input to the network is clamped to an arbitrary input

vector, Va As we did with propagate, we will assume that the input pattern

has been placed on the input units by an earlier call to set-inputs

Fur-thermore, we shall assume that the statistics arrays have been initialized by an

earlier call to a user-supplied routine that we refer to as zero-Statistics.procedure sum_cooccurrence (NET:BOLTZMANN)

{accumulate co-occurence statistics for the

specified network}var i,j,k : integer; {loop counters}

connect : integer; {co-occurence index}

stats : "float[]; {pointer to statistics array}

Trang 10

begin

if (NET.CLAMPED) {if network is clamped}

then stats = NET.STATISTICS.CLAMPED

else stats = NET.STATISTICS.UNCLAMPED;

end if;

for i = 1 to 5 {arbitrary number of cycles}do

propagate (NET); {run the network once}

connect = 1; {start at first pair}

stats"[connect] = stats"[connect] + 1;end if;

connect = next (connect);

Before we define the algorithm needed to estimate the pt term for theBoltzmann network, we will make a few assumptions Since the total number oftraining patterns that the network must learn will depend on the application, we

Trang 11

must write the code so that the computer will calculate the co-occurrence tics for a variable number of training patterns We must therefore assume thatthe training data are available to the simulator from some external source (such

statis-as a global array or disk file) that we will refer to statis-as PATTERNS, and that thetotal number of training patterns contained in this source is obtainable through

a call to an application-defined function that we will call how_many We alsopresume that you will provide the routines to initialize the co-occurrencearrays to zero, and set the outputs of the input network units to the state spec-ified by the Uh pattern in the PATTERNS data source We will refer to theseprocedures as initialize_arrays and set-inputs, respectively Based

on these assumptions, we shall now define our algorithm for computing pplus:procedure pplus (NET:BOLTZMANN)

var trials : integer;

i : integer;

{average over trials}

{loop counter}begin

trials = how_many (PATTERNS) * 5;

{five sums per pattern}for i = 1 to trials {for all trials}do

NET.STATISTICS.CLAMPED"[i] = {average results}

5.3.4 The Complete Boltzmann Simulator

Now that we have defined all the lower-level functions needed to implement theBoltzmann network, we shall describe the algorithms needed to tie everythingtogether As previously stated, the two user-provided routines (set-inputsand get outputs) are assumed to initialize and recover input and output data

to or from the simulator for an external process However, we have yet todefine the two intermediate routines that will be used to perform the networksimulation given the externally provided inputs We now begin to correct thatdeficiency by describing the algorithm for the anneal process

procedure anneal (NET:BOLTZMANN)

{perform one pass through annealing schedule for

current input}var passes : integer; {passes at current temperature}steps : integer; {number of steps in schedule}

i, j : integer; {loop counters}

Trang 12

passes = NET.ANNEALING".STEP[i].PASSES;

set_temp (NET, i); {set current annealing step}for j = 1 to passes {for all passes in step}do

during the annealing process This routine will compute and apply the Aw term

for each connection in the network To simplify the program, we assume that

the e constant contained in Eq (5.35) will always be 0.3.

procedure update_connections (NET:BOLTZMANN)

{update all connections based on cooccurence statistics}var connect : "float []; {pointer to connection array}

pp, pm : float[]; {statistics arrays}dupconnect : "float[];

{pointer to duplicate connection}

i, j, stat : integer; {iteration indices}begin

Trang 13

in two different arrays, such that these arrays always contain the same data Thealgorithm for update_connections satisfies this requirement by locating theassociated twin connection during every update cycle, and copying the new valuefrom the current connection to the twin connection, as illustrated in Figure 5.9.

We shall now describe the algorithm used to train the Boltzmann simulator.Here, as before, we assume that the training patterns to be learned are contained

in a globally accessible storage array named PATTERNS, and that the number

of patterns in this array is obtainable through a call to an application-definedroutine, howjnany Notice that in this function, we call the user-suppliedroutine, zero-Statistics, to initialize the statistics arrays

outs

/ Twin connections

Twin connections

Figure 5.9 Updating of the connections in the Boltzmann simulator is

shown The weights in the arrays highlighted by the darkened boxes represent connections modified by one pass through the update_connections procedure.

i

Trang 14

5.4 Using the Boltzmann Simulator 207

function learn (NET:BOLTZMANN)

{cause network to learn input PATTERNS}

var i : integer; {iteration counters}

begin

NET.CLAMPED = true; {clamp visible units)

zero_statistics (NET); {init statistics arrays}

for i = 1 to how_many (PATTERNS)

do

set_inputs (NET, PATTERNS, i);

anneal (NET); {apply annealing schedule)sum_cooccurrence (NET); {collect statistics)end do;

pplus (NET); {estimate p+)

NET.CLAMPED = false; {unclamp visible units}

for i = 1 to how_many (PATTERNS)

do

set_inputs (NET, PATTERNS, i);

anneal (NET); {apply annealing schedule}sum_cooccurrence (NET); {collect statistics)end do;

pminus (NET); {estimate p-}update_connections (NET); {modify connections}end procedure;

The algorithm necessary to have the network recall a pattern given an inputpattern (production mode) is straightforward, and now depends on only theroutines defined by the user to apply the new input pattern to the network and toread the resulting output These routines, apply_inputs and get.outputs,respectively, are combined with anneal to generate the desired output, asshown next:

procedure recall (NET:BOLTZMANN; INVEC,OUTVEC:"float[])

{stimulate the network to generate an output from input)begin

apply_inputs (NET, INVEC); {set the input)

anneal (NET); {generate output)

get_output (NET, OUTVEC); {return output)

end procedure;

5.4 USING THE BOLTZMANN SIMULATOR

With the exception of the backpropagation network described in Chapter 3,the Boltzmann network is probably the most general-purpose network of thosediscussed in this text It can be used either as an associative memory or as

Trang 15

a mapping network, depending only on whether the output units overlap theinput units These two operating modes encompass most of the common prob-lems to which ANS systems have been successfully applied Unfortunately,the Boltzman network also has the distinction of being the slowest of all thesimulators Nevertheless, there are several applications that can be addressed

using the Boltzmann network; in this section, we describe one

This application uses the Boltzmann input-output model to associate terns from "symptom" space with patterns in "diagnosis" space

pat-5.4.1 Boltzmann Symptom-Diagnosis Application

Let's consider a specific example of a symptom-diagnosis application We willuse an automobile diagnostic application as the basis for our example Specif-ically, we will focus on an application that will diagnose why a car will notstart We first define the various symptoms to be considered:

• Does nothing: Nothing happens when the key is turned in the ignitionswitch

• Clicks: A loud clicking noise is generated when the key is turned

• Grinds: A loud grinding noise is generated when the key is turned

• Cranks: The engine cranks as though trying to start, but the engine does

not run on its own

• No spark: Removing one of the spark-plug wires and holding the terminalnear the block while cranking the engine produces no spark

• Cable hot: After the engine has been cranked, the cable running from thebattery to the starter solenoid is hot

• No gas: Removing the fuel line from the carburetor (fuel injector) andcranking the engine produces no gas flow out of the fuel line

Next, we consider the possible causes of the problem, based on the

symptoms:

• Battery: The battery is dead

• Solenoid: The starter solenoid is defective

• Starter: The starter motor is defective

• Wires: The ignition wires are defective

• Distributor: The distributor rotor or cap is corroded

• Fuel pump: The fuel pump is defective

Although our list is not a complete representation of all possible problems,any one or a combination of these problems could be indicated by the symp-toms To complete our example, we shall construct a matrix indicating the

Trang 16

5.4 Using the Boltzmann Simulator 209

X

X X X

X X

An examination of this matrix indicates the variety of problems that can

be indicated by any one symptom The matrix also illustrates the problem

we encounter when we attempt to program a system to perform the diagnosticfunction: There rarely is a one-to-one correspondence between symptoms andcauses To be successful, our automated diagnostic system must be able to

correlate many different symptoms, and, in the event that some symptoms may

be overlooked or absent, must be able to "fill in the blanks" of the problembased on just the indicated symptoms

5.4.2 The Boltzmann Solution

We will now examine how a Boltzmann network can be applied to the diagnosis example we have created The first step is to construct the networkarchitecture that will solve the problem for us Since we would like to be able

symptom-to provide the network with observed sympsymptom-toms, and symptom-to have it respond withprobable cause, a good candidate architecture would be to map each symptomdirectly to an individual input PE, and each probable causes to an individ-

Trang 17

ual output PE Since our application requires outputs that are different fromthe inputs, we select the Boltzmann input-output network as the best candi-date.

Using the data from our example, we will need a network with seven inputunits and six output units That leaves only the number of internal units unde-termined In this case, there is nothing to indicate how many hidden units will

be required to solve the problem, and no external interface considerations thatwill limit the number of hidden units (as there were in the data-compressionexample described in Chapter 3) We therefore arbitrarily size the Boltzmannnetwork such that it contains 14 internal units If training indicates that we needmore units in order to converge, they can be added at a later time If we needfewer units, the extras can be eliminated later, although there is no overwhelm-ing reason to remove them in such a small network other than improving theperformance of the simulator

Next, we must define the data sets to be used to train the network Referringagain to our example matrix, we can consider the data in the row vectors ofthe matrix as seven-dimensional input patterns; that is, for each probable-causeoutput that we would like the network to learn, there are seven possible symp-toms that indicate the problem by their existence or absence This approach willprovide six training-vector pairs, each consisting of a seven-element symptompattern and a six-element problem-indication pattern

We let the existence of a symptom be indicated by a 1, and the absence of asymptom be represented by a 0 For any given input vector, the correct cause (orcauses) is indicated by a logic 1 in the proper position of the output vector Thetraining-vector pairs produced by the mapping in the symptom-problem matrixfor this example are illustrated in Figure 5.11 If you compare Figures 5.11 and5.10, you will notice slight differences You should convince yourself that thedifferences are justified

Symptoms Likely causes

Trang 18

Suggested Readings 211

All that remains from this point is to train the network on these data pairs

using the Boltzmann algorithms Once trained, the network will produce anoutput identifying the probable cause indicated by the input symptom map.The network will do this when the input is equivalent to one of the training

inputs, as expected, and it will produce an output indicating the likely cause

of the problem when the input is similar to, but different from, any traininginput This application illustrates the "best-guess" capability of the network andhighlights the network's ability to deal with noisy or incomplete data inputs

Programming Exercises

5.1 Develop the pseudocode design for the set-inputs, apply.inputs,

and

get_outputs routines

5.2 Develop the pseudocode design for the pminus routine

5.3 The pplus and pminus as described are largely redundant and can becombined into a single routine Develop the pseudocode design for such a

routine

5.4 Implement the Boltzmann simulator and test it with the automotive nostic data described in Section 5.4 Compare your results with ours, and

diag-discuss reasons for any differences

5.5 Implement the Boltzmann simulator and test it on an application of yourown choosing Describe the application and your choice of training data,and discuss reasons why the test did or did not succeed

5.6 Modify the simulator to contain two additional variable parameters,

epsilon (c) and cycles, as part of the network record structure.

Epsilon will be used to calculate the connection-weight change, stead of the hard-coded 0.3 constant described in the text, and cyclesshould be used to specify the number of iterations performed during thesum-cooccurrence routine (instead of the five we specified) Retrainthe network using the automotive diagnostic data with a different valuefor epsilon, then change cycles, and then change both parameters.Describe any performance variations that you observed

An early account of using simulated annealing to solve optimization problems

Trang 19

is given in a paper by Kirkpatrick, Gelatt, and Vecchi [5] The concept ofusing the Cauchy distribution to speed the annealing process is discussed in a

paper by Szu [8] A Byte magazine article by Hinton contains an algorithm for

the Boltzmann machine that is slightly different from the one presented in thischapter [4]

distribu-and Edward Rosenfeld, editors, Neurocomputing MIT Press, Cambridge,

MA, pages 614-634, 1988 Reprinted from IEEE Transactions of Pattern Analysis and Machine Intelligence PAMI-6: 721-741, 1984.

[3] G E Hinton and T J Sejnowski Learning and relearning in Boltzmannmachines In David E Rumelhart and James L McClelland, editors,

Parallel Distributed Processing: Explorations in the Microstructure of Cognition MIT Press, Cambridge, MA, pages 282-317, 1986.

[4] Geoffrey E Hinton Learning in parallel networks Byte, 10(4):265-273,

[6] F Reif Fundamentals of Statistical and Thermal Physics McGraw-Hill

series in fundamental physics McGraw-Hill, New York, 1965

[7] C E Shannon The mathematical theory of communication In C E

Shan-non and W Weaver, editors, The Mathematical Theory of Communication.

University of Illinois Press, Urbana, IL, pages 29-125, 1963

[8] Harold Szu Fast simulated annealing In John S Denker, editor, Neural Networks for Computing American Institute of Physics, New York, pages

420-425, 1986

Trang 20

known as a competitive network and Grossberg's outstar structure [5, 6]

Al-though the network architecture, as it appears in its originally published form

in Figure 6.1, seems rather formidable, we shall see that the operation of thenetwork is quite straightforward

Given a set of vector pairs, (xi,yi), (^2^2), • • • , (xz,,yt), the CPN can learn

to associate an x vector on the input layer with a y vector at the output layer

If the relationship between x and y can be described by a continuous function,

<J>, such that y = ^(x), the CPN will learn to approximate this mapping for anyvalue of x in the range specified by the set of training vectors Furthermore, ifthe inverse of $ exists, such that x is a function of y, then the CPN will also learnthe inverse mapping, x = 3>~'(y).' For a great many cases of practical interest,

the inverse function does not exist In these situations, we can simplify thediscussion of the CPN by considering only the forward-mapping case, y = $(x)

In Figure 6.2, we have reorganized the CPN diagram and have restrictedour consideration to the forward-mapping case The network now appears as

'We are using the term function in its strict mathematical sense If y is a function of x then every value of x corresponds to one and only one value of y Conversely, if x is a function of y, then every value of y corresponds to one and only one value of x An example of a function whose inverse is not a function is, y = x 2 , — oo < x < oo A somewhat more abstract, but perhaps

more interesting, situation is a function that maps images of animals to the name of the animal For example, "CAT" = 3>("picture of cat") Each picture represents only one animal, but each animal corresponds to many different pictures.

213

Định dạng
Số trang	41
Dung lượng	1,22 MB