By using the arrays in thismanner, we can collect co-occurrence statistics about the network by starting at the first input unit and sequentially scanning all other units in the network.
Trang 1Co-occurence A-B A-C A-D A-E B-C B-D B-E C-D C-E D-E
Low memory
High memory
Figure 5.5 The CO-OCCURRENCE array is shown (a) depicted as a
sequential memory array, and (b) with its mapping to theconnections in the Boltzmann network
Figure 5.5(a) The first four entries in the CO-OCCURRENCE array for this
network would be mapped to the connections between units B, C,
A-D, and A-E, as shown in Figure 5.5(b) Likewise, the next three slots would
be mapped to the connections between units B-C, B-D, and B-E; the nexttwo to C-D, and C- E; and the last one to D-E By using the arrays in thismanner, we can collect co-occurrence statistics about the network by starting
at the first input unit and sequentially scanning all other units in the network.After completing this initial pass, we can complete the network scan by merelyincrementing our array pointer to access the second unit, then the third, fourth, , nth units
We can now specify the remaining data structures needed to implement
the Boltzmann network simulator We begin by defining the top-level record
structure used to define the Boltzmann network:
Trang 25.3 The Boltzmann Simulator 195
Figure 5.6 provides an illustration of how the values in the BOLTZMANN
structure interact to specify a Boltzmann network Here, as in other networkmodels, the layer structure is the gateway to the network specific data struc-tures All that is needed to gain access to the layer-specific data are point-ers to the appropriate arrays Thus, the structure for the layer record is
Figure 5.6 Organization of the Boltzmann network using the defined data
structure is shown In this example, the input and output units are the same, and the network is in the third step of its annealing schedule.
Trang 3record LAYER =
outs : ~float[]; {pointer to unit outputs array}weights : ~"float[]; {pointers in weight _ptr array}end record;
where out s is a pointer used to locate the beginning of the unit outputs array
in memory, and weights is a pointer to the intermediate weight-ptr ray, which is used in turn to locate each of the input connection arrays in thesystem Since the Boltzmann network requires only one layer of PEs, we willneed only one layer pointer in the BOLTZMANN record All these low-leveldata structures are exactly the same as those specified in the generic simulatordiscussed in Chapter 1
Boltzmann Production Algorithms Remember that information recall in the
Boltzmann network consists of a sequence of steps where we first apply aninput to the network, raise the temperature to some predefined level, and annealthe network while slowly lowering the temperature In this example, we wouldinitially raise the temperature to 5 and would perform four stochastic signalpropagations; we would then lower the temperature to 4 and would performsix signal propagations, and so on After completing the four required signalpropagations when the temperature of the network is 1, we can consider the
network annealed At this point, we simply read the output values from thevisible units
If, however, we think about the process we just described, we can pose the information-recall problem into three lower-level subroutines:
decom-Temperature5 4 3 2 1
Passes4 6 7 6 4
Table 5.1 The annealing schedule for the simulator example.
Trang 45.3 The Boltzmann Simulator 197
apply_input A routine used to take a user-provided or training input andapply it to the network, and to initialize the output from all unknown units
at a low temperature These functions, described next, can each be implemented
as subroutines that are called by the parent anneal process
set_temp A procedure used to set the current network temperature and nealing schedule pass count to values specified in the overall annealingschedule
an-propagate A function used to perform one signal propagation through theentire network, using the current temperature and probabilistic unit se-lection This routine should be capable of performing the signal propagationregardless of the network state (clamped or undamped)
Signal Propagation in the Boltzmann Network We shall now define the
most basic of the needed subroutines, the propagate procedure The gorithm for this procedure, which follows, presumes that the user-providedapply-input and not-yet-defined set-temp functions have been executed
al-to initialize the outputs of the network's units and temperature parameter
to the desired states
procedure propagate (NET:BOLTZMANN)
{perform one signal propagation pass through network.}var unit : integer; {randomly selected unit}
p : float; {probability of unit being on}neti : float; {net input to unit}threshold : integer; {point at which unit turns on}
i, j : integer; {iteration counters}inputs : "float[]; {pointer to unit outputs array}connects : ~float[]; {pointer to unit weights array}undamped : integer; {index to first undamped unit}firstc : integer; (index to first connection}begin
{locate the first nonvisible unit, assuming first
index = 1}
undamped = NET OUTPUT FIRST + NET OUTPUT LENGTH - 1;
Trang 5if (NET.INPUTS.FIRST = NET.OUTPUTS.FIRST)
then firstc = NET.INPUTS.FIRST {Boltzmann completion}
else firstc = NET.INPUTS.LENGTH + 1;
{Boltzmann input-output}end if;
for i = 1 to NET.UNITS {for as many units in network}do
if (NET.CLAMPED) {if network is clamped}then {select an undamped unit}unit = random (NET UNITS - undamped)
for j = firstc to NET.UNITS
{all connections to unit}
do {compute sum of products}neti = neti + inputs[j] * connects[j];
end do;
{this next statement is used to improve
performance, as described in the text}
Trang 65.3 The Boltzmann Simulator 199
Before we move on to the next routine, there are three aspects of thepropagate procedure that bear further discussion: the selection mechanismfor unit update, the computation of the neti term, and the method we havechosen for determining when a unit is or is not active
In the first case, the Boltzmann network must be able to run with its puts either clamped or free-running So that we do not need to have differentpropagate routines for each mode, we simply use a Boolean variable inthe network record to indicate the current mode of operation, and enable thepropagate routine to select a unit for update accordingly If the network
in-is clamped, we cannot select an input or output unit for update We account
for these differences by assuming that the visible units to the network are thefirst TV units in the layer We thus can be assured that the visible units willnot change if we simply select a random unit from the set of units that do not
include the first N units We accomplish this selection by decreasing the range
of the random-number generator to the number of network units minus N, and
then adding TV to the result Since we have decided that all our arrays will usethe first TV indices to locate the visible units, generating a random index greaterthan TV will always select a random unit beyond the range of the visible units
However, if the network is undamped, any unit must be available for update.
Inspection of the algorithm for propagate will reveal that these two casesare handled by the if-then-else clause at the beginning of the routine.Second, there are two salient points regarding the computation of the netiterm with respect to the propagate routine The first point is that connec-tions between input units are processed only when the network is configured
as a Boltzmann completion network In the Boltzmann input-output mode,connections between input units do not exist This structure conforms to themathematical model described earlier The second point about the calculation
of the neti term is that we have obviously wasted computer time by ing a connection from each unit to itself twice, once as part of the summationloop during the calculation of the neti value, and once to subtract it out afterthe total neti has been calculated The reason we have chosen to implementthe algorithm in this manner is, again, to improve performance Even though
process-we have consumed computer time by processing a nonexistent connection forevery unit in the network, we have used far less time than would be required
to disallow the computation of the missing connection selectively during every iteration of the summation loop Furthermore, we can easily eliminate the error
introduced in the input summation by processing the nonexistent connection by
subtracting out just that term after completing the loop, prior to updating theoutput of the unit You might also observe that we have wasted memory byallocating space for the connections between each unit and itself We have cho-sen to implement the network in this fashion to simplify processing, and thus
to improve performance as described
As an example of why it is desirable to optimize the code at the expense
of wasted memory, consider the alternative case where only valid connections
Trang 7are modeled Since no unit has a connection to itself, but all units have outputsmaintained in the same array, the code to process all input connections to a unitwould have to be written as two different loops: one for those input PEs thatprecede the current unit, where the array indices for outputs and connectionscorrespond one-to-one, and one loop for inputs from units that follow, whereunit outputs are displaced by one array entry from the corresponding connection.This situation occurs because we have organized the unit outputs and connec-tions as linearly sequential arrays in memory Such a situation is illustrated inFigure 5.7.
Figure 5.7 The illustration shows array processing (a) when memory is
allocated for all possible connections, and (b) when memory
is not allocated for intra-unit connections In (a), the code necessary to perform this input summation simply computes the input value for all connections, then eliminates the error introduced by processing the nonexistent connection to itself.
In (b), the code must be more selective about accessing connections, since the one-to-one mapping of connections to units is lost Obviously, approach (a) is our preferred method, since it will execute much faster than approach (b).
Trang 85.3 The Boltzmann Simulator 201
Finally, with respect to deciding when to activate the output of a unit, recall
that the Boltzmann network differs from the other networks that we have studied
in that PEs are activated stochastically rather than deterministically Recall that
the equation
defines how we calculate the probability that a unit x/t is active with respect
to its input stimulation (net,t) However, simply knowing the probability that
a unit will generate an output does not guarantee that the unit will generate an
output We must therefore implement a mechanism that allows the computer totranslate the calculated probability into a unit output that occurs with the same
probability; in effect, we must let the computer roll the dice to determine when
an output is active and when it is not
One method for doing this is to make use of the pseudorandom-number erator available in most high-level computer languages Here, we take advantage
gen-of the fact that the computed probability, p^, will always be a fractional number
ranging between zero and one, as illustrated by the graph depicted in Figure 5.8
We can map p/, to an integer threshold value between zero and some arbitrarily
large number by simply multiplying the ceiling value by the computed ity and rounding the result into an integer We then generate a random numberbetween zero and the selected ceiling, and, if the probability does not exceedthe threshold value just computed, the output of the unit is set to one Assuming
probabil-that the pseudorandom-number generator has a uniform probability distribution
across the interval of interest, the random number produced will not exceed the
threshold value with a probability equal to the specified value, pk Thus, we
now have a means of stochastically activating unit outputs in the network
Net input
Figure 5.8 Shown here is a graph of the probability, p k , that the /cth unit
is on at five different temperatures, T.
Trang 9Boltzmann Learning Algorithms There are five additional functions that
must be defined to train the Boltzmann network:
set_temp A function used to update the parameters in the BOLTZMANN record
to reflect the network temperature at the current step, as specified in theannealing schedule
pplus A function used to compute and average the co-occurrence probabilitiesfor a network with clamped inputs after it has reached equilibrium at theminimum temperature
pminus A function similar to pplus, but used when the network is running
free-update_connections The procedure that modifies the connection weights
in the network to train the Boltzmann simulator
The implementation of the set-temp function is straightforward, as fined here:
de-procedure set_temp (NET:BOLTZMANN; N:integer)
{set the temperature and schedule step}
devel-We shall now turn our attention to the computation of the co-occurrence
probability, pj, when the input to the network is clamped to an arbitrary input
vector, Va As we did with propagate, we will assume that the input pattern
has been placed on the input units by an earlier call to set-inputs
Fur-thermore, we shall assume that the statistics arrays have been initialized by an
earlier call to a user-supplied routine that we refer to as zero-Statistics.procedure sum_cooccurrence (NET:BOLTZMANN)
{accumulate co-occurence statistics for the
specified network}var i,j,k : integer; {loop counters}
connect : integer; {co-occurence index}
stats : "float[]; {pointer to statistics array}
Trang 105.3 The Boltzmann Simulator 203
begin
if (NET.CLAMPED) {if network is clamped}
then stats = NET.STATISTICS.CLAMPED
else stats = NET.STATISTICS.UNCLAMPED;
end if;
for i = 1 to 5 {arbitrary number of cycles}do
propagate (NET); {run the network once}
connect = 1; {start at first pair}
stats"[connect] = stats"[connect] + 1;end if;
connect = next (connect);
Before we define the algorithm needed to estimate the pt term for theBoltzmann network, we will make a few assumptions Since the total number oftraining patterns that the network must learn will depend on the application, we
Trang 11must write the code so that the computer will calculate the co-occurrence tics for a variable number of training patterns We must therefore assume thatthe training data are available to the simulator from some external source (such
statis-as a global array or disk file) that we will refer to statis-as PATTERNS, and that thetotal number of training patterns contained in this source is obtainable through
a call to an application-defined function that we will call how_many We alsopresume that you will provide the routines to initialize the co-occurrencearrays to zero, and set the outputs of the input network units to the state spec-ified by the Uh pattern in the PATTERNS data source We will refer to theseprocedures as initialize_arrays and set-inputs, respectively Based
on these assumptions, we shall now define our algorithm for computing pplus:procedure pplus (NET:BOLTZMANN)
var trials : integer;
i : integer;
{average over trials}
{loop counter}begin
trials = how_many (PATTERNS) * 5;
{five sums per pattern}for i = 1 to trials {for all trials}do
NET.STATISTICS.CLAMPED"[i] = {average results}
5.3.4 The Complete Boltzmann Simulator
Now that we have defined all the lower-level functions needed to implement theBoltzmann network, we shall describe the algorithms needed to tie everythingtogether As previously stated, the two user-provided routines (set-inputsand get outputs) are assumed to initialize and recover input and output data
to or from the simulator for an external process However, we have yet todefine the two intermediate routines that will be used to perform the networksimulation given the externally provided inputs We now begin to correct thatdeficiency by describing the algorithm for the anneal process
procedure anneal (NET:BOLTZMANN)
{perform one pass through annealing schedule for
current input}var passes : integer; {passes at current temperature}steps : integer; {number of steps in schedule}
i, j : integer; {loop counters}
Trang 12passes = NET.ANNEALING".STEP[i].PASSES;
set_temp (NET, i); {set current annealing step}for j = 1 to passes {for all passes in step}do
during the annealing process This routine will compute and apply the Aw term
for each connection in the network To simplify the program, we assume that
the e constant contained in Eq (5.35) will always be 0.3.
procedure update_connections (NET:BOLTZMANN)
{update all connections based on cooccurence statistics}var connect : "float []; {pointer to connection array}
pp, pm : float[]; {statistics arrays}dupconnect : "float[];
{pointer to duplicate connection}
i, j, stat : integer; {iteration indices}begin
Trang 13in two different arrays, such that these arrays always contain the same data Thealgorithm for update_connections satisfies this requirement by locating theassociated twin connection during every update cycle, and copying the new valuefrom the current connection to the twin connection, as illustrated in Figure 5.9.
We shall now describe the algorithm used to train the Boltzmann simulator.Here, as before, we assume that the training patterns to be learned are contained
in a globally accessible storage array named PATTERNS, and that the number
of patterns in this array is obtainable through a call to an application-definedroutine, howjnany Notice that in this function, we call the user-suppliedroutine, zero-Statistics, to initialize the statistics arrays
outs
/ Twin connections
Twin connections
Figure 5.9 Updating of the connections in the Boltzmann simulator is
shown The weights in the arrays highlighted by the darkened boxes represent connections modified by one pass through the update_connections procedure.
i
Trang 145.4 Using the Boltzmann Simulator 207
function learn (NET:BOLTZMANN)
{cause network to learn input PATTERNS}
var i : integer; {iteration counters}
begin
NET.CLAMPED = true; {clamp visible units)
zero_statistics (NET); {init statistics arrays}
for i = 1 to how_many (PATTERNS)
do
set_inputs (NET, PATTERNS, i);
anneal (NET); {apply annealing schedule)sum_cooccurrence (NET); {collect statistics)end do;
pplus (NET); {estimate p+)
NET.CLAMPED = false; {unclamp visible units}
for i = 1 to how_many (PATTERNS)
do
set_inputs (NET, PATTERNS, i);
anneal (NET); {apply annealing schedule}sum_cooccurrence (NET); {collect statistics)end do;
pminus (NET); {estimate p-}update_connections (NET); {modify connections}end procedure;
The algorithm necessary to have the network recall a pattern given an inputpattern (production mode) is straightforward, and now depends on only theroutines defined by the user to apply the new input pattern to the network and toread the resulting output These routines, apply_inputs and get.outputs,respectively, are combined with anneal to generate the desired output, asshown next:
procedure recall (NET:BOLTZMANN; INVEC,OUTVEC:"float[])
{stimulate the network to generate an output from input)begin
apply_inputs (NET, INVEC); {set the input)
anneal (NET); {generate output)
get_output (NET, OUTVEC); {return output)
end procedure;
5.4 USING THE BOLTZMANN SIMULATOR
With the exception of the backpropagation network described in Chapter 3,the Boltzmann network is probably the most general-purpose network of thosediscussed in this text It can be used either as an associative memory or as
Trang 15a mapping network, depending only on whether the output units overlap theinput units These two operating modes encompass most of the common prob-lems to which ANS systems have been successfully applied Unfortunately,the Boltzman network also has the distinction of being the slowest of all thesimulators Nevertheless, there are several applications that can be addressed
using the Boltzmann network; in this section, we describe one
This application uses the Boltzmann input-output model to associate terns from "symptom" space with patterns in "diagnosis" space
pat-5.4.1 Boltzmann Symptom-Diagnosis Application
Let's consider a specific example of a symptom-diagnosis application We willuse an automobile diagnostic application as the basis for our example Specif-ically, we will focus on an application that will diagnose why a car will notstart We first define the various symptoms to be considered:
• Does nothing: Nothing happens when the key is turned in the ignitionswitch
• Clicks: A loud clicking noise is generated when the key is turned
• Grinds: A loud grinding noise is generated when the key is turned
• Cranks: The engine cranks as though trying to start, but the engine does
not run on its own
• No spark: Removing one of the spark-plug wires and holding the terminalnear the block while cranking the engine produces no spark
• Cable hot: After the engine has been cranked, the cable running from thebattery to the starter solenoid is hot
• No gas: Removing the fuel line from the carburetor (fuel injector) andcranking the engine produces no gas flow out of the fuel line
Next, we consider the possible causes of the problem, based on the
symptoms:
• Battery: The battery is dead
• Solenoid: The starter solenoid is defective
• Starter: The starter motor is defective
• Wires: The ignition wires are defective
• Distributor: The distributor rotor or cap is corroded
• Fuel pump: The fuel pump is defective
Although our list is not a complete representation of all possible problems,any one or a combination of these problems could be indicated by the symp-toms To complete our example, we shall construct a matrix indicating the
Trang 165.4 Using the Boltzmann Simulator 209
X
X X X
X X X
X X
An examination of this matrix indicates the variety of problems that can
be indicated by any one symptom The matrix also illustrates the problem
we encounter when we attempt to program a system to perform the diagnosticfunction: There rarely is a one-to-one correspondence between symptoms andcauses To be successful, our automated diagnostic system must be able to
correlate many different symptoms, and, in the event that some symptoms may
be overlooked or absent, must be able to "fill in the blanks" of the problembased on just the indicated symptoms
5.4.2 The Boltzmann Solution
We will now examine how a Boltzmann network can be applied to the diagnosis example we have created The first step is to construct the networkarchitecture that will solve the problem for us Since we would like to be able
symptom-to provide the network with observed sympsymptom-toms, and symptom-to have it respond withprobable cause, a good candidate architecture would be to map each symptomdirectly to an individual input PE, and each probable causes to an individ-
Trang 17ual output PE Since our application requires outputs that are different fromthe inputs, we select the Boltzmann input-output network as the best candi-date.
Using the data from our example, we will need a network with seven inputunits and six output units That leaves only the number of internal units unde-termined In this case, there is nothing to indicate how many hidden units will
be required to solve the problem, and no external interface considerations thatwill limit the number of hidden units (as there were in the data-compressionexample described in Chapter 3) We therefore arbitrarily size the Boltzmannnetwork such that it contains 14 internal units If training indicates that we needmore units in order to converge, they can be added at a later time If we needfewer units, the extras can be eliminated later, although there is no overwhelm-ing reason to remove them in such a small network other than improving theperformance of the simulator
Next, we must define the data sets to be used to train the network Referringagain to our example matrix, we can consider the data in the row vectors ofthe matrix as seven-dimensional input patterns; that is, for each probable-causeoutput that we would like the network to learn, there are seven possible symp-toms that indicate the problem by their existence or absence This approach willprovide six training-vector pairs, each consisting of a seven-element symptompattern and a six-element problem-indication pattern
We let the existence of a symptom be indicated by a 1, and the absence of asymptom be represented by a 0 For any given input vector, the correct cause (orcauses) is indicated by a logic 1 in the proper position of the output vector Thetraining-vector pairs produced by the mapping in the symptom-problem matrixfor this example are illustrated in Figure 5.11 If you compare Figures 5.11 and5.10, you will notice slight differences You should convince yourself that thedifferences are justified
Symptoms Likely causes
Trang 18Suggested Readings 211
All that remains from this point is to train the network on these data pairs
using the Boltzmann algorithms Once trained, the network will produce anoutput identifying the probable cause indicated by the input symptom map.The network will do this when the input is equivalent to one of the training
inputs, as expected, and it will produce an output indicating the likely cause
of the problem when the input is similar to, but different from, any traininginput This application illustrates the "best-guess" capability of the network andhighlights the network's ability to deal with noisy or incomplete data inputs
Programming Exercises
5.1 Develop the pseudocode design for the set-inputs, apply.inputs,
and
get_outputs routines
5.2 Develop the pseudocode design for the pminus routine
5.3 The pplus and pminus as described are largely redundant and can becombined into a single routine Develop the pseudocode design for such a
routine
5.4 Implement the Boltzmann simulator and test it with the automotive nostic data described in Section 5.4 Compare your results with ours, and
diag-discuss reasons for any differences
5.5 Implement the Boltzmann simulator and test it on an application of yourown choosing Describe the application and your choice of training data,and discuss reasons why the test did or did not succeed
5.6 Modify the simulator to contain two additional variable parameters,
epsilon (c) and cycles, as part of the network record structure.
Epsilon will be used to calculate the connection-weight change, stead of the hard-coded 0.3 constant described in the text, and cyclesshould be used to specify the number of iterations performed during thesum-cooccurrence routine (instead of the five we specified) Retrainthe network using the automotive diagnostic data with a different valuefor epsilon, then change cycles, and then change both parameters.Describe any performance variations that you observed
An early account of using simulated annealing to solve optimization problems
Trang 19is given in a paper by Kirkpatrick, Gelatt, and Vecchi [5] The concept ofusing the Cauchy distribution to speed the annealing process is discussed in a
paper by Szu [8] A Byte magazine article by Hinton contains an algorithm for
the Boltzmann machine that is slightly different from the one presented in thischapter [4]
distribu-and Edward Rosenfeld, editors, Neurocomputing MIT Press, Cambridge,
MA, pages 614-634, 1988 Reprinted from IEEE Transactions of Pattern Analysis and Machine Intelligence PAMI-6: 721-741, 1984.
[3] G E Hinton and T J Sejnowski Learning and relearning in Boltzmannmachines In David E Rumelhart and James L McClelland, editors,
Parallel Distributed Processing: Explorations in the Microstructure of Cognition MIT Press, Cambridge, MA, pages 282-317, 1986.
[4] Geoffrey E Hinton Learning in parallel networks Byte, 10(4):265-273,
[6] F Reif Fundamentals of Statistical and Thermal Physics McGraw-Hill
series in fundamental physics McGraw-Hill, New York, 1965
[7] C E Shannon The mathematical theory of communication In C E
Shan-non and W Weaver, editors, The Mathematical Theory of Communication.
University of Illinois Press, Urbana, IL, pages 29-125, 1963
[8] Harold Szu Fast simulated annealing In John S Denker, editor, Neural Networks for Computing American Institute of Physics, New York, pages
420-425, 1986
Trang 20known as a competitive network and Grossberg's outstar structure [5, 6]
Al-though the network architecture, as it appears in its originally published form
in Figure 6.1, seems rather formidable, we shall see that the operation of thenetwork is quite straightforward
Given a set of vector pairs, (xi,yi), (^2^2), • • • , (xz,,yt), the CPN can learn
to associate an x vector on the input layer with a y vector at the output layer
If the relationship between x and y can be described by a continuous function,
<J>, such that y = ^(x), the CPN will learn to approximate this mapping for anyvalue of x in the range specified by the set of training vectors Furthermore, ifthe inverse of $ exists, such that x is a function of y, then the CPN will also learnthe inverse mapping, x = 3>~'(y).' For a great many cases of practical interest,
the inverse function does not exist In these situations, we can simplify thediscussion of the CPN by considering only the forward-mapping case, y = $(x)
In Figure 6.2, we have reorganized the CPN diagram and have restrictedour consideration to the forward-mapping case The network now appears as
'We are using the term function in its strict mathematical sense If y is a function of x then every value of x corresponds to one and only one value of y Conversely, if x is a function of y, then every value of y corresponds to one and only one value of x An example of a function whose inverse is not a function is, y = x 2 , — oo < x < oo A somewhat more abstract, but perhaps
more interesting, situation is a function that maps images of animals to the name of the animal For example, "CAT" = 3>("picture of cat") Each picture represents only one animal, but each animal corresponds to many different pictures.
213