7.3.1 The SOM Data Structures From our theoretical discussion earlier in this chapter, we know that the SOM is structured as a two-layer network, with a single vector of input units vidi
Trang 1276 Self-Organizing Maps
PP
Figure 7.10 This illustration shows the sequence of responses from the
phonotopic map resulting from the spoken F i n n i s h word
humppila (Do not bother to look up the meaning of this word
in your F i n n i s h - E n g l i s h dictionary: humppila is the name of a place.) Source: Reprinted with permission from Teuvo Kohonen, ' 'The neural phonetic typewriter." IEEE Computer, March 1988.
©1988 IEEE.
mechanism; thus, the torques that cause a particular motion must be known inadvance Figure 7.11 illustrates the simple, two-dimensional robot-arm modelused in this example
For a particular starting position, x, and a particular, desired end-effectorvelocity, Udesired, the required torques can be found from
where T is the vector ("n^)'.3 The tensor quantity, A (here, simply a dimensional matrix) is determined by the details of the arm and its configuration.Ritter and Schulten use Kohonen's SOM algorithm to learn the A(x) quantities
two-A mechanism for learning the two-A tensors would be useful in a real environmentwhere aging effects and wear might alter the dynamics of the arm over time.The first part of the method is virtually identical to the two-dimensionalmapping example discussed in Section 7.1 Recall that, in that example, units
- Torque itself is a vector quantity, defined as the time-rate of change of the angular momentum
vector Our vector T is a composite of the magnitudes of two torque vectors, T\ and r* The directions of r\ and r^ can be accounted for by their signs: r > 0 implies a counterclockwise
Trang 27.2 Applications of Self-Organizing Maps 277
Figure 7.11 This figure shows a schematic of a simple robot arm and
its space of permitted movement The arm consists of twomassless segments of length 1.0 and 0.9, with unit, point
masses at its distal joint, d, and its end effector, e The end
effector begins at some randomly selected location, x, within
the region R The joint angles are Q\ and 6*2- The desired
movement of the arm is to have the end effector move atsome randomly selected velocity, udesired- For this movement
to be accomplished, torques r\ and r^ must be applied at the
Ritter and Schulten begin with a two-dimensional array of units
identi-fied by their integer coordinates, (i,j), within the region R of Figure 7.11.
Instead of using the coordinates of a selected point as inputs, they use thecorresponding values of the joint angles Given suitable restrictions on the
values of 9\ and #2> there will be a one-to-one correspondence between the joint-angle vector, 6 — (&i,02)', and the coordinate vector, x = (xj,.^)'.
Other than this change of variables, and the use of a different model for theMexican-hat function, the development of the map proceeds as described inSection 7.1:
1 Select a point x within R according to a uniform random distribution.
2 Determine the corresponding 0* =
Trang 3278 Self-Organizing Maps
3 Select the winning unit, y*, such that
||0(y*)-01=min.||0(y)-01
y
4 Update the theta vector for all units according to
0(y, t+l) = 0(y,«) + /i,(y - y*, t)(0* - 0(y,*)) The function h\(y — y*,i) defines the model of the Mexican-hat function:
It is a Gaussian function centered on the winning unit Therefore, the
neighborhood around the winning unit that gets to share in the victory
encompasses all of the units Unlike in the example in Section 7.1, however,the magnitude of the weight updates for the neighboring units decreases as a
function of distance from the winning unit Also, the width of the Gaussian
is decreased as learning proceeds
So far, we have not said anything about how the A(x) matrices arelearned That task is facilitated by association of one A tensor with eachunit of the SOM network Then, as winning units are selected according to
the procedure given, A matrices are updated right along with the 0 vectors.
We can determine how to adjust the A matrices by using the difference
between the desired motion, laired* and the actual motion, v, to determine
successive approximations to A In principle, we do not need a SOM to
accomplish this adjustment We could pick a starting location, then tigate all possible velocities starting from that point, and iterate A until itconverges to give the expected movements Then, we would select another
inves-starting point and repeat the exercise We could continue this process untilall starting locations have been visited and all As have been determined
The advantage of using a SOM is that all A matrices are updated
simultaneously, based on the corrections determined for only one ing location Moreover, the magnitude of the corrections for neighboringunits ensures that their A matrices are brought close to their correct values
start-quickly, perhaps even before their associated units have been selected via
the 6 competition So, to pick up the algorithm where we left off,
5 Select a desired velocity, u, with random direction and unit magnitude,
||u|| = 1 Execute an arm movement with torques computed from T =
A(x)u, and observe the actual end-effector velocity, v
6 Calculate an improved estimate of the A tensor for the winning unit:
A(y*, t + 1) = A(y*, t) + eA(y*, t)(u - v)v'
where e is a positive constant less than 1
7 Finally, update the A tensor for all units according to
A(y, t+l) = A(y, t) + My - y*, *)(A(y*, t + 1) - A(y, t))
where hi is a Gaussian function whose width decreases with time.
Trang 47.3 Simulating the SOM 279
The result of using a SOM in this manner is a significant decrease in theconvergence time for the A tensors Moreover, the investigators reported thatthe system was more robust in the sense of being less sensitive to the initialvalues of the A tensors
7.3 SIMULATING THE SOM
As we have seen, the SOM is a relatively uncomplicated network in that it hasonly two layers of units Therefore, the simulation of this network will nottax the capacity of the general network data structures with which we have, bynow, become familiar The SOM, however, adds at least one interesting twist
to the notion of the layer structure used by most other networks; this is the firsttime we have dealt with a layer of units that is organized as a two-dimensionalmatrix, rather than as a simple one-dimensional vector To accommodate thisnew dimension, we will decompose the matrix conceptually into a single vectorcontaining all the row vectors from the original matrix As you will see inthe following discussion, this matrix decomposition allows the SOM simulator
to be implemented with minimal modifications to the general data structuresdescribed in Chapter 1
7.3.1 The SOM Data Structures
From our theoretical discussion earlier in this chapter, we know that the SOM
is structured as a two-layer network, with a single vector of input units viding stimulation to a rectangular array of output units Furthermore, units
pro-in the output layer are pro-interconnected to allow lateral pro-inhibition and excitation,
as illustrated in Figure 7.12(a) This network structure will be rather some to simulate if we attempt to model the network precisely as illustrated,because we will have to iterate on the row and column offsets of the outputunits Since we have chosen to organize our network connection structures asdiscrete, single-dimension arrays accessed through an intermediate array, there
cumber-is no straightforward means of defining a matrix of connection arrays withoutmodifying most of the general network structures We can, however, reducethe complexity of the simulation task by conceptually unpacking the matrix ofunits in the output layer, reforming them as a single layer of units organized as
a long vector composed of the concatenation of the original row vectors
In so doing, we will have essentially restructured the network such that itresembles the more familiar two-layer structure, as shown in Figure 7.12(b) As
we shall see, the benefit of restructuring the network in this manner is that it
will enable us to efficiently locate, and update, the neighborhood surrounding
the winning unit in the competition
If we also observe that the connections between the units in the output layercan be simulated on the host computer system as an algorithmic determination ofthe winning unit (and its associated neighborhood), we can reduce the processing
Trang 5Input units
(b)
Figure 7.12 The conceptual model of the SOM is shown, (a) as described
by the theoretical model, and (b) restructured to ease the simulation task.
Trang 67.3 Simulating the SOM 281
model of the SOM network to a simple two-layer, feedforward structure Thisreduction allows us to simulate the SOM by using exactly the data structuresdescribed in Chapter I The only network-specific structure needed to implementthe simulator is then the top-level network-specification record For the SOM,such a record takes the following form:
record SOM =
ROWS : integer; {number of rows in output layer)COLS : integer; {ditto for columns}INPUTS : "layer; {pointer to input layer structure}OUTPUTS : "layer; {pointer to output layer structure}WINNER : integer; {index to winning unit}deltaR : integer; {neighborhood row offset}deltaC : integer; {neighborhood column offset}TIME : integer; {discrete timestep}end record;
7.3.2 SOM Algorithms
Let us now turn our attention to the process of implementing the SOM simulator
As in previous chapters, we shall begin by describing the algorithms needed topropagate information through the network, and shall conclude this section bydescribing the training algorithms Throughout the remainder of this section,
we presume that you are by now familiar with the data structures we use tosimulate a layered network Anyone not comfortable with these structures isreferred to Section 1.4
SOM Signal Propagation In Section 6.4.4, we described a modification tothe counterpropagation network that used the magnitude of the difference vectorbetween the unnormalized input and weight vectors as the basis for determiningthe activation of a unit on the competitive layer We shall now see that thisapproach is a viable means of implementing competition, since it is the basicmethod of stimulating output units in the SOM
In the SOM, the input layer is provided only to store the input vector.For that reason, we can consider the process of forward signal propagation
to be a matter of allowing the computer to visit all units in the output layersequentially At each output-layer unit, the computer calculates the magnitude of
the difference vector between the output of the input layer and the weight vectorformed by the connections between the input layer and the current unit Aftercompletion of this calculation, the magnitude will be stored, and the computerwill move on to the next unit on the layer Once all the output-layer units havebeen processed, the forward signal propagation is finished, and the output of thenetwork will be the matrix containing the magnitude of the difference vector foreach unit in the output layer
If we also consider the training process, we can allow the computer tostore locally an index (or pointer) to locate the output unit that had the smallest
Trang 7282 Self-Organizing Maps
difference-vector magnitude during the initial pass That index can then beused to identify the winner of the competition By adopting this approach, wecan also use the routine used to forward propagate signals in the SOM during
training with no modifications
Based on this strategy, we shall define the forward signpropagation gorithm to be the combination of two routines: one to compute the difference-vector magnitude for a specified unit on the output layer, and one to call the firstroutine for every unit on the output layer We shall call these routines propand propagate, respectively We begin with the definition of prop
al-function prop (NET:SOM; UNIT:integer) return float
{compute the magnitude of the difference vector for UNIT}var invec, connects
sum, mag : float;
i : integer;
'float[]; {locate arrays}
{temporary variables}{iteration counter}begin
invec = NET.INPUTS".CUTS"; {locate input vector}connects = NET.OUTPUTS".WEIGHTS"[UNIT]; {connections}sum = 0; {initialize sum}for i = 1 to length(invec) {for all inputs}
do {square of difference}sum = sum + sqr(invec[i] - connect[i]);
Now that we can compute the output value for any unit on the outputlayer, let us consider the routine to generate the output for the entire network.Since we have defined our SOM network as a standard, two-layer network, thepseudocode definition for propagate is straightforward
function propagate (NET:SOM) return integer
{propagate forward through the SOM, return the index to
winner}var outvec : "float [];
{locate output array}{initialize winner}{arbitrarily high}{for all outputs}
Trang 87.3 Simulating the SOM 283
do
mag = prop(NET, i) ; {activate unit}
outvecfi] = mag; {save output}
if (mag < smallest) {if new winner is found}then
winner = i; {mark new winner}smallest = mag; {save winning value)end if;
end do;
NET WINNER = winner; {store winning unit id}
return (winner) ; {identify winner}end function;
SOM Learning Algorithms Now that we have developed a means for
per-forming the forward signal propagation in the SOM, we have also solved thelargest part of the problem of training the network As described by Eq (7.4),learning in the SOM takes place by updating of the connections to the set of out-
put units that fall within the neighborhood of the winning unit We have already
provided the means for determining the winner as part of the forward signalpropagation; all that remains to be done to train the network is to develop theprocesses that define the neighborhood (7VC) and update the connection weights.Unfortunately, the process of determining the neighborhood surrounding thewinning unit is likely to be application dependent For example, consider thetwo applications described earlier, the neural phonetic typewriter and the ballisticarm movement systems Each implemented an SOM as the basic mechanismfor solving their respective problems, but each also utilized a neighborhood-selection mechanism that was best suited to the application being addressed
It is likely that other problems would also require alternative methods bettersuited to determining the size of the neighborhood needed for each application.Therefore, we will not presuppose that we can define a universally acceptable
function for N c
We will, however, develop the code necessary to describe a typicalneighborhood-selection function, trusting that you will learn enough from theexample to construct a function suitable for your applications For simplicity,
we will design the process as two functions: the first will return a true-falseflag to indicate whether a certain unit is within the neighborhood of the winningunit at the current timestep, and the second will update the connection values at
an output unit, if the unit falls within the neighborhood of the winning unit.The first of these routines, which we call neighbor, will return a trueflag if the row and column coordinates of the unit given as input fall within therange of units to be updated This process proves to be relatively easy, in thatthe routine needs to perform only the following two tests:
(R w - AR) <R<(R W
(C w - AC1) < C < (C w + AC)
Trang 9284 Self-Organizing Maps
-«———— AC = 1 —————+~
Figure 7.13 A simple scheme is shown for dynamically altering the size
of the neighborhood surrounding the winning unit In thisdiagram, W denotes the winning unit for a given input vector.The neighborhood surrounding the winning unit is then given
by the values contained in the variables deltaR and-deltaCcontained in the SOM record As the values in deltaR anddeltaC approach zero, the neighborhood surrounding thewinning unit shrinks, until the neighborhood is precisely thewinning unit.
where (R w , C w ) are the row and column coordinates of the winning unit, (A-R,
AC) are the row and column offsets from the winning unit that define the
neighborhood, and (R, C) the row and column coordinates of the unit being
tested
For example, consider the situation illustrated in Figure 7.13 Notice that
the boundary surrounding the winner's neighborhood shrinks with successively
smaller values for (A/?, AC), until the neighborhood is limited to the winnerwhen (A.R, AC) = (0,0) Thus, we need only to alter the values for (A/?, AC)
in order to change the size of the winner's neighborhood
So that we can implement this mechanism of neighborhood determination,
we have incorporated two variables in the SOM record, which we have nameddeltaR and deltaC, which allow the network record to keep the current
values for the &.R and AC terms Having made this observation, we can now
define the algorithm needed to implement the neighbor function
function neighbor (NET:SOM; R,C,W:integer) return boolean {return true if ( R , C ) is in the neighborhood of W}
var row, col, {coordinates of winner} dRl, dCl, {coordinates of lower boundary} dR2, dC2 : integer; {coordinates of upper boundary} begin
Trang 107.3 Simulating the SOM 285
dR2 = min(NET.ROWS, (row + NET.deltaR));
dCl = max(l, (col - NET.deltaC));
dC2 = min(NET.COLS, (col + NET.deltaC));
return (((dRl <= R) and (R <= dR2)) and
((dCl <= C) and (C <= dC2)));
end function;
Note that the algorithm for neighbor relies on the fact that the arrayindices for the winning unit (W) and the number of rows and columns in the
SOM output layer are presumed to start at 1 and to run through n If the first
index is presumed to be zero, the determination of the row and col valuesdescribed must be adjusted, since zero divided by anything is zero Similarly,the min and max functions utilized in the algorithm are needed to protect againstthe case where the winning unit is located on an "edge" of the network output.Now that we can determine whether or not a unit is in the neighborhood ofthe winning unit in the SOM, all that remains to complete the implementation
of the training algorithms is the function needed to update the weights to all theunits that require updating We shall design this algorithm to return the number
of units updated in the SOM, so that the calling process can determine when theneighborhood around the winning unit has shrunk to just the winning unit (i.e.,when the number of units updated is equal to 1) Also, to simplify things, we
shall assume that the a(t) term given in the weight-update equation (Eq 7.2) is
simply a small constant value, rather than a function of time In this example
algorithm, we define the a(t) parameter as the value A.
function update (NET:SOM) return integer
{update the weights to all winning units,
returning the number of winners updated}constant A : float = 0.3; {simple activation constant}var winner, unit, upd : integer;
{indices to output units}invec : "float[];
{locate unit output arrays}connect : "float []; {locate connection array}
i, j, k : integer; {iteration counters}begin
winner = propagate (NET); {propagate and find winner}unit = 1 ; {start at first output unit}upd = 0 ; {no updates yet}
Trang 11{first locate the appropriate connection array}connect = NET.OUTPUTS".WEIGHTS"[unit];
{then locate the input layer output array}invec = NET.INPUTS".OUTS";
upd = upd + 1; {count another update}for k = 1 to length(connect)
{for all connections}do
connect[k] = connect[k]
+ (A*(invec[k]-connect[k]));end do;
7.3.3 Training the SOM
Like most other networks, the SOM will be constructed so that it initially
con-tains random information The network will then be allowed to self-adapt by
being shown example inputs that are representative of the desired topology Ourcomputer simulation ought to mimic the desired network behavior if we simplyfollow these same guidelines when constructing and training the simulator
There are two aspects of the training process that are relatively simple to
implement, and we assume that you will provide them as part of the tation of the simulator These functions are the ones needed to initialize the
implemen-SOM (initialize) and to apply an input vector (set-input) to the inputlayer of the network
Most of the work to be done in the simulator will be accomplished by thepreviously defined routines, and we need to concern ourselves now with onlythe notion of deciding how and when to collapse the winning neighborhood as
we train the network Here again, this aspect of the design probably will beinfluenced by the specific application, so, for instructional purposes, we willrestrict ourselves to a fairly easy application that allows each of the output layerunits to be uniquely associated with a specific input vector
For this example, let us assume that the SOM to be simulated has four rows
of five columns of units in the output layer, and two units providing input Such
Trang 127.3 Simulating the SOM 287
Figure 7.14 This SOM network can be used to capture organization of
a two-dimensional image, such as a triangle, circle, or anyregular polygon In the programming exercises, we ask you
to simulate this network structure to test the operation of yoursimulator program
a network structure is depicted in Figure 7.14 We will code the SOM simulator
so that the entire output layer is initially contained in the neighborhood, and
we shall shrink the neighborhood by two rows and two columns after every 10training patterns
For the SOM, there are two distinct training sessions that must occur In thefirst, we will train the network until the neighborhood has shrunk to the pointthat only one unit wins the competition During the second phase, which occursafter all the training patterns have been allocated to a winning unit (although
not necessarily different units), we will simply continue to run the training
algorithm for an arbitrarily large number of additional cycles We do this to try
to ensure that the network has stabilized, although there is no absolute guaranteethat it has With this strategy in mind, we can now complete the simulator byconstructing the routine to initialize and train the network
procedure train (NET:SOM; NP:integer)
{train the network for each of NP patterns}
Trang 13288 Programming Exercises
NET.deltaR = NET.ROWS / 2;
{initialize the row offset}NET.deltaC = NET.COLS 7 2; {ditto for columns}NET.TIME = 0; {reset time counter}set_inputs(NET,i); {get training pattern}while (update(NET) > 1) {loop until one winner}do
NET.TIME = NET.TIME + 1;
{advance training counter}
if (NET.TIME % 10 = 0) {if shrink time}then
{shrink the neighborhood, with a floor
of (0,0)}NET.deltaR = max (0, NET.deltaR - 1);
NET.deltaC = max (0, NET.deltaC - 1);
for j = 1 to NP {for all patterns}do
set_inputs(NET, j); {set training pattern}dummy = update(NET); {train network}end do;
end do;
end procedure;
Programming Exercises
7.1 Implement the SOM simulator Test it by constructing a network similar
to the one depicted in Figure 7.14 Train the network using the Cartesiancoordinates of each unit in the output layer as training data Experiment withdifferent time periods to determine how many training passes are optimalbefore reducing the size of the neighborhood
7.2 Repeat Programming Exercise 7.1, but this time extend the simulator toplot the network dynamics using Kohonen's method, as described in Sec-tion 7.1 If you do not have access to a graphics terminal, simply list outthe connection-weight values to each unit in the output layer as a set ofordered pairs at various timesteps
7.3 The converge algorithm given in the text is not very general, in that itwill work only if the number of output units in the SOM is exactly equal
Trang 14Bibliography 289
to the number of training patterns to be encoded Redesign the routine to
handle the case where the number of training patterns greatly outnumbersthe number of output layer units Test the algorithm by repeating Program-ming Exercise 7.1 and reducing the number of output units used to threerows of four units
7.4 Repeat Programming Exercise 7.2, this time using three input units figure the output layer appropriately, and train the network to learn to map
Con-a three-dimensionCon-al cube in the first quCon-adrCon-ant (Con-all vertices should contCon-ainpositive coordinates) Do not put the vertices of the cube at integer co-ordinates Does the network do as well as the network in Programming
Exercise 7.2?
Suggested Readings
The best supplement to the material in this chapter is Kohonen's text on organization [2] Now in its second edition, that text also contains generalbackground material regarding various learning methods for neural networks, aswell as a review of the necessary mathematics
self-Bibliography
[1] Willian Y Huang and Richard P Lippmann Neural net and traditional
classifiers In Proceedings of the Conference on Neural Information cessing Systems, Denver, CO, November 1987.
Pro-[2] Teuvo Kohonen Self-Organization and Associative Memory, volume 8 of
Springer Series in Information Sciences Springer-Verlag, New York,
1984
[3] Teuvo Kohonen The "neural" phonetic typewriter Computer, 21(3): 11-22,
March 1988
[4] H Ritter and K Schulten Topology conserving mappings for learning
motor tasks In John S Denker, editor, Neural Networks for Computing.
American Institute of Physics, pp 376-380, New York, 1986
[5] H Ritter and K Schulten Extending Kohonen's self-organizing mappingalgorithm to learn ballistic movements In Rolf Eckmiller and Christoph
v.d Malsburg, editors Neural Computers Springer-Verlag, Heidelberg,
pp 393-406, 1987
[6] Helge J Ritter, Thomas M Martinetz, and Klaus J Schulten
Topology-conserving maps for learning visuo-motor-coordination Neural Networks,
2(3): 159-168, 1989
Trang 15i 1
Trang 16H A P T E R
Adaptive Resonance Theory
One of the nice features of human memory is its ability to learn many newthings without necessarily forgetting things learned in the past A frequentlycited example is the ability to recognize your parents even if you have not seenthem for some time and have learned many new faces in the interim It would
be highly desirable if we could impart this same capability to an ANS Mostnetworks that we have discussed in previous chapters will tend to forget oldinformation if we attempt to add new information incrementally
When developing an ANS to perform a particular pattern-classification ation, we typically proceed by gathering a set of exemplars, or training patterns,then using these exemplars to train the system During the training, information
oper-is encoded in the system by the adjustment of weight values Once the training
is deemed to be adequate, the system is ready to be put into production, and noadditional weight modification is permitted
This operational scenario is acceptable provided the problem domain haswell-defined boundaries and is stable Under such conditions, it is usuallypossible to define an adequate set of training inputs for whatever problem isbeing solved Unfortunately, in many realistic situations, the environment isneither bounded nor stable
Consider a simple example Suppose you intend to train a BPN to recognizethe silhouettes of a certain class of aircraft The appropriate images can becollected and used to train the network, which is potentially a time-consumingtask depending on the size of the network required After the network haslearned successfully to recognize all of the aircraft, the training period is endedand no further modification of the weights is allowed
If, at some future time, another aircraft in the same class becomes ational, you may wish to add its silhouette to the store of knowledge in your
Trang 17oper-292 Adaptive Resonance Theory
network To do this, you would have to retrain the network with the new pattern
plus all of the previous patterns Training on only the new silhouette could result
in the network learning that pattern quite well, but forgetting previously learned
patterns Although retraining may not take as long as the initial training, it still
could require a significant investment
Moreover, if an ANS is presented with a previously unseen input pattern,there is generally no built-in mechanism for the network to be able to recognizethe novelty of the input The ANS doesn't know that it doesn't know the inputpattern
We have been describing what Stephen Grossberg calls the
stability-plasticity dilemma [5] This dilemma can be stated as a series of questions [6]:
How can a learning system remain adaptive (plastic) in response to significantinput, yet remain stable in response to irrelevant input? How does the systemknow to switch between its plastic and its stable modes? How can the systemretain previously learned information while continuing to learn new things?
In response to such questions, Grossberg, Carpenter, and numerous
col-leagues developed adaptive resonance theory (ART), which seeks to provide
answers ART is an extension of the competitive-learning schemes that havebeen discussed in Chapters 6 and 7 The material in Section 6.1 especially,should be considered a prerequisite to the current chapter We will draw heav-ily from those results, so you should review the material, if necessary, before
proceeding
In the competitive systems discussed in Chapter 6, nodes compete withone another, based on some specified criteria, and the winner is said to classifythe input pattern Certain instabilities can arise in these networks such thatdifferent nodes might respond to the same input pattern on different occasions
Moreover, later learning can wash away earlier learning if the environment isnot statistically stationary or if novel inputs arise [9]
A key to solving the stability-plasticity dilemma is to add a feedback
mech-anism between the competitive layer and the input layer of a network This back mechanism facilitates the learning of new information without destroying
feed-old information, automatic switching between stable and plastic modes, and
sta-bilization of the encoding of the classes done by the nodes The results from
this approach are two neural-network architectures that are particularly suited forpattern-classification problems in realistic environments These network archi-tectures are referred to as ART1 and ART2 ART1 and ART2 differ in the nature
of their input patterns ART1 networks require that the input vectors be binary.ART2 networks are suitable for processing analog, or gray-scale, patterns.ART gets its name from the particular way in which learning and recallinterplay in the network In physics, resonance occurs when a small-amplitudevibration of the proper frequency causes a large-amplitude vibration in an elec-trical or mechanical system In an ART network, information in the form ofprocessing-element outputs reverberates back and forth between layers If theproper patterns develop, a stable oscillation ensues, which is the neural-network
Trang 188.1 ART Network Description 293
equivalent of resonance During this resonant period, learning, or adaptation,can occur Before the network has achieved a resonant state, no learning takes
place, because the time required for changes in the processing-element weights
is much longer than the time that it takes the network to achieve resonance
A resonant state can be attained in one of two ways If the network haslearned previously to recognize an input vector, then a resonant state will beachieved quickly when that input vector is presented During resonance, theadaptation process will reinforce the memory of the stored pattern If the inputvector is not immediately recognized, the network will rapidly search throughits stored patterns looking for a match If no match is found, the network willenter a resonant state whereupon the new pattern will be stored for the first time.Thus, the network responds quickly to previously learned data, yet remains able
to learn when novel data are presented
Much of Grossberg's work has been concerned with modeling actual scopic processes that occur within the brain in terms of the average properties
macro-of collections macro-of the microscopic components macro-of the brain (neurons) Thus, aGrossberg processing element may represent one or more actual neurons Inkeeping with our practice, we shall not dwell on the neurological implications
of the theory There exists a vast body of literature available concerning thiswork Work with these theories has led to predictions about neurophysiologicalprocesses, even down to the chemical-ion level, which have subsequently beenproven true through research by neurophysiologists [6] Numerous referencesare listed at the end of this chapter
The equations that govern the operation of the ART networks are quitecomplicated It is easy to lose sight of the forest while examining the treesclosely For that reason, we first present a qualitative description of the pro-cessing in ART networks Once that foundation is laid, we shall return to adetailed discussion of the equations
8.1 ART NETWORK DESCRIPTION
The basic features of the ART architecture are shown in Figure 8.1 Patterns ofactivity that develop over the nodes in the two layers of the attentional subsystemare called short-term memory (STM) traces because they exist only in associationwith a single application of an input vector The weights associated with the
bottom-up and top-down connections between F\ and F2 are called long-term
memory (LTM) traces because they encode information that remains a part of
the network for an extended period
8.1.1 Pattern Matching in ART
To illustrate the processing that takes place, we shall describe a hypotheticalsequence of events that might occur in an ART network The scenario is a simplepattern-matching operation during which an ART network tries to determine
Trang 19294 Adaptive Resonance Theory
Gain Attentional subsystem ' Orienting "^
control F 2 Layer I subsystem
Resetsignal
Input vector
Figure 8.1 The ART system is diagrammed The two major subsystems are
the attentional subsystem and the orienting subsystem F\ and
F2 represent two layers of nodes in the attentional subsystem.Nodes on each layer are fully interconnected to the nodes onthe other layer Not shown are interconnects among the nodes
on each layer Other connections between components areindicated by the arrows A plus sign indicates an excitatoryconnection; a minus sign indicates an inhibitory connection.The function of the gain control and orienting subsystem isdiscussed in the text
whether an input pattern is among the patterns previously stored in the network.Figure 8.2 illustrates the operation
In Figure 8.2(a), an input pattern, I, is presented to the units on F\ in the
same manner as in other networks: one vector component goes to each node
A pattern of activation, X, is produced across F\ The processing done by the
units on this layer is a somewhat more complicated form of that done by theinput layer of the CPN (see Section 6.1) The same input pattern excites both the
orienting subsystem, A, and the gain control, G (the connections to G are not
shown on the drawings) The output pattern, S, results in an inhibitory signal
that is also sent to A The network is structured such that this inhibitory signal exactly cancels the excitatory effect of the signal from I, so that A remains inactive Notice that G supplies an excitatory signal to F\ The same signal
is applied to each node on the layer and is therefore known as a nonspecific
signal The need for this signal will be made clear later.
The appearance of X on F\ results in an output pattern, S, which is sent through connections to F Each FI unit receives the entire output vector, S,
Trang 208.1 ART Network Description 295
0 =!
Figure 8.2 A pattern-matching cycle in an ART network is shown The
process evolves from the initial presentation of the input pattern
in (a) to a pattern-matching attempt in (b), to reset in (c), to thefinal recognition in (d) Details of the cycle are discussed in the text.
from F] F2 units calculate their net-input values in the usual manner by
sum-ming the products of the input values and the connection weights In response
to inputs from F\, a pattern of activity, Y, develops across the nodes of F 2 F 2
is a competitive layer that performs a contrast enhancement on the input signal
like the competitive layer described in Section 6.1 The gain control signals to
F 2 are omitted here for simplicity
In Figure 8.2(b), the pattern of activity, Y, results in an output pattern, U,
from F 2 This output pattern is sent as an inhibitory signal to the gain control
system The gain control is configured such that if it receives any inhibitorysignal from F2, it ceases activity U also becomes a second input pattern for the
F\ units U is transformed by LTM traces on the top-down connections from
F 2 to F\ We shall call this transformed pattern V.
Notice that there are three possible sources of input to F\, but that only two appear to be used at any one time The units on F\ (and F 2 as well)are constructed so that they can become active only if two out of the possible