1. Trang chủ
  2. » Công Nghệ Thông Tin

neural networks algorithms applications and programming techniques phần 8 ppsx

41 349 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 41
Dung lượng 1,25 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

7.3.1 The SOM Data Structures From our theoretical discussion earlier in this chapter, we know that the SOM is structured as a two-layer network, with a single vector of input units vidi

Trang 1

276 Self-Organizing Maps

PP

Figure 7.10 This illustration shows the sequence of responses from the

phonotopic map resulting from the spoken F i n n i s h word

humppila (Do not bother to look up the meaning of this word

in your F i n n i s h - E n g l i s h dictionary: humppila is the name of a place.) Source: Reprinted with permission from Teuvo Kohonen, ' 'The neural phonetic typewriter." IEEE Computer, March 1988.

©1988 IEEE.

mechanism; thus, the torques that cause a particular motion must be known inadvance Figure 7.11 illustrates the simple, two-dimensional robot-arm modelused in this example

For a particular starting position, x, and a particular, desired end-effectorvelocity, Udesired, the required torques can be found from

where T is the vector ("n^)'.3 The tensor quantity, A (here, simply a dimensional matrix) is determined by the details of the arm and its configuration.Ritter and Schulten use Kohonen's SOM algorithm to learn the A(x) quantities

two-A mechanism for learning the two-A tensors would be useful in a real environmentwhere aging effects and wear might alter the dynamics of the arm over time.The first part of the method is virtually identical to the two-dimensionalmapping example discussed in Section 7.1 Recall that, in that example, units

- Torque itself is a vector quantity, defined as the time-rate of change of the angular momentum

vector Our vector T is a composite of the magnitudes of two torque vectors, T\ and r* The directions of r\ and r^ can be accounted for by their signs: r > 0 implies a counterclockwise

Trang 2

7.2 Applications of Self-Organizing Maps 277

Figure 7.11 This figure shows a schematic of a simple robot arm and

its space of permitted movement The arm consists of twomassless segments of length 1.0 and 0.9, with unit, point

masses at its distal joint, d, and its end effector, e The end

effector begins at some randomly selected location, x, within

the region R The joint angles are Q\ and 6*2- The desired

movement of the arm is to have the end effector move atsome randomly selected velocity, udesired- For this movement

to be accomplished, torques r\ and r^ must be applied at the

Ritter and Schulten begin with a two-dimensional array of units

identi-fied by their integer coordinates, (i,j), within the region R of Figure 7.11.

Instead of using the coordinates of a selected point as inputs, they use thecorresponding values of the joint angles Given suitable restrictions on the

values of 9\ and #2> there will be a one-to-one correspondence between the joint-angle vector, 6 — (&i,02)', and the coordinate vector, x = (xj,.^)'.

Other than this change of variables, and the use of a different model for theMexican-hat function, the development of the map proceeds as described inSection 7.1:

1 Select a point x within R according to a uniform random distribution.

2 Determine the corresponding 0* =

Trang 3

278 Self-Organizing Maps

3 Select the winning unit, y*, such that

||0(y*)-01=min.||0(y)-01

y

4 Update the theta vector for all units according to

0(y, t+l) = 0(y,«) + /i,(y - y*, t)(0* - 0(y,*)) The function h\(y — y*,i) defines the model of the Mexican-hat function:

It is a Gaussian function centered on the winning unit Therefore, the

neighborhood around the winning unit that gets to share in the victory

encompasses all of the units Unlike in the example in Section 7.1, however,the magnitude of the weight updates for the neighboring units decreases as a

function of distance from the winning unit Also, the width of the Gaussian

is decreased as learning proceeds

So far, we have not said anything about how the A(x) matrices arelearned That task is facilitated by association of one A tensor with eachunit of the SOM network Then, as winning units are selected according to

the procedure given, A matrices are updated right along with the 0 vectors.

We can determine how to adjust the A matrices by using the difference

between the desired motion, laired* and the actual motion, v, to determine

successive approximations to A In principle, we do not need a SOM to

accomplish this adjustment We could pick a starting location, then tigate all possible velocities starting from that point, and iterate A until itconverges to give the expected movements Then, we would select another

inves-starting point and repeat the exercise We could continue this process untilall starting locations have been visited and all As have been determined

The advantage of using a SOM is that all A matrices are updated

simultaneously, based on the corrections determined for only one ing location Moreover, the magnitude of the corrections for neighboringunits ensures that their A matrices are brought close to their correct values

start-quickly, perhaps even before their associated units have been selected via

the 6 competition So, to pick up the algorithm where we left off,

5 Select a desired velocity, u, with random direction and unit magnitude,

||u|| = 1 Execute an arm movement with torques computed from T =

A(x)u, and observe the actual end-effector velocity, v

6 Calculate an improved estimate of the A tensor for the winning unit:

A(y*, t + 1) = A(y*, t) + eA(y*, t)(u - v)v'

where e is a positive constant less than 1

7 Finally, update the A tensor for all units according to

A(y, t+l) = A(y, t) + My - y*, *)(A(y*, t + 1) - A(y, t))

where hi is a Gaussian function whose width decreases with time.

Trang 4

7.3 Simulating the SOM 279

The result of using a SOM in this manner is a significant decrease in theconvergence time for the A tensors Moreover, the investigators reported thatthe system was more robust in the sense of being less sensitive to the initialvalues of the A tensors

7.3 SIMULATING THE SOM

As we have seen, the SOM is a relatively uncomplicated network in that it hasonly two layers of units Therefore, the simulation of this network will nottax the capacity of the general network data structures with which we have, bynow, become familiar The SOM, however, adds at least one interesting twist

to the notion of the layer structure used by most other networks; this is the firsttime we have dealt with a layer of units that is organized as a two-dimensionalmatrix, rather than as a simple one-dimensional vector To accommodate thisnew dimension, we will decompose the matrix conceptually into a single vectorcontaining all the row vectors from the original matrix As you will see inthe following discussion, this matrix decomposition allows the SOM simulator

to be implemented with minimal modifications to the general data structuresdescribed in Chapter 1

7.3.1 The SOM Data Structures

From our theoretical discussion earlier in this chapter, we know that the SOM

is structured as a two-layer network, with a single vector of input units viding stimulation to a rectangular array of output units Furthermore, units

pro-in the output layer are pro-interconnected to allow lateral pro-inhibition and excitation,

as illustrated in Figure 7.12(a) This network structure will be rather some to simulate if we attempt to model the network precisely as illustrated,because we will have to iterate on the row and column offsets of the outputunits Since we have chosen to organize our network connection structures asdiscrete, single-dimension arrays accessed through an intermediate array, there

cumber-is no straightforward means of defining a matrix of connection arrays withoutmodifying most of the general network structures We can, however, reducethe complexity of the simulation task by conceptually unpacking the matrix ofunits in the output layer, reforming them as a single layer of units organized as

a long vector composed of the concatenation of the original row vectors

In so doing, we will have essentially restructured the network such that itresembles the more familiar two-layer structure, as shown in Figure 7.12(b) As

we shall see, the benefit of restructuring the network in this manner is that it

will enable us to efficiently locate, and update, the neighborhood surrounding

the winning unit in the competition

If we also observe that the connections between the units in the output layercan be simulated on the host computer system as an algorithmic determination ofthe winning unit (and its associated neighborhood), we can reduce the processing

Trang 5

Input units

(b)

Figure 7.12 The conceptual model of the SOM is shown, (a) as described

by the theoretical model, and (b) restructured to ease the simulation task.

Trang 6

7.3 Simulating the SOM 281

model of the SOM network to a simple two-layer, feedforward structure Thisreduction allows us to simulate the SOM by using exactly the data structuresdescribed in Chapter I The only network-specific structure needed to implementthe simulator is then the top-level network-specification record For the SOM,such a record takes the following form:

record SOM =

ROWS : integer; {number of rows in output layer)COLS : integer; {ditto for columns}INPUTS : "layer; {pointer to input layer structure}OUTPUTS : "layer; {pointer to output layer structure}WINNER : integer; {index to winning unit}deltaR : integer; {neighborhood row offset}deltaC : integer; {neighborhood column offset}TIME : integer; {discrete timestep}end record;

7.3.2 SOM Algorithms

Let us now turn our attention to the process of implementing the SOM simulator

As in previous chapters, we shall begin by describing the algorithms needed topropagate information through the network, and shall conclude this section bydescribing the training algorithms Throughout the remainder of this section,

we presume that you are by now familiar with the data structures we use tosimulate a layered network Anyone not comfortable with these structures isreferred to Section 1.4

SOM Signal Propagation In Section 6.4.4, we described a modification tothe counterpropagation network that used the magnitude of the difference vectorbetween the unnormalized input and weight vectors as the basis for determiningthe activation of a unit on the competitive layer We shall now see that thisapproach is a viable means of implementing competition, since it is the basicmethod of stimulating output units in the SOM

In the SOM, the input layer is provided only to store the input vector.For that reason, we can consider the process of forward signal propagation

to be a matter of allowing the computer to visit all units in the output layersequentially At each output-layer unit, the computer calculates the magnitude of

the difference vector between the output of the input layer and the weight vectorformed by the connections between the input layer and the current unit Aftercompletion of this calculation, the magnitude will be stored, and the computerwill move on to the next unit on the layer Once all the output-layer units havebeen processed, the forward signal propagation is finished, and the output of thenetwork will be the matrix containing the magnitude of the difference vector foreach unit in the output layer

If we also consider the training process, we can allow the computer tostore locally an index (or pointer) to locate the output unit that had the smallest

Trang 7

282 Self-Organizing Maps

difference-vector magnitude during the initial pass That index can then beused to identify the winner of the competition By adopting this approach, wecan also use the routine used to forward propagate signals in the SOM during

training with no modifications

Based on this strategy, we shall define the forward signpropagation gorithm to be the combination of two routines: one to compute the difference-vector magnitude for a specified unit on the output layer, and one to call the firstroutine for every unit on the output layer We shall call these routines propand propagate, respectively We begin with the definition of prop

al-function prop (NET:SOM; UNIT:integer) return float

{compute the magnitude of the difference vector for UNIT}var invec, connects

sum, mag : float;

i : integer;

'float[]; {locate arrays}

{temporary variables}{iteration counter}begin

invec = NET.INPUTS".CUTS"; {locate input vector}connects = NET.OUTPUTS".WEIGHTS"[UNIT]; {connections}sum = 0; {initialize sum}for i = 1 to length(invec) {for all inputs}

do {square of difference}sum = sum + sqr(invec[i] - connect[i]);

Now that we can compute the output value for any unit on the outputlayer, let us consider the routine to generate the output for the entire network.Since we have defined our SOM network as a standard, two-layer network, thepseudocode definition for propagate is straightforward

function propagate (NET:SOM) return integer

{propagate forward through the SOM, return the index to

winner}var outvec : "float [];

{locate output array}{initialize winner}{arbitrarily high}{for all outputs}

Trang 8

7.3 Simulating the SOM 283

do

mag = prop(NET, i) ; {activate unit}

outvecfi] = mag; {save output}

if (mag < smallest) {if new winner is found}then

winner = i; {mark new winner}smallest = mag; {save winning value)end if;

end do;

NET WINNER = winner; {store winning unit id}

return (winner) ; {identify winner}end function;

SOM Learning Algorithms Now that we have developed a means for

per-forming the forward signal propagation in the SOM, we have also solved thelargest part of the problem of training the network As described by Eq (7.4),learning in the SOM takes place by updating of the connections to the set of out-

put units that fall within the neighborhood of the winning unit We have already

provided the means for determining the winner as part of the forward signalpropagation; all that remains to be done to train the network is to develop theprocesses that define the neighborhood (7VC) and update the connection weights.Unfortunately, the process of determining the neighborhood surrounding thewinning unit is likely to be application dependent For example, consider thetwo applications described earlier, the neural phonetic typewriter and the ballisticarm movement systems Each implemented an SOM as the basic mechanismfor solving their respective problems, but each also utilized a neighborhood-selection mechanism that was best suited to the application being addressed

It is likely that other problems would also require alternative methods bettersuited to determining the size of the neighborhood needed for each application.Therefore, we will not presuppose that we can define a universally acceptable

function for N c

We will, however, develop the code necessary to describe a typicalneighborhood-selection function, trusting that you will learn enough from theexample to construct a function suitable for your applications For simplicity,

we will design the process as two functions: the first will return a true-falseflag to indicate whether a certain unit is within the neighborhood of the winningunit at the current timestep, and the second will update the connection values at

an output unit, if the unit falls within the neighborhood of the winning unit.The first of these routines, which we call neighbor, will return a trueflag if the row and column coordinates of the unit given as input fall within therange of units to be updated This process proves to be relatively easy, in thatthe routine needs to perform only the following two tests:

(R w - AR) <R<(R W

(C w - AC1) < C < (C w + AC)

Trang 9

284 Self-Organizing Maps

-«———— AC = 1 —————+~

Figure 7.13 A simple scheme is shown for dynamically altering the size

of the neighborhood surrounding the winning unit In thisdiagram, W denotes the winning unit for a given input vector.The neighborhood surrounding the winning unit is then given

by the values contained in the variables deltaR and-deltaCcontained in the SOM record As the values in deltaR anddeltaC approach zero, the neighborhood surrounding thewinning unit shrinks, until the neighborhood is precisely thewinning unit.

where (R w , C w ) are the row and column coordinates of the winning unit, (A-R,

AC) are the row and column offsets from the winning unit that define the

neighborhood, and (R, C) the row and column coordinates of the unit being

tested

For example, consider the situation illustrated in Figure 7.13 Notice that

the boundary surrounding the winner's neighborhood shrinks with successively

smaller values for (A/?, AC), until the neighborhood is limited to the winnerwhen (A.R, AC) = (0,0) Thus, we need only to alter the values for (A/?, AC)

in order to change the size of the winner's neighborhood

So that we can implement this mechanism of neighborhood determination,

we have incorporated two variables in the SOM record, which we have nameddeltaR and deltaC, which allow the network record to keep the current

values for the &.R and AC terms Having made this observation, we can now

define the algorithm needed to implement the neighbor function

function neighbor (NET:SOM; R,C,W:integer) return boolean {return true if ( R , C ) is in the neighborhood of W}

var row, col, {coordinates of winner} dRl, dCl, {coordinates of lower boundary} dR2, dC2 : integer; {coordinates of upper boundary} begin

Trang 10

7.3 Simulating the SOM 285

dR2 = min(NET.ROWS, (row + NET.deltaR));

dCl = max(l, (col - NET.deltaC));

dC2 = min(NET.COLS, (col + NET.deltaC));

return (((dRl <= R) and (R <= dR2)) and

((dCl <= C) and (C <= dC2)));

end function;

Note that the algorithm for neighbor relies on the fact that the arrayindices for the winning unit (W) and the number of rows and columns in the

SOM output layer are presumed to start at 1 and to run through n If the first

index is presumed to be zero, the determination of the row and col valuesdescribed must be adjusted, since zero divided by anything is zero Similarly,the min and max functions utilized in the algorithm are needed to protect againstthe case where the winning unit is located on an "edge" of the network output.Now that we can determine whether or not a unit is in the neighborhood ofthe winning unit in the SOM, all that remains to complete the implementation

of the training algorithms is the function needed to update the weights to all theunits that require updating We shall design this algorithm to return the number

of units updated in the SOM, so that the calling process can determine when theneighborhood around the winning unit has shrunk to just the winning unit (i.e.,when the number of units updated is equal to 1) Also, to simplify things, we

shall assume that the a(t) term given in the weight-update equation (Eq 7.2) is

simply a small constant value, rather than a function of time In this example

algorithm, we define the a(t) parameter as the value A.

function update (NET:SOM) return integer

{update the weights to all winning units,

returning the number of winners updated}constant A : float = 0.3; {simple activation constant}var winner, unit, upd : integer;

{indices to output units}invec : "float[];

{locate unit output arrays}connect : "float []; {locate connection array}

i, j, k : integer; {iteration counters}begin

winner = propagate (NET); {propagate and find winner}unit = 1 ; {start at first output unit}upd = 0 ; {no updates yet}

Trang 11

{first locate the appropriate connection array}connect = NET.OUTPUTS".WEIGHTS"[unit];

{then locate the input layer output array}invec = NET.INPUTS".OUTS";

upd = upd + 1; {count another update}for k = 1 to length(connect)

{for all connections}do

connect[k] = connect[k]

+ (A*(invec[k]-connect[k]));end do;

7.3.3 Training the SOM

Like most other networks, the SOM will be constructed so that it initially

con-tains random information The network will then be allowed to self-adapt by

being shown example inputs that are representative of the desired topology Ourcomputer simulation ought to mimic the desired network behavior if we simplyfollow these same guidelines when constructing and training the simulator

There are two aspects of the training process that are relatively simple to

implement, and we assume that you will provide them as part of the tation of the simulator These functions are the ones needed to initialize the

implemen-SOM (initialize) and to apply an input vector (set-input) to the inputlayer of the network

Most of the work to be done in the simulator will be accomplished by thepreviously defined routines, and we need to concern ourselves now with onlythe notion of deciding how and when to collapse the winning neighborhood as

we train the network Here again, this aspect of the design probably will beinfluenced by the specific application, so, for instructional purposes, we willrestrict ourselves to a fairly easy application that allows each of the output layerunits to be uniquely associated with a specific input vector

For this example, let us assume that the SOM to be simulated has four rows

of five columns of units in the output layer, and two units providing input Such

Trang 12

7.3 Simulating the SOM 287

Figure 7.14 This SOM network can be used to capture organization of

a two-dimensional image, such as a triangle, circle, or anyregular polygon In the programming exercises, we ask you

to simulate this network structure to test the operation of yoursimulator program

a network structure is depicted in Figure 7.14 We will code the SOM simulator

so that the entire output layer is initially contained in the neighborhood, and

we shall shrink the neighborhood by two rows and two columns after every 10training patterns

For the SOM, there are two distinct training sessions that must occur In thefirst, we will train the network until the neighborhood has shrunk to the pointthat only one unit wins the competition During the second phase, which occursafter all the training patterns have been allocated to a winning unit (although

not necessarily different units), we will simply continue to run the training

algorithm for an arbitrarily large number of additional cycles We do this to try

to ensure that the network has stabilized, although there is no absolute guaranteethat it has With this strategy in mind, we can now complete the simulator byconstructing the routine to initialize and train the network

procedure train (NET:SOM; NP:integer)

{train the network for each of NP patterns}

Trang 13

288 Programming Exercises

NET.deltaR = NET.ROWS / 2;

{initialize the row offset}NET.deltaC = NET.COLS 7 2; {ditto for columns}NET.TIME = 0; {reset time counter}set_inputs(NET,i); {get training pattern}while (update(NET) > 1) {loop until one winner}do

NET.TIME = NET.TIME + 1;

{advance training counter}

if (NET.TIME % 10 = 0) {if shrink time}then

{shrink the neighborhood, with a floor

of (0,0)}NET.deltaR = max (0, NET.deltaR - 1);

NET.deltaC = max (0, NET.deltaC - 1);

for j = 1 to NP {for all patterns}do

set_inputs(NET, j); {set training pattern}dummy = update(NET); {train network}end do;

end do;

end procedure;

Programming Exercises

7.1 Implement the SOM simulator Test it by constructing a network similar

to the one depicted in Figure 7.14 Train the network using the Cartesiancoordinates of each unit in the output layer as training data Experiment withdifferent time periods to determine how many training passes are optimalbefore reducing the size of the neighborhood

7.2 Repeat Programming Exercise 7.1, but this time extend the simulator toplot the network dynamics using Kohonen's method, as described in Sec-tion 7.1 If you do not have access to a graphics terminal, simply list outthe connection-weight values to each unit in the output layer as a set ofordered pairs at various timesteps

7.3 The converge algorithm given in the text is not very general, in that itwill work only if the number of output units in the SOM is exactly equal

Trang 14

Bibliography 289

to the number of training patterns to be encoded Redesign the routine to

handle the case where the number of training patterns greatly outnumbersthe number of output layer units Test the algorithm by repeating Program-ming Exercise 7.1 and reducing the number of output units used to threerows of four units

7.4 Repeat Programming Exercise 7.2, this time using three input units figure the output layer appropriately, and train the network to learn to map

Con-a three-dimensionCon-al cube in the first quCon-adrCon-ant (Con-all vertices should contCon-ainpositive coordinates) Do not put the vertices of the cube at integer co-ordinates Does the network do as well as the network in Programming

Exercise 7.2?

Suggested Readings

The best supplement to the material in this chapter is Kohonen's text on organization [2] Now in its second edition, that text also contains generalbackground material regarding various learning methods for neural networks, aswell as a review of the necessary mathematics

self-Bibliography

[1] Willian Y Huang and Richard P Lippmann Neural net and traditional

classifiers In Proceedings of the Conference on Neural Information cessing Systems, Denver, CO, November 1987.

Pro-[2] Teuvo Kohonen Self-Organization and Associative Memory, volume 8 of

Springer Series in Information Sciences Springer-Verlag, New York,

1984

[3] Teuvo Kohonen The "neural" phonetic typewriter Computer, 21(3): 11-22,

March 1988

[4] H Ritter and K Schulten Topology conserving mappings for learning

motor tasks In John S Denker, editor, Neural Networks for Computing.

American Institute of Physics, pp 376-380, New York, 1986

[5] H Ritter and K Schulten Extending Kohonen's self-organizing mappingalgorithm to learn ballistic movements In Rolf Eckmiller and Christoph

v.d Malsburg, editors Neural Computers Springer-Verlag, Heidelberg,

pp 393-406, 1987

[6] Helge J Ritter, Thomas M Martinetz, and Klaus J Schulten

Topology-conserving maps for learning visuo-motor-coordination Neural Networks,

2(3): 159-168, 1989

Trang 15

i 1

Trang 16

H A P T E R

Adaptive Resonance Theory

One of the nice features of human memory is its ability to learn many newthings without necessarily forgetting things learned in the past A frequentlycited example is the ability to recognize your parents even if you have not seenthem for some time and have learned many new faces in the interim It would

be highly desirable if we could impart this same capability to an ANS Mostnetworks that we have discussed in previous chapters will tend to forget oldinformation if we attempt to add new information incrementally

When developing an ANS to perform a particular pattern-classification ation, we typically proceed by gathering a set of exemplars, or training patterns,then using these exemplars to train the system During the training, information

oper-is encoded in the system by the adjustment of weight values Once the training

is deemed to be adequate, the system is ready to be put into production, and noadditional weight modification is permitted

This operational scenario is acceptable provided the problem domain haswell-defined boundaries and is stable Under such conditions, it is usuallypossible to define an adequate set of training inputs for whatever problem isbeing solved Unfortunately, in many realistic situations, the environment isneither bounded nor stable

Consider a simple example Suppose you intend to train a BPN to recognizethe silhouettes of a certain class of aircraft The appropriate images can becollected and used to train the network, which is potentially a time-consumingtask depending on the size of the network required After the network haslearned successfully to recognize all of the aircraft, the training period is endedand no further modification of the weights is allowed

If, at some future time, another aircraft in the same class becomes ational, you may wish to add its silhouette to the store of knowledge in your

Trang 17

oper-292 Adaptive Resonance Theory

network To do this, you would have to retrain the network with the new pattern

plus all of the previous patterns Training on only the new silhouette could result

in the network learning that pattern quite well, but forgetting previously learned

patterns Although retraining may not take as long as the initial training, it still

could require a significant investment

Moreover, if an ANS is presented with a previously unseen input pattern,there is generally no built-in mechanism for the network to be able to recognizethe novelty of the input The ANS doesn't know that it doesn't know the inputpattern

We have been describing what Stephen Grossberg calls the

stability-plasticity dilemma [5] This dilemma can be stated as a series of questions [6]:

How can a learning system remain adaptive (plastic) in response to significantinput, yet remain stable in response to irrelevant input? How does the systemknow to switch between its plastic and its stable modes? How can the systemretain previously learned information while continuing to learn new things?

In response to such questions, Grossberg, Carpenter, and numerous

col-leagues developed adaptive resonance theory (ART), which seeks to provide

answers ART is an extension of the competitive-learning schemes that havebeen discussed in Chapters 6 and 7 The material in Section 6.1 especially,should be considered a prerequisite to the current chapter We will draw heav-ily from those results, so you should review the material, if necessary, before

proceeding

In the competitive systems discussed in Chapter 6, nodes compete withone another, based on some specified criteria, and the winner is said to classifythe input pattern Certain instabilities can arise in these networks such thatdifferent nodes might respond to the same input pattern on different occasions

Moreover, later learning can wash away earlier learning if the environment isnot statistically stationary or if novel inputs arise [9]

A key to solving the stability-plasticity dilemma is to add a feedback

mech-anism between the competitive layer and the input layer of a network This back mechanism facilitates the learning of new information without destroying

feed-old information, automatic switching between stable and plastic modes, and

sta-bilization of the encoding of the classes done by the nodes The results from

this approach are two neural-network architectures that are particularly suited forpattern-classification problems in realistic environments These network archi-tectures are referred to as ART1 and ART2 ART1 and ART2 differ in the nature

of their input patterns ART1 networks require that the input vectors be binary.ART2 networks are suitable for processing analog, or gray-scale, patterns.ART gets its name from the particular way in which learning and recallinterplay in the network In physics, resonance occurs when a small-amplitudevibration of the proper frequency causes a large-amplitude vibration in an elec-trical or mechanical system In an ART network, information in the form ofprocessing-element outputs reverberates back and forth between layers If theproper patterns develop, a stable oscillation ensues, which is the neural-network

Trang 18

8.1 ART Network Description 293

equivalent of resonance During this resonant period, learning, or adaptation,can occur Before the network has achieved a resonant state, no learning takes

place, because the time required for changes in the processing-element weights

is much longer than the time that it takes the network to achieve resonance

A resonant state can be attained in one of two ways If the network haslearned previously to recognize an input vector, then a resonant state will beachieved quickly when that input vector is presented During resonance, theadaptation process will reinforce the memory of the stored pattern If the inputvector is not immediately recognized, the network will rapidly search throughits stored patterns looking for a match If no match is found, the network willenter a resonant state whereupon the new pattern will be stored for the first time.Thus, the network responds quickly to previously learned data, yet remains able

to learn when novel data are presented

Much of Grossberg's work has been concerned with modeling actual scopic processes that occur within the brain in terms of the average properties

macro-of collections macro-of the microscopic components macro-of the brain (neurons) Thus, aGrossberg processing element may represent one or more actual neurons Inkeeping with our practice, we shall not dwell on the neurological implications

of the theory There exists a vast body of literature available concerning thiswork Work with these theories has led to predictions about neurophysiologicalprocesses, even down to the chemical-ion level, which have subsequently beenproven true through research by neurophysiologists [6] Numerous referencesare listed at the end of this chapter

The equations that govern the operation of the ART networks are quitecomplicated It is easy to lose sight of the forest while examining the treesclosely For that reason, we first present a qualitative description of the pro-cessing in ART networks Once that foundation is laid, we shall return to adetailed discussion of the equations

8.1 ART NETWORK DESCRIPTION

The basic features of the ART architecture are shown in Figure 8.1 Patterns ofactivity that develop over the nodes in the two layers of the attentional subsystemare called short-term memory (STM) traces because they exist only in associationwith a single application of an input vector The weights associated with the

bottom-up and top-down connections between F\ and F2 are called long-term

memory (LTM) traces because they encode information that remains a part of

the network for an extended period

8.1.1 Pattern Matching in ART

To illustrate the processing that takes place, we shall describe a hypotheticalsequence of events that might occur in an ART network The scenario is a simplepattern-matching operation during which an ART network tries to determine

Trang 19

294 Adaptive Resonance Theory

Gain Attentional subsystem ' Orienting "^

control F 2 Layer I subsystem

Resetsignal

Input vector

Figure 8.1 The ART system is diagrammed The two major subsystems are

the attentional subsystem and the orienting subsystem F\ and

F2 represent two layers of nodes in the attentional subsystem.Nodes on each layer are fully interconnected to the nodes onthe other layer Not shown are interconnects among the nodes

on each layer Other connections between components areindicated by the arrows A plus sign indicates an excitatoryconnection; a minus sign indicates an inhibitory connection.The function of the gain control and orienting subsystem isdiscussed in the text

whether an input pattern is among the patterns previously stored in the network.Figure 8.2 illustrates the operation

In Figure 8.2(a), an input pattern, I, is presented to the units on F\ in the

same manner as in other networks: one vector component goes to each node

A pattern of activation, X, is produced across F\ The processing done by the

units on this layer is a somewhat more complicated form of that done by theinput layer of the CPN (see Section 6.1) The same input pattern excites both the

orienting subsystem, A, and the gain control, G (the connections to G are not

shown on the drawings) The output pattern, S, results in an inhibitory signal

that is also sent to A The network is structured such that this inhibitory signal exactly cancels the excitatory effect of the signal from I, so that A remains inactive Notice that G supplies an excitatory signal to F\ The same signal

is applied to each node on the layer and is therefore known as a nonspecific

signal The need for this signal will be made clear later.

The appearance of X on F\ results in an output pattern, S, which is sent through connections to F Each FI unit receives the entire output vector, S,

Trang 20

8.1 ART Network Description 295

0 =!

Figure 8.2 A pattern-matching cycle in an ART network is shown The

process evolves from the initial presentation of the input pattern

in (a) to a pattern-matching attempt in (b), to reset in (c), to thefinal recognition in (d) Details of the cycle are discussed in the text.

from F] F2 units calculate their net-input values in the usual manner by

sum-ming the products of the input values and the connection weights In response

to inputs from F\, a pattern of activity, Y, develops across the nodes of F 2 F 2

is a competitive layer that performs a contrast enhancement on the input signal

like the competitive layer described in Section 6.1 The gain control signals to

F 2 are omitted here for simplicity

In Figure 8.2(b), the pattern of activity, Y, results in an output pattern, U,

from F 2 This output pattern is sent as an inhibitory signal to the gain control

system The gain control is configured such that if it receives any inhibitorysignal from F2, it ceases activity U also becomes a second input pattern for the

F\ units U is transformed by LTM traces on the top-down connections from

F 2 to F\ We shall call this transformed pattern V.

Notice that there are three possible sources of input to F\, but that only two appear to be used at any one time The units on F\ (and F 2 as well)are constructed so that they can become active only if two out of the possible

Ngày đăng: 12/08/2014, 21:21

TỪ KHÓA LIÊN QUAN