9.4 Applications of STNs 363Associativememory IOutput from SCAF 4 Original input vectors Figure 9.18 This hierarchy of SCAF fayers is used for spatiotemporal pattern classification.. The
Trang 1358 Spatiotemporal Pattern Classification
t
1 2 Figure 9.14 A simple two-node SCAF is shown.
is zero The mechanism for training these weights will be described later Thefinal assumption is that the initial value for F is zero
Consider what happens when Qn is applied first The net input to unit 1 is
-fl.net = Zi • Qn + U' 12 X 2 - T(t)
= i + o - r«)
where we have explicitly shown gamma as a function of time According to
Eqs (9.2) through (9.4), x\ — ~ax\ + b(l - F), so x\ begins to increase, since
F and x\ are initially zero.
The net input to unit 2 is
Trang 29.3 The Sequential Competitive Avalanche Field 359
After a short time, we remove Qn and present Qi2- x\ will begin to decay, but slowly with respect to its rise time Now, we calculate I\ nei and /2.net again:
I\ net = Z, • Q,2
= 0 + 0 - T(t)
h.nel = Z2 ' Q l 2 + W2 \Xi ~ T(t)
Using Eqs (9.2) through (9.4) again, x\ = —cax\ and ± 2 = 6(1 + x\ — F),
so x\ continues to decay, but x 2 will continue to rise until 1 + x\ < r(t) Figure 9 15(a) shows how x\ and x 2 evolve as a function of time
A similar analysis can be used to evaluate the network output for the
oppo-site sequence of input vectors When Q|2 is presented first, x 2 will increase x\
remains at zero since /i.net = -T(t) and, thus, x\ = -cax\ The total activity
in the system is not sufficient to cause T(t) to rise.
When Qn is presented, the input to unit 1 is I\ = 1 Even though x 2 is
nonzero, the connection weight is zero, so x 2 does not contribute to the input
to unit 1 x\ begins to rise and T(t) begins to rise in response to the increasing
total activity In this case, F does not increase as much as it did in the first
example Figure 9.15(b) shows the behavior of x\ and x 2 for this example The
values of F(t) for both cases are shown in Figure 9.15(c) Since T(t) is the
measure of recognition, we can conclude that Qn — > Qi2 was recognized, butQi2 — * Qn was not
9.3.2 Training the SCAF
As mentioned earlier, we accomplish training the weights on the connectionsfrom the inputs by methods already described for other networks These weightsencode the spatial part of the STP We have drawn the analogy between the SOMand the spatial portion of the STN In fact, a feood method for training the spatialweights on the SCAF is with Kohonen's clustering algorithm (see Chapter 7)
We shall not repeat the discussion of that trailing method here We shall insteadconcentrate on the training of the temporal part of the SCAF
Encoding the proper temporal order of the spatial patterns requires trainingthe weights on the connections between the various nodes This training uses thedifferential Hebbian learning law (also referred to as the Kosko-Klopf learninglaw):
Wij = (-cwij + dxiXj)U(±i)U(-Xj) (9.10) where c and d are positive constants, and
_ (
~ [
1 s >0
0 s < 0
Trang 3360 Spatiotemporal Pattern Classification
(c)
30
Figure 9.15 These figures illustrate the output response of a 2-node SCAF.
(a) This graph shows the results of a numerical simulation of
the two output values during the presentation of the sequence
Qn —* Qi2- The input pattern changes at t = 17 (b) This
graph shows the results for the presentation of the sequenceQi2 —> Qn- (c) This figure shows how the value of F evolves
in each case FI is for the case shown in (a), and F2 is forthe case shown in (b)
Without the U factors, Eq (9.10) resembles the Grossberg outstar law The
U factors ensure that learning can occur (wjj is nonzero) only under certain
conditions These conditions are that x, is increasing (x, > 0) at the same time
that Xj is decreasing (-±j > 0) When these conditions are met, both U factors will be equal to one Any other combination of ±j and Xj will cause one, or both, of the Us to be zero.
The effect of the differential Hebbian learning law is illustrated in ure 9.16, which refers back to the two-node SCAF in Figure 9.14 We want
Fig-to train the network Fig-to recognize that pattern Qn precedes pattern Qi2- In theexample that we did, we saw that the proper response from the network was
Trang 49.3 The Sequential Competitive Avalanche Field 361
z, • Q11
x > 0'
Figure 9.16 This figure shows the results of a sequential presentation of
Qn followed by Qi2- The net-input values of the two units are
shown, along with the activity of each unit Notice that we
still consider that x\ > 0 and ± 2 > 0 throughout the periods
indicated, even though the activity value is hard-limited to a
maximum value of one The region R indicates the time for which x\ < 0 and ± 2 > 0 simultaneously During this time
period, the differential Hebbian learning law causes w 2 \ to
are never right for it to learn The weight, w 2\, does learn, resulting in the
configuration shown in Figure 9.14
Trang 5362 Spatiotemporal Pattern Classification
Node output values
to another SCAF Since these outputs vary at a slower rate than the originalinput vectors, they can be sampled at a lower frequency The output values ofthis second SCAF would decay even more slowly than those of the previouslayer Conceptually, this process can be continued until a layer is reachedwhere the output patterns vary on a time scale that is equal to the total timenecessary to present a complete sequence of patterns to the original network.The last output values would be essentially stationary A single set of outputvalues from the last slab would represent an entire series of patterns making upone complete STP Figure 9.18 shows such a system based on a hierarchy ofSCAF layers
The stationary output vector can be used as the input vector to one of thespatial pattern-classification networks The spatial network can learn to classifythe stationary input vectors by the methods discussed previously A completespatiotemporal pattern-recognition and pattern-classification system can be con-structed in this manner
Exercise 9.4: No matter how fast input vectors are presented to a SCAF, the
outputs can be made to linger if the parameters of the attack function are justed such that, once saturated, a node output decays very slowly Such anarrangement would appear to eliminate the need for the layered SCAF architec-ture proposed in the previous paragraphs Analyze the response of a SCAF to
ad-an arbitrary STP in the limiting case where saturated nodes never decay
Trang 69.4 Applications of STNs 363
Associativememory
IOutput from SCAF 4
Original input vectors
Figure 9.18 This hierarchy of SCAF fayers is used for spatiotemporal
pattern classification The outputs from each layer aresampled at a rate slower man the rate at which inputs tothat layer change The output from the top layer, essentially
a spatial pattern, can be used as an input to an associativenetwork that classifies the original STP
9.4 APPLICATIONS OF STNS
We suggested earlier in this chapter that STNs would be useful in areas such asspeech recognition, radar analysis, and sonar-echo classification To date, thedearth of literature indicates that little work has been done with this promisingarchitecture
Trang 7364 Spatiotemporal Pattern Classification
A prototype sonar-echo classification system was built by General DynamicsCorporation using the layered STN architecture described in Section 9.2 [8] Inthat study, time slices of the incoming sonar signals were converted to powerspectra, which were then presented to the network in the proper time sequence.After being trained on seven civilian boats, the network was able to identifycorrectly each of these vessels from the latter's passive sonar signature.The developers of the SCAF architecture experimented with a 30 by 30SCAF, where outputs from individual units are connected randomly to otherunits Apparently, the network performance was encouraging, as the developersare reportedly working on new applications Details of those applications arenot available at the time of this writing
9.5 STN SIMULATION
In this section, we shall describe the design of the simulator for the poral network We shall focus on the implementation of a one-layer STN andshall show how that STN can be extended to encompass multilayer (and multi-network) STN architectures The implementation of the SCAF architecture isleft to you as an exercise
spatiotem-We begin this section, as we have all previous simulation discussions, with
a presentation of the data structures used to construct the STN simulator Fromthere, we proceed with the development of the algorithms used to perform signalprocessing within the simulator We close this section with a discussion of how
a multiple STN structure might be created to record a temporal sequence ofrelated patterns
9.5.1 STN Data Structures
The design of the STN simulator is reminiscent of the design we used for theCPN in Chapter 6 We therefore recommend that you review Section 6.4 prior tocontinuing here The reason for the similarity between these two networks is thatboth networks fit precisely the processing structure we defined for performingcompetitive processing within a layer of units.3 The units in both the STNand the competitive layer of the CPN operate by processing normalized inputvectors, and even though competition in the CPN suppresses the output fromall but the winning unit(s), all network units generate an output signal that isdistributed to other PEs
The major difference between the competitive layer in the CPN and theSTN structure is related to the fact that the output from each unit in the STN
becomes an input to all subsequent network units on the layer, whereas the
lateral connections in the CPN simulation were handled by the host computer
'Although the STN is not competitive in the same sense that the hidden layer in the CPN is we
shall see that STN units respond actively to inputs in much the same way that CPN hidden-layer
Trang 89.5 STN Simulation 365
system, and never were actually modeled Similarly, the interconnections tween units on the layer in the STN can be accounted for by the processingalgorithms performed in the host computer, so we do not need to account forthose connections in the simulator design
be-Let us now consider the top-level data structure needed to model an STN
As before, we will construct the network as a record containing pointers tothe appropriate lower-level structures, and containing any network specific dataparameters that are used globally within the network Therefore, we can create
an STN structure through the following record declaration:
Notice that, as illustrated in Figure 9.19, this record definition differs fromall previous network record declarations in that we have included a means for
Figure 9.19 The data structure of the STN simulator is shown Notice that,
in this network structure, there are pointers to other networkrecords above and below to accommodate multiple STNs Inthis manner, the same input data can be propagated efficiently through multiple STN structures.
Trang 9366 Spatiotemporal Pattern Classification
stacking multiple networks through the use of a doubly linked list of network
record pointers We include this capability for two reasons:
1 As described previously, a network that recognizes only one pattern is not
of much use We must therefore consider how to integrate multiple networks
as part of our simulator design
2 When multiple STNs are used to time dilate temporal patterns (as in the
SCAF), the activity patterns of the network units can be used as inputpatterns to another network for further classification
Finally, inspection of the STN record structure reveals that there is nothingabout the STN that will require further modifications or extensions to the genericsimulator structure we proposed in Chapter 1 We are therefore free to begindeveloping STN algorithms
9.5.2 STN Algorithms
Let us begin by considering the sequence of operations that must be performed
by the computer to simulate the STN Using the speech-recognition example
as described in Section 9.2.1 as the basis for the processing model, we canconstruct a list of the operations that must be performed by the STN simulator
1 Construct the network, and initialize the input connections to the units suchthat the first unit in the layer has the first normalized input pattern contained
in its connections, the second unit has the second pattern, and so on
2 Begin processing the test pattern by zeroing the outputs from all units inthe network (as well as the STN.y value, since it is a duplicate copy ofthe output value from the last network unit), and then applying the firstnormalized test vector to the input of the STN
3 Calculate the inner product between the input test vector and the weightvector for the first unprocessed unit
4 Compute the sum of the outputs from all units on the layer from the first
to the previous units, and multiply the result by the network d term.
5 Add the result from step 3 to the result from step 4 to produce the inputactivation for the unit
6 Subtract the threshold value (F) from the result of step 5 If the result is
greater than zero, multiply it by the network b term; otherwise, substitute
zero for the result
7 Multiply the negative of the network a term by the previous output fromthe unit, and add the result to the value produced in step 6
8 If the result of step 7 was less than or equal to zero, multiply it by the
network c term to produce x Otherwise, use the result of step 7 without modification as the value for x.
Trang 109.5 STN Simulation 367
9 Compute the attack value for the unit by multiplying the x value calculated
in step 8 by a small value indicating the network update rate (6t) to produce
the update value for the unit output Update the unit output by adding thecomputed attack value to the current unit output value
10 Repeat steps 3 through 9 for each unit in the network.
11 Repeat steps 3 through 10 for the duration of the time step, Ai The number
of repetitions that occur during this step will be a function of the samplingfrequency for the specific application
12 Apply the next time-sequential test vector to the network input, and repeat
steps 3 through 11
13 After all the time-sequential test vectors have been applied, use the output
of the last unit on the layer as the output value for the network for the givenSTP
Notice that we have assumed that the network units update at a rate much
more rapid than the sampling rate of the input (i.e., the value for fit is much smaller than the value of At) Since the actual sampling frequency (given by
-^r) will always be application dependent, we shall assume that the network
must update itself 100 times for each input pattern Thus, the ratio of 6t to A<
is 0.01, and we can use this ratio as the value for 6t in our simulations.
We shall also assume that you will provide the routines necessary to performthe first two operations in the list We therefore begin developing the simulatoralgorithms with the routine needed to propagate a given input pattern vector to aspecified unit on the STN This routine will encompass the operations described
in steps 3 through 5
function activation (net: STN; unumber:integer;
invec:"float[])return float;
{propagate the given input vector to the STN unit number}var i : integer; I {iteration counter}sum : float; {accumulator}others : float; {unit output accumulator}connects : ~float[]; {locate connection array}unit : "float[]; {locate unit outputs}begin
sum = 0; {initialize accumulator}others = 0; {ditto}unit = net.UNITS".OUTS; {locate unit arrays}connects = net.UNITS".WEIGHTS[unumber];
for i = 1 to length(invec) {for all input elements}
do {compute sum of products}sum = sum + connects[i} * invec[i];
end do;
Trang 11368 Spatiotemporal Pattern Classification
for i = 1 to (unumber - 1) {sum other units outputs}do
others = others + unit[i];
function Xdot (net:STN; unumber:integer; inval:float)
return float;
{convert the input value for the specified unit to
output value}var outval : float;
if (outval <= 0) {factor in decay term}then outval = outval * net.c;
return (outval); {return delta x value}end function;
All that remains at this point is to define a top-level procedure to tie togetherthe signal-propagation routines, and to iterate for every unit in the network.These functions are embodied in the following procedure
procedure propagate (net:STN; invec:"float[]);
{propagate an input vector through the STN}
const dt = 0.01; {network update rate}var i : integer; {iteration counter}how_many : integer; {number of units in STN}
dx : float; {computed Xdot value}
inval : float; {input activation}
Trang 129.5 STN Simulation 369
begin
unit = net.UNITS".OUTS; {locate the output array}how_many = length(unit); {save number of units}for i = 1 to how_many {for all units in the STN}
do {generate output from input}inval = activation (net, i, invec);
dx = Xdot (net, i, inval);
unitfi] = unitfi] + (dx * dt);
end do;
net.y = unit[how_many]; {save last unit output}end procedure;
The propagate procedure will perform a complete signal propagation
of one input vector through the entire STN For a true spatiotemporal
pattern-classification operation, propagate would have to be performed many times4for every Q; patterns that compose the spatiotemporal pattern to be processed
If the network recognized the temporal pattern sequence, the value contained inthe STN y slot would be relatively high after all patterns had been propagated
9.5.3 STN Training
In the previous discussion, we considered an STN that was trained by
initializa-tion Training the network in this manner is fine if we know all the training
vec-tors prior to building the network simulator But what about those cases where
it is preferable to defer training until after the network is operational? Suchoccurrences are common when the training environment is rather large, or whentraining-data acquisition is cumbersome In such cases, is it possible to train anSTN to record (and eventually to replay) data patterns collected at run time?The answer to this question is a qualified "yes." The reason it is qualified
is that the STN is not undergoing training in the same sense that most of theother networks described in this text are trained Rather, we shall take the
approach that an STN can be constructed dynamically, thus simulating the effect
of training As we have seen, the standard STN is constructed and initialized to
contain the normalized form of the pattern to be encoded at each timestep in the
connections of the individual network units To train an STN, we will simply cause our program to create a new STN whenever a new pattern to be learned
is available In this manner, we construct specialized STNs that can then beexercised using all of the algorithms developed previously
The only special consideration is that, with multiple networks in the puter simultaneously, we must take care to ensure that the networks remainaccessible and consistent To accomplish this feat, we shall simply link together
com-4lt would have to be performed essentially ^j- times, where At is the inverse of the sampling frequency for the application, and 8t is the time that it takes the host computer to perform the
Trang 13370 Programming Exercises
the network structures in a doubly linked list that a top-level routine can thenaccess sequentially A side benefit to this approach is that we have now cre-
ated a means of collecting a number of related STPs, and have grouped them
together sequentially Thus, we can utilize this structure to encode (and
recog-nize) a sequence of related patterns, such as the sonar signatures of different
submarines, using the output from the most active STN as an indication of thetype of submarine
The disadvantage to the STN, as mentioned earlier, is that it will requiremany concurrent STN simulations to begin to tackle problems that can be con-sidered nontrivial.5 There are two approaches to solving this dilemma, both ofwhich we leave to you as exercises The first alternative method is to eliminateredundant network elements whenever possible, as was illustrated in Figure 9.11and described in the previous section The second method is to implement theSCAF network, and to combine many SCAF's with an associative-memory net-work (such as a BPN or CPN, as described in Chapters 3 and 6 respectively) todecode the output of the final SCAF
Programming Exercises
9.1 Code the STN simulator and verify its operation by constructing multiple
STNs, each of which is coded to recognize a letter sequence as a word Forexample, consider the sequence "N E U R A L" versus the sequence "N E
U R O N." Assume that two STNs are constructed and initialized such thateach can recognize one of these two sequences At what point do the STNsbegin to fail to respond when presented with the wrong letter sequence?
9.2 Create several STNs that recognize letter sequences corresponding to
dif-ferent words Stack them to form simple sentences, and determine which(if any) STNs fail to respond when presented with word sequences that aresimilar to the encoded sequences
9.3 Construct an STN simulator that removes the redundant nodes for the
word-recognition application described in Programming Exercise 9.1 Show ings for any new (or modified) data structures, as well as for code Draw adiagram indicating the structure of the network Show how your new datastructures lend themselves to performing this simulation
list-9.4 Construct a simulator for the SCAF network Show the data structures
required, and a complete listing of code required to implement the network
Be sure to allow multiple SCAFs to feed one another, in order to stacknetworks Also describe how the output from your SCAF simulator wouldtie into a BPN simulator to perform the associative-memory function at theoutput
5 That is not to say that the STN should be considered a trivial network There are many applications where the STN might provide an excellent solution, such as voiceprint classification for controlling access to protected environments.
Trang 14Bibliography 371
9.5 Describe a method for training a BPN simulator to recognize the output of
a SCAF Remember that training in a BPN is typically completed beforethat network is first applied to a problem
Suggested Readings
There is not a great deal of information available about Hecht-Nielsen's STNimplementation Aside from the papers cited in the text, you can refer to hisbook for additional information [4]
On the subject of STP recognition in general, and speech recognition inparticular, there are a number of references to other approaches For a gen-eral review of neural networks for speech recognition, see the papers by Lipp-mann [5, 6, 7] For other methods see, for example, Grajski et al [1] andWilliams and Zipser [9]
[2] Stephen Grossberg Learning by neural networks In Stephen Grossberg,
editor, Studies of Mind and Brain D Reidel Publishing, Boston, MA, pp.
65-156, 1982
[3] Robert Hecht-Nielsen Nearest matched filter classification of ral patterns Technical report, Hecht-Nielsen Neurocomputer Corporation,San Diego CA, June 1986
spatiotempo-[4] Robert Hecht-Nielsen Neurocomputing Addison-Wesley, Reading, MA,
1990
[5] Richard P Lippmann and Ben Gold Neural-net classifiers useful for speech
recognition In Proceedings of the IEEE, First International Conference on Neural Networks, San Diego, CA, pp IV-417-IV-426, June 1987 IEEE.
[6] Richard P Lippmann Neural network classifiers for speech recognition
The Lincoln Laboratory Journal, 1(1): 107-124, 1988.
[7] Richard P Lippmann Review of neural networks for speech recognition
Neural Computation, 1(1): 1-38, Spring 1989.
[8] Robert L North Neurocomputing: Its impact on the future of defense
systems Defense Computing, 1(1), January-February 1988.
[9] Ronald J Williams and David Zipser A learning algorithm for continually
running fully recurrent neural networks Neural Computation,
1(2):270-280, 1989
Trang 16of their papers, Fukushima and his coworkers appear to be more interested indeveloping a model of the brain [4, 3].' To that end, their design was based
on the seminal work performed by Hubel and Weisel elucidating some of thefunctional architecture of the visual cortex
We could not begin to provide a complete accounting of what is knownabout the anatomy and physiology of the mammalian visual system Neverthe-less, we shall present a brief and highly simplified description of some of thatsystem's features as an aid to understanding thejbasis of the neocognitron design.Figure 10.1 shows the main pathways for neurons leading from the retinaback to the area of the brain known as the viiual, or striate, cortex This area
is also known as area 17 The optic nerve ii made up of axons from nervecells called retinal ganglia The ganglia receive stimulation indirectly from thelight-receptive rods and cones through several intervening neurons
Hubel and Weisel used an amazing technique to discern the function of thevarious nerve cells in the visual system They used microelectrodes to recordthe response of individual neurons in the cortex while stimulating the retina withlight By applying a variety of patterns and shapes, they were able to determinethe particular stimulus to which a neuron was most sensitive
The retinal ganglia and the cells of the lateral geniculate nucleus (LGN)appear to have circular receptive fields They respond most strongly to circular
'This statement is intended not as a negative criticism, but rather as justification for the ensuing, short discussion of biology.
Trang 17374 The Neocognitron
Rod and Bipolar Ganglion
cone cells cells cells
Left
eye
Centersurround
and
Center simple Complex
surround cortical corticalcells cells cells
Retina
LateralOptic chjasjna geniculate
nucleus
Visual cortex
Right
eye
Figure 10.1 Visual pathways from the eye to the primary visual cortex
are shown Some nerve fibers from each eye cross over into
the opposite hemisphere of the brain, where they meet nerve fibers from the other eye at the LGN From the LGN, neurons
project back to area 17 From area 17, neurons project into
other cortical areas, other areas deep in the brain, and also
back to the LGN Source: Reprinted with permission
ofAddison-Wesley Publishing Co., Reading, MA, from Martin A Fischler
and Oscar Firschein, Intelligence: The Eye, the Brain, and the Computer, © 1987 by Addison-Wesley Publishing Co.
spots of light of a particular size on a particular part of the retina The part
of the retina responsible for stimulating a particular ganglion cell is called thereceptive field of the ganglion Some of these receptive fields give an excitatoryresponse to a centrally located spot of light, and an inhibitory response to alarger, more diffuse spot of light These fields have an on-center off-surroundresponse characteristic (see Chapter 6, Section 6.1) Other receptive fields havethe opposite characteristic, with an inhibitory response to the centrally locatedspot—an off-center on-surround response characteristic
Trang 18The Neocognitron 375
The visual cortex itself is composed of six layers of neurons Most of theneurons from the LGN terminate on cells in layer IV These cells have circu-larly symmetric receptive fields like the retinal ganglia and the cells of the LGN.Further along the pathway, the response characteristic of the cells begins to in-crease in complexity Cells in layer IV project to a group of cells directly above
called simple cells Simple cells respond to line segments having a particular orientation Simple cells project to cells called complex cells Complex cells
respond to lines having the same orientation as their corresponding simple cells,although complex cells appear to integrate their response over a wider receptivefield In other words, complex cells are less sensitive to the position of the line
on the retina than are the simple cells Some complex cells are sensitive to linesegments of a particular orientation that are moving in a particular direction.Cells in different layers of area 17 project to different locations of the brain.For example, cells in layers II and III project to cells in areas 18 and 19 These
areas contain cells called hypercomplex cells Hypercomplex cells respond to
lines that form angles or corners and that move in various directions across thereceptive field
The picture that emerges from these studies is that of a hierarchy of cellswith increasingly complex response characteristics It is not difficult to extrap-olate this idea of a hierarchy into one where further data abstraction takes place
at higher and higher levels The neocognitron design adopts this hierarchicalstructure in a layered architecture, as illustrated schematically in Figure 10.2
"C1 U S3
Figure 10.2 The neocognitron hierarchical structure is shown Each box
represents a level in the neocognitron comprising a cell layer, usi, and a complex-cell layer, Ua, where i is
simple-the layer number U0 represents signals originating on theretina There is also a suggested mapping to the hierarchicalstructure of the brain The network concludes with singlecells that respond to complex visual stimuli These final cells
are often called grandmother cells after the notion that there
may be some cell in your brain that responds to complexvisual stimuli, such as a picture of your grandmother
Trang 19376 The Neocognitron
We remind you that the description of the visual system that we have sented here is highly simplified There is a great deal of detail that we haveomitted The visual system does not adhere to a strict hierarchical structure
pre-as presented here Moreover, we do not subscribe to the notion that mother cells per se exist in the brain We know from experience that strictadherence to biology often leads to a failed attempt to design a system to per-form the same function as the biological prototype: Flight is probably the mostsignificant example Nevertheless, we do promote the use of neurobiologicalresults if they prove to be appropriate The neocognitron is an excellent ex-ample of how neurobiological results can be used to develop a new networkarchitecture
grand-10.1 NEOCOGNITRON ARCHITECTURE
The neocognitron design evolved from an earlier model called the cognitron,
and there are several versions of the neocognitron itself The one that we shalldescribe has nine layers of PEs, including the retina layer The system wasdesigned to recognize the numerals 0 through 9, regardless of where they areplaced in the field of view of the retina Moreover, the network has a high degree
of tolerance to distortion of the character and is fairly insensitive to the size ofthe character This first architecture contains only feedforward connections
In Section 10.3.2, we shall describe a network that has feedback as well asfeedforward connections
10.1.1 Functional Description
The PEs of the neocognitron are organized into modules that we shall refer to
as levels A single level is shown in Figure 10.3 Each level consists of two layers: a layer of simple cells, or S-cells, followed by a layer of complex cells, or C-cells Each layer, in turn, is divided into a number of planes,
each of which consists of a rectangular array of PEs On a given level, theS-layer and the C-layer may or may not have the same number of planes.All planes on a given layer will have the same number of PEs; however, thenumber of PEs on the S-planes can be different from the number of PEs onthe C-planes at the same level Moreover, the number of PEs per plane canvary from level to level There are also PEs called Vs-cells and Vc-cells thatare not shown in the figure These elements play an important role in theprocessing, but we can describe the functionality of the system without reference
to them
We construct a complete network by combining an input layer, which we
shall call the retina, with a number of levels in a hierarchical fashion, as shown
in Figure 10.4 That figure shows the number of planes on each layer for theparticular implementation that we shall describe here We call attention to the
Trang 2010.1 Neocognitron Architecture 377
S-cell layerS-cell plane
Figure 10.3 A single level of a neocognitron is shown Each level consists
of two layers, and each layer consists of a number of planes.The planes contain the PEs in a rectangular array Data passfrom the S-layer to the C-layer through connections that arenot shown here In neocognitrons having feedback, there alsowill be connections from the C-layer to the S-layer
fact that there is nothing, in principle, that dictates a limit to the size of thenetwork in terms of the number of levels
The interconnection strategy is unlike that of networks that are fully terconnected between layers, such as the backpropagation network described
in-in Chapter 3 Figure 10.5 shows a schematic illustration of the way units areconnected in the neocognitron Each layer of simple cells acts as a feature-extraction system that uses the layer preceding it as its input layer On thefirst S-layer, the cells on each plane are sensitive to simple features on theretina—in this case, line segments at different orientation angles Each S-cell on a single plane is sensitive to the same feature, but at different loca-tions on the input layer S-cells on different planes respond to different fea-tures
As we look deeper into the network, the S-cells respond to features at higherlevels of abstraction; for example, corners with intersecting lines at various