1. Trang chủ
  2. » Công Nghệ Thông Tin

neural networks algorithms applications and programming techniques phần 10 potx

41 291 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Spatiotemporal Pattern Classification
Trường học Standard University
Chuyên ngành Neural Networks
Thể loại Luận văn
Thành phố City Name
Định dạng
Số trang 41
Dung lượng 0,99 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

9.4 Applications of STNs 363Associativememory IOutput from SCAF 4 Original input vectors Figure 9.18 This hierarchy of SCAF fayers is used for spatiotemporal pattern classification.. The

Trang 1

358 Spatiotemporal Pattern Classification

t

1 2 Figure 9.14 A simple two-node SCAF is shown.

is zero The mechanism for training these weights will be described later Thefinal assumption is that the initial value for F is zero

Consider what happens when Qn is applied first The net input to unit 1 is

-fl.net = Zi • Qn + U' 12 X 2 - T(t)

= i + o - r«)

where we have explicitly shown gamma as a function of time According to

Eqs (9.2) through (9.4), x\ — ~ax\ + b(l - F), so x\ begins to increase, since

F and x\ are initially zero.

The net input to unit 2 is

Trang 2

9.3 The Sequential Competitive Avalanche Field 359

After a short time, we remove Qn and present Qi2- x\ will begin to decay, but slowly with respect to its rise time Now, we calculate I\ nei and /2.net again:

I\ net = Z, • Q,2

= 0 + 0 - T(t)

h.nel = Z2 ' Q l 2 + W2 \Xi ~ T(t)

Using Eqs (9.2) through (9.4) again, x\ = —cax\ and ± 2 = 6(1 + x\ — F),

so x\ continues to decay, but x 2 will continue to rise until 1 + x\ < r(t) Figure 9 15(a) shows how x\ and x 2 evolve as a function of time

A similar analysis can be used to evaluate the network output for the

oppo-site sequence of input vectors When Q|2 is presented first, x 2 will increase x\

remains at zero since /i.net = -T(t) and, thus, x\ = -cax\ The total activity

in the system is not sufficient to cause T(t) to rise.

When Qn is presented, the input to unit 1 is I\ = 1 Even though x 2 is

nonzero, the connection weight is zero, so x 2 does not contribute to the input

to unit 1 x\ begins to rise and T(t) begins to rise in response to the increasing

total activity In this case, F does not increase as much as it did in the first

example Figure 9.15(b) shows the behavior of x\ and x 2 for this example The

values of F(t) for both cases are shown in Figure 9.15(c) Since T(t) is the

measure of recognition, we can conclude that Qn — > Qi2 was recognized, butQi2 — * Qn was not

9.3.2 Training the SCAF

As mentioned earlier, we accomplish training the weights on the connectionsfrom the inputs by methods already described for other networks These weightsencode the spatial part of the STP We have drawn the analogy between the SOMand the spatial portion of the STN In fact, a feood method for training the spatialweights on the SCAF is with Kohonen's clustering algorithm (see Chapter 7)

We shall not repeat the discussion of that trailing method here We shall insteadconcentrate on the training of the temporal part of the SCAF

Encoding the proper temporal order of the spatial patterns requires trainingthe weights on the connections between the various nodes This training uses thedifferential Hebbian learning law (also referred to as the Kosko-Klopf learninglaw):

Wij = (-cwij + dxiXj)U(±i)U(-Xj) (9.10) where c and d are positive constants, and

_ (

~ [

1 s >0

0 s < 0

Trang 3

360 Spatiotemporal Pattern Classification

(c)

30

Figure 9.15 These figures illustrate the output response of a 2-node SCAF.

(a) This graph shows the results of a numerical simulation of

the two output values during the presentation of the sequence

Qn —* Qi2- The input pattern changes at t = 17 (b) This

graph shows the results for the presentation of the sequenceQi2 —> Qn- (c) This figure shows how the value of F evolves

in each case FI is for the case shown in (a), and F2 is forthe case shown in (b)

Without the U factors, Eq (9.10) resembles the Grossberg outstar law The

U factors ensure that learning can occur (wjj is nonzero) only under certain

conditions These conditions are that x, is increasing (x, > 0) at the same time

that Xj is decreasing (-±j > 0) When these conditions are met, both U factors will be equal to one Any other combination of ±j and Xj will cause one, or both, of the Us to be zero.

The effect of the differential Hebbian learning law is illustrated in ure 9.16, which refers back to the two-node SCAF in Figure 9.14 We want

Fig-to train the network Fig-to recognize that pattern Qn precedes pattern Qi2- In theexample that we did, we saw that the proper response from the network was

Trang 4

9.3 The Sequential Competitive Avalanche Field 361

z, • Q11

x > 0'

Figure 9.16 This figure shows the results of a sequential presentation of

Qn followed by Qi2- The net-input values of the two units are

shown, along with the activity of each unit Notice that we

still consider that x\ > 0 and ± 2 > 0 throughout the periods

indicated, even though the activity value is hard-limited to a

maximum value of one The region R indicates the time for which x\ < 0 and ± 2 > 0 simultaneously During this time

period, the differential Hebbian learning law causes w 2 \ to

are never right for it to learn The weight, w 2\, does learn, resulting in the

configuration shown in Figure 9.14

Trang 5

362 Spatiotemporal Pattern Classification

Node output values

to another SCAF Since these outputs vary at a slower rate than the originalinput vectors, they can be sampled at a lower frequency The output values ofthis second SCAF would decay even more slowly than those of the previouslayer Conceptually, this process can be continued until a layer is reachedwhere the output patterns vary on a time scale that is equal to the total timenecessary to present a complete sequence of patterns to the original network.The last output values would be essentially stationary A single set of outputvalues from the last slab would represent an entire series of patterns making upone complete STP Figure 9.18 shows such a system based on a hierarchy ofSCAF layers

The stationary output vector can be used as the input vector to one of thespatial pattern-classification networks The spatial network can learn to classifythe stationary input vectors by the methods discussed previously A completespatiotemporal pattern-recognition and pattern-classification system can be con-structed in this manner

Exercise 9.4: No matter how fast input vectors are presented to a SCAF, the

outputs can be made to linger if the parameters of the attack function are justed such that, once saturated, a node output decays very slowly Such anarrangement would appear to eliminate the need for the layered SCAF architec-ture proposed in the previous paragraphs Analyze the response of a SCAF to

ad-an arbitrary STP in the limiting case where saturated nodes never decay

Trang 6

9.4 Applications of STNs 363

Associativememory

IOutput from SCAF 4

Original input vectors

Figure 9.18 This hierarchy of SCAF fayers is used for spatiotemporal

pattern classification The outputs from each layer aresampled at a rate slower man the rate at which inputs tothat layer change The output from the top layer, essentially

a spatial pattern, can be used as an input to an associativenetwork that classifies the original STP

9.4 APPLICATIONS OF STNS

We suggested earlier in this chapter that STNs would be useful in areas such asspeech recognition, radar analysis, and sonar-echo classification To date, thedearth of literature indicates that little work has been done with this promisingarchitecture

Trang 7

364 Spatiotemporal Pattern Classification

A prototype sonar-echo classification system was built by General DynamicsCorporation using the layered STN architecture described in Section 9.2 [8] Inthat study, time slices of the incoming sonar signals were converted to powerspectra, which were then presented to the network in the proper time sequence.After being trained on seven civilian boats, the network was able to identifycorrectly each of these vessels from the latter's passive sonar signature.The developers of the SCAF architecture experimented with a 30 by 30SCAF, where outputs from individual units are connected randomly to otherunits Apparently, the network performance was encouraging, as the developersare reportedly working on new applications Details of those applications arenot available at the time of this writing

9.5 STN SIMULATION

In this section, we shall describe the design of the simulator for the poral network We shall focus on the implementation of a one-layer STN andshall show how that STN can be extended to encompass multilayer (and multi-network) STN architectures The implementation of the SCAF architecture isleft to you as an exercise

spatiotem-We begin this section, as we have all previous simulation discussions, with

a presentation of the data structures used to construct the STN simulator Fromthere, we proceed with the development of the algorithms used to perform signalprocessing within the simulator We close this section with a discussion of how

a multiple STN structure might be created to record a temporal sequence ofrelated patterns

9.5.1 STN Data Structures

The design of the STN simulator is reminiscent of the design we used for theCPN in Chapter 6 We therefore recommend that you review Section 6.4 prior tocontinuing here The reason for the similarity between these two networks is thatboth networks fit precisely the processing structure we defined for performingcompetitive processing within a layer of units.3 The units in both the STNand the competitive layer of the CPN operate by processing normalized inputvectors, and even though competition in the CPN suppresses the output fromall but the winning unit(s), all network units generate an output signal that isdistributed to other PEs

The major difference between the competitive layer in the CPN and theSTN structure is related to the fact that the output from each unit in the STN

becomes an input to all subsequent network units on the layer, whereas the

lateral connections in the CPN simulation were handled by the host computer

'Although the STN is not competitive in the same sense that the hidden layer in the CPN is we

shall see that STN units respond actively to inputs in much the same way that CPN hidden-layer

Trang 8

9.5 STN Simulation 365

system, and never were actually modeled Similarly, the interconnections tween units on the layer in the STN can be accounted for by the processingalgorithms performed in the host computer, so we do not need to account forthose connections in the simulator design

be-Let us now consider the top-level data structure needed to model an STN

As before, we will construct the network as a record containing pointers tothe appropriate lower-level structures, and containing any network specific dataparameters that are used globally within the network Therefore, we can create

an STN structure through the following record declaration:

Notice that, as illustrated in Figure 9.19, this record definition differs fromall previous network record declarations in that we have included a means for

Figure 9.19 The data structure of the STN simulator is shown Notice that,

in this network structure, there are pointers to other networkrecords above and below to accommodate multiple STNs Inthis manner, the same input data can be propagated efficiently through multiple STN structures.

Trang 9

366 Spatiotemporal Pattern Classification

stacking multiple networks through the use of a doubly linked list of network

record pointers We include this capability for two reasons:

1 As described previously, a network that recognizes only one pattern is not

of much use We must therefore consider how to integrate multiple networks

as part of our simulator design

2 When multiple STNs are used to time dilate temporal patterns (as in the

SCAF), the activity patterns of the network units can be used as inputpatterns to another network for further classification

Finally, inspection of the STN record structure reveals that there is nothingabout the STN that will require further modifications or extensions to the genericsimulator structure we proposed in Chapter 1 We are therefore free to begindeveloping STN algorithms

9.5.2 STN Algorithms

Let us begin by considering the sequence of operations that must be performed

by the computer to simulate the STN Using the speech-recognition example

as described in Section 9.2.1 as the basis for the processing model, we canconstruct a list of the operations that must be performed by the STN simulator

1 Construct the network, and initialize the input connections to the units suchthat the first unit in the layer has the first normalized input pattern contained

in its connections, the second unit has the second pattern, and so on

2 Begin processing the test pattern by zeroing the outputs from all units inthe network (as well as the STN.y value, since it is a duplicate copy ofthe output value from the last network unit), and then applying the firstnormalized test vector to the input of the STN

3 Calculate the inner product between the input test vector and the weightvector for the first unprocessed unit

4 Compute the sum of the outputs from all units on the layer from the first

to the previous units, and multiply the result by the network d term.

5 Add the result from step 3 to the result from step 4 to produce the inputactivation for the unit

6 Subtract the threshold value (F) from the result of step 5 If the result is

greater than zero, multiply it by the network b term; otherwise, substitute

zero for the result

7 Multiply the negative of the network a term by the previous output fromthe unit, and add the result to the value produced in step 6

8 If the result of step 7 was less than or equal to zero, multiply it by the

network c term to produce x Otherwise, use the result of step 7 without modification as the value for x.

Trang 10

9.5 STN Simulation 367

9 Compute the attack value for the unit by multiplying the x value calculated

in step 8 by a small value indicating the network update rate (6t) to produce

the update value for the unit output Update the unit output by adding thecomputed attack value to the current unit output value

10 Repeat steps 3 through 9 for each unit in the network.

11 Repeat steps 3 through 10 for the duration of the time step, Ai The number

of repetitions that occur during this step will be a function of the samplingfrequency for the specific application

12 Apply the next time-sequential test vector to the network input, and repeat

steps 3 through 11

13 After all the time-sequential test vectors have been applied, use the output

of the last unit on the layer as the output value for the network for the givenSTP

Notice that we have assumed that the network units update at a rate much

more rapid than the sampling rate of the input (i.e., the value for fit is much smaller than the value of At) Since the actual sampling frequency (given by

-^r) will always be application dependent, we shall assume that the network

must update itself 100 times for each input pattern Thus, the ratio of 6t to A<

is 0.01, and we can use this ratio as the value for 6t in our simulations.

We shall also assume that you will provide the routines necessary to performthe first two operations in the list We therefore begin developing the simulatoralgorithms with the routine needed to propagate a given input pattern vector to aspecified unit on the STN This routine will encompass the operations described

in steps 3 through 5

function activation (net: STN; unumber:integer;

invec:"float[])return float;

{propagate the given input vector to the STN unit number}var i : integer; I {iteration counter}sum : float; {accumulator}others : float; {unit output accumulator}connects : ~float[]; {locate connection array}unit : "float[]; {locate unit outputs}begin

sum = 0; {initialize accumulator}others = 0; {ditto}unit = net.UNITS".OUTS; {locate unit arrays}connects = net.UNITS".WEIGHTS[unumber];

for i = 1 to length(invec) {for all input elements}

do {compute sum of products}sum = sum + connects[i} * invec[i];

end do;

Trang 11

368 Spatiotemporal Pattern Classification

for i = 1 to (unumber - 1) {sum other units outputs}do

others = others + unit[i];

function Xdot (net:STN; unumber:integer; inval:float)

return float;

{convert the input value for the specified unit to

output value}var outval : float;

if (outval <= 0) {factor in decay term}then outval = outval * net.c;

return (outval); {return delta x value}end function;

All that remains at this point is to define a top-level procedure to tie togetherthe signal-propagation routines, and to iterate for every unit in the network.These functions are embodied in the following procedure

procedure propagate (net:STN; invec:"float[]);

{propagate an input vector through the STN}

const dt = 0.01; {network update rate}var i : integer; {iteration counter}how_many : integer; {number of units in STN}

dx : float; {computed Xdot value}

inval : float; {input activation}

Trang 12

9.5 STN Simulation 369

begin

unit = net.UNITS".OUTS; {locate the output array}how_many = length(unit); {save number of units}for i = 1 to how_many {for all units in the STN}

do {generate output from input}inval = activation (net, i, invec);

dx = Xdot (net, i, inval);

unitfi] = unitfi] + (dx * dt);

end do;

net.y = unit[how_many]; {save last unit output}end procedure;

The propagate procedure will perform a complete signal propagation

of one input vector through the entire STN For a true spatiotemporal

pattern-classification operation, propagate would have to be performed many times4for every Q; patterns that compose the spatiotemporal pattern to be processed

If the network recognized the temporal pattern sequence, the value contained inthe STN y slot would be relatively high after all patterns had been propagated

9.5.3 STN Training

In the previous discussion, we considered an STN that was trained by

initializa-tion Training the network in this manner is fine if we know all the training

vec-tors prior to building the network simulator But what about those cases where

it is preferable to defer training until after the network is operational? Suchoccurrences are common when the training environment is rather large, or whentraining-data acquisition is cumbersome In such cases, is it possible to train anSTN to record (and eventually to replay) data patterns collected at run time?The answer to this question is a qualified "yes." The reason it is qualified

is that the STN is not undergoing training in the same sense that most of theother networks described in this text are trained Rather, we shall take the

approach that an STN can be constructed dynamically, thus simulating the effect

of training As we have seen, the standard STN is constructed and initialized to

contain the normalized form of the pattern to be encoded at each timestep in the

connections of the individual network units To train an STN, we will simply cause our program to create a new STN whenever a new pattern to be learned

is available In this manner, we construct specialized STNs that can then beexercised using all of the algorithms developed previously

The only special consideration is that, with multiple networks in the puter simultaneously, we must take care to ensure that the networks remainaccessible and consistent To accomplish this feat, we shall simply link together

com-4lt would have to be performed essentially ^j- times, where At is the inverse of the sampling frequency for the application, and 8t is the time that it takes the host computer to perform the

Trang 13

370 Programming Exercises

the network structures in a doubly linked list that a top-level routine can thenaccess sequentially A side benefit to this approach is that we have now cre-

ated a means of collecting a number of related STPs, and have grouped them

together sequentially Thus, we can utilize this structure to encode (and

recog-nize) a sequence of related patterns, such as the sonar signatures of different

submarines, using the output from the most active STN as an indication of thetype of submarine

The disadvantage to the STN, as mentioned earlier, is that it will requiremany concurrent STN simulations to begin to tackle problems that can be con-sidered nontrivial.5 There are two approaches to solving this dilemma, both ofwhich we leave to you as exercises The first alternative method is to eliminateredundant network elements whenever possible, as was illustrated in Figure 9.11and described in the previous section The second method is to implement theSCAF network, and to combine many SCAF's with an associative-memory net-work (such as a BPN or CPN, as described in Chapters 3 and 6 respectively) todecode the output of the final SCAF

Programming Exercises

9.1 Code the STN simulator and verify its operation by constructing multiple

STNs, each of which is coded to recognize a letter sequence as a word Forexample, consider the sequence "N E U R A L" versus the sequence "N E

U R O N." Assume that two STNs are constructed and initialized such thateach can recognize one of these two sequences At what point do the STNsbegin to fail to respond when presented with the wrong letter sequence?

9.2 Create several STNs that recognize letter sequences corresponding to

dif-ferent words Stack them to form simple sentences, and determine which(if any) STNs fail to respond when presented with word sequences that aresimilar to the encoded sequences

9.3 Construct an STN simulator that removes the redundant nodes for the

word-recognition application described in Programming Exercise 9.1 Show ings for any new (or modified) data structures, as well as for code Draw adiagram indicating the structure of the network Show how your new datastructures lend themselves to performing this simulation

list-9.4 Construct a simulator for the SCAF network Show the data structures

required, and a complete listing of code required to implement the network

Be sure to allow multiple SCAFs to feed one another, in order to stacknetworks Also describe how the output from your SCAF simulator wouldtie into a BPN simulator to perform the associative-memory function at theoutput

5 That is not to say that the STN should be considered a trivial network There are many applications where the STN might provide an excellent solution, such as voiceprint classification for controlling access to protected environments.

Trang 14

Bibliography 371

9.5 Describe a method for training a BPN simulator to recognize the output of

a SCAF Remember that training in a BPN is typically completed beforethat network is first applied to a problem

Suggested Readings

There is not a great deal of information available about Hecht-Nielsen's STNimplementation Aside from the papers cited in the text, you can refer to hisbook for additional information [4]

On the subject of STP recognition in general, and speech recognition inparticular, there are a number of references to other approaches For a gen-eral review of neural networks for speech recognition, see the papers by Lipp-mann [5, 6, 7] For other methods see, for example, Grajski et al [1] andWilliams and Zipser [9]

[2] Stephen Grossberg Learning by neural networks In Stephen Grossberg,

editor, Studies of Mind and Brain D Reidel Publishing, Boston, MA, pp.

65-156, 1982

[3] Robert Hecht-Nielsen Nearest matched filter classification of ral patterns Technical report, Hecht-Nielsen Neurocomputer Corporation,San Diego CA, June 1986

spatiotempo-[4] Robert Hecht-Nielsen Neurocomputing Addison-Wesley, Reading, MA,

1990

[5] Richard P Lippmann and Ben Gold Neural-net classifiers useful for speech

recognition In Proceedings of the IEEE, First International Conference on Neural Networks, San Diego, CA, pp IV-417-IV-426, June 1987 IEEE.

[6] Richard P Lippmann Neural network classifiers for speech recognition

The Lincoln Laboratory Journal, 1(1): 107-124, 1988.

[7] Richard P Lippmann Review of neural networks for speech recognition

Neural Computation, 1(1): 1-38, Spring 1989.

[8] Robert L North Neurocomputing: Its impact on the future of defense

systems Defense Computing, 1(1), January-February 1988.

[9] Ronald J Williams and David Zipser A learning algorithm for continually

running fully recurrent neural networks Neural Computation,

1(2):270-280, 1989

Trang 16

of their papers, Fukushima and his coworkers appear to be more interested indeveloping a model of the brain [4, 3].' To that end, their design was based

on the seminal work performed by Hubel and Weisel elucidating some of thefunctional architecture of the visual cortex

We could not begin to provide a complete accounting of what is knownabout the anatomy and physiology of the mammalian visual system Neverthe-less, we shall present a brief and highly simplified description of some of thatsystem's features as an aid to understanding thejbasis of the neocognitron design.Figure 10.1 shows the main pathways for neurons leading from the retinaback to the area of the brain known as the viiual, or striate, cortex This area

is also known as area 17 The optic nerve ii made up of axons from nervecells called retinal ganglia The ganglia receive stimulation indirectly from thelight-receptive rods and cones through several intervening neurons

Hubel and Weisel used an amazing technique to discern the function of thevarious nerve cells in the visual system They used microelectrodes to recordthe response of individual neurons in the cortex while stimulating the retina withlight By applying a variety of patterns and shapes, they were able to determinethe particular stimulus to which a neuron was most sensitive

The retinal ganglia and the cells of the lateral geniculate nucleus (LGN)appear to have circular receptive fields They respond most strongly to circular

'This statement is intended not as a negative criticism, but rather as justification for the ensuing, short discussion of biology.

Trang 17

374 The Neocognitron

Rod and Bipolar Ganglion

cone cells cells cells

Left

eye

Centersurround

and

Center simple Complex

surround cortical corticalcells cells cells

Retina

LateralOptic chjasjna geniculate

nucleus

Visual cortex

Right

eye

Figure 10.1 Visual pathways from the eye to the primary visual cortex

are shown Some nerve fibers from each eye cross over into

the opposite hemisphere of the brain, where they meet nerve fibers from the other eye at the LGN From the LGN, neurons

project back to area 17 From area 17, neurons project into

other cortical areas, other areas deep in the brain, and also

back to the LGN Source: Reprinted with permission

ofAddison-Wesley Publishing Co., Reading, MA, from Martin A Fischler

and Oscar Firschein, Intelligence: The Eye, the Brain, and the Computer, © 1987 by Addison-Wesley Publishing Co.

spots of light of a particular size on a particular part of the retina The part

of the retina responsible for stimulating a particular ganglion cell is called thereceptive field of the ganglion Some of these receptive fields give an excitatoryresponse to a centrally located spot of light, and an inhibitory response to alarger, more diffuse spot of light These fields have an on-center off-surroundresponse characteristic (see Chapter 6, Section 6.1) Other receptive fields havethe opposite characteristic, with an inhibitory response to the centrally locatedspot—an off-center on-surround response characteristic

Trang 18

The Neocognitron 375

The visual cortex itself is composed of six layers of neurons Most of theneurons from the LGN terminate on cells in layer IV These cells have circu-larly symmetric receptive fields like the retinal ganglia and the cells of the LGN.Further along the pathway, the response characteristic of the cells begins to in-crease in complexity Cells in layer IV project to a group of cells directly above

called simple cells Simple cells respond to line segments having a particular orientation Simple cells project to cells called complex cells Complex cells

respond to lines having the same orientation as their corresponding simple cells,although complex cells appear to integrate their response over a wider receptivefield In other words, complex cells are less sensitive to the position of the line

on the retina than are the simple cells Some complex cells are sensitive to linesegments of a particular orientation that are moving in a particular direction.Cells in different layers of area 17 project to different locations of the brain.For example, cells in layers II and III project to cells in areas 18 and 19 These

areas contain cells called hypercomplex cells Hypercomplex cells respond to

lines that form angles or corners and that move in various directions across thereceptive field

The picture that emerges from these studies is that of a hierarchy of cellswith increasingly complex response characteristics It is not difficult to extrap-olate this idea of a hierarchy into one where further data abstraction takes place

at higher and higher levels The neocognitron design adopts this hierarchicalstructure in a layered architecture, as illustrated schematically in Figure 10.2

"C1 U S3

Figure 10.2 The neocognitron hierarchical structure is shown Each box

represents a level in the neocognitron comprising a cell layer, usi, and a complex-cell layer, Ua, where i is

simple-the layer number U0 represents signals originating on theretina There is also a suggested mapping to the hierarchicalstructure of the brain The network concludes with singlecells that respond to complex visual stimuli These final cells

are often called grandmother cells after the notion that there

may be some cell in your brain that responds to complexvisual stimuli, such as a picture of your grandmother

Trang 19

376 The Neocognitron

We remind you that the description of the visual system that we have sented here is highly simplified There is a great deal of detail that we haveomitted The visual system does not adhere to a strict hierarchical structure

pre-as presented here Moreover, we do not subscribe to the notion that mother cells per se exist in the brain We know from experience that strictadherence to biology often leads to a failed attempt to design a system to per-form the same function as the biological prototype: Flight is probably the mostsignificant example Nevertheless, we do promote the use of neurobiologicalresults if they prove to be appropriate The neocognitron is an excellent ex-ample of how neurobiological results can be used to develop a new networkarchitecture

grand-10.1 NEOCOGNITRON ARCHITECTURE

The neocognitron design evolved from an earlier model called the cognitron,

and there are several versions of the neocognitron itself The one that we shalldescribe has nine layers of PEs, including the retina layer The system wasdesigned to recognize the numerals 0 through 9, regardless of where they areplaced in the field of view of the retina Moreover, the network has a high degree

of tolerance to distortion of the character and is fairly insensitive to the size ofthe character This first architecture contains only feedforward connections

In Section 10.3.2, we shall describe a network that has feedback as well asfeedforward connections

10.1.1 Functional Description

The PEs of the neocognitron are organized into modules that we shall refer to

as levels A single level is shown in Figure 10.3 Each level consists of two layers: a layer of simple cells, or S-cells, followed by a layer of complex cells, or C-cells Each layer, in turn, is divided into a number of planes,

each of which consists of a rectangular array of PEs On a given level, theS-layer and the C-layer may or may not have the same number of planes.All planes on a given layer will have the same number of PEs; however, thenumber of PEs on the S-planes can be different from the number of PEs onthe C-planes at the same level Moreover, the number of PEs per plane canvary from level to level There are also PEs called Vs-cells and Vc-cells thatare not shown in the figure These elements play an important role in theprocessing, but we can describe the functionality of the system without reference

to them

We construct a complete network by combining an input layer, which we

shall call the retina, with a number of levels in a hierarchical fashion, as shown

in Figure 10.4 That figure shows the number of planes on each layer for theparticular implementation that we shall describe here We call attention to the

Trang 20

10.1 Neocognitron Architecture 377

S-cell layerS-cell plane

Figure 10.3 A single level of a neocognitron is shown Each level consists

of two layers, and each layer consists of a number of planes.The planes contain the PEs in a rectangular array Data passfrom the S-layer to the C-layer through connections that arenot shown here In neocognitrons having feedback, there alsowill be connections from the C-layer to the S-layer

fact that there is nothing, in principle, that dictates a limit to the size of thenetwork in terms of the number of levels

The interconnection strategy is unlike that of networks that are fully terconnected between layers, such as the backpropagation network described

in-in Chapter 3 Figure 10.5 shows a schematic illustration of the way units areconnected in the neocognitron Each layer of simple cells acts as a feature-extraction system that uses the layer preceding it as its input layer On thefirst S-layer, the cells on each plane are sensitive to simple features on theretina—in this case, line segments at different orientation angles Each S-cell on a single plane is sensitive to the same feature, but at different loca-tions on the input layer S-cells on different planes respond to different fea-tures

As we look deeper into the network, the S-cells respond to features at higherlevels of abstraction; for example, corners with intersecting lines at various

Ngày đăng: 12/08/2014, 21:21

TỪ KHÓA LIÊN QUAN