Comparison Between Commonly Used Connectionist Mod- 123docz.net

5 10 15 20 25

0 2 4 6 8 10 12 14 16

Number of New Classes Accommodated

Deterioration Rate (%)

Letter 1−2 (solid line (1)) Letter 1−4

Letter 1−8

Letter 1−16 (solid line (2)) (1)

(2)

Fig. 2.8.Transition of the deterioration rate with varying the number of new classes accommodated – ISOLET data set

with the other three data sets. This is perhaps due to the insuﬃcient number of pattern vectors and thereby the weak coverage of the pattern space.

Nevertheless, it is stated that, by exploiting the flexible configuration property of a PNN, the separation of pattern space can be kept sufficiently well for each class even when adding new classes, as long as the amount of the training data is not excessive for each class. Then, as discussed above, this is supported by the empirical fact that the generalisation performance was not seriously deteriorated for almost all the cases.

It can therefore be concluded that any “catastrophic” forgetting of the previously stored data due to accommodation of new classes did not occur, which meets Criterion 4).

2.4 Comparison Between Commonly

Used Connectionist Models and PNNs/GRNNs

In practice, the advantage of PNNs/GRNNs is that they are essentially free from the “baby-sitting” required for e.g. MLP-NNs or SOFMs, i.e. the necessity to tune a number of network parameters to obtain a good convergence rate or worry about any numerical instability such as local minima or long

26 2 From Classical Connectionist Models to PNNs/GRNNs

and iterative training of the network parameters. As described earlier, by exploiting the property of PNNs/GRNNs, simple and quick incremental learning is possible due to their inherently memory-based architecture6, whereby the network growing/shrinking is straightforwardly performed (Hoya and Cham- bers, 2001a; Hoya, 2004b).

In terms of the generalisation capability within the pattern classification context, PNNs/GRNNs normally exhibit similar capability as compared with MLP-NNs; in Hoya (1998), such a comparison using the SFS dataset is made, and it is reported that a PNN/GRNN with the same number of hidden neu- rons as an MLP-NN yields almost identical classification performance. Related to this observation, in Mak et al. (1994), Mak et al. also compared the classification accuracy of an RBF-NN with an MLP-NN in terms of speaker identi- fication and concluded that an RBF-NN with appropriate parameter settings could even surpass the classification performance obtained by an MLP-NN.

Moreover, as described, by virtue of the ﬂexible network conﬁguration property, adding new classes can be straightforwardly performed, under the assumption that one pattern space spanned by a subnet isreasonably sepa- rated from the others. This principle is particularly applicable to PNNs and GRNNs; the training data for other widely-used layered networks such as MLP-NNs trained by a back-propagation algorithm (BP) or ordinary RBF- NNs is encoded and stored within the network after the iterative learning.

On the other hand, in MLP-NNs, the encoded data are then distributed over the weight vectors (i.e.sparse representation of the data) between the input and hidden layers and those between hidden and output layers (and hence not directly accessible).

Therefore, it is generally considered that, not to mention the accommodation of new classes, to achieve a flexible network configuration by an MLP-NN similar to that by a PNN/GRNN (that is, the quick network growing and shrinking) is very hard. This is because even a small adjustment of the weight parameters will cause a dramatic change in the pattern space constructed, which may eventually lead to a catastrophic corruption of the pattern space (Polikar et al., 2001). For the network reconfiguration of MLP-NNs, it is thus normally necessary for the iterative training to start from scratch. From an- other point of view, by MLP-NNs, the separation of the pattern space is represented in terms of the hyperplanes so formed, whilst that performed by PNNs and GRNNs is based upon the location and spread of the RBFs in the pattern space. In PNNs/GRNNs, it is therefore considered that, since a single class is essentially represented by a cluster of RBFs, a small change in a particular cluster does not have any serious impact upon other classes, unless the spread of the RBFs pervades the neighbour clusters.

6In general, the original RBF-NN scheme has already exhibited a similar property; in Poggio and Edelman (1990), it is stated that a reasonable initial performance can be obtained by merely setting the centres (i.e. the centroid vectors) to a subset of the examples.

2.4 Comparison Between Commonly Used Connectionist Models 27 Table 2.2.Comparison of symbol-grounding approaches and feedforward type networks – GRNNs, MLP-NNs, PNNs, and RBF-NNs

Generalised Multilayered Regression Neural Perceptron

Symbol Networks Neural Networks

Processing (GRNN)/ (MLP-NN)/Radial

Approaches Probabilistic Basic Function Neural Networks Neural Networks

(PNN) (RBF-NN)

Data Not Encoded Not Encoded Encoded

Representation Straightforward

Network Growing/ Yes Yes No

Shrinking (Yes for RBF-NN)

Numerical No No Yes

Instability

Memory Space Huge Relatively Moderately

Required Large Large

Capability in

Accommodating Yes Yes No

New Classes

In Table 2.2, a comparison of commonly used layered type artificial neural networks and symbol-based connectionist models is given, i.e. symbol processing approaches as in traditional artificial intelligence (see e.g. Newell and Simon, 1997) (where each node simply consists of the pattern and symbol (label) and no further processing between the respective nodes is involved) and layered type artificial neural networks, i.e. GRNNs, MLP-NNs, PNNs, and RBF-NNs.

As in Table 2.2 and the study (Hoya, 2003a), the disadvantageous points of PNNs may, in turn, reside in 1) the necessity for relatively large space in storing the network parameters, i.e. the centroid vectors, 2) intensive access to the stored data within the PNNs in the reference (i.e. testing) mode, 3) determination of the radii parameters, which is relevant to 2), and 4) how to determine the size of the PNN (i.e. the number of hidden nodes to be used).

In respect of 1), MLP-NNs seem to have an advantage in that the distributed (or sparse) data representation obtained after the learning may yield a more compact memory space than that required for PNN/GRNN, albeit at the expense of iterative learning and the possibility of the aforementioned numerical problems, which can be serious, especially when the size of the training set is large. However, this does not seem to give any further advantage, since, as in the pattern classiﬁcation application (Hoya, 1998), an RBF-NN (GRNN) with the same size of MLP-NN may yield a similar performance.

For 3), although some iterative tuning methods have been proposed and investigated (see e.g. Bishop, 1996; Wasserman, 1993), in Hoya and Chambers

28 2 From Classical Connectionist Models to PNNs/GRNNs

(2001a); Hoya (2003a, 2004b), it is reported that a unique setting of the radii for all the RBFs, which can also be regarded as the modiﬁed version suggested in (Haykin, 1994), still yields a reasonable performance:

σj=σ=θσ×dmax, (2.6)

wheredmax is maximum Euclidean distance between all the centroid vectors within a PNN/GRNN, i.e. dmax = max(cl−cm22),(l = m), and θσ is a suitably chosen constant (for all the simulation results given in Sect. 2.3.5, the settingθσ = 0.1 was employed.) Therefore, this is not considered to be crucial.

Point 4) still remains an open issue related to pruning of the data points to be stored within the network (Wasserman, 1993). However, the selection of data points, i.e. the determination of the network size, is not an issue limited to the GRNNs and PNNs. MacQueen’s k-means method (MacQueen, 1967) or, alternatively, graph theoretic data-pruning methods (Hoya, 1998) could be potentially used for clustering in a number of practical situations. These methods have been found to provide reasonable generalisation performance (Hoya and Chambers, 2001a). Alternatively, this can be achieved by means of an intelligent approach, i.e. within the context of the evolutionary process of a hierarchically arranged GRNN (HA-GRNN) (to be described in Chap. 10), since, as in Hoya (2004b), the performance of the suﬃciently evolved HA- GRNN is superior to an ordinary GRNN with exactly the same size using MacQueen’s k-means clustering method. (The issues related to HA-GRNNs will be given in more detail later in this book.)

Thus, the most outstanding issue pertaining to a PNN/GRNN seems to be 2). However, as described later (in Chap. 4), in the context of the self- organising kernel memory concept, this may not be such an issue, since, during the training phase, just one-pass presentation of the input data is suﬃcient to self-organise the network structure. In addition, by means of the modular architecture (to be discussed in Chap. 8; the hierarchically layered long-term memory (LTM) networks concept), the problem of intensive access, i.e. to update the radii values, could also be solved.

In addition, with a supportive argument regarding the RBF units in Vetter et al. (1995), the approach in terms of RBFs (or, in a more general term, the kernels) can also be biologically appealing. It is then fair to say that the functionality of an RBF unit somewhat represents that of the so-called

“grand-mother’ cells (Gross et al., 1972; Perrett et al., 1982)7. (We will return to this issue in Chap. 4.)

7However, at the neuro-anatomical level, whether or not such cells actually exist in a real brain is still an open issue and beyond the scope of this book. Here, the author simply intends to highlight the importance of the neurophysiological evidence that some cells (or the column structures) may represent thefunctionality of the

“grandmother” cells which exhibit such generalisation capability.

Comparison Between Commonly Used Connectionist Models

Deﬁnition of the Kernel Unit

Representation of the Kernel Unit Activated