DSpace at VNU: A max-min learning rule for Fuzzy ART

A max-min learning rule for Fuzzy ARTNong Thi Hoa, The Duy Bui Human Machine Interaction Laboratory University of Engineering and Technology Vietnam National University, Hanoi Abstract—F

Trang 1

A max-min learning rule for Fuzzy ART

Nong Thi Hoa, The Duy Bui Human Machine Interaction Laboratory University of Engineering and Technology Vietnam National University, Hanoi

Abstract—Fuzzy Adaptive Resonance Theory (Fuzzy ART) is

an unsupervised neural network, which clusters data effectively

based on learning from training data In the learning process,

Fuzzy ARTs update the weight vector of the wining category

based on the current input pattern from training data Fuzzy

ARTs, however, only learn from patterns whose values are smaller

than values of stored patterns In this paper, we propose a

max-min learning rule of Fuzzy ART that learns all patterns of

training data and reduces effect of abnormal training patterns

Our learning rule changes the weight vector of the wining

cate-gory based on the minimal difference between the current input

pattern and the old weight vector of the wining category We

have also conducted experiments on seven benchmark datasets to

prove the effectiveness of the proposed learning rule Experiment

results show that clustering results of Fuzzy ART with our

learning rule (Max-min Fuzzy ART) is significantly higher than

that of other models in complex datasets

Index Terms—Fuzzy ART; Adaptive Resonance Theory;

Clus-tering; Learning Rule;Unsupervised Neural Network

I INTRODUCTION Clustering is an important tool in data mining and

knowl-edge discovery It discovers the hidden similarities and key

concepts from data based on its ability to group similar items

together automatically Moreover, the clustering classifies a

large amount of data into a small number of groups This

serves as an invaluable tool for users to comprehend a large

amount of data Fuzzy Adaptive Resonance Theory (Fuzzy

ART) is an unsupervised neural network clustering from data

in a effective way In Fuzzy ART, weight vectors of categories

are updated when the similarity between the input pattern

and the wining category satisfies a given condition Studies

on Fuzzy ART can be divided into three categories including

developing new models, studying properties of Fuzzy ART,

and optimizing the computing of models In the category of

developing new models, Fuzzy ARTs are improved in the

learning step in order to increase the ability of clustering

Carpenter et al [1] proposed the first Fuzzy ART showing

the capacity of stable learning of recognition categories Isawa

et al [2], [3] proposed an additional step, Group Learning,

to present connections between similar categories Yousuf and

Murphey [4] provided an algorithm that allowed Fuzzy ART to

update multiple matching categories However, previous Fuzzy

ARTs only learn patterns whose values are smaller than values

of stored patterns It means that only when values of input

pattern are smaller than values of the weight vector of the

wining category, the weight vector of the wining category is

modified Otherwise, the weight vector of wining category is

not changed In other words, some input patterns might not

Figure 1 Architecture of an ART network

contribute to the learning process of Fuzzy ART Therefore, some important features of these patterns are not learned In this paper, we propose a new learning rule of Fuzzy ART, which we name max-min learning rule Our learning rule allows contribution of all training patterns to the learning process and reduces the effect of abnormal training patterns Proposed learning rule updates the weight vector of the wining category based on the minimal difference between the current input pattern and the old weight vector of the wining category

We have conducted experiments on seven benchmark datasets

in UCI and Shape database to prove the effectiveness of Max-min Fuzzy ART Results from the experiments shows that Max-min Fuzzy ART clusters better than other models The rest of this paper is organized as follows The next section

is back ground of Fuzzy ART Related works are presented

in Section 3 In section 4, we present the proposed learning rule Section 5 shows experiments and compares results of Max-min Fuzzy ART with those of other models

II BACK GROUND

A ART Network Adaptive Resonance Theory (ART) neural networks [5], [6] are developed to address the problem of stability-plasticity dilemma The general structure of an ART network is shown

in the Figure 1

A typical ART network consists of two layers: an input layer (F1) and an output layer (F2) The input layer contains

n nodes, where n is the number of input patterns The number

of nodes in the output layer is decided dynamically Every node in the output layer has a corresponding weight vector The network dynamics are governed by two sub-systems: an attention subsystem and an orienting subsystem The attention subsystem proposes a winning neuron (or category) and the orienting subsystem decides whether to accept the winning neuron or not This network is in a resonant state when the orienting subsystem accepts a winning category

2013 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for the Future (RIVF)

Trang 2

B Fuzzy ART Algorithm

Carpenter et al summarize Fuzzy ART algorithm in [1]

Input vector: Each input I is an M-dimensional vector

(I1, IM), where each component li is in the interval [0,

1]

Parameters: Fuzzy ART’s dynamics are determined by a

choice parameter α > 0; a learning rate parameter β ∈ [0, 1];

and a vigilance parameter θ ∈ [0, 1]

Fuzzy ART algorithm consists of five following steps:

Step 1: Set up weight vector

Each category (j) corresponds to a vector Wj =

(Wj1, , WjM) of adaptive weights, or LTM traces The

number of potential categories N(j = i, , N) is arbitrary

Initially

and each category is said to be uncommitted Alternatively,

initial weights Wji may be taken greater than 1 Larger

weights bias the system against selection of uncommitted

nodes, leading to deeper searches of previously coded

cat-egories After a category is selected for coding it becomes

committed As shown below, each LTM trace Wjiis monotone

non-increasing through time and hence converges to a limit

Step 2: Choose a wining category

For each input I and category j, the choice function Tj is

defined by

Tj(I) = ||I ∧ wj||

where the fuzzy AND operator ∧ is defined by

(x ∧ y)i= min(xi, yi) (3) and where the norm ||.|| is defined by

||x|| =

M

X

i=1

For notational simplicity, Tj(I) in Eqs 2 is often written as

Tj when the input I is fixed The category choice is indexed

by j, where

Tj= max{Tj: j = 1 N } (5)

If more than one Tj is maximal, the category j with

the smallest index is chosen In particular, nodes become

committed in order j = 1, 2, 3,

Step 3: Test state of Fuzzy ART

Resonance occurs if the match function of the chosen

category meets the vigilance criterion; that is, if

||I ∧ wj||

then learning process is performed in Step 4

Mismatch reset occurs if

||I ∧ wj||

Then the value of the choice function Tj is reset to −1 for the duration of the input presentation A new index J is chosen,

by Eqs 5 The search process continues until the chosen j satisfies Eqs 6 or actives a new category

Step 4: Perform the learning process

The weight vector if jthcategory, Wj, is updated according

to the following equation:

Wjnew= β(I ∧ Wjold) + (1 − β)Wjold (8) Step 5: Active a new category

For each input I, if no existing category satisfies Eqs 6 then

a new category j becomes active Then, Wj(new)= I

C Fuzzy ART with complement coding Proliferation of categories is avoided in Fuzzy ART if inputs are normalized; that is, for some γ > 0

for all inputs I Normalization can be achieved by prepro-cessing each incoming vector a

Complement coding represents both a and the complement

of a The complement of a is denoted by ac, where

The complement coded input I to the recognition system is the 2M-dimensional vector

I = (ai, aci) = (a1, , aM, ac1, , aMi ) (11) After normalization, ||I|| = M so inputs preprocessed into complement coding form are automatically normalized Where complement coding is used, the initial condition in Eqs 1 is replaced by:

III RELATED WORK Studies on theory of Fuzzy ART can be divided into three categories including developing new models, studying proper-ties of Fuzzy ART, and optimizing the computing of models

In the first category, new models of Fuzzy ART used a general learning rule Carpenter et al [7] proposed Fuzzy ARTMAP for incremental supervised learning of recognition categories and multidimensional maps from arbitrary sequences of input set This model minimized predictive error and maximized code generalization by increasing the ART vigilance parameter

to correct the predictive error Prediction was improved by training system several times with different sequences of input set, then voting This vote assigns probability estimations to competing predictions for small, noisy, and incomplete data Isawa et al [2] proposed an additional step that was called

Trang 3

Group Learning Its important feature was that creating

con-nections between similar categories It means that this model

learned not only weight vectors of categories but also relations

among categories in a group Then, Isawa [3] designed an

im-proved Fuzzy ART combining overlapped categories base on

connections This study arranged the vigilance parameters for

categories and varied them in learning process Moreover, this

model voided the category proliferating Yousuf and Murphey

[4] proposed an algorithm that compared the weights of every

categories with the current input pattern simultaneously and

allowed updating multiple matching categories This model

monitored the effects of updating wrong clusters Weight

scaling of categories depended on the closeness of the weight

vectors to the current input pattern

In the second category, important properties of Fuzzy ART

were studied to choose suitable parameters for each Fuzzy

ART Huang et al [8] presented some vital properties that

were distinguished into a number of categories The vital

properties included template, access, reset, and other properties

for weight stabilization Moreover, the effects of the choice

parameter and the vigilance parameter on the functionality of

Fuzzy ART were presented clearly Geogiopoulos et al [9]

provided a geometrical and clearer understanding of why, and

in which order that categories were chosen for various ranges

of the choice parameter This study came in useful when

devel-oping properties of learning that pertained to the architecture

of neural networks Anagnostopoulos and Georgiopoulos [10]

introduced geometric concepts, namely category regions, in the

original framework of Fuzzy ART and Fuzzy ARTMAP These

regions had the same geometrical shape and shared many

common and interesting properties They proved properties of

the learning and showed that training and performance phases

did not depend on particular choices of the vigilance parameter

in one special state of the vigilance-choice parameter space

In the third category, studies focused on ways to improve

the performance of Fuzzy ART Burwick and Joublin [11]

discussed implementations of ART on a non-recursive

al-gorithm to decrease alal-gorithmic complexity of Fuzzy ART

Therefore, the complexity dropped from O(N*N+M*N) down

to O(NM) where N was the number of categories and M

was the input dimension Dagher et al [12] introduced an

ordering algorithm for Fuzzy ARTMAP that identified a fixed

order of training pattern presentation based on the maxmin

clustering method The combination of this algorithm with

fuzzy ARTMAP established an ordered Fuzzy ARTMAP that

exhibited a generalization performance better Cano et al

[13] generated accurate function identifiers for noisy data

This study was supported by theorems that guaranteed the

possibility of representing an arbitrary function by fuzzy

systems They proposed two neuron-fuzzy identifiers that

offered a dual interpretation as fuzzy logic system or neural

network Moreover, these identifiers can be trained on noisy

data without changing the structure of neural networks or data

preprocessing Kobayashi et al [14] proposed a reinforcement

learning system that used Fuzzy ART to classify observed

information and construct an effective state space Then, profit

sharing was employed as a reinforcement learning method Furthermore, this system was used to effectively solve partially observed Markov decision process

IV OUR APPROACH

A Max-min learning rule of Fuzzy ART

As we discuss in Section 1, Fuzzy ARTs cannot learn from some important patterns of training data Therefore, we propose the max-min learning rule that learns all training patterns and avoids the effect of abnormal training patterns

In our learning rule, the weight vector of the wining category is updated based on the minimum difference between the input patterns and the old weight vector of the wining category Parameter shows the effect of each training pattern for each wining category We proposed a procedure to find an optimized value of for each dataset In this procedure, after

is set up roughly based on the size of dataset, it is increased

or decreased until clustering results become highest

Max-min learning rule is presented as follow:

Learning step is performed by the three following steps:

• Step 1: Determine the minimum difference between the current input pattern and the old weight vector of the wining category including the minimum difference of decrease (MDD) and the minimum difference of increase (MDI) MDD and MDI are formulated by two equations:

M DD = lim inf

Ii<W old

ji ,i=1, ,M

Wjiold− Ii (13)

M DI = lim inf

Ii>W old

ji ,i=1, ,M

Ii− Wold

ji (14)

• Step 2: Find an optimized value of δ by the procedure

in the next subsection

• Step 3: The weight vector of the wining category, W, is updated by the following equation:

Wji=





Wjiold− δ ∗ M DD, Ii< Wjiold

Wold

ji , Ii= Wold

ji

Wjiold+ δ ∗ M DI, Ii> Wjiold

(15)

B Procedure for determining an optimized value ofδ

We select a random subset from dataset with uniform distribution for categories Then, Max-min Fuzzy ART uses this subset to test the ability of clustering for each value of Our procedure consists of three steps as follow:

• Step 1: Set up δ based on the size of dataset Then, calculate clustering result

• Step 2: Do – Step 2.1: Increase or decrease value of with a small step

– Step 2.2: Calculate the clustering result

– Step 2.3: Test clustering result:

IF the ability of clustering is an increase or decrease THEN do Step 2

IF clustering result is highest THEN go Step 3

Trang 4

Table I CHARACTERISTICS OF DATASETS

until the clustering result is highest

• Step 3: Return the optimized value of δ

C Discussion

The proposed learning rule has two important advantages

overcome original learning rule as follow:

• All patterns from training data are learned by Eqs 15

Moreover, the effect level is equal for each training

pattern because parameter δ is fixed

• Avoiding the effect of abnormal training patterns

pre-sented in Eqs 13 and 14 because MDI and MDD are

the minimum difference of decrease and the minimum

difference of increase

The ability of clustering of Max-min Fuzzy ART can be

improved based on improvements of the learning process

V EXPERIMENTS

We select seven bench mark datasets from UCI database

1 and Shape database 2 for experiments, namely, MONKS,

BALANCE-SCALE, D31, R35, WDBC (Wisconsin

Diagnos-tic Breast Cancer), WINE-RED (Wine Quality of Red wine),

and WINE-WHITE (Wine Quality of White wine) These

datasets are different from each other by the number of

attributes, categories, patterns, and distribution of categories

Table I shows characteristics of selected datasets

Max-min Fuzzy ART is implemented into two models

including first model (Original Max-min Fuzzy ART) and

the second model with normalized inputs (Complement

Max-min Fuzzy ART) Similarly, Fuzzy ART of Carpenter [1]

consists of two models including Original Fuzzy ART and

Complement Fuzzy ART We use following models in

exper-iments, namely, Original Max-min Fuzzy ART (OriMART),

Complement Max-min Fuzzy ART (ComMART), Original

Fuzzy ART (OriART), Complement Fuzzy ART (ComART),

Kmean [15], and Euclidean ART (EucART) [16] to prove the

effectiveness of Max-min Fuzzy ART

Data of each datasets are normalized to values in [0,1] We

choose a random vector of each category to be initial weight

vector of categories Parameters of models are determined to

obtain the highest clustering results

For each dataset, we do sub-tests with the different number

of patterns The percents of successful clustering patterns are

presented in a corresponding table Bold numbers in each table

show results of the best model among compared models

1 UCI database, Avaliable at: http://archive.ics.uci.edu/ml/datasets

2 Shape database, Avaliable at: http://cs.joensuu./sipu/datasets

Table II THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS

WITH WDBC DATASET

#Record OriMART ComMART OriART ComART EucART K-mean

569 55.89 90.51 35.85 74.17 46.92 16.17

500 51 89.2 36.4 73.6 41.8 18.4

300 26 82 32.33 67.33 21 30.67

Table III THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS

WITH D31 DATASET

3100 91.87 94.45 84.74 92.94 92.48 65

2500 91.44 94.16 84.44 91.96 91.64 64.68

2000 90.7 93.95 86.1 91.55 91.2 61.6

1500 90 93.87 85.67 91.6 90.8 59.93

1000 89.5 93.2 86.4 90.9 90.5 66.2

500 84.8 90.6 89.6 86.8 88.6 57.4

A Testing with WDBC dataset Distribution of two categories is non-uniform with medium level Data from Table II shows that the ability of clustering

of Complement Max-min Fuzzy ART is greatly higher than other models in all sub-tests

B Testing with D31 dataset Distribution of 31 categories is uniform Results of Table III show that Complement Max-min Fuzzy ART is the best model in every sub-tests

C Testing with WINE-WHITE dataset Distribution of six categories is non-uniform with high level Data of Table IV shows that the ability of clustering of Original Max-min Fuzzy ART is greatly higher than other models in all sub-tests

D Testing with BALANCE-SCALE dataset Distribution of six categories is non-uniform with high level Results from Table V show that the ability of clustering of Original Max-min Fuzzy ART is sharply higher than other models in every sub-tests, excepting the last sub-test with smallest number of records (100 records)

E Testing with R15 dataset Distribution of 15 categories is uniform Data of Table VI shows that the ability of clustering of both Complement Max-min Fuzzy ART and Original Max-Max-min Fuzzy ART is higher than other models in all sub-tests, excepting the last sub-test with smallest number of records (100 records)

Table IV THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS

WITH WINE-WHITE DATASET

4898 34.73 43.32 21.95 17.78 17.07 32.24

4000 37.75 33.18 23.48 15.425 18.45 32.1

3000 41.57 12.67 28.47 18.13 16.93 30.47

2000 43.6 4.3 26.3 21.3 19.25 29.8

1000 50.8 4.1 11 23.8 23.7 20.7

Trang 5

Table V THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS

WITH BALANCE-SCALE DATASET

625 80.16 67.52 59.52 55.52 45.76 33.6

500 75.2 64.8 57.2 51.2 32.2 28.2

400 70.5 56.25 49.75 46.25 17 31.25

300 67 46.67 46.67 43.33 5 27.67

Table VI THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS

WITH R15 DATASET

600 98.17 98.17 91.2 97.8 97.8 76

500 97.8 97.8 89.4 97.4 97.4 71.2

400 97.25 97.25 86.8 96.8 96.8 64

300 96.33 96.33 88.3 95.7 95.7 53.7

Testing with MONKS dataset

Distribution of two categories is uniform Results of Table

VII show that Original max-min Fuzzy ART clusters higher

than other models, excepting the last sub-test with smallest

number of records (100 records)

F G Testing with WINE-RED dataset

Distribution of six categories is non-uniform with high level

Data of Table VIII show that Original Max-min Fuzzy ART

clusters better than other model in sub-tests with high number

of records and a bit lower than Complement Fuzzy ART at

the first sub-test (lower 0.13

In conclusion, results from sub-tests of seven experiments

show the ability of clustering of Max-min Fuzzy ART

im-proves significantly for small datasets with the high complex

(many attributes, non-uniform distribution with high level)

Especially, clustering results is high when dataset contains

many records

VI CONCLUSION

In this paper, we propose a new learning rule of Fuzzy ART

that learns from training data more effectively and improves

Table VII THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS

WITH MONKS DATASET

459 67.97 41.18 42.92 41.18 65.36 45.75

400 63.25 41.5 44.25 42 60.75 45.25

300 57.67 42.67 44 37.67 47.67 48.67

Table VIII THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS

WITH WINE-RED DATASET

1599 25.39 17.32 18.26 25.52 14.26 16.77

1200 33.83 23.08 20.25 18.67 17.75 17

900 41 27.78 21.11 10.78 22 19.22

600 32 26.83 21.5 9.833 28.67 17.83

300 12.67 21.33 25.67 16.33 51.67 24.67

the ability of clustering Our learning rule learns every patterns

of training data and avoids the effect of abnormal training patterns The improvement of clustering results is shown in our experiments with seven benchmark datasets The experiment results show that the ability of clustering of Max-min Fuzzy ART is higher than other model for complex datasets with the small number of patterns Especially, Max-min Fuzzy ART clusters more effectively with datasets that contain a high number of patterns

ACKNOWLEDGEMENTS This work was supported by Vietnams National Foundation for Science and Technology Development (NAFOSTED) un-der Granted Number 102.02-2011.13

REFERENCES [1] G Carpenter, S Grossberg, and D B Rosen, Fuzzy ART: Fast Stable Learning and Categorization of Analog Patterns by an Adaptive Reso-nance System, Neural Networks, vol 4, pp 759771, 1991.

[2] H Isawa, M Tomita, H Matsushita, and Y Nishio, Fuzzy Adaptive Res-onance Theory with Group Learning and its Applications, International Symposium on Nonlinear Theory and its Applications, no 1, pp 292295, 2007.

[3] [3] H Isawa, H Matsushita, and Y Nishio, Improved Fuzzy Adaptive Resonance Theory Combining Overlapped Category in Consideration of Connections, IEEE Workshop on Nonlinear Circuit Networks, pp 811, 2008.

[4] Yousuf and Y L Murphey, A Supervised Fuzzy Adaptive Resonance Theory with Distributed Weight Update, Proceedings of the 7th interna-tional conference on Advances in Neural Networks, Part I, LN, no 6063,

pp 430435, 2010.

[5] S Grossberg, Adaptive pattern classification and universal recoding, II: Feedback, expectation, olfaction and illusions, Biological Cybernetics, 23, 187-212, 1976.

[6] S Grossberg, How does a brain build a cognitive code, Studies of mind and brain: Neural principles of learning, perception, development, cognition, and motor control (Chap I) Boston, MA: Reidel Press, 1980 [7] G A Capenter, S Grossberg, and N Markuron, Fuzzy ARTMAP - an adaptive resonance architecture for incremental learning of analog maps, International Joint Conference on Neural Networks, vol 3., pp 309-314, 1992.

[8] J Huang, M Georgiopoulos, and G L Heileman, Fuzzy ART Properties, Neural Networks, vol 8, no 2, pp 203213, 1995.

[9] M Geogiopoulos, H Fernlund, G Bebis, and G Heileman, Fuzzy ART and Fuzzy ARTMAP-Effects of the choice parameter, Neural Networks, vol 9 pp 15411559, 1996.

[10] G C Anagnostopoulos and M Georgiopoulos, Category regions as new geometrical concepts in Fuzzy-ART and Fuzzy-ARTMAP, Neural Networks, vol 15, pp 12051221, 2002.

[11] T Burwick and F Joublin, Optimal Algorithmic Complexity of Fuzzy ART, Neural Processing Letters, vol 7, pp 3741, 1998.

[12] Dagher, M Georgiopoulos, G L Heileman, and G Bebis, An order-ing algorithm for pattern presentation in fuzzy ARTMAP that tends

to improve generalization per-formance, IEEE transactions on Neural Networks, vol 10, no 4, pp 76878, Jan 1999.

[13] M Cano, Y Dimitriadis, E Gomez, and J Coronado, Learning from noisy information in FasArt and FasBack neuro-fuzzy systems, Neural Networks, vol 14, pp 407425, 2001.

[14] K Kobayashi, S Mizuno, T Kuremoto, and M Obayashi, A Rein-forcement Learning System Based on State Space Construction Using Fuzzy ART, Proceedings of SICE Annual Conference, vol 2005, no 1, pp.36533658, 2005.

[15] J.B.MacQueen, Some methods for classication and analysis of multi-variate obser-vations, Proceedings of 5th Berkeley Symposium on Math-ematical Statistics and Probability, no 1, pp 281297, 1967.

[16] R Kenaya and K C Cheok, Euclidean ART Neural Networks, Pro-ceedings of the World Congress on Engineering and Computer Science, 2008.

Định dạng
Số trang	5
Dung lượng	641,21 KB