A max-min learning rule for Fuzzy ARTNong Thi Hoa, The Duy Bui Human Machine Interaction Laboratory University of Engineering and Technology Vietnam National University, Hanoi Abstract—F
Trang 1A max-min learning rule for Fuzzy ART
Nong Thi Hoa, The Duy Bui Human Machine Interaction Laboratory University of Engineering and Technology Vietnam National University, Hanoi
Abstract—Fuzzy Adaptive Resonance Theory (Fuzzy ART) is
an unsupervised neural network, which clusters data effectively
based on learning from training data In the learning process,
Fuzzy ARTs update the weight vector of the wining category
based on the current input pattern from training data Fuzzy
ARTs, however, only learn from patterns whose values are smaller
than values of stored patterns In this paper, we propose a
max-min learning rule of Fuzzy ART that learns all patterns of
training data and reduces effect of abnormal training patterns
Our learning rule changes the weight vector of the wining
cate-gory based on the minimal difference between the current input
pattern and the old weight vector of the wining category We
have also conducted experiments on seven benchmark datasets to
prove the effectiveness of the proposed learning rule Experiment
results show that clustering results of Fuzzy ART with our
learning rule (Max-min Fuzzy ART) is significantly higher than
that of other models in complex datasets
Index Terms—Fuzzy ART; Adaptive Resonance Theory;
Clus-tering; Learning Rule;Unsupervised Neural Network
I INTRODUCTION Clustering is an important tool in data mining and
knowl-edge discovery It discovers the hidden similarities and key
concepts from data based on its ability to group similar items
together automatically Moreover, the clustering classifies a
large amount of data into a small number of groups This
serves as an invaluable tool for users to comprehend a large
amount of data Fuzzy Adaptive Resonance Theory (Fuzzy
ART) is an unsupervised neural network clustering from data
in a effective way In Fuzzy ART, weight vectors of categories
are updated when the similarity between the input pattern
and the wining category satisfies a given condition Studies
on Fuzzy ART can be divided into three categories including
developing new models, studying properties of Fuzzy ART,
and optimizing the computing of models In the category of
developing new models, Fuzzy ARTs are improved in the
learning step in order to increase the ability of clustering
Carpenter et al [1] proposed the first Fuzzy ART showing
the capacity of stable learning of recognition categories Isawa
et al [2], [3] proposed an additional step, Group Learning,
to present connections between similar categories Yousuf and
Murphey [4] provided an algorithm that allowed Fuzzy ART to
update multiple matching categories However, previous Fuzzy
ARTs only learn patterns whose values are smaller than values
of stored patterns It means that only when values of input
pattern are smaller than values of the weight vector of the
wining category, the weight vector of the wining category is
modified Otherwise, the weight vector of wining category is
not changed In other words, some input patterns might not
Figure 1 Architecture of an ART network
contribute to the learning process of Fuzzy ART Therefore, some important features of these patterns are not learned In this paper, we propose a new learning rule of Fuzzy ART, which we name max-min learning rule Our learning rule allows contribution of all training patterns to the learning process and reduces the effect of abnormal training patterns Proposed learning rule updates the weight vector of the wining category based on the minimal difference between the current input pattern and the old weight vector of the wining category
We have conducted experiments on seven benchmark datasets
in UCI and Shape database to prove the effectiveness of Max-min Fuzzy ART Results from the experiments shows that Max-min Fuzzy ART clusters better than other models The rest of this paper is organized as follows The next section
is back ground of Fuzzy ART Related works are presented
in Section 3 In section 4, we present the proposed learning rule Section 5 shows experiments and compares results of Max-min Fuzzy ART with those of other models
II BACK GROUND
A ART Network Adaptive Resonance Theory (ART) neural networks [5], [6] are developed to address the problem of stability-plasticity dilemma The general structure of an ART network is shown
in the Figure 1
A typical ART network consists of two layers: an input layer (F1) and an output layer (F2) The input layer contains
n nodes, where n is the number of input patterns The number
of nodes in the output layer is decided dynamically Every node in the output layer has a corresponding weight vector The network dynamics are governed by two sub-systems: an attention subsystem and an orienting subsystem The attention subsystem proposes a winning neuron (or category) and the orienting subsystem decides whether to accept the winning neuron or not This network is in a resonant state when the orienting subsystem accepts a winning category
2013 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for the Future (RIVF)
Trang 2B Fuzzy ART Algorithm
Carpenter et al summarize Fuzzy ART algorithm in [1]
Input vector: Each input I is an M-dimensional vector
(I1, IM), where each component li is in the interval [0,
1]
Parameters: Fuzzy ART’s dynamics are determined by a
choice parameter α > 0; a learning rate parameter β ∈ [0, 1];
and a vigilance parameter θ ∈ [0, 1]
Fuzzy ART algorithm consists of five following steps:
Step 1: Set up weight vector
Each category (j) corresponds to a vector Wj =
(Wj1, , WjM) of adaptive weights, or LTM traces The
number of potential categories N(j = i, , N) is arbitrary
Initially
and each category is said to be uncommitted Alternatively,
initial weights Wji may be taken greater than 1 Larger
weights bias the system against selection of uncommitted
nodes, leading to deeper searches of previously coded
cat-egories After a category is selected for coding it becomes
committed As shown below, each LTM trace Wjiis monotone
non-increasing through time and hence converges to a limit
Step 2: Choose a wining category
For each input I and category j, the choice function Tj is
defined by
Tj(I) = ||I ∧ wj||
where the fuzzy AND operator ∧ is defined by
(x ∧ y)i= min(xi, yi) (3) and where the norm ||.|| is defined by
||x|| =
M
X
i=1
For notational simplicity, Tj(I) in Eqs 2 is often written as
Tj when the input I is fixed The category choice is indexed
by j, where
Tj= max{Tj: j = 1 N } (5)
If more than one Tj is maximal, the category j with
the smallest index is chosen In particular, nodes become
committed in order j = 1, 2, 3,
Step 3: Test state of Fuzzy ART
Resonance occurs if the match function of the chosen
category meets the vigilance criterion; that is, if
||I ∧ wj||
then learning process is performed in Step 4
Mismatch reset occurs if
||I ∧ wj||
Then the value of the choice function Tj is reset to −1 for the duration of the input presentation A new index J is chosen,
by Eqs 5 The search process continues until the chosen j satisfies Eqs 6 or actives a new category
Step 4: Perform the learning process
The weight vector if jthcategory, Wj, is updated according
to the following equation:
Wjnew= β(I ∧ Wjold) + (1 − β)Wjold (8) Step 5: Active a new category
For each input I, if no existing category satisfies Eqs 6 then
a new category j becomes active Then, Wj(new)= I
C Fuzzy ART with complement coding Proliferation of categories is avoided in Fuzzy ART if inputs are normalized; that is, for some γ > 0
for all inputs I Normalization can be achieved by prepro-cessing each incoming vector a
Complement coding represents both a and the complement
of a The complement of a is denoted by ac, where
The complement coded input I to the recognition system is the 2M-dimensional vector
I = (ai, aci) = (a1, , aM, ac1, , aMi ) (11) After normalization, ||I|| = M so inputs preprocessed into complement coding form are automatically normalized Where complement coding is used, the initial condition in Eqs 1 is replaced by:
III RELATED WORK Studies on theory of Fuzzy ART can be divided into three categories including developing new models, studying proper-ties of Fuzzy ART, and optimizing the computing of models
In the first category, new models of Fuzzy ART used a general learning rule Carpenter et al [7] proposed Fuzzy ARTMAP for incremental supervised learning of recognition categories and multidimensional maps from arbitrary sequences of input set This model minimized predictive error and maximized code generalization by increasing the ART vigilance parameter
to correct the predictive error Prediction was improved by training system several times with different sequences of input set, then voting This vote assigns probability estimations to competing predictions for small, noisy, and incomplete data Isawa et al [2] proposed an additional step that was called
Trang 3Group Learning Its important feature was that creating
con-nections between similar categories It means that this model
learned not only weight vectors of categories but also relations
among categories in a group Then, Isawa [3] designed an
im-proved Fuzzy ART combining overlapped categories base on
connections This study arranged the vigilance parameters for
categories and varied them in learning process Moreover, this
model voided the category proliferating Yousuf and Murphey
[4] proposed an algorithm that compared the weights of every
categories with the current input pattern simultaneously and
allowed updating multiple matching categories This model
monitored the effects of updating wrong clusters Weight
scaling of categories depended on the closeness of the weight
vectors to the current input pattern
In the second category, important properties of Fuzzy ART
were studied to choose suitable parameters for each Fuzzy
ART Huang et al [8] presented some vital properties that
were distinguished into a number of categories The vital
properties included template, access, reset, and other properties
for weight stabilization Moreover, the effects of the choice
parameter and the vigilance parameter on the functionality of
Fuzzy ART were presented clearly Geogiopoulos et al [9]
provided a geometrical and clearer understanding of why, and
in which order that categories were chosen for various ranges
of the choice parameter This study came in useful when
devel-oping properties of learning that pertained to the architecture
of neural networks Anagnostopoulos and Georgiopoulos [10]
introduced geometric concepts, namely category regions, in the
original framework of Fuzzy ART and Fuzzy ARTMAP These
regions had the same geometrical shape and shared many
common and interesting properties They proved properties of
the learning and showed that training and performance phases
did not depend on particular choices of the vigilance parameter
in one special state of the vigilance-choice parameter space
In the third category, studies focused on ways to improve
the performance of Fuzzy ART Burwick and Joublin [11]
discussed implementations of ART on a non-recursive
al-gorithm to decrease alal-gorithmic complexity of Fuzzy ART
Therefore, the complexity dropped from O(N*N+M*N) down
to O(NM) where N was the number of categories and M
was the input dimension Dagher et al [12] introduced an
ordering algorithm for Fuzzy ARTMAP that identified a fixed
order of training pattern presentation based on the maxmin
clustering method The combination of this algorithm with
fuzzy ARTMAP established an ordered Fuzzy ARTMAP that
exhibited a generalization performance better Cano et al
[13] generated accurate function identifiers for noisy data
This study was supported by theorems that guaranteed the
possibility of representing an arbitrary function by fuzzy
systems They proposed two neuron-fuzzy identifiers that
offered a dual interpretation as fuzzy logic system or neural
network Moreover, these identifiers can be trained on noisy
data without changing the structure of neural networks or data
preprocessing Kobayashi et al [14] proposed a reinforcement
learning system that used Fuzzy ART to classify observed
information and construct an effective state space Then, profit
sharing was employed as a reinforcement learning method Furthermore, this system was used to effectively solve partially observed Markov decision process
IV OUR APPROACH
A Max-min learning rule of Fuzzy ART
As we discuss in Section 1, Fuzzy ARTs cannot learn from some important patterns of training data Therefore, we propose the max-min learning rule that learns all training patterns and avoids the effect of abnormal training patterns
In our learning rule, the weight vector of the wining category is updated based on the minimum difference between the input patterns and the old weight vector of the wining category Parameter shows the effect of each training pattern for each wining category We proposed a procedure to find an optimized value of for each dataset In this procedure, after
is set up roughly based on the size of dataset, it is increased
or decreased until clustering results become highest
Max-min learning rule is presented as follow:
Learning step is performed by the three following steps:
• Step 1: Determine the minimum difference between the current input pattern and the old weight vector of the wining category including the minimum difference of decrease (MDD) and the minimum difference of increase (MDI) MDD and MDI are formulated by two equations:
M DD = lim inf
Ii<W old
ji ,i=1, ,M
Wjiold− Ii (13)
M DI = lim inf
Ii>W old
ji ,i=1, ,M
Ii− Wold
ji (14)
• Step 2: Find an optimized value of δ by the procedure
in the next subsection
• Step 3: The weight vector of the wining category, W, is updated by the following equation:
Wji=
Wjiold− δ ∗ M DD, Ii< Wjiold
Wold
ji , Ii= Wold
ji
Wjiold+ δ ∗ M DI, Ii> Wjiold
(15)
B Procedure for determining an optimized value ofδ
We select a random subset from dataset with uniform distribution for categories Then, Max-min Fuzzy ART uses this subset to test the ability of clustering for each value of Our procedure consists of three steps as follow:
• Step 1: Set up δ based on the size of dataset Then, calculate clustering result
• Step 2: Do – Step 2.1: Increase or decrease value of with a small step
– Step 2.2: Calculate the clustering result
– Step 2.3: Test clustering result:
IF the ability of clustering is an increase or decrease THEN do Step 2
IF clustering result is highest THEN go Step 3
Trang 4Table I CHARACTERISTICS OF DATASETS
until the clustering result is highest
• Step 3: Return the optimized value of δ
C Discussion
The proposed learning rule has two important advantages
overcome original learning rule as follow:
• All patterns from training data are learned by Eqs 15
Moreover, the effect level is equal for each training
pattern because parameter δ is fixed
• Avoiding the effect of abnormal training patterns
pre-sented in Eqs 13 and 14 because MDI and MDD are
the minimum difference of decrease and the minimum
difference of increase
The ability of clustering of Max-min Fuzzy ART can be
improved based on improvements of the learning process
V EXPERIMENTS
We select seven bench mark datasets from UCI database
1 and Shape database 2 for experiments, namely, MONKS,
BALANCE-SCALE, D31, R35, WDBC (Wisconsin
Diagnos-tic Breast Cancer), WINE-RED (Wine Quality of Red wine),
and WINE-WHITE (Wine Quality of White wine) These
datasets are different from each other by the number of
attributes, categories, patterns, and distribution of categories
Table I shows characteristics of selected datasets
Max-min Fuzzy ART is implemented into two models
including first model (Original Max-min Fuzzy ART) and
the second model with normalized inputs (Complement
Max-min Fuzzy ART) Similarly, Fuzzy ART of Carpenter [1]
consists of two models including Original Fuzzy ART and
Complement Fuzzy ART We use following models in
exper-iments, namely, Original Max-min Fuzzy ART (OriMART),
Complement Max-min Fuzzy ART (ComMART), Original
Fuzzy ART (OriART), Complement Fuzzy ART (ComART),
Kmean [15], and Euclidean ART (EucART) [16] to prove the
effectiveness of Max-min Fuzzy ART
Data of each datasets are normalized to values in [0,1] We
choose a random vector of each category to be initial weight
vector of categories Parameters of models are determined to
obtain the highest clustering results
For each dataset, we do sub-tests with the different number
of patterns The percents of successful clustering patterns are
presented in a corresponding table Bold numbers in each table
show results of the best model among compared models
1 UCI database, Avaliable at: http://archive.ics.uci.edu/ml/datasets
2 Shape database, Avaliable at: http://cs.joensuu./sipu/datasets
Table II THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS
WITH WDBC DATASET
#Record OriMART ComMART OriART ComART EucART K-mean
569 55.89 90.51 35.85 74.17 46.92 16.17
500 51 89.2 36.4 73.6 41.8 18.4
300 26 82 32.33 67.33 21 30.67
Table III THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS
WITH D31 DATASET
#Record OriMART ComMART OriART ComART EucART K-mean
3100 91.87 94.45 84.74 92.94 92.48 65
2500 91.44 94.16 84.44 91.96 91.64 64.68
2000 90.7 93.95 86.1 91.55 91.2 61.6
1500 90 93.87 85.67 91.6 90.8 59.93
1000 89.5 93.2 86.4 90.9 90.5 66.2
500 84.8 90.6 89.6 86.8 88.6 57.4
A Testing with WDBC dataset Distribution of two categories is non-uniform with medium level Data from Table II shows that the ability of clustering
of Complement Max-min Fuzzy ART is greatly higher than other models in all sub-tests
B Testing with D31 dataset Distribution of 31 categories is uniform Results of Table III show that Complement Max-min Fuzzy ART is the best model in every sub-tests
C Testing with WINE-WHITE dataset Distribution of six categories is non-uniform with high level Data of Table IV shows that the ability of clustering of Original Max-min Fuzzy ART is greatly higher than other models in all sub-tests
D Testing with BALANCE-SCALE dataset Distribution of six categories is non-uniform with high level Results from Table V show that the ability of clustering of Original Max-min Fuzzy ART is sharply higher than other models in every sub-tests, excepting the last sub-test with smallest number of records (100 records)
E Testing with R15 dataset Distribution of 15 categories is uniform Data of Table VI shows that the ability of clustering of both Complement Max-min Fuzzy ART and Original Max-Max-min Fuzzy ART is higher than other models in all sub-tests, excepting the last sub-test with smallest number of records (100 records)
Table IV THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS
WITH WINE-WHITE DATASET
#Record OriMART ComMART OriART ComART EucART K-mean
4898 34.73 43.32 21.95 17.78 17.07 32.24
4000 37.75 33.18 23.48 15.425 18.45 32.1
3000 41.57 12.67 28.47 18.13 16.93 30.47
2000 43.6 4.3 26.3 21.3 19.25 29.8
1000 50.8 4.1 11 23.8 23.7 20.7
Trang 5Table V THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS
WITH BALANCE-SCALE DATASET
#Record OriMART ComMART OriART ComART EucART K-mean
625 80.16 67.52 59.52 55.52 45.76 33.6
500 75.2 64.8 57.2 51.2 32.2 28.2
400 70.5 56.25 49.75 46.25 17 31.25
300 67 46.67 46.67 43.33 5 27.67
Table VI THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS
WITH R15 DATASET
#Record OriMART ComMART OriART ComART EucART K-mean
600 98.17 98.17 91.2 97.8 97.8 76
500 97.8 97.8 89.4 97.4 97.4 71.2
400 97.25 97.25 86.8 96.8 96.8 64
300 96.33 96.33 88.3 95.7 95.7 53.7
Testing with MONKS dataset
Distribution of two categories is uniform Results of Table
VII show that Original max-min Fuzzy ART clusters higher
than other models, excepting the last sub-test with smallest
number of records (100 records)
F G Testing with WINE-RED dataset
Distribution of six categories is non-uniform with high level
Data of Table VIII show that Original Max-min Fuzzy ART
clusters better than other model in sub-tests with high number
of records and a bit lower than Complement Fuzzy ART at
the first sub-test (lower 0.13
In conclusion, results from sub-tests of seven experiments
show the ability of clustering of Max-min Fuzzy ART
im-proves significantly for small datasets with the high complex
(many attributes, non-uniform distribution with high level)
Especially, clustering results is high when dataset contains
many records
VI CONCLUSION
In this paper, we propose a new learning rule of Fuzzy ART
that learns from training data more effectively and improves
Table VII THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS
WITH MONKS DATASET
#Record OriMART ComMART OriART ComART EucART K-mean
459 67.97 41.18 42.92 41.18 65.36 45.75
400 63.25 41.5 44.25 42 60.75 45.25
300 57.67 42.67 44 37.67 47.67 48.67
Table VIII THE PERCENTAGES OF SUCCESSFUL CLUSTERING PATTERNS
WITH WINE-RED DATASET
#Record OriMART ComMART OriART ComART EucART K-mean
1599 25.39 17.32 18.26 25.52 14.26 16.77
1200 33.83 23.08 20.25 18.67 17.75 17
900 41 27.78 21.11 10.78 22 19.22
600 32 26.83 21.5 9.833 28.67 17.83
300 12.67 21.33 25.67 16.33 51.67 24.67
the ability of clustering Our learning rule learns every patterns
of training data and avoids the effect of abnormal training patterns The improvement of clustering results is shown in our experiments with seven benchmark datasets The experiment results show that the ability of clustering of Max-min Fuzzy ART is higher than other model for complex datasets with the small number of patterns Especially, Max-min Fuzzy ART clusters more effectively with datasets that contain a high number of patterns
ACKNOWLEDGEMENTS This work was supported by Vietnams National Foundation for Science and Technology Development (NAFOSTED) un-der Granted Number 102.02-2011.13
REFERENCES [1] G Carpenter, S Grossberg, and D B Rosen, Fuzzy ART: Fast Stable Learning and Categorization of Analog Patterns by an Adaptive Reso-nance System, Neural Networks, vol 4, pp 759771, 1991.
[2] H Isawa, M Tomita, H Matsushita, and Y Nishio, Fuzzy Adaptive Res-onance Theory with Group Learning and its Applications, International Symposium on Nonlinear Theory and its Applications, no 1, pp 292295, 2007.
[3] [3] H Isawa, H Matsushita, and Y Nishio, Improved Fuzzy Adaptive Resonance Theory Combining Overlapped Category in Consideration of Connections, IEEE Workshop on Nonlinear Circuit Networks, pp 811, 2008.
[4] Yousuf and Y L Murphey, A Supervised Fuzzy Adaptive Resonance Theory with Distributed Weight Update, Proceedings of the 7th interna-tional conference on Advances in Neural Networks, Part I, LN, no 6063,
pp 430435, 2010.
[5] S Grossberg, Adaptive pattern classification and universal recoding, II: Feedback, expectation, olfaction and illusions, Biological Cybernetics, 23, 187-212, 1976.
[6] S Grossberg, How does a brain build a cognitive code, Studies of mind and brain: Neural principles of learning, perception, development, cognition, and motor control (Chap I) Boston, MA: Reidel Press, 1980 [7] G A Capenter, S Grossberg, and N Markuron, Fuzzy ARTMAP - an adaptive resonance architecture for incremental learning of analog maps, International Joint Conference on Neural Networks, vol 3., pp 309-314, 1992.
[8] J Huang, M Georgiopoulos, and G L Heileman, Fuzzy ART Properties, Neural Networks, vol 8, no 2, pp 203213, 1995.
[9] M Geogiopoulos, H Fernlund, G Bebis, and G Heileman, Fuzzy ART and Fuzzy ARTMAP-Effects of the choice parameter, Neural Networks, vol 9 pp 15411559, 1996.
[10] G C Anagnostopoulos and M Georgiopoulos, Category regions as new geometrical concepts in Fuzzy-ART and Fuzzy-ARTMAP, Neural Networks, vol 15, pp 12051221, 2002.
[11] T Burwick and F Joublin, Optimal Algorithmic Complexity of Fuzzy ART, Neural Processing Letters, vol 7, pp 3741, 1998.
[12] Dagher, M Georgiopoulos, G L Heileman, and G Bebis, An order-ing algorithm for pattern presentation in fuzzy ARTMAP that tends
to improve generalization per-formance, IEEE transactions on Neural Networks, vol 10, no 4, pp 76878, Jan 1999.
[13] M Cano, Y Dimitriadis, E Gomez, and J Coronado, Learning from noisy information in FasArt and FasBack neuro-fuzzy systems, Neural Networks, vol 14, pp 407425, 2001.
[14] K Kobayashi, S Mizuno, T Kuremoto, and M Obayashi, A Rein-forcement Learning System Based on State Space Construction Using Fuzzy ART, Proceedings of SICE Annual Conference, vol 2005, no 1, pp.36533658, 2005.
[15] J.B.MacQueen, Some methods for classication and analysis of multi-variate obser-vations, Proceedings of 5th Berkeley Symposium on Math-ematical Statistics and Probability, no 1, pp 281297, 1967.
[16] R Kenaya and K C Cheok, Euclidean ART Neural Networks, Pro-ceedings of the World Congress on Engineering and Computer Science, 2008.