Based upon the observation that the communication cost and the quality of results in CDFCM could be ameliorated through the integration of a distributed picture fuzzy clustering with the
Trang 1DPFCM: A novel distributed picture fuzzy clustering method on picture
fuzzy sets
VNU University of Science, Vietnam National University, Viet Nam
a r t i c l e i n f o
Article history:
Available online 26 July 2014
Keywords:
Clustering quality
Distributed clustering
Facilitator model
Fuzzy clustering
Picture fuzzy sets
a b s t r a c t
Fuzzy clustering is considered as an important tool in pattern recognition and knowledge discovery from
a database; thus has been being applied broadly to various practical problems Recent advances in data organization and processing such as the cloud computing technology which are suitable for the manage-ment, privacy and storing big datasets have made a significant breakthrough to information sciences and
to the enhancement of the efficiency of fuzzy clustering Distributed fuzzy clustering is an efficient mining technique that adapts the traditional fuzzy clustering with a new storage behavior where parts
of the dataset are stored in different sites instead of the centralized main site Some distributed fuzzy clustering algorithms were presented including the most effective one – the CDFCM of Zhou et al (2013) Based upon the observation that the communication cost and the quality of results in CDFCM could be ameliorated through the integration of a distributed picture fuzzy clustering with the facilitator model, in this paper we will present a novel distributed picture fuzzy clustering method on picture fuzzy sets so-called DPFCM Experimental results on various datasets show that the clustering quality of DPFCM
is better than those of CDFCM and relevant algorithms
Ó 2014 Elsevier Ltd All rights reserved
1 Introduction
Fuzzy clustering is considered as an important tool in pattern
recognition and knowledge discovery from a database; thus has
been being applied broadly to various practical problems The first
fuzzy clustering algorithm is Fuzzy C-Means (FCM) proposed by
and the partition matrix in each step in order to satisfy a given
objective function Bezdek proved that FCM converges to the
saddle points of the objective function Even though FCM was
proposed long time ago, this algorithm is still a popular fuzzy
clustering that has been being applied to many practical problems
for the rules extraction and implicit patterns discovery wherein the
fuzziness exist such as,
Moriarty, 2002; Cao, Deng, & Wang, 2012; Chen, Chen, & Lu,
2011; Chuang, Tzeng, Chen, Wu, & Chen, 2006; Krinidis &
Chatzis, 2010; Li, Chui, Chang, & Ong, 2011; Ma & Staunton,
2007; Pham, Xu, & Prince, 2000; Siang Tan & Mat Isa, 2011;
Zhang & Chen, 2004);
Face recognition (Agarwal, Agrawal, Jain, & Kumar, 2010; Chen
& Huang, 2003; Haddadnia, Faez, & Ahmadi, 2003; Lu, Yuan, &
Gesture recognition (Li, 2003; Wachs, Stern, & Edan, 2003);
Chimphlee, & Srinoy, 2005; Chimphlee, Abdullah, Noor Md Sap, Srinoy, & Chimphlee, 2006; Shah, Undercoffer, & Joshi,
Hot-spot spatial analysis (Di Martino, Loia, & Sessa, 2008);
Risk analysis (Li, Li, & Kang, 2011);
Bankrupt prediction (Martin, Gayathri, Saranya, Gayathri, &
2014a, 2014b; Son, Cuong, Lanzi, & Thong, 2012, 2013, 2014; Son, Lanzi, Cuong, & Hung, 2012);
Dhavale, & Sarkis, 2014; Chu, Liau, Lin, & Su, 2012; Egrioglu, Aladag, & Yolcu, 2013; Egrioglu, 2011; Hadavandi, Shavandi, & Ghanbari, 2011; Izakian & Abraham, 2011; Roh, Pedrycz, & Ahn, 2014; Wang, Ma, Lao, & Wang, 2014; Zhang, Huang, Ji, & Xie, 2011)
Recent advances in data organization and processing such as the cloud computing technology which are suitable for the
http://dx.doi.org/10.1016/j.eswa.2014.07.026
0957-4174/Ó 2014 Elsevier Ltd All rights reserved.
⇑ Official address: 334 Nguyen Trai, Thanh Xuan, Hanoi, Viet Nam Tel.: +84
904171284; fax: +84 0438623938.
E-mail addresses: sonlh@vnu.edu.vn , chinhson2002@gmail.com
Contents lists available atScienceDirect
Expert Systems with Applications
j o u r n a l h o m e p a g e : w w w e l s e v i e r c o m / l o c a t e / e s w a
Trang 2management, privacy and storing big datasets have made a
signif-icant breakthrough to information sciences in general and to the
enhancement of the efficiency of FCM in particular For example,
cloud computing is an Internet-based storage solution where
ubiq-uitous computers resources are set up with the same configuration
in order to develop and run applications as if they were
con-structed in a single centralized system Users do not need to know
where and how the computers resources operate so that the
main-tenance and running costs could be reduced; thus guaranteeing the
stable expansion of applications In the cloud computing paradigm,
data mining techniques especially fuzzy clustering are very much
needed in order to retrieve meaningful information from virtually
stated that using data mining through cloud computing reduces
the barriers that keep users from benefiting of the data mining
instruments so that they could only pay for the data mining tools
without handling complex hardware and data infrastructures
Examples of deploying data mining and clustering algorithms in
some typical cloud computing service providers such as Amazone
cloud, Google Apps, Microsoft, Salesforce and IBM could be found
Wu, Guru, & Buyya, 2010) and others (Surcel & Alecu, 2008) Such
these algorithms are called the distributed mining techniques
Distributed fuzzy clustering is a distributed mining technique
that adapts the traditional fuzzy clustering with a new storage
behavior where parts of the dataset are stored in different sites
instead of the centralized main site Distributed fuzzy clustering
is extended from the distributed hard clustering algorithms
Sev-eral efforts on distributed hard/fuzzy clustering could be named
distributed clustering algorithm called dSimpleGraph based on the
relation between two micro-clusters to classify data on the local
machines and generate a determined global view from local views
of Support Vector Machine for large-scale datasets and presented a
distributed clustering method inspired by the Multi-Agent
frame-work, in which data are divided to different agents, and the global
only limited local knowledge for clustering static and dynamic
graphs.Kwon et al (2010)proposed a scalable, parallel algorithm
algo-rithm based upon spatial data correlation among sensor nodes
and performed data accuracy for each distributed cluster at their
a distributed density-based clustering that both reduces the
communication overheads and improves the quality of the global
algo-rithm based on the aggregation of models produced locally that
means datasets were processed locally on each node and the
results were integrated to construct global clusters hierarchically
The aim of this approach is to minimize the communications,
maximize the parallelism and load balance the work among
differ-ent nodes of the system, and reduce the overhead due to extra
identification and outlier detection for distributed data based on
the idea that is to generate independent local models and combine
the local models at a central server to obtain global clusters with
presented a distributed random walk based clustering algorithm
that builds a bounded-size core through a random walks based
Clustering algorithm that takes only the distance function, which satisfies the triangle inequality and is of sufficiently high-granular-ity to permit the data to be partitioned into canopies of optimal
algorithms based on k-means and k-median The basic ideas of these algorithms are to reduce the problem of finding a clustering with low cost to the problem of finding a core-set of small size then construct a global core-set.Hai, Zhang, Zhu, and Wang (2012), Jain
of distributed clustering group methods including partitioning, hierarchical, density-based, soft-computing, neural network and fuzzy clustering methods They argued that datasets in the real world applications often consist of inconsistencies or outliers, where it is difficult to obtain homogeneous and meaningful global clusters so that the distributed hard clustering should incorporate with the fuzzy set theory in order to handle the hesitancy originat-ing from imperfect and imprecise information A parallel version of the FCM algorithm so-called PFCM aiming for the distributed fuzzy
modified the PFCM algorithm with a pre-processing procedure to estimate the number of clusters and also presented a
algorithm which
based distributed clustering algorithm including two different levels: the local level and the global level In the local level, numerical datasets are converted into intuitionistic fuzzy data and they are clustered independently from each other using modified FCM algo-rithm In the global level, global center is computed by clustering all local cluster centers The global center is again transmitted to local sites to update the local cluster model The communication
Master–Slave model A distributed fuzzy clustering namely CDFCM
and attribute-weights are calculated at each peer and then updated
by neighboring results through local communications The process
is repeated until a pre-defined stopping criterion hold, and the sta-tus quo of clusters in all peers reflects accurately the results as in the centralized clustering CDFCM was experimental validated and had better clustering quality than other relevant algorithms
Soft-DKM (Forero, Cano, & Giannakis, 2011) and WEFCM (Zhou &
distributed fuzzy clustering available in the literature
The motivation of this paper is described as follows In the activities of CDFCM, this algorithm solely updates the cluster cen-ters and attribute-weights of each peer by those of neighboring peers This requires large communication costs, approximately
P NB communications per iteration with P being the number of peers and NB being the average number of neighbors of a given peer Additionally, the quality of results in each peer could not be high since only local updates with neighboring results are con-ducted Based upon the idea that the communication cost and the quality of results in CDFCM could be ameliorated through the integration of a distributed picture fuzzy clustering with the facil-itator model, in this paper we will present a novel distributed pic-ture fuzzy clustering method on picpic-ture fuzzy sets so-called DPFCM The proposed algorithm utilizes the facilitator model that means all peers transferred their results into a special, unique peer called the Master peer so that it takes only P communication costs to complete the update process Employing the Master peer in the
Trang 3facilitator model also helps increasing the capability to update
more numbers of neighboring results, thus advancing the quality
of results In order to enhance the clustering quality as high as
pos-sible, we also deploy the distributed fuzzy clustering algorithm in
picture fuzzy sets (PFS) (Cuong & Kreinovich, 2013), which in
essence are the generalization of the traditional fuzzy sets (FS)
used for the development of the existing CDFCM algorithm PFS
based models can be applied to situations requiring human
opin-ions involving more answers of types: yes, abstain, no and refusal,
which can not be accurately expressed in the traditional FS
There-fore, deploying the distributed clustering algorithm in PFS could
give higher clustering quality than in FS and in IFS Our
contribu-tion in this paper is a novel distributed picture fuzzy clustering
method (DPFCM) that utilizes the ideas of both the facilitator
model and deploying clustering algorithms in PFS in order to
ame-liorate the clustering quality The proposed algorithm will be
implemented and validated in comparison with CDFCM and other
relevant algorithms in terms of clustering quality The significance
of this contributed research is not only the enhancement of the
clustering quality of distributed fuzzy clustering algorithm but also
the enrichment of the know-how knowledge of integrating picture
fuzzy sets to clustering algorithms and deploying them to practical
applications Indeed, the contribution of this paper is meaningful to
both theoretical and practical sides
The rest of the paper is organized as follows Section2gives the
preliminary about the PFS set The formulation of clustering
algo-rithms in PFS in association with the facilitator model is described
in Section3 Section4validates the proposed approach through a
draws the conclusions and delineates the future research
directions
2 Preliminary
In this section, we take a brief overview of some basic terms and
notations in PFS, which can be used throughout the paper
Definition 1 A picture fuzzy set (PFS) (Cuong & Kreinovich, 2013)
in a non-empty set X is,
_A ¼ x;l_AðxÞ;g_AðxÞ;c_AðxÞ
jx 2 X
wherel_AðxÞ is the positive degree of each element x 2 X;g_AðxÞ is the
constraints,
l_AðxÞ;g_AðxÞ;c_AðxÞ 2 ½0; 1; 8x 2 X; ð2Þ
0 6l_AðxÞ þg_AðxÞ þc_AðxÞ 6 1; 8x 2 X: ð3Þ
nA _ðxÞ ¼ 1 ðl_AðxÞ þgA _ðxÞ þcA _ðxÞÞ;8x 2 X In cases nA _ðxÞ ¼ 0 PFS
returns to intuitionistic fuzzy sets (IFS) (Atanassov, 1986), and
when both gA _ðxÞ ¼ n_AðxÞ ¼ 0, PFS returns to fuzzy sets (FS)
consider some examples below
Example 1 In a democratic election station, the council issues 500
voting papers for a candidate The voting results are divided into
four groups accompanied with the number of papers that are ‘‘vote
for’’ (300), ‘‘abstain’’ (64), ‘‘vote against’’ (115) and ‘‘refusal of
voting’’ (21) Group ‘‘abstain’’ means that the voting paper is a
white paper rejecting both ‘‘agree’’ and ‘‘disagree’’ for the
candi-date but still takes the vote Group ‘‘refusal of voting’’ is either
invalid voting papers or did not take the vote This example was
happened in reality and IFS could not handle it since the refusal
degree (group ‘‘refusal of voting’’) does not exist
Example 2 A patient was given the first emergency aid and diag-nosed by four states after examining possible symptoms that are
‘‘heart attack’’, ‘‘uncertain’’, ‘‘not heart attack’’, ‘‘appendicitis’’ In this case, we also have a PFS set
Now, we briefly present some basic picture fuzzy operations, picture distance metrics and picture fuzzy relations Let PFS(X) denote the set of all PFS sets on universe X
Definition 2 For A,B 2 PFS(X), the union, intersection and com-plement operations are defined as follows
A [ B ¼ fhx; maxflAðxÞ;lBðxÞg; minfgAðxÞ;gBðxÞg;
minfcAðxÞ;cBðxÞgijx 2 Xg; ð4Þ
A \ B ¼ fhx; minflAðxÞ;lBðxÞg; minfgAðxÞ;gBðxÞg;
maxfcAðxÞ;cBðxÞgijx 2 Xg; ð5Þ
A ¼ fhx;cAðxÞ;gAðxÞ;lAðxÞijx 2 Xg: ð6Þ
Definition 3 For A,B 2 PFS(X), the Cartesian product of these PFS sets is,
A1B ¼ fhðx;yÞ;lAðxÞ:lBðyÞ;gAðxÞ:gBðyÞ;cAðxÞ:cBðyÞijx 2 A; y 2 Bg; ð7Þ A2B ¼ fhðx;yÞ;lAðxÞ ^lBðyÞ;gAðxÞ ^gBðyÞ;
cAðxÞ _cBðyÞijx 2 A; y 2 Bg: ð8Þ
Definition 4 The distances between A,B 2 PFS(X) are the
dpðA; BÞ ¼1
N
XN i¼1
ðjlAðxiÞ lBðxiÞj þ jgAðxiÞ gBðxiÞj þ jcAðxiÞ cBðxiÞjÞ;
ð9Þ
epðA; BÞ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1
N
XN i¼1
ððlAðxiÞ lBðxiÞÞ2þ ðgAðxiÞ gBðxiÞÞ2þ ðcAðxiÞ cBðxiÞÞ2Þ
v u
:
ð10Þ
Definition 5 The picture fuzzy relation R is a picture fuzzy subset
of A B, given by
R ¼ fhðx; yÞ;lRðx; yÞ;gRðx; yÞ;cRðx; yÞijx 2 A; y 2 Bg; ð11Þ
lR;gR;cR:A B ! ½0; 1; ð12Þ
lRðx; yÞ þgRðx; yÞ þcRðx; yÞ 6 1; 8ðx; yÞ 2 A B: ð13Þ
PFR(A B) is the set of all picture fuzzy subset on A B Some properties of PFS operations, the convex combination of PFS, etc
3 The proposed method 3.1 The proposed distributed picture fuzzy clustering model
In this section, we propose a distributed picture fuzzy clustering model The communication model is the facilitator or the Master– Slave model having a Master peer and P Slave peers, and each Slave peer is allowed to communicate with the Master only Each Slave peer has a subset of the original dataset X consisting of N data
[P j¼1Yj¼ X;PP
j¼1jYjj ¼ N The number of dimensions in a subset is exactly the same as that in the original dataset Let us divide the dataset X into C groups satisfying the objective function below
Trang 4J ¼XP
l¼1
XYl
k¼1
XC
j¼1
ulkj
1 glkj nlkj
!m
Xr h¼1
wljhkXlkh Vljhk2
þcXP
l¼1
XC
j¼1
Xr
h¼1
wljhlog wljh! min; ð14Þ
Where ulkj,glkjand nlkjare the positive, the neutral and the refusal
degrees of data point kth to cluster jth in the Slave peer lth This
1 wljhis the attribute-weight of attribute hth to cluster jth in the
Slave peer lth Vljhis the center of cluster jth in the Slave peer lth
according to attribute hth Xlkhis the kth data point of the Slave peer
lth according to attribute hth m andcare the fuzzifier and a
posi-tive scalar, respecposi-tively The constraints for(14)are shown below
ulkj;glkj;nlkj2 ½0; 1; ð15Þ
ulkjþglkjþ nlkj61; ð16Þ
XC
j¼1
ulkj
1 glkj nlkj
!
XC
j¼1
glkjþnlkj
C
Xr
h¼1
Vljh¼ Vijh;ð8i – l; i; l ¼ 1; PÞ; ð20Þ
wljh¼ wijh:ð8i – l; i; l ¼ 1; PÞ: ð21Þ
The proposed model in Eqs.(14)–(21)relies on the principles of the
PFS set and the facilitator model The differences of this model with
The proposed model is the generalization of the CDFCM model
since when glkj= nlkj= 0 that means the PFS set degrades to
the FS set, it returns to the CDFCM model resulting in both
the objective function and the constraints In the other words,
Moreover, the constraints(15)–(18)that describes the relations
of some degrees in the PFS set were integrated to the
optimiza-tion problem By doing so, the new distributed picture fuzzy
clustering model is totally set up according to the PFS set
The proposed model utilizes the facilitator model to increase
the number of neighboring results used to update that of a
given peer, thus giving high accuracy of the final results This
reflects in the constraints(20) and (21)where the cluster
cen-ters and the attribute-weights of two ubiquitous peers are
coin-cided so that these local centers and attribute-weights would
converge to the global ones
Additional remarks of the distributed picture fuzzy clustering
within clusters and maximizes the entropy of attribute-weights
allowing important attributes could contribute greatly to the
identification of clusters
The constraints(15) and (16)are originated from the definition
of PFS
Constraint(17)describes that the sum of memberships of a data
point to all clusters in a Slave peer is equal to one Analogously,
constraint(18)states that the sum of hesitant memberships of a
data point to all clusters in a Slave peer expressed through the
neutral and refusal degrees is also equal to one
given cluster in a peer is equal to one Thus, all attributes could
be normalized for the clustering
Outputs of the distributed picture fuzzy clustering model(14)– (21) are the optimal cluster centers fVljhjl ¼ 1; P; j ¼ 1; C;
h ¼ 1; rg, the picture degrees fðulkj;glkj;nlkjÞjl ¼ 1; P; k ¼ 1; Yl;
j ¼ 1; Cg in all peers showing which cluster that a data point belongs to and the attribute-weights fwljhjl ¼ 1; P; j ¼ 1; C;
h ¼ 1; rg Based upon these results, the state of clusters in a given peer is determined, and the global results could be retrieved from the local ones according to a specific cluster 3.2 The solutions
In this section, we use the Lagranian method and the Picard
(21)as follows
Theorem 1 The optimal solutions of the systems(14)–(21)are:
ulkj¼ 1 glkj nlkj
PC i¼1
Pr h¼1 wljhkXlkhV ljhk2
Pr h¼1 wlihk XlkhVlihk 2
m1
; ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ;
ð22Þ
hlijh¼ hlijhþ a1ðVljh VijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;
ð23Þ
Vljh¼
PYl k¼1
ulkj 1 glkjn lkj
wljhXlkhPP
i ¼ 1 i–l
hlijh
PYl k¼1
ulkj 1 glkjn lkj
wljh
;
ð8l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ; ð24Þ
Dlijh¼Dljihþ a2ðwljh wijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;
ð25Þ
wljh¼
k¼1
ulkj 1 glkjn lkj
Xlkh Vljh
þcþ 2PP
i¼1 i–lDlijh
Pr
h 0 ¼1exp 1
k¼1
ulkj 1 g lkj nlkj
Xlkh0 Vljh 0
þcþ 2PP
i¼1 i–lDlijh0
glkj¼ 1 nlkjþ
C1 C
PC i¼1nlki
PC i¼1
ulkj
ulki
Pr h¼1
w lih kX lkh V lih k2
w ljh kX lkh V ljh k2
mþ1
;
ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ; ð27Þ
nlkj¼ 1 ðulkjþglkjÞ ð1 ðulkjþglkjÞaÞ1=a;
ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ: ð28Þ
Proof (A) Fix W,V,g,n the Lagranian function with respect to U is:
LðUÞ ¼XP l¼1
XY l
k¼1
XC j¼1
ulkj
1 glkj nlkj
!m
Xr h¼1
wljh Xlkh Vljh
þcXP l¼1
XC j¼1
Xr h¼1
wljhlog wljh
XP XYl
klk
XC ulkj
1 glkj nlkj
1
!
Trang 5@ulkj
1 glkj nlkj
ulkj
1 glkj nlkj
!m1
Xr h¼1
wljh Xlkh Vljh
klk
1 glkj nlkj
¼ 0; ð8l ¼ 1; P; k ¼ 1; Yl;j ¼ 1; C; h ¼ 1; rÞ;
ð30Þ
ulkj¼ ð1 glkj nlkjÞ klk
mPr
!1 m1
From constraint(17), we have
PC
h¼1 wljhkXlkhVljhk 2
m1
0
B
B
1 C C
m1
Substitute(32)into(31)we obtain the optimal solutions of ulkjas
follows
ulkj¼ 1 glkj nlkj
PC
i¼1
Pr
h¼1 wljhkXlkhVljhk 2
Pr
h¼1 w lih kX lkh V lih k2
m1
; ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ:
ð33Þ
(B) We fix all degrees and the attribute-weights to calculate the
cluster centers by the Lagranian function below
LðVÞ ¼XP
l¼1
XYl
k¼1
XC
j¼1
ulkj
1 glkj nlkj
!m
Xr h¼1
wljh Xlkh Vljh
þcXP
l¼1
XC
j¼1
Xr
h¼1
wljhlog wljhXP
l¼1
XP
i¼1 i–l
XC j¼1
Xr h¼1
hlijhðVljh VijhÞ; ð34Þ
where hlijhis a Lagranian multiplier matrix Taking the derivative of
L(V) with respect to Vljhwe have
@LðVÞ
@Vljh
¼ 2XYl
k¼1
ulkj
1 glkj nlkj
!m
wljh Xlkh Vljh
þXP
i¼1 i–l
hlijhXP
i¼1 i–l
hiljh¼ 0;
ð35Þ
Vljh¼
PY l
k¼1
u lkj
1 g lkj n lkj
wljhXlkhPP
i¼1 i–lhlijh
PY l
k¼1
ulkj 1 g lkj n lkj
wljh
; ð8l ¼ 1;P; j ¼ 1;C; h ¼ 1;rÞ:
ð36Þ
hlijhis calculated by a Picard iteration below with a1being a positive
scalar
hlijh¼ hlijhþ a1 ðVljh VijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ
ð37Þ
(C) By the similar calculation with (B), we take the Lagranian
function with respect to W
LðVÞ ¼XP
l¼1
XYl
k¼1
XC
j¼1
ulkj
1 glkj nlkj
!m
Xr h¼1
wljh Xlkh Vljh
þcXP
l¼1
XC
j¼1
Xr
h¼1
wljhlog wljhXP
l¼1
XC j¼1
klj
Xr h¼1
ðwljh 1Þ
þXP
l¼1
XP
i¼1
i–l
XC
j¼1
Xr h¼1
Dlijhðwljh wijhÞ: ð38Þ
@L
@wljh
¼XYl k¼1
ulkj
1 glkj nlkj
!m
kXlkh Vljhk2þcðlog wljhþ 1Þ klj
þXP
i¼1 i–l
ðDlijhDiljhÞ ¼ 0; ð8l ¼ 1; P; k ¼ 1; Yl;j ¼ 1; C; h ¼ 1; rÞ;
ð39Þ
wljh¼ exp 1
c
XYl k¼1
ulkj
1 glkj nlkj
!m
Xlkh Vljh
þc kljþ 2XP
i¼1 i–l
Dlijh
2 4
3 5
0
@
1 A; ð8l ¼ 1; P; k ¼ 1; Yl;j ¼ 1; C; h ¼ 1; rÞ:
exp klj
c
h¼1exp 1
k¼1
ulkj 1 glkjnlkj
Xlkh Vljh
þcþ 2PP
i¼1 i–lDlijh
wljh¼
k¼1
ulkj 1 glkjnlkj
kXlkh Vljhk2þcþ 2P
i¼1 i–l
P
Dlijh
Pr
h 0 ¼1exp 1
k¼1
ulkj 1 glkjnlkj
Xlkh0 Vljh 0
þcþ 2PP i¼1i–lDlijh0
positive scalar
Dlijh¼Dljihþ a2ðwljh wijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ
ð43Þ
(D) Fix W,V,u,n the Lagranian function with respect togis:
LðgÞ ¼XP l¼1
XYl k¼1
XC j¼1
ulkj
1 glkj nlkj
!m
Xr h¼1
wljh Xlkh Vljh
þcXP l¼1
XC j¼1
Xr h¼1
wljhlog wljh
XP l¼1
XY l
k¼1
klk
XC j¼1
glkjþnlkj C
1
!
@LðgÞ
@glkj
¼Xr h¼1
ulkj
1 glkj nlkj
!m
m
1 glkj nlkj
wljh Xlkh Vljh
klk¼ 0;
ð8l ¼ 1; P; k ¼ 1; Yl;j ¼ 1; C; h ¼ 1; rÞ; ð45Þ
glkj¼ 1 nlkj mu
m lkj
Pr
klk
!1 mþ1
glkj¼ 1 nlkjþ
C1 C
PC i¼1nlki
PC i¼1
ulkj
ulki
Pr h¼1
w lih k X lkh V lih k 2
w ljhkX lkh V ljhk2
mþ1
;
ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ: ð47Þ
(E) Once we have ulkjandglkj, from constraint(16), we can use
follows
nlkj¼ 1 ðulkjþglkjÞ ð1 ðulkjþglkjÞaÞ1=a;
ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ: ð48Þ
Notice that a> 0 is an exponent coefficient used to control the
Trang 63.3 The DPFCM algorithm
In this section, we present in details the DPFCM algorithm
3.4 The theoretical analyses of DPFCM
In this section, we make the analyses of the DPFCM algorithm
advanta-ges/disadvantages of the proposed work As we can recognize in
optimization problem aiming to derive the cluster centers
accom-panied with the attribute-weights and the positive, the neutral and
the refusal memberships of data points from a given dataset and a
facilitator system By using the Lagranian method and the Picard
iteration, the optimal solutions of the problem are determined as
in Eqs.(22)–(28) We clearly see that the cluster centers(24), the
attribute-weights(26)and the positive(22), the neutral(27)and
the refusal memberships(28)are affected by the facilitator model
through the uses of two Lagranian multipliers expressed in Eqs
be used in the next step of the cluster centers in(24) Similarly,
Dlijhcontributes greatly to the changes of the values of the
of all memberships and the cluster centers Similar to hlijh,Dlijh
are updated in the Master peers by those of other peers Using
the facilitator model in this case expressed by the activities of
two Lagranian multipliers hlijhandDlijhhelps the local results in a
peer being updated with those of other peers so that the local
clustering outputs could reach to the global optimum Besides the facilitator model, the utilization of various memberships in
improves the clustering quality of the algorithm That is to say,
calculated based upon the dataset, the previous cluster centers and memberships; thus regulating the next results according to
is not just the reflection of the ideas stated in Section1but also the expression of the calculation process, which can be easily
the proposed algorithm are threefold Firstly, the proposed cluster-ing algorithm could be applied to various practical problems requiring fast processing of huge datasets In fact, since the activi-ties of the algorithm are simultaneously performed in all peers, the total operating time is reduced as a result The clustering quality of outputs is also better than those of the relevant distributed cluster-ing algorithms accordcluster-ing to our theoretical analyses in Section1 Secondly, the proposed algorithm is easy to implement and could
be adapted with many parallel processing models such as the Mes-sage Passing Interface (MPI), Open Multi-Processing (OpenMP), Local Area Multicomputer (LAM/MPI), etc Thirdly, the design of the DPFCM algorithm in this article could be a know-how tutorial for the development of fuzzy clustering algorithms on advanced fuzzy sets like the PFS set Besides the advantages, the proposed work still contains some limitations as follows Firstly, the DPFCM algorithm has large computational time in comparison with some relevant algorithms such as FCM, PFCM, Soft-DKM, WEFCM and CDFCM due to extra computation on the membership degrees
Distributed Picture Fuzzy Clustering Method (DPFCM)
- Number of clusters: C
- Number of peers: P + 1
- Fuzzifier m
- Thresholde> 0
- Parameters:c,a1,a2,a, max Iter
O: fVljhjl ¼ 1; P; j ¼ 1; C; h ¼ 1; rg; fðulkj;glkj;nlkjÞjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg fwljhjl ¼ 1; P; j ¼ 1; C; h ¼ 1; rg
DPFCM
- Set the number of iterations: t = 0
- SetDlijhðtÞ ¼ hlijhðtÞ ¼ 0; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ
- Randomize fðulkjðtÞ;glkjðtÞ; nlkjðtÞÞjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg satisfying(16)
- Set wljh(t) = 1/r (l ¼ 1; P; j ¼ 1; C; h ¼ 1; r)
2S: Calculate cluster centers VljhðtÞ; ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ from (ulkj(t),glkj(t), nlkj(t)), wljh(t) and hlijh(t) by(24)
3S: Calculate attribute-weights wljh(t + 1), ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ from (ulkj(t),glkj(t),nlkj(t)), Vljh(t) andDlijh(t) by(26)
4S: Send fDlijhðtÞ; hlijhðtÞ; VljhðtÞ; wljhðt þ 1Þji; l ¼ 1; P; i – l; k ¼ 1; Yl;j ¼ 1; Cg to Master
5M: Calculates fDlijhðt þ 1Þ; hlijhðt þ 1Þji; l ¼ 1; P; i – l; k ¼ 1; Yl; j ¼ 1; Cg by(23) and (25)and send them to Slave peers
6S: Calculate cluster centers Vljhðt þ 1Þ; ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ from (ulkj(t),glkj(t), nlkj(t)), wljh(t + 1) and hlijh(t + 1) by(24) 7S: Calculate positive degrees fulkjðt þ 1Þjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg from (glkj(t),nlkj(t)), wljh(t + 1) and Vljh(t + 1) by(22)
8S: Compute neutral degrees fglkjðt þ 1Þjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg from (ulkj(t + 1), nlkj(t)), wljh(t + 1) and Vljh(t + 1) by(27) 9S: Calculate refusal degrees fnlkjðt þ 1Þjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg from (ulkj(t + 1),glkj(t + 1)), wljh(t + 1) and Vljh(t + 1) by(28) 10S: If maxlfmaxfkulkjðt þ 1Þ ulkjðtÞk; kglkjðt þ 1Þ glkjðtÞk; knlkjðt þ 1Þ nlkjðtÞkgg <eor t > max Iter then stop the algorithm,
Otherwise set t = t + 1 and return Step 3S
S: Operations in Slave peers
M: Operations in the Master peer
Trang 7and the results of all peers Secondly, the number of peers could
affect the clustering quality of outputs Large number of peers
may enhance the clustering quality but also increase the
computa-tional time of algorithm How many numbers of peers is enough to
balance between the clustering quality and the computational
time? In the experiment section, we will validate these remarks
as well as find the answers for these questions
4 Evaluation
4.1 Experimental environment
In this part, we describe the experimental environments such
as,
Experimental tools: we have implemented the proposed DPFCM
algorithm in MPI/C programming language and executed it on a
PC Intel Pentium 4, CPU 3.4 GHz, 4 GB RAM, 160 GB HDD The
experimental results are taken as the average values after 100
(Zhou & Philip Chen, 2011) and CDFCM (Zhou et al., 2013)
Experimental dataset: the benchmark datasets of UCI Machine
Learning Repository (Bache & Lichman, 2013) such as IRIS,
GLASS, IONOSPHERE, HABERMAN and HEART IRIS is a standard
data set consisting of 150 instances with three classes and four
attributes in which each class contains of 50 instances GLASS
contains 214 instances, 6 classes and 9 attributes which are
refractive index, sodium, magnesium, aluminum, silicon,
potas-sium, calcium, barium, and iron IONOSPHERE contains 351
instances of radar data, 34 attributes and 2 classes where
‘‘Good’’ radar shows evidence of some types of structures in
the ionosphere and ‘‘Bad’’ returns those that do not HABERMAN
contains cases from a study that was conducted between 1958
and 1970 at the University of Chicago’s Billings Hospital on the
survival of patients who had undergone surgery for breast
cancer It contains 306 instances, 3 attributes and 2 classes HEART shows the information of heart diseases including 270 instances, 13 attributes and 2 classes.Table 1gives an overview
of those datasets
These datasets are normalized by a simple normalization
X
Xi min
max
i fXig min
Yjðj ¼ 1; PÞ are generated by randomly selected from the original dataset satisfying the condition [P
j¼1Yj¼ X
Parameter setting: in order to accurate comparison with the
Iter = 1000
Cluster validity measurement: we use the Average Iteration Num-ber (AIN), the Average Classification Rate (ACR) (Eq.50)and the
the-larger-the-better validity indices whilst AIN is the-smaller-the-larger-the-better
CR ¼
PK
where dkis the number of objects correctly identified in kth cluster and N is the total number of objects in the dataset
NMIðR; QÞ ¼
PI i¼1
PJ j¼1Pði; jÞ log Pði;jÞ
PðiÞPðjÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi HðRÞHðQÞ
where R, Q are two partitions of the dataset having I and J clusters, respectively P(i) is the probability that a randomly selected object from the dataset falls into cluster Riin the partition R P(i,j) is the
Table 1
The descriptions of experimental datasets.
Trang 8Fig 2 The initiation of peer 2.
Fig 3 The initiation of peer 3.
Fig 4 The communication in each iteration step.
Trang 9probability that an object belongs to cluster Riin R and cluster Qjin
Q H(R) is the entropy associated with probabilities P(i) (1 6 i 6 I) in
partition R AIN, ACR and ANMI are the average results after 100
runs
Objective: (a) to illustrate the activities of DPFCM to classify a
specific benchmark dataset of UCI Machine Learning
Reposi-tory; (b) to evaluate the clustering qualities of algorithms
through validity indices; (c) to measure the effect of the number
of peers to the clustering quality; (d) to investigate the
compu-tational time of all algorithms
4.2 An illustration of DPFCM
Firstly, we illustrate the activities of the proposed algorithm –
DPFCM to classify the IRIS dataset In this case, N = 150, r = 4,
C = 3 and the number of peers is P = 3 The cardinalities of the first,
second and third peers are 38, 39 and 73, respectively The initial
positive, the neutral and the refusal matrices of the first peer are initialized in(52)–(54), respectively
0:082100 0:836100 0:011500 0:722100 0:002400 0:930900 0:365000 0:983200 0:578800 0:002900 0:199000 0:608400 0:116700 0:462500 0:932100
ð52Þ
0:052229 0:123007 0:143827 0:131697 0:878686 0:036471 0:466915 0:002841 0:415851 0:034799 0:450723 0:213030 0:747537 0:094331 0:017593
ð53Þ
Fig 6 The distribution of clusters of Peer 2 in the second iteration.
Fig 5 The distribution of clusters of Peer 1 in the second iteration.
Trang 100:477245 0:020119 0:244364
0:076669 0:050943 0:026051
0:065771 0:000965 0:001082
0:560059 0:075800 0:085142
0:113362 0:394597 0:014931
ð54Þ
From this initialization, the distribution of clusters of the first peer
in the first iteration is depicted inFig 1
Similarly, the distributions of clusters of the second and third
peer in the first iteration are depicted inFigs 2 and 3, respectively
Now, we illustrate the activities of the first peer The cluster
cen-ters Vljh(0) calculated by Eq.(24)are expressed in Eq.(55)
0:568 0:523 0:603 0:466
0:563 0:564 0:608 0:470
0:546 0:542 0:619 0:494
ð55Þ
The attribute-weights wljh(1) are computed from(26)and shown in
(56)
0:137 0:225 0:291 0:346
0:181 0:268 0:253 0:298
0:185 0:234 0:273 0:309
ð56Þ
Now all Slave peers synchronize their pairs {Dlijh(0), hlijh(0), Vljh(0),
Fig 4
The values of Lagranian multipliersDlijh(1), hlijh(1) in all peers
(58), respectively
The cluster centers Vljh(1) are then updated accordingly
0:505968 0:482254 0:573009 0:467232 0:496019 0:529548 0:542298 0:452127 0:495326 0:510318 0:562189 0:474282
ð59Þ
Based upon Vljh(1), wljh(1) and the updated Lagranian multipliers, new positive, the neutral and the refusal matrices of the first peer are calculated in(60)–(62), respectively
0:146003 0:306919 0:202825 0:244076 0:025787 0:304908 0:146620 0:359057 0:189975 0:129143 0:158800 0:242726 0:045708 0:161369 0:344092
ð60Þ
0:293576 0:638569 0:594259 0:740159 0:863683 0:634079 0:821898 0:585084 0:796155 0:232520 0:758352 0:726713 0:640528 0:534759 0:625413
ð61Þ
Fig 7 The distribution of clusters of Peer 3 in the second iteration.
0:000 0:000 0:000 0:000 0:171 0:180 0:160 0:191 0:139 0:359 0:214 0:284
0:000 0:000 0:000 0:000 0:185 0:061 0:120 0:126 0:056 0:399 0:196 0:260
0:000 0:000 0:000 0:000 0:195 0:087 0:140 0:141 0:049 0:457 0:227 0:279
ð57Þ
0:000 0:000 0:000 0:000 0:159 0:157 0:181 0:099 0:065 0:081 0:044 0:113
0:000 0:000 0:000 0:000 0:169 0:098 0:282 0:183 0:118 0:123 0:111 0:054
0:000 0:000 0:000 0:000 0:148 0:091 0:276 0:192 0:092 0:099 0:122 0:033
ð58Þ
...GLASS, IONOSPHERE, HABERMAN and HEART IRIS is a standard
data set consisting of 150 instances with three classes and four
attributes in which each class contains of 50 instances... Besides the advantages, the proposed work still contains some limitations as follows Firstly, the DPFCM algorithm has large computational time in comparison with some relevant algorithms such as FCM,... 1gives an overview
of those datasets
These datasets are normalized by a simple normalization
X
Xi
max
i