151. DPFCM A novel distributed picture fuzzy clustering method on picture fuzzy sets

Based upon the observation that the communication cost and the quality of results in CDFCM could be ameliorated through the integration of a distributed picture fuzzy clustering with the

Trang 1

DPFCM: A novel distributed picture fuzzy clustering method on picture

fuzzy sets

VNU University of Science, Vietnam National University, Viet Nam

a r t i c l e i n f o

Article history:

Available online 26 July 2014

Keywords:

Clustering quality

Distributed clustering

Facilitator model

Fuzzy clustering

Picture fuzzy sets

a b s t r a c t

Fuzzy clustering is considered as an important tool in pattern recognition and knowledge discovery from

a database; thus has been being applied broadly to various practical problems Recent advances in data organization and processing such as the cloud computing technology which are suitable for the manage-ment, privacy and storing big datasets have made a signiﬁcant breakthrough to information sciences and

to the enhancement of the efﬁciency of fuzzy clustering Distributed fuzzy clustering is an efﬁcient mining technique that adapts the traditional fuzzy clustering with a new storage behavior where parts

of the dataset are stored in different sites instead of the centralized main site Some distributed fuzzy clustering algorithms were presented including the most effective one – the CDFCM of Zhou et al (2013) Based upon the observation that the communication cost and the quality of results in CDFCM could be ameliorated through the integration of a distributed picture fuzzy clustering with the facilitator model, in this paper we will present a novel distributed picture fuzzy clustering method on picture fuzzy sets so-called DPFCM Experimental results on various datasets show that the clustering quality of DPFCM

is better than those of CDFCM and relevant algorithms

1 Introduction

Fuzzy clustering is considered as an important tool in pattern

recognition and knowledge discovery from a database; thus has

been being applied broadly to various practical problems The ﬁrst

fuzzy clustering algorithm is Fuzzy C-Means (FCM) proposed by

and the partition matrix in each step in order to satisfy a given

objective function Bezdek proved that FCM converges to the

saddle points of the objective function Even though FCM was

proposed long time ago, this algorithm is still a popular fuzzy

clustering that has been being applied to many practical problems

for the rules extraction and implicit patterns discovery wherein the

fuzziness exist such as,

Moriarty, 2002; Cao, Deng, & Wang, 2012; Chen, Chen, & Lu,

2011; Chuang, Tzeng, Chen, Wu, & Chen, 2006; Krinidis &

Chatzis, 2010; Li, Chui, Chang, & Ong, 2011; Ma & Staunton,

2007; Pham, Xu, & Prince, 2000; Siang Tan & Mat Isa, 2011;

Zhang & Chen, 2004);

Face recognition (Agarwal, Agrawal, Jain, & Kumar, 2010; Chen

& Huang, 2003; Haddadnia, Faez, & Ahmadi, 2003; Lu, Yuan, &

Gesture recognition (Li, 2003; Wachs, Stern, & Edan, 2003);

Chimphlee, & Srinoy, 2005; Chimphlee, Abdullah, Noor Md Sap, Srinoy, & Chimphlee, 2006; Shah, Undercoffer, & Joshi,

Hot-spot spatial analysis (Di Martino, Loia, & Sessa, 2008);

Risk analysis (Li, Li, & Kang, 2011);

Bankrupt prediction (Martin, Gayathri, Saranya, Gayathri, &

2014a, 2014b; Son, Cuong, Lanzi, & Thong, 2012, 2013, 2014; Son, Lanzi, Cuong, & Hung, 2012);

Dhavale, & Sarkis, 2014; Chu, Liau, Lin, & Su, 2012; Egrioglu, Aladag, & Yolcu, 2013; Egrioglu, 2011; Hadavandi, Shavandi, & Ghanbari, 2011; Izakian & Abraham, 2011; Roh, Pedrycz, & Ahn, 2014; Wang, Ma, Lao, & Wang, 2014; Zhang, Huang, Ji, & Xie, 2011)

Recent advances in data organization and processing such as the cloud computing technology which are suitable for the

http://dx.doi.org/10.1016/j.eswa.2014.07.026

⇑ Ofﬁcial address: 334 Nguyen Trai, Thanh Xuan, Hanoi, Viet Nam Tel.: +84

904171284; fax: +84 0438623938.

E-mail addresses: sonlh@vnu.edu.vn , chinhson2002@gmail.com

Contents lists available atScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w e l s e v i e r c o m / l o c a t e / e s w a

Trang 2

management, privacy and storing big datasets have made a

signif-icant breakthrough to information sciences in general and to the

enhancement of the efﬁciency of FCM in particular For example,

cloud computing is an Internet-based storage solution where

ubiq-uitous computers resources are set up with the same conﬁguration

in order to develop and run applications as if they were

con-structed in a single centralized system Users do not need to know

where and how the computers resources operate so that the

main-tenance and running costs could be reduced; thus guaranteeing the

stable expansion of applications In the cloud computing paradigm,

data mining techniques especially fuzzy clustering are very much

needed in order to retrieve meaningful information from virtually

stated that using data mining through cloud computing reduces

the barriers that keep users from beneﬁting of the data mining

instruments so that they could only pay for the data mining tools

without handling complex hardware and data infrastructures

Examples of deploying data mining and clustering algorithms in

some typical cloud computing service providers such as Amazone

cloud, Google Apps, Microsoft, Salesforce and IBM could be found

Wu, Guru, & Buyya, 2010) and others (Surcel & Alecu, 2008) Such

these algorithms are called the distributed mining techniques

Distributed fuzzy clustering is a distributed mining technique

that adapts the traditional fuzzy clustering with a new storage

behavior where parts of the dataset are stored in different sites

instead of the centralized main site Distributed fuzzy clustering

is extended from the distributed hard clustering algorithms

Sev-eral efforts on distributed hard/fuzzy clustering could be named

distributed clustering algorithm called dSimpleGraph based on the

relation between two micro-clusters to classify data on the local

machines and generate a determined global view from local views

of Support Vector Machine for large-scale datasets and presented a

distributed clustering method inspired by the Multi-Agent

frame-work, in which data are divided to different agents, and the global

only limited local knowledge for clustering static and dynamic

graphs.Kwon et al (2010)proposed a scalable, parallel algorithm

algo-rithm based upon spatial data correlation among sensor nodes

and performed data accuracy for each distributed cluster at their

a distributed density-based clustering that both reduces the

communication overheads and improves the quality of the global

algo-rithm based on the aggregation of models produced locally that

means datasets were processed locally on each node and the

results were integrated to construct global clusters hierarchically

The aim of this approach is to minimize the communications,

maximize the parallelism and load balance the work among

differ-ent nodes of the system, and reduce the overhead due to extra

identiﬁcation and outlier detection for distributed data based on

the idea that is to generate independent local models and combine

the local models at a central server to obtain global clusters with

presented a distributed random walk based clustering algorithm

that builds a bounded-size core through a random walks based

Clustering algorithm that takes only the distance function, which satisﬁes the triangle inequality and is of sufﬁciently high-granular-ity to permit the data to be partitioned into canopies of optimal

algorithms based on k-means and k-median The basic ideas of these algorithms are to reduce the problem of ﬁnding a clustering with low cost to the problem of ﬁnding a core-set of small size then construct a global core-set.Hai, Zhang, Zhu, and Wang (2012), Jain

of distributed clustering group methods including partitioning, hierarchical, density-based, soft-computing, neural network and fuzzy clustering methods They argued that datasets in the real world applications often consist of inconsistencies or outliers, where it is difﬁcult to obtain homogeneous and meaningful global clusters so that the distributed hard clustering should incorporate with the fuzzy set theory in order to handle the hesitancy originat-ing from imperfect and imprecise information A parallel version of the FCM algorithm so-called PFCM aiming for the distributed fuzzy

modiﬁed the PFCM algorithm with a pre-processing procedure to estimate the number of clusters and also presented a

algorithm which

based distributed clustering algorithm including two different levels: the local level and the global level In the local level, numerical datasets are converted into intuitionistic fuzzy data and they are clustered independently from each other using modiﬁed FCM algo-rithm In the global level, global center is computed by clustering all local cluster centers The global center is again transmitted to local sites to update the local cluster model The communication

Master–Slave model A distributed fuzzy clustering namely CDFCM

and attribute-weights are calculated at each peer and then updated

by neighboring results through local communications The process

is repeated until a pre-deﬁned stopping criterion hold, and the sta-tus quo of clusters in all peers reﬂects accurately the results as in the centralized clustering CDFCM was experimental validated and had better clustering quality than other relevant algorithms

Soft-DKM (Forero, Cano, & Giannakis, 2011) and WEFCM (Zhou &

distributed fuzzy clustering available in the literature

The motivation of this paper is described as follows In the activities of CDFCM, this algorithm solely updates the cluster cen-ters and attribute-weights of each peer by those of neighboring peers This requires large communication costs, approximately

P NB communications per iteration with P being the number of peers and NB being the average number of neighbors of a given peer Additionally, the quality of results in each peer could not be high since only local updates with neighboring results are con-ducted Based upon the idea that the communication cost and the quality of results in CDFCM could be ameliorated through the integration of a distributed picture fuzzy clustering with the facil-itator model, in this paper we will present a novel distributed pic-ture fuzzy clustering method on picpic-ture fuzzy sets so-called DPFCM The proposed algorithm utilizes the facilitator model that means all peers transferred their results into a special, unique peer called the Master peer so that it takes only P communication costs to complete the update process Employing the Master peer in the

Trang 3

facilitator model also helps increasing the capability to update

more numbers of neighboring results, thus advancing the quality

of results In order to enhance the clustering quality as high as

pos-sible, we also deploy the distributed fuzzy clustering algorithm in

picture fuzzy sets (PFS) (Cuong & Kreinovich, 2013), which in

essence are the generalization of the traditional fuzzy sets (FS)

used for the development of the existing CDFCM algorithm PFS

based models can be applied to situations requiring human

opin-ions involving more answers of types: yes, abstain, no and refusal,

which can not be accurately expressed in the traditional FS

There-fore, deploying the distributed clustering algorithm in PFS could

give higher clustering quality than in FS and in IFS Our

contribu-tion in this paper is a novel distributed picture fuzzy clustering

method (DPFCM) that utilizes the ideas of both the facilitator

model and deploying clustering algorithms in PFS in order to

ame-liorate the clustering quality The proposed algorithm will be

implemented and validated in comparison with CDFCM and other

relevant algorithms in terms of clustering quality The signiﬁcance

of this contributed research is not only the enhancement of the

clustering quality of distributed fuzzy clustering algorithm but also

the enrichment of the know-how knowledge of integrating picture

fuzzy sets to clustering algorithms and deploying them to practical

applications Indeed, the contribution of this paper is meaningful to

both theoretical and practical sides

The rest of the paper is organized as follows Section2gives the

preliminary about the PFS set The formulation of clustering

algo-rithms in PFS in association with the facilitator model is described

in Section3 Section4validates the proposed approach through a

draws the conclusions and delineates the future research

directions

2 Preliminary

In this section, we take a brief overview of some basic terms and

notations in PFS, which can be used throughout the paper

Deﬁnition 1 A picture fuzzy set (PFS) (Cuong & Kreinovich, 2013)

in a non-empty set X is,

_A ¼ x;l_AðxÞ;g_AðxÞ;c_AðxÞ

jx 2 X

wherel_AðxÞ is the positive degree of each element x 2 X;g_AðxÞ is the

constraints,

l_AðxÞ;g_AðxÞ;c_AðxÞ 2 ½0; 1; 8x 2 X; ð2Þ

0 6l_AðxÞ þg_AðxÞ þc_AðxÞ 6 1; 8x 2 X: ð3Þ

nA _ðxÞ ¼ 1 ðl_AðxÞ þgA _ðxÞ þcA _ðxÞÞ;8x 2 X In cases nA _ðxÞ ¼ 0 PFS

returns to intuitionistic fuzzy sets (IFS) (Atanassov, 1986), and

when both gA _ðxÞ ¼ n_AðxÞ ¼ 0, PFS returns to fuzzy sets (FS)

consider some examples below

Example 1 In a democratic election station, the council issues 500

voting papers for a candidate The voting results are divided into

four groups accompanied with the number of papers that are ‘‘vote

for’’ (300), ‘‘abstain’’ (64), ‘‘vote against’’ (115) and ‘‘refusal of

voting’’ (21) Group ‘‘abstain’’ means that the voting paper is a

white paper rejecting both ‘‘agree’’ and ‘‘disagree’’ for the

candi-date but still takes the vote Group ‘‘refusal of voting’’ is either

invalid voting papers or did not take the vote This example was

happened in reality and IFS could not handle it since the refusal

degree (group ‘‘refusal of voting’’) does not exist

Example 2 A patient was given the ﬁrst emergency aid and diag-nosed by four states after examining possible symptoms that are

‘‘heart attack’’, ‘‘uncertain’’, ‘‘not heart attack’’, ‘‘appendicitis’’ In this case, we also have a PFS set

Now, we brieﬂy present some basic picture fuzzy operations, picture distance metrics and picture fuzzy relations Let PFS(X) denote the set of all PFS sets on universe X

Deﬁnition 2 For A,B 2 PFS(X), the union, intersection and com-plement operations are deﬁned as follows

A [ B ¼ fhx; maxflAðxÞ;lBðxÞg; minfgAðxÞ;gBðxÞg;

minfcAðxÞ;cBðxÞgijx 2 Xg; ð4Þ

A \ B ¼ fhx; minflAðxÞ;lBðxÞg; minfgAðxÞ;gBðxÞg;

maxfcAðxÞ;cBðxÞgijx 2 Xg; ð5Þ

A ¼ fhx;cAðxÞ;gAðxÞ;lAðxÞijx 2 Xg: ð6Þ

Deﬁnition 3 For A,B 2 PFS(X), the Cartesian product of these PFS sets is,

A1B ¼ fhðx;yÞ;lAðxÞ:lBðyÞ;gAðxÞ:gBðyÞ;cAðxÞ:cBðyÞijx 2 A; y 2 Bg; ð7Þ A2B ¼ fhðx;yÞ;lAðxÞ ^lBðyÞ;gAðxÞ ^gBðyÞ;

cAðxÞ _cBðyÞijx 2 A; y 2 Bg: ð8Þ

Deﬁnition 4 The distances between A,B 2 PFS(X) are the

dpðA; BÞ ¼1

N

XN i¼1

ðjlAðxiÞ lBðxiÞj þ jgAðxiÞ gBðxiÞj þ jcAðxiÞ cBðxiÞjÞ;

ð9Þ

epðA; BÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1

N

XN i¼1

ððlAðxiÞ lBðxiÞÞ2þ ðgAðxiÞ gBðxiÞÞ2þ ðcAðxiÞ cBðxiÞÞ2Þ

v u

:

ð10Þ

Deﬁnition 5 The picture fuzzy relation R is a picture fuzzy subset

of A B, given by

R ¼ fhðx; yÞ;lRðx; yÞ;gRðx; yÞ;cRðx; yÞijx 2 A; y 2 Bg; ð11Þ

lR;gR;cR:A B ! ½0; 1; ð12Þ

lRðx; yÞ þgRðx; yÞ þcRðx; yÞ 6 1; 8ðx; yÞ 2 A B: ð13Þ

PFR(A B) is the set of all picture fuzzy subset on A B Some properties of PFS operations, the convex combination of PFS, etc

3 The proposed method 3.1 The proposed distributed picture fuzzy clustering model

In this section, we propose a distributed picture fuzzy clustering model The communication model is the facilitator or the Master– Slave model having a Master peer and P Slave peers, and each Slave peer is allowed to communicate with the Master only Each Slave peer has a subset of the original dataset X consisting of N data

[P j¼1Yj¼ X;PP

j¼1jYjj ¼ N The number of dimensions in a subset is exactly the same as that in the original dataset Let us divide the dataset X into C groups satisfying the objective function below

Trang 4

J ¼XP

l¼1

XYl

k¼1

XC

j¼1

ulkj

1 glkj nlkj

!m

Xr h¼1

wljhkXlkh Vljhk2

þcXP

l¼1

XC

j¼1

Xr

h¼1

wljhlog wljh! min; ð14Þ

Where ulkj,glkjand nlkjare the positive, the neutral and the refusal

degrees of data point kth to cluster jth in the Slave peer lth This

1 wljhis the attribute-weight of attribute hth to cluster jth in the

Slave peer lth Vljhis the center of cluster jth in the Slave peer lth

according to attribute hth Xlkhis the kth data point of the Slave peer

lth according to attribute hth m andcare the fuzziﬁer and a

posi-tive scalar, respecposi-tively The constraints for(14)are shown below

ulkj;glkj;nlkj2 ½0; 1; ð15Þ

ulkjþglkjþ nlkj61; ð16Þ

XC

j¼1

ulkj

1 glkj nlkj

!

XC

j¼1

glkjþnlkj

C

Xr

h¼1

Vljh¼ Vijh;ð8i – l; i; l ¼ 1; PÞ; ð20Þ

wljh¼ wijh:ð8i – l; i; l ¼ 1; PÞ: ð21Þ

The proposed model in Eqs.(14)–(21)relies on the principles of the

PFS set and the facilitator model The differences of this model with

The proposed model is the generalization of the CDFCM model

since when glkj= nlkj= 0 that means the PFS set degrades to

the FS set, it returns to the CDFCM model resulting in both

the objective function and the constraints In the other words,

Moreover, the constraints(15)–(18)that describes the relations

of some degrees in the PFS set were integrated to the

optimiza-tion problem By doing so, the new distributed picture fuzzy

clustering model is totally set up according to the PFS set

The proposed model utilizes the facilitator model to increase

the number of neighboring results used to update that of a

given peer, thus giving high accuracy of the ﬁnal results This

reﬂects in the constraints(20) and (21)where the cluster

cen-ters and the attribute-weights of two ubiquitous peers are

coin-cided so that these local centers and attribute-weights would

converge to the global ones

Additional remarks of the distributed picture fuzzy clustering

within clusters and maximizes the entropy of attribute-weights

allowing important attributes could contribute greatly to the

identiﬁcation of clusters

The constraints(15) and (16)are originated from the deﬁnition

of PFS

Constraint(17)describes that the sum of memberships of a data

point to all clusters in a Slave peer is equal to one Analogously,

constraint(18)states that the sum of hesitant memberships of a

data point to all clusters in a Slave peer expressed through the

neutral and refusal degrees is also equal to one

given cluster in a peer is equal to one Thus, all attributes could

be normalized for the clustering

Outputs of the distributed picture fuzzy clustering model(14)– (21) are the optimal cluster centers fVljhjl ¼ 1; P; j ¼ 1; C;

h ¼ 1; rg, the picture degrees fðulkj;glkj;nlkjÞjl ¼ 1; P; k ¼ 1; Yl;

j ¼ 1; Cg in all peers showing which cluster that a data point belongs to and the attribute-weights fwljhjl ¼ 1; P; j ¼ 1; C;

h ¼ 1; rg Based upon these results, the state of clusters in a given peer is determined, and the global results could be retrieved from the local ones according to a speciﬁc cluster 3.2 The solutions

In this section, we use the Lagranian method and the Picard

(21)as follows

Theorem 1 The optimal solutions of the systems(14)–(21)are:

ulkj¼ 1 glkj nlkj

PC i¼1

Pr h¼1 wljhkXlkhV ljhk2

Pr h¼1 wlihk XlkhVlihk 2

m1

; ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ;

ð22Þ

hlijh¼ hlijhþ a1ðVljh VijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;

ð23Þ

Vljh¼

PYl k¼1

ulkj 1 glkjn lkj

wljhXlkhPP

i ¼ 1 i–l

hlijh

PYl k¼1

ulkj 1 glkjn lkj

wljh

;

ð8l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ; ð24Þ

Dlijh¼Dljihþ a2ðwljh wijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;

ð25Þ

wljh¼

k¼1

ulkj 1 glkjn lkj

Xlkh Vljh

þcþ 2PP

i¼1 i–lDlijh

Pr

h 0 ¼1exp 1

k¼1

ulkj 1 g lkj nlkj

Xlkh0 Vljh 0

þcþ 2PP

i¼1 i–lDlijh0

glkj¼ 1 nlkjþ

C1 C

PC i¼1nlki

PC i¼1

ulkj

ulki

Pr h¼1

w lih kX lkh V lih k2

w ljh kX lkh V ljh k2

mþ1

;

ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ; ð27Þ

nlkj¼ 1 ðulkjþglkjÞ ð1 ðulkjþglkjÞaÞ1=a;

ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ: ð28Þ

Proof (A) Fix W,V,g,n the Lagranian function with respect to U is:

LðUÞ ¼XP l¼1

XY l

k¼1

XC j¼1

ulkj

1 glkj nlkj

!m

Xr h¼1

wljh Xlkh Vljh

þcXP l¼1

XC j¼1

Xr h¼1

wljhlog wljh

XP XYl

klk

XC ulkj

1 glkj nlkj

1

!

Trang 5

@ulkj

1 glkj nlkj

ulkj

1 glkj nlkj

!m1

Xr h¼1

wljh Xlkh Vljh

klk

1 glkj nlkj

¼ 0; ð8l ¼ 1; P; k ¼ 1; Yl;j ¼ 1; C; h ¼ 1; rÞ;

ð30Þ

ulkj¼ ð1 glkj nlkjÞ klk

mPr

!1 m1

From constraint(17), we have

PC

h¼1 wljhkXlkhVljhk 2

m1

0

B

1 C C

m1

Substitute(32)into(31)we obtain the optimal solutions of ulkjas

follows

ulkj¼ 1 glkj nlkj

PC

i¼1

Pr

h¼1 wljhkXlkhVljhk 2

Pr

h¼1 w lih kX lkh V lih k2

m1

; ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ:

ð33Þ

(B) We ﬁx all degrees and the attribute-weights to calculate the

cluster centers by the Lagranian function below

LðVÞ ¼XP

l¼1

XYl

k¼1

XC

j¼1

ulkj

1 glkj nlkj

!m

Xr h¼1

wljh Xlkh Vljh

þcXP

l¼1

XC

j¼1

Xr

h¼1

wljhlog wljhXP

l¼1

XP

i¼1 i–l

XC j¼1

Xr h¼1

hlijhðVljh VijhÞ; ð34Þ

where hlijhis a Lagranian multiplier matrix Taking the derivative of

L(V) with respect to Vljhwe have

@LðVÞ

@Vljh

¼ 2XYl

k¼1

ulkj

1 glkj nlkj

!m

wljh Xlkh Vljh

þXP

i¼1 i–l

hlijhXP

i¼1 i–l

hiljh¼ 0;

ð35Þ

Vljh¼

PY l

k¼1

u lkj

1 g lkj n lkj

wljhXlkhPP

i¼1 i–lhlijh

PY l

k¼1

ulkj 1 g lkj n lkj

wljh

; ð8l ¼ 1;P; j ¼ 1;C; h ¼ 1;rÞ:

ð36Þ

hlijhis calculated by a Picard iteration below with a1being a positive

scalar

hlijh¼ hlijhþ a1 ðVljh VijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ

ð37Þ

(C) By the similar calculation with (B), we take the Lagranian

function with respect to W

LðVÞ ¼XP

l¼1

XYl

k¼1

XC

j¼1

ulkj

1 glkj nlkj

!m

Xr h¼1

wljh Xlkh Vljh

þcXP

l¼1

XC

j¼1

Xr

h¼1

wljhlog wljhXP

l¼1

XC j¼1

klj

Xr h¼1

ðwljh 1Þ

þXP

l¼1

XP

i¼1

i–l

XC

j¼1

Xr h¼1

Dlijhðwljh wijhÞ: ð38Þ

@L

@wljh

¼XYl k¼1

ulkj

1 glkj nlkj

!m

kXlkh Vljhk2þcðlog wljhþ 1Þ klj

þXP

i¼1 i–l

ðDlijhDiljhÞ ¼ 0; ð8l ¼ 1; P; k ¼ 1; Yl;j ¼ 1; C; h ¼ 1; rÞ;

ð39Þ

wljh¼ exp 1

c

XYl k¼1

ulkj

1 glkj nlkj

!m

Xlkh Vljh

þc kljþ 2XP

i¼1 i–l

Dlijh

2 4

3 5

0

@

1 A; ð8l ¼ 1; P; k ¼ 1; Yl;j ¼ 1; C; h ¼ 1; rÞ:

exp klj

c

h¼1exp 1

k¼1

ulkj 1 glkjnlkj

Xlkh Vljh

þcþ 2PP

i¼1 i–lDlijh

wljh¼

k¼1

ulkj 1 glkjnlkj

kXlkh Vljhk2þcþ 2P

i¼1 i–l

P

Dlijh

Pr

h 0 ¼1exp 1

k¼1

ulkj 1 glkjnlkj

Xlkh0 Vljh 0

þcþ 2PP i¼1i–lDlijh0

positive scalar

Dlijh¼Dljihþ a2ðwljh wijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ

ð43Þ

(D) Fix W,V,u,n the Lagranian function with respect togis:

LðgÞ ¼XP l¼1

XYl k¼1

XC j¼1

ulkj

1 glkj nlkj

!m

Xr h¼1

wljh Xlkh Vljh

þcXP l¼1

XC j¼1

Xr h¼1

wljhlog wljh

XP l¼1

XY l

k¼1

klk

XC j¼1

glkjþnlkj C

1

!

@LðgÞ

@glkj

¼Xr h¼1

ulkj

1 glkj nlkj

!m

m

1 glkj nlkj

wljh Xlkh Vljh

klk¼ 0;

ð8l ¼ 1; P; k ¼ 1; Yl;j ¼ 1; C; h ¼ 1; rÞ; ð45Þ

glkj¼ 1 nlkj mu

m lkj

Pr

klk

!1 mþ1

glkj¼ 1 nlkjþ

C1 C

PC i¼1nlki

PC i¼1

ulkj

ulki

Pr h¼1

w lih k X lkh V lih k 2

w ljhkX lkh V ljhk2

mþ1

;

ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ: ð47Þ

(E) Once we have ulkjandglkj, from constraint(16), we can use

follows

nlkj¼ 1 ðulkjþglkjÞ ð1 ðulkjþglkjÞaÞ1=a;

ð8l ¼ 1; P; k ¼ 1; Yl; j ¼ 1; CÞ: ð48Þ

Notice that a> 0 is an exponent coefﬁcient used to control the

Trang 6

3.3 The DPFCM algorithm

In this section, we present in details the DPFCM algorithm

3.4 The theoretical analyses of DPFCM

In this section, we make the analyses of the DPFCM algorithm

advanta-ges/disadvantages of the proposed work As we can recognize in

optimization problem aiming to derive the cluster centers

accom-panied with the attribute-weights and the positive, the neutral and

the refusal memberships of data points from a given dataset and a

facilitator system By using the Lagranian method and the Picard

iteration, the optimal solutions of the problem are determined as

in Eqs.(22)–(28) We clearly see that the cluster centers(24), the

attribute-weights(26)and the positive(22), the neutral(27)and

the refusal memberships(28)are affected by the facilitator model

through the uses of two Lagranian multipliers expressed in Eqs

be used in the next step of the cluster centers in(24) Similarly,

Dlijhcontributes greatly to the changes of the values of the

of all memberships and the cluster centers Similar to hlijh,Dlijh

are updated in the Master peers by those of other peers Using

the facilitator model in this case expressed by the activities of

two Lagranian multipliers hlijhandDlijhhelps the local results in a

peer being updated with those of other peers so that the local

clustering outputs could reach to the global optimum Besides the facilitator model, the utilization of various memberships in

improves the clustering quality of the algorithm That is to say,

calculated based upon the dataset, the previous cluster centers and memberships; thus regulating the next results according to

is not just the reﬂection of the ideas stated in Section1but also the expression of the calculation process, which can be easily

the proposed algorithm are threefold Firstly, the proposed cluster-ing algorithm could be applied to various practical problems requiring fast processing of huge datasets In fact, since the activi-ties of the algorithm are simultaneously performed in all peers, the total operating time is reduced as a result The clustering quality of outputs is also better than those of the relevant distributed cluster-ing algorithms accordcluster-ing to our theoretical analyses in Section1 Secondly, the proposed algorithm is easy to implement and could

be adapted with many parallel processing models such as the Mes-sage Passing Interface (MPI), Open Multi-Processing (OpenMP), Local Area Multicomputer (LAM/MPI), etc Thirdly, the design of the DPFCM algorithm in this article could be a know-how tutorial for the development of fuzzy clustering algorithms on advanced fuzzy sets like the PFS set Besides the advantages, the proposed work still contains some limitations as follows Firstly, the DPFCM algorithm has large computational time in comparison with some relevant algorithms such as FCM, PFCM, Soft-DKM, WEFCM and CDFCM due to extra computation on the membership degrees

Distributed Picture Fuzzy Clustering Method (DPFCM)

- Number of clusters: C

- Number of peers: P + 1

- Fuzziﬁer m

- Thresholde> 0

- Parameters:c,a1,a2,a, max Iter

O: fVljhjl ¼ 1; P; j ¼ 1; C; h ¼ 1; rg; fðulkj;glkj;nlkjÞjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg fwljhjl ¼ 1; P; j ¼ 1; C; h ¼ 1; rg

DPFCM

- Set the number of iterations: t = 0

- SetDlijhðtÞ ¼ hlijhðtÞ ¼ 0; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ

- Randomize fðulkjðtÞ;glkjðtÞ; nlkjðtÞÞjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg satisfying(16)

- Set wljh(t) = 1/r (l ¼ 1; P; j ¼ 1; C; h ¼ 1; r)

2S: Calculate cluster centers VljhðtÞ; ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ from (ulkj(t),glkj(t), nlkj(t)), wljh(t) and hlijh(t) by(24)

3S: Calculate attribute-weights wljh(t + 1), ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ from (ulkj(t),glkj(t),nlkj(t)), Vljh(t) andDlijh(t) by(26)

4S: Send fDlijhðtÞ; hlijhðtÞ; VljhðtÞ; wljhðt þ 1Þji; l ¼ 1; P; i – l; k ¼ 1; Yl;j ¼ 1; Cg to Master

5M: Calculates fDlijhðt þ 1Þ; hlijhðt þ 1Þji; l ¼ 1; P; i – l; k ¼ 1; Yl; j ¼ 1; Cg by(23) and (25)and send them to Slave peers

6S: Calculate cluster centers Vljhðt þ 1Þ; ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ from (ulkj(t),glkj(t), nlkj(t)), wljh(t + 1) and hlijh(t + 1) by(24) 7S: Calculate positive degrees fulkjðt þ 1Þjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg from (glkj(t),nlkj(t)), wljh(t + 1) and Vljh(t + 1) by(22)

8S: Compute neutral degrees fglkjðt þ 1Þjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg from (ulkj(t + 1), nlkj(t)), wljh(t + 1) and Vljh(t + 1) by(27) 9S: Calculate refusal degrees fnlkjðt þ 1Þjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg from (ulkj(t + 1),glkj(t + 1)), wljh(t + 1) and Vljh(t + 1) by(28) 10S: If maxlfmaxfkulkjðt þ 1Þ ulkjðtÞk; kglkjðt þ 1Þ glkjðtÞk; knlkjðt þ 1Þ nlkjðtÞkgg <eor t > max Iter then stop the algorithm,

Otherwise set t = t + 1 and return Step 3S

S: Operations in Slave peers

M: Operations in the Master peer

Trang 7

and the results of all peers Secondly, the number of peers could

affect the clustering quality of outputs Large number of peers

may enhance the clustering quality but also increase the

computa-tional time of algorithm How many numbers of peers is enough to

balance between the clustering quality and the computational

time? In the experiment section, we will validate these remarks

as well as ﬁnd the answers for these questions

4 Evaluation

4.1 Experimental environment

In this part, we describe the experimental environments such

as,

Experimental tools: we have implemented the proposed DPFCM

algorithm in MPI/C programming language and executed it on a

PC Intel Pentium 4, CPU 3.4 GHz, 4 GB RAM, 160 GB HDD The

experimental results are taken as the average values after 100

(Zhou & Philip Chen, 2011) and CDFCM (Zhou et al., 2013)

Experimental dataset: the benchmark datasets of UCI Machine

Learning Repository (Bache & Lichman, 2013) such as IRIS,

GLASS, IONOSPHERE, HABERMAN and HEART IRIS is a standard

data set consisting of 150 instances with three classes and four

attributes in which each class contains of 50 instances GLASS

contains 214 instances, 6 classes and 9 attributes which are

refractive index, sodium, magnesium, aluminum, silicon,

potas-sium, calcium, barium, and iron IONOSPHERE contains 351

instances of radar data, 34 attributes and 2 classes where

‘‘Good’’ radar shows evidence of some types of structures in

the ionosphere and ‘‘Bad’’ returns those that do not HABERMAN

contains cases from a study that was conducted between 1958

and 1970 at the University of Chicago’s Billings Hospital on the

survival of patients who had undergone surgery for breast

cancer It contains 306 instances, 3 attributes and 2 classes HEART shows the information of heart diseases including 270 instances, 13 attributes and 2 classes.Table 1gives an overview

of those datasets

These datasets are normalized by a simple normalization

X

Xi min

max

i fXig min

Yjðj ¼ 1; PÞ are generated by randomly selected from the original dataset satisfying the condition [P

j¼1Yj¼ X

Parameter setting: in order to accurate comparison with the

Iter = 1000

Cluster validity measurement: we use the Average Iteration Num-ber (AIN), the Average Classiﬁcation Rate (ACR) (Eq.50)and the

the-larger-the-better validity indices whilst AIN is the-smaller-the-larger-the-better

CR ¼

PK

where dkis the number of objects correctly identiﬁed in kth cluster and N is the total number of objects in the dataset

NMIðR; QÞ ¼

PI i¼1

PJ j¼1Pði; jÞ log Pði;jÞ

PðiÞPðjÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi HðRÞHðQÞ

where R, Q are two partitions of the dataset having I and J clusters, respectively P(i) is the probability that a randomly selected object from the dataset falls into cluster Riin the partition R P(i,j) is the

Table 1

The descriptions of experimental datasets.

Trang 8

Fig 2 The initiation of peer 2.

Fig 3 The initiation of peer 3.

Fig 4 The communication in each iteration step.

Trang 9

probability that an object belongs to cluster Riin R and cluster Qjin

Q H(R) is the entropy associated with probabilities P(i) (1 6 i 6 I) in

partition R AIN, ACR and ANMI are the average results after 100

runs

Objective: (a) to illustrate the activities of DPFCM to classify a

speciﬁc benchmark dataset of UCI Machine Learning

Reposi-tory; (b) to evaluate the clustering qualities of algorithms

through validity indices; (c) to measure the effect of the number

of peers to the clustering quality; (d) to investigate the

compu-tational time of all algorithms

4.2 An illustration of DPFCM

Firstly, we illustrate the activities of the proposed algorithm –

DPFCM to classify the IRIS dataset In this case, N = 150, r = 4,

C = 3 and the number of peers is P = 3 The cardinalities of the ﬁrst,

second and third peers are 38, 39 and 73, respectively The initial

positive, the neutral and the refusal matrices of the ﬁrst peer are initialized in(52)–(54), respectively

0:082100 0:836100 0:011500 0:722100 0:002400 0:930900 0:365000 0:983200 0:578800 0:002900 0:199000 0:608400 0:116700 0:462500 0:932100

ð52Þ

0:052229 0:123007 0:143827 0:131697 0:878686 0:036471 0:466915 0:002841 0:415851 0:034799 0:450723 0:213030 0:747537 0:094331 0:017593

ð53Þ

Fig 6 The distribution of clusters of Peer 2 in the second iteration.

Trang 10

0:477245 0:020119 0:244364

0:076669 0:050943 0:026051

0:065771 0:000965 0:001082

0:560059 0:075800 0:085142

0:113362 0:394597 0:014931

ð54Þ

From this initialization, the distribution of clusters of the ﬁrst peer

in the ﬁrst iteration is depicted inFig 1

Similarly, the distributions of clusters of the second and third

peer in the ﬁrst iteration are depicted inFigs 2 and 3, respectively

Now, we illustrate the activities of the ﬁrst peer The cluster

cen-ters Vljh(0) calculated by Eq.(24)are expressed in Eq.(55)

0:568 0:523 0:603 0:466

0:563 0:564 0:608 0:470

0:546 0:542 0:619 0:494

ð55Þ

The attribute-weights wljh(1) are computed from(26)and shown in

(56)

0:137 0:225 0:291 0:346

0:181 0:268 0:253 0:298

0:185 0:234 0:273 0:309

ð56Þ

Now all Slave peers synchronize their pairs {Dlijh(0), hlijh(0), Vljh(0),

Fig 4

The values of Lagranian multipliersDlijh(1), hlijh(1) in all peers

(58), respectively

The cluster centers Vljh(1) are then updated accordingly

0:505968 0:482254 0:573009 0:467232 0:496019 0:529548 0:542298 0:452127 0:495326 0:510318 0:562189 0:474282

ð59Þ

Based upon Vljh(1), wljh(1) and the updated Lagranian multipliers, new positive, the neutral and the refusal matrices of the ﬁrst peer are calculated in(60)–(62), respectively

0:146003 0:306919 0:202825 0:244076 0:025787 0:304908 0:146620 0:359057 0:189975 0:129143 0:158800 0:242726 0:045708 0:161369 0:344092

ð60Þ

0:293576 0:638569 0:594259 0:740159 0:863683 0:634079 0:821898 0:585084 0:796155 0:232520 0:758352 0:726713 0:640528 0:534759 0:625413

ð61Þ

0:000 0:000 0:000 0:000 0:171 0:180 0:160 0:191 0:139 0:359 0:214 0:284

0:000 0:000 0:000 0:000 0:185 0:061 0:120 0:126 0:056 0:399 0:196 0:260

0:000 0:000 0:000 0:000 0:195 0:087 0:140 0:141 0:049 0:457 0:227 0:279

ð57Þ

0:000 0:000 0:000 0:000 0:159 0:157 0:181 0:099 0:065 0:081 0:044 0:113

0:000 0:000 0:000 0:000 0:169 0:098 0:282 0:183 0:118 0:123 0:111 0:054

0:000 0:000 0:000 0:000 0:148 0:091 0:276 0:192 0:092 0:099 0:122 0:033

ð58Þ

GLASS, IONOSPHERE, HABERMAN and HEART IRIS is a standard

data set consisting of 150 instances with three classes and four

attributes in which each class contains of 50 instances... Besides the advantages, the proposed work still contains some limitations as follows Firstly, the DPFCM algorithm has large computational time in comparison with some relevant algorithms such as FCM,... 1gives an overview

of those datasets

These datasets are normalized by a simple normalization

X

Xi

max

i

Định dạng
Số trang	16
Dung lượng	1,95 MB