This kind of systems often provides the medical diagnosis function based on the historic clinical symptoms of patients to give a list of possible diseases accompanied with the mem-bershi
Trang 1HIFCF: An effective hybrid model between picture fuzzy clustering
and intuitionistic fuzzy recommender systems for medical diagnosis
VNU University of Science, Vietnam National University, Hanoi, Viet Nam
a r t i c l e i n f o
Article history:
Available online 31 December 2014
Keywords:
Fuzzy sets
Hybrid Intuitionistic Fuzzy Collaborative
Filtering
Intuitionistic fuzzy recommender systems
Medical diagnosis
Picture fuzzy clustering
a b s t r a c t The health care support system is a special type of recommender systems that play an important role in medical sciences nowadays This kind of systems often provides the medical diagnosis function based on the historic clinical symptoms of patients to give a list of possible diseases accompanied with the mem-bership values The most acquiring disease from that list is then determined by clinicians’ experience expressed through a specific defuzzification method An important issue in the health care support sys-tem is increasing the accuracy of the medical diagnosis function that involves the cooperation of fuzzy systems and recommender systems in the sense that uncertain behaviors of symptoms and the clinicians’ experience are represented by fuzzy memberships whilst the determination of the possible diseases is conducted by the prediction capability of recommender systems Intuitionistic fuzzy recommender sys-tems (IFRS) are such the combination, which results in better accuracy of prediction than the relevant methods constructed on either the traditional fuzzy sets or recommender system only Based upon the observation that the calculation of similarity in IFRS could be enhanced by the integration with the infor-mation of possibility of patients belonging to clusters specified by a fuzzy clustering method, in this paper
we propose a novel hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis so-called HIFCF (Hybrid Intuitionistic Fuzzy Collaborative Filtering) Exper-imental results reveal that HIFCF obtains better accuracy than IFCF and the standalone methods of intui-tionistic fuzzy sets such as De, Biswas & Roy, Szmidt & Kacprzyk, Samuel & Balamurugan and recommender systems, e.g Davis et al and Hassan & Syed The significance and impact of the new method contribute not only the theoretical aspects of recommender systems but also the applicable roles
to the health care support systems
2014 Elsevier Ltd All rights reserved
1 Introduction
In recent years, the health care support system or the clinical
decision support system has emerged as an important tool in
ical sciences to assist clinicians in decision making especially
med-ical diagnosis specifying which diseases could be found from a list
of measured symptoms of a patient as well as the most acquiring
disease among them Physicians, nurses and other healthcare
pro-fessionals use the health care support system to prepare a
diagno-sis and to review the diagnodiagno-sis as a means of improving the final
result According to Basu, Fevrier-Thomas, and Sartipi (2011),
as computer applications that support and assist clinicians in improved decision-making by providing evidence-based knowl-edge with respect to patient data This type of computer-based sys-tem consists of three components: a language syssys-tem, a knowledge system and a problem processing system It is able to handle com-plex problems, applying domain-specific expertise to assess the consequences of executing its recommendations There are two main types of the health care support system (Rouse, 2014) The first one uses a knowledge base, applies rules to patient data using
an inference engine and displays the results to the end user Sys-tems without a knowledge base, on the other hand, rely on machine learning to analyze clinical data (Fig 1) Machine learning methods are conducted to examine patients’ medical history in conjunction with relevant clinical researches, which are able to predict potential events ranging from drug interactions to disease symptoms Utilizing the medical diagnosis process, characteristics
of an individual patient are matched to a computerized clinical
http://dx.doi.org/10.1016/j.eswa.2014.12.042
0957-4174/ 2014 Elsevier Ltd All rights reserved.
⇑Corresponding author at: 334 Nguyen Trai, Thanh Xuan, Hanoi, Viet Nam Tel.:
+84 904 171 284.
E-mail addresses: nguyenthothongtt89@gmail.com (N.T Thong), sonlh@vnu.
edu.vn , chinhson2002@gmail.com (L.H Son).
Contents lists available atScienceDirect
Expert Systems with Applications
j o u r n a l h o m e p a g e : w w w e l s e v i e r c o m / l o c a t e / e s w a
Trang 2knowledge base and patient-specific assessment and
recommen-dations are then presented to the clinical or the patient for a
deci-sion (Rajalakshmi, Mohan, & Babu, 2011)
An important issue in the health care support system is
increas-ing the accuracy of the medical diagnosis Previous researches
con-centrated on improving the machine learning methods/knowledge
systems appeared in Phase 2 of the medical diagnosis process in
A hybrid evolutionary algorithm between genetic programming
and genetic algorithms (Tan, Yu, Heng, & Lee, 2003)
Genetic algorithm (Anbarasi, Anupriya, & Iyengar, 2010)
The combination of a type-2 fuzzy logic with genetic algorithm
An evolutionary artificial neural network approach based on the
Pareto differential evolution algorithm augmented with local
search (Abbass, 2002)
Complex modular neural network (Kala, Janghel, Tiwari, &
Bayesian networks (Gevaert, De Smet, Timmerman, Moreau, &
C4.5 Rule-PANE, which combines an artificial neural network
ensemble with rule induction (Zhou & Jiang, 2003)
Support vector machines (Kampouraki, Vassis, Belsis, &
However, these methods often fail to achieve high accuracy of
prediction with real medical diagnosis datasets This is because
that the relations between the patients – the symptoms and the
symptoms – the diseases (Fig 1) are often vague, imprecise and
uncertain For instance, doctors could faced with patients who
are likely to have personal problems and/or mental disorders so
that the crucial patients’ signs and symptoms are missing,
incom-plete and vague even though the supports of patients’ medical
his-tories and physical examination are provided within the diagnosis
Even if information of patients are clearly provided, how to give
accurate evaluation to given symptoms/diseases is another
chal-lenge requiring well-trained, copious-experienced physicians
These evidences raise the need of using fuzzy set or its extension
to model and assist the techniques that improve the accuracy of
diagnosis The definition of fuzzy set is stated below
Definition 1 A Fuzzy Set (FS) (Zadeh, 1965) in a non-empty set X is
a function
x#lðxÞ;
wherelðxÞ is the membership degree of each element x 2 X A fuzzy set can be alternately defined as,
A ¼ fhx;lðxÞijx 2 Xg: ð2Þ
An extension of FS that is widely applied to the medical prognosis problem is Intuitionistic Fuzzy Set (IFS), which is defined as follows
Definition 2 An Intuitionistic Fuzzy Set (IFS) (Atanassov, 1986) in a non-empty set X is,
eA ¼ hx;n leAðxÞ;ceAðxÞijx 2 Xo
wherele
AðxÞ andceAðxÞ are the membership and non-membership degrees of each element x 2 X, respectively
leAðxÞ;ceAðxÞ 2 ½0; 1; 8x 2 X; ð4Þ
0 6leAðxÞ þceAðxÞ 6 1; 8x 2 X: ð5Þ
The intuitionistic fuzzy index of an element showing the non-deter-minacy is denoted as,
peAðxÞ ¼ 1 leAðxÞ þceAðxÞ; 8x 2 X: ð6Þ
whenpe
AðxÞ ¼ 0 for8x 2 X, IFS returns to the FS set of Zadeh Various researches utilizing FS and IFS for the medical diagnosis process can be found in the literature.De, Biswas, and Roy (2001) extended the Sanchez’s approach with the notion of intuitionistic fuzzy set theory for medical diagnosis The information of symp-toms – patients and sympsymp-toms – diseases are fuzzified by intui-tionistic fuzzy memberships, and the possibilities of acquired diseases are calculated based on those membership values and intuitionistic fuzzy relations.Szmidt and Kacprzyk (2001), Szmidt
of intuitionistic fuzzy set to express new aspects of imperfect infor-mation between the sets of symptoms and diagnoses and defined a new similarity measure between intuitionistic fuzzy sets for the
and intuitionistic fuzzy sets to encounter uncertainty in medical pattern recognition The experimental results showed that both fuzzy sets and intuitionistic fuzzy sets have powerful capabilities
to cope with the uncertainty in the medical pattern recognition problems but intuitionistic fuzzy sets especially the measure of Hausdorf and Mitchel yield better detection rate as a result of more accurate modeling which is involved with incurring more compu-tational cost.Own (2009)studied the switching relation between type-2 fuzzy sets and intuitionistic fuzzy sets to deal with the
for medical diagnosis without concerning about how to calculate the best membership function for each fuzzy data.Neog and Sut
extended Sanchez’s approach for medical diagnosis using the notion of fuzzy soft complement.Xiao et al (2012)proposed the concept of D–S generalized fuzzy soft sets by combining Demp-ster–Shafer theory of evidence and generalized fuzzy soft sets A new method of evaluation based on D–S generalized fuzzy soft sets
intuition-istic fuzzy soft set and a new scoring function to compare two
Clinical Data (Patients-Symptoms)
Knowledge System Machine Learning
Consequent Rules
Results (Patients-Diseases)
Phase 1
Phase 2
Phase 3
Phase 4
Trang 3intuitionistic fuzzy numbers for multi-criteria medical diagnosis.
Sanchez’s approach for medical diagnosis through the arithmetic
mean of an interval valued fuzzy matrix, which is a simpler
tech-nique than that of using intuitionistic fuzzy sets Ahn, Han, Oh,
degrees based on the relation between symptoms and diseases
(three types of headache), and utilized the interval-valued
intui-tionistic fuzzy weighted arithmetic average operator to aggregate
fuzzy information from the symptoms A measure based on
dis-tance between interval-valued intuitionistic fuzzy sets for medical
proposed a new technique named intuitionistic fuzzy max–min
composition to study the Sanchez’s approach for medical
intuitionistic fuzzy sets and fuzzy multisets of Yager Intuitionistic
fuzzy multisets are characterized by the count membership and
the count non-membership functions, and when the sum of these
functions is equal to one, intuitionistic fuzzy multisets returns to
intuitionistic fuzzy sets Intuitionistic fuzzy multisets are used to
model the symptoms by various timestamps Other recent works
could be found inAhn (2014), Bora, Bora, Neog, and Sut (2014),
Bourgani, Stylios, Manis, and Georgopoulos (2014), Das and Kar
(2014), Muthuvijayalakshmi, Kumar, and Venkatesan (2014),
Nguyen, Khosravi, Creighton, and Nahavandi (2014), Sanz, Galar,
Jurio, Brugos, Pagola, et al (2014), Shanmugasundaram and
The limitations of the relevant researches utilizing FS and IFS for
the medical diagnosis process are: Firstly, these works calculate the
relation between the patients and the diseases solely from those
between the patients – the symptoms and the symptoms – the
dis-eases In some practical cases where the relation between the
patients – the symptoms or the symptoms – the diseases is
miss-ing, those works could not be performed This fact is happened in
reality since clinicians somehow do not accurately express the
val-ues of membership and non-membership degrees of symptoms to
diseases or vive versa; secondly, the information of previous
diag-noses of patients could not be utilized That is to say, a patient
has had some records in the patients-diseases databases
before-hand Nevertheless, the calculation of the next records of this
patient is made solely on the basis of both the relations between
the patients – the symptoms and the symptoms – the diseases
Historic diagnoses of patients are not taken into account so that
the accuracy of diagnosis may not be high as a result; thirdly, the
determination of the most acquiring disease is dependent from
the defuzzification method For instance, De et al (2001) used
the hybrid function of membership and non-membership values
for the defuzzification,Samuel and Balamurugan (2012)relied on
the reduction matrix from WPD andSzmidt and Kacprzyk (2001),
Szmidt and Kacprzyk (2003, Szmidt and Kacprzyk (2004), Khatibi
distance functions Independent determination from the
defuzzifi-cation method should be investigated for the stable performance of
the algorithm
Due to these reasons, a combination of fuzzy sets and a machine
learning method is a good choice to eliminate the disadvantages of
the relevant works using FS and IFS Recommender Systems – RS
method, which can give users information about predictive
‘‘rat-ing’’ or ‘‘preference’’ that they would like to assess an item; thus
helping them to choose the appropriate item among numerous
possibilities This kind of expert systems is now commonly
popu-larized in numerous application fields such as books, documents,
images, movie, music, shopping and TV programs personalized
sys-tems Recommender Systems have been applied to medical
proposed CARE, a Collaborative Assessment and Recommendation Engine, which relies only on a patient’s medical history in order to predict future diseases risks and combines collaborative filtering methods with clustering to predict each patient’s greatest disease risks based on their own medical history and that of similar patients An iterative version of CARE so-called ICARE that incorpo-rates ensemble concepts for improved performance was also intro-duced These systems required no specialized information and provided predictions for medical conditions of all kinds in a single
framework expressed in Eq.(7)that assessed patient risk both by matching new cases to historical records and by matching patient demographics to adverse outcomes so that it could achieve a higher predictive accuracy for both sudden cardiac death and
approaches such as logistic regression and support vector machines
Rða; iÞ ¼ raþ
P
b2UnfagSIMða; bÞ ðrb;i rbÞ P
b2UnfagjSIMða; bÞj ; ð7Þ
where a; b are patients and iis the considered disease The similar-ity between two patients – SIMða; bÞ is calculated by the Pearson coefficient from the demographic information of patients Rða; i
Þ and rb;i are the possibilities of acquiring disease iof patient a and
b, respectively raand rb are the average possibilities of acquiring all diseases of patient a and b, respectively More works on the appli-cations of RS to the medical diagnosis could be referenced inDuan, Street, and Xu (2011), Meisamshabanpoor and Mahdavi (2012),
and Chau (2010), Son, Cuong, Lanzi, and Thong (2012), Son, Lanzi, Cuong, and Hung (2012), Son, Cuong, and Long (2013), Son, Linh, and Long (2014), Thong and Son (2014), Son (2014a), Son (2014b,
The standalone RS methods such as the works ofDavis et al
one Moreover, they work only if the historic diagnoses of patients for the prediction are provided, and their accuracies of diagnosis are depended on the defuzzification method Therefore, a coopera-tion of fuzzy systems and recommender systems is regarded as an effective strategy to exclude the drawbacks of both the researches using FS and IFS only in the sense that uncertain behaviors of symptoms and the clinicians’ experience are represented by fuzzy memberships whilst the determination of the possible diseases is conducted by the prediction capability of recommender systems Intuitionistic fuzzy recommender systems – IFRS (Son & Thong,
2015) are such the combination, which results in better accuracy
of prediction than the relevant standalone methods constructed
on either the traditional fuzzy sets or recommender systems only This work is the first effort to initiate fuzzy-based recommender systems for the health care support system In this research, new definitions of single-criterion IFRS (SC-IFRS) and multi-criteria IFRS (MC-IFRS) that extend the definition of RS taking into account a feature of a user and a characteristic of an item expressed by intui-tionistic linguistic labels were proposed Next, new definitions of intuitionistic fuzzy matrix (IFM), which is a representation of SC-IFRS and MC-SC-IFRS in the matrix format and the intuitionistic fuzzy composition matrix (IFCM) of two IFMs with the intersection/ union operation were presented and used to design some new sim-ilarity degrees of IFMs such as the intuitionistic fuzzy simsim-ilarity matrix (IFSM) and the intuitionistic fuzzy similarity degree (IFSD) From these similarity functions, a novel Intuitionistic Fuzzy Collab-orative Filtering method so-called Intuitionistic Fuzzy CollabCollab-orative Filtering (IFCF) was presented for the medical diagnosis problem
Trang 4IFCF has been validated on benchmark medical diagnosis datasets
from UCI Machine Learning Repository in terms of the accuracy of
diagnosis and showed better performance than the standalone
methods of FS and RS
The motivation and contributions of this paper are elicited as
follows IFCF used IFSD to calculate the similarity between two
patients This measure is the generalization of the hard user-based,
item-based and the rating-based similarity degrees in RS (Ricci
integra-tion with the informaintegra-tion of possibility of patients belonging to
clusters specified by a fuzzy clustering method That is to say, if
we know the new patient belongs to which group then the
similar-ities of this patient with others in the group should be given a high
influence in the calculation of IFSD Therefore, in this paper we
pro-pose a novel hybrid model between picture fuzzy clustering and
intuitionistic fuzzy recommender systems for medical diagnosis
so-called Hybrid Intuitionistic Fuzzy Collaborative Filtering (HIFCF)
HIFCF makes uses of a newest picture fuzzy clustering method
namely Distributed Picture Fuzzy Clustering Method – DPFCM (Son,
2015) to classify the patients into some groups according to the
relations information of patients Then, the possibility of a patient
belonging to a certain cluster is used to calculate the similarity
degrees between users They are supplemented into IFSD to give
the final similarity between patients The new hybrid algorithm
HIFCF will be validated experimentally on benchmark UCI Machine
Learning Repository dataset and compared with the relevant
meth-ods in terms of accuracy The rests of the paper are organized as
follows Section2presents the new algorithm HIFCF Section3
val-idates the proposed model by experiments Section4gives the
con-clusions and future works of the paper
2 The proposed method
In this section, we firstly recall some principal terms and
algo-rithms of Intuitionistic fuzzy recommender system – IFRS (Son &
Filter-ing – IFCF algorithm in Section2.1 Secondly, we recall one of the
best recently-published picture fuzzy clustering methods namely
Distributed Picture Fuzzy Clustering Method – DPFCM (Son, 2015)
used to classify the patients into some groups according to their
relations information in Section2.2 Thirdly, the main contribution
of the paper regarding a novel hybrid model between DPFCM
and IFRS for medical diagnosis so-called Hybrid Intuitionistic Fuzzy
Collaborative Filtering (HIFCF) is presented in Section 2.3 Lastly,
some theoretical analyses of the new algorithm are made in
Section2.4
2.1 Intuitionistic fuzzy recommender system
Firstly, the definition of medical diagnosis under the light of
intuitionistic fuzzy sets is described as follows
Definition 3 (Medical diagnosis (Son & Thong, 2015)) Given three
lists: P ¼ fP1; ;Png; S ¼ fS1; ;Smg and D ¼ fD1; ;Dkg where
P is a list of patients, S a list of symptoms and D a list of diseases,
respectively Three values n; m; k 2 Nþare the numbers of patients,
symptoms and diseases, respectively The relation between the
patients and the symptoms is characterized by the
set-RPS¼ fRPSðPi;SjÞj8i ¼ 1; ; n; 8j ¼ 1; ; mg where RPS
ðPi;SjÞ
represented by either a numeric value or a (intuitionistic) fuzzy
value depending on the domain of the problem Analogously, the
relation between the symptoms and the diseases is expressed as
RSD¼ fRSDðSi;DjÞj8i ¼ 1; ; m; 8j ¼ 1; ; kg where RSD
ðSi;DjÞ reflects the possibility that symptom Si would lead to disease Dj
The medical diagnosis problem aims to determine the relation between the patients and the diseases described by the set –
RPD¼ fRPDðPi;DjÞj8i ¼ 1; ; n; 8j ¼ 1; ; kg where RPD
ðPi;DjÞ is either 0 or 1 showing that patient Piacquires disease Djor not The medical diagnosis problem can be shortly represented by the implication fRPS;RSDg ! RPD
Definition 4 (Single-criterion intuitionistic fuzzy recommender sys-tems – SC-IFRS (Son & Thong, 2015)) The utility function R is a mapping specified on ðX; YÞ as follows
R : X Y ! D
ðl1XðxÞ;c1XðxÞÞ;
ðl2XðxÞ;c2XðxÞÞ;
ðlsXðxÞ;csXðxÞÞ
ðl1YðyÞ;c1YðyÞÞ;
ðl2YðyÞ;c2YðyÞÞ;
ðlsYðyÞ;csYðyÞÞ
!
ðl1DðDÞ;c1DðDÞÞ;
ðl2DðDÞ;c2DðDÞÞ;
ðlsDðDÞ;csDðDÞÞ
ð8Þ
whereliXðxÞ 2 ½0; 1 (resp.ciXðxÞ 2 ½0; 1),8i 2 f1; ; sg is the mem-bership (resp non-memmem-bership) value of the patient to the linguis-tic label ith of feature X:ljYðyÞ 2 ½0; 1 (resp cjYðyÞ 2 ½0; 1),
8j 2 f1; ; sg is the membership (resp non-membership) value of the symptom to the linguistic label jth of characteristic Y: Finally,
llDðDÞ 2 ½0; 1 (resp.clDðDÞ 2 ½0; 1),8l 2 f1; ; sg is the membership (resp non-membership) value of disease D to the linguistic label lth SC-IFRS provides two basic functions:
(a) Prediction: determine the values of ðllDðDÞ;clDðDÞÞ;
8l 2 f1; ; sg;
i¼ arg maxi¼1;sfliDðDÞ þliDðDÞð1 liDðDÞ ciDðDÞÞg
systems – MC-IFRS (Son & Thong, 2015)) The utility function R is
a mapping specified on ðX; YÞ below
R : X Y ! D1 Dk
ðl1XðxÞ;c1XðxÞÞ;
ðl2XðxÞ;c2XðxÞÞ;
ðlsXðxÞ;csXðxÞÞ
ðl1YðyÞ;c1YðyÞÞ;
ðl2YðyÞ;c2YðyÞÞ;
ðlsYðyÞ;csYðyÞÞ
!
ðl1DðD1Þ;c1DðD1ÞÞ;
ðl2DðD1Þ;c2DðD1ÞÞ;
ðlsDðD1Þ;csDðD1ÞÞ
ðl1DðDkÞ;c1DðDkÞÞ;
ðl2DðDkÞ;c2DðDkÞÞ;
ðlsDðDkÞ;csDðDkÞÞ
ð9Þ
MC-IFRS is the system that provides two basic functions below
ðllDðDiÞ;clDðDiÞÞ; 8l 2 f1; ; sg; 8i 2 f1; ; kg;
2 ½1; s satisfying i
¼ arg maxi¼1;sfPk
j¼1wjðliDðDjÞ þliDðDjÞð1 liDðDjÞ ciDðDjÞÞÞg where wj2 ½0; 1 is the weight of Djsatisfying the constraint:
Trang 5A representation of MC-IFRS in the matrix format is
demon-strated as follows
Definition 6 (Son & Thong, 2015) An intuitionistic fuzzy matrix
(IFM) Z in MC-IFRS is defined as,
Z ¼
a11 a12 a1s
b21 b22 b2s
c31 c32 c3s
c41 c42 c4s
ct1 ct2 cts
0
B
B
B
B
@
1 C C C C A
In Eq.(10), t ¼ k þ 2 where k 2 Nþis the number of diseases in
labels a1i;b2i;chi;8h 2 f3; ; tg;8i 2 f1; ; sg are the intuitionistic
fuzzy values (IFV) consisting of the membership and
non-member-ship values as in Definition 5 a1i¼ ðliXðxÞ;ciXðxÞÞ; 8i 2 f1; ; sg
represents for the IFV value of the patient to the linguistic label
ith of feature X b2i= (liY(y),ciY(y)), "ie{1, , s} stands for the IFV
value of the symptom to the linguistic label ith of characteristic Y
chi= (liD(Dh-2),ciD(Dh-2)), "ie{1, , s}, "he{3, , t} is the IFV value
of the disease to the linguistic label ith Each line from the third one
to the last in Eq.(10)is related to a given disease
Definition 7 (Son & Thong, 2015) Suppose that Z1and Z2are two IFM in MC-IFRS The intuitionistic fuzzy similarity matrix (IFSM) between Z1and Z2is defined as follows
eS ¼
eS11 eS12 eS1s
eS21 eS22 eS2s
eS31 eS32 eS3s
eS41 eS42 eS4s
eSt1 eSt2 eSts
0 B B B B B
1 C C C C C
where,
eS1i¼ 1
1 exp 1=2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
lð1Þ
iXðxÞ
q
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
lð2Þ
iXðxÞ q
þ ffiffiffiffiffiffiffiffiffiffiffiffiffifficð1Þ
iXðxÞ
q
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cð2Þ
iXðxÞ q
eS2i¼ 1
1 exp 1=2ð ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
lð1Þ
iYðyÞ
q
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
lð2Þ
iYðyÞ q
þ ffiffiffiffiffiffiffiffiffiffiffiffiffifficð1Þ
iYðyÞ
q
ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cð2Þ
iYðyÞ q
1 expð1Þ
eS hi ¼ 1
1 exp 1=2
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
lð1Þ
q
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
lð2Þ
q
þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifficð1Þ
q
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cð2Þ
q
Definition 8 (Son & Thong, 2015) Suppose that Z1and Z2are two IFM in MC-IFRS The intuitionistic fuzzy similarity degree (IFSD) between Z1and Z2is
SIMðZ1;Z2Þ ¼aXs
i¼1
w1ieS1iþ bXs
i¼1
w2ieS2iþvX
t
h¼3
Xs i¼1
whieShi; ð15Þ
Z2:W ¼ ðwijÞð8i 2 f1; ; tg; 8j 2 f1; ; sg) is the weight matrix
of IFSM between Z1and Z2satisfying,
Xs i¼1
w1i¼ 1; Xs
i¼1
w2i¼ 1; Xs
i¼1
whi¼ 1; 8h 2 f3; ; tg; ð16Þ
Definition 9 (Son & Thong, 2015) The formulas to predict the val-ues of linguistic labels of patient Puð8u 2 f1; ; ngÞ to symptom
Sjð8j 2 f1; ; mgÞ according to diseases ðD1;D2; ;DkÞ in MC-IFRS are:
lP u
iDðDjÞ ¼
Pn
v¼1SIMðPu;PvÞ lPv
iDðDjÞ
Pn
v¼1SIMðPu;PvÞ ; 8i 2 f1; ; sg;
8j 2 f1; ; kg; 8u 2 f1; ; ng; ð18Þ
cP u
iDðDjÞ ¼
Pn
v¼1SIMðPu;PvÞ cPv
iDðDjÞ
Pn
v¼1SIMðPu;PvÞ ; 8i 2 f1; ; sg;
8j 2 f1; ; kg; 8u 2 f1; ; ng: ð19Þ
Table 3
The extracted SC-IFRS dataset with ⁄
being the values to be predicted.
Ram Temperatureð0:8; 0:1Þ;
Headacheð0:6; 0:1Þ
Stomach painð0:2; 0:8Þ;
Coughð0:6; 0:1Þ
Chest painð0:1; 0:6Þ
* + Viral feverð0:4; 0:1Þ;Malariað0:7; 0:1Þ
Typhoidð0:6; 0:1Þ;
Stomach problemð0:2; 0:4Þ Chest problemð0:2; 0:6Þ
Mari Temperatureð0:0; 0:8Þ;
Headacheð0:4; 0:4Þ
Stomach painð0:6; 0:1Þ;
Coughð0:1; 0:7Þ
Chest painð0:1; 0:8Þ
* + Viral feverð0:3; 0:5Þ;Malariað0:2; 0:6Þ
Typhoidð0:4; 0:4Þ;
Stomach problemð0:6; 0:1Þ Chest problemð0:1; 0:7Þ
Sugu Temperatureð0:8; 0:1Þ;
Headacheð0:8; 0:1Þ
Stomach painð0:0; 0:6Þ;
Coughð0:2; 0:7Þ
Chest painð0:0; 0:5Þ
⁄
Somu Temperatureð0:6; 0:1Þ;
Headacheð0:5; 0:4Þ
Stomach painð0:3; 0:4Þ;
Coughð0:7; 0:2Þ
Chest painð0:3; 0:4Þ
⁄
Table 1
The relation between the patients and the symptoms.
P Temperature Headache Stomach_pain Cough Chest_pain
Ram (0.8, 0.1) (0.6, 0.1) (0.2, 0.8) (0.6, 0.1) (0.1, 0.6)
Mari (0, 0.8) (0.4, 0.4) (0.6, 0.1) (0.1, 0.7) (0.1, 0.8)
Sugu (0.8, 0.1) (0.8, 0.1) (0, 0.6) (0.2, 0.7) (0, 0.5)
Somu (0.6, 0.1) (0.5, 0.4) (0.3, 0.4) (0.7, 0.2) (0.3, 0.4)
Table 2
The training dataset with ⁄
being the values to be predicted.
P Viral_Fever Malaria Typhoid Stomach Chest
Ram (0.4, 0.1) (0.7, 0.1) (0.6, 0.1) (0.2, 0.4) (0.2, 0.6)
Mari (0.3, 0.5) (0.2, 0.6) (0.4, 0.4) (0.6, 0.1) (0.1, 0.7)
Table 4 The recommended diseases.
P Viral_Fever Malaria Typhoid Stomach Chest
Trang 6Example 1 We illustrate the steps of IFCF by an example inSon
namely P = {Ram, Mari, Sugu, Somu}, five symptoms S =
{Tempera-ture, Headache, Stomach-pain, Cough, Chest-pain} and five
dis-eases D = {Viral-Fever, Malaria, Typhoid, Stomach, Heart} The
relation between the patients and the symptoms is illustrated in
values in this table are needed to be predicted Motivated by
w3i¼ 0:2, the IFSD between Sugu (Somu) and Ram & Mari are shown below
IFSDðSugu; RamÞ ¼ 0:87; ð20Þ IFSDðSugu; MariÞ ¼ 0:57; ð21Þ IFSDðSomu; RamÞ ¼ 0:83; ð22Þ IFSDðSomu; MariÞ ¼ 0:58: ð23Þ
Table 5
The pseudo-code of DPFCM.
Distributed Picture Fuzzy Clustering Method (DPFCM)
I: – Data X whose number of elements (N) in r dimensions
– Number of clusters: C – Number of peers: P þ 1 – Fuzzifier m
– Thresholde> 0 – Parameters:c;a 1 ; a 2 ;a;max Iter
ljh jl ¼ 1; P; j ¼ 1; C; h ¼ 1; r
; n ðu lkj ;glkj; nlkjÞjl ¼ 1; P; k ¼ 1; Y l ;j ¼ 1; C o
w ljh jl ¼ 1; P; j ¼ 1; C; h ¼ 1; r
: DPFCM
– Set the number of iterations: t ¼ 0 – Set D lijh ðtÞ ¼ h lijh ðtÞ ¼ 0, (8i–l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; r) – Randomize fðu lkj ðtÞ;glkj ðtÞ; n lkj ðtÞÞjl ¼ 1; P; k ¼ 1; Y l ; j ¼ 1; Cg satisfying (31)
– Set w ljh ðtÞ ¼ 1=rðl ¼ 1; P; j ¼ 1; C; h ¼ 1; r) 2S: Calculate cluster centers VljhðtÞ; ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; r) from ðulkjðtÞ;glkjðtÞ; n lkj ðtÞÞ; w ljh ðtÞ and h lijh ðtÞ by (39)
3S: Calculate attribute-weights wljhðt þ 1Þ; ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ from ðu lkj ðtÞ;glkj ðtÞ; n lkj ðtÞÞ; V ljh ðtÞ and D lijh ðtÞ by (41)
4S: Send fDlijhðtÞ; hlijhðtÞ; VljhðtÞ; wljhðt þ 1Þji; l ¼ 1; P; i–l; k ¼ 1; Yl;j ¼ 1; Cg to Master
5M: Calculates fDlijhðt þ 1Þ; hlijhðt þ 1Þji; l ¼ 1; P; i–l; k ¼ 1; Yl;j ¼ 1; Cg by(38) and (40)and send them to Slave peers
6S: Calculate cluster centers Vljhðt þ 1Þ, (l ¼ 1; P; j ¼ 1; C; h ¼ 1; r) from ðulkjðtÞ;glkjðtÞ; n lkj ðtÞÞ; w ljh ðt þ 1Þ and h lijh ðt þ 1Þ by (39)
7S: Calculate positive degrees fulkjðt þ 1Þjl ¼ 1; P; k ¼ 1; Yl;j ¼ 1; Cg from ðglkjðtÞ; n lkj ðtÞÞ; w ljh ðt þ 1Þ and V ljh ðt þ 1Þ by (37)
8S: Compute neutral degrees fglkj ðt þ 1Þjl ¼ 1; P; k ¼ 1; Y l ; j ¼ 1; Cg from ðu lkj ðt þ 1Þ; n lkj ðtÞÞ; w ljh ðt þ 1Þ and V ljh ðt þ 1Þ by (42)
9S: Calculate refusal degrees fnlkjðt þ 1Þjl ¼ 1; P; k ¼ 1; Yl; j ¼ 1; Cg from ðu lkj ðt þ 1Þ;glkj ðt þ 1ÞÞ; w ljh ðt þ 1Þ and V ljh ðt þ 1Þ by (43)
10S: If max l fmaxfku lkj ðt þ 1Þ u lkj ðtÞk; kglkjðt þ 1Þ glkjðtÞk; kn lkj ðt þ 1Þ n lkj ðtÞkgg <eor t > max Iter then stop the algorithm,
Otherwise set t ¼ t þ 1 and return Step 3S.
S: Operations in Slave peers.
M: Operations in the Master peer.
Trang 7Next, useDefinition 9to calculate the predictive IFM results of Sugu
and Somu
DiseaseðSuguÞ ¼
Viral feverð0:49; 0:38Þ;
Malariað0:52; 0:22Þ Typhoidð0:36; 0:52Þ;
Stomach problemð0:40; 0:34Þ Chest problemð0:10; 0:68Þ
; ð24Þ
DiseaseðSomuÞ ¼
Viral feverð0:47; 0:39Þ;
Malariað0:52; 0:22Þ Typhoidð0:36; 0:51Þ;
Stomach problemð0:39; 0:47Þ Chest problemð0:10; 0:68Þ
: ð25Þ
Based on the recommendation function ofDefinition 4and Eqs.(24)
as inTable 4 From this table, we conclude that Sugu and Somu both suffer from the Malaria
2.2 Distributed Picture Fuzzy Clustering Method
Clus-tering Method on picture fuzzy sets so-called DPFCM Firstly, we raise the definition of picture fuzzy sets
Definition 10 A Picture Fuzzy Set (PFS) (Cuong & Kreinovich, 2013)
in a non-empty set X is,
_A ¼ hx; lðxÞ;gðxÞ;cðxÞijx 2 X
Fig 3 MAE values of algorithms by 2-fold cross validation.
Fig 4 MAE values of algorithms by 3-fold cross validation.
Trang 8wherel_AðxÞ is the positive degree of each element x 2 X;g_AðxÞ is the
neutral degree and c_AðxÞ is the negative degree satisfying the
constraints,
l_AðxÞ;g_AðxÞ;c_AðxÞ 2 ½0; 1; 8x 2 X; ð27Þ
0 6l_AðxÞ þg_AðxÞ þc_AðxÞ 6 1; 8x 2 X: ð28Þ
n_AðxÞ ¼ 1 ðl_AðxÞ þg_AðxÞ þc_AðxÞÞ;8x 2 X In cases n_AðxÞ ¼ 0 PFS
returns to intuitionistic fuzzy sets (IFS) (Atanassov, 1986), and
when bothg_AðxÞ ¼ n_AðxÞ ¼ 0, PFS returns to fuzzy sets (FS) (Zadeh,
In DPFCM, the communication model is the facilitator or the
Master–Slave model having a Master peer and P Slave peers, and
each Slave peer is allowed to communicate with the Master only
Each Slave peer has a subset of the original dataset X consisting
of N data points in r dimensions We call the subset Yjðj ¼ 1; PÞ and [P
j¼1Yj¼ X; PP
j¼1jYjj ¼ N The number of dimensions in a sub-set is exactly the same as that in the original datasub-set The clustering problem is to divide the dataset X into C groups satisfying the objective function below
J ¼XP l¼1
XY l
k¼1
XC j¼1
ulkj
1 glkj nlkj
!m
Xr h¼1
wljhkXlkh Vljhk2
þcXP l¼1
XC j¼1
Xr h¼1
wljhlog wljh! min; ð29Þ
where ulkj; glkjand nlkjare the positive, the neutral and the refusal degrees of data point kth to cluster jth in the Slave peer lth This reflects the clustering in the PFS set expressed throughDefinition
10 w is the attribute-weight of attribute hth to cluster jth in the
Fig 5 MAE values of algorithms by 4-fold cross validation.
Fig 6 MAE values of algorithms by 5-fold cross validation.
Trang 9Slave peer lth Vljhis the center of cluster jth in the Slave peer lth
according to attribute hth Xlkhis the kth data point of the Slave peer
lth according to attribute hth m andcare the fuzzifier and a
posi-tive scalar, respecposi-tively The constraints for(29)are shown below
ulkj;glkj;nlkj2 ½0; 1; ð30Þ
ulkjþglkjþ nlkj61; ð31Þ
XC
j¼1
ulkj
1 glkj nlkj
!
XC
glkjþnlkj
C
Xr h¼1
Vljh¼ Vijh; ð8i – l; i; l ¼ 1; PÞ ð35Þ
wljh¼ wijh ð8i–l; i; l ¼ 1; PÞ ð36Þ
The clustering model in Eqs.(29)–(36)relies on the principles of the PFS set and the facilitator model By using the Lagranian method and the Picard iteration, the optimal solutions of this model are shown as in Eqs.(37)–(43)
ulkj¼ 1 glkj nlkj
PC i¼1
Pr h¼1 wljhkXlkhVljhk 2
Pr
w kX V k2
m1
; ð8l ¼ 1;P; k ¼ 1;Yl;j ¼ 1;CÞ; ð37Þ
Fig 7 MAE values of algorithms by 6-fold cross validation.
Fig 8 MAE values of algorithms by 7-fold cross validation.
Trang 10hlijh¼ hlijhþ a1ðVljh VijhÞ; ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;
ð38Þ
Vljh¼
PYl
k¼1
u lkj
1 g lkj n lkj
m
wljhXlkhPP
i¼1 i–lhlijh
PY l
k¼1
u lkj
1 g lkj n lkj
m
wljh
;
ð8l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ; ð39Þ
Dlijh¼Dljihþ a2ðwljh wijhÞ; ð8i–l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;
ð40Þ
wljh¼
exp 1
c PYl k¼1
ulkj 1 g lkj nlkj
kXlkh Vljhk2þcþ 2PP
i¼1 i–l
Dlijh
Pr
h 0 ¼1exp 1
c PYl k¼1
ulkj 1 glkjnlkj
kXlkh 0 Vljh 0k2þcþ 2PP
i¼1 i–l
Dlijh0
glkj¼ 1 nlkjþ
C1 C
PC i¼1nlki
PC i¼1
ulkj
ulki
Pr h¼1
w lih kX lkh V lih k 2
w ljh kX lkh V ljh k 2
mþ1
;
ð8l ¼ 1; P; k ¼ 1; Yl;j ¼ 1; CÞ; ð42Þ
nlkj¼ 1 ðulkjþglkjÞ ð1 ðulkjþglkjÞaÞ1=a;
ð8l ¼ 1; P; k ¼ 1; Y; j ¼ 1; CÞ: ð43Þ
Fig 9 MAE values of algorithms by 8-fold cross validation.
Fig 10 MAE values of algorithms by 9-fold cross validation.