Nghiên cứu phát triển một số thuật toán phân cụm bán giám sát sử dụng mạng nơron min max mờ và ứng dụng tt tiếng anh

2 Propose a novel model of combined semi-supervised clustering,this model automatically defines additional information.In our research, a part of sample of the fuzzy semi-supervised clus

Trang 1

ACADEMY OF MILITARY SCIENCE AND TECHNOLOGY

Trang 2

ACADEMY OF MILITARY SCIENCE AND TECHNOLOGY

Scientific supervisors:

1 Assoc Prof Dr Le Ba Dung

2 Dr Nguyen Doan Cuong

Reviewer 1: Assoc Prof Dr Bui Thu Lam

Military Technical Academy

Reviewer 2: Assoc Prof Phung Trung Nghia

Thai Nguyen University

Reviewer 3: Dr Nguyen Do Van

Academy of Military Science and Technology

The thesis was defended at the Doctoral Evaluating Council at Academy level held at Academy of Military Science and Technology at date ……., 2019

The thesis can be found at:

- The library of Academy of Military Science and Technology

- Vietnam National Library

Trang 3

INTRODUCTION

1 The necessary of the thesis

Fuzzy semi-supervised clustering is an extension of fuzzy clustering using prior knowledge that increases quality of clusters Pre-informed information, also known as additional information, is intended to guide, monitor and control the clustering process

Fuzzy min-max neural network (FMNN) model proposed by Patrick

K Simpson is based on advantages of combining fuzzy logic, artificial neural network, fuzzy min-max theory to solve classing and clustering problem FMNN is an incremental learning model based on fuzzy meta-files for ability to process large data sets

Liver disease diagnosis based on data from liver enzyme test results can be formulated as a pattern recognition problem Use of FMNN is considered an effective approach One of the reasons that FMNN is used

in disease diagnostic support is the ability to generate if…then decision

rule that is very simple Each FMNN's hyperbox transforms into a rule described by quantifying min and max values of the data attributes However, the FMNN itself still has many shortcomings leading to the difficulties and limited practical application Main researches on FMNN focus on major directions such as improving the network structure, optimizing parameters, subscribing, reducing the number of hyperbox in the network, improving the learning method or incorporating other method to improve the quality

Based on the research on FMNN's development process, to improve the efficiency of FMNN, the thesis topic focuses on proposing and improving methodology by semi-supervised learning method In the new methods presented in the thesis, additional information is defined as the label assigned to a piece of data to guide and monitor the clustering process This is a new approach that earlier methods have not mentioned

Trang 4

2 Objectives of the research

1) Develop advanced fuzzy semi-supervised clustering algorithm based on label spreading Additional information is a small percentage of the samples labeled

2) Propose a novel model of combined semi-supervised clustering,this model automatically defines additional information.In our research, a part of sample of the fuzzy semi-supervised clustering algorithm is labeled

3) Develop a fuzzy clustering algorithm considering to the distribution of data

4) Apply fuzzy min-max neural network to the dump of fuzzy

if then decision rule in design of the liver disease diagnostic support

system from data is data of the results of the liver enzyme test

3 Object and scope of the research

The thesis focuses on the following issues:

- An overview of fuzzy min-max neural network and variations of fuzzy min-max neural network

- Analysis of limitations and solutions used by researchers to overcome these limitations

- Application of fuzzy min-max neural network with dump of fuzzy

if then decision rule in disease diagnosis

4 Research methods

The thesis uses theoretical research method, in particular, the thesis has studied the FMNN model for classing and clustering data Since then, the thesis focuses on the proposed semi- supervised clustering algorithm The thesis also uses simulated empirical method in combination with analysis, statistics and evaluation of empirical data

5 Contribution of the thesis

- Develop the advanced SS-FMM algorithm for fuzzy supervised clustering based on label spreading progress

Trang 5

- Propose a novel model of semi-supervised clustering combined with FMNN and SS-FMM This model automatically defined additional information for semi-supervised clustering algorithms

- Develop a fuzzy clustering algorithm considering to the distribution of data

6 Structure of the thesis

Apart from the introduction and conclusion, the main contents of the thesis consists of three chapters:

- Chapter 1 presents an overview of the thesis, including the basic concepts of FMNN and FMNN extensions From general characteristics

of extensions, limitations, it shall provide the direction of the next research Throughout this chapter, the thesis gives an overview of the research problem, concepts and basic algorithms used in the research

- Chapter 2 presents suggestions for improving learning method in the FMNN using the semi-supervised algorithm model for data clustering The additional information is labeled a part of the sample in the training data set Then labels from this part of data are spreading to unlabeled data samples Fuzzy semi-supervised clustering combining with FMNN model automatically defines additional information This is also used as the input of the fuzzy semi-supervised algorithm Data clustering model in fuzzy min-max neural network takes into account distribution of data as well

- Chapter 3 presents the application of proposed model with the

generation of fuzzy decision rules formed if then in support system of

liver disease diagnostic on a real dataset

Chapter 1: Overview of fuzzy min-max neural network

1.1 Fundamental knowledge of fuzzy min-max neural network

* Hyperbox membership function

The degree determination of membership function b j (A,B j) measures

the degree of belonging of sample A corresponding to hyperbox B j It is defined by Eq (1.2) or Ed (1.3) below

Trang 6

* Fuzzy min-max neural network structure

FMNN uses a straight-forward neural network structure, two-layer structure (Fig 1.4) with unsupervised learning and three-layer structure (Fig 1.5) with supervised learning

* Overlapping between hyperboxes

The FMNN algorithm is aimed at creating and modifying hyperboxes

in n-dimensional spaces If the expansion creates overlap between the

hyperboxes, the contraction process is performed to eliminate overlap The

overlap happens between B j and B k if one of the four following cases occurs:

- Case 1: max of B j overlapped with min of B k

- Case 2: min of B j overlapped with max of B k

- Case 3: B k contained within B j

- Case 4: B j contained within B k

If B j and B k are overlapped, the contraction process of hyperboxes is performed in the corresponding direction to eliminate overlap:

- Case 1 If v ji v ki w ji w kithen:

 / 2

v  v w new  old old/ 2

w  v w

Trang 7

- Case 3 If v ji v ki w ki w ji, considerring following cases:

+ If(w kiv ji w jiv ki), then: new old

- Case 4 If v ki v ji w ji w ki, considerring following cases:

+ If (w kiv ji w jiv ki), then: new old

* The learning algorithm in fuzzy min-max neural netwwork

Algorithm in fuzzy min-max neural network only include creation and modification of hyperboxes in the sample space The learning algorithm in FMNN consists of 3 steps: creation and expansion of hyperboxes, overlapping test, hyperbox contraction Each step is repeated for all samples in the dataset

1.2 Some researches to improve quality of FMNN

* Adjust size limit of hyperbox

In order to overcome the phenomenon of exceeding size limit of hyperbox for network training due to the averaging method, D Ma proposed an alternative solution of size limit function to be compared in all dimensions calculated according to formula (1.24) using the formula (1.29)

* Modify FMNN structure to manage overlapping areas

The FMCN (Fuzzy Min-max neural network classifier with Compensatory Neurons) and DCFMN (Data-Core-Based Fuzzy Min–Max Neural Network) models overcome the problems caused by contraction of the hyperboxes that created the additional hyperboxes Rather than adjusting contraction of the hyperboxes, the FMCN and

Trang 8

DCFMN handle overlapping areas by using hyperboxes to manage separate overlapping area

* Improve learning method in FMNN

The semi-supervised model of GFMM (General Fuzzy Min-Max) and RFMN (Reflex Fuzzy Min-max Neural network) uses additional information

as the labels accompanying with some input patterns GFMM and RFMN used prior knowledge to monitor and guide clustering

1.5 Conclusion of Chapter 1

Chapter 1 presented the overview research on FMNN and development trend of FMNN, synthesized and compared the case researches on structural improvement of FMNN algorithm

The following chapters will present proposals on some issues that remain in development of FMNN and application of FMNN to support disease diagnosis

Chapter 2: The development of semi-supervised clustering algorithm using fuzzy min-max neural network

This chapter presents three algorithms to improve learning method and the experimental results used to evaluate proposed algorithms The novel models include:

- An improvement of SS-FMM semi-supervised learning method, results announced in [3]

- A novel model of semi-supervised clustering combined with FMNN and SS-FMM, results announced in [5]

- A fuzzy clustering algorithm considering to the distribution of data

In addition, the algorithm uses a set of additional rules in the training process Results announced in [2, 4]

2.1 SS-FMM semi-supervised fuzzy clustering algorithm

The GFMM model and the modified model (RFMN) have the advantage of using more prior information to monitor the clustering process, thereby improving the clustering quality However, both GFMM and RFMN are capable of producing hyperboxes with their own attributes

Trang 9

that are not labeled Because when GFMM and RFMN create new hyperboxes for the first sample with out label, the new hyperbox is not labeled This hyperbox will wait for labeled samples to edit the label of the hyperbox by the label of the sample However, there may still be unlabeled hyperboxes that are not edited due to the absence of labeled samples Figure 2.1 is an illustrative example of the case of GFMM and RFMN producing unlabeled hyperboxes

Siêu hộp U

Siêu hộp V

Hyperbox U

Hyperbox V

Fig 2.1 Failed hyperboxes of GFMM and RFMN

Where: V is a hyperbox created from labeled samples or be adjusted label by labeled samples, U is a hyperbox created from unlabeled samples

or without label adjustment

The SS-FMM algorithm proposes the method to overcome this disadvantage of GFMM and RFMN SS-FMM prevents the algorithm

from making unlabeled hyperboxes using the β-limit threshold The

initial threshold is defined by user, but the algorithm has the ability to manually redefine the threshold for fit during training process The framework diagram is described in Figure 2.2

When creating a new hyperbox from the unlabeled pattern, SS-FMM

only creates a new hyperbox if it satisfies β criteria defined in (2.2)

Trang 10

from labeled data samples and spread the labels from labeled hyperboxes

to the hyperboxes created by unlabeled samples SS-FMM incorporates all the hyperboxes with the same label that form a full cluster

y

Is there overlapping?

Fig 2.2 General diagram of SS-FMM algorithm

* Complexity evaluation of the SS-FMM algorithm

O(M(M(M-1)/2+NK) Where M is the total number of samples in the training data set, N is the number of attributes of the data sample, K is the total number of hyperboxes generated in the SS-FMM network

Trang 11

2.2 Combined fuzzy semi-supervised clustering algorithm (SCFMN)

The algorithm of SS-FMM generated hyperboxes, with each hyperbox as a cluster SS-FMM used many small hyperboxes to classify samples on the boundary However, when the value of parameter max

decreases, the number of hyperboxes in the network will increase and the complexity of algorithm increases as well SS-FMM should have a certain rate of labeled sample in the training set

To against this limitation of SS-FMM, SCFMN uses the max

parameter for different values in two stages to improve clustering results with fewer hyperboxes Value of 1

maxand 2

maxare the maximum size of the large and small hyperboxes, respectively In the first stage, SCFMN generates hyperboxes and label for fully attached samples with hyperboxes In the second stage, SCFMN spreads label from hyperboxes created in previous stage to hyperboxes created from unlabeled samples Large and small hyperboxes with the same label will form a full cluster Figure 2.3 shows the idea of using large hyperboxes at the center of clusters in conjunction with the smaller hyperboxes in the boundary These hyperboxes are expressed in 2-dimensional space and data sets

consists of two clusters Denote B is a large hyperbox, G is a small hyperbox (dashed line) obtained from labeled samples, R is a small

hyperboxes (dot- cross line) obtained from unlabeled samples

*

Hyperbox B Hyperbox R

+ + + +

+ + + + + + + + +

+ +

+

+ + + + + + + + + + + + + + + +

+ + + + + + + + + + +

+ + +

+

+ + + + +

+

Fig 2.3 SCFMN uses the large and small hyperboxes

Trang 12

2.2.2 Methodology of SCFMN algorithm

Figure 2.5 is general diagram of SCFMN algorithm

* The complexity of the SCFMN algorithm

SCFMN has a time complexity of O(KN(M(K+1)+1)+M(M-1)/2) Where M is the total number of samples in the training data set, N is the number of attributes of the data sample, K is the total number of hyperboxes generated in the SCFMN network

y

Is there overlapping?

Hyperbox contraction

Input pattern {A h ,d h} D

n

y

Phase 1: Additional information

Are all the data has labeled?

Hyperbox contraction

H d

G = G{H new}

Fig 2.5 General diagram of SCFMN algorithm

2.3 CFMNN fuzzy min-max clustering algorithm based on data cluster center

The value of FMNN membership function does not decrease as the samples are far away from the hyperbox To overcome these disadvantages, CFMNN relies on the distances between the samples and

Trang 13

centroids of the corresponding hyper-boxes Centroid value is calculated until the sample is far away from the hyperbox and its membership is less than 0.6, when the membership function value does not decrease Apart from the min and max points, each hyperbox has the center of the hyperbox defined as in (2.8)

For each sample A h satisfies the size limit condition (1.24) where the

membership function value is b j < 0.6, its distance is calculated and compared with others Samples will belong to the closest hyperboxes

* Complexity of the CFMNN algorithm

CFMNN algorithm has a time complexity of O(MKN) Where M is the total number of samples in the training data set, N is the number of attributes of the data sample, K is the total number of hyperboxes generated in CFMNN

2.4 Experiment and evaluation

measurements are used to evaluate the performance of algorithms and

compare them to other ones Accuracy is calculated by (2.12), CCC is

calculated by (2.13)

Định dạng
Số trang	27
Dung lượng	835,6 KB