Thông tin tóm tắt về những đóng góp mới của luận án tiến sĩ: Khai phá luật quyết định trên mô hình dữ liệu dạng khối.

The thesis focuses on the problem of mining the decision laws on the block in some cases with the following main results: - Builded a model of mining decision laws on the data block w[r]

Trang 1

TRAINING OF SCIENCE AND TECHNOLOGY

GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY

-

Do Thi Lan Anh

MINING DECISION LAWS ON THE DATA BLOCK

Major: Computer sciense

Code: 9 48 01 01

SUMMARY OF COMPUTER DOCTORAL THESIS

Ha Noi –2020

Trang 2

Technology

Science instructor: Assoc Prof Dr Trinh Dinh Thang

Reviewer 1: Assoc Prof Dr Nguyen Huu Quynh

Reviewer 2: Assoc Prof Dr Do Nang Toan

Reviewer 3: Assoc Prof Dr Pham Van Cuong

The thesis will be defended before the Academy-level PhD Thesis Evaluation Council, meeting at Graduate University of Science and Technology - Vietnam Academy of Science and Technology at… o'clock , date… month… year 202…

The thesis can be found at:

- Graduate University of Science and Technology’s Library

- National Library of Vietnam

Trang 3

INTRODUCTION

1 The urgency of the thesis

Mining decision laws is the process of defining decision laws on a given decision table, serving the object classification problem This is one of the popular data mining techniques and has been studied by many domestic and foreign experts on both

of the relational model and the extended models of the relational model

Researches in the world and our country for the purpose

of finding meaningful knowledge, especialy the laws on different data models with different research directions An approach to the data block model of the authors with the purpose of tracking the laws occurring in a process that changes over time, period

is the desire to contribute of the thesis

2 The objective of the thesis

The objective of the thesis focus on solving three problems:

- To find decision laws on the data block and block’s slice

- To find decision laws bettwen object groups on the block which has index attribute value change, particulary when smoothing, or roughing attribute value

- To find decision laws bettwen object groups on the block when adding or removing block’s elements

3 Layout of the thesis

The layout of the thesis consists of the introduction and the three chapters of content, the conclusions and the references

Trang 4

Chapter 1 presents the basic concepts of data block, data mining, mining decision laws and equivalent relationship

Chapter 2 presents two research results: the first is to propose the MDLB algorithm to find decision laws on the block and block’s slice The second is to propose the MDLB_VAC algorithm to find decision laws on the block in case the attribute value changes In addition, giving theoretical studies on block mining, calculating complexity and setting test proposed algorithms

Chapter 3 builds a model to increase or decrease the object set of decision blocks; proposes two incremental algorithms MDLB_OSC1 and MDLB_OSC2 to find decision laws when block’s object set changes and test setting

CHAPTER 1 SOME BASIC KNOWLEDGE

1.1 Data mining

1.1.1 Definition of data mining

Data mining is the main stage in the process of discovering knowledge in the database This process’s output is the latent knowledge from data to help forecast, making decisions

in business, management, production activities,

1.1.2 Some data mining techniques

Trang 5

1.2.1 Information System

Definition 1.1 (Informattion system)

Information system is a set of four S = (U, A, V, f) where

U is a finite object set, different than empty objects (U is also known as the set of universe) and A is the finite and non-empty attributes set; V is the values set, in which where V=⋃a∈A V A , V a

is the value set of the attribute a  A, f is the information function f: U x A→V, where a  A, u  U: f(u,a)  V a

1.2.2 Indiscernibility Relation

Given the information system S = (U, A, V, f), for each attribute subset P  A, there exists a binary relations on U, denoted IND (P), defined as follows:

IND(P) = {(u,v)  U x U|u(a) = v(a), a  P)

IND (P) is called an Indiscernibility

1.2.3 Decision table

Decision table is a special information system in which

attribute set A is divided into two separate non-empty sets C and

D, ( A C =  D C ,  =  D ), respectively called conditional

attribute set C and decision attribute set D

The decision table is denoted as: DS = (U, C  D, V, f)

or simply DS = (U, C  D)

1.2.4 Decision law

Definition 1.4 (Decision law)

Given the decision table DS = (U, CD), suppose U/C

= {C 1 , C 2 , …, C m } and U/D = {D 1 , D 2 , …, D n } are the partitions generated by C, D For C i  U/C, D j  U/D, a decision law is presented as: C i→ D j , i=1 m, j=1 n

1.3 Model of data block

Trang 6

1.3.1 The block

Definition 1.8

Let R = (id; A 1 , A 2 , , A n ) is a finite set of elements, where

id is non-empty finite index set, A i (i=1 n) is the attribute Each attribute A i (i=1 n) there is a corresponding value domain dom(A i ) A block r on R, denoted r(R) consists of a finite number

of elements that each element is a family of mappings from the

index set id to the value domain of the attributes A i (i = 1 n)

t  r (R)  t = { t i : id → dom (A i )} i=1 n

The block is denoted by r(R) or r(id; A 1 , A 2 , , A n ), sometime without fear of confusion we simply denoted r

1.3.2 The block’s slice

Let R = (id; A 1 , A 2 , , A n ), r(R) is a block over R For each xid we denoted r(R x ) is a block with R x = ({x}; A 1 , A 2 , ,

Here, for simplicity we use symbols:

x (i) = (x; A i ) ; id (i) = {x (i) | x  id}

We call x (i) (x  id, i = 1 n) are the index attributes of the block scheme R = (id; A 1 ,A 2 , ,A n )

1.3.3 Relational algebra on the block

Trang 7

Subtraction Descartes product Descartes product with index set Projection

Division

1.4 Conclusion chapter 1

Chapter 1 of the thesis presents an overview of data mining, data mining techniques, knowledge of mining decision law, equivalence class The last part chapter presents basic concepts

of the data block model: blocks, block’s slices, relational algebra

on blocks These knowledge will be the basis for the issues presented in the next chapter

CHAPTER 2 MINING DECISION LAWS ON THE DATA BLOCK HAS VARIABLE ATTRIBUTE VALUES

2.1 Some concepts built on the block

2.1.1 Information block

Definition 2.1

Let block scheme R = (id;A 1 ,A 2 , ,A n ), r is a block over

R Then, the information block is a tuples of four elements IB= (U,A,V,f ) with U is a set of objects of r called space objects, A

Trang 8

index attribute set P A we define an equivalence relation, sign IND(P) defined as follows:

IND(P) ={(u,v) UxU| x (i)P: f(u,x (i) )=f(v,x (i) )}, and

called non-discriminatory relations:

2.1.3 Decision block

Definition 2.5

Let information block IB = (U,A,V,f) with U is the space

of objects A = Suppose A is divided into two sets C and

D such that:

1,

k i

2.1.4 Decision laws on the block and slice

correspondingly, the partitions are generated by C, C x , D, D x A

decision law on a block is denoted by:

C i→ D j , i = 1 m, j=1 k ,

( ) 1

n i i

id

=

( ) 1,

k i

i x id

x

=  ( )

k i

i

x

= ( )

Trang 9

and on the slice at point x is denoted by:

q{1,2,…,h x }, xid Then, support, accuracy and coverage of

decision law Ci→ Dj on the block are:

Definition 2.9

Let decision block DB=(U,CD), C iU/C, D jU/D is

the conditional equivalence class and decision equivalence class

generated by C, D corresponding, C i→ D j is the decision law on the block DB, i =1 m, j=1 k

- If Acc(C i→ D j ) = 1 then C i→ D j is called certain decision law

- If 0 < Acc(C i→ D j ) < 1 then C i→ D j is called uncertain

decision law

Definition 2.10

Let decision block DB = (U,CD), C i U/C, D j U/D,

i =1 m, j =1 k is the conditional equivalence class and decision equivalence class generated by C,D corresponding; ,  are two given thresholds (, (0,1)) If Acc(C i ,D j )  and Cov(C i ,D j )

 then we call C i→ D j is the decision law meaning

2.2 Mining decision law on the data block and block’s slice algorithm (MDLB)

The MDLB algorithm consists of the following steps:

Trang 10

- Step 1: Assign classes of conditional, decision equivalence on blocks (on slices)

- Step 2: Calculate the support matrix on the block (on slice)

- Step 3: Calculate accuracy matrix, coverage matrix

- Step 4: Find the decision laws on the block

2.3 Mining decision laws on the block when index attribute value changed

Definition 2.11(Definition of smoothing index attribute value on the block)

Let decision block DB=(U,CD,V,f), with U is the space

of objects, a CD, V a is the set of existing values of the index attribute a Suppose Z={x sU | f(x s ,a) = z} is the set of objects whose z value is on the index attribute a If Z is partitioned into two sets W and Y such that: Z=WY, WY= with W={x pU| f(x p ,a) = w, wV a }, Y={x qU| f(x q ,a)=y, yV a }, then we say the

z value of the index attribute a is smoothed to two new values w and y

Definition 2.12(Definition of roughing index attribute value on the block)

Let decision block DB=(U,CD,V,f), with U is the space

of objects, a CD, V a is the set of existing values of the index

attribute a Suppose f(x p ,a)=w, f(x q ,a)=y are respectively the values of x p , x q on the index attribute a (pq) If at any one time

we have: f(x p ,a)= f(x q ,a)=z, (zV a ) then we say the two values w,

y of a are roughened to the new value z

Theorem 2.1

Let decision block DB = (U, CD, V, f ), with U is the

Trang 11

space of objects, a CD, V a is the set of existing values of the index attribute a Then, two equivalent classes E p , E q (E p ,

E qU/E, E{C,D}) is made rough into new equivalent class E s if and only if a j a: f(E p ,a j ) = f(E q ,a j )

Theorem 2.2

Let decision block DB = (U, CD, V, f ), with U is the space of objects, a C D, V a is the set of existing values of the index attribute a Then, equivalent class E s (E sU/E, E{C,D}) smoothed into two new equivalents classes E p , E q if and only if

we can put: f(E p ,a)=w, f(E q ,a)=y và E p E q = E s , w, yV a , w

y

Theorem 2.3

Let decision block DB = (U, CD, V, f ) ,  are two given thresholds (, (0,1)) Suppose that if C i → D j is the decision law meaning on the decision block then it is also the

decision law meaning on any slice of the decision block at xid

2.3.1 Smoothing, roughening the conditional equivalente clases on the decision block and on the slice

Proposition 2.3

Let decision block DB = (U,CD,V,f ), a=x(i) C, V a is the set of existing values of the conditional index attribute a, The

z value of a is smoothed to two new values w and y

Suppose that if the conditional equivalence class C s

U/C, (f(C s ,a)=z ) smoothed into two new conditional equivalents classes C p ,C q (f(C p ,a)=w, f(C q ,a)=y, with w,yV a ) then on the slice r x , exists equivalence class C xi satisfy: C s  C xi , also smoothed into two new conditional equivalents classes C xi’ and

C xi’’ satisfy: C pC xi’, C qC xi’’ (f(C xi’ ,a)=w, f(C xi’’ ,a)=y)

Trang 12

Proposition 2.5

Let decision block DB = (U, CD, V, f ), a=x (i) C, V a

is the set of existing values of the conditional index attribute a, the w and y values of a are roughened to the new value z

Suppose, if two conditional equivalents classes C p ,C q 

U/C, (f(C p ,a)=w, f(C q ,a)=y) is made rough into new conditional equivalent class C s U/C ( f(C s ,a)=z ) then on the slice r x exists two conditional equivalents classes C xi , C xj satisfy: C pC xi, C q

C xj , also is made rough into new conditional equivalent class C xk

satisfy: C s C xk

2.3.2 Smoothing, roughening the decision equivalente clases

on the decision block and on the slice

Proposition 2.7

Let decision block DB = (U, CD, V, f ), a=x (i) D, V a

is the set of existing values of the decision index attribute a, the

z value of a is smoothed to two new values w and y

Suppose that if decision equivalent class D s U/D ( f(D s ,a)=z ) smoothed into two decision equivalents classes D p ,D q (f(D p ,a)=w, f(D q ,a)=y, with w,yV a ) then on the slice r x , exists decision equivalence class D xi satisfy: D s D xi , also smoothed into two new decision equivalents classes D xi’ and D xi’’ satisfy:

D pD xi’, D qD xi’’ (f(D xi’ ,a)=w, f(D xi’’ ,a)=y)

Proposition 2.9

Let decision block DB = (U, CD, V, f ), a=x (i) D, V a

is the set of existing values of the decision index attribute a, the

w and y values of a are roughened to the new value z

Suppose, if two decision equivalents classes D p ,D q , (f(D p ,a)=w, f(D q ,a)=y) is made rough into new decision

Trang 13

equivalent class D s U/D ( f(D s ,a)=z ) then on the slice r x exists two decision equivalents classes D xi , D xj satisfy: D pD xi, D q

D xj , also is made rough into new decision equivalent class D xk

satisfy: D s D xk

2.3.4 The algorimth of mining decision laws when smoothing, roughing index attribute values on the block and the slice (MDLB_VAC)

The algorimth MDLB_VAC consists of the following steps:

Step 1: Calculate the support matrix Sup (C,D) of the

original block

Step 2: Incremental calculating the support matrix on the

block Sup (C',D') after roughing/ smoothing the value of the

index attribute

Step 3: Calculate accuracy matrix Acc (C',D'), the coverage matrix Cov (C',D') after roughing/ smoothing the value

of the index attribute from the matrix Sup (C',D')

Step 4: Finding decision laws on the block

2.4 Complexity of the Sup matrix algorithms on the block and

on the slice

Proposition 2.13: The support matrix algorithm for decision

block and slice at xid has the same complexity of O(|U| 2 )

Proposition 2.14: The support matrix algorithm for decision

block and slice at xid after roughing the values of the conditional index attribute has the same complexity of O(|U| 2 ) Proposition 2.15: The support matrix algorithm for decision block and slice at xid after smoothing the values of the conditional index attribute has the same complexity of O(|U| 2 )

Trang 14

CHAPTER 3 MINING DECISION LAWS ON BLOCK

HAS OBJECT SET CHANGED

3.1 Model of adding and removing objects on block and slice

Proposition 3.1

Let decision block DB = (U, CD, V, f ), AN and DM is

a set of adding and removing objects to block decisions DB Then

, 𝑖 = 𝑚 + 1 𝑚 + 𝑝, 𝑗 = 1 ℎ + 𝑞

,

Trang 15

Proposition 3.3

Let decision block DB = (U, CD, V, f ), AN and DM is

a set of adding and removing objects to block decisions DB Then

3.2.1 Adding object x into decision block

Case 1: Create a new conditional class and a new decision class Acc(C’ m+1 , D’ h+1 ) = 1 and Cov(C’ m+1 , D’ h+1 ) = 1,

j=1 h: Acc(C’ m+1 , D’ j ) = Cov(C’ m+1 , D’ j ) = 0,

i=1 m: Acc(C’ i , D’ h+1 ) = Cov(C’ i , D’ h+1 ) = 0

Other way, i=1 m, j=1 h:

Acc(C’ i , D’ j ) = Acc(C i , D j ) ,

and Cov(C’ i , D’ j ) = Cov(C i , D j )

Case 2: Create only new conditional class

Acc(C’ m+1 , D’ j* ) = 1 and Cov(C’ m+1 , D’ j* ) = 1

|𝐷𝑗∗|+1

If k  j* then: Acc(C’ m+1 , D’ k ) = Cov(C’ m+1 , D’ k ) = 0

If i  m+1 then: Acc(C’ i , D’ j* ) = Acc(C i , D j* ), Cov(C’ i , D’ j* )

=|𝐶𝑖∩𝐷𝑗∗|

|𝐷𝑗∗|+1

Định dạng
Số trang	26
Dung lượng	707,14 KB