The thesis focuses on the problem of mining the decision laws on the block in some cases with the following main results: - Builded a model of mining decision laws on the data block w[r]
Trang 1TRAINING OF SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY
-
Do Thi Lan Anh
MINING DECISION LAWS ON THE DATA BLOCK
Major: Computer sciense
Code: 9 48 01 01
SUMMARY OF COMPUTER DOCTORAL THESIS
Ha Noi –2020
Trang 2Technology
Science instructor: Assoc Prof Dr Trinh Dinh Thang
Reviewer 1: Assoc Prof Dr Nguyen Huu Quynh
Reviewer 2: Assoc Prof Dr Do Nang Toan
Reviewer 3: Assoc Prof Dr Pham Van Cuong
The thesis will be defended before the Academy-level PhD Thesis Evaluation Council, meeting at Graduate University of Science and Technology - Vietnam Academy of Science and Technology at… o'clock , date… month… year 202…
The thesis can be found at:
- Graduate University of Science and Technology’s Library
- National Library of Vietnam
Trang 3INTRODUCTION
1 The urgency of the thesis
Mining decision laws is the process of defining decision laws on a given decision table, serving the object classification problem This is one of the popular data mining techniques and has been studied by many domestic and foreign experts on both
of the relational model and the extended models of the relational model
Researches in the world and our country for the purpose
of finding meaningful knowledge, especialy the laws on different data models with different research directions An approach to the data block model of the authors with the purpose of tracking the laws occurring in a process that changes over time, period
is the desire to contribute of the thesis
2 The objective of the thesis
The objective of the thesis focus on solving three problems:
- To find decision laws on the data block and block’s slice
- To find decision laws bettwen object groups on the block which has index attribute value change, particulary when smoothing, or roughing attribute value
- To find decision laws bettwen object groups on the block when adding or removing block’s elements
3 Layout of the thesis
The layout of the thesis consists of the introduction and the three chapters of content, the conclusions and the references
Trang 4Chapter 1 presents the basic concepts of data block, data mining, mining decision laws and equivalent relationship
Chapter 2 presents two research results: the first is to propose the MDLB algorithm to find decision laws on the block and block’s slice The second is to propose the MDLB_VAC algorithm to find decision laws on the block in case the attribute value changes In addition, giving theoretical studies on block mining, calculating complexity and setting test proposed algorithms
Chapter 3 builds a model to increase or decrease the object set of decision blocks; proposes two incremental algorithms MDLB_OSC1 and MDLB_OSC2 to find decision laws when block’s object set changes and test setting
CHAPTER 1 SOME BASIC KNOWLEDGE
1.1 Data mining
1.1.1 Definition of data mining
Data mining is the main stage in the process of discovering knowledge in the database This process’s output is the latent knowledge from data to help forecast, making decisions
in business, management, production activities,
1.1.2 Some data mining techniques
Trang 51.2.1 Information System
Definition 1.1 (Informattion system)
Information system is a set of four S = (U, A, V, f) where
U is a finite object set, different than empty objects (U is also known as the set of universe) and A is the finite and non-empty attributes set; V is the values set, in which where V=⋃a∈A V A , V a
is the value set of the attribute a A, f is the information function f: U x A→V, where a A, u U: f(u,a) V a
1.2.2 Indiscernibility Relation
Given the information system S = (U, A, V, f), for each attribute subset P A, there exists a binary relations on U, denoted IND (P), defined as follows:
IND(P) = {(u,v) U x U|u(a) = v(a), a P)
IND (P) is called an Indiscernibility
1.2.3 Decision table
Decision table is a special information system in which
attribute set A is divided into two separate non-empty sets C and
D, ( A C = D C , = D ), respectively called conditional
attribute set C and decision attribute set D
The decision table is denoted as: DS = (U, C D, V, f)
or simply DS = (U, C D)
1.2.4 Decision law
Definition 1.4 (Decision law)
Given the decision table DS = (U, CD), suppose U/C
= {C 1 , C 2 , …, C m } and U/D = {D 1 , D 2 , …, D n } are the partitions generated by C, D For C i U/C, D j U/D, a decision law is presented as: C i→ D j , i=1 m, j=1 n
1.3 Model of data block
Trang 61.3.1 The block
Definition 1.8
Let R = (id; A 1 , A 2 , , A n ) is a finite set of elements, where
id is non-empty finite index set, A i (i=1 n) is the attribute Each attribute A i (i=1 n) there is a corresponding value domain dom(A i ) A block r on R, denoted r(R) consists of a finite number
of elements that each element is a family of mappings from the
index set id to the value domain of the attributes A i (i = 1 n)
t r (R) t = { t i : id → dom (A i )} i=1 n
The block is denoted by r(R) or r(id; A 1 , A 2 , , A n ), sometime without fear of confusion we simply denoted r
1.3.2 The block’s slice
Let R = (id; A 1 , A 2 , , A n ), r(R) is a block over R For each xid we denoted r(R x ) is a block with R x = ({x}; A 1 , A 2 , ,
Here, for simplicity we use symbols:
x (i) = (x; A i ) ; id (i) = {x (i) | x id}
We call x (i) (x id, i = 1 n) are the index attributes of the block scheme R = (id; A 1 ,A 2 , ,A n )
1.3.3 Relational algebra on the block
Trang 7Subtraction Descartes product Descartes product with index set Projection
Division
1.4 Conclusion chapter 1
Chapter 1 of the thesis presents an overview of data mining, data mining techniques, knowledge of mining decision law, equivalence class The last part chapter presents basic concepts
of the data block model: blocks, block’s slices, relational algebra
on blocks These knowledge will be the basis for the issues presented in the next chapter
CHAPTER 2 MINING DECISION LAWS ON THE DATA BLOCK HAS VARIABLE ATTRIBUTE VALUES
2.1 Some concepts built on the block
2.1.1 Information block
Definition 2.1
Let block scheme R = (id;A 1 ,A 2 , ,A n ), r is a block over
R Then, the information block is a tuples of four elements IB= (U,A,V,f ) with U is a set of objects of r called space objects, A
Trang 8index attribute set P A we define an equivalence relation, sign IND(P) defined as follows:
IND(P) ={(u,v) UxU| x (i)P: f(u,x (i) )=f(v,x (i) )}, and
called non-discriminatory relations:
2.1.3 Decision block
Definition 2.5
Let information block IB = (U,A,V,f) with U is the space
of objects A = Suppose A is divided into two sets C and
D such that:
1,
k i
2.1.4 Decision laws on the block and slice
correspondingly, the partitions are generated by C, C x , D, D x A
decision law on a block is denoted by:
C i→ D j , i = 1 m, j=1 k ,
( ) 1
n i i
id
=
( ) 1,
k i
i x id
x
= ( )
k i
i
x
= ( )
Trang 9and on the slice at point x is denoted by:
q{1,2,…,h x }, xid Then, support, accuracy and coverage of
decision law Ci→ Dj on the block are:
Definition 2.9
Let decision block DB=(U,CD), C iU/C, D jU/D is
the conditional equivalence class and decision equivalence class
generated by C, D corresponding, C i→ D j is the decision law on the block DB, i =1 m, j=1 k
- If Acc(C i→ D j ) = 1 then C i→ D j is called certain decision law
- If 0 < Acc(C i→ D j ) < 1 then C i→ D j is called uncertain
decision law
Definition 2.10
Let decision block DB = (U,CD), C i U/C, D j U/D,
i =1 m, j =1 k is the conditional equivalence class and decision equivalence class generated by C,D corresponding; , are two given thresholds (, (0,1)) If Acc(C i ,D j ) and Cov(C i ,D j )
then we call C i→ D j is the decision law meaning
2.2 Mining decision law on the data block and block’s slice algorithm (MDLB)
The MDLB algorithm consists of the following steps:
Trang 10- Step 1: Assign classes of conditional, decision equivalence on blocks (on slices)
- Step 2: Calculate the support matrix on the block (on slice)
- Step 3: Calculate accuracy matrix, coverage matrix
- Step 4: Find the decision laws on the block
2.3 Mining decision laws on the block when index attribute value changed
Definition 2.11(Definition of smoothing index attribute value on the block)
Let decision block DB=(U,CD,V,f), with U is the space
of objects, a CD, V a is the set of existing values of the index attribute a Suppose Z={x sU | f(x s ,a) = z} is the set of objects whose z value is on the index attribute a If Z is partitioned into two sets W and Y such that: Z=WY, WY= with W={x pU| f(x p ,a) = w, wV a }, Y={x qU| f(x q ,a)=y, yV a }, then we say the
z value of the index attribute a is smoothed to two new values w and y
Definition 2.12(Definition of roughing index attribute value on the block)
Let decision block DB=(U,CD,V,f), with U is the space
of objects, a CD, V a is the set of existing values of the index
attribute a Suppose f(x p ,a)=w, f(x q ,a)=y are respectively the values of x p , x q on the index attribute a (pq) If at any one time
we have: f(x p ,a)= f(x q ,a)=z, (zV a ) then we say the two values w,
y of a are roughened to the new value z
Theorem 2.1
Let decision block DB = (U, CD, V, f ), with U is the
Trang 11space of objects, a CD, V a is the set of existing values of the index attribute a Then, two equivalent classes E p , E q (E p ,
E qU/E, E{C,D}) is made rough into new equivalent class E s if and only if a j a: f(E p ,a j ) = f(E q ,a j )
Theorem 2.2
Let decision block DB = (U, CD, V, f ), with U is the space of objects, a C D, V a is the set of existing values of the index attribute a Then, equivalent class E s (E sU/E, E{C,D}) smoothed into two new equivalents classes E p , E q if and only if
we can put: f(E p ,a)=w, f(E q ,a)=y và E p E q = E s , w, yV a , w
y
Theorem 2.3
Let decision block DB = (U, CD, V, f ) , are two given thresholds (, (0,1)) Suppose that if C i → D j is the decision law meaning on the decision block then it is also the
decision law meaning on any slice of the decision block at xid
2.3.1 Smoothing, roughening the conditional equivalente clases on the decision block and on the slice
Proposition 2.3
Let decision block DB = (U,CD,V,f ), a=x(i) C, V a is the set of existing values of the conditional index attribute a, The
z value of a is smoothed to two new values w and y
Suppose that if the conditional equivalence class C s
U/C, (f(C s ,a)=z ) smoothed into two new conditional equivalents classes C p ,C q (f(C p ,a)=w, f(C q ,a)=y, with w,yV a ) then on the slice r x , exists equivalence class C xi satisfy: C s C xi , also smoothed into two new conditional equivalents classes C xi’ and
C xi’’ satisfy: C pC xi’, C qC xi’’ (f(C xi’ ,a)=w, f(C xi’’ ,a)=y)
Trang 12Proposition 2.5
Let decision block DB = (U, CD, V, f ), a=x (i) C, V a
is the set of existing values of the conditional index attribute a, the w and y values of a are roughened to the new value z
Suppose, if two conditional equivalents classes C p ,C q
U/C, (f(C p ,a)=w, f(C q ,a)=y) is made rough into new conditional equivalent class C s U/C ( f(C s ,a)=z ) then on the slice r x exists two conditional equivalents classes C xi , C xj satisfy: C pC xi, C q
C xj , also is made rough into new conditional equivalent class C xk
satisfy: C s C xk
2.3.2 Smoothing, roughening the decision equivalente clases
on the decision block and on the slice
Proposition 2.7
Let decision block DB = (U, CD, V, f ), a=x (i) D, V a
is the set of existing values of the decision index attribute a, the
z value of a is smoothed to two new values w and y
Suppose that if decision equivalent class D s U/D ( f(D s ,a)=z ) smoothed into two decision equivalents classes D p ,D q (f(D p ,a)=w, f(D q ,a)=y, with w,yV a ) then on the slice r x , exists decision equivalence class D xi satisfy: D s D xi , also smoothed into two new decision equivalents classes D xi’ and D xi’’ satisfy:
D pD xi’, D qD xi’’ (f(D xi’ ,a)=w, f(D xi’’ ,a)=y)
Proposition 2.9
Let decision block DB = (U, CD, V, f ), a=x (i) D, V a
is the set of existing values of the decision index attribute a, the
w and y values of a are roughened to the new value z
Suppose, if two decision equivalents classes D p ,D q , (f(D p ,a)=w, f(D q ,a)=y) is made rough into new decision
Trang 13equivalent class D s U/D ( f(D s ,a)=z ) then on the slice r x exists two decision equivalents classes D xi , D xj satisfy: D pD xi, D q
D xj , also is made rough into new decision equivalent class D xk
satisfy: D s D xk
2.3.4 The algorimth of mining decision laws when smoothing, roughing index attribute values on the block and the slice (MDLB_VAC)
The algorimth MDLB_VAC consists of the following steps:
Step 1: Calculate the support matrix Sup (C,D) of the
original block
Step 2: Incremental calculating the support matrix on the
block Sup (C',D') after roughing/ smoothing the value of the
index attribute
Step 3: Calculate accuracy matrix Acc (C',D'), the coverage matrix Cov (C',D') after roughing/ smoothing the value
of the index attribute from the matrix Sup (C',D')
Step 4: Finding decision laws on the block
2.4 Complexity of the Sup matrix algorithms on the block and
on the slice
Proposition 2.13: The support matrix algorithm for decision
block and slice at xid has the same complexity of O(|U| 2 )
Proposition 2.14: The support matrix algorithm for decision
block and slice at xid after roughing the values of the conditional index attribute has the same complexity of O(|U| 2 ) Proposition 2.15: The support matrix algorithm for decision block and slice at xid after smoothing the values of the conditional index attribute has the same complexity of O(|U| 2 )
Trang 14CHAPTER 3 MINING DECISION LAWS ON BLOCK
HAS OBJECT SET CHANGED
3.1 Model of adding and removing objects on block and slice
Proposition 3.1
Let decision block DB = (U, CD, V, f ), AN and DM is
a set of adding and removing objects to block decisions DB Then
, 𝑖 = 𝑚 + 1 𝑚 + 𝑝, 𝑗 = 1 ℎ + 𝑞
,
Trang 15Proposition 3.3
Let decision block DB = (U, CD, V, f ), AN and DM is
a set of adding and removing objects to block decisions DB Then
3.2.1 Adding object x into decision block
Case 1: Create a new conditional class and a new decision class Acc(C’ m+1 , D’ h+1 ) = 1 and Cov(C’ m+1 , D’ h+1 ) = 1,
j=1 h: Acc(C’ m+1 , D’ j ) = Cov(C’ m+1 , D’ j ) = 0,
i=1 m: Acc(C’ i , D’ h+1 ) = Cov(C’ i , D’ h+1 ) = 0
Other way, i=1 m, j=1 h:
Acc(C’ i , D’ j ) = Acc(C i , D j ) ,
and Cov(C’ i , D’ j ) = Cov(C i , D j )
Case 2: Create only new conditional class
Acc(C’ m+1 , D’ j* ) = 1 and Cov(C’ m+1 , D’ j* ) = 1
|𝐷𝑗∗|+1
If k j* then: Acc(C’ m+1 , D’ k ) = Cov(C’ m+1 , D’ k ) = 0
If i m+1 then: Acc(C’ i , D’ j* ) = Acc(C i , D j* ), Cov(C’ i , D’ j* )
=|𝐶𝑖∩𝐷𝑗∗|
|𝐷𝑗∗|+1