1. Trang chủ
  2. » Thể loại khác

DSpace at VNU: The lattice-based approaches for mining association rules: a review

12 147 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 12
Dung lượng 720,11 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The lattice-based approaches formining association rules: a review Tuong Le1,2and Bay Vo3,4* The traditional methods for mining association rules ARs include two phrases: mining frequent

Trang 1

The lattice-based approaches for

mining association rules: a review

Tuong Le1,2and Bay Vo3,4*

The traditional methods for mining association rules (ARs) include two

phrases: mining frequent itemsets (FIs)/frequent closed itemsets

(FCIs)/fre-quent maximal itemsets (FMIs) and generating ARs from FIs/FCIs/FMIs

Lattice-based approaches (LBAs) for mining ARs are new approaches including

two phrases: frequent itemset lattice (FIL)/frequent closed itemset lattice (FCIL)

building and generating ARs from the lattice Total mining time of LBAs for

mining ARs outperforms the traditional methods for mining ARs Besides, the

most important advantage of LBAs for mining ARs is that the algorithms only

build the lattice once and mine ARs with many different confidences or many

different minimum supports (the thresholds have to be greater than or equal

to the threshold used to build lattices) without mining FIs/FCIs again In this

article, we describe a number of existing LBAs for mining ARs on static

data-bases including lattice building and rule generation In addition, in today’s

online system, the data often change in several operations such as insert,

delete, and update Hence, a number of LBAs for mining ARs on dynamic

databases are mentioned Finally, complexity analysis of the LBAs for mining

ARs is also thoroughly discussed.© 2016 John Wiley & Sons, Ltd

How to cite this article:

WIREs Data Mining Knowl Discov 2016, 6:140–151 doi: 10.1002/widm.1181

INTRODUCTION

Data mining is a process of analyzing the data to

find knowledge to use in intelligent systems

There are currently many problems to be introduced

such as problem of mining association rules (ARs),

classification,1–6clustering,7,8text mining,9and their

applications.2,10 Mining ARs, including ARs,

mini-mal non-redundant association rules (MNARs), and

most generalization association rules (MGARs), is a

model being widely used in market basket analysis,

online e-commerce such as Amazon, Alibaba, and so

on, and several other recommendation systems Traditional approaches for mining ARs consist of two steps: mining frequent itemsets (FIs)/frequent closed itemsets (FCIs)/frequent maximal itemsets (FMIs) (FIs/FCIs/FMIs),11,12,13 and generating rules from those itemsets Some variants of FIs such as high utility itemsets (itemsets whose utility satisfies a given threshold),14–27top-k high utility itemsets

(top-k itemsets with highest utility),28 weighted pattern (pattern with weighted items),29–31 erasable itemsets (itemsets can be eliminated but do not greatly affect the factory’s profit),32–34 weighted erasable patterns (erasable itemsets considered the distinct weight of each item),35,36and so on are proposed Besides, sev-eral type of representations that limit the number of FIs such as FCIs,37–41 FMIs,42–47 top-k FIs,48,49

top-rank-k FIs,50,51 and FIs with constraints52 are also proposed In traditional approaches for mining ARs, researchers usually focus on the first phrase (mining FIs/FCIs/FMIs) However, the second phrase (rule generation) takes a lot of time for mining a large

*Correspondence to: bayvodinh@gmail.com

1 Division of Data Science, Ton Duc Thang University, Ho Chi

Minh City, Vietnam

2 Faculty of Information Technology, Ton Duc Thang University,

Ho Chi Minh City, Vietnam

3 Faculty of Information Technology, Ho Chi Minh City University

of Technology, Ho Chi Minh City, Vietnam

4 College of Electronics and Information Engineering, Sejong

University, Seoul, Republic of Korea

Con flict of interest: The authors have declared no conflicts of

inter-est for this article.

Trang 2

number of FIs/FCIs/FMIs Therefore, lattice-based

approaches (LBAs) for mining ARs are proposed to

overcome the above weakness Generally, these

approaches will build frequent itemset lattice (FIL)/

frequent closed itemset lattice (FCIL) (FIL/FCIL) in

thefirst phrase In the next phrase, they only traverse

the lattice to generate ARs As generating rules

from lattice has less complexity than traditional

approaches such as Apriori or hash table, total

mining time of LBAs for mining ARs outperforms

the traditional methods for mining ARs Moreover,

the largest advantage of LBAs for mining ARs is that

the algorithms only build the lattice once and mine

ARs with many different confidences or many

differ-ent minimum supports (the thresholds have to be

greater than or equal to the threshold used to build

lattices) without mining FIs/FCIs/FMIs again

There-fore, LBAs are extensively used to mine ARs

nowa-days In addition, in today’s online system, the data

often change in several operations such as add,

delete, and update, especially in e-commerce systems,

which raise a need of improving AR mining methods

to adapt with the new requirements There have been

several studies for mining patterns/rules on dynamic

databases In this article, we conduct a review of

LBAs for mining ARs including lattice building and

rule generation phrases Furthermore, a number of

LBAs for mining ARs for dynamic databases are also

surveyed Next, the complexity analysis of LBA for

mining ARs is discussed Finally, some challenges of

the LBAs and their potential applications in the near

future are introduced

The rest of the article is organized as follows

The section “Classical Approaches for Mining ARS”

presents the classical approaches for mining FIs/FCIs

and mining ARs/MNARs/MGARs In the section

“The FIL/FCIL Building,” we report the existing

approaches for building FIL and FCIL Next, the

section“LBAs for Mining ARs” presents a number of

LBAs for mining ARs Some incremental LBAs for

mining ARs are subsequently presented in the

section “LBAs for Mining ARS on Dynamic

Data-bases ” Then, the section “Complexity Analysis”

shows complexity analysis of LBA for mining ARs

The conclusion is presented in the section

“Conclu-sion and Future Researches.

CLASSICAL APPROACHES

FOR MINING ARS

Given a database (DB) comprising of a number of

transactions (n) such that each transaction contains a

number of items Transaction database (DB e) is

presented in Table 1 as an example and will be used for illustrative purposes throughout this article

The support of an itemset X, denoted by σ(X),

is the number of transactions in DB that contain all items of X An itemset X is an FI if and only if

σ(X) ≥ dminSup × ne, in which minSup is a

user-given minimum support threshold Currently, there are many algorithms for mining FIs, which may be divided into three main groups: (1) Methods that use a candidate generate-and-test strategy: they generate frequent 1-itemsets which are then used to generate candidate 2-itemsets, and so on until there

is no more candidates that can be generated Apriori53 and BitTableFI54 are exemplar algorithms (2) Methods that adopt a divide-and-conquer

strat-egy: they compress DB into a tree structure and mine

FIs from this tree by using divide-and-conquer strat-egy FP-Growth55 and FP-Growth*56 are exemplar algorithms (3) Methods that use a hybrid approach: these methods use vertical data formats to compress

DB and also mine FIs by using divide-and-conquer

strategy Eclat,57 dEclat,58 Index-BitTableFI,59

DBV-FI,60 PrePost,61 FIN,62 NSFI,63 and PrePost+64 are some examples

An FI is called an FCI if none of its supersets

has the same support For instance, consider DB e and minSup = 50% Two itemsets, AW and ACW,

are two FIs because σ(AW) = σ(ACW) = 4 >

dminSup × ne = d50% × 6e = 3 However, AW is not an FCI because ACW is its superset and has the same support to AW Only ACW is an FCI Most of

the previously proposed algorithms for mining FCIs can be categorized as being either (1) generate-and-test, (2) divide-and-conquer, or (3) hybrid methods The generate-and-test (Apriori-based) approach uses

a level-wise search to mine FCIs A well-known algo-rithm is Close.65 The divide-and-conquer approach adopts a divide-and-conquer strategy and uses some compact data structures to efficiently mine FCIs Examples are CLOSET39 and CLOSET+.56 The hybrid approaches integrate the previous two Typically, the database is firstly transformed into a

TABLE 1| A Transaction Database (DBe ) Example

Trang 3

vertical data format or compress format The

approach then utilizes some pruning properties to

quickly prune nonclosed itemsets Examples are

CHARM, dCHARM,58 DBV-Miner,41DCI_PLUS,66

and NAFCP.37

An AR is an implication expression of the form

X ! Y, where X and Y are disjoint itemsets, i.e., X \

Y = ; The strength of an AR can be measured in

terms of its confidence Confidence of a rule (c)

deter-mines how frequently items in Y appear in

transac-tions that contain X: c(X ! Y) = σ(X [ Y)/σ(X) Each

frequent k-itemset, XY, can produce up to 2 k−2 ARs,

ignoring rules that have empty antecedents or

conse-quents (; ! XY or XY ! ;) An AR can be extracted

by partitioning the itemset XY into two nonempty

subsets, X and Y, such that X ! Y satisfies the

confi-dence threshold (minConf ) Note that all such rules

must have already met the support threshold because

they are generated from an FI Because the rule

genera-tion from FIs is simple, there are relatively few studies

on this stage Many studies focused on the stage of

mining FIs/FCIs Agrawal and Srikant53 introduced

the following properties: “if the rule c(AB ! CD) <

minConf, then the rules c(ABC ! D) and

c(ABD ! C) are smaller than minConf” to reduce

the search space An algorithm based on this

prop-erty has been proposed to efficiently mine ARs

from FIs/FCIs generated from stage 1 This method

has been used to mine ARs from FIs/FCIs so far

Let X be an FCI An itemset Y is a generator of

X if and only if in Y  X and σ(X) = σ(Y) For

exam-ple, AW is a generator of ACW, because AW 

ACW and σ(AW) = σ(ACW) = 4 Similarly, A and

AC are also generators of ACW Let G(X) is the set

of X ’s generators We have Y 2 G(X) is a minimal

generator if and only if Y does not have any subset in

G(X) For example, G(ACW) = {A, AC, AW}

there-fore minimal generators of ACW is mG(ACW) = {A}.

An association rule R1: X1! Y1is a MNAR if there

is no AR R2: X2 ! Y2 with σ(X1 [ Y1) =σ(X2 [

Y2), c(R1) = c(R2), X2 X1 and Y2 Y1 There are

two kinds of MNARs obtained: (1) exact rules (their

confidence = 100%): the rules have the form X0!

X, where X is an FCI, and X0 2 mG(X) and

(2) approximate rules (their confidence < 100%): the

rules have the form X0! Y, in which X and Y are

FCIs, and X02 mG(X), X  Y.

Assume that there are two rules R1: X1! Y1

and R2: X2! Y2 Rule R1is said to be more general

than R2(R1 / R2) if and only if X1 X2 and Y2

Y1 Let R = {R1, R2, …, R n} be the set of rules that

satisfy the conditions of minSup and minConf A rule

R i is said to have a higher precedence than another

rule R j , denoted as R i > R j , if R i / R jand one of the

following conditions holds: (1) c(R i ) > c(R j ); (2) c(R i)

= c(R j) andσ(R i) > σ(R j ) Let RMG be the set of the

MGARs of R: RMG= {R j 2 R| ¬ 9 R i 2 R: R i > R j}

THE FIL/FCIL BUILDING

LBAs for mining ARS are divided into two phases: (1) building lattice and (2) generating ARs from the lattice This section presents the existing approaches for building lattices Some of existing approaches for mining ARs from the lattices are then introduced in the section“LBAs for Mining ARs”.

The FIL Building

In 2009, Vo and Le67 proposed an algorithm for building FIL (e.g., FIL-2009) directly from the data-base (Table 2) In FIL-2009, each node in the lattice

has the tuple hX, Tidset, Childreni where X is a k-itemset, Tidset is the set of IDs associated with the transactions containing X, and Children = {Y | Y 2 (k + 1)-itemsets and X  Y} FIL-2009 built for DB e

in Table 1 with minSup = 50% is presented in

Figure 1

Although, mining ARs from FIL-2009 is very effective, FIL-2009 is not an effective structure to mine MNARs Therefore, in 2011, Vo and Le68 extended the structure of FIL-2009 (e.g., FIL-2011)

by adding onefield to consider whether or not a lat-tice node is a minimal generator, and anotherfield to consider whether or not a lattice node is an FCI These values were directly determined in the lattice building The structure is then used to effectively mine MNARs, which will be presented in“LBAs for

Mining ARs ” section With DB e in Table 1 and

minSup = 50%, FIL-2011 is presented in Figure 2.

On thefigure, bold-nodes and dashed-nodes indicate FCIs and minimal generators respectively

When a node XA in an 2009 (and

FIL-2011) is created, 2009 (or FIL-Building-2011) has to find all the nodes that are the children

of XA to update the lattice This process first visits

all children of X (Y 2 X.Children) With each Y, the process visits all children of Y (YB 2 Y.Children) With each YB, if XA  YB, the process then updates

TABLE 2 | Existing Approaches for Building Frequent Itemset

Lattice (FIL)

Trang 4

YB belonging to the children of XA (YB 2

XA.Chil-dren) Considering FIL-2009 in Figure 1, when the

algorithm creates the node TC, it has to consider all

the child nodes associated with T, which consist of

AT and TW Next, the algorithm has to consider all

the child nodes associated with AT and TW, which

are {ATW, ATC} and {ATW} However, the process

of considering all child nodes of TW does not find

any nodes that are the child node of TC The node

ATW is a duplicate, and thus making the process of

considering all child nodes associated with TW

unncessary To overcome this weakness, Vo et al.69

proposed a new structure for an FIL (e.g., FIL-2014)

and TFIL algorithm for FIL-2014 building Each

node on the lattice contains the form hItemset,

Tid-set, ChildrenEC, ChildrenLi In which, ChildrenEC

contains the child nodes based on the equivalent class

feature associating with Itemset; and ChildrenL

con-tains the child nodes based on the lattice feature

asso-ciated with Itemset Because this algorithm does not scan all the child nodes of XA to update the lattice,

the time needed to build FIL-2014 of TFIL algorithm

is less than that of Building-2009 to build

FIL-2009 and FIL-Building-2011 to build FIL-2011 For

DB e in Table 1 and minSup = 50%, FIL-2014 is

pre-sented in Figure 3

A1345

4

AW1345

4 AC13454

AWC1345

4

DW245

3

DWC245

3

DC2456

4 TW1353 TC13564 WC123455

AT135

3

TWC135

3

ATW135

3 ATC1353

ATWC135

3

D2456

4 T13564

{}

W12345

5 C1234566

FIGURE 1 | FIL-2009 for DBe with minSup = 50%.

A1345

4

AW1345

4 AC13454

AWC1345

4

DW245

3

DWC245

3

DC2456

4 TW1353 TC13564 WC123455

AT135

3

TWC135

3

ATW135

3 ATC1353

ATWC135

3

D2456

4 T13564

{}

W12345

FIGURE 2 | FIL-2011 for DBe

with minSup = 50%.

Trang 5

The FCIL Building

In 2005, Zaki and Hsiao58 proposed CHARM-L to

create FCIL-2005 (Table 3) The FCIL-2005 created

by CHARM-L for DB e with minSup = 50% is shown

in Figure 4 However, MNARs and MGARs cannot

be generated from FCIL-2005 Mining MNARs and

MGARs from FCIL-2005 requires using a level-wise

approach to generate generators; therefore, it is

inef-ficient in terms of the mining time

In 2013, Vo et al.70 proposed

FLC-Building-2013 to build FCIL (e.g., FCIL-FLC-Building-2013) effectively First, FCIs with their minimal generators are mined using MG-CHARM.67 Then, an algorithm (e.g., FCIL-Building-2013) is proposed to insert FCIs into

FCIL-2013 with O(n × k) complexity where n is the number of FCIs and k is the average of the number

of child nodes on the lattice Since k << n, the

FCIL-Building-2013 algorithm is efficient The FCIL-2013

created by FCIL-Building-2013 on DB e with minSup

= 50% is shown in Figure 5

In 2014, Szathmary et al.71 proposed Snow-Touch, a novel computation schema for iceberg lattices with generators First, FCI computation is delegated to the Charm algorithm Then, FGs are extracted by Talky-G Next, two of the above meth-ods together with an FG-to-FCI matching technique form the Touch algorithm Finally, the precedence is retrieved from FCIs with FGs by the Snow algorithm using a ground duality result from hyper graph the-ory The result of Snow-Touch is the same with

A1345

4

AW1345

4 AC13454

AWC1345

4

DW245

3

DWC245

3

DC2456

4 TW1353 TC13564 WC123455

AT135

3

TWC135

3

ATW135

3 ATC1353

ATWC135

3

D2456

4 T13564

{}

W12345

5 C1234566

FIGURE 3 | FIL-2014 for DBe

with minSup = 50%.

TABLE 3| Existing Approaches for Building Frequent Closed

Itemset Lattice (FCIL)

No Name of Algorithm Year Name of FCIL

2 FCIL-Building-2013 70 2013 FCIL-2013

{}

C123456

6

TC1356

DC2456

4

ATWC135

3

AWC1345

4

DWC245

3

FIGURE 4 | FCIL-2005 for DBe with minSup = 50%.

{}

C123456

6

DC2456

4 TC13564 WC123455

D T W DW A

AT, TW

DWC245

3

ATWC135

3

AWC1345

4

FIGURE 5 | FCIL-2013 for DBe with minSup = 50%.

Trang 6

FCIL-Building-2013 On DB e with minSup = 50%,

this result is shown in Figure 5

LBAS FOR MINING ARS

Besides ARs, a number of types of ARs were

pro-posed, namely MNARs and MGARs Table 4 shows

the list of existing LBAs for mining ARs, MNARs

and MGARs

LBA for Mining ARs

Atfirst, LBA-ARs-200967 traverses all child nodes Lc

of the root of FIL-2009, and then it calls a recursively

function to traverse all nodes in the lattice

(recur-sively and mark the visited nodes by turning the flag

on) Then, this algorithm uses a queue (Ω) for

traver-sing all child nodes of Lc(and marking all of the

vis-ited nodes for rejecting coincides) For each child

node (of Lc), this algorithm computes the confidence

of rule basing on the information of this node If the

confidence ≥ minConf, this algorithm will add this

rule to the results

For example, LBA-ARs-2009 uses FIL-2009 for

DB e with minSup = 50%, which was shown in Figure 1 to generate ARs Let minConf = 100%.

Considering the first child node of root, Ω = {AT,

AW, AC} (Figure 6).

1 Let L = AC, the last element of Ω We have c(A

! AC) = σ(AC)/σ(A) = 4/4 = minConf Hence,

this rule will be added to the results

2 Let L = AW, the second element ofΩ We have

c(A ! AW) = σ(AW)/σ(A) = 4/4 = minConf.

Hence, this rule will be added to the results

3 Let L = AT, the first element of Ω We have c(A

! AT) = σ(AT)/σ(A) = 3/4 < minConf Hence, this rule A ! AT will not be added to the

results

Next, LBA-ARs-2009 will perform recursively

to generate all rules on the lattice

LBA for Mining ARs with Interestingness Measures

After building FIL-2009, LBA-ARs-IM-201160 will create the HT-FIs (Hash table of FIs) including two levels of key: (1)first level uses the length of the item-set as a key (2) In case of the itemitem-sets sharing the same length, the algorithm uses hash tables with keys computed byP

y2Y (Y is the itemset which needs to

determine its support)

At first, LBA-ARs-IM-2011 traverses all child

nodes Lcof the root of FIL-2009, and then it calls a

TABLE 4 | Existing Lattice-based Approaches (LBAs) for Mining

Association Rules (ARs)

No Name of Algorithm Authors (Year) Type of ARs

1 LBA-ARs-2009 Vo and Le67 ARs

2 LBA-ARs-IM-2011 Vo and Le60 ARs (with

interestingness measures)

MGARs, most generalization association rules; MNARs, minimal

non-redundant association rules.

A1345

4

AW1345

4 AC13454

AWC1345

4

DW245

3

DWC245

3

DC2456

4 TW1353 TC13564 WC123455

AT135

3

TWC135

3

ATW135

3 ATC1353

ATWC135

3

D2456

4 T13564

{}

W12345

5 C1234566

FIGURE 6 | Association rules generation on node A.

Trang 7

recursively function to traverse all nodes in the lattice

(recursively marks the visited nodes if the flag is

turned on) This algorithm uses a queue for

traver-sing all child nodes of Lc(and marks all of the visited

nodes for rejecting coincides) For each child node

(of Lc), the authors compute the measure value by

using vm(n, σ(Lc),λ(L\Lc),σ(L)) function (where n is

the number of transactions, σ(Lc) is support of Lc,

σ(L) is support of L and λ(L\Lc) = get support from

the hash table |L\Lc|th), and add this rule into ARs

There are a number of measures shown in Table 5

In fact, the number of generated rules is very large

Therefore, we need to use a threshold of vm to

shrink the rules set

For example, LBA-ARs-2009 uses FIL-2009 for

DB e with minSup = 50%, which is shown in

Figure 1, to generate ARs with Lift measures Let

minVM = 1.2 Considering the first child node, A, of

root,Ω = {AT, AW, AC} (Figure 7).

1 Let L = AC, the last element of Ω We have vm

(A ! AC) = (4 × 6)/(4 × 6) = 1 < minVM.

Hence, this rule will not be added to the results

2 Let L = AW, the second element ofΩ We have

vm(A ! AW) = (4 × 6)/(4 × 5) = 1.2 = minVM Hence, this rule will be added to the

results

3 Let L = AT, the first element of Ω We have vm (A ! AT) = (4 × 6)/(4 × 4) = 1.5 < minVM.

Hence, this rule will be added to the results

Next, LBA-ARs-2009 will recursively perform

to generate all rules on the lattice

LBA for Mining MNARs

Firstly, MNARs-FCIL traverses all child nodes Lc of the root of FIL-2011, in which each of the nodes has one field marking whether or not a lattice node is a

minimal generator (mG) and another field indicating whether or not a lattice node is a closed Then it calls

a function to traverse all nodes on the lattice (recur-sively and marks the visited nodes by turning theflag on) This algorithm uses a queue to traverse all child

nodes of Lc (and marks all visited nodes to reject

coincide) For each child node L of Lc (Lc is a

mini-mal generator), the algorithm computes c(Lc! L\Lc),

if L is an FCI and c(Lc ! L\Lc) ≥ minConf then this algorithm adds Lc! L\Lcto the results

For example, MNARs-FCIL uses FIL-2011 for

DB e with minSup = 50% (e.g., Figure 2) to generate MNARs Let minConf = 100% Considering thefirst

child node, A, of root Because A is minimal genera-tor, and c(A ! AWC) = 4/4 = minConf, A ! AWC

will be added to the results (e.g., Figure 8) Next,

TABLE 5| Value of Some Measures with Rule X ! Y

nX

X ×n Y

p

nX×n Y

n

n X + 2

nX+nY−n XY

7 Phi-coef ficient nXYffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffin×n− n X ×n Y

X ×n Y ×nX×nY p

A1345

4

AW1345

4 AC13454

AWC1345

4

DW245

3

DWC245

3

DC2456

4 TW1353 TC13564 WC123455

AT135

3

TWC135

3

ATW135

3 ATC1353

ATWC135

3

D2456

4 T13564

{}

W12345

5 C1234566

FIGURE 7 | Association rules with interestingness measures generation on node A.

Trang 8

MNARs-FCIL will recursively perform this process

to generate all of the MNARs on the lattice

LBA for Mining MGARs

In 2013, Vo et al.70 proposed a LBA for mining

MGARs A theorem was introduced Given three

nodes l1, l2, and l3 in FCIL-2013, if l1 is the parent

node of l2, l2is the parent node of l3, andσ(l2)/σ(l1) <

minConf, then σ(l3)/σ(l1) < minConf According to

this theorem, if a lattice node {Y} is a child node of

{X} in the FCIL and σ(Y)/σ(X) < minConf, then the

child nodes of {Y} cannot form rules with {X}.

In details, at first, MGARs-FCIL traverses all

the FCI in FCIs For each FCI, C, it then initializes

the RHS (right-hand side) to ; It then generates rules

from the minimal generators of C to C This

algo-rithm uses a queue (Ω) to traverse all the child nodes

of {C} (and marks all the visited nodes to avoid

coin-cidence) For each child node Ls of {C}, the con

fi-dence of all rules with the form X0 ! Ls\X0 (X0 2

mG(C)) is then calculated If the confidence satisfies

minConf and L s is not marked, then Lsis added toΩ

for generating all rules from C to Ls After Ls is

added toΩ, it is marked to avoid coincidence in the

future This process is subsequently repeated to

gen-erate all rules on the lattice

LBAs FOR MINING ARs

ON DYNAMIC DATABASES

In 2014, Vo et al.69 proposed two effective

approaches for maintaining an FIL with dynamically

inserted data based on the pre-large and tidset/diffset

concepts The pre-large concept72 is proposed basing

on a safety threshold f = bðSU−SLÞ×jDj

1−SU c to reduce the need of rescanning the original database for ef

fi-ciently maintaining ARs In which, SU is the upper

threshold, SL is the lower threshold, and |D| is the number of original database D’s transactions When the number of new transactions is equal to or less

than f, the algorithm does not need to rescan the

original database The FIL with pre-large concept is called by PFIL (Pre-large FIL) In tidset-based mainte-nance of a pre-large FIL (TMPFIL) and diffset-based maintenance of a PFIL (DMPFIL), for each of the increments, the process of this algorithm is described

as follows (1) If the original database is empty, the

algorithm uses the lower threshold SLto build a PFIL

and recalculates the safety threshold f for incremental database D0 (2) If the number of transactions in

incremental database D0 is larger than f, the algo-rithm uses SL to build a PFIL and recalculates the

safety threshold f for D + D0 (3) If the number of

transactions in incremental database D0is equal to or

less than f, the algorithm updates the PFIL without

scanning the database (4) The original database is

updated as D = D + D0 The experimental results69

show that DMPFIL outweighs both of TMPFIL and the batch approach in terms of execution time required to build an FIL (Table 6)

In 2014, La et al.73 proposed MFCIL-2014 for maintaining an FCIL with dynamically inserted data based on the pre-large concepts The process of this algorithm is illustrated as follows (1) Building the initial FCIL-2005 with CHARM-L58 with SL (2) Building an index table for the initial lattice from step 1 (3) Adding transactions, one by one, to the

A1345

4

AW1345

4 AC13454

AWC1345

4

DW245

3

DWC245

3

DC2456

4 TW1353 TC13564 WC123455

AT135

3

TWC135

3

ATW135

3 ATC1353

ATWC135

3

D2456

4 T13564

{}

W12345

FIGURE 8 | Minimal non-redundant association rules generation on node A.

Trang 9

lattice with the improved CLICL75 algorithm when

the number of inserted transactions is lower than f.

(4) Rescanning the entire database when the

rescan-ning value reaches f, then going back to Step

1 Experimental results73 show that MFCIL-2014

outperforms CLICL in terms of both execution time

and memory space

In 2015, based on pre-large and tidset/diffset

concepts, Vo et al.74proposed two algorithms

(TiFU-FIL and DiFU-(TiFU-FIL algorithms) for updating P(TiFU-FIL

with transaction deletion The experimental results74

show that the two approaches outperform the

batch-mode algorithm in building PFIL, with the

diffset-based approach (DiFU-FIL) being more efficient than

the tidset-based approach (TiFU-FIL)

Current incremental approaches are mainly

used on pre-large concept which require to rescan the

database when the number of inserted or updated

transactions over a safety threshold Thus, it is

important to find methods that can mine ARs

with-out rescanning database toward mining ARs on data

streams

COMPLEXITY ANALYSIS

The complexity of mining FIs/FCIs in the worst case

is O(2 |I| ) where |I| is the number of items in the

data-base The complexity for building an FIL from the

database67–69 is O(2 |I| × k), where k is the average

number of all subsets of all FIs In addition, the

com-plexity for building an FCIL from the set of FCIs70in

the worst case is O(n × k), where n is the number of

FCIs and k is the average number of all subsets of all

FCIs Therefore, the complexity for building an FCIL

from the database70 is O(2 |I| + n × k) Fortunately,

Vo et al.70 shown that k  n and n  2 |I| in most

databases, therefore, the overall computational

com-plexity of FIL/FCIL building67–70is O(2 |I|), the same

with that of mining FIs/FCIs.

For generation of ARs/MNARs/MGARs from

built LBAs, the complexity is O(n × k), where n is

the number of nodes on FIL/FCIL and k is the aver-age number of all subsets of all FCIs with k  n Meanwhile, mining ARs from FIs/FCIs requires O (n2) Therefore, LBAs for mining ARs/MNARs/ MGARs are especially effective in the case users need

to mine ARs/MNARs/MGARs with many different confidences or many different minimum support thresholds (the thresholds have to be greater than or equal to the threshold used to build lattices)

CONCLUSION AND FUTURE RESEARCHES

LBAs for mining ARs are new approaches that com-prise of two phrases: FIL/FCIL building and generat-ing ARs from the lattice Total mingenerat-ing time of LBAs for mining ARs outperforms the traditional methods for mining ARs, especially when the number of FIs/F-CIs is large In this article, we survey the existing LBAs for mining ARs on static and dynamic data-bases First, we present some of the building method

of lattice on static databases including FIL and FCIL Next, methods using LBAs for generating traditional ARs, MNARs and MGARs from FIL/FCIL are pre-sented Then, maintenance FIL/FCIL approaches toward mining ARs for dynamic databases including inserted and deleted transactions are surveyed

In reality, that the number of FIs/FCIs is often large and end users just interested in a small set con-cerning a certain number of issues Therefore, mining FIs/FCIs with constraints are proposed However, FIL/FCIL with constrains building is still an open challenge

Although methods for maintaining an FIL with inserted and deleted transactions are proposed, a general method that facilitates to maintain an FIL with inserted, deleted and updated data is quite nec-essary For FCIL, the study on methods for maintain-ing an FCIL with deleted and updated transactions, and as well as a general method for maintaining an FCIL with all operations are necessary

In addition, current incremental approaches are mainly used on pre-large concept These methods require to rescan the database when the number of inserted or updated transactions over a safety thresh-old Thus, it is crucial to investigate methods that can mine ARs without rescanning database toward mining ARs on data streams Finally, examining LBAs methods for mining ARs on quantitative data-base is also a potential research direction

TABLE 6| Existing Lattice-based Approaches (LBAs) for Mining

Association Rules (ARs) on Dynamic Databases

No Name of Algorithm Year Structure Actions

1 TMPFIL and DMPFIL 69 2014 FIL Inserted data

2 MFCIL-2014 73 2014 FCIL Inserted data

3 TiFU-FIL and DiFU-FIL 74 2015 FIL Deleted data

FCIL, frequent closed itemset lattice; FIL, frequent itemset lattice.

Trang 10

1 Menardi G, Torelli N Training and assessing classi

fi-cation rules with imbalanced data Data Min Knowl

Discov 2014, 28:92–122.

2 Nassirtoussi AK, Aghabozorgi SR, The Y.W,

Ngo DCL Text mining for market prediction: a

sys-tematic review Expert Syst Appl 2014, 41:7653–7670.

3 Niemann U, Völzke H, Kühn JP, Spiliopoulou M.

Learning and inspecting classi fication rules from

longi-tudinal epidemiological data to identify predictive

fea-tures on hepatic steatosis Expert Syst Appl 2014,

41:5405 –5415.

4 Nguyen TTL, Vo B, Hong TP, Hoang CT Classi

fica-tion based on associafica-tion rules: a lattice-based

approach Expert Syst Appl 2012, 39:11357–11366.

5 Shindea S, Kulkarnib U Extracting classi fication rules

from modi fied fuzzy min–max neural network for data

with mixed attributes Appl Soft Comput 2016,

40:364 –378.

6 Wang X, Liu X, Pedrycz W, Zhu X, Hu G Mining

axiomatic fuzzy set association rules for classi fication

problems Eur J Oper Res 2012, 218:202–210.

7 Le HS A novel kernel fuzzy clustering algorithm for

Geo-Demographic Analysis. Inform Sci 2015,

317:202 –223.

8 Mai TS, He X, Feng J, Plant C, Böhm C Anytime

density-based clustering of COMPLEx data Knowl Inf

Syst 2015, 45:319–355.

9 Indurkhya N Emerging directions in predictive text

mining Data Min Knowl Discov 2015, 5:155–164.

Vairavasundaram I, Ravi L Data mining-based tag

recommendation system: an overview Data Min

Knowl Discov 2015, 5:87–112.

11 Fariha A, Ahmed CF, Leung CK, Samiullah M,

Pervin S, Cao L A new framework for mining frequent

interaction patterns from meeting databases Eng Appl

Artif Intel 2015, 45:103–118.

12 Fournier-Viger P, Gomariz A, Gueniche T, Soltani A,

Wu CW, Tseng VS SPMF: a Java open-source pattern

mining library. J Mach Learn Res 2014,

15:3389 –3393.

13 Hacene MR, Huchard M, Napoli A, Valtchev P

Rela-tional concept analysis: mining concept lattices from

multi-relational data Ann Math Artif Intell 2013,

67:81 –108.

14 Lan GC, Hong TP, Tseng VS An ef ficient

projection-based indexing approach for mining high utility

item-sets Knowl Inf Syst 2014, 38:85–107.

15 Lin CW, Lan GC, Hong TP Mining high utility

item-sets for transaction deletion in a dynamic database.

Intell Data Anal 2015, 19:43–55.

16 Lin CW, Hong TP, Lan GC, Wong JW, Lin WY Incrementally mining high utility patterns based on

pre-large concept Appl Intell 2014, 40:343–357.

17 Lin JCW, Gan W, Hong TP A fast updated algorithm

to maintain the discovered high-utility itemsets for transaction modification Adv Eng Inform 2015,

29:562 –574.

18 Lin JCW, Gan W, Hong TP, Tseng VS Ef ficient

algo-rithms for mining up-to-date high-utility patterns Adv

Eng Inform 2015, 29:648–661.

19 Song W, Liu Y, Li J Mining high utility itemsets by

dynamically pruning the tree structure Appl Intell

2014, 40:29 –43.

20 Song W, Liu Y, Li J BAHUI: fast and memory ef ficient

mining of high utility itemsets based on bitmap Int J

Data Warehouse Min 2014, 10:1–15.

21 Tseng VS, Wu CW, Fournier-Viger P, Yu PS Ef ficient algorithms for mining the concise and lossless

repre-sentation of high utility itemsets IEEE Trans Knowl

Data Eng 2015, 27:726–739.

22 Yun U, Ryang H Incremental high utility pattern

min-ing with static and dynamic databases Appl Intell

2015, 42:323 –352.

23 Zhang X, Deng ZH Mining summarization of high

utility itemsets Knowl-Based Syst 2015, 84:67–77.

24 Kim D, Yun U Ef ficient mining of high utility patterns

with considering of rarity and length Appl Intell, in

press, doi:10.1007/s10489-015-0750-2.

25 Ryang H, Yun U, Ryu K Fast algorithm for high util-ity pattern mining with the sum of item quantities.

Intell Data Anal 2016, 20:395–415.

26 Ryang H, Yun U, Ryu K Discovering high utility

item-sets with multiple minimum supports Intell Data Anal

2014, 18:1027 –1047.

27 Yun U, Ryang H, Ryu K High utility itemset mining with techniques for reducing overestimated utilities

and pruning candidates Expert Syst Appl 2014,

41:3861 –3878.

28 Tseng VS, Wu CW, Fournier-Viger P, Yu PS Ef ficient

algorithms for mining top-k high utility itemsets IEEE

Trans Knowl Data Eng 2016, 28:54–67.

29 Lee G, Yun U, Ryang H An uncertainty-based approach: frequent itemset mining from uncertain data

with different item importance Knowl-Based Syst

2015, 90:239 –256.

30 Yun U, Pyun G, Yoon E Ef ficient mining of robust closed weighted sequential patterns without

informa-tion loss Int J Artif Intell Tools 2015, 24:1–28.

31 Yun U, Yoon E An ef ficient approach for mining weighted approximate closed frequent patterns

consid-ering noise constraints Int J Uncertainty Fuzziness

Knowl Based Syst 2014, 22:879–912.

Ngày đăng: 16/12/2017, 09:00

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN