1. Trang chủ
  2. » Thể loại khác

DSpace at VNU: A method for mining top-rank-k frequent closed itemsets

9 179 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 9
Dung lượng 0,95 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Mining frequent closed itemsets FCIs is important in mining non-redundant minimal association rules.. There-fore, many algorithms have been developed for mining FCIs with reduced mining

Trang 1

Uncorrected Author Proof

Journal of Intelligent & Fuzzy Systems xx (20xx) x–xx

DOI:10.3233/JIFS-169128

IOS Press

1

A method for mining top-rank-k

frequent closed itemsets

1

2

Loan T.T Nguyena,b,∗, Truc Trinhc, Ngoc-Thanh Nguyendand Bay Voe

3

aDivision of Knowledge and System Engineering for ICT, Ton Duc Thang University,

Ho Chi Minh City, Vietnam

4

5

bFaculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam

6

cVOV College, Ho Chi Minh City, Vietnam

7

dFaculty of Computer Science and Management, Wroclaw University of Science and Technology,

Wrocław, Poland

8

9

eFaculty of Information Technology, Ho Chi Minh City University of Technology, Vietnam

10

Abstract Mining frequent closed itemsets (FCIs) is important in mining non-redundant (minimal) association rules

There-fore, many algorithms have been developed for mining FCIs with reduced mining time and memory usage For mining FCIs,

algorithms use the minimum support threshold, minSup, to prune itemsets However, using a fixed minSup is not suitable for mining top-rank-k FCIs A large threshold will lead to a small number of generated FCIs, leading to insufficient FCIs to query

whenk is large On the other hand, a small minSup will generate a huge number of generated FCIs, leading to large runtimes and high memory usage In this paper, we propose a method for mining top-rank-k FCIs without using a fixed minimum support threshold A strategy is first used to eliminate 1-items that cannot generate FCIs belonging to top-rank-k FCIs Next, based on the set of candidate 1-items, we propose TRK-FCI, a DCI-Plus-based algorithm, for mining top-rank-k FCIs In the process of mining top-rank-k FCIs, TRK-FCI automatically increases minSup according to the mined FCIs, efficiently pruning itemsets that cannot belong to top-rank-k FCIs We also modify the dynamic bit vector (DBV) structure and apply

it to reduce memory usage and runtime in the TRK-FCI-DBV algorithm Experimental results show that TRK-FCI-DBV is more efficient than TRK-FCI for various databases

11

12

13

14

15

16

17

18

19

20

21

22

Keywords: DCI-Plus, dynamic bit vectors, frequent closed itemsets, top-rank-k frequent closed itemsets

23

1 Introduction

24

Data mining is the process of extracting interesting

25

knowledge from data Various methods for

discover-26

ing knowledge have been proposed, such as mining

27

traditional association rules [1–4, 6, 7, 23, 31, 36, 37],

28

mining non-redundant association rules [8, 41],

min-29

ing minimal non-redundant association rules [26, 27],

30

mining most generalization association rules [38],

∗Corresponding author Loan T.T Nguyen, Division of

Knowledge and System Engineering for ICT, Ton Duc

Thang University, Ho Chi Minh City, Vietnam E-mail:

nguyenthithuyloan@tdt.edu.vn.

31

classification using decision trees [13, 20, 30] or 32

ILA [33], classification based on association rules 33

[13, 14, 20, 21], and clustering [22] Mining asso- 34

ciation rules has many applications in practice [3, 35

23] For mining association rules, frequent itemsets 36

[2, 11, 21, 42], frequent closed itemsets (FCIs) [15, 37

21, 26, 28, 31, 32, 37, 40–42], or maximal frequent 38

itemsets [12, 19] must be mined Mining frequent 39

itemsets is often used for generating all association 40

rules that satisfy minimum support threshold (min- 41

Sup) and minimum confidence threshold (minConf) 42

[1, 2, 35, 36] and mining FCIs is used for mining 43

(minimal) non-redundant association rules (i.e., rules 44 1064-1246/16/$35.00 © 2016 – IOS Press and the authors All rights reserved

Trang 2

Uncorrected Author Proof

considered redundant based on certain criteria are

45

eliminated) [26, 27, 41] For mining maximal

fre-46

quent itemsets, all frequent itemsets or FCIs (for

47

which the database must be scanned to compute the

48

supports of itemsets) must be generated to mine above

49

kinds of rules

50

Mining FCIs is important for pruning redundant

51

rules The problem was first stated in 1999 by

52

Pasquier et al [26] Since then, many algorithms have

53

been developed to enhance the efficiency of

min-54

ing FCIs, such those based on FP-tree [11, 28, 40],

55

IT-tree [32, 42], bit vectors [31, 37], and N-Lists

56

[15] To mine FCIs, the minSup is set The FCIs

57

that satisfy the minSup threshold are selected It is

58

difficult to mine a sufficient number of top-rank-k

59

FCIs because an excessively high threshold will lead

60

to very few FCIs, not enough to query Conversely,

61

a minSup that is too low will lead to a very large

num-62

ber of FCIs, requiring a lot of memory and time to

63

mine Therefore, developing efficient algorithms for

64

mining top-rank-k FCIs is necessary.

65

Some algorithms have been developed for

min-66

ing top-rank-k frequent itemsets Deng et al [5]

67

proposed the NTK algorithm and used a Node-list

68

to mine top-rank-k frequent itemsets The iNTK

69

algorithm, an improved version of the NTK

algo-70

rithm proposed by Le et al [14], uses the subsume

71

concept and the N-list structure to fast mine

top-rank-72

k itemsets After that, some algorithms have been

73

developed for mining top-k frequent itemsets [29],

74

top-k FCIs [39], and top-k non-redundant association

75

rules [8]

76

Mining top-rank-k FCIs is important for mining

77

non-redundant association rules However, for our

78

best knowledge, there are no developed algorithms

79

for mining top-rank-k FCIs Besides, algorithms

80

developed for mining top-rank-k frequent itemsets

81

or top-k FCIs cannot be applied to mine

top-rank-82

k FCIs Therefore, in this paper, we propose the

83

TRK-FCI algorithm, which is based on DCI-Plus

84

[31], for mining top-rank-k FCIs First, the

algo-85

rithm finds a set of candidate items that may belong

86

to top-rank-k FCIs, where k is a given threshold.

87

Then, it uses the DCI-Plus algorithm to generate

88

FCIs based on these candidate items When an FCI

89

is generated, it is directly inserted into a table named

90

tab k FCIs with the same support are stored in the

91

same entry The number of entries in tab k is below

92

the threshold k In the process of mining top-rank-k

93

FCIs, the algorithm automatically increases minSup

94

to reduce the number of FCI candidates that do not

belong to tab k Because DCI-Plus uses fixed bit 96

vectors, it has high memory usage and runtime for 97

storing and computing the bit vector of a new item- 98

set, checking subsets, and computing the supports 99

of itemsets TRK-FCI-DBV, an improved version of 100

TRK-FCI, is then developed TRK-FCI-DBV uses 101

the dynamic bit vector (DBV) structure instead of 102

the bit vector structure to reduce mining time and 103

memory usage 104

The rest of this paper is organized as follows Sec- 105

tion 2 presents definitions of FCIs and top-rank-k 106

FCIs and states the problem of mining top-rank-k 107

FCIs In Section 3, we review works related to the 108

problem of mining FCIs, top-k and top-rank-k fre- 109

quent itemsets, and top-k FCIs Section 4 describes 110

a method for mining top-rank-k based on the DCI- 111

Plus algorithm and an improved algorithm based on 112

DBVs Experimental results on standard databases 113

for TRK-FCI and TRK-FCI-DBV are presented in 114

Section 5 Conclusions and suggestions for future 115

work are given in Section 6 116

2 Definitions and problem statement 117

Let I = {i1, i2, , i m} be a set of items and 118

DB = {t1, t2, , t n} be a set of transactions, where 119

eacht i(1≤ i ≤ n) is a transaction labeled by a unique 120

identifier and contains a set of items inI. 121

Definition 1 (support of an itemset) Given aDB and 122

an itemsetX (X ⊆ I), the support of X, denoted by 123

SUP X, is the number of transactions containingX in 124

Definition 2 (frequent itemset) Given a DB and 126

an itemset X (X ⊆ I), X is a frequent itemset if 127

SUP X ≥ min Sup. 128

Definition 3 (FCI) Given a DB and an itemset 129

X (X ⊆ I), X is called an FCI if no itemset Y exists 130

such thatX ⊂ Y and SUP X = SUP Y 131

Definition 4 (rank of an FCI) Given a set of CI

including all closed itemsets from a transaction database DB and an FCI X (X ∈ CI), the rank of

X in CI is the number of itemsets whose support

val-ues are no greater than the support ofX The rank of

X is defined as:

R X = |{SUP Y |Y ∈ CI ∧ SUP Y ≥ SUP X}|

Trang 3

Uncorrected Author Proof

L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 3

Definition 5 (a top-rank-k FCI) Given a set of

132

CI including all closed itemsets from a transaction

133

databaseDB and a threshold k, an itemset X ∈ CI is

134

called a top-rank-k FCI if and only if R Xis no greater

135

thank, i.e., R X ≤ k.

136

Definition 6 (mining top-rank-k FCIs) Given CI

137

including all FCIs from transaction databaseDB and

138

a thresholdk, the goal of mining top-rank-k FCIs is to

139

find a complete set of FCIs whose ranks are no greater

140

thank, i.e., top-rank-k FCIs are a set of itemsets for

141

which{X ∈ CI|R X ≤ k}.

142

From definition 6, the problem of mining

top-rank-143

k FCIs is stated as follows Given a database BD and

144

a thresholdk, mining top-rank-k FCIs is divided into

145

two steps:

146

Step 1: Mine all closed itemsets inDB, a set called

147

CI.

148

Step 2: Keep the closed itemsets that satisfy

defini-149

tion 6 inCI.

150

The above approach is simple but not feasible

151

because the number of closed itemsets in the database

152

is often large Therefore, finding a direct solution for

153

mining top-rank-k FCIs without mining all closed

154

itemsets is a challenge

155

3 Related works

156

3.1 Mining frequent closed itemsets

157

Problem of mining FCIs was first proposed in

158

1999 [26] Many algorithms for mining FCIs have

159

since been developed to reduce runtime and

mem-160

ory usage Apriori-based algorithms for this purpose

161

include Close [26] and A-Close [27] These

algo-162

rithms generate candidates and compute their closure

163

to find FCIs Algorithms based on the

divide-and-164

conquer technique have been developed Closet [28]

165

uses FP-tree to compress the database and early

prun-166

ing to prune non-closed itemsets Closet+ [40], an

167

improved version of Closet (which uses a

bottom-168

up projection scheme for FP-tree), uses a hybrid

169

approach: bottom-up for dense databases and

top-170

down for sparse databases It uses item merging and

171

sub-itemset pruning, which are widely used in other

172

algorithms, and applies the subset checking strategy

173

to fast check closed itemsets and item skipping to

eliminate items at high levels that have the same 175

support as that of items at low levels FPClose [11] is 176

an improved version of Closet+ that uses FP-array to 177

reduce the number of FP-tree scans when FP-tree is 178

projected CHARM [42] is based on tidsets for fast 179

computing the supports of itemsets and uses subset 180

checking to fast prune non-closed itemsets To check 181

whether a generated itemset is closed, CHARM uses 182

a hash table in which the key of each itemset is the sum 183

of its items dCHARM, a diffset approach for mining 184

FCIs is also developed [42] CloseMiner [32] uses 185

closed tidsets to check whether an itemset is closed 186

Although CHARM, dCHARM and CloseMiner have 187

advantages over algorithms based on horizon data 188

format such as Close, A-Close, Closet, Closet+, and 189

FPClose, they must use hash tables to check whether 190

a candidate itemset is closed, and thus closed itemsets 191

must be stored in main memory for easy checking 192

DCI-Closed [21] uses tidsets and a non-duplication 193

generation strategy for mining FCIs DCI-Plus [31], 194

an improved version of DCI-Closed [21], generates 195

FCIs and minimal generators of each FCI Because 196

DCI-Closed is based on tidsets, when the tidsets of 197

itemsets are long, a lot of memory is required to store 198

the tidsets and the runtime required to compute the 199

intersection with other tidsets is high To reduce the 200

length of tidsets and reduce computation time, DCI- 201

Plus uses BitTable 202

3.2 Mining top-rank-k frequent itemsets 203

Deng et al proposed the NTK algorithm for min- 204

ing top-rank-k frequent itemsets [5] NTK uses the 205

Node-list data structure to represent itemsets and 206

uses a level-wise approach for mining top-rank-k 207

frequent itemsets, i.e., t-patterns are used to form 208

(t+1)-patterns By using Node-lists, the algorithm 209

does not need to rescan the database to compute the 210

supports of itemsets A dynamic minSup is used to 211

efficient prune candidates Le et al developed iNTK 212

[14], an improved version of NTK iNTK uses the 213

subsume concept to reduce the number of generated 214

candidates compared to those for NTK, reducing the 215

time required to generate candidates 216

3.3 Mining top-k frequent closed itemsets 217

Wang et al [39] proposed the TFP algorithm for 218

mining top-k FCIs, where k is the number of FCIs 219

that need to be mined TFP uses a divide-and-conquer 220

technique (like FP-Growth) and prunes candidates

Trang 4

Uncorrected Author Proof

based on minSup (automatically increased in the

222

process of updating candidates) The authors also

223

used a threshold min l to eliminate itemsets whose

224

lengths are smaller than min l.

225

3.4 Mining top-k association rules

226

In 2012, Fournier-Viger et al [10] proposed the

227

TopKRules algorithm for mining top-k association

228

rules from datasets This algorithm uses the

min-229

Conf value during the mining process of top-k rules

230

The change of the minSup value is dependent on

231

the lowest support of itemsets The TopKRules

algo-232

rithm is based on the principle of extending rules and

233

some methods for early eliminating rules that do not

234

belong to top-k rules Fournier-Viger and Tseng also

235

extended TopKRules for mining top-k non-redundant

236

rules [8] and top-k sequential rules [9] These

algo-237

rithms are very efficient compared to post-processing

238

methods

239

3.5 Dynamic bit vectors

240

In 2012, Vo et al [37] proposed the concept of

241

dynamic bit vectors (DBV) and used it in mining

fre-242

quent closed itemsets DBV of an itemset is a bit

243

vector in which zero bits from the begin and the end

244

are removed With this concept, we can save memory

245

to store bit vectors and time to compute the

intersec-246

tion of bit vectors Tran et al expanded this concept

247

to mine frequent closed sequences [34] Le et al also

248

used DBV to develop an efficient algorithm for

min-249

ing frequent closed inter-sequence using DBV [16]

250

4 Proposed algorithms

251

In this section, we present the TRK-FCI algorithm

252

for mining top-rank-k FCIs based on BitTable

TRK-253

FCI uses DCI Plus [31] to generate candidate closed

254

itemsets and apply some early pruning techniques to

255

prune candidates First, the algorithm chooses a set

256

of candidate items that may belong to top-rank-k

257

FCIs, where k is a given threshold Then, it uses

258

the DCI-Plus algorithm to generate FCIs based on

259

these candidate items When an FCI is generated, it

260

is directly inserted into a table namedtab k FCIs with

261

the same support are stored in the same entry The

262

number of entries intab kis below the thresholdk In

263

the process of mining top-rank-k FCIs, the algorithm

264

automatically increases minSup to reduce the number

265

of FCI candidates that do not belong totab k

266

Fig 1 TRK-FCI algorithm for mining top-rank-k FCIs.

In the above algorithm, databaseD is first scanned 268

to compute the BitTable and determine single items 269

F1 These items are sorted in descending order 270

according to their supports; if two items have the 271

same support, then they are sorted in increasing lex- 272

icographical order Next, the algorithm creates F2 273

by inserting each item in F1 into F2 such that the 274

number of items (which are different in their Bit- 275

Tables) is equal to k The items in F2 are sorted in 276

increasing order according to their supports; if two 277

items have the same support, then they are sorted in 278

increasing lexicographical order POST SET is cre- 279

ated by computing the closure of each item inF2 The 280

procedure DCI CLOSED++is called with the input 281

iCLOSED SET =∅, PRE SET = ∅, POST SET, and 282

minSup, where minSup is the support of the first item 283

Trang 5

Uncorrected Author Proof

L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 5

inF2 if the number of items which are different in

284

their BitTables inF2is equal tok; otherwise, minSup

285

is set to 0

286

DCI CLOSED++

Fig 2 DCI CLOSED++ procedure.

Consider the database in Table 1, which includes 288

10 transactions and 10 items 289

Assume that k = 5, the process of mining top-rank- 290

k FCIs is as follows First, the BitTable and support 291

of each item are obtained, as shown in Table 2 292

AfterF1is sorted, we haveF1= {G, F, E, H, D, C, 293

B, A, J, I} Next, we choose items from F1that may 294

belong to top-rank-k FCIs and store them in F2, i.e., 295

F2= {A, C, D, H, E, F, G} (after sorting) Because 296

A has the same BitTable as that of C and E has the 297

same BitTable as that of F, they are grouped into 298

two groups as (A, B) and (E, F), respectively After 299

grouping, the algorithm computes the closure of each 300

item The results are shown in Table 3 From Table 3, 301

we have POST SET ={ACEFG, CEFG, DEF, H, 302

EF, F, G}. 303

Table 1 Transaction databaseD

Transaction Items

Table 2 Items inD with their BitTable and support

Table 3 BitTable and Closure of items in D Item BitTable Closure Support

Trang 6

Uncorrected Author Proof

Procedure DCI CLOSED++ is called with the

304

input PRE SET =∅, POST SET, CLOSED SET = ∅,

305

and minSup = supp(A) = 0.2 The first element of

306

POST SET (ACEFG) is set to I Because PRE

307

SET =∅ and supp(ACEFG) = minSup, ACEFG

308

is an FCI, and it is put into tab k with its key,

309

which is its support (0.2), ACEFG is also inserted

310

into PRE SET Next, itemset CEFG is processed

311

Because supp(CEFG) = minSup and the BitTable

312

of CEFG is a subset of the BitTable of ACEFG

313

in PRE SET, CEFG is pruned When DEF is

314

processed, because supp(DEF) > minSup and its

315

BitTable is not a subset of the BitTable of any

316

itemset in PRE SET, DEF is an FCI, and it is put into

317

tab k with its key, which is its support (0.5) After

318

that, procedure DCI CLOSED++ is called with

319

PRE SET ={ACEFG}, CLOSED SETnew= DEF,

320

and POST SETnew={H, G} EF and F do not

321

appear in POST SETnew because they belong to

322

CLOSED SET Because CLOSED SET /= φ, DEF

323

is joined with H to create a newgen, which is

324

DEFH Similarly, DEFH is an FCI, and is

325

inserted into tab k with its key (0.3) The

pro-326

cedure is called recursively with parameters PRE

327

SET ={ACEFG}, CLOSED SETnew= DEFH, and

328

POST SETnew={G} Because CLOSED SET /= φ

329

and its generator is DEFHG, and there is no itemset

330

X in PRE SET such that the BitTable of DEFHG is

331

a subset of the BitTable of X, and thus DEFGH is

332

an FCI, and is inserted intotab k with its key (0.2)

333

Now, POST SET =φ and thus DEF is added into

334

PRE SET The process continues by joining DEF

335

with G to form DEFG DEFG is also an FCI and it

336

is inserted intotab kwith its key (0.4 The algorithm

337

then starts with a newgen H H is an FCI and is

338

inserted intotab k with its key (0.7) Note that now

339

the number of entries intab kis 5 and equal tok The

340

algorithm will continue to insert generated FCIs into

341

tab k They include HEF (key is 0.5), HEFG (key

342

is 0.3), and HG (key is 0.5) Consider the process

343

of inserting FCI EF (whose key is 0.8) into tab k

344

Because the key of EF is greater than that of the

345

last entry (DEFGH) intab k(key is 0.2), DEFGH is

346

Table 4 Top-rank-k FCIs generated according to TRK-FCI algorithm

k key/sup FCIs

1 0.8 {EF}, {G}

3 0.6 {EFG},

4 0.5 {DEF}, {HEF}, {HG}

5 0.4 {DEFG}

removed and EF is inserted into tab k , and minSup 347

is set to 0.3 (the key of the last entry intab k) The 348

algorithm will continue to process other FCIs The 349

results are shown in Table 4 350

The TRK-FCI algorithm is based on the DCI-Plus 352

algorithm Because DCI-Plus uses bit vectors to rep- 353

resent the tidsets of items, it requires more memory 354

to store bit vectors and more time to compute the 355

intersection of bit vectors when the number of trans- 356

actions in the database is large To reduce the mining 357

time and memory usage, we develop an improved 358

algorithm that uses DBVs instead 359

Table 5 is presented to show the process of using 360

DBVs for mining top-rank-k FCIs It shows the details 361

of items, supports, closures, and DBVs ofF2 362

Procedure DCI CLOSED++is the same as that in 363

TRK-FCI but the operations for BitTable are replied 364

by operations for DBVs The final results are the same 365

as those obtained with TRK-FCI 366

The algorithms used in the experiments were 368

implemented in C# 2012 on a personal computer 369

with an i5-4200U 1.60-GHz CPU and 4 GB of 370

RAM running Windows 8.1 The experiments were 371

tested on three databases downloaded from the UCI 372

Machine Learning Repository (http://fimi.ua.ac.be/ 373

data) Table 6 shows the characteristics of the exper- 374

imental databases 375

Table 5 DBVs, closures, and supports of items in F2

A {0,520} ACEFG 0.2

C {0,520} CEFG 0.2

D {0,355} DEF 0.5

H {0,919} H 0.7

E {0,879} EF 0.8

F {0,879} F 0.8

G {0,763} G 0.8

Table 6 Characteristics of experimental databases Database # of transactions # of items

Trang 7

Uncorrected Author Proof

L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 7

The experimental databases have different

fea-376

tures The Pumsb and Accidents databases have many

377

transactions (or records), whereas the Chess database

378

is small (3196 transactions)

379

5.1 Execution time

380

The efficiency of applying BitTable and DBVs

381

for mining top-rank-k FCIs was evaluated The

382

Fig 3 Runtimes of TRK-FCI-DBV and TRK-FCI for Accidents

database.

Fig 4 Runtimes of TRK-FCI-DBV and TRK-FCI for Chess

database.

Fig 5 Runtimes of TRK-FCI-DBV and TRK-FCI for Pumsb

database.

experiments were conducted with various values of 383

threshold k for the Accidents, Chess, and Pumsb 384

databases With increasing thresholdk, the number of 385

FCIs increased, increasing the time required to obtain 386

top-rank-k FCIs. 387

Figures 3 to 5 show that the time required for 388

mining top-rank-k FCIs from the three databases 389

increases with increasing k TRK-FCI-DBV runs 390

Fig 6 Memory usage of TRK-FCI-DBV and TRK-FCI for Chess database.

Fig 7 Memory usage of TRK-FCI-DBV and TRK-FCI for Acci-dents database.

Fig 8 Memory usage of TRK-FCI-DBV and TRK-FCI for Pumsb database.

Trang 8

Uncorrected Author Proof

faster than TRK-FCI For example, consider the

391

Pumsb database with a thresholdk of 200 The mining

392

time of TRK-FCI is 179.8 s and that of

TRK-FCI-393

DBV is 130.7 s Most of the processing time for both

394

algorithms is in the itemset expansion stage

TRK-395

FCI-DBV has a lower processing time because it uses

396

a better data format

397

5.2 Memory usage

398

Figures 6 to 8 show that the memory usage for

399

mining top-rank-k FCIs for the three experimental

400

databases increases with increasing threshold k The

401

memory required by TRK-FCI-DBV is significantly

402

less than that required by TRK-FCI Consider the

403

Pumsb database with a threshold k of 120 The

mem-404

ory usage values of the two algorithms are similar;

405

however, when the threshold k is increased to 200,

406

the memory used by TRK-FCI is nearly double that

407

used by TRK-FCI-DBV

408

6 Conclusion and future work

409

This paper proposed a method for mining

top-rank-410

k FCIs based on DCI-Plus Two efficient algorithms,

411

TRK-FCI and TRK-FCI-DBV, were proposed These

412

two algorithms differ in the way they represent data

413

for each itemset, which gives them different mining

414

times and memory usage values A strategy is used

415

to automatically change minSup to prune candidates

416

in the mining process The mining time and memory

417

usage of the two algorithms were analyzed to

com-418

pare the effectiveness of DBV compared to that of

419

BitTable

420

In the future, we will study how to prune candidates

421

more efficiently Moreover, we will try to use other

422

approaches for mining top-rank-k FCIs We will also

423

expand our research to quantitative databases

424

References

425

[1] R Agrawal, T Imielinski and A Swami, Mining association

426

rules between sets of items in large databases, In Proc of the

427

1993 ACM SIGMOD Conference Washington DC, USA,

428

1993, pp 207–216.

429

[2] R Agrawal and R Srikant, Fast algorithms for mining

430

association rules in large databases, In Proc of the 20th

431

International Conference on Very Large Data Bases, San

432

Francisco, CA, USA, 1994, pp 487–499.

433

[3] S Ayubi, M.K Muyeba, A Baraani and J Keane, An

algo-434

rithm to mine general association rules from tabular data,

435

Information Sciences 179(20) (2009), 3520–3539.

[4] E Baralis, L Cagliero, T Cerquitelli and P Garza, Gener- 437

alized association rule mining with constraints, Information 438

Sciences 194 (2011), 68–84. 439 [5] Z.H Deng, Fast mining top-rank-k frequent patterns by 440

using Node-lists, Expert Systems with Applications 41(4) 441

[6] Y.J Du and H.M Li, Strategy for mining association rules 443

for web pages based on formal concept analysis, Applied 444

Soft Computing 10 (2010), 772–783. 445 [7] H.V Duong and T.C Truong, An efficient method for min- 446 ing association rules based on minimum single constraints, 447

Vietnam Journal of Computer Science 2(2) (2015), 67–83. 448 [8] P Fournier-Viger and V.S Tseng, Mining top-k non- 449

redundant association rules, In Proc of 20th International 450

Symposium, ISMIS 2012, Macau, China, 7661, 2012, pp. 451

[9] P Fournier-Viger and V.S Tseng, Mining top-K sequential 453

rules, In Proc of ADMA 2011, Beijing, China, 7121, 2011, 454

[10] P Fournier-Viger, C.W Wu and V.S Tseng, Mining top-K 456

association rules, In Proc of Canadian Conference on AI 457

2012, Toronto, Canada, 7310, 2011, pp 61–73. 458 [11] G Grahne and J Zhu, Fast algorithms for frequent itemset 459

mining using fptrees, IEEE Transactions on Knowledge and 460

Data Engineering 17(10) (2005), 1347–1362. 461 [12] K Gouda and M.J Zaki, GenMax: An efficient algorithm 462

for mining maximal frequent itemsets, Data Mining and 463

Knowledge Discovery 11(3) (2005), 223–242. 464 [13] T.R Hoens, Q Qian, N.V Chawla and Z.H Zhou, Building 465

decision trees for the multiclass imbalance problem, In Proc 466

of PAKDD 2012, 2012, pp 122–134. 467 [14] Q.H.T Le, T Le, B Vo and B Le, An efficient and effective 468

algorithm for mining top-rank-k frequent patterns, Expert 469

Systems with Applications 42(1) (2015), 156–164. 470 [15] T Le and B Vo, An N-list-based algorithm for mining 471

frequent closed patterns, Expert Systems with Applications 472

[16] B Le, M.T Tran and B Vo, Mining frequent closed inter- 474 sequence patterns efficiently using dynamic bit vectors, 475

Applied Intelligence 43(1) (2015), 74–84. 476 [17] W Li, J Han and J Pei, CMAR: Accurate and efficient 477 classification based on multiple class-association rules, In 478

Proc of The 1st IEEE International Conference on Data 479

Mining, San Jose, California, USA, 2001, pp 369–376. 480 [18] B Liu, W Hsu and Y Ma, Integrating classification and 481

association rule mining, In Proc of the 4th International 482

Conference on Knowledge Discovery and Data Mining, 483

[19] X.B Liu, K Zhai and W Pedrycz, An improved associa- 485

tion rules mining method, Expert Systems with Applications 486

[20] W.Y Loh, Classification and regression trees, WIREs Data 488

Mining and Knowledge Discovery 1(1) (2011), 14–23. 489 [21] C Lucchese, S Orlando and R Perego, Fast and memory 490

efficient mining of frequent closed itemsets, IEEE Trans 491

Knowledge and Data Engineering 18(1) (2006), 21–36. 492 [22] S.T Mai, X He, J Feng, C Plant and C B¨ohm, Anytime 493

density-based clustering of complex data, Knowledge and 494

Information Systems 45(2) (2015), 319–355. 495 [23] V Nebot and R Berlanga, Finding association rules in 496

semantic web data, Knowledge-Based Systems 25 (2012), 497

[24] L.T.T Nguyen, B Vo, T.P Hong and H.C Thanh, CAR- 499 Miner: An efficient algorithm for mining class-association

Trang 9

Uncorrected Author Proof

L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 9

rules, Expert Systems with Applications 40(6) (2013),

500

2305–2311.

501

[25] D Nguyen, L.T.T Nguyen, B Vo and T.P Hong, A novel

502

method for constrained class association rule mining,

Infor-503

mation Sciences 320 (2015), 107–125.

504

[26] N Pasquier, Y Bastide, R Taouil and L Lakhal,

Discover-505

ing frequent closed itemsets for association rules In Proc

506

of the 5th International Conference on Database Theory,

507

1999, pp 398–416.

508

[27] N Pasquier, Y Bastide, R Taouil and L Lakhal, Efficient

509

mining of association rules using closed itemset lattices,

510

Information Systems 24(1) (1999), 25–46.

511

[28] J Pei, J Han and R Mao, CLOSET: An efficient algorithm

512

for mining frequent closed itemsets In Proc of the 5th

ACM-513

SIGMOD Workshop on Research Issues in Data Mining and

514

Knowledge Discovery, Dallas, Texas, USA, 2000, pp.11–20.

515

[29] G Pyun and U Yun, Mining top-k frequent patterns

516

with combination reducing techniques, Applied Intelligence

517

41(1) (2014), 76–98.

518

[30] J.R Quinlan, Introduction of decision tree, Machine

Learn-519

ing 1(1) (1986), 81–106.

520

[31] J Sahoo, A.K Das and A Goswami, An effective

521

association rule mining scheme using a new generic

522

basis, Knowledge and Information Systems 43(1) (2015),

523

127–156.

524

[32] N.G Singh, S.R Singh and A.K Mahanta, CloseMiner:

525

Discovering frequent closed itemsets using frequent closed

526

tidsets, In Proc of the 5th ICDM, Washington DC, USA,

527

2005, pp 633–636.

528

[33] M.R Tolun and S.M Abu-Soud, ILA: An inductive

learn-529

ing algorithm for production rule discovery, Expert Systems

530

with Applications 14(3) (1998), 361–370.

531

[34] M.T Tran, B Le and B Vo, Combination of dynamic bit vec- 532 tors and transaction information for mining frequent closed 533

sequences efficiently, Engineering Applications of Artificial 534

Intelligence 38 (2015), 183–189. 535 [35] B Vo and B Le, Mining traditional association rules using 536

frequent itemsets lattice, 39th International Conference on 537

CIE, Troyes, France, 2009, pp 1401–1406. 538 [36] B Vo and B Le, Interestingness measures for association 539

rules: Combination between lattice and hash tables, Expert 540

Systems with Applications 38(9) (2011), 11630–11640. 541 [37] B Vo, T.P Hong and B Le, DBV-Miner: A dynamic bit- 542 vector approach for fast mining frequent closed itemsets, 543

Expert Systems with Applications 39(8) (2012), 7196–7206. 544 [38] B Vo, T.P Hong and B Le, A lattice-based approach for 545

mining most generalization association rules, Knowledge- 546

Based Systems 45 (2013), 20–30. 547 [39] J Wang, J Han, Y Lu and P Tzvetkov, TFP: An efficient 548

algorithm for mining top-k frequent closed itemsets, IEEE 549

Transactions on Knowledge and Data Engineering 17(5) 550

[40] J Wang, J Han and J Pei, CLOSET+: Searching for the 552

best strategies formining frequent closed itemsets, In ACM 553

SIGKDD International Conference on Knowledge Discov- 554

ery and Data Mining, 2003, pp 236–245. 555 [41] M.J Zaki, Mining non-redundant association rules, Data 556

Mining and Knowledge Discovery 9(3) (2004), 223–248. 557 [42] M.J Zaki and C.J Hsiao, Efficient algorithms for mining 558

closed itemsets and their lattice structure, IEEE Transac- 559

tions on Knowledge and Data Engineering 17(4) (2005), 560

Ngày đăng: 12/12/2017, 11:55

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN