Mining frequent closed itemsets FCIs is important in mining non-redundant minimal association rules.. There-fore, many algorithms have been developed for mining FCIs with reduced mining
Trang 1Uncorrected Author Proof
Journal of Intelligent & Fuzzy Systems xx (20xx) x–xx
DOI:10.3233/JIFS-169128
IOS Press
1
A method for mining top-rank-k
frequent closed itemsets
1
2
Loan T.T Nguyena,b,∗, Truc Trinhc, Ngoc-Thanh Nguyendand Bay Voe
3
aDivision of Knowledge and System Engineering for ICT, Ton Duc Thang University,
Ho Chi Minh City, Vietnam
4
5
bFaculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
6
cVOV College, Ho Chi Minh City, Vietnam
7
dFaculty of Computer Science and Management, Wroclaw University of Science and Technology,
Wrocław, Poland
8
9
eFaculty of Information Technology, Ho Chi Minh City University of Technology, Vietnam
10
Abstract Mining frequent closed itemsets (FCIs) is important in mining non-redundant (minimal) association rules
There-fore, many algorithms have been developed for mining FCIs with reduced mining time and memory usage For mining FCIs,
algorithms use the minimum support threshold, minSup, to prune itemsets However, using a fixed minSup is not suitable for mining top-rank-k FCIs A large threshold will lead to a small number of generated FCIs, leading to insufficient FCIs to query
whenk is large On the other hand, a small minSup will generate a huge number of generated FCIs, leading to large runtimes and high memory usage In this paper, we propose a method for mining top-rank-k FCIs without using a fixed minimum support threshold A strategy is first used to eliminate 1-items that cannot generate FCIs belonging to top-rank-k FCIs Next, based on the set of candidate 1-items, we propose TRK-FCI, a DCI-Plus-based algorithm, for mining top-rank-k FCIs In the process of mining top-rank-k FCIs, TRK-FCI automatically increases minSup according to the mined FCIs, efficiently pruning itemsets that cannot belong to top-rank-k FCIs We also modify the dynamic bit vector (DBV) structure and apply
it to reduce memory usage and runtime in the TRK-FCI-DBV algorithm Experimental results show that TRK-FCI-DBV is more efficient than TRK-FCI for various databases
11
12
13
14
15
16
17
18
19
20
21
22
Keywords: DCI-Plus, dynamic bit vectors, frequent closed itemsets, top-rank-k frequent closed itemsets
23
1 Introduction
24
Data mining is the process of extracting interesting
25
knowledge from data Various methods for
discover-26
ing knowledge have been proposed, such as mining
27
traditional association rules [1–4, 6, 7, 23, 31, 36, 37],
28
mining non-redundant association rules [8, 41],
min-29
ing minimal non-redundant association rules [26, 27],
30
mining most generalization association rules [38],
∗Corresponding author Loan T.T Nguyen, Division of
Knowledge and System Engineering for ICT, Ton Duc
Thang University, Ho Chi Minh City, Vietnam E-mail:
nguyenthithuyloan@tdt.edu.vn.
31
classification using decision trees [13, 20, 30] or 32
ILA [33], classification based on association rules 33
[13, 14, 20, 21], and clustering [22] Mining asso- 34
ciation rules has many applications in practice [3, 35
23] For mining association rules, frequent itemsets 36
[2, 11, 21, 42], frequent closed itemsets (FCIs) [15, 37
21, 26, 28, 31, 32, 37, 40–42], or maximal frequent 38
itemsets [12, 19] must be mined Mining frequent 39
itemsets is often used for generating all association 40
rules that satisfy minimum support threshold (min- 41
Sup) and minimum confidence threshold (minConf) 42
[1, 2, 35, 36] and mining FCIs is used for mining 43
(minimal) non-redundant association rules (i.e., rules 44 1064-1246/16/$35.00 © 2016 – IOS Press and the authors All rights reserved
Trang 2Uncorrected Author Proof
considered redundant based on certain criteria are
45
eliminated) [26, 27, 41] For mining maximal
fre-46
quent itemsets, all frequent itemsets or FCIs (for
47
which the database must be scanned to compute the
48
supports of itemsets) must be generated to mine above
49
kinds of rules
50
Mining FCIs is important for pruning redundant
51
rules The problem was first stated in 1999 by
52
Pasquier et al [26] Since then, many algorithms have
53
been developed to enhance the efficiency of
min-54
ing FCIs, such those based on FP-tree [11, 28, 40],
55
IT-tree [32, 42], bit vectors [31, 37], and N-Lists
56
[15] To mine FCIs, the minSup is set The FCIs
57
that satisfy the minSup threshold are selected It is
58
difficult to mine a sufficient number of top-rank-k
59
FCIs because an excessively high threshold will lead
60
to very few FCIs, not enough to query Conversely,
61
a minSup that is too low will lead to a very large
num-62
ber of FCIs, requiring a lot of memory and time to
63
mine Therefore, developing efficient algorithms for
64
mining top-rank-k FCIs is necessary.
65
Some algorithms have been developed for
min-66
ing top-rank-k frequent itemsets Deng et al [5]
67
proposed the NTK algorithm and used a Node-list
68
to mine top-rank-k frequent itemsets The iNTK
69
algorithm, an improved version of the NTK
algo-70
rithm proposed by Le et al [14], uses the subsume
71
concept and the N-list structure to fast mine
top-rank-72
k itemsets After that, some algorithms have been
73
developed for mining top-k frequent itemsets [29],
74
top-k FCIs [39], and top-k non-redundant association
75
rules [8]
76
Mining top-rank-k FCIs is important for mining
77
non-redundant association rules However, for our
78
best knowledge, there are no developed algorithms
79
for mining top-rank-k FCIs Besides, algorithms
80
developed for mining top-rank-k frequent itemsets
81
or top-k FCIs cannot be applied to mine
top-rank-82
k FCIs Therefore, in this paper, we propose the
83
TRK-FCI algorithm, which is based on DCI-Plus
84
[31], for mining top-rank-k FCIs First, the
algo-85
rithm finds a set of candidate items that may belong
86
to top-rank-k FCIs, where k is a given threshold.
87
Then, it uses the DCI-Plus algorithm to generate
88
FCIs based on these candidate items When an FCI
89
is generated, it is directly inserted into a table named
90
tab k FCIs with the same support are stored in the
91
same entry The number of entries in tab k is below
92
the threshold k In the process of mining top-rank-k
93
FCIs, the algorithm automatically increases minSup
94
to reduce the number of FCI candidates that do not
belong to tab k Because DCI-Plus uses fixed bit 96
vectors, it has high memory usage and runtime for 97
storing and computing the bit vector of a new item- 98
set, checking subsets, and computing the supports 99
of itemsets TRK-FCI-DBV, an improved version of 100
TRK-FCI, is then developed TRK-FCI-DBV uses 101
the dynamic bit vector (DBV) structure instead of 102
the bit vector structure to reduce mining time and 103
memory usage 104
The rest of this paper is organized as follows Sec- 105
tion 2 presents definitions of FCIs and top-rank-k 106
FCIs and states the problem of mining top-rank-k 107
FCIs In Section 3, we review works related to the 108
problem of mining FCIs, top-k and top-rank-k fre- 109
quent itemsets, and top-k FCIs Section 4 describes 110
a method for mining top-rank-k based on the DCI- 111
Plus algorithm and an improved algorithm based on 112
DBVs Experimental results on standard databases 113
for TRK-FCI and TRK-FCI-DBV are presented in 114
Section 5 Conclusions and suggestions for future 115
work are given in Section 6 116
2 Definitions and problem statement 117
Let I = {i1, i2, , i m} be a set of items and 118
DB = {t1, t2, , t n} be a set of transactions, where 119
eacht i(1≤ i ≤ n) is a transaction labeled by a unique 120
identifier and contains a set of items inI. 121
Definition 1 (support of an itemset) Given aDB and 122
an itemsetX (X ⊆ I), the support of X, denoted by 123
SUP X, is the number of transactions containingX in 124
Definition 2 (frequent itemset) Given a DB and 126
an itemset X (X ⊆ I), X is a frequent itemset if 127
SUP X ≥ min Sup. 128
Definition 3 (FCI) Given a DB and an itemset 129
X (X ⊆ I), X is called an FCI if no itemset Y exists 130
such thatX ⊂ Y and SUP X = SUP Y 131
Definition 4 (rank of an FCI) Given a set of CI
including all closed itemsets from a transaction database DB and an FCI X (X ∈ CI), the rank of
X in CI is the number of itemsets whose support
val-ues are no greater than the support ofX The rank of
X is defined as:
R X = |{SUP Y |Y ∈ CI ∧ SUP Y ≥ SUP X}|
Trang 3Uncorrected Author Proof
L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 3
Definition 5 (a top-rank-k FCI) Given a set of
132
CI including all closed itemsets from a transaction
133
databaseDB and a threshold k, an itemset X ∈ CI is
134
called a top-rank-k FCI if and only if R Xis no greater
135
thank, i.e., R X ≤ k.
136
Definition 6 (mining top-rank-k FCIs) Given CI
137
including all FCIs from transaction databaseDB and
138
a thresholdk, the goal of mining top-rank-k FCIs is to
139
find a complete set of FCIs whose ranks are no greater
140
thank, i.e., top-rank-k FCIs are a set of itemsets for
141
which{X ∈ CI|R X ≤ k}.
142
From definition 6, the problem of mining
top-rank-143
k FCIs is stated as follows Given a database BD and
144
a thresholdk, mining top-rank-k FCIs is divided into
145
two steps:
146
Step 1: Mine all closed itemsets inDB, a set called
147
CI.
148
Step 2: Keep the closed itemsets that satisfy
defini-149
tion 6 inCI.
150
The above approach is simple but not feasible
151
because the number of closed itemsets in the database
152
is often large Therefore, finding a direct solution for
153
mining top-rank-k FCIs without mining all closed
154
itemsets is a challenge
155
3 Related works
156
3.1 Mining frequent closed itemsets
157
Problem of mining FCIs was first proposed in
158
1999 [26] Many algorithms for mining FCIs have
159
since been developed to reduce runtime and
mem-160
ory usage Apriori-based algorithms for this purpose
161
include Close [26] and A-Close [27] These
algo-162
rithms generate candidates and compute their closure
163
to find FCIs Algorithms based on the
divide-and-164
conquer technique have been developed Closet [28]
165
uses FP-tree to compress the database and early
prun-166
ing to prune non-closed itemsets Closet+ [40], an
167
improved version of Closet (which uses a
bottom-168
up projection scheme for FP-tree), uses a hybrid
169
approach: bottom-up for dense databases and
top-170
down for sparse databases It uses item merging and
171
sub-itemset pruning, which are widely used in other
172
algorithms, and applies the subset checking strategy
173
to fast check closed itemsets and item skipping to
eliminate items at high levels that have the same 175
support as that of items at low levels FPClose [11] is 176
an improved version of Closet+ that uses FP-array to 177
reduce the number of FP-tree scans when FP-tree is 178
projected CHARM [42] is based on tidsets for fast 179
computing the supports of itemsets and uses subset 180
checking to fast prune non-closed itemsets To check 181
whether a generated itemset is closed, CHARM uses 182
a hash table in which the key of each itemset is the sum 183
of its items dCHARM, a diffset approach for mining 184
FCIs is also developed [42] CloseMiner [32] uses 185
closed tidsets to check whether an itemset is closed 186
Although CHARM, dCHARM and CloseMiner have 187
advantages over algorithms based on horizon data 188
format such as Close, A-Close, Closet, Closet+, and 189
FPClose, they must use hash tables to check whether 190
a candidate itemset is closed, and thus closed itemsets 191
must be stored in main memory for easy checking 192
DCI-Closed [21] uses tidsets and a non-duplication 193
generation strategy for mining FCIs DCI-Plus [31], 194
an improved version of DCI-Closed [21], generates 195
FCIs and minimal generators of each FCI Because 196
DCI-Closed is based on tidsets, when the tidsets of 197
itemsets are long, a lot of memory is required to store 198
the tidsets and the runtime required to compute the 199
intersection with other tidsets is high To reduce the 200
length of tidsets and reduce computation time, DCI- 201
Plus uses BitTable 202
3.2 Mining top-rank-k frequent itemsets 203
Deng et al proposed the NTK algorithm for min- 204
ing top-rank-k frequent itemsets [5] NTK uses the 205
Node-list data structure to represent itemsets and 206
uses a level-wise approach for mining top-rank-k 207
frequent itemsets, i.e., t-patterns are used to form 208
(t+1)-patterns By using Node-lists, the algorithm 209
does not need to rescan the database to compute the 210
supports of itemsets A dynamic minSup is used to 211
efficient prune candidates Le et al developed iNTK 212
[14], an improved version of NTK iNTK uses the 213
subsume concept to reduce the number of generated 214
candidates compared to those for NTK, reducing the 215
time required to generate candidates 216
3.3 Mining top-k frequent closed itemsets 217
Wang et al [39] proposed the TFP algorithm for 218
mining top-k FCIs, where k is the number of FCIs 219
that need to be mined TFP uses a divide-and-conquer 220
technique (like FP-Growth) and prunes candidates
Trang 4Uncorrected Author Proof
based on minSup (automatically increased in the
222
process of updating candidates) The authors also
223
used a threshold min l to eliminate itemsets whose
224
lengths are smaller than min l.
225
3.4 Mining top-k association rules
226
In 2012, Fournier-Viger et al [10] proposed the
227
TopKRules algorithm for mining top-k association
228
rules from datasets This algorithm uses the
min-229
Conf value during the mining process of top-k rules
230
The change of the minSup value is dependent on
231
the lowest support of itemsets The TopKRules
algo-232
rithm is based on the principle of extending rules and
233
some methods for early eliminating rules that do not
234
belong to top-k rules Fournier-Viger and Tseng also
235
extended TopKRules for mining top-k non-redundant
236
rules [8] and top-k sequential rules [9] These
algo-237
rithms are very efficient compared to post-processing
238
methods
239
3.5 Dynamic bit vectors
240
In 2012, Vo et al [37] proposed the concept of
241
dynamic bit vectors (DBV) and used it in mining
fre-242
quent closed itemsets DBV of an itemset is a bit
243
vector in which zero bits from the begin and the end
244
are removed With this concept, we can save memory
245
to store bit vectors and time to compute the
intersec-246
tion of bit vectors Tran et al expanded this concept
247
to mine frequent closed sequences [34] Le et al also
248
used DBV to develop an efficient algorithm for
min-249
ing frequent closed inter-sequence using DBV [16]
250
4 Proposed algorithms
251
In this section, we present the TRK-FCI algorithm
252
for mining top-rank-k FCIs based on BitTable
TRK-253
FCI uses DCI Plus [31] to generate candidate closed
254
itemsets and apply some early pruning techniques to
255
prune candidates First, the algorithm chooses a set
256
of candidate items that may belong to top-rank-k
257
FCIs, where k is a given threshold Then, it uses
258
the DCI-Plus algorithm to generate FCIs based on
259
these candidate items When an FCI is generated, it
260
is directly inserted into a table namedtab k FCIs with
261
the same support are stored in the same entry The
262
number of entries intab kis below the thresholdk In
263
the process of mining top-rank-k FCIs, the algorithm
264
automatically increases minSup to reduce the number
265
of FCI candidates that do not belong totab k
266
Fig 1 TRK-FCI algorithm for mining top-rank-k FCIs.
In the above algorithm, databaseD is first scanned 268
to compute the BitTable and determine single items 269
F1 These items are sorted in descending order 270
according to their supports; if two items have the 271
same support, then they are sorted in increasing lex- 272
icographical order Next, the algorithm creates F2 273
by inserting each item in F1 into F2 such that the 274
number of items (which are different in their Bit- 275
Tables) is equal to k The items in F2 are sorted in 276
increasing order according to their supports; if two 277
items have the same support, then they are sorted in 278
increasing lexicographical order POST SET is cre- 279
ated by computing the closure of each item inF2 The 280
procedure DCI CLOSED++is called with the input 281
iCLOSED SET =∅, PRE SET = ∅, POST SET, and 282
minSup, where minSup is the support of the first item 283
Trang 5Uncorrected Author Proof
L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 5
inF2 if the number of items which are different in
284
their BitTables inF2is equal tok; otherwise, minSup
285
is set to 0
286
DCI CLOSED++
Fig 2 DCI CLOSED++ procedure.
Consider the database in Table 1, which includes 288
10 transactions and 10 items 289
Assume that k = 5, the process of mining top-rank- 290
k FCIs is as follows First, the BitTable and support 291
of each item are obtained, as shown in Table 2 292
AfterF1is sorted, we haveF1= {G, F, E, H, D, C, 293
B, A, J, I} Next, we choose items from F1that may 294
belong to top-rank-k FCIs and store them in F2, i.e., 295
F2= {A, C, D, H, E, F, G} (after sorting) Because 296
A has the same BitTable as that of C and E has the 297
same BitTable as that of F, they are grouped into 298
two groups as (A, B) and (E, F), respectively After 299
grouping, the algorithm computes the closure of each 300
item The results are shown in Table 3 From Table 3, 301
we have POST SET ={ACEFG, CEFG, DEF, H, 302
EF, F, G}. 303
Table 1 Transaction databaseD
Transaction Items
Table 2 Items inD with their BitTable and support
Table 3 BitTable and Closure of items in D Item BitTable Closure Support
Trang 6Uncorrected Author Proof
Procedure DCI CLOSED++ is called with the
304
input PRE SET =∅, POST SET, CLOSED SET = ∅,
305
and minSup = supp(A) = 0.2 The first element of
306
POST SET (ACEFG) is set to I Because PRE
307
SET =∅ and supp(ACEFG) = minSup, ACEFG
308
is an FCI, and it is put into tab k with its key,
309
which is its support (0.2), ACEFG is also inserted
310
into PRE SET Next, itemset CEFG is processed
311
Because supp(CEFG) = minSup and the BitTable
312
of CEFG is a subset of the BitTable of ACEFG
313
in PRE SET, CEFG is pruned When DEF is
314
processed, because supp(DEF) > minSup and its
315
BitTable is not a subset of the BitTable of any
316
itemset in PRE SET, DEF is an FCI, and it is put into
317
tab k with its key, which is its support (0.5) After
318
that, procedure DCI CLOSED++ is called with
319
PRE SET ={ACEFG}, CLOSED SETnew= DEF,
320
and POST SETnew={H, G} EF and F do not
321
appear in POST SETnew because they belong to
322
CLOSED SET Because CLOSED SET /= φ, DEF
323
is joined with H to create a newgen, which is
324
DEFH Similarly, DEFH is an FCI, and is
325
inserted into tab k with its key (0.3) The
pro-326
cedure is called recursively with parameters PRE
327
SET ={ACEFG}, CLOSED SETnew= DEFH, and
328
POST SETnew={G} Because CLOSED SET /= φ
329
and its generator is DEFHG, and there is no itemset
330
X in PRE SET such that the BitTable of DEFHG is
331
a subset of the BitTable of X, and thus DEFGH is
332
an FCI, and is inserted intotab k with its key (0.2)
333
Now, POST SET =φ and thus DEF is added into
334
PRE SET The process continues by joining DEF
335
with G to form DEFG DEFG is also an FCI and it
336
is inserted intotab kwith its key (0.4 The algorithm
337
then starts with a newgen H H is an FCI and is
338
inserted intotab k with its key (0.7) Note that now
339
the number of entries intab kis 5 and equal tok The
340
algorithm will continue to insert generated FCIs into
341
tab k They include HEF (key is 0.5), HEFG (key
342
is 0.3), and HG (key is 0.5) Consider the process
343
of inserting FCI EF (whose key is 0.8) into tab k
344
Because the key of EF is greater than that of the
345
last entry (DEFGH) intab k(key is 0.2), DEFGH is
346
Table 4 Top-rank-k FCIs generated according to TRK-FCI algorithm
k key/sup FCIs
1 0.8 {EF}, {G}
3 0.6 {EFG},
4 0.5 {DEF}, {HEF}, {HG}
5 0.4 {DEFG}
removed and EF is inserted into tab k , and minSup 347
is set to 0.3 (the key of the last entry intab k) The 348
algorithm will continue to process other FCIs The 349
results are shown in Table 4 350
The TRK-FCI algorithm is based on the DCI-Plus 352
algorithm Because DCI-Plus uses bit vectors to rep- 353
resent the tidsets of items, it requires more memory 354
to store bit vectors and more time to compute the 355
intersection of bit vectors when the number of trans- 356
actions in the database is large To reduce the mining 357
time and memory usage, we develop an improved 358
algorithm that uses DBVs instead 359
Table 5 is presented to show the process of using 360
DBVs for mining top-rank-k FCIs It shows the details 361
of items, supports, closures, and DBVs ofF2 362
Procedure DCI CLOSED++is the same as that in 363
TRK-FCI but the operations for BitTable are replied 364
by operations for DBVs The final results are the same 365
as those obtained with TRK-FCI 366
The algorithms used in the experiments were 368
implemented in C# 2012 on a personal computer 369
with an i5-4200U 1.60-GHz CPU and 4 GB of 370
RAM running Windows 8.1 The experiments were 371
tested on three databases downloaded from the UCI 372
Machine Learning Repository (http://fimi.ua.ac.be/ 373
data) Table 6 shows the characteristics of the exper- 374
imental databases 375
Table 5 DBVs, closures, and supports of items in F2
A {0,520} ACEFG 0.2
C {0,520} CEFG 0.2
D {0,355} DEF 0.5
H {0,919} H 0.7
E {0,879} EF 0.8
F {0,879} F 0.8
G {0,763} G 0.8
Table 6 Characteristics of experimental databases Database # of transactions # of items
Trang 7Uncorrected Author Proof
L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 7
The experimental databases have different
fea-376
tures The Pumsb and Accidents databases have many
377
transactions (or records), whereas the Chess database
378
is small (3196 transactions)
379
5.1 Execution time
380
The efficiency of applying BitTable and DBVs
381
for mining top-rank-k FCIs was evaluated The
382
Fig 3 Runtimes of TRK-FCI-DBV and TRK-FCI for Accidents
database.
Fig 4 Runtimes of TRK-FCI-DBV and TRK-FCI for Chess
database.
Fig 5 Runtimes of TRK-FCI-DBV and TRK-FCI for Pumsb
database.
experiments were conducted with various values of 383
threshold k for the Accidents, Chess, and Pumsb 384
databases With increasing thresholdk, the number of 385
FCIs increased, increasing the time required to obtain 386
top-rank-k FCIs. 387
Figures 3 to 5 show that the time required for 388
mining top-rank-k FCIs from the three databases 389
increases with increasing k TRK-FCI-DBV runs 390
Fig 6 Memory usage of TRK-FCI-DBV and TRK-FCI for Chess database.
Fig 7 Memory usage of TRK-FCI-DBV and TRK-FCI for Acci-dents database.
Fig 8 Memory usage of TRK-FCI-DBV and TRK-FCI for Pumsb database.
Trang 8Uncorrected Author Proof
faster than TRK-FCI For example, consider the
391
Pumsb database with a thresholdk of 200 The mining
392
time of TRK-FCI is 179.8 s and that of
TRK-FCI-393
DBV is 130.7 s Most of the processing time for both
394
algorithms is in the itemset expansion stage
TRK-395
FCI-DBV has a lower processing time because it uses
396
a better data format
397
5.2 Memory usage
398
Figures 6 to 8 show that the memory usage for
399
mining top-rank-k FCIs for the three experimental
400
databases increases with increasing threshold k The
401
memory required by TRK-FCI-DBV is significantly
402
less than that required by TRK-FCI Consider the
403
Pumsb database with a threshold k of 120 The
mem-404
ory usage values of the two algorithms are similar;
405
however, when the threshold k is increased to 200,
406
the memory used by TRK-FCI is nearly double that
407
used by TRK-FCI-DBV
408
6 Conclusion and future work
409
This paper proposed a method for mining
top-rank-410
k FCIs based on DCI-Plus Two efficient algorithms,
411
TRK-FCI and TRK-FCI-DBV, were proposed These
412
two algorithms differ in the way they represent data
413
for each itemset, which gives them different mining
414
times and memory usage values A strategy is used
415
to automatically change minSup to prune candidates
416
in the mining process The mining time and memory
417
usage of the two algorithms were analyzed to
com-418
pare the effectiveness of DBV compared to that of
419
BitTable
420
In the future, we will study how to prune candidates
421
more efficiently Moreover, we will try to use other
422
approaches for mining top-rank-k FCIs We will also
423
expand our research to quantitative databases
424
References
425
[1] R Agrawal, T Imielinski and A Swami, Mining association
426
rules between sets of items in large databases, In Proc of the
427
1993 ACM SIGMOD Conference Washington DC, USA,
428
1993, pp 207–216.
429
[2] R Agrawal and R Srikant, Fast algorithms for mining
430
association rules in large databases, In Proc of the 20th
431
International Conference on Very Large Data Bases, San
432
Francisco, CA, USA, 1994, pp 487–499.
433
[3] S Ayubi, M.K Muyeba, A Baraani and J Keane, An
algo-434
rithm to mine general association rules from tabular data,
435
Information Sciences 179(20) (2009), 3520–3539.
[4] E Baralis, L Cagliero, T Cerquitelli and P Garza, Gener- 437
alized association rule mining with constraints, Information 438
Sciences 194 (2011), 68–84. 439 [5] Z.H Deng, Fast mining top-rank-k frequent patterns by 440
using Node-lists, Expert Systems with Applications 41(4) 441
[6] Y.J Du and H.M Li, Strategy for mining association rules 443
for web pages based on formal concept analysis, Applied 444
Soft Computing 10 (2010), 772–783. 445 [7] H.V Duong and T.C Truong, An efficient method for min- 446 ing association rules based on minimum single constraints, 447
Vietnam Journal of Computer Science 2(2) (2015), 67–83. 448 [8] P Fournier-Viger and V.S Tseng, Mining top-k non- 449
redundant association rules, In Proc of 20th International 450
Symposium, ISMIS 2012, Macau, China, 7661, 2012, pp. 451
[9] P Fournier-Viger and V.S Tseng, Mining top-K sequential 453
rules, In Proc of ADMA 2011, Beijing, China, 7121, 2011, 454
[10] P Fournier-Viger, C.W Wu and V.S Tseng, Mining top-K 456
association rules, In Proc of Canadian Conference on AI 457
2012, Toronto, Canada, 7310, 2011, pp 61–73. 458 [11] G Grahne and J Zhu, Fast algorithms for frequent itemset 459
mining using fptrees, IEEE Transactions on Knowledge and 460
Data Engineering 17(10) (2005), 1347–1362. 461 [12] K Gouda and M.J Zaki, GenMax: An efficient algorithm 462
for mining maximal frequent itemsets, Data Mining and 463
Knowledge Discovery 11(3) (2005), 223–242. 464 [13] T.R Hoens, Q Qian, N.V Chawla and Z.H Zhou, Building 465
decision trees for the multiclass imbalance problem, In Proc 466
of PAKDD 2012, 2012, pp 122–134. 467 [14] Q.H.T Le, T Le, B Vo and B Le, An efficient and effective 468
algorithm for mining top-rank-k frequent patterns, Expert 469
Systems with Applications 42(1) (2015), 156–164. 470 [15] T Le and B Vo, An N-list-based algorithm for mining 471
frequent closed patterns, Expert Systems with Applications 472
[16] B Le, M.T Tran and B Vo, Mining frequent closed inter- 474 sequence patterns efficiently using dynamic bit vectors, 475
Applied Intelligence 43(1) (2015), 74–84. 476 [17] W Li, J Han and J Pei, CMAR: Accurate and efficient 477 classification based on multiple class-association rules, In 478
Proc of The 1st IEEE International Conference on Data 479
Mining, San Jose, California, USA, 2001, pp 369–376. 480 [18] B Liu, W Hsu and Y Ma, Integrating classification and 481
association rule mining, In Proc of the 4th International 482
Conference on Knowledge Discovery and Data Mining, 483
[19] X.B Liu, K Zhai and W Pedrycz, An improved associa- 485
tion rules mining method, Expert Systems with Applications 486
[20] W.Y Loh, Classification and regression trees, WIREs Data 488
Mining and Knowledge Discovery 1(1) (2011), 14–23. 489 [21] C Lucchese, S Orlando and R Perego, Fast and memory 490
efficient mining of frequent closed itemsets, IEEE Trans 491
Knowledge and Data Engineering 18(1) (2006), 21–36. 492 [22] S.T Mai, X He, J Feng, C Plant and C B¨ohm, Anytime 493
density-based clustering of complex data, Knowledge and 494
Information Systems 45(2) (2015), 319–355. 495 [23] V Nebot and R Berlanga, Finding association rules in 496
semantic web data, Knowledge-Based Systems 25 (2012), 497
[24] L.T.T Nguyen, B Vo, T.P Hong and H.C Thanh, CAR- 499 Miner: An efficient algorithm for mining class-association
Trang 9Uncorrected Author Proof
L.T.T Nguyen et al / A method for mining top-rank-k frequent closed itemsets 9
rules, Expert Systems with Applications 40(6) (2013),
500
2305–2311.
501
[25] D Nguyen, L.T.T Nguyen, B Vo and T.P Hong, A novel
502
method for constrained class association rule mining,
Infor-503
mation Sciences 320 (2015), 107–125.
504
[26] N Pasquier, Y Bastide, R Taouil and L Lakhal,
Discover-505
ing frequent closed itemsets for association rules In Proc
506
of the 5th International Conference on Database Theory,
507
1999, pp 398–416.
508
[27] N Pasquier, Y Bastide, R Taouil and L Lakhal, Efficient
509
mining of association rules using closed itemset lattices,
510
Information Systems 24(1) (1999), 25–46.
511
[28] J Pei, J Han and R Mao, CLOSET: An efficient algorithm
512
for mining frequent closed itemsets In Proc of the 5th
ACM-513
SIGMOD Workshop on Research Issues in Data Mining and
514
Knowledge Discovery, Dallas, Texas, USA, 2000, pp.11–20.
515
[29] G Pyun and U Yun, Mining top-k frequent patterns
516
with combination reducing techniques, Applied Intelligence
517
41(1) (2014), 76–98.
518
[30] J.R Quinlan, Introduction of decision tree, Machine
Learn-519
ing 1(1) (1986), 81–106.
520
[31] J Sahoo, A.K Das and A Goswami, An effective
521
association rule mining scheme using a new generic
522
basis, Knowledge and Information Systems 43(1) (2015),
523
127–156.
524
[32] N.G Singh, S.R Singh and A.K Mahanta, CloseMiner:
525
Discovering frequent closed itemsets using frequent closed
526
tidsets, In Proc of the 5th ICDM, Washington DC, USA,
527
2005, pp 633–636.
528
[33] M.R Tolun and S.M Abu-Soud, ILA: An inductive
learn-529
ing algorithm for production rule discovery, Expert Systems
530
with Applications 14(3) (1998), 361–370.
531
[34] M.T Tran, B Le and B Vo, Combination of dynamic bit vec- 532 tors and transaction information for mining frequent closed 533
sequences efficiently, Engineering Applications of Artificial 534
Intelligence 38 (2015), 183–189. 535 [35] B Vo and B Le, Mining traditional association rules using 536
frequent itemsets lattice, 39th International Conference on 537
CIE, Troyes, France, 2009, pp 1401–1406. 538 [36] B Vo and B Le, Interestingness measures for association 539
rules: Combination between lattice and hash tables, Expert 540
Systems with Applications 38(9) (2011), 11630–11640. 541 [37] B Vo, T.P Hong and B Le, DBV-Miner: A dynamic bit- 542 vector approach for fast mining frequent closed itemsets, 543
Expert Systems with Applications 39(8) (2012), 7196–7206. 544 [38] B Vo, T.P Hong and B Le, A lattice-based approach for 545
mining most generalization association rules, Knowledge- 546
Based Systems 45 (2013), 20–30. 547 [39] J Wang, J Han, Y Lu and P Tzvetkov, TFP: An efficient 548
algorithm for mining top-k frequent closed itemsets, IEEE 549
Transactions on Knowledge and Data Engineering 17(5) 550
[40] J Wang, J Han and J Pei, CLOSET+: Searching for the 552
best strategies formining frequent closed itemsets, In ACM 553
SIGKDD International Conference on Knowledge Discov- 554
ery and Data Mining, 2003, pp 236–245. 555 [41] M.J Zaki, Mining non-redundant association rules, Data 556
Mining and Knowledge Discovery 9(3) (2004), 223–248. 557 [42] M.J Zaki and C.J Hsiao, Efficient algorithms for mining 558
closed itemsets and their lattice structure, IEEE Transac- 559
tions on Knowledge and Data Engineering 17(4) (2005), 560