The lattice-based approaches formining association rules: a review Tuong Le1,2and Bay Vo3,4* The traditional methods for mining association rules ARs include two phrases: mining frequent
Trang 1The lattice-based approaches for
mining association rules: a review
Tuong Le1,2and Bay Vo3,4*
The traditional methods for mining association rules (ARs) include two
phrases: mining frequent itemsets (FIs)/frequent closed itemsets
(FCIs)/fre-quent maximal itemsets (FMIs) and generating ARs from FIs/FCIs/FMIs
Lattice-based approaches (LBAs) for mining ARs are new approaches including
two phrases: frequent itemset lattice (FIL)/frequent closed itemset lattice (FCIL)
building and generating ARs from the lattice Total mining time of LBAs for
mining ARs outperforms the traditional methods for mining ARs Besides, the
most important advantage of LBAs for mining ARs is that the algorithms only
build the lattice once and mine ARs with many different confidences or many
different minimum supports (the thresholds have to be greater than or equal
to the threshold used to build lattices) without mining FIs/FCIs again In this
article, we describe a number of existing LBAs for mining ARs on static
data-bases including lattice building and rule generation In addition, in today’s
online system, the data often change in several operations such as insert,
delete, and update Hence, a number of LBAs for mining ARs on dynamic
databases are mentioned Finally, complexity analysis of the LBAs for mining
ARs is also thoroughly discussed.© 2016 John Wiley & Sons, Ltd
How to cite this article:
WIREs Data Mining Knowl Discov 2016, 6:140–151 doi: 10.1002/widm.1181
INTRODUCTION
Data mining is a process of analyzing the data to
find knowledge to use in intelligent systems
There are currently many problems to be introduced
such as problem of mining association rules (ARs),
classification,1–6clustering,7,8text mining,9and their
applications.2,10 Mining ARs, including ARs,
mini-mal non-redundant association rules (MNARs), and
most generalization association rules (MGARs), is a
model being widely used in market basket analysis,
online e-commerce such as Amazon, Alibaba, and so
on, and several other recommendation systems Traditional approaches for mining ARs consist of two steps: mining frequent itemsets (FIs)/frequent closed itemsets (FCIs)/frequent maximal itemsets (FMIs) (FIs/FCIs/FMIs),11,12,13 and generating rules from those itemsets Some variants of FIs such as high utility itemsets (itemsets whose utility satisfies a given threshold),14–27top-k high utility itemsets
(top-k itemsets with highest utility),28 weighted pattern (pattern with weighted items),29–31 erasable itemsets (itemsets can be eliminated but do not greatly affect the factory’s profit),32–34 weighted erasable patterns (erasable itemsets considered the distinct weight of each item),35,36and so on are proposed Besides, sev-eral type of representations that limit the number of FIs such as FCIs,37–41 FMIs,42–47 top-k FIs,48,49
top-rank-k FIs,50,51 and FIs with constraints52 are also proposed In traditional approaches for mining ARs, researchers usually focus on the first phrase (mining FIs/FCIs/FMIs) However, the second phrase (rule generation) takes a lot of time for mining a large
*Correspondence to: bayvodinh@gmail.com
1 Division of Data Science, Ton Duc Thang University, Ho Chi
Minh City, Vietnam
2 Faculty of Information Technology, Ton Duc Thang University,
Ho Chi Minh City, Vietnam
3 Faculty of Information Technology, Ho Chi Minh City University
of Technology, Ho Chi Minh City, Vietnam
4 College of Electronics and Information Engineering, Sejong
University, Seoul, Republic of Korea
Con flict of interest: The authors have declared no conflicts of
inter-est for this article.
Trang 2number of FIs/FCIs/FMIs Therefore, lattice-based
approaches (LBAs) for mining ARs are proposed to
overcome the above weakness Generally, these
approaches will build frequent itemset lattice (FIL)/
frequent closed itemset lattice (FCIL) (FIL/FCIL) in
thefirst phrase In the next phrase, they only traverse
the lattice to generate ARs As generating rules
from lattice has less complexity than traditional
approaches such as Apriori or hash table, total
mining time of LBAs for mining ARs outperforms
the traditional methods for mining ARs Moreover,
the largest advantage of LBAs for mining ARs is that
the algorithms only build the lattice once and mine
ARs with many different confidences or many
differ-ent minimum supports (the thresholds have to be
greater than or equal to the threshold used to build
lattices) without mining FIs/FCIs/FMIs again
There-fore, LBAs are extensively used to mine ARs
nowa-days In addition, in today’s online system, the data
often change in several operations such as add,
delete, and update, especially in e-commerce systems,
which raise a need of improving AR mining methods
to adapt with the new requirements There have been
several studies for mining patterns/rules on dynamic
databases In this article, we conduct a review of
LBAs for mining ARs including lattice building and
rule generation phrases Furthermore, a number of
LBAs for mining ARs for dynamic databases are also
surveyed Next, the complexity analysis of LBA for
mining ARs is discussed Finally, some challenges of
the LBAs and their potential applications in the near
future are introduced
The rest of the article is organized as follows
The section “Classical Approaches for Mining ARS”
presents the classical approaches for mining FIs/FCIs
and mining ARs/MNARs/MGARs In the section
“The FIL/FCIL Building,” we report the existing
approaches for building FIL and FCIL Next, the
section“LBAs for Mining ARs” presents a number of
LBAs for mining ARs Some incremental LBAs for
mining ARs are subsequently presented in the
section “LBAs for Mining ARS on Dynamic
Data-bases ” Then, the section “Complexity Analysis”
shows complexity analysis of LBA for mining ARs
The conclusion is presented in the section
“Conclu-sion and Future Researches.”
CLASSICAL APPROACHES
FOR MINING ARS
Given a database (DB) comprising of a number of
transactions (n) such that each transaction contains a
number of items Transaction database (DB e) is
presented in Table 1 as an example and will be used for illustrative purposes throughout this article
The support of an itemset X, denoted by σ(X),
is the number of transactions in DB that contain all items of X An itemset X is an FI if and only if
σ(X) ≥ dminSup × ne, in which minSup is a
user-given minimum support threshold Currently, there are many algorithms for mining FIs, which may be divided into three main groups: (1) Methods that use a candidate generate-and-test strategy: they generate frequent 1-itemsets which are then used to generate candidate 2-itemsets, and so on until there
is no more candidates that can be generated Apriori53 and BitTableFI54 are exemplar algorithms (2) Methods that adopt a divide-and-conquer
strat-egy: they compress DB into a tree structure and mine
FIs from this tree by using divide-and-conquer strat-egy FP-Growth55 and FP-Growth*56 are exemplar algorithms (3) Methods that use a hybrid approach: these methods use vertical data formats to compress
DB and also mine FIs by using divide-and-conquer
strategy Eclat,57 dEclat,58 Index-BitTableFI,59
DBV-FI,60 PrePost,61 FIN,62 NSFI,63 and PrePost+64 are some examples
An FI is called an FCI if none of its supersets
has the same support For instance, consider DB e and minSup = 50% Two itemsets, AW and ACW,
are two FIs because σ(AW) = σ(ACW) = 4 >
dminSup × ne = d50% × 6e = 3 However, AW is not an FCI because ACW is its superset and has the same support to AW Only ACW is an FCI Most of
the previously proposed algorithms for mining FCIs can be categorized as being either (1) generate-and-test, (2) divide-and-conquer, or (3) hybrid methods The generate-and-test (Apriori-based) approach uses
a level-wise search to mine FCIs A well-known algo-rithm is Close.65 The divide-and-conquer approach adopts a divide-and-conquer strategy and uses some compact data structures to efficiently mine FCIs Examples are CLOSET39 and CLOSET+.56 The hybrid approaches integrate the previous two Typically, the database is firstly transformed into a
TABLE 1| A Transaction Database (DBe ) Example
Trang 3vertical data format or compress format The
approach then utilizes some pruning properties to
quickly prune nonclosed itemsets Examples are
CHARM, dCHARM,58 DBV-Miner,41DCI_PLUS,66
and NAFCP.37
An AR is an implication expression of the form
X ! Y, where X and Y are disjoint itemsets, i.e., X \
Y = ; The strength of an AR can be measured in
terms of its confidence Confidence of a rule (c)
deter-mines how frequently items in Y appear in
transac-tions that contain X: c(X ! Y) = σ(X [ Y)/σ(X) Each
frequent k-itemset, XY, can produce up to 2 k−2 ARs,
ignoring rules that have empty antecedents or
conse-quents (; ! XY or XY ! ;) An AR can be extracted
by partitioning the itemset XY into two nonempty
subsets, X and Y, such that X ! Y satisfies the
confi-dence threshold (minConf ) Note that all such rules
must have already met the support threshold because
they are generated from an FI Because the rule
genera-tion from FIs is simple, there are relatively few studies
on this stage Many studies focused on the stage of
mining FIs/FCIs Agrawal and Srikant53 introduced
the following properties: “if the rule c(AB ! CD) <
minConf, then the rules c(ABC ! D) and
c(ABD ! C) are smaller than minConf” to reduce
the search space An algorithm based on this
prop-erty has been proposed to efficiently mine ARs
from FIs/FCIs generated from stage 1 This method
has been used to mine ARs from FIs/FCIs so far
Let X be an FCI An itemset Y is a generator of
X if and only if in Y X and σ(X) = σ(Y) For
exam-ple, AW is a generator of ACW, because AW
ACW and σ(AW) = σ(ACW) = 4 Similarly, A and
AC are also generators of ACW Let G(X) is the set
of X ’s generators We have Y 2 G(X) is a minimal
generator if and only if Y does not have any subset in
G(X) For example, G(ACW) = {A, AC, AW}
there-fore minimal generators of ACW is mG(ACW) = {A}.
An association rule R1: X1! Y1is a MNAR if there
is no AR R2: X2 ! Y2 with σ(X1 [ Y1) =σ(X2 [
Y2), c(R1) = c(R2), X2 X1 and Y2 Y1 There are
two kinds of MNARs obtained: (1) exact rules (their
confidence = 100%): the rules have the form X0!
X, where X is an FCI, and X0 2 mG(X) and
(2) approximate rules (their confidence < 100%): the
rules have the form X0! Y, in which X and Y are
FCIs, and X02 mG(X), X Y.
Assume that there are two rules R1: X1! Y1
and R2: X2! Y2 Rule R1is said to be more general
than R2(R1 / R2) if and only if X1 X2 and Y2
Y1 Let R = {R1, R2, …, R n} be the set of rules that
satisfy the conditions of minSup and minConf A rule
R i is said to have a higher precedence than another
rule R j , denoted as R i > R j , if R i / R jand one of the
following conditions holds: (1) c(R i ) > c(R j ); (2) c(R i)
= c(R j) andσ(R i) > σ(R j ) Let RMG be the set of the
MGARs of R: RMG= {R j 2 R| ¬ 9 R i 2 R: R i > R j}
THE FIL/FCIL BUILDING
LBAs for mining ARS are divided into two phases: (1) building lattice and (2) generating ARs from the lattice This section presents the existing approaches for building lattices Some of existing approaches for mining ARs from the lattices are then introduced in the section“LBAs for Mining ARs”.
The FIL Building
In 2009, Vo and Le67 proposed an algorithm for building FIL (e.g., FIL-2009) directly from the data-base (Table 2) In FIL-2009, each node in the lattice
has the tuple hX, Tidset, Childreni where X is a k-itemset, Tidset is the set of IDs associated with the transactions containing X, and Children = {Y | Y 2 (k + 1)-itemsets and X Y} FIL-2009 built for DB e
in Table 1 with minSup = 50% is presented in
Figure 1
Although, mining ARs from FIL-2009 is very effective, FIL-2009 is not an effective structure to mine MNARs Therefore, in 2011, Vo and Le68 extended the structure of FIL-2009 (e.g., FIL-2011)
by adding onefield to consider whether or not a lat-tice node is a minimal generator, and anotherfield to consider whether or not a lattice node is an FCI These values were directly determined in the lattice building The structure is then used to effectively mine MNARs, which will be presented in“LBAs for
Mining ARs ” section With DB e in Table 1 and
minSup = 50%, FIL-2011 is presented in Figure 2.
On thefigure, bold-nodes and dashed-nodes indicate FCIs and minimal generators respectively
When a node XA in an 2009 (and
FIL-2011) is created, 2009 (or FIL-Building-2011) has to find all the nodes that are the children
of XA to update the lattice This process first visits
all children of X (Y 2 X.Children) With each Y, the process visits all children of Y (YB 2 Y.Children) With each YB, if XA YB, the process then updates
TABLE 2 | Existing Approaches for Building Frequent Itemset
Lattice (FIL)
Trang 4YB belonging to the children of XA (YB 2
XA.Chil-dren) Considering FIL-2009 in Figure 1, when the
algorithm creates the node TC, it has to consider all
the child nodes associated with T, which consist of
AT and TW Next, the algorithm has to consider all
the child nodes associated with AT and TW, which
are {ATW, ATC} and {ATW} However, the process
of considering all child nodes of TW does not find
any nodes that are the child node of TC The node
ATW is a duplicate, and thus making the process of
considering all child nodes associated with TW
unncessary To overcome this weakness, Vo et al.69
proposed a new structure for an FIL (e.g., FIL-2014)
and TFIL algorithm for FIL-2014 building Each
node on the lattice contains the form hItemset,
Tid-set, ChildrenEC, ChildrenLi In which, ChildrenEC
contains the child nodes based on the equivalent class
feature associating with Itemset; and ChildrenL
con-tains the child nodes based on the lattice feature
asso-ciated with Itemset Because this algorithm does not scan all the child nodes of XA to update the lattice,
the time needed to build FIL-2014 of TFIL algorithm
is less than that of Building-2009 to build
FIL-2009 and FIL-Building-2011 to build FIL-2011 For
DB e in Table 1 and minSup = 50%, FIL-2014 is
pre-sented in Figure 3
A1345
4
AW1345
4 AC13454
AWC1345
4
DW245
3
DWC245
3
DC2456
4 TW1353 TC13564 WC123455
AT135
3
TWC135
3
ATW135
3 ATC1353
ATWC135
3
D2456
4 T13564
{}
W12345
5 C1234566
FIGURE 1 | FIL-2009 for DBe with minSup = 50%.
A1345
4
AW1345
4 AC13454
AWC1345
4
DW245
3
DWC245
3
DC2456
4 TW1353 TC13564 WC123455
AT135
3
TWC135
3
ATW135
3 ATC1353
ATWC135
3
D2456
4 T13564
{}
W12345
FIGURE 2 | FIL-2011 for DBe
with minSup = 50%.
Trang 5The FCIL Building
In 2005, Zaki and Hsiao58 proposed CHARM-L to
create FCIL-2005 (Table 3) The FCIL-2005 created
by CHARM-L for DB e with minSup = 50% is shown
in Figure 4 However, MNARs and MGARs cannot
be generated from FCIL-2005 Mining MNARs and
MGARs from FCIL-2005 requires using a level-wise
approach to generate generators; therefore, it is
inef-ficient in terms of the mining time
In 2013, Vo et al.70 proposed
FLC-Building-2013 to build FCIL (e.g., FCIL-FLC-Building-2013) effectively First, FCIs with their minimal generators are mined using MG-CHARM.67 Then, an algorithm (e.g., FCIL-Building-2013) is proposed to insert FCIs into
FCIL-2013 with O(n × k) complexity where n is the number of FCIs and k is the average of the number
of child nodes on the lattice Since k << n, the
FCIL-Building-2013 algorithm is efficient The FCIL-2013
created by FCIL-Building-2013 on DB e with minSup
= 50% is shown in Figure 5
In 2014, Szathmary et al.71 proposed Snow-Touch, a novel computation schema for iceberg lattices with generators First, FCI computation is delegated to the Charm algorithm Then, FGs are extracted by Talky-G Next, two of the above meth-ods together with an FG-to-FCI matching technique form the Touch algorithm Finally, the precedence is retrieved from FCIs with FGs by the Snow algorithm using a ground duality result from hyper graph the-ory The result of Snow-Touch is the same with
A1345
4
AW1345
4 AC13454
AWC1345
4
DW245
3
DWC245
3
DC2456
4 TW1353 TC13564 WC123455
AT135
3
TWC135
3
ATW135
3 ATC1353
ATWC135
3
D2456
4 T13564
{}
W12345
5 C1234566
FIGURE 3 | FIL-2014 for DBe
with minSup = 50%.
TABLE 3| Existing Approaches for Building Frequent Closed
Itemset Lattice (FCIL)
No Name of Algorithm Year Name of FCIL
2 FCIL-Building-2013 70 2013 FCIL-2013
{}
C123456
6
TC1356
DC2456
4
ATWC135
3
AWC1345
4
DWC245
3
FIGURE 4 | FCIL-2005 for DBe with minSup = 50%.
{}
C123456
6
DC2456
4 TC13564 WC123455
D T W DW A
AT, TW
DWC245
3
ATWC135
3
AWC1345
4
FIGURE 5 | FCIL-2013 for DBe with minSup = 50%.
Trang 6FCIL-Building-2013 On DB e with minSup = 50%,
this result is shown in Figure 5
LBAS FOR MINING ARS
Besides ARs, a number of types of ARs were
pro-posed, namely MNARs and MGARs Table 4 shows
the list of existing LBAs for mining ARs, MNARs
and MGARs
LBA for Mining ARs
Atfirst, LBA-ARs-200967 traverses all child nodes Lc
of the root of FIL-2009, and then it calls a recursively
function to traverse all nodes in the lattice
(recur-sively and mark the visited nodes by turning the flag
on) Then, this algorithm uses a queue (Ω) for
traver-sing all child nodes of Lc(and marking all of the
vis-ited nodes for rejecting coincides) For each child
node (of Lc), this algorithm computes the confidence
of rule basing on the information of this node If the
confidence ≥ minConf, this algorithm will add this
rule to the results
For example, LBA-ARs-2009 uses FIL-2009 for
DB e with minSup = 50%, which was shown in Figure 1 to generate ARs Let minConf = 100%.
Considering the first child node of root, Ω = {AT,
AW, AC} (Figure 6).
1 Let L = AC, the last element of Ω We have c(A
! AC) = σ(AC)/σ(A) = 4/4 = minConf Hence,
this rule will be added to the results
2 Let L = AW, the second element ofΩ We have
c(A ! AW) = σ(AW)/σ(A) = 4/4 = minConf.
Hence, this rule will be added to the results
3 Let L = AT, the first element of Ω We have c(A
! AT) = σ(AT)/σ(A) = 3/4 < minConf Hence, this rule A ! AT will not be added to the
results
Next, LBA-ARs-2009 will perform recursively
to generate all rules on the lattice
LBA for Mining ARs with Interestingness Measures
After building FIL-2009, LBA-ARs-IM-201160 will create the HT-FIs (Hash table of FIs) including two levels of key: (1)first level uses the length of the item-set as a key (2) In case of the itemitem-sets sharing the same length, the algorithm uses hash tables with keys computed byP
y2Y (Y is the itemset which needs to
determine its support)
At first, LBA-ARs-IM-2011 traverses all child
nodes Lcof the root of FIL-2009, and then it calls a
TABLE 4 | Existing Lattice-based Approaches (LBAs) for Mining
Association Rules (ARs)
No Name of Algorithm Authors (Year) Type of ARs
1 LBA-ARs-2009 Vo and Le67 ARs
2 LBA-ARs-IM-2011 Vo and Le60 ARs (with
interestingness measures)
MGARs, most generalization association rules; MNARs, minimal
non-redundant association rules.
A1345
4
AW1345
4 AC13454
AWC1345
4
DW245
3
DWC245
3
DC2456
4 TW1353 TC13564 WC123455
AT135
3
TWC135
3
ATW135
3 ATC1353
ATWC135
3
D2456
4 T13564
{}
W12345
5 C1234566
FIGURE 6 | Association rules generation on node A.
Trang 7recursively function to traverse all nodes in the lattice
(recursively marks the visited nodes if the flag is
turned on) This algorithm uses a queue for
traver-sing all child nodes of Lc(and marks all of the visited
nodes for rejecting coincides) For each child node
(of Lc), the authors compute the measure value by
using vm(n, σ(Lc),λ(L\Lc),σ(L)) function (where n is
the number of transactions, σ(Lc) is support of Lc,
σ(L) is support of L and λ(L\Lc) = get support from
the hash table |L\Lc|th), and add this rule into ARs
There are a number of measures shown in Table 5
In fact, the number of generated rules is very large
Therefore, we need to use a threshold of vm to
shrink the rules set
For example, LBA-ARs-2009 uses FIL-2009 for
DB e with minSup = 50%, which is shown in
Figure 1, to generate ARs with Lift measures Let
minVM = 1.2 Considering the first child node, A, of
root,Ω = {AT, AW, AC} (Figure 7).
1 Let L = AC, the last element of Ω We have vm
(A ! AC) = (4 × 6)/(4 × 6) = 1 < minVM.
Hence, this rule will not be added to the results
2 Let L = AW, the second element ofΩ We have
vm(A ! AW) = (4 × 6)/(4 × 5) = 1.2 = minVM Hence, this rule will be added to the
results
3 Let L = AT, the first element of Ω We have vm (A ! AT) = (4 × 6)/(4 × 4) = 1.5 < minVM.
Hence, this rule will be added to the results
Next, LBA-ARs-2009 will recursively perform
to generate all rules on the lattice
LBA for Mining MNARs
Firstly, MNARs-FCIL traverses all child nodes Lc of the root of FIL-2011, in which each of the nodes has one field marking whether or not a lattice node is a
minimal generator (mG) and another field indicating whether or not a lattice node is a closed Then it calls
a function to traverse all nodes on the lattice (recur-sively and marks the visited nodes by turning theflag on) This algorithm uses a queue to traverse all child
nodes of Lc (and marks all visited nodes to reject
coincide) For each child node L of Lc (Lc is a
mini-mal generator), the algorithm computes c(Lc! L\Lc),
if L is an FCI and c(Lc ! L\Lc) ≥ minConf then this algorithm adds Lc! L\Lcto the results
For example, MNARs-FCIL uses FIL-2011 for
DB e with minSup = 50% (e.g., Figure 2) to generate MNARs Let minConf = 100% Considering thefirst
child node, A, of root Because A is minimal genera-tor, and c(A ! AWC) = 4/4 = minConf, A ! AWC
will be added to the results (e.g., Figure 8) Next,
TABLE 5| Value of Some Measures with Rule X ! Y
nX
X ×n Y
p
nX×n Y
n
n X + 2
nX+nY−n XY
7 Phi-coef ficient nXYffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffin×n− n X ×n Y
X ×n Y ×nX×nY p
A1345
4
AW1345
4 AC13454
AWC1345
4
DW245
3
DWC245
3
DC2456
4 TW1353 TC13564 WC123455
AT135
3
TWC135
3
ATW135
3 ATC1353
ATWC135
3
D2456
4 T13564
{}
W12345
5 C1234566
FIGURE 7 | Association rules with interestingness measures generation on node A.
Trang 8MNARs-FCIL will recursively perform this process
to generate all of the MNARs on the lattice
LBA for Mining MGARs
In 2013, Vo et al.70 proposed a LBA for mining
MGARs A theorem was introduced Given three
nodes l1, l2, and l3 in FCIL-2013, if l1 is the parent
node of l2, l2is the parent node of l3, andσ(l2)/σ(l1) <
minConf, then σ(l3)/σ(l1) < minConf According to
this theorem, if a lattice node {Y} is a child node of
{X} in the FCIL and σ(Y)/σ(X) < minConf, then the
child nodes of {Y} cannot form rules with {X}.
In details, at first, MGARs-FCIL traverses all
the FCI in FCIs For each FCI, C, it then initializes
the RHS (right-hand side) to ; It then generates rules
from the minimal generators of C to C This
algo-rithm uses a queue (Ω) to traverse all the child nodes
of {C} (and marks all the visited nodes to avoid
coin-cidence) For each child node Ls of {C}, the con
fi-dence of all rules with the form X0 ! Ls\X0 (X0 2
mG(C)) is then calculated If the confidence satisfies
minConf and L s is not marked, then Lsis added toΩ
for generating all rules from C to Ls After Ls is
added toΩ, it is marked to avoid coincidence in the
future This process is subsequently repeated to
gen-erate all rules on the lattice
LBAs FOR MINING ARs
ON DYNAMIC DATABASES
In 2014, Vo et al.69 proposed two effective
approaches for maintaining an FIL with dynamically
inserted data based on the pre-large and tidset/diffset
concepts The pre-large concept72 is proposed basing
on a safety threshold f = bðSU−SLÞ×jDj
1−SU c to reduce the need of rescanning the original database for ef
fi-ciently maintaining ARs In which, SU is the upper
threshold, SL is the lower threshold, and |D| is the number of original database D’s transactions When the number of new transactions is equal to or less
than f, the algorithm does not need to rescan the
original database The FIL with pre-large concept is called by PFIL (Pre-large FIL) In tidset-based mainte-nance of a pre-large FIL (TMPFIL) and diffset-based maintenance of a PFIL (DMPFIL), for each of the increments, the process of this algorithm is described
as follows (1) If the original database is empty, the
algorithm uses the lower threshold SLto build a PFIL
and recalculates the safety threshold f for incremental database D0 (2) If the number of transactions in
incremental database D0 is larger than f, the algo-rithm uses SL to build a PFIL and recalculates the
safety threshold f for D + D0 (3) If the number of
transactions in incremental database D0is equal to or
less than f, the algorithm updates the PFIL without
scanning the database (4) The original database is
updated as D = D + D0 The experimental results69
show that DMPFIL outweighs both of TMPFIL and the batch approach in terms of execution time required to build an FIL (Table 6)
In 2014, La et al.73 proposed MFCIL-2014 for maintaining an FCIL with dynamically inserted data based on the pre-large concepts The process of this algorithm is illustrated as follows (1) Building the initial FCIL-2005 with CHARM-L58 with SL (2) Building an index table for the initial lattice from step 1 (3) Adding transactions, one by one, to the
A1345
4
AW1345
4 AC13454
AWC1345
4
DW245
3
DWC245
3
DC2456
4 TW1353 TC13564 WC123455
AT135
3
TWC135
3
ATW135
3 ATC1353
ATWC135
3
D2456
4 T13564
{}
W12345
FIGURE 8 | Minimal non-redundant association rules generation on node A.
Trang 9lattice with the improved CLICL75 algorithm when
the number of inserted transactions is lower than f.
(4) Rescanning the entire database when the
rescan-ning value reaches f, then going back to Step
1 Experimental results73 show that MFCIL-2014
outperforms CLICL in terms of both execution time
and memory space
In 2015, based on pre-large and tidset/diffset
concepts, Vo et al.74proposed two algorithms
(TiFU-FIL and DiFU-(TiFU-FIL algorithms) for updating P(TiFU-FIL
with transaction deletion The experimental results74
show that the two approaches outperform the
batch-mode algorithm in building PFIL, with the
diffset-based approach (DiFU-FIL) being more efficient than
the tidset-based approach (TiFU-FIL)
Current incremental approaches are mainly
used on pre-large concept which require to rescan the
database when the number of inserted or updated
transactions over a safety threshold Thus, it is
important to find methods that can mine ARs
with-out rescanning database toward mining ARs on data
streams
COMPLEXITY ANALYSIS
The complexity of mining FIs/FCIs in the worst case
is O(2 |I| ) where |I| is the number of items in the
data-base The complexity for building an FIL from the
database67–69 is O(2 |I| × k), where k is the average
number of all subsets of all FIs In addition, the
com-plexity for building an FCIL from the set of FCIs70in
the worst case is O(n × k), where n is the number of
FCIs and k is the average number of all subsets of all
FCIs Therefore, the complexity for building an FCIL
from the database70 is O(2 |I| + n × k) Fortunately,
Vo et al.70 shown that k n and n 2 |I| in most
databases, therefore, the overall computational
com-plexity of FIL/FCIL building67–70is O(2 |I|), the same
with that of mining FIs/FCIs.
For generation of ARs/MNARs/MGARs from
built LBAs, the complexity is O(n × k), where n is
the number of nodes on FIL/FCIL and k is the aver-age number of all subsets of all FCIs with k n Meanwhile, mining ARs from FIs/FCIs requires O (n2) Therefore, LBAs for mining ARs/MNARs/ MGARs are especially effective in the case users need
to mine ARs/MNARs/MGARs with many different confidences or many different minimum support thresholds (the thresholds have to be greater than or equal to the threshold used to build lattices)
CONCLUSION AND FUTURE RESEARCHES
LBAs for mining ARs are new approaches that com-prise of two phrases: FIL/FCIL building and generat-ing ARs from the lattice Total mingenerat-ing time of LBAs for mining ARs outperforms the traditional methods for mining ARs, especially when the number of FIs/F-CIs is large In this article, we survey the existing LBAs for mining ARs on static and dynamic data-bases First, we present some of the building method
of lattice on static databases including FIL and FCIL Next, methods using LBAs for generating traditional ARs, MNARs and MGARs from FIL/FCIL are pre-sented Then, maintenance FIL/FCIL approaches toward mining ARs for dynamic databases including inserted and deleted transactions are surveyed
In reality, that the number of FIs/FCIs is often large and end users just interested in a small set con-cerning a certain number of issues Therefore, mining FIs/FCIs with constraints are proposed However, FIL/FCIL with constrains building is still an open challenge
Although methods for maintaining an FIL with inserted and deleted transactions are proposed, a general method that facilitates to maintain an FIL with inserted, deleted and updated data is quite nec-essary For FCIL, the study on methods for maintain-ing an FCIL with deleted and updated transactions, and as well as a general method for maintaining an FCIL with all operations are necessary
In addition, current incremental approaches are mainly used on pre-large concept These methods require to rescan the database when the number of inserted or updated transactions over a safety thresh-old Thus, it is crucial to investigate methods that can mine ARs without rescanning database toward mining ARs on data streams Finally, examining LBAs methods for mining ARs on quantitative data-base is also a potential research direction
TABLE 6| Existing Lattice-based Approaches (LBAs) for Mining
Association Rules (ARs) on Dynamic Databases
No Name of Algorithm Year Structure Actions
1 TMPFIL and DMPFIL 69 2014 FIL Inserted data
2 MFCIL-2014 73 2014 FCIL Inserted data
3 TiFU-FIL and DiFU-FIL 74 2015 FIL Deleted data
FCIL, frequent closed itemset lattice; FIL, frequent itemset lattice.
Trang 101 Menardi G, Torelli N Training and assessing classi
fi-cation rules with imbalanced data Data Min Knowl
Discov 2014, 28:92–122.
2 Nassirtoussi AK, Aghabozorgi SR, The Y.W,
Ngo DCL Text mining for market prediction: a
sys-tematic review Expert Syst Appl 2014, 41:7653–7670.
3 Niemann U, Völzke H, Kühn JP, Spiliopoulou M.
Learning and inspecting classi fication rules from
longi-tudinal epidemiological data to identify predictive
fea-tures on hepatic steatosis Expert Syst Appl 2014,
41:5405 –5415.
4 Nguyen TTL, Vo B, Hong TP, Hoang CT Classi
fica-tion based on associafica-tion rules: a lattice-based
approach Expert Syst Appl 2012, 39:11357–11366.
5 Shindea S, Kulkarnib U Extracting classi fication rules
from modi fied fuzzy min–max neural network for data
with mixed attributes Appl Soft Comput 2016,
40:364 –378.
6 Wang X, Liu X, Pedrycz W, Zhu X, Hu G Mining
axiomatic fuzzy set association rules for classi fication
problems Eur J Oper Res 2012, 218:202–210.
7 Le HS A novel kernel fuzzy clustering algorithm for
Geo-Demographic Analysis. Inform Sci 2015,
317:202 –223.
8 Mai TS, He X, Feng J, Plant C, Böhm C Anytime
density-based clustering of COMPLEx data Knowl Inf
Syst 2015, 45:319–355.
9 Indurkhya N Emerging directions in predictive text
mining Data Min Knowl Discov 2015, 5:155–164.
Vairavasundaram I, Ravi L Data mining-based tag
recommendation system: an overview Data Min
Knowl Discov 2015, 5:87–112.
11 Fariha A, Ahmed CF, Leung CK, Samiullah M,
Pervin S, Cao L A new framework for mining frequent
interaction patterns from meeting databases Eng Appl
Artif Intel 2015, 45:103–118.
12 Fournier-Viger P, Gomariz A, Gueniche T, Soltani A,
Wu CW, Tseng VS SPMF: a Java open-source pattern
mining library. J Mach Learn Res 2014,
15:3389 –3393.
13 Hacene MR, Huchard M, Napoli A, Valtchev P
Rela-tional concept analysis: mining concept lattices from
multi-relational data Ann Math Artif Intell 2013,
67:81 –108.
14 Lan GC, Hong TP, Tseng VS An ef ficient
projection-based indexing approach for mining high utility
item-sets Knowl Inf Syst 2014, 38:85–107.
15 Lin CW, Lan GC, Hong TP Mining high utility
item-sets for transaction deletion in a dynamic database.
Intell Data Anal 2015, 19:43–55.
16 Lin CW, Hong TP, Lan GC, Wong JW, Lin WY Incrementally mining high utility patterns based on
pre-large concept Appl Intell 2014, 40:343–357.
17 Lin JCW, Gan W, Hong TP A fast updated algorithm
to maintain the discovered high-utility itemsets for transaction modification Adv Eng Inform 2015,
29:562 –574.
18 Lin JCW, Gan W, Hong TP, Tseng VS Ef ficient
algo-rithms for mining up-to-date high-utility patterns Adv
Eng Inform 2015, 29:648–661.
19 Song W, Liu Y, Li J Mining high utility itemsets by
dynamically pruning the tree structure Appl Intell
2014, 40:29 –43.
20 Song W, Liu Y, Li J BAHUI: fast and memory ef ficient
mining of high utility itemsets based on bitmap Int J
Data Warehouse Min 2014, 10:1–15.
21 Tseng VS, Wu CW, Fournier-Viger P, Yu PS Ef ficient algorithms for mining the concise and lossless
repre-sentation of high utility itemsets IEEE Trans Knowl
Data Eng 2015, 27:726–739.
22 Yun U, Ryang H Incremental high utility pattern
min-ing with static and dynamic databases Appl Intell
2015, 42:323 –352.
23 Zhang X, Deng ZH Mining summarization of high
utility itemsets Knowl-Based Syst 2015, 84:67–77.
24 Kim D, Yun U Ef ficient mining of high utility patterns
with considering of rarity and length Appl Intell, in
press, doi:10.1007/s10489-015-0750-2.
25 Ryang H, Yun U, Ryu K Fast algorithm for high util-ity pattern mining with the sum of item quantities.
Intell Data Anal 2016, 20:395–415.
26 Ryang H, Yun U, Ryu K Discovering high utility
item-sets with multiple minimum supports Intell Data Anal
2014, 18:1027 –1047.
27 Yun U, Ryang H, Ryu K High utility itemset mining with techniques for reducing overestimated utilities
and pruning candidates Expert Syst Appl 2014,
41:3861 –3878.
28 Tseng VS, Wu CW, Fournier-Viger P, Yu PS Ef ficient
algorithms for mining top-k high utility itemsets IEEE
Trans Knowl Data Eng 2016, 28:54–67.
29 Lee G, Yun U, Ryang H An uncertainty-based approach: frequent itemset mining from uncertain data
with different item importance Knowl-Based Syst
2015, 90:239 –256.
30 Yun U, Pyun G, Yoon E Ef ficient mining of robust closed weighted sequential patterns without
informa-tion loss Int J Artif Intell Tools 2015, 24:1–28.
31 Yun U, Yoon E An ef ficient approach for mining weighted approximate closed frequent patterns
consid-ering noise constraints Int J Uncertainty Fuzziness
Knowl Based Syst 2014, 22:879–912.