However, sequential algorithms are not efficient for mining CARs in large datasets while existing parallel algorithms require communication and collaboration among computing nodes which i
Trang 1Efficient strategies for parallel mining class association rules
Dang Nguyena, Bay Vob,⇑, Bac Lec
a
University of Information Technology, Vietnam National University, Ho Chi Minh, Viet Nam
b
Information Technology Department, Ton Duc Thang University, Ho Chi Minh, Viet Nam
c
Department of Computer Science, University of Science, Vietnam National University, Ho Chi Minh, Viet Nam
a r t i c l e i n f o
Keywords:
Associative classification
Class association rule mining
Parallel computing
Data mining
Multi-core processor
a b s t r a c t
Mining class association rules (CARs) is an essential, but time-intensive task in Associative Classification (AC) A number of algorithms have been proposed to speed up the mining process However, sequential algorithms are not efficient for mining CARs in large datasets while existing parallel algorithms require communication and collaboration among computing nodes which introduces the high cost of synchroni-zation This paper addresses these drawbacks by proposing three efficient approaches for mining CARs in large datasets relying on parallel computing To date, this is the first study which tries to implement an algorithm for parallel mining CARs on a computer with the multi-core processor architecture The pro-posed parallel algorithm is theoretically proven to be faster than existing parallel algorithms The exper-imental results also show that our proposed parallel algorithm outperforms a recent sequential algorithm
in mining time
Ó 2014 Elsevier Ltd All rights reserved
1 Introduction
Classification is a common topic in machine learning, pattern
recognition, statistics, and data mining Therefore, numerous
ap-proaches based on different strategies have been proposed for
building classification models Among these strategies, Associative
Classification (AC), which uses the associations between itemsets
and class labels (called class association rules), has been proven
it-self to be more accurate than traditional methods such as C4.5
(Quinlan, 1993) and ILA (Tolun & Abu-Soud, 1998; Tolun, Sever,
Uludag, & Abu-Soud, 1999) The problem of classification based
on class association rules is to find the complete set of CARs which
satisfy the user-defined minimum support and minimum
confi-dence thresholds from the training dataset A subset of CARs is
then selected to form the classifier Since the first introduction in
(Liu, Hsu, & Ma, 1998), tremendous approaches have been
pro-posed to solve this problem Examples include the classification
based on multiple association rules (Li, Han, & Pei, 2001), the
clas-sification model based on predictive association rules (Yin & Han,
2003), the classification based on the maximum entropy (Thabtah,
Cowling, & Peng, 2005), the classification based on the information
gain measure (Chen, Liu, Yu, Wei, & Zhang, 2006), the lazy-based
approach for classification (Baralis, Chiusano, & Garza, 2008), the
use of an equivalence class rule tree (Vo & Le, 2009), the classifier based on Galois connections between objects and rules (Liu, Liu, & Zhang, 2011), the lattice-based approach for classification (Nguyen,
Vo, Hong, & Thanh, 2012), and the integration of taxonomy infor-mation into classifier construction (Cagliero & Garza, 2013) However, most existing algorithms for associative classification have primarily concentrated on building an efficient and accurate classifier but have not considered carefully the runtime perfor-mance of discovering CARs in the first phase In fact, finding all CARs is a challenging and time-consuming problem due to two rea-sons First, it may be hard to find all CARs in dense datasets since there are a huge number of generated rules For example, in our experiments, some datasets can induce more than 4,000,000 rules Second, the number of candidate rules to check is very large Assuming there are d items and k class labels in the dataset, there can be up to k (2d 1) rules to consider Very few studies, for in-stance (Nguyen, Vo, Hong, & Thanh, 2013; Nguyen et al., 2012; Vo
& Le, 2009; Zhao, Cheng, & He, 2009), have discussed the execution time efficiency of the CAR mining process Nevertheless, all algo-rithms have been implemented by sequential strategies Conse-quently, their runtime performances have not been satisfied on large datasets, especially recently emerged dense datasets Researchers have begun switching to parallel and distributed com-puting techniques to accelerate the computation Two parallel algorithms for mining CARs were recently proposed on distributed memory systems (Mokeddem & Belbachir, 2010; Thakur & Ramesh, 2008)
http://dx.doi.org/10.1016/j.eswa.2014.01.038
0957-4174/Ó 2014 Elsevier Ltd All rights reserved.
⇑Corresponding author Tel.: +84 083974186.
E-mail addresses: nguyenphamhaidang@outlook.com (D Nguyen), vdbay@it.
tdt.edu.vn (B Vo), lhbac@fit.hcmus.edu.vn (B Le).
Contents lists available atScienceDirect
Expert Systems with Applications
j o u r n a l h o m e p a g e : w w w e l s e v i e r c o m / l o c a t e / e s w a
Trang 2Along with the advent of the computers with the multi-core
processors, more memory and computing power of processors
have been utilized so that larger datasets can be tackled in the
main memory with lower cost in comparison with the usage of
dis-tributed or mainframe systems Therefore, this present study aims
to propose three efficient strategies for parallel mining CARs on the
multi-core processor computers The proposed approaches
over-come two disadvantages of existing methods for parallel mining
CARs They eliminate communication and collaboration among
computing nodes which introduces the overhead of
synchroniza-tion They also avoid data replication and do not require data
trans-fer among processing units As a result, the proposals significantly
improve the response time compared to the sequential counterpart
and existing parallel methods The proposed parallel algorithm is
theoretically proven to be more efficient than existing parallel
algorithms The experimental results also show that the proposed
parallel algorithm can achieve up to a 2.1 speedup compared to
a recent sequential CAR mining algorithm
The rest of this paper is organized as follows In Section2, some
preliminary concepts of the class association rule problem and the
multi-core processor architecture are briefly given The benefits of
parallel mining on multi-core processor computers are also
dis-cussed in this section Work related to sequential and parallel
min-ing class association rules are reviewed in Section3 Our previous
sequential CAR mining algorithm is summarized in Section4
be-cause it forms the basic framework of our proposed parallel
algo-rithm The primary contributions are presented in Section 5 in
which three proposed strategies for efficiently mining
classifica-tion rules under the high performance parallel computing context
are described The time complexity of the proposed algorithm is
analyzed in Section6 Section7presents the experimental results
while conclusions and future work are discussed in Section8
2 Preliminary concepts
This section provides some preliminary concepts of the class
association rule problem and the multi-core processor
architec-ture It also discusses benefits of parallel mining on the multi-core
processor architecture
2.1 Class association rule
One of main goals of data mining is to discover important
rela-tionships among items such that the presences of some items in a
transaction are associated with the presences of some other items
To achieve this purpose, Agrawal and his colleagues proposed the
Apriori algorithm to find association rules in a transactional
data-set (Agrawal & Srikant, 1994) An association rule has the form
X ? Y where X, Y are frequent itemsets and X \ Y = £ The problem
of mining association rules is to find all association rules in a
data-set having support and confidence no less than user-defined
min-imum support and minmin-imum confidence thresholds
Class association rule is a special case of association rule in
which only the class attribute is considered in the rule’s right-hand
side (consequent) Mining class association rules is to find the set
of rules which satisfy the minimum support and minimum
confi-dence thresholds specified by end-users Let us define the CAR
problem as follows
Let D be a dataset with n attributes {A1, A2, , An} and |D|
re-cords (objects) where each record has an object identifier (OID)
Let C = {c1, c2, , ck} be a list of class labels A specific value of an
attribute Aiand class C are denoted by lower-case letters aimand
c, respectively
Definition 1 An item is described as an attribute and a specific value for that attribute, denoted by h(Ai, aim)i and an itemset is a set
of items
Definition 2 Let I ¼ fhðA1;a11Þi; ; hðA1;a1m1Þi; hðA2;a21Þi; ; hðA2;a2m 2Þi; ; hðAn;an1Þi; ; hðAn;anm nÞig be a finite set of items Dataset D is a finite set of objects, D = {OID1, OID2, , OID|D|} in which each object OIDxhas the form OIDx= attr(OIDx) ^ class(OIDx) (1 6 x 6 |D|) with attr(OIDx) # I and class(OIDx)eC For example, OID1 for the dataset shown in Table 1 is {h(A, a1)i, h(B, b1)i, h(C, c1)i} ^ {1}
Definition 3 A class association rule R has the form itemset ? cj, where cjeC is a class label
Definition 4 The actual occurrence ActOcc(R) of rule R in D is the number of objects of D that match R’s antecedent, i.e., ActOcc(R) = |{OID|OIDeD ^ itemset # attr(OID)}|
Definition 5 The support of rule R, denoted by Supp(R), is the number of objects of D that match R’s antecedent and are labeled with R’s class Supp(R) is defined as:
SuppðRÞ ¼ jfOIDjOID 2 D ^ itemset # attrðOIDÞ ^ cj¼ classðOIDÞgj
Definition 6 The confidence of rule R, denoted by Conf(R), is defined as:
Conf ðRÞ ¼ SuppðRÞ
ActOccðRÞ
A sample dataset is shown inTable 1 It contains three objects, three attributes (A, B, and C), and two classes (1 and 2) Considers rule R: h(A, a1)i ? 1 We have ActOcc(R) = 2 and Supp(R) = 1 since there are two objects with A = a1, in that one object (object 1) also con-tains class 1 We also have Conf ðRÞ ¼ActOccðRÞSuppðRÞ ¼1
2.2 Multi-core processor architecture
A multi-core processor (shown inFig 1) is a single computing component with two or more independent central processing units (cores) in the same physical package (Andrew, 2008) The proces-sors were originally designed with only one core However, mul-ti-core processors became mainstream when Intel and AMD introduced their commercial multi-core chip in 2008 (Casali & Ernst, 2013) A multi-core processor computer has different speci-fications from either a computer cluster (Fig 2) or a SMP (Symmet-ric Multi-processor) system (Fig 3): the memory is not distributed like in a cluster but rather is shared It is similar to the SMP archi-tecture Many SMP systems, however, have the NUMA (Non Uni-form Memory Access) architecture There are several memory blocks which are accessed with different speeds from each proces-sor depending on the distance between the memory block and the processor On the contrary, the multi-core processors are usually
on the UMA (Uniform Memory Access) architecture There is one
Table 1 Example of a dataset.
Trang 3memory block only, so all cores have an equal access time to the
memory (Laurent, Négrevergne, Sicard, & Termier, 2012)
2.3 Parallel mining on the multi-core processor architecture
Obviously, the multi-core processor architecture has many
desirable properties, for example each core has direct and equal
ac-cess to all the system’s memory and the multi-core chip also allows
higher performance at lower energy and cost Therefore, numerous
researchers have developed parallel algorithms on the multi-core
processor architecture in the data mining literature One of the first
algorithms targeting multi-core processor computers was FP-array
proposed by Liu and his colleagues in 2007 (Liu, Li, Zhang, & Tang,
2007) The authors proposed two techniques, namely a
cache-conscious FP-array and a lock-free dataset tiling parallelism
mech-anism for parallel discovering frequent itemsets on the multi-core
processor machines.Yu and Wu (2011)proposed an efficient load
balancing strategy in order to reduce massive duplicated generated
candidates Their main contribution was to enhance the task of candidate generation in the Apriori algorithm on the multi-core processor computers Schlegel, Karnagel, Kiefer, and Lehner (2013) recently adapted the well-known Eclat algorithm to a highly parallel version which runs on the multi-core processor sys-tem They proposed three parallel approaches for Eclat: indepen-dent class, shared class, and shared itemset Parallel mining has also been widely adopted in many other research fields, such as closed frequent itemset mining (Negrevergne, Termier, Méhaut, & Uno, 2010), gradual pattern mining (Laurent et al., 2012), corre-lated pattern mining (Casali & Ernst, 2013), generic pattern mining (Negrevergne, Termier, Rousset, & Méhaut, 2013), and tree-struc-tured data mining (Tatikonda & Parthasarathy, 2009)
While many researches have been devoted to develop parallel pattern mining and association rule mining algorithms relied on the multi-core processor architecture, no studies have published regarding the parallel class association rule mining problem Thus, this paper proposes the first algorithm for parallel mining CARs which can be executed efficiently on the multi-core processor architecture
3 Related work This section begins with the overview of some sequential ver-sions of CAR mining algorithm and then provides details about two parallel versions of it
3.1 Sequential CAR mining algorithms The first algorithm for mining CARs was proposed byLiu et al (1998)based on the Apriori algorithm (Agrawal & Srikant, 1994) After its introduction, several other algorithms adopted its ap-proach, including CAAR (Xu, Han, & Min, 2004) and PCAR (Chen, Hsu, & Hsu, 2012) However, these methods are time-consuming because they generate a lot of candidates and scan the dataset sev-eral times Another approach for mining CARs is to build the fre-quent pattern tree (FP-tree) (Han, Pei, & Yin, 2000) to discover rules, which was presented in some algorithms such as CMAR (Li
et al., 2001) and L3 (Baralis, Chiusano, & Garza, 2004) The mining
Processor
Memory
Processor
Memory
Processor
Memory
Processor
Memory
Chip
C o r e
C o r e
Memory
Fig 1 Multi-core processor: one chip, two cores, two threads (Source: http://
software.intel.com/en-us/articles/multi-core-processor-architecture-explained ).
Trang 4process used by the FP-tree does not generate candidate rules.
However, its significant weakness lies in the fact that the FP-tree
does not always fit in the main memory Several algorithms, MMAC
(Thabtah, Cowling, & Peng, 2004), MCAR (Thabtah et al., 2005), and
MCAR (Zhao et al., 2009), utilized the vertical layout of the dataset
to improve the efficiency of the rule discovery phase by employing
a method that extends the tidsets intersection method mentioned
in (Zaki, Parthasarathy, Ogihara, & Li, 1997) Vo and Le proposed
another method for mining CARs by using an equivalence class rule
tree (ECR-tree) (Vo & Le, 2009) An efficient algorithm, called
ECR-CARM, was also proposed in their paper The two strong features
demonstrated by ECR-CARM are that it scans the dataset only once
and uses the intersection of object identifiers to determine the
sup-port of itemsets quickly However, it needs to generate and test a
huge number of candidates because each node in the tree contains
all values of a set of attributes.Nguyen et al (2013)modified the
ECR-tree structure to speed up the mining process In their
en-hanced tree, named MECR-tree, each node contains only one value
instead of the whole group They also provided theorems to
iden-tify the support of child nodes and prune unnecessary nodes
quickly Based on MECR-tree and these theorems, they presented
the CAR-Miner algorithm for effectively mining CARs
It can be seen that many sequential algorithms of CAR mining
have been developed but very few parallel versions of it have been
proposed Next section reviews two parallel algorithms of CAR
mining which have been mentioned in the associative
classifica-tion literature
3.2 Parallel CAR mining algorithms
One of the primary weaknesses of sequential versions of CAR
mining is that they are unable to provide the scalability in terms
of data dimension, size, or runtime performance for such large
datasets Consequently, some researchers recently have tried to
apply parallelism to current sequential CAR mining algorithms to
release the sequential bottleneck and improve the response time
Thakur and Ramesh (2008) proposed a parallel version for the
CBA algorithm (Liu et al., 1998) Their proposed algorithm was
implemented on a distributed memory system and based on data
parallelism The parallel CAR mining phase is an adaption of the
CD approach which was originally proposed for parallel mining
fre-quent itemsets (Agrawal & Shafer, 1996) The training dataset was
partitioned into P parts which were computed on P processors
Each processor worked on its local data to mine CARs with the
same global minimum support and minimum confidence
How-ever, this algorithm has three big weaknesses as follows First, it
uses a static load balance which partitions work among processors
by using a heuristic cost function This causes a high load imbal-ance Second, a high synchronization happens at the end of each step Final, each site must keep the duplication of the entire set
of candidates Additionally, the authors did not provide any exper-iments to illustrate the performance of the proposed algorithm Mokeddem and Belbachir (2010)proposed a distributed version for FP-Growth (Han et al., 2000) to discover CARs Their proposed algorithm was also employed on a distributed memory system and based on the data parallelism Data were partitioned into P parts which were computed on P processors for parallel discover-ing the subsets of classification rules An inter-communication was established to make global decisions Consequently, their ap-proach faces the big problem of high synchronization among nodes In addition, the authors did not conduct any experiments
to compare their proposed algorithm with others
Two existing parallel algorithms for mining CARs which were employed on distributed memory systems have two significant problems: high synchronization among nodes and data replication
In this paper, a parallel CAR mining algorithm based on the multi-core processor architecture is thus proposed to solve those problems
4 A sequential class association rule mining algorithm
In this section, we briefly summarize our previous sequential CAR mining algorithm as it forms the basic framework of our pro-posed parallel algorithm
In (Nguyen & Vo, 2014), we proposed a tree structure to mine CARs quickly and directly Each node in the tree contains one item-set along with:
(1) (Obidset1, Obidset2, , Obidsetk) – A list of Obidsets in which each Obidsetiis a set of object identifiers that contain both the itemset and class ci Note that k is the number of classes
in the dataset
(2) pos – A positive integer storing the position of the class with
pos = argmaxie[1,k]{|Obidseti|}
(3) total – A positive integer which stores the sum of cardinality
of all Obidseti, i.e., total ¼Pk
i¼1ðjObidsetijÞ
However, the itemset is converted to the form att values for easily programming, where
(1) att – A positive integer represents a list of attributes (2) values – A list of values, each of which is contained in one attribute in att
Main Memory
Processor 1 Processor 2 Processor n
Bus Arbiter
System Bus
Fig 3 Symmetric multi-processor system (Source: http://en.wikipedia.org/wiki/Symmetric_multiprocessing ).
Trang 5For example, itemset X = {h(B, b1)i, h(C, c1)i} is denoted as
X = 6 b1c1 A bit representation is used for storage of itemset
attributes to save memory usage Attributes BC can be represented
as 110 in bit representation, so the value of these attributes is 6
Bitwise operations are then used to quickly join itemsets
InTable 1, itemset X = {h(B, b1)i, h(C, c1)i} is contained in objects
1, 2 and 3 Thus, the node which contains itemset X has the form
6 b1c1(1, 23) in which Obidset1= {1} (or Obidset1= 1 for short)
(i.e., object 1 contains both itemset X and class 1), Obidset2= {2, 3}
(or Obidset2= 23 for short) (i.e., objects 2 and 3 contain both
item-set X and class 2), pos = 2 (denoted by a line under Obiditem-set2, i.e., 23),
and total = 3 pos is 2 because the cardinality of Obidset2for class 2
is maximum (2 versus 1)
Obtaining support and confidence of a rule becomes computing
|Obidsetpos| and jObidsetpos j
total , respectively For example, node
6 b1c1(1, 23) generates rule {h(B, b1)i, h(C, c1)i} ? 2 (i.e., if B = b1
and C = c1, then Class = 2) with Supp = |Obidset2| = |23| = 2 and
Conf ¼2
Based on the tree structure, we also proposed a sequential
algo-rithm for mining CARs, called Sequential-CAR-Mining, as shown in
Fig 4 Firstly, we find all frequent 1-itemsets and add them to the
root node of the tree (Line 1) Secondly, we recursively discover
other frequent k-itemsets based on the Depth-First Search strategy
(procedure Sequential-CAR-Mining) Thirdly, while traversing
nodes in the tree, we also generate rules which satisfy the mini-mum confidence threshold (procedure Generate-Rule) The pseudo code of the algorithm is shown inFig 4
Fig 5shows the tree structure generated by the sequential CAR mining algorithm for the dataset shown inTable 1 For details on the tree generation, please refer to the study byNguyen and Vo (2014)
5 The proposed parallel class association rule mining algorithm Although Sequential-CAR-Mining is an efficient algorithm for mining all CARs, its runtime performance reduces significantly on large datasets due to the computational complexity As a result,
Input: Dataset ,D minSup and minConf
Output: All CARs satisfying minSup and minConf Procedure:
frequent 1-itemset
Sequential-CAR-Mining(L r,minSup, minConf)
2 CARs=∅ ;
7 ifl att y ≠l att x then // two nodes are combined only if their attributes are different
8 O att =l att l att x | y ; // using bitwise operation
9 O values =l values x ∪l values y ;
10 O Obidset i=l Obidset x i∩l Obidset y i; // ∀ ∈i [ ]1,k
11 O pos =argmaxi∈[ ] 1,k{O Obidset i}; 12
1
k
i i
O total O Obidset
=
13 if O Obidset O pos. ≥ minSup then // node O satisfies minSup
Generate-Rule( ,l minConf)
16.conf=l Obidset l pos. / l total;
18 CARs=CARs∪{l itemset →c pos(l Obidset l pos. , conf) };
{ } ( )
1×a1 1, 2 1×a2(∅,3) 2×b1 1, 23( ) 4×c1 1, 23( ) ( )
3×a b1 1 1, 2 5×a c1 1 1, 2( ) 3×a b2 1(∅,3) 5×a c2 1(∅,3) 6×b c1 1 1, 23( ) ( )
7×a b c1 1 1 1, 2 7×a b c2 1 1(∅,3)
Fig 5 Tree generated by sequential-CAR-mining for the dataset in Table 1
Trang 6Input: Dataset ,D minSup and minConf
Output: All CARs satisfying minSup and minConf
Procedure:
frequent 1-itemset
PMCAR(L r,minSup, minConf)
2 totalCARs=CARs=∅ ;
7 ifl att y ≠l att x then // two nodes are combined only if their attributes are different
8 O att =l att l att x | y ; // using bitwise operation
9 O values =l values x ∪l values y ;
10 O Obidset i=l Obidset x i∩l Obidset y i; // ∀ ∈i [ ]1,k
11 O pos =argmaxi∈[ ] 1,k{O Obidset i};
12
1
k
i i
O total O Obidset
=
13 if O Obidset O pos. ≥ minSup then // node O satisfies minSup
15 Taskt = new Task(() => { i
16 for each task in the list of created tasks do
18 totalCARs=totalCARs∪tCARs;
Sub-PMCAR(tCARs,L r,minSup, minConf)
24 ifl att y ≠l att x then // two nodes are combined only if their attributes are different
25 O att =l att l att x | y ; // using bitwise operation
26 O values =l values x ∪l values y ;
27 O Obidset i=l Obidset x i∩l Obidset y i; // ∀ ∈i [ ]1,k
28 O pos =argmaxi∈[ ] 1,k{O Obidset i};
29
1
k
i i
O total O Obidset
=
30 if O Obidset O pos. ≥ minSup then // node O satisfies minSup
Fig 6 PMCAR with independent branch strategy.
Trang 7we have tried to apply parallel computing techniques to the
sequential algorithm to speed up the mining process
Schlegel et al (2013)recently adapted the well-known Eclat
algorithm to a highly parallel version which runs on the multi-core
processor system They proposed three parallel approaches for
Eclat: independent class, shared class, and shared itemset In the
‘‘independent class’’ strategy, each equivalence class is distributed
to a single thread which mines its assigned class independently
from other threads This approach has an important advantage in
that the synchronization cost is low It, however, consumes much
higher memory than the sequential counterpart because all threads
hold entire their tidsets at the same time Additionally, this strategy
often causes high load imbalances when a large number of threads
are used Threads mine light classes often finish sooner than
threads mine heavier classes In the ‘‘shared class’’ strategy, a single
class is assigned to multiple threads This can reduce the memory
consumption but increase the cost of synchronization since one
thread has to communicate to others to obtain their tidsets In the
final strategy, ‘‘shared itemset’’, multiple threads concurrently
per-form the intersection of two tidsets for a new itemset In this
strat-egy, threads have to synchronize with each other with a high cost
Basically, the proposed algorithm, Parallel Mining Class
Associ-ation Rules (PMCAR), is a combinAssoci-ation of Sequential-CAR-Mining
and parallel ideas mentioned in (Schlegel et al., 2013) It has the
same core steps as Sequential-CAR-Mining where it scans the
data-set once to obtain all frequent 1-itemdata-sets along with their
Obid-sets, and it then starts recursively mining It also adopts two
parallel strategies ‘‘independent class’’ and ‘‘shared class’’
How-ever, PMCAR has some differences as follows PMCAR is a parallel
algorithm for mining class association rules while the work done
by Schlegel et al focuses on mining frequent itemsets only
Addi-tionally, we also propose a third parallel strategy shared Obidset
for PMCAR PMCAR is employed on a single system with the
mul-ti-core processor where the main memory can be shared with and
equally accessed by all cores Hence, PMCAR does not require
syn-chronization among computing nodes like other parallel CAR
min-ing algorithms employed on distributed memory systems
Compared to Sequential-CAR-Mining, the main differences
be-tween PMCAR and Sequential-CAR-Mining in terms of parallel
CAR mining strategies are discussed in the following sections
5.1 Independent branch strategy
The first strategy, independent branch, distributes each branch of
the tree to a single task, which mines assigned branch independently
from all other tasks to generate CARs General speaking, this strategy
is similar to the ‘‘independent class’’ strategy mentioned in (Schlegel
et al., 2013) except that PMCAR uses the different tree structure for
the purpose of CAR mining and it is implemented by using tasks
in-stead of threads As mentioned above, this strategy has some
limita-tions such as high load imbalances and high memory consumption
However, the primary advantage of this strategy is that each task is
executed independently from other tasks without any
synchroniza-tion In our implementation, the algorithm is employed based on the
parallelism model in NET Framework 4.0 Instead of using threads,
our algorithm uses tasks that have more advantageous than threads First, task consumes less memory usage than thread Second, while a single thread runs on a single core, tasks are designed to be aware of the multi-core processor and multiple tasks can be executed on a single core Final, using threads takes much time because operating systems must allocate data structures of threads, initialize, destroy them, and also perform the context switches between threads Con-sequently, our implementation can solve two problems: high mem-ory consumption and high imbalance
The pseudo code of PMCAR with independent branch strategy is shown inFig 6
We apply the algorithm to the sample dataset shown inTable 1to illustrate its basic ideas First, PMCAR finds all frequent 1-itemsets as done in Sequential-CAR-Mining (Line 1) After this step, we have
Lr= {1 a1(1, 2), 1 a2(£, 3), 2 b1(1, 23), 4 c1(1, 23)} Second, PMCAR calls procedure PMCAR to generate frequent 2-itemsets (Lines 3–14) For example, consider node 1 a1(1, 2) This node combines with two nodes 2 b1(1, 23) and 4 c1(1, 23) to generate two new nodes 3 a1b1(1, 2) and 5 a1c1(1, 2) Note that node 1 a1(1, 2) does not combine with node 1 a2(£, 3) since they have the same attribute (attribute A) which causes the support of the new node is zero regarding Theorem 1 mentioned in (Nguyen & Vo, 2014) After these steps, we have Pi= {3 a1b1(1, 2), 5 a1c1(1, 2)} Then, PMCAR creates a new task tiand calls procedure Sub-PMCAR inside that task with four parameters tCARs, minSup, minConf, and Pi The first parameter tCARs is used to store the set of rules returned by Sub-PMCAR in a task (Line 15) For instance, task t1 is created and proce-dure Sub-PMCAR is executed inside t1 Proceproce-dure Sub-PMCAR is recursively called inside a task to mine all CARs (Lines 20–32) For example, task t1 also generates node 7 a1b1c1(1, 2) and its rule Fi-nally, after all created tasks completely mine all assigned branches, their results are collected and form the complete set of rules (Lines 16–19) InFig 7, three tasks t1, t2, and t3 represented by solid blocks parallel mine three branches a1, a2, and b1 independently
5.2 Shared branch strategy The second strategy, shared branch, adopts the same ideas of the
‘‘shared class’’ strategy mentioned inSchlegel et al (2013) In this strategy, each branch is parallel mined by multiple tasks The
pseu-do code of PMCAR with shared branch strategy is shown inFig 8 First, the algorithm initializes the root node Lr(Line 1) Then, the procedure PMCAR is recursively called to generate CARs When node lxcombines with node ly, the algorithm creates a new task
tiand performs the combination code inside that task (Lines 7– 17) Note that because multiple tasks concurrently mine the same branch, synchronization happens to collect necessary information for the new node (Line 18) Additionally, to avoid a data race (i.e., two or more tasks perform operations that update a shared piece data) (Netzer & Miller, 1989), we use a lock object to coordi-nate tasks’ access to the share data Pi(Lines 15 and 16)
We also apply the algorithm to the dataset inTable 1to demon-strate its work As an example, we can discuss node 1 a1(1, 2) The algorithm creates task t1 to combine node 1 a1(1, 2) with node 2 b1(1, 23) to generate node 3 a1b1(1, 2); it parallel cre-ates task t2 to combine node 1 a1(1, 2) with node 4 c1(1, 23) to generate node 5 a1c1(1, 2) However, before the algorithm con-tinues creating task t3 to generate node 7 a1b1c1(1, 2), it has
to wait till tasks t1 and t2 finishing their works Therefore, this strategy is slower than the first one in execution time InFig 9, three tasks t1, t2, and t3 parallel mine the same branch a1 5.3 Shared Obidset strategy
The third strategy, shared Obidset, is different from the ‘‘shared itemset’’ strategy discussed inSchlegel et al (2013) Each task has a
{ } ( )
1×a1 1, 2 1×a2(∅,3) 2×b1 1, 23( ) 4×c1 1, 23( )
( )
3×a b1 1 1, 2 5×a c1 1 1, 2( ) 3×a b2 1(∅,3) 5×a c2 1(∅,3) 6×b c1 1 1, 23( )
( )
7×a b c1 1 1 1, 2 t1 7×a b c2 1 1(∅,3) t2 t3
Fig 7 Illustration of the independent branch strategy.
Trang 8different branch assigned and its child tasks process together a
node in the branch The pseudo code of PMCAR with shared Obidset
strategy is shown inFig 10 The algorithm first finds all frequent
1-itemsets and adds them to the root node (Line 1) It then calls
pro-cedure PMCAR to generate frequent 2-itemsets (Lines 2–14) For
each branch of the tree, it creates a task and call procedure
Sub-PMCAR inside that task (Line 15) Sub-Sub-PMCAR is recursively called
to generate frequent k-itemsets (k > 2) and their rules (Lines 20–
34) We can see that the functions of procedures PMCAR and
Sub-PMCAR look like those mentioned in PMCAR with independent
branch strategy However, this algorithm provides a more
compli-cated parallel strategy In Sub-PMCAR, the algorithm creates a list
of child tasks to parallel intersect Obidset of two nodes (Lines 27–
28) This allows the work distribution to be the most fine-grained Nevertheless, all child tasks have to finish their work before calcu-lating two properties pos and total for the new node (Lines 29–31) Consequently, there is a high cost of synchronization among child tasks and between child tasks and their parent task
Let us illustrate the basic ideas of shared Obidset strategy by Fig 11 Branch a1 is assigned to task t1 In procedure Sub-PMCAR, tasks t2 and t3 which are child tasks of t1 process together node
3 a1b1(1, 2), i.e., tasks t2 and t3 parallel intersect Obidset1 and Obidset2 of two nodes 3 a1b1(1, 2) and 5 a1c1(1, 2), respec-tively However, task t2 must wait till task t3 finishing the intersec-tion of two Obidset2to obtain Obidset1and Obidset2of the new node
7 a1b1c1(1, 2) Additionally, parent task t1 represented by the solid block must wait till all tasks t2, t3, and other child tasks fin-ishing their work
6 Time complexity analysis
In this section, we analyze the time complexities of both sequential and proposed parallel CAR mining algorithms We then derive the speedup of the parallel algorithm We also compare the time complexity of our parallel algorithm with those of existing parallel algorithms
{ } ( )
1×a1 1, 2 1×a2(∅,3) 2×b1 1, 23( ) 4×c1 1, 23( )
( )
3×a b1 1 1, 2 5×a c1 1 1, 2( ) 3×a b2 1(∅,3) 5×a c2 1(∅,3) 6×b c1 1 1, 23( )
( )
7×a b c1 1 1 1, 2 7×a b c2 1 1(∅,3)
t3
Fig 9 Illustration of the shared branch strategy.
Input: Dataset D,minSup and minConf
Output: All CARs satisfying minSup and minConf Procedure:
frequent 1-itemset
PMCAR(L r,minSup, minConf)
8 ifl att y ≠l att x then
9 O att =l att l att x | y ; // using bitwise operation
10 O values =l values x ∪l values y ;
11 O Obidset i=l Obidset x i∩l Obidset y i; // ∀ ∈i [ ]1,k
12 O pos =argmaxi∈[ ] 1,k{O Obidset i}; 13
1
k
i i
O total O Obidset
=
14 if O Obidset O pos. ≥ minSup then // node O satisfies minSup
Fig 8 PMCAR with shared branch strategy.
Trang 9We can see that the sequential CAR mining algorithm described
in Section4scans the dataset once and uses a main loop to mine all
CARs Based on the cost model inSkillicorn (1999), the time com-plexity of this algorithm is:
Input: Dataset D minSup and minConf,
Output: All CARs satisfying minSup and minConf
Procedure:
1 Let L be the root node of the tree r L includes a set of nodes in which each node contains a r
frequent 1-itemset
PMCAR(L r,minSup, minConf)
2 totalCARs=CARs=∅ ;
3 for all l x∈L r.children do
4 Generate-Rule(CARs, l , minConf); x
5 P i= ∅ ;
6 for all l y∈L r.children, with y> do x
7 ifl att y ≠l att x then // two nodes are combined only if their attributes are different
8 O att =l att l att x | y ; // using bitwise operation
9 O values =l values x ∪l values y ;
10 O Obidset i=l Obidset x i∩l Obidset y i; // ∀ ∈i [ ]1,k
11 O pos =argmaxi∈[ ] 1,k{O Obidset i}; 12
1
k
i i
O total O Obidset
=
13 ifO Obidset O pos. ≥ minSup then // node O satisfies minSup
14 P i= ∪ ;P i O
15 Taskt = new Task(() => { i Sub-PMCAR(tCARs, P minSup, minConf); }); i,
16 for each task in the list of created tasks do
17 collect the set of rules ( tCARs ) returned by each task;
18 totalCARs=totalCARs∪tCARs;
19 totalCARs=totalCARs∪CARs;
Sub-PMCAR(tCARs, L r,minSup, minConf)
20 for all l x∈L r.children do
21 Generate-Rule(tCARs, l , minConf); x
22 P i= ∅ ;
23 for all l y∈L r.children, with y> do x
24 ifl att y ≠l att x then // two nodes are combined only if their attributes are different
25 O att =l att l att x | y ; // using bitwise operation
26 O values =l values x ∪l values y ;
27 for i = 1 to k do // k is the number of classes
28 Taskchild = new Task(() => { i
O Obidset =l Obidset∩l Obidset ; });
29 Task.WaitAll(child ); i
30 O pos =argmaxi∈[ ] 1,k{O Obidset i}; 31
1
k
i i
O total O Obidset
=
32 ifO Obidset O pos. ≥ minSup then // node O satisfies minSup
33 P i= ∪ ;P i O
34 Sub-PMCAR(tCARs, P minSup, minConf); i,
Fig 10 PMCAR with shared Obidset strategy.
Trang 10TS¼ kS m þ a
where TSis the execution time of the sequential CAR mining
algo-rithm, kSis the number of iterations in the main loop, m is the
exe-cution time of generating nodes and rules in each iteration, and a is
the execution time of accessing dataset
The proposed parallel algorithm distributes node and rule
gen-erations to multiple tasks executed on multi-cores Thus, the
exe-cution time of generating nodes and rules in each iteration is m
tc, where t is the number of tasks and c is the number of cores The
time complexity of the parallel algorithm is:
TP¼ kP m
t cþ a
where TPis the execution time of the proposed parallel CAR mining
algorithm, kPis the number of iterations in the main loop
The speedup is thus:
Sp ¼TS
TP¼kS m þ a
kPm
tcþ a
In our experiments, the execution time of the sequential code (for
example, the code to scan the dataset) is very small In addition,
the number of iterations in the main loop in both sequential and
parallel algorithms is similar Therefore, the speedup equation can
be simplified as follows:
Sp ¼kS m þ a
kP m
tcþ a
kS m
kPm tc
mm
tc
¼ t c Thus, we can achieve up to a t c speedup over the sequential
algorithm
Now we analyze the time complexity of the parallel CBA
algo-rithm proposed inThakur and Ramesh (2008) Since this algorithm
is based on the Apriori algorithm, it must scan the dataset many
times Additionally, this algorithm was employed on a distributed
memory system which means that it needs an additional
computa-tion time for communicacomputa-tion and informacomputa-tion exchange among
nodes Consequently, the time complexity of this algorithm is:
TC¼ kC m
pþ a þ d
where TCis the execution time of the parallel CBA algorithm, kCis
the number of iterations required by the parallel CBA algorithm, p
is the number of processors, and d is the execution time for
commu-nication and data exchange among computing nodes
Assume that kP kCand t c p We have:
TC¼ kCm
pþ a
þ ðkC 1Þ a þ kC d
TPþ ðkC 1Þ a þ kC d
Obviously, TP< TCwhich implies that our proposed algorithm is
fas-ter than the parallel version for CBA in theory
Similarly, the time complexity of the parallel FP-Growth
algo-rithm proposed inMokeddem and Belbachir (2010)is as follows:
TF¼ kF m
pþ d
þ a
where TFis the execution time of the parallel FP-Growth algorithm,
kFis the number of iterations required by the parallel FP-Growth algorithm
The parallel FP-Growth scans the dataset once and then parti-tions it into P parts regarding the number of processors Each pro-cessor scans its local data partition to count the local support of each item Therefore, the execution time of accessing the dataset
in this algorithm is only a However, computing nodes need to broadcast the local support of each item across the group so that each processor can calculate the global count Thus, this algorithm also needs an additional computation time d for data transfer Assume that kP kFand t c p We have:
TF¼ kFm
pþ a
þ kF d TPþ kF d
It can conclude that our proposed parallel algorithm is also faster than the parallel FP-Growth algorithm in theory and TP< TF< TC
7 Experimental results This section provides the results of our experiments including the testing environment, the results of the scalability experiments
of three proposed parallel strategies, and the performance of the proposed parallel algorithm with variation on the number of ob-jects and attributes It finally compares the execution time of PMCAR with that of the recent sequential CAR mining algorithm, CAR-Miner (Nguyen et al., 2013)
7.1 Testing environment All experiments were conducted on a multi-core processor computer which has one Intel i7-2600 processor The processor has 4 cores and an 8 MB L3-cache, runs at a core frequency of 3.4 GHz, and also supports Hyper-threading The computer has
4 GB of memory and runs OS Windows 7 Enterprise (64-bit) SP1 The algorithms were coded in C# by using MS Visual Studio NET
2010 Express The parallel algorithm was implemented based on the parallelism model supported in Microsoft NET Framework 4.0 (version 4.0.30319)
The experimental datasets were obtained from the University of California Irvine (UCI) Machine Learning Repository ( http://mlear-n.ics.uci.edu) and the Frequent Itemset Mining (FIM) Dataset Repository (http://fimi.ua.ac.be/data/) The four datasets used in the experiments are Poker-hand, Chess, Connect-4, and Pumsb with the characteristics shown in Table 2 The table shows the number of attributes (including the class attribute), the number
of class labels, the number of distinctive values (i.e., the total num-ber of distinct values in all attributes), and the numnum-ber of objects (or records) in each dataset The Chess, Connect-4, and Pumsb datasets are dense and have many attributes whereas the Poker-hand dataset is sparse and has few attributes
7.2 Scalability experiments
We evaluated the scalability of PMCAR by running it on the computer that had been configured to utilize a different number
{ } ( )
1×a1 1, 2 1×a2(∅,3) 2×b1 1, 23( ) 4×c1 1, 23( )
( )
3×a b1 1 1, 2 5×a c1 1 1, 2( ) 3×a b2 1(∅,3) 5×a c2 1(∅,3) 6×b c1 1 1, 23( )
( )
7×a b c1 1 1 1, 2 t1 7×a b c2 1 1(∅,3)
t2,t3
Fig 11 Illustration of shared Obidset strategy.
Table 2 Characteristics of the experimental datasets.
Dataset # Attributes # Classes # Distinctive values # Objects