DSpace at VNU: Efficient strategies for parallel mining class association rules

However, sequential algorithms are not efﬁcient for mining CARs in large datasets while existing parallel algorithms require communication and collaboration among computing nodes which i

Trang 1

Efﬁcient strategies for parallel mining class association rules

Dang Nguyena, Bay Vob,⇑, Bac Lec

a

University of Information Technology, Vietnam National University, Ho Chi Minh, Viet Nam

b

Information Technology Department, Ton Duc Thang University, Ho Chi Minh, Viet Nam

c

Department of Computer Science, University of Science, Vietnam National University, Ho Chi Minh, Viet Nam

a r t i c l e i n f o

Keywords:

Associative classiﬁcation

Class association rule mining

Parallel computing

Data mining

Multi-core processor

a b s t r a c t

Mining class association rules (CARs) is an essential, but time-intensive task in Associative Classification (AC) A number of algorithms have been proposed to speed up the mining process However, sequential algorithms are not efficient for mining CARs in large datasets while existing parallel algorithms require communication and collaboration among computing nodes which introduces the high cost of synchroni-zation This paper addresses these drawbacks by proposing three efficient approaches for mining CARs in large datasets relying on parallel computing To date, this is the first study which tries to implement an algorithm for parallel mining CARs on a computer with the multi-core processor architecture The pro-posed parallel algorithm is theoretically proven to be faster than existing parallel algorithms The exper-imental results also show that our proposed parallel algorithm outperforms a recent sequential algorithm

in mining time

1 Introduction

Classiﬁcation is a common topic in machine learning, pattern

recognition, statistics, and data mining Therefore, numerous

ap-proaches based on different strategies have been proposed for

building classiﬁcation models Among these strategies, Associative

Classiﬁcation (AC), which uses the associations between itemsets

and class labels (called class association rules), has been proven

it-self to be more accurate than traditional methods such as C4.5

(Quinlan, 1993) and ILA (Tolun & Abu-Soud, 1998; Tolun, Sever,

Uludag, & Abu-Soud, 1999) The problem of classiﬁcation based

on class association rules is to ﬁnd the complete set of CARs which

satisfy the user-deﬁned minimum support and minimum

conﬁ-dence thresholds from the training dataset A subset of CARs is

then selected to form the classiﬁer Since the ﬁrst introduction in

(Liu, Hsu, & Ma, 1998), tremendous approaches have been

pro-posed to solve this problem Examples include the classiﬁcation

based on multiple association rules (Li, Han, & Pei, 2001), the

clas-siﬁcation model based on predictive association rules (Yin & Han,

2003), the classiﬁcation based on the maximum entropy (Thabtah,

Cowling, & Peng, 2005), the classiﬁcation based on the information

gain measure (Chen, Liu, Yu, Wei, & Zhang, 2006), the lazy-based

approach for classiﬁcation (Baralis, Chiusano, & Garza, 2008), the

use of an equivalence class rule tree (Vo & Le, 2009), the classiﬁer based on Galois connections between objects and rules (Liu, Liu, & Zhang, 2011), the lattice-based approach for classiﬁcation (Nguyen,

Vo, Hong, & Thanh, 2012), and the integration of taxonomy infor-mation into classifier construction (Cagliero & Garza, 2013) However, most existing algorithms for associative classification have primarily concentrated on building an efficient and accurate classifier but have not considered carefully the runtime perfor-mance of discovering CARs in the first phase In fact, finding all CARs is a challenging and time-consuming problem due to two rea-sons First, it may be hard to find all CARs in dense datasets since there are a huge number of generated rules For example, in our experiments, some datasets can induce more than 4,000,000 rules Second, the number of candidate rules to check is very large Assuming there are d items and k class labels in the dataset, there can be up to k (2d 1) rules to consider Very few studies, for in-stance (Nguyen, Vo, Hong, & Thanh, 2013; Nguyen et al., 2012; Vo

& Le, 2009; Zhao, Cheng, & He, 2009), have discussed the execution time efﬁciency of the CAR mining process Nevertheless, all algo-rithms have been implemented by sequential strategies Conse-quently, their runtime performances have not been satisﬁed on large datasets, especially recently emerged dense datasets Researchers have begun switching to parallel and distributed com-puting techniques to accelerate the computation Two parallel algorithms for mining CARs were recently proposed on distributed memory systems (Mokeddem & Belbachir, 2010; Thakur & Ramesh, 2008)

http://dx.doi.org/10.1016/j.eswa.2014.01.038

⇑Corresponding author Tel.: +84 083974186.

E-mail addresses: nguyenphamhaidang@outlook.com (D Nguyen), vdbay@it.

tdt.edu.vn (B Vo), lhbac@ﬁt.hcmus.edu.vn (B Le).

Contents lists available atScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w e l s e v i e r c o m / l o c a t e / e s w a

Trang 2

Along with the advent of the computers with the multi-core

processors, more memory and computing power of processors

have been utilized so that larger datasets can be tackled in the

main memory with lower cost in comparison with the usage of

dis-tributed or mainframe systems Therefore, this present study aims

to propose three efﬁcient strategies for parallel mining CARs on the

multi-core processor computers The proposed approaches

over-come two disadvantages of existing methods for parallel mining

CARs They eliminate communication and collaboration among

computing nodes which introduces the overhead of

synchroniza-tion They also avoid data replication and do not require data

trans-fer among processing units As a result, the proposals signiﬁcantly

improve the response time compared to the sequential counterpart

and existing parallel methods The proposed parallel algorithm is

theoretically proven to be more efﬁcient than existing parallel

algorithms The experimental results also show that the proposed

parallel algorithm can achieve up to a 2.1 speedup compared to

a recent sequential CAR mining algorithm

The rest of this paper is organized as follows In Section2, some

preliminary concepts of the class association rule problem and the

multi-core processor architecture are brieﬂy given The beneﬁts of

parallel mining on multi-core processor computers are also

dis-cussed in this section Work related to sequential and parallel

min-ing class association rules are reviewed in Section3 Our previous

sequential CAR mining algorithm is summarized in Section4

be-cause it forms the basic framework of our proposed parallel

algo-rithm The primary contributions are presented in Section 5 in

which three proposed strategies for efﬁciently mining

classiﬁca-tion rules under the high performance parallel computing context

are described The time complexity of the proposed algorithm is

analyzed in Section6 Section7presents the experimental results

while conclusions and future work are discussed in Section8

2 Preliminary concepts

This section provides some preliminary concepts of the class

association rule problem and the multi-core processor

architec-ture It also discusses beneﬁts of parallel mining on the multi-core

processor architecture

2.1 Class association rule

One of main goals of data mining is to discover important

rela-tionships among items such that the presences of some items in a

transaction are associated with the presences of some other items

To achieve this purpose, Agrawal and his colleagues proposed the

Apriori algorithm to ﬁnd association rules in a transactional

data-set (Agrawal & Srikant, 1994) An association rule has the form

X ? Y where X, Y are frequent itemsets and X \ Y = £ The problem

of mining association rules is to ﬁnd all association rules in a

data-set having support and conﬁdence no less than user-deﬁned

min-imum support and minmin-imum conﬁdence thresholds

Class association rule is a special case of association rule in

which only the class attribute is considered in the rule’s right-hand

side (consequent) Mining class association rules is to ﬁnd the set

of rules which satisfy the minimum support and minimum

confi-dence thresholds specified by end-users Let us define the CAR

problem as follows

Let D be a dataset with n attributes {A1, A2, , An} and |D|

re-cords (objects) where each record has an object identiﬁer (OID)

Let C = {c1, c2, , ck} be a list of class labels A speciﬁc value of an

attribute Aiand class C are denoted by lower-case letters aimand

c, respectively

Deﬁnition 1 An item is described as an attribute and a speciﬁc value for that attribute, denoted by h(Ai, aim)i and an itemset is a set

of items

Definition 2 Let I ¼ fhðA1;a11Þi; ; hðA1;a1m1Þi; hðA2;a21Þi; ; hðA2;a2m 2Þi; ; hðAn;an1Þi; ; hðAn;anm nÞig be a finite set of items Dataset D is a finite set of objects, D = {OID1, OID2, , OID|D|} in which each object OIDxhas the form OIDx= attr(OIDx) ^ class(OIDx) (1 6 x 6 |D|) with attr(OIDx) # I and class(OIDx)eC For example, OID1 for the dataset shown in Table 1 is {h(A, a1)i, h(B, b1)i, h(C, c1)i} ^ {1}

Deﬁnition 3 A class association rule R has the form itemset ? cj, where cjeC is a class label

Deﬁnition 4 The actual occurrence ActOcc(R) of rule R in D is the number of objects of D that match R’s antecedent, i.e., ActOcc(R) = |{OID|OIDeD ^ itemset # attr(OID)}|

Deﬁnition 5 The support of rule R, denoted by Supp(R), is the number of objects of D that match R’s antecedent and are labeled with R’s class Supp(R) is deﬁned as:

SuppðRÞ ¼ jfOIDjOID 2 D ^ itemset # attrðOIDÞ ^ cj¼ classðOIDÞgj

Definition 6 The confidence of rule R, denoted by Conf(R), is defined as:

Conf ðRÞ ¼ SuppðRÞ

ActOccðRÞ

A sample dataset is shown inTable 1 It contains three objects, three attributes (A, B, and C), and two classes (1 and 2) Considers rule R: h(A, a1)i ? 1 We have ActOcc(R) = 2 and Supp(R) = 1 since there are two objects with A = a1, in that one object (object 1) also con-tains class 1 We also have Conf ðRÞ ¼ActOccðRÞSuppðRÞ ¼1

2.2 Multi-core processor architecture

A multi-core processor (shown inFig 1) is a single computing component with two or more independent central processing units (cores) in the same physical package (Andrew, 2008) The proces-sors were originally designed with only one core However, mul-ti-core processors became mainstream when Intel and AMD introduced their commercial multi-core chip in 2008 (Casali & Ernst, 2013) A multi-core processor computer has different speci-ﬁcations from either a computer cluster (Fig 2) or a SMP (Symmet-ric Multi-processor) system (Fig 3): the memory is not distributed like in a cluster but rather is shared It is similar to the SMP archi-tecture Many SMP systems, however, have the NUMA (Non Uni-form Memory Access) architecture There are several memory blocks which are accessed with different speeds from each proces-sor depending on the distance between the memory block and the processor On the contrary, the multi-core processors are usually

on the UMA (Uniform Memory Access) architecture There is one

Table 1 Example of a dataset.

Trang 3

memory block only, so all cores have an equal access time to the

memory (Laurent, Négrevergne, Sicard, & Termier, 2012)

2.3 Parallel mining on the multi-core processor architecture

Obviously, the multi-core processor architecture has many

desirable properties, for example each core has direct and equal

ac-cess to all the system’s memory and the multi-core chip also allows

higher performance at lower energy and cost Therefore, numerous

researchers have developed parallel algorithms on the multi-core

processor architecture in the data mining literature One of the ﬁrst

algorithms targeting multi-core processor computers was FP-array

proposed by Liu and his colleagues in 2007 (Liu, Li, Zhang, & Tang,

2007) The authors proposed two techniques, namely a

cache-conscious FP-array and a lock-free dataset tiling parallelism

mech-anism for parallel discovering frequent itemsets on the multi-core

processor machines.Yu and Wu (2011)proposed an efﬁcient load

balancing strategy in order to reduce massive duplicated generated

candidates Their main contribution was to enhance the task of candidate generation in the Apriori algorithm on the multi-core processor computers Schlegel, Karnagel, Kiefer, and Lehner (2013) recently adapted the well-known Eclat algorithm to a highly parallel version which runs on the multi-core processor sys-tem They proposed three parallel approaches for Eclat: indepen-dent class, shared class, and shared itemset Parallel mining has also been widely adopted in many other research ﬁelds, such as closed frequent itemset mining (Negrevergne, Termier, Méhaut, & Uno, 2010), gradual pattern mining (Laurent et al., 2012), corre-lated pattern mining (Casali & Ernst, 2013), generic pattern mining (Negrevergne, Termier, Rousset, & Méhaut, 2013), and tree-struc-tured data mining (Tatikonda & Parthasarathy, 2009)

While many researches have been devoted to develop parallel pattern mining and association rule mining algorithms relied on the multi-core processor architecture, no studies have published regarding the parallel class association rule mining problem Thus, this paper proposes the ﬁrst algorithm for parallel mining CARs which can be executed efﬁciently on the multi-core processor architecture

3 Related work This section begins with the overview of some sequential ver-sions of CAR mining algorithm and then provides details about two parallel versions of it

3.1 Sequential CAR mining algorithms The ﬁrst algorithm for mining CARs was proposed byLiu et al (1998)based on the Apriori algorithm (Agrawal & Srikant, 1994) After its introduction, several other algorithms adopted its ap-proach, including CAAR (Xu, Han, & Min, 2004) and PCAR (Chen, Hsu, & Hsu, 2012) However, these methods are time-consuming because they generate a lot of candidates and scan the dataset sev-eral times Another approach for mining CARs is to build the fre-quent pattern tree (FP-tree) (Han, Pei, & Yin, 2000) to discover rules, which was presented in some algorithms such as CMAR (Li

et al., 2001) and L3 (Baralis, Chiusano, & Garza, 2004) The mining

Processor

Memory

Processor

Memory

Processor

Memory

Processor

Memory

Chip

C o r e

Memory

Fig 1 Multi-core processor: one chip, two cores, two threads (Source: http://

software.intel.com/en-us/articles/multi-core-processor-architecture-explained ).

Trang 4

process used by the FP-tree does not generate candidate rules.

However, its signiﬁcant weakness lies in the fact that the FP-tree

does not always ﬁt in the main memory Several algorithms, MMAC

(Thabtah, Cowling, & Peng, 2004), MCAR (Thabtah et al., 2005), and

MCAR (Zhao et al., 2009), utilized the vertical layout of the dataset

to improve the efﬁciency of the rule discovery phase by employing

a method that extends the tidsets intersection method mentioned

in (Zaki, Parthasarathy, Ogihara, & Li, 1997) Vo and Le proposed

another method for mining CARs by using an equivalence class rule

tree (ECR-tree) (Vo & Le, 2009) An efﬁcient algorithm, called

ECR-CARM, was also proposed in their paper The two strong features

demonstrated by ECR-CARM are that it scans the dataset only once

and uses the intersection of object identiﬁers to determine the

sup-port of itemsets quickly However, it needs to generate and test a

huge number of candidates because each node in the tree contains

all values of a set of attributes.Nguyen et al (2013)modiﬁed the

ECR-tree structure to speed up the mining process In their

en-hanced tree, named MECR-tree, each node contains only one value

instead of the whole group They also provided theorems to

iden-tify the support of child nodes and prune unnecessary nodes

quickly Based on MECR-tree and these theorems, they presented

the CAR-Miner algorithm for effectively mining CARs

It can be seen that many sequential algorithms of CAR mining

have been developed but very few parallel versions of it have been

proposed Next section reviews two parallel algorithms of CAR

mining which have been mentioned in the associative

classiﬁca-tion literature

3.2 Parallel CAR mining algorithms

One of the primary weaknesses of sequential versions of CAR

mining is that they are unable to provide the scalability in terms

of data dimension, size, or runtime performance for such large

datasets Consequently, some researchers recently have tried to

apply parallelism to current sequential CAR mining algorithms to

release the sequential bottleneck and improve the response time

Thakur and Ramesh (2008) proposed a parallel version for the

CBA algorithm (Liu et al., 1998) Their proposed algorithm was

implemented on a distributed memory system and based on data

parallelism The parallel CAR mining phase is an adaption of the

CD approach which was originally proposed for parallel mining

fre-quent itemsets (Agrawal & Shafer, 1996) The training dataset was

partitioned into P parts which were computed on P processors

Each processor worked on its local data to mine CARs with the

same global minimum support and minimum conﬁdence

How-ever, this algorithm has three big weaknesses as follows First, it

uses a static load balance which partitions work among processors

by using a heuristic cost function This causes a high load imbal-ance Second, a high synchronization happens at the end of each step Final, each site must keep the duplication of the entire set

of candidates Additionally, the authors did not provide any exper-iments to illustrate the performance of the proposed algorithm Mokeddem and Belbachir (2010)proposed a distributed version for FP-Growth (Han et al., 2000) to discover CARs Their proposed algorithm was also employed on a distributed memory system and based on the data parallelism Data were partitioned into P parts which were computed on P processors for parallel discover-ing the subsets of classiﬁcation rules An inter-communication was established to make global decisions Consequently, their ap-proach faces the big problem of high synchronization among nodes In addition, the authors did not conduct any experiments

to compare their proposed algorithm with others

Two existing parallel algorithms for mining CARs which were employed on distributed memory systems have two signiﬁcant problems: high synchronization among nodes and data replication

In this paper, a parallel CAR mining algorithm based on the multi-core processor architecture is thus proposed to solve those problems

4 A sequential class association rule mining algorithm

In this section, we brieﬂy summarize our previous sequential CAR mining algorithm as it forms the basic framework of our pro-posed parallel algorithm

In (Nguyen & Vo, 2014), we proposed a tree structure to mine CARs quickly and directly Each node in the tree contains one item-set along with:

(1) (Obidset1, Obidset2, , Obidsetk) – A list of Obidsets in which each Obidsetiis a set of object identiﬁers that contain both the itemset and class ci Note that k is the number of classes

in the dataset

(2) pos – A positive integer storing the position of the class with

pos = argmaxie[1,k]{|Obidseti|}

(3) total – A positive integer which stores the sum of cardinality

of all Obidseti, i.e., total ¼Pk

i¼1ðjObidsetijÞ

However, the itemset is converted to the form att values for easily programming, where

(1) att – A positive integer represents a list of attributes (2) values – A list of values, each of which is contained in one attribute in att

Main Memory

Processor 1 Processor 2 Processor n

Bus Arbiter

System Bus

Fig 3 Symmetric multi-processor system (Source: http://en.wikipedia.org/wiki/Symmetric_multiprocessing ).

Trang 5

For example, itemset X = {h(B, b1)i, h(C, c1)i} is denoted as

X = 6 b1c1 A bit representation is used for storage of itemset

attributes to save memory usage Attributes BC can be represented

as 110 in bit representation, so the value of these attributes is 6

Bitwise operations are then used to quickly join itemsets

InTable 1, itemset X = {h(B, b1)i, h(C, c1)i} is contained in objects

1, 2 and 3 Thus, the node which contains itemset X has the form

6 b1c1(1, 23) in which Obidset1= {1} (or Obidset1= 1 for short)

(i.e., object 1 contains both itemset X and class 1), Obidset2= {2, 3}

(or Obidset2= 23 for short) (i.e., objects 2 and 3 contain both

item-set X and class 2), pos = 2 (denoted by a line under Obiditem-set2, i.e., 23),

and total = 3 pos is 2 because the cardinality of Obidset2for class 2

is maximum (2 versus 1)

Obtaining support and conﬁdence of a rule becomes computing

|Obidsetpos| and jObidsetpos j

total , respectively For example, node

6 b1c1(1, 23) generates rule {h(B, b1)i, h(C, c1)i} ? 2 (i.e., if B = b1

and C = c1, then Class = 2) with Supp = |Obidset2| = |23| = 2 and

Conf ¼2

Based on the tree structure, we also proposed a sequential

algo-rithm for mining CARs, called Sequential-CAR-Mining, as shown in

Fig 4 Firstly, we ﬁnd all frequent 1-itemsets and add them to the

root node of the tree (Line 1) Secondly, we recursively discover

other frequent k-itemsets based on the Depth-First Search strategy

(procedure Sequential-CAR-Mining) Thirdly, while traversing

nodes in the tree, we also generate rules which satisfy the mini-mum conﬁdence threshold (procedure Generate-Rule) The pseudo code of the algorithm is shown inFig 4

Fig 5shows the tree structure generated by the sequential CAR mining algorithm for the dataset shown inTable 1 For details on the tree generation, please refer to the study byNguyen and Vo (2014)

5 The proposed parallel class association rule mining algorithm Although Sequential-CAR-Mining is an efﬁcient algorithm for mining all CARs, its runtime performance reduces signiﬁcantly on large datasets due to the computational complexity As a result,

Input: Dataset ,D minSup and minConf

Output: All CARs satisfying minSup and minConf Procedure:

frequent 1-itemset

Sequential-CAR-Mining(L r,minSup, minConf)

2 CARs=∅ ;

7 ifl att y ≠l att x then // two nodes are combined only if their attributes are different

8 O att =l att l att x | y ; // using bitwise operation

9 O values =l values x ∪l values y ;

10 O Obidset i=l Obidset x i∩l Obidset y i; // ∀ ∈i [ ]1,k

11 O pos =argmaxi∈[ ] 1,k{O Obidset i}; 12

1

k

i i

O total O Obidset

=

13 if O Obidset O pos. ≥ minSup then // node O satisfies minSup

Generate-Rule( ,l minConf)

16.conf=l Obidset l pos. / l total;

18 CARs=CARs∪{l itemset →c pos(l Obidset l pos. , conf) };

{ } ( )

1×a1 1, 2 1×a2(∅,3) 2×b1 1, 23( ) 4×c1 1, 23( ) ( )

3×a b1 1 1, 2 5×a c1 1 1, 2( ) 3×a b2 1(∅,3) 5×a c2 1(∅,3) 6×b c1 1 1, 23( ) ( )

7×a b c1 1 1 1, 2 7×a b c2 1 1(∅,3)

Fig 5 Tree generated by sequential-CAR-mining for the dataset in Table 1

Trang 6

Input: Dataset ,D minSup and minConf

Output: All CARs satisfying minSup and minConf

Procedure:

frequent 1-itemset

PMCAR(L r,minSup, minConf)

2 totalCARs=CARs=∅ ;

11 O pos =argmaxi∈[ ] 1,k{O Obidset i};

12

1

k

i i

O total O Obidset

=

15 Taskt = new Task(() => { i

16 for each task in the list of created tasks do

18 totalCARs=totalCARs∪tCARs;

Sub-PMCAR(tCARs,L r,minSup, minConf)

28 O pos =argmaxi∈[ ] 1,k{O Obidset i};

29

1

k

i i

O total O Obidset

=

Fig 6 PMCAR with independent branch strategy.

Trang 7

we have tried to apply parallel computing techniques to the

sequential algorithm to speed up the mining process

Schlegel et al (2013)recently adapted the well-known Eclat

algorithm to a highly parallel version which runs on the multi-core

processor system They proposed three parallel approaches for

Eclat: independent class, shared class, and shared itemset In the

‘‘independent class’’ strategy, each equivalence class is distributed

to a single thread which mines its assigned class independently

from other threads This approach has an important advantage in

that the synchronization cost is low It, however, consumes much

higher memory than the sequential counterpart because all threads

hold entire their tidsets at the same time Additionally, this strategy

often causes high load imbalances when a large number of threads

are used Threads mine light classes often ﬁnish sooner than

threads mine heavier classes In the ‘‘shared class’’ strategy, a single

class is assigned to multiple threads This can reduce the memory

consumption but increase the cost of synchronization since one

thread has to communicate to others to obtain their tidsets In the

ﬁnal strategy, ‘‘shared itemset’’, multiple threads concurrently

per-form the intersection of two tidsets for a new itemset In this

strat-egy, threads have to synchronize with each other with a high cost

Basically, the proposed algorithm, Parallel Mining Class

Associ-ation Rules (PMCAR), is a combinAssoci-ation of Sequential-CAR-Mining

and parallel ideas mentioned in (Schlegel et al., 2013) It has the

same core steps as Sequential-CAR-Mining where it scans the

data-set once to obtain all frequent 1-itemdata-sets along with their

Obid-sets, and it then starts recursively mining It also adopts two

parallel strategies ‘‘independent class’’ and ‘‘shared class’’

How-ever, PMCAR has some differences as follows PMCAR is a parallel

algorithm for mining class association rules while the work done

by Schlegel et al focuses on mining frequent itemsets only

Addi-tionally, we also propose a third parallel strategy shared Obidset

for PMCAR PMCAR is employed on a single system with the

mul-ti-core processor where the main memory can be shared with and

equally accessed by all cores Hence, PMCAR does not require

syn-chronization among computing nodes like other parallel CAR

min-ing algorithms employed on distributed memory systems

Compared to Sequential-CAR-Mining, the main differences

be-tween PMCAR and Sequential-CAR-Mining in terms of parallel

CAR mining strategies are discussed in the following sections

5.1 Independent branch strategy

The ﬁrst strategy, independent branch, distributes each branch of

the tree to a single task, which mines assigned branch independently

from all other tasks to generate CARs General speaking, this strategy

is similar to the ‘‘independent class’’ strategy mentioned in (Schlegel

et al., 2013) except that PMCAR uses the different tree structure for

the purpose of CAR mining and it is implemented by using tasks

in-stead of threads As mentioned above, this strategy has some

limita-tions such as high load imbalances and high memory consumption

However, the primary advantage of this strategy is that each task is

executed independently from other tasks without any

synchroniza-tion In our implementation, the algorithm is employed based on the

parallelism model in NET Framework 4.0 Instead of using threads,

our algorithm uses tasks that have more advantageous than threads First, task consumes less memory usage than thread Second, while a single thread runs on a single core, tasks are designed to be aware of the multi-core processor and multiple tasks can be executed on a single core Final, using threads takes much time because operating systems must allocate data structures of threads, initialize, destroy them, and also perform the context switches between threads Con-sequently, our implementation can solve two problems: high mem-ory consumption and high imbalance

The pseudo code of PMCAR with independent branch strategy is shown inFig 6

We apply the algorithm to the sample dataset shown inTable 1to illustrate its basic ideas First, PMCAR ﬁnds all frequent 1-itemsets as done in Sequential-CAR-Mining (Line 1) After this step, we have

Lr= {1 a1(1, 2), 1 a2(£, 3), 2 b1(1, 23), 4 c1(1, 23)} Second, PMCAR calls procedure PMCAR to generate frequent 2-itemsets (Lines 3–14) For example, consider node 1 a1(1, 2) This node combines with two nodes 2 b1(1, 23) and 4 c1(1, 23) to generate two new nodes 3 a1b1(1, 2) and 5 a1c1(1, 2) Note that node 1 a1(1, 2) does not combine with node 1 a2(£, 3) since they have the same attribute (attribute A) which causes the support of the new node is zero regarding Theorem 1 mentioned in (Nguyen & Vo, 2014) After these steps, we have Pi= {3 a1b1(1, 2), 5 a1c1(1, 2)} Then, PMCAR creates a new task tiand calls procedure Sub-PMCAR inside that task with four parameters tCARs, minSup, minConf, and Pi The ﬁrst parameter tCARs is used to store the set of rules returned by Sub-PMCAR in a task (Line 15) For instance, task t1 is created and proce-dure Sub-PMCAR is executed inside t1 Proceproce-dure Sub-PMCAR is recursively called inside a task to mine all CARs (Lines 20–32) For example, task t1 also generates node 7 a1b1c1(1, 2) and its rule Fi-nally, after all created tasks completely mine all assigned branches, their results are collected and form the complete set of rules (Lines 16–19) InFig 7, three tasks t1, t2, and t3 represented by solid blocks parallel mine three branches a1, a2, and b1 independently

5.2 Shared branch strategy The second strategy, shared branch, adopts the same ideas of the

‘‘shared class’’ strategy mentioned inSchlegel et al (2013) In this strategy, each branch is parallel mined by multiple tasks The

pseu-do code of PMCAR with shared branch strategy is shown inFig 8 First, the algorithm initializes the root node Lr(Line 1) Then, the procedure PMCAR is recursively called to generate CARs When node lxcombines with node ly, the algorithm creates a new task

tiand performs the combination code inside that task (Lines 7– 17) Note that because multiple tasks concurrently mine the same branch, synchronization happens to collect necessary information for the new node (Line 18) Additionally, to avoid a data race (i.e., two or more tasks perform operations that update a shared piece data) (Netzer & Miller, 1989), we use a lock object to coordi-nate tasks’ access to the share data Pi(Lines 15 and 16)

We also apply the algorithm to the dataset inTable 1to demon-strate its work As an example, we can discuss node 1 a1(1, 2) The algorithm creates task t1 to combine node 1 a1(1, 2) with node 2 b1(1, 23) to generate node 3 a1b1(1, 2); it parallel cre-ates task t2 to combine node 1 a1(1, 2) with node 4 c1(1, 23) to generate node 5 a1c1(1, 2) However, before the algorithm con-tinues creating task t3 to generate node 7 a1b1c1(1, 2), it has

to wait till tasks t1 and t2 ﬁnishing their works Therefore, this strategy is slower than the ﬁrst one in execution time InFig 9, three tasks t1, t2, and t3 parallel mine the same branch a1 5.3 Shared Obidset strategy

The third strategy, shared Obidset, is different from the ‘‘shared itemset’’ strategy discussed inSchlegel et al (2013) Each task has a

{ } ( )

1×a1 1, 2 1×a2(∅,3) 2×b1 1, 23( ) 4×c1 1, 23( )

( )

3×a b1 1 1, 2 5×a c1 1 1, 2( ) 3×a b2 1(∅,3) 5×a c2 1(∅,3) 6×b c1 1 1, 23( )

( )

7×a b c1 1 1 1, 2 t1 7×a b c2 1 1(∅,3) t2 t3

Fig 7 Illustration of the independent branch strategy.

Trang 8

different branch assigned and its child tasks process together a

node in the branch The pseudo code of PMCAR with shared Obidset

strategy is shown inFig 10 The algorithm ﬁrst ﬁnds all frequent

1-itemsets and adds them to the root node (Line 1) It then calls

pro-cedure PMCAR to generate frequent 2-itemsets (Lines 2–14) For

each branch of the tree, it creates a task and call procedure

Sub-PMCAR inside that task (Line 15) Sub-Sub-PMCAR is recursively called

to generate frequent k-itemsets (k > 2) and their rules (Lines 20–

34) We can see that the functions of procedures PMCAR and

Sub-PMCAR look like those mentioned in PMCAR with independent

branch strategy However, this algorithm provides a more

compli-cated parallel strategy In Sub-PMCAR, the algorithm creates a list

of child tasks to parallel intersect Obidset of two nodes (Lines 27–

28) This allows the work distribution to be the most ﬁne-grained Nevertheless, all child tasks have to ﬁnish their work before calcu-lating two properties pos and total for the new node (Lines 29–31) Consequently, there is a high cost of synchronization among child tasks and between child tasks and their parent task

Let us illustrate the basic ideas of shared Obidset strategy by Fig 11 Branch a1 is assigned to task t1 In procedure Sub-PMCAR, tasks t2 and t3 which are child tasks of t1 process together node

3 a1b1(1, 2), i.e., tasks t2 and t3 parallel intersect Obidset1 and Obidset2 of two nodes 3 a1b1(1, 2) and 5 a1c1(1, 2), respec-tively However, task t2 must wait till task t3 ﬁnishing the intersec-tion of two Obidset2to obtain Obidset1and Obidset2of the new node

7 a1b1c1(1, 2) Additionally, parent task t1 represented by the solid block must wait till all tasks t2, t3, and other child tasks ﬁn-ishing their work

6 Time complexity analysis

In this section, we analyze the time complexities of both sequential and proposed parallel CAR mining algorithms We then derive the speedup of the parallel algorithm We also compare the time complexity of our parallel algorithm with those of existing parallel algorithms

{ } ( )

1×a1 1, 2 1×a2(∅,3) 2×b1 1, 23( ) 4×c1 1, 23( )

( )

3×a b1 1 1, 2 5×a c1 1 1, 2( ) 3×a b2 1(∅,3) 5×a c2 1(∅,3) 6×b c1 1 1, 23( )

( )

7×a b c1 1 1 1, 2 7×a b c2 1 1(∅,3)

t3

Fig 9 Illustration of the shared branch strategy.

Input: Dataset D,minSup and minConf

Output: All CARs satisfying minSup and minConf Procedure:

frequent 1-itemset

PMCAR(L r,minSup, minConf)

8 ifl att y ≠l att x then

1

k

i i

O total O Obidset

=

Fig 8 PMCAR with shared branch strategy.

Trang 9

We can see that the sequential CAR mining algorithm described

in Section4scans the dataset once and uses a main loop to mine all

CARs Based on the cost model inSkillicorn (1999), the time com-plexity of this algorithm is:

Input: Dataset D minSup and minConf,

Output: All CARs satisfying minSup and minConf

Procedure:

1 Let L be the root node of the tree r L includes a set of nodes in which each node contains a r

frequent 1-itemset

PMCAR(L r,minSup, minConf)

2 totalCARs=CARs=∅ ;

3 for all l x∈L r.children do

4 Generate-Rule(CARs, l , minConf); x

5 P i= ∅ ;

6 for all l y∈L r.children, with y> do x

1

k

i i

O total O Obidset

=

13 ifO Obidset O pos. ≥ minSup then // node O satisfies minSup

14 P i= ∪ ;P i O

15 Taskt = new Task(() => { i Sub-PMCAR(tCARs, P minSup, minConf); }); i,

16 for each task in the list of created tasks do

17 collect the set of rules ( tCARs ) returned by each task;

18 totalCARs=totalCARs∪tCARs;

19 totalCARs=totalCARs∪CARs;

Sub-PMCAR(tCARs, L r,minSup, minConf)

20 for all l x∈L r.children do

21 Generate-Rule(tCARs, l , minConf); x

22 P i= ∅ ;

23 for all l y∈L r.children, with y> do x

27 for i = 1 to k do // k is the number of classes

28 Taskchild = new Task(() => { i

O Obidset =l Obidset∩l Obidset ; });

29 Task.WaitAll(child ); i

1

k

i i

O total O Obidset

=

32 ifO Obidset O pos. ≥ minSup then // node O satisfies minSup

33 P i= ∪ ;P i O

34 Sub-PMCAR(tCARs, P minSup, minConf); i,

Fig 10 PMCAR with shared Obidset strategy.

Trang 10

TS¼ kS m þ a

where TSis the execution time of the sequential CAR mining

algo-rithm, kSis the number of iterations in the main loop, m is the

exe-cution time of generating nodes and rules in each iteration, and a is

the execution time of accessing dataset

The proposed parallel algorithm distributes node and rule

gen-erations to multiple tasks executed on multi-cores Thus, the

exe-cution time of generating nodes and rules in each iteration is m

tc, where t is the number of tasks and c is the number of cores The

time complexity of the parallel algorithm is:

TP¼ kP m

t cþ a

where TPis the execution time of the proposed parallel CAR mining

algorithm, kPis the number of iterations in the main loop

The speedup is thus:

Sp ¼TS

TP¼kS m þ a

kPm

tcþ a

In our experiments, the execution time of the sequential code (for

example, the code to scan the dataset) is very small In addition,

the number of iterations in the main loop in both sequential and

parallel algorithms is similar Therefore, the speedup equation can

be simpliﬁed as follows:

Sp ¼kS m þ a

kP m

tcþ a

kS m

kPm tc

mm

tc

¼ t c Thus, we can achieve up to a t c speedup over the sequential

algorithm

Now we analyze the time complexity of the parallel CBA

algo-rithm proposed inThakur and Ramesh (2008) Since this algorithm

is based on the Apriori algorithm, it must scan the dataset many

times Additionally, this algorithm was employed on a distributed

memory system which means that it needs an additional

computa-tion time for communicacomputa-tion and informacomputa-tion exchange among

nodes Consequently, the time complexity of this algorithm is:

TC¼ kC m

pþ a þ d

where TCis the execution time of the parallel CBA algorithm, kCis

the number of iterations required by the parallel CBA algorithm, p

is the number of processors, and d is the execution time for

commu-nication and data exchange among computing nodes

Assume that kP kCand t c p We have:

TC¼ kCm

pþ a

þ ðkC 1Þ a þ kC d

TPþ ðkC 1Þ a þ kC d

Obviously, TP< TCwhich implies that our proposed algorithm is

fas-ter than the parallel version for CBA in theory

Similarly, the time complexity of the parallel FP-Growth

algo-rithm proposed inMokeddem and Belbachir (2010)is as follows:

TF¼ kF m

pþ d

þ a

where TFis the execution time of the parallel FP-Growth algorithm,

kFis the number of iterations required by the parallel FP-Growth algorithm

The parallel FP-Growth scans the dataset once and then parti-tions it into P parts regarding the number of processors Each pro-cessor scans its local data partition to count the local support of each item Therefore, the execution time of accessing the dataset

in this algorithm is only a However, computing nodes need to broadcast the local support of each item across the group so that each processor can calculate the global count Thus, this algorithm also needs an additional computation time d for data transfer Assume that kP kFand t c p We have:

TF¼ kFm

pþ a

þ kF d TPþ kF d

It can conclude that our proposed parallel algorithm is also faster than the parallel FP-Growth algorithm in theory and TP< TF< TC

7 Experimental results This section provides the results of our experiments including the testing environment, the results of the scalability experiments

of three proposed parallel strategies, and the performance of the proposed parallel algorithm with variation on the number of ob-jects and attributes It ﬁnally compares the execution time of PMCAR with that of the recent sequential CAR mining algorithm, CAR-Miner (Nguyen et al., 2013)

7.1 Testing environment All experiments were conducted on a multi-core processor computer which has one Intel i7-2600 processor The processor has 4 cores and an 8 MB L3-cache, runs at a core frequency of 3.4 GHz, and also supports Hyper-threading The computer has

4 GB of memory and runs OS Windows 7 Enterprise (64-bit) SP1 The algorithms were coded in C# by using MS Visual Studio NET

2010 Express The parallel algorithm was implemented based on the parallelism model supported in Microsoft NET Framework 4.0 (version 4.0.30319)

The experimental datasets were obtained from the University of California Irvine (UCI) Machine Learning Repository ( http://mlear-n.ics.uci.edu) and the Frequent Itemset Mining (FIM) Dataset Repository (http://ﬁmi.ua.ac.be/data/) The four datasets used in the experiments are Poker-hand, Chess, Connect-4, and Pumsb with the characteristics shown in Table 2 The table shows the number of attributes (including the class attribute), the number

of class labels, the number of distinctive values (i.e., the total num-ber of distinct values in all attributes), and the numnum-ber of objects (or records) in each dataset The Chess, Connect-4, and Pumsb datasets are dense and have many attributes whereas the Poker-hand dataset is sparse and has few attributes

7.2 Scalability experiments

We evaluated the scalability of PMCAR by running it on the computer that had been conﬁgured to utilize a different number

{ } ( )

1×a1 1, 2 1×a2(∅,3) 2×b1 1, 23( ) 4×c1 1, 23( )

( )

3×a b1 1 1, 2 5×a c1 1 1, 2( ) 3×a b2 1(∅,3) 5×a c2 1(∅,3) 6×b c1 1 1, 23( )

( )

7×a b c1 1 1 1, 2 t1 7×a b c2 1 1(∅,3)

t2,t3

Fig 11 Illustration of shared Obidset strategy.

Table 2 Characteristics of the experimental datasets.

Dataset # Attributes # Classes # Distinctive values # Objects

Định dạng
Số trang	14
Dung lượng	1,09 MB