CloSpan: Mining Closed Sequential Patterns in Large Datasets. PowerPoint Presentation CloSpan Mining Closed Sequential Patterns in Large Datasets SEQUENTIAL PATTERNS Natural Language Processing Lab , NTU, 2006 Slide Outline Introduction Search Space Pruning Cl.
Trang 1CloSpan: Mining Closed
Sequential Patterns in Large
Datasets SEQUENTIAL PATTERNS
Trang 3element are listed alphabetically
<a(bc)dc> is a
subsequence of <a(abc) (ac)d(cf)>
Given support threshold min_sup_count =2, <(ab)c> is a
Trang 4Introduction (Cont.)
Definition
– Frequent Sequential Pattern (FS)
Include all the sequences whose support is no less than
min_sup
– Closed Frequent Sequential Pattern (CS)
Include no sequence which has a super-sequence with the same support
CS FS
Trang 5Introduction (Cont.)
Example – FS & CS
(af)dea eab
e(abf)(bde)
0 1 2
ea:3, (af)d:2, (af)e:2, eab:2
Trang 6Introduction (Cont.)
Definition
– Prefix and Postfix (Projection)
<a>, <aa>, <a(ab)> and <a(abc)>
(ac)d(cf)>
Given sequence <a(abc)(ac)d(cf)>
Prefix Postfix /Projection
<a> <(abc)(ac)d(cf)>
<aa> <(_bc)(ac)d(cf)>
<ab> <(_c)(ac)d(cf)>
Trang 7 Ex: s=<(ed)a> và ={e} thì <(ed)(ae)> is
an I-Step extension of <(ed)a>
– S-Step extension
s s = <e 1 , e 2 , …, e m , {}>
Ex: <(a)(e)> is an S-Step extension of <(a)>
Trang 9Search Space Pruning (Cont.)
Definition
Total number of items in D
Two sequences s and s’, s s’
Ds = D s’ (Ds) = (Ds’)
Example
– D f = D (af) = {de, (de)}
(D(af)) = (Df) = 4
Trang 10Search Space Pruning (Cont.)
Definition
Two sequences s and s’, s s’
Trang 11Search Space Pruning (Cont.)
f
Trang 12Search Space Pruning (Cont.)
Trang 13Search Space Pruning (Cont.)
Trang 14 CloSpan( s , D s , min_sup, L)
– Input: A sequence s, a projectd DB D s , and min_sup
– Output: The prefix search lattice L
– Check whether a discovered sequence s’ exist s.t either s s’ or s’ s,
and (D s ) = (D s’ );
– if such super-pattern or sub-pattern exists then
Modify the link in L, return;
– else insert s into L;
– scan D s once, find every frequent item such that
s can be extended to (s i ), or
s can be extended to (s s );
– if no valid available then
return;
– for each valid do I-Step
Call CloSpan(s i , D s i , min_sup , L );
– for each valid do S-Step
Call CloSpan(s s , D s s , min_sup , L );
– return;
Trang 16CloSpan (Cont.)
Example
(af)dea eab
e(abf)(bde)
0 1 2
min_sup_count = 2
a:3, b:2, d:2, e:3, f:2
Trang 170 1 2
nil nil nil
Trang 18CloSpan (Cont.)
Example (Cont.)
<>
0 1 2 3
Trang 20a s :3
4
f i :2
Trang 220 1 2 3
Trang 240 1 2
Trang 260 1 2
Trang 27e s :3
nil
Trang 29e s :3
nil
a s :3
nil
Trang 31e s :3
nil
a s :3
nil
Trang 38Experimental Results
Synthetic Data
– Parameters
D : Number of sequences in 000s
C : Average itemsets per sequence
T : Average items per itemset
N : Number of different items in 000s
S : Average itemsets in maximal sequences
I : Average items in maximal sequences
– Two Data Set
D10 C10 T2.5 N10 S6 I2.5
D5 C20 T20 N10 S20 I20
Real world datasets
– KDDCup2000 – Gazelle Click Stream
Trang 39Experimental Results (Cont.)
Synthetic Data
D10 C10 T2.5 N10 S6 I2.5
Trang 40Experimental Results (Cont.)
Synthetic Data
D5 C20 T20 N10 S20 I20
Trang 41Experimental Results (Cont.)
Real world datasets
– KDDCup2000
29,369 sequences
35,722 sessions
87,546 page views
The average number of sessions in a sequence is around 1
The average number of pageviews in a session is 2
The largest session contains 342 views
The longest sequence has 140 sessions
The largest sequence contains 651 page views
Trang 42Experimental Results (Cont.)
Trang 43 Clospan to mine frequent closed sequences efficiently.
Clospan outperforms PrefixSpan.
Trang 47Lexicographic Sequence Tree
Definition
– Lexicographic Sequence Tree
<>
<(a)> <(b)>
<(ab)> <(a)(a)> <(a)(b)>
<(ab)(a)> <(ab)(b)> <(a)(bc)> <(a)(bd)>
Trang 48Search Space Pruning
Definition
a subsequence s, projected database Ds
if , is a common prefix for all the sequence with the same extension type (either itemset-extension or
sequence-extension) in Ds
, if s is closed, must be a prefix of
, we need not search s and its descendants except the branch of s
Example
– D s = {de(af), de(fg)}
– s <de> not closed unnecessary to extend s <e>
Trang 49Search Space Pruning (Cont.)
Trang 50Search Space Pruning (Cont.)
– Partial Order
D s = D s