1. Trang chủ
  2. » Giáo Dục - Đào Tạo

CloSpan: Mining Closed Sequential Patterns in Large Datasets

50 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 50
Dung lượng 3,2 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

CloSpan: Mining Closed Sequential Patterns in Large Datasets. PowerPoint Presentation CloSpan Mining Closed Sequential Patterns in Large Datasets SEQUENTIAL PATTERNS Natural Language Processing Lab , NTU, 2006 Slide Outline Introduction Search Space Pruning Cl.

Trang 1

CloSpan: Mining Closed

Sequential Patterns in Large

Datasets SEQUENTIAL PATTERNS

Trang 3

element are listed alphabetically

<a(bc)dc> is a

subsequence of <a(abc) (ac)d(cf)>

Given support threshold min_sup_count =2, <(ab)c> is a

Trang 4

Introduction (Cont.)

Definition

Frequent Sequential Pattern (FS)

Include all the sequences whose support is no less than

min_sup

Closed Frequent Sequential Pattern (CS)

Include no sequence which has a super-sequence with the same support

CS FS

Trang 5

Introduction (Cont.)

Example – FS & CS

(af)dea eab

e(abf)(bde)

0 1 2

ea:3, (af)d:2, (af)e:2, eab:2

Trang 6

Introduction (Cont.)

Definition

Prefix and Postfix (Projection)

<a>, <aa>, <a(ab)> and <a(abc)>

(ac)d(cf)>

Given sequence <a(abc)(ac)d(cf)>

Prefix Postfix /Projection

<a> <(abc)(ac)d(cf)>

<aa> <(_bc)(ac)d(cf)>

<ab> <(_c)(ac)d(cf)>

Trang 7

Ex: s=<(ed)a> và ={e} thì <(ed)(ae)> is

an I-Step extension of <(ed)a>

S-Step extension

s s = <e 1 , e 2 , …, e m , {}>

Ex: <(a)(e)> is an S-Step extension of <(a)>

Trang 9

Search Space Pruning (Cont.)

Definition

Total number of items in D

Two sequences s and s’, s s’

Ds = D s’  (Ds) = (Ds’)

Example

D f = D (af) = {de, (de)}

 (D(af)) = (Df) = 4

Trang 10

Search Space Pruning (Cont.)

Definition

Two sequences s and s’, s s’

Trang 11

Search Space Pruning (Cont.)

f

Trang 12

Search Space Pruning (Cont.)

Trang 13

Search Space Pruning (Cont.)

Trang 14

CloSpan( s , D s , min_sup, L)

Input: A sequence s, a projectd DB D s , and min_sup

Output: The prefix search lattice L

Check whether a discovered sequence s’ exist s.t either s  s’ or s’ s,

and (D s ) = (D s’ );

if such super-pattern or sub-pattern exists then

Modify the link in L, return;

else insert s into L;

scan D s once, find every frequent item such that

s can be extended to (s  i), or

s can be extended to (s s);

if no valid available then

return;

for each valid do I-Step

Call CloSpan(s i , D s  i , min_sup , L );

for each valid do S-Step

Call CloSpan(s s , D ss , min_sup , L );

return;

Trang 16

CloSpan (Cont.)

Example

(af)dea eab

e(abf)(bde)

0 1 2

min_sup_count = 2

a:3, b:2, d:2, e:3, f:2

Trang 17

0 1 2

nil nil nil

Trang 18

CloSpan (Cont.)

Example (Cont.)

<>

0 1 2 3

Trang 20

a s :3

4

f i :2

Trang 22

0 1 2 3

Trang 24

0 1 2

Trang 26

0 1 2

Trang 27

e s :3

nil

Trang 29

e s :3

nil

a s :3

nil

Trang 31

e s :3

nil

a s :3

nil

Trang 38

Experimental Results

Synthetic Data

Parameters

D : Number of sequences in 000s

C : Average itemsets per sequence

T : Average items per itemset

N : Number of different items in 000s

S : Average itemsets in maximal sequences

I : Average items in maximal sequences

Two Data Set

D10 C10 T2.5 N10 S6 I2.5

D5 C20 T20 N10 S20 I20

Real world datasets

KDDCup2000 – Gazelle Click Stream

Trang 39

Experimental Results (Cont.)

Synthetic Data

D10 C10 T2.5 N10 S6 I2.5

Trang 40

Experimental Results (Cont.)

Synthetic Data

D5 C20 T20 N10 S20 I20

Trang 41

Experimental Results (Cont.)

Real world datasets

KDDCup2000

29,369 sequences

35,722 sessions

87,546 page views

The average number of sessions in a sequence is around 1

The average number of pageviews in a session is 2

The largest session contains 342 views

The longest sequence has 140 sessions

The largest sequence contains 651 page views

Trang 42

Experimental Results (Cont.)

Trang 43

Clospan to mine frequent closed sequences efficiently.

Clospan outperforms PrefixSpan.

Trang 47

Lexicographic Sequence Tree

Definition

Lexicographic Sequence Tree

<>

<(a)> <(b)>

<(ab)> <(a)(a)> <(a)(b)>

<(ab)(a)> <(ab)(b)> <(a)(bc)> <(a)(bd)>

Trang 48

Search Space Pruning

Definition

a subsequence s, projected database Ds

if , is a common prefix for all the sequence with the same extension type (either itemset-extension or

sequence-extension) in Ds

 , if s  is closed, must be a prefix of

 , we need not search s   and its descendants except the branch of s  

Example

D s = {de(af), de(fg)}

s  <de> not closed unnecessary to extend s <e>

Trang 49

Search Space Pruning (Cont.)

Trang 50

Search Space Pruning (Cont.)

Partial Order

D s        = D s

Ngày đăng: 08/11/2022, 14:03

TỪ KHÓA LIÊN QUAN

w