of Information Technology Specialization FPT University Hanoi, Vietnam hatv14@fe.edu.vn Dang Hai Nguyen Institute of System Integration Le Quy Don Technical University Hanoi, Vietnam ngu
Trang 1An Efficient Algorithm for Mining Maximal
Co-location Pattern Using Instance-trees
Dai Phong Le
Institute of System Integration
Le Quy Don Technical University
Hanoi, Vietnam
ledaiphong.isi@lqdtu.edu.vn
Cao Dai Pham Institute of System Integration
Le Quy Don Technical University
Hanoi, Vietnam daipc.isi@lqdtu.edu.vn
Van Tuan Luu Institute of System Integration
Le Quy Don Technical University
Hanoi, Vietnam tuanlv.isi@lqdtu.edu.vn Vanha Tran
Dept of Information Technology Specialization
FPT University Hanoi, Vietnam hatv14@fe.edu.vn
Dang Hai Nguyen Institute of System Integration
Le Quy Don Technical University
Hanoi, Vietnam nguyendanghai.mta@gmail.com
Abstract—Prevalent co-location patterns, which refer to groups
of features whose instances frequently appear together in nearby
geographic space, are one of the main branches of spatial data
mining As the data volume continues to increase, it is redundant
if all patterns are discovered Maximal co-location patterns
(MCPs) are a compressed representation of all these patterns
and they provide a new insight into the interaction among
different spatial features to discover more valuable knowledge
from data sets Increasing the volume of spatial data sets makes
discovering MCPs still very challenging We dedicate this study
to designing an efficient MCP mining algorithm First, features in
size-2 patterns are regarded as a sparse graph, MCP candidates
are generated by enumerating maximal cliques from the sparse
graph Second, we design two instance-tree structures, star
neighbor- and sibling node-based instance-trees to store neighbor
relationships of instances All maximal co-location instances of
the candidates are yielded efficiently from these instance-tree
structures Finally, a MCP candidate is marked as prevalent if
its participation index, which is calculated based on the maximal
co-location instances, is not smaller than a minimum prevalence
threshold given by users The efficiency of the proposed algorithm
is proved by comparison with the previous algorithms on both
synthetic and real data sets
Index Terms—data mining, maximal co-location pattern, star
neighbor, instance-tree
I INTRODUCTION With the development of the global positing system (GPS)
enabled mobile and hand-held devices, many applications
are designed based on geo-location data, e.g., peer-to-peer
ridesharing, ride service hailing, and food delivery The
valu-able knowledge discovered from spatial data makes these
application services more and more accurately and they can
provide personalized services And prevalent co-location
pat-terns (PCPs), which are groups of spatial features (e.g., hotels,
restaurants, convenience stores in point of interest data) with
their instances (e.g., a specific hotel, restaurant, or convenience
store), are one of the main branches of spatial data mining
Fig 1 shows a distribution of a point of interest data set
in Tokyo, Japan As can be seen, the instances of hotels,
restaurants, and convenience stores are frequently located to-gether in neighbors of each other There are four PCPs formed
in this data set, {Hotel, Restaurant}, {Hotel, Convenience store}, {Restaurant, Convenience store}, and {Hotel, Restau-rant, Convenience store} {Hotel, RestauRestau-rant, Convenience store} is a MCP because it does not have any super-PCPs PCP mining has been proved to be an effective tool for discovering valuable knowledge from spatial data sets and it is applied to many fields such as environmental management [1], mobile communications [2], social science [3], and location-based services [4]
Fig 1: The distribution of a point of interest data set
If a PCP has no super-patterns, it is a MCP It is challenging
to discover MCPs when the numbers of features and instances are large and/or the distribution of data is dense In this study, we focus on developing an efficient algorithm of MCP mining by employing two efficient instance-tree structures The two structures are designed to store neighbor relationships
of instances Co-location instances of MCP candidates can
be efficiently collected from these structures Therefore, the efficiency of the mining process can be improved
The remainder of this study is organized as follows Section
2 shows the problem statement and related work The proposed algorithms are represented in Section 3 Section 4 makes a promise to improve the mining efficiency of our algorithms
by experiments Section 5 concludes this work
2021 8th NAFOSTED Conference on Information and Computer Science (NICS)
Trang 2C.4 D.2 A.2
C.1 D.1 A.3
B.1
A.4 C.3
B.4
B.2 A.1
f.i: the i-th instance of feature f.
: having neighbor relationships.
E.1 D.3 C.2 B.3 E.2
{A, B} {A, C} {A, D} {B, C} {C, D}
A.2 B.2 A.1 C.3 A.1 D.2 B.1 C.2 C.1 D.1 A.3 B.1 A.2 C.4 A.2 D.1 B.2 C.4 C.1 D.2 A.3 B.4 A.3 C.2 A.3 D.3 B.3 C.1 C.1 D.3 A.4 B.3 3/4 3/4 3/4 3/3 B.4 C.3 C.2 D.3 3/4 4/4 0.75 0.75 4/4 4/4 C.4 D.1 0.75 1 3/4 3/3 I(c) T(c) 0.75
PR(c)
Candidate c
PI(c)
{B, D} {A, E} {C, E} {A, B, C} {A, B, D} {A, C, D} {B, C, D} {A, B, C, D}
B.1 D.3 A.2 E.2 C.3 E.1 A.2 B.2 C.4 A.2 B.2 D.1 A.2 C.4 D.1 B.1 C.2 D.3 A.2 B.2 C.4 D.1 B.2 D.1 1/4 1/2 1/4 1/2 A.3 B.1 C.2 A.3 B.1 D.3 A.3 C.2 D.3 B.2 C.4 D.1 A.3 B.1 C.2 D.3 B.4 D.2 0.25 0.25 2/4 2/4 2/4 A.3 B.4 D.3 2/4 2/4 2/3 2/4 2/4 2/3 2/4 2/4 2/4 2/3 B.4 D.3 0.5 2/4 3/4 2/3 0.5 0.5 0.5 3/4 3/3 0.5
0.75
Fig 2: An example of co-location pattern mining
II PROBLEM STATEMENT AND RELATED WORK
A Problem statement
Given (1) a set of spatial features F = {f1, , fm}, and
a set of their instances I = {I1, , Im}, with Ii (1 ≤ i ≤
m) corresponds to instances of feature fi, each instance in
Ii is a triple tuple hfeature type, instance ID, locationi; (2)
a neighbor relationship R on the instance set I, R normally
uses a Euclidean distance metric with a distance threshold d,
if the distance between two instances that belong to different
feature types is smaller than or equal to d, the two instances
have a neighbor relationship; and (3) minprev is a minimum
prevalence threshold to evaluate the prevalence of a pattern
A subset of F, c = {f1, , fk} (1 ≤ k ≤ m) is a
size-k co-location pattern, I(c) is a co-location instance of
c whose instances have the neighbor relationship R with
each other A set of I(c) is called the table instance of c,
T(c) The participation ratio of feature fi in c is denoted
PR(c, fi), which is the fraction of the instances of fi that
participate in T {c} The participation index of c is denoted
P I(c) = min{P R(c, fi)}, fi ∈ c If the PI(c) is not smaller
than minprev, c is marked as a PCP If a PCP c has no any
prevalent super-patterns, c is called a MCP
Fig 2 shows an example of co-location pattern mining
There are five features, A, B, C, D, and E The instances
of A are A.1, A.2, A.3, and A.4 Assuming that c = {A,
B, C, D} is a candidate and the co-location instances of c
are {A.2, B.2, C.4, D.1}, and {A.3, B.1, C.2, D.3} The
participation ratio of each feature in c is P R(c, A) = 2/4,
P R(c, B) = 2/4, P R(c, C) = 2/4, P R(c, D) = 2/3 Thus,
P I(c) = min{2/4, 2/4, 2/4, 2/3} = 0.5 If users set minprev
= 0.4, P I(c) > minprev, hence c is a prevalent CP Similarly,
{A, B}, {A, C}, {A, D}, {B, C}, {C, D}, {B, D}, {A, B, C},
{A, C, D}, {B, C, D} are prevalent Since {A, B, C, D} has
no prevalent super-patterns, it becomes a MCP While {A, B,
C} is not a MCP since it has a prevalent super-pattern
The problem of CP mining is discovering all PCPs from
a given data set In furthermore, to represent compactly the
patterns in the minig result, a set of MCPs are required
B Related work
Join-based [5] is known as the first algorithm in the PCP
mining domain It uses an expensive join operation to collect
table instances To tackle this weakness, many algorithms which no longer use join operations are developed [6]–[8] However, these algorithms mentioned above are difficult to handle with the increase in the volume of data Hence, many mining PCP algorithms on big data have proposed [9]–[11] The mining result normally contains a large numbers of PCPs, it is difficult for users to absorb, understand and apply them Hence, the notion of MCPs is proposed Yoo et al [12] designed a MCP mining algorithm called MAXColoc It converts instance neighborhood transactions to feature neigh-borhood transactions and then building a feature type tree
to generate candidates The table instance of each candidate
is collected by using a star instance mechanism However, this mechanism becomes very time-consuming when data sets are dense or large since it needs to examine the neighbor relationship of the instances in the all subsets of it
An order-clique-based (OCB) approach for discovering MCPs is also developed [13] The candidates are generated
by using a P2-tree For collecting co-location instances, they construct two tree structures, a Neib-tree to save the neighbor relationship of instances and an Ins-tree to collect co-location instances However, when data sets are dense or big, these trees become very luxuriant, it takes a lot of time when it copies all sub-trees of a candidate from Neib-tree to Ins-tree And it needs to allocate a large amount memory space since
it must remain the two trees in memory in all mining process
A sparse-graph and condensed tree-based (SGCT) algorithm [14] is developed recently to mine MCPs The candidates are generated by using a maximal clique enumeration algorithm [15] The table instance of each candidate is collected by using
a hierarchical verification scheme to construct a condensed instance-tree However, the scheme is a one-by-one inspection and it becomes very expensive when data is dense and the size
of candidates is long, the performance of SGCT drops sharply For a summary of the mentioned MCP algorithms, there are two aspects concerned: (1) reduce the number of MCP candidates; (2) build various data structures to collect table instances efficiently However, each algorithm has its own disadvantages when dealing with dense or/and large data sets Regarding the first aspect, because in practical applications, the number of features is small (generally within 100) [13], there is no difference in efficiency between the various meth-ods of generating MCP candidates Our full attention on the
2021 8th NAFOSTED Conference on Information and Computer Science (NICS)
Trang 3(1) A data set
(2) d
(3) min prev
Materialize neighbor relationships
Find size 2 patterns
Generate maximal candidates
Construct Inst-trees
Calculate PIs, filter maximal patterns
Fig 3: The proposed mining framework
C
B E
Fig 4: The relationship
of features in size-2 CPs second aspect by developing two instance-tree structures
III THE PROPOSED ALGORITHMS
Fig 3 shows the framework of the proposed algorithm The
first phase requires users to input a spatial data set, a distance
threshold d, and a minimum prevalence threshold minprev
The neighbor relationship of instances is materialized under
d in the second phase The third phase finds size-2 PCPs
A set of MCP candidates are generated based on the
size-2 patterns in the fourth phase The fifth phase collects the
table instance of each candidate by constructing a
instance-tree The sixth phase calculates participation indexes and filters
prevalent MCPs In this study, we mainly focuses on the fourth
phase, two efficient instance-trees are devised
A Star neighbors
Definition 1: The star neighbor (SN) of an instance iq is
defined as a set of instances jp, SN = {jp | jp > iq, p 6=
q, 1 ≤ p, q ≤ m} and iq is called the center instance
For example, Table I lists the star neighbor of each instance
in the data set shown in Fig 2
TABLE I: Star neighbors of instances in Fig 2
Center
instance Star neighborinstances
Center instance Star neighborinstances A.1 C.3, D.2 A.2 B.2, C.4, D.1, E.2
A.3 B.1, B.4, C.2, D.3 A.4 B.3
: empty set.
B Generating candidates
According to the anti-monotonicity property of PCPs [6], if
a size-(k>2) pattern c is prevalent, all size-2 patterns which
are generated by the features in c must be prevalent Hence,
size-k candidates can be generated based on size-2 PCPs It is
easy to find that the relationship of features in size-2 PCPs can
be plotted as an undirected graph G2F whose nodes are the
features in the size-2 PCPs and edges are these size-2 PCPs
Definition 2: A size 2-feature graph, G2F(V, E) is a set of
vertices V = {fi | fi is a feature of the size-2 PCPs} and a
set of edges E = {(fi, fj) | (fi, fj) is a size-2 PCPs}
For example, the data set in Fig 2, if users give minprev
= 0.2, we can obtain size-2 PCPs are: {A, B}, {A, C}, {A,
D}, {A, E}, {B, C}, {B, D}, {C, D}, and {C, E} Fig 4
illustrates the G2F graph constructed by the size-2 PCPs It
can be seen that the MCP candidates are equal to the maximal
cliques which are enumerated in G2F
To enumerate all maximal cliques from G2F, we employ
a maximal clique enumeration algorithm developed in [12], [15] Algorithm 1 describes the process of generating MCP candidates from size-2 PCPs, where Γ(fi) is a set of vertices that directly connect to fi For details of Algorithm 1, please refer to [15]
For example, running Algorithm 1 in Fig 4, two maximal cliques are yielded, {A, B, C, D} and {A, C, E} The two are considered as candidates for discovering prevalent MCPs Algorithm 1: Generating candidate maximal patterns Input: an undirected graph constructed by size-2 prevalent patterns, G2F(V, E);
Output: a set of candidate maximal pattens, CMPs;
1 Initialize P = V , Q = ∅, X = ∅;
2 for fi in a degeneracy ordering f1, , fmof (V, E) do
3 P = Γ(fi) ∩ {fi+1, , fm} ;
4 X = Γ(fi) ∩ {f0, , fi−1} ;
5 BronKerboshPivot(P , {fi}, X) ;
7 BronKerboshPivot(P , Q, X):
8 if P ∪ X = ∅ then
9 CM P s.add(Q) ;
11 Choose a pivot u in P ∪ X with |P ∩ Γ(u)| = max
12 for v ∈ P \ Γ(u) do
13 BronKerboshPivot(P ∩ Γ(v), Q ∪ {v}, X ∩ Γ(v)) ;
14 P = P \ {v} ;
15 X = X \ {v} ;
17 return CMPs ;
C A star neighbor-based instance-tree to collect co-location instances
Definition 3: A star neighbor-based instance-tree (STN-IT)
of a candidate is defined as follows: (1) The tree has one root The root is the center instance which is determined by an item
in SNs; (2) Each node is an instance in the star neighbor of the center instance; (3) A qualified node is the intersection of the star neighbor of its parent with the instances in the star neighbor of the center instance; (4) The tree-depth of STN-IT
is equal to (k-1) with k is the size of the candidate
Algorithm 2 shows the pseudocode of constructing a star neighbor-based instance-tree for collecting the table instance
of a MCP candidate c The first phase initializes a star neighbor-based instance-tree STNIT by using an item it which
2021 8th NAFOSTED Conference on Information and Computer Science (NICS)
Trang 4has the center instance feature type that is the first feature in
c and its neighbor contains all remainder feature types in c
Then, these instances in the star neighbor whose feature type is
equal to the second feature in c are added as children (Step 1)
A variable depth is defined as the tree-depth The second phase
iterates each tree-leaf leaf in STNIT and gets the intersection
of the star neighbor of it and the star neighbor of leaf (Steps
2-5) The third phase adds the result of the intersection as the
children of leaf (Steps 6-7) If the intersection is empty, the
leaf is removed (Step 9) The fourth phase deletes all leaves
that their depths are smaller than the size of c (Step 10)
For example, SN(A.2) = {B.2, C.4, D.1, E.2} and it is a
satisfied item of c = {A, B, C, D} The STN-IT of A.2 can
be plotted in Fig 5 First, A.2 is added as the root and B.2 is
added as a child of A.2 (Fig 5a) Then, the intersection of it
and the star neighbor of B.2 is required, com = it ∩ SN(B.2)
= {B.2, C.4, D.1, E.2} ∩ {C.4, D.1} = {C.4, D.1}, thus C.4
and D.1 are added as children of B.2 (Fig 5b) Fig 5c shows
after appending the children of C.4 Next, D.1 which is a child
of B.2, is deleted since it is a leaf with its depth is smaller
than the size of c (k = 4) Finally, {A.2, B.2, C.4, D.1} is
regarded as a co-location instance of {A, B, C, D}
Algorithm 2: Constructing a STN-IT tree
Input: a candidate maximal pattern, c; an item it in
SCSNIs of c; the star neighbor, SN;
Output: a star neighbor-based instance-tree, STNIT;
1 STNIT = initialTree(c, it) ;
2 depth = ST N IT getDepth;
3 while depth < k do
4 for leaf ∈ ST N IT getLeaves do
5 if leaf getDepth == depth then
6 com = getIntersection(it, SN (leaf ));
14 depth = STNIT.getDepth;
16 STNIT = refinementTree(STNIT);
17 return STNIT;
D A sibling node-based instance-tree to collect co-location
instances
The star neighbor-based instance-tree is constructed for each
instance in the satisfied candidate star neighbor items, in this
Section, a new instance-tree that deals with the all instances
simultaneously is designed
Definition 4: A sibling node-based instance-tree (SBN-IT)
of a candidate is defined as follows: (1) The tree has one
root named Root; (2) The children of the root are the center
instances in SNs; (3) A qualified node is the intersection of
A.2
B.2
A.2 B.2
A.2 B.2
D.1
A.2 B.2
D.1
(a)
A.2
B.2
A.2 B.2
A.2 B.2
D.1
A.2 B.2
D.1
(b)
A.2
B.2
A.2 B.2
A.2 B.2
D.1
A.2 B.2
D.1
A.2
B.2
A.2 B.2
A.2 B.2
D.1
A.2 B.2
D.1
Fig 5: The star neighbor-based instance-tree of A.2 (a) Initialize a STN-IT tree (b) Add the children of B.2 (c) Add the children of C.4 (d) The final STN-IT tree
its star neighbor with its sibling nodes; (4) The tree-depth of SBN-IT is equal to the size of the candidate
Algorithm 3 describes the pseudocode of the process of con-structing SBN-IT A sibling node-based instance-tree, SBNIT
is initialized in the first phase The children of the root are all the center instances in the satisfied candidate star neighbor items All the star neighbors of the center instances are added
as children of themselves (Step 1) The second phase iterates each leaf of SBN-IT to get the intersection between the sibling nodes and the star neighbor of the leaf (Steps 3-6) Note that, only a leaf that has its feature type is equal to the one feature type obtained by (depth - 1) index in the candidate with depth is the tree-depth of SBN-IT, other leaves can be directly deleted In the third phase, appending the intersection
as children of the leaf if it is not empty (Steps 7-8) Finally, a refinement function is called to delete all leaves if their depths are not equal to the size of the candidate (Step 14)
Algorithm 3: Constructing a SBN-IT tree Input: a candidate maximal pattern, c; an item it in SCSNIs of c; all item in SCSNIs, items;
Output: a sibling node-based instance-tree, SBNIT;
1 SBNIT = initialTree(c, items) ;
2 depth= SBNIT.getDepth;
3 while depth ≤ k do
4 for leaf ∈ SBNIT.getLeaves do
5 if leaf.feature == c[depth - 1] then
6 sibl= getSibling(leaf );
7 com= getIntersection(sibl, SN(leaf ));
14 SBNIT.delete(leaf )
17 depth= SBNIT.getDepth;
19 SBNIT = refinementTree(SBNIT);
20 return SBNIT;
For example, Fig 6 presents the process of constructing the
2021 8th NAFOSTED Conference on Information and Computer Science (NICS)
Trang 5B.2 C.4 D.1
(a)
(b)
Root
A.3
A.2
B.2
C.4 D.1
Root
A.3 B.1
C.2 D.3
(c)
B.4
B.4
A.2
B.2 C.4
D.1
Root
A.3 B.1
C.2 D.3
B.4
E.2
E.2
E.2 (a)
A.2
B.2 C.4 D.1
(a)
(b)
Root
A.3
B.1 C.2 D.3
A.2
B.2 C.4 D.1
Root
A.3 B.1 C.2 D.3
(c)
B.4
B.4
A.2
B.2 C.4
D.1
Root
A.3 B.1 C.2 D.3
B.4
E.2
E.2
E.2
(b)
A.2
B.2 C.4 D.1
(a)
(b)
Root
A.3
A.2
B.2 C.4 D.1
Root
A.3 B.1 C.2 D.3
(c)
B.4
B.4
A.2
D.1
Root
A.3 B.1 C.2 D.3
B.4
E.2
E.2
E.2
(c)
Fig 6: Constructing SBN-IT for {A, B, C, D} (a) Initialize a SBN-IT tree (b) Add children (c) The final SBN-IT tree sibling node-based instance-tree for the candidate pattern c =
{A, B, C, D} First, the satisfied candidate star neighbor items
are {A.2: B.2, C.4, D.1, E.2} and {A.3: B.1, B.4, C.2, D.3}
In the first phase, a SBN-IT tree is constructed as shown in
Fig 6a with a root and A.2, A.3 are two children of the root
All instances in the star neighbors of A.2 and A.3 are added as
children of A.2 and A.3, respectively Iterating each leaf in the
SBN-IT tree, for example, considering B.2, we get the sibling
nodes of B.2 is sibl = {C.4, D.1, E.2}, and the star neighbor
of B.2 is SN(B.2) = {C.4, D.1}, then com = sibl ∩ SN(B.2) =
{C.4, D.1} Thus, C.4 and D.1 are added as children of B.2
In the next iterator, C.4 is considered Since the feature type
of C.4 is C and it is different to c[depth -1] = B (the tree-depth
now is 2), C.4 is directly removed Fig 6b shows the result
when all sibling nodes of feature B are processed A complete
SBN-IT tree is plotted in Fig 6c It can be seen that each
branch of the tree is a co-location instance of the candidate
IV COMPUTATIONAL EXPERIMENTS
A set of experiments is designed to evaluate the performance
of the proposed algorithm When the framework in Fig 1
uses the star neighbor-based instance-tree and the sibling
node-based instance-tree, we name the mining algorithm
MCPM-STN-IT and MCPM-SBN-IT, respectively SGCT [14] is
cho-sen for comparison with our algorithms since it is the most
recent MCP mining algorithm and has been proven to be
superior to MAXColoc [12] and OCB [13] All algorithms are
implemented in C++ and performed on an Intel Core i7-3770
3.40GHz PC running Windows 7 with 16G main memory
A Data sets
Two synthetic data sets are generated by a synthetic data
generator which is similar to [5] The numbers of features
and instances of the two are 50 and 20,000, respectively The
spatial areas are set to 500×500 for dense data and 1000×1000
for sparse data Moreover, there are two real POI data sets in
our experiments They are collected from facilities such as
bank, parking lot, and hotel in Guangzhou (49,566 instances,
44 features) and Shanghai (67,824 instances, 50 features),
China Their distributions are plotted in Fig 7
B Performance study
a) The effectiveness: Table II lists the execution time
in each phase of each algorithm For the sparse data set,
Fig 7: The distribution of (a) Guangzhou (b) Shanghai the distance and prevalence thresholds are set to 16 and 0.4, respectively The two thresholds are set to 13 and 0.6, respectively when the dense data set is used As can be seen that: (1) The proportion of generating MCP candidates is very small in the total cost; (2) The largest fraction of the computation time is devoted to constructing instance-trees to collect table instances The neighbor relationship of instances
in SGCT is verified one-by-one, it takes more execution time compared with the proposed algorithms which effectively reduce searching space The different gap in computation time becomes larger when data sets are dense
TABLE II: The execution time of each phase of the algorithms Algorithm SGCT MCPM-STN-IT MCPM-SBN-IT
Factor (s)
Data set
Sparse Dense Sparse Dense Sparse Dense
T gen neighbors 0.159 0.247 0.149 0.212 0.195 0.21
T find size 2 patterns 0.321 0.625 0.28 0.498 0.287 0.494
T gen candidates 0.003 0.003 0.003 0.003 0.003 0.003
T constr inst trees 29.541 316.139 1.137 124.912 1.924 19.889
T calc PI filter patterns 0.294 1.537 0.068 0.403 0.081 0.324
T total 30.318 318.551 1.637 126.028 2.490 20.920
Fig 8: The execution times on different instances (a) Sparse (b) Dense
b) The scalability: First, we compare the effect of differ-ent numbers of instances As shown in Fig 8, as the number
2021 8th NAFOSTED Conference on Information and Computer Science (NICS)
Trang 6(a) (b) (c) (d)
Fig 9: The scalability in different distance thresholds on (a) Synthetic sparse data (b) Dense data (c) Guangzhou (d) Shanghai
Fig 10: The scalability in different prevalence thresholds on (a) Sparse data (b) Dense data (c) Guangzhou (d) Shanghai
of instances increases, the proposed algorithms shows better
performance
Second, we evaluate the performance of the proposed
algo-rithms on different distance thresholds Fig 9a and 9b show the
results examined in the synthetic data sets when the prevalence
thresholds are fixed at 0.4 and 0.6 for the sparse and dense data
sets, respectively Fig 9c and 9d compares the computation
time of these algorithms when they are performed on the two
real data sets with the prevalence thresholds is set to 0.4 As
can be seen, MCPM-STN-IT and MCPM-SBN-IT show less
execution time
Third, the scalability of the proposed algorithms in terms
of the minimum prevalence threshold is examined We set the
distance thresholds to 20 and 13 for the sparse and dense data
sets, respectively Fig 10a and 10b show the results Overall,
with the increase of the prevalence threshold, the execution
times of all algorithms are decreasing in cost However, SGCT
takes more execution time in small values of the prevalence
threshold When the proposed algorithms are performed on the
two real data sets, the distance thresholds are set to 300m and
250m, respectively The comparison of the execution times is
shown in Fig 10c and 10d As can be seen that the proposed
algorithms show better performance
V CONCLUSION AND FUTURE WORK
Two efficient instance-trees named STN-IT and SBN-IT
are designed to collect table instances of candidate
maxi-mal co-location patterns in this study The two instance-tree
structures effectively reduce the search space when examining
the neighbor relationship of instances Fast construction of
instance-trees supports increasing the speed of collecting table
instances Therefore, the performance of discovering maximal
co-location patterns is improved By examining on both
syn-thetic and real data sets, the proposed algorithm shows more
efficiently than the existing algorithms
REFERENCES [1] W Liu, Q Liu, M Deng, J Cai, and J Yang, “Discovery of statisti-cally significant regional co-location patterns on urban road networks,” International Journal of Geographical Information Science, pp 1–24, 2021.
[2] V Tran, L Wang, and H Chen, “Discovering spatial co-location patterns
by automatically determining the instance neighbor,” in Fuzzy Systems and Data Mining V IOS Press, 2019, pp 583–590.
[3] Z He, M Deng, Z Xie, L Wu, Z Chen, and T Pei, “Discovering the joint influence of urban facilities on crime occurrence using spatial co-location pattern mining,” Cities, vol 99, p 102612, 2020 [4] V Tran and L Wang, “Delaunay triangulation-based spatial colocation pattern mining without distance thresholds,” Statistical Analysis and Data Mining, vol 13, no 3, pp 282–304, 2020.
[5] Y Huang, S Shekhar, and H Xiong, “Discovering colocation patterns from spatial data sets: a general approach,” IEEE Transactions on Knowledge and data engineering, vol 16, no 12, pp 1472–1485, 2004 [6] J S Yoo and S Shekhar, “A joinless approach for mining spatial coloca-tion patterns,” IEEE Transaccoloca-tions on Knowledge and Data Engineering, vol 18, no 10, pp 1323–1337, 2006.
[7] V Tran, L Wang, and L Zhou, “A spatial co-location pattern mining framework insensitive to prevalence thresholds based on overlapping cliques,” Distributed and Parallel Databases, pp 1–38, 2021 [8] V Tran, L Wang, and L Zhou, “Mining spatial co-location patterns based on overlap maximal clique partitioning,” in 20th IEEE Interna-tional Conference on Mobile Data Management, 2019, pp 467–472 [9] A M Sainju and Z Jiang, “Mining colocation from big geo-spatial event data on gpu,” 2021.
[10] J S Yoo, D Boulware, and D Kimmey, “Parallel co-location mining with mapreduce and nosql systems,” Knowledge and Information Sys-tems, pp 1–31, 2019.
[11] A M Sainju, D Aghajarian, Z Jiang, and S Prasad, “Parallel grid-based colocation mining algorithms on gpus for big spatial event data,” IEEE Transactions on Big Data, vol 6, no 1, pp 107–118, 2018 [12] J S Yoo and M Bow, “Mining maximal co-located event sets,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining Springer, 2011, pp 351–362.
[13] L Wang, L Zhou, J Lu, and J Yip, “An order-clique-based approach for mining maximal co-locations,” Information Sciences, vol 179, no 19,
pp 3370–3382, 2009.
[14] X Yao and L Peng, “A fast space-saving algorithm for maximal co-location pattern mining,” Expert Syst Appl., vol 63, pp 310–323, 2016 [15] D Eppstein and D Strash, “Listing all maximal cliques in large sparse real-world graphs,” in International Symposium on Experimental Algorithms Springer, 2011, pp 364–375.
2021 8th NAFOSTED Conference on Information and Computer Science (NICS)