A Novel Clustering Method for Animal Trajectory Analysis using Wireless Sensor Network Quang Hiep Vu Database/Bioinformatics Laboratory Chungbuk National University Cheongju, Korea hie
Trang 1A Novel Clustering Method for Animal Trajectory
Analysis using Wireless Sensor Network
Quang Hiep Vu Database/Bioinformatics Laboratory
Chungbuk National University
Cheongju, Korea hiep88@dblab.chungbuk.ac.kr
Meijing Li Database/Bioinformatics Laboratory
Chungbuk National University
Cheongju, Korea mjlee@dblab.chungbuk.ac.kr
Thi Hong Nhan Vu Human Machine Interaction Laboratory UET, Vietnam National University
Hanoi, Vietnam vthnhan@gmail.com
Keun Ho Ryu Database/Bioinformatics Laboratory Chungbuk National University Cheongju, Korea khryu@dblab.chungbuk.ac.kr
Abstract—Animal plays an important role in our Earth,
researching the movements of animals is very helpful for us to
conserve rare and precious species as well as food exploration In
this paper, we employ Wireless Sensor Networks (WSNs) with
the potential for highly increased spatial and temporal resolution
of measurement data Hence WSNs promise enhanced tracking
of animals without human intervention To help experts making
a better species and habitat assessment as well as conversation
strategies, we propose an Extended Hierarchical Path clustering
eHPC1 method for analyzing the mobility of wild animals A
predictive mobility algorithm is also presented, which help
experts solve the problems in data allocation and management A
system that simulates the mobility of animals is implemented
Performance of the proposed method is finally evaluated in terms
of running time and estimation accuracy
Keywords—Clustering methods, animal trajectory analysis,
wireless sensor network
I INTRODUCTION The animal kingdom is very large having a variety of
animals big and small which can be found in water, in air and
on land They have different shapes and sizes Animals are our
natural resources and it is not infinite so we should do our best
to conserve them They need our care and love To that end,
animal tracking is very useful, which helps us understand how
individuals and populations move within local areas, migrate
across oceans and continents, and evolve through millennia
This information is being used to address environmental
challenges such as climate and land use change, biodiversity
loss, invasive species, and the spread of infectious diseases
Wireless Sensor Networks (WSNs) provide an advanced
solution for tracking animals A WSN is composed of relay
nodes, sensor nodes, and the base stations Cellular networks
can also be used considering the difficulty of achieving the
necessary radio range coverage The WSN displays precise animal locations and movements By using the Received Signal Strength Indicator the trilateration method can be used
to exactly locate the animals and GPS positioning (equipped with animals) gives accurate position information that can be
stored on the sensor node The sensing data from distributed
relay nodes will transmit to base stations and the base stations can use satellites or cellular networks to transmit the data to the researcher [7, 8] In this research, we assume animals always move in a coverage region of the WSNs and we can equip the devices in that region for monitoring animals
In this paper, we employ WSNs to track animals WSNs technology will be more effective than other technologies for obtaining the required information with a very considerable reduction of the intervention of the researcher The automatically collected volume of data is enormous, which is analyzed and used in long-term decision making [13, 14, 15] Understanding the movement patterns of animals helps us
in making strategies of rare wildlife conversation and efficiently food exploration For that purpose, many techniques for data analysis have been proposed Clustering is just one of them Conventional methods such as K-mean, DBSCAN, etc cannot be directly applied to discover hidden trajectory patterns since they were originally proposed for objects in form of points, not for objects in form of time series [1] An algorithm HierCluster recently proposed in [10] for finding out clusters of user trajectories using cell phones in cellular networks This work uses the metric edit distance for determining the similarity between two trajectories However calculating edit distances for hundreds of sequences, which is often the case, is extremely inefficient
In this paper, we introduce an Extended Hierarchical Path Clustering (eHPC1) method for mobility paths of animals
Trang 2Similar to HierCluster, eHPC1 works in the bottom-up fashion
but the similarity or dissimilarity of clusters is determined by
hamming distance Closest clusters are merged until the
number of clusters is equal to a predefined threshold Besides,
in management of animal mobility, we wish to know in
advance the movement direction of an animal To this end we
introduce the algorithm for Prediction of Directional
Movement (PDM)
Finally A simulator system for animal mobility is
developed The performance of eHPC1 is evaluated with
respect to the length of trajectories and the number of objects
The prediction accuracy of the algorithm PDM is also assessed
based on the deviation between moving points The mobility
patterns as well as predictive positions can be used in the
application of animal management
The rest of the paper is organized as follows Section II
overviews work related to clustering methods followed by an
animal path model in Section III Section IV explains the
algorithms for finding clusters of mobility paths and predictive
position Section V shows the experimental results Conclusion
and future work is presented in Section VI
II RELATED WORK Fast mining of the information from the warehouse is
always a significant issue in data analysis A variety of
methods has been developed for this purpose, clustering is one
out of them Clustering is a way of grouping a set of physical
or abstract objects into classes of similar objects
There is a great deal of clustering approaches available, and
each of them may give a different grouping methodology of a
dataset In general, clustering methods may be divided into two
categories based on the cluster structure which they produce:
hierarchical clustering and partitioning clustering [1, 11, 12]
Partitioning methods (K-mean, Bisecting K-mean PAM,
DBSCAN); in which the classes are mutually exclusive, and
the less common clumping method, in which overlap is
allowed Each object is a member of the cluster with which it is
most similar; however the threshold of similarity has to be
defined
The hierarchical approaches can be divided into
agglomerative and divisive [1, 11] Divisive methods (TOP
DOWN) begin with just only one cluster that contains all
sample data Then, the single cluster splits into 2 or more
clusters that have higher dissimilarity between them until the
number of clusters specified by the user is obtained In
contrast, Agglomerative methods (BOTTOM UP), the
hierarchy is built up in a series of N-1 agglomerations or fusion
of pairs of objects, beginning with the un-clustered dataset For
N samples, agglomerative algorithms begin with N clusters and
each cluster contains a single sample or a point Then two
clusters are merged so that the similarity between them is the
closest until the number of clusters becomes one or as specified
by the user In this research, we extend the later for moving
animals
Previous methods have mainly dealt with clustering of
point data Recent improvements in WSNs and tracking
facilities have made it possible to collect a large amount of path data of moving objects There is increasing interest to perform data analysis over these path data A typical data analysis task
is to find objects that have moved in a similar way Thus, an efficient clustering algorithm for path is essential for such data analysis tasks The work in [3, 6] have proposed a model-based clustering algorithm for paths
Recently, there have been a lot of researches on mobility management Compared to the amount of work performed on location update, little has been done in the area of mobility prediction These works have some weaknesses in the following:
For collecting such information, most of the works [3, 5, 6] use highly sophisticated and expensive tools such as GPS, which is very frequent readings uses the battery power faster and can’t re-task the network
The work in [2, 3, 5] assumed the mobility patterns are already available These patterns are then used for mobility prediction and do not attempt to find mobility pattern And prediction is based on the probability distribution of the speed and direction of the objects
This paper studies path clustering method using the previously collected data The algorithm is built based on the idea of the hierarchical clustering approach HierCluster in [10] The edit distance between two strings is employed It defines the minimum number of label changes, insertions and deletes
to map from one string to another Unfortunately, calculating edit distances for hundreds of sequences, which is often the case, is extremely inefficient To solve this problem, we apply hamming distance [4] and this measure is more appropriate for comparing series of labels associated with timestamps The trajectory patterns discovered are then used for problem of predictive mobility
III ANIMAL MOBILITY MODEL
In this paper we assume that the animals move in a space in which a wireless sensor network (WSN) is installed The coverage region of the WSN is partitioned into smaller areas called cells In each cell in the WSN, there is a base station (BS) which has the capability of broadcasting and receiving information The base stations are connected to each other via a fixed wired network The base station receives the sensing data from distributed relay nodes
The coverage area consists of a number of location areas Each location area may consist of one or more cells but in our work we assumed that each location area consists of only one cell Base stations regularly broadcast the ID of the cell in which they are located Therefore, the animals which are in a cell would be picked up by listening to the broadcast channel transmitting the signal The movement of animals from one cell
to another will be recorded in a database which called the home location register In addition, every base station keeps a database in which the profiles of the animals located in this cell are recorded This database is called visitor location register Therefore, in our system it is possible to get the movement history of an animal from the logs on its home location register
Trang 3The mobility path of an animal is defined in form of Tr = <
(id1, t1), (id2, t2) , (idk, tk)> where idk, tk denotes the ID number
of the cell to which the animal enters at timestamp tk In this
recording it is clear that two consecutive ID numbers must be
the ID numbers of two neighboring cells in the network [8, 9]
We call the original data recoded from WSNs the Animal
Actual Paths (AAPs) They are considered as a valuable source
of information because the mobility of the animals contains
both regular and random patterns Therefore based on the
AAPs, we may be able to extract the regular patterns If
needed, the future movement direction of an animal can be
estimated based on the mobility patterns
We assume that AAPs is represented as Tr = (p1, p2…pn) in
which each pi is a moving point as shown in Fig 1 The
moving point is represented by a Tr tuple pi= (xi, yi, ti), in
which ti is the timestamp at the moment the point (xi, yi) is
sampled
Owing to the uncertainty of the mobility and sometimes we
do not need to know the exact coordinate of the animal,
therefore we can transform the absolute position (xi, yi) to a
relative position
The smallest unit of the relative position of the animal is
the cell of the area covered by WSNs According, an AAPS
can be represented by a series of relative positions To do that,
AAP is mapped onto the horizontal plane which is represented
by cells Each cell c is a square shown in Figure 1
As a result, the mobility path can be represented by Tr =
(c1,c2…cn) in which ck denotes the ID of the cell k in the
coverage region
In this paper we will use the format Tr = (c1,c2…cn) in the
proposed algorithm for representing animal path Where: ci:
label of cell in the map if pi falls in that cells
We call the frequently followed mobility paths as Animal
Mobility Patterns (AMPs) Understanding AMPs helps us
understand the mobility rules of the animals It is useful in
making decision related to food resource allowance for animals
as well as conservation as well as exploitation strategies
Besides, sometimes we wish to estimate the movement
direction of an animal based on the mobility rules when we
know the animal trajectory to the current moment We can
predict the next inter-cell movement of the animal by matching
the actual current path to one of the existing mobility patterns
IV APPROACH TO CLUSTERING ANIMAL MOVING PATHS
AND PREDICTIONAL MOVEMENT This section presents a method for clustering animal trajectories by extending hierarchical clustering approach The mobility patterns discovered are then applied to estimate the directional movement of the animals
A Method for clustering animal trajectories with a number
of predefined clusters
The idea of our algorithm is based on the Hierarchical agglomerative clustering (HAC) In other words, it works in Bottom-up fashion Each cluster AMP of animal mobility paths has a representative rep AMP The representative is the path that the minimum total number of distances to the rest of the paths in the same group Figure 3 represent a pair of paths
After mapping the two paths onto the plane, we obtain a series of labels for each path To determine the distance between two sequences, the metric named hamming distance is used Assume we have two strings:
T a = { <c 1 ,t 1 >, <c 2 ,t 2 >, …, <c n ,t n > } and T b = {<d 1 ,t 1 >,
<d 2 ,t 2 >, … , <d n ,t n >}
The distance between Ta and Tb is determined by the following equation (We asume the sequences are compared at the same time):
(1) Where:
Applying the measure hamming distance to computing the distance between a pair of paths in Figure 2, we obtain
hammingDistance(A,B) = 6 because they have no labels in
common and hammingDistance(A,C) = 4 because they have 2
labels in common
Algorithm eHPC1() Input:
+ D: A set consisting of n animal actual paths AAPs
+ k: The number of clusters AMPs Output: k clusters of animal paths
Method:
Fig 2 The distance between a pair of paths
Fig 1 Mobility path of an animal in 3D space
Trang 4Begin
Step 1: Initialize clusters
Create n clusters AMPs with their corresponding n mobility paths
AAPs in D as their representatives
Step2: Repeat merging the closest clusters
while (n > k) {
+ minimum hammingDistance(AMPi.rep, AMPj.rep) Å Find
the two clusters i≠ j whose distance is minimum among the
existing clusters,
//Merge the two closest clusters
+ AAPs of AMP’Å AAPs of AMPi ∪ AAPs of AMPj ;
+ Calculate the representative path of the cluster AMP’;
+ n = n – 1;
}
return k clusters with their representative paths;
End
Algorithm 1 Extended Hierarchical Path Clustering
Algorithm eHPC1() with a predefined number k of clusters
The Algorithm 1 eHPC1() explains the mechanism of
clustering the animal trajectories, which takes as its input a set
D of n animal actual paths and k predefined number of clusters
Initially every single animal actual path (AAP) forms a
cluster AMP itself and this single element also plays the role as
its representative
At each iteration of eHPC1(), two closest AMPi and AMPj
are merged to form a new cluster AMP’(i.e., AAPs of AMP’ is
the union of AAPs of AMPi and AAPs of AMPj) After each
merge operation, the representative of the new cluster AMP’
must be determined
The merge operation is repeatedly carried out until the
number of the AMPs is satisfied a predefined threshold k
B Algorithm for Prediction of directional movement
With the mobility patterns AMPs returned by the algorithm
eHPC1(), we can apply to estimate the path an animal possibly
follow in the near future
An animal’s next movement is predicted by finding the best
matching AMP with the trajectory the animal has been moving
to the current time The best match is the one that has the
minimum distance to the current trajectory In case, more than
cluster one matches, we randomly choose one out of them for
prediction Figure 3 shows the predicted movement of an
animal whose trajectory till the current moment Tr1 The path
Tr1(blue color) is closest to the third cluster (rep3-black color),
so its future mobility is estimated based on the representative rep3
The Algorithm 2 shows the process of Prediction of the Directional Movement PDM() of an animal when the trajectory
Tr1 to the current moment is known With the given set of mobility patterns AMPs, the one whose representative is closest to Tr1 is found, say the representative P={P1, P2…,Pm} The next mobility path will be Tr2={Pm+1, Pm+2…,Pm+r}
It is obvious that if the movement Tr1 is random and much different from the mobility rules the animal often follow, it is impossible to estimate the future mobility To control this case,
a constraint maxdistance is used The next movement of the animal can only be estimated if the distance between its path
Tr1 and the representatives of the clusters is less that threshold
Algorithm PDM() Input:
+ S: Set of clusters AMPs + m: Length of current trajectory Tr 1
+ r: Length of the next moving path + Distance threshold: maxdistance
Output: Future path Tr 2 of the animal
Method:
Begin
Step 1: Find the pattern AMPu whose representative rep is closest
to Tr 1
For ( i ∈ set of patterns AMPs)
uÅ minimum hammingDistance(AMPi.rep, Tr1);
Step 2:
If ( Distance(Tr 1 , u) < maxdistance)
//Predict the future movement of Tr1
Tr2Å( um+1…um+r), (r is length of future movement
of Tr 1) Else
//Can’t predict
Tr2Åφ;
Return Tr2;
End Algorithm 2 Predicting future mobility path of the animal
V EXPERIMENT AND RESULTS This section presents the experimental settings as well as the performance evaluation of the proposed algorithms
A Simulation Setting
To assess the performance of the proposed algorithms, we firstly build a system that simulates the mobility of the animals Without loss of generality, it is assumed that the animals travel on a 10 by 15 square shaped network which gives a total
of 150 cells (area of each cell corresponds to 60x60 pixels) In order to generate the Animal actual paths (AAPs), first a number of Animal mobility patterns (AMPs) is defined The Fig 3 Example of predicted directional movement based on clusters
of mobility paths
Trang 5length of an AMP is determined by a uniform distribution with
a length l Each AMP is taken as a random walk over the
square network Otherwise, the AAPs are gathered by a
function in which AMPs are selected randomly to be used for
the generation of AAPs We also use a corruption mechanism
to distinguish the AAPs from its corresponding AMPs We
insert random cells between the consecutive cells of the AMP
In order to accomplish this step, we define a deviation ratio θ,
which denotes the ratio of the number of such random cells to
the number of cells in the corresponding AMPs
The total number of AAPs is defined by the user and from
this we construct the training and test sets The training set is
90% total set of AAPs and the test set is 10% total set of AAPs
In order to evaluate the algorithm PDM(), we have used
several paths from the original AAPs for testing accuracy
problem Let’s assume that there are n original AAPs, and x
number of AAPs paths (x<n) are selected to discover AMPs by
using the algorithm eHPC1() The rest number (n-x) of paths
creates the test set, which is used to evaluate the accuracy of
algorithm PDM() The accuracy is determined by the following
equation:
100(%)
x
#
#
testdata
paths predicted correctly
TABLE I S HOWS T HE P ARAMETERS A LONG W ITH O UR D EFAULT
V ALUES U SED I N T HE E XPERIMENT
Symbol Description Default
values
r The maximal representation of each cluster 1
l The average length of each AAP 6
c The total trajectories be collected 600
B Execution Time Assessment of the algorithm HPC1
We evaluate the running time of the algorithm with respect
to some dependent factors
1) Impact of Path Length
In the first experiment, we examine the impact the
performance of the algorithm with regard to the average length
of the AAP We fix the number of paths that is 600 and change
the average length of paths
Figure 4 indicates that the execution time of the algorithm eHPC1 increase along the increase in the average length l of the trajectory It is appropriate to the theoretical analysis of the time complexity of eHPC1 discussed above
2) Impact of the number of mobility paths
In this test, we investigate the effect of the number of paths
In this case, we assign the default value (l=6) to the average path length of the data sets
Figure 5 shows that as the number c of mobility paths increases, the running time of eHPC1 algorithm also increases
In this case, the function of the time complexity of the algorithm eHPC1 varies similarly compared to the one observed by the experiment for evaluating the impact of parameter l (path length) However, we can see that the gradient of the running time in Figure 5 is greater than that in Figure 4 Therefore the eHPC1 tends to be greatly impacted by the number of paths The running time sharply increase if the number of paths is too high
In conclusion, the algorithm eHPC works well when the number of data as well as the trajectory length is not too much big
3) Assessment of the accuracy of mobility prediction algorithm PDM based on deviation
We finally examine the effect of deviation on the accuracy
of algorithm PDM using the Equation (3)
To some extent there must be time and/or space differences among data samples, even though animals follow the same daily paths In our study, the position of animals described by cells is a solution to deal with the uncertainty, in section V A when each AMP is taken as a random walk, in each cell we can Fig 6 Accuracy of the algorithm PDM as a function of the deviation Fig 5 Running time depend on number c of paths
Fig 4 Running time as a function of the average path length l
Trang 6choose randomly a lot of positions of animals, but all the
postions in a cell are always located with a fixed ID of that cell
In the last experiment, the position of animal is varied with
respect to a random parameter denoted as deviation θ As the
result shown in Figure 6, the prediction accuracy value is very
high about 100% when the deviation reaches zero However,
this is not a realistic case because there is no possibility of
absence of deviation between samples at different times A
realistic corruption value would be smaller than 20 which is the
default value used in our experiments As the deviation
increases, the accuracy of the algorithm PDM also decreases
Therefore, the algorithm PDM work well as animals follow
similar movements and the effectiveness is degraded as the
randomness increases
In brief, our algorithms work well when the average length
of animal paths is not too long and the number of trajectories is
not too big We will improve their performance by taking
density of data into account and getting rid the clusters whose
number of paths is scarce
VI CONCLUSION Awareness of the animal movement direction helps us
establish WSNs topology more appropriately Concretely, the
organization of sensor nodes specially reply node and base
station in a network or re-task the nodes can be done so that
unnecessary nodes can be gotten rid of In the meantime, power
consumption of the used nodes is optimized and still
guarantees the visualization of animal efficiently To address
this problem in this study, we presented an extended
hierarchical clustering method eHPC1 for mining
animal-movement patterns from the animal actual paths AAPs
collected by the deployment of wireless sensor network To
forecast the movement direction of an animal, we proposed the
algorithm PDM which takes as its input the clusters AMPs
discovered by eHPC1 For a given trajectory of an animal to
the current moment, we can predict the next moving positions
by finding the best matching AMPs Nonetheless, the
difference between the best matching with the current
trajectory must be smaller than a predefined minimum
distance Otherwise, the prediction would become meaningless
A simulator was finally implemented to generate the mobility
paths of the animals The performance of the proposed
algorithms is evaluated in terms of running time and prediction
accuracy
In this work, we have not taken the time domain of the
application into account yet We assumed that AAPs were
collected over a season of a year However, with the rapid
development of hardware technology, we are able collect
AAPs continuously and seamlessly for over a year without any
problem related to sensor battery In ongoing work, we are
classifying AAPs according to some specific time period, for
example month or season because animals might change their
movement habits for finding food, a mate, or migration
Accordingly, different sets of AAPs should be associated with different time intervals We then will improve our algorithms
to associate the time period for each mobility pattern Besides, clustering algorithm will be upgraded for the case the number
of clusters is not known in advance
ACKNOWLEDGMENT This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No 2008-0062611)
REFERENCES [1] Han, J and Kamber, “M Data Mining: Concepts and Techniques,” 2nd ed., Morgan Kaufmann, 2006
[2] Affney, S., Robertson, A., Smyth, P., Camargo, S., and Ghil, M
“Probabilistic Clustering of Extra tropical Cyclones Using Regression Mixture Models,” Technical Report UCI-ICS 06-02, University of California, Irvine, 2007
[3] B Liang and Z Haas, “Predictive Distance-Based mobility Management for PCS Networks,” IEEE Conference on Computer and Communications, pp 1377-1384, 1999
[4] Alex X Liu, Ke Shen, Eric Torng, “Large scale Hamming distance query processing,” Data Engineering (ICDE), IEEE 27th International Conference, pp 553 – 564, 2011
[5] S Rajagopal, R.B Srinivasan, R.B Narayan, and X.B.C Petit, “GPS – Based Predictive Resource Allocation in Cellular Networks,” IEEE International Conference, pp 229-234, 2002
[6] Jose Luis Patino Vilchis, François Bremond, “Incremental learning on trajectory clustering,” Intelligent Paradigms in Safety and Security Springer-Verlag, 2010
[7] Jean-Marie Zogg “GPS Basics - Introduction to the system Application overview,” Switzerland, 2002
[8] Luca Mottola, “Programming wireless sensor networks: Fundamental concepts and state of the art,” ACM Computing Surveys, USA, 2011 [9] Karel Heurtefeux, Florence Maraninchi, Fabrice Valois, “AreaCast: a Communication by Area in Wireless Sensor Networks,” Verimag Research Report 2011
[10] Dimitrios Katsaros, Alexandros Nanopoulos, Murat Karakaya, Gökhan Yavas, Özgür Ulusoy, Yannis Manolopoulos, “Clustering Mobile Trajectories for Resource Allocation in Mobile Environments,” Intelligent Data Analysis - IDA , pp 319-329, 2003
[11] Shalini, D.V.S, Shashi, M ; Sowjanya, A.M, “Mining frequent patterns
of stock data using hybrid clustering,” India Conference (INDICON), Annual IEEE, 1-4, 2011
[12] Akshay Krishnamurthy, Sivaraman Balakrishnan, Min Xu, Aarti Singh,
“Efficient Active Algorithms for Hierarchical Clustering,” International Conference on Machine Learning ICML, 2012
[13] Guo, Ying, Corke, Peter, Poulton, Geoff, Wark, Tim, Bishop-Hurley, Greg, Swain, “Animal behaviour understanding using wireless sensor networks,” 31st IEEE Conference on Local Computer Networks, Florida, U.S.A, 2006
[14] Ren, Keni, Karlsson, Johannes, “Animal Tracking in Multi-Modal Wireless Sensor Networks,” SNCNW, 2012
[15] Arnab Raha, Shovan Maity, Mrinal Kanti Naskar, Omar Alfandi, and Dieter Hogrefe, “An Optimal Sensor Deployment Scheme to Ensure Multi Level Coverage and Connectivity in Wireless Sensor Networks,”
In 8th International Wireless Communications & Mobile Computing Conference, 2012
Trang 7Program Guide author Index
iCAST 2013
Technical Program
UMEDIA 2013 Technical Program