DSpace at VNU: A novel clustering method for animal trajectory analysis using Wireless Sensor Network

A Novel Clustering Method for Animal Trajectory Analysis using Wireless Sensor Network Quang Hiep Vu Database/Bioinformatics Laboratory Chungbuk National University Cheongju, Korea hie

Trang 1

A Novel Clustering Method for Animal Trajectory

Analysis using Wireless Sensor Network

Quang Hiep Vu Database/Bioinformatics Laboratory

Chungbuk National University

Cheongju, Korea hiep88@dblab.chungbuk.ac.kr

Meijing Li Database/Bioinformatics Laboratory

Chungbuk National University

Cheongju, Korea mjlee@dblab.chungbuk.ac.kr

Thi Hong Nhan Vu Human Machine Interaction Laboratory UET, Vietnam National University

Hanoi, Vietnam vthnhan@gmail.com

Keun Ho Ryu Database/Bioinformatics Laboratory Chungbuk National University Cheongju, Korea khryu@dblab.chungbuk.ac.kr

Abstract—Animal plays an important role in our Earth,

researching the movements of animals is very helpful for us to

conserve rare and precious species as well as food exploration In

this paper, we employ Wireless Sensor Networks (WSNs) with

the potential for highly increased spatial and temporal resolution

of measurement data Hence WSNs promise enhanced tracking

of animals without human intervention To help experts making

a better species and habitat assessment as well as conversation

strategies, we propose an Extended Hierarchical Path clustering

eHPC1 method for analyzing the mobility of wild animals A

predictive mobility algorithm is also presented, which help

experts solve the problems in data allocation and management A

system that simulates the mobility of animals is implemented

Performance of the proposed method is finally evaluated in terms

of running time and estimation accuracy

Keywords—Clustering methods, animal trajectory analysis,

wireless sensor network

I INTRODUCTION The animal kingdom is very large having a variety of

animals big and small which can be found in water, in air and

on land They have different shapes and sizes Animals are our

natural resources and it is not infinite so we should do our best

to conserve them They need our care and love To that end,

animal tracking is very useful, which helps us understand how

individuals and populations move within local areas, migrate

across oceans and continents, and evolve through millennia

This information is being used to address environmental

challenges such as climate and land use change, biodiversity

loss, invasive species, and the spread of infectious diseases

Wireless Sensor Networks (WSNs) provide an advanced

solution for tracking animals A WSN is composed of relay

nodes, sensor nodes, and the base stations Cellular networks

can also be used considering the difficulty of achieving the

necessary radio range coverage The WSN displays precise animal locations and movements By using the Received Signal Strength Indicator the trilateration method can be used

to exactly locate the animals and GPS positioning (equipped with animals) gives accurate position information that can be

stored on the sensor node The sensing data from distributed

relay nodes will transmit to base stations and the base stations can use satellites or cellular networks to transmit the data to the researcher [7, 8] In this research, we assume animals always move in a coverage region of the WSNs and we can equip the devices in that region for monitoring animals

In this paper, we employ WSNs to track animals WSNs technology will be more effective than other technologies for obtaining the required information with a very considerable reduction of the intervention of the researcher The automatically collected volume of data is enormous, which is analyzed and used in long-term decision making [13, 14, 15] Understanding the movement patterns of animals helps us

in making strategies of rare wildlife conversation and efficiently food exploration For that purpose, many techniques for data analysis have been proposed Clustering is just one of them Conventional methods such as K-mean, DBSCAN, etc cannot be directly applied to discover hidden trajectory patterns since they were originally proposed for objects in form of points, not for objects in form of time series [1] An algorithm HierCluster recently proposed in [10] for finding out clusters of user trajectories using cell phones in cellular networks This work uses the metric edit distance for determining the similarity between two trajectories However calculating edit distances for hundreds of sequences, which is often the case, is extremely inefficient

In this paper, we introduce an Extended Hierarchical Path Clustering (eHPC1) method for mobility paths of animals

Trang 2

Similar to HierCluster, eHPC1 works in the bottom-up fashion

but the similarity or dissimilarity of clusters is determined by

hamming distance Closest clusters are merged until the

number of clusters is equal to a predefined threshold Besides,

in management of animal mobility, we wish to know in

advance the movement direction of an animal To this end we

introduce the algorithm for Prediction of Directional

Movement (PDM)

Finally A simulator system for animal mobility is

developed The performance of eHPC1 is evaluated with

respect to the length of trajectories and the number of objects

The prediction accuracy of the algorithm PDM is also assessed

based on the deviation between moving points The mobility

patterns as well as predictive positions can be used in the

application of animal management

The rest of the paper is organized as follows Section II

overviews work related to clustering methods followed by an

animal path model in Section III Section IV explains the

algorithms for finding clusters of mobility paths and predictive

position Section V shows the experimental results Conclusion

and future work is presented in Section VI

II RELATED WORK Fast mining of the information from the warehouse is

always a significant issue in data analysis A variety of

methods has been developed for this purpose, clustering is one

out of them Clustering is a way of grouping a set of physical

or abstract objects into classes of similar objects

There is a great deal of clustering approaches available, and

each of them may give a different grouping methodology of a

dataset In general, clustering methods may be divided into two

categories based on the cluster structure which they produce:

hierarchical clustering and partitioning clustering [1, 11, 12]

Partitioning methods (K-mean, Bisecting K-mean PAM,

DBSCAN); in which the classes are mutually exclusive, and

the less common clumping method, in which overlap is

allowed Each object is a member of the cluster with which it is

most similar; however the threshold of similarity has to be

defined

The hierarchical approaches can be divided into

agglomerative and divisive [1, 11] Divisive methods (TOP

DOWN) begin with just only one cluster that contains all

sample data Then, the single cluster splits into 2 or more

clusters that have higher dissimilarity between them until the

number of clusters specified by the user is obtained In

contrast, Agglomerative methods (BOTTOM UP), the

hierarchy is built up in a series of N-1 agglomerations or fusion

of pairs of objects, beginning with the un-clustered dataset For

N samples, agglomerative algorithms begin with N clusters and

each cluster contains a single sample or a point Then two

clusters are merged so that the similarity between them is the

closest until the number of clusters becomes one or as specified

by the user In this research, we extend the later for moving

animals

Previous methods have mainly dealt with clustering of

point data Recent improvements in WSNs and tracking

facilities have made it possible to collect a large amount of path data of moving objects There is increasing interest to perform data analysis over these path data A typical data analysis task

is to find objects that have moved in a similar way Thus, an efficient clustering algorithm for path is essential for such data analysis tasks The work in [3, 6] have proposed a model-based clustering algorithm for paths

Recently, there have been a lot of researches on mobility management Compared to the amount of work performed on location update, little has been done in the area of mobility prediction These works have some weaknesses in the following:

For collecting such information, most of the works [3, 5, 6] use highly sophisticated and expensive tools such as GPS, which is very frequent readings uses the battery power faster and can’t re-task the network

The work in [2, 3, 5] assumed the mobility patterns are already available These patterns are then used for mobility prediction and do not attempt to find mobility pattern And prediction is based on the probability distribution of the speed and direction of the objects

This paper studies path clustering method using the previously collected data The algorithm is built based on the idea of the hierarchical clustering approach HierCluster in [10] The edit distance between two strings is employed It defines the minimum number of label changes, insertions and deletes

to map from one string to another Unfortunately, calculating edit distances for hundreds of sequences, which is often the case, is extremely inefficient To solve this problem, we apply hamming distance [4] and this measure is more appropriate for comparing series of labels associated with timestamps The trajectory patterns discovered are then used for problem of predictive mobility

III ANIMAL MOBILITY MODEL

In this paper we assume that the animals move in a space in which a wireless sensor network (WSN) is installed The coverage region of the WSN is partitioned into smaller areas called cells In each cell in the WSN, there is a base station (BS) which has the capability of broadcasting and receiving information The base stations are connected to each other via a fixed wired network The base station receives the sensing data from distributed relay nodes

The coverage area consists of a number of location areas Each location area may consist of one or more cells but in our work we assumed that each location area consists of only one cell Base stations regularly broadcast the ID of the cell in which they are located Therefore, the animals which are in a cell would be picked up by listening to the broadcast channel transmitting the signal The movement of animals from one cell

to another will be recorded in a database which called the home location register In addition, every base station keeps a database in which the profiles of the animals located in this cell are recorded This database is called visitor location register Therefore, in our system it is possible to get the movement history of an animal from the logs on its home location register

Trang 3

The mobility path of an animal is defined in form of Tr = <

(id1, t1), (id2, t2) , (idk, tk)> where idk, tk denotes the ID number

of the cell to which the animal enters at timestamp tk In this

recording it is clear that two consecutive ID numbers must be

the ID numbers of two neighboring cells in the network [8, 9]

We call the original data recoded from WSNs the Animal

Actual Paths (AAPs) They are considered as a valuable source

of information because the mobility of the animals contains

both regular and random patterns Therefore based on the

AAPs, we may be able to extract the regular patterns If

needed, the future movement direction of an animal can be

estimated based on the mobility patterns

We assume that AAPs is represented as Tr = (p1, p2…pn) in

which each pi is a moving point as shown in Fig 1 The

moving point is represented by a Tr tuple pi= (xi, yi, ti), in

which ti is the timestamp at the moment the point (xi, yi) is

sampled

Owing to the uncertainty of the mobility and sometimes we

do not need to know the exact coordinate of the animal,

therefore we can transform the absolute position (xi, yi) to a

relative position

The smallest unit of the relative position of the animal is

the cell of the area covered by WSNs According, an AAPS

can be represented by a series of relative positions To do that,

AAP is mapped onto the horizontal plane which is represented

by cells Each cell c is a square shown in Figure 1

As a result, the mobility path can be represented by Tr =

(c1,c2…cn) in which ck denotes the ID of the cell k in the

coverage region

In this paper we will use the format Tr = (c1,c2…cn) in the

proposed algorithm for representing animal path Where: ci:

label of cell in the map if pi falls in that cells

We call the frequently followed mobility paths as Animal

Mobility Patterns (AMPs) Understanding AMPs helps us

understand the mobility rules of the animals It is useful in

making decision related to food resource allowance for animals

as well as conservation as well as exploitation strategies

Besides, sometimes we wish to estimate the movement

direction of an animal based on the mobility rules when we

know the animal trajectory to the current moment We can

predict the next inter-cell movement of the animal by matching

the actual current path to one of the existing mobility patterns

IV APPROACH TO CLUSTERING ANIMAL MOVING PATHS

AND PREDICTIONAL MOVEMENT This section presents a method for clustering animal trajectories by extending hierarchical clustering approach The mobility patterns discovered are then applied to estimate the directional movement of the animals

A Method for clustering animal trajectories with a number

of predefined clusters

The idea of our algorithm is based on the Hierarchical agglomerative clustering (HAC) In other words, it works in Bottom-up fashion Each cluster AMP of animal mobility paths has a representative rep AMP The representative is the path that the minimum total number of distances to the rest of the paths in the same group Figure 3 represent a pair of paths

After mapping the two paths onto the plane, we obtain a series of labels for each path To determine the distance between two sequences, the metric named hamming distance is used Assume we have two strings:

T a = { <c 1 ,t 1 >, <c 2 ,t 2 >, …, <c n ,t n > } and T b = {<d 1 ,t 1 >,

<d 2 ,t 2 >, … , <d n ,t n >}

The distance between Ta and Tb is determined by the following equation (We asume the sequences are compared at the same time):

(1) Where:

Applying the measure hamming distance to computing the distance between a pair of paths in Figure 2, we obtain

hammingDistance(A,B) = 6 because they have no labels in

common and hammingDistance(A,C) = 4 because they have 2

labels in common

Algorithm eHPC1() Input:

+ D: A set consisting of n animal actual paths AAPs

+ k: The number of clusters AMPs Output: k clusters of animal paths

Method:

Fig 2 The distance between a pair of paths

Fig 1 Mobility path of an animal in 3D space

Trang 4

Begin

Step 1: Initialize clusters

Create n clusters AMPs with their corresponding n mobility paths

AAPs in D as their representatives

Step2: Repeat merging the closest clusters

while (n > k) {

+ minimum hammingDistance(AMPi.rep, AMPj.rep) Å Find

the two clusters i≠ j whose distance is minimum among the

existing clusters,

//Merge the two closest clusters

+ AAPs of AMP’Å AAPs of AMPi ∪ AAPs of AMPj ;

+ Calculate the representative path of the cluster AMP’;

+ n = n – 1;

}

return k clusters with their representative paths;

End

Algorithm 1 Extended Hierarchical Path Clustering

Algorithm eHPC1() with a predefined number k of clusters

The Algorithm 1 eHPC1() explains the mechanism of

clustering the animal trajectories, which takes as its input a set

D of n animal actual paths and k predefined number of clusters

Initially every single animal actual path (AAP) forms a

cluster AMP itself and this single element also plays the role as

its representative

At each iteration of eHPC1(), two closest AMPi and AMPj

are merged to form a new cluster AMP’(i.e., AAPs of AMP’ is

the union of AAPs of AMPi and AAPs of AMPj) After each

merge operation, the representative of the new cluster AMP’

must be determined

The merge operation is repeatedly carried out until the

number of the AMPs is satisfied a predefined threshold k

B Algorithm for Prediction of directional movement

With the mobility patterns AMPs returned by the algorithm

eHPC1(), we can apply to estimate the path an animal possibly

follow in the near future

An animal’s next movement is predicted by finding the best

matching AMP with the trajectory the animal has been moving

to the current time The best match is the one that has the

minimum distance to the current trajectory In case, more than

cluster one matches, we randomly choose one out of them for

prediction Figure 3 shows the predicted movement of an

animal whose trajectory till the current moment Tr1 The path

Tr1(blue color) is closest to the third cluster (rep3-black color),

so its future mobility is estimated based on the representative rep3

The Algorithm 2 shows the process of Prediction of the Directional Movement PDM() of an animal when the trajectory

Tr1 to the current moment is known With the given set of mobility patterns AMPs, the one whose representative is closest to Tr1 is found, say the representative P={P1, P2…,Pm} The next mobility path will be Tr2={Pm+1, Pm+2…,Pm+r}

It is obvious that if the movement Tr1 is random and much different from the mobility rules the animal often follow, it is impossible to estimate the future mobility To control this case,

a constraint maxdistance is used The next movement of the animal can only be estimated if the distance between its path

Tr1 and the representatives of the clusters is less that threshold

Algorithm PDM() Input:

+ S: Set of clusters AMPs + m: Length of current trajectory Tr 1

+ r: Length of the next moving path + Distance threshold: maxdistance

Output: Future path Tr 2 of the animal

Method:

Begin

Step 1: Find the pattern AMPu whose representative rep is closest

to Tr 1

For ( i ∈ set of patterns AMPs)

uÅ minimum hammingDistance(AMPi.rep, Tr1);

Step 2:

If ( Distance(Tr 1 , u) < maxdistance)

//Predict the future movement of Tr1

Tr2Å( um+1…um+r), (r is length of future movement

of Tr 1) Else

//Can’t predict

Tr2Åφ;

Return Tr2;

End Algorithm 2 Predicting future mobility path of the animal

V EXPERIMENT AND RESULTS This section presents the experimental settings as well as the performance evaluation of the proposed algorithms

A Simulation Setting

To assess the performance of the proposed algorithms, we firstly build a system that simulates the mobility of the animals Without loss of generality, it is assumed that the animals travel on a 10 by 15 square shaped network which gives a total

of 150 cells (area of each cell corresponds to 60x60 pixels) In order to generate the Animal actual paths (AAPs), first a number of Animal mobility patterns (AMPs) is defined The Fig 3 Example of predicted directional movement based on clusters

of mobility paths

Trang 5

length of an AMP is determined by a uniform distribution with

a length l Each AMP is taken as a random walk over the

square network Otherwise, the AAPs are gathered by a

function in which AMPs are selected randomly to be used for

the generation of AAPs We also use a corruption mechanism

to distinguish the AAPs from its corresponding AMPs We

insert random cells between the consecutive cells of the AMP

In order to accomplish this step, we define a deviation ratio θ,

which denotes the ratio of the number of such random cells to

the number of cells in the corresponding AMPs

The total number of AAPs is defined by the user and from

this we construct the training and test sets The training set is

90% total set of AAPs and the test set is 10% total set of AAPs

In order to evaluate the algorithm PDM(), we have used

several paths from the original AAPs for testing accuracy

problem Let’s assume that there are n original AAPs, and x

number of AAPs paths (x<n) are selected to discover AMPs by

using the algorithm eHPC1() The rest number (n-x) of paths

creates the test set, which is used to evaluate the accuracy of

algorithm PDM() The accuracy is determined by the following

equation:

100(%)

x

#

testdata

paths predicted correctly

TABLE I S HOWS T HE P ARAMETERS A LONG W ITH O UR D EFAULT

V ALUES U SED I N T HE E XPERIMENT

Symbol Description Default

values

r The maximal representation of each cluster 1

l The average length of each AAP 6

c The total trajectories be collected 600

B Execution Time Assessment of the algorithm HPC1

We evaluate the running time of the algorithm with respect

to some dependent factors

1) Impact of Path Length

In the first experiment, we examine the impact the

performance of the algorithm with regard to the average length

of the AAP We fix the number of paths that is 600 and change

the average length of paths

Figure 4 indicates that the execution time of the algorithm eHPC1 increase along the increase in the average length l of the trajectory It is appropriate to the theoretical analysis of the time complexity of eHPC1 discussed above

2) Impact of the number of mobility paths

In this test, we investigate the effect of the number of paths

In this case, we assign the default value (l=6) to the average path length of the data sets

Figure 5 shows that as the number c of mobility paths increases, the running time of eHPC1 algorithm also increases

In this case, the function of the time complexity of the algorithm eHPC1 varies similarly compared to the one observed by the experiment for evaluating the impact of parameter l (path length) However, we can see that the gradient of the running time in Figure 5 is greater than that in Figure 4 Therefore the eHPC1 tends to be greatly impacted by the number of paths The running time sharply increase if the number of paths is too high

In conclusion, the algorithm eHPC works well when the number of data as well as the trajectory length is not too much big

3) Assessment of the accuracy of mobility prediction algorithm PDM based on deviation

We finally examine the effect of deviation on the accuracy

of algorithm PDM using the Equation (3)

To some extent there must be time and/or space differences among data samples, even though animals follow the same daily paths In our study, the position of animals described by cells is a solution to deal with the uncertainty, in section V A when each AMP is taken as a random walk, in each cell we can Fig 6 Accuracy of the algorithm PDM as a function of the deviation Fig 5 Running time depend on number c of paths

Fig 4 Running time as a function of the average path length l

Trang 6

choose randomly a lot of positions of animals, but all the

postions in a cell are always located with a fixed ID of that cell

In the last experiment, the position of animal is varied with

respect to a random parameter denoted as deviation θ As the

result shown in Figure 6, the prediction accuracy value is very

high about 100% when the deviation reaches zero However,

this is not a realistic case because there is no possibility of

absence of deviation between samples at different times A

realistic corruption value would be smaller than 20 which is the

default value used in our experiments As the deviation

increases, the accuracy of the algorithm PDM also decreases

Therefore, the algorithm PDM work well as animals follow

similar movements and the effectiveness is degraded as the

randomness increases

In brief, our algorithms work well when the average length

of animal paths is not too long and the number of trajectories is

not too big We will improve their performance by taking

density of data into account and getting rid the clusters whose

number of paths is scarce

VI CONCLUSION Awareness of the animal movement direction helps us

establish WSNs topology more appropriately Concretely, the

organization of sensor nodes specially reply node and base

station in a network or re-task the nodes can be done so that

unnecessary nodes can be gotten rid of In the meantime, power

consumption of the used nodes is optimized and still

guarantees the visualization of animal efficiently To address

this problem in this study, we presented an extended

hierarchical clustering method eHPC1 for mining

animal-movement patterns from the animal actual paths AAPs

collected by the deployment of wireless sensor network To

forecast the movement direction of an animal, we proposed the

algorithm PDM which takes as its input the clusters AMPs

discovered by eHPC1 For a given trajectory of an animal to

the current moment, we can predict the next moving positions

by finding the best matching AMPs Nonetheless, the

difference between the best matching with the current

trajectory must be smaller than a predefined minimum

distance Otherwise, the prediction would become meaningless

A simulator was finally implemented to generate the mobility

paths of the animals The performance of the proposed

algorithms is evaluated in terms of running time and prediction

accuracy

In this work, we have not taken the time domain of the

application into account yet We assumed that AAPs were

collected over a season of a year However, with the rapid

development of hardware technology, we are able collect

AAPs continuously and seamlessly for over a year without any

problem related to sensor battery In ongoing work, we are

classifying AAPs according to some specific time period, for

example month or season because animals might change their

movement habits for finding food, a mate, or migration

Accordingly, different sets of AAPs should be associated with different time intervals We then will improve our algorithms

to associate the time period for each mobility pattern Besides, clustering algorithm will be upgraded for the case the number

of clusters is not known in advance

ACKNOWLEDGMENT This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No 2008-0062611)

REFERENCES [1] Han, J and Kamber, “M Data Mining: Concepts and Techniques,” 2nd ed., Morgan Kaufmann, 2006

[2] Affney, S., Robertson, A., Smyth, P., Camargo, S., and Ghil, M

“Probabilistic Clustering of Extra tropical Cyclones Using Regression Mixture Models,” Technical Report UCI-ICS 06-02, University of California, Irvine, 2007

[3] B Liang and Z Haas, “Predictive Distance-Based mobility Management for PCS Networks,” IEEE Conference on Computer and Communications, pp 1377-1384, 1999

[4] Alex X Liu, Ke Shen, Eric Torng, “Large scale Hamming distance query processing,” Data Engineering (ICDE), IEEE 27th International Conference, pp 553 – 564, 2011

[5] S Rajagopal, R.B Srinivasan, R.B Narayan, and X.B.C Petit, “GPS – Based Predictive Resource Allocation in Cellular Networks,” IEEE International Conference, pp 229-234, 2002

[6] Jose Luis Patino Vilchis, François Bremond, “Incremental learning on trajectory clustering,” Intelligent Paradigms in Safety and Security Springer-Verlag, 2010

[7] Jean-Marie Zogg “GPS Basics - Introduction to the system Application overview,” Switzerland, 2002

[8] Luca Mottola, “Programming wireless sensor networks: Fundamental concepts and state of the art,” ACM Computing Surveys, USA, 2011 [9] Karel Heurtefeux, Florence Maraninchi, Fabrice Valois, “AreaCast: a Communication by Area in Wireless Sensor Networks,” Verimag Research Report 2011

[10] Dimitrios Katsaros, Alexandros Nanopoulos, Murat Karakaya, Gökhan Yavas, Özgür Ulusoy, Yannis Manolopoulos, “Clustering Mobile Trajectories for Resource Allocation in Mobile Environments,” Intelligent Data Analysis - IDA , pp 319-329, 2003

[11] Shalini, D.V.S, Shashi, M ; Sowjanya, A.M, “Mining frequent patterns

of stock data using hybrid clustering,” India Conference (INDICON), Annual IEEE, 1-4, 2011

[12] Akshay Krishnamurthy, Sivaraman Balakrishnan, Min Xu, Aarti Singh,

“Efficient Active Algorithms for Hierarchical Clustering,” International Conference on Machine Learning ICML, 2012

[13] Guo, Ying, Corke, Peter, Poulton, Geoff, Wark, Tim, Bishop-Hurley, Greg, Swain, “Animal behaviour understanding using wireless sensor networks,” 31st IEEE Conference on Local Computer Networks, Florida, U.S.A, 2006

[14] Ren, Keni, Karlsson, Johannes, “Animal Tracking in Multi-Modal Wireless Sensor Networks,” SNCNW, 2012

[15] Arnab Raha, Shovan Maity, Mrinal Kanti Naskar, Omar Alfandi, and Dieter Hogrefe, “An Optimal Sensor Deployment Scheme to Ensure Multi Level Coverage and Connectivity in Wireless Sensor Networks,”

In 8th International Wireless Communications & Mobile Computing Conference, 2012

Trang 7

Program Guide author Index

iCAST 2013

Technical Program

UMEDIA 2013 Technical Program

Định dạng
Số trang	7
Dung lượng	365,28 KB