Data Analysis Machine Learning and Applications Episode 2 Part 1 pot

Thus, at each time step t, the algorithm keeps track of a set of active situation hypotheses, based on a sequence of relational descriptions.. Update all instantiated situation HMMs acco

Trang 1

Fig 2 passing maneuver and corresponding HMM.

an HMM that describes this sequences could have three states, one for each step of

the maneuver: q0= behind(R,R ), q1= left(R,R ), and q2= in_front_of(R,R ).The transition model of this HMM is depicted in Figure 2 It defines the allowedtransitions between the states Observe how the HMM specifies that when in the

second state (q1), that is, when the passing car is left of the reference car, it can

only remain left (q1) or move in front of the reference car (q2) It is not allowed to

move behind it again (q0) Such a sequence would not be a valid passing situation

according to our description

A situation HMM consists of a tuple O = (Q, A, S), whereQ = {q0, , q N } resents a finite set of N states, which are in turn abstract states as described in the previous section, A = {a i j } is the state transition matrix where each entry a i jrepre-

rep-sents the probability of a transition from state q i to state q j , and S = {S i } is the initial

state distribution, where Si represents the probability of state q ibeing the initial state.Additionally, just as for the DBNs, there is also an observation model In our case,this observation model is the same for every situation HMM, and will be described

in detail in Section 4.1

4 Recognizing situations

The idea behind our approach to situation recognition is to instantiate at each timestep new candidate situation HMMs and to track these over time A situation HMMcan be instantiated if it assigns a positive probability to the current state of the sys-

tem Thus, at each time step t, the algorithm keeps track of a set of active situation

hypotheses, based on a sequence of relational descriptions

The general algorithm for situation recognition and tracking is as follows At

every time step t,

1 Estimate the current state of the system x t(see Section 2)

2 Generate relational representation o t from x t: From the estimated state of the

system x t , a conjunction o tof grounded relational atoms with an associated ability is generated (see next section)

prob-3 Update all instantiated situation HMMs according to o t: Bayes filtering is used

to update the internal state of the instantiated situation HMMs

Trang 2

Probabilistic Relational Modeling of Situations 273

4 Instantiate all non-redundant situation HMMs consistent with o t : Based on o t, allsituation HMMs are grounded, that is, the variables in the abstract states of the

HMM are replaced by the constant terms present in o t If a grounded HMM

as-signs a non-zero probability to the current relational description o t, the situationHMM can be instantiated However, we must first check that no other situation

of the same type and with the same grounding has an overlapping internal state

If this is the case, we keep the oldest instance since it provides a more accurateexplanation for the observed sequence

4.1 Representing uncertainty at the relational level

At each time step t, our algorithm estimates the state x tof the system The estimatedstate is usually represented through a probability distribution which assigns a proba-bility to each possible hypothesis about the true state In order to be able to use thesituation HMMs to recognize situation instances, we need to represent the estimatedstate of the system as a grounded abstract state using relational logic

To convert the uncertainties related to the estimated state x t into appropriate certainties at the relational level, we assign to each relation the probability massassociated to the interval of the state space that it represents The resulting distribu-tion is thus a histogram that assigns to each relation a single cumulative probability.Such a histogram can be thought of as a piecewise constant approximation of the

un-continuous density The relational description o t of the estimated state of the system

x t at time t is then a grounded abstract state where each relation has an associated

probability

The probability P(o t |q i ) of observing o twhile being in a grounded abstract state

q i is computed as the product of the matching terms in o t and q i In this way, theobservation probabilities needed to estimate the internal state of the situation HMMs

and the likelihood of a given sequence of observations O 1:t = (o1, , o t) can becomputed

4.2 Situation model selection using Bayes factors

The algorithm for recognizing situations keeps track of a set of active situation

hy-pothesis at each time step t We propose to decide between models at a given time t

using Bayes factors for comparing two competing situation HMMs that explain thegiven observation sequence Bayes factors (Kass and Raftery (1995)) provide a way

of evaluating evidence in favor of a probabilistic model as opposed to another one

The Bayes factor B 1,2for two competing models O1and O2is computed as

Trang 3

In order to use the Bayes factor as evaluation criterion, the observation sequence

O t:t +nwhich the models in Equation 1 are conditioned on, must be the same for thetwo models being compared This is, however, not always the case, since situationcan be instantiated at any point in time To solve this problem we propose a solutionused for sequence alignment in bio-informatics (Durbin et al (1998)) and extend the

situation model using a separate world model to account for the missing part of the observation sequence This world model in our case is defined analogously to the

bigram models that are learn from the corpora in the field of natural language cessing (Manning and Schütze (1999)) By using the extended situation model, wecan use Bayes factors to evaluate two situation models even if they where instantiated

pro-at different points in time

5 Evaluation

Our framework was implemented and tested in a traffic scenario using a simulated

3D environment TORCS - The Open Racing Car Simulator (Espié and Guionneau)

was used as simulation environment The scenario consisted of several autonomousvehicles with simple driving behaviors and one reference vehicle controlled by ahuman operator Random noise was added to the pose of the vehicles to simulate un-certainty at the state estimation level The goal of the experiments is to demonstratethat our framework can be used to model and successfully recognize different sit-uations in dynamic multi-agent environments Concretely, three different situationsrelative to a reference car where considered:

1 The passing situation corresponds to the reference car being passed by another

car The passing car approaches the reference car from behind, it passes it on theleft, and finally ends up in front of it

2 The aborted passing situation is similar to the passing situation, but the reference

car is never fully overtaken The passing car approaches the reference car frombehind, it slows down before being abeam, and ends up behind it again

3 The follow situation corresponds to the reference car being followed from behind

by another car at a short distance and at the same velocity

The structure and parameters of the corresponding situation HMMs where definedmanually The relations considered for these experiments where defined over therelative distance, position, and velocity of the cars

Figure 3 (left) plots the likelihood of an observation sequence corresponding to

a passing maneuver During this maneuver, the passing car approaches the referencecar from behind Once at close distance, it maintains the distance for a couple ofseconds It then accelerates and passes the reference car on the left to finally end up

in front of it It can be observed in the figure how the algorithm correctly tiated the different situation HMMs and tracked the different instances during the

instan-execution of the maneuver For example, the passing and aborted passing situations

where instantiated simultaneously from the start, since both situation HMMs initially

Trang 4

Probabilistic Relational Modeling of Situations 275

4 6 8 10 12 14 16 18 20 22

time (s) passing vs follow

Fig 3 (Left) Likelihood of the observation sequence for a passing maneuver according to

the different situation models, and (right) Bayes factor in favor of the passing situation model

against the other situation models

describe the same sequence of observations The follow situation HMM was

instanti-ated, as expected, at the point where both cars where close enough and their relativevelocity was almost zero Observe too that at this point, the likelihood according to

the passing and aborted passing situation HMMs starts to decrease rapidly, since

these two models do not expect both cars to drive at the same speed As the passing

vehicle starts changing to the left lane, the HMM for the follow situation stops

pro-viding an explanation for the observation sequence and, accordingly, the likelihoodstarts to decrease rapidly until it becomes almost zero At this point the instance ofthe situation is not tracked anymore and is removed from the active situation set This

happens since the follow situation HMM does not expect the vehicle to speed up and

change lanes

The Bayes factor in favor of the passing situation model compared against the follow situation model is depicted in Figure 3 (right) A positive Bayes factor value indicates that there is evidence in favor of the passing situation model Observe that

up to the point where the follow situation is actually instantiated the Bayes factor

keeps increasing rapidly At the time where both cars are equally fast, the evidence

in favor of the passing situation model starts decreasing until it becomes negative At this point there is evidence against the passing situation model, that is, there is evidence in favor of the follow situation Finally, as the passing vehicle starts changing

to the left lane the evidence in favor of the passing situation model starts

increas-ing again Figure 3 (right) shows how Bayes factors can be used to make decisionsbetween competing situation models

6 Conclusions and further work

We presented a general framework for modeling and recognizing situations Our proach uses a relational description of the state space and hidden Markov models torepresent situations An algorithm was presented to recognize and track situations

ap-in an onlap-ine fashion The Bayes factor was proposed as evaluation criterion between

Trang 5

two competing models Using our framework, many meaningful situations can bemodeled Experiments demonstrate that our framework is capable of tracking multi-ple situation hypotheses in a dynamic multi-agent environment.

References

ANDERSON, C R., DOMINGOS, P and WELD, D A (2002): Relational Markov models

and their application to adaptive web navigation Proc of the International Conference

on Knowledge Discovery and Data Mining (KDD).

COCORA, A., KERSTING, K., PLAGEMANN, C and BURGARD, W and DE RAEDT,

L (2006): Learning Relational Navigation Policies Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

COLLETT, T., MACDONALD, B and GERKEY, B (2005): Player 2.0: Toward a

Practi-cal Robot Programming Framework In: Proceedings of the Australasian Conference on Robotics and Automation (ACRA 2005).

DEAN, T and KANAZAWA, K (1989): A Model for Reasoning about Persistence and

Cau-sation Computational Intelligence, 5(3):142-150.

DURBIN, R., EDDY, S., KROGH, A and MITCHISON, G (1998):Biological Sequence ysis Cambridge University Press.

Anal-FERN, A and GIVAN, R (2004): Relational sequential inference with reliable observations

Proc of the International Conference on Machine Learning.

JEFFREYS, H (1961): Theory of Probability (3rd ed.) Oxford University Press.

KASS, R and RAFTERY, E (1995): Bayes Factors Journal of the American Statistical sociation, 90(430):773-795.

As-KERSTING, K., DE RAEDT, L and RAIKO, T (2006): Logical Hidden Markov Models

Journal of Artificial Intelligence Research.

MANNING, C.D and SCHÜTZE, H (1999): Foundations of Statistical Natural Language Processing The MIT Press.

RABINER, L (1989): A tutorial on hidden Markov models and selected applications in speech

recognition Proceedings of the IEEE, 77(2):257–286.

ESPIÉ, E and GUIONNEAU, C TORCS - The Open Racing Car Simulator.

http://torcs.sourceforge.net

Trang 6

Applying the Q n Estimator Online

Robin Nunkesser1, Karen Schettlinger2and Roland Fried2

1 Department of Computer Science, Univ Dortmund, 44221 Dortmund, Germany

Robin.Nunkesser@udo.edu

2 Department of Statistics, Univ Dortmund, 44221 Dortmund, Germany

{schettlinger,fried}@statistik.uni-dortmund.de

Abstract Reliable automatic methods are needed for statistical online monitoring of noisy

time series Application of a robust scale estimator allows to use adaptive thresholds for the

detection of outliers and level shifts We propose a fast update algorithm for the Q nestimatorand show by simulations that it leads to more powerful tests than other highly robust scaleestimators

tion of robust methods which are able to withstand some largely deviating values.

However, many robust methods are computationally too demanding for real timeapplication if efficient algorithms are not available

Gather and Fried (2003) recommend Rousseeuw and Croux’s (1993) Q n

estima-tor to measure the variability of the noise in robust signal extraction The Q nsesses a breakdown point of 50%, i.e it can resist up to almost 50% large outlierswithout becoming extremely biased Additionally, its Gaussian efficiency is 82% inlarge samples, which is much higher than that of other robust scale estimators: for ex-ample, the asymptotic efficiency of the median absolute deviation about the median(MAD) is only 36% However, in an online application to moving time windows theMAD can be updated inO(logn) time (Bernholt et al (2006)), while the fastest algorithm known so far for the Q nneedsO(nlogn) time (Croux and Rousseeuw (1992)), where n is the width of the time window.

Trang 7

pos-In this paper, we construct an update algorithm for the Q nestimator which, inpractice, is substantially faster than the offline algorithm and implies an advantagefor online application The algorithm is easy to implement and can also be used

to compute the Hodges-Lehmann location estimator (HL) online Additionally, we

show by simulation that the Q nleads to resistant rules for shift detection which havehigher power than rules using other highly robust scale estimators This better power

can be explained by the well-known high efficiency of the Q nfor estimation of thevariability

Section 2 presents the update algorithm for the Q n Section 3 describes a parative study of rules for level shift detection which apply a robust scale estimatorfor fixing the thresholds Section 4 draws some conclusions

For data x1, , x n , x i ∈ R and k =!n/2"+12 , !a" denoting the largest integer not larger than a, the Q nscale estimator is defined as

ˆV(Q) = c (Q) n |x i − x j |,1 ≤ i < j ≤ n(k) ,

corresponding to approximately the first quartile of all pairwise differences Here,

c (Q) n denotes a finite sample correction factor for achieving unbiasedness for the

estimation of the standard deviation V at Gaussian samples of size n For online analysis of a time series x1, , x N , we can apply the Q n to a moving time win-

dow x t−n+1 , , x t of width n < N, always adding the incoming observation x t+1and

deleting the oldest observation x t−n+1 when moving the time window from t to t+1

Addition of x t+1and deletion of x t−n+1 is called an update in the following.

It is possible to compute the Q n as well as the HL estimator of n observations with

an algorithm by Johnson and Mizoguchi (1978) in running timeO(nlogn), which has

been proved to be optimal for offline calculation An optimal online update algorithmtherefore needs at leastO(logn) time for insertion or deletion, respectively, since

otherwise we could construct an algorithm faster thanO(nlogn) for calculating the

Q nfrom scratch TheO(logn) time bound was achieved for k = 1 by Bespamyatnikh (1998) For larger k - as needed for the computation of Q nor the HL estimator - theproblem gets more difficult and to our knowledge there is no online algorithm, yet.Following an idea of Smid (1991), we use a buffer of possible solutions to get an

online algorithm for general k, because it is easy to implement and achieves a good

running time in practice Theoretically, the worst case amortized time per update may

not be better than the offline algorithm, because k=O(n2) in our case However, wecan show that our algorithm runs substantially faster for many data sets

Lemma 1 It is possible to compute the Q n and the HL estimator by computing the kth order statistic in a multiset of form X +Y = {x i + y j | x i ∈ X and y j ∈ Y } Proof For X = {x1, , x n }, k =!n/2"+1

2 , and k = k + n +n

2 we may compute

the Q nin the following way:

Trang 8

Applying the Q nEstimator Online 279

c (Q) n {|x i − x j |,1 ≤ i < j ≤ n} (k )= c (Q) n {x (i) − x (n− j+1) , 1 ≤ i, j ≤ n} (k) . Therefore me may compute the Q n by computing the kth order statistic in X +(−X).

To compute the HL estimator ˆz = median(x i + x j )/2,1 ≤ i ≤ j ≤ n, we only

need to compute the median element in X/2 + X/2 following the convention that in multisets of form X + X exactly one of x i + x j and x j + x i appears for each i and j 2

To compute the kth order statistic in a multiset of form X + Y , we use the

al-gorithm of Johnson and Mizoguchi (1978) Due to Lemma 1, we only consider theonline version of this algorithm in the following

con-determined to be certainly smaller or certainly greater than this element, and parts ofthe matrix are excluded from further consideration according to a case differentia-

tion As soon as less than n elements remain for consideration, they are sorted and the

sought-after element is returned The algorithm may easily be extended to compute

a bufferBof size s of matrix elements b (k−!(s−1)/2"):n2, , b (k+!s/2"):n2

To achieve a better computation time in online application, we use balanced trees,more precisely indexed AVL-trees, as the main data structure Inserting, deleting,finding and determining the rank of an element needsO(logn) time in this data

structure We additionally use two pointers for each element in a balanced tree In

detail, we store X, Y , andB in separate balanced trees and let the pointers of an

element b i j = x (i) + y ( j) ∈B point to x (i) ∈ X and y ( j) ∈ Y , respectively The first and second pointer of an element x (i) ∈ X points to the smallest and greatest element such that b i j ∈B for 1 ≤ j ≤ n The pointers for an element y ( j) ∈ Y are defined

analogously

Insertion and deletion of data points into the bufferBcorrespond to the insertion

and deletion of matrix rows or columns in B We only consider insertions into and deletions from X in the following, because they are similar to insertions into and deletions from Y

Deletion of element xdel

1 Search in X for xdeland determine its rank i and the elements b s and b gpointedat

2 Determine y ( j) and y () with the help of the pointers such that b s = x (i) +y ( j)and

b g = x (i) + y ()

3 Find all elements b m = x (i) + y (m) ∈Bwith j ≤ m ≤ .

4 Delete these elements b mfrom B, delete xdelfrom X, and update the pointers

accordingly

Trang 9

5 Compute the new position of the kth element inB.

Insertion of element xins

1 Determine the smallest element b s and the greatest element b ginB

2 Determine with a binary search the smallest j such that xins+ y ( j) ≥ b s and the

greatest such that xins+ y () ≤ b g

3 Compute all elements b m = xins+ y (m) with j ≤ m ≤ .

4 Insert these elements b m into B, insert xinsinto X and update pointers to and

from the inserted elements accordingly

5 Compute the new position of the kth element inB

It is easy to see, that we need a maximum of O(|deleted elements|logn) and

O(|inserted elements|logn) time for deletion and insertion, respectively After tion and insertion we determine the new position of the kth element inB and returnthe new solution or recomputeB with the offline algorithm if the kth element is not

dele-inB any more We may also introduce bounds on the size ofBin order to maintainlinear size and to recomputeBif these bounds are violated

For the running time we have to consider the number of elements in the buffer

that depend on the inserted or deleted element and the amount the kth element may

move in the buffer

Theorem 1 For a constant signal with stationary noise, the expected amortized time

per update isO(logn).

Proof In a constant signal with stationary noise, data points are exchangeable in the

sense that the rank of each data point in the set of all data points is equiprobable

Assume w.l.o.g that we only insert into and delete from X Consider for each rank i

of an element in X the number of buffer elements depending on it, i.e.

{i | b i j ∈B}.WithO(n) elements inBand equiprobable ranks of the observations inserted into or

deleted from X, the expected number of buffer elements depending on an observation

isO(1) Thus, the expected number of buffer elements to delete or insert during anupdate step is alsoO(1) and the expected time we spend for the update isO(logn).

To calculate the amortized running time, we have to consider the number of times

B has to be recomputed With equiprobable ranks, the expected amount the kth

el-ement moves in the buffer for a deletion and a subsequent insertion is 0 Thus, theexpected time the buffer has to be recomputed is also 0 and consequently, the ex-

2.2 Running time simulations

To show the good performance of the algorithm in practice, we conducted some

running time simulations for online computation of the Q n The first data set for thesimulations suits the conditions of Theorem 1, i.e it consists of a constant signalwith standard normal noise and an additional 10% outliers of size 8 The second dataset is the same in the first third of the time period, before an upward shift of size 8and a linear upward trend in the second third and another downward shift of size 8

Trang 10

Fig 2 Positions ofB in the matrix B for data set 1 (left) and 2 (right).

and a linear downward trend in the final third occur The reason to look at this dataset is to analyze situations with shifts, trends and trend changes, because these arenot covered by Theorem 1

We analyzed the average number of buffer insertions and deletions needed for an

update when performing 3n updates of windows of size n with 10 ≤ n ≤ 500

Re-call, that the insertions and deletions directly determine the running time A variablenumber of updates assures similar conditions for all window widths Additionally,

we analyzed the position ofBover time visualized in the matrix B when performing

3000 updates with a window of size 1000

We see in Figure 1 that the number of buffer insertions and deletions for the firstdata set seems to be constant as expected, apart from a slight increase caused by the10% outliers The second data set causes a stronger increase, but is still far from the

theoretical worst case of 4n insertions and deletions.

Considering Figure 2 we gain some insight into the observed number of updatesteps For the first data set, elements ofBare restricted to a small region in the matrix

B This region is recovered for the first third of the second data set in the right-hand

Fig 1 Insertions and deletions needed for an update with growing window size n.

Trang 11

side figure The trends in the second data set causeB to be in an additional, evenmore concentrated diagonal region, which is even better for the algorithm The causefor the increased running time is the time it takes to adapt to trend changes After atrend change there is a short period, in which parts ofBare situated in a wider region

of the matrix B.

3 Comparative study

An important task in signal extraction is the fast and reliable detection of abruptlevel shifts Comparison of two medians calculated from different windows has beensuggested for the detection of such edges in images (Bovik and Munson (1986),Hwang and Haddad (1994)) This approach has been found to give good results also

in signal processing (Fried (2007)) Similar as for the two-sample t-test, an estimate

of the noise variance is needed for standardization Robust scale estimators like the

Q ncan be applied for this task Assuming that the noise variance can vary over time

but is locally constant within each window, we calculate both the median and the Q n separately from two time windows y t−h+1 , , y t and y t+1, , y t +kfor the detection

of a level shift between times t and t + 1 Let ˜z t− and ˜z t+be the medians from thetwo time windows, and ˆVt−and ˆVt+be the scale estimate for the left and the right

window of possibly different widths h and k An asymptotically standard normal test

statistic in case of a (locally) constant signal and Gaussian noise with a constantvariance is

˜z t+− ˜z t−

+

0.5S( ˆV2

t− /h + ˆV2t+/k)

Critical values for small sample sizes can be derived by simulation

Figure 3 compares the efficiencies of the Q n, the median absolute deviation about

the median (MAD) and the interquartile range (IQR) measured as the percentage ance of the empirical standard deviation as a function of the sample size n, derived from 200000 simulation runs for each n Obviously, the Q nis much more efficientthan the other, ’classical’ robust scale estimators

vari-The higher efficiency of the Q nis an intuitive explanation for median

compar-isons standardized by the Q n having higher power than those standardized by the

MAD or the IQR if the windows are not very short The power functions depicted in Figure 3 for the case h = k = 15 have been derived from shifts of several heights

G = 0, 1, ,6 overlaid by standard Gaussian noise, using 10000 simulation runs

each The two-sample t-test, which is included for the reason of comparison, fers under Gaussian assumptions higher power than all the median comparisons, ofcourse However, Figure 3 shows that its power can drop down to zero because of

of-a single outlier, even if the shift is huge To see this, of-a shift of fixed size 10V wof-asgenerated, and a single outlier of increasing size into the opposite direction of theshift inserted briefly after the shift The median comparisons are not affected by a

single outlier even if windows as short as h = k = 7 are used.

Trang 12

Fig 3 Gaussian efficiencies (top left), power of shift detection (top right), power for a

10V-shift in case of an outlier of increasing size (bottom left), and detection rate in case of an

increasing number of deviating observations (bottom right): Q n (solid), MAD (dashed), IQR (dotted), and S n(dashed-dot) The two-sample t-test (thin solid) is included for the reason ofcomparison

As a final exercise, we treat shift detection in case of an increasing number ofdeviating observations in the right-hand window Since a few outliers should neithermask a shift nor cause false detection when the signal is constant, we would like

a test to resist the deviating observations until more than half of the observationsare shifted, and to detect a shift from then on Figure 3 shows the detection rates

calculated as the percentage of cases in which a shift was detected for h = k = 7 Median comparisons with the Q nbehave as desired, while a few outliers can mask

a shift when using the IQR for standardization, similar as for the t-test This can be explained by the IQR having a smaller breakdown point than the Q n and the MAD.

4 Conclusions

The proposed new update algorithm for calculation of the Q nscale estimator or theHodges-Lehmann location estimator in a moving time window shows good running

Định dạng
Số trang	25
Dung lượng	534,76 KB