Basics This chapter surveys off-line formulations of single and multiple change point data batch.wise, many important algorithms have natural on-line implemen- In segmentation... We wil
Trang 1Off-line approaches
4.1 Basics 89
4.2 Segmentation criteria 91
4.2.1 ML change time sequence estimation 91
4.2.2 Information based segmentation 92
4.3 On-line local search for optimum 94
4.3.1 Local tree search 94
4.3.2 A simulation example 95
4.4 Off-line global search for optimum 98
4.4.1 Local minima 98
4.4.2 An MCMC approach 101
4.5 Change point estimation 102
4.5.1 The Bayesian approach 103
4.5.2 The maximum likelihood approach 104
4.5.3 A non-parametric approach 104
4.6 Applications 106
4.6.1 Photon emissions 106
4.6.2 Altitude sensor quality 107
4.6.3 Rat EEG 108
4.1 Basics
This chapter surveys off-line formulations of single and multiple change point
data batch.wise, many important algorithms have natural on-line implemen-
In segmentation the goal is to find a sequence kn = ( k ~ k2 k n ) of time
Adaptive Filtering and Change Detection
Fredrik Gustafsson Copyright © 2000 John Wiley & Sons, Ltd ISBNs: 0-471-49287-6 (Hardback); 0-470-84161-3 (Electronic)
Trang 290 Off-line amroaches
the signal can be accurately described as piecewise constant, i.e.,
are other possibilities Equation (4.1) will be the signal model used throughout this chapter, but it should be noted that an important extension to the case where the parameter is slowly varying within each segment is possible with minor modifications However, equation (4.1) illustrates the basic ideas One way to guarantee that the best possible solution is found is t o consider
choose the particular kn that minimizes an optimality criteria,
h
kn = arg min V ( k n )
n>l,O<kl< <k,=N
The procedure is illustrated below:
optimality criteria have been proposed:
estimate of kn is studied
V ( i ) (the sum of squared residuals), and the total information is the sum
of these Since the total information is minimized for the degenerated
Similar problems have been studied in the context of model structure selection, and from this literature Akaike’s AIC and BIC criteria have been proposed for segmentation
change at each time instant) Here, several strategies have been proposed:
Monte Carlo (MCMC) techniques
Trang 34.2 Seamentation criteria 91
vectors
4.2 Segmentation criteria
This section describes the available statistical and information based optimiza- tion criteria
kl, k2, , k, is estimated from the data sequence yt Later, on-line algorithms
will be derived from this approach We will use the likelihood for data, given
that the vector of change points is p(ytlkn)
Trang 492 Off-line amroaches
independence That is,
The a posteriori probability for k is defined by p ( k I y t ) Bayes' rule P(AIB) =
U P ( B 1 A ) thus gives
P ( B )
probability density function equal t o one The last term is recognized as the
posteriori ( M A P ) estimate, which is not influenced by the scaling factor p ( y t ) ,
MAP and ML estimators coincide
n
i = O
(4.10)
n
(4.11)
Trang 54.2 Seamentation criteria 93
more change points, the smaller loss function The easiest way t o see this is
to consider the extreme case where the number of change points equals the
because there is no error In fact, the loss function is monotonously decreasing
principle, which says that the best data description is a compromise between performance (small loss function) and complexity (few parameters) This is
principle is the choice of model structures in system identification Penalty terms occuring in model order selection problems can also be used in this application, for instance:
The asymptotically equivalent criteria: Akaike’s BIC (Akaike, 1977),
Section 5.3.2
AIC is proposed in Kitagawa and Akaike (1978) for auto-regressive models
models, it would read
(4.12)
constant noise variance, leading to
h
known, but it is not commented upon in Kitagawa and Akaike (1978)
The MDL theory provides a nice interpretation of the segmentation prob- lem: Choose the segments such that the fewest possible data bits are used t o
Trang 694 Off-line approaches
vectors and the prediction errors are stored with finite accuracy
where marginalized ML works fine
4.3 On-line local search for optimum
used in the following section
other alternative using global search strategies, examined in the next section, decides which branches to examine on an off-line basis
in common with the famous Viterbi algorithm in equalization; see Algorithm
Algorithm 4.1 Recursive signal segmentation
probabilities or information-based criteria
Compute recursively the optimality criterion using a bank of least squares
segmentation
Use the following rules for maintaining the hypotheses and keeping the
c) Assume a minimum segment length: let the most probable sequence
Trang 74.3 On-line local search for oDtimum 95
Figure 4.1 The tree of jump sequences A path labeled 0 corresponds t o no jump, while 1 corresponds t o a jump
d ) Assure that sequences are not cut off immediately after they are born:
a certain minimum lafe-length, until only M are left
The last two restrictions are optional, but might be useful in some cases
than exponential complexity in the data size, which would be the consequence
Trang 896 Off-line amroaches
12
Measurements and real parameters
Figure 4.2 A change in the mean signal with three abrupt changes of increasing magnitude
respectively The plot mimics Figure 4.1 but is ‘upside down’ Each line rep- resents one hypothesis and shows how the number of change points evolves for
there is one filter that performs best and the other filters are used t o evaluate change points at each time instant After having lived for three samples with-
time 22 one filter reacts and at time 23 the correct change time is found After the last change, it takes three samples until the correct hypothesis becomes the most likely
result from Appendix 7.A) Figure 4.4 shows how the hypotheses examine all branches that need to be considered
Trang 94.3 On-line local search for oDtimum 97
Local search with M=5
I l I
0 5 10 15 20 25 30 35 40
Time [sample Local search withll\rl=8
Figure 4.4 Evolution of change hypotheses for a local search with M = 40 A small offset
is added to the number of change points for each hypothesis
Trang 1098 Off-line amroaches
4.4 Off-line global search for optimum
Example 4.7 Local minimum for one change point
This example shows that we must do an exhaustive search for the change point However, this might not improve the likelihood if there are two change points as the following example demonstrates
Example 4.2 Local minimum for two change points
The signal in Figure 4.6 is constant and zero, except for a small segment
in the middle of the data record The global minimum of the likelihood is
Such an example might motivate an approach where a complete search of one and two change points is performed This will work in most cases, but
Trang 114.4 Off-line alobal search for oDtimum 99
Figure 4.5 Upper plot: signal Lower plot: negative log likelihood p ( k ) with global mini-
mum at k = 100 but local minimum at k = 0
= (100,110) but local minimum at k = 0 No improvement for k" = m, m =
Trang 12100 Off-line amroaches
Example 4.3 Global search using one and two change points
true change times generally
Example 4.4 Counterexample of convergence of global search
Assume the following exponential distribution on the noise:
Then the negative log likelihood for no change point is
Trang 134.4 Off-line alobal search for oDtimum 101
Figure 4.8 The signal in the counter example with M = 10
residuals in two of the segments are identically zero, so their likelihood vanish
That is, its negative log likelihood is larger than the likelihood given no change
at all Thus, we will never find the global optimum by trying all combinations
search for three change points
Markov Chain Monte Carlo (MCMC) approaches are surveyed in Chapter
MCMC algorithm proposed in Fitzgerald et al (1994) for signal estimation is
a combination of Gibbs sampling and the Metropolis algorithm The algorithm
below is based solely on the knowledge of the likelihood function for data given
applied, which defines the Metropolis algorithm: the candidate will be rejected
with large probability if its value is unlikely
Trang 14102 Off-line amroaches
Algorithm 4.2 MCMC signal segmentation
from
- p(kj1k;" except 5 )
be taken as flat, or Gaussian centered around the previous estimate
3 The candidate j is accepted with probability
likelihood
computed by Monte Carlo techniques
and that one has to decide what the burn-in time is
Example 4.5 MCMC search for two change points
4.5 Change point estimation
Trang 154.5 Chanae Doint estimation 103
Figure 4.9 Result of the MCMC Algorithm 4.2 Left plot (a) shows the examined jump
sequence in each iteration The best encountered sequence is marked with dashed lines The
right plot (b) shows an histogram over all considered jump sequences The burn-in time is
not excluded in the histogram
(1975), where different procedures to test H0 t o H1 are described The meth-
ods are off-line and only one change point may exist in the data The Bayesian
and likelihood methods are closely related to the already described algorithms
However, the non-parametric approaches below are interesting and unique for
same, gives:
N
t=2
N
Trang 16104 Off-line amroaches
~ N - l N
L k=l t=k+l
one possible estimate of the jump time (change point) is given by
for P3 and similarly for P4
Using the ML method the test statistics are as follows:
tribution for the noise Non-parametric tests for the first problem, assuming only whiteness, are based on the decision rule:
Trang 174.5 Chanae Doint estimation 105
where the distance measure s i is one of the following ones:
Here, med denotes the median and sign is the sign function The first method
estimate is given by the maximizing argument
Example 4.6 Change point estimation
To get a feeling of the different test statistics, we compare the statistics
from zero to one Both signals have white Gaussian measurement noise with
for the abruptly changing data (though the Bayesian statistic seems to have
Trang 18106 Off-line amroaches
some probability of over-estimating the change time), and that there should
be no problem in designing a good threshold
The explicit formulas given in this section, using the maximum likelihood
special cases of the more general formulas in Section 4.2, which can be verified
by the reader
4.6 Applications
likelihood based algorithms with respect to any distribution of the noise The exponential distribution has the nice property of offering explicit and compact
(MGL) The standard algorithm assumes Gaussian noise, but can still be
Trang 19The result is shown in Figure 4.11 Clearly, there are non-stationarities in the
might be used in real-time automatic surveillance
Section 2.2.1 One problem is to detect the critical regions of variance increases
Figure 4.13 shows the same low-pass filtered variance estimate (in logarith- mic scale) and the result from the ML variance segmentation algorithm The
Trang 20108 Off-line amroaches
10-3; 500 l000 ,500 2000 2500 3000 3500 4600
Figure 4.1 3 Low-pass filtered and segmented squared residuals for altitude data
number That is, we know precisely where the measurements are useful
based on a band pass filter and level thresholding on the output power This
EEG signal on a rat
Trang 214.6 Amlications 109
gives
[l096 1543 1887 2265 2980 3455 3832 39341
applied This gives
C754 1058 1358 1891 2192 2492 2796 3098 3398 36991
It can be noted that the changes are hardly abrupt for this signal