378 Chanae detection based on multide models Here St is a discrete parameter representing the mode of the system linearized mode, faulty mode etc., and it takes on one of S different va
Trang 110
multiple models
10.1 Basics 377
10.2 Examples of applications 378
10.3 On-line algorithms 385
10.3.1 General ideas 385
10.3.2 Pruning algorithms 386
10.3.3 Merging strategies 387
10.3.4 A literature survey 389
10.4 Off-line algorithms 391
10.4.1 The EM algorithm 391
10.4.2 MCMC algorithms 392
10.5 Local pruning in blind equalization 395
10.5.1 Algorithm 395
10.A.Posterior distribution 397
1O.A.l.Posterior distribution of the continuous state 398
lO.A.2 Unknown noise level 400
10.1 Basics
This chapter addresses the most general problem formulation of detection in linear systems Basically, all problem formulations that have been discussed
so far are included in the framework considered The main purpose is to survey multiple model algorithms, and a secondary purpose is to overview and compare the state of the art in different application areas for reducing complexity, where similar algorithms have been developed independently The goal is to detect abrupt changes in the state space model
Yt = C t ( & h + + et
ut E N ( m u , t ( & ) , Q t ( & ) )
et E N ( m e , t ( & ) , &(Q)
Adaptive Filtering and Change Detection
Fredrik Gustafsson Copyright © 2000 John Wiley & Sons, Ltd ISBNs: 0-471-49287-6 (Hardback); 0-470-84161-3 (Electronic)
Trang 2378 Chanae detection based on multide models
Here St is a discrete parameter representing the mode of the system (linearized
mode, faulty mode etc.), and it takes on one of S different values (mostly
we have the case S = 2) This model incorporates all previously discussed problems in this book, and is therefore the most general formulation of the es- timation and detection problem Section 10.2 gives a number of applications, including change detection and segmentation, but also model structure selec- tion, blind and standard equalization, missing data and outliers The common theme in these examples is that there is an unknown discrete parameter, mode,
in a linear system
One natural strategy for choosing a S is the following:
0 For each possible 6, filter the data through a Kalman filter for the (con- ditional) known state space model (10.1)
0 Choose the particular value of 6, whose Kalman filter gives the smallest prediction errors
In fact, this is basically how the MAP estimator
g M A P = arg rn?P(6lYN) (10.2) works, as will be proven in Theorem 10.1 The structure is illustrated in Figure 10.1
The key tool in this chapter is a repeated application of Bayes’ law t o compute a posteriori probabilities:
(10.3)
W Y t - c t ( s t ) s t , t - l ( 6 t ) , R t ( 6 t ) + Ct(S,)P,,t-,(6,)C,T(s,))
A proof is given in Section 10.A The latter equation is recursive and suitable for implementation This recursion immediately leads to a multiple model algorithm summarized in Table 10.1 This table also serves as a summary of the chapter
A classical signal processing problem is to find a sinusoid in noise, where the phase, amplitude and frequency may change in time Multiple model ap- proaches are found in Caciotta and Carbone (1996) and Spanjaard and White
Trang 310.2 ExamDles of amlications 379
Table 10.1 A generic multiple model algorithm
1 Kalman filtering: conditioned on a particular sequence bt, the state estimation problem in (10.1) is solved by a Kalman filter This will be
called the conditional Kalman filter, and its outputs are
2 Mode evaluation: for each sequence, we can compute, up to an un-
known scaling factor, the posterior probability given the measurements,
using (10.3)
3 Distribution: at time t , there are St different sequences St, which will
be labeled @(i), i = 1 , 2 , , S t It follows from the theorem of total probability that the exact posterior density of the state vector is
This distribution is a Gaussian mixture with St modes
4 Pruning and merging (on-line): for on-line applications, there are
two approaches to approximate the Gaussian mixture, both aiming at removing modes so only a fixed number of modes in the Gaussian mixture are kept The exponential growth can be interpreted as a tree, and the approximation strategies are merging and pruning Pruning is simply to
cut off modes in the mixture with low probability In merging, two or more modes are replaced by one new Gaussian distribution
5 Numerical search (off-line): for off-line analysis, there are numerical
approaches based on the EM algorithm or MCMC methods We will detail some suggestions for how to generate sequences of bt which will theoretically belong to the true posterior distribution
Trang 4380 Chanae detection based on multide models
Figure 10.1 The multiple model approach
(1995) In Daumera and Falka (1998), multiple models are used to find the change points in biomedical time series In Caputi (1995), the multiple model
is used to model the input to a linear system as a switching Gaussian process Actuator and sensor faults are modeled by multiple models in Maybeck and Hanlon (1995) Wheaton and Maybeck (1995) used the multiple model ap- proach for acceleration modeling in target tracking, and Yeddanapudi et al (1997) applied the framework to target tracking in ATC These are just a few examples, more references can be found in Section 10.3.4 Below, important special cases of the general model are listed as examples It should be stressed that the general algorithm and its approximations can be applied t o all of them
Example 70.1 Detection in changing mean model
Consider the case of an unknown constant in white noise Suppose that
we want to test the hypothesis that the 'constant' has been changed at some unknown time instant We can then model the signal by
yt = 81 + a(t - S + l)& + et,
where a ( t ) is the step function If all possible change instants are t o be consid- ered, the variable S takes its value from the set {l, 2, , t - 1, t } , where S = t
Trang 510.2 ExamDles of amlications 381
should be interpreted as no change (yet) This example can be interpreted as
a special case of ( l O l ) , where
1
S = {1,2, , t } , xt = (81,82)~, At(S) = ( 0 a(t - S + 1)
Ct(S) = (1, l), Q t ( S ) = 0 &(S) = X
The detection problem is to estimate S
Example 70.2 Segmentation in changing mean model
Suppose in Example 10.1 that there can be arbitrarily many changes in the mean The model used can be extended by including more step functions, but such a description would be rather inconvenient A better alternative to model the signal is
&+l =& + Stvt
Yt =& + et
6, E{O, 11
Here the changes are modeled as the noise ut, and the discrete parameter S,
is 1 if a change occurs at time t and 0 otherwise Obviously, this is a special case of (10.1) where the discrete variable is SN = ( S l , b ~ , , S and
S N = (0, xt = B t , At(S) = 1, Ct = 1, Q t ( S ) = StQt, Rt = X Here (0, denotes all possible sequences of zeros and ones of length N The
problem of estimating the sequence S N is called segmentation
Example 70.3 Model structure selection
Suppose that there are two possible model structures for describing a mea- sured signal, namely two auto-regressions with one or two parameters,
6 = 1 : yt = -alyt-l+ et
6 = 2 : yt = -alyt-l - a2yt-2 + e t
Here, et is white Gaussian noise with variance X We want to determine from
a given data set which model is the most suitable One solution is to refer to the general problem with discrete parameters in (10.1) Here we can take
At(6) = I , Q t ( S ) = 0, &(S) = X
Trang 6382 Change detection based on multiple models
and
The problem of estimating S is called model structure selection
Example 10.4 Equalization
A typical digital communication problem is to estimate a binary signal, ut,
transmitted through a channel with a known characteristic and measured at the output A simple example is
We refer to the problem of estimating the input sequence with a known channel
as equalization
Example 10.5 Blind equalization
Consider again the communication problem in Example 10.4, but assume now that both the channel model and the binary signal are unknown a priori
We can try to estimate the channel parameters as well by using the model
The problem of estimating the input sequence with an unknown channel is called blind equalization
Trang 7where some of the measurements are known t o be bad One possible approach
to this problem is to model the measurement noise as a Gaussian mixture,
M i=l
where C cq = 1 With this notation we mean that the density function for et
( N ( p 1 , Q 1 ) with probability a 1
1 N ( p 2 , Q 2 ) with probability a 2
et E
[ ; ( P M , Q M ) with probability Q M
Hence, the noise distribution can be written
where St E { 1 , 2 , , M } and the prior is chosen as p ( & = i ) = ai
The simplest way to describe possible outliers is t o take p1 = p 2 = 0, Q 1
equal to the nominal noise variance, Q 2 as much larger than Q 1 and a 2 = 1 - a 1
equal to a small number This models the fraction a 2 of all measurements as outliers with a very large variance The Kalman filter will then ignore these measurements, and the a posteriori probabilities are almost unchanged
Trang 8384 Chanae detection based on multide models
Example 10.7 Missing data
In some applications it frequently happens that measurements are missing, typically due to sensor failure A suitable model for this situation is
yt =( 1 - &)Ctxt + et (10.5) This model is used in Lainiotis (1971) The model (10.5) corresponds to the choices
in the general formulation (10.1) For a thorough treatment of missing data, see Tanaka and Katayama (1990) and Parzen (1984)
Example 10.8 Markov models
Consider again the case of missing data, modeled by (10.5) In applications, one can expect that a very low fraction, say p l l , of the data is missing On the other hand, if one measurement is missing, there is a fairly high probability, say p22, that the next one is missing as well This is nothing but a prior assumption on 6, corresponding to a Markov chain Such a state space model
is commonly referred to as a jump linear model A Markov chain is completely specified by its transition probabilities
and the initial probabilities p ( & = i ) = p i Here we must have p12 = 1 - p 2 2
and p21 = 1 -p11 In our framework, this is only a recursive description of the prior probability of each sequence,
For outliers, and especially missing data, the assumption of an underlying Markov chain is particularly logical It is used, for instance, in MacGarty (1975)
Trang 910.3 On-line alaorithms 385
10.3 On-line algorithms
Interpret the exponentially increasing number of discrete sequences St as a
growing tree, as illustrated in Figure 10.2 It is inevitable that we either
prune or merge this tree
In this section, we examine how one can discard elements in S by cutting off branches in the tree, and lump sequences into subsets of S by merging branches
Thus, the basic possibilities for pruning the tree are to cut 08 branches and to merge two or more branches into one That is, two state sequences are
merged and in the following treated as just one There is also a timing question:
at what instant in the time recursion should the pruning be performed? To understand this, the main steps in updating the a posteriori probabilities can
be divided into a time update and a measurement update as follows:
Figure 10.2 A growing tree of discrete state sequences In GPB(2) the sequences (1,5),
(2,6), (3,7) and (4,8), respectively, are merged In GPB(1) the sequences (1,3,5,7) and (2,4,6,8), respectively, are merged
Trang 10386 Change detection based on multiple models
0 Time update:
(10.6) (10.7)
First, a quite general pruning algorithm is given
1 Compute recursively the conditional Kalman filter for a bank of M se- quences @(i) = ( S l ( i ) , & ( i ) , , 6 t ( i ) ) T , i = 1 , 2 , , M
2 After the measurement update at time t , prune all but the M / S most probable branches St(i)
3 At time t + 1: let the M / S considered branches split into S M / S = M
branches, S t s l ( j ) = (st(i),&+l) for all @(i) and &+l Update their a posteriori probabilities according to Theorem 10.1
For change detection purposes, where 6, = 0 is the normal outcome and
St # 0 corresponds to different fault modes, we can save a lot of filters in the filter bank by using a local search scheme similar to that in Algorithm 7.1
Algorithm 70.2 Local pruning for multiple models
1 Compute recursively the conditional Kalman filter for a bank of M hy- potheses of @(i) = ( S l ( i ) , 6 2 ( i ) , , 6 t ( i ) ) T , i = 1 , 2 , , M
2 After the measurement update at time t , prune the S - 1 least probable branches St
Trang 1110.3 On-line alaorithms 387
3 At time t + 1: let only the most probable branch split into S branches,
S t + l ( j ) = (Jt(i), S,+l)
4 Update their posterior probabilities according to Theorem 10.1
Some restrictions on the rules above can sometimes be useful:
0 Assume a minimum segment length: let the most probable sequence split
only if it is not too young
0 Assure that sequences are not cut off immediately after they are born: cut off the least probable sequences among those that are older than a certain minimum life-length, until only M ones are left
The exact posterior density of the state vector is a mixture of St Gaussian dis- tributions The key point in merging is to replace, or approximate, a number
of Gaussian distributions by one single Gaussian distribution in such a way that the first and second moments are matched That is, a sum of L Gaussian distributions
Trang 12388 Chanae detection based on multide models
The GP6 algorithm
The idea of the Generalized Pseudo-Bayesian ( G P B ) approach is to merge the
mixture after the measurement update
The mode parameter 6 is an independent sequence with S outcomes used to switch modes in a linear state space model Decide on the sliding window memory L Represent the posterior distribution of the state at time t with a Gaussian mixture of M = SL-' distributions,
i = l
Repeat the following recursion:
1 Let these split into S L sequences by considering all S new branches at time t + 1
2 For each i, apply the conditional Kalman filter measurement and time up- date giving ? t + l l t ( 4 , i t + l l t + l ( i ) , Pt+llt(i), Pt+llt+l(i), Et+&) and St+lW
3 Time update the weight factors a(i) according to
4 Measurement update the weight factors a(i) according to
5 Merge S sequences corresponding to the same history up to time t - L
This requires SL-l separate merging steps using the formula