The advantage of the re-parameterization is that we can apply the Kalman filter directly, with or without change detection, and we have an explicit fault state that can be used for fault
Trang 1Change detection based on
likelihood ratios
9.1 Basics 343
9.2 The likelihood approach 346
9.2.1 Notation 346
9.2.2 Likelihood 347
9.2.3 Likelihood ratio 348
9.3 The GLR test 349
9.4 The MLR test 353
9.4.1 Relation between GLR and MLR 353
9.4.2 A two-filter implementation 355
9.4.3 Marginalization of the noise level 361
9.4.4 State and variance jump 363
9.4.5 Summary 364
9.5 Simulation study 365
9.5.1 A Monte Carlo simulation 365
9.5.2 Complexity 369
9.A Derivation of the GLR test 370
9 A l Regression model for the jump 370
9.A.2 The GLR test 371
9.B LS-based derivation of the MLR test 372
This chapter is devoted to the problem of detecting additive abrupt changes
in linear state space models Sensor and actuator faults as a sudden offset
or drift can all be modeled as additive changes In addition disturbances are traditionally modeled as additive state changes The likelihood ratio formula- tion provides a general framework for detecting such changes and to isolate the fault/disturbance
Adaptive Filtering and Change Detection
Fredrik Gustafsson Copyright © 2000 John Wiley & Sons, Ltd ISBNs: 0-471-49287-6 (Hardback); 0-470-84161-3 (Electronic)
Trang 2The state space model studied in this chapter is
xt+1 = A m + B,,tut + &,tVt + ut-kBe,tv (9.1)
Y t =Gxt + et + D u p t + ut-rcDe,tv (9.2)
The additive change (fault) v enters at time L as a step (ut denotes the step
function) Here ut, et and 20 are assumed to be independent Gaussian vari- ables:
ut Q t )
et N O , Rt)
Furthermore, they are assumed to be mutually independent The state change
v occurs at the unknown time instant L, and S(j) is the pulse function that is one if j = 0 and zero otherwise The set of measurements y1, y2, , YN, each
of dimension p , will be denoted yN and y F denotes the set yt, yt+l, , Y N
This formulation of the change detection problem can be interpreted as
an input observer or input estimator approach A similar model is used in Chapter 11
To motivate the ideas of this chapter, let us consider the augmented state space model, assuming a change at time t = k (compare with Examples 6.6 and 8.2.4):
That is, at time t = k the parameter value is changed as a step from Bt = 0 for t < k to Qt = v for t 2 k It should be noted that v and Q both denote the magnitude of the additive change, but the former is seen as an input and the latter as a state, or parameter
The advantage of the re-parameterization is that we can apply the Kalman filter directly, with or without change detection, and we have an explicit fault state that can be used for fault isolation The Kalman filter applied to the augmented state space model gives a parameter estimator
&+l,, = et,,-, + m y t - Ct%lt l - D6&1 - Du,tUt),
Trang 3Here we have split the Kalman filter quantities as
so the covariance matrix of the change (fault component) is P/' Note that
K! = 0 before the change The following alternatives directly appear:
0 Kalman filter-based adaptive filtering, where the state noise covariance
special structure of the state space model can be used to derive lower order filters The basic idea is that the residuals from a Kalman filter, assuming no change, can be expressed as a linear regression
Trang 4Linear regression formulation
The nominal Kalman filter, assuming no abrupt change, is applied,
and the additive change is expressed as a linear regression with the
innovations as measurements with the following notation:
Kalman filter + Q - l , E t
Auxiliary recursion + pt, pt
Residual regression E t = ' p ? + et .Compensation xt X t l t - l + P P
The third equation indicates that we can use RLS to estimate the change
v , and the fourth equation shows how to solve the compensation problem after
detection of change and estimation (isolation) of v
Chapter 10 gives an alternative approach to this problem, where the change
is not explicitely parameterized
Some modifications of the Kalman filter equations are given, and the likelihood ratio is defined for the problem at hand
9.2.1 Notation
The Kalman filter equations for a change v E N(0, Pv) at a given time k follows
directly from (8.34)-(8.37) - by considering v as an extra state noise component
V t = 6t-,p, with Qt = 6t-kPv
The addressed problem is to modify these equations to the case where k and
v are unknown The change instant k is of primary interest, but good state estimates may also be desired
Trang 59.2 The likelihood amroach 347
In GLR, v is an unknown constant, while it is considered as a stochastic variable in the MLR test To start with, the change will be assumed to have
a Gaussian prior Later on, a non-informative prior will be used which is sometimes called a prior of ignorance; see Lehmann (1991) This prior is
characterized by a constant density function, p(v) = C
We can use Eq (9.1) to detect abrupt changes in the mean of a sequence of stochastic variables by letting At = 1, C, = 1, Qt = 0, Bu,t = 0 Furthermore,
if the mean before the change is supposed to be 0, a case often considered
in the literature (see Basseville and Nikiforov (1993)), we have zo = 0 and
no = 0
It is worth mentioning, that parametric models from Part I11 can fit this framework as well
By letting At = I and Ct = (yt-1, yt-2, , ut-1, ut-2, ), a special case
of equation (9.1) is obtained We then have a linear regression description of
an ARX model, where xt is the (time-varying) parameter vector and C, the regressors In this way, we can detect abrupt changes in the transfer function
of ARX models Note that the change occurs in the dynamics of the system
in this case, and not in the system’s state
9.2.2 likelihood
The likelihood for the measurements up to time N given the change v at time
k is denoted p ( y N I k , v) The same notation is used for the conditional density function for yN, given k , v For simplicity, k = N is agreed to mean no change
There are two principally different possibilities to estimate the change time L:
0 Joint ML estimate of k and v,
Here arg m a ~ k ~ [ ~ , ~ l , ~ p ( y ~ I k , v ) means the maximizing arguments of the likelihood p ( y N I k , v ) where k is restricted to [l, NI
Trang 60 The ML estimate of just k using marginalization of the conditional den-
sity function p ( y N I k , v):
The likelihood for data given just k in (9.6) is the starting point in this ap-
proach
A tool in the derivations is the so-called flat prior, of the form p(v) = C,
which is not a proper density function See Section 7.3.3 for a discussion and two examples for the parametric case, whose conclusions are applicable here
as well
9.2.3 likelihood ratio
In the context of hypothesis testing, the likelihood ratios rather than the likeli- hoods are used The LR test is a multiple hypotheses test, where the different change hypotheses are compared to the no change hypothesis pairwise In the LR test, the change magnitude is assumed t o be known The hypotheses under consideration are
H0 : no change
H l ( k , v) : a change of magnitude v at time k
The test is as follows Introduce the log likelihood ratio for the hypotheses as
the test statistic:
The factor 2 is just for notational convenience We use the convention that
H 1 ( N , v) = H o , so again, k = N means no change Then the LR estimate can
be expressed as
when v is known Exactly as in (9.5) and (9.7), we have two possibilities of how to eliminate the unknown nuisance parameter v Double maximization gives the GLR test, proposed for change detection in Willsky and Jones (1976), and marginalization the MLR test, proposed in Gustafsson (1996)
Trang 79.3 The GLR test 349
Why not just use the augmented state space model (9.3) and the Kalman filter equations in (9.4)? It would be straightforward to evaluate the likelihood ratios
in (9.8) for each possible k The answer is as follows:
The GLR algorithm is mainly a computational tool that splits
the Kalman filter for the full order model (9.3) into a low order
Kalman filter (which is perhaps already designed and running)
and a cascade coupled filter bank with least squares filters I
The GLR test proposed in Willsky and Jones (1976) utilizes this approach GLR’s general applicability has contributed to it now being a standard tool
in change detection As summarized in Kerr (1987), GLR has an appealing analytic framework, is widely understood by many researchers and is readily applicable to systems already utilizing a Kalman filter Another advantage
with GLR is that it partially solves the isolation problem in fault detection,
i.e to locate the physical cause of the change In Kerr (1987), a number of drawbacks with GLR is pointed out as well Among these, we mention prob- lems with choosing decision thresholds, and for some applications an untenable computational burden
The use of likelihood ratios in hypothesis testing is motivated by the
Neyrnan-Pearson Lemma; see, for instance, Theorem 3.1 in Lehmann (1991)
In the application considered here, it says that the likelihood ratio is the op- timal test statistic when the change magnitude is known and just one change time is considered This is not the case here, but a sub-optimal extension is immediate: the test is computed for each possible change time, or a restriction
to a sliding window, and if several tests indicate a change the most significant
is taken as the estimated change time In GLR, the actual change in the state
of a linear system is estimated from data and then used in the likelihood ratio Starting with the likelihood ratio in (9.8), the GLR
mization over k and v ,
test is a double maxi-
where D ( k ) is the maximum likelihood estimate of v, given a change at time
k The change candidate i in the GLR test is accepted if
Z~(i,fi(i)) > h (9.10)
Trang 8The threshold h characterizes a hypothesis test and distinguishes the GLR test from the ML method (9.5) Note that (9.5) is a special case of (9.10),
where h = 0 If the zero-change hypothesis is rejected, the state estimate can easily be compensated for the detected change
The idea in the implementation of GLR in Willsky and Jones (1976) is t o make the dependence on v explicit This task is solved in Appendix 9.A The key point is that the innovations from the Kalman filter (9.4) with k = N can
be expressed as a linear regression in v ,
where Et(k) are the innovations from the Kalman filter if v and k were known
Here and in the sequel, non-indexed quantities as E t are the output from the nominal Kalman filter, assuming no change The GLR algorithm can be implemented as follows
Given the signal model (9.1):
0 Calculate the innovations from the Kalman filter (9.4) assuming no change
0 Compute the regressors cpt(k) using
initialized by zeros at time t = k ; see Lemma 9.7 Here pt is n, X 1 and
Trang 99.3 The GLR test 351
0 A change candidate is given by k = arg max 1 ~ ( k , C ( k ) ) It is accepted
if Z~(i,fi(i)) is greater than some threshold h (otherwise k = N ) and
the corresponding estimate of the change magnitude is given by C N ( ~ ) =
n,l(i)fN(i)
We now make some comments on the algorithm:
0 It can be shown that the test statistic ZN(L, .(L)) under the null hypoth- esis is x2 distributed Thus, given the confidence level on the test, the threshold h can be found from standard statistical tables Note that this
is a multiple hypothesis test performed for each k = 1 , 2 , , N - 1, so
nothing can be said about the total confidence level
0 The regressor pt(k) is called a failure signature matrix in Willsky and Jones (1976)
0 The regressors are pre-computable Furthermore, if the system and the Kalman filter are time-invariant, the regressor is only a function of t - k ,
which simplifies the calculations
0 The formulation in Algorithm 9.1 is off-line Since the test statistic involves a matrix inversion of R N , a more efficient on-line method is as
follows From (9.34) and (9.37) we get
W ) ) = f?(k)Ct(k),
where t is used as time index instead of N The Recursive Least Squares
recursively, eliminating the matrix inversion of Rt(k) Thus, the best implementation requires t parallel RLS schemes and one Kalman filter The choice of threshold is difficult It depends not only upon the system's signal-to-noise ratio, but also on the actual noise levels, as will be pointed out
in Section 9.4.3
Example 9.3 DC motor: the GLR test
Consider the DC motor in Example 8.4 Assume impulsive additive state changes at times 60, 80, 100 and 120 First the angle is increased by five units, and then decreased again Then the same fault is simulated on angular velocity That is,
= ( 3 ,v2 = (;5) ,v3 = (3 ,v4 = (_os)
Trang 10Test statistic and threshold
30
20 10-
in particular, the improved angle tracking of GLR
Figure 9.2(a) shows how the maximum value maxt-L<k<tZt(k) of the test
statistics evolves in time, and how it exceeds the threshold level h = 10 four times The delay for detection is three samples for angular change and five samples for velocity change
The GLR state estimate adapts to the true state as shown in Figure 9.2(b) The Kalman filter also comes back to the true state, but much more slowly The change identification is not very reliable:
Compared to the simulated changes, these look like random numbers The explanation is that detection is so fast that there are too few data for fault estimation To get good isolations, we have t o wait and get considerably more data The incorrect compensation explains the short transients we can see in the angular velocity estimate
Navigation examples and references t o such are presented in Kerr (1987)
As a non-standard application, GLR is applied to noise suppression in image processing in Hong and Brzakovic (1980)
Trang 119.4 The MLR test 353
Another alternative is to consider the change magnitude as a stochastic nui- sance parameter This is then eliminated not by estimation, but by marginal- ixation Marginalization is wellknown in estimation theory, and is also used in
other detection problems; see, for instance, Wald (1950) The resulting test will be called the Marginalixed Likelihood Ratio ( M L R ) test The MLR test
applies to all cases where GLR does, but we point out three advantages with using the former:
0 Tuning Unlike GLR, there is no sensitive threshold to choose in MLR
One interpretation is that a reasonable threshold in GLR is chosen au- tomatically
0 Robustness to modeling errors The performance of GLR deteriorates in
the case of incorrectly chosen noise variances The noise level in MLR is allowed to be considered as another unknown nuisance parameter This approach increases the robustness of MLR
0 Complexity GLR requires a linearly increasing number of parallel filters
An approximation involving a sliding window technique is proposed in Willsky and Jones (1976) to obtain a constant number of filters, typically equivalent to 10-20 parallel filters For off-line processing, the MLR test can be computed exactly from only two filters This implementation is
of particularly great impact in the design step Here the false alarm rate, robustness properties and detectability of different changes can be eval- uated quickly using Monte-Carlo simulations In fact, the computation
of one single exact GLR test for a realistic data size (> 1000) is already far from inter-active
In Appendix 9.B the MLR test is derived using the quantities from the GLR test in Algorithm 9.1 This derivation gives a nice relationship between GLR
I f (9.1) is time invaxiant and v is unknown, then the GLR test in Algorithm
9.1 gives the same estimated change time as the MLR test in Theorem 9.8 as
N - k + 00 and k + 00 i f the threshold is chosen as
h = p log(27~) + log det RN ( k ) - py(i.)
Trang 12when the prior o f t h e j u m p i s v E N(vo, P,), and
h = log det & ( k ) for a flat prior Here R N ( ~ ) = liml\r-k+oo,k+ooRN(k), and & ( k ) is defined
in Algorithm 9.1
Proof: In the MLR test a change k is detected if Z N ( ~ ) > ZN(N) = 0 and in
the GLR if 1 ~ ( k , v(k)) > h From Theorem 9.8 we have Z N ( ~ ) = Z ~ ( l c , v(k)) +
2 logp,(i/) -log det R N ( ~ ) -plog(27r) Lemma 9.9 shows that & ( k ) converges
as N + 00, and so does log det R N ( ~ ) Since (9.1) is restricted t o be time
invariant the terms of & ( k ) that depend on the system matrices and the
Kalman gain are the same independently of k as k + 00 according to (9.28)
probability of correct detection
We now make a new derivation of the MLR test in a direct way using a linearly increasing number of Kalman filters This derivation enables first the efficient implementation in the Section 9.4.2, and secondly, the elimination of noise scalings in Section 9.4.3 Since the magnitudes of the likelihoods turn out to be of completely different orders, the log likelihood will be used in order
to avoid possible numerical problems
Theorem 9.2
Consider the signal model (9.1)’ where the covariance matrix of the Gaussian distributed jump magnitude is P, For each k = 1 , 2 , , t , update the k’th Kalman filter in (9.4) T h e log likelihood, conditioned on a j u m p a t t i m e k , can be recursively computed by
logp(yt1k) = logp(yt-lIk) - - P log 27r
Trang 139.4 The MLR test 355
It is a well-known property of the Kalman filter; see, for instance, Anderson and Moore (1979), that
Ytlk E N ( C t Q - l ( k ) , ctptIt-l(~)c,T + W ,
and the result follows from the definition of the Gaussian density function 0
This approach requires a linearly growing number with N Kalman filters
9.4.2 A two-filter implementation
To compute the likelihood ratios efficiently, two statistical tricks are needed:
0 Use a flat prior on the jump magnitude v
0 Use some of the last observations for calculating proper distributions The point with the former is that the measurements after the jump are in- dependent of the measurements before the jump, and the likelihood can be computed as a product of the likelihoods before and after the jump However, this leads to a problem The likelihood is not uniquely defined immediately after a jump of infinite variance Therefore, a small part of the data is used for initialization We also have to assume that At in (9.1) is invertible
The key point in the derivation is the backward model presented in Chapter
8 when discussing smoothing algorithms The problem here, which is not apparent in smoothing, is that the 'prior' I= E [ z ~ x s ] in the backward recursion generally depends upon k , so we must be careful in using a common
Kalman filter for all hypotheses For this reason, the assumption on infinite variance of the jump magnitude is needed, so Iis infinite for all k as well
By infinite we mean that I ;= 0 The recursion = FIItFT + Q gives
= 0 The backward model for non-singular At becomes
X t =AF1zt+l - AF'w~ = AF'zt+l + ~t B
Here Q," = E [ w ? ( v ? ) ~ ] = AFIQtAFT and II;' = 0, where I= E [ z ~ z g ]
We now have the backward model and can simply apply the Kalman filter for the estimate X$+' and its covariance matrix P$+'
The likelihoods rather than likelihood ratios will be derived The last L
measurements are used for normalization, which means that jumps after time
N - L are not considered This is not a serious restriction, since it suffices t o
choose L = dim X , and jumps supported by so little data cannot be detected with any significance in any case
We are now ready for the main result of this section
Trang 14Theorem 9.3
Consider the signal model (9.1) for the case o f an invertible At T h e likeli- hood for the measurements conditioned on a jump at time lc and the last L measurements, can be computed by two Kalman filters as follows First, the likelihoods axe separated,
T h e likelihoods involved are computed by
Here ?(X - p, P ) is the Gaussian probability density function The quantities
?ctpl and PGpl axe given by the Kalman filter applied to the forward model and P$+, and P$+l axe given by the Kalman filter applied on the backward model (9.12) The quantities and PN used for normalization are
given by the Kalman filter applied on the ?-l orwaxd model initiated at time
t = N - L + 1 with PN-L+~IN-L = I I N - L + ~
Proof: Bayes' law gives
(9.17)
(9.18)
Trang 159.4 The MLR test 357
The fact that the jump at time k does not affect the measurements before
time k (by causality) is used in the last equality, so p(y'1k) = p(&> Here, the infinite variance jump makes the measurements after the jump independent of those before
The likelihood for a set yk can be expanded either forwards or backwards using Bayes' chain rule:
t=m
Now p ( y N I k = N ) and p(yk) are computed using the forward recursion (9.21), and since xt is Gaussian, it follows immediately that ytlyt-' is Gaussian with mean Ct2&l and covariance C&-lCF + Rt, and (9.14) follows
Also, p(y$-,+,lk = N ) is computed in the same way; the difference is that
the Kalman filter is initiated at time N - L + 1 Finally, p(yr<Lly$-L+l, k )
is computed using (9.22) where ytlygl is Gaussian with mean Cti?:t+l and
As can be seen, all that is needed to compute the likelihoods are one Kalman filter running backwards in time, one running forwards in time, and one processing the normalizing data at the end The resulting algorithm is
as follows, where the log likelihoods are used because of possible numerical problems caused by very large differences in the magnitude of the likelihoods The notation introduced here will be used in the sequel
Trang 16Algorithm 9.2 Two-filter detection
The likelihood given in Theorem 9.3 of a jump at time L, L = 1 , 2 , , N , is
computed with two filters as follows
Forward filter for t = 1,1, , N
Normalization filter for t = N - L + 1, N - L + 2, , N :
N - L
- N
t = l