Tài liệu Adaptive lọc và phát hiện thay đổi P9 docx

The advantage of the re-parameterization is that we can apply the Kalman filter directly, with or without change detection, and we have an explicit fault state that can be used for fault

Trang 1

Change detection based on

likelihood ratios

9.1 Basics 343

9.2 The likelihood approach 346

9.2.1 Notation 346

9.2.2 Likelihood 347

9.2.3 Likelihood ratio 348

9.3 The GLR test 349

9.4 The MLR test 353

9.4.1 Relation between GLR and MLR 353

9.4.2 A two-filter implementation 355

9.4.3 Marginalization of the noise level 361

9.4.4 State and variance jump 363

9.4.5 Summary 364

9.5 Simulation study 365

9.5.1 A Monte Carlo simulation 365

9.5.2 Complexity 369

9.A Derivation of the GLR test 370

9 A l Regression model for the jump 370

9.A.2 The GLR test 371

9.B LS-based derivation of the MLR test 372

This chapter is devoted to the problem of detecting additive abrupt changes

in linear state space models Sensor and actuator faults as a sudden offset

or drift can all be modeled as additive changes In addition disturbances are traditionally modeled as additive state changes The likelihood ratio formulation provides a general framework for detecting such changes and to isolate the fault/disturbance

Adaptive Filtering and Change Detection

Trang 2

The state space model studied in this chapter is

xt+1 = A m + B,,tut + &,tVt + ut-kBe,tv (9.1)

Y t =Gxt + et + D u p t + ut-rcDe,tv (9.2)

The additive change (fault) v enters at time L as a step (ut denotes the step

function) Here ut, et and 20 are assumed to be independent Gaussian variables:

ut Q t )

et N O , Rt)

Furthermore, they are assumed to be mutually independent The state change

v occurs at the unknown time instant L, and S(j) is the pulse function that is one if j = 0 and zero otherwise The set of measurements y1, y2, , YN, each

of dimension p , will be denoted yN and y F denotes the set yt, yt+l, , Y N

This formulation of the change detection problem can be interpreted as

an input observer or input estimator approach A similar model is used in Chapter 11

To motivate the ideas of this chapter, let us consider the augmented state space model, assuming a change at time t = k (compare with Examples 6.6 and 8.2.4):

That is, at time t = k the parameter value is changed as a step from Bt = 0 for t < k to Qt = v for t 2 k It should be noted that v and Q both denote the magnitude of the additive change, but the former is seen as an input and the latter as a state, or parameter

The advantage of the re-parameterization is that we can apply the Kalman filter directly, with or without change detection, and we have an explicit fault state that can be used for fault isolation The Kalman filter applied to the augmented state space model gives a parameter estimator

&+l,, = et,,-, + m y t - Ct%lt l - D6&1 - Du,tUt),

Trang 3

Here we have split the Kalman filter quantities as

so the covariance matrix of the change (fault component) is P/' Note that

K! = 0 before the change The following alternatives directly appear:

0 Kalman filter-based adaptive filtering, where the state noise covariance

special structure of the state space model can be used to derive lower order filters The basic idea is that the residuals from a Kalman filter, assuming no change, can be expressed as a linear regression

Trang 4

Linear regression formulation

The nominal Kalman filter, assuming no abrupt change, is applied,

and the additive change is expressed as a linear regression with the

innovations as measurements with the following notation:

Kalman filter + Q - l , E t

Auxiliary recursion + pt, pt

Residual regression E t = ' p ? + et .Compensation xt X t l t - l + P P

The third equation indicates that we can use RLS to estimate the change

v , and the fourth equation shows how to solve the compensation problem after

detection of change and estimation (isolation) of v

Chapter 10 gives an alternative approach to this problem, where the change

is not explicitely parameterized

Some modifications of the Kalman filter equations are given, and the likelihood ratio is defined for the problem at hand

9.2.1 Notation

The Kalman filter equations for a change v E N(0, Pv) at a given time k follows

directly from (8.34)-(8.37) - by considering v as an extra state noise component

V t = 6t-,p, with Qt = 6t-kPv

The addressed problem is to modify these equations to the case where k and

v are unknown The change instant k is of primary interest, but good state estimates may also be desired

Trang 5

9.2 The likelihood amroach 347

In GLR, v is an unknown constant, while it is considered as a stochastic variable in the MLR test To start with, the change will be assumed to have

a Gaussian prior Later on, a non-informative prior will be used which is sometimes called a prior of ignorance; see Lehmann (1991) This prior is

characterized by a constant density function, p(v) = C

We can use Eq (9.1) to detect abrupt changes in the mean of a sequence of stochastic variables by letting At = 1, C, = 1, Qt = 0, Bu,t = 0 Furthermore,

if the mean before the change is supposed to be 0, a case often considered

in the literature (see Basseville and Nikiforov (1993)), we have zo = 0 and

no = 0

It is worth mentioning, that parametric models from Part I11 can fit this framework as well

By letting At = I and Ct = (yt-1, yt-2, , ut-1, ut-2, ), a special case

of equation (9.1) is obtained We then have a linear regression description of

an ARX model, where xt is the (time-varying) parameter vector and C, the regressors In this way, we can detect abrupt changes in the transfer function

of ARX models Note that the change occurs in the dynamics of the system

in this case, and not in the system’s state

9.2.2 likelihood

The likelihood for the measurements up to time N given the change v at time

k is denoted p ( y N I k , v) The same notation is used for the conditional density function for yN, given k , v For simplicity, k = N is agreed to mean no change

There are two principally different possibilities to estimate the change time L:

0 Joint ML estimate of k and v,

Here arg m a ~ k ~ [ ~ , ~ l , ~ p ( y ~ I k , v ) means the maximizing arguments of the likelihood p ( y N I k , v ) where k is restricted to [l, NI

Trang 6

0 The ML estimate of just k using marginalization of the conditional den-

sity function p ( y N I k , v):

The likelihood for data given just k in (9.6) is the starting point in this ap-

proach

A tool in the derivations is the so-called flat prior, of the form p(v) = C,

which is not a proper density function See Section 7.3.3 for a discussion and two examples for the parametric case, whose conclusions are applicable here

as well

9.2.3 likelihood ratio

In the context of hypothesis testing, the likelihood ratios rather than the likelihoods are used The LR test is a multiple hypotheses test, where the different change hypotheses are compared to the no change hypothesis pairwise In the LR test, the change magnitude is assumed t o be known The hypotheses under consideration are

H0 : no change

H l ( k , v) : a change of magnitude v at time k

The test is as follows Introduce the log likelihood ratio for the hypotheses as

the test statistic:

The factor 2 is just for notational convenience We use the convention that

H 1 ( N , v) = H o , so again, k = N means no change Then the LR estimate can

be expressed as

when v is known Exactly as in (9.5) and (9.7), we have two possibilities of how to eliminate the unknown nuisance parameter v Double maximization gives the GLR test, proposed for change detection in Willsky and Jones (1976), and marginalization the MLR test, proposed in Gustafsson (1996)

Trang 7

9.3 The GLR test 349

Why not just use the augmented state space model (9.3) and the Kalman filter equations in (9.4)? It would be straightforward to evaluate the likelihood ratios

in (9.8) for each possible k The answer is as follows:

The GLR algorithm is mainly a computational tool that splits

the Kalman filter for the full order model (9.3) into a low order

Kalman filter (which is perhaps already designed and running)

and a cascade coupled filter bank with least squares filters I

The GLR test proposed in Willsky and Jones (1976) utilizes this approach GLR’s general applicability has contributed to it now being a standard tool

in change detection As summarized in Kerr (1987), GLR has an appealing analytic framework, is widely understood by many researchers and is readily applicable to systems already utilizing a Kalman filter Another advantage

with GLR is that it partially solves the isolation problem in fault detection,

i.e to locate the physical cause of the change In Kerr (1987), a number of drawbacks with GLR is pointed out as well Among these, we mention problems with choosing decision thresholds, and for some applications an untenable computational burden

The use of likelihood ratios in hypothesis testing is motivated by the

Neyrnan-Pearson Lemma; see, for instance, Theorem 3.1 in Lehmann (1991)

In the application considered here, it says that the likelihood ratio is the optimal test statistic when the change magnitude is known and just one change time is considered This is not the case here, but a sub-optimal extension is immediate: the test is computed for each possible change time, or a restriction

to a sliding window, and if several tests indicate a change the most significant

is taken as the estimated change time In GLR, the actual change in the state

of a linear system is estimated from data and then used in the likelihood ratio Starting with the likelihood ratio in (9.8), the GLR

mization over k and v ,

test is a double maxi-

where D ( k ) is the maximum likelihood estimate of v, given a change at time

k The change candidate i in the GLR test is accepted if

Z~(i,fi(i)) > h (9.10)

Trang 8

The threshold h characterizes a hypothesis test and distinguishes the GLR test from the ML method (9.5) Note that (9.5) is a special case of (9.10),

where h = 0 If the zero-change hypothesis is rejected, the state estimate can easily be compensated for the detected change

The idea in the implementation of GLR in Willsky and Jones (1976) is t o make the dependence on v explicit This task is solved in Appendix 9.A The key point is that the innovations from the Kalman filter (9.4) with k = N can

be expressed as a linear regression in v ,

where Et(k) are the innovations from the Kalman filter if v and k were known

Here and in the sequel, non-indexed quantities as E t are the output from the nominal Kalman filter, assuming no change The GLR algorithm can be implemented as follows

Given the signal model (9.1):

0 Calculate the innovations from the Kalman filter (9.4) assuming no change

0 Compute the regressors cpt(k) using

initialized by zeros at time t = k ; see Lemma 9.7 Here pt is n, X 1 and

Trang 9

9.3 The GLR test 351

0 A change candidate is given by k = arg max 1 ~ ( k , C ( k ) ) It is accepted

if Z~(i,fi(i)) is greater than some threshold h (otherwise k = N ) and

the corresponding estimate of the change magnitude is given by C N ( ~ ) =

n,l(i)fN(i)

We now make some comments on the algorithm:

0 It can be shown that the test statistic ZN(L, .(L)) under the null hypothesis is x2 distributed Thus, given the confidence level on the test, the threshold h can be found from standard statistical tables Note that this

is a multiple hypothesis test performed for each k = 1 , 2 , , N - 1, so

nothing can be said about the total confidence level

0 The regressor pt(k) is called a failure signature matrix in Willsky and Jones (1976)

0 The regressors are pre-computable Furthermore, if the system and the Kalman filter are time-invariant, the regressor is only a function of t - k ,

which simplifies the calculations

0 The formulation in Algorithm 9.1 is off-line Since the test statistic involves a matrix inversion of R N , a more efficient on-line method is as

follows From (9.34) and (9.37) we get

W ) ) = f?(k)Ct(k),

where t is used as time index instead of N The Recursive Least Squares

recursively, eliminating the matrix inversion of Rt(k) Thus, the best implementation requires t parallel RLS schemes and one Kalman filter The choice of threshold is difficult It depends not only upon the system's signal-to-noise ratio, but also on the actual noise levels, as will be pointed out

in Section 9.4.3

Example 9.3 DC motor: the GLR test

Consider the DC motor in Example 8.4 Assume impulsive additive state changes at times 60, 80, 100 and 120 First the angle is increased by five units, and then decreased again Then the same fault is simulated on angular velocity That is,

= ( 3 ,v2 = (;5) ,v3 = (3 ,v4 = (_os)

Trang 10

Test statistic and threshold

30

20 10-

in particular, the improved angle tracking of GLR

Figure 9.2(a) shows how the maximum value maxt-L<k<tZt(k) of the test

statistics evolves in time, and how it exceeds the threshold level h = 10 four times The delay for detection is three samples for angular change and five samples for velocity change

The GLR state estimate adapts to the true state as shown in Figure 9.2(b) The Kalman filter also comes back to the true state, but much more slowly The change identification is not very reliable:

Compared to the simulated changes, these look like random numbers The explanation is that detection is so fast that there are too few data for fault estimation To get good isolations, we have t o wait and get considerably more data The incorrect compensation explains the short transients we can see in the angular velocity estimate

Navigation examples and references t o such are presented in Kerr (1987)

As a non-standard application, GLR is applied to noise suppression in image processing in Hong and Brzakovic (1980)

Trang 11

9.4 The MLR test 353

Another alternative is to consider the change magnitude as a stochastic nuisance parameter This is then eliminated not by estimation, but by marginal- ixation Marginalization is wellknown in estimation theory, and is also used in

other detection problems; see, for instance, Wald (1950) The resulting test will be called the Marginalixed Likelihood Ratio ( M L R ) test The MLR test

applies to all cases where GLR does, but we point out three advantages with using the former:

0 Tuning Unlike GLR, there is no sensitive threshold to choose in MLR

One interpretation is that a reasonable threshold in GLR is chosen au- tomatically

0 Robustness to modeling errors The performance of GLR deteriorates in

the case of incorrectly chosen noise variances The noise level in MLR is allowed to be considered as another unknown nuisance parameter This approach increases the robustness of MLR

0 Complexity GLR requires a linearly increasing number of parallel filters

An approximation involving a sliding window technique is proposed in Willsky and Jones (1976) to obtain a constant number of filters, typically equivalent to 10-20 parallel filters For off-line processing, the MLR test can be computed exactly from only two filters This implementation is

of particularly great impact in the design step Here the false alarm rate, robustness properties and detectability of different changes can be eval- uated quickly using Monte-Carlo simulations In fact, the computation

of one single exact GLR test for a realistic data size (> 1000) is already far from inter-active

In Appendix 9.B the MLR test is derived using the quantities from the GLR test in Algorithm 9.1 This derivation gives a nice relationship between GLR

I f (9.1) is time invaxiant and v is unknown, then the GLR test in Algorithm

9.1 gives the same estimated change time as the MLR test in Theorem 9.8 as

N - k + 00 and k + 00 i f the threshold is chosen as

h = p log(27~) + log det RN ( k ) - py(i.)

Trang 12

when the prior o f t h e j u m p i s v E N(vo, P,), and

h = log det & ( k ) for a flat prior Here R N ( ~ ) = liml\r-k+oo,k+ooRN(k), and & ( k ) is defined

in Algorithm 9.1

Proof: In the MLR test a change k is detected if Z N ( ~ ) > ZN(N) = 0 and in

the GLR if 1 ~ ( k , v(k)) > h From Theorem 9.8 we have Z N ( ~ ) = Z ~ ( l c , v(k)) +

2 logp,(i/) -log det R N ( ~ ) -plog(27r) Lemma 9.9 shows that & ( k ) converges

as N + 00, and so does log det R N ( ~ ) Since (9.1) is restricted t o be time

invariant the terms of & ( k ) that depend on the system matrices and the

Kalman gain are the same independently of k as k + 00 according to (9.28)

probability of correct detection

We now make a new derivation of the MLR test in a direct way using a linearly increasing number of Kalman filters This derivation enables first the efficient implementation in the Section 9.4.2, and secondly, the elimination of noise scalings in Section 9.4.3 Since the magnitudes of the likelihoods turn out to be of completely different orders, the log likelihood will be used in order

to avoid possible numerical problems

Theorem 9.2

Consider the signal model (9.1)’ where the covariance matrix of the Gaussian distributed jump magnitude is P, For each k = 1 , 2 , , t , update the k’th Kalman filter in (9.4) T h e log likelihood, conditioned on a j u m p a t t i m e k , can be recursively computed by

logp(yt1k) = logp(yt-lIk) - - P log 27r

Trang 13

It is a well-known property of the Kalman filter; see, for instance, Anderson and Moore (1979), that

Ytlk E N ( C t Q - l ( k ) , ctptIt-l(~)c,T + W ,

and the result follows from the definition of the Gaussian density function 0

This approach requires a linearly growing number with N Kalman filters

9.4.2 A two-filter implementation

To compute the likelihood ratios efficiently, two statistical tricks are needed:

0 Use a flat prior on the jump magnitude v

0 Use some of the last observations for calculating proper distributions The point with the former is that the measurements after the jump are independent of the measurements before the jump, and the likelihood can be computed as a product of the likelihoods before and after the jump However, this leads to a problem The likelihood is not uniquely defined immediately after a jump of infinite variance Therefore, a small part of the data is used for initialization We also have to assume that At in (9.1) is invertible

The key point in the derivation is the backward model presented in Chapter

8 when discussing smoothing algorithms The problem here, which is not apparent in smoothing, is that the 'prior' I= E [ z ~ x s ] in the backward recursion generally depends upon k , so we must be careful in using a common

Kalman filter for all hypotheses For this reason, the assumption on infinite variance of the jump magnitude is needed, so Iis infinite for all k as well

By infinite we mean that I ;= 0 The recursion = FIItFT + Q gives

= 0 The backward model for non-singular At becomes

X t =AF1zt+l - AF'w~ = AF'zt+l + ~t B

Here Q," = E [ w ? ( v ? ) ~ ] = AFIQtAFT and II;' = 0, where I= E [ z ~ z g ]

We now have the backward model and can simply apply the Kalman filter for the estimate X$+' and its covariance matrix P$+'

The likelihoods rather than likelihood ratios will be derived The last L

measurements are used for normalization, which means that jumps after time

N - L are not considered This is not a serious restriction, since it suffices t o

choose L = dim X , and jumps supported by so little data cannot be detected with any significance in any case

We are now ready for the main result of this section

Trang 14

Theorem 9.3

Consider the signal model (9.1) for the case o f an invertible At T h e likelihood for the measurements conditioned on a jump at time lc and the last L measurements, can be computed by two Kalman filters as follows First, the likelihoods axe separated,

T h e likelihoods involved are computed by

Here ?(X - p, P ) is the Gaussian probability density function The quantities

?ctpl and PGpl axe given by the Kalman filter applied to the forward model and P$+, and P$+l axe given by the Kalman filter applied on the backward model (9.12) The quantities and PN used for normalization are

given by the Kalman filter applied on the ?-l orwaxd model initiated at time

t = N - L + 1 with PN-L+~IN-L = I I N - L + ~

Proof: Bayes' law gives

(9.17)

(9.18)

Trang 15

The fact that the jump at time k does not affect the measurements before

time k (by causality) is used in the last equality, so p(y'1k) = p(&> Here, the infinite variance jump makes the measurements after the jump independent of those before

The likelihood for a set yk can be expanded either forwards or backwards using Bayes' chain rule:

t=m

Now p ( y N I k = N ) and p(yk) are computed using the forward recursion (9.21), and since xt is Gaussian, it follows immediately that ytlyt-' is Gaussian with mean Ct2&l and covariance C&-lCF + Rt, and (9.14) follows

Also, p(y$-,+,lk = N ) is computed in the same way; the difference is that

the Kalman filter is initiated at time N - L + 1 Finally, p(yr<Lly$-L+l, k )

is computed using (9.22) where ytlygl is Gaussian with mean Cti?:t+l and

As can be seen, all that is needed to compute the likelihoods are one Kalman filter running backwards in time, one running forwards in time, and one processing the normalizing data at the end The resulting algorithm is

as follows, where the log likelihoods are used because of possible numerical problems caused by very large differences in the magnitude of the likelihoods The notation introduced here will be used in the sequel

Trang 16

Algorithm 9.2 Two-filter detection

The likelihood given in Theorem 9.3 of a jump at time L, L = 1 , 2 , , N , is

computed with two filters as follows

Forward filter for t = 1,1, , N

Normalization filter for t = N - L + 1, N - L + 2, , N :

N - L

- N

t = l

Tiêu đề	Adaptive Filtering and Change Detection
Tác giả	Fredrik Gustafsson
Trường học	John Wiley & Sons, Ltd
Chuyên ngành	Adaptive Filtering and Change Detection
Thể loại	Thesis
Năm xuất bản	2000
Thành phố	New York

Định dạng
Số trang	33
Dung lượng	1,19 MB