Adaptive lọc và phát hiện thay đổi P3

For change detection, this will be labeled as a change in the mean model.. The task of determining Bt from yt will be referred to as estimation, and change detection or alarming is the

Trang 1

Part II: Signal estimation

ISBNs: 0-471-49287-6 (Hardback); 0-470-84161-3 (Electronic)

Trang 2

On-line approaches

3.1 Introduction 57

3.2 Filtering approaches 59

3.3 Summary of least squares approaches 59

3.3.1 Recursive least squares 60

3.3.2 The least squares over sliding window 61

3.3.3 Least mean square 61

3.3.4 The Kalman filter 62

3.4 Stopping rules and the CUSUM test 63

3.4.1 Distance measures 64

3.4.2 One-sided tests 65

3.4.3 Two-sided tests 67

3.4.4 The CUSUM adaptive filter 67

3.5 Likelihood based change detection 70

3.5.1 Likelihood theory 70

3.5.2 ML estimation of nuisance parameters 71

3.5.3 ML estimation of a single change time 73

3.5.4 Likelihood ratio ideas 75

3.5.5 Model validation based on sliding windows 77

3.6 Applications 81

3.6.1 Fuel monitoring 81

3.6.2 Paper refinery 82

3.A Derivations 84

3 A l Marginalization of likelihoods 84

3.A.2 Likelihood ratio approaches 86

The basic assumption in this part signal estimation is that the measurements

gt consist of a deterministic component Ot ~ the signal ~ and additive white

noise et

yt = Ot + et

Adaptive Filtering and Change Detection

Trang 3

For change detection, this will be labeled as a change in the mean model

The task of determining Bt from yt will be referred to as estimation, and

change detection or alarming is the task of finding abrupt, or rapid, changes

in d t , which is assumed to start at time L, referred to as the change time Surveillance comprises all these aspects, and a typical application is to monitor levels, flows and so on in industrial processes and alarm for abnormal values The basic assumptions about model (3.1) in change detection are:

0 The deterministic component Bt undergoes an abrupt change at time

t = k Once this change is detected, the procedure starts all over again

to detect the next change The alternative is t o consider Bt as piecewise constant and focus on a sequence of change times k1, kz, , kn, as shown

in Chapter 4 This sequence is denoted P , where both Ici and n are free

parameters The segmentation problem is t o find both the number and

locations of the change times in P

0 In the statistical approaches, it will be assumed that the noise is white

and Gaussian et E N(0, R ) However, the formulas can be generalized t o other distributions, as will be pointed out in Section 3.A

The change magnitude for a change at time k is defined as v a &+l - d k

Change detection approaches can be divided into hypothesis tests and estimation/information approaches Algorithms belonging to the class of hypothesis tests can be split into the parts shown in Figure 3.1

For the change in the mean model, one or more of these blocks become trivial, but the picture is useful to keep in mind for the general model-based case Estimation and information approaches do everything in one step, and

do not suit the framework of Figure 3.1

The alternative to the non-parametric approach in this chapter is t o model

the deterministic component of yt as a parametric model, and this issue will

(b) A stopping rule consists of averaging and thresholding

Figure 3.1 The steps in change detection based on hypothesis tests The stopping rule can

be seen as an averaging filter and a thresholding decision device

Trang 4

3.2 Filterina amroaches 59

be the dealt with in Part 111 It must be noted that the signal model (3.1),

and thus all methods in this part, are special cases of what will be covered in Part 111

This chapter presents a review of averaging strategies, stopping rules, and change detection ideas Most of the ideas to follow in subsequent chapters are introduced here

The standard approach in signal processing for separating the signal Bt and

the noise et is by (typically low-pass) filtering

Q, = H ( d Y t ( 3 4

The filter can be of Finite Impulse Response (FIR) or Infinite Impulse Re- sponse (IIR) type, and designed by any standard method (Butterworth, Cheby- shev etc.) An alternative interpretation of filters is data windowing

00

Q, = c W k Y t - k , k=O

where the weights should satisfy C w k = 1 This is equal to the filtering

approach if the weights are interpreted as the impulse response of the (low- pass) filter H ( q ) , i.e w k = h k

An important special case of these is the exponential forgetting window, or Geometric Moving Average ( G M A )

W k = (1 - X ) P , 0 5 X < 1 (3.4)

A natural, and for change detection fundamental, principle is to use a sliding

window, defined by

A more general approach, which can be labeled Finite Moving Average ( F M A )

is obtained by using arbitrary weights w k in the sliding window with the constraint w k = 1, which is equivalent t o a FIR filter

This section offers a summary of the adaptive filters presented in a more general form in Chapter 5

Trang 5

A common framework for the most common estimation approaches is to let the signal estimate be the minimizing argument (arg min) of a certain loss

function

8, = argmin&(O)

In the next four subsections, a number of loss functions are given, and the corresponding estimators are derived For simplicity, the noise variance is assumed to be constant Ee: = R We are interested in the signal estimate, and also its theoretical variance Pt A E ( & - and its estimate P t For adaptive methods, the parameter variance is defined under the assumption that the parameter Qt is time invariant

3.3.1 Recursive least squares

To start with, the basic idea in least squares is to minimize the sum of squared

‘his is an off-line approach which assumes that the parameter is time invariant If Qt is time-varying, adaptivity can be obtained by forgetting old measurements using the following loss function:

t

i = l

Trang 6

3.3 Summarv of least sauares amroaches 61

Here X is referred to as the forgetting factor This formula yields the recursive least squares (RLS) estimate Note that the estimate et is unbiased only if the true parameter is time invariant A recursive version of the RLS estimate is

e, = xet-1 + (1 - X) yt

= et-1 + (1 - X)&t, (3.12) where Et = (yt - 6t-1) is the prediction error This latter formulation of RLS will be used frequently in the sequel, and a general derivation is presented in Chapter 5

3.3.2 The least squares over sliding window

Computing the least squares loss function over a sliding window of size L gives:

3.3.3 least mean square

In the Least Mean Square (LMS) approach, the objective is to minimize

Trang 7

by a stochastic gradient algorithm defined by

(3.14)

Here, p is the step size of the algorithm The expectation in (3.13) cannot be

evaluated, so the standard approach is to just ignore it Differentiation then gives the LMS algorithm:

A

Ot = et-l + p&t (3.15)

That is, for signal estimation, LMS and RLS coincide with p = 1 - X This is not true in the general case in Chapter 5

3.3.4 The Kalman filter

One further alternative is to explicitly model the parameter time-variations as

a so called random walk

Ot+l = ot + ut (3.16)

Let the variance of the noise ut be Q Then the Kalman filter, as will be

derived in Chapter 13, applies:

(3.17) (3.18)

If the assumption (3.16) holds, the Kalman filter is the optimal estimator in

the following meanings:

0 It is the minimum variance estimator if ut and et (and also the initial

knowledge of do) are independent and Gaussian That is, there is no other estimator that gives a smaller variance error Var(& - O t )

0 Among all linear estimators, it gives the minimum variance error inde-

pendently of the distribution of the noises

0 Since minimum variance is related to the least squares criterion, we also have a least squares optimality under the model assumption (3.16)

0 It is the conditional expectation of dt, given the observed values of yt This subject will be thoroughly treated in Chapter 8

Trang 8

3.4 Stamina rules and the CUSUM test 63

Example 3.1 Signal estimation using linear filters

Figure 3.2 shows an example of a signal, and the estimates from RLS, LMS and KF, respectively The design parameters are X = 0.9, p = 0.1 and

Q = 0.02, respectively RLS and LMS are identical and the Kalman filter is very similar for these settings

a certain threshold Often, a stopping rule is used as a part of a change detection algorithm It can be characterized as follows:

0 The definition of a stopping rule here is that, in contrast to change detection, no statistical assumptions on its input are given

0 The change from et = 0 to a positive value may be abrupt, linear or incipient, whereas in change detection the theoretical assumption in the derivation is that the change is abrupt

0 There is prior information on how large the threshold is

An auxiliary test statistic gt is introduced, which is used for alarm decisions

using a threshold h The purpose of the stopping rule is to give an alarm when

Trang 9

gt exceeds a certain value,

Alarm if gt > h (3.19) Stopping rules will be used frequently when discussing change detection based

on filter residual whiteness tests and model validation

3.4.1 Distance measures

The input to a stopping rule is, as illustrated in Figure 3.3, a distance measure

S t Several possibilities exist:

0 A simple approach is t o take the residuals

,

S t = E t = yt - & l , (3.20) where 8t-l (based on measurements up to time t-l) is any estimate from Sections 3.2 or 3.3 This is suitable for the change in the mean problem, which should be robust t o variance changes A good alternative is t o normalize to unit variance The variance of the residuals will be shown

to equal R + Pt, so use instead

(3.21) This scaling facilitates the design somewhat, in that approximately the same design parameters can be used for different applications

0 An alternative is to square the residuals

Figure 3.3 Structure of a stopping rule

Trang 10

3.4 Stomina rules and the CUSUM test 65

GMA filters can be used

As an example of a wellknown combination, we can take the squared residual as distance measure from the no change hypothesis, st = E?, and average over a sliding window Then we get a X’ test, where the distribution of gt is

ters Particular named algorithms are obtained for the exponential forgetting window and finite moving average filter In this way, stopping rules based on FMA or GMA are obtained

The methods so far have been linear in data, or for the X’ test quadratic

in data We now turn our attention to a fundamental and historically very important class of non-linear stopping rules First, the Sequential Probability Ratio Test (SPRT) is given

Algorithm 3.1 SPRT

,

Design parameters: Drift v, threshold h and reset level a

In words, the test statistic gt sums up its input st, with the idea to give

an alarm when the sum exceeds a threshold h With a white noise input,

the test statistic will drift away similar t o a random walk There are two mechanisms to prevent this natural fluctuation To prevent positive drifts, eventually yielding a false alarm, a small drift term v is subtracted at each time instant To prevent a negative drift, which would increase the time to detection after a change, the test statistic is reset t o 0 each time in becomes less than a negative constant a

The level crossing parameter a should be chosen t o be small in magnitude,

and it has been thoroughly explained why a = 0 is a good choice This important special case yields the cumulative sum (CUSUM) algorithm

Trang 11

gt = 0, and t a = t and alarm if gt > h > 0 (3.29)

Both algorithms were originally derived in the context of quality control (Page, 1954) A more recent reference with a few variations analyzed is Malladi and Speyer (1999)

In both SPRT and CUSUM, the alarm time ta is the primary output, and the drift should be chosen as half of the critical level that must not be exceeded

by the physical variable Bt A non-standard, but very simple, suggestion for how to estimate the change time is included in the parameter L, but remember that the change is not necessarily abrupt when using stopping rules in general The estimate of the change time is logical for the following reason (although the change location problem does not seem t o be dealt with in literature in this context) When Ot = 0 the test statistic will be set t o zero at almost every time instant (depending on the noise level and if a < -U is used) After a change to Ot > v , gt will start to grow and will not be reset until the alarm comes, in which case is close to the correct change time As a rule of thumb, the drift should be chosen as one half of the expected change magnitude Robustness and decreased false alarm rate may be achieved by requiring several gt > h This is in quality control called a run test

Example 3.2 Surveillance using the CUSUM test

Suppose we want to make surveillance of a signal t o detect if its level reaches or exceeds 1 The CUSUM test with U = 0.5 and h = 5 gives an output illustrated in Figure 3.4 Shortly after the level of the signal exceeds

0.5, the test statistic starts to grow until it reaches the threshold, where we get an alarm After this, we continuously get alarms for level crossing A run test where five stops in the CUSUM test generates an alarm would give an alarm at sample 150

Trang 12

3.4 Stopping rules and the CUSUM test 67

Figure 3.4 A signal observed with noise The lower plot shows the test statistic gt from

(3.27)-(3.29) in the CUSUM test

3.4.3 Two-sided tests

The tests in the previous section are based on the assumption that Bt is posi-

tive A two-sided test is obtained as follows:

0 For the averaging and estimation approaches where gt is a linear function

of data, simply test if gt > h1 or gt < -h2

0 For the non-linear (in data) methods CUSUM and SPRT, apply two tests in parallel The second one can be seen as having -yt as the input

and h2 as the threshold We get an alarm when one of the single tests signals an alarm

In a fault detection context, we here get a very basic diagnosis based on the sign of the change

3.4.4 The CUSUM adaptive filter

To illustrate one of the main themes in this book, we here combine adaptive filters with the CUSUM test as a change detector, according to the general picture shown in Figure 1.18 The first idea is to consider the signal as piecewise constant and update the least squares estimate in between the alarm times After the alarm, the LS algorithm is restarted

Trang 13

Algorithm 3.3 The CUSUM LS filter

Example 3.3 Surveillance using the CUSUM test

Consider the same signal as in Example 3.2 Algorithm 3.3 gives the signal

estimate and the test statistics in Figure 3.5 We only get one alarm, at time

79, and the parameter estimate quickly adapts afterwards

A variant is obtained by including a forgetting factor in the least squares estimation Technically, this corresponds to an assumption of a signal that normally changes slowly (caught by the forgetting factor) but sometimes undergoes abrupt changes

Trang 14

3.4 Stomina rules and the CUSUM test 69

Algorithm 3.4 The CUSUM RlS filter

After an alarm, reset g!') = 0, gj2) = 0 and 8, = yt

Design parameters: v, h

output: e,

The point with the reset 8, = yt is that the algorithm forgets all old information instantaneously, while, at the same time, avoids bias and a transient Finally, some general advice for tuning these algorithms, which are defined

by combinations of the CUSUM test and adaptive filters, are given

Trang 15

Tuning of CUSUM filtering algorithms

Start with a very large threshold h Choose v t o one half of the

expected change, or adjust v such that gt = 0 more than 50% of

the time Then set the threshold so the required number of false

alarms (this can be done automatically) or delay for detection is

obtained

0 If faster detection is sought, try to decrease v

0 If fewer false alarms are wanted, try to increase v

0 If there is a subset of the change times that does not make

sense, try to increase v

This section provides a compact presentation of the methods derived in Chap- ters 6 and 10, for the special case of signal estimation

3.5.1 likelihood theory

Likelihood is a measure of likeliness of what we have observed, given the assumptions we have made In this way, we can compare, on the basis of observed data, different assumptions on the change time For the model

yt = 8 + et, et E N(0, R ) , (3.30)

the likelihood is denoted p(yt18, R ) or Zt(8, R ) This should be read as “the likelihood for data gt given the parameters 8, R” Independence and Gaussianity give

The parameters are here nuisance, which means they are irrelevant for change

detection There are two ways to eliminate them if they are unknown: estimation or marginalization

Trang 16

3.5 Likelihood based chanae detection 71

3.5.2 M1 estimation of nuisance parameters

The Muzimum Likelihood ( M L ) estimate of 19 (or any parameter) is formally

By taking the logarithm, we get

-2 log Zt(4, R ) = t log(27r) + t log(R) + -(y2 t - - y2)

R

Setting the derivative with respect to R equal to zero gives

(3.35)

Tiêu đề	Adaptive Filtering And Change Detection
Tác giả	Fredrik Gustafsson
Trường học	John Wiley & Sons, Ltd
Chuyên ngành	Signal Processing
Thể loại	Sách
Năm xuất bản	2000
Thành phố	Hoboken

Định dạng
Số trang	32
Dung lượng	1,05 MB