If the true value X; is not known for some reason, or if one wants to measure only the variance error, X: can be replaced by its Monte Carlo mean, while at the same time... Some authors
Trang 1Part V: Theory
ISBNs: 0-471-49287-6 (Hardback); 0-470-84161-3 (Electronic)
Trang 212
Evaluation theory
12.1 Filter evaluation 427
12.1.1 Filter definitions 427
12.1.2 Performance measures 428
12.1.3 Monte Carlo simulations 429
12.1.4 Bootstrap 432
12.1.5 MCMC and Gibbs sampler 437
12.2 Evaluation of change detectors 439
12.2.1 Basics 439
12.2.2 The ARL function 441
12.3 Performance optimization 444
12.3.1 The MDL criterion 445
12.3.2 Auto-tuning and optimization 450
12.1 l Filter definitions
Consider a general linear filter (or estimator)
(12.1)
as illustrated in Figure 12.1 Typically, the estimated quantity X is either the
parameter vector 19 in a parametric model, or the state X in a state space model
It might also be the signal component st of the measurement yt = st +ut The
measurements zt consist of the measured outputs yt and, when appropriate,
the inputs ut
The basic definitions that will be used for a linear $filter are:
e A time-invariant filter has at,i = cui for all t and i
e A non-causal filter has tl < 0 This is used in smoothing If tl 2 0, the filter is causal
ISBNs: 0-471-49287-6 (Hardback); 0-470-84161-3 (Electronic)
Trang 3Figure 12.1 An estimator takes the observed signal zt and transforms it to estimates &
0 A causal filter is IIR (Infinite Impulse Response) if t 2 = c m , otherwise it
is FIR (Finite Impulse Response)
0 2t is a k-step ahead prediction if tl = k > 0
Most filters use all past data, so t 2 = c m , and the notation is sometimes useful for highlighting that the estimator is a predictor ( t l > 0), a smoother
0 A scalar measure of performance is often to prefer when evaluating dif- ferent filters, such as the square root of the mean value of the norm of the estimation error
where the subindex 2 stands for the 2-norm and tr for trace, which is the sum of diagonal elements of the matrix argument This is a measure of the length of the estimation error One can think of it as the standard deviation of the estimation error
0 Sometimes the second order properties are not enough, and the complete
Probability Density Function (PDF) needs t o be estimated
Trang 4Confidence intervals for the parameters and hypothesis tests are other related applications of variability measures
Monte Carlo simulations offer a means for estimating these measures Before
proceeding, let us consider the causes for variability in more detail:
Variance error is caused by the variability of different noise realizations,
which gives a variability in 2t - E(&)
Bias and tracking error are caused by an error in the signal model (bias)
and inability of the filter to track fast variations in X: (tracking error), respectively The combined effect is a deviation in z: - E(&) Determi- nation of bias error is in general a difficult task and will not be discussed further in this section We will assume that there is no bias, only tracking error
Hence, the covariance matrix can be seen as a sum of two terms,
v -
Suppose we can generate A4 realizations of the data zt, by means of simulation
or data acquisition under the same conditions Denote them by zt ( A , J =
1 , 2 , , M Apply the same estimator (12.1) t o all of them,
From these M sets of estimates, we can estimate all kind of statistics For instance, if the dimension of z is only one, we can make a histogram over &, which graphically approximates the PDF
The scalar measure (12.3) is estimated by the Root Mean Square Error ( R M S E )
(12.6)
This equation comprises both tracking and variance errors If the true value
X; is not known for some reason, or if one wants to measure only the variance error, X: can be replaced by its Monte Carlo mean, while at the same time
Trang 5changing M to M - 1 in the normalization (to get an unbiased estimate of the standard deviation), and we get
Equation (12.6) is an estimate of the standard deviation of the estimation error norm at each time instant A scalar measure for the whole data sequence is
(12.8)
Note that the time average is placed inside the square root, in order to get
an unbiased estimate of standard deviation Some authors propose to time average the RMSE(t), but this is not an estimate of standard deviation of the error anymore.’
Example 72.1 Signal estimation
The first subplot in Figure 12.2 shows a signal, which includes a ramp
change, and a noisy measurement yt = st + et of it To recover the signal from the measurements, a low-pass filter of Butterworth-type is applied Using many different realization of the measurements, the Monte Carlo mean is illustrated in the last two subplots, where an estimated confidence bound is also marked This bound is the ‘fd level, where o is replaced by RMSE(t)
In some applications, including safety critical ones, it is the peak error that
is crucial,
(12.9)
From this, a scalar peak measure can be defined as the total peak (max over time) or a time RMSE (time average peak value) Note that this measure is non-decreasing with the number of Monte Carlo runs, so the absolute value should be interpreted with a little care
According to Jensen’s inequality, E(,/$ < m, so this incorrect procedure would produce an under-biased estimate of standard deviation In general, standard deviations should not be averaged Compare the outcomes from mean(std(randn(l0,IOOOO))) with
sqrt(mean(std(randn(lO,10000)).~2)) !
Trang 6Figure 12.2 First plot shows the signal (dashed line) and the measurements (solid line)
from one realization The second and third plot show the Monte Carlo mean estimate and the one sigma confidence interval using the Monte Carlo standard deviation for 50 and 1000 Monte Carlo runs, respectively
The scalar performance measures can be used for auto-tuning Suppose the
filter is parameterized in a scalar design parameter As argued in Chapter 1, all linear filters have such a scalar to trade-off variance and tracking errors Then
we can optimize the filter design with respect to this measure, for instance
by using a line search The procedure can be generalized to non-scalar design parameters, at the cost of using more sophisticated and computer intensive optimization routines
Example 72.2 Target tracking
Consider the target tracking example in Example 1.1 For a particular filter (better tuned than the one in Example 1.1), the filter estimates lie on top of the measurements, so Figure 12.3(a) is not very practical for filter evaluation, because it is hard to see any error at all
The RMSE position error in 12.3(b) is a much better tool for evalua- tion Here we can clearly see the three important phases in adaptive filtering:
transient (for sample numbers 1-7), the variance error (8-15 and 27-32) and trucking error (16-26 and 35-45)
All in all, as long as it is possible to generate several data sets under the same premises, Monte Carlo techniques offer a solution to filter evaluation and design However, this might not be possible, for instance when collecting
Trang 7Figure 12.3 Radar trajectory (circles) and filter estimated positions (crosses) in (a) The
RMSE position error in (b)
data during one single test trial, where it is either impossible to repeat the
same experiment or too expensive One can then try resampling techniques,
as described in the next subsections
0 Bootstrap offers a way to reorder the filter residuals and make artificial
simulations that are similar to Monte Carlo simulations The procedure
includes iterative filtering and simulation
0 Gibbs resampling is useful when certain marginal distributions can be
formulated mathematically The solution consists in iterative filtering
and random number generation
12.1.4 Bootstrap
Static case
As a non-dynamic example of the bootstrap technique, consider the case of
estimating distribution parameters in a sequence of independent identically
distributed (i.i.d.) stochastic variables,
where Q are parameters in the distribution For instance, consider the location
Q and scale oY parameters in a scalar Gaussian distribution We know from
Section 3.5.2 how to estimate them, and there are also quite simple theoretical
expressions for uncertainty in the estimates In more general cases, it might
be difficult, if not impossible, to compute the sample variance of the point
Trang 8estimate, since most known results are asymptotic As detailed in Section 12.1.1, the standard approach in a simulation environment is to perform Monte Carlo simulations However, real data sets are often impossible or too costly to reproduce under indentical conditions, and there could be too few data points for the asymptotic expressions to hold
The key idea in bootstrap is to produce new artificial data sets by picking samples at random from the set yN with replacement That is, from the measured values 4 , 2 , 3 , 5 , 1 we might generate 2,2,5,1,5 Denote the new data set by ( Y ~ ) ( ~ ) Each new data set is used to compute a point estimate
8(2) Finally, these estimates are treated as independent outcomes from Monte Carlo simulations, and we can obtain different variability measures as standard deviation and confidence intervals
Example 12.3 Boorsrrap
Following the example in Zoubir and Boushash (1998), we generate 10
samples from N(10,25) We want to estimate the mean I9 and its variance P
in the unknown distribution for the measurements yt The point estimate of
Trang 9Statistics
10 8.8114 9.9806 8.7932 E(8)
Theoretical bootstrap
Monte Carlo Point estimate
A
Finally, one might ask what the performance is as an average Twenty re- alizations of the tables above are generated, and the mean values of variability are:
Statistics
2.5 2.40 2.53 2.65 EVar(8)
Theoretical bootstrap
Monte Carlo Point estimate
A
In conclusion, bootstrap offers a good alternative to Monte Carlo simula- tions or analytical point estimate
From the example, we can conclude the following:
0 The bootstrap result is as good as the natural point estimate of vari- ablity
0 The result very much depends upon the realization
That is:
Bootstrap
Bootstrap cannot create more information than is contained in the
measurements, but it can compute variability measures numeri-
cally, which are as good as analytically derived point estimates I
There are certain applications where there is no analytical expression for the point estimate of variablity As a simple example, the variance of the sample median is very hard to compute analytically, and thus a point estimator
of median variance is also hard to find
Dynamic time-invariant case
For more realistic signal processing applications, we first start by outlining the time invariant case Suppose that the measured data are generated by a dynamical model parametrized by a parameter 19,
The bootstrap idea is now as follows:
Trang 101 Compute a point estimate 8 from the original data set {yi, ui}El
2 Apply the inverse model (filter) and get the noise (or, more precisely,
residual) sequence eN
3 Generate bootstrap noise sequences ( G N ) ( 2 ) by picking random samples
from eN with replacement
4 Simulate the estimated system with these artificial noise sequences:
Yt = f(Yt-17 Y t - 2 , 7 ut-17 * * * 7 et 4 4 e)*
5 Compute point estimates 8(i), and treat them as independent outcomes from Monte Carlo simulations
Example 72.4 Boorsrrap
Consider the auto-regressive (AR( 1)) model
N = 10 samples are simulated From these, the ML estimate of the parameter
6' in the AR model gt + 6'yt-l = et is computed (In MATLABTM this is done
Generate bootstrap sequences ( d N ) ( i ) and simulate new data by
Finally, estimate 6' for each sequence ( T J ~ ) ( ~ )
A histogram over 1000 bootstrap estimates is shown in Figure 12.4 For comparison, the Monte Carlo estimates are used t o approximate the true PDF
of the estimate Note that the PDF of the estimate is well predicted by using only 10 samples and bootstrap techniques
The table below summarizes the accuracy of the point estimates for the different methods:
Statistics Point estimate Monte Carlo bootstrap Theoretical
A
Trang 11Figure 12.4 Histograms for point estimates of an AR parameter using (a) Monte Carlo
simulations and (b) bootstrap
As in the previous example, we can average over, say, 20 tables to average
out the effect of the short data realization of only 10 samples on which the
bootstrap estimate is based The table below shows that bootstrap gives
almost as reliable estimate as a Monte Carlo simulation:
Statistics
0.22583 0.31 0.28 0.25 Std(0)
Theoretical bootstrap
Monte Carlo Point estimate
The example shows a case where the theoretical variance and standard
deviation can be computed with a little effort However, for higher order AR
models, finite data covariance expressions are hard to compute analytically,
and one has to rely on asymptotic expressions or resampling techniques as
bootstrap
Dynamical time-varying case
A quite challenging problem for adaptive filtering is to compute the RMSE as
a function of time As a general problem formulation, consider a state space
model
where P! = Cov(2.t) is sought Other measures as RMSE can be expressed
in terms of P! For many applications, it is quite obvious that the covariance
matrix Pt delivered by the Kalman filter is not reliable Consider the target
Trang 12tracking example The Kalman filter innovations are certainly not white after the manoeuvres A natural generalization of the bootstrap principle is as follows:
1 Inverse filter the state space model to get an estimate of the two noise sequences {6t} and {&} This includes running the Kalman filter to get the filtered estimate &, and then compute
The estimate of ut might be in the least squares sense if B, is not full rank
2 If an estimate of the variance contribution to the RMSE(t) is to be found, resample only the measurement noise sequence {&}(i) Other- wise, resample both sequences However, here one has to be careful If the sequence {6t} is not independent and identically distributed (2.2 d ) ,
which is probably the case when using real data, and the bootstrap idea does not apply
3 Simulate the system for each set of bootstrap sequences
4 Apply the Kalman filter and treat the state estimates as Monte Carlo outcomes
Literature
An introduction to the mathematical aspects of bootstrap can be found in Politis (1998) and a survey of signal processing applications in Zoubir and Boushash (1998) An overview for system identification and an application to uncertainty estimation is presented in Tjarnstrom (2000)
As a quite general change detection and Kalman filter example, consider the state space model
~ t + 1 = A x ~ + + &ut + C S t - k j B f f j
j
gt =Cxt + Dut + et,
where fj is a fault, assumed to have known (Gaussian) distribution, occuring
at times k j Let K be the random vector with change times, X the random