Example: derivation of the Kalman filter As an illustration of how to use projections, an inductive derivation of the Kalman filter will be given for the state space model, with scalar
Trang 113 Linear estimation
13.1 Projections 451
13.1.1 Linear algebra 451
13.1.2 Functional analysis 453
13.1.3 Linear estimation 454
13.1.4 Example: derivation of the Kalman filter 454
13.2 Conditional expectations 456
13.2.1 Basics 456
13.2.2 Alternative optimality interpretations 457
13.2.3 Derivation of marginal distribution 458
13.2.4 Example: derivation of the Kalman filter 459
13.3 Wiener filters 460
13.3.1 Basics 460
13.3.2 The non-causal Wiener filter 462
13.3.3 The causal Wiener filter 463
13.3.4 Wiener signal predictor 464
13.3.5 An algorithm 464
13.3.6 Wiener measurement predictor 466
13.3.7 The stationary Kalman smoother as a Wiener filter 467
13.3.8 A numerical example 468
1 3.1 Projections
The purpose of this section is to get a geometric understanding of linear es- timation First we outline how projections are computed in linear algebra for finite dimensional vectors Functional analysis generalizes this procedure
to some infinite-dimensional spaces (so-called Hilbert spaces) and finally we point out that linear estimation is a special case of an infinite-dimensional space As an example we derive the Kalman filter
13.1 1 linear algebra
The theory presented here can be found in any textbook in linear algebra
Suppose that X , y are two vectors in Rm We need the following definitions:
Adaptive Filtering and Change Detection
Fredrik Gustafsson Copyright © 2000 John Wiley & Sons, Ltd ISBNs: 0-471-49287-6 (Hardback); 0-470-84161-3 (Electronic)
Trang 20 The scalar product is defined by (X, y) = Czl xiyi The scalar product
is a linear operation in data y
0 Length is defined by the Euclidean norm llxll = d m
0 Orthogonality of z and y is defined by (X, y) = 0:
Ly
0 The projection z p of z on y is defined by
Note that z p - X is orthogonal to y, (xp - X , y) = 0 This is the projection
theorem, graphically illustrated below:
X
The fundamental idea in linear estimation is t o project the quantity to be
estimated onto a plane, spanned by the measurements IIg The projection
all yi spanning the plane II,:
X
We distinguish two different cases for how t o compute xp:
1 Suppose ( ~ 1 , ~ 2 , , E N ) is an orthogonal basis for IIg That is, ( ~ i , ~ j= )0
for all i # j and span(e1, ~ 2 , , E N ) = IIg Later on, E t will be interpreted
Trang 313.1 Proiections 453
as the innovations, or prediciton errors The projection is computed by
Note that the coefficients f i can be interpreted as a filter The projection theorem ( z P - X , ~ j= )0 for all j now follows, since (xP, E ~ ) = (X, E ~ )
2 Suppose that the vectors (yl, y2, , Y N ) are linearly independent, but not necessarily orthogonal, and span the plane IIy Then, Gram-Schmidt orthogonlization gives an orthogonal basis by the following recursion, initiated with €1 = y1,
and we are back in case 1 above:
Y1 = E1
13.1.2 Functional analysis
A nice fact in functional analysis is that the geometric relations in the previous section can be generalized from vectors in Em to infinite dimensional spaces, which (although a bit sloppily) can be denoted Em This holds for so-called
1 (z,z) > 0 for all IC # 0 That is, there is a length measure, or norm, that can be defined as llzll A ( x , x ) ~ / ~
From these properties, one can prove the triangle inequality ( x + Y , X + y ) l j 2 I
( I C , I C ) ' / ~ + (y,y)II2 and Schwartz inequality I(x,y)l 5 l l z l l Ilyll See, for instance, Kreyszig (1978) for more details
Trang 413.1.3 linear estimation
In linear estimation, the elements z and y are stochastic variables, or vectors
of stochastic variables It can easily be checked that the covariance defines a
scalar product (here assuming zero mean),
which satisfies the three postulates for a Hilbert space
A linear filter that is optimal in the sense of minimizing the 2-norm implied
by the scalar product, can be recursively implemented as a recursive Gram-
Schmidt orthogonalization and a projection For scalar y and vector valued z,
the recursion becomes
Remarks:
0 This is not a recursive algorithm in the sense that the number of com-
putations and memory is limited in each time step Further application-
specific simplifications are needed to achieve this
0 To get expressions for the expectations, a signal model is needed Basi-
cally, this model is the only difference between different algorithms
13.1.4 Example: derivation of the Kalman filter
As an illustration of how to use projections, an inductive derivation of the
Kalman filter will be given for the state space model, with scalar yt,
1 Let the filter be initialized by iolo with an auxiliary matrix Polo
2 Suppose that the projection at time t on the observations of ys up to
time t is Ztlt, and assume that the matrix Ptlt is the covariance matrix
of the estimation error, Ptlt = E(ZtltZ&)
Trang 513.1 Proiections 455
3 Time update Define the linear projection operator by
Then
=AProj(stIyt) + - Proj(B,vtIyt) =O = A2+
Define the estimation error as
which gives
Measurement update Recall the projection figure
X
and the projection formula for an orthogonal basis
Trang 6The correlation between xt and E t is examined separately, using (accord-
ing to the projection theorem) E(2tlt-lZtlt-l) = 0 and ~t = yt-C2tlt-l =
CQ-1 + et:
Here we assume that xt is un-correlated with et We also need
The measurement update of the covariance matrix is similar All to-
gether, this gives
The induction is completed
13.2 Conditional expectations
In this section, we use arguments and results from mathematical statistics
Stochastic variables (scalar or vector valued) are denoted by capital letters, t o
distinguish them from the observations This overview is basically taken from
Anderson and Moore (1979)
Suppose the vectors X and Y are simultaneously Gaussian distributed
Then, the conditional distribution for X , given the observed Y = y, is Gaus-
sian distributed:
Trang 713.2 Conditional exDectations 457
This follows directly from Bayes’ rule
by rather tedious computations The complete derivation is given in Section
13.2.3
The Conditional Mean (CM) estimator seen as a stochastic variable can
be denoted
while the conditional mean estimate, given the observed y, is
= E(XIY = y) = px + PxyP;;(y - p y )
Note that the estimate is a linear function of y (or rather, affine)
13.2.2 Alternative optimality interpretations
The Maximum A Posteriori ( M A P ) estimator, which maximizes the Probabil- ity Density Function (PDF) with respect t o X , coincides with the CM estimator for Gaussian distributions
Another possible estimate is given by the Conditional Minimum Variance principle ( CMV) ,
2cMV(y) = argminE(1IX - ~ ( y ) 1 1 ~ I Y = y)
4 Y )
It is fairly easy to see that the CMV estimate also coincides with the CM estimate:
minimum variance This expression is minimized for x(y) = 2(y), and the minimum variance is the remaining two terms
Trang 8The closely related (unconditional) Minimum Variance principle ( M V ) de-
fines an estimator (note the difference between estimator and estimate here):
- W )
Here we explicitely marked which variable the expectation operates on Now,
the CM estimate minimizes the second expectation for all values on Y Thus,
the weighted version, defined by the expectation with respect t o Y must be
minimized by the CM estimator for each Y = y That is, as an estimator, the
unconditional MV and CM also coincide
Start with the easily checked formula
P
(13.3)
and Bayes' rule
From (13.3) we get
and the ratio of determinants can be simplified We note that the new Gaus-
sian distribution must have P,, PxgP&'Pyx as covariance matrix
Trang 913.2 Conditional expectations 459
where
2 = P x + P yP;; (Y - Py)
From this, we can conclude that
1
P X l Y (X, Y) =
det(Pzz - PzyP&j1Pyz)1/2
which is a Gaussian distribution with mean and covariance as given in (13.2)
13.2.4 Example: derivation of the Kalman filter
As an illustration of conditional expectation, an inductive derivation of the Kalman filter will be given, for the state space model
~ t + 1 =Axt + &ut, ut E N(O, Q )
yt =Cxt + et, et E N(O,R)
Trang 10Induction implies that Q, given yt, is normally distributed
13.3 Wiener filters
The derivation and interpretations of the Wiener filter follows Hayes (1996)
Consider the signal model
The fundamental signal processing problem is t o separate the signal st from
the noise et using the measurements yt The signal model used in Wiener's ap-
proach is to assume that the second order properties of all signals are known
When st and et are independent, sufficient knowledge is contained in the cor-
relations coefficients
r s s ( W = E h - k )
r e e ( k ) =E(ete;-k), and similarly for a possible correlation r s e ( k ) Here we have assumed that
the signals might be complex valued and vector valued, so * denotes complex
conjugate transpose The correlation coefficients (or covariance matrices) may
Trang 1113.3 Wiener filters 461
in turn be defined by parametric signal models For example, for a state space model, the Wiener filter provides a solution to the stationary Kalman filter,
as will be shown in Section 13.3.7
The non-causal Wiener filter is defined by
(13.5)
i=-w
In the next subsection, we study causal and predictive Wiener filters, but the principle is the same The underlying idea is t o minimize a least squares criterion,
h = a r g m i n V ( h ) = argminE(Et)2 = argminE(st - & ( h ) ) 2 (13.6)
CO
= argminE(st - (y * h ) t ) 2 = argminE(st - c hiyt-i)2, (13.7)
i z - 0 0
where the residual = st - dt and the least squares cost V ( h ) are defined in a standard manner Straightforward differentiation and equating t o zero gives
(13.8)
This is the projection theorem, see Section 13.1 Using the definition of cor-
relation coefficients gives
CO
c hiryg(k - i) = r S g ( k ) , -m < IC < m (13.9)
i=-a
These are the Wiener-Hopf equations, which are fundamental for Wiener fil-
tering There are several special cases of the Wiener-Hopf equations, basically corresponding to different summation indices and intervals for k
0 The FIR Wiener filter H ( q ) = h0 +hlq-l+ +h,-lq-(n-l) corresponds
to
n-l
c hirgg(k - i) = r s y ( k ) , k = 0 , 1 , , n - 1 (13.10)
i=O
0 The causal (IIR) Wiener filter H ( q ) = h0 + h1q-l + corresponds to
CO
Trang 120 The one-step ahead predictive (IIR) Wiener filter H ( q ) = h1q-l +
h2qP2 + corresponds t o
CO
C h i r y y ( k - i) = rsy(L), 1 5 L < Co (13.12)
i=l
The FIR Wiener filter is a special case of the linear regression framework
studied in Part 111, and the non-causal, causal and predictive Wiener filters are
derived in the next two subsections The example in Section 13.3.8 summarizes
the performance for a particular example
An expression for the estimation error variance is easy t o derive from the
projection theorem (second equality):
Var(st - i t ) = E(st - &)2
= E(st - & ) S t
(13.13)
This expression holds for all cases, the only difference being the summation
interval
To get an easily computable expression for the non-causal Wiener filter, write
(13.9) as a convolution (ryy * h)(L) = r sy(L ) The Fourier transform of a
convolution is a multiplication, and the correlation coefficients become spectral
densities, H ( e i w ) Q y y ( e i w ) = Qsy(eiw) Thus, the Wiener filter is
H ( e Z W ) = Qsy(eiw)
Q yy (eiw ) '
or in the z domain
(13.14)
(13.15)
Here the x-transform is defined as F ( x ) = C f k x P k , so that stability of causal
filters corresponds to IzI < 1 This is a filter where the poles occur in pairs
reflected in the unit circle Its implementation requries either a factorization
or partial fraction decomposition, and backward filtering of the unstable part
Trang 1313.3 Wiener filters 463
Figure 13.1 The causal Wiener filter H ( z ) = G + ( z ) F ( z ) can be seen as cascade of a whitening filter F ( z ) and a non-causal Wiener filter G+(.) with white noise input E t
13.3.3 The causal Wiener filter
The causal Wzenerfilteris defined as in (13.6), with the restriction that h k = 0 for k < 0 so that future measurements are not used when forming dt The
immediate idea of truncating the non-causal Wiener filter for k < 0 does not work The reason is that the information in future measurements can be par- tially recovered from past measurements due to signal correlation However, the optimal solution comes close to this argumentation, when interpreting a part of the causal Wiener filter as a whitening filter The basic idea is that the causal Wiener filter is the causal part of the non-causal Wiener filter if the measurements are white noise!
Therefore, consider the filter structure depicted in Figure 13.1 If yt has a rational spectral density, spectral factorization provides the sought whitening filter,
where Q(z) is a monic ( q ( 0 ) = l),
For real valued signals, it holds on
written Q g g ( z ) = a i Q ( z ) Q ( l / ~ ) A
given as
(13.16) stable, minimum phase and causal filter the unit circle that the spectrum can be stable and causal whitening filter is then
(13.17)
Now the correlation function of white noise is r E E ( k ) = B k , so the Wiener-Hopf equation (13.9) becomes
(13.18) where {g:} denotes the impulse response of the white noise Wiener filter in Figure 13.1 Let us define the causal part of a sequence { x ~ } ~ ~ in the z
domain as [ X ( x ) ] + Then, in the X domain (13.18) can be written as
Trang 14It remains to express the spectral density for the correlation stet* in terms of
the signals in (13.4) Since E; = $ F * ( l / x * ) , the cross spectrum becomes
To summarize, the causal Wiener filter is
It is well worth noting that the non-causal Wiener
similar way:
(13.20)
(13.21)
t
filter can be written in a
(13.22)
That is, both the causal and non-causal Wiener filters can be interpreted as a
cascade of a whitening filter and a second filter giving the Wiener solution for
the whitened signal The second filter's impulse response is simply truncated
when the causal filter is sought
Finally, to actually compute the causal part of a filter which has poles both
inside and outside the unit circle, a partial fraction decomposition is needed,
where the fraction corresponding to the causal part has all poles inside the
unit circle and contains the direct term, while the fraction with poles outside
the unit circle is discarded
13.3.4 Wiener signal predictor
The Wiener m-step signal predictor is easily derived from the causal Wiener
filter above The simplest derivation is to truncate the impulse response of
the causal Wiener filter for a whitened input at another time instant Figure
13.2(c) gives an elegant presentation and relation to the causal Wiener filter
The same line of arguments hold for the Wiener fixed-lag smoother as well;
just use a negative value of the prediction horizon m
13.3.5 An algorithm
The general algorithm below computes the Wiener filter for both cases of
smoothing and prediction
Algorithm 73.7 Causal, predictive and smoothing Wiener filter
Given signal and noise spectrum The prediction horizon is m, that is, mea-
surements up to time t m are used For fixed-lag smoothing, m is negative