In Chapter 9 we developed the Kalman filter from the minimum variance estimate for the case where there is no driving noise present in the target dynamics model.. 604] d dtXðtÞ ¼ AðtÞXðt
Trang 1KALMAN FILTER REVISITED
18.1 INTRODUCTION
In Section 2.6 we developed the Kalman filter as the minimization of a quadratic error function In Chapter 9 we developed the Kalman filter from the minimum variance estimate for the case where there is no driving noise present
in the target dynamics model In this chapter we develop the Kalman filter for more general case [5, pp 603–618] The concept of the Kalman filter as a fading-memory filter shall be presented Also its use for eliminating bias error buildup will be presented Finally, the use of the Kalman filter driving noise to prevent instabilities in the filter is discussed
18.2 KALMAN FILTER TARGET DYNAMIC MODEL
The target model considered by Kalman [19, 20] is given by [5, p 604]
d
dtXðtÞ ¼ AðtÞXðtÞ þ DðtÞUðtÞ ð18:2-1Þ where AðtÞ is as defined for the time-varying target dynamic model given in (15.2-1), DðtÞ is a time-varying matrix and UðtÞ is a vector consisting of random variables to be defined shortly The term UðtÞ is known as the process-noise or forcing function Its inclusion has beneficial properties to be indicated later The matrix DðtÞ need not be square and as a result UðtÞ need not have the same dimension as XðtÞ The solution to the above linear differential equation is
375
Copyright # 1998 John Wiley & Sons, Inc ISBNs: 0-471-18407-1 (Hardback); 0-471-22419-7 (Electronic)
Trang 2[5, p 605]
XðtÞ ¼ ðt; tn1ÞXðtn1Þ þ
ðt
t n1
ðt; ÞDðÞUðÞd ð18:2-2Þ
where is the transition matrix obtained from the homogeneous part of (18.2-1), that is, the differential equation without the driving-noise term DðtÞUðtÞ, which is the random part of the target dynamic model Consequently,
satisfies (15.3-1)
The time-discrete form of (18.2-1) is given by [5, p 606]
XðtnÞ ¼ ðtn; tn1ÞXðtn1Þ þ Vðtn; tn1Þ ð18:2-3Þ
where
Vðt; tn1Þ ¼
ðt
t n1
ðt; ÞDðÞUðÞd ð18:2-4Þ The model process noise UðtÞ is white noise, that is,
and
E½UðtÞUðt0ÞT ¼ KðtÞðt t0Þ ð18:2-6Þ
where KðtÞ is a nonnegative definite matrix dependent on time and ðtÞ is the Dirac delta function given by
ðt t0Þ ¼ 0 t06¼ t ð18:2-7Þ
with
ðb a
ðt t0Þ dt ¼ 1 a< t0 < b ð18:2-8Þ
18.3 KALMAN’S ORIGINAL RESULTS
By way of history as mentioned previously, the least-square and minimum-variance estimates developed in Sections 4.1 and 4.5 have their origins in the work done by Gauss in 1795 The least mean-square error estimate, which obtains the minimum of the ensemble expected value of the squared difference between the true and estimated values, was independently developed by
Trang 3Kolmogorov [125] and Wiener [126] in 1941 and 1942, respectively Next, the Kalman filter [19, 20] was developed, it providing an estimate of a random variable that satisfies a linear differential equation driven by white noise [see (18.2-1)] In this section the Kalman filter as developed in [19] is summarized together with other results obtained in that study The least mean-square error criteria was used by Kalman and when the driving noise is not present the results are consistent with those obtained using the least-squares error estimate, and minimum-variance estimate given previously
Kalman [19] defines the optimal estimate as that which (if it exists) minimizes the expected value of a loss function Lð"Þ, that is, it minimizes E½Lð"Þ, which is the expected loss, where
n;nis an estimate of xn, the parameter to be estimated based on the nþ 1 observations given by
YðnÞ¼ ðy0; y1; y2; ; ynÞT ð18:3-2Þ
It is assumed that the above random variables have a joint probability density function given by pðxn; YðnÞÞ A scalar function Lð"Þ is a loss function if it satisfies
ðiiÞ Lð"0Þ > Lð"00Þ > 0 if "0> "00 > 0 ð18:3-3bÞ ðiiiÞ Lð"Þ ¼ Lð"Þ ð18:3-3cÞ
Example loss functions are Lð"Þ ¼ "2 and Lð"Þ ¼ j"j Kalman [19] gives the following very powerful optimal estimate theorem
Theorem 1 [5, pp 610–611] n;n of xn based on the observation YðnÞ is given by
n;n¼ E½xnjYðnÞ ð18:3-4Þ
If the conditional density function for xngiven YðnÞrepresented by pðxnjYðnÞÞ is (a) unimodel and (b) symmetric about its conditional expectation E½xnjYðnÞ
The above theorem gives the amazing result that the optimum estimate (18.3-4) is independent of the loss function as long as (18.3-3a) to (18.3-3c) applies, it only depending on pðxnjYðnÞÞ An example of a conditional density function that satisfies conditions (a) and (b) is the Gaussian distribution
Trang 4In general, the conditional expectation E½ xnjYðnÞ is nonlinear and difficult
to compute If the loss function is assumed to be the quadratic loss function Lð"Þ ¼ "2, then conditions (a) and (b) above can be relaxed, it now only being necessary for the conditional density function to have a finite second moment in order for (18.3-4) to be optimal
Before proceeding to Kalman’s second powerful theorem, the concept of orthogonal projection for random variables must be introduced Let i and j
be two random variables In vector terms these two random variables are independent of each other if i is not just a constant multiple of j Furthermore, if [5, p 611]
¼ iiþ jj ð18:3-5Þ
is a linear combination of iand j, then is said to lie in the two-dimensional space defined by i and j A basis for this space can be formed using the Gram–Schmidt orthogonalization procedure Specifically, let [5, p 611]
and
ej ¼ jEfijg
Ef2
It is seen that
Efeiejg ¼ 0 i6¼ j ð18:3-8Þ
The above equation represents the orthogonality condition (The idea of orthogonal projection for random variables follows by virtue of the one-for-one analogy with the theory of linear vector space Note that whereas in linear algebra an inner product is used, here the expected value of the product of the random variables is used.) If we normalize ei and ej by dividing by their respective standard deviations, then we have ‘‘unit length’’ random variables and form an orthonormal basis for the space defined by iand j Let eiand ej
now designate these orthonormal variables Then
Efeiejg ¼ ij ð18:3-9Þ
where ij is the Kronecker function, which equals 1 when i¼ j and equals 0 otherwise
Let be any random variable that is not necessarily a linear combination of
i and j Then the orthogonal projection of onto the i; j space is defined
by [5, p 612]
¼ e Ef eg þ eEf eg ð18:3-10Þ
Trang 5~
Then it is easy to see that [5, p 612]
Ef ~ eig ¼ 0 ¼ Ef ~ ejg ð18:3-12Þ
which indicates that is orthogonal to the space i; j Thus has been broken
up into two parts, the part in the space i, j, called the orthogonal projection
of onto the i, j space, and the ~ part orthogonal to this space The above concept of orthogonality for random variables can be generalized to an n-dimensional space (A less confusing labeling than ‘‘orthogonal projection’’ would probably be just ‘‘projection.’’)
We are now ready to give Kalman’s important Theorem 2
Theorem 2 [5, pp 612–613] n;n of xn based on the measurements YðnÞ is equal to the orthogonal projection of xn onto the space defined by YðnÞ if
1 The random variables xn; y0; y1; ; yn all have zero mean and either
2 (a) xnand YðnÞare just Gaussian or (b) the estimate is restricted to being a linear function of the measurement YðnÞ and Lð"Þ ¼ "2
The above optimum estimate is linear for the Gaussian case This is because the projection of xn onto YðnÞis a linear combination of the element of YðnÞ But
in the class of linear estimates the orthogonal projection always minimizes the expected quadratic loss given by E½ "2 Note that the more general estimate given by Kalman’s Theorem 1 will not be linear
Up till now the observations yi and the variable xn to be estimated were assumed to be scaler Kalman actually gives his results for the case where they are vectors, and hence Kalman’s Theorem 1 and Theorem 2 apply when these variables are vectors We shall now apply Kalman’s Theorem 2 to obtain the form of the Kalman filter given by him
Let the target dynamics model be given by (18.2-1) and let the observation scheme be given by [5, p 613]
Note that Kalman, in giving (18.3-13), does not include any measurement noise term NðtÞ Because of this, the Kalman filter form he gives is different from that given previously in this book (see Section 2.4) We shall later show that his form can be transformed to be identical to the forms given earlier in this book The measurement YðtÞ given in (18.3-13) is assumed to be a vector Let us assume that observations are made at times i¼ 0; 1; ; n and can be
Trang 6represented by measurement vector given by
YðnÞ
YðnÞ
-Yn1 -
-Y0
2 6 6 6 6 6 4
3 7 7 7 7 7 5
ð18:3-14Þ
nþ1;n of Xnþ1, which minimizes E½Lð"Þ Applying Kalman’s Theorem 2, we find that the optimum estimate is given by the projection of Xnþ1 onto YðnÞ of (18.3-14) In reference 19 Kalman shows that this solution is given by the recursive relationships [5, p 614]
n nMnTðMn nMnTÞ1 ð18:3-15aÞ
nþ1;n n;n1 nYn ð18:3-15cÞ
nþ1 n ðn þ 1; nÞTþ Qnþ1;n ð18:3-15dÞ The above form of the Kalman filter has essentially the notation used by Kalman in reference 19; see also reference 5 Physically, ðn þ 1; nÞ is the transition matrix of the unforced system as specified by (18.2-3) Defined earlier, Mn is the observation matrix, Qnþ1;n is the covariance matrix of the vector Vðtnþ1; tn nþ1 is the covariance matrix of the estimate
nþ1:n
We will now put the Kalman filter given by (18.3-15a) to (18.3-15d) in the form of (2.4-4a) to (2.4-4j) or basically (9.3-1) to (9.3-1d) The discrete version
of the target dynamics model of (18.2-3) can be written as [5, p 614]
Xnþ1¼ ðn þ 1; nÞXnþ Vnþ1;n ð18:3-16Þ
The observation equation with the measurement noise included can be written as
Yn¼ MnXnþ Nn ð18:3-17Þ
instead of (18.3-13), which does not include the measurement noise Define an augmented state vector [5, p 614]
X0n¼
Xn N
2 4 3
Trang 7and augmented driving noise vector [5, p 615]
Vnþ1;n0 ¼
Vnþ1;n
-Nnþ1
2 4
3
Define also the augmented transition matrix [5, p 615]
0ðn þ 1; nÞ ¼
ðn þ 1; nÞ j 0 - j
2 4
3
and the augmented observation matrix
M0n¼ ðMn j IÞ ð18:3-21Þ
It then follows that (18.3-16) can be written as [5, p 615]
X0nþ1¼ 0ðn þ 1; nÞXn0 þ Vnþ1;n0 ð18:3-22Þ
and (18.3-17) as [5, p 615]
Yn ¼ Mn0Xn0 ð18:3-23Þ
which have the same identical forms as (18.2-3) and (18.3-13), respectively, and
to which Kalman’s Theorem 2 was applied to obtain (18.3-15) Replacing the unprimed parameters of (8.3-15) with their above-primed parameters yields [5,
p 616]
n;n n;n1þ HnðYn Mn n;n1Þ ð18:3-24aÞ
Hn n;n1MTnðRnþ Mn n;n1MnTÞ1 ð18:3-24bÞ
n;n1 n1;n1ðn; n 1ÞTþ Qn;n1 ð18:3-24dÞ
n;n1 n1;n1 ð18:3-24eÞ where Qnþ1;n is the covariance matrix of Vnþ1;n and Rnþ1 is the covariance matrix of Nnþ1 The above form of the Kalman filter given by (18.3-24a) to (18.3-24e) is essentially exactly that given by (2.4-4a) to (2.4-4j) and (9.3-1) to (9.3-1d) when the latter two are extended to the case of a time-varying dynamics model
Comparing (9.3-1) to (9.3-1d) developed using the minimum-variance estimate with (18.3-24a) to (18.3-24e) developed using the Kalman filter projection theorem for minimizing the loss function, we see that they differ by
Trang 8the presence of the Q term, the variance of the driving noise vector It is gratifying to see that the two radically different aproaches led to essentially the same algorithms Moreover, when the driving noise vector V goes to 0, then (18.3-24a) to (18.3-24e) is essentially the same as given by (9.3-1) to (9.3-1d), the Q term in (18.3-24d) dropping out With V present Xn is no longer determined by Xn1 completely The larger the variance of V, the lower the dependence of Xn on Xn1 and as a result the less the Kalman filter estimate
n;n should and will depend on the past measurements Put in another way the larger V is the smaller the Kalman filter memory The Kalman filter in effect thus has a fading memory built into it Viewed from another point of view, the
n;n1 n;n1is the less
n;n1 n;n, which means that the filter memory is fading faster
The matrix Q is often introduced for purely practical reasons even if the presence of a process noise term in the target dynamics model cannot be justified It can be used to counter the buildup of a bias error The shorter the filter memory the lower the bias error will be The filter fading rate can be controlled adaptively to prevent bias error buildup or to respond to a target maneuver This is done by observing the filter residual given by either
rn ¼ ðYn Mn n;nÞTðYn Mn n;nÞ ð18:3-25Þ
or
rn¼ ðYn Mn n;nÞT n;nÞ1ðYn Mn n;nÞ ð18:3-26Þ
The quantity
sn ¼ Yn Mn n;n ð18:3-27Þ
in the above two equations is often called the innovation process or just innovation in the literature [7, 127] The innovation process is white noise when the optimum filter is being used
staying singular once it becomes singular for any reason at any given time A become singular when the observations being made at one instant of time are perfect [5] If this occurs, then the elements of H in (18.3-24a) becomes 0, and
H becomes singular When this occurs, the Kalman filter without process noise stops functioning — it no longer accepts new data, all new data being given a 0 weight by H¼ 0 This is prevented when Q is present because if, for example,
n1;n1 is singular at time n 1, the presence of Qn;n1 in (18.3-24d) will
n;n1 nonsingular