Examples of problems that can be cast into the form of the generic linear model are: identifyingthe impulse response coefficients in the convolutional summation model for a linear time-i
Trang 1Mendel, J.M “Estimation Theory and Algorithms: From Gauss to Wiener to Kalman”
Digital Signal Processing Handbook
Ed Vijay K Madisetti and Douglas B Williams
Boca Raton: CRC Press LLC, 1999
Trang 215.9 State Estimation for the Basic State-Variable Model
Prediction•Filtering (the Kalman Filter)•Smoothing
15.10 Digital Wiener Filtering15.11 Linear Prediction in DSP, and Kalman Filtering15.12 Iterated Least Squares
15.13 Extended Kalman FilterAcknowledgment
ReferencesFurther Information
15.1 Introduction
Estimation is one of four modeling problems The other three are representation (how somethingshould be modeled), measurement (which physical quantities should be measured and how theyshould be measured), and validation (demonstrating confidence in the model) Estimation, whichfits in between the problems of measurement and validation, deals with the determination of thosephysical quantities that cannot be measured from those that can be measured We shall cover a widerange of estimation techniques including weighted least squares, best linear unbiased, maximum-
likelihood, mean-squared, and maximum-a posteriori These techniques are for parameter or state
estimation or a combination of the two, as applied to either linear or nonlinear models
The discrete-time viewpoint is emphasized in this chapter because: (1) much real data is collected in
a digitized manner, so it is in a form ready to be processed by discrete-time estimation algorithms; and(2) the mathematics associated with discrete-time estimation theory is simpler than with continuous-time estimation theory We view (discrete-time) estimation theory as the extension of classical signalprocessing to the design of discrete-time (digital) filters that process uncertain data in a optimalmanner Estimation theory can, therefore, be viewed as a natural adjunct to digital signal processingtheory Mendel [12] is the primary reference for all the material in this chapter
Trang 3Estimation algorithms process data and, as such, must be implemented on a digital computer.Our computation philosophy is, whenever possible, leave it to the experts Many of our chapter’salgorithms can be used with MATLABTMand appropriate toolboxes (MATLAB is a registered trade-mark of The MathWorks, Inc.) See [12] for specific connections between MATLABTMand toolboxM-files and the algorithms of this chapter.
The main model that we shall direct our attention to is linear in the unknown parameters, namely
In this model, which we refer to as a “generic linear model,” Z(k) = col (z(k), z(k − 1), , z(k −
N + 1)), which is N × 1, is called the measurement vector Its elements are z(j) = h0(j)θ + v(j);
θ which is n × 1, is called the parameter vector, and contains the unknown deterministic or random
parameters that will be estimated using one or more of this chapter’s techniques; H(k), which is N ×n,
is called the observation matrix; and, V(k), which is N × 1, is called the measurement noise vector.
By convention, the argument “k” of Z(k), H(k), and V(k) denotes the fact that the last measurementused to construct (15.1) is thekth.
Examples of problems that can be cast into the form of the generic linear model are: identifyingthe impulse response coefficients in the convolutional summation model for a linear time-invariantsystem from noisy output measurements; identifying the coefficients of a linear time-invariant finite-difference equation model for a dynamical system from noisy output measurements; function ap-proximation; state estimation; estimating parameters of a nonlinear model using a linearized version
of that model; deconvolution; and identifying the coefficients in a discretized Volterra series sentation of a nonlinear system
repre-The following estimation notation is used throughout this chapter: ˆθ(k) denotes an estimate of θ
and ˜θ(k) denotes the error in estimation, i.e., ˜θ(k) = θ − ˆθ(k) The generic linear model is the starting
point for the derivation of many classical parameter estimation techniques, and the estimation model
for Z(k) is ˆZ(k) = H(k) ˆθ(k) In the rest of this chapter we develop specific structures for ˆθ(k) These
structures are referred to as estimators Estimates are obtained whenever data are processed by anestimator
matrix can be used to weight recent measurements more (or less) heavily than past measurements
If W(k) = cI, so that all measurements are weighted the same, then weighted least-squares reduces
to least squares, in which case, we obtain ˆθ LS (k) Setting dJ [ ˆθ(k)]/d ˆθ(k) = 0, we find that:
ˆθWLS(k) =H0(k)W(k)H(k)−1H0(k)W(k)Z(k) (15.2)and, consequently,
ˆθ LS (k) =H0(k)H(k)−1H0(k)Z(k) (15.3)Note, also, thatJ [ ˆθWLS(k)] = Z0(k)W(k)Z(k) − ˆθWLS0 (k)H0(k)W(k)H(k) ˆθWLS(k).
Matrix H0(k)W(k)H(k) must be nonsingular for its inverse in (15.2) to exist This is true if W(k)
is positive definite, as assumed, and H(k) is of maximum rank We know that ˆθWLS(k) minimizes
J [ ˆθWLS(k)] because d2J [ ˆθ(k)]/d ˆθ2(k) = 2H0(k)W(k)H(k) > 0, since H0(k)W(k)H(k) is
invert-ible Estimator ˆθWLS(k) processes the measurements Z(k) linearly; hence, it is referred to as a linear
Trang 4estimator In practice, we do not compute ˆ θWLS(k) using (15.2), because computing the inverse
of H0(k)W(k)H(k) is fraught with numerical difficulties Instead, the so-called normal equations
[H0(k)W(k)H(k)] ˆθWLS(k) = H0(k)W(k)Z(k) are solved using stable algorithms from numerical
lin-ear algebra (e.g., [3] indicating that one approach to solving the normal equations is to convert theoriginal least squares problem into an equivalent, easy-to-solve problem using orthogonal transfor-mations such as Householder or Givens transformations) Note, also, that (15.2) and (15.3) apply tothe estimation of either deterministic or random parameters, because nowhere in the derivation of
ˆθWLS(k) did we have to assume that θ was or was not random Finally, note that WLSEs may not be
invariant under changes of scale One way to circumvent this difficulty is to use normalized data.Least-squares estimates can also be computed using the singular-value decomposition (SVD) of
matrix H(k) This computation is valid for both the overdetermined (N < n) and underdetermined
(N > n) situations and for the situation when H(k) may or may not be of full rank The SV D of
where U and V are unitary matrices,P
= diag (σ1, σ2, , σ r ), and σ1≥ σ2≥ ≥ σ r > 0 The
σ i’s are the singular values of A, andr is the rank of A Let the SVD of H(k) be given by (15.4) Even
if H(k) is not of maximum rank, then
Similar formulas exist for computing ˆθWLS(k).
Equations (15.2) and (15.3) are batch equations, because they process all of the measurements
at one time These formulas can be made recursive in time by using simple vector and matrixpartitioning techniques The information form of the recursive WLSE is:
ˆθWLS(k + 1) = ˆθWLS(k) + Kw(k + 1)[z(k + 1) − h0(k + 1) ˆθWLS(k)] (15.7)
P−1(k + 1) = P−1(k) + h(k + 1)w(k + 1)h0(k + 1) (15.9)Equations (15.8) and (15.9) require the inversion ofn × n matrix P If n is large, then this will
be a costly computation Applying a matrix inversion lemma to (15.9), one obtains the followingalternative covariance form of the recursive WLSE: Equation (15.7), and
P(n) = [H0(n)W(n)H(n)]−1, and are used fork = n, n + 1, , N − 1.
Equation (15.7) can be expressed as
ˆθWLS(k + 1) =I − Kw(k + 1)h0(k + 1) ˆθWLS(k) + Kw(k + 1)z(k + 1) (15.12)
Trang 5which demonstrates that the recursive WLSE is a time-varying digital filter that is excited by randominputs (i.e., the measurements), one whose plant matrix[I − Kw(k + 1)h0(k + 1)] may itself be
random because K w(k + 1) and h(k + 1) may be random, depending upon the specific application.
The random natures of these matrices make the analysis of this filter exceedingly difficult
Two recursions are present in the recursive WLSEs The first is the vector recursion for ˆθWLSgiven
by (15.7) Clearly, ˆθWLS(k +1) cannot be computed from this expression until measurement z(k +1)
is available The second is the matrix recursion for either P−1given by (15.9) or P given by (15.11).
Observe that values for these matrices can be precomputed before measurements are made A digital
computer implementation of (15.7)–(15.9) is P−1(k +1) → P(k +1) → Kw(k +1) → ˆθWLS(k +1),
whereas for (15.7), (15.10), and (15.11), it is P(k) → Kw(k + 1) → ˆθWLS(k + 1) → P(k + 1).
Finally, the recursive WLSEs can even be used fork = 0, 1, , N − 1 Often z(0) = 0, or there
is no measurement made atk = 0, so that we can set z(0) = 0 In this case we can set w(0) = 0,
and the recursive WLSEs can be initialized by setting ˆθWLS(0) = 0 and P(0) to a diagonal matrix of
very large numbers This is very commonly done in practice Fast fixed-order recursive least-squaresalgorithms that are based on the Givens rotation [3] and can be implemented using systolic arraysare described in [5] and the references therein
15.3 Properties of Estimators
How do we know whether or not the results obtained from the WLSE, or for that matter any estimator,are good? To answer this question, we must make use of the fact that all estimators representtransformations of random data; hence, ˆθ(k) is itself random, so that its properties must be studied
from a statistical viewpoint This fact, and its consequences, which seem so obvious to us today, aredue to the eminent statistician R.A Fischer
It is common to distinguish between small-sample and large-sample properties of estimators Theterm “sample” refers to the number of measurements used to obtain ˆθ, i.e., the dimension of Z.
The phrase “small sample” means any number of measurements (e.g., 1, 2, 100, 104, or even aninfinite number), whereas the phrase “large sample” means “an infinite number of measurements.”Large-sample properties are also referred to as asymptotic properties If an estimator possesses assmall-sample property, it also possesses the associated large-sample property; but the converse is notalways true Although large sample means an infinite number of measurements, estimators begin
to enjoy large-sample properties for much fewer than an infinite number of measurements Howfew usually depends on the dimension ofθ, n, the memory of the estimators, and in general on the
underlying, albeit unknown, probability density function
A thorough study into ˆθ would mean determining its probability density function p( ˆθ) Usually,
it is too difficult to obtainp( ˆθ) for most estimators (unless ˆθ is multivariate Gaussian); thus, it is
customary to emphasize the first-and second-order statistics of ˆθ (or its associated error ˜θ = θ − ˆθ),
the mean and the covariance
Small-sample properties of an estimator are unbiasedness and efficiency An estimator is unbiased
if its mean value is tracking the unknown parameter at every value of time, i.e., the mean value ofthe estimation error is zero at every value of time Dispersion about the mean is measured by errorvariance Efficiency is related to how small the error variance will be Associated with efficiency is thevery famous Cramer-Rao inequality (Fisher information matrix, in the case of a vector of parameters)which places a lower bound on the error variance, a bound that does not depend on a particularestimator
Large-sample properties of an estimator are asymptotic unbiasedness, consistency, asymptoticnormality, and asymptotic efficiency Asymptotic unbiasedness and efficiency are limiting forms oftheir small sample counterparts, unbiasedness and efficiency The importance of an estimator beingasymptotically normal (Gaussian) is that its entire probabilistic description is then known, and it
Trang 6can be entirely characterized just by its asymptotic first- and second-order statistics Consistency is
a form of convergence of ˆθ(k) to θ; it is synonymous with convergence in probability One of the
reasons for the importance of consistency in estimation theory is that any continuous function of aconsistent estimator is itself a consistent estimator, i.e., “consistency carries over.” It is also possible
to examine other types of stochastic convergence for estimators, such as mean-squared convergenceand convergence with probability 1 A general carry-over property does not exist for these two types
of convergence; it must be established case-by case (e.g., [11])
Generally speaking, it is very difficult to establish small sample or large sample properties for
least-squares estimators, except in the very special case when H(k) and V(k) are statistically independent.
While this condition is satisfied in the application of identifying an impulse response, it is violated
in the important application of identifying the coefficients in a finite difference equation, as well
as in many other important engineering applications Many large sample properties of LSEs aredetermined by establishing that the LSE is equivalent to another estimator for which it is known thatthe large sample property holds true We pursue this below
Least-squares estimators require no assumptions about the statistical nature of the generic model.Consequently, the formula for the WLSE is easy to derive The price paid for not making assumptionsabout the statistical nature of the generic linear model is great difficulty in establishing small or largesample properties of the resulting estimator
15.4 Best Linear Unbiased Estimation
Our second estimator is both unbiased and efficient by design, and is a linear function of
measure-ments Z(k) It is called a best linear unbiased estimator (BLUE), ˆθBLU(k) As in the derivation of
the WLSE, we begin with our generic linear model; but, now we make two assumptions about this
model, namely: (1) H(k) must be deterministic, and (2) V(k) must be zero mean with positive definite known covariance matrix R(k) The derivation of the BLUE is more complicated than the
derivation of the WLSE because of the design constraints; however, its performance analysis is mucheasier because we build good performance into its design
We begin by assuming the following linear structure for ˆθBLU(k), ˆθBLU(k) = F(k)Z(k) Matrix
F(k) is designed such that: (1) ˆθBLU(k) is an unbiased estimator of θ, and (2) the error variance for
each of then parameters is minimized In this way, ˆθBLU(k) will be unbiased and efficient (within
the class of linear estimators) by design The resulting BLUE estimator is:
ˆθBLU(k) = [H0(k)R−1(k)H(k)]H0(k)R−1(k)Z(k) (15.13)
A very remarkable connection exists between the BLUE and WLSE, namely, the BLUE ofθ is the
special case of the WLSE ofθ when W(k) = R−1(k) Consequently, all results obtained in our
section above for ˆθWLS(k) can be applied to ˆθBLU(k) by setting W(k) = R−1(k) Matrix R−1(k)
weights the contributions of precise measurements heavily and deemphasizes the contributions ofimprecise measurements The best linear unbiased estimation design technique has led to a weightingmatrix that is quite sensible
If H(k) is deterministic and R(k) = σ2
νI, then ˆθBLU(k) = ˆθ LS (k) This result, known as the
Gauss-Markov theorem, is important because we have connected two seemingly different estimators,one of which, ˆθBLU(k), has the properties of unbiasedness and minimum variance by design; hence,
in this case ˆθ LS (k) inherits these properties.
In a recursive WLSE, matrix P(k) has no special meaning In a recursive BLUE [which is obtained
by substituting W(k) = R−1(k) into (15.7)–(15.9), or (15.7), (15.10) and (15.11)], matrix P(k) is
the covariance matrix for the error betweenθ and ˆθBLU(k), i.e., P(k) = [H0(k)R−1(k)H(k)]−1 =cov[ ˜θBLU(k)] Hence, every time P(k) is calculated in the recursive BLUE, we obtain a quantitative
measure of how well we are estimatingθ.
Trang 7Recall that we stated that WLSEs may change in numerical value under changes in scale BLUEs
are invariant under changes in scale This is accomplished automatically by setting W(k) = R−1(k)
in the WLSE
The fact that H(k) must be deterministic severely limits the applicability of BLUEs in engineering
applications
15.5 Maximum-Likelihood Estimation
Probability is associated with a forward experiment in which the probability model,p(Z(k)|θ), is
specified, including values for the parameters,θ, in that model (e.g., mean and variance in a Gaussian
density function), and data (i.e., realizations) are generated using this model Likelihood,l(θ|Z(k)),
is proportional to probability In likelihood, the data is given as well as the nature of the probabilitymodel;but the parameters of the probability model are not specified They must be determined fromthe given data Likelihood is, therefore, associated with an inverse experiment
The maximum-likelihood method is based on the relatively simple idea that different (statistical)populations generate different samples and that any given sample (i.e., set of data) is more likely tohave come from some populations than from others
In order to determine the maximum-likelihood estimate (MLE) of deterministicθ, ˆθ ML, we need todetermine a formula for the likelihood function and then maximize that function Because likelihood
is proportional to probability, we need to know the entire joint probability density function of themeasurements in order to determine a formula for the likelihood function This, of course, is much
more information about Z(k) than was required in the derivation of the BLUE In fact, it is the most
information that we can ever expect to know about the measurements The price we pay for knowing
so much information about Z(k) is complexity in maximizing the likelihood function Generally,
mathematical programming must be used in order to determine ˆθ ML
Maximum-likelihood estimates are very popular and widely used because they enjoy very goodlarge sample properties They are consistent, asymptotically Gaussian with meanθ and covariance
matrixN1J−1, in which J is the Fisher information matrix, and are asymptotically efficient Functions
of maximum-likelihood estimates are themselves maximum-likelihood estimates, i.e., if g(θ) is a
vector function mappingθ into an interval in r-dimensional Euclidean space, then g( ˆθ ML ) is a MLE
of g(θ) This “invariance” property is usually not enjoyed by WLSEs or BLUEs.
In one special case it is very easy to compute ˆθ ML, i.e., for our generic linear model in which H(k)
is deterministic and V(k) is Gaussian In this case ˆθ ML = ˆθBLU These estimators are: unbiased,because ˆθBLU is unbiased; efficient (within the class of linear estimators), because ˆθBLUis efficient;consistent, because ˆθ MLis consistent; and, Gaussian, because they depend linearly on Z(k), which is Gaussian If, in addition, R(k) = σ2
νI, then ˆθ ML (k) = ˆθBLU(k) = ˆθ LS (k), and these estimators are
unbiased, efficient (within the class of linear estimators), consistent, and Gaussian
The method of maximum-likelihood is limited to deterministic parameters In the case of randomparameters, we can still use the WLSE or the BLUE, or, if additional information is available, we can
use either a mean-squared or maximum-a posteriori estimator, as described below The former does
not use statistical information about the random parameters, whereas the latter does
15.6 Mean-Squared Estimation of Random Parameters
Given measurements z(1), z(2), , z(k), themean-squaredestimator(MSE)ofrandom θ, ˆθ MS (k) =
φ[z(i), i = 1, 2, , k], minimizes the mean-squared error J [ ˜θ MS (k)] = E{ ˜θ0
MS (k) ˜θ MS (k)} [where
˜θ MS (k) = θ − ˆθ MS (k)] The function φ[z(i), i = 1, 2, , k] may be nonlinear or linear Its exact
structure is determined by minimizingJ [ ˜θ MS (k)].
Trang 8The solution to this mean-squared estimation problem, which is known as the fundamental orem of estimation theory is:
As it stands, (15.14) is not terribly useful for computing ˆθ MS (k) In general, we must first compute
p[θ|Z(k)] and then perform the requisite number of integrations of θp[θ|Z(k)] to obtain ˆθ MS (k).
It is useful to separate this computation into two major cases; (1)θ and Z(k) are jointly Gaussian —
the Gaussian case, and (2) θ and Z(k) are not jointly Gaussian — the non-Gaussian case.
Whenθ and Z(k) are jointly Gaussian, the estimator that minimizes the mean-squared error is
ˆθ MS (k) = m θ+ Pθz (k)P z−1(k)Z(k) − m z (k) (15.15)
where mθ is the mean ofθ, m z (k) is the mean of Z(k), P z (k) is the covariance matrix of Z(k), and
Pθz (k) is the cross-covariance between θ and Z(k) Of course, to compute ˆθ MS (k) using (15.15), wemust somehow know all of these statistics, and we must be sure thatθ and Z(k) are jointly Gaussian.
For the generic linear model, Z(k) = H(k)θ +V(k), in which H(k) is deterministic, V(k) is Gaussian
noise with known invertible covariance matrix R(k), θ is Gaussian with mean m θ and covariance
matrix Pθ, and, θ and V(k) are statistically independent, then θ and Z(k) are jointly Gaussian,
ˆθ MS (k) = m θ+ PMS (k)H0(k)R−1(k) [Z(k) − H(k)m θ] (15.18)Supposeθ and Z(k) are not jointly Gaussian and that we know m θ , m z (k), P z (k), and P θz (k) In
this case, the estimator that is constrained to be an affine transformation of Z(k) and that minimizes
the mean-squared error is also given by (15.15)
We now know the answer to the following important question: When is the linear (affine) squared estimator the same as the mean-squared estimator? The answer is whenθ and Z(k) are jointly
mean-Gaussian Ifθ and Z(k) are not jointly Gaussian, then ˆθ MS (k) = E{θ|Z(k)}, which, in general, is a
nonlinear function of measurements Z(k), i.e., it is a nonlinear estimator.
Associated with mean-squared estimation theory is the orthogonality principle: Suppose f[Z(k)]
is any function of the data Z(k); then the error in the mean-squared estimator is orthogonal to f[Z(k)]
in the sense that E{[θ − ˆθMS (k)]f0[Z(k)]} = 0 A frequently encountered special case of this occurs
when f[Z(k)] = ˆθMS (k), in which case E{ ˜θ MS (k) ˜θ0
MS (k)} = 0.
Whenθ and Z(k) are jointly Gaussian, ˆθ MS (k) in (15.15) has the following properties: (1) it isunbiased; (2) each of its components has the smallest error variance; (3) it is a “linear” (affine)estimator; (4) it is unique; and, (5) both ˆθ MS (k) and ˜θ MS (k) are multivariate Gaussian, which
means that these quantities are completely characterized by their first- and second-order statistics.Tremendous simplifications occur whenθ and Z(k) are jointly Gaussian!
Many of the results presented in this section are applicable to objective functions other than themean-squared objective function See the supplementary material at the end of Lesson 13 in [12] for
discussions on a wide number of objective functions that lead to E{θ|Z(k)} as the optimal estimator
ofθ, as well as discussions on a full-blown nonlinear estimator of θ.
Trang 9There is a connection between the BLUE and the MSE The connection requires a slightly different
BLUE, one that incorporates the a priori statistical information about random θ To do this, we treat
mθas an additional measurement that is augmented to Z(k) The additional measurement equation
is obtained by adding and subtractingθ in the identity m θ = mθ, i.e., mθ = θ + (m θ − θ) Quantity
(m θ − θ) is now treated as zero-mean measurement noise with covariance P θ The augmented linear
BLU(k) Then it is always true that
ˆθ MS (k) = ˆθBLUa (k) Note that the weighted least-squares objective function that is associated with
ˆθ a
BLU(k) is J a [ ˆθ a (k)] = [m θ − ˆθ a (k)]0P−1
θ [mθ − ˆθ a (k)] + ˜Z0(k)R−1(k) ˜Z(k).
15.7 Maximum A Posteriori Estimation of Random Parameters
Maximum a posteriori (MAP) estimation is also known as Bayesian estimation Recall Bayes’s rule:
p(θ|Z(k)) = p(Z(k)|θ)p(θ)/p(Z(k)) in which density function p(θ|Z(k)) is known as the a
posteri-ori (or posterior) conditional density function, and p(θ) is the prior density function for θ Observe
thatp(θ|Z(k)) is related to likelihood function l{θ|Z(k)}, because l{θ|Z(k)} ∝ p(Z(k)|θ)
Addi-tionally, becausep(Z(k)) does not depend on θ, p(θ|Z(k)) ∝ p(Z(k)|θ)p(θ) In MAP estimation,
values ofθ are found that maximize p(Z(k)|θ)p(θ) Obtaining a MAP estimate involves specifying
bothp(Z(k)|θ) and p(θ) and finding the value of θ that maximizes p(θ|Z(k)) It is the knowledge
of the a priori probability model for θ , p(θ), that distinguishes the problem formulation for MAP
estimation from MS estimation
Ifθ1, θ2, , θ nare uniformly distributed, thenp(θ|Z(k)) ∝ p(Z(k)|θ), and the MAP estimator
ofθ equals the ML estimator of θ Generally, MAP estimates are quite different from ML estimates.
For example, the invariance property of MLEs usually does not carry over to MAP estimates Onereason for this can be seen from the formulap(θ|Z(k)) ∝ p(Z(k)|θ)p(θ) Suppose, for example,
thatφ = g(θ) and we want to determine ˆφMAPby first computing ˆθMAP Becausep(θ) depends
on the Jacobian matrix of g−1(φ), ˆφMAP 6= g( ˆθMAP) Usually ˆθMAPand ˆθ ML (k) are asymptotically
identical to one another since in the large sample case the knowledge of the observations tends toswamp the knowledge of the prior distribution [10]
Generally speaking, optimization must be used to compute ˆθMAP(k) In the special but important
case, when Z(k) and θ are jointly Gaussian, then ˆθMAP(k) = ˆθ MS (k) This result is true regardless of
the nature of the model relatingθ to Z(k) Of course, in order to use it, we must first establish that
Z(k) and θ are jointly Gaussian Except for the generic linear model, this is very difficult to do.
When H(k) is deterministic, V(k) is white Gaussian noise with known covariance matrix R(k),
andθ is multivariate Gaussian with known mean m θand covariance Pθ , ˆθMAP(k) = ˆθ a
BLU(k); hence,
for the generic linear Gaussian model, MS, MAP, and BLUE estimates ofθ are all the same, i.e.,
ˆθ MS (k) = ˆθ a
BLU(k) = ˆθMAP(k).
15.8 The Basic State-Variable Model
In the rest of this chapter we shall describe a variety of mean-squared state estimators for a linear,(possibly) time-varying, discrete-time, dynamical system, which we refer to as the basic state-variablemodel This system is characterized byn × 1 state vector x(k) and m × 1 measurement vector z(k),
and is:
x(k + 1) = 8(k + 1, k)x(k) + 0(k + 1, k)w(k) + 9(k + 1, k)u(k) (15.20)
Trang 10z(k + 1) = H(k + 1)x(k + 1) + v(k + 1) (15.21)wherek = 0, 1, In this model w(k) and v(k) are p×1 and m×1 mutually uncorrelated (possibly
nonstationary) jointly Gaussian white noise sequences; i.e., E{w(i)w0(j)} = Q(i)δ ij , E{v(i)v0(j)} =
R(i)δ ij, and E{w(i)v0(j)} = S = 0, for all i and j Covariance matrix Q(i) is positive semidefinite
and R(i) is positive definite [so that R−1(i) exists] Additionally, u(k) is an l × 1 vector of known
system inputs, and initial state vector x(0) is multivariate Gaussian, with mean m x(0) and covariance
P x(0), and x(0) is not correlated with w(k) and v(k) The dimensions of matrices 8, 0, 9, H, Q, and R aren × n, n × p, n × l, m × n, p × p, and m × m, respectively The double arguments in
matrices8, 0, and 9 may not always be necessary, in which case we replace (k + 1, k) by k.
Disturbance w(k) is often used to model disturbance forces acting on the system, errors in modeling the system, or errors due to actuators in the translation of the known input, u(k), into physical signals Vector v(k) is often used to model errors in measurements made by sensing instruments, or
unavoidable disturbances that act directly on the sensors
Not all systems are described by this basic model In general, w(k) and v(k) may be correlated,
some measurements may be made so accurate that, for all practical purposes, they are “perfect” (i.e.,
no measurement noise is associated with them), and either w(k) or v(k), or both, may be nonzero
mean or colored noise processes How to handle these situations is described in Lesson 22 of [12]
When x(0) and{w(k), k = 0, 1, } are jointly Gaussian, then {x(k), k = 0, 1, } is a
Gauss-Markov sequence Note that if x(0) and w(k) are individually Gaussian and statistically independent,
they will be jointly Gaussian Consequently, the mean and covariance of the state vector completely
characterize it Let mx(k) denote the mean of x(k) For our basic state-variable model, mx(k) can
be computed from the vector recursive equation
m x(k + 1) = 8(k + 1, k)mx(k) + 9(k + 1, k)u(k) (15.22)wherek = 0, 1, , and mx(0) initializes (15.22) Let P x(k) denote the covariance matrix of x(k). For our basic state-variable model, P x(k) can be computed from the matrix recursive equation
P x(k + 1) = 8(k + 1, k)Px(k)80(k + 1, k) + 0(k + 1, k)Q(k)00(k + 1, k) (15.23)wherek = 0, 1, , and Px(0) initializes (15.23) Equations (15.22) and (15.23) are easily pro-grammed for a digital computer
For our basic state-variable model, when x(0), w(k), and v(k) are jointly Gaussian, then {z(k), k =
1, 2, } is Gaussian, and
and
P z(k + 1) = H(k + 1)Px(k + 1)H0(k + 1) + R(k + 1) (15.25)
where m x(k + 1) and Px(k + 1) are computed from (15.22) and (15.23), respectively
For our basic state-variable model to be stationary, it must be time-invariant, and the probability
density functions of w(k) and v(k) must be the same for all values of time Because w(k) and v(k)
are zero-mean and Gaussian, this means that Q(k) must equal the constant matrix Q and R(k) must
equal the constant matrix R Additionally, either x(0)= 0 or 8(k, 0)x(0) ≈ 0 when k > k0; in both
cases x(k) will be in its steady-state regime, so stationarity is possible.
If the basic state-variable model is time-invariant and stationary and if8 is associated with an
asymptotically stable system (i.e., one whose poles all lie within the unit circle), then [1] matrix
P x(k) reaches a limiting (steady-state) solution ¯Px and ¯P xis the solution of the following steady-stateversion of (15.23): ¯P x= 8¯Px 80+0Q00 This equation is called a discrete-time Lyapunov equation.