15 Estimation Theory and Algorithms: From Gauss to Wiener to Kalman

Examples of problems that can be cast into the form of the generic linear model are: identifyingthe impulse response coefficients in the convolutional summation model for a linear time-i

Trang 1

Mendel, J.M “Estimation Theory and Algorithms: From Gauss to Wiener to Kalman”

Digital Signal Processing Handbook

Ed Vijay K Madisetti and Douglas B Williams

Boca Raton: CRC Press LLC, 1999

Trang 2

15.9 State Estimation for the Basic State-Variable Model

Prediction•Filtering (the Kalman Filter)•Smoothing

15.10 Digital Wiener Filtering15.11 Linear Prediction in DSP, and Kalman Filtering15.12 Iterated Least Squares

15.13 Extended Kalman FilterAcknowledgment

ReferencesFurther Information

15.1 Introduction

Estimation is one of four modeling problems The other three are representation (how somethingshould be modeled), measurement (which physical quantities should be measured and how theyshould be measured), and validation (demonstrating confidence in the model) Estimation, whichfits in between the problems of measurement and validation, deals with the determination of thosephysical quantities that cannot be measured from those that can be measured We shall cover a widerange of estimation techniques including weighted least squares, best linear unbiased, maximum-

likelihood, mean-squared, and maximum-a posteriori These techniques are for parameter or state

estimation or a combination of the two, as applied to either linear or nonlinear models

The discrete-time viewpoint is emphasized in this chapter because: (1) much real data is collected in

a digitized manner, so it is in a form ready to be processed by discrete-time estimation algorithms; and(2) the mathematics associated with discrete-time estimation theory is simpler than with continuous-time estimation theory We view (discrete-time) estimation theory as the extension of classical signalprocessing to the design of discrete-time (digital) filters that process uncertain data in a optimalmanner Estimation theory can, therefore, be viewed as a natural adjunct to digital signal processingtheory Mendel [12] is the primary reference for all the material in this chapter

Trang 3

Estimation algorithms process data and, as such, must be implemented on a digital computer.Our computation philosophy is, whenever possible, leave it to the experts Many of our chapter’salgorithms can be used with MATLABTMand appropriate toolboxes (MATLAB is a registered trade-mark of The MathWorks, Inc.) See [12] for specific connections between MATLABTMand toolboxM-files and the algorithms of this chapter.

The main model that we shall direct our attention to is linear in the unknown parameters, namely

In this model, which we refer to as a “generic linear model,” Z(k) = col (z(k), z(k − 1), , z(k −

N + 1)), which is N × 1, is called the measurement vector Its elements are z(j) = h0(j)θ + v(j);

θ which is n × 1, is called the parameter vector, and contains the unknown deterministic or random

parameters that will be estimated using one or more of this chapter’s techniques; H(k), which is N ×n,

is called the observation matrix; and, V(k), which is N × 1, is called the measurement noise vector.

By convention, the argument “k” of Z(k), H(k), and V(k) denotes the fact that the last measurementused to construct (15.1) is thekth.

Examples of problems that can be cast into the form of the generic linear model are: identifyingthe impulse response coefficients in the convolutional summation model for a linear time-invariantsystem from noisy output measurements; identifying the coefficients of a linear time-invariant finite-difference equation model for a dynamical system from noisy output measurements; function ap-proximation; state estimation; estimating parameters of a nonlinear model using a linearized version

of that model; deconvolution; and identifying the coefficients in a discretized Volterra series sentation of a nonlinear system

repre-The following estimation notation is used throughout this chapter: ˆθ(k) denotes an estimate of θ

and ˜θ(k) denotes the error in estimation, i.e., ˜θ(k) = θ − ˆθ(k) The generic linear model is the starting

point for the derivation of many classical parameter estimation techniques, and the estimation model

for Z(k) is ˆZ(k) = H(k) ˆθ(k) In the rest of this chapter we develop specific structures for ˆθ(k) These

structures are referred to as estimators Estimates are obtained whenever data are processed by anestimator

matrix can be used to weight recent measurements more (or less) heavily than past measurements

If W(k) = cI, so that all measurements are weighted the same, then weighted least-squares reduces

to least squares, in which case, we obtain ˆθ LS (k) Setting dJ [ ˆθ(k)]/d ˆθ(k) = 0, we find that:

ˆθWLS(k) =H0(k)W(k)H(k)−1H0(k)W(k)Z(k) (15.2)and, consequently,

ˆθ LS (k) =H0(k)H(k)−1H0(k)Z(k) (15.3)Note, also, thatJ [ ˆθWLS(k)] = Z0(k)W(k)Z(k) − ˆθWLS0 (k)H0(k)W(k)H(k) ˆθWLS(k).

Matrix H0(k)W(k)H(k) must be nonsingular for its inverse in (15.2) to exist This is true if W(k)

is positive definite, as assumed, and H(k) is of maximum rank We know that ˆθWLS(k) minimizes

J [ ˆθWLS(k)] because d2J [ ˆθ(k)]/d ˆθ2(k) = 2H0(k)W(k)H(k) > 0, since H0(k)W(k)H(k) is

invert-ible Estimator ˆθWLS(k) processes the measurements Z(k) linearly; hence, it is referred to as a linear

Trang 4

estimator In practice, we do not compute ˆ θWLS(k) using (15.2), because computing the inverse

of H0(k)W(k)H(k) is fraught with numerical difficulties Instead, the so-called normal equations

[H0(k)W(k)H(k)] ˆθWLS(k) = H0(k)W(k)Z(k) are solved using stable algorithms from numerical

lin-ear algebra (e.g., [3] indicating that one approach to solving the normal equations is to convert theoriginal least squares problem into an equivalent, easy-to-solve problem using orthogonal transfor-mations such as Householder or Givens transformations) Note, also, that (15.2) and (15.3) apply tothe estimation of either deterministic or random parameters, because nowhere in the derivation of

ˆθWLS(k) did we have to assume that θ was or was not random Finally, note that WLSEs may not be

invariant under changes of scale One way to circumvent this difficulty is to use normalized data.Least-squares estimates can also be computed using the singular-value decomposition (SVD) of

matrix H(k) This computation is valid for both the overdetermined (N < n) and underdetermined

(N > n) situations and for the situation when H(k) may or may not be of full rank The SV D of

where U and V are unitary matrices,P

= diag (σ1, σ2, , σ r ), and σ1≥ σ2≥ ≥ σ r > 0 The

σ i’s are the singular values of A, andr is the rank of A Let the SVD of H(k) be given by (15.4) Even

if H(k) is not of maximum rank, then

Similar formulas exist for computing ˆθWLS(k).

Equations (15.2) and (15.3) are batch equations, because they process all of the measurements

at one time These formulas can be made recursive in time by using simple vector and matrixpartitioning techniques The information form of the recursive WLSE is:

ˆθWLS(k + 1) = ˆθWLS(k) + Kw(k + 1)[z(k + 1) − h0(k + 1) ˆθWLS(k)] (15.7)

P−1(k + 1) = P−1(k) + h(k + 1)w(k + 1)h0(k + 1) (15.9)Equations (15.8) and (15.9) require the inversion ofn × n matrix P If n is large, then this will

be a costly computation Applying a matrix inversion lemma to (15.9), one obtains the followingalternative covariance form of the recursive WLSE: Equation (15.7), and

P(n) = [H0(n)W(n)H(n)]−1, and are used fork = n, n + 1, , N − 1.

Equation (15.7) can be expressed as

ˆθWLS(k + 1) =I − Kw(k + 1)h0(k + 1) ˆθWLS(k) + Kw(k + 1)z(k + 1) (15.12)

Trang 5

which demonstrates that the recursive WLSE is a time-varying digital filter that is excited by randominputs (i.e., the measurements), one whose plant matrix[I − Kw(k + 1)h0(k + 1)] may itself be

random because K w(k + 1) and h(k + 1) may be random, depending upon the specific application.

The random natures of these matrices make the analysis of this filter exceedingly difficult

Two recursions are present in the recursive WLSEs The first is the vector recursion for ˆθWLSgiven

by (15.7) Clearly, ˆθWLS(k +1) cannot be computed from this expression until measurement z(k +1)

is available The second is the matrix recursion for either P−1given by (15.9) or P given by (15.11).

Observe that values for these matrices can be precomputed before measurements are made A digital

computer implementation of (15.7)–(15.9) is P−1(k +1) → P(k +1) → Kw(k +1) → ˆθWLS(k +1),

whereas for (15.7), (15.10), and (15.11), it is P(k) → Kw(k + 1) → ˆθWLS(k + 1) → P(k + 1).

Finally, the recursive WLSEs can even be used fork = 0, 1, , N − 1 Often z(0) = 0, or there

is no measurement made atk = 0, so that we can set z(0) = 0 In this case we can set w(0) = 0,

and the recursive WLSEs can be initialized by setting ˆθWLS(0) = 0 and P(0) to a diagonal matrix of

very large numbers This is very commonly done in practice Fast fixed-order recursive least-squaresalgorithms that are based on the Givens rotation [3] and can be implemented using systolic arraysare described in [5] and the references therein

15.3 Properties of Estimators

How do we know whether or not the results obtained from the WLSE, or for that matter any estimator,are good? To answer this question, we must make use of the fact that all estimators representtransformations of random data; hence, ˆθ(k) is itself random, so that its properties must be studied

from a statistical viewpoint This fact, and its consequences, which seem so obvious to us today, aredue to the eminent statistician R.A Fischer

It is common to distinguish between small-sample and large-sample properties of estimators Theterm “sample” refers to the number of measurements used to obtain ˆθ, i.e., the dimension of Z.

The phrase “small sample” means any number of measurements (e.g., 1, 2, 100, 104, or even aninfinite number), whereas the phrase “large sample” means “an infinite number of measurements.”Large-sample properties are also referred to as asymptotic properties If an estimator possesses assmall-sample property, it also possesses the associated large-sample property; but the converse is notalways true Although large sample means an infinite number of measurements, estimators begin

to enjoy large-sample properties for much fewer than an infinite number of measurements Howfew usually depends on the dimension ofθ, n, the memory of the estimators, and in general on the

underlying, albeit unknown, probability density function

A thorough study into ˆθ would mean determining its probability density function p( ˆθ) Usually,

it is too difficult to obtainp( ˆθ) for most estimators (unless ˆθ is multivariate Gaussian); thus, it is

customary to emphasize the first-and second-order statistics of ˆθ (or its associated error ˜θ = θ − ˆθ),

the mean and the covariance

Small-sample properties of an estimator are unbiasedness and efficiency An estimator is unbiased

if its mean value is tracking the unknown parameter at every value of time, i.e., the mean value ofthe estimation error is zero at every value of time Dispersion about the mean is measured by errorvariance Efficiency is related to how small the error variance will be Associated with efficiency is thevery famous Cramer-Rao inequality (Fisher information matrix, in the case of a vector of parameters)which places a lower bound on the error variance, a bound that does not depend on a particularestimator

Large-sample properties of an estimator are asymptotic unbiasedness, consistency, asymptoticnormality, and asymptotic efficiency Asymptotic unbiasedness and efficiency are limiting forms oftheir small sample counterparts, unbiasedness and efficiency The importance of an estimator beingasymptotically normal (Gaussian) is that its entire probabilistic description is then known, and it

Trang 6

can be entirely characterized just by its asymptotic first- and second-order statistics Consistency is

a form of convergence of ˆθ(k) to θ; it is synonymous with convergence in probability One of the

reasons for the importance of consistency in estimation theory is that any continuous function of aconsistent estimator is itself a consistent estimator, i.e., “consistency carries over.” It is also possible

to examine other types of stochastic convergence for estimators, such as mean-squared convergenceand convergence with probability 1 A general carry-over property does not exist for these two types

of convergence; it must be established case-by case (e.g., [11])

Generally speaking, it is very difficult to establish small sample or large sample properties for

least-squares estimators, except in the very special case when H(k) and V(k) are statistically independent.

While this condition is satisfied in the application of identifying an impulse response, it is violated

in the important application of identifying the coefficients in a finite difference equation, as well

as in many other important engineering applications Many large sample properties of LSEs aredetermined by establishing that the LSE is equivalent to another estimator for which it is known thatthe large sample property holds true We pursue this below

Least-squares estimators require no assumptions about the statistical nature of the generic model.Consequently, the formula for the WLSE is easy to derive The price paid for not making assumptionsabout the statistical nature of the generic linear model is great difficulty in establishing small or largesample properties of the resulting estimator

15.4 Best Linear Unbiased Estimation

Our second estimator is both unbiased and efficient by design, and is a linear function of

measure-ments Z(k) It is called a best linear unbiased estimator (BLUE), ˆθBLU(k) As in the derivation of

the WLSE, we begin with our generic linear model; but, now we make two assumptions about this

model, namely: (1) H(k) must be deterministic, and (2) V(k) must be zero mean with positive definite known covariance matrix R(k) The derivation of the BLUE is more complicated than the

derivation of the WLSE because of the design constraints; however, its performance analysis is mucheasier because we build good performance into its design

We begin by assuming the following linear structure for ˆθBLU(k), ˆθBLU(k) = F(k)Z(k) Matrix

F(k) is designed such that: (1) ˆθBLU(k) is an unbiased estimator of θ, and (2) the error variance for

each of then parameters is minimized In this way, ˆθBLU(k) will be unbiased and efficient (within

the class of linear estimators) by design The resulting BLUE estimator is:

ˆθBLU(k) = [H0(k)R−1(k)H(k)]H0(k)R−1(k)Z(k) (15.13)

A very remarkable connection exists between the BLUE and WLSE, namely, the BLUE ofθ is the

special case of the WLSE ofθ when W(k) = R−1(k) Consequently, all results obtained in our

section above for ˆθWLS(k) can be applied to ˆθBLU(k) by setting W(k) = R−1(k) Matrix R−1(k)

weights the contributions of precise measurements heavily and deemphasizes the contributions ofimprecise measurements The best linear unbiased estimation design technique has led to a weightingmatrix that is quite sensible

If H(k) is deterministic and R(k) = σ2

νI, then ˆθBLU(k) = ˆθ LS (k) This result, known as the

Gauss-Markov theorem, is important because we have connected two seemingly different estimators,one of which, ˆθBLU(k), has the properties of unbiasedness and minimum variance by design; hence,

in this case ˆθ LS (k) inherits these properties.

In a recursive WLSE, matrix P(k) has no special meaning In a recursive BLUE [which is obtained

by substituting W(k) = R−1(k) into (15.7)–(15.9), or (15.7), (15.10) and (15.11)], matrix P(k) is

the covariance matrix for the error betweenθ and ˆθBLU(k), i.e., P(k) = [H0(k)R−1(k)H(k)]−1 =cov[ ˜θBLU(k)] Hence, every time P(k) is calculated in the recursive BLUE, we obtain a quantitative

measure of how well we are estimatingθ.

Trang 7

Recall that we stated that WLSEs may change in numerical value under changes in scale BLUEs

are invariant under changes in scale This is accomplished automatically by setting W(k) = R−1(k)

in the WLSE

The fact that H(k) must be deterministic severely limits the applicability of BLUEs in engineering

applications

15.5 Maximum-Likelihood Estimation

Probability is associated with a forward experiment in which the probability model,p(Z(k)|θ), is

specified, including values for the parameters,θ, in that model (e.g., mean and variance in a Gaussian

density function), and data (i.e., realizations) are generated using this model Likelihood,l(θ|Z(k)),

is proportional to probability In likelihood, the data is given as well as the nature of the probabilitymodel;but the parameters of the probability model are not specified They must be determined fromthe given data Likelihood is, therefore, associated with an inverse experiment

The maximum-likelihood method is based on the relatively simple idea that different (statistical)populations generate different samples and that any given sample (i.e., set of data) is more likely tohave come from some populations than from others

In order to determine the maximum-likelihood estimate (MLE) of deterministicθ, ˆθ ML, we need todetermine a formula for the likelihood function and then maximize that function Because likelihood

is proportional to probability, we need to know the entire joint probability density function of themeasurements in order to determine a formula for the likelihood function This, of course, is much

more information about Z(k) than was required in the derivation of the BLUE In fact, it is the most

information that we can ever expect to know about the measurements The price we pay for knowing

so much information about Z(k) is complexity in maximizing the likelihood function Generally,

mathematical programming must be used in order to determine ˆθ ML

Maximum-likelihood estimates are very popular and widely used because they enjoy very goodlarge sample properties They are consistent, asymptotically Gaussian with meanθ and covariance

matrixN1J−1, in which J is the Fisher information matrix, and are asymptotically efficient Functions

of maximum-likelihood estimates are themselves maximum-likelihood estimates, i.e., if g(θ) is a

vector function mappingθ into an interval in r-dimensional Euclidean space, then g( ˆθ ML ) is a MLE

of g(θ) This “invariance” property is usually not enjoyed by WLSEs or BLUEs.

In one special case it is very easy to compute ˆθ ML, i.e., for our generic linear model in which H(k)

is deterministic and V(k) is Gaussian In this case ˆθ ML = ˆθBLU These estimators are: unbiased,because ˆθBLU is unbiased; efficient (within the class of linear estimators), because ˆθBLUis efficient;consistent, because ˆθ MLis consistent; and, Gaussian, because they depend linearly on Z(k), which is Gaussian If, in addition, R(k) = σ2

νI, then ˆθ ML (k) = ˆθBLU(k) = ˆθ LS (k), and these estimators are

unbiased, efficient (within the class of linear estimators), consistent, and Gaussian

The method of maximum-likelihood is limited to deterministic parameters In the case of randomparameters, we can still use the WLSE or the BLUE, or, if additional information is available, we can

use either a mean-squared or maximum-a posteriori estimator, as described below The former does

not use statistical information about the random parameters, whereas the latter does

15.6 Mean-Squared Estimation of Random Parameters

Given measurements z(1), z(2), , z(k), themean-squaredestimator(MSE)ofrandom θ, ˆθ MS (k) =

φ[z(i), i = 1, 2, , k], minimizes the mean-squared error J [ ˜θ MS (k)] = E{ ˜θ0

MS (k) ˜θ MS (k)} [where

˜θ MS (k) = θ − ˆθ MS (k)] The function φ[z(i), i = 1, 2, , k] may be nonlinear or linear Its exact

structure is determined by minimizingJ [ ˜θ MS (k)].

Trang 8

The solution to this mean-squared estimation problem, which is known as the fundamental orem of estimation theory is:

As it stands, (15.14) is not terribly useful for computing ˆθ MS (k) In general, we must first compute

p[θ|Z(k)] and then perform the requisite number of integrations of θp[θ|Z(k)] to obtain ˆθ MS (k).

It is useful to separate this computation into two major cases; (1)θ and Z(k) are jointly Gaussian —

the Gaussian case, and (2) θ and Z(k) are not jointly Gaussian — the non-Gaussian case.

Whenθ and Z(k) are jointly Gaussian, the estimator that minimizes the mean-squared error is

ˆθ MS (k) = m θ+ Pθz (k)P z−1(k)Z(k) − m z (k) (15.15)

where mθ is the mean ofθ, m z (k) is the mean of Z(k), P z (k) is the covariance matrix of Z(k), and

Pθz (k) is the cross-covariance between θ and Z(k) Of course, to compute ˆθ MS (k) using (15.15), wemust somehow know all of these statistics, and we must be sure thatθ and Z(k) are jointly Gaussian.

For the generic linear model, Z(k) = H(k)θ +V(k), in which H(k) is deterministic, V(k) is Gaussian

noise with known invertible covariance matrix R(k), θ is Gaussian with mean m θ and covariance

matrix Pθ, and, θ and V(k) are statistically independent, then θ and Z(k) are jointly Gaussian,

ˆθ MS (k) = m θ+ PMS (k)H0(k)R−1(k) [Z(k) − H(k)m θ] (15.18)Supposeθ and Z(k) are not jointly Gaussian and that we know m θ , m z (k), P z (k), and P θz (k) In

this case, the estimator that is constrained to be an affine transformation of Z(k) and that minimizes

the mean-squared error is also given by (15.15)

We now know the answer to the following important question: When is the linear (affine) squared estimator the same as the mean-squared estimator? The answer is whenθ and Z(k) are jointly

mean-Gaussian Ifθ and Z(k) are not jointly Gaussian, then ˆθ MS (k) = E{θ|Z(k)}, which, in general, is a

nonlinear function of measurements Z(k), i.e., it is a nonlinear estimator.

Associated with mean-squared estimation theory is the orthogonality principle: Suppose f[Z(k)]

is any function of the data Z(k); then the error in the mean-squared estimator is orthogonal to f[Z(k)]

in the sense that E{[θ − ˆθMS (k)]f0[Z(k)]} = 0 A frequently encountered special case of this occurs

when f[Z(k)] = ˆθMS (k), in which case E{ ˜θ MS (k) ˜θ0

MS (k)} = 0.

Whenθ and Z(k) are jointly Gaussian, ˆθ MS (k) in (15.15) has the following properties: (1) it isunbiased; (2) each of its components has the smallest error variance; (3) it is a “linear” (affine)estimator; (4) it is unique; and, (5) both ˆθ MS (k) and ˜θ MS (k) are multivariate Gaussian, which

means that these quantities are completely characterized by their first- and second-order statistics.Tremendous simplifications occur whenθ and Z(k) are jointly Gaussian!

Many of the results presented in this section are applicable to objective functions other than themean-squared objective function See the supplementary material at the end of Lesson 13 in [12] for

discussions on a wide number of objective functions that lead to E{θ|Z(k)} as the optimal estimator

ofθ, as well as discussions on a full-blown nonlinear estimator of θ.

Trang 9

There is a connection between the BLUE and the MSE The connection requires a slightly different

BLUE, one that incorporates the a priori statistical information about random θ To do this, we treat

mθas an additional measurement that is augmented to Z(k) The additional measurement equation

is obtained by adding and subtractingθ in the identity m θ = mθ, i.e., mθ = θ + (m θ − θ) Quantity

(m θ − θ) is now treated as zero-mean measurement noise with covariance P θ The augmented linear

BLU(k) Then it is always true that

ˆθ MS (k) = ˆθBLUa (k) Note that the weighted least-squares objective function that is associated with

ˆθ a

BLU(k) is J a [ ˆθ a (k)] = [m θ − ˆθ a (k)]0P−1

θ [mθ − ˆθ a (k)] + ˜Z0(k)R−1(k) ˜Z(k).

15.7 Maximum A Posteriori Estimation of Random Parameters

Maximum a posteriori (MAP) estimation is also known as Bayesian estimation Recall Bayes’s rule:

p(θ|Z(k)) = p(Z(k)|θ)p(θ)/p(Z(k)) in which density function p(θ|Z(k)) is known as the a

posteri-ori (or posterior) conditional density function, and p(θ) is the prior density function for θ Observe

thatp(θ|Z(k)) is related to likelihood function l{θ|Z(k)}, because l{θ|Z(k)} ∝ p(Z(k)|θ)

Addi-tionally, becausep(Z(k)) does not depend on θ, p(θ|Z(k)) ∝ p(Z(k)|θ)p(θ) In MAP estimation,

values ofθ are found that maximize p(Z(k)|θ)p(θ) Obtaining a MAP estimate involves specifying

bothp(Z(k)|θ) and p(θ) and finding the value of θ that maximizes p(θ|Z(k)) It is the knowledge

of the a priori probability model for θ , p(θ), that distinguishes the problem formulation for MAP

estimation from MS estimation

Ifθ1, θ2, , θ nare uniformly distributed, thenp(θ|Z(k)) ∝ p(Z(k)|θ), and the MAP estimator

ofθ equals the ML estimator of θ Generally, MAP estimates are quite different from ML estimates.

For example, the invariance property of MLEs usually does not carry over to MAP estimates Onereason for this can be seen from the formulap(θ|Z(k)) ∝ p(Z(k)|θ)p(θ) Suppose, for example,

thatφ = g(θ) and we want to determine ˆφMAPby first computing ˆθMAP Becausep(θ) depends

on the Jacobian matrix of g−1(φ), ˆφMAP 6= g( ˆθMAP) Usually ˆθMAPand ˆθ ML (k) are asymptotically

identical to one another since in the large sample case the knowledge of the observations tends toswamp the knowledge of the prior distribution [10]

Generally speaking, optimization must be used to compute ˆθMAP(k) In the special but important

case, when Z(k) and θ are jointly Gaussian, then ˆθMAP(k) = ˆθ MS (k) This result is true regardless of

the nature of the model relatingθ to Z(k) Of course, in order to use it, we must first establish that

Z(k) and θ are jointly Gaussian Except for the generic linear model, this is very difficult to do.

When H(k) is deterministic, V(k) is white Gaussian noise with known covariance matrix R(k),

andθ is multivariate Gaussian with known mean m θand covariance Pθ , ˆθMAP(k) = ˆθ a

BLU(k); hence,

for the generic linear Gaussian model, MS, MAP, and BLUE estimates ofθ are all the same, i.e.,

ˆθ MS (k) = ˆθ a

BLU(k) = ˆθMAP(k).

15.8 The Basic State-Variable Model

In the rest of this chapter we shall describe a variety of mean-squared state estimators for a linear,(possibly) time-varying, discrete-time, dynamical system, which we refer to as the basic state-variablemodel This system is characterized byn × 1 state vector x(k) and m × 1 measurement vector z(k),

and is:

x(k + 1) = 8(k + 1, k)x(k) + 0(k + 1, k)w(k) + 9(k + 1, k)u(k) (15.20)

Trang 10

z(k + 1) = H(k + 1)x(k + 1) + v(k + 1) (15.21)wherek = 0, 1, In this model w(k) and v(k) are p×1 and m×1 mutually uncorrelated (possibly

nonstationary) jointly Gaussian white noise sequences; i.e., E{w(i)w0(j)} = Q(i)δ ij , E{v(i)v0(j)} =

R(i)δ ij, and E{w(i)v0(j)} = S = 0, for all i and j Covariance matrix Q(i) is positive semidefinite

and R(i) is positive definite [so that R−1(i) exists] Additionally, u(k) is an l × 1 vector of known

system inputs, and initial state vector x(0) is multivariate Gaussian, with mean m x(0) and covariance

P x(0), and x(0) is not correlated with w(k) and v(k) The dimensions of matrices 8, 0, 9, H, Q, and R aren × n, n × p, n × l, m × n, p × p, and m × m, respectively The double arguments in

matrices8, 0, and 9 may not always be necessary, in which case we replace (k + 1, k) by k.

Disturbance w(k) is often used to model disturbance forces acting on the system, errors in modeling the system, or errors due to actuators in the translation of the known input, u(k), into physical signals Vector v(k) is often used to model errors in measurements made by sensing instruments, or

unavoidable disturbances that act directly on the sensors

Not all systems are described by this basic model In general, w(k) and v(k) may be correlated,

some measurements may be made so accurate that, for all practical purposes, they are “perfect” (i.e.,

no measurement noise is associated with them), and either w(k) or v(k), or both, may be nonzero

mean or colored noise processes How to handle these situations is described in Lesson 22 of [12]

When x(0) and{w(k), k = 0, 1, } are jointly Gaussian, then {x(k), k = 0, 1, } is a

Gauss-Markov sequence Note that if x(0) and w(k) are individually Gaussian and statistically independent,

they will be jointly Gaussian Consequently, the mean and covariance of the state vector completely

characterize it Let mx(k) denote the mean of x(k) For our basic state-variable model, mx(k) can

be computed from the vector recursive equation

m x(k + 1) = 8(k + 1, k)mx(k) + 9(k + 1, k)u(k) (15.22)wherek = 0, 1, , and mx(0) initializes (15.22) Let P x(k) denote the covariance matrix of x(k). For our basic state-variable model, P x(k) can be computed from the matrix recursive equation

P x(k + 1) = 8(k + 1, k)Px(k)80(k + 1, k) + 0(k + 1, k)Q(k)00(k + 1, k) (15.23)wherek = 0, 1, , and Px(0) initializes (15.23) Equations (15.22) and (15.23) are easily pro-grammed for a digital computer

For our basic state-variable model, when x(0), w(k), and v(k) are jointly Gaussian, then {z(k), k =

1, 2, } is Gaussian, and

and

P z(k + 1) = H(k + 1)Px(k + 1)H0(k + 1) + R(k + 1) (15.25)

where m x(k + 1) and Px(k + 1) are computed from (15.22) and (15.23), respectively

For our basic state-variable model to be stationary, it must be time-invariant, and the probability

density functions of w(k) and v(k) must be the same for all values of time Because w(k) and v(k)

are zero-mean and Gaussian, this means that Q(k) must equal the constant matrix Q and R(k) must

equal the constant matrix R Additionally, either x(0)= 0 or 8(k, 0)x(0) ≈ 0 when k > k0; in both

cases x(k) will be in its steady-state regime, so stationarity is possible.

If the basic state-variable model is time-invariant and stationary and if8 is associated with an

asymptotically stable system (i.e., one whose poles all lie within the unit circle), then [1] matrix

P x(k) reaches a limiting (steady-state) solution ¯Px and ¯P xis the solution of the following steady-stateversion of (15.23): ¯P x= 8¯Px 80+0Q00 This equation is called a discrete-time Lyapunov equation.

Tiêu đề	Estimation Theory and Algorithms: From Gauss to Wiener to Kalman
Tác giả	Jerry M. Mendel
Người hướng dẫn	Vijay K. Madisetti, Editor, Douglas B. Williams, Editor
Trường học	University of Southern California
Chuyên ngành	Digital Signal Processing
Thể loại	Chương
Năm xuất bản	1999
Thành phố	Boca Raton

Định dạng
Số trang	21
Dung lượng	273,86 KB