The problem of dual estimation can be motivated either from the needfor a model to estimate the signal or in other applications from the needfor good signal estimates to estimate the mod
Trang 1Kalman Filtering and Neural Networks, Edited by Simon Haykin
ISBN 0-471-36998-5 # 2001 John Wiley & Sons, Inc.
Copyright # 2001 John Wiley & Sons, Inc ISBNs: 0-471-36998-5 (Hardback); 0-471-22154-6 (Electronic)
Trang 2To be more specific, we consider the problem of learning both thehidden states xk and parameters w of a discrete-time nonlinear dynamicalsystem,
xkþ1 ¼ Fðxk; uk; wÞ þ vk;
where both the system states xk and the set of model parameters w for thedynamical system must be simultaneously estimated from only theobserved noisy signal yk The process noise vk drives the dynamicalsystem, observation noise is given by nk, and uk corresponds to observedexogenous inputs The model structure, FðÞ and HðÞ, may representmultilayer neural networks, in which case w are the weights
The problem of dual estimation can be motivated either from the needfor a model to estimate the signal or (in other applications) from the needfor good signal estimates to estimate the model In general, applicationscan be divided into the tasks of modeling, estimation, and prediction Inestimation, all noisy data up to the current time is used to approximate thecurrent value of the clean state Prediction is concerned with using allavailable data to approximate a future value of the clean state Modeling(sometimes referred to as identification) is the process of approximatingthe underlying dynamics that generated the states, again given only thenoisy observations Specific applications may include noise reduction(e.g., speech or image enhancement), or prediction of financial andeconomic time series Alternatively, the model may correspond to theexplicit equations derived from first principles of a robotic or vehiclesystem In this case, w corresponds to a set of unknown parameters.Applications include adaptive control, where parameters are used in thedesign process and the estimated states are used for feedback
Heuristically, dual estimation methods work by alternating betweenusing the model to estimate the signal, and using the signal to estimate themodel This process may be either iterative or sequential Iterativeschemes work by repeatedly estimating the signal using the currentmodel and all available data, and then estimating the model using theestimates and all the data (see Fig 5.1a) Iterative schemes are necessarilyrestricted to off-line applications, where a batch of data has beenpreviously collected for processing In contrast, sequential approachesuse each individual measurement as soon as it becomes available to updateboth the signal and model estimates This characteristic makes thesealgorithms useful in either on-line or off-line applications (see Fig 5.1b)
Trang 3The vast majority of work on dual estimation has been for linearmodels In fact, one of the first applications of the EKF combines both thestate vector xk and unknown parameters w in a joint bilinear state-spacerepresentation An EKF is then applied to the resulting nonlinear estima-tion problem [1, 2]; we refer to this approach as the joint extended Kalmanfilter Additional improvements and analysis of this approach are provided
in [3, 4] An alternative approach, proposed in [5], uses two separateKalman filters: one for signal estimation, and another for model estima-tion The signal filter uses the current estimate of w, and the weight filteruses the signal estimates ^xxk to minimize a prediction error cost In [6], thisdual Kalman approach is placed in a general family of recursive predictionerror algorithms Apart from these sequential approaches, some iterativemethods developed for linear models include maximum-likelihoodapproaches [7–9] and expectation-maximization (EM) algorithms [10–13] These algorithms are suitable only for off-line applications, althoughsequential EM methods have been suggested
Fewer papers have appeared in the literature that are explicitlyconcerned with dual estimation for nonlinear models One algorithm(proposed in [14]) alternates between applying a robust form of the
Figure 5.1 Two approaches to the dual estimation problem (a ) Iterative approaches use large blocks of data repeatedly (b) Sequential ap- proaches are designed to pass over the data one point at a time.
Trang 4EKF to estimate the time-series and using these estimates to train a neuralnetwork via gradient descent A joint EKF is used in [15] to modelpartially unknown dynamics in a model reference adaptive control frame-work Furthermore, iterative EM approaches to the dual estimationproblem have been investigated for radial basis function networks [16]and other nonlinear models [17]; see also Chapter 6 Errors-in-variables(EIV) models appear in the nonlinear statistical regression literature [18],and are used for regressing on variables related by a nonlinear function,but measured with some error However, errors-in-variables is an iterativeapproach involving batch computation; it tends not to be practical fordynamical systems because the computational requirements increase inproportion to N2, where N is the length of the data A heuristic methodknown as Clearning minimizes a simplified approximation to the EIV costfunction While it allows for sequential estimation, the simplification canlead to severely biased results [19] The dual EKF [19] is a nonlinearextension of the linear dual Kalman approach of [5], and recursiveprediction error algorithm of [6] Application of the algorithm to speechenhancement appears in [20], while extensions to other cost functionshave been developed in [21] and [22] The crucial, but often overlookedissue of sequential variance estimation is also addressed in [22].
Overview The goal of this chapter is to present a unified probabilisticand algorithmic framework for nonlinear dual estimation methods In thenext section, we start with the basic dual EKF prediction error method.This approach is the most intuitive, and involves simply running two EKFfilters in parallel The section also provides a quick review of the EKF forboth state and weight estimation, and introduces some of the complica-tions in coupling the two An example in noisy time-series prediction isalso given In Section 5.3, we develop a general probabilistic frameworkfor dual estimation This allows us to relate the various methods that havebeen presented in the literature, and also provides a general algorithmicapproach leading to a number of different dual EKF algorithms Results onadditional example data sets are presented in Section 5.5
5.2 DUAL EKF–PREDICTION ERROR
In this section, we present the basic dual EKF prediction error algorithm.For completeness, we start with a quick review of the EKF for stateestimation, followed by a review of EKF weight estimation (see Chapters
Trang 51 and 2 for more details) We then discuss coupling the state and weightfilters to form the dual EKF algorithm.
Another interpretation of Kalman filtering is that of an optimizationalgorithm that recursively determines the state xk in order to minimize acost function It can be shown that the cost function consists of a weightedprediction error and estimation error components given by
in Section 5.3.3
Trang 65.2.2 EKF–Weight Estimation
As proposed initially in [30], and further developed in [31] and [32], theEKF can also be used for estimating the parameters of nonlinear models(i.e., training neural networks) from clean data Consider the generalproblem of learning a mapping using a parameterized nonlinear functionGðxk; wÞ Typically, a training set is provided with sample pairs consisting
of known input and desired output, fxk; dkg The error in the model isdefined as ek ¼ dk Gðxk; wÞ, and the goal of learning involves solvingfor the parameters w in order to minimize the expected squared error TheEKF may be used to estimate the parameters by writing a new state-spacerepresentation
Trang 7corresponds to a nonlinear observation on wk The EKF can then beapplied directly, with the equations given in Table 5.2 In the linear case,the relationship between the Kalman filter (KF) and the popular recursiveleast-squares (RLS) is given [33] and [34] In the nonlinear case, the EKFtraining corresponds to a modified-Newton optimization method [22].
As an optimization approach, the EKF minimizes the prediction errorcost:
ð5:20Þ
Trang 8Set Rrk ¼ ðl11ÞPwk, where l 2 ð0; 1 is often referred to as the
‘‘forgetting factor.’’ This provides for an approximate exponentiallydecaying weighting on past data and is described more fully in [22].Set Rr
k ¼ ð1 aÞRr
k1þaKw
k½dk Gðxk; ^wwÞ½dkGðxk; ^wwÞTðKwkÞT,which is a Robbins–Monro stochastic approximation scheme forestimating the innovations [6] The method assumes that the covari-ance of the Kalman update model is consistent with the actual updatemodel
Typically, Rrk is also constrained to be a diagonal matrix, which implies anindependence assumption on the parameters
Study of the various trade-offs between these different approaches isstill an area of open research For the experiments performed in thischapter, the forgetting factor approach is used
Returning to the dynamic system of Eq (5.1), the EKF weight filter can
be used to estimate the model parameters for either F or H To learn thestate dynamics, we simply make the substitutions G ! F and dk ! xkþ1
To learn the measurement function, we make the substitutions G ! Hand dk ! yk Note that for both cases, it is assumed that the noise-freestate xk is available for training
5.2.3 Dual Estimation
When the clean state is not available, a dual estimation approach isrequired In this section, we introduce the basic dual EKF algorithm,which combines the Kalman state and weight filters Recall that the task is
to estimate both the state and model from only noisy observations.Essentially, two EKFs are run concurrently At every time step, an EKFstate filter estimates the state using the current model estimate ^wk, whilethe EKF weight filter estimates the weights using the current state estimate
^x
xk The system is shown schematically in Figure 5.2 In order to simplifythe presentation of the equations, we consider the slightly less generalstate-space model:
xkþ1 ¼ Fðxk; uk; wÞ þ vk; ð5:22Þ
yk ¼ Cxkþnk; C ¼ ½1 0 0; ð5:23Þ
in which we take the scalar observation ykto be one of the states Thus, weonly need to consider estimating the parameters associated with a single
Trang 9nonlinear function F The dual EKF equations for this system arepresented in Table 5.3 Note that for clarity, we have specified theequations for the additive white-noise case The case of colored measure-ment noise nk is treated in Appendix B.
Recurrent Derivative Computation While the dual EKF equationsappear to be a simple concatenation of the previous state and weight EKFequations, there is actually a necessary modification of the linearization
Cwk ¼ C@^xxk=@ ^wk associated with the weight filter This is due to the factthat the signal filter, whose parameters are being estimated by the weightfilter, has a recurrent architecture, i.e., ^xxk is a function of ^xxk1, and bothare functions of w.1 Thus, the linearization must be computed usingrecurrent derivatives with a routine similar to real-time recurrent learning
Update EKFx
Measurement Update EKFw
y k
k
w w
Time Update EKFx
Time Update EKFw
1 Note that a linearization is also required for the state EKF, but this derivative,
Trang 10(RTRL) [35] Taking the derivative of the signal filter equations results inthe following system of recursive equations:
@ ^w ðyk C ^xxkÞ; ð5:36Þ
Table 5.3 The dual extended Kalman filter equations The definitions of kand Cwk depend on the particular form of the weight filter being used See the text for details
@w
w¼ ^ w k
: ð5:34Þ
Trang 11where @Fð^xx; ^wwÞ=@ ^xxk and @Fð^xx; ^wwÞ=@ ^wk are evaluated at ^wk and containstatic linearizations of the nonlinear function.
The last term in Eq (5.36) may be dropped if we assume that theKalman gain Kx
k is independent of w Although this greatly simplifies thealgorithm, the exact computation of @Kxk=@ ^ww may be computed, as shown
in Appendix A Whether the computational expense in calculating therecursive derivatives (especially that of calculating @Kxk=@ ^ww) is worth theimprovement in performance is clearly a design issue Experimentally,the recursive derivatives appear to be more critical when the signal ishighly nonlinear, or is corrupted by a high level of noise
Example As an example application, consider the noisy time-series
fxkgN1 generated by a nonlinear autoregression:
xk ¼f ðxk1; xkM; wÞ þ vk;
yk ¼xk þnk 8k 2 f1; ; N g: ð5:37Þ
The observations of the series yk contain measurement noise nk in addition
to the signal The dual EKF requires reformulating this model into a space representation One such representation is given by
37
5
xk1
xkM
2664
3775
26666
3777
7þ
10
.0
26666
3777
by a neural network (10-5-1) with chaotic dynamics, driven by whiteGaussian-process noise (s2
v ¼0:36) Colored noise generated by a linearautoregressive model is added at 3 dB signal-to-noise ratio (SNR) toproduce the noisy data indicated by þ symbols Figure 5.3b shows the
Trang 12Figure 5.3 The dual EKF estimate (heavy curve) of a signal generated by a neural network (thin curve) and corrupted by adding colored noise at 3 dB (þ) For clarity, the last 150 points of a 20,000-point series are shown Only the noisy data are available: both the signal and weights are estimated by the dual EKF (a ) Clean neural network signal and noisy measurements (b) Dual EKF estimates versus EKF estimates (c ) Estimates with full and static deriva- tives (d ) MSE profiles of EKF versus dual EKF.
Trang 13time series estimated by the dual EKF The algorithm estimates both theclean time series and the neural network weights The algorithm is runsequentially over 20,000 points of data; for clarity, only the last 150 pointsare shown For comparison, the estimates using an EKF with the knownneural network model are also shown The MSE for the dual EKF,computed over the final 1000 points of the series, is 0.2171, whereasthe EKF produces an MSE of 0.2153, indicating that the dual algorithmhas successfully learned both the model and the states estimates.2
Figure 5.3c shows the estimate when the static approximation torecursive derivatives is used In this example, this static derivative actuallyprovides a slight advantage, with an MSE of 0.2122 The difference,however, is not statistically significant Finally, Figure 5.3d assesses theconvergence behavior of the algorithm The mean-squared error (MSE) iscomputed over 500 point segments of the time series at 50 point intervals
to produce the MSE profile (dashed line) For comparison, the solid line isthe MSE profile of the EKF signal estimation algorithm, which uses thetrue neural network model The dual EKF appears to converge to theoptimal solution after only about 2000 points
5.3 A PROBABILISTIC PERSPECTIVE
In this section, we present a unified framework for dual estimation Westart by developing a probabilistic perspective, which leads to a number ofpossible cost functions that can be used in the estimation process Variousapproaches in the literature, which may differ in their actual optimizationprocedure, can then be related based on the underlying cost function Wethen show how a Kalman-based optimization procedure can be used toprovide a common algorithmic framework for minimizing each of the costfunctions
MAP Estimation Dual estimation can be cast as a maximum a iori (MAP) solution The statistical information contained in the sequence
poster-of data fykgN1 about the signal and parameters is embodied by the jointconditional probability density of the sequence of states fxkgN1 and weights
2 A surprising result is that the dual EKF sometimes actually outperforms the EKF, even though the EKF appears to have an unfair advantage of knowing the true model Our explanation is that the EKF, even with the known model, is still an approximate estimation algorithm While the dual EKF also learns an approximate model, this model can actually
be better matched to the state estimation approximation.
Trang 14w, given the noisy data fykgN1 For notational convenience, define thecolumn vectors xN1 and yN1, with elements from fxkgN1 and fykgN1,respectively The joint conditional density function is written as
rxN
1 wjy N
1ðX ¼ xN1; W ¼ wjY ¼ yN1Þ; ð5:40Þ
where X, Y, and W are the vectors of random variables associated with
xN1, yN1, and w, respectively This joint density is abbreviated as rxN
1 wjyN1.The MAP estimation approach consists of determining instances of thestates and weights that maximize this conditional density For Gaussiandistributions, the MAP estimate also corresponds to the minimum mean-squared error (MMSE) estimator More generally, as long as the density isunimodal and symmetric around the mean, the MAP estimate provides theBayes estimate for a broad class of loss functions [36]
Taking MAP as the starting point allows dual estimation approaches to
be divided into two basic classes The first, referred to here as jointestimation methods, attempt to maximize rxN
Trang 155.3.1 Joint Estimation Methods
Using Bayes’ rule, the joint conditional density can be expressed as
Although fykgN1 is statistically dependent on fxkgN1 and w, the prior ryN
1 isnonetheless functionally independent of fxkgN1 and w Therefore, rxN
1 wjy N 1
can be maximized by maximizing the terms in the numerator alone.Furthermore, if no prior information is available on the weights, rwcan bedropped, leaving the maximization of
with respect to fxkgN1 and w
To derive the corresponding cost function, we assume vk and nk areboth zero-mean white Gaussian noise processes It can then be shown (see[22]), that
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið2pÞNjRvjN
Trang 16Here we have used the structure given in Eq (5.37) to compute theprediction xk using the model Fð; wÞ Taking the logarithm, the corre-sponding cost function is given by
This cost function can be minimized with respect to any of the unknownquantities (including the variances, which we will consider in Section 5.4).For the time being, consider only the optimization of fxkgN1 and w.Because the log terms in the above cost are independent of the signaland weights, they can be dropped, providing a more specialized costfunction:
The first term is a soft constraint keeping fxkgN1 close to the observations
fykgN1 The smaller the measurement noise variance s2n, the stronger thisconstraint will be The second term keeps the state estimates and modelestimates mutually consistent with the model structure This constraintwill be strong when the state is highly deterministic (i.e., Rv is small)
JjðxN1; wÞ should be minimized with respect to both fxkgN1 and w to findthe estimates that maximize the joint density ryN
1 x N
1 jw This is a difficultoptimization problem because of the high degree of coupling between theunknown quantities fxkgN1 and w In general, we can classify approaches asbeing either direct or decoupled In direct approaches, both the signal andthe state are determined jointly as a multivariate optimization problem.Decoupled approaches optimize one variable at a time while the othervariable is fixed, and then alternating Direct algorithms include the jointEKF algorithm (see Section 5.1), which attempts to minimize the costsequentially by combining the signal and weights into a single (joint) statevector The decoupled approaches are elaborated below
Decoupled Estimation To minimize JjðxN1; wÞ with respect to thesignal, the cost function is evaluated using the current estimate ^ww of the
Trang 17weights to generate the predictions The simplest approach is to substitutethe predictions ^xxk ¼DFðxk1; ^wwÞ directly into Eq (5.50), obtaining
k ¼DFð ^xxk1; wÞ Again, this results in a straightforwardsubstitution in Eq (5.50):
in [14] for robust prediction of time series containing outliers
In the decoupled approach to joint estimation, by separately minimizingeach cost with respect to its argument, the values are found that maximize(at least locally) the joint conditional density function Algorithms that fallinto this class include a sequential two-observation form of the dual EKFalgorithm [21], and the errors-in-variables (EIV) method applied to batch-style minimization [18, 19] An alternative approach, referred to as errorcoupling, makes the extra step of taking the errors in the estimates intoaccount However, this error-coupled approach (investigated in [22]) doesnot appear to perform reliably, and is not described further in this chapter
Trang 185.3.2 Marginal Estimation Methods
Recall that in marginal estimation, the joint density function is expandedas
1 jwy N
1) is dependent on the state Hence, maximizing this factorfor the state will yield the same solution as when maximizing the jointdensity (assuming the optimal weights have been found) However,because both factors also depend on w, maximizing the second (rwjyN
1)alone with respect to w is not the same as maximizing the joint density
rxN
1 wjy N
1 with respect to w Nonetheless, the resulting estimates ^ww areconsistent and unbiased, if conditions of sufficient excitation are met [37].The marginal estimation approach is exemplified by the maximum-likelihood approaches [8, 9] and EM approaches [11, 12] Motivation forthese methods usually comes from considering only the marginal density
If there is no prior information on w, maximizing this posterior density isequivalent to maximizing the likelihood function ryN
1 jw Assuming sian statistics, the chain rule for conditional probabilities can be used toexpress this likelihood function as:
Gaus-ryN
1 jw ¼ QN k¼1
1ffiffiffiffiffiffiffiffiffiffiffi2ps2
Trang 19be determined as a step to weight estimation For linear models, this can
be done exactly using an ordinary Kalman filter For nonlinear models,however, the expectation is approximated by an extended Kalman filter,which equivalently attempts to minimize the joint cost Jjðxk1; ^wwÞ defined inSection 5.3.1 by Eq (5.51)
An iterative maximum-likelihood approach for linear models isdescribed in [7] and [8]; this chapter presents a sequential maximum-likelihood approach for nonlinear models, developed in [21]
Prediction Error Cost Often the variance s2ek in the lihood cost is assumed (incorrectly) to be independent of the weights wand the time index k Under this assumption, the log likelihood can bemaximized by minimizing the squared prediction error cost function:
in the literature to be quite useful In addition, they benefit from reducedcomputational cost, because the derivative of the variance s2ek with respect
to w is not computed
Trang 20EM Algorithm Another approach to maximizing rwjyN
1 is offered by theexpectation-maximization (EM) algorithm [10, 12, 38] The EM algorithmcan be derived by first expanding the log-likelihood as
where the expectation over X of the left-hand side has no effect, because Xdoes not appear in log ryN
1 jw Note that the expectation is conditional on aprevious estimate of the weights, ^ww The second term on the right isconcave by Jensen’s inequality [39],3so choosing w to maximize the firstterm on the right-hand side alone will always increase the log-likelihood
on the left-hand side Thus, the EM algorithm repeatedly maximizes
1 jw.For the white-noise case, it can be shown (see [12, 22]) that the EM costfunction is
3 Jensen’s inequality states that E½gðxÞ gðE½xÞ for a concave function gðÞ.
Trang 21expression for the special case of time-series estimation, represented in
Eq (5.37) As shown in [22], the expectation evaluates to
Summary of Cost Functions The various cost functions given in thissection are summarized in Table 5.4 No explicit signal estimation cost isgiven for the marginal estimation methods, because signal estimation isonly an implicit step of the marginal approach, and uses the joint cost
JjðxN1; ^wwÞ These cost functions, combined with specific optimizationmethods, lead to the variety of algorithms that appear in the literature
Table 5.4 Summary of dual estimation cost functions
Trang 22In the next section, we shall show how each of these cost functions can beminimized using a general dual EKF-based approach.
5.3.3 Dual EKF Algorithms
In this section, we show how the dual EKF algorithm can be modified tominimize any of the cost functions discussed earlier Recall that the basicdual EKF as presented in Section 5.2.3 minimized the prediction error cost
of Eq (5.59) As was shown in the last section, all approaches use thesame joint cost function for the state-estimation component Thus, thestate EKF remains unchanged Only the weight EKF must be modified
We shall show that this involves simply redefining the error term k
To develop the method, consider again the general state-space tion for weight estimation (Eq (5.11)):
Trang 23P1w
k ¼ ðl1Pw
k1Þ1þ ðCwkÞTðReÞ1Cwk: ð5:72ÞThe weight update in Eq (5.68) is of the form
^
wk ¼w^k SkHwJ ð ^wkÞT; ð5:73Þ
where HwJ is the gradient of the cost J with respect to w, and Sk is asymmetric matrix that approximates the inverse Hessian of the cost Boththe gradient and Hessian are evaluated at the previous value of the weightestimate Thus, we see that by using the observed error formulation, it ispossible to redefine the error term k, which in turn allows us to minimize
an arbitrary cost function that can be expressed as a sum of instantaneousterms Jk ¼ Tk k This basic idea was presented by Puskorius and Feld-kamp [40] for minimizing an entropic cost function; see also Chapter 2.Note that Jk ¼ Tk k does not uniquely specify k, which can be vector-valued The error must be chosen such that the gradient and inverseHessian approximations (Eqs (5.70) and (5.72)) are consistent with thedesired batch cost
In the following sections, we give the exact specification of the errorterm (and corresponding gradient Cwk) necessary to modify the dual EKFalgorithm to minimize the different cost functions The original set of dualEKF equations given in Table 5.3 remains the same, with only k beingredefined Note that for each case, the full evaluation of Cwk requires takingrecursive gradients The procedure for this is analogous to that taken inSection 5.2.3 Furthermore, we restrict ourselves to the autoregressivetime-series model with state-space representation given in Eqs (5.38) and(5.39)
Joint Estimation Forms The corresponding weight cost function (seealso Eq (5.52)) and error terms are given in Table 5.5 Note that thisrepresents a special two-observation form of the weight filter, where
^xxt ¼f ð ^xxt1; wÞ; ek¼4 ðyk^xxkÞ; ~^xx^xxk¼4 ð^xxk^xxÞk;
Note that this dual EKF algorithm represents a sequential form of thedecoupled approach to joint optimization; that is, the two EKFs minimizethe overall joint cost function by alternately optimizing one argument at a
Trang 24time while the other argument is fixed A direct approach found using thejoint EKF is described later in Section 5.3.4.
Marginal Estimation Forms–Maximum-Likelihood Cost Thecorresponding weight cost function (see Eq (5.58)) and error terms aregiven in Table 5.6, where
Table 5.6 Maximum-likelihood cost function observed
error terms for dual EKF weight filter
JmlðwÞ ¼ P N
k¼1 logð2ps2ekÞ þ ðyk ^xx
3 7 5:
Table 5.5 Joint cost function observed error terms for the dual EKF weight filter
Trang 25Marginal Estimation Forms–Prediction Error Cost If s2
ek isassumed to be independent of w, then we are left with the formulascorresponding to the original basic dual EKF algorithm (for the time-seriescase); see Table 5.7
Marginal Estimation Forms–EM Cost The dual EKF can be fied to implement a sequential EM algorithm Note that the M-step, whichrelates to the weight filter, corresponds to a generalized M-step, in whichthe cost function is decreased (but not necessarily minimized) at eachiteration The formulation is given in Table 5.8, where ~^xx^xxkjk ¼^xxk^xx
modi-kjk.Note that JkemðwÞ was specified by dropping terms in Eq (5.63) that areindependent of the weights (see [22]) While ^xxk are found by the usualstate EKF, the variance terms pykjk, and pkjk, as well as ^xxkjk (a noncausalprediction), are not typically computed in the normal implementation ofthe state EKF To compute these, the state vector is augmented by oneadditional lagged value of the signal:
T
w pkjk
2 6 6 6 6
3 7 7 7 7 :