23 Adaptive IIR FiltersFilter-23.2 The Equation Error Approach The LMS and LS Equation Error Algorithms •InstrumentalVariable Algorithms•Equation Error Algorithms with Unit Norm Constrai
Trang 1Williamson, G.A “Adaptive IIR Filters”
Digital Signal Processing Handbook
Ed Vijay K Madisetti and Douglas B Williams Boca Raton: CRC Press LLC, 1999
Trang 223 Adaptive IIR Filters
Filter-23.2 The Equation Error Approach
The LMS and LS Equation Error Algorithms •InstrumentalVariable Algorithms•Equation Error Algorithms with Unit Norm Constraints
23.3 The Output Error Approach
Gradient-Descent Algorithms • Output Error AlgorithmsBased on Stability Theory
23.4 Equation-Error/Output-Error Hybrids
The Steiglitz-McBride Family of Algorithms
23.5 Alternate Parametrizations23.6 Conclusions
of the IIR structure, compared to the “all-zero” form of the FIR structure
However, adapting an IIR filter brings with it a number of challenges in obtaining stable andoptimal behavior of the algorithms used to adjust the filter parameters Since the 1970s, there hasbeen much active research focused on adaptive IIR filters, but many of these challenges to date havenot been completely resolved As a consequence, adaptive IIR filters are not found in commercialpractice in anywhere near the frequency that adaptive FIR filters are Nonetheless, recent advances
in adaptive IIR filter research have provided new results and insights into the behavior of severalmethods for adapting the filter parameters, and new algorithms have been proposed that addresssome of the problems and open issues in these systems Hence, this class of adaptive filter continues
to maintain promise as a potentially effective and efficient adaptive filtering option
In this section, we provide an up-to-date overview of the different approaches to the adaptive IIRfiltering problem Due to the extensive literature on the subject, many readers may wish to peruseseveral earlier general treatments of the topic Johnson’s 1984 [11] and Shynk’s 1989 paper [23] arestill current in the sense that a number of open issues cited therein remain open today More recently,Regalia’s 1995 book [19] provides a comprehensive view of the subject
Trang 323.1.1 The System Identification Framework for Adaptive IIR Filtering
The spread of issues associated with adaptive IIR filters is most easily understood if one adopts a systemidentification perspective to the filtering problem To this end, consider the diagram presented inFig.23.1 Available to the adaptive filter are two external signals: the input signalx(n) and the desired
output signald(n) The adaptive filtering problem is to adjust the parameters of the filter acting on x(n) so that its output y(n) approximates d(n) From the system identification perspective, the task at
hand is to adjust the parameters of the filter generatingy(n) from x(n) in Fig.23.1so that the filtering operation itself matches in some sense the system generating d(n) from x(n) These two viewpoints
are closely related because if the systems are the same, then their outputs will be close However, byadopting the convention that there is a system generatingd(n) from x(n), clearer insights into the
behavior and design of adaptive algorithms are obtained This insight is useful even if the “system”generatingd(n) from x(n) has only a statistical and not a physical basis in reality.
FIGURE 23.1: System identification configuration of the adaptive IIR filter
The standard adaptive IIR filter is described by
y(n)+a1(n)y(n−1)+· · ·+a N (n)y(n−N) = b0(n)x(n)+b1(n)x(n−1)+· · ·+b M (n)x(n−M) ,
Trang 4whereB(q−1, n) and A(q−1, n) are the time-dependent polynomials in the delay operator q−1appearing in (23.2) The parameters that are updated by the adaptive algorithm are the coefficients
of these polynomials Note that the polynomialA(q−1, n) is constrained to be monic, such that
a0(n) = 1.
We adopt a rather more general description for the unknown system, assuming thatd(n) is
gen-erated from the input signalx(n) via some linear time-invariant system H (q−1), with the addition
of a noise signalv(n) to reflect components in d(n) that are independent of x(n) We further break
downH(q−1) into a transfer function H m (q−1) that is explicitly modeled by the adaptive filter, and
a transfer functionH u (q−1) that is unmodeled In this way, we view d(n) as a sum of three
compo-nents: the signaly m (n) that is modeled by the adaptive filter, the signal y u (n) that is unmodeled but
that depends on the input signal, and the signalv(n) that is independent of the input Hence,
withBopt(q−1) =PM i=0 b i,optq −i andAopt(q−1) = 1 +PN i=i a i,optq −i Note that (23.6) has the
same form as (23.3) The parameters{a i,opt} and {b i,opt} are considered to be the optimal values forthe adaptive filter parameters, in a manner that we describe shortly
Figure23.1 shows two error signals: e e (n) termed the equation error , and e o (n), termed the output error The parameters of the adaptive filter are usually adjusted so as to minimize some
positive function of one or the other of these error signals However, the figure of merit for judgingadaptive filter performance that we will apply throughout this section is the mean-square output error
E{e2
o (n)} In most adaptive filtering applications, the desired signal, d(n), is available only during a
“training phase” in which the filter parameters are adapted At the conclusion of the training phase,the filter will be operated to produce the output signaly(n) as shown in the figure, with the difference
between the filter outputy(n) and the (now unmeasurable) system output d(n) the error Thus,
we adopt the convention that{a i,opt} and {b i,opt} are defined such that when a i (n) ≡ a i,opt and
b i (n) ≡ b i,opt, E{e2
o (n)} is minimized, with Aopt(q−1) constrained to be stable.
At this point it is convenient to set down some notation and terminology Define the regressorvectors
Ue (n) = [x(n) · · · x(n − M) − d(n − 1) · · · − d(n − N)] T , (23.7)
Uo (n) = [x(n) · · · x(n − M) − y(n − 1) · · · − y(n − N)] T , (23.8)
Um (n) = [x(n) · · · x(n − M) − y m (n − 1) · · · − y m (n − N)] T (23.9)These vectors are the equation error regressor, output error regressor, and modeled system regressorvectors, respectively Define a noise regressor vector
withM + 1 leading zeros corresponding to the x(n − i) values in the preceding regressors
Further-more, define the parameter vectors
Trang 5We will have occasion to use W to refer to the adaptive filter parameter vector when the parameters
are considered to be held at fixed values With this notation, we may for instance writey m (n) =
UT
m (n)Woptandy(n) = U T
o (n)W(n).
The situation in whichy u (n) ≡ 0 is referred to as the sufficient order case The situation in which
y u (n) 6≡ 0 is termed the undermodeled case.
23.1.2 Algorithms and Performance Issues
A number of different algorithms for the adaptation of the parameter vectorW(n) in (23.11) havebeen suggested These may be characterized with respect to the form of the error criterion employed
by the algorithm Each algorithm attempts to drive to zero either the equation error, the output error,
or some combination or hybrid of these two error criteria Major algorithm classes that we considerfor the equation error approach include the standard least-squares (LS) and least mean-square (LMS)algorithms, which parallel the algorithms used in adaptive FIR filtering For equation error meth-ods, we also examine the instrumental variables (IV) algorithm, as well as algorithms that constrainthe parameters in the denominator of the adaptive filter’s transfer function to improve estimationproperties In the output error class, we examine gradient algorithms and hyperstability-based algo-rithms Within the equation and output error hybrid algorithm class, we focus predominantly on theSteiglitz-McBride (SM) algorithm, though there are several algorithms that are more straightforwardcombinations of equation and output error approaches
In general, we desire that the adaptive filtering algorithm adjusts the parameter vector Wnso that
it converges to Wopt, the parameters that minimize the mean-square output error The major issuesfor adaptive IIR filtering on which we will focus herein are
1 conditions for the stability and convergence of the algorithm used to adapt W(n), and
2 the asymptotic value of the adapted parameter vector W∞, and its relationship to Wopt.This latter issue relates to the minimum mean-square error achievable by the algorithm, as notedabove Other issues of importance include the convergence speed of the algorithm, its ability totrack time variations of the “true” parameter values, and numerical properties, but these will receiveless attention here Of these, convergence speed is of particular concern to practitioners, especially
as adaptive IIR filters tend to converge at a far slower rate than their FIR counterparts However,
we emphasize the stability and nature of convergence over the speed because if the algorithm fails
to converge or converges to an undesirable solution, the rate at which it does so is of less concern.Furthermore, convergence speed is difficult to characterize for adaptive IIR filters due to a number offactors, including complicated dependencies on algorithm initializations, input signal characteristics,and the relationship betweenx(n) and d(n).
23.1.3 Some Preliminaries
Unless otherwise indicated, we assume in our discussion that all signals in Fig.23.1are stationary,zero mean, random signals with finite variance In particular, the properties we ascribe to the variousalgorithms are stated with this assumption and are presumed to be valid Results that are based on adeterministic framework are similar to those developed here; see [1] for an example
We shall also make use of the following definitions
DEFINITION 23.1 A (scalar) signal is persistently exciting (PE) of orderL if, with
Trang 6there existα and β satisfying 0 < α < β < ∞ such that αI < E{X(n)X T (n)} < βI The (vector)
signal X(n) is then also said to be PE.
Ifx(n) contains at least L/2 distinct sinusoidal components, then x(n) is PE of order L Any
random signalx(n) whose power spectrum is nonzero over a interval of nonzero width will be PE
for any value ofL in (23.15) Such is the case, for example, ifx(n) is uncorrelated or if x(n) is
modeled as an AR, MA, or ARMA process driven by uncorrelated noise PE conditions are required
of all adaptive algorithms to ensure good behavior because if there is inadequate excitation to provideinformation to the algorithm, convergence of the adapted parameters estimates will not necessaryfollow [22]
DEFINITION 23.2 A transfer functionH (q−1) is said to be strictly positive real (SPR) if H (q−1)
is stable and the real part of its frequency response is positive at all frequencies
An SPR condition will be required to ensure convergence for a few of the algorithms that we discuss.Note that such a condition cannot be guaranteed in practice whenH(q−1) is an unknown transfer
function, or whenH(q−1) depends on an unknown transfer function.
23.2 The Equation Error Approach
To motivate the equation error approach, consider again Fig.23.1 Suppose thaty(n) in the figure
were actually equal tod(n) Then the system relationship A(q−1, n)y(n) = B(q−1, n)x(n) would
imply thatA(q−1, n)d(n) = B(q−1, n)x(n) But of course this last equation does not hold exactly,
and we term its error the “equation error”e e (n) Hence, we define
Using the notation developed in (23.7) through (23.14), we find that
e e (n) = d(n) − U T
Equation error methods for adaptive IIR filtering typically adjust W(n) so as to minimize the
mean-squared error (MSE)JMSE(n) = E{e2
e (n)}, where E{·} denotes statistical expectation, or the
expo-nentially weighted least-squares (LS) errorJLS(n) =Pn k=0 λ n−k e2
e (k).
23.2.1 The LMS and LS Equation Error Algorithms
The equation errore e (n) of (23.17) is the difference betweend(n) and a prediction of d(n) given by
UT
e (n)W(n) Noting that U T
e (n) does not depend on W(n), we see that equation error adaptive IIR
filtering is a type of linear prediction, and in particular the form of the prediction is identical to thatarising in adaptive FIR filtering One would suspect that many adaptive FIR filter algorithms wouldthen apply directly to adaptive IIR filters with an equation error criterion, and this is in fact the case.Two adaptive algorithms applicable to equation error adaptive IIR filtering are the LMS algorithmgiven by
Trang 7where the above expression forP (n) is a recursive implementation of
d ) are chosen With the normalized step
size, we require 0< ¯µ < 2 and > 0 for stability, with typical choices of ¯µ = 0.1 and = 0.001.
In23.20, we require thatλ satisfy 0 < λ ≤ 1, with λ typically close to or equal to one, and we
initializeP (0) = γ I with γ a large, positive number These results are analogous to the FIR filter
cases considered in the earlier sections of this chapter
These algorithms possess nice convergence properties, as we now discuss
Property 1: Given that x is P E of order N + M + 1, under ( 23.18 ) and under ( 23.19 ) and ( 23.20 ), with algorithm parameters chosen to satisfy the conditions noted above, then E{W(n)} converges to a
value W∞minimizing J MSE (n) and J LS (n), respectively, as n → ∞.
This property is desirable in that global convergence to parameter values optimal for the equationerror cost function is guaranteed, just as with adaptive FIR filters The convergence result holdswhether the filter is operating in the sufficient order case or the undermodeled case This is animportant advantage of the equation error approach over other approaches The reader is referred
to Chapters 19, 20, and 21 for further details on the convergence behaviors of these algorithms andtheir variations As in the FIR case, the eigenvalues of the matrixR = E{U e (n)U T
e (n)} determine
the rates of convergence for the LMS algorithm A large eigenvalue disparity inR engenders slow
convergence in the LMS algorithm and ill-conditioning, with the attendant numerical instabilities,
in the RLS algorithm For adaptive IIR filters, compared to the FIR case, the presence ofd(n) in
Ue (n) tends to increase the eigenvalue disparity, so that slower convergence is typically observed for
these algorithms
Of importance is the value of the convergence points for the LMS and RLS algorithms with respect
to the modeling assumptions of the system identification configuration of Fig.23.1 For simplicity,let us first assume that the adaptive filter is capable of modeling the unknown system exactly; that is,
H u (q−1) = 0 One may readily show that the parameter vector W that minimizes the mean-square
equation error (or equivalently the asymptotic least square equation error, given ergodic stationarysignals) is
= E{U m (n)U m T (n)} + EnV(n)V T (n)o−1
Clearly, ifv(n) ≡ 0, the W so obtained must equal Wopt, so that we have
Wopt= EnUm (n)U T m (n)o−1E{U m (n)y m (n)} (23.24)
By comparing (23.23) and (23.24), we can easily see that whenv(n) 6≡ 0, W 6= Wopt That is, theparameter estimates provided by (23.18) through (23.20) are, in general, biased from the desired
values, even when the noise termv(n) is uncorrelated.
What effect on adaptive filter performance does this bias impose? Since the parameters that
minimize the mean-square equation error are not the same as Wopt, the values that minimize the
Trang 8mean-square output error, the adaptive filter performance will not be optimal Situations can arise
in which this bias is severe, with correspondingly significant degradation of performance
Furthermore, a critical issue with regard to the parameter bias is the input-output stability of theresulting IIR filter Because the equation error is formed asA(q−1)d(n) − B(q−1)x(n), a difference
of two FIR filtered signals, there are no built in constraints to keep the roots ofA(q−1) within the
unit circle in the complex plane Clearly, if an unstable polynomial results from the adaptation, thenthe filter outputy(n) can grow unboundedly in operational mode, so that the adaptive filter fails An
example of such a situation is given in [25] An important feature of this example is that the adaptivefilter is capable of precisely modeling the unknown system, and that interactions of the noise processwithin the algorithm are all that is needed to destabilize the resulting model
Nonetheless, under certain operating conditions, this kind of instability can be shown not to occur,
as described in the following
Property 2: [ 18 ] Consider the adaptive filter depicted in Fig 23.1 , where y(n) is given by ( 23.2 ) If x(n) is an autoregressive process of order no more than N, and v(n) is independent of x(n) and of finite variance, then the adaptive filter parameters minimizing the mean-square equation error E{e2
e (n)} are such that A(q−1) is stable.
For instance, ifx(n) is an uncorrelated signal, then the convergence point of the equation error
algorithms corresponds to a stable filter
To summarize, for LMS and RLS adaptation in an equation error setting, we have guaranteedglobal convergence, but bias in the presence of additive noise even in the exact modeling case, and
an estimated model guaranteed to be stable only under a limited set of conditions
23.2.2 Instrumental Variable Algorithms
A number of different approaches to adaptive IIR filtering have been proposed with the intention ofmitigating the undesirable biased properties of the LMS- and RLS-based equation error adaptive IIRfilters One such approach, still within the equation error context, is the instrumental variables (IV)method Observe that the bias problem illustrated above stems from the presence ofv(n) in both
Ue (n) and in e e (n) in the update terms in (23.18) and (23.19), so that second order terms inv(n) then
appear in (23.23) This simultaneous presence creates, in expectation, a nonzero, noise-dependent
driving term to the adaptation The IV algorithm approach addresses this by replacing Ue (n) in these
algorithms with a vector Uiv (n) of instrumental variables that are independent of v(n) If U iv (n)
remains correlated with Um (n), the noiseless regressor, convergence to unbiased filter parameters is
withλ(n) = 1 − µ(n) Common choices for λ(n) are to set λ(n) ≡ λ0, a fixed constant in the range
0 < λ < 1 and usually chosen in the range between 0.9 and 0.99, or to choose µ(n) = 1/n and λ(n) = 1 − µ(n) As with RLS methods, P (0) = γ I with γ a large, positive number The vector
Uiv (n) is typically chosen as
Uiv (n) = [x(n) · · · x(n − M) − z(n − 1) · · · − z(n − N)] T (23.27)with either
z(n) = −x(n − M) or z(n) = ¯B(q−1)
¯
Trang 9In the first case, Uiv (n) is then simply an extended regressor in the input x(n), while the second
choice may be viewed as a regressor parallel to Um (n), with z(n) playing the role of y m (n) For this
choice, one may think of ¯A(q−1) and ¯B(q−1) as fixed filters chosen to approximate Aopt(q−1) and
Bopt(q−1), but the exact choice of ¯ A(q−1) and ¯B(q−1) is not critical to the qualitative behavior of
the algorithm In both cases, note that Uiv (n) is independent of v(n), since d(n) is not employed in
its construction
The convergence of this algorithm is described by the following property, derived in [15]
Property 3: In the sufficient order case with x(n) PE of order at least N + M + 1, the IV algorithm
in ( 23.25 ) and ( 23.26 ) with U iv (n) chosen according to ( 23.27 ) or ( 23.28 ) causes E{W(n)} to converge
to W∞= Wopt
There are a few additional technical conditions anAopt(q−1), Bopt(q−1), ¯ A(q−1), and ¯B(q−1) that
are required for the property to hold These conditions will be satisfied in almost all circumstances; fordetails, the reader is referred to [15] This convergence property demonstrates that the IV algorithmdoes in fact achieve unbiased parameter estimates in the sufficient order case
In the undermodeled case, little has been said regarding the behavior and performance of the
IV algorithm A convergence point W∞must satisfyE{U iv (n) − (d(n)U T
e (n)W∞)} = 0, but no
characterization of such points exists ifN and M are not of sufficient order Furthermore, it is
possible for the IV algorithm to converge to a point such that 1/A(q−1) is unstable [9]
Notice that (23.25) and (23.26) are similar in form to the RLS algorithm One may postulate an
“LMS-style” IV algorithm as
which is computationally much simpler than the “RLS-style” IV algorithm of (23.25) and (23.26)
However, the guarantee of convergence of the algorithm to Woptin the sufficient order case for the
RLS-style algorithm is now complicated by an additional requirement on Uiv (n) for convergence of
the algorithm in (23.29) In particular, all eigenvalues of
must lie strictly in the right half of the complex plane Since the properties of Ue (n) depend on
the unknown relationship betweenx(n) and d(n), one is generally unable to guarantee a priori
satisfaction of such conditions This situation has parallels with the stability-theory approach tooutput error algorithms, as discussed later in this section
Summarizing the IV algorithm properties, we have that in the sufficient order case, the RLS-style
IV algorithm is guaranteed to converge to unbiased parameter values However, an understandingand characterization of its behavior in the undermodeled case is yet incomplete, and the IV algorithmmay produce unstable filters
23.2.3 Equation Error Algorithms with Unit Norm Constraints
A different approach to mitigating the parameter bias in equation error methods arises as follows.Consider modifying the equation error of (23.17) to
Trang 10and allowing for adaptation of the new parametera0(n) One can view the equation error algorithms
that we have already discussed as adapting the coefficients of this version ofA(q−1, n), but with
a monic constraint that imposes a0(n) = 1 Recently, several algorithms have been proposed that consider instead equation error methods with a unit norm constraint In these schemes, one adapts
W(n) and a0(n) subject to the constraint
Property 4: [ 18 ] Consider the adaptive filter in Fig 23.1 with A(q−1, n) given by ( 23.32 ), with v(n)
an uncorrelated signal and with H u (q−1) = 0 (the sufficient order case) Then the parameter values W
and a0that minimize E{e2
e (n)} subject to the unit norm constraint ( 23.33 ) satisfy W /a0= Wopt
That is, the parameter estimates are unbiased in the sufficient order case with uncorrelated
out-put noise Note that normalizing the coefficients in W bya0recovers the monic character of the
denominator for Wopt:
B(q−1) A(q−1) =
b0+ b1q−1+ · · · + b Mq−M
= (b0/a0) + (b1/a0) q−1+ · · · + (b M /a0) q −M
1+ (a1/a0) q−1+ · · · + (a N /a0) q −N (23.35)
In the undermodeled case, we have the following
Property 5: [ 18 ] Consider the adaptive filter in Fig 23.1 with A(q−1, n) given by ( 23.32 ) If x(n) is
an autoregressive process of order no more than N, and v(n) is independent of x(n) and of finite variance,
then the parameter values W and a0that minimize E{e2
e (n)} subject to the unit norm constraint ( 23.33 ) are such that A(q−1) is stable Furthermore, at those minimizing parameter values, if x(n) is an uncorrelated input, then
E{e2
e (n)} ≤ σ2
where σ N+1 is the (N + 1) st Hankel singular value of H (z).
Notice that Property 5 is similar to Property 2, except that we have the added bonus of a bound
on the mean-square equation error in terms of the Hankel singular values ofH (q−1) Note that
the(N + 1)st Hankel singular value ofH (q−1) is related to the achievable modeling error in an Nth order, reduced order approximation to H (q−1) (see [19, Ch.4] for details) This bound thusindicates that the optimal unit norm constrained equation error filter will in fact do about as well
as can be expected with anNth order filter However, this adaptive filter will suffer, just as with the
equation error approaches with the monic constraint on the denominator, from a possibly unstabledenominator if the inputx(n) is not an autoregressive process.
An adaptive algorithm for minimizing the mean-square equation error subject to the unit normconstraint can be found in [4] The algorithm of [4] is formulated as a recursive total least squaresalgorithm using a two-channel, fast transversal filter implementation The connection between totalleast squares and the unit norm constrained equation error adaptive filter implies that the correlationmatrices that are embedded within the adaptive algorithm will be more poorly conditioned than thecorrelation matrices arising in the RLS algorithm Consequently, convergence will be slower for theunit norm constrained approach than in the standard, monic constraint approach