8.2.1 Introduction to Adaptive Filtering An adaptive filter consists of two distinct parts ± a digital filter to perform the desiredsignal processing, and an adaptive algorithm to adjust
Trang 1Adaptive Filtering
As discussed in previous chapters, filtering refers to the linear process designed to alterthe spectral content of an input signal in a specified manner In Chapters 5 and 6, weintroduced techniques for designing and implementing FIR and IIR filters for givenspecifications Conventional FIR and IIR filters are time-invariant Theyperform linearoperations on an input signal to generate an output signal based on the fixed coeffi-cients Adaptive filters are time varying, filter characteristics such as bandwidth andfrequencyresponse change with time Thus the filter coefficients cannot be determinedwhen the filter is implemented The coefficients of the adaptive filter are adjustedautomaticallybyan adaptive algorithm based on incoming signals This has the import-ant effect of enabling adaptive filters to be applied in areas where the exact filteringoperation required is unknown or is non-stationary
In Section 8.1, we will review the concepts of random processes that are useful in thedevelopment and analysis of various adaptive algorithms The most popular least-mean-square (LMS) algorithm will be introduced in Section 8.2 Its important properties will beanalyzed in Section 8.3 Two widely used modified adaptive algorithms, the normalizedand leakyLMS algorithms, will be introduced in Section 8.4 In this chapter, we introduceand analyze the LMS algorithm following the derivation and analysis given in [8] InSection 8.5, we will brieflyintroduce some important applications of adaptive filtering.The implementation considerations will be discussed in Section 8.6, and the DSP imple-mentations using the TMS320C55x will be presented in Section 8.7
8.1 Introduction to Random Processes
A signal is called a deterministic signal if it can be described preciselyand be reproducedexactlyand repeatedly However, the signals encountered in practice are not necessarily
of this type A signal that is generated in a random fashion and cannot be described bymathematical expressions or rules is called a random (or stochastic) signal The signals
in the real world are often random in nature Some common examples of randomsignals are speech, music, and noises These signals cannot be reproduced and need to
be modeled and analyzed using statistical techniques We have briefly introducedprobabilityand random variables in Section 3.3 In this section, we will review theimportant properties of the random processes and introduce fundamental techniquesfor processing and analyzing them
Real-Time Digital Signal Processing Sen M Kuo, Bob H Lee
Copyright # 2001 John Wiley& Sons Ltd ISBNs: 0-470-84137-0 (Hardback); 0-470-84534-1 (Electronic)
Trang 2A random process maybe defined as a set of random variables We associate a timefunction x n x n, A with everypossible outcome A of an experiment Each timefunction is called a realization of the random process or a random signal The ensemble
of all these time functions (called sample functions) constitutes the random process x n
If we sample this process at some particular time n0, we obtain a random variable Thus
a random process is a familyof random variables
We mayconsider the statistics of a random process in two ways If we fix the time n at
n0 and consider the random variable x n0, we obtain statistics over the ensemble Forexample, Ex n0 is the ensemble average, where E is the expectation operationintroduced in Chapter 3 If we fix A and consider a particular sample function, wehave a time function and the statistics we obtain are temporal For example, Ex n, Ai
is the time average If the time average is equal to the ensemble average, we saythat theprocess is ergodic The propertyof ergodicityis important because in practice we oftenhave access to onlyone sample function Since we generallywork onlywith temporalstatistics, it is important to be sure that the temporal statistics we obtain are the truerepresentation of the process as a whole
8.1.1 Correlation Functions
For manyapplications, one signal is often used to compare with another in order todetermine the similaritybetween the pair, and to determine additional informationbased on the similarity Autocorrelation is used to quantify the similarity between twosegments of the same signal The autocorrelation function of the random process x(n) isdefined as
This function specifies the statistical relation of two samples at different time index nand k, and gives the degree of dependence between two random variables of n kunits apart For example, consider a digital white noise x(n) as uncorrelated randomvariables with zero-mean and variance s2
x The autocorrelation function is
to measure the degree in which the two signals are similar The crosscorrelationand crosscovariance functions between two random processes x(n) and y(n) are definedas
Trang 3gxy n, k Efx n mx ny k my kg rxy n, k mx nmy k: 8:1:5Correlation is a veryuseful DSP tool for detecting signals that are corruptedbyadditive random noise, measuring the time delaybetween two signals, determiningthe impulse response of a system (such as obtain the room impulse response used inSection 4.5.2), and manyothers Signal correlation is often used in radar, sonar, digitalcommunications, and other engineering areas For example, in CDMA digital commu-nications, data symbols are represented with a set of unique key sequences If one ofthese sequences is transmitted, the receiver compares the received signal with everypossible sequence from the set to determine which sequence has been received In radarand sonar applications, the received signal reflected from the target is the delayedversion of the transmitted signal Bymeasuring the round-trip delay, one can determinethe location of the target
Both correlation functions and covariance functions are extensivelyused in analyzingrandom processes In general, the statistical properties of a random signal such as themean, variance, and autocorrelation and autocovariance functions are time-varyingfunctions A random process is said to be stationaryif its statistics do not changewith time The most useful and relaxed form of stationaryis the wide-sense stationary(WSS) process A random process is called WSS if the following two conditions aresatisfied:
1 The mean of the process is independent of time That is,
Trang 42 where rxx 0 Ex2 n is equal to the mean-squared value, or the power in therandom process.
In addition, if x(n) is a zero-mean random process, we have
rxx 0 Ex2 n s2
Thus the autocorrelation function of a signal has its maximum value at zero lag
If x(n) has a periodic component, then rxx k will contain the same periodic ponent
com-Example 8.1: Given the sequence
x n anu n, 0 < a < 1,the autocorrelation function can be computed as
rxx k 1 aak 2:Example 8.2: Consider the sinusoidal signal expressed as
x n cos !n,find the mean and the autocorrelation function of x(n)
(a) mx Ecos !n 0
(b) rxx k Ex n kx n Ecos !n !k cos !n
12Ecos 2!n !k 12cos !k 12cos !k:
The crosscorrelation function of two WSS processes x(n) and y(n) is defined as
Trang 5In practice, we onlyhave one sample sequence fx ng available for analysis Asdiscussed earlier, a stationaryrandom process x(n) is ergodic if all its statistics can bedetermined from a single realization of the process, provided that the realization is longenough Therefore time averages are equal to ensemble averages when the record length
is infinite Since we do not have data of infinite length, the averages we compute differfrom the true values In dealing with finite-duration sequence, the sample mean of x(n)
where N is the length of the sequence x(n) Note that for a given sequence of length
N, Equation (8.1.15) generates values for up to N different lags In practice, we canonlyexpect good results for lags of no more than 5±10 percent of the length of thesignals
The autocorrelation and crosscorrelation functions introduced in this section can becomputed using the MATLAB function xcorr in the Signal Processing Toolbox Thecrosscorrelation function rxy k of the two sequences x(n) and y(n) can be computedusing the statement
c = xcorr(x, y);
where x and y are length N vectors and the crosscorrelation vector c has length 2N 1.The autocorrelation function rxx k of the sequence x(n) can be computed using thestatement
See Signal Processing Toolbox User's Guide for details
Trang 6The correlation functions represent the time-domain description of the statistics of arandom process The frequency-domain statistics are represented by the power densityspectrum (PDS) or the autopower spectrum The PDS is the DTFT (or the z-transform)
of the autocorrelation function rxx k of a WSS signal x(n) defined as
of (7.3.16) and (7.3.17) if the DFT is used in computing the PDS of random signals.Equation (8.1.16) implies that the autocorrelation function is the inverse DTFT of thePDS, which is expressed as
or
Trang 7The DTFT of the crosscorrelation function Pxy ! of two WSS signals x(n) and y(n) isgiven by
This function is called the cross-power spectrum
Example 8.3: The autocorrelation function of a WSS white random process can bedefined as
which is of constant value for all frequencies !
Consider a linear and time-invariant digital filter defined bythe impulse responseh(n), or the transfer function H(z) The input of the filter is a WSS random signal x(n)with the PDS Pxx ! As illustrated in Figure 8.1, the PDS of the filter output y(n) can
be expressed as
Pyy ! H ! 2Pxx ! 8:1:28or
Pyy z H z 2Pxx z, 8:1:29
Trang 8h(n) H(w) x(n)
P xx (w) P yy (w)
y(n)
Figure 8.1 Linear filtering of random processes
where H ! is the frequencyresponse of the filter Therefore the value of the outputPDS at frequency ! depends on the squared magnitude response of the filter and theinput PDS at the same frequency
Another important relationships between x(n) and y(n) are
h lEx n l mx
X1 l 1
h l, 8:1:30and
h lrxx k l h k rxx k: 8:1:31Taking the z-transform of both sides, we obtain
Trang 9Example 8.4: Let the system shown in Figure 8.1 be a second-order FIR filter Theinput x(n) is a zero-mean white noise given byExample 8.3, and the I/O equation
x if k 12s2
Manypractical applications involve the reduction of noise and distortion for extraction
of information from the received signal The signal degradation in some physicalsystems is time varying, unknown, or possibly both Adaptive filters provide a usefulapproach for these applications Adaptive filters modifytheir characteristics to achievecertain objectives and usuallyaccomplish the modification (adaptation) automatically.For example, consider a high-speed modem for transmitting and receiving data overtelephone channels It employs a filter called a channel equalizer to compensate forthe channel distortion Since the dial-up communication channels have different char-acteristics on each connection and are time varying, the channel equalizers must beadaptive
Adaptive filters have received considerable attention from manyresearchers over thepast 30 years Many adaptive filter structures and adaptation algorithms have beendeveloped for different applications This chapter presents the most widelyused adap-tive filter based on the FIR filter with the LMS algorithm Adaptive filters in this classare relativelysimple to design and implement Theyare well understood with regard toconvergence speed, steady-state performance, and finite-precision effects
8.2.1 Introduction to Adaptive Filtering
An adaptive filter consists of two distinct parts ± a digital filter to perform the desiredsignal processing, and an adaptive algorithm to adjust the coefficients (or weights) ofthat filter A general form of adaptive filter is illustrated in Figure 8.2, where d(n) is adesired signal (or primaryinput signal), y(n) is the output of a digital filter driven byareference input signal x(n), and an error signal e(n) is the difference between d(n) andy(n) The function of the adaptive algorithm is to adjust the digital filter coefficients to
Trang 10x(n) y(n)
d(n) e(n)
+
−
Adaptive algorithm
Digital filter
Figure 8.2 Block diagram of adaptive filter
Figure 8.3 Block diagram of FIR filter for adaptive filtering
minimize the mean-square value of e(n) Therefore the filter weights are updated so thatthe error is progressivelyminimized on a sample-by-sample basis
In general, there are two types of digital filters that can be used for adaptive filtering:FIR and IIR filters The choice of an FIR or an IIR filter is determined bypracticalconsiderations The FIR filter is always stable and can provide a linear phase response
On the other hand, the IIR filter involves both zeros and poles Unless theyare properlycontrolled, the poles in the filter maymove outside the unit circle and make the filterunstable Because the filter is required to be adaptive, the stabilityproblems are muchdifficult to handle Thus the FIR adaptive filter is widelyused for real-time applications.The discussions in the following sections will be restricted to the class of adaptive FIRfilters
The most widelyused adaptive FIR filter is depicted in Figure 8.3 Given a set
of L coefficients, wl n, l 0, 1, , L 1, and a data sequence, fx n x n 1 x n L 1g, the filter output signal is computed as
y n XL 1l0
Trang 11and the weight vector at time n as
w n w0 n w1 n wL 1 nT: 8:2:3Then the output signal y(n) in (8.2.1) can be expressed using the vector operation
y n wT nx n xT nw n: 8:2:4The filter output y(n) is compared with the desired response d(n), which results in theerror signal
e n d n y n d n wT nx n: 8:2:5
In the following sections, we assume that d(n) and x(n) are stationary, and our objective is
to determine the weight vector so that the performance (or cost) function is minimized
8.2.2 Performance Function
The general block diagram of the adaptive filter shown in Figure 8.2 updates thecoefficients of the digital filter to optimize some predetermined performance criterion.The most commonlyused performance measurement is based on the mean-square error(MSE) defined as
For an adaptive FIR filter, x n will depend on the L filter weights w0 n, w1 n, , wL 1 n The MSE function can be determined bysubstituting (8.2.5) into (8.2.6),expressed as
x n Ed2 n 2pTw n wT nRw n, 8:2:7where p is the crosscorrelation vector defined as
p Ed nx n rdx 0 rdx 1 rdx L 1T, 8:2:8and
37
Trang 12is the autocorrelation function of x(n)
Example 8.5: Given an optimum filter illustrated in the following figure:
The optimum filter wominimizes the MSE cost function x n Vector differentiation
of (8.2.7) gives woas the solution to
This system equation defines the optimum filter coefficients in terms of two correlationfunctions ± the autocorrelation function of the filter input and the crosscorrelationfunction between the filter input and the desired response Equation (8.2.12) provides asolution to the adaptive filtering problem in principle However, in manyapplications,the signal maybe non-stationary This linear algebraic solution, wo R 1p, requirescontinuous estimation of R and p, a considerable amount of computations In addition,when the dimension of the autocorrelation matrix is large, the calculation of R 1maypresent a significant computational burden Therefore a more useful algorithm isobtained bydeveloping a recursive method for computing wo, which will be discussed
in the next section
To obtain the minimum MSE, we substitute the optimum weight vector wo R 1pfor w(n) in (8.2.7), resulting in
Trang 13Since R is positive semidefinite, the quadratic form on the right-hand side of (8.2.7)indicates that anydeparture of the weight vector w(n) from the optimum wo wouldincrease the error above its minimum value In other words, the error surface is concaveand possesses a unique minimum This feature is veryuseful when we utilize searchtechniques in seeking the optimum weight vector In such cases, our objective is todevelop an algorithm that can automaticallysearch the error surface to find theoptimum weights that minimize x n using the input signal x(n) and the error signal e(n).Example 8.6: Consider a second-order FIR filter with two coefficients w0 and
w1, the desired signal d n p2sin n!0, n 0, and the reference signal
x n d n 1 Find woand xmin
Similar to Example 8.2, we can obtain rxx 0 Ex2 n Ed2 n 1,
rxx 1 cos !0, rxx 2 cos 2!0, rdx 0 rxx 1, and rdx 1 rxx 2 From(8.2.12), we have
wo R 1p 1 cos !0
cos !0 1
cos !ocos 2!0
xmin 1 cos !0 cos 2!0 2 cos !1 0
0:
Equation (8.2.7) is the general expression for the performance function of an adaptiveFIR filter with given weights That is, the MSE is a function of the filter coefficientvector w(n) It is important to note that the MSE is a quadratic function because theweights appear onlyto the first and second degrees in (8.2.7) For each coefficient vectorw(n), there is a corresponding (scalar) value of MSE Therefore the MSE valuesassociated with w(n) form an L 1-dimensional space, which is commonlycalled theMSE surface, or the performance surface
For L 2, this corresponds to an error surface in a three-dimensional space Theheight of x n corresponds to the power of the error signal e(n) that results from filteringthe signal x(n) with the coefficients w(n) If the filter coefficients change, the power in theerror signal will also change This is indicated bythe changing height on the surfaceabove w0 w1 the plane as the component values of w(n) are varied Since the errorsurface is quadratic, a unique filter setting w n wowill produce the minimum MSE,
xmin In this two-weight case, the error surface is an elliptic paraboloid If we cut theparaboloid with planes parallel to the w0 w1 plane, we obtain concentric ellipses ofconstant mean-square error These ellipses are called the error contours of the errorsurface
Example 8.7: Consider a second-order FIR filter with two coefficients w0and w1.The reference signal x(n) is a zero-mean white noise with unit variance Thedesired signal is given as
d n b0x n b1x n 1:
Trang 14Plot the error surface and error contours.
From Equation (8.2.10), we obtain R rxx 0 rxx 1
dx 1
bb01
From (8.2.7), we get
x Ed2 n 2pTw wTRw b2
0 b2
1 2b0w0 2b1w1 w2
0 w2 1Let b0 0:3 and b1 0:5, we have
x 0:34 0:6w0 w1 w2
0 w2
1:The MATLAB script (exam8_7a.m in the software package) is used to plot the errorsurface shown in Figure 8.4(a) and the script exam8_7b.m is used to plot the errorcontours shown in Figure 8.4(b)
1200 1000 800 600
400 200
20 15 10 5 0
10 20 30
0 40 20 0
−20
−40
40 20 0 w0
−40
Figure 8.4 Performance surface and error contours, L 2
Trang 15One of the most important properties of the MSE surface is that it has onlyone globalminimum point At that minimum point, the tangents to the surface must be 0 Minim-izing the MSE is the objective of manycurrent adaptive methods such as the LMSalgorithm.
8.2.3 Method of Steepest Descent
As shown in Figure 8.4, the MSE of (8.2.7) is a quadratic function of the weights thatcan be pictured as a positive-concave hyperparabolic surface Adjusting the weights tominimize the error involves descending along this surface until reaching the `bottom ofthe bowl.' Various gradient-based algorithms are available These algorithms are based
on making local estimates of the gradient and moving downward toward the bottom ofthe bowl The selection of an algorithm is usuallydecided bythe speed of convergence,steady-state performance, and the computational complexity
The steepest-descent method reaches the minimum byfollowing the direction inwhich the performance surface has the greatest rate of decrease Specifically, an algo-rithm whose path follows the negative gradient of the performance surface Thesteepest-descent method is an iterative (recursive) technique that starts from some initial(arbitrary) weight vector It improves with the increased number of iterations Geomet-rically, it is easy to see that with successive corrections of the weight vector in thedirection of the steepest descent on the concave performance surface, we should arrive
at its minimum, xmin, at which point the weight vector components take on theiroptimum values Let x 0 represent the value of the MSE at time n 0 with an arbitrarychoice of the weight vector w(0) The steepest-descent technique enables us to descend
to the bottom of the bowl, wo, in a systematic way The idea is to move on the errorsurface in the direction of the tangent at that point The weights of the filter are updated
at each iteration in the direction of the negative gradient of the error surface
The mathematical development of the method of steepest descent is easilyseen fromthe viewpoint of a geometric approach using the MSE surface Each selection of a filterweight vector w(n) corresponds to onlyone point on the MSE surface, [w n, x n].Suppose that an initial filter setting w(0) on the MSE surface, [w 0, x 0] is arbitrarilychosen A specific orientation to the surface is then described using the directionalderivatives of the surface at that point These directional derivatives quantifythe rate ofchange of the MSE surface with respect to the w(n) coordinate axes The gradient of theerror surface rx n is defined as the vector of these directional derivatives
The concept of steepest descent can be implemented in the following algorithm:
w n 1 w n m
where m is a convergence factor (or step size) that controls stabilityand the rate ofdescent to the bottom of the bowl The larger the value of m, the faster the speed ofdescent The vector rx n denotes the gradient of the error function with respect tow(n), and the negative sign increments the adaptive weight vector in the negativegradient direction The successive corrections to the weight vector in the direction of
Trang 16the steepest descent of the performance surface should eventuallylead to the minimummean-square error xmin, at which point the weight vector reaches its optimum value wo.When w(n) has converged to wo, that is, when it reaches the minimum point of theperformance surface, the gradient rx n 0 At this time, the adaptation in (8.2.14) isstopped and the weight vector stays at its optimum solution The convergence can beviewed as a ball placed on the `bowl-shaped' MSE surface at the point [w 0, x 0] If theball was released, it would roll toward the minimum of the surface, and would initiallyroll in a direction opposite to the direction of the gradient, which can be interpreted asrolling towards the bottom of the bowl.
^
Therefore the gradient estimate used bythe LMS algorithm is
r^x n 2re ne n: 8:2:16Since e n d n wT nx x, re n x n, the gradient estimate becomes
Substituting this gradient estimate into the steepest-descent algorithm of (8.2.14), we have
This is the well-known LMS algorithm, or stochastic gradient algorithm This algorithm
is simple and does not require squaring, averaging, or differentiating The LMS rithm provides an alternative method for determining the optimum filter coefficientswithout explicitlycomputing the matrix inversion suggested in (8.2.12)
algo-Widrow's LMS algorithm is illustrated in Figure 8.5 and is summarized as follows:
1 Determine L, m, and w(0), where L is the order of the filter, m is the step size, andw(0) is the initial weight vector at time n 0
2 Compute the adaptive filter output
y n XL 1l0
Trang 17x(n) y(n)
d(n) e(n)
+
−
w(n)
LMS
Figure 8.5 Block diagram of an adaptive filter with the LMS algorithm
3 Compute the error signal
4 Update the adaptive weight vector from w(n) to w(n + 1) byusing the LMSalgorithm
wl n 1 wl n mx n le n, l 0, 1, , L 1: 8:2:218.3 Performance Analysis
A detailed discussion of the performance of the LMS algorithm is available in manytextbooks In this section, we present some important properties of the LMS algorithmsuch as stability, convergence rate, and the excess mean-square error due to gradientestimation error
8.3.1 Stability Constraint
As shown in Figure 8.5, the LMS algorithm involves the presence of feedback Thusthe algorithm is subject to the possibilityof becoming unstable From (8.2.18), weobserve that the parameter m controls the size of the incremental correction applied
to the weight vector as we adapt from one iteration to the next The mean weightconvergence of the LMS algorithm from initial condition w(0) to the optimum filter womust satisfy
0 < m <l2
where lmax is the largest eigenvalue of the autocorrelation matrix R defined in (8.2.10).Applying the stability constraint on m given in (8.3.1) is difficult because of the compu-tation of lmax when L is large
In practical applications, it is desirable to estimate lmaxusing a simple method From(8.2.10), we have
Trang 18Px rxx 0 Ex2 n 8:3:4denotes the power of x(n) Therefore setting
0 < m <LP2
assures that (8.3.1) is satisfied
Equation (8.3.5) provides some important information on how to select m, and theyare summarized as follows:
1 Since the upper bound on m is inverselyproportional to L, a small m is used for order filters
large-2 Since m is made inverselyproportional to the input signal power, weaker signals use
a larger m and stronger signals use a smaller m One useful approach is to normalize
with respect to the input signal power Px The resulting algorithm is called thenormalized LMS algorithm, which will be discussed in Section 8.4
8.3.2 Convergence Speed
In the previous section, we saw that w(n) converges to wo if the selection of m satisfies(8.3.1) Convergence of the weight vector w(n) from w(0) to wo corresponds to theconvergence of the MSE from x 0 to xmin Therefore convergence of the MSE towardits minimum value is a commonlyused performance measurement in adaptive systemsbecause of its simplicity During adaptation, the squared error e2 n is non-stationaryasthe weight vector w(n) adapts toward wo The corresponding MSE can thus be definedonlybased on ensemble averages A plot of the MSE versus time n is referred to as thelearning curve for a given adaptive algorithm Since the MSE is the performancecriterion of LMS algorithms, the learning curve is a natural wayto describe the transientbehavior
Each adaptive mode has its own time constant, which is determined bythe overalladaptation constant m and the eigenvalue ll associated with that mode Overall con-vergence is clearlylimited bythe slowest mode Thus the overall MSE time constant can
be approximated as
Trang 19Because the upper bound of tmse is inverselyproportional to lmin, a small lmin canresult in a large time constant (i.e., a slow convergence rate) Unfortunately, if lmax isalso verylarge, the selection of m will be limited by(8.3.1) such that onlya small m cansatisfythe stabilityconstraint Therefore if lmaxis verylarge and lminis verysmall, from(8.3.6), the time constant can be verylarge, resulting in veryslow convergence Aspreviouslynoted, the fastest convergence of the dominant mode occurs for m 1=lmax.Substituting this smallest step size into (8.3.6) results in
tmselmax
For stationaryinput and sufficientlysmall m, the speed of convergence of the algorithm
is dependent on the eigenvalue spread (the ratio of the maximum to minimum values) of the matrix R
eigen-As mentioned in the previous section, the eigenvalues lmaxand lminare verydifficult
to compute However, there is an efficient wayto estimate the eigenvalue spread fromthe spectral dynamic range That is,
8.3.3 Excess Mean-Square Error
The steepest-descent algorithm in (8.2.14) requires knowledge of the gradient rx n,which must be estimated at each iteration The estimated gradient r^x n produces thegradient estimation noise After the algorithm converges, i.e., w(n) is close to wo, thetrue gradient rx n 0 However, the gradient estimator r^x n 6 0 As indicated bythe update of Equation (8.2.14), perturbing the gradient will cause the weight vector
w n 1 to move awayfrom the optimum solution wo Thus the gradient estimation
Trang 20noise prevents w n 1 from staying at woin steadystate The result is that w(n) variesrandomlyabout wo Because wocorresponds to the minimum MSE, when w(n) movesawayfrom wo, it causes x n to be larger than its minimum value, xmin, thus producingexcess noise at the filter output.
The excess MSE, which is caused byrandom noise in the weight vector after vergence, is defined as the average increase of the MSE For the LMS algorithm, it can
con-be approximated as
This approximation shows that the excess MSE is directlyproportional to m The largerthe value of m, the worse the steady-state performance after convergence However,Equation (8.3.6) shows that a larger m results in faster convergence There is a designtrade-off between the excess MSE and the speed of convergence
The optimal step size m is difficult to determine Improper selection of m might makethe convergence speed unnecessarilyslow or introduce excess MSE If the signal is non-stationaryand real-time tracking capabilityis crucial for a given application, then use alarger m If the signal is stationaryand convergence speed is not important, use a smaller
m to achieve better performance in a steadystate In some practical applications, we canuse a larger m at the beginning of the operation for faster convergence, then use a smaller
m to achieve better steady-state performance
The excess MSE, xexcess, in (8.3.9) is also proportional to the filter order L, whichmeans that a larger L results in larger algorithm noise From (8.3.5), a larger L implies asmaller m, resulting in slower convergence On the other hand, a large L also impliesbetter filter characteristics such as sharp cutoff There exists an optimum order L foranygiven application The selection of L and m also will affect the finite-precision error,which will be discussed in Section 8.6
In a stationaryenvironment, the signal statistics are unknown but fixed The LMSalgorithm graduallylearns the required input statistics After convergence to a steadystate, the filter weights jitter around the desired fixed values The algorithm perform-ance is determined byboth the speed of convergence and the weight fluctuations insteadystate In the non-stationarycase, the algorithm must continuouslytrack the time-varying statistics of the input Performance is more difficult to assess
8.4 Modified LMS Algorithms
The LMS algorithm described in the previous section is the most widelyused adaptivealgorithm for practical applications In this section, we present two modified algorithmsthat are the direct variants of the basic LMS algorithm
8.4.1 Normalized LMS Algorithm
The stability, convergence speed, and fluctuation of the LMS algorithm are governed bythe step size m and the reference signal power As shown in (8.3.5), the maximum stable
Trang 21step-size m is inverselyproportional to the filter order L and the power of the referencesignal x(n) One important technique to optimize the speed of convergence whilemaintaining the desired steady-state performance, independent of the reference signalpower, is known as the normalized LMS algorithm (NLMS) The NLMS algorithm isexpressed as
1 ChooseP^x 0 as the best a priori estimate of the reference signal power
2 Since it is not desirable that the power estimate P^x n be zero or verysmall, asoftware constraint is required to ensure that m n is bounded even if P^x n is verysmall when the signal is absent for a long time This can be achieved bymodifying(8.4.2) as
where v is the leakage factor with 0 < v 1 It can be shown that leakage is thedeterministic equivalent of adding low-level white noise Therefore this approach results
Trang 22in some degradation in adaptive filter performance The value of the leakage factor isdetermined bythe designer on an experimental basis as a compromise between robust-ness and loss of performance of the adaptive filter The leakage factor introduces a bias
on the long-term coefficient estimation The excess error power due to the leakage isproportional to 1 v=m2 Therefore (1 v) should be kept smaller than m in order tomaintain an acceptable level of performance For fixed-point hardware realization,multiplication of each coefficient by v, as shown in (8.4.5), can lead to the introduction
of roundoff noise, which adds to the excess MSE Therefore the leakage effects must beincorporated into the design procedure for determining the required coefficient andinternal data wordlength The leakyLMS algorithm not onlyprevents unconstrainedweight overflow, but also limits the output power in order to avoid nonlinear distortion.8.5 Applications
The desirable features of an adaptive filter are the abilityto operate in an unknownenvironment and to track time variations of the input signals, making it a powerfulalgorithm for DSP applications The essential difference between various applications
of adaptive filtering is where the signals x(n), d(n), y(n), and e(n) are connected Thereare four basic classes of adaptive filtering applications: identification, inverse modeling,prediction, and interference canceling
8.5.1 Adaptive System Identification
System identification is an experimental approach to the modeling of a process or a plant.The basic idea is to measure the signals produced bythe system and to use them toconstruct a model The paradigm of system identification is illustrated in Figure 8.6,where P(z) is an unknown system to be identified and W(z) is a digital filter used to modelP(z) Byexciting both the unknown system P(z) and the digital model W(z) with the sameexcitation signal x(n) and measuring the output signals y(n) and d(n), we can determinethe characteristics of P(z) byadjusting the digital model W(z) to minimize the differencebetween these two outputs The digital model W(z) can be an FIR filter or an IIR filter
filter, W(z)
LMS algorithm
Unknown
system, P(z)
Signal generator
Figure 8.6 Block diagram of adaptive system identification using the LMS algorithm
Trang 23Adaptive system identification is a technique that uses an adaptive filter for the modelW(z) This section presents the application of adaptive estimation techniques for directsystem modeling This technique has been widely applied in echo cancellation, whichwill be introduced in Sections 9.4 and 9.5 A further application for system modeling is
to estimate various transfer functions in active noise control systems [8]
Adaptive system identification is a very important procedure that is used frequently
in the fields of control systems, communications, and signal processing The modeling
of a single-input/single-output dynamic system (or plant) is shown in Figure 8.6, wherex(n), which is usuallywhite noise, is applied simultaneouslyto the adaptive filter and theunknown system The output of the unknown system then becomes the desired signal,d(n), for the adaptive filter If the input signal x(n) provides sufficient spectral excita-tion, the adaptive filter output y(n) will approximate d(n) in an optimum sense afterconvergence
Identification could mean that a set of data is collected from the system, and that aseparate procedure is used to construct a model Such a procedure is usuallycalled off-line (or batch) identification In manypractical applications, however, the model issometimes needed on-line during the operation of the system That is, it is necessary toidentifythe model at the same time that the data set is collected The model is updated ateach time instant that a new data set becomes available The updating is performed with
a recursive adaptive algorithm such as the LMS algorithm
As shown in Figure 8.6, it is desired to learn the structure of the unknown systemfrom knowledge of its input x(n) and output d(n) If the unknown time-invariant systemP(z) can be modeled using an FIR filter of order L, the estimation error is given as
e n d n y n XL 1
l0
p l wl nx n l, 8:5:1
where p(l) is the impulse response of the unknown plant
Bychoosing each wl n close to each p(l), the error will be made small For noise input, the converse also holds: minimizing e(n) will force the wl n to approachp(l), thus identifying the system
white-wl n p l, l 0, 1, , L 1: 8:5:2The basic concept is that the adaptive filter adjusts itself, intending to cause its output tomatch that of the unknown system When the difference between the physical systemresponse d(n) and adaptive model response y(n) has been minimized, the adaptive modelapproximates P(z) In actual applications, there will be additive noise present at theadaptive filter input and so the filter structure will not exactlymatch that of the unknownsystem When the plant is time varying, the adaptive algorithm has the task of keeping themodeling error small bycontinuallytracking time variations of the plant dynamics
8.5.2 Adaptive Linear Prediction
Linear prediction is a classic signal processing technique that provides an estimate of thevalue of an input process at a future time where no measured data is yet available The
Trang 24techniques have been successfullyapplied to a wide range of applications such as speechcoding and separating signals from noise As illustrated in Figure 8.7, the time-domainpredictor consists of a linear prediction filter in which the coefficients wl n are updatedwith the LMS algorithm The predictor output y(n) is expressed as
y n XL 1l0
refer-Now consider the adaptive predictor for enhancing an input of M sinusoidsembedded in white noise, which is of the form
x n s n v n M 1X
m0
Amsin !mn fm v n, 8:5:5where v(n) is white noise with uniform noise power s2
v In this application, the structureshown in Figure 8.7 is called the adaptive line enhancer, which provides an efficientmeans for the adaptive tracking of the sinusoidal components of a received signal x(n)and separates these narrowband signals s(n) from broadband noise v(n) This techniquehas been shown effective in practical applications when there is insufficient a prioriknowledge of the signal and noise parameters
As shown in Figure 8.7, we want the highlycorrelated components of x(n) to appear
in y(n) This is accomplished byadjusting the weights to minimize the expected square value of the error signal e(n) This causes an adaptive filter W(z) to form
mean-x(n)
y(n) + e(n)
− Digital
Narrowband output
Figure 8.7 Block diagram of an adaptive predictor
...8.2.1 Introduction to Adaptive Filtering< /h3>
An adaptive filter consists of two distinct parts ± a digital filter to perform the desiredsignal processing, and an adaptive algorithm to... general form of adaptive filter is illustrated in Figure 8.2, where d(n) is adesired signal (or primaryinput signal) , y(n) is the output of a digital filter driven byareference input signal x(n),... error is progressivelyminimized on a sample-by-sample basis
In general, there are two types of digital filters that can be used for adaptive filtering: FIR and IIR filters The choice of