Transform Domain AdaptiveMIT Lincoln Laboratory 22.1 LMS Adaptive Filter Theory22.2 Orthogonalization and Power Normalization22.3 Convergence of the Transform Domain Adaptive Filter22.4
Trang 1W Kenneth Jenkins , et Al “Transform Domain Adaptive Filtering.”
2000 CRC Press LLC <http://www.engnetbase.com>.
Trang 2Transform Domain Adaptive
MIT Lincoln Laboratory
22.1 LMS Adaptive Filter Theory22.2 Orthogonalization and Power Normalization22.3 Convergence of the Transform Domain Adaptive Filter22.4 Discussion and Examples
22.5 Quasi-Newton Adaptive Algorithms
A Fast Quasi-Newton Algorithm•Examples
22.6 The 2-D Transform Domain Adaptive Filter22.7 Block-Based Adaptive Filters
Comparison of the Constrained and Unconstrained quency Domain Block-LMS Adaptive Algorithms•Examples and Discussion
Fre-References
One of the earliest works on transform domain adaptive filtering was published in 1978 by Dentino
et al [4], in which the concept of adaptive filtering in the frequency domain was proposed Manypublications have since appeared that further develop the theory and expand the current under-standing of performance characteristics for this class of adaptive filters In addition to the discreteFourier transform (DFT), other orthogonal transforms such as the discrete cosine transform (DCT)and the Walsh Hadamard transform (WHT) can also be used effectively as a means to improve theLMS algorithm without adding too much computational complexity For this reason, the general
term transform domain adaptive filtering is used in the following discussion to mean that the input
signal is preprocessed by decomposing the input vector into orthogonal components, which are inturn used as inputs to a parallel bank of simpler adaptive subfilters With an orthogonal transforma-tion, the adaptation takes place in the transform domain, as it is possible to show that the adjustableparameters are indeed related to an equivalent set of time domain filter coefficients by means of thesame transformation that is used for the real time processing [5,14,17]
A direct form FIR digital filter structure is shown in Fig.22.1 The direct form requiresN − 1
delays,N multiplications, and N −1 additions for each output sample that is produced The amount
of hardware (as well as power) required to implement the direct form structure depends on the degree
of hardware multiplexing that can be utilized within the speed demands of the application A fullyparallel implementation consisting ofN delay registers, N multipliers, and a tree of two-input adders
would be needed for very high-frequency applications At the opposite end of the performance trum, a sequential implementation consisting of a lengthN delay line and a single time multiplexed
spec-multiplier and accumulation adder would provide the cheapest (and slowest) implementation This
Trang 3FIGURE 22.1: The direct form adaptive filter structure.
latter structure would be characteristic of a filter that is implemented in software on one of the manycommercially available DSP chips
Regardless of the hardware complexity that results from a particular implementation, the putational complexity of the filter is determined by the requirements of the algorithm and, as such,remains invariant with respect to different hardware structures In particular, the computationalcomplexity of the direct form FIR filter isO[N], since N multiplications and (N − 1) additions must
com-be performed at each iteration When designing an adaptive filter, it seems reasonable to seek anadaptive algorithm whose order of complexity is no greater than the order of complexity of the basicfilter structure itself This goal is achieved by the LMS algorithm, which is the major contributingfactor to the enormous success of that algorithm Extending this principle for 2-D adaptive filtersimplies that desirable 2-D adaptive algorithms have an order of complexity ofO[N2], since a 2-DFIR direct form filter hasO[N2] complexity inherent in its basic structure [11,21]
The transform domain adaptive filter is a generalization of the LMS FIR structure, in which a lineartransformation is performed on the input signal and each transformed “chanel” is power normalized
to improve the convergence rate of the adaptation process The linear transform is characterizedthroughout the following discussions as a sliding window operator that consists of a transformationmatrix multiplying an input vector [14] At each iterationn the input vector includes one new input
samplex(n), and N − 1 past input samples x(n − k), k = 1, , N − 1 As the window slides
forward sample by sample, filtered outputs are produced continuously at each value of the indexn.
Since the input transformation is represented by a matrix-vector product, it might appear thatthe computational complexity of the transform domain filter is at leastO[N2] However, manytransformations can be implemented with fast algorithms that have complexities less thanO[N2].For example, the discrete Fourier transform can be implemented by the FFT algorithm, resulting in
a complexity ofO[N log2N] per iteration Some transformations can be implemented recursively
in a bank of parallel filters, resulting in a net complexity ofO[N] per iteration The main point to
be made here is that the complexity of the transform domain filter typically falls betweenO[N] and O[N2], with the actual complexity depending on the specific algorithm that is used to compute thesliding window transform operator [17]
22.1 LMS Adaptive Filter Theory
The LMS algorithm is derived as an approximation to the steepest descent optimization strategy Thefact that the field of adaptive signal processing is based on an elementary principle from optimizationtheory suggests that more advanced adaptive algorithms can be developed by incorporating other
Trang 4results from the field of optimization [22] This point of view recurs throughout this discussion, asconcepts are borrowed from the field of optimization and modified for adaptive filtering as needed.
In particular, one of the borrowed ideas that appears later is the quasi-Newton optimization strategy
It will be shown that transform domain adaptive filtering algorithms are closely related to Newton algorithms, but have computational complexity that is closer to the simple requirements ofthe LMS algorithm
quasi-For a lengthN FIR filter with the input expressed as a column vector x(n) = [x(n), x(n −
1), , x(n − N + 1)] T, the filter outputy(n) is easily expressed as
where w(n) = [w0(n), w1(n), , w N−1 (n)] T is the time varying vector of filter coefficients (tap
weights), and the superscript “T” denotes vector transpose The output error is formed as thedifference between the filter output and a training signald(n), i.e., e(n) = d(n)−y(n) Strategies for
obtaining an appropriated(n) vary from one application to another In many cases the availability
of a suitable training signal determines whether an adaptive filtering solution will be successful
in a particular application The ideal cost function is defined by the mean squared error (MSE)criterion,E[|e(n)|2] The LMS algorithm is derived by approximating the ideal cost function by theinstantaneous squared error, resulting inJLMS(n) = |e(n)|2 While the LMS seems to make a rathercrude approximation at the very beginning, the approximation results in an unbiased estimator
In many applications the LMS algorithm is quite robust and is able to converge rapidly to a smallneighborhood of the optimum Wiener solution
The steepest descent optimization strategy is given by
w(n + 1) = w(n) − µ∇ E[|e|2 ](n) , (22.2)where∇E[|e|2 ](n)is the gradient of the cost function with respect to the coefficient vector w(n) When
the gradient is formed using the LMS cost functionJLMS(n) = |e(n)|2, the conventional LMS results:
1 Assume that all of the signals and filter variables are real-valued The filter itself requires
N multiplications and N − 1 additions to produce y(n) at each value of n The
coeffi-cient update algorithm requires 2N multiplications and N additions, resulting in a total
computational burden of 3N multiplications and 2N − 1 additions per iteration Since
N is generally much larger than the factor of three, the order of complexity of the LMS
algorithm isO[N].
2 The cost function given for the LMS algorithm is a simplified form of the one used forthe RLS algorithm This implies that the LMS algorithm is a simplified version of the RLSalgorithm, where averages are replaced by single instantaneous terms
Trang 53 The (power normalized) LMS algorithm is also a simplified form of the transform domainadaptive filter which results by setting the transform matrix equal to the identity matrix.
4 The LMS algorithm is also a simplified form of the Gauss-Newton optimization strategywhich introduces second order statistics (the input autocorrelation function) to acceleratethe rate of convergence In order to obtain the LMS algorithm from the Gauss-Newtonalgorithm, two approximations must be made: (i) The gradient must be approximated bythe instantaneous error squared, and (ii) the inverse of the input autocorrelation matrixmust be crudely approximated by the identity matrix
These observations suggest that many of the seemingly distinct adaptive filtering algorithms thatappear scattered about in the literature are indeed closely related, and can be considered to be mem-bers of a family whose hereditary characteristics have their origins in Gauss-Newton optimizationtheory [15,16] The different members of this family inherit their individual characteristics fromapproximations that are made on the pure Gauss-Newton algorithm at various stages of their deriva-tions However, after the individual derivations are complete and each algorithm is packaged inits own algorithmic form, the algorithms look considerably different from one another Unless aconscious effort is made to reveal their commonality, the fact that they have evolved from commonroots may be entirely obscured
The convergence behavior of the LMS algorithm, as applied to a direct form FIR filter structure, is
controlled by the autocorrelation matrix Rxof the input process, where
Rx The eigenvalue spread is measured by the condition number of Rx, defined asκ = λmax/λmin,whereλminis the minimum eigenvalue of Rx Ideal conditioning occurs whenκ = 1 (white noise); as
this ratio increases, slower convergence results The eigenvalue spread (condition number) depends
on the spectral distribution of the input signal and can be shown to be related to the maximum andminimum values of the input power spectrum (22.4) From this line of reasoning it becomes clear thatwhite noise is the ideal input signal for rapidly training an LMS adaptive filter The adaptive processbecomes slower and requires more computation for input signals that are more severely colored [6].Convergence properties are reflected in the geometry of the MSE surface, which is simply themean squared output errorE[|e(n)|2] expressed as a function of the N adaptive filter coefficients in (N + 1)-space An expression for the error surface of the direct form filter is
J (z) ≡ Eh|e(n)|2i
with Rxdefined in (22.4) and z ≡ w − wopt, where woptis the vector of optimum filter coefficients in
the sense of minimizing the mean squared error ( woptis known as the Wiener solution) An example
of an error surface for a simple two-tap filter is shown in Fig.22.2 In this examplex(n) was specified
to be a colored noise input signal with an autocorrelation matrix
Figure22.2shows three equal-error contours on the three dimensional surface The term z∗TRxz
in Eq (22.2) is a quadratic form that describes the bowl shape of the FIR error surface When Rxis
Trang 6positive definite, the equal-error contours of the surface are hyperellipses (N dimensional ellipses)
centered at the origin of the coefficient parameter space Furthermore, the principle axes of these
hyperellipses are the eigenvectors of Rx, and their lengths are proportional to the eigenvalues of Rx.Since the convergence rate of the LMS algorithm is inversely related to the ratio of the maximum to the
minimum eigenvalues of Rx, large eccentricity of the equal-error contours implies slow convergence
of the adaptive system In the case of an ideal white noise input, Rx has a single eigenvalue ofmultiplicityN, so that the equal-error contours are hyperspheres [8]
FIGURE 22.2: Example of an error surface for a simple two-tap filter
22.2 Orthogonalization and Power Normalization
The transform domain adaptive filter (TDAF) structure is shown in Fig.22.3 The inputx(n) and
desired signald(n) are assumed to be zero mean and jointly stationary The input to the filter is a
vector ofN current and past input samples, defined in the previous section and denoted as x(n).
This vector is processed by a unitary transform, such as the DFT Once the filter orderN is fixed, the
transform is simply anN × N matrix T, which is in general complex, with orthonormal rows The
transformed outputs form a vector v(n) which is given by
v(n) =v0(n), v1(n), , v N−1 (n)T = Tx(n) (22.6)With an adaptive tap vector defined as
W(n) =W0(n), W1(n), , W N−1 (n)T , (22.7)the filter output is given by
The instantaneous output error
Trang 7FIGURE 22.3: The transform domain adaptive filter structure
is then formed and used to update the adaptive filter taps using a modified form of the LMS rithm (22.11):
i becomes too small due to an insufficient amount of energy in thei-th channel, the update
mechanism becomes ill-conditioned due to a very large effective step size In some cases the processwill become unstable and register overflow will cause the adaptation to catastrophically fail Sothe algorithm given by (22.10) should have the update mechanism disabled for thei-th orthogonal
channel ifσ2
i falls below a critical threshold.
Alternatively the transform domain algorithm may be stabilized by adding small positive constants
ε to the diagonal elements of 32according to
b
Then b32is used in place of32in Eq (22.10) For most input signalsσ2
i ε, and the inclusion
of the stabilization factors is transparent to the performance of the algorithm However, whenever
σ2
i ≈ ε, the stabilization terms begins to have a significant effect Within this operating region the
power in the channels will not be uniformly normalized and the convergence rate of the filter willbegin to degrade but catatrophic failure will be avoided
The motivation for using the TDAF adaptive system instead of a simpler LMS based system is
to achieve rapid convergence of the filter’s coefficients when the input signal is not white, whilemaintaining a reasonably low computational complexity requirement In the following section thisconvergence rate improvement of the TDAF will be explained geometrically
Trang 822.3 Convergence of the Transform Domain Adaptive Filter
In this section the convergence rate improvement of the TDAF is described in terms of the meansquared error surface From Eqs (22.4) and (22.6) it is found that Rv = T∗RxTT, so that for the
transform structure without power normalization Eq (22.5) becomes
J z ≡ Eh|e(n)|2i
= Jmin+ z∗ThT∗R
xTTi
The difference between (22.5) and (22.13) is the presence of T in the quadratic term of (22.13) When
T is a unitary matrix, its presence in (22.13) gives a rotation and/or a reflection of the surface Theeccentricity of the surface is unaffected by the transform, so the convergence rate of the system isunchanged by the transformation alone
However, the signal power levels at the adaptive coefficients are changed by the
transforma-tion Consider the intersection of the equal-error contours with the rotated axes: letting z =
[0 · · · z i· · · 0]T , withz iin thei-th position, Eq (22.13) becomes
If the equal-error contours are hyperspheres (the ideal case), then for a fixed value of the error
J (n), (22.14) must give|z i | = |z j | for all i and j, since all points on a hypersphere are equidistant
from the origin When the filter input is not white, this will not hold in general But since the powerlevelsσ2
i are easily estimated, the rotated axes can be scaled to have this property Let3−1ˆz = z,
where3 is defined in (22.10) Then the error surface of the TDAF, with transform T and including
power normalization, is given by
J (ˆz) = Jmin+ ˆz∗T h3−1T∗R
The main diagonal entries of3−1T∗R x TT 3−1are all equal to one, so (22.14) becomesJ (z)−Jmin=
ˆz2
i, which has the property described above.
Thus, the action of the TDAF system is to rotate the axes of the filter coefficient space using a
unitary rotation matrix T, and then to scale these axes so that the error surface contours become
approximately hyperspherical at the points where they can be easily observed, i.e., the points ofintersection with the new (rotated) axes Usually the actual eccentricity of the error surface contours
is reduced by this scaling, and faster convergence is obtained
As a second example, transform domain processing is now added to the previous example, asillustrated in Figs.22.4and22.5 The error surface of Fig.22.4was created by using the (arbitrary)transform
on the error surface shown in Fig.22.2, which produces clockwise rotation of the ellipsoidal contours
so that the major and minor axes more closely align with the coordinate axes than they did withoutthe transform Power normalization was then applied using the normalization matrix3−1as shown
in Fig.22.5, which represents the transformed and power normalized error surface Note that theelliptical contours after transform domain processing are nearly circular in shape, and in fact theywould have been perfectly circular if the rotation of Fig.22.4had brought the contours into precise
alignment with the coordinate axes Perfect alignment did not occur in this example because T was
not able to perfectly diagonalize the input autocorrelation matrix for this particularx(n) Since T is
a fixed transform in the TDAF structure, it clearly cannot properly diagonalize Rxfor an arbitrary
x(n), hence the surface rotation (orthogonalization) will be less than perfect for most input signals It
Trang 9FIGURE 22.4: Error surface for the TDAF with transform T.
FIGURE 22.5: Error surface with transform and power normalization
should be noted here that a well-known conventional algorithm called recursive least squares (RLS) is
known to achieve near optimum convergence rates by forming an estimate of R−1
x , the inverse of the
autocorrelation matrix This type of algorithm automatically adjusts to whiten any input signal, and
it also varies over time if the input signal is a nonstationary process Unfortunately, the computationrequired for the RLS algorithm is large and is not easily carried out in real time within the resourcelimitations of many practical applications The RLS algorithm falls into the general class of quasi-Newton optimization techniques, which are thoroughly treated in numerous places throughout theliterature
There are two different ways to interpret the mechanism that brings about improved convergencerates achieved through transform domain processing [16] The first point of view considers the com-bined operations of orthogonalization and power normalization to be the effective transformation
3−1T , an interpretation that is implied by Eq (22.15) This line of thinking leads to an ing of the transformed error surfaces as illustrated by example in Figs.22.4and22.5and leads to thelogical conclusion that the faster learning rate is due to the conventional LMS algorithm operating on
Trang 10understand-an improved error surface that has been rendered more properly oriented understand-and more symmetrical viathe transformation While this point of view is useful in understanding the principles of transformdomain processing, it is not generally implementable from a practical point of view This is becausefor an arbitrary input signal, the power normalization factors that constitute the3−1part of the
input transformation are not known a priori, and must be estimated after T is used to decompose
the input signal into orthogonal channels
The second point of view interprets the transform domain equations as operating on the formed error surface (without power normalization) with a modified LMS algorithm where the
trans-step sizes are adjusted differently in the various channels according to µ(n) = µ3−2, where
µ(n) = diag[µ i (n)] is a diagonal matrix that contains the step size for the i-th channel at
loca-tion(i, i) The dependence of the µ i (n)’s on the iteration (time) index n acknowledges that the
steps sizes are a function of the power normalization factors, which are updated in real time as part
of the on-line algorithm This suggests that the TDAF should be able to track nonstationary input
statistics within the limited abilities of the transformation T to orthogonalize the input and within
the accuracy limits of the power normalization factors Furthermore, when the input signal is white,all of theσ2
i’s are identical and each is equal to the power in the input signal In this case the TDAF
with power normalization becomes the conventional normalized LMS algorithm
It is straightforward to show mathematically that the above two points of view are indeed patible [10] Letˆv(n) ≡ 3−1Tx(n) = 3−1v(n) and let the filter tap vector be denoted ˆW(n) when
com-the matrix3−1T is treated as the effective transformation For the resulting filter to have the same
response as the filter in Fig.22.3we must have
22.4 Discussion and Examples
It is clear from the above development that the power estimatesσ2
i are the optimum scale factors,
as opposed to|σ i| or some other statistic Also, it is significant to note that no convergence rateimprovement can be realized without power normalization This is the same conclusion that wasreached in [6] where the frequency domain LMS algorithm was analyzed with a constant convergencefactor From the error surface description of the TDAF’s operation, it is seen that an optimal transformrotates the axes of the hyperellipsoidal equal-error contours into alignment with the coordinate axes.The prescribed power normalization scheme then gives the ideal hyperspherical contours, and theconvergence rate becomes the same as if the input were white The optimal transform is composed
of the orthonormal eigenvectors of the input autocorrelation matrix and is known in the literature
as the Karhunen-Loe’ve transform (KLT) The KLT is signal dependent and usually cannot be easilycomputed in real time Note that real signals have real KLT’s, suggesting the use of real transforms
in the TDAF (in contrast to complex transforms such as the DTF)
Since the optimal transform for the TDAF is signal dependent, a universally optimal fixed parametertransform can never be found It is also clear that once the filter order has been chosen, any unitary
Trang 11matrix of correct dimensions is a possible choice for the transform; there is no need to restrictattention to classes of known transforms In fact, if a prototype input power spectrum is available,its KLT can be constructed and used One factor that must be considered in choosing a transform forreal-time applications is computational complexity In this respect, real transforms are superior tocomplex ones, transforms with fast algorithms are superior to those without, and transforms whoseelements are all powers-of-two are attractive since only additions and shifts are needed to computethem Throughout the literature the discrete Fourier transform (DFT), the discrete cosine transform(DCT), and the Walsh Hadamard transform (WHT) have received considerable attention as possiblecandidates for use in the TDAF [14] In spite of the fact that the DFT is a complex transform and notcomputationally optimal from that point of view, it is often used in practice because of the availability
of efficient FFT algorithms
Figure22.6shows learning characteristics for computer-generated TDAF examples using six ferent orthogonal transforms to decorrelate the input signal The examples presented are for systemidentification experiments, where the desired signal was derived by passing the input through an8-tap FIR filter, which serves as the model system to be identified Computer-generated whitepseudo-noise, uncorrelated with the input signal, was added to the output of the model system,creating a−100 dB noise floor The filter inputs were generated by filtering white pseudo-noise with
dif-a 32-tdif-ap linedif-ar phdif-ase FIR noise-coloring filter to produce dif-an input dif-autocorreldif-ation eigenvdif-alue rdif-atio of
681 Experiments were then performed using the discrete Fourier transform (DFT), the discrete sine transform (DCT), the Walsh-Hadamard transform (WHT), discrete Hartley transform (DHT),and a specially designed computationally efficient “power-of-2” PO2 transform, as listed in Fig.22.6.The eigenvalue ratios that result from transform processing with each of these transforms is shown
co-in Fig.22.6, where it is seen that the PO2 transform with power normalization reduces the inputcondition number from 681 to 128, resulting in the most effective transform for this particular inputcoloring All of the transforms used in this experiment are able to reduce the input condition numberand greatly improve convergence rates, although some transforms are seen to be more effective thanothers for the coloring chosen for these examples
22.5 Quasi-Newton Adaptive Algorithms
The dependence of the adaptive system’s convergence rate on the input power spectrum can bereduced by using second-order statistics via the Gauss-Newton method [9,10,21] The Gauss-Newton algorithm is well known in the field of optimization as one of the basic accelerated searchtechniques In recent years it has also appeared in various forms in publications on adaptive filtering
In this section a brief introduction to quasi-Newton adaptive filtering methods is presented Whenthe quasi-Newton concept is integrated into the LMS algorithm, the resulting adaptive strategy isclosely related to the transform domain adaptive filter, but where the transform is computed on-line
as an approximation to the Hessian acceleration matrix For FIR structures it turns out that theHessian is equivalent to the input autocorrelation matrix inverse, and therefore the quasi-NewtonLMS algorithm effectively implements a transform that adjusts to the statistics of the input signaland is capable of tracking slowly varying nonstationary input signals
The basic Gauss-Newton coefficient update algorithm for an FIR adaptive filter is given by
w(n + 1) = w(n) − µH(n)∇ E[e2 ](n) , (22.18)
where H(n) is the Hessian matrix and ∇ E[e2 ](n) is the gradient of the cost function at iteration n For
an FIR adaptive filter with a stationary input the Hessian is equal to R−1
x If the gradient is estimated
with the instantaneous error squared, as in the LMS algorithm, the result is
w(n + 1) = w(n) + µe(n)bR−1