1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Tài liệu Digital Signal Processing Handbook P21 docx

40 353 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Recursive Least-Squares Adaptive Filters
Tác giả Ali H. Sayed, Thomas Kailath
Trường học University of California, Los Angeles
Chuyên ngành Digital Signal Processing
Thể loại Chương
Năm xuất bản 2000
Thành phố Los Angeles
Định dạng
Số trang 40
Dung lượng 290,91 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Recursive Least-Squares Adaptive•A Numerical Example21.2 The Least-Squares ProblemGeometric Interpretation •Statistical Interpretation21.3 The Regularized Least-Squares ProblemGeometric

Trang 1

Ali H Sayed, et Al “Recursive Least-Squares Adaptive Filters.”

2000 CRC Press LLC <http://www.engnetbase.com>.

Trang 2

Recursive Least-Squares Adaptive

•A Numerical Example21.2 The Least-Squares ProblemGeometric Interpretation •Statistical Interpretation21.3 The Regularized Least-Squares ProblemGeometric Interpretation•Statistical Interpretation21.4 The Recursive Least-Squares ProblemReducing to the Regularized Form •Time Updates21.5 The RLS Algorithm

Estimation Errors and the Conversion Factor •Update of theMinimum Cost

21.6 RLS Algorithms in Array FormsMotivation •A Very Useful Lemma•The Inverse QR Algo-rithm •The QR Algorithm

21.7 Fast Transversal AlgorithmsThe Prewindowed Case•Low-Rank Property•A Fast Array Algorithm•The Fast Transversal Filter

21.8 Order-Recursive FiltersJoint Process Estimation •The Backward Prediction Error Vec-tors •The Forward Prediction Error Vectors•A NonunityForgetting Factor•The QRD Least-Squares Lattice Filter•The Filtering or Joint Process Array

21.9 Concluding RemarksReferences

The central problem in estimation is to recover, to good accuracy, a set of unobservable parametersfrom corrupted data Several optimization criteria have been used for estimation purposes over theyears, but the most important, at least in the sense of having had the most applications, are criteriathat are based on quadratic cost functions The most striking among these is the linear least-squarescriterion, which was perhaps first developed by Gauss (ca 1795) in his work on celestial mechanics.Since then, it has enjoyed widespread popularity in many diverse areas as a result of its attractivecomputational and statistical properties Among these attractive properties, the most notable are thefacts that least-squares solutions:

• can be explicitly evaluated in closed forms;

• can be recursively updated as more input data is made available, and

Trang 3

• are maximum likelihood estimators in the presence of Gaussian measurement noise.

The aim of this chapter is to provide an overview of adaptive filtering algorithms that result whenthe least-squares criterion is adopted Over the last several years, a wide variety of algorithms in thisclass has been derived They all basically fall into the following main groups (or variations thereof):recursive least-squares (RLS) algorithms and the corresponding fast versions (known as FTF andFAEST), QR and inverse QR algorithms, least-squares lattice (LSL), and QR decomposition-basedleast-squares lattice (QRD-LSL) algorithms

Table21.1lists these different variants and classifies them into order-recursive and fixed-orderalgorithms The acronyms and terminology are not important at this stage and will be explained

as the discussion proceeds Also, the notationO(M) is used to indicate that each iteration of an

algorithm requires of the order ofM floating point operations (additions and multiplications) In

this sense, some algorithms are fast (requiring onlyO(M)), while others are slow (requiring O(M2)).The value ofM is the filter order that will be introduced in due time.

TABLE 21.1 Most Common RLS Adaptive Schemes

Adaptive Order Fixed Cost per Algorithm Recursive Order Iteration RLS x O(M2)

QR and Inverse QR x O(M2)

Here we wish to stress that, apart from introducing the reader to the fundamentals of RLS filtering,one of our goals in this exposition is to present the different versions of the RLS algorithm incomputationally convenient so-called array forms In these forms, an algorithm is described as asequence of elementary operations on arrays of numbers Usually, a prearray of numbers has to betriangularized by a rotation, or a sequence of elementary rotations, in order to yield a postarray ofnumbers The quantities needed to form the next prearray can then be read off from the entries ofthe postarray, and the procedure can be repeated The explicit forms of the rotation matrices are notneeded in most cases

Such array descriptions are more truly algorithms in the sense that they operate on sets of numbersand provide other sets of numbers, with no explicit equations involved The rotations themselves can

be implemented in a variety of well-known ways: as a sequence of elementary circular or hyperbolicrotations, in square-root- and/or division-free forms, as Householder transformations, etc Thesemay differ in computational complexity, numerical behavior, and ease of hardware (VLSI) imple-mentation But, if preferred, explicit expressions for the rotation matrices can also be written down,thus leading to explicit sets of equations in contrast to the array forms

For this reason, and although the different RLS algorithms that we consider here have already beenderived in many different ways in earlier places in the literature, the derivation and presentation inthis chapter are intended to provide an alternative unifying exposition that we hope will help a readerget a deeper appreciation of this class of adaptive algorithms

Trang 4

We use small boldface letters to denote column vectors (e.g., w) and capital boldface letters to denote matrices (e.g., A) The symbol Indenotes the identity matrix of sizen × n, while 0 denotes

a zero column The symbolT denotes transposition This chapter deals with real-valued data The

case of complex-valued data is essentially identical and is treated in many of the references at the end

of this chapter

Square-Root Factors

A symmetric positive-definite matrix A is one that satisfies A = AT and xTAx > 0 for

all nonzero column vectors x Any such matrix admits a factorization (also known as decomposition) of the form A= U6U T , where U is an orthogonal matrix, namely a square matrix

eigen-that satisfies UUT = UTU= I, and 6 is a diagonal matrix with real positive entries In particular,

note that AU = U6, which shows that the columns of U are the right eigenvectors of A and the

entries of6 are the corresponding eigenvalues.

Note also that we can write A= U61/2 (61/2 ) TUT , where 61/2is a diagonal matrix whose entries

are (positive) square-roots of the diagonal entries of6 Since 61/2is diagonal,(61/2 ) T = 61/2 If we

introduce the matrix notation A1/2 = U61/2, then we can alternatively write A = (A1/2 )(A1/2 ) T

This can be regarded as a square-root factorization of the positive-definite matrix A Here, the notation A1/2is used to denote one such square-root factor, namely the one constructed from the

eigen-decomposition of A.

Note, however, that square-root factors are not unique For example, we may multiply the diagonalentries of61/2by±10s and obtain a new square-root factor for 6 and, consequently, a new square-

root factor for A.

Also, given any square-root factor A1/2, and any orthogonal matrix2 (satisfying 22 T = I) we can define a new square-root factor for A as A1/2 2 since

(A1/2 2)(A1/2 2) T = A1/2 (22 T )(A1/2 ) T = A

Hence, square factors are highly nonunique We shall employ the notation A1/2to denote any such

square-root factor They can be made unique, e.g., by insisting that the factors be symmetric or thatthey be triangular (with positive diagonal elements) In most applications, the triangular form ispreferred For convenience, we also write

The array form is so important that it will be worthwhile to explain its generic form here

An array algorithm is described via rotation operations on a prearray of numbers, chosen to obtain

a certain zero pattern in a postarray Schematically, we write

where2 is any rotation matrix that triangularizes the prearray In general, 2 is required to be

a J−orthogonal matrix in the sense that it should satisfy the normalization 2J2T = J, where J

Trang 5

is a given signature matrix with±10s on the diagonal and zeros elsewhere The orthogonal case

corresponds to J= I since then 22 T = I.

A rotation2 that transforms a prearray to triangular form can be achieved in a variety of ways: by

using a sequence of elementary Givens and hyperbolic rotations, Householder transformations, orsquare-root-free versions of such rotations Here we only explain the elementary forms The otherchoices are discussed in some of the references at the end of this chapter

21.1.1 Elementary Circular Rotations

An elementary 2× 2 orthogonal rotation 2 (also known as Givens or circular rotation) takes a row

An expression for2 is given by

counterclockwise (ifb < 0) direction.

21.1.2 Elementary Hyperbolic Rotations

An elementary 2× 2 hyperbolic rotation 2 takes a row vector a b and rotates it to lie eitheralong the basis vector

(if|a| > |b|) or along the basis vector 0 1 

(if|a| < |b|) More

precisely, it performs either of the transformations



a b 2 =h ±p|a|2− |b|2 0

i

if |a| > |b| , (21.3)

Trang 6

a b 2 =h 0 ±p|b|2− |a|2

i

if |a| < |b|. (21.4)The quantityp

±(|a|2− |b|2) that appears on the right-hand side of the above expressions is

con-sistent with the fact that the prearray,

a b , and the postarrays must have equal hyperbolic

“norms.” By the hyperbolic “norm” of a row vector xT we mean the indefinite quantity xTJx, which

can be positive or negative Here,

b when b 6= 0 and |b| > |a|

The hyperbolic rotation (21.5) can also be expressed in the alternative form:

The name hyperbolic rotation for 2 is again justified by its effect on a vector; it rotates the original

vector along the hyperbola of equation x2−y2= |a|2−|b|2, by an angleθ determined by the inverse

of the above hyperbolic cosine and/or sine parameters,θ = tanh−1[ρ], in order to align it with

the appropriate basis vector Note also that the special case|a| = |b| corresponds to a row vector



a b with zero hyperbolic norm since|a|2− |b|2= 0 It is then easy to see that there does not

exist a hyperbolic rotation that will rotate the vector to lie along the direction of one basis vector orthe other

21.1.3 Square-Root-Free and Householder Transformations

We remark that the above expressions for the circular and hyperbolic rotations involve square-rootoperations In many situations, it may be desirable to avoid the computation of square-roots because

it is usually expensive For this and other reasons, square-root- and division-free versions of theabove elementary rotations have been developed and constitute an attractive alternative

Therefore one could use orthogonal or J−orthogonal Householder reflections (for given J) to

simultaneously annihilate several entries in a row, e.g., to transform

x x x x directly to theform

x0 0 0 0 

Combinations of rotations and reflections can also be used.

We omit the details here but the idea is clear There are many different ways in which a prearray

of numbers can be rotated into a postarray of numbers

Trang 7

This can be obtained, among several different possibilities, as follows We start by annihilating the

(1, 3) entry of the prearray (21.6) by pivoting with its(1, 1) entry According to expression (21.2),the orthogonal transformation21that achieves this result is given by

1+ ρ2 1

We now annihilate the(1, 2) entry of the resulting matrix in the above equation by pivoting with

its(1, 1) entry This requires that we choose

1+ ρ2 2

We finally annihilate the(2, 3) entry of the resulting matrix in (21.10) by pivoting with its(2, 2)

entry In principle this requires that we choose

1+ ρ2 3

Trang 8

Alternatively, this last step could have been implemented without explicitly forming23 We simplyreplace the row vector

−0.2557 0.1788 , which contains the (2, 2) and (2, 3) entries of the

prearray in (21.12), by the row vector

h

±p(−0.2557)2+ (0.1788)2 0.0000 i, which is equal to



±0.3120 0.0000  We choose the positive sign in order to conform with our earlier convention

that the diagonal entries of triangular square-root factors are taken to be positive The resultingpostarray is therefore 

It will become clear throughout our discussion that the different adaptive RLS schemes can be scribed in array forms, where the necessary operations are elementary rotations as described above.Such array descriptions lend themselves rather directly to parallelizable and modular implementa-tions Indeed, once a rotation matrix is chosen, then all the rows of the prearray undergo the samerotation transformation and can thus be processed in parallel Returning to the above example, where

de-we started with the prearray A, de-we see that once the first rotation is determined, both rows of A are

then transformed by it, and can thus be processed in parallel, and by the same functional (rotation)block, to obtain the desired postarray The same remark holds for prearrays with multiple rows

21.2 The Least-Squares Problem

Now that we have explained the generic form of an array algorithm, we return to the main topic

of this chapter and formulate the least-squares problem and its regularized version Once this isdone, we shall then proceed to describe the different variants of the recursive least-squares solution

in compact array forms

Let w denote a column vector ofn unknown parameters that we wish to estimate, and consider a

set of(N + 1) noisy measurements {d(i)} that are assumed to be linearly related to w via the additive

noise model

d(j) = u T jw+ v(j) ,

where the{uj } are given column vectors The (N + 1) measurements can be grouped together into

a single matrix expression:

or, more compactly, d= Aw + v Because of the noise component v, the observed vector d does not

lie in the column space of the matrix A The objective of the least-squares problem is to determine the vector in the column space of A that is closest to d in the least-squares sense.

More specifically, any vector in the range space of A can be expressed as a linear combination of its columns, say A ˆw for some ˆw It is therefore desired to determine the particular ˆw that minimizes the distance between d and A ˆw,

min

Trang 9

The resulting ˆw is called the least-squares solution and it provides an estimate for the unknown w The term A ˆw is called the linear least-squares estimate (l.l.s.e.) of d.

The solution to (21.14) always exists and it follows from a simple geometric argument The

orthogonal projection of d onto the column span of A yields a vector ˆd that is the closest to d in

the least-squares sense This is because the resulting error vector(d − ˆd) will be orthogonal to the

whereP Adenotes the projector onto the range space of A Figure21.1is a schematic representation

of this geometric construction, whereR(A) denotes the column span of A.

FIGURE 21.1: Geometric interpretation of the least-squares solution

21.2.2 Statistical Interpretation

The least-squares solution also admits an important statistical interpretation For this purpose,

assume that the noise vector v is a realization of a vector-valued random variable that is normally distributed with zero mean and identity covariance matrix, written v ∼ N[0, I] In this case, the

observation vector d will be a realization of a vector-valued random variable that is also normally

Trang 10

distributed with mean Aw and covariance matrix equal to the identity I This is because the random vectors are related via the additive model d = Aw + v The probability density function of the observation process d is then given by

21.3 The Regularized Least-Squares Problem

A more general optimization criterion that is often used instead of (21.14) is the following

where50 is a given positive-definite (weighting) matrix and ¯w is also a given vector Choosing

50= ∞ · I leads us back to the original expression (21.14)

A motivation for (21.17) is that the freedom in choosing50allows us to incorporate additional a

priori knowledge into the statement of the problem Indeed, different choices for 50would indicate

how confident we are about the closeness of the unknown w to the given vector ¯w.

Assume, for example, that we set50 =  · I, where  is a very small positive number Then the

first term in the cost function (21.17) becomes dominant It is then not hard to see that, in this case,the cost will be minimized if we choose the estimate ˆw close enough to ¯w in order to annihilate the

effect of the first term In simple words, a “small”50reflects a high confidence that ¯w is a good and close enough guess for w On the other hand, a “large”50indicates a high degree of uncertainty inthe initial guess ¯w.

One way of solving the regularized optimization problem (21.17) is to reduce it to the standard squares problem (21.14) This can be achieved by introducing the change of variables w0= w − ¯w and d0= d − A ¯w Then (21.17) becomes

Trang 11

TABLE 21.2 Linear Least-Squares Estimation

Optimization / Problem Solution

50 positive-definite Min value= (d − A ¯w) T [I + A50AT] −1(d − A ¯w)

Comparing with the earlier expression (21.15), we see that instead of requiring the invertibility

of ATA, we now require the invertibility of the matrixh

5−10 + ATAi

This is yet another reason in

favor of the modified criterion (21.17) because it allows us to relax the full rank condition on A.

The solution (21.18) can also be reexpressed as the solution of the following linear system ofequations:

where we have denoted, for convenience, the coefficient matrix by8 and the right-hand side by s.

Moreover, it further follows that the value of (21.17) at the minimizing solution (21.18), denoted

byEmin, is given by either of the following two expressions:

A statistical interpretation for the regularized problem can be obtained as follows Given two

vector-valued zero-mean random variables w and d, the minimum-variance unbiased (MVU) estimator of

w given an observation of d is ˆw = E(w|d), the conditional expectation of w given d If the random

Trang 12

variables(w, d) are jointly Gaussian, then the MVU estimator for w given d can be shown to collapse

to

ˆw = (Ewd T )Edd T−1d. (21.22)Therefore, if(w, d) are further linearly related, say

d= Aw + v , v ∼ N(0, I) , w ∼ N(0, 50 ) (21.23)

with a zero-mean noise vector v that is uncorrelated with w (Ewv T = 0), then the expressions for

(Ewd T ) and (Edd T ) can be evaluated as

Ewd T = Ew(Aw + v) T = 50AT , Edd T = A50AT + I

This shows that (21.22) evaluates to

ˆw = 50AT (I + A50AT )−1d. (21.24)

By invoking the useful matrix inversion formula (for arbitrary matrices of appropriate dimensions

and invertible E and C):

(E + BCD)−1= E−1− E−1B(DE−1B + C−1)−1DE−1,

we can rewrite expression (21.24) in the equivalent form

ˆw = (5−10 + ATA)−1ATd. (21.25)This expression coincides with the regularized solution (21.18) for ¯w = 0 (the case ¯w 6= 0 follows from similar arguments by assuming a nonzero mean random variable w).

Therefore, the regularized least-squares solution is the minimum variance unbiased (MVU)

esti-mate of w given observations d that are corrupted by additive Gaussian noise as in (21.23)

21.4 The Recursive Least-Squares Problem

The recursive least-squares formulation deals with the problem of updating the solutionˆw of a squares problem (regularized or not) when new data are added to the matrix A and to the vector

least-d This is in contrast to determining afresh the least-squares solution of the new problem The

distinction will become clear as we proceed in our discussions In this section, we formulate therecursive least-squares problem as it arises in the context of adaptive filtering

Consider a sequence of(N + 1) scalar data points, {d(j)} N

j=0 , also known as reference or desired

signals, and a sequence of(N + 1) row vectors {u T

j}N j=0 , also known as input signals Each input

Consider also a known column vector ¯w and a positive-definite weighting matrix 50 The objective

is to determine anM × 1 column vector w, also known as the weight vector, so as to minimize the

weighted error sum:

Trang 13

whereλ is a positive scalar that is less than or equal to one (usually 0  λ ≤ 1) It is often called the

forgetting factor since past data is exponentially weighted less than the more recent data The specialcaseλ = 1 is known as the growing memory case, since, as the length N of the data grows, the effect

of past data is not attenuated In contrast, the exponentially decaying memory case (λ < 1) is more

suitable for time-variant environments

Also, and in principle, the factorλ −(N+1)that multiplies50in the error-sum expression (21.27)can be incorporated into the weighting matrix50 But it is left explicit for convenience of exposition.

We further denote the individual entries of the column vector w by{w(j)} M

j=1 ,

w = col{w(1), w(2), , w(M)}

A schematic description of the problem is shown in Fig.21.2 At each time instantj, the inputs of the

M channels are linearly combined via the coefficients of the weight vector and the resulting signal is

compared with the desired signald(j) This results in a residual error e(j) = d(j) − u T

jw, for every

j, and the objective is to find a weight vector w in order to minimize the (exponentially weighted

and regularized) squared-sum of the residual errors over an interval of time, say fromj = 0 up to

j = N.

The linear combiner is said to be of orderM since it is determined by M coefficients {w(j)} M

j=1.

FIGURE 21.2: A linear combiner

21.4.1 Reducing to the Regularized Form

The expression for the weighted error-sum (21.27) is a special case of the regularized cost function(21.17) To clarify this, we introduce the residual vector eN, the reference vector dN, the data matrix

AN, and a diagonal weighting matrix3 N,

Trang 14

respectively, and withλ −(N+1) 50replacing50.

We therefore conclude from (21.19) that the optimal solution ˆw of (21.27) is given by

The solution ˆw obtained by solving (21.30) is the optimal weight estimate based on the availabledata from timei = 0 up to time i = N We shall denote it from now on by w N,

8 N (w N − ¯w) = s N

The subscriptN in wNindicates that the data up to, and including, timeN were used This is to

differentiate it from the estimate obtained by using a different number of data points

This notational change is necessary because the main objective of the recursive least-squares (RLS)

problem is to show how to update the estimate wN, which is based on the data up to timeN, to the

Trang 15

estimate wN+1, which is based on the data up to time(N + 1), without the need to solve afresh a

new set of linear equations of the form

8 N+1 (w N+1 − ¯w) = s N+1

Such a recursive update of the weight estimate should be possible since the coefficient matricesλ8 N

and8 N+1of the associated linear systems differ only by a rank-one matrix In fact, a wide variety ofalgorithms has been devised for this end and our purpose in this chapter is to provide an overview

of the different schemes

Before describing these different variants, we note in passing that it follows from (21.20) that wecan express the minimum value ofE(N) in the form:

Let wi−1be the solution of an optimization problem of the form (21.27) that uses input data up

to time(i − 1) [that is, for N = (i − 1)] Likewise, let w i be the solution of the same optimizationproblem but with input data up to timei [N = i].

The recursive least-squares (RLS) algorithm provides a recursive procedure that computes wifrom

wi−1 A classical derivation follows by noting from (21.30) that the new solution wi should satisfy

wi − ¯w = 8−1i si =hλ8 i−1+ uiuT

i

i−1

λs i−1+ uihd(i) − u i T ¯wi ,

where we have also used the time-updates for{8 i , s i}

Introduce the quantities

Pi = 8−1i , g i = 8−1i ui (21.36)Expanding the inverse of[λ8 i−1+uiuT

i ] by using the matrix inversion formula [stated after (21.24)],and grouping terms, leads after some straightforward algebra to the RLS procedure:

• Initial conditions: w−1= ¯w and P−1= 50.

• The computational complexity of the algorithm is O(M2) per iteration.

21.5.1 Estimation Errors and the Conversion Factor

With the RLS problem we associate two residuals at each time instanti: the a priori estimation error

e a (i), defined by

e a (i) = d(i) − u T

i wi−1 ,

Trang 16

and the a posteriori estimation error e p (i), defined by

e p (i) = d(i) − u T i wi

Comparing the expressions fore a (i) and e p (i), we see that the latter employs the most recent weight

vector estimate

If we replace wi in the definition fore p (i) by its update expression (21.37), say

e p (i) = d(i) − u T i (w i−1+ gi

h

d(i) − u i Twi−1

i

) ,

some straightforward algebra will show that we can relatee p (i) and e a (i) via a factor γ (i) known as

the conversion factor:

e p (i) = γ (i)e a (i) ,

whereγ (i) is equal to

1+ λ−1uT

i Pi−1ui

= 1 − uT i Piui (21.40)

That is, the a posteriori error is a scaled version of the a priori error The scaling factor γ (i) is defined

in terms of{ui , P i−1} or {ui , P i } Note that 0 ≤ γ (i) ≤ 1.

Note further that the expression forγ (i) appears in the definition of the so-called gain vector g i

in (21.38) and, hence, we can alternatively rewrite (21.38) and (21.39) in the forms:

Pi = λ−1Pi−1 − γ−1(i)g igT

21.5.2 Update of the Minimum Cost

LetEmin(i) denote the value of the minimum cost of the optimization problem (21.27) with data up

to timei It is given by an expression of the form (21.35) withN replaced by i,

Using the RLS update (21.37) for wi in terms of wi−1, as well as the time-update (21.34) for si in

terms of si−1, we can derive the following time-update for the minimum cost:

Emin(i) = λEmin(i − 1) + e p (i)e a (i) , (21.43)whereEmin(i − 1) denotes the value of the minimum cost of the same optimization problem (21.27)but with data up to time(i − 1).

21.6 RLS Algorithms in Array Forms

As mentioned in the introduction, we intend to stress the array formulations of the RLS solution due

to their intrinsic advantages:

• They are easy to implement as a sequence of elementary rotations on arrays of numbers

• They are modular and parallelizable

• They have better numerical properties than the classical RLS description

Trang 17

21.6.1 Motivation

Note from (21.39) that the RLS solution propagates the variable Pias the difference of two quantities.This variable should be positive-definite But due to roundoff errors, however, the update (21.39)

may not guarantee the positive-definiteness of Pi at all timesi This problem can be ameliorated

by using the so-called array formulations These alternative forms propagate square-root factors of

either Pior P−1

i , namely, P1i /2or P−1/2 i , rather than Piitself By squaring Pi1/2, for example, we can

always recover a matrix Pi that is more likely to be positive-definite than the matrix obtained via(21.39),

Pi = P1/2

i PT /2 i

21.6.2 A Very Useful Lemma

The derivation of the array variants of the RLS algorithm relies on a very useful matrix result thatencounters applications in many other scenarios as well For this reason, we not only state the resultbut also provide one simple proof

LEMMA 21.1 Given twon × m (n ≤ m) matrices A and B, then AAT = BB Tif, and only if, thereexists anm × m orthogonal matrix 2 (22T = Im) such that A= B2.

PROOF 21.1 One implication is immediate If there exists an orthogonal matrix2 such that

A= B2 then

AAT = (B2)(B2) T = B(22 T )B T = BBT

One proof for the converse implication follows by invoking the singular value decompositions of

the matrices A and B,

where UAand UBaren × n orthogonal matrices, V Aand VBarem × m orthogonal matrices, and

6 Aand6 Baren × n diagonal matrices with nonnegative (ordered) entries.

The squares of the diagonal entries of6 A(6 B) are the eigenvalues of AAT (BBT) Moreover, UA

(UB) are constructed from an orthonormal basis for the right eigenvectors of AA T (BB T)

Hence, it follows from the identity AA T = BB T that we have 6 A = 6 B and we can choose

UA= UB Let2 = V BVT

A We then obtain22T = Imand B2 = A.

21.6.3 The Inverse QR Algorithm

We now employ the above result to derive an array form of the RLS algorithm that is known as theinverse QR algorithm

Trang 18

Now note that the RLS recursions (21.38) and (21.39) can be expressed in factored form as follows:

"

λuT i P

1/2 i−1

0 √1

λP

1/2 i−1

0 √1

λP

1/2 i−1

0 √1

λP

1/2 i−1

That is, there should exist an orthogonal2 i that transforms the prearray A into the postarray B.

Note that the prearray contains quantities that are available at stepi, namely {u i , P1/2

i−1}, while the

postarray provides the (normalized) gain vector gi γ −1/2 (i), which is needed to update the weight

vector estimate wi−1into wi, as well as the square-root factor of the variable Pi, which is needed toform the prearray for the next iteration

But how do we determine2 i? The answer highlights a remarkable property of array algorithms

We do not really need to know or determine2 i explicitly!

To clarify this point, we first remark from the expressions (21.44) and (21.45) for the pre andpostarrays that2 i is an orthogonal matrix that takes an array of numbers of the form (assuming a

That is,2 iannihilates all the entries of the top row of the prearray (except for the left-most entry)

Now assume we form the prearray A in (21.44) and choose any2 i(say as a sequence of elementary

rotations) so as to reduce A to the triangular form (21.47), that is, in order to annihilate the desiredentries in the top row

Let us denote the resulting entries of the postarray arbitrarily as:

Trang 19

λuT i P

1/2 i−1

0 √1

λP

1/2 i−1

where{a, b, C} are quantities that we wish to identify [a is a scalar, b is a column vector, and C is a

lower triangular matrix] The claim is that by constructing2 i in this way (i.e., by simply requiringthat it achieves the desired zero pattern in the postarray), the resulting quantities{a, b, C} will be

meaningful and can in fact be identified with the quantities in the postarray B.

To verify that the quantities{a, b, C} can indeed be identified with {γ −1/2 (i), g i γ −1/2 (i), P1/2

0 √1

λP

1/2 i−1

In summary, we have established the validity of an array alternative to the RLS algorithm, known

as the inverse QR algorithm (also as square-root RLS) It is listed in Table21.3 The recursions are

known as inverse QR since they propagate P1/2

i , which is a square-root factor of the inverse of the

coefficient matrix8 i

TABLE 21.3 The Inverse QR Algorithm

Initialization Start with w−1= ¯w and

0 √1

λP

1/2 i−1

where the quantities{γ −1/2 (i), g i γ −1/2 (i)} are

read from the entries of the postarray.

The computational cost isO(M2) per iteration.

Trang 20

21.6.4 The QR Algorithm

The RLS recursion (21.39) and the inverse QR recursion of Table21.3propagate the variable Pior

a square-root factor of it The starting condition for both algorithms is therefore dependent on theweighting matrix50or its square-root factor51/2

0 This situation becomes inconvenient when the initial condition50assumes relatively large values,say50= σI with σ  1 A particular instance arises, for example, when we take σ → ∞ in which

case the regularized least-squares problem (21.27) reduces to a standard least-squares problem of theform

For such problems, it is preferable to propagate the inverse of the variable Pi rather than Pi itself

Recall that the inverse of Piis8 isince we have defined earlier Pi = 8−1i

The QR algorithm is a recursive procedure that propagates a square-root factor of8 i Its validitycan be verified in much the same way as we did for the inverse QR algorithm We form a prearray ofnumbers and then choose a sequence of rotations that induces a desired zero pattern in the postarray.Then by squaring and comparing terms on both sides of an equality we can identify the resultingentries of the postarray as meaningful quantities in the RLS context For this reason, we shall be briefand only highlight the main points

Let81/2

i−1denote a square-root factor (preferably lower-triangular) of8 i−1,8 i−1 = 81/2

i−1 8 T /2 i−1 ,

and define, for notational convenience, the quantity

At time(i − 1) we form the prearray of numbers

λq T i−1 d(i)

λq T i−1 d(i)

Ngày đăng: 13/12/2013, 00:15

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[1] Haykin, S., Adaptive Filter Theory, 3rd ed., Prentice-Hall, Englewood Cliffs, NJ, 1996 Sách, tạp chí
Tiêu đề: Adaptive Filter Theory
[2] Proakis, J.G., Rader, C.M., Ling, F., and Nikias, C.L., Advanced Digital Signal Processing, Macmillan, New York, 1992 Sách, tạp chí
Tiêu đề: Advanced Digital Signal Processing
[3] Honig, M.L. and Messerschmitt, D.G., Adaptive Filters — Structures, Algorithms and Appli- cations, Kluwer Academic Publishers, 1984 Sách, tạp chí
Tiêu đề: Adaptive Filters — Structures, Algorithms and Appli-cations
[4] Orfanidis, S.J., Optimum Signal Processing, 2nd ed., McGraw-Hill, New York, 1988 Sách, tạp chí
Tiêu đề: Optimum Signal Processing
[5] Kalouptsidis, N. and Theodoridis, S., Adaptive System Identification and Signal Processing Algorithms, Prentice-Hall, Englewood Cliffs, NJ, 1993.The array formulation that we emphasized in this chapter is motivated by the state-space approach developed in Sách, tạp chí
Tiêu đề: Adaptive System Identification and Signal ProcessingAlgorithms
[7] Morf, M. and Kailath, T. Square root algorithms for least squares estimation, IEEE Trans.Automatic Control, AC-20(4), 487–497, Aug. 1975 Sách, tạp chí
Tiêu đề: IEEE Trans."Automatic Control
[8] Lee, D.T.L., Morf, M., and Friedlander, B., Recursive least-squares ladder estimation algorithms, IEEE Trans. Circuits and Systems, CAS-28(6), 467–481, June 1981 Sách, tạp chí
Tiêu đề: IEEE Trans. Circuits and Systems
[9] Friedlander, B., Lattice filters for adaptive processing, Proc. IEEE, 70(8), 829–867, Aug. 1982 Sách, tạp chí
Tiêu đề: Proc. IEEE
[10] Lev-Ari, H., Kailath, T., and Cioffi, J., Least squares adaptive lattice and transversal filters: a unified geometrical theory, IEEE Trans. Information Theory, IT-30(2), 222–236, March, 1984.The fast fixed-order recursive least-squares algorithms (FTF and FAEST) were independently derived in Sách, tạp chí
Tiêu đề: IEEE Trans. Information Theory

TỪ KHÓA LIÊN QUAN