Adaptive Filtering Part 4 pot

The Ultra High Speed LMS Algorithm Implemented on Parallel Architecture Suitable for Multidimensional Adaptive Filtering 79 Fig.. The Ultra High Speed LMS Algorithm Implemented on Paral

Trang 1

The Ultra High Speed LMS Algorithm Implemented on

Parallel Architecture Suitable for Multidimensional Adaptive Filtering 79

Fig 42 The clean speech obtained at the output of our proposed ANC (Fig 30) by reducing

the time scale

8 The ultra high speed LMS algorithm implemented on parallel architecture

There are many problems that require enormous computational capacity to solve, and

therefore the success of computational science to accurately describe and model the real

world has helped to fuel the ever increasing demand for cheap computing power Scientists

are eager to find ways to test the limits of theories, using high performance computing to

allow them to simulate more realistic systems in greater detail Parallel computing offers a

way to address these problems in a cost effective manner Parallel Computing deals with

the development of programs where multiple concurrent processes cooperate in the

fulfilment of a common task Finally, in this section we will develop the theory of the

parallel computation of the widely used algorithms named the least-mean-square (LMS)

algorithm1 by its originators, Widrow and Hoff (1960) [2]

8.1 The spatial radix-r factorization

This section will be devoted in proving that discrete signals could be decomposed into r

partial signals and whose statistical properties remain invariant therefore, given a discrete

1 M Jaber “Method and apparatus for enhancing processing speed for performing a least mean square

operation by parallel processing” US patent No 7,533,140, 2009

Trang 2

 l n,  l c,  rn rn1  rn p

is the product of the identity matrix of size r by r sets of vectors of size N/r (n = 0,1, , N/r -1)

where the l th element of the n th product is stored into the memory address location given by

x

E x N







(101) which could be factorizes as:

Similarly to the mean, the variance of the signal x n equal to sum of the variances of its r

partial signals according to:

= =

8.2 The parallel implementation of the least squares method

The method of least squares assumes that the best-fit curve of a given type is the curve that

has the minimal sum of the deviations squared (least square error) from a given set of data

Suppose that the N data points are (x0, y0), (x1, y1)… (x (n – 1) , y (n – 1)), where x is the

independent variable and y is the dependent variable The fitting curve d has the deviation

(error) σ from each data point, i.e., σ0 = d0 – y0, σ1 = d1 – y1 σ (n – 1) = d (n – 1) – d (n – 1) which

could be re-ordered as:

Trang 3

According to the method of least squares, the best fitting curve has the property that:

0 0

0

0 0

for j0 = 1, …, r – 1 and in order to pick the line which best fits the data, we need a criterion to

determine which linear estimator is the “best” The sum of square errors (also called the

mean square error (MSE)) is a widely utilized performance criterion

 

1 2 0

N n n

11

N r

J is the partial MSE applied on the subdivided data

Our goal is to minimize J analytically, which according to Gauss can be done by taking its

partial derivative with respect to the unknowns and equating the resulting equations to

zero:

00

J b J w

Trang 4

0 1

0 0

r j r

j j

r j

2 1

12

=

21

N

N r

r J r

J is the partial MSE applied on the subdivided data

The solution to the extreme (minimum) of this equation can be found in exactly the same

way as before, that is, by taking the derivatives of

0

J with respect to the unknowns (w k), and equating the result to zero

Instead of solving equations 110 and 111 analytically, a gradient adaptive system can be

used which is done by estimating the derivative using the difference operator This

estimation is given by:

w

J J w



where in this case the bias b is set to zero

8.3 Search of the performance surface with steepest descent

The method of steepest descent (also known as the gradient method) is the simplest example

of a gradient based method for minimizing a function of several variables [12] In this section we

will be elaborating the linear case

Since the performance surface for the linear case implemented in parallel, are r paraboloids

each of which has a single minimum, an alternate procedure to find the best value of the

coefficient w j0 k is to search in parallel the performance surface instead of computing the

best coefficient analytically by Eq 110 The search for the minimum of a function can be

done efficiently using a broad class of methods that use gradient information The gradient has

two main advantages for search

 The gradient can be computed locally

 The gradient always points in the direction of maximum change

If the goal is to reach the minimum in each parallel segment, the search must be in the

direction opposite to the gradient So, the overall method of search can be stated in the

following way:

Trang 5

Start the search with an arbitrary initial weight w j0 0 , where the iteration is denoted by the

index in parenthesis (Fig 43) Then compute the gradient of the performance surface atw j0 0 ,

and modify the initial weight proportionally to the negative of the gradient atw j0 0  This

changes the operating point tow j0 1  Then compute the gradient at the new position w j0 1 ,

and apply the same procedure again, i.e

 denotes the gradient of the performance surface at the

k th iteration of j0 parallel segment η is used to maintain stability in the search by ensuring

that the operating point does not move too far along the performance surface This search

procedure is called the steepest descent method (fig 43)

Fig 43 The search using the gradient information [13]

If one traces the path of the weights from iteration to iteration, intuitively we see that if the

constant η is small, eventually the best value for the coefficient w* will be found Whenever

w>w*, we decrease w, and whenever w<w*, we increase w

8.4 The radix- r parallel LMS algorithm

Based on what was proposed in [2] by using the instantaneous value of the gradient as the

estimator for the true quantity which means by dropping the summation in equation 108 and

then taking the derivative with respect to w yields:

1

Trang 6

What this equation tells us is that an instantaneous estimate of the gradient is simply the product

of the input to the weight times the error at iteration k This means that with one multiplication

per weight the gradient can be estimated This is the gradient estimate that leads to the

famous Least Means Square (LMS) algorithm (Fig 44)

If the estimator of Eq.114 is substituted in Eq.113, the steepest descent equation becomes

This equation is the r Parallel LMS algorithm, which is used as predictive filter, is illustrated

in Figure 45 The small constant η is called the step size or the learning rate

Jaber Product Device

 n

e

Fig 45 r Parallel LMS Algorithm Used in Predictive Filter

Trang 7

8.5 Simulation results

The notion of a mathematical model is fundamental to sciences and engineering In class

of applications dealing with identification, an adaptive filter is used is used to provide a linear model that represents the best fit (in some sense) to an unknown signal The LMS Algorithm which is widely used is an extremely simple and elegant algorithm that is able

to minimize the external cost function by using local information available to the system parameters Due to its computational burden and in order to speed up the process, this paper has presented an efficient way to compute the LMS algorithm in parallel where it follows from the simulation results that the stability of our models relies on the stability of

our r parallel adaptive filters It follows from figures 47 and 48 that the stability of r parallel LMS filters (in this case r = 2) has been achieved and the convergence

performance of the overall model is illustrated in figure 49 The complexity of the

proposed method will be reduced by a factor of r in comparison to the direct method

illustrated in figure 46 Furthermore, the simulation result of the channel equalization is illustrated in figure 50 in which the blue curves represents our parallel implementation (2 LMS implemented in parallel) compared to the conventional method where the curve is in

Trang 8

first portion of error

Fig 47 Simulation Result of the first partial LMS Algorithm

Fig 48 Simulation Result of the second partial LMS Algorithm

Trang 9

reconstructed error signal

Fig 49 Simulation Result of the Overall System

Trang 10

8 References

[1] S Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, NJ, 1991

[2] Widrow and Stearns, " Adaptive Signal Processing ", Prentice Hall 195

[3] K Mayyas, and T Aboulnasr, "A Robust Variable Step Size LMS-Type Algorithm:

Analysis and Simulations", IEEE 5-1995, pp 1408-1411

[4] T Aboulnasar, and K Mayyas, "Selective Coefficient Update of Gradient-Based Adaptive

Algorithms", IEEE 1997, pp 1929-1932

[5] E Bjarnason: "Analysis of the Filtered X LMS Algorithm ", IEEE 4 1993, pp 511,

III-514

[6] E.A Wan, "Adjoint LMS: An Efficient Alternative To The Filtered X LMS And Multiple

Error LMS Algorithm", Oregon Graduate Institute Of Science & Technology, Department Of Electrical Engineering And Applied Physics, P.O Box 91000, Portland, OR 97291

[7] B Farhang-Boroujeny: “Adaptive Filters, Theory and Applications”, Wiley 1999

[8] Wiener, Norbert (1949) “Extrapolation, Interpolation, and Smoothing of Stationary Time

Series”, New York: Wiley ISBN 0-262-73005-7

[9] M Jaber “Noise Suppression System with Dual Microphone Echo Cancellation US patent

no.US-6738482

[10] M Jaber, “Voice Activity detection Algorithm for Voiced /Unvoiced Decision and Pitch

Estimation in a Noisy Speech feature Extraction”, US patent application no 60/771167, 2007

[11] M Jaber and D Massicottes: “A Robust Dual Predictive Line Acoustic Noise Canceller”,

International Conference on Digital Signal Processing DSP 2009 Santorini Greece [12] M Jaber, D Massicotte, "A New FFT Concept for Efficient VLSI Implementation: Part I

– Butterfly Processing Element", 16th International Conference on Digital Signal Processing (DSP’09), Santorini, Greece, 5-7 July 2009

[13] J.C Principe, W.C Lefebvre, N.R Euliano, “Neural Systems: Fundamentals through

Simulation”, 1996

Trang 11

4

An LMS Adaptive Filter Using Distributed Arithmetic - Algorithms and Architectures

Kyo Takahashi1, Naoki Honma2 and Yoshitaka Tsunekawa2

1Iwate Industrial Research Institute

& Koizumi, 1988) Therefore, implementations of very high order adaptive filters are required In order to satisfy these requirements, highly-efficient algorithms and architectures are desired The adaptive filter is generally constructed by using the multipliers, adders and memories, and so on, whereas, the structure without multipliers has been proposed

The LMS adaptive filter using distributed arithmetic can be realized by using adders and memories without multipliers, that is, it can be achieved with a small hardware A Distributed Arithmetic (DA) is an efficient calculation method of an inner product of constant vectors, and it has been used in the DCT realization Furthermore, it is suitable for time varying coefficient vector in the adaptive filter Cowan and others proposed a Least Mean Square (LMS) adaptive filter using the DA on an offset binary coding (Cowan & Mavor, 1981; Cowan et al, 1983) However, it is found that the convergence speed of this me-thod is extremely degraded (Tsunekawa et al, 1999) This degradation results from an offset bias added to an input signal coded on the offset binary coding To overcome this problem,

an update algorithm generalized with 2’s complement representation has been proposed (Tsunekawa et al, 1999), and the convergence condition has been analyzed (Takahashi et al, 2002) The effective architectures for the LMS adaptive filter using the DA have been proposed (Tsunekawa et al, 1999; Takahashi et al, 2001) The LMS adaptive filter using distributed arithmetic is expressed by DA-ADF The DA is applied to the output calculation, i.e., inner product of the input signal vector and coefficient vector The output signal is obtained by the shift and addition of the partial-products specified with the bit patterns of the N-th order input signal vector This process is performed from LSB to MSB direction at the every sampling instance, where the B indicates the word length The B partial-products

Trang 12

used to obtain the output signal are updated from LMB to MSB direction There exist 2N

partial-products, and the set including all the partial-products is called Whole Adaptive

Function Space (WAFS) Furthermore, the DA-ADF using multi-memory block structure

that uses the divided WAFS (MDA-ADF) (Wei & Lou, 1986; Tsunakawa et al, 1999) and the

MDA-ADF using half-memory algorithm based on the pseudo-odd symmetry property of

the WAFS (HMDA-ADF) have been proposed (Takahashi et al, 2001) The divided WAFS is

expressed by DWAFS

In this chapter, the new algorithm and effective architecture of the MDA-ADF are discussed

The objectives are improvements of the MDA-ADF permitting the increase of an amount of

hardware and power dissipation The convergence properties of the new algorithm are

evaluated by the computer simulations, and the efficiency of the proposed VLSI architecture

The output signal of an adaptive filter is represented as

and the wi(k) is an i-th tap coefficient of the adaptive filter

The Widrow’s LMS algorithm (Widrow et al, 1975) is represented as

k1  k 2e k   k

where, the e(k), μ and d(k) are an error signal, a step-size parameter and the desired signal,

respectively The step-size parameter deterimines the convergence speed and the accuracy

of the estimation The error signal is obtained by

 k d   k yk

The fundamental structure of the LMS adaptive filter is shown in Fig 1 The filter input

signal s(k) is fed into the delay-line, and shifted to the right direction every sampling

instance The taps of the delay-line provide the delayed input signal corresponding to the

depth of delay elements The tap outputs are multiplied with the corresponding

coefficients, the sum of these products is an output of the LMS adaptive filter The error

signal is defined as the difference between the desired signal and the filter output signal

The tap coefficients are updated using the products of the input signals and the scaled

error signal

Trang 13

An LMS Adaptive Filter Using Distributed Arithmetic - Algorithms and Architectures 91

Fig 1 Fundamental Structure of the 4-tap LMS adaptive filter

3 LMS adaptive filter using distributed arithmetic

In the following discussions, the fundamentals of the DA on the 2’s complement

representation and the derivation of the DA-ADF are explained The degradation of the

convergence property and the drastic increase of the amount of hardware in the DA-ADF

are the serious problems for its higher order implementation As the solutions to overcome

the problems, the multi-memory block structure and the half-memory algorithm based on

the pseudo-odd symmetry property of WAFS are explained

3.1 Distributed arithmetic

The DA is an efficient calculation method of an inner product by a table lookup method

(Peled &Liu, 1974) Now, let’s consider the inner product

1

N T

i i i

1 i 



and

1 0 1

Trang 14

In the Eq.(9), vik indicates the k-th bit of vi, i.e., 0 or 1 By substituting Eq.(9) for Eq.(6),

Eq.(10) indicates that the inner product of y is obtained as the weighted sum of the

partial-products The first term of the right side is weighted by -1, i.e., sign bit, and the following

terms are weighted by the 2-k Fig.2 shows the fundamental structure of the FIR filter using

the DA (DA-FIR) The function table is realized using the Read Only Memory (ROM), and

the right-shift and addition operation is realized using an adder and register The ROM

previously includes the partial-products determined by the tap coefficient vector and the

bit-pattern of the input signal vector From above discussions, the operation time is only

depended on the word length B, not on the number of the term N, fundamentally This

means that the output latency is only depended on the word length B The FIR filter using

the DA can be implemented without multipliers, that is, it is possible to reduce the amount

of hardware

Fig 2 Fundamental structure of the FIR filter using distributed arithmetic

3.2 Derivation of LMS adaptive algorithm using distributed arithmetic

The derivation of the LMS algorithm using the DA on 2’s complement representation is as

follows The N-th order input signal vector in Eq.(1) is defined as

Trang 15

An LMS Adaptive Filter Using Distributed Arithmetic - Algorithms and Architectures 93

In Eq.(12) and Eq(13), an address matrix which is determined by the bit pattern of the input

signal vector is represented as

The P(k) is a subset of the WAFS including the elements specified by the row vectors (access

vectors) of the address matrix Now, multiplying both sides by AT(k), Eq.(4) becomes

To overcome this problem, the simplification of the term of AT(k)A(k)F in Eq.(21) has been also

achieved on the 2’s complement representation (Tsunekawa et al, 1999) By using the relation

Tiêu đề	The Ultra High Speed LMS Algorithm Implemented on Parallel Architecture Suitable for Multidimensional Adaptive Filtering
Trường học	University of Science and Technology
Chuyên ngành	Adaptive Filtering
Thể loại	Bài luận

Định dạng
Số trang	30
Dung lượng	2,86 MB