1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

10 Fast Matrix Computations

10 337 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề 10 fast matrix computations
Tác giả Andrew E. Yagle
Người hướng dẫn Vijay K. Madisetti (Editor), Douglas B. Williams (Editor)
Trường học University of Michigan
Chuyên ngành Electrical Engineering
Thể loại Book chapter
Năm xuất bản 1999
Thành phố Boca Raton
Định dạng
Số trang 10
Dung lượng 177,96 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Yagle University of Michigan 10.1 Introduction 10.2 Divide-and-Conquer Fast Matrix Multiplication Strassen Algorithm •Divide-and-Conquer•Arbitrary Preci-sion Approximation APA Algorithm

Trang 1

Yagle, A.E “Fast Matrix Computations”

Digital Signal Processing Handbook

Ed Vijay K Madisetti and Douglas B Williams Boca Raton: CRC Press LLC, 1999

Trang 2

Fast Matrix Computations

Andrew E Yagle

University of Michigan

10.1 Introduction 10.2 Divide-and-Conquer Fast Matrix Multiplication Strassen Algorithm •Divide-and-Conquer•Arbitrary

Preci-sion Approximation (APA) Algorithms•Number Theoretic Transform (NTT) Based Algorithms

10.3 Wavelet-Based Matrix Sparsification Overview •The Wavelet Transform•Wavelet Representations

of Integral Operators •Heuristic Interpretation of Wavelet

Sparsification References

10.1 Introduction

This chapter presents two major approaches to fast matrix multiplication We restrict our attention

to matrix multiplication, excluding matrix addition and matrix inversion, since matrix addition admits no fast algorithm structure (save for the obvious parallelization), and matrix inversion (i.e., solution of large linear systems of equations) is generally performed by iterative algorithms that require repeated matrix-matrix or matrix-vector multiplications Hence, matrix multiplication is the real problem of interest

We present two major approaches to fast matrix multiplication The first is the divide-and-conquer strategy made possible by Strassen’s [1] remarkable reformulation of non-commutative 2× 2 matrix multiplication We also present the APA (arbitrary precision approximation) algorithms, which improve on Strassen’s result at the price of approximation, and a recent result that reformulates matrix multiplication as convolution and applies number theoretic transforms The second approach is to use a wavelet basis to sparsify the representation of Calderon-Zygmund operators as matrices Since electromagnetic Green’s functions are Calderon-Zygmund operators, this has proven to be useful

in solving integral equations in electromagnetics The sparsified matrix representation is used in

an iterative algorithm to solve the linear system of equations associated with the integral equations, greatly reducing the computation We also present some new insights that make the wavelet-induced sparsification seem less mysterious

10.2 Divide-and-Conquer Fast Matrix Multiplication

10.2.1 Strassen Algorithm

It is not obvious that there should be any way to perform matrix multiplication other than using the definition of matrix multiplication, for which multiplying twoN × N matrices requires N3

Trang 3

multiplications and additions (N for each of the N2elements of the resulting matrix) However, in

1969 Strassen [1] made the remarkable observation that the product of two 2× 2 matrices



a1,1 a1,2

a2,1 a2,2

 

b1,1 b1,2

b2,1 b2,2



=



c1,1 c1,2

c2,1 c2,2



(10.1) may be computed using only seven multiplications (fewer than the obvious eight), as

m1 = (a1,2 − a2,2 )(b2,1 + b2,2 ); m3= (a1,1 − a2,1 )(b1,1 + b1,2 )

m2 = (a1,1 + a2,2 )(b1,1 + b2,2 )

m4 = (a1,1 + a1,2 )b2,2 ; m7= (a2,1 + a2,2 )b1,1

m5 = a1,1 (b1,2 − b2,2 ); m6= a2,2 (b2,1 − b1,1 )

c1,1 = m1+ m2− m4+ m6; c1,2 = m4+ m5

A vital feature of (10.2) is that it is non-commutative, i.e., it does not depend on the commutative

property of multiplication This can be seen easily by noting that each of them i are the product

of a linear combination of the elements ofA by a linear combination of the elements of B, in that

order, so that it is never necessary to use, saya2,2 b2,1 = b2,1 a2,2 We note there exist commutative algorithms for 2× 2 matrix multiplication that require even fewer operations, but they are of little practical use

The significance of noncommutativity is that the noncommutative algorithm (10.2) may be applied

as is to block matrices That is, if the a i,j , b i,j andc i,j in (10.1) and (10.2) are replaced by block matrices, (10.2) is still true Since matrix multiplication can be subdivided into block submatrix operations (i.e (10.1) is still true ifa i,j , b i,jandc i,jare replaced by block matrices), this immediately leads to a divide-and-conquer fast algorithm

10.2.2 Divide-and-Conquer

To see this, consider the 2n× 2nmatrix multiplication AB = C, where A, B, C are all 2 n× 2n

matrices Using the usual definition, this requires(2 n )3 = 8nmultiplications and additions But

ifA, B, C are subdivided into 2 n−1× 2n−1blocksa i,j , b i,j , c i,j, thenAB = C becomes (10.1), which can be implemented with (10.2) since (10.2) does not require the products of subblocks ofA

andB to commute Thus the 2 n× 2nmatrix multiplicationAB = C can actually be implemented

using only seven matrix multiplications of 2n−1× 2n−1subblocks ofA and B And these subblock

multiplications can in turn be broken down by using (10.2) to implement them as well The end result

is that the 2n×2nmatrix multiplicationAB = C can be implemented using only 7 nmultiplications,

instead of 8n.

The computational savings grow as the matrix size increases Forn = 5 (32 × 32 matrices) the

savings is about 50% Forn = 12 (4096 × 4096 matrices) the savings is about 80% The savings as

a fraction can be made arbitrarily close to unity by taking sufficiently large matrices Another way of looking at this is to note thatN × N matrix multiplication requires O(Nlog27) = O(N2.807 ) < N3

multiplications using Strassen

Of course we are not limited to subdividing into 2× 2 = 4 subblocks Fast non-commutative algorithms for 3× 3 matrix multiplication requiring only 23 < 33= 27 multiplications were found

by exhaustive search in [2] and [3]; 23 is now known to be optimal Repeatedly subdividingAB = C

into 3× 3 = 9 subblocks computes a 3n× 3nmatrix multiplication in 23n < 27 nmultiplications;

N × N matrix multiplication requires O(Nlog323) = O(N2.854 ) multiplications, so this is not quite

as good as using (10.2) A fast noncommutative algorithm for 5× 5 matrix multiplication requiring only 102< 53 = 125 multiplications was found in [4]; this also seems to be optimal Using this

Trang 4

algorithm,N × N matrix multiplication requires O(Nlog5102) = O(N2.874 ) multiplications, so

this is even worse Of course, the idea is to writeN = 2 a3b5c for somea, b, c and subdivide into

2× 2 = 4 subblocks a times, then subdivide into 3 × 3 = 9 subblocks b times, etc The total number

of multiplications is then 7a23b102c < 8 a27b125c = N3

Note that we have not mentioned additions Readers familiar with nesting fast convolution algo-rithms will know why; now we review why reducing multiplications is much more important than reducing additions when nesting algorithms The reason is that at each nesting stage (reversing the divide-and-conquer to build up algorithms for multiplying large matrices from (10.2)), each scalar addition is replaced by a matrix addition (which requiresN2additions forN × N matrices), and

each scalar multiplication is replaced by a matrix multiplication (which requiresN3multiplications

and additions for N × N matrices) Although we are reducing N3to aboutN2.8, it is clear that each

multiplication will produce more multiplications and additions as we nest than each addition So

reducing the number of multiplications from eight to seven in (10.2) is well worth the extra additions incurred In fact, the number of additions is alsoO(N2.807 ).

The design of these base algorithms has been based on the theory of bilinear and trilinear forms The review paper [5] and book [6] of Pan are good introductions to this theory We note that reducing the exponent ofN in N × N matrix multiplication is an area of active research This exponent has

been reduced to below 2.5; a known lower bound is two However, the resulting algorithms are too complicated to be useful

10.2.3 Arbitrary Precision Approximation (APA) Algorithms

APA algorithms are noncommutative algorithms for 2× 2 and 3 × 3 matrix multiplication that require even fewer multiplications than the Strassen-type algorithms, but at the price of requiring longer word lengths Proposed by Bini [7], the APA algorithm for multiplying two 2× 2 matrices is this:

p1 = (a2,1 + a1,2 )(b2,1 + b1,2 ) ;

p2 = (−a2,1 + a1,1 )(b1,1 + b1,2 )

p3 = (a2,2 − a1,2 )(b2,1 + b2,2 ) ;

p4 = a2,1 (b1,1 − b2,1 ) ;

p5 = (a2,1 + a2,2 )b2,1

c1,1 = (p1+ p2+ p4)/ − (a1,1 + a1,2 )b1,2;

c2,1 = p4+ p5;

c2,2 = (p1+ p3− p5)/ − a1,2 (b1,2 − b2,2 ) (10.3)

If we now let → 0, the second terms in (10.3) become negligible next to the first terms, and so they need not be computed Hence, three of the four elements ofC = AB may be computed using

only five multiplications c1,2may be computed using a sixth multiplication, so that, in fact, two

2× 2 matrices may be multiplied to arbitrary accuracy using only six multiplications The APA 3 × 3 matrix multiplication algorithm requires 21 multiplications Note that APA algorithms improve on the exact Strassen-type algorithms (6< 7, 21 < 23).

The APA algorithms are often described as being numerically unstable, due to roundoff error as

 → 0 We believe that an electrical engineering perspective on these algorithms puts them in a light

different from that of the mathematical perspective In fixed point implementation, the computation

AB = C can be scaled to operations on integers, and the p ican be bounded Then it is easy to set

a sufficiently small (negative) power of two to ensure that the second terms in (10.3) do not overlap the first terms, provided that the wordlength is long enough Thus, the reputation for instability

Trang 5

is undeserved However, the requirement of large wordlengths to be multiplied seems also to have escaped notice; this may be a more serious problem in some architectures

The divide-and-conquer and resulting nesting of APA algorithms work the same way as for the Strassen-type algorithms.N×N matrixmultiplicationusing(10.3) requiresO(Nlog 2(6)) = O(N2.585 )

multiplications, which improves on theO(N2.807 ) multiplications using (10.2) But the wordlengths are longer

A design methodology for fast matrix multiplication algorithms by grouping terms has been proposed in a series of papers by Pan (see References [5] and [6]) While this has proven quite fruitful, the methodology of grouping terms becomes somewhat ad hoc

10.2.4 Number Theoretic Transform (NTT) Based Algorithms

An approach similar in flavor to the APA algorithms, but more flexible, has been taken recently in [8] First, matrix multiplication is reformulated as a linear convolution, which can be implemented as the multiplication of two polynomials using the z-transform Second, the variablez is scaled, producing

a scaled convolution, which is then made cyclic This aliases some quantities, but they are separated

by a power of the scaling factor Third, the scaled convolution is computed using pseudo-number-theoretic transforms Finally, the various components of the product matrix are read off of the convolution, using the fact that the elements of the product matrix are bounded This can be done without error if the scaling factor is sufficiently large

This approach yields algorithms that require the same number of multiplications or fewer as APA for 2× 2 and 3 × 3 matrices The multiplicands are again sums of scaled matrix elements as in APA However, the design methodology is quite simple and straightforward, and the reason why the fast algorithm exists is now clear, unlike the APA algorithms Also, the integer computations inherent in this formulation make possible the engineering insights into APA noted above

We reformulate the product of twoN ×N matrices as the linear convolution of a sequence of length

N2and a sparse sequence of lengthN3− N + 1 This results in a sequence of length N3+ N2− N,

from which elements of the product matrix may be obtained For convenience, we write the linear convolution as the product of two polynomials This result (of [8]) seems to be new, although a similar result is briefly noted in ([3], p 197) Define

N−1X

i=0

N−1X

j=0

a i+jN x i+jN

N−1X

i=0

N−1X

j=0

=

N3+NX2−N−1

i=0

c i x i

c i,j = c N2−N+i+jN2; 0 ≤ i, j ≤ N − 1 (10.4) Note that coefficients of all three polynomials are read off of the matricesA, B, C column-by-column

(each column ofB is reversed), and the result is noncommutative For example, the 2 × 2 matrix

multiplication (10.1) becomes

a1,1 + a2,1 x + a1,2 x2+ a2,2 x3

b2,1 + b1,1 x2+ b2,2 x4+ b1,2 x6

= ∗ + ∗x + c1,1 x2+ c2,1 x3+ ∗x4+ ∗x5+ c1,2 x6+ c2,2 x7+ ∗x8+ ∗x9, (10.5)

Trang 6

where∗ denotes an irrelevant quantity In (10.5) substitutex = sz and take the result mod(z6− 1).

This gives



a1,1 + a2,1 sz + a1,2 s2z2+ a2,2 s3z3 

(b2,1 + b1,2 s6) + b1,1 s2z2+ b2,2 s4z4

= (∗ + c1,2 s6) + (∗s + c2,2 s7)z + (c1,1 s2+ ∗s8)z2

+ (c2,1 s3+ ∗s9)z3+ ∗z4+ ∗z5; mod(z6− 1) (10.6)

If|c i,j |, | ∗ | < s6then the∗ and c i,j may be separated without error, since both are known to be integers Ifs is a power of two, c0,1may be obtained by discarding the 6 log2s least significant bits in

the binary representation of∗+c0,1 s6 The polynomial multiplicationmod(z6−1) can be computed

using number-theoretic transforms [9] using six multiplications Hence, 2×2 matrix multiplication requires six multiplications Similarly, 3× 3 matrices may be multiplied using 21 multiplications Note these are the same numbers required by the APA algorithms, quantities multiplied are again sums of scaled matrix elements, and results are again sums in which one quantity is partitioned from another quantity which is of no interest

However, this approach is more flexible than the APA approach (see [8]) As an extreme case, settingz = 1 in (10.5) computes a 2× 2 matrix multiplication using ONE (very long wordlength) multiplication! For example, usings = 100



2 4

3 5

 

9 8

7 6



=



46 40

62 54



(10.7) becomes the single scalar multiplication

(5, 040, 302)(8, 000, 600, 090, 007) = 40, 325, 440, 634, 862, 462, 114 (10.8) This is useful in optical computing architectures for multiplying large numbers

10.3 Wavelet-Based Matrix Sparsification

10.3.1 Overview

A common application of solving large linear systems of equations is the solution of integral equations arising in, say, electromagnetics The integral equation is transformed into a linear system of equations using Galerkin’s method, so that entries in the matrix and vectors of knowns and unknowns are coefficients of basis functions used to represent the continuous functions in the integral equation Intelligent selection of the basis functions results in a sparse (mostly zero entries) system matrix The sparse linear system of unknowns is then usually solved using an iterative algorithm, which is where the sparseness becomes an advantage (iterative algorithms require repeated multiplication of the system matrix by the current approximation to the vector of unknowns)

Recently, wavelets have been recognized as a good choice of basis function for a wide variety of applications, especially in electromagnetics This is true because in electromagnetics the kernel of the integral equation is a 2-D or 3-D Green’s function for the wave equation, and these are Calderon-Zygmund operators Using wavelets as basis functions makes the matrix representation of the kernel drop off rapidly away from the main diagonal, more rapidly than discretization of the integral equation would produce

Here we quickly review the wavelet transform as a representation of continuous functions and show how it sparsifies Calderon-Zygmund integral operators We also provide some insight into why this happens and present some alternatives that make the sparsification less mysterious We present our results in terms of continuous (integral) operators, rather than discrete matrices, since this is the proper presentation for applications, and also since similar results can be obtained for the explicitly discrete case

Trang 7

10.3.2 The Wavelet Transform

We will not attempt to present even an overview of the rich subject of wavelets The reader is urged to consult the many papers and textbooks (e.g., [10]) now being published on the subject Instead, we restrict our attention to aspects of wavelets essential to sparsification of matrix operator representations

The wavelet transform of anL2functionf (x) is defined as

f i (n) = 2 i/2Z ∞

−∞f (x)ψ (2 i x − n)dx; f (x) =X

i

X

n

f i (n)ψ(2 i x − n)2 i/2 (10.9)

where{ψ(2 i x −n), i, n ∈ Z} is a complete orthonormal basis for L2 That isL2(the space of square-integrable functions) is spanned by dilations (scaling) and translations of a wavelet basis function

ψ(x) Constructing this ψ(x) is nontrivial, but has been done extensively in the literature.

Since the summations must be truncated to finite intervals in practice, we define the wavelet scaling functionφ(x) whose translations on a given scale span the space spanned by the wavelet basis function ψ(x) at all translations and at scales coarser than the given scale Then we can write

f (x) = 2 I/2X

n

c I (n)φ(2 I x − n) +X∞

i=I

X

n

f i (n)ψ(2 i x − n)2 i/2

c I (n) = 2 I/2

Z ∞

So the projectionc I (n) of f (x) on the scaling function φ(x) at scale I replaces the projections

f i (n) on the basis function ψ(x) on scales coarser (smaller) than I The scaling function φ(x) is

orthogonal to its translations but (unlike the basis functionψ(x)) is not orthogonal between scales.

Truncating the summation at the upper end approximatesf (x) at the resolution defined by the finest

(largest) scalei; this is somewhat analogous to truncating Fourier series expansions and neglecting

high-frequency components

We also define the 2-D wavelet transform off (x, y) as

f i,j (m, n) = 2 i/22j/2Z ∞

−∞

Z ∞

−∞f (x, y)ψ(2 i x − m)ψ(2 j y − n)dx dy

i,j,m,n

f i,j (m, n)ψ(2 i x − m)ψ(2 j y − n)2 i/22i/2 (10.11)

However, it is more convenient to use the 2-D counterpart of (10.10), which is

c I (m, n) = 2 IZ ∞

−∞

Z ∞

−∞f (x, y)φ(2 I x − m)φ(2 I y − n)dx dy

f1

i (m, n) = 2 i

Z ∞

−∞

Z ∞

−∞f (x, y)φ(2 i x − m)ψ(2 i y − n)dx dy

f2

i (m, n) = 2 i

Z ∞

−∞

Z ∞

−∞f (x, y)ψ(2 i x − m)φ(2 i y − n)dx dy

f3

i (m, n) = 2 iZ ∞

−∞

Z ∞

−∞f (x, y)ψ(2 i x − m)ψ(2 i y − n)dx dy

m,n

c I (m, n)φ(2 I x − m)φ(2 I y − n)2 I

Trang 8

i=I

X

m,n

f1

i (m, n)φ(2 i x − m)ψ(2 i y − n)2 i

+

X

i=I

X

m,n

f2

i (m, n)ψ(2 i x − m)φ(2 i y − n)2 i

+X∞

i=I

X

m,n

f3

i (m, n)ψ(2 i x − m)ψ(2 i y − n)2 i (10.12)

Once again the projectionc I (m, n) on the scaling function at scale I replaces all projections on the

basis functions on scales coarser thanM.

Some examples of wavelet scaling and basis functions:

Wavelet Haar Battle-Lemarie Paley-Littlewood Meyer Daubechies

An important property of the wavelet basis functionψ(x) is that its first k moments can be made

zero, for any integerk [10]:

Z ∞

10.3.3 Wavelet Representations of Integral Operators

We wish to use wavelets to sparsify theL2integral operatorK(x, y) in

g(x) =

Z ∞

A common situation: (10.14) is an integral equation with known kernelK(x, y) and known g(x)

in which the goal is to compute an unknown functionf (y) Often the kernel K(x, y) is the Green’s

function (spatial impulse response) relating observed wave field or signalg(x) to unknown source

field or signalf (y).

For example, the Green’s function for Laplace’s equation in free space is

G(r) = − 1

2π logr (2D);

1

wherer is the distance separating the points of source and observation Now consider a line source in

an infinite 2-D homogeneous medium, with observations made along the same line The observed field strengthg(x) at position x is

g(x) = − 1

2π

Z ∞

wheref (y) is the source strength at position y.

Using Galerkin’s method, we expandf (y) and g(x) as in (10.9) andK(x, y) as in (10.11) Using the orthogonality of the basis functions yields

X

j

X

n

Expandingf (y) and g(x) as in (10.10) andK(x, y) as in (10.12) leads to another system of equations which is difficult notationally to write out in general, but can clearly be done in individual applications

Trang 9

We note here that the entries in the system matrix in this latter case can be rapidly generated using the fast wavelet algorithm of Mallat (see [10])

The point of using wavelets is as follows.K(x, y) is a Calderon-Zygmund operator if

| ∂ k

∂x k K(x, y)| + | ∂ k

∂y k K(x, y)| ≤ C k

for somek ≥ 1 Note in particular that the Green’s functions in (10.15) are Calderon-Zygmund operators Then the representation (10.12) ofK(x, y) has the property [11]

|f1

i (m, n)| + |f2

i (m, n)| + |f3

1+ |m − n| k+1 , |m − n| > 2k (10.19)

if the wavelet basis functionψ(x) has its first k moments zero (10.13)

This means that using wavelets satisfying (10.13) sparsifies the matrix representation of the kernel K(x, y) For example, a direct discretization of the 3-D Green’s function in (10.15) decays as 1/|m−n|

as one moves away from the main diagonalm = n in its matrix representation However, using

wavelets, we can attain the much faster decay rate 1/(1+|m − n| k+1 ) far away from the main diagonal.

By neglecting matrix entries less than some threshold (typically 1% of the largest entry) a sparse and mostly banded matrix is obtained This greatly speeds up the following matrix computations:

1 Multiplication by the matrix for solving the forward problem of computing the response

to a given excitation (as in (10.16));

2 Fast solution of the linear system of equations for solving the inverse problem of re-constructing the source from a measured response (solving (10.16) as an integral equa-tion) This is typically performed using an iterative algorithm such as conjugate gradient method Sparsification is essential for convergence in a reasonable time

A typical sparsified matrix from an electromagnetics application is shown in Figure 6 of [12] Battle-Lemarie wavelet basis functions were used to sparsify the Galerkin method matrix in an integral equation for planar dielectric millimeter-wave waveguides and a 1% threshold applied (see [12] for details) Note that the matrix is not only sparse but (mostly) banded

10.3.4 Heuristic Interpretation of Wavelet Sparsification

Why does this sparsification happen? Considerable insight can be gained using (10.13) Let ˆψ(ω)

be the Fourier transform of the wavelet basis functionψ(x) Since the first k moments of ψ(x) are

zero by (10.13) we can expand ˆψ(ω) in a power series around ω = 0:

This shows that for small|ω| taking the wavelet transform of f (x) is roughly equivalent to taking

thek thderivative off (x) This can be confirmed that many wavelet basis functions bear a striking

resemblance to the impulse responses of regularized differentiators SinceK(x, y) is assumed a

Calderon-Zygmund operator, itsk thderivatives inx and y drop off as 1/|x − y| k+1 Thus, it is not

surprising that the wavelet transform ofK(x, y), which is roughly taking k thderivatives, should drop

off as 1/|m − n| k+1 Of course there is more to it, but this is why it happens.

It is not surprising thatK(x, y) can be sparsified by taking advantage of its derivatives being small.

To see a more direct way of accomplishing this, apply integration by parts to (10.14) and take the partial derivative with respect tox This gives

dg(x)

Z ∞

−∞



∂x

∂y K(x, y)

 Z y

−∞f (y0)dy0

Trang 10

which will likely sparsify a smoothK(x, y) Of course, higher derivatives can be used until a condition

like (10.18) is reached The operations of integratingf (y) and ∂ ∂x k g k (to getg(x)) k times can be

accomplished usingnk << n2additions, so considerable savings can result This is different from using wavelets, but in the same spirit

References

[1] Strassen, V., Gaussian elimination is not optimal,Numer Math., 13: 354–356, 1969.

[2] Landerman, J.D., A noncommutative algorithm for multiplying 3× 3 matrices using 23 mul-tiplications,Bull Am Math Soc., 82: 127–128, 1976.

[3] Johnson, R.W and McLoughlin, A.M., Noncommutative bilinear algorithms for 3× 3 matrix multiplication,SIAM J Comput., 15: 595–603, 1976.

[4] Makarov, O.M., A noncommutative algorithm for multiplying 5× 5 matrices using 102 mul-tiplications,Inform Proc Lett., 23: 115–117, 1986.

[5] Pan, V., How can we speed up matrix multiplication?SIAM Rev., 26(3): 393–415, 1984.

[6] Pan, V.,How Can We Multiply Matrices Faster?, Springer-Verlag, New York, 1984.

[7] Bini, D., Capovani, M., Lotti, G and Romani, F.,O(n2.7799 ) complexity for matrix

multipli-cation,Inform Proc Lett., 8: 234–235, 1979.

[8] Yagle, A.E., Fast algorithms for matrix multiplication using pseudo number theoretic trans-forms,IEEE Trans Signal Process., 43: 71–76, 1995.

[9] Nussbaumer, H.J.,Fast Fourier Transforms and Convolution Algorithms, Springer-Verlag,

Berlin, 1982

[10] Daubechies, I.,Ten Lectures on Wavelets, SIAM, Philadelphia, PA, 1992.

[11] Beylkin, G., Coifman, R and Rokhlin, V., Fast wavelet transforms and numerical algorithms I,

Comm Pure Appl Math., 44: 141–183, 1991.

[12] Sabetfakhri, K and Katehi, L.P.B., Analysis of integrated millimeter wave and submillime-ter wave waveguides using orthonormal wavelet expansions,IEEE Trans Microwave Theor Technol., 42: 2412–2422, 1994.

Ngày đăng: 25/10/2013, 02:15

TỪ KHÓA LIÊN QUAN

w