Parallel Programming: for Multicore and Cluster Systems- P40 ppsx

In the following, we present solution meth-ods for linear equation systems with banded structure and start the description with tridiagonal systems.. 7.2.2 Tridiagonal Systems For the so

Trang 1

where I denotes the N × N unit matrix, which has the value 1 in the diagonal

elements and the value 0 in all other entries The matrix B has the structure

B=

⎛

⎜

⎝

−1

⎞

⎟

Figure 7.9 illustrates the two-dimensional mesh with five-point stencil (above) and

the sparsity structure of the corresponding coefficient matrix A of Formula (7.17).

In summary, Formulas (7.15) and (7.17) represent a linear equation system with a sparse coefficient matrix, which has non-zero elements in the main diagonal and its

direct neighbors as well as in the diagonals in distance N Thus, the linear equation

system resulting from the Poisson equation has a banded structure, which should

be exploited when solving the system In the following, we present solution meth-ods for linear equation systems with banded structure and start the description with tridiagonal systems These systems have only three non-zero diagonals in the main diagonal and its two neighbors A tridiagonal system results, for example, when discretizing the one-dimensional Poisson equation

7.2.2 Tridiagonal Systems

For the solution of a linear equation system Ax = y with a banded or tridiagonal

coefficient matrix A∈ Rn ×n, specific solution methods can exploit the sparse matrix

structure A matrix A = (a i j)i, j=1, ,n ∈ Rn ×n is called banded when its structure

takes the form of a band of non-zero elements around the principal diagonal More

precisely, this means a matrix A is a banded matrix if there exists r ∈ N, r ≤ n,

with

a i j = 0 for |i − j| > r

The number r is called the semi-bandwidth of A For r = 1 a banded matrix

is called tridiagonal matrix We first consider the solution of tridiagonal systems

which are linear equation systems with tridiagonal coefficient matrix

7.2.2.1 Gaussian Elimination for Tridiagonal Systems

For the solution of a linear equation system Ax = y with tridiagonal matrix A,

the Gaussian elimination can be used Step k of the forward elimination (without

pivoting) results in the following computations, see also Sect 7.1:

1 Compute li k:= a (k)

i k /a (k)

kk for i = k + 1, , n.

2 Subtract li k times the kth row from the rows i = k + 1, , n, i.e., compute

a (k+1)= a (k) − l i k · a (k)

for k ≤ j ≤ n and k < i ≤ n

Trang 2

384 7 Algorithms for Systems of Linear Equations

i-1 i i+1 i-N i+N

x

1

2N

y

2 N

N+1 (N-1)N+1

1

2

n

x

x x

x

x x

x

x x x

x x

x x x

x x

N

Fig 7.9 Rectangular mesh in the x–y plane of size N × N and the n × n coefficient matrix with n = N2 of the corresponding linear equation system of the five-point formula The sparsity structure of the matrix corresponds to the adjacency relation of the mesh points The mesh can be considered as adjacency graph of the non-zero elements of the matrix

The vector y is changed analogously.

Because of the tridiagonal structure of A, all matrix elements ai k with i ≥ k + 2 are

zero elements, i.e., ai k = 0 Thus, in each step k of the Gaussian elimination only

one elimination factor lk+1 := l k +1,k and only one row with only one new element

have to be computed Using the notation

Trang 3

⎛

⎜

⎝

a2 b2 c2

a3 b3

cn−1

⎞

⎟

⎠

(7.20)

for the matrix elements and starting with u1= b1, these computations are

u k+1 = b k+1− l k+1· c k After n − 1 steps an LU decomposition A = LU of matrix (7.20) with

L=

⎛

⎜

l2 1

0 l n 1

⎞

⎟

⎛

⎜

u n−1 c n−1

⎞

⎟

results The right-hand side y is transformed correspondingly according to

˜yk+1= y k+1− l k+1· ˜y k The solution x is computed from the upper triangular matrix U by a backward sub-stitution, starting with xn = ˜y n /un and solving the equations ui x i + c i x i+1= ˜y ione after another resulting in

x i = ˜yi

u i −c i

u i

x i+1 for i = n − 1, , 1

The computational complexity of the Gaussian elimination is reduced to O(n) for tridiagonal systems However, the elimination phase computing lk and ukaccording

to Eq (7.21) is inherently sequential, since the computation of lk+1 depends on uk and the computation of uk+1depends on lk+1 Thus, in this form the Gaussian

elimi-nation or LU decomposition has to be computed sequentially and is not suitable for

a parallel implementation

7.2.2.2 Recursive Doubling for Tridiagonal Systems

An alternative approach for solving a linear equation system with tridiagonal matrix

is the method of recursive doubling or cyclic reduction The methods of recursive

doubling and cyclic reduction also use elimination steps but contain potential par-allelism [72, 71] Both techniques can be applied if the coefficient matrix is either symmetric and positive definite or diagonal dominant [115] The elimination steps

Trang 4

in both methods are applied to linear equation systems Ax = y with the matrix

structure shown in (7.20), i.e.,

b1 x1 + c1 x2 = y1,

a i x i−1 + b i x i + c i x i+1 = y i for i = 2, , n − 1,

a n x n−1+ b n x n = y n

The method, which was first introduced by Hockney and Golub in [91], uses two

equations i − 1 and i + 1 to eliminate the variables x i−1 and xi+1 from equation

i This results in a new equivalent equation system with a coefficient matrix with

three non-zero diagonals where the diagonals are moved to the outside Recursive doubling and cyclic reduction can be considered as two implementation variants for the same numerical idea of the method of Hockney and Golub The implementation

of recursive doubling repeats the elimination step, which finally results in a matrix structure in which only the elements in the principal diagonal are non-zero and the

solution vector x can be computed easily Cyclic reduction is a variant of recursive

doubling which also eliminates variables using neighboring rows But in each step the elimination is only applied to half of the equations and, thus, less computations

are performed On the other hand, the computation of the solution vector x requires

a substitution phase

We would like to mention that the terms recursive doubling and cyclic reduction are used in different ways in the literature Cyclic reduction is sometimes used for the numerical method of Hockney and Golub in both implementation variants, see [60, 115] On the other hand the term recursive doubling (or full recursive doubling)

is sometimes used for a different method, the method of Stone [168] This method applies the implementation variants sketched above in Eq (7.21) resulting from the Gaussian elimination, see [61, 173] In the following, we start the description of recursive doubling for the method of Hockney and Golub according to [61] and [13]

Recursive doubling considers three neighboring equations i − 1, i, i + 1 of

the equation system Ax = y with coefficient matrix A in the form (7.20) for

i = 3, 4, , n − 2 These equations are

a i−1x i−2 + b i−1x i−1+ c i−1x i = y i−1,

a i x i−1 + b i x i + c i x i+1 = y i,

a i+1x i + b i+1x i+1+ c i+1x i+2 = y i+1.

Equation i − 1 is used to eliminate x i−1from the i th equation and equation i+ 1 is

used to eliminate xi+1from the i th equation This is done by reformulating equations

i − 1 and i + 1 to

x i−1 = y i−1

b i−1 −a i−1

b i−1x i−2−c i−1

b i−1x i ,

x i+1 = y i+1

b i+1 −a i+1

b i+1x i−c i+1

b i+1x i+2

and inserting those descriptions of xi−1and xi+1into equation i The resulting new equation i is

Trang 5

a i(1)x i−2+ b(1)

i x i + c(1)

i x i+2= y(1)

with coefficients

a(1)i = α(1)

i · a i−1,

b(1)i = b i + α(1)

i · c i−1+ β(1)

i · a i+1,

c(1)i = β(1)

y i(1) = y i + α(1)

i · y i−1+ β(1)

i · y i+1,

and

α(1)

i := −a i /bi−1,

β(1)

i := −c i /bi+1 For the special cases i = 1, 2, n − 1, n, the coefficients are given by

b(1)1 = b1+ β(1)

1 · a2, y1(1)= y1+ β(1)

1 · y2,

b(1)n = b n + α(1)

n · c n−1, y n(1)= b n + α(1)

n · y n−1,

a1(1)= a(1)

2 = 0, and c(1)n−1 = c(1)

The values for a(1)n−1, a(1)

n , b(1)2 , b n(1)−1, c(1)1 , c2(1), y2(1), and y n(1)−1 are defined as in

Eq (7.23) Equation (7.22) forms a linear equation system A(1)x = y(1) with a coefficient matrix

A(1)=

⎛

⎜

0 b(1)2 0 c(1)2

a3(1) 0 b(1)3

a(1)4 c(1)

n−2

0

n

⎞

⎟

.

Comparing the structure of A(1) with the structure of A, it can be seen that the

diagonals are moved to the outside

In the next step, this method is applied to the equations i − 2, i, i + 2 of the

equation system A(1)x = y(1) for i = 5, 6, , n − 4 Equation i − 2 is used to

eliminate xi−2 from the i th equation and equation i + 2 is used to eliminate x i+2

from the i th equation This results in a new i th equation

a i(2)x i−4+ b(2)

i x i + c(2)

i x i+4 = y(2)

i , which contains the variables xi−4, xi, and xi+4 The cases i = 1, , 4, n −3, , n

are treated separately as shown for the first elimination step Altogether a next

equa-tion system A(2)x = y(2) results in which the diagonals are further moved to the

outside The structure of A(2)is

Trang 6

A(2)=

⎛

⎜

⎝

n

⎞

⎟

⎠

.

The following steps of the recursive doubling algorithm apply the same method

to the modified equation system of the last step Step k transfers the side diagonals

2k− 1 positions away from the main diagonal, compared to the original coefficient

matrix This is reached by considering equations i− 2k−1, i, i + 2 k−1:

a i (k−2−1)k−1x i−2k + b (k−1)

i−2k−1x i−2k−1+ c (k−1)

i−2k−1,

a i (k−1)x i−2k−1 + b (k−1)

i x i + c (k−1)

a i (k+2−1)k−1x i + b (k−1)

i+2k−1x i+2k−1+ c (k−1)

i+2k−1x i+2k = y (k−1)

i+2k−1

Equation i − 2k−1 is used to eliminate xi

−2k−1 from the i th equation and equation

i+ 2k−1is used to eliminate xi

+2k−1from the i th equation Again, the elimination is

performed by computing the coefficients for the next equation system These coef-ficients are

a (k) i = α (k)

i · a (k−1)

i−2k−1 for i = 2k + 1, , n, and a (k)

c (k) i = β (k)

i · c (k−1)

i+2k−1 for i = 1, , n − 2 k , and c (k)

b (k) i = α (k)

i · c (k−1)

i−2k−1+ b (k−1)

i + β (k)

i · a (k−1)

i+2k−1 for i = 1, , n ,

y i (k) = α (k)

i · y (k−1)

i−2k−1+ y (k−1)

i + β (k)

i · y (k−1)

i+2k−1 for i = 1, , n

with

α (k)

i := −a (k−1)

i /b (k−1)

i−2k−1 for i = 2k−1+ 1, , n , (7.25)

β (k)

i := −c (k−1)

i /b (k−1)

i+2k−1 for i = 1, , n − 2 k−1 The modified equation i results by multiplying equation i− 2k−1from step k− 1

withα (k)

i , multiplying equation i+ 2k−1from step k − 1 with β (k)

i , and adding both

to equation i The resulting i th equation is

a (k) x i−2k + b (k)

x i + c (k)

x i+2k = y (k)

(7.26)

Trang 7

with the coefficients (7.24) The cases k = 1, 2 are special cases of this formula.

The initialization for k= 0 is the following:

a(0)i = a i for i = 2, , n ,

b(0)i = b i for i = 1, , n ,

c(0)i = c i for i = 1, , n − 1 ,

y(0)i = y i for i = 1, , n

and a1(0)= 0, c(0)

n = 0 Also, for the steps k = 0, , log n and i ∈ Z \ {1, , n}

the values

a (k) i = c (k)

i = y (k)

b (k) i = 1 ,

x i = 0

are set After N = log n steps, the original matrix A is transformed into a diagonal

matrix A (N )

A (N ) = diag(b (N )

1 , , b (N )

n )

in which only the main diagonal contains non-zero elements The solution x of the

linear equation system can be directly computed using this matrix and the

corre-spondingly modified vector y (N ):

x i = y (N )

i /b (N )

i for i = 1, 2, , n

To summarize, the recursive doubling algorithm consists of two main phases:

1 Elimination phase: Compute the values a i (k) , b (k) i , c (k) i , and y i (k) for k =1, ,log n

and i = 1, , n according to Eqs (7.24) and (7.25).

2 Solution phase: Compute xi = y (N )

i /b (N )

i for i = 1, , n with N = log n.

The first phase consists of log n steps where in each step O(n) values are

com-puted The sequential asymptotic runtime of the algorithm is therefore O(n · log n)

which is asymptotically slower than the O(n) runtime for the Gaussian elimination

approach described earlier The advantage is that the computations in each step of the elimination and the substitution phase are independent and can be performed in parallel Figure 7.10 illustrates the computations of the recursive doubling algorithm and the data dependencies between different steps

7.2.2.3 Cyclic Reduction for Tridiagonal Systems

The recursive doubling algorithm offers a large degree of potential parallelism but has a larger computational complexity than the Gaussian elimination caused by

Trang 8

i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8

Fig 7.10 Dependence graph for the computation steps of the recursive doubling algorithm in the

case of three computation steps and eight equations The computations of step k are shown in column k of the illustration Column k contains one node for each equation i , thus representing the computation of all coefficients needed in step k Column 0 represents the data of the coefficient matrix of the linear system An edge from a node i in step k to a node j in step k+ 1 means that

the computation at node j needs at least one coefficient computed at node i

computational redundancy The cyclic reduction algorithm is a modification of

recursive doubling which reduces the amount of computations to be performed In each step, half the variables in the equation system are eliminated which means that

only half of the values a i (k) , b (k) i , c i (k) , and y i (k) are computed A substitution phase

is needed to compute the solution vector x The elimination and the substitution

phases of cyclic reduction are described by the following two phases:

1 Elimination phase: For k i (k) , b i (k) , c (k) i , and y i (k)with

i = 2k , , n and step size 2 k The number of equations of the form (7.26) is reduced by a factor of 1

equation left for i= 2N with N

2 Substitution phase: For k i according to Eq (7.26)

for i= 2k , , n with step size 2 k+1:

x i = y

(k)

i − a (k)

i · x i−2k − c (k)

i · x i+2k

Figure 7.11 illustrates the computations of the elimination and the substitution phases of cyclic reduction represented by nodes and their dependencies represented

by arrows In each computation step k, k

Trang 9

i=2

i=3

i=4

i=5

i=6

i=7

i=8

8

x

4

x

x x

2

6

1

3

5

7

Fig 7.11 Dependence graph illustrating the dependencies between neighboring computation steps

of the cyclic reduction algorithm for the case of three computation steps and eight equations in analogy to the representation in Fig 7.10 The first four columns represent the computations of the

coefficients The last columns in the graph represent the computation of the solution vector x in

the second phase of the cyclic reduction algorithm, see (7.27)

there are n/2 k

nodes representing the computations for the coefficients of one equa-tion This results in

n

2+n

4+n

8 + · · · + n

i=1

1

2i ≤ n

computation nodes with N

reduction is O(n) Thus, the computational complexity is the same as for the

Gaussian elimination; however, the cyclic reduction offers potential parallelism which can be exploited in a parallel implementation as described in the following The computations of the numbersα (k)

i ,β (k)

i require a division by b i (k)and, thus,

cyclic reduction as well as recursive doubling is not possible if any number b (k) i is zero This can happen even when the original matrix is invertible and has non-zero diagonal elements or when the Gaussian elimination can be applied without pivot-ing However, for many classes of matrices it can be shown that a division by zero

is never encountered Examples are matrices A which are symmetric and positive

definite or invertible and diagonally dominant, see [61] or [115] (using the name

odd–even reduction) (A matrix A is symmetric if A = A T and positive definite if

x T Ax > 0 for all x A matrix is diagonally dominant if in each row the absolute

value of the diagonal element exceeds the sum of the absolute values of the other elements in the row without the diagonal in the row.)

Trang 10

7.2.2.4 Parallel Implementation of Cyclic Reduction

We consider a parallel algorithm for the cyclic reduction for p processors For the description of the phases we assume n = p · q for q ∈ N and q = 2 Q

for Q∈ N

Each processor stores a block of rows of size q, i.e., processor Pi stores the rows of

A with the numbers (i − 1)q + 1, , i · q for 1 ≤ i ≤ p We describe the parallel

algorithm with data exchange operations that are needed for an implementation with

a distributed address space As data distribution a row-blockwise distribution of the

matrix A is used to reduce the interaction between processors as much as possible.

The parallel algorithm for the cyclic reduction comprises three phases: the elimina-tion phase stopping earlier than described above, an addielimina-tional recursive doubling phase, and a substitution phase

Phase 1: Parallel reduction of the cyclic reduction in log q steps: Each

pro-cessor computes the first Q = log q steps of the cyclic reduction algorithm,

i.e., processor Pi computes for k = 1, , Q the values

a (k) j , b (k)

j , c (k)

j , y (k) j for j = (i − 1) · q + 2 k , , i · q with step size 2 k After each computation step,

processor Pi receives four data values from Pi−1(if i > 1) and from processor Pi+1

(if i < n) computed in the previous step Since each processor owns a block of rows of size q, no communication with any other processor is required The size of

data to be exchanged with the neighboring processors is a multiple of 4 since four

coefficients (a (k) j , b (k) j , c (k) j , y (k) j ) are transferred Only one data block is received per

step and so there are at most 2Q messages of size 4 for each step.

Phase 2: Parallel recursive doubling for tridiagonal systems of size p:

Proces-sor Pi is responsible for the i th equation of the following p-dimensional tridiagonal

system

˜ai ˜xi−1+ ˜b i ˜xi + ˜c i ˜xi+1= ˜y i for i = 1, , p

with

˜ai = a (Q)

i ·q

˜bi = b (Q)

i ·q

˜ci = c (Q)

i ·q

˜yi = y (Q)

i ·q

˜xi = x i ·q

⎫

⎪

for i = 1, , p

For the solution of this system, we use recursive doubling Each processor is

assigned one equation Processor Pi performslog p steps of the recursive

dou-bling algorithm In step k, k = 1, , log p, processor P i computes

˜a (k) , ˜b (k) , ˜c (k) , ˜y (k)

Tiêu đề	Direct Methods for Linear Systems with Banded Structure
Trường học	University of Example
Chuyên ngành	Computer Science
Thể loại	Bài báo
Năm xuất bản	2023
Thành phố	Example City

Định dạng
Số trang	10
Dung lượng	244,09 KB