Applied Computational Fluid Dynamics Techniques - Wiley Episode 1 Part 7 doc

7 SOLUTION OF LARGE SYSTEMSOF EQUATIONS As we saw from the preceeding sections, both the straightforward spatial discretization of a steady-state problem and the implicit time discretiza

Trang 1

7 SOLUTION OF LARGE SYSTEMS

OF EQUATIONS

As we saw from the preceeding sections, both the straightforward spatial discretization of

a steady-state problem and the implicit time discretization of a transient problem will yield

a large system of coupled equations of the form

There are two basic approaches to the solution of this problem:

(a) directly, by some form of Gaussian elimination; or

Suppose that the objective is to obtain vanishing entries for all matrix elements located in

the j th column below the diagonal K jj entry This can be achieved by adding to the kth row (k > j ) an appropriate fraction of the j th row, resulting in

(K kl + α k K jl )u l = f k + α k f j , k > j. (7.4)

Applied Computational Fluid Dynamics Techniques: An Introduction Based on Finite Element Methods, Second Edition.

Trang 2

138 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES

0 t

=

= 0

Figure 7.1 Direct solvers: (a) Gaussian elimination; (b) Crout decomposition; (c) Cholesky

Such an addition of rows will not change the final result for u and is therefore allowable For

the elements located in the j th column below the diagonal K jjentry to vanish, we must have

the matrix triangularization requires O(N ) multiplications for each column, i.e O(N2)

operations for all columns As this has to be repeated for each row, the total estimate is

O(N3) The solution phase requires O(N ) operations for each row, i.e O(N2)operations

for all unknowns If the matrix has a banded structure with bandwidth N ba, these estimates

reduce to O(N N ba2) for the matrix triangularization and O(N N ba )for the solution phase.Gaussian elimination is seldomly used in practice, as the transformation of the matrix changesthe RHS vector, thereby rendering it inefficient for systems with multiple RHS

Trang 3

that the matrix has been decomposed up to entry i − 1, i − 1, i.e the entries 1 : i − 1; 1 : −1

of L and U are known The entries along row i are given by

This completes the decomposition of the ith row and column The process is started with

the first row and column, and repeated for all remaining ones Once the decomposition iscomplete, the system is solved in two steps:

- Forward substitution: L · v = f, followed by

- Backward substitution: U · u = v.

Observe that the RHS is not affected by the decomposition process This allows the simplesolution of multiple RHS

Trang 4

140 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES7.1.3 CHOLESKY ELIMINATION

This special decomposition is only applicable to symmetric matrices The algorithm is almostthe same as the Crout decomposition, except that square roots are taken for the diagonalelements This seemingly innocuous change has a very beneficial effect on rounding errors(Zurmühl (1964))

All direct solvers have a storage and operation count as follows:

the reduction of the bandwidth N ba This is an optimization problem that is Np complete,

i.e many heuristic solutions can be obtained that give the same or nearly the same costfunction (bandwidth in this case), but the optimum solution is practically impossible to obtain.Moreover, the optimum solution may not be unique As an example, consider a square domain

discretized by N × N quadrilateral elements Suppose further that Poisson’s equation with

Dirichlet boundary conditions is to be solved numerically, and that the spatial discretizationconsists of bilinear finite elements Starting any numbering in the same way from each ofthe four corners will give the same bandwidth, storage and CPU requirements, and hence thesame cost function Bandwidth reduction implies a renumbering of the nodes, with the aim

of bringing all matrix entries closer to the diagonal The main techniques used to accomplishthis are (Piessanetzky (1984)):

- Cuthill–McKee (CMK), and reverse CMK, which order the points according tolowest connectivity with surrounding points at each level of the corresponding graph(Cuthill and McKee (1969));

- wavefront, whereby the mesh is renumbered according to an advancing front; and

- nested dissection, where the argument of bandwidth reduction due to recursive vision of domains is employed (George and Liu (1981))

subdi-The first two approaches have been used extensively in structural finite element analysis.Many variations have been reported, particularly for the ‘non-exact’ parameters such asstarting point, depth of search and trees, data structures, etc Renumbering strategies reappearwhen trying to minimize cache-misses, and are considered in more depth in Chapter 15

7.2 Iterative solvers

When (7.1) is solved iteratively, the matrix K is not inverted directly, but the original problem

is replaced by a sequence of solutions of the form

˜K · (u n+1− un ) = ˜K · u = τr = τ(f − K · u). (7.13)

The vector r is called the residual vector, and ˜K the preconditioning matrix The case ˜K = K

corresponds to a direct solution, and the sequence of solutions stops after one iteration The

aim is to approximate K by some low-cost, yet ‘good’ ˜ K ‘Good’ in this context means that:

Trang 5

(a) ˜K is inexpensive to decompose or solve for;

(b) ˜K contains relevant information (eigenvalues, eigenvectors) about K.

Unfortunately, these requirements are contradictory What tends to happen is that the

low-eigenvalue (i.e long-wavelength eigenmodes) information is lost when K is

approx-imated with ˜K To circumvent this deficiency, most practical iterative solvers employ a

‘globalization’ procedure that counteracts this loss of low-eigenvalue information Boththe approximation and the globalization algorithms employed may be grouped into threefamilies: operator-based, grid-based and matrix-based We just mention some examples here

G1 Operator-based: Tshebishev, supersteps, etc.;

G2 Grid-based: projection (one coarser grid), multigrid (n coarser grids);

G3 Matrix-based: dominant eigenvalue extrapolation, conjugate gradient (CG), ized minimal residuals (GMRES), algebraic multigrid (AMG)

general-7.2.1 MATRIX PRECONDITIONING

In order to be more specific about the different techniques, we rewrite the matrix K either as

a sum of lower, diagonal and upper parts

Trang 6

7.2.1.1 Diagonal preconditioning

The simplest preconditioners are obtained by neglecting all off-diagonal matrix entries,

resulting in diagonal preconditioning

The physical implication of this simplification is that any transfer of information betweenpoints or elements can only be accomplished on the RHS during the iterations (equation(7.13)) This implies that information can only travel one element per iteration, and is similar

to explicit timestepping with local timesteps

Stokes equations with a k − turbulence model 7 × 7 blocks are obtained As before,

the propagation of information between gridpoints can only occur on the RHS during theiterations, at a maximum speed of one element per iteration The advantage of block-diagonal preconditioning is that it removes the stiffness that may result from equations withdifferent time scales A typical class of problems for which block-diagonal preconditioning

is commonly used is chemically reacting flows, where the time scales of chemical reactionsmay be orders of magnitude smaller than the (physically interesting) advection time scale ofthe fluid

7.2.1.3 LU preconditioning

Although point preconditioners are extremely fast, all the inter-element propagation ofinformation occurs on the RHS, resulting in slow convergence rates Faster information

transfer can only be achieved by neglecting fewer entries from K in ˜ K, albeit at higher CPU

and storage costs If we recall that the solution of a lower (or upper) matrix by itself is simple,

a natural way to obtain better preconditioners is to attempt a preconditioner of the form

Trang 7

approximate the original matrix K well An extra diagonal term has appeared This may be

remedied by interposing the inverse of the diagonal between the lower and upper matrices,resulting in

˜K = ˜KL· D−1· ˜KU = (L + D) · D−1· (D + U) = K + L · D−1· U. (7.21)The error may also be mitigated by adding, for subsequent iterations, a correction with thelatest information of the unknowns This leads to two commonly used schemes:

Ku = (L + D + U) · u = r + L · D−1· U · (u0− u). (7.24)

In most cases u0= 0 GS and LU-SGS have been used extensively in CFD, both as solversand as preconditioners In this context, very elaborate techniques that combine physicalinsight, local eigenvalue decompositions and scheme switching have produced very fast

and robust preconditioners (Sharov and Nakahashi (1998), Luo et al (1998), Sharov et al (2000a), Luo et al (2001)).

Diagonal +1 preconditioning

Consider a structured grid of m × n points Furthermore, assume that a discretization of the

Laplacian using the standard stencil

−u i −1,j − u i,j−1+ 4u i,j − u i +1,j − u i,j+1= r i (7.25)

Trang 8

1 2 3 N

N+1

2N+1

2N 3N

M*N

M*N N N

Figure 7.2 Matrix resulting from m × n structured grid

is being performed The resulting K matrix for the numbering shown in Figure 7.2(a) is depicted in Figure 7.2(b) As one can see, K consists of a tridiagonal core D and regular

(Hassan et al (1990), Martin and Löhner (1992), Mavriplis (1995), Soto et al (2003)) For

cases where no discernable spatial direction for stiffness exists, several point renumberings

should be employed, with the aim of covering as many i, j ,

Diagonal +1 Gauss–Seidel

As before, the unknowns already obtained during the solution of ˜K = Dcan be re-used with

minor additional effort, resulting in diagonal+1 GS preconditioning For structured grids,this type of preconditioning is referred to as line GS relaxation The resulting preconditioningmatrices are of the form

˜KL= L+ D, ˜KU= D+ U. (7.28)

Trang 9

7.2.1.4 Incomplete lower-upper preconditioning

All the preconditioners described so far avoided the large operation count and storage

requirements of a direct inversion of K by staying close to the diagonal when operating

with ˜K For incomplete lower-upper (ILU) preconditioning, the product decomposition of

the Crout solver

If K is tridiagonal, then ˜ K = K, implying perfect preconditioning The observation often

made is that the quality of ˜K depends strongly on the bandwidth, which in turn depends

on the point numbering (Duff and Meurant (1989), Venkatakrishnan and Mavriplis (1993,1995)) The smaller the bandwidth, the closer ˜K is to K, and the better the preconditioning.

This is to be expected for problems with no discernable stiffness direction If, on the otherhand, a predominant stiffness direction exists, the point numbering should be aligned with it.This may or may not result in small bandwidths (see Figure 7.3 for a counterexample), but iscertainly the most advisable way to renumber the points

1 2 3 4 5 1

3 5 6

Figure 7.3 Counterexample

Before going on, the reader should consider the storage requirements of ILU conditioners Assuming the lower bound of no allowed fill-in (nfilr=0), a discretiza-tion of space using linear tetrahedra and neqns unknowns per point, we requirenstor=2*neqns*neqns*nedgefor ˜L, ˜U, which for the Euler or laminar Navier–Stokes

pre-equations with neqns=5 and on a typical mesh with nedge=7*npoin translates intonstor=350*npoin storage locations Given that a typical explicit Euler solver on thesame type of grid only requiresnstor=90*npoinstorage locations, it is not difficult tosee why even one more layer of fill-in is seldom used

Trang 10

7.2.1.5 Block methods

Considering that the cost of direct solvers scales with the square of the bandwidth, another

possibility is to decompose K into blocks These blocks are then solved for directly The

reduction in cost is a result of neglecting all matrix entries outside the block, leading to lower

bandwidths With the notation of Figure 7.4, we may decompose K additively as

Figure 7.4 Decomposition of a matrix

For the additive decomposition, one can either operate without reusing the unknowns at thesolution stage, i.e just on the diagonal level,

or, analogous to Gauss–Seidel, by reusing the unknowns at the solution stage,

˜KL= Lb+ Db , ˜KU= Ub+ Db (7.34)For the product decomposition, the preconditioner is of the form

where I denotes the identity matrix, and E contains the off-diagonal block entries, scaled by

D As before, the propagation of information is determined by the numbering of the blocks.

Typical examples of this type of preconditioning are element-by-element (Hughes et al (1983a,c)) or group-by-group (Tezduyar and Liou (1989), Tezduyar et al (1992a), Liou and

Tezduyar (1992)) techniques

Trang 11

7.2.2 GLOBALIZATION PROCEDURES

As seen from the previous section, any form of preconditioning neglects some information

from the original matrix K The result is that after an initially fast convergence, a very

slow rate of convergence sets in In order to avert this behaviour, a number of acceleration

or globalization procedures have been devised The description that follows starts with theanalytical ones, and then proceeds to matrix-based and grid-based acceleration Let us recallthe basic iterative scheme:

is employed, the resulting discretization at node i, j, k for the Jacobi iterations with ˜K = D

and τ = t takes the form

4(1 + a2+ b2)u i,j,k = t[(u i −1,j,k − 2u i,j,k + u i +1,j,k )

+ a2(u i,j −1,k − 2u i,j,k + u i,j +1,k ) + b2(u i,j,k−1− 2u i,j,k + u i,j,k+1) ], (7.38)

with a = h x / h y , b = h x / h z Inserting the Fourier mode

u = g p m,n,lexp

iπ x

mh x

exp

j πy

nh y

exp

Note that we have lumped the constant portions of the grid and a mode into the function

f (a, b, m, n, l) After p timesteps with varying t , the decay factor will be given by

Trang 12

(a) Tchebicheff sequence (Löhner and Morgan (1987)):

1+ cos [π · (q − 1)/p] , q = 1, , p; (7.43a)(b) superstep sequence (Gentzsch and Schlüter (1978), Gentzsch (1980)):

1+ (R/p2) + cos [π · (2q − 1)/2p] , q = 1, , p, R = 2.5 (7.43b) Observe that in both cases the maximum timestep is of order t = O(p2), which is outsidethe stability range The overall procedure nevertheless remains stable, as the smaller timesteps

‘rescue’ the stability Figure 7.5 compares the performances of the two formulas for the 1-Dcase with that of uniform timesteps The improvement in residual reduction achieved by theuse of non-uniform timesteps is clearly apparent

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0

0.00 0.31 0.63 0.94 1.26 1.57 1.88 2.20 2.51 2.83 3.14

Phi

Eqn.(7.43a) Eqn.(7.43b) Tau=0.8

Figure 7.5 Damping curves for the Laplacian

Returning to (7.42), let us determine the magnitude of t required to eliminate a certain mode For any given mode g m,n,l, the mode can be eliminated by choosing a timestep ofmagnitude

(1+ cos(π/m)) + a2(1+ cos(π/n)) + b2(1+ cos(π/l)) , (7.44)with c2= a2+ b2 The timesteps required to eliminate the three highest modes have beensummarized in Table 7.1 Two important trends may be discerned immediately

(a) The magnitude of t , or, equivalently, the number of iterations required to eliminate

the three highest modes, increases with the dimensionality of the problem This is the case

even for uniform grids (a = b = 1).

Định dạng
Số trang	25
Dung lượng	250,06 KB