7 SOLUTION OF LARGE SYSTEMSOF EQUATIONS As we saw from the preceeding sections, both the straightforward spatial discretization of a steady-state problem and the implicit time discretiza
Trang 17 SOLUTION OF LARGE SYSTEMS
OF EQUATIONS
As we saw from the preceeding sections, both the straightforward spatial discretization of
a steady-state problem and the implicit time discretization of a transient problem will yield
a large system of coupled equations of the form
There are two basic approaches to the solution of this problem:
(a) directly, by some form of Gaussian elimination; or
Suppose that the objective is to obtain vanishing entries for all matrix elements located in
the j th column below the diagonal K jj entry This can be achieved by adding to the kth row (k > j ) an appropriate fraction of the j th row, resulting in
(K kl + α k K jl )u l = f k + α k f j , k > j. (7.4)
Applied Computational Fluid Dynamics Techniques: An Introduction Based on Finite Element Methods, Second Edition.
Trang 2138 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES
0 t
=
= 0
Figure 7.1 Direct solvers: (a) Gaussian elimination; (b) Crout decomposition; (c) Cholesky
Such an addition of rows will not change the final result for u and is therefore allowable For
the elements located in the j th column below the diagonal K jjentry to vanish, we must have
the matrix triangularization requires O(N ) multiplications for each column, i.e O(N2)
operations for all columns As this has to be repeated for each row, the total estimate is
O(N3) The solution phase requires O(N ) operations for each row, i.e O(N2)operations
for all unknowns If the matrix has a banded structure with bandwidth N ba, these estimates
reduce to O(N N ba2) for the matrix triangularization and O(N N ba )for the solution phase.Gaussian elimination is seldomly used in practice, as the transformation of the matrix changesthe RHS vector, thereby rendering it inefficient for systems with multiple RHS
Trang 3that the matrix has been decomposed up to entry i − 1, i − 1, i.e the entries 1 : i − 1; 1 : −1
of L and U are known The entries along row i are given by
This completes the decomposition of the ith row and column The process is started with
the first row and column, and repeated for all remaining ones Once the decomposition iscomplete, the system is solved in two steps:
- Forward substitution: L · v = f, followed by
- Backward substitution: U · u = v.
Observe that the RHS is not affected by the decomposition process This allows the simplesolution of multiple RHS
Trang 4140 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES7.1.3 CHOLESKY ELIMINATION
This special decomposition is only applicable to symmetric matrices The algorithm is almostthe same as the Crout decomposition, except that square roots are taken for the diagonalelements This seemingly innocuous change has a very beneficial effect on rounding errors(Zurmühl (1964))
All direct solvers have a storage and operation count as follows:
the reduction of the bandwidth N ba This is an optimization problem that is Np complete,
i.e many heuristic solutions can be obtained that give the same or nearly the same costfunction (bandwidth in this case), but the optimum solution is practically impossible to obtain.Moreover, the optimum solution may not be unique As an example, consider a square domain
discretized by N × N quadrilateral elements Suppose further that Poisson’s equation with
Dirichlet boundary conditions is to be solved numerically, and that the spatial discretizationconsists of bilinear finite elements Starting any numbering in the same way from each ofthe four corners will give the same bandwidth, storage and CPU requirements, and hence thesame cost function Bandwidth reduction implies a renumbering of the nodes, with the aim
of bringing all matrix entries closer to the diagonal The main techniques used to accomplishthis are (Piessanetzky (1984)):
- Cuthill–McKee (CMK), and reverse CMK, which order the points according tolowest connectivity with surrounding points at each level of the corresponding graph(Cuthill and McKee (1969));
- wavefront, whereby the mesh is renumbered according to an advancing front; and
- nested dissection, where the argument of bandwidth reduction due to recursive vision of domains is employed (George and Liu (1981))
subdi-The first two approaches have been used extensively in structural finite element analysis.Many variations have been reported, particularly for the ‘non-exact’ parameters such asstarting point, depth of search and trees, data structures, etc Renumbering strategies reappearwhen trying to minimize cache-misses, and are considered in more depth in Chapter 15
7.2 Iterative solvers
When (7.1) is solved iteratively, the matrix K is not inverted directly, but the original problem
is replaced by a sequence of solutions of the form
˜K · (u n+1− un ) = ˜K · u = τr = τ(f − K · u). (7.13)
The vector r is called the residual vector, and ˜K the preconditioning matrix The case ˜K = K
corresponds to a direct solution, and the sequence of solutions stops after one iteration The
aim is to approximate K by some low-cost, yet ‘good’ ˜ K ‘Good’ in this context means that:
Trang 5(a) ˜K is inexpensive to decompose or solve for;
(b) ˜K contains relevant information (eigenvalues, eigenvectors) about K.
Unfortunately, these requirements are contradictory What tends to happen is that the
low-eigenvalue (i.e long-wavelength eigenmodes) information is lost when K is
approx-imated with ˜K To circumvent this deficiency, most practical iterative solvers employ a
‘globalization’ procedure that counteracts this loss of low-eigenvalue information Boththe approximation and the globalization algorithms employed may be grouped into threefamilies: operator-based, grid-based and matrix-based We just mention some examples here
G1 Operator-based: Tshebishev, supersteps, etc.;
G2 Grid-based: projection (one coarser grid), multigrid (n coarser grids);
G3 Matrix-based: dominant eigenvalue extrapolation, conjugate gradient (CG), ized minimal residuals (GMRES), algebraic multigrid (AMG)
general-7.2.1 MATRIX PRECONDITIONING
In order to be more specific about the different techniques, we rewrite the matrix K either as
a sum of lower, diagonal and upper parts
Trang 6142 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES
7.2.1.1 Diagonal preconditioning
The simplest preconditioners are obtained by neglecting all off-diagonal matrix entries,
resulting in diagonal preconditioning
The physical implication of this simplification is that any transfer of information betweenpoints or elements can only be accomplished on the RHS during the iterations (equation(7.13)) This implies that information can only travel one element per iteration, and is similar
to explicit timestepping with local timesteps
Stokes equations with a k − turbulence model 7 × 7 blocks are obtained As before,
the propagation of information between gridpoints can only occur on the RHS during theiterations, at a maximum speed of one element per iteration The advantage of block-diagonal preconditioning is that it removes the stiffness that may result from equations withdifferent time scales A typical class of problems for which block-diagonal preconditioning
is commonly used is chemically reacting flows, where the time scales of chemical reactionsmay be orders of magnitude smaller than the (physically interesting) advection time scale ofthe fluid
7.2.1.3 LU preconditioning
Although point preconditioners are extremely fast, all the inter-element propagation ofinformation occurs on the RHS, resulting in slow convergence rates Faster information
transfer can only be achieved by neglecting fewer entries from K in ˜ K, albeit at higher CPU
and storage costs If we recall that the solution of a lower (or upper) matrix by itself is simple,
a natural way to obtain better preconditioners is to attempt a preconditioner of the form
Trang 7approximate the original matrix K well An extra diagonal term has appeared This may be
remedied by interposing the inverse of the diagonal between the lower and upper matrices,resulting in
˜K = ˜KL· D−1· ˜KU = (L + D) · D−1· (D + U) = K + L · D−1· U. (7.21)The error may also be mitigated by adding, for subsequent iterations, a correction with thelatest information of the unknowns This leads to two commonly used schemes:
Ku = (L + D + U) · u = r + L · D−1· U · (u0− u). (7.24)
In most cases u0= 0 GS and LU-SGS have been used extensively in CFD, both as solversand as preconditioners In this context, very elaborate techniques that combine physicalinsight, local eigenvalue decompositions and scheme switching have produced very fast
and robust preconditioners (Sharov and Nakahashi (1998), Luo et al (1998), Sharov et al (2000a), Luo et al (2001)).
Diagonal +1 preconditioning
Consider a structured grid of m × n points Furthermore, assume that a discretization of the
Laplacian using the standard stencil
−u i −1,j − u i,j−1+ 4u i,j − u i +1,j − u i,j+1= r i (7.25)
Trang 8144 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES
1 2 3 N
N+1
2N+1
2N 3N
M*N
M*N
M*N N N
Figure 7.2 Matrix resulting from m × n structured grid
is being performed The resulting K matrix for the numbering shown in Figure 7.2(a) is depicted in Figure 7.2(b) As one can see, K consists of a tridiagonal core D and regular
(Hassan et al (1990), Martin and Löhner (1992), Mavriplis (1995), Soto et al (2003)) For
cases where no discernable spatial direction for stiffness exists, several point renumberings
should be employed, with the aim of covering as many i, j ,
Diagonal +1 Gauss–Seidel
As before, the unknowns already obtained during the solution of ˜K = Dcan be re-used with
minor additional effort, resulting in diagonal+1 GS preconditioning For structured grids,this type of preconditioning is referred to as line GS relaxation The resulting preconditioningmatrices are of the form
˜KL= L+ D, ˜KU= D+ U. (7.28)
Trang 97.2.1.4 Incomplete lower-upper preconditioning
All the preconditioners described so far avoided the large operation count and storage
requirements of a direct inversion of K by staying close to the diagonal when operating
with ˜K For incomplete lower-upper (ILU) preconditioning, the product decomposition of
the Crout solver
If K is tridiagonal, then ˜ K = K, implying perfect preconditioning The observation often
made is that the quality of ˜K depends strongly on the bandwidth, which in turn depends
on the point numbering (Duff and Meurant (1989), Venkatakrishnan and Mavriplis (1993,1995)) The smaller the bandwidth, the closer ˜K is to K, and the better the preconditioning.
This is to be expected for problems with no discernable stiffness direction If, on the otherhand, a predominant stiffness direction exists, the point numbering should be aligned with it.This may or may not result in small bandwidths (see Figure 7.3 for a counterexample), but iscertainly the most advisable way to renumber the points
1 2 3 4 5 1
3 5 6
Figure 7.3 Counterexample
Before going on, the reader should consider the storage requirements of ILU conditioners Assuming the lower bound of no allowed fill-in (nfilr=0), a discretiza-tion of space using linear tetrahedra and neqns unknowns per point, we requirenstor=2*neqns*neqns*nedgefor ˜L, ˜U, which for the Euler or laminar Navier–Stokes
pre-equations with neqns=5 and on a typical mesh with nedge=7*npoin translates intonstor=350*npoin storage locations Given that a typical explicit Euler solver on thesame type of grid only requiresnstor=90*npoinstorage locations, it is not difficult tosee why even one more layer of fill-in is seldom used
Trang 10146 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES
7.2.1.5 Block methods
Considering that the cost of direct solvers scales with the square of the bandwidth, another
possibility is to decompose K into blocks These blocks are then solved for directly The
reduction in cost is a result of neglecting all matrix entries outside the block, leading to lower
bandwidths With the notation of Figure 7.4, we may decompose K additively as
Figure 7.4 Decomposition of a matrix
For the additive decomposition, one can either operate without reusing the unknowns at thesolution stage, i.e just on the diagonal level,
or, analogous to Gauss–Seidel, by reusing the unknowns at the solution stage,
˜KL= Lb+ Db , ˜KU= Ub+ Db (7.34)For the product decomposition, the preconditioner is of the form
where I denotes the identity matrix, and E contains the off-diagonal block entries, scaled by
D As before, the propagation of information is determined by the numbering of the blocks.
Typical examples of this type of preconditioning are element-by-element (Hughes et al (1983a,c)) or group-by-group (Tezduyar and Liou (1989), Tezduyar et al (1992a), Liou and
Tezduyar (1992)) techniques
Trang 117.2.2 GLOBALIZATION PROCEDURES
As seen from the previous section, any form of preconditioning neglects some information
from the original matrix K The result is that after an initially fast convergence, a very
slow rate of convergence sets in In order to avert this behaviour, a number of acceleration
or globalization procedures have been devised The description that follows starts with theanalytical ones, and then proceeds to matrix-based and grid-based acceleration Let us recallthe basic iterative scheme:
is employed, the resulting discretization at node i, j, k for the Jacobi iterations with ˜K = D
and τ = t takes the form
4(1 + a2+ b2)u i,j,k = t[(u i −1,j,k − 2u i,j,k + u i +1,j,k )
+ a2(u i,j −1,k − 2u i,j,k + u i,j +1,k ) + b2(u i,j,k−1− 2u i,j,k + u i,j,k+1) ], (7.38)
with a = h x / h y , b = h x / h z Inserting the Fourier mode
u = g p m,n,lexp
iπ x
mh x
exp
j πy
nh y
exp
Note that we have lumped the constant portions of the grid and a mode into the function
f (a, b, m, n, l) After p timesteps with varying t , the decay factor will be given by
Trang 12148 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES
(a) Tchebicheff sequence (Löhner and Morgan (1987)):
1+ cos [π · (q − 1)/p] , q = 1, , p; (7.43a)(b) superstep sequence (Gentzsch and Schlüter (1978), Gentzsch (1980)):
1+ (R/p2) + cos [π · (2q − 1)/2p] , q = 1, , p, R = 2.5 (7.43b) Observe that in both cases the maximum timestep is of order t = O(p2), which is outsidethe stability range The overall procedure nevertheless remains stable, as the smaller timesteps
‘rescue’ the stability Figure 7.5 compares the performances of the two formulas for the 1-Dcase with that of uniform timesteps The improvement in residual reduction achieved by theuse of non-uniform timesteps is clearly apparent
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
0.00 0.31 0.63 0.94 1.26 1.57 1.88 2.20 2.51 2.83 3.14
Phi
Eqn.(7.43a) Eqn.(7.43b) Tau=0.8
Figure 7.5 Damping curves for the Laplacian
Returning to (7.42), let us determine the magnitude of t required to eliminate a certain mode For any given mode g m,n,l, the mode can be eliminated by choosing a timestep ofmagnitude
(1+ cos(π/m)) + a2(1+ cos(π/n)) + b2(1+ cos(π/l)) , (7.44)with c2= a2+ b2 The timesteps required to eliminate the three highest modes have beensummarized in Table 7.1 Two important trends may be discerned immediately
(a) The magnitude of t , or, equivalently, the number of iterations required to eliminate
the three highest modes, increases with the dimensionality of the problem This is the case
even for uniform grids (a = b = 1).