Applied Computational Fluid Dynamics Techniques - Wiley Episode 2 Part 3 pps

Notice furthermorethe small extent of the regions that require refinement as compared to the overall domain.The equivalent uniform mesh run would have required more than two orders of ma

Trang 1

(c) Figure 14.9 Continued

Trang 2

294 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUES14.5.2 SHOCK-OBJECT INTERACTION IN TWO DIMENSIONS

Figures 14.9(a)–(c) show a case taken from (Baum and Löhner (1992)) They show classich-refinement for strongly unsteady flows at its best For this class of problems a new mesh isrequired every five to seven timesteps, strict conservation of mass, momentum and energyduring refinement is critical, and the introduction of dissipation due to information lossduring interpolation when remeshing proves disastrous for accuracy A maximum of six levels

of refinement were specified for this case, yielding meshes that on average have 300 000triangles and 100 000 points Figures 14.9(a) and (b) show the mesh, mesh refinement levelsand pressures for different times

(a)

(b)

Figure 14.10 Shock–object interaction in three dimensions

Observe the detail in the physics that is achievable through adaptation Notice furthermorethe small extent of the regions that require refinement as compared to the overall domain.The equivalent uniform mesh run would have required more than two orders of magnitudemore elements, CPU time and memory, pushing the limits of available supercomputers The

Trang 3

(b)

(c) Figure 14.11 Shock–structure interaction: (a) building definition; (b) surface mesh and pressure;

(c) mesh and pressure in plane

Trang 4

296 APPLIED COMPUTATIONAL FLUID DYNAMICS TECHNIQUEScomparison to experimental results, given in Figure 14.9(c), reveals that indeed very accurateresults with a minimum of degrees of freedom are achieved using adaptive grid refinementfor this class of problems.

Figure 14.12 Object falling into supersonic free stream

14.5.3 SHOCK–OBJECT INTERACTION IN THREE DIMENSIONS

Figures 14.10(a)–(b) show a case taken from Baum and Löhner (1991) and Löhner and Baum(1992) The object under consideration is a common main battlefield tank A maximum oftwo layers of refinement were specified close to the tank, whereas only one level of refinementwas employed farther away The original, unrefined, but strongly graded mesh consisted

of approximately 100 000 tetrahedra and 20 000 points During the run, a mesh change(refinement and coarsening) occurred every five timesteps, and the mesh size increased toapproximately 1.6 million tetrahedra and 280 000 points This represents an increase factor

of 1:16 Although seemingly high, the corresponding global h-refinement would have resulted

in a 1:64 size increase A second important factor is that most of the elements of theoriginal mesh are close to the body, where most of the refinement is going to take place.Figures 14.10(a) and (b) show surface gridding and pressure contours at two selected timesduring the run The extent of mesh refinement is clearly discernable, as well as the locationand interaction of shocks

Trang 5

14.5.4 SHOCK–STRUCTURE INTERACTION

Figures 14.11(a)–(c) show a typical shock–structure interaction case The building underconsideration is shown in Figure 14.11(a) One layer of refinement was specified whereverthe physics required it The pressures and grids obtained at the surface and at planes at agiven time are shown in Figures 14.11(b) and (c) The mesh had approximately 60 milliontetrahendra

14.5.5 OBJECT FALLING INTO SUPERSONIC FREE STREAM TWO DIMENSIONSThe problem statement is as follows: an object is placed in a cavity surrounded by a free

stream at M∞= 1.5 After the steady-state solution is reached (time T = 0.0), a body motion

is prescribed, and the resulting flowfield disturbance is computed Adaptive remeshing wasperformed every 100 timesteps initially, while at later times the grid was modified every 50timesteps One level of global h-refinement was used to accelerate the grid regeneration The

maximum stretching ratio specified was S = 5.0 Figure 14.12 shows different stages during the computation at times T = 60 and T = 160 One can clearly see how the location and

strength of the shocks change due to the motion of the object Notice how the directionality

of the flow features is reflected in the mesh

Trang 6

15 EFFICIENT USE OF COMPUTER

HARDWARE

However clever an algorithm may be, it has to run efficiently on the available computerhardware Each type of computer, from the PC to the fastest massively parallel machine,has its own shortcomings that must be accounted for when developing both the algorithmsand the simulation code The present section assumes that the algorithm has been selected,and identifies the main issues that must be addressed in order to achieve good performance

on the most common types of computers The main types of computer platforms currentlybeing used are as follows

(a) Personal computers Although perhaps not considered a serious analysis tool even a

decade ago, personal computers can already be used cost-effectively for 3-D simulations

In fact, many applications where CPU time is not a constraining factor are currently beingcarried out on PCs Most CFD software companies report higher revenues from PC platformsthan from all other platforms combined High-end PCs (4 Gbytes of RAM, 120 GFLOPSgraphics card) are ideal tools for simulations We see this as one more proof of the theme thathas been repeated so often in this book: a CFD run is more than just CPU – if this were so,vector machines would have become the dominant type of computer Rather, it consists ofproblem definition, grid generation, flow solver execution and visualization High-end PCscombine a relatively fast CPU with good visualization hardware, allowing to cut down themost expensive cost-component of any run: man-hours

(b) Vector machines These machines achieve higher speeds by splitting up arithmetic

operations (fetch, align, add, multiply, store, etc.), performing each on different data itemsconcurrently The assumption made is that the same basic operation(s) have to be performed

on a relatively large number of data items These data items can be thought of as vectors,hence the name As an example, consider the operationD=C*(A+B) While the central CPU

fetches the data from memory for the ith item, it may align the data for item i+ 1, add

two numbers for item i + 2, multiply numbers for item i + 3 and store the results for item

i+ 4 This would yield a speedup of 1:4 In practice, many more operations than the onesdescribed above are required even to add two numbers Hence, speedups of about one order

of magnitude are achievable (1:14 on the Cray-X or NEC-SX series)

(c) Single instruction multiple data (SIMD) machines Here the assumption made is that all

data items (e.g elements, points, etc.) will be subject to the same arithmetic operations Inorder to go beyond the one order of magnitude speedup of vector machines, thousands ofprocessors are combined Each processor performs the same task on a different piece of data.While this type of machine did not succeed when based on conventional chips, high-end

graphics cards are increasingly being used in this mode (Hagen et al (2006), LeGresley et al.

(2007))

Applied Computational Fluid Dynamics Techniques: An Introduction Based on Finite Element Methods, Second Edition.

Trang 7

(d) Multiple instruction multiple data (MIMD) machines In this case different arithmetic

operations may be performed on different processors This circumvents some of the tions posed by the SIMD assumption that all processors are performing the same arithmeticoperation On the other hand, the operating system software required to keep these machinesfunctioning is much more involved and sensitive than that required for SIMD machines.The emerging architecture for future machines is a generalization of the MIMD machine,where some processors may be based on commodity, general-purpose chips, others onreduced instruction set chips (RISC-chips), others on powerful vector-processors, and somehave SIMD architecture An example of such a machine is the Cray-T3E, which combines

restric-a Crrestric-ay-T90 vector supercomputer with up to 2056 Alphrestric-a-Chip-brestric-ased processors An restric-tecture like this, which combines scalar, vector and distributed memory parallel processing,requires the programmer to take into consideration all the individual aspects encountered ineach of these architectures

If the data required by the CPU for subsequent arithmetic operations is not close enough tofit into the cache, this piece of information will have to be fetched from memory or disk This

is called a cache-miss Depending on the frequency of cache-misses versus CPU, a seriousdegradation in performance, often in excess of 1:10, can take place The relative number ofcache-misses invariably increases with problem size The aim of the renumbering strategiesconsidered in the present section is to minimize the frequency of cache-misses, i.e to retardthe degradation of performance with problem size The main techniques considered are:

- array access in loops;

- renumbering of points to reduce the spread in memory of the items fetched by a singleelement or edge;

- reordering of the nodes in each element so that data is accessed in as uniform a way aspossible within each element; and

- renumbering of elements, faces and edges so that data is accessed in as uniform a way

as possible when looping over them

15.1.1 ARRAY ACCESS IN LOOPS

Storing all the arrays required (elements, coordinates, unknowns, edges, etc.) in a way that iscompatible with the way they are accessed within loops reduces cache-misses appreciably Tosee why, consider the array containing the coordinates of the points: horizontal or flat storage

some Crays the preferred choice would be vertical storage à lacoord(npoin,ndimn)

Trang 8

EFFICIENT USE OF COMPUTER HARDWARE 301

Suppose that the difference vector (dx, dy, dz) of the two endpoints of an edge is required.

This implies fetching six items and performing three arithmetic operations For flat storage,the jump in memory is given by

whereas for vertical storage the jumps are

The difference in the number of large jumps is clearly visible from this comparison For thisreason, flat storage is recommended for any machine with cache Note that, for codes written

in C, the opposite holds, as the second index moves faster than the first one

15.1.2 POINT RENUMBERING

Consider the evaluation of an edge RHS (the same basic principle applies to element-based

or face-based solvers), given by the following loop

(a) gather point information into the edge;

(b) perform the required mathematical operations at edge level;

(c) scatter-add the edge RHS to the assembled point RHS

Trang 9

The transfer of information to and from memory required in steps (a), (c) is proportional

to the number of nodes in the edge (element, face) and the number of unknowns per node

If the nodes within each edge (element, face) are widely spaced in memory, cache-missesare likely to occur If, on the other hand, all the points within an element are ‘close’ inmemory, cache-misses are minimized From these considerations, it becomes clear thatcache-misses are directly linked to the bandwidth of the equivalent matrix system (or graph).Point renumbering to reduce bandwidths has been an important theme for many years intraditional finite element applications (Piessanetzky (1984), Zienkiewicz (1991)) The aimwas to reduce the cost of the matrix inversion, which was considered to be the most expensivepart of any finite element simulation

(b) (a)

Figure 15.1 Ordering of points for 2-D mesh

The optimal renumbering of points in such a way that spatial (or near-neighbour) locality

is mirrored in memory is a problem of formidable algorithmic complexity Fortunately,most of the benefits of renumbering points are already obtained from near-optimal heuristicrenumbering techniques To see how most of these fast, near-optimal techniques work,consider the rectangular domain with a structured mesh shown in Figure 15.1 Numberingthe points in the horizontal (Figure 15.1(a)) and vertical (Figure 15.1(b)) directions yields anaverage bandwidth ofnxandny, respectively One should therefore aim to number the points

in the direction normal to the longest graph depth Based on this observation, several pointrenumbering techniques have been developed To exemplify these techniques, the simplemesh shown in Figure 15.2 is considered

15.1.2.1 Directional ordering

If the direction of maximal graph depth is known, one can simply order the points in thisdirection This is perhaps the simplest (and fastest) possible renumbering, but implies thatthe problem class being addressed has a clear maximal graph depth direction that can easily

be identified Renumbering in the x-direction, this yields the numbering shown in Figure 15.3.

15.1.2.2 Bin ordering

Given an arbitrary distribution of points, one may first place the points in a bin of uniform

size h One can then identify, by ordering the number of bins in the x, y, z directions in

Trang 10

Figure 15.2 Original mesh

Figure 15.3 Renumbering in the x-direction

ascending sizei,j,k, the plane k that traverses space yielding the lowest bandwidth, i.e the

closest proximity in memory Bins offer the advantage of high speed (very few operations arerequired, and most of these are easy to vectorize/ parallelize) and simplicity After obtainingthe overall dimensions of the computational domain, bin ordering may be realized in twoways:

(1) Obtain the bin each point falls into; store the points into bins (e.g using a linked list

points;

(2) Obtain the bin each point falls into; assign a number to the point based on the bin itfalls into (e.g inumb=ibinx+nbinx*(ibiny-1)+nbinx*nbiny*(ibinz-1)); store the points in a heap list (based on the assigned number); retrieve the pointsfrom the heap list, renumbering points

Bins are mostly used for grids with modest changes in element size Figure 15.4 shows thebin ordering of points for the mesh from Figure 15.2

Trang 11

Figure 15.4 Renumbering using bins

subdivided space) This is easily accomplished using quadtrees (two dimensions) or octrees(three dimensions) These data structures have already been described in Chapter 2 Havingstored all the points, the quad/octree is traversed as shown in Figure 15.5, renumbering thepoints One can see that in this way spatial proximity is mirrored in memory in a near-optimalway

Figure 15.5 Renumbering using quadtree

15.1.2.4 Space-filling curves

A very similar effect to that of quad/octree ordering can be achieved by using so-called filling curves A typical curve that is often employed is the Peano–Hilbert–Morton curveshown in Figure 15.6 for two dimensions Any point in space can be thought of as lying

space-on this curve This implies that, space-once the coordinate alspace-ong this line ξ has been established for each point, the points can be renumbered in ascending order of ξ One can see from

Figure 15.6 the similarity with quad/octree renumbering, as well as the effectiveness of theprocedure

15.1.2.5 Wave renumbering

All of the techniques discussed so far have only required the spatial location of points

to achieve near-optimal renumberings However, if a mesh is given, one can obtain from

Trang 12

Figure 15.6 Renumbering using space-filling curves

the connectivity table the nearest-neighbours for each point and construct renumberingtechniques with this information One of the most useful techniques is the so-called waverenumbering or advancing-front renumbering technique Starting from a given point, newpoints are added in layers according to the smallest connectivity The ‘front’ of renumberedpoints is advanced through the grid until all points have been covered (see Figure 15.7)

Figure 15.7 Wave front renumbering

The choice of the seed-point can have a significant effect on the total bandwidth obtained.Unfortunately, choosing the optimal starting point may be more expensive than the wholesubsequent simulation A very effective heuristic approach (all of the bandwidth minimizationstrategies are heuristic by nature) is to choose the last point of the renumbered mesh asthe starting point for a new renumbering pass This procedure is repeated until no furtherreduction in the bandwidth is achieved Convergence is obtained in a relatively small number

of passes, typically less than five, even for complex 3-D meshes

An improvement on the wave renumbering technique is the Cuthill–McKee (Cuthill andMcKee 1969) or reverse Cuthill–McKee (RCM) reordering At each stage, the node with thesmallest number of surrounding unrenumbered nodes is added to the renumbering table Formeshes, which are characterized by having a bounded number of nearest-neighbours for eachpoint, the improvement of RCM versus wave front is not considerable

Định dạng
Số trang	25
Dung lượng	557,31 KB