fast parallel integration for three dimensional discontinuous petrov galerkin method

leszek@ices.utexas.edu Abstract Finite Element Method comes with a challenge of constructing test functions, that provide better stability.. Discontinuous Petrov-Galerkin method construc

Trang 1

Procedia Computer Science 101 , 2016 , Pages 8 – 17

YSC 2016 5th International Young Scientist Conference on Computational Science,

doi: 10.1016/j.procs.2016.11.003

Peer-review under responsibility of organizing committee of the scientific committee of the

5th International Young Scientist Conference on Computational Science

YSC 2016 5th International Young Scientist Conference on Computational Science

Fast parallel integration for three dimensional

Discontinuous Petrov Galerkin method Maciej Wo´zniak1, Marcin Lo´s1, Maciej Paszy´nski1, and Leszek Demkowicz2

1 AGH University of Science and Technology Krak´ow, Poland macwozni@agh.edu.pl, los@agh.edu.pl paszynsk@agh.edu.pl

2 The University of Texas at Austin, Austin, Texas, U.S.A.

leszek@ices.utexas.edu

Abstract

Finite Element Method comes with a challenge of constructing test functions, that provide better stability Discontinuous Petrov-Galerkin method constructs optimal test functions “on the ﬂy” However this method comes with relatively high computational cost In this paper we show a parallelization method to reduce computation time

Keywords: Finite Element Method, Discontinuous Petrov-Galerkin, parallel, shared memory

In this paper we present a parallelization of the algorithm for generation of the element matrices for Discontinuous Petrov Galerkin (DPG) method [3, 4, 5] The DPG method is a new rapidly growing method for solving the numerical problems It enables for automatic control of the stability of the numerical formulations We have parallelized the element routines of the hp3d DPG code developed by the group of prof Demkowicz We are aware of other parallel FEM packages supporting adaptive computations for DPG, including CAMELIA [14] and DUNE-DPG [10] However, the hp3d framework is unique in the following ways:

• It supports hexahedral, tetrahedral, prism and pyramid elements in 3D To our best

knowledge, CAMELIA supports only hexahedral elements, and DUNE supports triangular elements only

• It enables for parallel anisotropic reﬁnements over computational domain distributed with

different kind of finite elements, including tetrahedral, hexahedra, prism and pyramid The CAMELIA and DUNE packages do not allow for anisotropic refinements, and thus the exponential convergence of the numerical solution is not possible there

• CAMELIA and DUNE do not support complex Hcurl discretization Our framework will

enable for parallel automatic hp-adaptive computations for diﬀerent classes of problems, including H1, Hdiv, and Hcurl

8

Trang 2

Our preliminary work presented in this paper concerns the parallelization of the element matri-ces for the elliptic problem However, our future work will involve parallelization of Hdiv and Hcurl element routines

Let us focus on the model elliptic problem In the Sobolev space

H1(Ω) ={u ∈ L2(Ω) :D α u ∈ L2Ω, |α| ≤ 1, tr u = 0 on ∂Ω} (1)

we introduce classical weak formulation for Poisson problem inH1(Ω) We seek foru ∈ H1(Ω).

Ω

∇u∇v dx =

Ω

fv dx ∀v ∈ H1

We may also express the above problem with abstract notation:

where in our model problem we have

b(u, v) =

Ω

l(v) =

Ω

We project the weak problem into the ﬁnite dimensional subspaceV h ⊂ H1(Ω)

Ω

∇u h ∇v hdx =

Ω

fv hdx ∀v h ∈ V h ⊂ H1

The actual mathematical theory concerning the stability of the numerical method for general weak formulation (2) is based on the famous “Babu´ska-Brezzi condition” (BBC) developed in years 1971-1974 at the same time by Ivo Babu´ska, and Franco Brezzi [13, 12, 9] The condition states that the problem (2) is stable when

sup

v∈V

|b(u, v)|

However, the inf-sup condition in the above form concerns the abstract formulation where we consider all the test functions from v ∈ V and look for solution at u ∈ U (e.g U = V ) The

above condition is satisﬁed also if we restric to the space of trial functionsu h ∈ U h

sup

v∈V

|b(u h , v)|

However, if we use test functions from the ﬁnite dimensional test spaceV h= span{v h }

sup

v h ∈V h

|b(u h , v h)|

v h V h

we do not have a guarantee that the supremum (9) will be equal to the original supremom (7), since we have restricted V to V h The optimality of the method depends on the quality

Trang 3

10

of the polynomial test functions deﬁning the space V h = span{v h } and how far are they from

the supremum realized in (7) Many scientists spent their lives on constructing test functions providing better stability of the method for given class of problems [11, 7, 8, 1] They have developed several techniques for stabilization different kind of problems In 2010 the DPG was proposed, with the modern summary of the method described in [2] The key idea of the DPG method is to construct the optimal test functions “on the fly”, element by element The DPG automatically guarantee the numerical stability of difficult computational problems, thanks to the automatic selection of the optimal basis functions

The DPG method can be derived using one of the following three methods [2, 6]

a) Minimum residuum method

b) Petrov-Galerkin with optimal test functions

c) Special mixed methods

Minimum residuum method is good to illustrate the idea of the DPG method, however, it re-sults in the optimal test functions problem that is as expensive as the original problem itself The construction of the DPG method with Petrov-Galerkin approach is relatively diﬃcult, so

we will only illustrate it here as a tool for making the optimization problem local over each finite element Special mixed method formulation of the DPG method is the most suitable for efficient implementation, since it results in a modification of the classical finite element method

We will present this method to discuss the implementation issues

Ad a) Let us start now from the derivation of the DPG method using the Minimum residuum method For our weak problem (2) we construct the operator

such that

so we can reformulate the problem as

We wish to minimize the residual

u h= arg min

w h ∈U h

1

2Bw h − l2

We introduce the Riesz operator being the isometric isomorphism

We can project the problem back to V

u h= arg min

w h ∈U h

1

2R −1 V (Bw h − l)2

Trang 4

11

The minimum is attained atu hwhen the Gˆateaux derivative is equal to 0 in all directions:

−1

V (Bu h − l), R −1

V (B δu h) V = 0 ∀ δu h ∈ U h (16)

Using again the deﬁnition of the Riesz map we get

h − l, R −1

which is equivalent to our original residuum problem

with optimal test functions

v δu h =R −1

V B δu hfor each trial function δu h (19)

In other words, with the help of the Riesz operator, it is possible to construct the optimal test functions [2] However, with the traditional weak formulation, the numerical solution of this optimization problem is as expensive as the solution of the original problem itself [2] In order

to make the basis function optimization problem local over particular ﬁnite elements, we need

to break the test spaces and reformulate the weak formulation

Ad b) In other words we switch now to the Petrov-Galerkin method, where we use original trial functions and broken test functions In order to allow for local element-wise solution of the problem of the optimization of the test functions, we derive the weak formulation with broken test spaces We introduce the space of broken test functions

H1(Ω

h) ={u ∈ L2(Ω) :u| K ∈ H1 K) ∀K ∈ T h } (20) and we introduce the weak formulation with broken test functions We seek for u ∈ H1(Ω) as

well as for ﬂuxest ∈ trΓh H(div, Ω)

K

∇u∇v dx +

K

∂K

tv ds =

Ω

fv dx for all v ∈ H1(Ω

We denote by

K

∂K tv ds Everythig now is computed element-wise and summed up.

In particular the term

K

∂K

tv ds =

f

where f denotes the faces of an element and n f denotes the face normal vector, and

[v] =

(for element faces located on the boundary of the domain we just take the normal component

of v, but for element faces located inside the domain, we consider diﬀerence of the normal

components v[+] and v[−] resulting from the two elements sharing the common face) Over a

Trang 5

single element, we have the following contributions now: element frontal matrices

K ∇u∇v dx+

∂K tv ds and right-hand-sides

K fv dx Notice that the ﬁrst term comes from the integration

inside the element, and the second term results from the integration on the boundary of the element Again, we express the above weak-formulation with broken basis functions in the following abstract form

where

(25) When we compare the standard weak formulation and the formulation with broken spaces, we can see that in the second one we have more unknowns, namely (u, t) For such the weak

for-mulation with broken test functions, we can compute automatically the optimal test functions This is the main idea of the DPG method (Discontinuous Petrov-Galerkin method) - to con-struct on ﬂy, element by element, the optimal test functions that will ensure numerical stability

of the method However, the implementation with this Petrov-Galerkin method is relatively diﬃcult, and we will rather switch to the formulation of the DPG with special mixed method

Ad c) We will focus now on the formulation of the DPF by using the special mixed method, with error representation function [6] The error representation functions given by

Ψ =R −1

allows us to develop alternative formulation of the DPG method: Find Ψ ∈ V, u h ∈ U h , t ∈

trΓh H(div, K) such as

(29) Based on the above formulation, we can construct now element matrices for this problem

⎡

⎣B G T −B1 −B2

B T

⎤

⎦

⎡

⎣Ψu

t

⎤

⎦ =

⎡

⎣−l0 0

⎤

in DPG method

The formulation of the DPG with special mixed method results in the structure of the element local matrices, as presented in equation (30) for the case of approximation in H1 K) spaces.

Analogous formulation forH(div, K) and H(curl, K) approximations also results in a similar

structure of the element local matrix, but with diﬀerent distribution of base functions and degrees of freedom over element vertices, edges, faces and interiors In the following estimations

we assume three dimensional hexahedral ﬁnite elements with vertices, edges, faces and interior nodes Notice thatu ≈u i e i is approximated over the element with polynomials of orderp

from the space u h ∈ U h ⊂ H1 K) This means that we have one degree of freedom per each

vertex node,p − 1 degrees of freedom per each edge node, (p − 1)2degrees of freedom per each

Trang 6

13

face node and (p − 1)3 degrees of freedom per each interior node Tracest ≈i=1, ,O(p3 )t i f i

are approximated with polynomials of orderp from the space trΓh H(div, K), on the boundary

of elements only This means that we have one degree of freedom per each vertex node,p − 1

degrees of freedom per each edge node, and (p − 1)2 degrees of freedom per each face node The

error representation function Ψ≈i=1, ,O((p+Δp)3 )Ψi e i is approximated with polynomials of orderp + Δp (from the enriched space) also forming a subspace of H1 K) This means that

we have one degree of freedom per each vertex node,p + Δp − 1 degrees of freedom per each

edge node, (p + Δp − 1)2 degrees of freedom per each face node and (p + Δp − 1)3 degrees of

freedom per each interior node

Summing, up:

• The G is the Gram matrix, and it is a block-diagonal matrix

• u ≈ i=1, ,p u i e i is approximated with polynomial of order p, which means over the

3D element there are O(p3) unknowns related tou, since they are deﬁned over element

vertices, edges, faces and interiors

• t ≈i=1, ,p t i e i is approximated with polynomial of orderp, which means over the 3D

element there are also O(p2) unknowns related to t, since the ﬂuxes are deﬁned over

element edges and faces

Ψ≈i=1, ,p+ΔpΨi e i is approximated with polynomial of orderp + Δp which means over the

3D element there are alsoO((p + Δp)3) unknowns related to Ψ.

Thus, the Gram matrix has a square shape of sizeO((p + Δp)3× (p + Δp)3), the matrixB1

has rectangular shape of sizeO((p+Δp)3×p2) and the matrixB2has rectangular shape of size

O((p + Δp)3× p3) The cost of generation of the Gram matrix is of the order ofO(p9+ Δp9

using Gaussian quadratures

In general, the generation of the DPG matrices involves nested loops, starting from the Gauss integration points, through test basis functions, to trial basis functions

The generation of the matrices involves Gramm matrix and the so-called extended HH matrices

u s e o m p l i b

c l o o p o v e r i n t e g r a t i o n p o i n t s

!$OMP PARALLEL DO

!$OMP& DEFAULT(SHARED)

!$OMP& PRIVATE( l , x i , wa , weight , k1 , v1 , dv1 , k2 , v2 , dv2 , k , gradH ,

!$OMP& nrdofH , shapH , x , dxdxi , z f v a l , i f l a g , r j a c , d x i d x )

!$OMP& FIRSTPRIVATE( nrdofHH )

!$OMP& REDUCTION(+:BLOADH)

!$OMP& REDUCTION(+:AP)

!$OMP& REDUCTION(+:STIFFHH)

do l =1 , n i n t

c P r e p a r e common d a t a ( e g i n t e g r a t i o n p o i n t s , w e i g h t s )

Trang 7

Figure 1: Execution time of the parallel integration algorithm for a single DPG element, when increasing number of cores 3D hexahedral element with cubic polynomials

Figure 2: Parallel eﬃciency of the parallel integration algorithm for a single DPG element 3D hexahedral element with cubic polynomials

c f i r s t l o o p t h r o u g h e n r i c h e d H1 t e s t f u n c t i o n s

do k1 =1 , nrdofHH

c compute t h e RHS e n t r y

BLOADH( k1 ) = BLOADH( k1 ) + z f v a l∗v1∗ weight

c l o o p t h r o u g h H1 t r i a l f u n c t i o n s

do k2 =1 , nrdofHH

dv2 ( 1 : 3 ) = gradHH ( 1 , k2 )∗ dxidx ( 1 , 1 : 3 )

+ gradHH ( 2 , k2 )∗ dxidx ( 2 , 1 : 3 )

+ gradHH ( 3 , k2 )∗ dxidx ( 3 , 1 : 3 )

Trang 8

15

Figure 3: Parallel speedup of the parallel integration algorithm for a single DPG element 3D hexahedral element with cubic polynomials

c −− GRAM MATRIX −− ( s t o r e d in t r i a n g u l a r format )

c d e t e r m i n e i n d e x i n t r i a g u l a r f o r m a t

k = nk ( k1 , k2 ) AP( k ) = AP( k ) + ( dv1 ( 1 )∗ dv2(1)+dv1 (2)∗ dv2(2)+dv1 (3)∗ dv2 ( 3 )

enddo

c l o o p t h r o u g h H1 t r i a l f u n c t i o n s

do k2 =1 , nrdofH

v2 = shapH ( k2 ) dv2 ( 1 : 3 ) = gradH ( 1 , k2 )∗ dxidx ( 1 , 1 : 3 )

+ gradH ( 2 , k2 )∗ dxidx ( 2 , 1 : 3 )

+ gradH ( 3 , k2 )∗ dxidx ( 3 , 1 : 3 )

c P o i s s o n e q u a t i o n

STIFFHH( k1 , k2 ) = STIFFHH( k1 , k2 ) + ( dv1 ( 1 )∗ dv2(1)+dv1 (2)∗ dv2(2)+dv1 (3)∗ dv2 ( 3 ) ) ∗ weight

enddo

!$OMP END PARALLEL DO

We have parallelized the two external loops, through Gauss integration points and through rows (test functions)

Trang 9

We summarized our paper with parallel implementation of the DPG element matrices generator

We focused on the simple Poisson problem, with the pseudo-code described in previous section The numerical experiments have been performed for hexahedral DPG elements with cubic polynomials, integrated over Linux cluster node equipped with 14 cores

We can observe the eﬃciency going through 70% of 8 cores, 60 % on 11 cores down to 50 percent on 14 cores The corresponding speedup reaches up to 6.5 on 10 cores (see ﬁgures [1-3]) All computations were performed on a 8× 8 × 4 elements mesh.

In this paper we presented a scalability of parallel OpenMP integration of DPG matrices The parallel integrator has been obtained through OpenMP parallelization of sequential 3D DPG code developed for model Poisson problem by the group of prof Demkowicz We observe

a speedup up to 6.5 on 10 cores Future work will include the domain decomposition based parallelization of the DPG code, with hybrid parallelism including MPI on the level of particular elements and OpenMP on the level of matrices

The work of MW was supported by Deans grant no 15.11.230.270

References

[1] Franco Brezzi, Marie-Odile Bristeau, Leopoldo P Franca, Michel Mallet, and Gilbert Rog A rela-tionship between stabilized ﬁnite element methods and the galerkin method with bubble functions

Computer Methods in Applied Mechanics and Engineering, 96:117–129, 1992.

[2] Leszek Demkowicz and Jay Gopalakrishnan Recent developments in discontinuous galerkin

ﬁ-nite element methods for partial diﬀerential equations IMA Volumes in Mathematics and its

Applications, 157:149–180, 2014 An Overview of the DPG Method.

[3] Leszek Demkowicz and Jay Gopalarkishnan A class of discontinuous petrov-galerkin methods part i: The transport equation 199:1558–1572, 2010

[4] Leszek Demkowicz and Jay Gopalarkishnan Numerical methods for partial diﬀerential, a class of discontinuous petrov-galerkin methods part ii optimal test functions 27:70–105, 2011

[5] Leszek Demkowicz, Jay Gopalarkishnan, and Anti Niemmi A class of discontinuous petrov-galerkin methods part iii: Adaptivity 62:396–427, 2012

[6] Tim Ellis, Leszek Demkowicz, and Jessy Chan Locally conservative discontinuous petrov-galerkin ﬁnite elements for ﬂuid problems 68:1530–1549, 2014

[7] Leopoldo P Franca, Sergio L Frey, and Thomas J.R Hughes Stabilized ﬁnite element methods:

I application to the advective-diﬀusive model Computer Methods in Applied Mechanics and Engineering, 95:253–276, 1992.

[8] Leopoldo P Franca and Srgio L Frey Stabilized ﬁnite element methods: Ii the incompressible

navier-stokes equations Computer Methods in Applied Mechanics and Engineering, 99:209–233,

1992

[9] Brezzi Franco On the existence, uniqueness and approximation of saddle-point problems arising

from lagrange multipliers ESAIM: Mathematical Modelling and Numerical Analysis - Modlisation

Mathmatique et Analyse Numrique, 8:129–151, 1974.

Trang 10

17

[10] F Gruber, A Klewinghaus, and O Mula The dune-dpg library for solving pdes with discontinuous

petrov-galerkin ﬁnite elements http://arxiv.org/abs/1602.08338, 2016.

[11] Thomas J.R Hughes, Guglielmo Scovazzi, and Tayfun Tezduyar Stabilized methods for

com-pressible ﬂows Journal of Scientiﬁc Computing, 43:343–368, 2010.

[12] Babuska Ivo Error bounds for ﬁnite element method 16:322–333, 1971

[13] Demkowicz Leszek Babuska↔ brezzi Technical Report 0608, The University of Texas at Austin,

2006

[14] Nathan Roberts Camelia: A software framework for discontinuous petrov-galerkin methods

https://github.com/CamelliaDPG/Camellia, 2016.

Tiêu đề	Fast parallel integration for three dimensional Discontinuous Petrov-Galerkin method
Tác giả	Maciej Wozniak, Marcin Los, Maciej Paszynski, Leszek Demkowicz
Trường học	AGH University of Science and Technology, Kraków
Chuyên ngành	Computational Science
Thể loại	conference paper
Năm xuất bản	2016
Thành phố	Kraków

Định dạng
Số trang	10
Dung lượng	327,04 KB