leszek@ices.utexas.edu Abstract Finite Element Method comes with a challenge of constructing test functions, that provide better stability.. Discontinuous Petrov-Galerkin method construc
Trang 1Procedia Computer Science 101 , 2016 , Pages 8 – 17
YSC 2016 5th International Young Scientist Conference on Computational Science,
doi: 10.1016/j.procs.2016.11.003
Peer-review under responsibility of organizing committee of the scientific committee of the
5th International Young Scientist Conference on Computational Science
© 2016 The Authors Published by Elsevier B.V
YSC 2016 5th International Young Scientist Conference on Computational Science
Fast parallel integration for three dimensional
Discontinuous Petrov Galerkin method Maciej Wo´zniak1, Marcin Lo´s1, Maciej Paszy´nski1, and Leszek Demkowicz2
1 AGH University of Science and Technology Krak´ow, Poland macwozni@agh.edu.pl, los@agh.edu.pl paszynsk@agh.edu.pl
2 The University of Texas at Austin, Austin, Texas, U.S.A.
leszek@ices.utexas.edu
Abstract
Finite Element Method comes with a challenge of constructing test functions, that provide better stability Discontinuous Petrov-Galerkin method constructs optimal test functions “on the fly” However this method comes with relatively high computational cost In this paper we show a parallelization method to reduce computation time
Keywords: Finite Element Method, Discontinuous Petrov-Galerkin, parallel, shared memory
In this paper we present a parallelization of the algorithm for generation of the element matrices for Discontinuous Petrov Galerkin (DPG) method [3, 4, 5] The DPG method is a new rapidly growing method for solving the numerical problems It enables for automatic control of the stability of the numerical formulations We have parallelized the element routines of the hp3d DPG code developed by the group of prof Demkowicz We are aware of other parallel FEM packages supporting adaptive computations for DPG, including CAMELIA [14] and DUNE-DPG [10] However, the hp3d framework is unique in the following ways:
• It supports hexahedral, tetrahedral, prism and pyramid elements in 3D To our best
knowledge, CAMELIA supports only hexahedral elements, and DUNE supports triangular elements only
• It enables for parallel anisotropic refinements over computational domain distributed with
different kind of finite elements, including tetrahedral, hexahedra, prism and pyramid The CAMELIA and DUNE packages do not allow for anisotropic refinements, and thus the exponential convergence of the numerical solution is not possible there
• CAMELIA and DUNE do not support complex Hcurl discretization Our framework will
enable for parallel automatic hp-adaptive computations for different classes of problems, including H1, Hdiv, and Hcurl
8
Trang 2Our preliminary work presented in this paper concerns the parallelization of the element matri-ces for the elliptic problem However, our future work will involve parallelization of Hdiv and Hcurl element routines
Let us focus on the model elliptic problem In the Sobolev space
H1(Ω) ={u ∈ L2(Ω) :D α u ∈ L2Ω, |α| ≤ 1, tr u = 0 on ∂Ω} (1)
we introduce classical weak formulation for Poisson problem inH1(Ω) We seek foru ∈ H1(Ω).
Ω
∇u∇v dx =
Ω
fv dx ∀v ∈ H1
We may also express the above problem with abstract notation:
where in our model problem we have
b(u, v) =
Ω
l(v) =
Ω
We project the weak problem into the finite dimensional subspaceV h ⊂ H1(Ω)
Ω
∇u h ∇v hdx =
Ω
fv hdx ∀v h ∈ V h ⊂ H1
The actual mathematical theory concerning the stability of the numerical method for general weak formulation (2) is based on the famous “Babu´ska-Brezzi condition” (BBC) developed in years 1971-1974 at the same time by Ivo Babu´ska, and Franco Brezzi [13, 12, 9] The condition states that the problem (2) is stable when
sup
v∈V
|b(u, v)|
However, the inf-sup condition in the above form concerns the abstract formulation where we consider all the test functions from v ∈ V and look for solution at u ∈ U (e.g U = V ) The
above condition is satisfied also if we restric to the space of trial functionsu h ∈ U h
sup
v∈V
|b(u h , v)|
However, if we use test functions from the finite dimensional test spaceV h= span{v h }
sup
v h ∈V h
|b(u h , v h)|
v h V h
we do not have a guarantee that the supremum (9) will be equal to the original supremom (7), since we have restricted V to V h The optimality of the method depends on the quality
Trang 310
of the polynomial test functions defining the space V h = span{v h } and how far are they from
the supremum realized in (7) Many scientists spent their lives on constructing test functions providing better stability of the method for given class of problems [11, 7, 8, 1] They have developed several techniques for stabilization different kind of problems In 2010 the DPG was proposed, with the modern summary of the method described in [2] The key idea of the DPG method is to construct the optimal test functions “on the fly”, element by element The DPG automatically guarantee the numerical stability of difficult computational problems, thanks to the automatic selection of the optimal basis functions
The DPG method can be derived using one of the following three methods [2, 6]
a) Minimum residuum method
b) Petrov-Galerkin with optimal test functions
c) Special mixed methods
Minimum residuum method is good to illustrate the idea of the DPG method, however, it re-sults in the optimal test functions problem that is as expensive as the original problem itself The construction of the DPG method with Petrov-Galerkin approach is relatively difficult, so
we will only illustrate it here as a tool for making the optimization problem local over each finite element Special mixed method formulation of the DPG method is the most suitable for efficient implementation, since it results in a modification of the classical finite element method
We will present this method to discuss the implementation issues
Ad a) Let us start now from the derivation of the DPG method using the Minimum residuum method For our weak problem (2) we construct the operator
such that
so we can reformulate the problem as
We wish to minimize the residual
u h= arg min
w h ∈U h
1
2Bw h − l2
We introduce the Riesz operator being the isometric isomorphism
We can project the problem back to V
u h= arg min
w h ∈U h
1
2R −1 V (Bw h − l)2
Trang 411
The minimum is attained atu hwhen the Gˆateaux derivative is equal to 0 in all directions:
−1
V (Bu h − l), R −1
V (B δu h) V = 0 ∀ δu h ∈ U h (16)
Using again the definition of the Riesz map we get
h − l, R −1
which is equivalent to our original residuum problem
with optimal test functions
v δu h =R −1
V B δu hfor each trial function δu h (19)
In other words, with the help of the Riesz operator, it is possible to construct the optimal test functions [2] However, with the traditional weak formulation, the numerical solution of this optimization problem is as expensive as the solution of the original problem itself [2] In order
to make the basis function optimization problem local over particular finite elements, we need
to break the test spaces and reformulate the weak formulation
Ad b) In other words we switch now to the Petrov-Galerkin method, where we use original trial functions and broken test functions In order to allow for local element-wise solution of the problem of the optimization of the test functions, we derive the weak formulation with broken test spaces We introduce the space of broken test functions
H1(Ω
h) ={u ∈ L2(Ω) :u| K ∈ H1 K) ∀K ∈ T h } (20) and we introduce the weak formulation with broken test functions We seek for u ∈ H1(Ω) as
well as for fluxest ∈ trΓh H(div, Ω)
K
K
∇u∇v dx +
K
∂K
tv ds =
Ω
fv dx for all v ∈ H1(Ω
We denote by
K
∂K tv ds Everythig now is computed element-wise and summed up.
In particular the term
K
∂K
tv ds =
f
f
where f denotes the faces of an element and n f denotes the face normal vector, and
[v] =
(for element faces located on the boundary of the domain we just take the normal component
of v, but for element faces located inside the domain, we consider difference of the normal
components v[+] and v[−] resulting from the two elements sharing the common face) Over a
Trang 5single element, we have the following contributions now: element frontal matrices
K ∇u∇v dx+
∂K tv ds and right-hand-sides
K fv dx Notice that the first term comes from the integration
inside the element, and the second term results from the integration on the boundary of the element Again, we express the above weak-formulation with broken basis functions in the following abstract form
where
(25) When we compare the standard weak formulation and the formulation with broken spaces, we can see that in the second one we have more unknowns, namely (u, t) For such the weak
for-mulation with broken test functions, we can compute automatically the optimal test functions This is the main idea of the DPG method (Discontinuous Petrov-Galerkin method) - to con-struct on fly, element by element, the optimal test functions that will ensure numerical stability
of the method However, the implementation with this Petrov-Galerkin method is relatively difficult, and we will rather switch to the formulation of the DPG with special mixed method
Ad c) We will focus now on the formulation of the DPF by using the special mixed method, with error representation function [6] The error representation functions given by
Ψ =R −1
allows us to develop alternative formulation of the DPG method: Find Ψ ∈ V, u h ∈ U h , t ∈
trΓh H(div, K) such as
(29) Based on the above formulation, we can construct now element matrices for this problem
⎡
⎣B G T −B1 −B2
B T
⎤
⎦
⎡
⎣Ψu
t
⎤
⎦ =
⎡
⎣−l0 0
⎤
in DPG method
The formulation of the DPG with special mixed method results in the structure of the element local matrices, as presented in equation (30) for the case of approximation in H1 K) spaces.
Analogous formulation forH(div, K) and H(curl, K) approximations also results in a similar
structure of the element local matrix, but with different distribution of base functions and degrees of freedom over element vertices, edges, faces and interiors In the following estimations
we assume three dimensional hexahedral finite elements with vertices, edges, faces and interior nodes Notice thatu ≈u i e i is approximated over the element with polynomials of orderp
from the space u h ∈ U h ⊂ H1 K) This means that we have one degree of freedom per each
vertex node,p − 1 degrees of freedom per each edge node, (p − 1)2degrees of freedom per each
Trang 613
face node and (p − 1)3 degrees of freedom per each interior node Tracest ≈i=1, ,O(p3 )t i f i
are approximated with polynomials of orderp from the space trΓh H(div, K), on the boundary
of elements only This means that we have one degree of freedom per each vertex node,p − 1
degrees of freedom per each edge node, and (p − 1)2 degrees of freedom per each face node The
error representation function Ψ≈i=1, ,O((p+Δp)3 )Ψi e i is approximated with polynomials of orderp + Δp (from the enriched space) also forming a subspace of H1 K) This means that
we have one degree of freedom per each vertex node,p + Δp − 1 degrees of freedom per each
edge node, (p + Δp − 1)2 degrees of freedom per each face node and (p + Δp − 1)3 degrees of
freedom per each interior node
Summing, up:
• The G is the Gram matrix, and it is a block-diagonal matrix
• u ≈ i=1, ,p u i e i is approximated with polynomial of order p, which means over the
3D element there are O(p3) unknowns related tou, since they are defined over element
vertices, edges, faces and interiors
• t ≈i=1, ,p t i e i is approximated with polynomial of orderp, which means over the 3D
element there are also O(p2) unknowns related to t, since the fluxes are defined over
element edges and faces
Ψ≈i=1, ,p+ΔpΨi e i is approximated with polynomial of orderp + Δp which means over the
3D element there are alsoO((p + Δp)3) unknowns related to Ψ.
Thus, the Gram matrix has a square shape of sizeO((p + Δp)3× (p + Δp)3), the matrixB1
has rectangular shape of sizeO((p+Δp)3×p2) and the matrixB2has rectangular shape of size
O((p + Δp)3× p3) The cost of generation of the Gram matrix is of the order ofO(p9+ Δp9
using Gaussian quadratures
In general, the generation of the DPG matrices involves nested loops, starting from the Gauss integration points, through test basis functions, to trial basis functions
The generation of the matrices involves Gramm matrix and the so-called extended HH matrices
u s e o m p l i b
c l o o p o v e r i n t e g r a t i o n p o i n t s
!$OMP PARALLEL DO
!$OMP& DEFAULT(SHARED)
!$OMP& PRIVATE( l , x i , wa , weight , k1 , v1 , dv1 , k2 , v2 , dv2 , k , gradH ,
!$OMP& nrdofH , shapH , x , dxdxi , z f v a l , i f l a g , r j a c , d x i d x )
!$OMP& FIRSTPRIVATE( nrdofHH )
!$OMP& REDUCTION(+:BLOADH)
!$OMP& REDUCTION(+:AP)
!$OMP& REDUCTION(+:STIFFHH)
do l =1 , n i n t
c P r e p a r e common d a t a ( e g i n t e g r a t i o n p o i n t s , w e i g h t s )
Trang 7
Figure 1: Execution time of the parallel integration algorithm for a single DPG element, when increasing number of cores 3D hexahedral element with cubic polynomials
Figure 2: Parallel efficiency of the parallel integration algorithm for a single DPG element 3D hexahedral element with cubic polynomials
c f i r s t l o o p t h r o u g h e n r i c h e d H1 t e s t f u n c t i o n s
do k1 =1 , nrdofHH
c compute t h e RHS e n t r y
BLOADH( k1 ) = BLOADH( k1 ) + z f v a l∗v1∗ weight
c l o o p t h r o u g h H1 t r i a l f u n c t i o n s
do k2 =1 , nrdofHH
dv2 ( 1 : 3 ) = gradHH ( 1 , k2 )∗ dxidx ( 1 , 1 : 3 )
+ gradHH ( 2 , k2 )∗ dxidx ( 2 , 1 : 3 )
+ gradHH ( 3 , k2 )∗ dxidx ( 3 , 1 : 3 )
Trang 815
Figure 3: Parallel speedup of the parallel integration algorithm for a single DPG element 3D hexahedral element with cubic polynomials
c −− GRAM MATRIX −− ( s t o r e d in t r i a n g u l a r format )
c d e t e r m i n e i n d e x i n t r i a g u l a r f o r m a t
k = nk ( k1 , k2 ) AP( k ) = AP( k ) + ( dv1 ( 1 )∗ dv2(1)+dv1 (2)∗ dv2(2)+dv1 (3)∗ dv2 ( 3 )
enddo
c l o o p t h r o u g h H1 t r i a l f u n c t i o n s
do k2 =1 , nrdofH
v2 = shapH ( k2 ) dv2 ( 1 : 3 ) = gradH ( 1 , k2 )∗ dxidx ( 1 , 1 : 3 )
+ gradH ( 2 , k2 )∗ dxidx ( 2 , 1 : 3 )
+ gradH ( 3 , k2 )∗ dxidx ( 3 , 1 : 3 )
c P o i s s o n e q u a t i o n
STIFFHH( k1 , k2 ) = STIFFHH( k1 , k2 ) + ( dv1 ( 1 )∗ dv2(1)+dv1 (2)∗ dv2(2)+dv1 (3)∗ dv2 ( 3 ) ) ∗ weight
enddo
enddo
enddo
!$OMP END PARALLEL DO
We have parallelized the two external loops, through Gauss integration points and through rows (test functions)
Trang 9We summarized our paper with parallel implementation of the DPG element matrices generator
We focused on the simple Poisson problem, with the pseudo-code described in previous section The numerical experiments have been performed for hexahedral DPG elements with cubic polynomials, integrated over Linux cluster node equipped with 14 cores
We can observe the efficiency going through 70% of 8 cores, 60 % on 11 cores down to 50 percent on 14 cores The corresponding speedup reaches up to 6.5 on 10 cores (see figures [1-3]) All computations were performed on a 8× 8 × 4 elements mesh.
In this paper we presented a scalability of parallel OpenMP integration of DPG matrices The parallel integrator has been obtained through OpenMP parallelization of sequential 3D DPG code developed for model Poisson problem by the group of prof Demkowicz We observe
a speedup up to 6.5 on 10 cores Future work will include the domain decomposition based parallelization of the DPG code, with hybrid parallelism including MPI on the level of particular elements and OpenMP on the level of matrices
The work of MW was supported by Deans grant no 15.11.230.270
References
[1] Franco Brezzi, Marie-Odile Bristeau, Leopoldo P Franca, Michel Mallet, and Gilbert Rog A rela-tionship between stabilized finite element methods and the galerkin method with bubble functions
Computer Methods in Applied Mechanics and Engineering, 96:117–129, 1992.
[2] Leszek Demkowicz and Jay Gopalakrishnan Recent developments in discontinuous galerkin
fi-nite element methods for partial differential equations IMA Volumes in Mathematics and its
Applications, 157:149–180, 2014 An Overview of the DPG Method.
[3] Leszek Demkowicz and Jay Gopalarkishnan A class of discontinuous petrov-galerkin methods part i: The transport equation 199:1558–1572, 2010
[4] Leszek Demkowicz and Jay Gopalarkishnan Numerical methods for partial differential, a class of discontinuous petrov-galerkin methods part ii optimal test functions 27:70–105, 2011
[5] Leszek Demkowicz, Jay Gopalarkishnan, and Anti Niemmi A class of discontinuous petrov-galerkin methods part iii: Adaptivity 62:396–427, 2012
[6] Tim Ellis, Leszek Demkowicz, and Jessy Chan Locally conservative discontinuous petrov-galerkin finite elements for fluid problems 68:1530–1549, 2014
[7] Leopoldo P Franca, Sergio L Frey, and Thomas J.R Hughes Stabilized finite element methods:
I application to the advective-diffusive model Computer Methods in Applied Mechanics and Engineering, 95:253–276, 1992.
[8] Leopoldo P Franca and Srgio L Frey Stabilized finite element methods: Ii the incompressible
navier-stokes equations Computer Methods in Applied Mechanics and Engineering, 99:209–233,
1992
[9] Brezzi Franco On the existence, uniqueness and approximation of saddle-point problems arising
from lagrange multipliers ESAIM: Mathematical Modelling and Numerical Analysis - Modlisation
Mathmatique et Analyse Numrique, 8:129–151, 1974.
Trang 1017
[10] F Gruber, A Klewinghaus, and O Mula The dune-dpg library for solving pdes with discontinuous
petrov-galerkin finite elements http://arxiv.org/abs/1602.08338, 2016.
[11] Thomas J.R Hughes, Guglielmo Scovazzi, and Tayfun Tezduyar Stabilized methods for
com-pressible flows Journal of Scientific Computing, 43:343–368, 2010.
[12] Babuska Ivo Error bounds for finite element method 16:322–333, 1971
[13] Demkowicz Leszek Babuska↔ brezzi Technical Report 0608, The University of Texas at Austin,
2006
[14] Nathan Roberts Camelia: A software framework for discontinuous petrov-galerkin methods
https://github.com/CamelliaDPG/Camellia, 2016.