While such applications entail solv-ing the 3D compressible Navier-Stokes equations, in principle otherequationsexpressibleinEinsteinnotationandsolvedusing finitedifferencesarealsosupport
Trang 1jo u r n al ho me p a g e :w w w e l s e v i e r c o m / l o c a t e / j o c s
architectures
Christian T Jacobs∗, Satya P Jammy, Neil D Sandham
a r t i c l e i n f o
Keywords:
a b s t r a c t
Exascalecomputingwillfeaturenovelandpotentiallydisruptivehardwarearchitectures.Exploitingthese
totheirfullpotentialisnon-trivial.Numericalmodellingframeworksinvolvingfinitedifferencemethods arecurrentlylimitedbythe‘static’natureofthehand-codeddiscretisationschemesandrepeatedlymay havetobere-writtentorunefficientlyonnewhardware.Incontrast,OpenSBLIusescodegenerationto derivethemodel’scodefromahigh-levelspecification.Usersfocusontheequationstosolve,whilstnot concerningthemselveswiththedetailedimplementation.Source-to-sourcetranslationisusedtotailor thecodeandenableitsexecutiononavarietyofhardware
©2016TheAuthor(s).PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense
(http://creativecommons.org/licenses/by/4.0/)
1 Introduction
HighPerformanceComputing(HPC)systemsandarchitectures
areevolvingrapidly.Traditionalsingleprocessor-basedCPU
clus-tersaremovingtowardsmulti-core/multi-threadedCPUs.Atthe
sametimenewarchitecturesbasedonmany-coreprocessorssuch
asgraphicsprocessingunits(GPUs)andIntel’sXeonPhiare
emerg-ingasimportantsystemsandfurtherdevelopmentsareexpected
withenergy-efficient designsfrom ARM and IBM.Accordingto
theITindustry,suchadvancesareexpectedtodelivercompute
hardwarecapableofexascale-performance(i.e.1018floating-point
operationspersecond)by2018[1].Yetmanyframeworksaimed
atcomputational/numericalmodellingarecurrentlynotreadyto
exploitsuchnewandpotentiallydisruptivetechnologies
Traditional approaches to numerical model development
involvetheproductionofstatic,hand-writtencodetoperformthe
numericaldiscretisationandsolutionofthegoverningequations
NormallythisiswritteninalanguagesuchasCorFortranthatis
considerablylessabstractwhencomparedtoanear-mathematical
domainspecificlanguage.Explicitlyinsertingthenecessarycallsto
MPIorOpenMPlibrariesenablestheexecutionofthecodeon multi-coreormulti-threadhardware.However,shouldauserwishtorun thecodeonalternativeplatformssuchasGPUs,theywouldlikely needtore-writelargesectionsofthecode,includingcallstonew librariessuchasCUDAorOpenCL,andoptimiseitforthatparticular hardwarebackend[2].AsHPChardwareevolves,anincreasing bur-denfacedbycomputationalscientistsbecomesapparent;inorder
tokeepupwithtrendsinHPC,notonlymustamodeldeveloper
beadomainspecialistintheirareaofstudy,butalsoanexpertin numericalalgorithms,softwareengineering,andparallel comput-ingparadigms[3,4]
Onewaytoaddressthisissueistointroduce aseparationof concernsusinghighlevelabstractions,suchasdomainspecific lan-guages(DSLs)andactivelibraries[4–8].Thisparadigmshiftallows
adomainspecialisttodescribetheirproblemasahigh-level, near-mathematicalspecification Thetaskof takingthis specification andtransformingit intoexecutablecomputercodecanthenbe handledinthesubsequentabstractionlayer;unlikethetraditional approachofhand-writingtheC/Fortrancodethatdiscretisesthe governingequations,thislayergeneratesthecodeautomatically fromtheproblemspecification.Finally,thegeneratedcodecanbe readilytargettedtowardsa specifichardware platformthrough source-to-sourcetranslation.Hence,domainspecialistsfocuson theequationstheywishtosolveandthesetupoftheirproblem, whilsttheparallelcomputingexpertscanintroducesupportfor
http://dx.doi.org/10.1016/j.jocs.2016.11.001
Trang 2havetoundergoa fundamentalre-write ifthedesiredbackend
changes.Useofsuchstrategiescanhavesignificantbenefitsforthe
productivityofboththeuseranddeveloper,byremovingtheneed
tospendtimere-writingcodeand/ortheproblemspecification[5]
Giventhemotivationfortheuseofautomatedsolution
tech-niques, in this paper we present a new framework, OpenSBLI,
for the automated derivation and parallel execution of finite
difference-based models.This is an open-source release of the
recent developments in the SBLI codebase developed at the
University ofSouthampton,involvingthe replacementof SBLI’s
Fortran-based core withflexible Python-basedcode generation
capabilities,andthecouplingofSBLItotheOPSactivelibrary[9–12]
whichtargetsthegenerated codetowardsa particularbackend
usingsource-to-sourcetranslation.Currently,OpenSBLIcan
gen-erateOPS-compliantCcodetodiscretiseandsolvethegoverning
equations,usingarbitrary-ordercentralfinitedifferenceschemes
andachoiceofeithertheforwardEulerschemeorathird-order
Runge-Kutta time-steppingscheme.OpenSBLIthenuses OPSto
producecodetargettedtowardsdifferentbackends.Itisworth
not-ingthatbackend APIssuchasOpenMP (version4.0and above)
arealsocapableofrunningonCPU,GPUandIntelXeonPhi
archi-tectures,forexample.However,currentlyOPShasnosupportfor
OpenMPversion4.0andabove.Moreover,codesthatarewritten
byhandinOpenMPwouldstillpotentiallyneedtobere-writtenif
differentalgorithmsorequationsweretobeconsidered.Thus,the
benefitsofcodegenerationstillplayacrucialrolehere,regardless
ofwhichbackendischosen
Theapplicationof SBLIhasso-far concentratedonproblems
inaeronauticsandaeroacoustics, inparticularlookingat
shock-boundarylayerinteractions(seee.g.[13–16]andthereferences
therein for more details) While such applications entail
solv-ing the 3D compressible Navier-Stokes equations, in principle
otherequationsexpressibleinEinsteinnotationandsolvedusing
finitedifferencesarealsosupportedbythenewcodegeneration
functionality, highlighting anotheradvantage of such a flexible
approachtonumericalmodeldevelopment.Notealsothatwhile
OpenSBLIdoesnotyetfeatureshock-capturingschemesandLarge
EddySimulationmodels(unlikethelegacySBLIcode),thesewill
beimplementedinthefutureaspartoftheproject’sroadmap.The
mainpurposesofthisinitialreleaseisthealgorithmicchangesto
legacySBLI’score
Details the abstraction and design principles employed by
OpenSBLIaregiveninSection2.Section3detailsthreeverification
andvalidationtestcasesthatwereusedtocheckthecorrectness
oftheimplementation.Thepaperfinisheswithsomeconcluding
remarksinSection4
2 Design
LegacyversionsofSBLIcomprisestatichand-writtenFortran
code,parallelisedwithMPI,thatimplementsafourth-order
cen-traldifferencingschemeandalow-storage,thirdorfourth-order
Runge-Kuttatimesteppingroutine.Itiscapableofsolvingthe
com-pressibleNavier-Stokesequationscoupledwithvariousturbulence
parameterisations(e.g.LargeEddySimulationmodels)and
diag-nosticroutines.Incontrast,OpenSBLIiswritteninPython,andby
replacingthelegacycorewithmoderncodegenerationtechniques,
theexisting functionality ofSBLI is enrichedwithnew
flexibil-ity;thecompressibleNavier-Stokesequationscanstillbesolved
inOpenSBLIforthesakeofcontinuity,butthesetofequationsthat
canbereadilysolvedessentiallybecomesasupersetofthatofthe
legacycode.Furthermore,theuseoftheOPSlibraryallowsthe
gen-eratedcodetoeasilybetargettedtowardssequential,MPI,oran
MPI+OpenMPhybridbackend(forCPUparallelexecution),CUDA
andOpenCL(forGPUparallelexecution),andOpenACC(forparallel executiononaccelerators),withouttheneedtore-writethemodel code.OPSisreadilyextensibleintermsofnewbackends,making thecodegenerationtechniqueanattractivewayoffuture-proofing thecodebaseandpreparingtheframeworkforexascale-capable hardwarewhenitarrives.ThemainachievementofOpenSBLIis theability toexpressmodel equationsat ahigh-level withthe helpoftheSymPylibrary[17],expandingtheequationsbasedon theindexnotation,andcouplingthisfunctionalitywiththe gen-erationofOPSC-basedmodelcodeandalsowiththeOPSlibrary whichperformscodetargetting.OpenSBLI’sfocusonthe genera-tionofcomputationalkernelsessentiallyformsabridgebetween thehigh-levelequationsandthecomputationalparallelloops (‘par-loops’) that iterateover thegrid pointsto solvethe governing equations
Foranygivensimulationthatistobeperformedwith OpenS-BLI,theproblem(comprisingtheequationstobesolved,thegrid
to solve them on, their associated boundary and initial condi-tions,etc)mustbedefinedinasetupfile,whichisnothingbut
aPythonfilewhichinstantiatesthevariousrelevantcomponents
oftheOpenSBLIframework.Allcomponentsfollowtheprinciple
of object-oriented design, and each class is explained in detail throughoutthesubsectionsthatfollow.Anoverviewoftheclass relationshipsisalsoprovidedinFig.1
Trang 32.1 Equationspecification
Inasimilarfashiontootherproblemsolvingenvironmentssuch
asOpenFOAM [18],Firedrake [4], FEniCS[5,6], OPESCI-FD [19],
Devito[20,21], deal.II[22]and FreeFEM++ [23],OpenSBLI
com-prisesahigh-levelinterfaceforspecifyingthedifferentialequations
thataretobesolved.Theseequations(andanyaccompanying
for-mulasfortemperature-dependentviscosity,forexample)canbe
expressedinEinsteinnotation,alsoknownasindexnotation.The
adoptionofsuchanabstractionisadvantageoussinceitremoves
theneed for the usertoexpand the equations by hand which
canbean error-pronetask Furthermore, much liketheDevito
domainspecificlanguage(DSL)[20,21]forfinitedifferencestencil
compilation,OpenSBLImakesuseoftheSymPysymbolicalgebra
librarythatsuppliesthebasiccomponentsrequiredforthe
mod-ellingfunctionalitythathasbeenimplementedinthepresentwork
This functionality includes the automatic expansion of indices
basedontheircontractionstructure,suchthatrepeatedindicesare
expandedintoasumaboutthatindex,andtheimplementationof
varioustypesofdifferentialoperator
2.1.1 Expressing
Considertheconservationofmassequation
∂
∂t +∂ ∂
whereujis thejthcomponentofthevelocity vectoru,isthe
densityfield,andxjisthecoordinatefield inthejthdimension
InanOpenSBLIproblemsetupfile,theuserwouldspecifythisasa
string,givingtheleft-handsideandright-handsideoftheequation
inthefollowingformat:
mass=“Eq(Der(rho,t),−Conservative(rho∗uj,xj))”
The functions Der and Conservative here are
OpenSBLI-specific derivative operators, each defined in their own class
derivedfromSymPy’sFunctionclass.Otherhigh-levelinterfaces
suchasOpenFOAMoffersimilardifferentialoperatorssuchasdiv
andgrad,forexample[18].Generalderivativesarerepresented
using the Der operator, whereas the Conservative operator
ensuresthatthederivativewillnotbeexpandedusingthe
prod-uctrule.Askew-symmetricformofthederivativeisalsoavailable
usingtheSkewfunction,discussedlaterinSection3.3.Allofthese
areessentially‘handler’/placeholder objectsthat OpenSBLIuses
for spatial/temporal discretisation after parsing and expanding
theequations abouttheEinstein indices.Special functionssuch
astheKronecker deltafunction and theLevi-Civitasymbol are
alsoavailable,derivedfromSymPy’sLeviCivitaand
Kroneck-erDeltaclassesinordertohandleEinsteinexpansion;thesetoo
areexpandedlaterbyOpenSBLI
2.1.2 Parsing
Onceallofthegoverningequationshavebeenexpressedby
theuserinstring format,theyarecollectedtogetherin
OpenS-BLI’s Problem class (see Appendix A This class also accepts
substitutions, formulas, and constants For long equations, such
optionalsubstitutions(suchasthedefinitionofthestresstensor)
canbewrittenasaseparatestring(inthesamewayasthe
gov-erningequations)toallowbetterequationreadability,andthen
automaticallysubstitutedintotheequations(suchasthe
conser-vationof momentum and energy equations)at expansion-time
insteadofperformingsucherror-pronemanipulationsbyhand
Theconstitutiveequationswhich definearelationshipbetween
theprognosticand non-prognosticvariablesaregivenas
formu-las,forexampletemperature-dependentviscosityrelations,andan
equationofstateforpressure.Theconstantsarethespatiallyand
temporallyindependentvariableswhicharerepresentedasstrings UponinstantiationoftheProblemclass,theprocessisinvokedto transformtheequationsintotheirfinalexpandedform
Foreach equationinstringform,anewOpenSBLIEquation object is created During its initialisation, SymPy’s parseexpr functionconvertstheequationstringintoaSymPyEqdatatype AnyoftheOpenSBLIderivativeoperatorssuchasDerand Conser-vative(currentlyinstringformat)arereplacedbyactualinstances
oftheDerandConservativeclasses.Similarly,anysubstitutions givenintheProblemareparsedandsubstituteddirectlyintothe expressionusingSymPy’sxreplacefunction.Allothertermsin theparsedexpressionarerepresentedbyOpenSBLI’s Einstein-Termclass,derived fromSymPy’s Symbolclass, which contains itsownmethodsand attributesfordetermining/expanding Ein-steinindices.Forexample,theclass’sinitialisationmethod init splitsupthetermujwherethereareunderscoremarkers,and storestheEinsteinindex jina listasa SymPyIdxobject.The get expanded method later replaces the alphabetical Einstein indiceswithactualnumericalindices,replacing jwith0and1,
inthe2Dcase.Finally,anyconstantsintheProblemobjectarealso representedasanEinsteinTermobject,butareflaggedasconstant termsinOpenSBLI,sothattheyarenotspatiallyor temporally-dependent Thecoordinate vectorcomponentsxj(and thetime termt)areaspecialcaseofanEinsteinTerm;thesearemarked withais coordinateflagsothat,duringtheexpansionphase,the EinsteinTermsaremadedependentonthecoordinatefield(and time,ifappropriate) toensurethat differentiationisperformed correctly
2.1.3 Expanding After the parsing and substitution stage, the equations are expandedaboutrepeatedindices.Note thatthisprocess is per-formedbyOpenSBLI,althoughvariousSymPyclassesunderpinthe functionality.Followingtheexample,(1)wouldbeexpandedas
∂
∂t + ∂
∂x0[u0]+ ∂
OpenSBLIloopsovereachEinsteinTermstoredintheparsed Equationobject,andmapsittoaSymPyIndexedobject.For exam-ple,thetermukwouldfirstbemappedtou[k].Theindexkinthe termisthenexpandedover0, ,d−1(wheredisthedimensionof theproblem)byreplacingitwitheachintegerdimension,yieldinga SymPyMutableDenseNDimArrayarrayofsized(foravector func-tion,ord×dforatensorofrank2)ofexpandedvariableswhich
isstoredasaclassattribute.Forexample,expandingthevector u[k]yieldstheexpansionarray[u0, u1]in2D.Uponexpansion, thetermsarealsomadespatially-dependent(i.e.indexedbyx0,
x1,x2coordinates,dependingonthedimension)and,ifapplicable, temporally-dependent(i.e.indexedalsobyt).Theonlyexceptions
tothisareconstantssuchastheReynoldsnumberRe.The expan-sionarrayfromthepreviousexamplethenbecomes[u0[x0, x1, t], u1[x0, x1, t]](and[x0, x1]fortheconstantcoordinate field)
Eachequationisexpandedbylocatinganyrepeatedindicesand thensummingoverthemasappropriate.Forexample,after map-pingeachEinsteinTerm(e.g.uk)toanIndexedobject(e.g.u[k]), themassequationisrepresentedinternallyas
Eq(Der(rho,t),−Conservative(rho∗u[k],x[k])) Sincetheindexkisrepeated,theexpansionarraysareusedto expandthisexpressionto
Eq(Der(rho[x0,x1,t],t),
−Conservative(rho[x0,x1,t]∗u0[x0,x1,t],x0[x0,x1,t])
−Conservative(rho[x0,x1,t]∗u1[x0,x1,t]),x1[x0,x1,t]))
Trang 4Fig 2.The regular grid of solution points upon which the governing equations
Finally,theDerandConservativefunctionsareapplied,with
theexpressionbecoming
Eq(Derivative(rho[x0,x1,t],t),
−Derivative(rho[x0,x1,t]∗u0[x0,x1,t],x0)
−Derivative(rho[x0,x1,t]∗u1[x0,x1,t],x1))
whichisequivalentto(2).Similarexpansioncanalsobeapplied
foranyotherequationsinvolvinge.g.diagnosticfields.Notehow
thecalls toDerandConservativehavebeenreplacedbycalls
toSymPy’sDerivativeclass(whichinturnusesSymPy’sdiff
function);whileitisSymPythathandlesthedifferentiation,itis
OpenSBLIthathandlestheexactformulationofthederivative(i.e
OpenSBLIhasensuredthatthederivativehasnotbeenexpanded
usingtheproductrulehere)
Anynested derivatives are also handledhere It is not
cur-rently possibleto specify,for example, diff(diff(uj, x i),
xj)usingSymPy’sdifffunctiondirectlybecausethefactthat
uj is dependenton xi and x j is not takeninto account In
contrast,theuseofDerandEinsteinTermslikeu jin
OpenS-BLIallowsthederivativetobecomputedcorrectlysincetheterms
aremadedependentthroughtheuseofIndexedobjectsas
previ-ouslydescribed.OpenSBLIusersmustinsteadusetheDerfunction
Der(Der(uj, xi), xj).Foreachnestedderivative(ornested
functioningeneral),theinnerfunctionisevaluatedfirstalongwith
allother non-nested functions.Only thenis the outerfunction
applied
For the purposes of debugging, OpenSBLI includes a
LatexWriterclass that takestheexpanded equations asinput
and writes them out in LaTeX format sodevelopers can more
easilyspoterrors,forexamplewhereindiceshavebeenexpanded
incorrectly
2.2 Grid
Thegoverningequationsarediscretisedonaregulargridof
solu-tionpointsthatspanthedomainofinterest;anexampleisprovided
inFig.2.Allgrid-relatedfunctionalityishandledbytheGridclass,
whichmustbeinstantiatedbytheuserintheproblemsetupfile
Thedimensionalityoftheproblemd,thenumberofpointsineach dimension,andthegridspacingmustallbesupplied.Aproblem
ofdimensiondwouldgenerateagridofNx0×···×Nxd−1 solution pointsintotal,whereNxi representstheuser-definednumberof gridpointsindirectionxi
Forthesakeofloopingovereachsolutionpointand comput-ingthenecessaryderivativesviathefinitedifferencemethod,each (non-constant)termisprocessedfurtherbyOpenSBLI;theindex
ofeachspatialcoordinate(e.g.x0)ismappedontoanindexover thegridpointsinthatspatialdirection(e.g.i0)whichwilliterate from0toNxi−1(foragivendirectionxi)whenthecomputational kerneliseventuallygenerated
Inadditiontothesolutionpointswithinthephysicaldomain,a setofhalopoints(or‘ghost’points),whichbordertheouter-most gridpoints,arealsocreatedautomaticallydependingonthe bound-aryconditionsandthespatialorderofaccuracy.Thesehalopoints arenecessarytoensurethatthederivativesneartheboundarycan
becomputedwiththesamestencilasthe‘inner’points.Theexact numberofhalopointsrequiredthereforedependsonthenumber
ofstencilpoints;forexample,inFig.2thestencilforasecond-order centraldifference(using3pointsineachdirection)wouldrequire onehalopointateachendofthedomain.Thevaluesthatthesehalo pointsholddependonthetypeofboundaryconditionapplied,and thisisdiscussedinmoredetailinSection2.6
Everyfield/terminthegoverningequationsthatisrepresented
by thegridindices holds a so-called‘work array’which essen-tiallycontainsthefield’snumericalvalueateachofthegridpoints, includingthehalos.Theimplementationofinitialandboundary conditionsisdonebyaccessingandmodifyingthisworkarray,as willbedescribedinSections2.5and2.6
2.3 Computationalkernels
TheKernelclassdefinesasequenceofcomputationalstepsthat shouldbeperformedtosolvethegoverningequations.Forinstance, onekernelmaybecreatedtocomputethespatialderivativeofa field,while anotherkernel handlestheinitialisationofthefield valuesbasedonagiveninitialcondition,andanotherhandlesthe enforcementof boundaryconditionsthatinvolvecomputations Duringtheinstantiationofakernel,therelevantvariablesandfields areclassifiedasinputs,outputsandinput/outputs(i.e.bothaninput andanoutput),andthekernel’srangeofevaluation(i.e.therange
ofgridindicesoverwhichthekernelisapplied).Thishelpsto min-imisedatatransfer,sinceonlythose variables/fieldsrequiredto performthecomputationarepassedtothegeneratedkernelcode
2.4 Discretisationschemes Onceagridiscreated,theequationsarediscretiseduponthat grid.ForspatialdiscretisationpurposesOpenSBLIoffersacentral differencingschemeforfirstandsecond-orderderivatives;allthe stencilcoefficientsarecomputedusingSymPy,whichallows sten-cilsofanarbitraryorderofaccuracytobecreated.Fortemporal discretisation purposes,OpenSBLI features the (first-order) for-wardEulerschemeaswellasthesamelow-storage,third-order Runge-Kuttatimesteppingscheme[24]presentinthelegacySBLI code
Touseaparticularscheme,oneshouldinstantiatea discreti-sationschemederivedfromthegenericbaseclasscalledScheme, whichessentiallystoresthefinitedifferencestencilcoefficientsor theweightsusedinaparticulartime-steppingscheme.Spatialand temporalschemesshouldbeinstantiatedseparately
Forthepurposeofspatialdiscretisation,handledbythe OpenS-BLI SpatialDiscretisation class, an Evaluations object is createdforeachoftheformulas,andthederivativesinthe equa-tions.Each objectautomaticallyfindsandstoresthe
Trang 5depend-enciesAandB).OncealltheEvaluationshavebeencreated,they
aresortedwithrespecttotheirdependenciesbeingevaluated(e.g
ifBdependsonA,thenAshouldbeevaluatedfirst).Thenextstep
involvesdefiningtherangeofgridpointindicesoverwhicheach
evaluationshouldbeperformed,andalsoassigningatemporary
workarrayfor each evaluation Allof the evaluationsare then
describedbyaKernelobject (seeSection2.3).It ishere,while
creatingthekernels,thatthe(continuous)spatialderivativesare
automaticallyreplacedbytheirdiscretecounterparts.Itshouldbe
notedthat,fortheevaluationofformulas,thesekernelsarefused
togetheriftheyhavenointer-dependenciestoavoidrace
condi-tionswhenrunningonthreadedarchitectures.Finally,toevaluate
theresidualforthepurposesoftemporaldiscretisation,the
deriva-tivesintheexpandedequations(representedbyanEvaluations
object)aresubstitutedbytheirtemporaryworkarrays,anda
Ker-neliscreatedforevaluatingtheresidualofeachequation
The temporal discretisation, handled by the
TemporalDis-cretisationclass, involvesapplyingthe variousstages of the
time-steppingschemesuppliedusingtheresidualscomputedby
thespatialdiscretisationprocess.Similarly,aKernelobjectis
cre-atedfortheevaluationsinthetime-steppingscheme
2.5 Initialconditions
Inorder fortheprognosticfields tobeadvancedforward in
time,initialconditionscanbeappliedusingthe
GridBasedIni-tialisationclass.Thisisaccomplishedinmuchthesameway
asspecifyingequations,butinvolvesassignmentofgridvariables
andworkarraysofgridpointvalues.Forexample,inthe
simula-tionsetupfilethex0coordinatecanbedefinedusingthegridpoint
indexandx0:
x0=“Eq(grid.gridvariable(x0),grid.Idx[0]∗grid.deltas[0])”,
whichinturndefines theinitialvalueforeach prognostic
vari-able,byassigningthistothearrayofvaluesateachgridpoint(also
knownasthevariable’sworkarray),e.g.:
rho=“Eq(grid.workarray(rho),2.0∗sin(x0))”
2.6 Boundaryconditions
OpenSBLIcurrentlycomprisestwotypesofboundarycondition,
implemented in the classes PeriodicBoundaryCondition and
SymmetryBoundaryCondition.Usersmayapplydifferent
bound-ary conditions in different directions if they so wish Periodic
boundaries are defined such that, for each prognostic field ,
(x0)=(xN)whereNisthenumberofpointsinthedomain.This
conditionisachievedviatheexchangeofhalopointdataateachend
ofthedomain.Symmetryboundaryconditionsenforcethe
condi-tionthat(xN)=(xN−1)forscalarfieldsandi(xN)=−i(xN−1)for
vectorfields(inthedirectioni),whichisachievedusinga
compu-tationalkernel
2.7 Inputandoutput
Thestateoftheprognosticfieldscanbewrittentodiskevery
niterationsasdefinedbytheuser,oronlyattheendofthe
simu-lation.ThisfunctionalityishandledbytheFileIOclass.OpenSBLI
adoptstheHDF5format[25,26]asitfeaturesparallelread/write
capabilitiesandthereforehasthepotentialtoovercometheserial
input/outputbottleneckcurrentlyplaguingmanylarge-scale
par-allelapplications[27,28].FuturereleasesofOpenSBLIwillcome
withtheabilitytoreadinmeshfilesandthestatefieldsfroman
HDF5file,enablingtherestartingofsimulationsfrom‘checkpoints’
aswellastheassignmentofinitialconditionsthatcannotbesimply definedbyaformula
2.8 Codegeneration OpenSBLIcurrentlygeneratescodeintheOPSClanguagewhich performsthesimulation;thisisessentiallystandardC++codethat includescallstotheOPSlibrary.Suchfunctionalityisaccomplished usingtheOpenSBLIOPSCCodePrinterclass(derivedfromSymPy’s CCodePrinterclass,usedtoperformthegenerationofOPSCcode statements)and theOPSC class(which agglomeratestheliteral stringsofOPSCstatementsandkernelfunctionsandwritesthemto file).Thegeneratedcode’sstructurefollowsagenerictemplatethat mapsouttheorderinwhichthesimulationsteps/computations aretobecalled.Thetemplateisrepresentedasamulti-linePython stringtemplate,witheachlinecontainingaplace-holderforthe codethatperforms aparticularstep.Examplesinclude$header which is replaced by any generic boilerplate headercode (e.g
#include <stdlib.h> and kernel function prototypes), $ini-tialisationwhich isreplacedbythegridandfieldsetup(e.g
bydeclaringanOPS blockusingtheopsdeclblockfunction), and$bccallswhichisreplacedbycallstotheboundary condi-tionkernel(s).Thistemplatecanbereadilychangedtoincorporate additionalfunctionality,suchastheinclusionofturbulence mod-els.Onceallcomponentplace-holdershavebeenreplacedbyOPSC code,thecodeiswrittenouttodisk.ForthecaseoftheOPSC lan-guage,twofilesarewritten;oneisaC++headerfilecontainingthe computationalkernels,andtheotheristheC++sourcefile contain-ingvariousconstantdefinitions(e.g.thetimestepsizedeltat,and theconstantsoftheButchertableauforthetime-steppingscheme), OPSdatastructures,andcallstothekernelsspecifiedintheheader file
OpenSBLI’slocalPythonobjects(mostpertinently,thekernel objectsthat describethecomputationstobeperformed onthe grid)areessentiallytranslatedtoOPSCdatastructuresand func-tioncallsduringthepreparationofthecode.Forinstance,when declaringcomputationalstencilsthat defineaparticularcentral differencingscheme,thelocalgridindicesstoredintheCentral schemeobjectareusedtowriteoutanopsstencildefinition duringcodegeneration.Similarly,opshalostructuresandcalls
toopshalotransferareproducedtofacilitatethe implemen-tationoftheperiodicboundaryconditions.Allfieldsaredeclared
asopsdatdatasets;foranexampleofwheretheseareused,see thefunctionopsargumentcallinthefileopsc.pywhich gen-erates/accumulatescallstotheOPSfunctionopsargdatthrough theuseof‘printf’-stylestringformatting,fillinginthe‘placeholder’ arguments(e.g.%sinPython)withvaluesfromthelocalOpenSBLI objects.Finally,callstoOpenSBLIKernelobjectsarerepresentedin OPSCasregularC++functions(seeFig.3)whicharepassedtothe opsparloopfunction(seeFig.4 whichexecutesthefunction efficientlyovertherangeofgridpointswithinthedesiredblock; OpenSBLIiscurrentlyasingle-blockcodesoonlyoneblock, con-tainingallthegridpoints,isused.FurtherdetailsontheOPSdata structuresandfunctionalitycanbefoundintheworkbyReguly
etal.[10] Someoptimisationsareperformedduringthecodegeneration stagebyOpenSBLItoavoid unnecessaryandexpensivedivision operationsinthekernels;rationalnumbers(e.g.finitedifference stencilweights that are rational) and constant EinsteinTerms raisedtonegativepowers(e.g.Re−1)areevaluatedandstored(e.g
byover-ridingthe printRational methodin the OPSCCode-Printerclass)
Oncethecodegenerationprocessiscomplete,theOPSlibrary
is called to target the code towards various backends These includethesequentialcode,MPIandhybridMPI+OpenMP paral-lellisedversionsofthecodeforCPUs,CUDAandOpenCLversions
Trang 6Fig 3. Code snippit showing two kernels from a 2D ‘method of manufactured solutions’ (MMS) simulation (see Section 3.2 ) using second-order central differences The first
of the code for GPUs, and an OpenACC version for
acceler-ators The test cases presented in this paper (see Section 3)
considerthesequential,MPI,andCUDAbackends.Targetting
‘hand-written’/manually-generated model code towards a particular
architectureissomethingthatiswell-knownasatime-consuming,
error-prone and often unsustainable activity; often numerical
modelshavetobecompletelyre-written,involvingmanyif-else
statementsand#ifdef-stylepragmastoensurethatthecorrect
branchofthecodeisfollowedforagivenbackend.Asthenumber
ofbackendsgrows,thecodebecomesunsustainable.Incontrast,
withtheabstractionintroducedherethroughcodegeneration,
sup-portforanewbackendonlyneedstobeaddedtotheOPSlibrary;
thetop-level,abstractdefinitionoftheequationsandtheir
imple-mentationneednotbemodifiedduetotheseparationofconcerns,
therebyhighlightingoneofthekeyadvantagesofautomatedmodel
development
Whencomparingthenumberoflinesandthecomplexityofthe
codethatgetsgeneratedbyOpenSBLI,anotheradvantageof
auto-matedmodeldevelopmentbecomesclear;inthecaseofthe3D
Taylor-Greenvortextestcase,theproblemspecificationfile
con-taining∼100linesgeneratesOPSCcodethatisapproximately1500
lineslong(excludingblanklinesandcomments).Asmore
param-eterisations(e.g.LargeEddySimulationturbulencemodels) and
diagnosticfield computationsareadded,it isexpectedthatthis
numberwouldgrowevenfurtherrelativetothenumberoflines
requiredinthesetupfile
3 Verification and Validation
InordertoverifythecorrectnessofOpenSBLIandbeconfident
intheabilityofthesolutionalgorithmstoaccuratelyrepresentthe
underlyingphysics, threerepresentativetest casescovering1,2 and3dimensionswerecreatedandarepresentedhere
3.1 Propagationofawave This1Dtestcaseconsidersthefirst-orderwaveequation,given by
∂
∂t +c∂
where isthequantitythatistransportedatconstant speedc Theexpectedbehaviouristhatanarbitraryinitialprofileattime
t=0isdisplacedbyadistancedt=ct,suchthat(x,t=0)=(x=dT,
t=T)forsomefinishtimeT.Theconstantcwassetto0.5ms−1
inthiscase,andtheequationwassolvedontheline0≤x≤1m Eighth-ordercentraldifferencingwasusedtodiscretisethedomain
inspaceinconjunctionwithathird-orderRunge-Kuttaschemefor temporaldiscretisation.Thegridspacingxwassetto0.001m,and thetimestepsizetwassetto4×10−4s,yieldingaCourant num-berof0.2.Asmooth,periodicinitialcondition(x,t=0)=sin(2x) wasused,andperiodicboundaryconditionswereenforcedatboth endsofthedomain
Thesimulationwasruninserial(onanIntel® CoreTMi7-4790 CPU)untilafinishtimeoft=1s.Theinitialandfinalstatesofthe solutionfieldareshowninFig.5.Asdesired,theerrorinthe solu-tionisverysmallatO(10−10),andprovidessomeconfidenceinthe implementationofthesolutionmethodandtheperiodicboundary conditions
Trang 7Fig 5.Results from the 1D wave propagation simulation Left: The solution field at time t = 0 s and t = 1 s Right: The error between the analytical solution and the numerical
3.2 Methodofmanufacturedsolutions
Themethodofmanufacturedsolutions(MMS)isarigorousway
tocheckthecorrectnessofanumericalmethod’simplementation
[29–31].Theoverallalgorithminvolvesconstructinga
manufac-turedsolutionmfortheprognosticvariable(s)andsubstituting
this intothegoverning equation.Since themanufactured
solu-tionwillnot, in general,betheexact solutiontotheequation,
a non-zeroresidual term willbepresent This residualterm is
thensubtractedfromtheRHSsuchthatthemanufacturedsolution
essentiallybecomestheexact/analyticalsolutionofthemodified
equation(i.e.theonewiththesourceterm).Asuiteofsimulations
canthenbeperformedusingincreasinglyfinegridstocheckthat
thenumericalsolutionconvergestothemanufacturedsolutionat
theexpectedratedeterminedbythediscretisationscheme
Forthistest,the2Dadvection-diffusionequation(withasource
termS)givenby
∂
∂t + ∂
∂xj
uj−k∂
∂xj
isconsidered
The constant k is the diffusivity coefficient which is set to 0.75m2s−1here.Theprescribedfieldui istheithvelocity com-ponent, with u0=1.0ms−1 and u1=−0.5ms−1 The prognostic field isto bedeterminedand hasaninitialconditionof (x,
t=0)=0.Inasimilarfashiontotheworksof[29–31],the manufac-tured/‘analytical’solutionm=sin(x0)cos(x1)employsamixtureof sineandcosinefunctionssincethesearecontinuousandinfinitely differentiable.TheSAGEframework[32]wasusedtosymbolically determinetheresidual/sourcetermS
Thedomainisa2Dsquarewithdimensions0≤x0≤2mand
0≤x1≤2msuchthatthemanufacturedsolutionisperiodic Fur-thermore,periodicboundaryconditionsareappliedonallsides
ofthedomain.Sixcentraldifferencingschemesoforder2,4,6,
8,10and 12areconsideredforthespatialdiscretisation,and a third-orderRunge-Kuttaschemeis usedthroughouttoadvance theequationin time.To performtheconvergenceanalysis,the gridspacingwashalvedforeachsuccessivecasesuchthatx=
y=
2,4,8,16 and 32.Thetimestepsizetwasalsohalvedfor eachcasetomaintainamaximumboundof0.025ontheCourant number;thiswaspurposefully keptsmallandnear-constantto minimisetheinfluenceoftemporal discretisationerror[33].All
Trang 8Fig 7.The absolute error (in the L2 norm) between the numerical solution and
simulationswereruninserial(onanIntel® CoreTMi7-4790CPU)
untilafinishtimeofT=100stoensurethatasteady-statesolution
wasattained
Fig.6demonstrates how convergestowards the
manufac-tured solution m as thegrid is refined.The convergence rate
foreach order ofthecentraldifferenceschemeis illustratedin
Fig.7.Theanomalyinthetwelfth-orderconvergenceplotwaslikely
causedbyreachingthelimitofmachineprecision.Overall,these
resultsprovideconfidenceinthecorrectnessofthe
automatically-generatedcode/model
3.3 3DTaylor-Greenvortex
TheTaylor-Greenvortexisawell-knownhydrodynamic
prob-lem[34–36]characterisedby transitiontoturbulence,decay of
turbulence,andtheenergydissipationduringitsevolution Itis
frequently used to evaluatethe ability of a numerical method
to capture the underlying physical processes During the
ini-tialstagesofevolution,thedynamicsdisplaystructuralchanges
(rollingup,stretchingandinteractionofthevortices).Thisprocess
isinviscidinnature.Laterthevorticesbreakdownandtransition
intofully-turbulentdynamics.Astherearenoexternalforcesor
turbulence-generatingmechanisms,thesmall-scalestructures
dis-sipatealltheenergy,andthefluideventuallycomestorest[34]
Thenumericalmethodemployedshouldbeabletocaptureeachof
thesestagesaccurately
The3DcompressibleNavier-Stokesequationsweresolvedin
non-dimensionalform,writteninEinsteinnotationas
∂
∂t +∂ ∂
∂ui
∂t +∂ ∂
xj[uiuj+pıij−ij]=0, (6)
and
∂E
∂t + ∂
∂xj[Euj+ujp−qj−uiij]=0 (7)
fortheconservationofmass,momentumandenergy,respectively
The(dimensionless)quantityisthefluid density,uiis theith
(scalar)componentofthevelocityvectoru,pisthepressurefield,
Eisthetotalenergy.Thecomponentsofthestresstensoraregiven
by
ij=Re1
∂ui
∂xj +∂uj
∂xi−23ıij∂uk
∂xk
whereıij istheKroneckerDeltafunctionandReistheReynolds number.Thecomponentsoftheheatfluxtermqaregivenby
qj= ( −1)M2PrRe
∂T
whereTisthetemperaturefield, istheratioofspecificheats,Mis theMachnumber,andPristhePrandtlnumber.Thevarious quan-titiesarenon-dimensionalisedusingthereferencevelocityuref,the referencelengthL,thereferencedensity ref,and thereference temperatureTref
Theequationofstatelinkingp,andT,isdefinedby
p= 1
andthetotalenergyisgivenby
E= p −1+1
2u
2
Thepressurepisnon-dimensionalisedbyrefu2
ref Centralfinite differenceschemesarenon-dissipativeandare therefore suitable for accurately capturing turbulent dynamics However,thelackofdissipationcanmaketheschemeunstable.To improvethestability,askew-symmetricformulation[37–40]was appliedtotheconvectivetermsin(5)–(7);theconvectivetermthen becomes
∂
∂xj[uj]=12
∂
∂xjuj+uj ∂
∂xj+ ∂
∂xjuj
, (12)
whereshouldbesetto1,ujandEforthecontinuity,momentum andenergy equations,respectively.It shouldalsobenotedthat theboththeconvectiveandviscoustermsarediscretisedusing thesamespatialorder Inallof thesimulationsperformed,the Laplacianin theviscous termis expandedusing a finite differ-encerepresentation ofthesecondderivative(i.e.nottreatedby successivefirstderivatives)
AspertheworkofDeBonis[35]andBullandJameson[36],the equationsweresolvedina3Dcube,with0≤x0≤2L,0≤x1≤2L, and0≤x2≤2L.Periodicboundaryconditionswereappliedonall surfaces.Thefollowinginitialconditionswereimposedattimet=0:
u0(x0,x1,x2,t=0)=sin
x
0
L
cos
x
1
L
cos
x
2
L
, (13)
u1(x0,x1,x2,t=0)=−cos
x
0
L
sin
x
1
L
cos
x
2
L
, (14)
u2(x0,x1,x2,t=0)=0, (15) p(x0,x1,x2,t=0)= 1
M2 +161
cos
2x
0
L
+cos
2x
1
L
2+cos
2x
2
L
, (16)
Inallthesimulations,Re=1600,Pr=0.71,M=0.1,and =1.4.The referencequantitiesL,urefandrefweresetto1.0,andthereference temperatureTrefwasevaluatedusingtheequationofstate(10)
Afourth-orderaccuratecentraldifferencingschemewasused
tospatiallydiscretisethedomain,andathird-orderRunge-Kutta timesteppingschemewasusedtomarchtheequations forward
intime.Asetofsimulationswasperformedoverarangeof res-olutions,namely643,1283,2563and5123uniformly-spacedgrid points.Forthe643case,anon-dimensionaltime-stepsizetof 3.385×10−3[35]wasused.Eachtimethenumberofgridpoints wasdoubled,thetime-stepsizewashalvedtomaintainaconstant upperboundontheCourantnumber.Thegeneratedcodewas tar-gettedtowardstheCUDAbackendusingOPSandexecutedonan NVIDIATeslaK40GPUuntilanon-dimensionaltimeoft=20,except forthe5123case;thiswastargettedtowardstheMPIbackendand
Trang 9Fig 8.Visualisations of the non-dimensional vorticity (z-component) iso-contours,
runinparallelover1440processesontheUKNational
Supercom-putingService(ARCHER)duetolackofavailablememoryonthe
GPU,and provideda goodexample ofhowthebackendcanbe
readilychanged
Thez-componentofthevorticityfieldatvarioustimescanbe
foundinFig.8.Atnon-dimensionaltimet=2.5vortexevolution
andstretchingareclearlyvisible,progressingontohighly
turbu-lentdynamicswheretherelativelysmoothstructuresroll-upand
eventuallybreakdownataroundt=9.Thispointischaracterisedby
peakenstrophyinthesystem.Thefinalstageofthesimulation
fea-turesthedecayoftheturbulentstructuressuchthattheenstrophy
tendstowardsitsinitialvalue
FollowingthedefinitionsofDeBonis[35],theintegralsofthe
kineticenergy
Ek= 1
ref
1
andenstrophy
ε= 1
ref
1
2
ijk∂uk
∂xj
2
(18)
werecomputedthroughoutthesimulations.Note that isthe
wholedomainand ijkis theLevi-Civitafunction.These
quanti-tiesareshowninFigs.9and10forthevariousgridresolutions,and
areplottedagainstthereferencedatafromaspectralelement
sim-ulationbyWangetal.[41]usinga5123gridforcomparison.Fig.10
highlights theinviscid natureof theTaylor-Green vortex
prob-lemfort<∼3–4.Thetransitiontoturbulenceoccursfrom∼3<t<9
(whichisassociatedwiththepeakinenstrophyinFig.9).Finally,
dissipationoccursatt>9.Theresultsshowaclearagreementwith thereferencedata,andrepresentsa solidfirststeptowardsthe validationofOpenSBLI
4 Conclusion
Advancesincomputehardwarearedrivinganeedtochange thecurrentstateofnumericalmodeldevelopment.Bydeveloping
anewmodellingframework basedonautomatedsolution tech-niques,we have effectively future-proofed thecoreof theSBLI codebase;nolongerdoesacomputationalscientistneedtore-write significantportionsofcodeinordertogetitupandrunningona newpieceofhardware.Instead,themodelisderivedfroma high-levelspecificationindependentofthearchitecturethatitwillrun
on,andtheunderlyingcodeisautomaticallygeneratedandtailored
toaparticularbackend,theresponsibilityfor whichwouldrest withcomputerscientistswhoareexpertsinparallelprogramming paradigms.Furthermore,theeaseatwhichthegoverningequations canbechangedisafundamentaladvantageofusingsuchabstract
Trang 10cases,each ofwhich comprisedadifferentsetofequations.The
discretisation,codegenerationandcodetargettingisperformed
automatically,therebyreducingdevelopmentcostsandpotentially
avoiding errors, bugs, and non-performant/non-optimal
opera-tions Inaddition, codethat solvesthedifferentvariants ofthe
samegoverningequationscanbeeasilygenerated.Forexample,in
thecompressibleNavier-Stokesequations,viscositycanbetreated
eitherasaconstantorasaspatially-varyingterm.Instatic,
hand-writtencodesthisflexibilitycomesatthecostofwritingdifferent
routinesforthevariousformulations,unlikewithautomatedcode
generationtechniques.Thisis particularlyusefulwhen wanting
to switch between Cartesian and generalised coordinates This
particularframeworkalsofacilitatesthefastandefficientswitching betweendifferentspatialordersofaccuracy,andreducesthe devel-opmenttimeandeffortwhenwishingtotryoutnewnumerical formulationsoftheequations(oranewspatial/temporalscheme)
onawidevarietyoftestcases
4.1 Futurework Explicit schemes suchas the oneimplemented here can be readilyextendibletoarangeofapplicationareassuchas compu-tationalaeroacoustics,aero-thermodynamics,problemsinvolving shocks,andhypersonicflow.Incompressibleflowsmayalsobe han-dledwiththeexplicit,compressiblesolverinOpenSBLIsolongas
Fig A.11.A cut-down version of the 3D Taylor-Green vortex setup/configuration file (67 lines long including whitespace), showing the key components and classes available
... between Cartesian and generalised coordinates Thisparticularframeworkalsofacilitatesthefastandefficientswitching betweendifferentspatialordersofaccuracy,andreducesthe devel-opmenttimeandeffortwhenwishingtotryoutnewnumerical... withcomputerscientistswhoareexpertsinparallelprogramming paradigms.Furthermore,theeaseatwhichthegoverningequations canbechangedisafundamentaladvantageofusingsuchabstract
Trang... class="text_page_counter">Trang 10cases,each ofwhich comprisedadifferentsetofequations .The
discretisation,codegenerationandcodetargettingisperformed