There has been no prior h to generate other triangle mesh h as Delaunaytriangulation, Delaunaytriangulation and quality triangulation tly usingthe parallel power of the GPU.The GPU ismas
Trang 1Shandong University, China
A THESIS SUBMITTED FOR THEDEGREE OF
DOCTOR OF PHILOSOPHY
DEPARTMENT OF COMPUTER SCIENCE
NATIONAL UNIVERSITY OF SINGAPORE
2014
Trang 2have beenused inthethesis.
This thesis has also notbeen submittedfor any degree in anyuniversity
previ-ously
Qi Meng
November 2014
Trang 3I would like to express my gratitude to all those who helped me during the
writing ofthisthesis
My deepest gratitude goes rst and foremost to Dr Tan Tiow-Seng, my
su-pervisor,who has oered me valuable suggestions in the studies He
hasspent m htime the withme andprovidedmewith
inspir-ing Withouthis illuminating insightful and expert
the of thisthesiswouldnothave beenpossible
I am very grateful to Cao Thanh-Tung and Gao for selessly sharing
theirknowledgeandtime,forhundredsdeep wehavehadaboutevery
forsittingbesides meto helpme outoftheendless de optimizations
Iwouldliketo thank Prof LowKok-Lim,Prof Alan ChengHo-Lun,and Prof
Huang Zhiyong for their ts and suggestions on my h
duringtheweekly meeting of G 3
Lab
Notforgetting to thank all myfellow lab-mates who oeredme great y
and inmanyways Thankyou,myfriendsHuaBinh-Son,Cao
Thanh-Tung, Gao Guo Jiayan, Li Ruoru, Yan Ke, Liu Linlin, Tang Ke, Su
Jun, Alvin Chia, Lai Kuan, Poonna Yospanya, Kang Juan, Sang Ngo Le, Li
Yunzhen,DuCheng,youhave hedmylifeinNUS,makingitmoreenjoyable
and fun
Lastbutnotleast, mygratitudewould goto mybeloved familyfortheirloving
supportingme withoutany t
Trang 4several trianglemeshgenerators h generateDelaunay
triangu-lation, Delaunay triangulation, Delaunay triangulation
andqualitytriangulationontheCPU.However,there isnosimilargeneratorfor
The GPU hasbeenusednot onlyfor pro tasksbutalso general
enormous parallel power In geometry, early works
on the GPU the digitalVoronoi diagram and Delaunay
tri-angulation There has been no prior h to generate other triangle mesh
h as Delaunaytriangulation, Delaunaytriangulation
and quality triangulation tly usingthe parallel power of the
GPU.The GPU ismassivelymultithreaded,withhundredsof pro in
or-dertofullyutilizetheGPUhardware,aparallelalgorithmusuallyneedstohave
regularized work and lo data However itis even not how to
hievethese whileadaptingthetraditionalandusually parallel
hniques, has to themeshgenerationproblem Soitis
not how to tlyapplytraditionalparallelalgorithms on the
GPU
Inthisthesis,wefo ondesigningmeshgeneratingalgorithmsin2D on
theGPU Two algorithmstermed asGPU-QM and GPU-CDTare proposedin
the thesis, h improve the quality of the Delaunay triangulation for a
point set, and Delaunay triangulation for a set of points
and ts, resp ely Both of these two algorithms are the rst GPU
algorithmsproposedsofar A toourexperimentsforbothsyn and
real-worlddata,ourGPUalgorithmsaren robustandrunfasterthan
thefastestsequentialalgorithm Comparing to thefastestsequential
implemen-tation,the GPU-QMgainsupto 5.5times speedup;theGPU-CDTgainsup to
two orders of magnitudespeedup Furthermore,we obtaintherst GPU mesh
generator b integrating theGPU-QM, GPU-CDTalgorithmswith an existing
workGPU-DT, h Delaunaytriangulationforapointsetusing
theGPU.Our meshgenerator digitalVoronoidiagrams, Delaunay
Trang 5triangulation, Delaunay triangulation, Delaunay
trian-gulation and high-quality triangle meshes in 2D on the GPU In order
to handlen errorand degenerate input,we implement
andsimulationof ymethod ontheGPUbased onthesequential
imple-mentations So our generator handle and is n
robust
Trang 6List of Figures viii
3.2 Issuesfora GPU DelaunayRenement Algorithm 26
3.4 hanismsforGPU-QM to HandleBoundary Rening 36
Trang 73.6.1 Dealing withvariable-size arrays 46
Trang 81.1 (a)An exampleof DTof a setof points (b)No pointis insidethe
1.2 (a)An example of CDT of a setof pointsand ts (b)No visiblepoint
isinsidethe ofanytriangleinthe triangulation 3
1.4 Twokindsofskinnytriangleswhose arem hlargerthantheir
short-est edges (a) Needle, whose longestedge is m h longerthan its shortestedge
(b)Cap,whosemaximumangleis to 180
2.2 APSLG ofpointsandsegments Theradiusof hdiskillustratedhere
2.3 An exampleof DT of aset of points No visiblepoint is insidethe
2.4 (a)Star of p,and the boldedges boundshaded triangles is the link of thestar
2.5 Flipping abto cd Beforeip, ab is not lo Delaunay, and the union of thetwo trainglesabcand abd is vex After ip,cdislo Delaunay 142.6 AnexampleofCDT ofaPSLG h ofpointsand one t(red
line) Novisiblepointisinsidethe ofanytriangleinthetriangulation 16
2.7 (a)A triangulation (b) Data of thetriangulation An orientated
2.8 Possible inparallelpro wheninsertingpointsintothetriangulation 19
3.1 Anytriangle whose edge ratiois larger than B is split
b inserting its ter Every new edge has lengthat leastB times that
of shortest edge of the bad triangle Left: before point insertion Right: after
Trang 93.3 The ter andthe teroftrianglepqrislabeledascand c 1 resp
tively (a)If |c 1 p | ≤ B|pq|,then c = c 1 (b)Otherwise,c 6= c 1 and c isobtained
b setting |c 2 p | = B|pq|,wherec 2 isthe terof trianglepqc 243.4 ter c is the sink The arrow shows how to nd the sink from a bad
3.5 1-3 ip (inserting d into the triangle abc) and 3-1 ip (removing d from the
3.7 Givenanon-extremepointv Toremovev,weipveandvcto thedegree
3.8 A h generates ashorter edge, and leadsto aninnite loop 37
3.9 1-2 ip (inserting midpoint d on the edge of ab) and 2-1 ip (removing the
3.10 An example shows that ifp does not h boundary edge ab, other points
and b ′ abisa edgesharedb the starof x and y 45
3.17 Oursolutionis to re-useemptyslots inthepoint/trianglelist 47
3.18 distribution (a)InputDT mesh (b)Outputqualitymesh 50
3.19 Totalrunningtimeandspeedupovertriangle ondierentdistributionsb using
3.20 distribution: oneintermediateresultafterseveraliterations Badtriangles
3.21 Meshquality between Triangle (left and GPU-QM (right
ondierent distributions (a)(b) Uniform distribution Gaussian
distribution (e)(f) Disk distribution (g)(h) distribution (i)(j) Grid
3.22 ComparisononthenumberofoutputpointsamongTrianglewithpriorityqueue,
GPU-QMwithpriority,andGPU-QMwithoutpriorityonuniform(left)andgrid
3.23 TherunningtimeofdierentstepsoftheGPU-QMalgorithmfor1millionpoints
Trang 103.24 Comparison on number of ips between GPU-QM and Triangle on dierent
distributions (a)Uniform (b)Gaussian Disk (d) (e)Grid 56
3.25 Comparisononthenumberof betweenGPU-QMandGPU-QM-Von
dierentdistributions (a) Uniform (b)Gaussian Disk (d) (e) Grid 58
3.26 Comparison on number of ips for three strategies, GPU-QM, orthogonal-test
(i.e.,GPU-QM-V),no-Deletion,ondierentdistributions (a)Uniform (b)Gaussian
3.27 TherunningtimeandspeedupofGPU-QM-V toTriangle fordierent
3.28 Comparisononnumberof foruniformdistribution withandwithout
3.30 Arasterimageanditsqualitymesh,in htheredpointsaretheinputpoints
4.1 Steps of the GPU-CDT algorithm (a) Input PSLG; (b) Step 1: Triangulation
Step2: Constraintsinsertion;(d)Step3: Edgeipping k
4.2 (a) Findthe rst triangleA in b the t (red line), yellow angle is therst triangle t to the a inthe vertex array (b) How to nd
4.3 Congurationsofatrianglepairin a t(drawnindashedline)
4.4 Flipping ofatrianglepairinvolvingA The tpqin
thetriangles from left to right (a) Case1a (b) Case 1b Case 2 (d) Case
4.5 (a) When the triangle pairs in the t c i = ab are only eitherdoublein or ve,thereexistsaippablepair(A, C) (b)B,Aand
C fulllCase3a B,A and C fulllCase 2 (d)B,A andC fulllCase3b 74
4.7 Push the ve pair towards the right end of the t using a ipping
4.8 Asyn dataset (left)and its Delaunay triangulation(right) 77
4.9 SpeedupoverTriangle when theCDT, (a)with 1M ts and
varyingthenumberofpoints, and (b)with10M pointsandvaryingthenumber
Trang 114.10 Total number of t-triangle in with dierent grid sizes, (a)
with1M tsand varyingthenumberof points,and(b)with10Mpoints
4.11 Runningtime for dierent steps for CDT, (a) with 1M ts
and varying the number of points, and (b) with 10M points and varying the
4.12 Comparisonwith Triangle on thetotal number of ippingswhen inserting
straints,(a)with1M tsandvaryingthenumberofpoints,and(b)with
4.15 Araster image(top) and theCDT foritsedge map(bottom) 83
Trang 122.1 Primitive operations to thetriangulationinFigure 2.7a 17
Trang 133.1 redundantpoint,non-DelaunayandDelaunayips 33
3.2 Mark ttrianglesforredundantpoint,anddoedge-ipfortrianglepairs
Trang 14Meshes osed of triangles are used in various h as
interpolation, nite element method, and terrain databases Although there are several
trianglemeshgeneratorson theCPU, h asTriangle, CGAL andsoon, h
generateDelaunaytriangulation(DT), Delaunaytriangulation(CDT),
forming Delaunay triangulation and quality triangulation, there is no h a generator as
faraswe knowforthe pro unit (GPU) tlyGPUwith its
enormous parallel power has been used widely in many for general
purpose GPU uses a massively parallel with hundredsto
thousandsofpro elementsto thousandstomillionsofthreadssimultaneously,
issuesinparallelprogramming has operationamongthreads, data
and b more serious problems In order to fully utilize the
GPU hardware, a parallel algorithm usuallyneeds to have regularized work and lo
data It is hard and even not how to hieve those while adapting
the traditionaland usually parallel hniques, h as to the
mesh generating problem In the thesis, we to design 2D CDT and qualitymesh
generating algorithms h are suitablefor parallel esp for the GPU
Furthermore,weobtaina2DmeshgeneratorontheGPUafterintegrating an
existingDT algorithm A to ourexperimentresultsforbothsyn
and real-world data, our mesh generator is n robust and runs m h faster than
existing sequential algorithms Comparing the fastsequentialimplementation, our quality
meshalgorithmrunsupto 5.5times faster,whileourCDTalgorithmrunsupto twoorders
Trang 15For the DT there are several GPU algorithms and implementations, h
as [RTCS08, QCT12, QCT13,CNGT14℄ Usually, the generationof Delaunay
triangulation is bined with quality mesh generation b all edges in the mesh
being Delaunay edges So we do not Delaunay triangulation
here
Inthethesis,wemanagedtoaddresstheothertwomeshgenerationalgorithmslistedabove,
i.e., CDT and quality triangulation In our CDT generation algorithm,the algorithm
handleany planar straight line graph (PSLG)input While inourqualitymeshgeneration
algorithm,theinputisasegment-boundedDTofasetofpoints,andthereisnosmallangle
between any two t segments Setting on theinput of the qualitymesh
generationalgorithm, isb inboththeory and no algorithm or
implementa-tion guarantee to terminate for all inputdata with good quality that even for
the sequential algorithm, it is still a big hallenge to handle small angles h are from
the input All existing algorithms are likely to be foiled b input data h have small
features, smallanglesor topologies Beforequalitytriangulationalgorithm
be applied to the GPU thoroughly preparation on knowledge of both CPU and
GPUareneeded thequalitymeshgenerationalgorithmforanyPSLGisleft tothe
future In thethesis we only thequality mesh generationalgorithm for
a segment-bounded DTof a setof points,and no angle betweenanytwo inputsegments is
lessthan
60◦
In the following kground knowledge related to mesh generator and GPU
ar-are intro At the end of this hapter, tributions of the thesis will be
highlighted
namedafterBorisDelaunayforhisworkonthis from1934 ADTforasetofpointsS
inaplaneisatriangulation hthatnopointinSisinsidethe ofanytriangle
inthetriangulation(see Figure1.1) ha sp propertyoftheDelaunaytriangulation
isalso empty property TheDTmaximizestheminimumangleofall theangles
ofthetrianglesinthetriangulation In otherwords,amongall triangulationsof agivenset
of points,theDT hasthe largestminimumangle
Due to its property of avoiding long, skinny triangles, the DT has many
indierentelds Forexample, in informationsystem(GIS),one
waytomodeltheterrainisto interpolatethedatapointsbasedontheDT [Gol94℄ Inpath
planning,theDT beusedto the minimumspanningtreeofa setof
Trang 16(a) (b)
Figure1.1: (a)AnexampleofDTofasetofpoints (b)Nopointisinsidethe
of any triangleinthetriangulation
usedto buildqualitymeshes fortheniteelement analysis [HDSB01℄
CDT is a extension of the DT where some edges in theoutput are
before-hand [Che89a℄; these edges are referred to as onstraints Given a set S of n points (orsites) in the2D plane and a set of ts, theCDT is a triangulation of
S havingall the ts while being as to the DT of S as possible (seeFigure 1.2) So the CDT inherits the DT's optimality: among all possible triangulations
of a set of pointsthat all the ts, the CDTmaximizes the minimumangle
Similar to the empty property of the DT, CDT must fulll the weaker onstrained
empty property To state thisproperty,itis venient to thinkof edges
asthewall h blo kingtheview For a CDT,no visiblepoint is insidethe
of any triangleinthetriangulation
Figure 1.2: (a)AnexampleofCDTofasetofpointsand ts (b)Novisiblepoint
isinside the of any triangleinthetriangulation
Trang 17Figure 1.3: An edgemap ofan imageand its CDT.
are tours in the of the body's skull; and in modeling, they are
es[Boi88,Tre95, Kal10℄ In short, theCDT tsthe DT and isa very useful
inmanyelds Figure 1.3 shows an example of how to applyCDT inthe image
v (details be found in 4.5.3) This gure shows an edge map of
a raster image (depending on the resolution of the image, the edge map might of
hundredsofthousands oflinesegments or ts),and theCDT resultforit
putational geometry Due to their property of avoiding long, skinny triangles, they
havemany indierent elds However, inthereal-world
h asinterpolation, the niteelement method, and thenite volume method, the
trian-gle meshes pro b the mesh generator should satisfy guaranteed bounds on angles,
edgelengths,the numberof triangles,and the gradingof trianglesfrom smallto largesize
Neither DTnor CDT satisfy hrequirementsintheoryand
For example, most mesh generation algorithms take a PSLG as their input A PSLG is
a set of v and segments ts) shown in Figure 1.5a Although the DT
maximizetheminimumangleofalltheanglesofthetrianglesinthetriangulation,thereare
someskinnytrianglesinthemesh Furthermore,theDTofthev maynotresp the
domainthat auser wishesto triangulate On theother hand,theCDT,as anextension of
theDT, to thedomain's boundary,butstill eliminateskinnytriangles
inthemesh
Conformityandskinnytriangledeletion, bothof theseproblems besolvedb inserting
additional points (or Steiner points, named after Jakob Steiner) The main y is
Trang 18where to the additional points Usually people use the hnique Delaunay
renement to generate triangle meshes The main problem of Delaunay renement is to
ndatriangulationthat t arbitrarily domains,and tainonlytriangles
of appropriatesizeand shape
Figure 1.4: Two kinds of skinny triangles whose are m h larger than their
shortest edges (a) Needle, whose longest edge is m h longer than its shortest edge (b)
Cap,whosemaximumangleis to
2 Alltriangles shouldbe relatively "round" shape: there are no too smallorlarge angles
Triangleswithtoosmallorlargeanglesare skinnytriangles(lowqualityorbad
trian-gles;seeFigure1.4) Largeangle leadtolargeerrorinthegradientsoftheinterpolated
and large error in nite element method Small angle lead the
systems of equations h that the nite element method yields to be
3 Algorithm oer asm h trolaspossibleoverthesizes of triangles, h
thespeedand Usually, smalltriangles oer more than largerones, but
the timeis proportionalto the numberof triangles dierent triangle
sizesentailstradingospeedand Foragivenmesh,itisnot toreneitto
generate anothermesh h tains more triangles But the reverse pro is relatively
[MTT97℄
Most meshgenerationalgorithms take a PSLG astheirinput DuringtheDelaunay
rene-ment pro a segment usually is divided into smaller edges The domain of interest
orthetriangulation domain is theregionthatisneeded tobetriangulated hadomain
shouldbe segment-bounded, meaningthat thesegments ver theboundaryof the
Trang 19triangu-either vex or ve, and it may tain holes, but the holes should be bounded b
segments
Figure 1.5: (a) A PSLG, and (b) a mesh generated b Ruppert's Delaunay renement
algorithm
The main pro of Delaunay renement algorithms is to insert Steiner
points until the mesh meets ts on triangle quality and size while maintaining a
DT or CDT.There are several advantages of the Delaunayrenement algorithm First,it
maintainstheDT, hmaximizestheminimumangleamongallpossibletriangulationsof
apointset,andisoptimalamongallpossibletriangulationsofapointset inserting
a vertexto a DT isa lo operation, h isinexpensive inunusual
In h as the nite element method, there are several measures
in use for the quality of a triangle Usually, a good quality mesh means all triangles are
non-obtuse, or all with bounded asp ratio Asp ratio of a triangle is the length of
thelongest edgedividedb thelengthofthe shortestaltitude A fairlygeneralmeasure of
triangleistheminimumangle α, thisgivesaboundofπ − 2αonmaximumangle andguaranteesanasp ratiobetween| 1
sin α |and| 2
sin α | Themostnaturalandelegant measureforanalyzingDelaunayrenementalgorithmsisthe adius-to-shortestedgeratio (r/l)
of a triangle [MTTW95℄ The enter and adius of a triangle are the ter
and radius of its resp ely A , a triangle's
edgeratio r/lis related to its smallestangleθ min b the formula r/l = 1/(2 sin θ min ) Thesmaller a triangle's ratio, the larger its smallest angle, i.e., given an upper bound B onthe edge ratioof all triangles ina mesh, there is no angle smaller
than arcsin 2B 1 (and versa) Clearly,people want to make B assmall aspossible Forexample, ifB = √
2 isemployed,all anglesareboundedbetween
20.7◦
and138.6◦
thequalityofthemeshdependssolelyonhowthepoints hdenedtheDTare
distributed,sothe tralquestionofanyDelaunayrenementalgorithm ishowto hoose
Trang 20farfrom other pointsas possible If a newpointinsertedis too to another point,the
resulting smalledgewill engenderthin triangles Fora bad triangle t,a waydate k at least to mid-1980s [Fre87℄) to eliminate it is to insert a Steiner point at its
teranduseLawson'salgorithm[Law77℄ortheBowyer-Watson algorithm[Bow81℄
tomaintainthetriangulationbeingaDT.Afterthe ter'sinsertion,thebadtriangle
survive,b its isnolongerempty Intheliterature,thereareother
dierent kinds of hniques on how to hoose thenext Steiner point Allof them willbe
Anotherbig hallengeto theDelaunayrenement algorithmisto ensureterminationofthe
algorithm whilestillobtaininggoodqualitymesh Dueto smallanglesfrom theinput,the
Delaunayrenement algorithm always terminate Small angleinherentintheinput
geometry beremoved,anditisnotpossibletotriangulateadomainwithout
anynew smallangles aspointedb Shew hukin[She00℄: foranyangle boundθ > 0,thereexistsaPSLGX hthatitisnotpossibletotriangulateX without anew
(not present in X) whose angle is smaller than θ This statement imposes a fundamentallimitationon anytrianglemeshgenerationalgorithm Ifnoinputangleissmallerthan60 ◦
,
Ruppert's algorithm [Rup95℄ guarantee to terminate, and have no angle smaller than
k h is no larger than arcsin 1
) Similarly, forthe same inputs as mentioned above, Chew's Delaunay
renementalgorithm [Che89b℄ also terminate,andall anglesarebetween30 ◦
and 120 ◦
A ,mostofexisting algorithmsworkwellin formany butall of
them would failon some triangulation domains Therefore, in thethesis, we only
fo on Delaunay renement algorithm for DT mesh of a point set in h the mesh is
segment-bounded
Drivenb thedemandforrealtime,high-denition3D theprogrammable
pro unitorGPUhasevolvedintoahighlyparallel,multithreaded,man pro
withtremendous horsepowerandveryhighmemorybandwidth,asillustrated
b Figure 1.6 The inoating-point ybetweentheCPUand theGPU
is that more transistors are devoted to data pro ratherthan data hing and ow
trolon theGPU,as illustrated b Figure 1.7
hersanddevelopersareb moreandmoreinterestedindoinggeneralpurpose
onGPUs,wheretheGPUsareusednotonlyfor pro tasksbut
ph simulationto data miningand geometry
Trang 21(a) (b)
Figure 1.6: (a)Computational power and (b) memorybandwidth of theCPU and GPU
(NVIDIA 2012 [NVI13℄)
Figure 1.7: TheGPU devotes more transistorsto data pro
putations,i.e., thesame isperformedon manydataelementsinparallel,
espe-whena highamount of isinvolved Inaddition, ha trend
is withthe t intro of CUDA,a generalpurposeparallel
b NVIDIAwheredevelopers now witheasethefullpowerofGPUs
formore tasks
In the eld of geometry, GPU has been employed to solve some problems
Earlyworks thedigitalVoronoidiagram(VD)[HKL
+
99,FG06,CTMT10℄,
a thatis relatedto the DT.Theseworks alsomentionedthepossibilityof
obtainingthelatterfromtheformerstraightforwardly However,we thattheVoronoi
diagram ina digital (of atexture) is not thedual of theDT in a tinuous
andonlyuntil tly,Rong etal.[RTCS08℄presentaseriousattempttoderivethe
DT fromthedigitalVD Theiralgorithm,however, ishybrid, parallel is
onlyusedintherstpart whileleavingtherest to asequentialalgorithm After thatQi et
Trang 22Inthethesis, we thealgorithm h generatetheDT foraset ofpointsin
2D asGPU-DT
As fortheCDTand qualitymeshproblems,there is no t GPUsolutionasfaraswe
know This is partly b h problems do not present themselves readily to parallel
A parallel algorithm, in order to fully utilize the GPU hardware, usually
needs to have regularized work and lo data It is not how to hieve
those whileadapting thetraditional and usually parallel hniques, h
Motivated b the rapid of the p of GPU, the exibility of the parallel
programming model CUDA and the observation that there is no known algorithm and
implementationofa2DmeshgeneratorontheGPU,theob eofthisworkistodevelop
a 2D meshgenerator on the GPU The major tributionsof this work are listedin the
following:
1 Algorithm GPU-QM: A new and t h for 2D quality triangle
meshontheGPU.ItistherstGPUsolutionforthisproblem A toourexperiment
results, this algorithm handle both syn and real-world data very well When
toTriangle,ouralgorithm generatemesheswithsimilarqualityasthemeshes
generated b Triangle, and runs up to 5.5 times faster Furthermore, this algorithm
oer bothtermination andqualityguarantees
2 Algorithm GPU-CDT: The rst GPU solution to the 2D CDT of a PSLG
of points and edges Our implementation runs up to two orders of magnitude
fasterthanthe best sequentialimplementationson theCPU.This resultis inour
experimentwithbothrandomlygeneratedPSLGsandreal-worldGISdatahavingmillionsof
pointsandedges Furthermore,we provethatthealgorithmisguaranteedto terminate
for any given PSLG, and the total number of ips performed b the algorithm is Θ(n 2 ),where nisthe numberof inputpoints
3 A software bining GPU-DT, GPU-QM, GPU-CDT together It is the rst GPU
mesh generator so far Similar to Triangle software, thissoftware generate DT,
CDT, Delaunay triangulations, digital Voronoidiagrams, and high-quality
tri-angle meshes But our algorithmsare parallel h isimplemented usingtheCUDA
pro-gramming model on nVidia GPUs The experiment results of the program show that it is
n robustand runsm hfasterthananyexisting CPUprograms hasTriangle
and CGAL The des for all the algorithms have also been made freely available to the
Trang 234 Several key GPU hniques, h as handling and robustness,
han-dling operation among threads, dealing with usingmultiple iterations
The GPU hniques have been do ted in the thesis, and provide valuable
forfuture h
The rest of the thesis is organized asfollows Chapter 2 intro some denitions,
data andGPUprogramming ThealgorithmsGPU-QMand
GPU-CDT would be intro in Chapter 3 and Chapter 4, resp ely Finally, Chapter 5
thethesis
Trang 24Before intro our algorithms in the following hapters, in this hapter we intro
some denitionsand properties aboutthem in 2.1 2.2 esthe
data used in our mesh generator all the algorithms proposed in
the thesis are based on GPU, some related programming are in
2.3 At the end, 2.4 presents the experimental environment used in the
thesis
2.1 Terminology and Denition
Denition 2.1.1 (Voronoi diagram) Let S b a set of n sites in the an sp e of
ℜ 2
For e site p of S, the Voronoi region R(p) of p is the set of points that are
to p than to other sites of S TheVoronoi diagram V(S) is the sp e partition e byVoronoi regions The elements of S are also alled sites of this Voronoi diagram The linesegments share by the boundaries of two Voronoiregions are alled Voronoiedges, and the
points share by the boundaries ofthre or moreVoronoiregionsare alled Voronoi es
In the instead of a tinuous plane, we only theset of all integer
Figure 2.1: DigitalVoronoidiagram
Trang 25grid points Ifa gridpointp liesinsidetheVoronoiregionof thesitex i,and we saythat p
is b x i In pisequal fromx i andx j and i < j,we pb x i Thesetof gridpointsformthe VoronoidiagramD(S)ofS (seeFigure2.1) Wethispro
Denition 2.1.2 (Planar graph) Let V b a nite set of n es in ℜ 2
, and let E b aset of edges determined by the es of V A planar graph is the pair G = (V, E) thatsatises:
(i) For e edge ab ∈ E, ab T
V = ∅, and(ii) For e edge pair ab 6= cdin E, ab T
cd = ∅.Denition2.1.3(PlanarStraightLineGraph-PSLG) PSLGisagraphin the es
areembedded as pointsin the an plane, and the edges areembedded as ossing
line segments
Bydenition,aPSLG tainbothendpointsofeverysegmentandasegmentmayin
v and othersegmentsonlyat its endpoints Figure2.2 shows anexample ofPSLG
Figure 2.2: A PSLG of pointsand segments The radius of h diskillustrated
hereis thelo featuresizeofthe pointat its ter
Denition 2.1.4 (Lo featuresize) Given a PSLG X, the lo al featuresize lfs(p) at anypointpistheradiusofthesmallestdisk entere atpthatinterse two es
or segments of X Two features, e a vertex or segment, are said to b
if they interse
Figure 2.2 illustrates the notion of lo feature size b giving examples for a variety of
Trang 26featuresizeof thepoint.
Proof The disk having radius lfs(u) tered at u in two t features of
X Thediskhavingradiuslfs(u) + |uv| tered atv tainsthepriordisk, and thusalso
in thesame two features thesmallestdisk tered at v thatin two
t featuresof X hasradiusno larger thanlfs(u) + |uv|
Denition 2.1.5 (Triangulation) Triangulation is a PSLG G = (V, E) that E ismaximal
A sp of triangulationisDelaunaytriangulationasshown inFigure2.3
Denition 2.1.6 (Delaunay Triangulation) A triangulation G = (V, E) is Delaunay if all
edges ab ∈ E satisfy the alled empty property (with respe to the setof points V)that is to say, there is a that passes through a and b that the other points of V
areexterior tothe
Figure 2.3: AnexampleofDTofasetofpoints Novisiblepointisinsidethe
of any triangleinthetriangulation
In the non-degenerate i.e., no four or more points are the Delaunay
tri-angulation for a given set of points is unique For other degenerate the Delaunay
triangulation is not unique, i.e., there are more than one Delaunay triangulation Then
anyof them willbe Among all triangulationsof a nite point set S ⊆ ℜ 2
,the
Delaunay triangulation maximizes the minimum angle of all the angles of the triangles in
thetriangulation
The star of a vertex p (Stp) of all triangles that tain p The link of p
of all edges oftriangles inthe star thatare disjoint from p (see Figure2.4a) Let p / ∈ S be
a point in the interior of the vex hullof S, and assume that S S
{p}is non-degenerate.Let D be the DT of S and D p be the DT of S S
{p} The prestar of p (Ptp) of
Trang 27substituting the star for the prestar, D p = (D −Ptp ) S
Denition 2.1.7 (Lo Delaunay) Let S b a point set, T b a triangulation of S An
edge ab ∈ T islo ally Delaunay if
(i) it belongs to onlyone traingle and therefore bounds the onvex hull of S, or
(ii) it belongs to two triangles, abcand abd, and dlies outside the of abc.Denition 2.1.8 (Delaunay Lemma) If every edge of T is lo ally Delaunay then T is the
DT of S
Edge ip is a lo operation proposed in [Law77℄ If ab belongs to two triangles, abc
and abd, whose union is a vex quadrangle, then we ip ab to cd, see Figure 2.5
A totheDelaunayLemma,we useedgeipsaselementaryoperationsto vert
an arbitrarytriangulation T to theDT
Figure 2.5: Flippingabto cd Beforeip,abisnotlo Delaunay,andtheunionofthetwo trainglesabc andabd is vex Afterip,cdislo Delaunay
Trang 28There are many sequential algorithmsdeveloped for the CPU to theDT [Aur91,
For97,SSD97℄ Allthesealgorithmsingeneralfollowoneofthethreewell-knownparadigms:
[Dwy87℄,sweep-line[For87℄and talinsertion [GKS92℄
a Divide-and-Conquer Algorithmsbasedonthisstrategy elydivideasetofpoints
intotwosmallersets,untilasetissmallenoughto triviallyitsDT.Thenitmerges
elytheresultsoftwo small tsetsintothatofabiggerone,untilresultsofall
sets aregroupedinthe triangulation Usingthis h,theDT bebuiltinoptimal
O(n log n)time[SH75,Dwy87℄
b Sweep-line The Voronoi diagram and DT are dual to h other Fortune [For87℄ uses
a sweep-line algorithm to the Voronoi diagram, from h the DT is obtained
First,the algorithmsortsthe inputpoints to theirx ordinates,then avline, the sweep-line, is swept from left to right Points behind the sweep-line are
already added into the Voronoi diagram, while points ahead of the sweep-line are waiting
forpro Asthesweep-lineprogresses,theVoronoiedgesaregenerated tally
The runningtimeof thisalgorithmis also O(n log n)
emental Insertion A natural way to tly the DT is to repeatedly
add points one at a time, re-triangulate the parts of the triangulation To insert
a point, we rst lo the triangle or edge taining the point The new point splits
the triangle taining itself into three triangles,or thetwo triangles t to theedge
tainingitselfinto fourtriangles Subsequently,weperformedgeippingtomaintainthe
triangulationbeingaDT.Althoughthis talinsertion hrunsinO(n 2
)time
in the worst the exp time y still be O(n log n), provided that thepointsare insertedin arandomorder [GKS92℄
For the GPU algorithm, there are a few t works, h as [RTCS08,CNGT14℄
ouralgorithmsproposedinthe hapterareGPUalgorithms,we hooseanyone ofthem
Denition 2.1.9 (Visibility) Two es a and b of a graph G = (V, E) are visible from
e other (in G) if the line segment ab does notinterse any of the edges of E
Denition 2.1.10 (ConstrainedDelaunayTriangulation) LetG = (V, E) b a planar graphwithE 6= ∅ A triangulation (V, E ∪ E ′ ) isa onstrained Delaunay triangulation of G if the
Delaunay triangulation is a triangulation h is as as possible to the Delaunay
triangulation Figure2.6 showsan exampleof Delaunaytriangulation
Trang 292.2 Data
Figure2.7bshowsthedata ofthetriangulationshowedinFigure2.7a Throughout
the phases of DT, CDT and quality mesh, we need to frequently walk from
triangletotriangle,oraroundthetrianglefanofavertex As h,wehavetomaintaintwo
data First, for h trianglewe always maintain thelink to its three neighbors
The three v of a triangle are indexed with 0, 1 and 2 The neighbors of a arealso indexed with 0, 1, 2 in h a way that the neighborindexed b i is opposite to thevertexwiththesame index for h vertex, we maintaina linklist ofall triangles
tto that vertex
Anytimewhenthetriangulationis hanged,wehavetokeepupdatingthesedata
Like Triangle, all triangles in our algorithm are oriented triangles Three v of an
oriented triangle are origin vertex, destination vertex, apex vertex resp ely An
oriented triangle a pointerto a triangleand orientation There are three possible
orientations foratriangleabc,that isabc,bca,and cab(allaredened in kwiseorder) Supposethe orientation of triangle abc is abc, that means a is the apex vertex oftriangle, b is the origin vertex of triangle, and c is the destination vertex of triangle (seeFigure ) In we use an edge to denote the orientation of a triangle, and
use the orientated triangle to denote an edge of the triangle For example, the orientated
Figure 2.6: An example of CDT of a PSLG h of points and one t
(red line) No visiblepointis insidethe of anytriangleinthetriangulation
Trang 30(a) (b)
Figure 2.7: (a)Atriangulation (b)Data ofthetriangulation Anorientated
triangle
Operation Result Usage
sym(abc) bad Findtheabuttingtrianglesharing an edgeab/ba
lnext(abc) bca Findthenext edge kwise)of a triangle
lprev(abc) cab Findthepreviousedge kwise) ofa triangle
onext(abc) acf Find the next edge kwise with the same
origin
oprev(abc) adb Findthenext edge kwise withthesame origin
dnext(abc) dba Find the next edge kwise with the same
org(abc) a Findtheoriginvertex
dest(abc) b Findthedestination vertex
apex(abc) c Findtheapexvertex
bond(abc,bad) - twotriangles sharingan edgeab/ba
Table 2.1: Primitiveoperations to the triangulationinFigure2.7a
triangleabcdenotestheedgebcof thetriangle, andedgebc denotestheorientatedtriangle
Trang 31Inthealgorithmsintro inthenexttwo hapters, every point, t,andtriangle
has a unique index For example, there are n points in all, then h point has a indexnumberrangingfrom0 to n − 1
CUDA is a parallel programming model and a software environment for parallel
IntheCUDAmodel,theCPUis thehost andis tooneormore
multipro (SM) h SM is osed of manystreaming pro (SP), t
eight SPsperSM.The CUDA programminglanguageis an extension to Cand C++ with
some extra syntax Programmer dene parallel kernels on
the usingCUDAprogramminglanguage
A t of aCUDA program isasfollows
1 Allo memoryof theGPU
2 Copythe datafromthe CPUto theGPU
3 Congure thethread hoose the blo kand grid dimensionforthe
problem
4 hthethreads
5 hronize the CUDA threads to ensure that the has all its tasks
beforedoing furtheroperationson theGPU memory
6 thethreadshave datais kfrom theGPU to theCPU
7 Freethe GPUmemory
In all, GPU ismassivelymultithreaded, with hundreds of pro To elyutilize
the GPU, it isdesirable to have tens of thousands of threadsat anygiven time
As h, we keepin mind the following two design in developing algorithms for
theGPU
First, the GPU is most suitable for data-parallel in h the
same is performedon multiple of datab manythreads Therefore,we
needto makeourthread de(oralgorithm)assimple(withlittle trolow)
and asuniform(with similaramount of work variousthreads) aspossible
with so many threads, operation among threads, data and
are serious problems To mitigate this, we usually employ some simple
ks to break the set of tasks into several groups, within h the tasks be done
Trang 32tlywith no orlittle Figure 2.8 shows a possible in parallel
pro-when insertingpointsinto the triangulation In thisgure, point a is supposedto
be inserted on the edge t to triangle A and C; point b is supposed to be inserted
on the edge t to triangleA and B; cshould be insertedon the edge t to B
and C Obviously, if there is more than one point h need to be inserted into a sametriangle, onlyone point be pro at atime But ifapointis insertedinto an edge,
it will hange two triangles in the triangulation So we need to make sure that no other
thread istrying to updatethese triangles In orderto solve the betweeninsertions,
we need to break the insertions into several rounds At h round, we want to insert as
manypointsaspossiblewithoutany An arrayX isneededforthetrianglesinthetriangulation,and every roundwe do thefollowingtwo steps:
Figure 2.8: Possible inparallelpro when inserting pointsinto the
triangu-lation
1 Lo the triangle or edge h tains the point p If the point is tained in
a triangle t, mark the triangle with X[t] = min(X[t], p) If p lies on an edge shared btwo triangles t 1 and t 2,we mark both triangles with X[t 1 ] =min(X[t 1 ], p) and
X[t 2 ] =min(X[t 2 ], p) Forthe showninFigure2.8,wemarkthreetrianglesas: X[A] =
min(X[A], a), X[A] = min(X[A], b), X[B] =min(X[B], b), X[B] = min(X[B], c), X[C] =
min(X[C], c), X[C] = min(X[C], a) In CUDA, when several threads try to writeto X[t],
itis guaranteed one ofthem will Themarkingis doneusingthe minimum
operation, h isreadilyavailableon theGPUs
2 UsingthearrayX to whether to insertpinthisroundornot p bepro
inthisroundifall themarkswritten b it (eitherone ortwo) are notoverwritten Forthe
shown in Figure 2.8, the point with smallest index among a, b and c will be inserted
inthisround
Weimplementouralgorithmsb usingtheCUDAprogrammingmodelb NVIDIA[NBGS08℄
Trang 33DDR3 RAM and an NVIDIA GTX580 Fermi with 3GB of video memory.
CPU and GPU used here were bothtop-of-the-line at thetime of experiment Visual
Stu-dio2008 andCUDA5.5Toolkitareusedto alltheprograms,withalloptimizations
enabled Inthethesiswe theresultsfromouralgorithmswiththeresultsfromthe
best-known sequentialtrianglemeshgenerators Triangle [She96a℄and CGAL[CGA11℄
Triangle A2Dmeshgenerator, h generate DT,CDT, Delaunay
trian-gulations, Voronoi diagrams, and high-quality triangle meshes It has thousands of users,
with ranging from radiosity rendering and terrain databases to stereo vision
and imageorientation, aswellasdozens of variantsof n methods
CGAL The goal of the CGAL Open Pro is to provide easy to t
and reliable algorithms in the form of a C++ library CGAL is used invarious
aided design and modeling, information systems, biology,
imaging, rob and motion planning, meshgeneration, n methods and
soon
A to our experiment results, usually Triangle runs faster than CGAL, esp
on the Delaunay renement problemand CDT problem So in the thesis,we
always ouralgorithmswithTriangle unlessotherwisestated
Trang 34A GPU Algorithm for Delaunay Renement
In this hapter, a Delaunay renement algorithm termed as GPU-QM is proposed, h
is the rst GPU solution for improving the mesh quality of DT of a set of points so far
A totheexperimentsonthesyn andreal-worlddatasets,ourGPUalgorithm
outperforms Triangle b 2 to 3 times speedup The qualityof themesh generated b our
algorithmisverysimilartothemeshgeneratedb Triangle,onlylessthan1%moreSteiner
pointsareinsertedinouralgorithm Furthermore,weproposedonevariantalgorithm
GPU-QM-V basedon theGPU-QM algorithm In thisvariant algorithm,a new for
the minimumseparation bound between any two Steiner points is used When using the
GPU-QM-V algorithm, we almost double the p (4 to 5.5 times speedup)
with no more than 4% Steiner points to Triangle In addition both
GPU-QM andGPU-QM-Valgorithmsareguaranteedtoterminate andn robust
Inthis hapter,weuseBastheupperboundof edgeratio,andanytrianglewhose edgeratioisbiggerthanB isabadtriangle Inthefollowing we rst intro some related works in 3.1 In 3.3, an
versionofGPU-QM algorithm)is Inthis algorithm,
we ignore the boundary rening problem, b in order to handle boundary rening,
several sp hanisms are needed h would divert attention away from the main
framework of the GPU-QM algorithm In 3.4 we add several hanisms to the
version, hthattheboundaryedges behandled In 3.5,
GPU-QM-V, thevariant algorithm of theGPU-QM is Allimplementationdetailsand key
GPU hniquesusedinthealgorithmsareshownin 3.6 In 3.7,we
ourresultswiththefastestsequentialimplementationonbothsyn andreal-worlddata
Finally, 3.8 the hapter
3.1 Related Works
Asmentionedbefore,the tralquestionofanyDelaunayrenementalgorithmiswherethe
nextpointshouldbeinserted Areasonableansweristhatthenewpointshouldbeinserted
as far from other v as possible A to the hnique used to hoose Steiner
Trang 35points, there are three dierent strategies/algorithms, h are termed as
enter-insertion, enter-insertion, and sink-insertion, resp ely These three strategies will
beintro inthe following In addition,we willreviewsome parallel Delaunay
renementalgorithmsin theend
Inthiskindof thealgorithm,themainoperationis toinserta pointat the terof
a bad triangle(see Figure3.1), and maintainthe DTb usingBowyer-Watson's algorithm
to re-triangulate the prestar of the new point, orusingLawson'salgorithm to do edge-ip
on non-lo Delaunayedges
t
Figure3.1: Anytrianglewhose edgeratioislargerthanB issplit
b inserting its ter Every new edgehas lengthat least B times that of shortestedgeof thebad triangle Left: beforepoint insertion Right: after pointinsertion
TherearemanyDelaunayrenementalgorithmsusingthisstrategytoinsertSteinerpoints
in the literature Ruppert's Delaunay renement algorithm [Rup95℄ and Chew's
Delaunay renement algorithm [Che89b℄ are the best-known algorithms, h perform
wellin and have provable guaranteeson bothmeshqualityand termination Most
ofother algorithms([She00,Ü09,MPW03,EG01℄)followtheideasofthesetwo algorithms
Here we onlydetailRuppert'salgorithm inthefollowing
A segment is said to be o d if a point other than its endpointslies on or inside its
diametral Any hed segment shouldbe splitinto two segmentsb inserting a
point at its midpoint Forexample, in Figure 3.2,point p is inside the diametral ofsegment s, i.e., s is hed b p After splitting s at its midpoint with point v 1, thesegment sb two segments If theleft segment of s is still hed b p, theleftsegment shouldbesplit againb inserting apoint at its midpoint v 2
Ruppert's algorithm starts from the DT for a given PSLG If there is any
hed segment, split the hed segment at its midpoint, and update the DT
immediatelyuntilnosegmentis hed Thenweneedto kalltrianglesinthemesh
Ifthereisanybad triangle,splitthebadtriangleb insertingpointatits ter, and
Trang 36have higher prioritythan bad triangles, h meansif the ter of a bad triangle
hesone ormore segments,the tershouldbe and the hed
segments should be split inadv As forthe order in h segments are split, or bad
triangles are split, is arbitrary Ruppert's algorithm uses Lawson's algorithm [Law77℄ to
maintain theDelaunaypropertyof thetriangulation
Figure 3.2: Segment is split elyuntilno segment is hed
For input with no angle is less than 60 ◦
◦)
that, theorderof insertingSteinerpointsis veryimportant Aspointedin[She96a℄,
aheapofbadtrianglesindexedb theirsmallestangle a35% inmeshsize
overa rst-in-rst-outqueue
tly,Üngör [Ü09℄proposedanewtypeofSteinerpoint hhe enterforthe
Delaunayrenementalgorithm(seeFigure3.3) Givenabadtrianglepqrwithshortestedge
pqand terc 1,wedenethe tertobethe terc 1ifthe
to-shortest edge ratio of pqc 1 is smaller than or equal to the upper boundB Otherwise,the teristhepoint conthe ofpq(andinsidethe of pqr), h
of thisalgorithmfrom Ruppert'salgorithm isthat thisalgorithmalways splitsa
low qualitytriangleat its ter other thanits ter
The author proves that the new algorithm has the same quality and sizeoptimality
guar-antees as the best known Delaunay renement algorithm In the new algorithm
insertsabout40%fewerSteinerpoints runsfaster)andgeneratestriangulationsthat
have about30%fewerelements withthe best previousalgorithms
Trang 37(a) (b)
Figure 3.3: The ter and the ter of triangle pqr is labeled as c and c 1resp ely (a) If |c 1 p | ≤ B|pq|,then c = c 1 (b) Otherwise, c 6= c 1 and c is obtained bsetting |c 2 p | = B|pq|,wherec 2 is the teroftriangle pqc
ally,this his employed b Triangle that Triangle addsa t of 0.95
to the formula of the ters to improve robustin i.e., theformula
of cinFigure 3.3is modiedto |c 2 p | = 0.95 × B × |pq|,whereB = √
2
3.1.3 Sink-insertion
Edelsbrunnerand Guoy [EG01℄ propose sink-insertion asa new hnique to improve the
mesh quality of DT In 2D a triangle is a sink triangle if its ter is
tainedin theinteriorof the triangle In addition,the ter of a sink triangleis
a sink The ideaof thealgorithm isto eliminatebad trianglesb adding sinks
asnew pointsto theDT
Starting from anybad triangle, we needto nda sinkto be inserted Forexample, for an
existingbad trianglet 0,westartawalkalong the of t 0's terto thenexttriangle t 1 Then starting from t 1,do similarwalk until we nd a sink triangle t i In theend, t i's ter c is the sink to be inserted in the algorithm Figure 3.4 illustrateshowto ndasink fromthebad trianglet 0
Theauthors thatsink-insertion aboutthesamemeshqualityas
ter-insertion algorithm, but it does this in a more manner This is b bad
triangles tend to and share sinks Instead of dealing with a large number of
ters, thisalgorithmonlyworkswith asmallnumberof sinks In addition,thesinks
tendtobewellseparatedandthusexhibitfewerdep hisdesirableinparallel
implementation
Thissinkinsertionalgorithmstudiesthe ofusingsinksinsteadof tersinthe
Delaunay renement algorithm forimprovingthe meshquality However, theirexperiment
Trang 38Figure 3.4: ter cisthesink The arrowshowshowto ndthesinkfrom a badtrianglet 0.
resultsshow thatthe to sinkssurprisinglyhaslittle onthemeshquality
The authors thought the sink-insertion algorithm should be more than the
ter-insertion algorithm, b bad triangles tend to and share sinks
Instead of dealing with a large number of ters, they therefore work with a
smallnumberofsinks However in there arefew bad triangles sharinga
sink triangle to our experiment results on both syn and real-world data
Even though some bad triangles at the beginning,after one or two point insertion
iterations,fewbadtrianglesmaystill together A ,whenthe tersare
insertedone b one,one terinsertionmayremovemorethanonebad triangle In
other words, ter-insertionmayhave thesame as thesink-insertionmethod
Sothe sink-insertionmethodhaslittle on thenumberofSteiner points A
tually,wehave triedto implementthismethodinparallelontheGPU,andtheexperiment
results show that there is little between this method and ter-insertion
LC99,LT01,CN99,STU04℄andsoon AllofthesealgorithmsemployRuppert'sorChew's
Delaunay renement algorithm Although the details of these algorithms are
dif-ferent, most of them employ a simple strategy: at h iteration, they hoose a set of
independent pointsto insert into the domain,and thenupdatetheDT orCDT The
ria of hoosing independentdata set aredierent for h algorithm,butall of them show
Trang 39thenumberofiterationsisa ofLands,whereListhediameterofthedomain,and
sisthelengthofthesmallestedgeintheoutputmesh hindependentset behandled
b Ruppert'sorChew'smethod Furthermore,someofthese algorithms begeneralized
to three-dimension Allthese algorithmsfollow the paradigm
However, a parallelalgorithm,inorder to fullyutilizetheGPU hardware, usuallyneedsto
haveregularizedworkandlo data It isnot howto hieve these
whileadaptingthe strategy
tly, Nasre et al [NBP13℄ their GPU algorithm for Delaunay renement
gain up to 80 times speedup over the sequentialimplementation In their algorithm, they
triedto insert ters forbad triangles inparallel Beforea ter
cis inserted,all triangleswhose cshouldbe foundand marked Thenthey deletemarked triangles,and re-triangulatethe deletedregionafter inserting c Whenall ters are insertedat thesame time, one triangle be marked b more than
one ter Underthissituation,onlyone ter beinserted,other
ters shouldwait However, we repro thesame resultsasthey when
trying their de LonestarGPU We tried to run their de on uniform
dis-tributedpointsasrequiredb their de However,wefoundthattheoutputtriangulation
isnot aDT at all
3.2 Issues for a GPU Delaunay Renement Algorithm
Generally,there are several issuesthat need to be when designinga GPU
algo-rithm forDelaunayrenement algorithm
1 How to Steiner points in parallel Should we simply all bad triangles'
ters, or onlya partof them?
2 HowtoensureallSteinerpointsobtainedfromtherstquestionarepairwiseindependent
Or is it too aggressive to limitall Steinerpoints shouldbe pairwiseindependent? Is there
anyother to denetheminimumseparationbetween any twoSteiner points?
3 How to simulate the priority queue used in thesequentialimplementation on the GPU
tly
4 How to handlenon-degenerate to getrobustresult
5 Howto handletheboundaryrening,andmake sure thealgorithm terminate
Of thelastbutnottheleastissueishowtomaketheGPUalgorithmrunfast Inthe
next we will the algorithmof GPU-QM, h is a framework of the
whole algorithmand solveall issuesmentionedabove forthefthone In order
Trang 40h will be intro in 3.4 In 3.5, we will the issue
again to make theGPU-QM algorithm runfaster
Inthis wefo onthe algorithm/frameworkoftheGPU-QMalgorithm The
inputisa stable ofaDT fora setofpointsS, hb denitionisa sub
whose tersalllieintheinteriorofitsunderlying WeuseB = 1astheupperboundof edgeratio,sointheoutputDT,allanglesarelargerthan
orequalto 30 ◦.
Althoughthere aremanyalgorithmswe useto generatetheinitialDT
of S, ouralgorithm is a GPU algorithm, we hoose to use the GPU algorithmmentionedin[QCT12℄to generate theDT forS hanismsforrening theboundaryis
3.3.1 Motivation and algorithm overview
the three stepsfor a sequentialDelaunayrenement algorithm Let us the
normal withno Forabadtrianglet,we should its terc
intherst,and ndthe tainerforc(dopointlo Thenwe insertcinto themeshand maintain the mesh to be a DT b usingedge-ips Repeat these steps until no more
bad triangleexist we alwayssimulate thesequentialalgorithm ontheGPU,one
nạve method bedesigned asfollows:
Nạvemethod 1: Letonethread handleone triangle Foranybad trianglet, its
ter c, and do point lo for cto ndits tainer triangle Then insert all c
into the mesh,using operation to voidinserting more thanone Steiner point into a
same triangle In theend, doedge-ip inparallelto maintain DT
When one Steiner point p is inserted, a regionnear p should be modied, and thisregion
is e region of p For a Steiner point p, its region is the union of
in the of its star in the DT mesh after inserting p, or in the ofits prestar in the DT mesh before inserting p When manySteiner points are inserted inparallel, their regions may overlap Points whose regions are pairwise
disjointarepairwiseindependent
This method may lead to many more Steiner points to the result from the
se-quentialimplementation, and even innite loop problems For example, assuming p and q
aretwo Steiner pointsinsertedinthe same roundforeliminating bad trianglet p and t q If
pandq arenotindependentwith hother,theirprestarsarepartiallyorfullyoverlappedwith hother Inanextreme whentheirprestarsarefullyoverlapped,inserting