1. Trang chủ
  2. » Giáo Dục - Đào Tạo

A 2d GPU mesh generator

104 247 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 104
Dung lượng 5,65 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

There has been no prior h to generate other triangle mesh h as Delaunaytriangulation, Delaunaytriangulation and quality triangulation tly usingthe parallel power of the GPU.The GPU ismas

Trang 1

Shandong University, China

A THESIS SUBMITTED FOR THEDEGREE OF

DOCTOR OF PHILOSOPHY

DEPARTMENT OF COMPUTER SCIENCE

NATIONAL UNIVERSITY OF SINGAPORE

2014

Trang 2

have beenused inthethesis.

This thesis has also notbeen submittedfor any degree in anyuniversity

previ-ously

Qi Meng

November 2014

Trang 3

I would like to express my gratitude to all those who helped me during the

writing ofthisthesis

My deepest gratitude goes rst and foremost to Dr Tan Tiow-Seng, my

su-pervisor,who has oered me valuable suggestions in the studies He

hasspent m htime the withme andprovidedmewith

inspir-ing Withouthis illuminating insightful and expert

the of thisthesiswouldnothave beenpossible

I am very grateful to Cao Thanh-Tung and Gao for selessly sharing

theirknowledgeandtime,forhundredsdeep wehavehadaboutevery

forsittingbesides meto helpme outoftheendless de optimizations

Iwouldliketo thank Prof LowKok-Lim,Prof Alan ChengHo-Lun,and Prof

Huang Zhiyong for their ts and suggestions on my h

duringtheweekly meeting of G 3

Lab

Notforgetting to thank all myfellow lab-mates who oeredme great y

and inmanyways Thankyou,myfriendsHuaBinh-Son,Cao

Thanh-Tung, Gao Guo Jiayan, Li Ruoru, Yan Ke, Liu Linlin, Tang Ke, Su

Jun, Alvin Chia, Lai Kuan, Poonna Yospanya, Kang Juan, Sang Ngo Le, Li

Yunzhen,DuCheng,youhave hedmylifeinNUS,makingitmoreenjoyable

and fun

Lastbutnotleast, mygratitudewould goto mybeloved familyfortheirloving

supportingme withoutany t

Trang 4

several trianglemeshgenerators h generateDelaunay

triangu-lation, Delaunay triangulation, Delaunay triangulation

andqualitytriangulationontheCPU.However,there isnosimilargeneratorfor

The GPU hasbeenusednot onlyfor pro tasksbutalso general

enormous parallel power In geometry, early works

on the GPU the digitalVoronoi diagram and Delaunay

tri-angulation There has been no prior h to generate other triangle mesh

h as Delaunaytriangulation, Delaunaytriangulation

and quality triangulation tly usingthe parallel power of the

GPU.The GPU ismassivelymultithreaded,withhundredsof pro in

or-dertofullyutilizetheGPUhardware,aparallelalgorithmusuallyneedstohave

regularized work and lo data However itis even not how to

hievethese whileadaptingthetraditionalandusually parallel

hniques, has to themeshgenerationproblem Soitis

not how to tlyapplytraditionalparallelalgorithms on the

GPU

Inthisthesis,wefo ondesigningmeshgeneratingalgorithmsin2D on

theGPU Two algorithmstermed asGPU-QM and GPU-CDTare proposedin

the thesis, h improve the quality of the Delaunay triangulation for a

point set, and Delaunay triangulation for a set of points

and ts, resp ely Both of these two algorithms are the rst GPU

algorithmsproposedsofar A toourexperimentsforbothsyn and

real-worlddata,ourGPUalgorithmsaren robustandrunfasterthan

thefastestsequentialalgorithm Comparing to thefastestsequential

implemen-tation,the GPU-QMgainsupto 5.5times speedup;theGPU-CDTgainsup to

two orders of magnitudespeedup Furthermore,we obtaintherst GPU mesh

generator b integrating theGPU-QM, GPU-CDTalgorithmswith an existing

workGPU-DT, h Delaunaytriangulationforapointsetusing

theGPU.Our meshgenerator digitalVoronoidiagrams, Delaunay

Trang 5

triangulation, Delaunay triangulation, Delaunay

trian-gulation and high-quality triangle meshes in 2D on the GPU In order

to handlen errorand degenerate input,we implement

andsimulationof ymethod ontheGPUbased onthesequential

imple-mentations So our generator handle and is n

robust

Trang 6

List of Figures viii

3.2 Issuesfora GPU DelaunayRenement Algorithm 26

3.4 hanismsforGPU-QM to HandleBoundary Rening 36

Trang 7

3.6.1 Dealing withvariable-size arrays 46

Trang 8

1.1 (a)An exampleof DTof a setof points (b)No pointis insidethe

1.2 (a)An example of CDT of a setof pointsand ts (b)No visiblepoint

isinsidethe ofanytriangleinthe triangulation 3

1.4 Twokindsofskinnytriangleswhose arem hlargerthantheir

short-est edges (a) Needle, whose longestedge is m h longerthan its shortestedge

(b)Cap,whosemaximumangleis to 180

2.2 APSLG ofpointsandsegments Theradiusof hdiskillustratedhere

2.3 An exampleof DT of aset of points No visiblepoint is insidethe

2.4 (a)Star of p,and the boldedges boundshaded triangles is the link of thestar

2.5 Flipping abto cd Beforeip, ab is not lo Delaunay, and the union of thetwo trainglesabcand abd is vex After ip,cdislo Delaunay 142.6 AnexampleofCDT ofaPSLG h ofpointsand one t(red

line) Novisiblepointisinsidethe ofanytriangleinthetriangulation 16

2.7 (a)A triangulation (b) Data of thetriangulation An orientated

2.8 Possible inparallelpro wheninsertingpointsintothetriangulation 19

3.1 Anytriangle whose edge ratiois larger than B is split

b inserting its ter Every new edge has lengthat leastB times that

of shortest edge of the bad triangle Left: before point insertion Right: after

Trang 9

3.3 The ter andthe teroftrianglepqrislabeledascand c 1 resp

tively (a)If |c 1 p | ≤ B|pq|,then c = c 1 (b)Otherwise,c 6= c 1 and c isobtained

b setting |c 2 p | = B|pq|,wherec 2 isthe terof trianglepqc 243.4 ter c is the sink The arrow shows how to nd the sink from a bad

3.5 1-3 ip (inserting d into the triangle abc) and 3-1 ip (removing d from the

3.7 Givenanon-extremepointv Toremovev,weipveandvcto thedegree

3.8 A h generates ashorter edge, and leadsto aninnite loop 37

3.9 1-2 ip (inserting midpoint d on the edge of ab) and 2-1 ip (removing the

3.10 An example shows that ifp does not h boundary edge ab, other points

and b ′ abisa edgesharedb the starof x and y 45

3.17 Oursolutionis to re-useemptyslots inthepoint/trianglelist 47

3.18 distribution (a)InputDT mesh (b)Outputqualitymesh 50

3.19 Totalrunningtimeandspeedupovertriangle ondierentdistributionsb using

3.20 distribution: oneintermediateresultafterseveraliterations Badtriangles

3.21 Meshquality between Triangle (left and GPU-QM (right

ondierent distributions (a)(b) Uniform distribution Gaussian

distribution (e)(f) Disk distribution (g)(h) distribution (i)(j) Grid

3.22 ComparisononthenumberofoutputpointsamongTrianglewithpriorityqueue,

GPU-QMwithpriority,andGPU-QMwithoutpriorityonuniform(left)andgrid

3.23 TherunningtimeofdierentstepsoftheGPU-QMalgorithmfor1millionpoints

Trang 10

3.24 Comparison on number of ips between GPU-QM and Triangle on dierent

distributions (a)Uniform (b)Gaussian Disk (d) (e)Grid 56

3.25 Comparisononthenumberof betweenGPU-QMandGPU-QM-Von

dierentdistributions (a) Uniform (b)Gaussian Disk (d) (e) Grid 58

3.26 Comparison on number of ips for three strategies, GPU-QM, orthogonal-test

(i.e.,GPU-QM-V),no-Deletion,ondierentdistributions (a)Uniform (b)Gaussian

3.27 TherunningtimeandspeedupofGPU-QM-V toTriangle fordierent

3.28 Comparisononnumberof foruniformdistribution withandwithout

3.30 Arasterimageanditsqualitymesh,in htheredpointsaretheinputpoints

4.1 Steps of the GPU-CDT algorithm (a) Input PSLG; (b) Step 1: Triangulation

Step2: Constraintsinsertion;(d)Step3: Edgeipping k

4.2 (a) Findthe rst triangleA in b the t (red line), yellow angle is therst triangle t to the a inthe vertex array (b) How to nd

4.3 Congurationsofatrianglepairin a t(drawnindashedline)

4.4 Flipping ofatrianglepairinvolvingA The tpqin

thetriangles from left to right (a) Case1a (b) Case 1b Case 2 (d) Case

4.5 (a) When the triangle pairs in the t c i = ab are only eitherdoublein or ve,thereexistsaippablepair(A, C) (b)B,Aand

C fulllCase3a B,A and C fulllCase 2 (d)B,A andC fulllCase3b 74

4.7 Push the ve pair towards the right end of the t using a ipping

4.8 Asyn dataset (left)and its Delaunay triangulation(right) 77

4.9 SpeedupoverTriangle when theCDT, (a)with 1M ts and

varyingthenumberofpoints, and (b)with10M pointsandvaryingthenumber

Trang 11

4.10 Total number of t-triangle in with dierent grid sizes, (a)

with1M tsand varyingthenumberof points,and(b)with10Mpoints

4.11 Runningtime for dierent steps for CDT, (a) with 1M ts

and varying the number of points, and (b) with 10M points and varying the

4.12 Comparisonwith Triangle on thetotal number of ippingswhen inserting

straints,(a)with1M tsandvaryingthenumberofpoints,and(b)with

4.15 Araster image(top) and theCDT foritsedge map(bottom) 83

Trang 12

2.1 Primitive operations to thetriangulationinFigure 2.7a 17

Trang 13

3.1 redundantpoint,non-DelaunayandDelaunayips 33

3.2 Mark ttrianglesforredundantpoint,anddoedge-ipfortrianglepairs

Trang 14

Meshes osed of triangles are used in various h as

interpolation, nite element method, and terrain databases Although there are several

trianglemeshgeneratorson theCPU, h asTriangle, CGAL andsoon, h

generateDelaunaytriangulation(DT), Delaunaytriangulation(CDT),

forming Delaunay triangulation and quality triangulation, there is no h a generator as

faraswe knowforthe pro unit (GPU) tlyGPUwith its

enormous parallel power has been used widely in many for general

purpose GPU uses a massively parallel with hundredsto

thousandsofpro elementsto thousandstomillionsofthreadssimultaneously,

issuesinparallelprogramming has operationamongthreads, data

and b more serious problems In order to fully utilize the

GPU hardware, a parallel algorithm usuallyneeds to have regularized work and lo

data It is hard and even not how to hieve those while adapting

the traditionaland usually parallel hniques, h as to the

mesh generating problem In the thesis, we to design 2D CDT and qualitymesh

generating algorithms h are suitablefor parallel esp for the GPU

Furthermore,weobtaina2DmeshgeneratorontheGPUafterintegrating an

existingDT algorithm A to ourexperimentresultsforbothsyn

and real-world data, our mesh generator is n robust and runs m h faster than

existing sequential algorithms Comparing the fastsequentialimplementation, our quality

meshalgorithmrunsupto 5.5times faster,whileourCDTalgorithmrunsupto twoorders

Trang 15

For the DT there are several GPU algorithms and implementations, h

as [RTCS08, QCT12, QCT13,CNGT14℄ Usually, the generationof Delaunay

triangulation is bined with quality mesh generation b all edges in the mesh

being Delaunay edges So we do not Delaunay triangulation

here

Inthethesis,wemanagedtoaddresstheothertwomeshgenerationalgorithmslistedabove,

i.e., CDT and quality triangulation In our CDT generation algorithm,the algorithm

handleany planar straight line graph (PSLG)input While inourqualitymeshgeneration

algorithm,theinputisasegment-boundedDTofasetofpoints,andthereisnosmallangle

between any two t segments Setting on theinput of the qualitymesh

generationalgorithm, isb inboththeory and no algorithm or

implementa-tion guarantee to terminate for all inputdata with good quality that even for

the sequential algorithm, it is still a big hallenge to handle small angles h are from

the input All existing algorithms are likely to be foiled b input data h have small

features, smallanglesor topologies Beforequalitytriangulationalgorithm

be applied to the GPU thoroughly preparation on knowledge of both CPU and

GPUareneeded thequalitymeshgenerationalgorithmforanyPSLGisleft tothe

future In thethesis we only thequality mesh generationalgorithm for

a segment-bounded DTof a setof points,and no angle betweenanytwo inputsegments is

lessthan

60◦

In the following kground knowledge related to mesh generator and GPU

ar-are intro At the end of this hapter, tributions of the thesis will be

highlighted

namedafterBorisDelaunayforhisworkonthis from1934 ADTforasetofpointsS

inaplaneisatriangulation hthatnopointinSisinsidethe ofanytriangle

inthetriangulation(see Figure1.1) ha sp propertyoftheDelaunaytriangulation

isalso empty property TheDTmaximizestheminimumangleofall theangles

ofthetrianglesinthetriangulation In otherwords,amongall triangulationsof agivenset

of points,theDT hasthe largestminimumangle

Due to its property of avoiding long, skinny triangles, the DT has many

indierentelds Forexample, in informationsystem(GIS),one

waytomodeltheterrainisto interpolatethedatapointsbasedontheDT [Gol94℄ Inpath

planning,theDT beusedto the minimumspanningtreeofa setof

Trang 16

(a) (b)

Figure1.1: (a)AnexampleofDTofasetofpoints (b)Nopointisinsidethe

of any triangleinthetriangulation

usedto buildqualitymeshes fortheniteelement analysis [HDSB01℄

CDT is a extension of the DT where some edges in theoutput are

before-hand [Che89a℄; these edges are referred to as onstraints Given a set S of n points (orsites) in the2D plane and a set of ts, theCDT is a triangulation of

S havingall the ts while being as to the DT of S as possible (seeFigure 1.2) So the CDT inherits the DT's optimality: among all possible triangulations

of a set of pointsthat all the ts, the CDTmaximizes the minimumangle

Similar to the empty property of the DT, CDT must fulll the weaker onstrained

empty property To state thisproperty,itis venient to thinkof edges

asthewall h blo kingtheview For a CDT,no visiblepoint is insidethe

of any triangleinthetriangulation

Figure 1.2: (a)AnexampleofCDTofasetofpointsand ts (b)Novisiblepoint

isinside the of any triangleinthetriangulation

Trang 17

Figure 1.3: An edgemap ofan imageand its CDT.

are tours in the of the body's skull; and in modeling, they are

es[Boi88,Tre95, Kal10℄ In short, theCDT tsthe DT and isa very useful

inmanyelds Figure 1.3 shows an example of how to applyCDT inthe image

v (details be found in 4.5.3) This gure shows an edge map of

a raster image (depending on the resolution of the image, the edge map might of

hundredsofthousands oflinesegments or ts),and theCDT resultforit

putational geometry Due to their property of avoiding long, skinny triangles, they

havemany indierent elds However, inthereal-world

h asinterpolation, the niteelement method, and thenite volume method, the

trian-gle meshes pro b the mesh generator should satisfy guaranteed bounds on angles,

edgelengths,the numberof triangles,and the gradingof trianglesfrom smallto largesize

Neither DTnor CDT satisfy hrequirementsintheoryand

For example, most mesh generation algorithms take a PSLG as their input A PSLG is

a set of v and segments ts) shown in Figure 1.5a Although the DT

maximizetheminimumangleofalltheanglesofthetrianglesinthetriangulation,thereare

someskinnytrianglesinthemesh Furthermore,theDTofthev maynotresp the

domainthat auser wishesto triangulate On theother hand,theCDT,as anextension of

theDT, to thedomain's boundary,butstill eliminateskinnytriangles

inthemesh

Conformityandskinnytriangledeletion, bothof theseproblems besolvedb inserting

additional points (or Steiner points, named after Jakob Steiner) The main y is

Trang 18

where to the additional points Usually people use the hnique Delaunay

renement to generate triangle meshes The main problem of Delaunay renement is to

ndatriangulationthat t arbitrarily domains,and tainonlytriangles

of appropriatesizeand shape

Figure 1.4: Two kinds of skinny triangles whose are m h larger than their

shortest edges (a) Needle, whose longest edge is m h longer than its shortest edge (b)

Cap,whosemaximumangleis to

2 Alltriangles shouldbe relatively "round" shape: there are no too smallorlarge angles

Triangleswithtoosmallorlargeanglesare skinnytriangles(lowqualityorbad

trian-gles;seeFigure1.4) Largeangle leadtolargeerrorinthegradientsoftheinterpolated

and large error in nite element method Small angle lead the

systems of equations h that the nite element method yields to be

3 Algorithm oer asm h trolaspossibleoverthesizes of triangles, h

thespeedand Usually, smalltriangles oer more than largerones, but

the timeis proportionalto the numberof triangles dierent triangle

sizesentailstradingospeedand Foragivenmesh,itisnot toreneitto

generate anothermesh h tains more triangles But the reverse pro is relatively

[MTT97℄

Most meshgenerationalgorithms take a PSLG astheirinput DuringtheDelaunay

rene-ment pro a segment usually is divided into smaller edges The domain of interest

orthetriangulation domain is theregionthatisneeded tobetriangulated hadomain

shouldbe segment-bounded, meaningthat thesegments ver theboundaryof the

Trang 19

triangu-either vex or ve, and it may tain holes, but the holes should be bounded b

segments

Figure 1.5: (a) A PSLG, and (b) a mesh generated b Ruppert's Delaunay renement

algorithm

The main pro of Delaunay renement algorithms is to insert Steiner

points until the mesh meets ts on triangle quality and size while maintaining a

DT or CDT.There are several advantages of the Delaunayrenement algorithm First,it

maintainstheDT, hmaximizestheminimumangleamongallpossibletriangulationsof

apointset,andisoptimalamongallpossibletriangulationsofapointset inserting

a vertexto a DT isa lo operation, h isinexpensive inunusual

In h as the nite element method, there are several measures

in use for the quality of a triangle Usually, a good quality mesh means all triangles are

non-obtuse, or all with bounded asp ratio Asp ratio of a triangle is the length of

thelongest edgedividedb thelengthofthe shortestaltitude A fairlygeneralmeasure of

triangleistheminimumangle α, thisgivesaboundofπ − 2αonmaximumangle andguaranteesanasp ratiobetween| 1

sin α |and| 2

sin α | Themostnaturalandelegant measureforanalyzingDelaunayrenementalgorithmsisthe adius-to-shortestedgeratio (r/l)

of a triangle [MTTW95℄ The enter and adius of a triangle are the ter

and radius of its resp ely A , a triangle's

edgeratio r/lis related to its smallestangleθ min b the formula r/l = 1/(2 sin θ min ) Thesmaller a triangle's ratio, the larger its smallest angle, i.e., given an upper bound B onthe edge ratioof all triangles ina mesh, there is no angle smaller

than arcsin 2B 1 (and versa) Clearly,people want to make B assmall aspossible Forexample, ifB = √

2 isemployed,all anglesareboundedbetween

20.7◦

and138.6◦

thequalityofthemeshdependssolelyonhowthepoints hdenedtheDTare

distributed,sothe tralquestionofanyDelaunayrenementalgorithm ishowto hoose

Trang 20

farfrom other pointsas possible If a newpointinsertedis too to another point,the

resulting smalledgewill engenderthin triangles Fora bad triangle t,a waydate k at least to mid-1980s [Fre87℄) to eliminate it is to insert a Steiner point at its

teranduseLawson'salgorithm[Law77℄ortheBowyer-Watson algorithm[Bow81℄

tomaintainthetriangulationbeingaDT.Afterthe ter'sinsertion,thebadtriangle

survive,b its isnolongerempty Intheliterature,thereareother

dierent kinds of hniques on how to hoose thenext Steiner point Allof them willbe

Anotherbig hallengeto theDelaunayrenement algorithmisto ensureterminationofthe

algorithm whilestillobtaininggoodqualitymesh Dueto smallanglesfrom theinput,the

Delaunayrenement algorithm always terminate Small angleinherentintheinput

geometry beremoved,anditisnotpossibletotriangulateadomainwithout

anynew smallangles aspointedb Shew hukin[She00℄: foranyangle boundθ > 0,thereexistsaPSLGX hthatitisnotpossibletotriangulateX without anew

(not present in X) whose angle is smaller than θ This statement imposes a fundamentallimitationon anytrianglemeshgenerationalgorithm Ifnoinputangleissmallerthan60 ◦

,

Ruppert's algorithm [Rup95℄ guarantee to terminate, and have no angle smaller than

k h is no larger than arcsin 1

) Similarly, forthe same inputs as mentioned above, Chew's Delaunay

renementalgorithm [Che89b℄ also terminate,andall anglesarebetween30 ◦

and 120 ◦

A ,mostofexisting algorithmsworkwellin formany butall of

them would failon some triangulation domains Therefore, in thethesis, we only

fo on Delaunay renement algorithm for DT mesh of a point set in h the mesh is

segment-bounded

Drivenb thedemandforrealtime,high-denition3D theprogrammable

pro unitorGPUhasevolvedintoahighlyparallel,multithreaded,man pro

withtremendous horsepowerandveryhighmemorybandwidth,asillustrated

b Figure 1.6 The inoating-point ybetweentheCPUand theGPU

is that more transistors are devoted to data pro ratherthan data hing and ow

trolon theGPU,as illustrated b Figure 1.7

hersanddevelopersareb moreandmoreinterestedindoinggeneralpurpose

onGPUs,wheretheGPUsareusednotonlyfor pro tasksbut

ph simulationto data miningand geometry

Trang 21

(a) (b)

Figure 1.6: (a)Computational power and (b) memorybandwidth of theCPU and GPU

(NVIDIA 2012 [NVI13℄)

Figure 1.7: TheGPU devotes more transistorsto data pro

putations,i.e., thesame isperformedon manydataelementsinparallel,

espe-whena highamount of isinvolved Inaddition, ha trend

is withthe t intro of CUDA,a generalpurposeparallel

b NVIDIAwheredevelopers now witheasethefullpowerofGPUs

formore tasks

In the eld of geometry, GPU has been employed to solve some problems

Earlyworks thedigitalVoronoidiagram(VD)[HKL

+

99,FG06,CTMT10℄,

a thatis relatedto the DT.Theseworks alsomentionedthepossibilityof

obtainingthelatterfromtheformerstraightforwardly However,we thattheVoronoi

diagram ina digital (of atexture) is not thedual of theDT in a tinuous

andonlyuntil tly,Rong etal.[RTCS08℄presentaseriousattempttoderivethe

DT fromthedigitalVD Theiralgorithm,however, ishybrid, parallel is

onlyusedintherstpart whileleavingtherest to asequentialalgorithm After thatQi et

Trang 22

Inthethesis, we thealgorithm h generatetheDT foraset ofpointsin

2D asGPU-DT

As fortheCDTand qualitymeshproblems,there is no t GPUsolutionasfaraswe

know This is partly b h problems do not present themselves readily to parallel

A parallel algorithm, in order to fully utilize the GPU hardware, usually

needs to have regularized work and lo data It is not how to hieve

those whileadapting thetraditional and usually parallel hniques, h

Motivated b the rapid of the p of GPU, the exibility of the parallel

programming model CUDA and the observation that there is no known algorithm and

implementationofa2DmeshgeneratorontheGPU,theob eofthisworkistodevelop

a 2D meshgenerator on the GPU The major tributionsof this work are listedin the

following:

1 Algorithm GPU-QM: A new and t h for 2D quality triangle

meshontheGPU.ItistherstGPUsolutionforthisproblem A toourexperiment

results, this algorithm handle both syn and real-world data very well When

toTriangle,ouralgorithm generatemesheswithsimilarqualityasthemeshes

generated b Triangle, and runs up to 5.5 times faster Furthermore, this algorithm

oer bothtermination andqualityguarantees

2 Algorithm GPU-CDT: The rst GPU solution to the 2D CDT of a PSLG

of points and edges Our implementation runs up to two orders of magnitude

fasterthanthe best sequentialimplementationson theCPU.This resultis inour

experimentwithbothrandomlygeneratedPSLGsandreal-worldGISdatahavingmillionsof

pointsandedges Furthermore,we provethatthealgorithmisguaranteedto terminate

for any given PSLG, and the total number of ips performed b the algorithm is Θ(n 2 ),where nisthe numberof inputpoints

3 A software bining GPU-DT, GPU-QM, GPU-CDT together It is the rst GPU

mesh generator so far Similar to Triangle software, thissoftware generate DT,

CDT, Delaunay triangulations, digital Voronoidiagrams, and high-quality

tri-angle meshes But our algorithmsare parallel h isimplemented usingtheCUDA

pro-gramming model on nVidia GPUs The experiment results of the program show that it is

n robustand runsm hfasterthananyexisting CPUprograms hasTriangle

and CGAL The des for all the algorithms have also been made freely available to the

Trang 23

4 Several key GPU hniques, h as handling and robustness,

han-dling operation among threads, dealing with usingmultiple iterations

The GPU hniques have been do ted in the thesis, and provide valuable

forfuture h

The rest of the thesis is organized asfollows Chapter 2 intro some denitions,

data andGPUprogramming ThealgorithmsGPU-QMand

GPU-CDT would be intro in Chapter 3 and Chapter 4, resp ely Finally, Chapter 5

thethesis

Trang 24

Before intro our algorithms in the following hapters, in this hapter we intro

some denitionsand properties aboutthem in 2.1 2.2 esthe

data used in our mesh generator all the algorithms proposed in

the thesis are based on GPU, some related programming are in

2.3 At the end, 2.4 presents the experimental environment used in the

thesis

2.1 Terminology and Denition

Denition 2.1.1 (Voronoi diagram) Let S b a set of n sites in the an sp e of

ℜ 2

For e site p of S, the Voronoi region R(p) of p is the set of points that are

to p than to other sites of S TheVoronoi diagram V(S) is the sp e partition e byVoronoi regions The elements of S are also alled sites of this Voronoi diagram The linesegments share by the boundaries of two Voronoiregions are alled Voronoiedges, and the

points share by the boundaries ofthre or moreVoronoiregionsare alled Voronoi es

In the instead of a tinuous plane, we only theset of all integer

Figure 2.1: DigitalVoronoidiagram

Trang 25

grid points Ifa gridpointp liesinsidetheVoronoiregionof thesitex i,and we saythat p

is b x i In pisequal fromx i andx j and i < j,we pb x i Thesetof gridpointsformthe VoronoidiagramD(S)ofS (seeFigure2.1) Wethispro

Denition 2.1.2 (Planar graph) Let V b a nite set of n es in ℜ 2

, and let E b aset of edges determined by the es of V A planar graph is the pair G = (V, E) thatsatises:

(i) For e edge ab ∈ E, ab T

V = ∅, and(ii) For e edge pair ab 6= cdin E, ab T

cd = ∅.Denition2.1.3(PlanarStraightLineGraph-PSLG) PSLGisagraphin the es

areembedded as pointsin the an plane, and the edges areembedded as ossing

line segments

Bydenition,aPSLG tainbothendpointsofeverysegmentandasegmentmayin

v and othersegmentsonlyat its endpoints Figure2.2 shows anexample ofPSLG

Figure 2.2: A PSLG of pointsand segments The radius of h diskillustrated

hereis thelo featuresizeofthe pointat its ter

Denition 2.1.4 (Lo featuresize) Given a PSLG X, the lo al featuresize lfs(p) at anypointpistheradiusofthesmallestdisk entere atpthatinterse two es

or segments of X Two features, e a vertex or segment, are said to b

if they interse

Figure 2.2 illustrates the notion of lo feature size b giving examples for a variety of

Trang 26

featuresizeof thepoint.

Proof The disk having radius lfs(u) tered at u in two t features of

X Thediskhavingradiuslfs(u) + |uv| tered atv tainsthepriordisk, and thusalso

in thesame two features thesmallestdisk tered at v thatin two

t featuresof X hasradiusno larger thanlfs(u) + |uv|

Denition 2.1.5 (Triangulation) Triangulation is a PSLG G = (V, E) that E ismaximal

A sp of triangulationisDelaunaytriangulationasshown inFigure2.3

Denition 2.1.6 (Delaunay Triangulation) A triangulation G = (V, E) is Delaunay if all

edges ab ∈ E satisfy the alled empty property (with respe to the setof points V)that is to say, there is a that passes through a and b that the other points of V

areexterior tothe

Figure 2.3: AnexampleofDTofasetofpoints Novisiblepointisinsidethe

of any triangleinthetriangulation

In the non-degenerate i.e., no four or more points are the Delaunay

tri-angulation for a given set of points is unique For other degenerate the Delaunay

triangulation is not unique, i.e., there are more than one Delaunay triangulation Then

anyof them willbe Among all triangulationsof a nite point set S ⊆ ℜ 2

,the

Delaunay triangulation maximizes the minimum angle of all the angles of the triangles in

thetriangulation

The star of a vertex p (Stp) of all triangles that tain p The link of p

of all edges oftriangles inthe star thatare disjoint from p (see Figure2.4a) Let p / ∈ S be

a point in the interior of the vex hullof S, and assume that S S

{p}is non-degenerate.Let D be the DT of S and D p be the DT of S S

{p} The prestar of p (Ptp) of

Trang 27

substituting the star for the prestar, D p = (D −Ptp ) S

Denition 2.1.7 (Lo Delaunay) Let S b a point set, T b a triangulation of S An

edge ab ∈ T islo ally Delaunay if

(i) it belongs to onlyone traingle and therefore bounds the onvex hull of S, or

(ii) it belongs to two triangles, abcand abd, and dlies outside the of abc.Denition 2.1.8 (Delaunay Lemma) If every edge of T is lo ally Delaunay then T is the

DT of S

Edge ip is a lo operation proposed in [Law77℄ If ab belongs to two triangles, abc

and abd, whose union is a vex quadrangle, then we ip ab to cd, see Figure 2.5

A totheDelaunayLemma,we useedgeipsaselementaryoperationsto vert

an arbitrarytriangulation T to theDT

Figure 2.5: Flippingabto cd Beforeip,abisnotlo Delaunay,andtheunionofthetwo trainglesabc andabd is vex Afterip,cdislo Delaunay

Trang 28

There are many sequential algorithmsdeveloped for the CPU to theDT [Aur91,

For97,SSD97℄ Allthesealgorithmsingeneralfollowoneofthethreewell-knownparadigms:

[Dwy87℄,sweep-line[For87℄and talinsertion [GKS92℄

a Divide-and-Conquer Algorithmsbasedonthisstrategy elydivideasetofpoints

intotwosmallersets,untilasetissmallenoughto triviallyitsDT.Thenitmerges

elytheresultsoftwo small tsetsintothatofabiggerone,untilresultsofall

sets aregroupedinthe triangulation Usingthis h,theDT bebuiltinoptimal

O(n log n)time[SH75,Dwy87℄

b Sweep-line The Voronoi diagram and DT are dual to h other Fortune [For87℄ uses

a sweep-line algorithm to the Voronoi diagram, from h the DT is obtained

First,the algorithmsortsthe inputpoints to theirx ordinates,then avline, the sweep-line, is swept from left to right Points behind the sweep-line are

already added into the Voronoi diagram, while points ahead of the sweep-line are waiting

forpro Asthesweep-lineprogresses,theVoronoiedgesaregenerated tally

The runningtimeof thisalgorithmis also O(n log n)

emental Insertion A natural way to tly the DT is to repeatedly

add points one at a time, re-triangulate the parts of the triangulation To insert

a point, we rst lo the triangle or edge taining the point The new point splits

the triangle taining itself into three triangles,or thetwo triangles t to theedge

tainingitselfinto fourtriangles Subsequently,weperformedgeippingtomaintainthe

triangulationbeingaDT.Althoughthis talinsertion hrunsinO(n 2

)time

in the worst the exp time y still be O(n log n), provided that thepointsare insertedin arandomorder [GKS92℄

For the GPU algorithm, there are a few t works, h as [RTCS08,CNGT14℄

ouralgorithmsproposedinthe hapterareGPUalgorithms,we hooseanyone ofthem

Denition 2.1.9 (Visibility) Two es a and b of a graph G = (V, E) are visible from

e other (in G) if the line segment ab does notinterse any of the edges of E

Denition 2.1.10 (ConstrainedDelaunayTriangulation) LetG = (V, E) b a planar graphwithE 6= ∅ A triangulation (V, E ∪ E ′ ) isa onstrained Delaunay triangulation of G if the

Delaunay triangulation is a triangulation h is as as possible to the Delaunay

triangulation Figure2.6 showsan exampleof Delaunaytriangulation

Trang 29

2.2 Data

Figure2.7bshowsthedata ofthetriangulationshowedinFigure2.7a Throughout

the phases of DT, CDT and quality mesh, we need to frequently walk from

triangletotriangle,oraroundthetrianglefanofavertex As h,wehavetomaintaintwo

data First, for h trianglewe always maintain thelink to its three neighbors

The three v of a triangle are indexed with 0, 1 and 2 The neighbors of a arealso indexed with 0, 1, 2 in h a way that the neighborindexed b i is opposite to thevertexwiththesame index for h vertex, we maintaina linklist ofall triangles

tto that vertex

Anytimewhenthetriangulationis hanged,wehavetokeepupdatingthesedata

Like Triangle, all triangles in our algorithm are oriented triangles Three v of an

oriented triangle are origin vertex, destination vertex, apex vertex resp ely An

oriented triangle a pointerto a triangleand orientation There are three possible

orientations foratriangleabc,that isabc,bca,and cab(allaredened in kwiseorder) Supposethe orientation of triangle abc is abc, that means a is the apex vertex oftriangle, b is the origin vertex of triangle, and c is the destination vertex of triangle (seeFigure ) In we use an edge to denote the orientation of a triangle, and

use the orientated triangle to denote an edge of the triangle For example, the orientated

Figure 2.6: An example of CDT of a PSLG h of points and one t

(red line) No visiblepointis insidethe of anytriangleinthetriangulation

Trang 30

(a) (b)

Figure 2.7: (a)Atriangulation (b)Data ofthetriangulation Anorientated

triangle

Operation Result Usage

sym(abc) bad Findtheabuttingtrianglesharing an edgeab/ba

lnext(abc) bca Findthenext edge kwise)of a triangle

lprev(abc) cab Findthepreviousedge kwise) ofa triangle

onext(abc) acf Find the next edge kwise with the same

origin

oprev(abc) adb Findthenext edge kwise withthesame origin

dnext(abc) dba Find the next edge kwise with the same

org(abc) a Findtheoriginvertex

dest(abc) b Findthedestination vertex

apex(abc) c Findtheapexvertex

bond(abc,bad) - twotriangles sharingan edgeab/ba

Table 2.1: Primitiveoperations to the triangulationinFigure2.7a

triangleabcdenotestheedgebcof thetriangle, andedgebc denotestheorientatedtriangle

Trang 31

Inthealgorithmsintro inthenexttwo hapters, every point, t,andtriangle

has a unique index For example, there are n points in all, then h point has a indexnumberrangingfrom0 to n − 1

CUDA is a parallel programming model and a software environment for parallel

IntheCUDAmodel,theCPUis thehost andis tooneormore

multipro (SM) h SM is osed of manystreaming pro (SP), t

eight SPsperSM.The CUDA programminglanguageis an extension to Cand C++ with

some extra syntax Programmer dene parallel kernels on

the usingCUDAprogramminglanguage

A t of aCUDA program isasfollows

1 Allo memoryof theGPU

2 Copythe datafromthe CPUto theGPU

3 Congure thethread hoose the blo kand grid dimensionforthe

problem

4 hthethreads

5 hronize the CUDA threads to ensure that the has all its tasks

beforedoing furtheroperationson theGPU memory

6 thethreadshave datais kfrom theGPU to theCPU

7 Freethe GPUmemory

In all, GPU ismassivelymultithreaded, with hundreds of pro To elyutilize

the GPU, it isdesirable to have tens of thousands of threadsat anygiven time

As h, we keepin mind the following two design in developing algorithms for

theGPU

First, the GPU is most suitable for data-parallel in h the

same is performedon multiple of datab manythreads Therefore,we

needto makeourthread de(oralgorithm)assimple(withlittle trolow)

and asuniform(with similaramount of work variousthreads) aspossible

with so many threads, operation among threads, data and

are serious problems To mitigate this, we usually employ some simple

ks to break the set of tasks into several groups, within h the tasks be done

Trang 32

tlywith no orlittle Figure 2.8 shows a possible in parallel

pro-when insertingpointsinto the triangulation In thisgure, point a is supposedto

be inserted on the edge t to triangle A and C; point b is supposed to be inserted

on the edge t to triangleA and B; cshould be insertedon the edge t to B

and C Obviously, if there is more than one point h need to be inserted into a sametriangle, onlyone point be pro at atime But ifapointis insertedinto an edge,

it will hange two triangles in the triangulation So we need to make sure that no other

thread istrying to updatethese triangles In orderto solve the betweeninsertions,

we need to break the insertions into several rounds At h round, we want to insert as

manypointsaspossiblewithoutany An arrayX isneededforthetrianglesinthetriangulation,and every roundwe do thefollowingtwo steps:

Figure 2.8: Possible inparallelpro when inserting pointsinto the

triangu-lation

1 Lo the triangle or edge h tains the point p If the point is tained in

a triangle t, mark the triangle with X[t] = min(X[t], p) If p lies on an edge shared btwo triangles t 1 and t 2,we mark both triangles with X[t 1 ] =min(X[t 1 ], p) and

X[t 2 ] =min(X[t 2 ], p) Forthe showninFigure2.8,wemarkthreetrianglesas: X[A] =

min(X[A], a), X[A] = min(X[A], b), X[B] =min(X[B], b), X[B] = min(X[B], c), X[C] =

min(X[C], c), X[C] = min(X[C], a) In CUDA, when several threads try to writeto X[t],

itis guaranteed one ofthem will Themarkingis doneusingthe minimum

operation, h isreadilyavailableon theGPUs

2 UsingthearrayX to whether to insertpinthisroundornot p bepro

inthisroundifall themarkswritten b it (eitherone ortwo) are notoverwritten Forthe

shown in Figure 2.8, the point with smallest index among a, b and c will be inserted

inthisround

Weimplementouralgorithmsb usingtheCUDAprogrammingmodelb NVIDIA[NBGS08℄

Trang 33

DDR3 RAM and an NVIDIA GTX580 Fermi with 3GB of video memory.

CPU and GPU used here were bothtop-of-the-line at thetime of experiment Visual

Stu-dio2008 andCUDA5.5Toolkitareusedto alltheprograms,withalloptimizations

enabled Inthethesiswe theresultsfromouralgorithmswiththeresultsfromthe

best-known sequentialtrianglemeshgenerators Triangle [She96a℄and CGAL[CGA11℄

Triangle A2Dmeshgenerator, h generate DT,CDT, Delaunay

trian-gulations, Voronoi diagrams, and high-quality triangle meshes It has thousands of users,

with ranging from radiosity rendering and terrain databases to stereo vision

and imageorientation, aswellasdozens of variantsof n methods

CGAL The goal of the CGAL Open Pro is to provide easy to t

and reliable algorithms in the form of a C++ library CGAL is used invarious

aided design and modeling, information systems, biology,

imaging, rob and motion planning, meshgeneration, n methods and

soon

A to our experiment results, usually Triangle runs faster than CGAL, esp

on the Delaunay renement problemand CDT problem So in the thesis,we

always ouralgorithmswithTriangle unlessotherwisestated

Trang 34

A GPU Algorithm for Delaunay Renement

In this hapter, a Delaunay renement algorithm termed as GPU-QM is proposed, h

is the rst GPU solution for improving the mesh quality of DT of a set of points so far

A totheexperimentsonthesyn andreal-worlddatasets,ourGPUalgorithm

outperforms Triangle b 2 to 3 times speedup The qualityof themesh generated b our

algorithmisverysimilartothemeshgeneratedb Triangle,onlylessthan1%moreSteiner

pointsareinsertedinouralgorithm Furthermore,weproposedonevariantalgorithm

GPU-QM-V basedon theGPU-QM algorithm In thisvariant algorithm,a new for

the minimumseparation bound between any two Steiner points is used When using the

GPU-QM-V algorithm, we almost double the p (4 to 5.5 times speedup)

with no more than 4% Steiner points to Triangle In addition both

GPU-QM andGPU-QM-Valgorithmsareguaranteedtoterminate andn robust

Inthis hapter,weuseBastheupperboundof edgeratio,andanytrianglewhose edgeratioisbiggerthanB isabadtriangle Inthefollowing we rst intro some related works in 3.1 In 3.3, an

versionofGPU-QM algorithm)is Inthis algorithm,

we ignore the boundary rening problem, b in order to handle boundary rening,

several sp hanisms are needed h would divert attention away from the main

framework of the GPU-QM algorithm In 3.4 we add several hanisms to the

version, hthattheboundaryedges behandled In 3.5,

GPU-QM-V, thevariant algorithm of theGPU-QM is Allimplementationdetailsand key

GPU hniquesusedinthealgorithmsareshownin 3.6 In 3.7,we

ourresultswiththefastestsequentialimplementationonbothsyn andreal-worlddata

Finally, 3.8 the hapter

3.1 Related Works

Asmentionedbefore,the tralquestionofanyDelaunayrenementalgorithmiswherethe

nextpointshouldbeinserted Areasonableansweristhatthenewpointshouldbeinserted

as far from other v as possible A to the hnique used to hoose Steiner

Trang 35

points, there are three dierent strategies/algorithms, h are termed as

enter-insertion, enter-insertion, and sink-insertion, resp ely These three strategies will

beintro inthe following In addition,we willreviewsome parallel Delaunay

renementalgorithmsin theend

Inthiskindof thealgorithm,themainoperationis toinserta pointat the terof

a bad triangle(see Figure3.1), and maintainthe DTb usingBowyer-Watson's algorithm

to re-triangulate the prestar of the new point, orusingLawson'salgorithm to do edge-ip

on non-lo Delaunayedges

t

Figure3.1: Anytrianglewhose edgeratioislargerthanB issplit

b inserting its ter Every new edgehas lengthat least B times that of shortestedgeof thebad triangle Left: beforepoint insertion Right: after pointinsertion

TherearemanyDelaunayrenementalgorithmsusingthisstrategytoinsertSteinerpoints

in the literature Ruppert's Delaunay renement algorithm [Rup95℄ and Chew's

Delaunay renement algorithm [Che89b℄ are the best-known algorithms, h perform

wellin and have provable guaranteeson bothmeshqualityand termination Most

ofother algorithms([She00,Ü09,MPW03,EG01℄)followtheideasofthesetwo algorithms

Here we onlydetailRuppert'salgorithm inthefollowing

A segment is said to be o d if a point other than its endpointslies on or inside its

diametral Any hed segment shouldbe splitinto two segmentsb inserting a

point at its midpoint Forexample, in Figure 3.2,point p is inside the diametral ofsegment s, i.e., s is hed b p After splitting s at its midpoint with point v 1, thesegment sb two segments If theleft segment of s is still hed b p, theleftsegment shouldbesplit againb inserting apoint at its midpoint v 2

Ruppert's algorithm starts from the DT for a given PSLG If there is any

hed segment, split the hed segment at its midpoint, and update the DT

immediatelyuntilnosegmentis hed Thenweneedto kalltrianglesinthemesh

Ifthereisanybad triangle,splitthebadtriangleb insertingpointatits ter, and

Trang 36

have higher prioritythan bad triangles, h meansif the ter of a bad triangle

hesone ormore segments,the tershouldbe and the hed

segments should be split inadv As forthe order in h segments are split, or bad

triangles are split, is arbitrary Ruppert's algorithm uses Lawson's algorithm [Law77℄ to

maintain theDelaunaypropertyof thetriangulation

Figure 3.2: Segment is split elyuntilno segment is hed

For input with no angle is less than 60 ◦

◦)

that, theorderof insertingSteinerpointsis veryimportant Aspointedin[She96a℄,

aheapofbadtrianglesindexedb theirsmallestangle a35% inmeshsize

overa rst-in-rst-outqueue

tly,Üngör [Ü09℄proposedanewtypeofSteinerpoint hhe enterforthe

Delaunayrenementalgorithm(seeFigure3.3) Givenabadtrianglepqrwithshortestedge

pqand terc 1,wedenethe tertobethe terc 1ifthe

to-shortest edge ratio of pqc 1 is smaller than or equal to the upper boundB Otherwise,the teristhepoint conthe ofpq(andinsidethe of pqr), h

of thisalgorithmfrom Ruppert'salgorithm isthat thisalgorithmalways splitsa

low qualitytriangleat its ter other thanits ter

The author proves that the new algorithm has the same quality and sizeoptimality

guar-antees as the best known Delaunay renement algorithm In the new algorithm

insertsabout40%fewerSteinerpoints runsfaster)andgeneratestriangulationsthat

have about30%fewerelements withthe best previousalgorithms

Trang 37

(a) (b)

Figure 3.3: The ter and the ter of triangle pqr is labeled as c and c 1resp ely (a) If |c 1 p | ≤ B|pq|,then c = c 1 (b) Otherwise, c 6= c 1 and c is obtained bsetting |c 2 p | = B|pq|,wherec 2 is the teroftriangle pqc

ally,this his employed b Triangle that Triangle addsa t of 0.95

to the formula of the ters to improve robustin i.e., theformula

of cinFigure 3.3is modiedto |c 2 p | = 0.95 × B × |pq|,whereB = √

2

3.1.3 Sink-insertion

Edelsbrunnerand Guoy [EG01℄ propose sink-insertion asa new hnique to improve the

mesh quality of DT In 2D a triangle is a sink triangle if its ter is

tainedin theinteriorof the triangle In addition,the ter of a sink triangleis

a sink The ideaof thealgorithm isto eliminatebad trianglesb adding sinks

asnew pointsto theDT

Starting from anybad triangle, we needto nda sinkto be inserted Forexample, for an

existingbad trianglet 0,westartawalkalong the of t 0's terto thenexttriangle t 1 Then starting from t 1,do similarwalk until we nd a sink triangle t i In theend, t i's ter c is the sink to be inserted in the algorithm Figure 3.4 illustrateshowto ndasink fromthebad trianglet 0

Theauthors thatsink-insertion aboutthesamemeshqualityas

ter-insertion algorithm, but it does this in a more manner This is b bad

triangles tend to and share sinks Instead of dealing with a large number of

ters, thisalgorithmonlyworkswith asmallnumberof sinks In addition,thesinks

tendtobewellseparatedandthusexhibitfewerdep hisdesirableinparallel

implementation

Thissinkinsertionalgorithmstudiesthe ofusingsinksinsteadof tersinthe

Delaunay renement algorithm forimprovingthe meshquality However, theirexperiment

Trang 38

Figure 3.4: ter cisthesink The arrowshowshowto ndthesinkfrom a badtrianglet 0.

resultsshow thatthe to sinkssurprisinglyhaslittle onthemeshquality

The authors thought the sink-insertion algorithm should be more than the

ter-insertion algorithm, b bad triangles tend to and share sinks

Instead of dealing with a large number of ters, they therefore work with a

smallnumberofsinks However in there arefew bad triangles sharinga

sink triangle to our experiment results on both syn and real-world data

Even though some bad triangles at the beginning,after one or two point insertion

iterations,fewbadtrianglesmaystill together A ,whenthe tersare

insertedone b one,one terinsertionmayremovemorethanonebad triangle In

other words, ter-insertionmayhave thesame as thesink-insertionmethod

Sothe sink-insertionmethodhaslittle on thenumberofSteiner points A

tually,wehave triedto implementthismethodinparallelontheGPU,andtheexperiment

results show that there is little between this method and ter-insertion

LC99,LT01,CN99,STU04℄andsoon AllofthesealgorithmsemployRuppert'sorChew's

Delaunay renement algorithm Although the details of these algorithms are

dif-ferent, most of them employ a simple strategy: at h iteration, they hoose a set of

independent pointsto insert into the domain,and thenupdatetheDT orCDT The

ria of hoosing independentdata set aredierent for h algorithm,butall of them show

Trang 39

thenumberofiterationsisa ofLands,whereListhediameterofthedomain,and

sisthelengthofthesmallestedgeintheoutputmesh hindependentset behandled

b Ruppert'sorChew'smethod Furthermore,someofthese algorithms begeneralized

to three-dimension Allthese algorithmsfollow the paradigm

However, a parallelalgorithm,inorder to fullyutilizetheGPU hardware, usuallyneedsto

haveregularizedworkandlo data It isnot howto hieve these

whileadaptingthe strategy

tly, Nasre et al [NBP13℄ their GPU algorithm for Delaunay renement

gain up to 80 times speedup over the sequentialimplementation In their algorithm, they

triedto insert ters forbad triangles inparallel Beforea ter

cis inserted,all triangleswhose cshouldbe foundand marked Thenthey deletemarked triangles,and re-triangulatethe deletedregionafter inserting c Whenall ters are insertedat thesame time, one triangle be marked b more than

one ter Underthissituation,onlyone ter beinserted,other

ters shouldwait However, we repro thesame resultsasthey when

trying their de LonestarGPU We tried to run their de on uniform

dis-tributedpointsasrequiredb their de However,wefoundthattheoutputtriangulation

isnot aDT at all

3.2 Issues for a GPU Delaunay Renement Algorithm

Generally,there are several issuesthat need to be when designinga GPU

algo-rithm forDelaunayrenement algorithm

1 How to Steiner points in parallel Should we simply all bad triangles'

ters, or onlya partof them?

2 HowtoensureallSteinerpointsobtainedfromtherstquestionarepairwiseindependent

Or is it too aggressive to limitall Steinerpoints shouldbe pairwiseindependent? Is there

anyother to denetheminimumseparationbetween any twoSteiner points?

3 How to simulate the priority queue used in thesequentialimplementation on the GPU

tly

4 How to handlenon-degenerate to getrobustresult

5 Howto handletheboundaryrening,andmake sure thealgorithm terminate

Of thelastbutnottheleastissueishowtomaketheGPUalgorithmrunfast Inthe

next we will the algorithmof GPU-QM, h is a framework of the

whole algorithmand solveall issuesmentionedabove forthefthone In order

Trang 40

h will be intro in 3.4 In 3.5, we will the issue

again to make theGPU-QM algorithm runfaster

Inthis wefo onthe algorithm/frameworkoftheGPU-QMalgorithm The

inputisa stable ofaDT fora setofpointsS, hb denitionisa sub

whose tersalllieintheinteriorofitsunderlying WeuseB = 1astheupperboundof edgeratio,sointheoutputDT,allanglesarelargerthan

orequalto 30 ◦.

Althoughthere aremanyalgorithmswe useto generatetheinitialDT

of S, ouralgorithm is a GPU algorithm, we hoose to use the GPU algorithmmentionedin[QCT12℄to generate theDT forS hanismsforrening theboundaryis

3.3.1 Motivation and algorithm overview

the three stepsfor a sequentialDelaunayrenement algorithm Let us the

normal withno Forabadtrianglet,we should its terc

intherst,and ndthe tainerforc(dopointlo Thenwe insertcinto themeshand maintain the mesh to be a DT b usingedge-ips Repeat these steps until no more

bad triangleexist we alwayssimulate thesequentialalgorithm ontheGPU,one

nạve method bedesigned asfollows:

Nạvemethod 1: Letonethread handleone triangle Foranybad trianglet, its

ter c, and do point lo for cto ndits tainer triangle Then insert all c

into the mesh,using operation to voidinserting more thanone Steiner point into a

same triangle In theend, doedge-ip inparallelto maintain DT

When one Steiner point p is inserted, a regionnear p should be modied, and thisregion

is e region of p For a Steiner point p, its region is the union of

in the of its star in the DT mesh after inserting p, or in the ofits prestar in the DT mesh before inserting p When manySteiner points are inserted inparallel, their regions may overlap Points whose regions are pairwise

disjointarepairwiseindependent

This method may lead to many more Steiner points to the result from the

se-quentialimplementation, and even innite loop problems For example, assuming p and q

aretwo Steiner pointsinsertedinthe same roundforeliminating bad trianglet p and t q If

pandq arenotindependentwith hother,theirprestarsarepartiallyorfullyoverlappedwith hother Inanextreme whentheirprestarsarefullyoverlapped,inserting

Ngày đăng: 09/09/2015, 11:16

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w