Luận văn 3d point cloud reconstruction

After the data collection is completed, the point clouds also need to carry out a series of processing, such as point cloud denoising, smoothing, registration, and fusion.. This willnot

Trang 1

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY

HO CHI MINH UNIVERSITY OF TECHNILOGY

FALCULTY OF COMPUTER SCIENCE AND ENGINEERING

——————– * ——————–

REPORT CAPSTONE PROJECT

3D POINT CLOUD RECONSTRUCTION

Major: COMPUTER SCIENCE

THESIS COMMITTEE: COMPUTER SCIENCE - 02 CLCSUPERVISOR: NGUYEN DUC DUNG

REVIEWER: LE THANH SACH

——o0o——

STUDENT: NGUYEN PHUOC NGUYEN PHUC - 2053342

HO CHI MINH CITY, JUNE 2024

Trang 5

As the author of this work, Nguyen Phuoc Nguyen Phuc - 2053342, hereby declarethat the thesis entitled ”3D Point Cloud Reconstruction” represents my original workand ﬁndings, conducted under the supervision of Dr Nguyen Duc Dung

We acknowledge that:

• The work presented in this thesis, including the introduction, current work survey,baseline development, and experimental analysis, is based on my independentresearch conducted at Ho Chi Minh University of Technology

• All references, sources of information, and contributions from other researchers

or sources have been appropriately cited and acknowledged in accordance withacademic conventions and citation guidelines

• Any assistance, technical support, or guidance received during the course of thisresearch project has been duly acknowledged in the thesis

• The results, discussions, and conclusions drawn in this thesis are the outcome of

my analysis and interpretation of the data gathered during this study

• The thesis has not been previously submitted for any degree or qualiﬁcation atthis or any other institution

Ho Chi Minh, JUNE 2024

Author

Nguyen Phuoc Nguyen Phuc

Trang 6

We would like to express my heartfelt gratitude for your unwavering support out the journey of my project Your guidance and mentorship have been invaluable, andyour constant inspiration, feedback, and direction have truly made a signiﬁcant differ-ence We are deeply thankful for your dedication to my success, and we couldn’t haveaccomplished this without your help

through-we also want to extend my appreciation to all the other professors and teachers whohave equipped us with the knowledge and skills necessary to carry out this project Yourdedication to imparting knowledge and fostering a love for learning has been instrumen-tal in my academic growth

Additionally, we want to thank our friends who have provided valuable insights andideas, contributing to the development of this project Your willingness to share yourthoughts and collaborate with us has been a source of encouragement and inspiration.This project would not have been possible without the collective efforts, encourage-ment, and guidance of all these individuals we are sincerely grateful for their support,and we look forward to continuing to learn and grow under their mentorship and guid-ance

Once again, thank you from the bottom of my heart

Trang 7

Content SumaryIntroduction (Chapter 1):This chapter motivates the research on point cloud comple-tion, outlining the importance of accurate 3D object reconstruction from incomplete data.

It deﬁnes point cloud missingness and discusses its causes, highlighting the real-worldapplications where this problem arises

Background Knowledge (Chapter 2):This chapter provided the essential backgroundfor understanding 3D point cloud completion This foundation prepares you for the deeperdive into the mechanisms and advancements of point cloud completion in the followingsections

Survey (Chapter 3 + Chapter 4):These 2 chapters provide a comprehensive survey ofexisting point cloud completion methods It covers different datasets, evaluation metrics(Chapter 3), and previously proposed approaches (Chapter 4) A comparative analysis

of various methods highlights their strengths and weaknesses, offering insights into thecurrent state-of-the-art

Baseline Method (Chapter 5):This chapter focuses on the chosen baseline method,AdaPoinTr, for further investigation It provides a detailed description of the architec-ture, including the encoder-decoder structure, multi-head attention mechanism, positionalencoding, and other key components

Improve The Baseline Method (Chapter 6 + Chapter 7):These 2 chapters shedlight on the limitations of existing baseline models for 3D point cloud completion Weexplored the challenges these models face when dealing with incomplete and noisy data,which are hallmarks of real-world scenarios To bridge this gap and enable application toreal-life problems, we then proposed advancements that focus on enhancing the model’sability to handle imperfect data and generalize effectively to unseen real-world situations.These advancements aim to pave the way for robust and reliable point cloud completion

in practical applications

Experiments and Results (Chapter 8):This chapter presents the experimental results

of applying AdaPoinTr to various datasets It analyzes the performance of the model interms of accuracy and robustness, comparing it to other methods and highlighting itsstrengths and limitations The results offer valuable insights into the effectiveness ofAdaPoinTr and pave the way for further research and development

Conclusion (Chapter 9):This chapter summarizes the key ﬁndings of the thesis, phasizing the contributions of the point cloud completion research It discusses the limi-tations of the current approach and outlines potential avenues for future exploration Thischapter concludes by reiterating the importance and signiﬁcance of point cloud comple-tion in various computer vision applications

Trang 8

1.1 The Rise of 3D Data and Point Clouds 1

1.2 Challenges of Raw Point Cloud Data 3

1.3 3D Point Cloud Reconstruction: A Solution 4

1.4 Project Scope: Point Cloud Completion 5

1.5 Project Goals and Deliverables 5

2 Background Knowledge 7 2.1 Fundamentals of 3D Point Clouds 7

2.1.1 What is 3D Point Cloud 7

2.1.2 Data Representation 7

2.1.3 Point Cloud Processing Techniques 8

2.2 Machine Learning and Deep Learning in 3D Point Cloud 9

2.2.1 Machine Learning 9

2.2.2 Deep Learning 10

3 Datasets and Metrics for 3D Point Cloud 13 3.1 Datasets 13

3.1.1 ShapeNet [14] - A Large-Scale Dataset for 3D Shape Understanding 13 3.1.2 ModelNet [18]: A Clean and Categorized Dataset for 3D Point Cloud Analysis 14

3.1.3 PCN [21] Dataset: A Benchmark for Point Cloud Completion 15

3.1.4 S3DIS [19]: Unveiling the Structure of Indoor Scenes 15

3.1.5 KITTI [20] 16

3.2 Metrics 16

3.2.1 Chamfer Distance 16

3.2.2 Earth’s Mover Distances 17

3.2.3 F-Score 17

4 Literature Survey 19 4.1 Methods 19

4.1.1 Point-based methods 19

4.1.2 Convolution-based methods 22

4.1.3 Graph-based methods 23

4.1.4 GAN-based method 23

4.1.5 Transformer-based methods 24

4.2 Comparison 25

4.3 Survey Conclusion 26

Trang 9

5 AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware

5.1 Why chosing AdaPointr for 3D Point Cloud Completion 28

5.2 Relative Knowledge 28

5.3 AdaPoinTr’s Architecture 29

5.3.1 Set-to-Set Translation with Transformers 29

5.3.2 Point Proxy 30

5.3.3 Geometry-aware Transformer Block 31

5.3.4 Query Generator 32

5.3.5 Multi-Scale Point Cloud Generation 32

5.3.6 Adaptive Denoising Queries 32

5.3.7 Optimization 34

6 Enhancing the Existing Loss 36 6.1 Drawbacks of Chamfer Distance Loss 36

6.2 Solution: InfoCD [55] - A Contrastive Chamfer Distance Loss 36

6.2.1 Core Concept Of InfoCD 37

6.3 InfoCD Loss 38

6.3.1 Preliminaries 38

6.3.2 Chamfer Distance Loss 39

6.3.3 InfoCD Loss Function 39

6.3.4 Analysis 39

7 Multi-Object Completion 42 7.1 The Limitation of Synthetic Training 42

7.1.1 ShapeNet Normalization 42

7.1.2 AdaPoinTr Initial Setting 43

7.1.3 Limited Generalize ability 43

7.2 A more general learning 45

7.2.1 The Partial Normalize 45

7.2.2 Anti One Direction 45

7.2.3 Uniform Random Sampling 46

7.3 Multi-ShapeNet Dataset 46

7.3.1 Steps to create ShapeNet Room 47

7.4 Transfer Learning on S3DIS datset 48

8 Experiments 55 8.1 Object Point Cloud Completion 55

8.1.1 Benchmark for Diverse Point Completion 55

8.1.2 Results on PCN dataset 56

8.1.3 Results on ShapeNet34 57

8.1.4 Result on ShapeNet55 dataset 60

8.2 Multi Object Completion 65

8.2.1 Setting for Multi Object Completion 65

8.2.2 Experiments result on Multi ShapeNet 65

8.2.3 Experiment on S3DIS 67

Trang 10

9 Conclusion And Future Research Direction 779.1 Conclusion 779.2 Future Research Direction 78

Trang 11

List of Tables

3.1 Summary of existing datasets for point cloud completion [5] 17

4.1 Results on the PCN dataset [15] 26

4.2 Results on the ShapeNet55 dataset [15] 26

4.3 Results on the ShapeNet34 dataset [15] 26

4.4 Complexity analysis of existing methods 27

8.1 Result on PCN dataset after 100 epochs 56

8.2 Result on PCN dataset (InfoCD) after 100 epochs 57

8.3 Result on ShapeNet34 dataset (CD) after 10 epochs 59

8.4 Result on ShapeNet Unseen 21 (CD) after 10 epochs 60

8.5 Result on ShapeNet34 dataset (InfoCD) after 10 epochs 64

8.6 Result on ShapeNet21 Unseen (InfoCD) after 10 epochs 65

8.7 Result on ShapeNet55 dataset (CD) after 10 epochs (part 1) 66

8.8 Result on ShapeNet55 dataset (CD) after 10 epochs (part 2) 67

8.9 Result on ShapeNet55 dataset (InfoCD) after 10 epochs (part 1) 68

8.10 Result on ShapeNet55 dataset (InfoCD) after 10 epochs (part 2) 69

8.11 Test result on Multi ShapeNet 200 epoch with Chamfer Distance 69

8.12 Test result on Multi ShapeNet 200 epoch with Info Chamfer Distance 70

Trang 12

List of Figures

1.1 3D Point Cloud for Autonomous Driving Car 1

1.2 3D Point Cloud for Robotic 2

1.3 Example building point cloud after ﬂoor partitioning [4] 2

1.4 Schematic of complete point cloud and missing 70% point cloud [5] 4

1.5 Reasons for incomplete point clouds [5] 4

2.1 Convolutional Neural Network for Images [10] 10

2.2 Encoder-Decoder Transformer Architecture [14] 12

2.3 An illustration of a graph-based network [17] 12

3.1 Some categories in ShapeNet dataset [15] 14

4.1 End-to-end network for point clouds completion N represents the dimen-sion of latent space [5] 20

4.2 Description of two-step-folding decoding 21

4.3 The architecture of FoldingNet [35] 22

5.1 PoinTr[26] Architecture 30

5.2 Comparisons of the vanilla Transformer block and the proposed geometry-aware Transformer block [14] 31

5.3 Three types of queries for transformer decoder [15] 32

5.4 Improvement made by AdaPoinTr compare to PoinTr [15] 33

6.1 Illustration of comparison among CD, MI, and InfoCD with different numbers of samples [55] 38

6.2 Illustration of the moving directions of matched points 41

7.1 Some examples from ShapeNet It can be seen that all object, despite their size, is all at a same scale and direction 43

7.2 Examples when applying original model objects that are out of distibution 44 7.3 Some Rooms example from Multi-ShapeNet 50

7.4 Some rooms that collected from S3DIS dataset 52

7.5 Some objects that collected from S3DIS dataset 54

8.1 Inference on PCN after 100 epochs 58

8.2 Inference result on ShapeNet34 (easy missing) 61

8.4 Inference result on ShapeNet34 and 21 with 25% missing points 63

8.5 Point Spreading Comparing between a) CD and b) InfoCD 63

Trang 13

8.7 Multi-ShapeNet room View 1 a) Full room before cutting b) After moveing 50% point and c) After completion process using AdaPoinTr 718.8 Multi-ShapeNet room View 2 a) Full room before cutting b) After re-moveing 50% point and c) After completion process using AdaPoinTr 728.9 Multi-ShapeNet inference results 748.10 Transfer inferencing on S3DIS The left image is raw object, the right one

re-is after completion 76

Trang 14

Chapter 1

Introduction

In recent years, the world of technology has witnessed an incredible surge in the velopment of 3D technologies, leading to remarkable advancements in various ﬁelds Theapplications of 3D technology have transcended the realm of imagination, permeating ourdaily lives, and revolutionizing industries such as autonomous vehicles, architecture, gam-ing, and more These innovations have transformed the way we interact with the worldand opened new avenues for creativity and problem-solving

de-(a) Illustration of 3D Point cloud segmentation following the road slope

Ground points are green, obstacles are pink [1]

(b) Examples of autonomous vehicles In all model, the Lidar sensor can be seen on the roof ofthe car [2]

Figure 1.1: 3D Point Cloud for Autonomous Driving Car

Trang 15

Figure 1.2: 3D point cloud (colored) with pose estimates (grey) of OP-Net AP on world data without ICP reﬁnement for ring screws The gripper (blue) is visualized at thechosen grasp pose for execution together with the joint conﬁguration of the robot (white)[3].

real-The emergence of 3D technology has been particularly inﬂuential in the development

of autonomous vehicles, where it enables precise environmental mapping and navigation.Architects and urban planners now utilize 3D models to visualize and design structureswith unparalleled precision Gamers are immersed in lifelike virtual worlds, thanks to therealistic 3D environments created for their enjoyment These are just a few examples ofhow 3D technology has redeﬁned our experiences and possibilities

Within the ever-evolving landscape of 3D technology, point clouds stand as a lutionary force, silently gathering and transforming the intricate details of our physicalworld These dense collections of data points act like digital detectives, meticulously cap-turing the nuances of objects, spaces, and environments They are more than just dots on

revo-a screen; they revo-are the rrevo-aw mrevo-aterirevo-al for revo-a new errevo-a of understrevo-anding revo-and interrevo-acting with theworld around us

One of the most potent applications of point clouds lies in 3D scanning Imagineholding a wand that, instead of casting spells, captures the essence of reality in a digi-tal tapestry of points This is the power of 3D scanners equipped with laser or LiDARtechnology They sweep their beams across objects and landscapes, transforming everybump, curve, and crevice into a precise numerical map From the weathered facade of anancient temple to the intricate machinery of a modern factory, point clouds breathe digitallife into the physical world

Figure 1.3: Example building point cloud after ﬂoor partitioning [4]

But point clouds are not mere static representations They are the building blocks for

a dynamic digital realm Object recognition algorithms dance across these data points,identifying furniture in a room, cars on a highway, or even individual trees in a forest This

Trang 16

unlocks a world of possibilities – robots navigating complex environments, autonomousvehicles making split-second decisions, and engineers analyzing the structural integrity

of aging infrastructure

The reach of point clouds extends beyond the tangible realm In the domain ofmedicine, they transform CT scans and MRIs into intricate 3D models of bones, organs,and even blood vessels This allows surgeons to virtually explore the human body, plandelicate procedures with pinpoint accuracy, and even print custom implants that perfectlymatch a patient’s anatomy

From the bustling streets of modern cities to the depths of archaeological digs, pointclouds are weaving a digital tapestry of our world They are not just capturing reality;they are transforming it, enabling us to analyze, manipulate, and even recreate the spaces

we inhabit As this technology continues to evolve, the possibilities seem endless So, thenext time you look around, remember – amidst the familiar sights, a silent symphony ofdata points is playing out, capturing the essence of our world and paving the way for afuture where the boundaries between the physical and digital blur beautifully

While 3D point clouds offer a valuable representation of object surfaces, their rawform presents several challenges that hinder their direct use in various applications Dur-ing data acquisition, the 3D laser scanner will be affected by the characteristics of themeasured object, the processing method, and the environment, inevitably leading to themissing of points (ﬁg 1.4) As shown in Fig 1.5, the main reasons could be attributed

to specular reflection, signal absorption, occlusion of external objects, self-occlusion ofobjects, and blind spots The former two are due to the surface material of the objects,which might absorb or reflect the LiDAR signal in an unexpected way The latter threeare mainly due to occlusion, which can be completed with the aid of other parts of theobjects or by utilizing multi-source data Moreover, the stability of the 3D scanner in thescanning process also has a particular influence on the scanning quality Here’s a closerlook at these limitations:

• Sparsity: One of the most signiﬁcant challenges is sparsity Raw point clouds oftenlack a sufﬁcient number of data points to capture the complete and intricate details

of an object’s surface This sparsity can lead to an incomplete picture of the object,limiting its usefulness in tasks like detailed object recognition or high-ﬁdelity 3Dmodel generation

• Noise: Real-world sensor limitations and environmental factors can introduce noiseinto the point cloud data This noise manifests as inaccurate or outlier points that de-viate from the actual object’s surface The presence of noise can signiﬁcantly impactthe accuracy of downstream applications that rely on precise measurements and cleandata

• Irregularity: Unlike a structured grid, raw point clouds exhibit an irregular tion of points This irregularity arises from the nature of the data acquisition process,where points are captured at varying densities and distances depending on the sensorand object characteristics This irregular distribution can make it difﬁcult to performanalysis and processing tasks on the point cloud data using conventional methodsdesigned for structured formats

Trang 17

distribu-• Incomplete Information: Raw point clouds often lack additional information yond the 3D coordinates of each point This missing information could include colordata (RGB values), surface normals, or material properties The absence of such de-tails can limit the usability of the point cloud in applications like object classiﬁcation

be-or scene understanding, where this additional infbe-ormation plays a crucial role

Figure 1.4: Schematic of complete point cloud and missing 70% point cloud [5]

Figure 1.5: Reasons for incomplete point clouds [5]

After the data collection is completed, the point clouds also need to carry out a series

of processing, such as point cloud denoising, smoothing, registration, and fusion At thesame time, these operations will signiﬁcantly exacerbate the missing of points This willnot only affect the data integrity and lead to topology errors but also affect the quality

of the point cloud refactoring, the 3D model reconstruction, local spatial informationextraction, and subsequent processing

The limitations of raw point cloud data - sparsity, noise, and irregularity - becomesignificant hurdles in tasks like object recognition, scene understanding, and 3D modelgeneration 3D point cloud reconstruction tackles these issues by transforming the datainto a more complete and accurate representation of the underlying object or scene Thisrefined representation can be achieved through various techniques, each addressing spe-cific aspects of the raw data:

• Point Cloud Completion: This type of reconstruction focuses on ﬁlling in the ing information present in sparse point clouds Techniques like interpolation, nearestneighbor search, and advanced algorithms like deep learning can be employed to es-timate the missing points and create a denser representation of the object’s surface.This is crucial for applications where a complete 3D model is necessary, such as 3Dprinting or detailed object analysis

Trang 18

miss-• Point Cloud Denoising: Raw point cloud data can be corrupted by noise due to sor limitations or environmental factors Denoising techniques aim to remove theseinaccuracies while preserving the true geometry of the object Common approachesinclude statistical ﬁltering methods and machine learning algorithms trained to dis-tinguish between valid points and noise Denoised point clouds are essential for taskswhere accurate geometric measurements are required, such as robotics manipulation

sen-or scene understanding in autonomous vehicles

• Point Cloud Smoothing: Irregularities in the distribution of points within a cloudcan lead to a rough and uneven surface representation Smoothing techniques aim toaddress this by creating a smoother and more continuous surface while maintainingthe overall shape of the object This can be achieved through geometric averaging

or advanced ﬁltering methods Smoothing is particularly beneﬁcial for applicationslike visualization and generating high-quality meshes for 3D models used in virtualreality or augmented reality experiences

• Topological Correction: In some cases, raw point clouds may contain topologicalerrors, such as holes or inconsistencies in the surface Topological correction methodsaim to identify and address these errors by ﬁlling holes, reconnecting disconnectedcomponents, and ensuring a valid surface representation This is crucial for taskslike watertight mesh generation or 3D object analysis that relies on a clean and well-deﬁned surface

By addressing these specific aspects of raw point cloud data, reconstruction techniquespave the way for a wide range of applications in various fields The choice of specificreconstruction methods depends on the desired outcome and the nature of the data itself

This capstone project tackles the challenge of point cloud completion within the realm

of 3D point cloud reconstruction Raw point clouds often suffer from sparsity, meaningthey lack a sufficient number of data points to accurately capture an object’s complete sur-face This sparsity can significantly hinder downstream applications like 3D object recog-nition, detailed model generation for 3D printing, and scene understanding in robotics.Our project focuses on developing and evaluating a method for effective point cloudcompletion We aim to achieve this by exploring techniques that can analyze the existingpoints within a sparse cloud and utilize that information to estimate and fill in the missingdata points This process should result in a denser and more comprehensive representation

of the object’s surface

This project tackles the challenge of incomplete 3D point clouds These point clouds,like digital sketches, lack the full picture Our goal is to transform them!

We’ll achieve this by:

• Finding the Best Completion Method: We’ll explore existing techniques, from sic methods to cutting-edge AI, to ﬁnd the one that most accurately ﬁlls in the missingdata

Trang 19

clas-• Pushing the Boundaries: We’ll build upon this chosen method, adding our owninnovative touches to make it even better at revealing the missing details.

• Testing in the Real World: Theories need proof! We’ll unleash our improvedmethod on various datasets of incomplete point clouds, like solving puzzles Thiswill help us reﬁne it and contribute valuable knowledge to the ﬁeld

Following an introduction that ignites interest in point cloud completion and its cations, Chapter 2 equips readers with foundational knowledge on 3D point clouds andreconstruction techniques Chapters 3 and 4 then delve into existing datasets, metrics,and research through a literature review, building a strong base for our proposed approach.The heart of the report lies in Chapters 5, 6, and 7, where we introduce our chosen methodfor point cloud completion, discuss its limitations, and detail the innovative improvements

appli-we implemented Chapter 8 details the experimentation process, presenting the datasetsused and the achieved results in terms of accuracy and effectiveness Finally, Chapter 9summarizes our key ﬁndings, the impact of this research, and exciting possibilities forfuture exploration, while last part provides comprehensive references

Trang 20

Chapter 2

Background Knowledge

This chapter lays the groundwork for understanding the project by establishing some key concepts We’ll explore fundamental technical aspects relevant to the chosen field, providing a foundation upon which the later chapters will build This background knowledge equips you with the necessary context to grasp the technical details and complexities presented throughout the report.

2.1.1 What is 3D Point Cloud

Point clouds serve as the foundation for this project They represent 3D objects orenvironments like a digital point cloud, where each point holds a speciﬁc location inspace These locations are typically deﬁned by their X, Y, and Z coordinates, akin to

a three-dimensional grid reference system Imagine a vast collection of dots in space,meticulously positioned to form a digital representation of the 3D world

While seemingly simple, this discrete representation offers a powerful and efficientway to store and process complex 3D shapes It’s important to note that some pointclouds may contain additional information beyond just spatial coordinates Depending onthe data acquisition method, points might also store intensity values, reflecting the lightcaptured at that specific location This additional data can be valuable for further analysisand processing

• Octrees: Octrees are hierarchical data structures that partition 3D space into cubicregions (octants) at multiple resolutions They enable efﬁcient spatial indexing andquerying of point cloud data, particularly for spatial partitioning and neighbor search

Trang 21

operations Octrees offer adaptive resolution, allowing ﬁner representation of denseregions and coarser representation of sparse regions.

• Voxel Grids: Voxel grids divide 3D space into equally sized cubic volumes (voxels),providing a uniform representation of space They simplify spatial operations andcomputations and are suitable for voxel-based reconstruction and analysis algorithms.Considerations for efficient storage, transmission, and processing of point cloud datainclude selecting the appropriate data representation based on factors such as data size,computational requirements, and specific application needs Additionally, compressiontechniques and optimization algorithms can be employed to minimize storage and trans-mission overhead while maintaining data fidelity and accessibility Understanding theadvantages and disadvantages of each representation method is crucial for optimizing theperformance and scalability of point cloud processing workflows

2.1.3 Point Cloud Processing Techniques

Point clouds, despite their simplicity, often require processing before they can be fectively utilized This section delves into some general techniques used to manipulateand analyze point cloud data:

ef-• Filtering: Imagine sifting through a bucket of sand to ﬁnd seashells Filtering moves unwanted points from a point cloud, such as noise caused by sensor limitations

re-or environmental factre-ors This can involve techniques like statistical outlier removal

or applying distance thresholds to eliminate points outside a speciﬁc range

• Segmentation: Have you ever tried untangling a knot of Christmas lights? tation tackles a similar challenge for point clouds It involves grouping points thatbelong to the same object, separating individual objects within a scene This allowsfor focused analysis and processing on speciﬁc objects captured in the point cloud

Segmen-• Registration: Imagine stitching together multiple photographs to create a panorama.Point cloud registration achieves a similar goal It aligns multiple point clouds ofthe same scene captured from different viewpoints, creating a uniﬁed and completerepresentation of the environment

• Feature Extraction: Just as fingerprints provide unique identification, point cloudshold valuable features that can be extracted for further analysis This might involveidentifying geometric properties like edges, planes, or curvature within the pointcloud data These features can be crucial for tasks like object recognition, classifi-cation, or robot navigation

These are just a few of the common processing techniques used on point clouds While

we won’t delve into intricate details here, understanding their existence provides a dation for appreciating the challenges addressed by point cloud completion, which will

foun-be the focus of later chapters

Trang 22

2.2 Machine Learning and Deep Learning in 3D Point

Cloud

The field of 3D point cloud reconstruction has been revolutionized by the mative power of Artificial Intelligence (AI), particularly through the advancements inMachine Learning (ML) and Deep Learning (DL) These techniques have become pow-erful tools for unlocking the potential of point cloud data While traditional methods laidthe groundwork, Deep Learning, a subfield of ML, has emerged as a game-changer.This chapter delves into the core concepts of Machine Learning and how it empowerscomputers to ”learn” from data, specifically focusing on its role in point cloud processing.We’ll then explore Deep Learning in more detail, highlighting its ability to capture intri-cate patterns within the data, leading to groundbreaking results in reconstruction tasks.Additionally, we’ll touch upon some common ML and DL methods frequently encoun-tered in this field, equipping you with foundational knowledge for further exploration

transfor-2.2.1 Machine Learning

Machine Learning (ML) encompasses a broad range of algorithms and techniquesthat empower computers to learn from data These algorithms can analyze vast datasetsand identify patterns and relationships, allowing them to make data-driven predictions ordecisions without explicit programming for each speciﬁc task In the realm of 3D pointcloud reconstruction, ML provides valuable tools for various tasks

Some commonly used techniques in 3D point clouds include:

• K-Nearest Neighbors (KNN) [6]: K-Nearest Neighbors (KNN) is a straightforwardand intuitive algorithm used for both classification and regression tasks in machinelearning Instead of training a model on the data, KNN stores the entire dataset andmakes predictions by finding the K closest data points to the query point based on adistance metric, typically Euclidean distance For classification tasks, KNN assignsthe most common class label among the K nearest neighbors, while for regressiontasks, it calculates the average of the target values of the K nearest neighbors KNN

is easy to understand and implement It is widely used in 3D Point Cloud task whenextracting local features KNN identiﬁes the k closest points surrounding this point,acting like a detective examining its immediate surroundings By analyzing the prop-erties (location, intensity) of these neighboring points, we can extract valuable localfeatures of the point cloud in that speciﬁc region

• Clustering: Clustering is a fundamental task in point cloud processing, aiming togroup points into distinct clusters This helps us segment the point cloud into mean-ingful objects or regions Clustering can also be used to identify and remove outlierpoints, effectively denoising the point cloud Points that fall outside of establishedclusters are more likely to be noise or artifacts, and can be ﬁltered out after the clus-tering process One common clustering method is DBSCAN [7]

• Multi Layer Perceptron (MLP) [8]: MLPs can analyze the entire point cloud ine a team of analysts working together to understand a complex scene MLPs func-tion similarly, but with interconnected layers of artiﬁcial neurons instead of analysts.These neurons, inspired by the structure of the human brain, process informationlayer by layer, allowing the MLP to learn complex patterns within the data

Trang 23

Imag-In the context of point clouds, MLPs excel at feature extraction They can analyzethe entire set of points and extract high-level features that capture the overall shape,curvature, and other geometric properties of the scene In fact, MLP is the key methodfor processing point cloud in many model It is the bridge lead to a ”deeper” concept:Deep Learning.

2.2.2 Deep Learning

While Machine Learning offers valuable tools for point cloud processing, Deep ing unlocks a new level of sophistication The ﬁeld of 3D point cloud processing hasundergone a revolution with the emergence of Deep Learning (DL) Unlike traditionalmethods, DL leverages the power of complex artiﬁcial neural networks (MLP) Thesenetworks, inspired by the human brain, can learn intricate patterns hidden within mas-sive datasets of point clouds Most model for 3D Point Cloud Completion apply DeepLearning in most of their structure These method are:

• Convolutional Neural Networks (CNNs) [9]: are a game-changer in Deep ing, particularly adept at analyzing visual data like images and videos Unlike stan-dard neural networks, CNNs focus on identifying patterns within localized regions

Learn-of the data Imagine dissecting an image into smaller tiles – CNNs work similarly,with each layer analyzing these tiles to extract speciﬁc features Through a series ofconvolutional ﬁlters acting like specialized lenses, CNNs progressively learn increas-ingly complex features This allows them to excel at tasks like image recognition(identifying objects within an image) and object detection (locating and classifyingobjects even if partially hidden) Their ability to learn intricate spatial relationshipswithin visual data makes CNNs a cornerstone of Deep Learning advancements acrossvarious computer vision applications

Figure 2.1: Convolutional Neural Network for Images [10]

• Variational Autoencoders (VAEs) [11]: are a type of Deep Learning architecturethat falls under the umbrella of generative models Unlike some Deep Learning mod-els designed for speciﬁc tasks like image recognition, VAEs delve into the fascinatingworld of generating new data Imagine having a magic machine that can learn theunderlying patterns of an image and not only describe them, but also create entirelynew images that share those same characteristics! That’s the core idea behind VAEs.VAEs work in a two-part process: Encoding and Decoding

Trang 24

– Encoding: VAEs act like a skilled artist compressing a complex image into a pler sketch This compressed version captures the essence of the original image but

sim-in a lower-dimensional space, often referred to as the latent space

– Decoding: Just like the artist uses the sketch to recreate the full image, the VAEutilizes the latent space representation It ”decodes” this compressed information

to generate a new image that resembles the original but might contain variations orslight modiﬁcations

• Generative Adversarial Networks (GANs) [12]: Unlike VAEs focused on pressing data and generating similar variations, GANs push the boundaries of datacreation Imagine two rival artists, one a forger (generator) and the other a detec-tive (discriminator) GANs pit these two models against each other in a continuouslearning process

com-The Art of Generator and Discriminator:

– The Generator: This model acts like a creative forger, constantly striving to duce realistic new data samples that closely resemble the real data In the context

pro-of images, the generator might create new photos pro-of cats that look indistinguishablefrom real photographs As the discriminator catches on to the generator’s forgeries,the generator is forced to adapt and create even more realistic counterfeits Overtime, the generator becomes adept at producing data that can fool the discriminator.– The Discriminator: This model acts as a vigilant detective, constantly sharpeningits skills to distinguish between real data and the forgeries produced by the gener-ator The continuous stream of increasingly realistic forgeries from the generatorkeeps the discriminator on its toes It learns to identify even the subtlest inconsis-tencies between real and fake data

• Transfomer [13]: Transformers have emerged as a powerful architecture for tial data Imagine a team of experts analyzing a long sentence, not just individualwords, but also the relationships between them Transformers function similarly,but with artiﬁcial neural networks speciﬁcally designed for sequential data like text,speech, and even point clouds with a sequential order

sequen-– Attention Mechanism Unlike traditional neural networks that process data quentially, Transformers rely on a powerful concept called the attention mecha-nism This mechanism allows the network to focus on the most relevant parts ofthe sequence when processing each element Think of it as the team of expertsnot just reading the sentence but also dynamically referencing speciﬁc words tounderstand the overall meaning

se-– Long-Range Dependencies: Transformers can capture long-range dependencieswithin sequences, something traditional methods struggled with This is crucialfor tasks like machine translation, where understanding the relationship betweenwords at the beginning and end of a sentence is essential

– Parallelization: The architecture of Transformers allows for parallel processing ofinformation, making them computationally efﬁcient for handling large datasets

Trang 25

Figure 2.2: Encoder-Decoder Transformer Architecture [14].

• Graph Neural Network (GNN) [16]: Unlike CNNs and Transformers focused ongrids or sequences, GNNs excel at processing data structured as graphs Imagine acomplex network of interconnected objects, where the connections themselves holdvaluable information Social networks, road networks, and even molecules can berepresented as graphs GNNs are speciﬁcally designed to analyze these intercon-nected structures

GNNs don’t just process individual nodes (data points) within the graph They delvedeeper, analyzing the relationships between these nodes Imagine a social networkwhere the nodes represent people and the connections represent friendships GNNscan not only learn about each person’s proﬁle but also understand how friendshipsinﬂuence them and the overall network

Figure 2.3: An illustration of a graph-based network [17]

How GNNs Work: GNNs operate in iterative rounds:

– Message Passing: Nodes exchange information with their neighbors, like friendssharing updates on a social network This allows each node to gain insights not justfrom its own data but also from the data of its connected neighbors

– Node Update: After receiving messages, each node updates its own internal resentation, incorporating the newly acquired information about its connections

Trang 26

The availability of high-quality datasets is crucial for training and evaluating DeepLearning models Here are some prominent datasets frequently used in point cloud re-search:

3.1.1 ShapeNet [14] - A Large-Scale Dataset for 3D Shape

Under-standing

ShapeNet stands as a cornerstone dataset within the realm of 3D point cloud ing research This meticulously curated collection offers a rich tapestry of 3D shapes,encompassing a diverse range of object categories

water-• ShapeNet Parts: This sub-category delves deeper, providing a more granular view of3D shapes It includes the same 55 object categories from ShapeNetCore, but eachmodel is segmented into its constituent parts This allows researchers to explore part-level recognition and understand the relationships between different parts within acomplex object The size of ShapeNet Parts varies depending on the object complex-ity, but it offers a rich dataset for part segmentation tasks

Trang 27

Data Format:

ShapeNet provides the data in two primary formats:

• Point Clouds: Each 3D model is converted into a point cloud representation, whereeach point corresponds to a speciﬁc location in 3D space This format is most com-monly used for Deep Learning applications in point cloud processing

• Meshes: The original watertight mesh representation of the 3D models is also able This format can be useful for visualization purposes or for tasks requiring access

avail-to the complete surface information of the object

Figure 3.1: Some categories in ShapeNet dataset [15]

3.1.2 ModelNet [18]: A Clean and Categorized Dataset for 3D Point

Cloud Analysis

Alongside ShapeNet, ModelNet stands as another prominent dataset within the realm

of 3D point cloud processing While ShapeNet offers a vast collection with diverse egories, ModelNet focuses on providing a clean and well-structured dataset speciﬁcallydesigned for tasks like object recognition and classiﬁcation

cat-Composition of ModelNet:

ModelNet comprises a collection of 12,311 pre-aligned 3D shapes categorized into

40 distinct object classes These classes encompass a variety of everyday objects, cluding furniture (chairs, tables), electronic devices (laptops, screens), and vehicles (cars,airplanes)

in-Key Characteristics:

Trang 28

• Clean and Watertight Meshes: Unlike raw point cloud data, ModelNet provides 3Dmodels represented as clean and watertight meshes This ensures consistent dataquality and simpliﬁes pre-processing steps for tasks like object recognition.

• Consistent Alignment: All models within ModelNet are meticulously aligned, suring a standardized orientation for each object This consistency facilitates easierfeature extraction and model training for classiﬁcation tasks

en-• Balanced Representation: Each object class within ModelNet contains a relativelybalanced number of models, minimizing bias towards speciﬁc categories during train-ing

3.1.3 PCN [21] Dataset: A Benchmark for Point Cloud CompletionThe PCN (Point Completion Network) dataset serves as a benchmark for evaluatingmachine learning models designed for 3D point cloud completion tasks This dataset fo-cuses on providing incomplete point cloud representations of objects, challenging models

to predict the missing parts and reconstruct the complete object

Data Composition:

The PCN dataset is derived from a subset of the ShapeNet dataset, a widely recognizedcollection of 3D models categorized into various object classes Each sample within thePCN dataset consists of two point clouds:

• Partial Point Cloud: This represents a portion of the complete object, containing asubset of the original points The percentage of missing points can be customizeddepending on the desired difﬁculty of the completion task

• Ground Truth: This represents the full, complete point cloud of the object, serving asthe reference solution for evaluating model performance

Both the partial point cloud and the ground truth are represented with spatial coordinates(x, y, z) for each point

3.1.4 S3DIS [19]: Unveiling the Structure of Indoor Scenes

Shifting our focus from object-centric datasets to broader environments, we encounterthe Stanford 3D Indoor Scenes Dataset (S3DIS) Unlike ModelNet and ShapeNet, whichconcentrate on individual objects, S3DIS delves into the realm of capturing the completestructure and semantic labeling of indoor scenes

Composition of S3DIS:

• Six Areas: The dataset encompasses six diverse indoor areas, including ofﬁce spaces,conference rooms, and hallways This variety allows researchers to develop modelsthat generalize well across different indoor settings

• Point Clouds: Each area within S3DIS is represented by a meticulously capturedpoint cloud These point clouds contain rich information about the 3D structure ofthe scene, including the location and intensity of each point

Trang 29

• Semantic Labels: Beyond just geometry, S3DIS provides semantic labels for eachpoint in the scene These labels categorize points into 13 distinct classes, such aswalls, ﬂoors, ceilings, furniture, and doors This semantic information allows re-searchers to tackle tasks like scene understanding and object detection within indoorenvironments.

3.1.5 KITTI [20]

Transitioning from indoor scenes to the complexities of outdoor environments, weencounter the KITTI Vision Benchmark Suite (KITTI) This dataset stands as a prominentresource for researchers developing and evaluating algorithms crucial for autonomousdriving applications

KITTI offers a rich tapestry of data captured from a car driving in various outdoorenvironments Here’s what it provides:

• Highly Realistic Data: The dataset comprises real-world sensor data, including:

– Calibrated Stereo Camera Images: Capturing the visual scene from two tives, similar to human vision

perspec-– Velodyne Lidar Scans: Providing high-ﬁdelity 3D point cloud data of the ings, crucial for object detection and depth perception

surround-– GPS/IMU Data: Offering information about the car’s location, orientation, andmovement, essential for tasks like localization and mapping

• Diverse Scenarios: KITTI encompasses various driving scenarios, including urbanstreets, highways, and rural roads This diversity allows researchers to develop mod-els that can handle a wide range of real-world driving conditions

• Ground Truth Annotations: The dataset provides meticulously labeled annotationsfor objects within the scene These annotations typically include bounding boxes, 3Dbounding boxes, and object classiﬁcations (e.g., car, pedestrian, cyclist)

For 3D point cloud completion, the Chamfer Distance (CD) [24] and Earth Mover’sDistance (EMD) [24] are the most frequently used performance criteria CD tries to ﬁndthe minimum distance between two sets of points, while EMD evaluates the reconstruc-tion quality of the point clouds

Trang 30

Name Year Classes Sensors or

origin Type Views ResolutionsPCN [21] 2015 8 CAD Synthetic 8 2048/4096/

8192/16384 ShapeNet55 [14] 2021 55 CAD Synthetic all possible

views 8192ShapeNet34 [14] 2021 34 CAD Synthetic all possible

views 8192KITTI [20] 2021 8 RGB &

LiDAR

Urban (Driving) -

depends on the network ModelNet [18] 2015 10 or 40 CAD Synthetic 12 2048/16384 Completion3D [22] 2019 8 CAD Synthetic - 1024/2048/16384 Multi-View Partial

Point Cloud (MVP) [23] 2021 16 CAD Synthetic 26 2048/4096/16384Single-View Point

Cloud Completion [23] 2021 16 CAD Synthetic 1 2048/4096/16384

Table 3.1: Summary of existing datasets for point cloud completion [5]

S1 There are two variants to CD: CD-T (CD-l1) and CD-P (CD-l2) The deﬁnitions of

CD-T and CD-P between two point clouds S1 and S2 are as follows:

3.2.2 Earth’s Mover Distances

EMD aims to ﬁnd out a bijectionφ : S1 → S2 to minimizes the average differencebetween paired points, one from a partial set and the other from a complete set Different

from CD, the size of S1and S2needs to be the same:

Trang 32

This limitation highlights the need for approaches that can effectively utilize the richgeometric information embedded within point clouds More sophisticated methods areneeded to capture not only individual point features but also the relationships and interac-tions between them, leading to more accurate and faithful reconstructions.

Preliminary works Pioneered by PointNet [27], a few works used MLP for the cessing and recovering of the point clouds due to its concise and non-negligible ability ofrepresentations PointNet++ [28] and TopNet [29] incorporated a hierarchical structure totake the geometric information into consideration PointNet++ introduced two innovativeset abstraction layers that cleverly combine multi-level information for point cloud pro-cessing In contrast, TopNet presented a novel decoder capable of generating structuredpoint clouds without requiring any pre-deﬁned structure or topology

pro-To alleviate the loss of structural information often caused by Multilayer Perceptrons(MLPs), AtlasNet [30] and MSN [31] propose novel methods for reconstructing completepoint clouds Both approaches leverage parametric surface elements from which the ﬁnalpoint cloud is generated

AtlasNet [30] utilizes an additional input, a 2D point within a unit square, to produce

a single point on the surface of the object Essentially, this method generates a ous image of a plane, allowing for iterative reconstruction of a 3D shape by combiningnumerous surface elements

Trang 33

continu-MSN [31], on the other hand, introduces a morphing-based decoder that addresses theissue of structural loss This decoder morphs unit squares, representing individual surfaceelements, into the coarse point cloud, preserving the overall structure of the object.

Both AtlasNet and MSN offer innovative solutions for mitigating structure loss inpoint cloud completion tasks, providing valuable alternatives to traditional MLP-basedapproaches

PCN-drived methods Hebert et al [32] pioneered learning-based shape completionwith their Point Completion Network (PCN) Unlike traditional methods, PCN directlyoperates on raw point clouds without relying on structural assumptions (e.g., symme-try) or manual shape annotations (e.g., semantic class) Its decoder design enables ﬁne-grained completion with a limited number of parameters

End-to-end mechanism In point-based methods, end-to-end architecture within theencoder-decoder scheme (Fig 2.1) is prevalent The encoder extracts both global shapefeatures and regional features for each point, while the decoder generates and reﬁnes thecompleted point cloud Lake-Net [33] takes a topology-aware approach for point cloudcompletion It localizes keypoints and follows a novel Keypoints-Skeleton-Shape predic-tion pipeline

Figure 4.1: End-to-end network for point clouds completion N represents the dimension

of latent space [5]

Attention-assited methods Attention is a flexible medium for learning informationtone- adaptively, and the accumulated important information is weighted largely PointAttN[34] employs the leverage the cross-attention and self-attention mechanisms to tackle thepoint completion task in a coarse-to-fine manner It mainly comprises three modules: afeature extractor block for local geometric structure and global shape feature capturing, aseed generator block for the coarse point cloud generation, and a point generator block toproduce the fine-grained point cloud

Folding-derived methods As a generic architecture ﬁrstly proved by Yang et al [35],Folding-based decoders, pioneered by Yang et al [35], have demonstrated impressive ca-pabilities in reconstructing intricate point clouds from 2D grids with low error rates (Fig2.2, Fig 2.3) FoldingNet operates by applying a ”virtual force” that deforms, cuts, andstretches a 2D grid onto the target 3D surface This force is modulated by the intercon-nections between adjacent meshes

Trang 34

Figure 4.2: This figure demonstrates a two-step folding decoding process The secondcolumn showcases the initial 2D grid that undergoes folding transformations The thirdcolumn displays the intermediate results after applying the first folding operation Finally,the fourth column presents the final point cloud reconstruction after applying both folds.The color gradient visually links the corresponding points between the initial 2D grid andthe reconstructed point cloud, providing a clear understanding of the folding process andits impact on the final output [35].

Trang 35

The intermediate folding steps and training processes are visualized through struction points, providing intuitive feedback on the gradual variation of the folding force.Folding-based methods, such as MSN [31] and PoinTr [14], typically sample 2D gridsfrom a ﬁxed-size plane and combine them with global shape features extracted by the en-coder AtlasNet [30], MSN [31], and SA-Net [36] all rely on evaluating a set of parametricsurface elements to reconstruct the complete object.

recon-Figure 4.3: The architecture of FoldingNet [35]

FoldingNet has ﬁrmly established itself as the most popular choice for decoding inpoint cloud completion networks However, despite its widespread use, FoldingNet suf-fers from a signiﬁcant limitation Its folding operation applies the same 2D grid to everyparent point, neglecting the unique local shape characteristics present in each individualpoint This oversight hinders the ability to accurately capture intricate details and nuances

in the reconstructed shape

4.1.2 Convolution-based methods

Encouraged by the great success of convolutional neural networks (CNNs) on 2Dimages, several works try to utilize 3D CNNs to learn the volume representation of three-dimensional point clouds Nevertheless, transforming a point cloud into 3D volume willbring quantization effects: (1) Loss of details; (2) Insufﬁcient to represent ﬁne-grainedinformation Therefore, some research efforts have focused on applying CNNs directly toirregular, partial, and defect point clouds for 3D shape completion This approach avoidsthe quantization issues associated with 3D volume conversion and potentially allows formore accurate and detailed reconstruction of the complete shape

Preliminary works In terms of the processing of point clouds, several contributionsdeveloped CNNs acting on discrete 3D grids of point cloud transformation Hua et al.[37] defined convolution kernels on regular 3D grids, where the points are given the sameweights when falling into the same grid PointCNN [38] implements permutation invari-ance through a X -conv transformation In addition to CNN on discrete spaces, severalmethods define convolution kernels on continuous space A rigid and deformable kernelconvolution (KPConv) module was devised by Thomas et al [39] to utilize a collection oflearnable kernel points for 3D point clouds The dynamic filter was extended into a con-volution operator dubbed PointConv by Tao et al [40] This operator could be employed

Trang 36

to fulﬁll the deep convolution architecture.

Convolutional encoder In this ﬁeld, the point cloud will ﬁrst be voxelized as input

of 3D CNNs Xie et al [41] introduced a Gridding Residual Network (GRNet) andtook the 3D grids as intermediate representations to process irregular point clouds GR-Net’s Gridding and Gridding Reverse methods seamlessly transform point clouds to 3Dgrids, preserving crucial structural information The Cubic Feature Sampling layer thenextracts information from surrounding points, capturing essential contextual knowledge.This unique combination allows GRNet to leverage the power of 3D convolutions whilesimultaneously maintaining the structural integrity and context inherent to point clouds

Deconvolutional decoder Beyond feature learning, convolutions play a key role inpoint cloud reconstruction Wang et al [42] proposed SoftPoolNet, a novel approachthat leverages ”soft pooling” to organize extracted features based on their activation Thistechnique is combined with ”regional convolutions” at the decoding stage, maximizingglobal activation entropy for accurate reconstruction

4.1.3 Graph-based methods

Due to their non-Euclidean structure, both point clouds and graphs share a tal similarity This allows us to explore relationships between points or local regions byrepresenting them as vertices within a graph (Fig 2.4) By deﬁning edges based on pointadjacency, we can readily construct graphs from point clouds This opens the door toapplying graph convolutions for effective point cloud processing

fundamen-Graph-based methods leverage the strengths of graph convolutions, typically ing neighborhood aggregation and generating a new graph enriched with information fromlocal regions Compared to point-based approaches, this strategy explicitly considers re-gional geometric details, leading to potentially more accurate processing

involv-One pioneering work in this space is DGCNN [43], which introduced the concept ofdynamic graph convolution Here, adjacency matrices are calculated based on relation-ships between vertices in the latent space, allowing for a dynamic and adaptive graphstructure within the network Additionally, EdgeConv offers another approach that dy-namically constructs graphs at each network layer, making it readily integrable with ex-isting architectures

These innovative methods demonstrate the potential of leveraging graph convolutionsfor effective point cloud processing By capturing both local and global relationshipswithin the point cloud data, graph-based approaches offer promising solutions for varioustasks such as 3D object recognition, shape completion, and scene understanding

4.1.4 GAN-based method

Generative Adversarial Networks (GANs) [44] are a powerful class of machine ing models employing a pair of competing neural networks: a generator and a discrimi-nator The generator aims to create new data samples, while the discriminator strives todiscern between real and generated data

learn-In the context of 3D point cloud completion using GANs, a two-stage approach iscommonly adopted During the ﬁrst stage, the generator takes a random latent code asinput and generates a sparse point cloud This initial point cloud serves as the founda-

Trang 37

tion for the second stage, where the generator further processes it and outputs a denser,more complete representation of the object This iterative reﬁnement process allows theGAN to progressively reconstruct the missing portions of the 3D object while maintainingconsistency with the original data.

This two-stage strategy leverages the unique strengths of GANs The first stage talizes on the generator’s ability to capture the underlying structure and global features ofthe object, even from limited information The second stage then utilizes the combinedpower of the generator and discriminator to refine the details and ensure the generatedpoint cloud faithfully reflects the real object

capi-End-to-end mechanism End-to-End Learning for 3D Point Cloud Completion: veiling the Missing Pieces End-to-end learning has become a prevalent approach in 3Dpoint cloud completion One such method, 3DED-GAN, employs an encoder network tomap voxelized 3D shapes into a probabilistic latent space This latent representation isthen fed to a generative adversarial network (GAN) that facilitates the decoder in generat-ing the entire volumetric shape This two-pronged approach offers a powerful frameworkfor reconstructing missing portions of 3D objects

Un-However, incomplete point clouds often suffer from noise and geometrical tencies To address these challenges, PF-Net [45] adopts a novel strategy that leveragesthe spatial arrangement of incomplete input data This allows PF-Net to recover evencomplex geometries within the missing portions This is achieved through a multi-levelgeneration process based on feature points, which progressively predicts the missing parts

inconsis-in a hierarchical manner

By combining end-to-end learning with strategies designed to handle incomplete andnoisy data, 3D ED-GAN and PF-Net offer promising solutions for 3D point cloud comple-tion These methods enable the reconstruction of missing object parts with high accuracyand ﬁdelity, even with limited or corrupted input data

Refinement Beyond the core GAN architecture, several refinement strategies haveemerged to further improve the accuracy and quality of 3D point cloud completion Thesemethods often leverage unique techniques like feature alignment [46] to integrate shapepriors into the reconstruction process, resulting in more realistic and faithful representa-tions Additionally, cascaded refinement networks (CRNs) [47] offer a powerful coarse-to-fine approach, progressively refining the generated point cloud from coarse details tofine features with each iteration By incorporating these innovative strategies, GAN-basedmethods achieve even greater fidelity in reconstructing missing parts of 3D objects, pavingthe way for wider applications in diverse fields like robotics, autonomous driving, and 3Dcontent creation

4.1.5 Transformer-based methods

Transformers, initially introduced for natural language processing (NLP) [13], havemade signiﬁcant waves in 2D computer vision (CV) [48], [49] This success has nowtranslated to the realm of 3D point cloud processing, pioneered by PCT [50], Pointformer[51], and PointTransformer [52]

Yu et al [14] recognized the powerful representation learning capabilities of formers and applied them to point cloud completion Their approach frames the problem

trans-as a set-to-set translation ttrans-ask, employing a transformer encoder-decoder architecture

Trang 38

They represent the point cloud as a set of unordered points with position embeddings,enabling the transformer to process them as a series of point proxies.

Further enhancing the approach, Yu et al introduced a geometry-aware block that plicitly captures local geometric relationships, allowing the transformer to better leveragethe inherent 3D structure of the point cloud

ex-Xiang et al [53] took a different approach with SnowflakeNet, utilizing a based structure for decoding rather than representation learning Their Snowflake PointDeconvolution (SPD) models the generation of complete point clouds as a snowflake-likegrowth of points in 3D space, demonstrating the versatility of transformers in point cloudprocessing

transformer-These advancements highlight the transformative role transformers are playing in 3Dpoint cloud processing By leveraging their unique strengths, researchers are unlockingnew possibilities for tasks like 3D object recognition, shape completion, and scene under-standing

AdaPoinTr[15] Takes the Lead Building upon the success of PoinTr [14], AdaPoinTrfurther advances the state-of-the-art in transformer-based point cloud completion WhilePoinTr effectively utilizes transformers for set-to-set translation and introduces a geometry-aware block, AdaPoinTr introduces two key innovations that signiﬁcantly improve perfor-mance and efﬁciency:

• Adaptive Query Generation: This novel mechanism dynamically generates queriesduring training, allowing the model to focus on speciﬁc regions of the point cloud andlearn more nuanced details

• Denoising Task: By incorporating a denoising task alongside the completion task,AdaPoinTr improves training stability and reduces sensitivity to noise in the inputdata

This section delves into the performance results of cutting-edge point cloud tion methods across various datasets We will dissect the efficacy of these approachesand offer insights for future research endeavors It’s crucial to note that these resultsare directly sourced from the original research papers, leading to variations in resolutionand dataset configurations However, comparisons between methods under identical reso-lution settings remain valid unless otherwise specified By examining these performancemetrics, we can gain valuable insights into the strengths and weaknesses of existing meth-ods and identify promising avenues for further improvement

comple-A Summary of the performance on PCN

The PCN is the most commonly utilized dataset for 3D shape completion To comparebetween methods, we selected experiments on this dataset following the standard protocoland evaluation metric that proposed in the AdaPoinTr[15] - Table 6 The results are shown

in Table 2.2

B Summary of the performance on ShapeNet55/34

ShapeNet55 and ShapeNet34 are proposed to measure the generalization performance,the models are trained in the seen 34 categories and evaluated on the unseen 21 categories

As we can see in the last two columns of Table 2.3, AdaPoinTr[15] performs well inShapeNet55 and ShapeNet34, proving the most remarkable generalization of AdaPoinTr

Trang 39

Air Cab Car Cha Lam Sof Tab Wat Avg CD-l1 F-Score@1% FoldingNet[35] 9.49 15.80 12.61 15.55 16.41 15.97 13.65 14.99 14.31 0.322 AtlasNet[30] 6.37 11.94 10.10 1206 12.37 12.99 10.33 10.61 10.85 0.616 PCN[32] 5.50 22.70 10.63 8.70 0.695 11.34 11.68 8.59 9.64 0.695 TopNet[29] 7.61 13.31 10.90 13.82 14.44 14.78 11.22 11.12 12.15 0.503 MSN[31] 5.60 11.90 10.30 10.20 10.70 11.60 9.60 9.90 10.00 0.705 GRNet[41] 6.45 10.37 9.45 9.41 7.96 10.51 8.44 8.04 8.83 0.708 CRN[47] 4.79 9.97 8.31 9.49 8.94 10.69 7.81 8.05 8.51 - SnowFlake[53] 4.29 9.16 8.08 7.89 6.07 9.23 6.55 6.40 7.21 - LakeNet[33] 4.17 9.78 8.56 7.45 5.88 9.39 6.43 5.98 7.23 - PoinTr[14] 4.75 10.47 8.68 9.39 7.75 10.93 7.78 7.29 8.38 0.745 AdaPoinTr[15] 3.68 8.82 7.47 6.85 5.47 8.35 5.80 5.76 6.53 0.845

Table 4.1: Results on the PCN dataset [15]

Table 4.2: Results on the ShapeNet55 dataset [15]

Table 4.3: Results on the ShapeNet34 dataset [15]

D Complexity analysis and generalization performance

To further gain insight into model performances, further analysis of the parameter(Params) and theoretical computation cost (FLOPs) are conducted to compare the com-plexity of the model and the time-consuming situation, provided in [5] As shown in thesecond and third columns of Table VI, it can be found that FoldingNet owns the smallestnumber of parameters, while the GRNet and PF-Net possess a relatively great number ofparameters due to the complex architecture It is noticed that the number of parameters inSnowﬂakeNet and PoinTr are also relatively high because of the attention mechanism

Based on the result on PCN, ShapeNet55/34 We can have these conclusion:

• Transformer-based methods such as SnowFlakeNet, SeedFormer, PoinTr and PoinTr achieve better performance in all benchmark We can see that Transformer

Ada-is not only suitable for Natural Language Processing task but also perform well onComputer Vision task

• However, the number of parameters of transformer-based methods are relatively highbecause of the attention mechanism

Trang 40

Methods Params FLOPSFoldingNet[35] 2.30M 27.58G

Định dạng
Số trang	97
Dung lượng	12,32 MB