(Đồ án hcmute) research, design and construct an autonomous golf cart using multisensor fusion

MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR HIGH QUALITY TRAINING ADVISOR: STUDENT: S K L 0 0 9 8 3 0 RESEARCH, DESIGN AN

Trang 1

MINISTRY OF EDUCATION AND TRAINING

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION

FACULTY FOR HIGH QUALITY TRAINING

ADVISOR:

STUDENT:

S K L 0 0 9 8 3 0

RESEARCH, DESIGN AND CONSTRUCT

AN AUTONOMOUS GOLF CART

USING MULTISENSOR FUSION

GRADUATION PROJECT COMPUTER ENGINEERING TECHNOLOGY

DR LE MI HA PHAN THANH DANH NGUYEN TAN THIEN NIEN

Ho Chi Minh city, January 2022

Trang 2

GRADUATION PROJECT

RESEARCH, DESIGN AND CONSTRUCT AN

AUTONOMOUS GOLF CART USING MULTISENSOR

Trang 3

GRADUATION PROJECT

RESEARCH, DESIGN AND CONSTRUCT AN

AUTONOMOUS GOLF CART USING MULTISENSOR

Trang 4

THE SOCIALIST REPUBLIC OF VIETNAM

Independence – Freedom– Happiness

-

Ho Chi Minh City, July 27, 2022

GRADUATION PROJECT ASSIGNMENT

Student name: Phan Thanh Danh Student ID: 18119214

Student name: Nguyễn Tấn Thiên Niên Student ID: 18119033

Major: Computer Engineering Technology Class: 18119CLA1

Advisor: Assoc Prof Lê Mỹ Hà Phone number: 0938811201

Date of assignment: Date of submission:

1 Project title: Research, design, and construct an autonomous golf cart using multisensor fusion

2 Initial materials provided by the advisor:

- Image processing and machine learning documents such as papers and books:

- The related thesis of previous students

- The hardware specifications and its review

3 Content of the project:

- Refer to documents, survey, read and summarize to determine the project directions

- Calculate parameters and design block diagram for steering system using DC servo

- Try and handle errors in the wheeling system (mechanical and electrical)

- Collect and visualize data of sensors

- Choose models and algorithms for the car’s perception

- Write programs for microcontrollers and processors

- Test and evaluate the completing system

- Write a report

- Prepare slides for presenting

4 Final product: The golf cart model uses a multisensor combination that has two modes: Automatic and Manual The golf cart can operate well on HCMUTE campus with not many complex scenarios

CHAIR OF THE PROGRAM

(Sign with full name)

ADVISOR

Trang 5

ADVISOR’S EVALUATION SHEET

Student name: Phan Thanh Danh Student ID: 18119214

Student name: Nguyễn Tấn Thiên Niên Student ID: 18119033

Major: Computer Engineering Technology

Project title: Research, design, and construct an autonomous golf cart using multisensor fusion Advisor: Assoc Prof Lê Mỹ Hà

EVALUATION

1 Content of the project:

- The thesis has a total of six chapters with 73 pages

- The construction and design of an autonomous golf cart that can run on the HCMUTE campus with two modes: Manual and Automatic

- The system works with a combination of different sensors and their according algorithms

- The real system is successfully completed following the objectives in the proposal

- The thesis forms a basic foundation for the next generation of HCMUTE students

in the field of the practical autonomous car

2 Strengths:

- The system can support the UTE students and lectures in moving around campus

- The golf car is designed with image processing and machine learning algorithms combined with the control technique and mechanism design

- The whole sensors of this project are low-cost

- The execution time for the automatic mode is suitable for practical application

- The accuracy and the safety of the system are guaranteed

6 Mark: 10 (In words: Ten)

ADVISOR

-

Trang 6

APPENDIX 5: (Pre-Defense Evaluation sheet)

-

Ho Chi Minh City, January 20, 2020 PRE-DEFENSE EVALUATION SHEET Student name: Phan Thanh Danh Student ID: 18119214 Student name: Nguyễn Tấn Thiên Niên Student ID: 18119033 Major: Computer Engineering Technology Project title: Research, design, and construct an autonomous golf cart using multisensor fusion Name of Reviewer:

EVALUATION 1 Content and workload of the project

2 Strengths:

3 Weaknesses:

4 Approval for oral defense? (Approved or denied)

5 Overall evaluation: (Excellent, Good, Fair, Poor)

6 Mark: ……… (In words: )

Ho Chi Minh City, month day, year

REVIEWER

Trang 7

APPENDIX 6: (Evaluation sheet of Defense Committee Member)

-EVALUATION SHEET OF DEFENSE COMMITTEE MEMBER Student name: Phan Thanh Danh Student ID: 18119214 Student name: Nguyễn Tấn Thiên Niên Student ID: 18119033 Major: Computer Engineering Technology Project title: Research, design, and construct an autonomous golf cart using multisensor fusion Name of Defense Committee Member:

EVALUATION 1 Content and workload of the project

2 Strengths:

3 Weaknesses:

4 Overall evaluation: (Excellent, Good, Fair, Poor)

5 Mark: ……… (In words: )

Ho Chi Minh City, month day, year

COMMITTEE MEMBER

Trang 8

We would like to sincerely thank Professor Le My Ha for his thorough instruction, which helped us to have the necessary information to use for completing the thesis During the whole progress, even if we did our best to complete everything completely, mistakes are still inevitable We anticipate having my advisor's focused assistance and direction to help us gain more experience and successfully complete the topic project

On the other hand, we would like to express our sincere thanks to the Faculty of Hight Quality Training and Faculty of Electrical and Electronics Engineering where we obtained basic knowledge and experience Especially, we received the golf cart from FEEE honorably

Moreover, we also would like to thank ISLab members who helps us in detailing the works of this project They shared valuable experience and knowledge with us

Ultimately, we would like to express our gratitude to our families for their support

of our team throughout the implementation of this thesis

Sincere thanks for everything!

Trang 9

A GUARANTEE

This thesis is the result of our study and implementation, which we hereby formally proclaim We did not plagiarize from a published article without author acceptance We will take full responsibility for any violations that may have occurred

Authors

Phan Thanh Danh

Nguyễn Tấn Thiên Niên

Trang 10

ABSTRACT

Autonomous cars will be able to make an analysis and manage themselves on the ongoing path, depending on scene understanding and surrounding observation Particularly, the automobile does need to clarify the whole information surrounding it Inspired by that ideas, in this paper, we proposed a multi-sensor fusion method for autonomous cars operating on the HCMUTE campus The fusion method comprises three types of sensors, which are a Camera, GPS, and 2D LiDAR To begin with, we utilized and enhanced two deep learning models, which are lane-line detection and semantic segmentation Two of these models are pre-trained and fine-tuned on our self-labeled dataset As for the GP signal, we used Kalman Filter to reduce the noises from the environment and then check the continuous destination by a circular equation Additionally, we took advantage of 2D LiDAR as the safety term during the car avoidance process Last but not least, we combined all these programs by using threading and distributed system, which communicates with each other by User Datagram Protocol

The system uses our laptop with NVIDIA GTX 1650 graphic card as the primary processing unit To reduce the execution burden of our laptop, we subjoined a Jetson TX2 for processing GPS and LiDAR sensors In terms of the controlling system, the main microcontroller is two Arduino boards (Mega and Nano); one is for steering control, and another is for speed control Furthermore, we designed a simple interface using PyQT to display the on-road information

Experimental results reveal that the whole system works well on several campus roads Furthermore, the lowest frame per second of the system is 20, which satisfies real-time practical applications

Trang 11

Contents

ACKNOWLEDGEMENT v

A GUARANTEE vi

ABSTRACT vii

LIST OF FIGURES x

LIST OF TABLES xii

CHAPTER 1 INTRODUCTION 1

1.1 OVERVIEW AND RELATED RESEARCH 1

1.2 RESEARCH OBJECTIVE 1

1.3 LIMITATION 1

1.4 RESEARCH CONTENT 2

1.5 THESIS SUMMARY 2

CHAPTER 2 LITERATURE REVIEW 4

2.1 DEEP LEARNING 4

2.1.1 Convolutional Neural Network 4

2.1.2 Image Segmentation 4

2.1.3 Common backbones 6

2.1.4 Optimization method 9

2.2 2D-LIDAR PROCESSING 12

2.2.1 Point cloud clustering. 12

2.2.2 Feature Extraction. 13

2.3 GPS PROCESSING 14

2.3.1 Kalman Filter 14

2.4 PID CONTROLLER 15

CHAPTER 3 HARDWARE PLATFORM 20

3.1 OVERALL SYSTEM 20

3.1 DETAILS 21

3.1.1 Ezgo Golf Cart 21

CHAPTER 4 SOFTWARE DESIGN 33

4.1 RGB IMAGE-BASED ALGORITHMS 34

4.1.1 Enhanced Semantic Segmentation 34

4.1.2 Lane-Line Detection 36

4.2 ALGORITHMS OF GPS DATA 37

Trang 12

4.3 ALGORITHMS ON 2D LiDAR 39

4.4 FUSION OF CAMERA AND 2D LIDAR 41

4.5 ESTIMATION AND CONTROL OF STEERING ANGLE 43

4.6 WHEELING STRATEGY 44

CHAPTER 5 EXPERIMENTAL RESULTS 47

5.1 EXPERIMENTAL ENVIRONMENT 47

5.1.1 Environment 47

5.1.2 Dataset 47

5.2 TRAINING PROCESS 51

5.3 RESULTS 53

5.4 COMPARISONS AND EVALUATION 59

CHAPTER 6 CONCLUSION AND FUTURE WORK 62

APPENDIX 66

Trang 13

LIST OF FIGURES

Figure 2 1 Basic CNN architecture [6] 4

Figure 2 2 The difference between three types of segmentation 5

Figure 2 3.The residual block 6

Figure 2 4 Performance of Residual block with different layers testing 7

Figure 2 5 Resnet-18 architecture 7

Figure 2 6 Depthwise convolution, uses 3 kernels to transform a 12×12×3 image to an 8×8×3 image 8

Figure 2 7 Pointwise convolution, transforms an image of 3 channels into an image of 1 channel 8

Figure 2 8 Pointwise convolution with 256 kernels, outputting an image with 256 channels 9

Figure 2 9 Optimizer algorithms development 10

Figure 2 10 Momentum Idea 10

Figure 2 11 Adaptive Breakpoint Detection algorithm 12

Figure 2 12 RANSAC algorithm 13

Figure 2 13 PID controller 15

Figure 2 14 Function and Operation of Pytorch 16

Figure 2 15 TensorRT mechanism 17

Figure 2 16 OpenStreetMap Interface 18

Figure 2 17 Multithreading for the application 19

Figure 3 1: The overall hardware platform 20

Figure 3 2 The EZGO golf cart 21

Figure 3 3 Ezi servo and driver 22

Figure 3 4 Absolute 8-bit encoder 23

Figure 3 5:The receiver (left) and Devo7 transmitter (right) 24

Figure 3 6 Astra camera 25

Figure 3 7 RP-Lidar A1 25

Figure 3 8 Module GPS Ublox M8N 26

Figure 3 9 Waveshare touch Screen 27

Figure 3 10 Module WIFI router 28

Figure 3 11 Laptop Acer Nitro 5 29

Figure 3 12 Jetson TX2 30

Figure 3 13 The PCB (left) and the steering unit (right) 31

Figure 3 14 The connected diagram between devices (better to see in zoom) 32

Figure 3 15: The connection between relays and actuators to control speed 32

Figure 4 1 The overall software design 33

Figure 4 2 The concurrently executed thread 34

Figure 4 3 Convolutional Block Attention Module with two submodules 34

Trang 14

Figure 4 4 The modified LiteSeg model with CSP and CBAM 35

Figure 4 5 The Ultra-Fast lane line detection model 36

Figure 4 6 Kalman filter flowchart 38

Figure 4 7 The circular checking idea 39

Figure 4 8 Point cloud cluster from ABD algorithm 40

Figure 4 9 Distance from LiDAR to the estimated straight line 40

Figure 4 10 Range and position of the camera as well as LiDAR 41

Figure 4 11 The desired trajectory through three phases 41

Figure 4 12 Angle estimation idea using PID The left is the error estimation (yellow arrow) using the expected middle point 44

Figure 4 13 Car model with detail parameter 45

Figure 5 1 Testing environments 47

Figure 5 2 The self-collected dataset 48

Figure 5 3 Tusimple dataset 48

Figure 5 4 The data labeling process (a,c) and the collected dataset (b,d) 49

Figure 5 5 The Mapilliary Vitas dataset 50

Figure 5 6 Labeling process using Hasty tool 50

Figure 5 7 GPS data Orange and green are the GPS data from module, and blue is the path from API 51

Figure 5 8 Training mIoU and loss graph of lane-line detection model 52

Figure 5 9 Training mIoU and loss graph of the segmentation detection model 53

Figure 5 10 Segmentation output with the campus road 54

Figure 5 11 Results of lane line detection Original image (left), the model prediction (right) 55

Figure 5 12 Results of the clustering algorithm in short and long distance 56

Figure 5 13 Position data in UTE map without Kalman filter and use Kalman filter 56

Figure 5 14 Steering angle visualization The red line is the steering angle sent to the microprocessor, and the blue line is the encoder angle's feedback 57

Figure 5 15 Position of a car through the trajectory in different cases 58

Figure 5 16 Position of the car during the avoiding process 59

Figure 5 17 LiteSeg model with different versions a) Original LiteSeg, b) CSP LiteSeg, c) CBAM LiteSeg, d) CSP CBAM LiteSeg 60

Trang 15

LIST OF TABLES

Table 2 1 Common modules of Pytorch 16

Table 3 1 Specifications of the EZGO golf cart 21

Table 3 2: Specifications of Ezi servo and driver 22

Table 3 3 Specifications of absolute encoder 23

Table 3 4 Specifications of Devo7 transmitter 24

Table 3 5 Specifications of RX601 24

Table 3 6 Specifications of Astra camera 25

Table 3 7 Specifications of RPLidar A1 26

Table 3 8 Specifications of GPS Module (NEO-M8N) 26

Table 3 9 Specifications of Touch Screen 27

Table 3 10 Specifications of WIFI Module 28

Table 3 11 Specifications of Laptop Acer Nitro 5 29

Table 3 12 Specifications of Jetson TX2 30

Tabel 4 1 Kalman variables 37

Table 5 1 Training parameters of the lane-line detection model 51

Table 5 2 Training parameters of the segmentation model 52

Table 5 3 Comparison Liteseg model with different versions 60

Table 5 4 Comparison between the original model with TensorRT model 61

Table 5 5 Execution time after and before applying the threading technique 61

Trang 16

ABBREVIATIONS

Systems

Module

Receiver/Transmitter

Trang 17

CHAPTER 1 INTRODUCTION 1.1 OVERVIEW AND RELATED RESEARCH

For more than 50 years, the automobile sector has existed and flourished In the past few years, self-driving cars have been considered the favorable top-notch expectation, with the potential to be the largest technological revolution in the near future Finding the next scenario behavior for the autonomous car has been developed to improve traffic efficiency while ensuring vehicle stability and driver safety These things derive from the awareness

of the surrounding environment, which require the combination of many sensors

Nowadays, there are several methods were introduced to navigate a vehicle using the multisensor technique One of the earliest approaches [10] involved placing cameras, ultrasonic sensors (HC-RS04), and infrared sensors on the robot's various sides However, these sensor ranges cannot capture the entire geometry of objects on actual roadways In the paper [1], the authors gave us the advantages of sensors that can optimize the perceived information of the surrounding environment Extended Kalman Filter was introduced to combine multi sensors [2] that can localize the robot's position on the track The sensors used in that paper comprised an encoder, compass, IMU, and GPS In other research [3], the Lyapunov function and Kalman Filter were utilized to plan a trajectory for the robot with the aid of odometry Trendingly, combining 3D LiDAR with cameras is a common method for developing self-driving automobiles [4] or ADAS However, the expensive cost

of 3D LiDAR is a barrier for students Otherwise, 2D LiDAR is a low-cost alternative option for constructing autonomous systems On the other hand, as for the lateral control and the steering part, the paper [5] used a DC motor that is chain-driven and controlled by

a PID controller on an actual golf cart There is a control problem as well as specific faults caused by the transmission chain driven

In this thesis, our team managed to conduct a self-driving system for a golf cart that can run well in basic scenarios The golf cart work with multiple sensors aid, which can give the central processing block executes the program easier Moreover, we proposed a lateral control pipeline for a self-driving car without EPS (Electronic Power Steering), which is used in poor-mechanism cases

1.2 RESEARCH OBJECTIVE

Research, design, and construct an autonomous golf cart that can operate well on the HCMUTE campus For this purpose, our team has to build that hardware and research suitable algorithms that can operate the car At the same time, this project will be the most fundamental investigation of the self-driving car’s operation in reality

1.3 LIMITATION

Because the whole project is long-term, there are many remaining shortages In this thesis, some obvious limitations are presented as follows:

Trang 18

 The automobile can operate in not many complicated environments such as crowded people scenarios, sudden changes, and so on

 In our project, the self-driving car can operate on the HCMUTE campus only

 Due to the restriction of camera angle, the car cannot run well on large-sized roads

 The low-priced sensors do not work well in outdoor conditions

 With the limitation of the hardware aids, this project prioritizes light, yet efficient algorithms Therefore, some best precise methods are not utilized in this thesis

1.4 RESEARCH CONTENT

The implemented contents are described as follows:

 Content 1: Refer to documents, survey, read and summarize to determine the project directions

 Content 2: Calculate parameters and design block diagram for steering system using DC servo

 Content 3: Implement the printed circuit board and wire all modules together

 Content 4: Try and handle errors in the wheeling system (mechanical and electrical)

 Content5: Construct the wheeling algorithm suited for the entire system

 Content 6: Collect and visualize data of sensors

 Content 7: Choose models and algorithms for the car’s perception

 Content 8: Write programs for microcontrollers and processors

 Content 9: Test and evaluate the completing system

 Content 10: Write a report

 Content 11: Prepare slides for presenting

1.5 THESIS SUMMARY

The structure of this thesis is arranged as follows:

Chapter 1: Introduction

This chapter introduced the topic, the objectives, the limitations, the related works

of the research, and the layout of this thesis

Chapter 2: Literature review

This chapter gives the fundamental theory, the framework, and the algorithms for the implementation of the thesis, using the relevant studies as a source of reference

Chapter 3: The hardware platform

This chapter presents an overall description of the model's hardware, including crucial elements, equipped sensors, and the connection of them

Trang 19

Chapter 4: Software design

This chapter presents an overall description of the model's software, including the

algorithms, parameters, and operation flow

Chapter 5: Experimental result, comparison, and evaluation

This chapter describes the experiments, which also include the experimental setting, data collecting, and initially predefined parameters Additionally, the results and the comparison were also presented

Chapter 6: Conclusion and future works

This chapter gives the conclusion and some future works which will be conducted

Trang 20

CHAPTER 2 LITERATURE REVIEW 2.1 DEEP LEARNING

2.1.1 Convolutional Neural Network

Convolutional Neural Network (CNN) is a fundamental concept in the deep learning field Compared with other handcraft features, CNN can extract features deeply and concretely In detail, the features are extracted by CNN layers, which may describe clearer rules of the whole data Currently, the main function of CNN is to exploit the deep features

of kernels The input data is convoluted with the kernels with predefined shapes Thus, the features will become more and more complex through each layer Besides, CNN must combine with other layers to extract the features better such as max pooling or average pooling Finally, the exploited features are classified by “Fully Connected Layers (FCN)” after the flattening layer In nowadays networks, there are some common structures of CNN A CNN is just a series of layers, with the output of the former layer becoming the input of the next These CNNs have the typical structure shown in Figure 2.1 It comprises

2 main parts: (1) Backbone (feature extraction block), and (2) FCN (Classifier)

Figure 2 1 Basic CNN architecture [6]

Trang 21

precise prediction labels to each pixel There are three kinds of IM: Instance Segmentation, Semantic Segmentation, and Panoptic Segmentation

+ Instance Segmentation: Instance segmentation is the process of finding and distinguishing each separate object of interest in a picture

+ Semantic Segmentation: Semantic Segmentation is the process of partitioning a digital picture into several segmented pieces [7] (sets of pixels, also known as image objects), with each pixel tagged to the relevant class displayed on the image

+ Panoptic Segmentation: Panoptic segmentation is the combination of instance and semantic segmentation

Figure 2.2 illustrates the differences between semantic segmentation, instance segmentation, and panoptic segmentation The execution time is the ratio to the number of classification tasks That means, with the same architecture, the Instance Segmentation is the lightest task and the Panoptic Segmentation is the heaviest task

Figure 2 2 The difference between three types of segmentation

Trang 22

2.1.3 Common backbones

2.1.3.1 ResNet

ResNet (Residual Network) was first introduced to the public in 2015, and it went

on to win first place in the 2015 ILSVRC competition with a top 5 error rate of only 3.57 percent [8] It also won first place in the ILSVRC and COCO 2015 competitions for ImageNet detection, ImageNet localization, Coco detection, and Coco segmentation Currently, there are several ResNet architectural versions available today, each with a different number of layers, such as ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, and so on

A problem that occurs when building a CNN network with many convolutional layers may occur the Vanishing Gradient phenomenon ResNet is the solution that uses a

"skip" connection to cross one or more layers shown in Figure 2.3 ResNet is similar to networks in that it uses convolution, pooling, activation, and fully-connected layers The residual block utilized in the network is seen in Figure 2.3 A curved arrow appears from the beginning and ends at the end of the residual block In other words, it will add Input X

to the output of the layer, or the addition we see in the illustration, which will counteract the zero derivatives since X is still added

Figure 2 3.The residual block The ResNet structure was presented as a simpler alternative that focused on increasing information through the gradient of the network As proved in Figure 2.4 [9], the idea of residual block solved Vanishing Gradient efficiently Following ResNet, some modifications to this architecture were introduced Experiments demonstrate that these structures can be taught using neural networks with hundreds of layers of depth, and they rapidly became the most common architecture in Computer Vision

Trang 23

Figure 2 4 Performance of Residual block with different layers testing

Trang 24

Deepwise Convolutional Strategy:

To begin, we must define Depthwise Convolution The tensor3D in Figure 2.6 input block

is divided into depth matrix slices Then, convolution is performed on each slice as indicated in Figure 2.6 Each 5x5x1 kernel iterates across one channel of the image, obtaining the scalar products of every 25 pixels group and producing an 8x8x1 image When you combine these images, you get an 8x8x3 image Convolution produces a result that is concatenated with depth As a result, the output is a tensor3D block of size hꞌ×wꞌ×c.

Figure 2 6 Depthwise convolution, uses 3 kernels to transform a 12×12×3 image to an

8×8×3 image

Pointwise Convolutional Strategy:

Afterward, the 12x12x3 picture has been converted to an 8x8x3 image using depthwise convolution The number of channels in each picture must now be increased The pointwise convolution gets its name from the fact that it uses a 1x1 kernel, which iterates across every single point This kernel has a depth equal to the number of channels

in the input picture; in our instance, there are three To generate an 8x8x1 image, we iterate

a 1x1x3 kernel through our 8x8x3 image shown in Figure 2.7

Figure 2 7 Pointwise convolution, transforms an image of 3 channels into an image of 1

channel

Trang 25

Finally, we can create 256 channels for the outputs We just multiply the input with 256 kernels 1×1×3 like Figure 2.8 This type of convolution will make the model reduce the many parameters in the models

Figure 2 8 Pointwise convolution with 256 kernels, outputting an image with 256 channels

2.1.4 Optimization method

In this section, optimizers are presented with the aim of pushing the performance of the deep learning model However, the demand for trade-off models is highly necessary

2.1.4.1 Gradient Descent

Gradient descent (GD) is a first-order iterative optimization process used in mathematics

to determine the local minimum of a differentiable function [11] Because this is the steepest descent, the objective is to take repeated steps in the opposite direction of the gradient of the function at the current position There are several transformations of GD to optimize the training process and minimum loss function, which are listed as follows:

 Stochastic gradient descent (SGD): It is a stochastic approximation of gradient descent optimization since it substitutes the real gradient with an estimate of it

 Batch gradient descent: In Batch Gradient Descent, all of the training data is used

to take a single step

 Mini-batch gradient descent: A mini-batch, is a batch with a predefined number of training data fewer than the actual data

2.1.4.2 Adam optimizer

Adam (Adaptive Moment Estimation) [12] is an optimized algorithm for gradient descent With the advantage of memory requirements, the Adam optimizer algorithm is efficient

Trang 26

when working to solve a large problem involving huge data or parameters The Adam optimizer is a combination of two gradient descent methodologies, momentum, and RMSprop

Figure 2 9 Optimizer algorithms development

 Momentum

This method utilizes the "exponentially weighted average" of the gradients to provide

a certain acceleration to make the gradient descent algorithm faster Using averages accelerates the algorithm's convergence to the minimum

Figure 2 10 Momentum Idea

ωt+1 = ωt − 𝛼𝑚𝑡 (2.1) 𝑤ℎ𝑒𝑟𝑒

𝑚𝑡 = 𝛽𝑚𝑡−1+ (1 − 𝛽) [ 𝛿𝐿

𝛿ωt] (2.2)

Trang 27

 RMSprop

RMSprop, often known as the root mean square prop, is an adaptive learning technology that is an improved version of AdaGrad It uses the "exponential moving average" rather than the cumulative sum of squared gradients, as in AdaGrad

ωt+1 = ωt − 𝛼𝑡

(𝑣𝑡+ 𝜀)12

∗ [𝛿𝐿

𝛿ωt] (2.3) Where

𝑚𝑡 = 𝛽𝑚𝑡−1+ (1 − 𝛽2) [ 𝛿𝐿

𝛿ω t] (2.6) Because both 𝛽1and 𝛽2 are about to equal 1, the gain tends towards 0 By calculating

"bias-corrected" 𝑚𝑡 and 𝑣𝑡, this optimizer resolves the issue In order to avoid large oscillations when close to the global minimum, this is also done to manage the weights as they approach it The formulas employed are:

𝑣̂ = 𝑡 𝑣𝑡

1−𝛽 1𝑡 and 𝑚̂ = 𝑡 𝑚𝑡

1−𝛽 2𝑡 (2.7) Including them in our fundamental equation yields:

ωt+1 = ωt− 𝑚̂ (𝑡 𝛼

√𝑣̂ + 𝜀𝑡 ) (2.8)

Trang 28

2.2 2D-LIDAR PROCESSING

2.2.1 Point cloud clustering

Figure 2 11 Adaptive Breakpoint Detection algorithm The clustering procedure divides the raw LiDAR data into groups correlating to real-world objects Proposed in the paper [13], the algorithm utilized to categorize all the 2D data (points) is called Adaptive Breakpoint Detector (ABD) Traditionally, to cluster a set of point clouds including n points, the Euclidean distance between the spatial continuous points 𝑃𝑛 and 𝑃𝑛−1 are calculated If the distance between these specific points is lower than a threshold Dmax, they are in a cluster Equation (2.9) depicts the relationship of points

in a cluster through the predefined threshold

‖𝑃𝑛− 𝑃𝑛−1‖ > 𝐷𝑚𝑎𝑥 (2.9)

In fact, raw data of 2D LiDAR gets meager according to how far the objects are Therefore, 𝐷𝑚𝑎𝑥 must be changeable depending on the distance between LiDAR and the whole surroundings Adapting the threshold distance (𝐷𝑚𝑎𝑥) is one approach to get around this issue Geovany introduced a methodology estimating the adaptive threshold with respect to scanning range (𝑟𝑛) is shown in Figure 2.11 In this algorithm, a predetermined line passes through the point 𝑃𝑛−1 and makes an 𝜆 angle concerning the (𝑛 − 1)𝑡ℎ scan This straight-line attempts to calculate the allowable range point According to Figure 2.11 , in Δ𝐴𝐵𝐶, the law of Sines was applied to estimate 𝐷𝑚𝑎𝑥 as follows:

Trang 29

After obtaining 𝐷𝑚𝑎𝑥 from equation (2.11), we measured the physical distance of two consecutive points by using Cosine law in a triangle, which is shown in equation (2.12) If the physical distance ‖𝑃𝑛− 𝑃𝑛−1‖ is greater than 𝐷𝑚𝑎𝑥, two continuous points are in a cluster

‖𝑃𝑛− 𝑃𝑛−1‖ = (𝑟𝑛)2+ (𝑟𝑛−1)2− 2𝑟𝑛−1𝑟𝑛cos(∆∅) (2.12)

2.2.2 Feature Extraction

To extract the feature of the obstacle, in this work, we utilized Random Sample Consensus [14] (RANSAC) to fit the straight line of the car body from the right The RANSAC method is a learning approach that uses random sampling of observed data to estimate model parameters RANSAC employs the voting mechanism to determine the best fitting result given a dataset with both inliers and outliers The dataset's pieces are utilized

to vote for one or more models This voting technique is based on two assumptions: that the noisy characteristics will not consistently vote for any particular model (few outliers), and that there are enough features to agree on a decent model (few missing data) From these advantages, RANSAC can be used as a straight line estimator in 2D space without many noises Figure 2.12 shows the RANSAC algorithms used in line prediction based on the voting technique

Figure 2 12 RANSAC algorithm The RANSAC technique can be illustrated by the algorithms as follows [15]:

Algorithm 𝟏: RANSAC algorithm

Input:

data – A set of observations

model – A model to explain observed data points

n – Minimum number of data points required to estimate model parameters

k – Maximum number of iterations allowed in the algorithm

t – Threshold value to determine data points that fit well with the model

d – Number of close data points required to assert that a model fits well to data

Trang 30

Outputs: BestFit – model parameters that best fit the data (or null if no good model is

found)

Begin

while iterations < k do

maybeInliers := n randomly selected values from data

maybeModel := model parameters fitted to maybeInliers

alsoInliers := empty set

for every point in data not in maybeInliers do

if point fits maybeModel with an error smaller than t

add point to alsoInliers

end if

end for

if the number of elements in alsoInliers is > d then

betterModel := model parameters fitted to all points in maybeInliers and

alsoInliers

thisErr := a measure of how well betterModel fits these points

if thisErr < bestErr then

2.3 GPS PROCESSING

2.3.1 Kalman Filter

The Kalman Filter is a Linear-Gaussian State Space Model that consists of time series prediction techniques It was first introduced by Kalman and has since been widely studied and utilized

The Kalman filter [16] is mostly used to combine low-level data If the system can

be characterized as a linear model and the error as Gaussian noise, the recursive Kalman filter will generate optimal statistical estimates The Kalman filter calculates the system's state as a weighted average of the projected state and the new measurement The weights are intended to ensure that values with lower estimated uncertainty are "trusted" more The

Trang 31

weights are derived from the covariance, which is a measure of the estimated uncertainty

in predicting the system's state The weighted average yields a new state estimate that falls somewhere between the expected and observed states and has a lower estimated uncertainty than either alone At each time step, this process is repeated, with the current estimate and its covariance informing the prediction used in the next iteration This means that the Kalman filter works recursively, using only the most recent "best estimate" of a system's state to generate a new state rather than the complete history

2.4 PID CONTROLLER

A Proportional Integral Derivative (PID) [17] shown in Figure 2.13 is a common type of feedback mechanism in control loops PID controllers are the most often utilized controllers in feedback controllers in industrial control systems The "error" value is calculated by the PID controller as the difference between the variable parameter measured value and the intended set value By modifying the input control value, the controller will reduce the error Simply, the three phases of PID are described as follows:

P: is a proportional adjustment parameter that aids in the generation of an adjustment signal proportionate to the initial inaccuracy of the input based on the sampling time

I: is the integral of the error over the sampling time Integral control is a tuning method to generate tuning signals so that the error is reduced to 0 This tells us the total instantaneous error over time or the accumulated error in the past The smaller the time, the stronger the integral adjustment effect, corresponding to the smaller deviation

D: is the error differential The differential control creates an adjustment signal proportionate to the input bias rate of change The greater the duration, the greater the differential tuning range, and hence the faster the regulator responds to input changes

Figure 2 13 PID controller

Trang 32

2.5 PYTORCH FRAMEWORK

PyTorch is an open-source machine learning framework based on the Torch library that is used for applications such as computer vision and natural language processing [18] Pytorch is developed under an open-source license, hence it has a wide community Though Google's Tensorflow is a well-established ML/DL framework with a devoted following, PyTorch has established a stronghold due to its dynamic graph approach and flexible debugging strategy [19] Tensor is the primary data structure in PyTorch Tensor, like NumPy array, is a one-dimensional array with elements of the same data type

Pytorch packages many modules to support the training process shown in Figure 2.14 These modules support the user to conduct training duration easier with different hardware platforms

Figure 2 14 Function and Operation of Pytorch Table 2.1 lists some most-used modules with different phases such as load dataset, config model, and so on

Table 2 1 Common modules of Pytorch

torch.nn Define basic blocks such as layers in models

torch.optim Define optimizers in Deep learning/ Machine Learning

torch.utils Commonly, used for loading datasets with different loader

Trang 33

torch.autograd PyTorch’s automatic differentiation engine that powers neural

network training

torch.backward Computes the gradient of the current tensor

We can also convert PyTorch-trained models into formats like ONNX, allowing them to be used in other “Deep learning frameworks” like MXNet, CNTK, and Caffe2 We can also convert ONNX models to Tensorflow or TensorRT format

2.6 TENSOR-RT PATTERN

TensorRT is an NVIDIA library designed to increase inference performance and minimize latency on NVIDIA graphics hardware (GPUs) [20] It can increase inference speed by 2-4 times when compared to real-time services and by 30 times when compared

to CPU performance TensorRT uses five methods of optimization to improve inference speed, which is shown in Figure 2.15

Figure 2 15 TensorRT mechanism

 Precision calibration

During training, the parameters and activations in precision FP32 (Float Point 32) will be converted to precision FP16 or INT8 Optimizing it will decrease stagnation and boost inference speed, albeit at the price of slightly decreasing model accuracy In real-time recognition, a trade-off between accuracy and inference speed

is sometimes essential

Trang 34

 Dynamic Tensor Memory

TensorRT will merge nodes vertically, horizontally, or both ways to minimize GPU memory and bandwidth

 Layer and Tensor Fusion

Several kernels dedicated to optimization are conducted during the model optimization process

 Kernel Auto-Tuning

Simply allocates the memory necessary for each tensor and only for the time that it will be used Moreover, minimizing the memory footprint and encouraging memory re-use is boosted

 Multi-Stream Execution

Enables concurrent processing for various input streams

2.7 OPEN STREET MAP

OpenStreetMap [21] (OSM) is an online world map, which was created by many volunteers as amateurs on the Wikipedia this map was released with a free license to encourage the creation of users to the global map data The OSM is widely used because

of the following advantages:

- The online global map data, save as the vector

- Run on many programs: JOSM,

- Easy to change the data format

Figure 2 16 OpenStreetMap Interface

2.8 MULTITHREADING

In the computer system, a program needs at least one thread to execute that is called The word ‘Multithreading’ [22] is used to perform a program execution that uses more than

Trang 35

one thread to run In the real world, a computer usually has many cores of CPU and the thread in the program will be executed back and forth on those CPUs via the scheduler (a part of the operating system)

Figure 2 17 Multithreading for the application The advantages of multithreading are “Full utilization of CPU resources” and Increase user experience

Trang 36

CHAPTER 3 HARDWARE PLATFORM 3.1 OVERALL SYSTEM

Figure 3 1: The overall hardware platform This section presents the functions of each hardware component and their relation

to them Overall, the whole hardware system is illustrated in Figure 3.1 The functions of each component are described as follows:

- Inputs: We have mainly 4 input devices: A camera, 2D LiDAR, GPS module, and

encoder These devices aim to collect the input from the surrounding environment

- Processors: Two main processors are utilized, which are our Laptop and Nvidia

Jetson TX2 board These processors are used for receiving data from peripherals,

and processing and sending signals to microcontrollers

- Microcontrollers: There are two microcontrollers on the PCB One is for controlling steering angle and another is for controlling the speed

- Actuators: The part that executes commands from the microcontroller

Trang 37

3.1 DETAILS

3.1.1 Ezgo Golf Cart

Figure 3 2 The EZGO golf cart Table 3 1 Specifications of the EZGO golf cart

Function and feature:

- The golf cart is the implementing model

- Dimensions of the real car

- Have no Electronic Power Steering Support (RPSS)

Trang 38

3.1.2 Ezi Servo and Driver

Figure 3 3 Ezi servo and driver Table 3 2: Specifications of Ezi servo and driver

- Supply pulse for steering servo

- Save the servo in cases: overpower, a circuit error, over current, and so on

- It receives pulses from the microcontroller and control servo

Specifications

Control Method Closed loop control with 23bit MCU

Current Consumption Max 500mA (Except motor current)

Max Input Pulse

I/O

Signal Input Signals

Position Command Pulse, Servo On/Off, Alarm Reset (Photocoupler Input)

Output Signals

In-Position, Alarm (Photocoupler Output) Encoder Signal (A+, A-, B+, B-, Z+, Z-, 26C31 of Equivalent) (Line Driver Output), Brake

Trang 39

3.1.3 Absolute Encoder

Figure 3 4 Absolute 8-bit encoder Table 3 3 Specifications of absolute encoder

- Used for measuring the steering angle directly

- The resolution of the encoder is 256 pulses

Trang 40

3.1.4 Devo7 and RX 601

Figure 3 5:The receiver (left) and Devo7 transmitter (right) Table 3 4 Specifications of Devo7 transmitter

- Used for manual control mode, Devo7 is the transmitter with seven channels

- The maximum distance can reach 80m

Tiêu đề	Research, design and construct an autonomous golf cart using multisensor fusion
Tác giả	Phan Thanh Danh, Nguyễn Tấn Thiên Niên
Người hướng dẫn	Assoc. Prof Lê Mỹ Hà
Trường học	Ho Chi Minh City University of Technology and Education
Chuyên ngành	Computer Engineering Technology
Thể loại	Đồ án
Năm xuất bản	2022
Thành phố	Ho Chi Minh City

Định dạng
Số trang	96
Dung lượng	8,42 MB