MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR HIGH QUALITY TRAINING ADVISOR: STUDENT: S K L 0 0 9 8 3 0 RESEARCH, DESIGN AN
Trang 1MINISTRY OF EDUCATION AND TRAINING
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION
FACULTY FOR HIGH QUALITY TRAINING
ADVISOR:
STUDENT:
S K L 0 0 9 8 3 0
RESEARCH, DESIGN AND CONSTRUCT
AN AUTONOMOUS GOLF CART
USING MULTISENSOR FUSION
GRADUATION PROJECT COMPUTER ENGINEERING TECHNOLOGY
DR LE MI HA PHAN THANH DANH NGUYEN TAN THIEN NIEN
Ho Chi Minh city, January 2022
Trang 2HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION
FACULTY FOR HIGH QUALITY TRAINING
GRADUATION PROJECT
RESEARCH, DESIGN AND CONSTRUCT AN
AUTONOMOUS GOLF CART USING MULTISENSOR
Trang 3HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION
FACULTY FOR HIGH QUALITY TRAINING
GRADUATION PROJECT
RESEARCH, DESIGN AND CONSTRUCT AN
AUTONOMOUS GOLF CART USING MULTISENSOR
Trang 4THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-
Ho Chi Minh City, July 27, 2022
GRADUATION PROJECT ASSIGNMENT
Student name: Phan Thanh Danh Student ID: 18119214
Student name: Nguyễn Tấn Thiên Niên Student ID: 18119033
Major: Computer Engineering Technology Class: 18119CLA1
Advisor: Assoc Prof Lê Mỹ Hà Phone number: 0938811201
Date of assignment: Date of submission:
1 Project title: Research, design, and construct an autonomous golf cart using multisensor fusion
2 Initial materials provided by the advisor:
- Image processing and machine learning documents such as papers and books:
- The related thesis of previous students
- The hardware specifications and its review
3 Content of the project:
- Refer to documents, survey, read and summarize to determine the project directions
- Calculate parameters and design block diagram for steering system using DC servo
- Try and handle errors in the wheeling system (mechanical and electrical)
- Collect and visualize data of sensors
- Choose models and algorithms for the car’s perception
- Write programs for microcontrollers and processors
- Test and evaluate the completing system
- Write a report
- Prepare slides for presenting
4 Final product: The golf cart model uses a multisensor combination that has two modes: Automatic and Manual The golf cart can operate well on HCMUTE campus with not many complex scenarios
CHAIR OF THE PROGRAM
(Sign with full name)
ADVISOR
(Sign with full name)
Trang 5ADVISOR’S EVALUATION SHEET
Student name: Phan Thanh Danh Student ID: 18119214
Student name: Nguyễn Tấn Thiên Niên Student ID: 18119033
Major: Computer Engineering Technology
Project title: Research, design, and construct an autonomous golf cart using multisensor fusion Advisor: Assoc Prof Lê Mỹ Hà
EVALUATION
1 Content of the project:
- The thesis has a total of six chapters with 73 pages
- The construction and design of an autonomous golf cart that can run on the HCMUTE campus with two modes: Manual and Automatic
- The system works with a combination of different sensors and their according algorithms
- The real system is successfully completed following the objectives in the proposal
- The thesis forms a basic foundation for the next generation of HCMUTE students
in the field of the practical autonomous car
2 Strengths:
- The system can support the UTE students and lectures in moving around campus
- The golf car is designed with image processing and machine learning algorithms combined with the control technique and mechanism design
- The whole sensors of this project are low-cost
- The execution time for the automatic mode is suitable for practical application
- The accuracy and the safety of the system are guaranteed
6 Mark: 10 (In words: Ten)
Ho Chi Minh City, July 27, 2022
ADVISOR
(Sign with full name)
THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-
Ho Chi Minh City, July 27, 2022
Trang 6APPENDIX 5: (Pre-Defense Evaluation sheet)
THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-
Ho Chi Minh City, January 20, 2020 PRE-DEFENSE EVALUATION SHEET Student name: Phan Thanh Danh Student ID: 18119214 Student name: Nguyễn Tấn Thiên Niên Student ID: 18119033 Major: Computer Engineering Technology Project title: Research, design, and construct an autonomous golf cart using multisensor fusion Name of Reviewer:
EVALUATION 1 Content and workload of the project
2 Strengths:
3 Weaknesses:
4 Approval for oral defense? (Approved or denied)
5 Overall evaluation: (Excellent, Good, Fair, Poor)
6 Mark: ……… (In words: )
Ho Chi Minh City, month day, year
REVIEWER
(Sign with full name)
Trang 7APPENDIX 6: (Evaluation sheet of Defense Committee Member)
THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-EVALUATION SHEET OF DEFENSE COMMITTEE MEMBER Student name: Phan Thanh Danh Student ID: 18119214 Student name: Nguyễn Tấn Thiên Niên Student ID: 18119033 Major: Computer Engineering Technology Project title: Research, design, and construct an autonomous golf cart using multisensor fusion Name of Defense Committee Member:
EVALUATION 1 Content and workload of the project
2 Strengths:
3 Weaknesses:
4 Overall evaluation: (Excellent, Good, Fair, Poor)
5 Mark: ……… (In words: )
Ho Chi Minh City, month day, year
COMMITTEE MEMBER
(Sign with full name)
Trang 8We would like to sincerely thank Professor Le My Ha for his thorough instruction, which helped us to have the necessary information to use for completing the thesis During the whole progress, even if we did our best to complete everything completely, mistakes are still inevitable We anticipate having my advisor's focused assistance and direction to help us gain more experience and successfully complete the topic project
On the other hand, we would like to express our sincere thanks to the Faculty of Hight Quality Training and Faculty of Electrical and Electronics Engineering where we obtained basic knowledge and experience Especially, we received the golf cart from FEEE honorably
Moreover, we also would like to thank ISLab members who helps us in detailing the works of this project They shared valuable experience and knowledge with us
Ultimately, we would like to express our gratitude to our families for their support
of our team throughout the implementation of this thesis
Sincere thanks for everything!
Trang 9A GUARANTEE
This thesis is the result of our study and implementation, which we hereby formally proclaim We did not plagiarize from a published article without author acceptance We will take full responsibility for any violations that may have occurred
Authors
Phan Thanh Danh
Nguyễn Tấn Thiên Niên
Trang 10ABSTRACT
Autonomous cars will be able to make an analysis and manage themselves on the ongoing path, depending on scene understanding and surrounding observation Particularly, the automobile does need to clarify the whole information surrounding it Inspired by that ideas, in this paper, we proposed a multi-sensor fusion method for autonomous cars operating on the HCMUTE campus The fusion method comprises three types of sensors, which are a Camera, GPS, and 2D LiDAR To begin with, we utilized and enhanced two deep learning models, which are lane-line detection and semantic segmentation Two of these models are pre-trained and fine-tuned on our self-labeled dataset As for the GP signal, we used Kalman Filter to reduce the noises from the environment and then check the continuous destination by a circular equation Additionally, we took advantage of 2D LiDAR as the safety term during the car avoidance process Last but not least, we combined all these programs by using threading and distributed system, which communicates with each other by User Datagram Protocol
The system uses our laptop with NVIDIA GTX 1650 graphic card as the primary processing unit To reduce the execution burden of our laptop, we subjoined a Jetson TX2 for processing GPS and LiDAR sensors In terms of the controlling system, the main microcontroller is two Arduino boards (Mega and Nano); one is for steering control, and another is for speed control Furthermore, we designed a simple interface using PyQT to display the on-road information
Experimental results reveal that the whole system works well on several campus roads Furthermore, the lowest frame per second of the system is 20, which satisfies real-time practical applications
Trang 11Contents
ACKNOWLEDGEMENT v
A GUARANTEE vi
ABSTRACT vii
LIST OF FIGURES x
LIST OF TABLES xii
CHAPTER 1 INTRODUCTION 1
1.1 OVERVIEW AND RELATED RESEARCH 1
1.2 RESEARCH OBJECTIVE 1
1.3 LIMITATION 1
1.4 RESEARCH CONTENT 2
1.5 THESIS SUMMARY 2
CHAPTER 2 LITERATURE REVIEW 4
2.1 DEEP LEARNING 4
2.1.1 Convolutional Neural Network 4
2.1.2 Image Segmentation 4
2.1.3 Common backbones 6
2.1.4 Optimization method 9
2.2 2D-LIDAR PROCESSING 12
2.2.1 Point cloud clustering. 12
2.2.2 Feature Extraction. 13
2.3 GPS PROCESSING 14
2.3.1 Kalman Filter 14
2.4 PID CONTROLLER 15
CHAPTER 3 HARDWARE PLATFORM 20
3.1 OVERALL SYSTEM 20
3.1 DETAILS 21
3.1.1 Ezgo Golf Cart 21
CHAPTER 4 SOFTWARE DESIGN 33
4.1 RGB IMAGE-BASED ALGORITHMS 34
4.1.1 Enhanced Semantic Segmentation 34
4.1.2 Lane-Line Detection 36
4.2 ALGORITHMS OF GPS DATA 37
Trang 124.3 ALGORITHMS ON 2D LiDAR 39
4.4 FUSION OF CAMERA AND 2D LIDAR 41
4.5 ESTIMATION AND CONTROL OF STEERING ANGLE 43
4.6 WHEELING STRATEGY 44
CHAPTER 5 EXPERIMENTAL RESULTS 47
5.1 EXPERIMENTAL ENVIRONMENT 47
5.1.1 Environment 47
5.1.2 Dataset 47
5.2 TRAINING PROCESS 51
5.3 RESULTS 53
5.4 COMPARISONS AND EVALUATION 59
CHAPTER 6 CONCLUSION AND FUTURE WORK 62
APPENDIX 66
Trang 13LIST OF FIGURES
Figure 2 1 Basic CNN architecture [6] 4
Figure 2 2 The difference between three types of segmentation 5
Figure 2 3.The residual block 6
Figure 2 4 Performance of Residual block with different layers testing 7
Figure 2 5 Resnet-18 architecture 7
Figure 2 6 Depthwise convolution, uses 3 kernels to transform a 12×12×3 image to an 8×8×3 image 8
Figure 2 7 Pointwise convolution, transforms an image of 3 channels into an image of 1 channel 8
Figure 2 8 Pointwise convolution with 256 kernels, outputting an image with 256 channels 9
Figure 2 9 Optimizer algorithms development 10
Figure 2 10 Momentum Idea 10
Figure 2 11 Adaptive Breakpoint Detection algorithm 12
Figure 2 12 RANSAC algorithm 13
Figure 2 13 PID controller 15
Figure 2 14 Function and Operation of Pytorch 16
Figure 2 15 TensorRT mechanism 17
Figure 2 16 OpenStreetMap Interface 18
Figure 2 17 Multithreading for the application 19
Figure 3 1: The overall hardware platform 20
Figure 3 2 The EZGO golf cart 21
Figure 3 3 Ezi servo and driver 22
Figure 3 4 Absolute 8-bit encoder 23
Figure 3 5:The receiver (left) and Devo7 transmitter (right) 24
Figure 3 6 Astra camera 25
Figure 3 7 RP-Lidar A1 25
Figure 3 8 Module GPS Ublox M8N 26
Figure 3 9 Waveshare touch Screen 27
Figure 3 10 Module WIFI router 28
Figure 3 11 Laptop Acer Nitro 5 29
Figure 3 12 Jetson TX2 30
Figure 3 13 The PCB (left) and the steering unit (right) 31
Figure 3 14 The connected diagram between devices (better to see in zoom) 32
Figure 3 15: The connection between relays and actuators to control speed 32
Figure 4 1 The overall software design 33
Figure 4 2 The concurrently executed thread 34
Figure 4 3 Convolutional Block Attention Module with two submodules 34
Trang 14Figure 4 4 The modified LiteSeg model with CSP and CBAM 35
Figure 4 5 The Ultra-Fast lane line detection model 36
Figure 4 6 Kalman filter flowchart 38
Figure 4 7 The circular checking idea 39
Figure 4 8 Point cloud cluster from ABD algorithm 40
Figure 4 9 Distance from LiDAR to the estimated straight line 40
Figure 4 10 Range and position of the camera as well as LiDAR 41
Figure 4 11 The desired trajectory through three phases 41
Figure 4 12 Angle estimation idea using PID The left is the error estimation (yellow arrow) using the expected middle point 44
Figure 4 13 Car model with detail parameter 45
Figure 5 1 Testing environments 47
Figure 5 2 The self-collected dataset 48
Figure 5 3 Tusimple dataset 48
Figure 5 4 The data labeling process (a,c) and the collected dataset (b,d) 49
Figure 5 5 The Mapilliary Vitas dataset 50
Figure 5 6 Labeling process using Hasty tool 50
Figure 5 7 GPS data Orange and green are the GPS data from module, and blue is the path from API 51
Figure 5 8 Training mIoU and loss graph of lane-line detection model 52
Figure 5 9 Training mIoU and loss graph of the segmentation detection model 53
Figure 5 10 Segmentation output with the campus road 54
Figure 5 11 Results of lane line detection Original image (left), the model prediction (right) 55
Figure 5 12 Results of the clustering algorithm in short and long distance 56
Figure 5 13 Position data in UTE map without Kalman filter and use Kalman filter 56
Figure 5 14 Steering angle visualization The red line is the steering angle sent to the microprocessor, and the blue line is the encoder angle's feedback 57
Figure 5 15 Position of a car through the trajectory in different cases 58
Figure 5 16 Position of the car during the avoiding process 59
Figure 5 17 LiteSeg model with different versions a) Original LiteSeg, b) CSP LiteSeg, c) CBAM LiteSeg, d) CSP CBAM LiteSeg 60
Trang 15LIST OF TABLES
Table 2 1 Common modules of Pytorch 16
Table 3 1 Specifications of the EZGO golf cart 21
Table 3 2: Specifications of Ezi servo and driver 22
Table 3 3 Specifications of absolute encoder 23
Table 3 4 Specifications of Devo7 transmitter 24
Table 3 5 Specifications of RX601 24
Table 3 6 Specifications of Astra camera 25
Table 3 7 Specifications of RPLidar A1 26
Table 3 8 Specifications of GPS Module (NEO-M8N) 26
Table 3 9 Specifications of Touch Screen 27
Table 3 10 Specifications of WIFI Module 28
Table 3 11 Specifications of Laptop Acer Nitro 5 29
Table 3 12 Specifications of Jetson TX2 30
Tabel 4 1 Kalman variables 37
Table 5 1 Training parameters of the lane-line detection model 51
Table 5 2 Training parameters of the segmentation model 52
Table 5 3 Comparison Liteseg model with different versions 60
Table 5 4 Comparison between the original model with TensorRT model 61
Table 5 5 Execution time after and before applying the threading technique 61
Trang 16ABBREVIATIONS
Systems
Module
Receiver/Transmitter
Trang 17CHAPTER 1 INTRODUCTION 1.1 OVERVIEW AND RELATED RESEARCH
For more than 50 years, the automobile sector has existed and flourished In the past few years, self-driving cars have been considered the favorable top-notch expectation, with the potential to be the largest technological revolution in the near future Finding the next scenario behavior for the autonomous car has been developed to improve traffic efficiency while ensuring vehicle stability and driver safety These things derive from the awareness
of the surrounding environment, which require the combination of many sensors
Nowadays, there are several methods were introduced to navigate a vehicle using the multisensor technique One of the earliest approaches [10] involved placing cameras, ultrasonic sensors (HC-RS04), and infrared sensors on the robot's various sides However, these sensor ranges cannot capture the entire geometry of objects on actual roadways In the paper [1], the authors gave us the advantages of sensors that can optimize the perceived information of the surrounding environment Extended Kalman Filter was introduced to combine multi sensors [2] that can localize the robot's position on the track The sensors used in that paper comprised an encoder, compass, IMU, and GPS In other research [3], the Lyapunov function and Kalman Filter were utilized to plan a trajectory for the robot with the aid of odometry Trendingly, combining 3D LiDAR with cameras is a common method for developing self-driving automobiles [4] or ADAS However, the expensive cost
of 3D LiDAR is a barrier for students Otherwise, 2D LiDAR is a low-cost alternative option for constructing autonomous systems On the other hand, as for the lateral control and the steering part, the paper [5] used a DC motor that is chain-driven and controlled by
a PID controller on an actual golf cart There is a control problem as well as specific faults caused by the transmission chain driven
In this thesis, our team managed to conduct a self-driving system for a golf cart that can run well in basic scenarios The golf cart work with multiple sensors aid, which can give the central processing block executes the program easier Moreover, we proposed a lateral control pipeline for a self-driving car without EPS (Electronic Power Steering), which is used in poor-mechanism cases
1.2 RESEARCH OBJECTIVE
Research, design, and construct an autonomous golf cart that can operate well on the HCMUTE campus For this purpose, our team has to build that hardware and research suitable algorithms that can operate the car At the same time, this project will be the most fundamental investigation of the self-driving car’s operation in reality
1.3 LIMITATION
Because the whole project is long-term, there are many remaining shortages In this thesis, some obvious limitations are presented as follows:
Trang 18 The automobile can operate in not many complicated environments such as crowded people scenarios, sudden changes, and so on
In our project, the self-driving car can operate on the HCMUTE campus only
Due to the restriction of camera angle, the car cannot run well on large-sized roads
The low-priced sensors do not work well in outdoor conditions
With the limitation of the hardware aids, this project prioritizes light, yet efficient algorithms Therefore, some best precise methods are not utilized in this thesis
1.4 RESEARCH CONTENT
The implemented contents are described as follows:
Content 1: Refer to documents, survey, read and summarize to determine the project directions
Content 2: Calculate parameters and design block diagram for steering system using DC servo
Content 3: Implement the printed circuit board and wire all modules together
Content 4: Try and handle errors in the wheeling system (mechanical and electrical)
Content5: Construct the wheeling algorithm suited for the entire system
Content 6: Collect and visualize data of sensors
Content 7: Choose models and algorithms for the car’s perception
Content 8: Write programs for microcontrollers and processors
Content 9: Test and evaluate the completing system
Content 10: Write a report
Content 11: Prepare slides for presenting
1.5 THESIS SUMMARY
The structure of this thesis is arranged as follows:
Chapter 1: Introduction
This chapter introduced the topic, the objectives, the limitations, the related works
of the research, and the layout of this thesis
Chapter 2: Literature review
This chapter gives the fundamental theory, the framework, and the algorithms for the implementation of the thesis, using the relevant studies as a source of reference
Chapter 3: The hardware platform
This chapter presents an overall description of the model's hardware, including crucial elements, equipped sensors, and the connection of them
Trang 19Chapter 4: Software design
This chapter presents an overall description of the model's software, including the
algorithms, parameters, and operation flow
Chapter 5: Experimental result, comparison, and evaluation
This chapter describes the experiments, which also include the experimental setting, data collecting, and initially predefined parameters Additionally, the results and the comparison were also presented
Chapter 6: Conclusion and future works
This chapter gives the conclusion and some future works which will be conducted
Trang 20CHAPTER 2 LITERATURE REVIEW 2.1 DEEP LEARNING
2.1.1 Convolutional Neural Network
Convolutional Neural Network (CNN) is a fundamental concept in the deep learning field Compared with other handcraft features, CNN can extract features deeply and concretely In detail, the features are extracted by CNN layers, which may describe clearer rules of the whole data Currently, the main function of CNN is to exploit the deep features
of kernels The input data is convoluted with the kernels with predefined shapes Thus, the features will become more and more complex through each layer Besides, CNN must combine with other layers to extract the features better such as max pooling or average pooling Finally, the exploited features are classified by “Fully Connected Layers (FCN)” after the flattening layer In nowadays networks, there are some common structures of CNN A CNN is just a series of layers, with the output of the former layer becoming the input of the next These CNNs have the typical structure shown in Figure 2.1 It comprises
2 main parts: (1) Backbone (feature extraction block), and (2) FCN (Classifier)
Figure 2 1 Basic CNN architecture [6]
Trang 21precise prediction labels to each pixel There are three kinds of IM: Instance Segmentation, Semantic Segmentation, and Panoptic Segmentation
+ Instance Segmentation: Instance segmentation is the process of finding and distinguishing each separate object of interest in a picture
+ Semantic Segmentation: Semantic Segmentation is the process of partitioning a digital picture into several segmented pieces [7] (sets of pixels, also known as image objects), with each pixel tagged to the relevant class displayed on the image
+ Panoptic Segmentation: Panoptic segmentation is the combination of instance and semantic segmentation
Figure 2.2 illustrates the differences between semantic segmentation, instance segmentation, and panoptic segmentation The execution time is the ratio to the number of classification tasks That means, with the same architecture, the Instance Segmentation is the lightest task and the Panoptic Segmentation is the heaviest task
Figure 2 2 The difference between three types of segmentation
Trang 222.1.3 Common backbones
2.1.3.1 ResNet
ResNet (Residual Network) was first introduced to the public in 2015, and it went
on to win first place in the 2015 ILSVRC competition with a top 5 error rate of only 3.57 percent [8] It also won first place in the ILSVRC and COCO 2015 competitions for ImageNet detection, ImageNet localization, Coco detection, and Coco segmentation Currently, there are several ResNet architectural versions available today, each with a different number of layers, such as ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, and so on
A problem that occurs when building a CNN network with many convolutional layers may occur the Vanishing Gradient phenomenon ResNet is the solution that uses a
"skip" connection to cross one or more layers shown in Figure 2.3 ResNet is similar to networks in that it uses convolution, pooling, activation, and fully-connected layers The residual block utilized in the network is seen in Figure 2.3 A curved arrow appears from the beginning and ends at the end of the residual block In other words, it will add Input X
to the output of the layer, or the addition we see in the illustration, which will counteract the zero derivatives since X is still added
Figure 2 3.The residual block The ResNet structure was presented as a simpler alternative that focused on increasing information through the gradient of the network As proved in Figure 2.4 [9], the idea of residual block solved Vanishing Gradient efficiently Following ResNet, some modifications to this architecture were introduced Experiments demonstrate that these structures can be taught using neural networks with hundreds of layers of depth, and they rapidly became the most common architecture in Computer Vision
Trang 23Figure 2 4 Performance of Residual block with different layers testing
Trang 24Deepwise Convolutional Strategy:
To begin, we must define Depthwise Convolution The tensor3D in Figure 2.6 input block
is divided into depth matrix slices Then, convolution is performed on each slice as indicated in Figure 2.6 Each 5x5x1 kernel iterates across one channel of the image, obtaining the scalar products of every 25 pixels group and producing an 8x8x1 image When you combine these images, you get an 8x8x3 image Convolution produces a result that is concatenated with depth As a result, the output is a tensor3D block of size hꞌ×wꞌ×c.
Figure 2 6 Depthwise convolution, uses 3 kernels to transform a 12×12×3 image to an
8×8×3 image
Pointwise Convolutional Strategy:
Afterward, the 12x12x3 picture has been converted to an 8x8x3 image using depthwise convolution The number of channels in each picture must now be increased The pointwise convolution gets its name from the fact that it uses a 1x1 kernel, which iterates across every single point This kernel has a depth equal to the number of channels
in the input picture; in our instance, there are three To generate an 8x8x1 image, we iterate
a 1x1x3 kernel through our 8x8x3 image shown in Figure 2.7
Figure 2 7 Pointwise convolution, transforms an image of 3 channels into an image of 1
channel
Trang 25Finally, we can create 256 channels for the outputs We just multiply the input with 256 kernels 1×1×3 like Figure 2.8 This type of convolution will make the model reduce the many parameters in the models
Figure 2 8 Pointwise convolution with 256 kernels, outputting an image with 256 channels
2.1.4 Optimization method
In this section, optimizers are presented with the aim of pushing the performance of the deep learning model However, the demand for trade-off models is highly necessary
2.1.4.1 Gradient Descent
Gradient descent (GD) is a first-order iterative optimization process used in mathematics
to determine the local minimum of a differentiable function [11] Because this is the steepest descent, the objective is to take repeated steps in the opposite direction of the gradient of the function at the current position There are several transformations of GD to optimize the training process and minimum loss function, which are listed as follows:
Stochastic gradient descent (SGD): It is a stochastic approximation of gradient descent optimization since it substitutes the real gradient with an estimate of it
Batch gradient descent: In Batch Gradient Descent, all of the training data is used
to take a single step
Mini-batch gradient descent: A mini-batch, is a batch with a predefined number of training data fewer than the actual data
2.1.4.2 Adam optimizer
Adam (Adaptive Moment Estimation) [12] is an optimized algorithm for gradient descent With the advantage of memory requirements, the Adam optimizer algorithm is efficient
Trang 26when working to solve a large problem involving huge data or parameters The Adam optimizer is a combination of two gradient descent methodologies, momentum, and RMSprop
Figure 2 9 Optimizer algorithms development
Momentum
This method utilizes the "exponentially weighted average" of the gradients to provide
a certain acceleration to make the gradient descent algorithm faster Using averages accelerates the algorithm's convergence to the minimum
Figure 2 10 Momentum Idea
ωt+1 = ωt − 𝛼𝑚𝑡 (2.1) 𝑤ℎ𝑒𝑟𝑒
𝑚𝑡 = 𝛽𝑚𝑡−1+ (1 − 𝛽) [ 𝛿𝐿
𝛿ωt] (2.2)
Trang 27 RMSprop
RMSprop, often known as the root mean square prop, is an adaptive learning technology that is an improved version of AdaGrad It uses the "exponential moving average" rather than the cumulative sum of squared gradients, as in AdaGrad
ωt+1 = ωt − 𝛼𝑡
(𝑣𝑡+ 𝜀)12
∗ [𝛿𝐿
𝛿ωt] (2.3) Where
𝑚𝑡 = 𝛽𝑚𝑡−1+ (1 − 𝛽2) [ 𝛿𝐿
𝛿ω t] (2.6) Because both 𝛽1and 𝛽2 are about to equal 1, the gain tends towards 0 By calculating
"bias-corrected" 𝑚𝑡 and 𝑣𝑡, this optimizer resolves the issue In order to avoid large oscillations when close to the global minimum, this is also done to manage the weights as they approach it The formulas employed are:
𝑣̂ = 𝑡 𝑣𝑡
1−𝛽 1𝑡 and 𝑚̂ = 𝑡 𝑚𝑡
1−𝛽 2𝑡 (2.7) Including them in our fundamental equation yields:
ωt+1 = ωt− 𝑚̂ (𝑡 𝛼
√𝑣̂ + 𝜀𝑡 ) (2.8)
Trang 282.2 2D-LIDAR PROCESSING
2.2.1 Point cloud clustering
Figure 2 11 Adaptive Breakpoint Detection algorithm The clustering procedure divides the raw LiDAR data into groups correlating to real-world objects Proposed in the paper [13], the algorithm utilized to categorize all the 2D data (points) is called Adaptive Breakpoint Detector (ABD) Traditionally, to cluster a set of point clouds including n points, the Euclidean distance between the spatial continuous points 𝑃𝑛 and 𝑃𝑛−1 are calculated If the distance between these specific points is lower than a threshold Dmax, they are in a cluster Equation (2.9) depicts the relationship of points
in a cluster through the predefined threshold
‖𝑃𝑛− 𝑃𝑛−1‖ > 𝐷𝑚𝑎𝑥 (2.9)
In fact, raw data of 2D LiDAR gets meager according to how far the objects are Therefore, 𝐷𝑚𝑎𝑥 must be changeable depending on the distance between LiDAR and the whole surroundings Adapting the threshold distance (𝐷𝑚𝑎𝑥) is one approach to get around this issue Geovany introduced a methodology estimating the adaptive threshold with respect to scanning range (𝑟𝑛) is shown in Figure 2.11 In this algorithm, a predetermined line passes through the point 𝑃𝑛−1 and makes an 𝜆 angle concerning the (𝑛 − 1)𝑡ℎ scan This straight-line attempts to calculate the allowable range point According to Figure 2.11 , in Δ𝐴𝐵𝐶, the law of Sines was applied to estimate 𝐷𝑚𝑎𝑥 as follows:
Trang 29After obtaining 𝐷𝑚𝑎𝑥 from equation (2.11), we measured the physical distance of two consecutive points by using Cosine law in a triangle, which is shown in equation (2.12) If the physical distance ‖𝑃𝑛− 𝑃𝑛−1‖ is greater than 𝐷𝑚𝑎𝑥, two continuous points are in a cluster
‖𝑃𝑛− 𝑃𝑛−1‖ = (𝑟𝑛)2+ (𝑟𝑛−1)2− 2𝑟𝑛−1𝑟𝑛cos(∆∅) (2.12)
2.2.2 Feature Extraction
To extract the feature of the obstacle, in this work, we utilized Random Sample Consensus [14] (RANSAC) to fit the straight line of the car body from the right The RANSAC method is a learning approach that uses random sampling of observed data to estimate model parameters RANSAC employs the voting mechanism to determine the best fitting result given a dataset with both inliers and outliers The dataset's pieces are utilized
to vote for one or more models This voting technique is based on two assumptions: that the noisy characteristics will not consistently vote for any particular model (few outliers), and that there are enough features to agree on a decent model (few missing data) From these advantages, RANSAC can be used as a straight line estimator in 2D space without many noises Figure 2.12 shows the RANSAC algorithms used in line prediction based on the voting technique
Figure 2 12 RANSAC algorithm The RANSAC technique can be illustrated by the algorithms as follows [15]:
Algorithm 𝟏: RANSAC algorithm
Input:
data – A set of observations
model – A model to explain observed data points
n – Minimum number of data points required to estimate model parameters
k – Maximum number of iterations allowed in the algorithm
t – Threshold value to determine data points that fit well with the model
d – Number of close data points required to assert that a model fits well to data
Trang 30Outputs: BestFit – model parameters that best fit the data (or null if no good model is
found)
Begin
while iterations < k do
maybeInliers := n randomly selected values from data
maybeModel := model parameters fitted to maybeInliers
alsoInliers := empty set
for every point in data not in maybeInliers do
if point fits maybeModel with an error smaller than t
add point to alsoInliers
end if
end for
if the number of elements in alsoInliers is > d then
betterModel := model parameters fitted to all points in maybeInliers and
alsoInliers
thisErr := a measure of how well betterModel fits these points
if thisErr < bestErr then
2.3 GPS PROCESSING
2.3.1 Kalman Filter
The Kalman Filter is a Linear-Gaussian State Space Model that consists of time series prediction techniques It was first introduced by Kalman and has since been widely studied and utilized
The Kalman filter [16] is mostly used to combine low-level data If the system can
be characterized as a linear model and the error as Gaussian noise, the recursive Kalman filter will generate optimal statistical estimates The Kalman filter calculates the system's state as a weighted average of the projected state and the new measurement The weights are intended to ensure that values with lower estimated uncertainty are "trusted" more The
Trang 31weights are derived from the covariance, which is a measure of the estimated uncertainty
in predicting the system's state The weighted average yields a new state estimate that falls somewhere between the expected and observed states and has a lower estimated uncertainty than either alone At each time step, this process is repeated, with the current estimate and its covariance informing the prediction used in the next iteration This means that the Kalman filter works recursively, using only the most recent "best estimate" of a system's state to generate a new state rather than the complete history
2.4 PID CONTROLLER
A Proportional Integral Derivative (PID) [17] shown in Figure 2.13 is a common type of feedback mechanism in control loops PID controllers are the most often utilized controllers in feedback controllers in industrial control systems The "error" value is calculated by the PID controller as the difference between the variable parameter measured value and the intended set value By modifying the input control value, the controller will reduce the error Simply, the three phases of PID are described as follows:
P: is a proportional adjustment parameter that aids in the generation of an adjustment signal proportionate to the initial inaccuracy of the input based on the sampling time
I: is the integral of the error over the sampling time Integral control is a tuning method to generate tuning signals so that the error is reduced to 0 This tells us the total instantaneous error over time or the accumulated error in the past The smaller the time, the stronger the integral adjustment effect, corresponding to the smaller deviation
D: is the error differential The differential control creates an adjustment signal proportionate to the input bias rate of change The greater the duration, the greater the differential tuning range, and hence the faster the regulator responds to input changes
Figure 2 13 PID controller
Trang 322.5 PYTORCH FRAMEWORK
PyTorch is an open-source machine learning framework based on the Torch library that is used for applications such as computer vision and natural language processing [18] Pytorch is developed under an open-source license, hence it has a wide community Though Google's Tensorflow is a well-established ML/DL framework with a devoted following, PyTorch has established a stronghold due to its dynamic graph approach and flexible debugging strategy [19] Tensor is the primary data structure in PyTorch Tensor, like NumPy array, is a one-dimensional array with elements of the same data type
Pytorch packages many modules to support the training process shown in Figure 2.14 These modules support the user to conduct training duration easier with different hardware platforms
Figure 2 14 Function and Operation of Pytorch Table 2.1 lists some most-used modules with different phases such as load dataset, config model, and so on
Table 2 1 Common modules of Pytorch
torch.nn Define basic blocks such as layers in models
torch.optim Define optimizers in Deep learning/ Machine Learning
torch.utils Commonly, used for loading datasets with different loader
Trang 33torch.autograd PyTorch’s automatic differentiation engine that powers neural
network training
torch.backward Computes the gradient of the current tensor
We can also convert PyTorch-trained models into formats like ONNX, allowing them to be used in other “Deep learning frameworks” like MXNet, CNTK, and Caffe2 We can also convert ONNX models to Tensorflow or TensorRT format
2.6 TENSOR-RT PATTERN
TensorRT is an NVIDIA library designed to increase inference performance and minimize latency on NVIDIA graphics hardware (GPUs) [20] It can increase inference speed by 2-4 times when compared to real-time services and by 30 times when compared
to CPU performance TensorRT uses five methods of optimization to improve inference speed, which is shown in Figure 2.15
Figure 2 15 TensorRT mechanism
Precision calibration
During training, the parameters and activations in precision FP32 (Float Point 32) will be converted to precision FP16 or INT8 Optimizing it will decrease stagnation and boost inference speed, albeit at the price of slightly decreasing model accuracy In real-time recognition, a trade-off between accuracy and inference speed
is sometimes essential
Trang 34 Dynamic Tensor Memory
TensorRT will merge nodes vertically, horizontally, or both ways to minimize GPU memory and bandwidth
Layer and Tensor Fusion
Several kernels dedicated to optimization are conducted during the model optimization process
Kernel Auto-Tuning
Simply allocates the memory necessary for each tensor and only for the time that it will be used Moreover, minimizing the memory footprint and encouraging memory re-use is boosted
Multi-Stream Execution
Enables concurrent processing for various input streams
2.7 OPEN STREET MAP
OpenStreetMap [21] (OSM) is an online world map, which was created by many volunteers as amateurs on the Wikipedia this map was released with a free license to encourage the creation of users to the global map data The OSM is widely used because
of the following advantages:
- The online global map data, save as the vector
- Run on many programs: JOSM,
- Easy to change the data format
Figure 2 16 OpenStreetMap Interface
2.8 MULTITHREADING
In the computer system, a program needs at least one thread to execute that is called The word ‘Multithreading’ [22] is used to perform a program execution that uses more than
Trang 35one thread to run In the real world, a computer usually has many cores of CPU and the thread in the program will be executed back and forth on those CPUs via the scheduler (a part of the operating system)
Figure 2 17 Multithreading for the application The advantages of multithreading are “Full utilization of CPU resources” and Increase user experience
Trang 36CHAPTER 3 HARDWARE PLATFORM 3.1 OVERALL SYSTEM
Figure 3 1: The overall hardware platform This section presents the functions of each hardware component and their relation
to them Overall, the whole hardware system is illustrated in Figure 3.1 The functions of each component are described as follows:
- Inputs: We have mainly 4 input devices: A camera, 2D LiDAR, GPS module, and
encoder These devices aim to collect the input from the surrounding environment
- Processors: Two main processors are utilized, which are our Laptop and Nvidia
Jetson TX2 board These processors are used for receiving data from peripherals,
and processing and sending signals to microcontrollers
- Microcontrollers: There are two microcontrollers on the PCB One is for controlling steering angle and another is for controlling the speed
- Actuators: The part that executes commands from the microcontroller
Trang 373.1 DETAILS
3.1.1 Ezgo Golf Cart
Figure 3 2 The EZGO golf cart Table 3 1 Specifications of the EZGO golf cart
Function and feature:
- The golf cart is the implementing model
- Dimensions of the real car
- Have no Electronic Power Steering Support (RPSS)
Trang 383.1.2 Ezi Servo and Driver
Figure 3 3 Ezi servo and driver Table 3 2: Specifications of Ezi servo and driver
Function and feature:
- Supply pulse for steering servo
- Save the servo in cases: overpower, a circuit error, over current, and so on
- It receives pulses from the microcontroller and control servo
Specifications
Control Method Closed loop control with 23bit MCU
Current Consumption Max 500mA (Except motor current)
Max Input Pulse
I/O
Signal Input Signals
Position Command Pulse, Servo On/Off, Alarm Reset (Photocoupler Input)
Output Signals
In-Position, Alarm (Photocoupler Output) Encoder Signal (A+, A-, B+, B-, Z+, Z-, 26C31 of Equivalent) (Line Driver Output), Brake
Trang 393.1.3 Absolute Encoder
Figure 3 4 Absolute 8-bit encoder Table 3 3 Specifications of absolute encoder
Function and feature:
- Used for measuring the steering angle directly
- The resolution of the encoder is 256 pulses
Trang 403.1.4 Devo7 and RX 601
Figure 3 5:The receiver (left) and Devo7 transmitter (right) Table 3 4 Specifications of Devo7 transmitter
Function and feature:
- Used for manual control mode, Devo7 is the transmitter with seven channels
- The maximum distance can reach 80m