Embedded Computing Platforms for Real-Time Inferencing Real-time embedded systems, such as an autonomous vehicle, present unique challenges for deep learning, as the computing platforms
Trang 1CONTENTS
THESIS TASKS i
SCHEDULE ii
ASSURANCE STATEMENT iv
ACKNOWLEDGEMENT v
ADVISOR’S COMMENT SHEET vi
REVIEWER’S COMMENT SHEET vii
CONTENTS viii
ABBREVIATIONS AND ACRONYMS x
LIST OF FIGURES xii
LIST OF TABLES xv
ABSTRACT xvi
CHAPTER 1: OVERVIEW 1
1.1 INTRODUCTION 1
1.2 BACKGROUND AND RELATED WORK 2
1.2.1 Overview about Autonomous Car 2
1.2.2 Literature Review and Other Study 3
1.3 OBJECTIVES OF THE THESIS 7
1.4 OBJECT AND RESEARCHING SCOPE 7
1.5 RESEARCHING METHOD 8
1.6 THE CONTENT OF THESIS 9
CHAPTER 2: THE PRINCIPLE OF SELF – DRIVING CARS 10
2.1 INTRODUCTION OF SELF – DRIVING CARS 10
2.2 DIFFERENT TECHNOLOGIES USED IN SELF-DRIVING CARS 11
2.2.1 Laser 11
2.2.2 Lidar 12
2.2.3 Radar 15
2.2.4 GPS 16
2.2.5 Camera 16
2.2.6 Ultrasonic Sensors 17
2.3 OVERVIEW ABOUT ARTIFICIAL INTELLIGENCE 18
2.3.1 Artificial Intelligence 18
2.3.2 Machine Learning 19
2.3.3 Deep Learning 21
CHAPTER 3: CONVOLUTIONAL NEURAL NETWORK 24
3.1 INTRODUCTION 24
3.2 STRUCTURE OF CONVOLUTIONAL NEURAL NETWORKS 24
3.2.1 Convolution Layer 25
3.2.2 Activation function 27
3.2.3 Stride and Padding 28
3.2.4 Pooling Layer 29
Trang 23.2.5 Fully-Connected layer 30
3.3 NETWORK ARCHITECTURE AND PARAMETER OPTIMIZATION 31
3.4 OBJECT DETECTION 32
3.4.1 Single Shot Detection framework 32
3.4.2 MobileNet Architecture 34
3.4.3 Non-Maximum Suppression 38
3.5 OPTIMIZE NEURAL NETWORKS 39
3.5.1 Types of Gradient Descent 39
3.5.2 Types of Optimizer 40
CHAPTER 4: HARDWARE DESIGN OF SELF-DRIVING CAR PROTOTYPE 43
4.1 HARDWARE COMPONENTS 43
4.1.1 1/10 Scale 4WD Off Road Remote Control Car Buggy Desert 43
4.1.2 Brushed Motor RC-540PH 44
4.1.3 Motor control module BTS7960 45
4.1.4 RC Servo MG996 47
4.1.5 Raspberry Pi 3 Model B+ 47
4.1.6 NVIDIA Jetson Nano Developer Kit 50
4.1.7 Camera Logitech C270 53
4.1.8 Arduino Nano 54
4.1.9 Lipo Battery 2S-30C 2200mAh 56
4.1.10 Voltage reduction module 57
4.1.11 USB UART PL2303 59
4.2 HARDWARE WIRING DIAGRAM 59
4.2.1 Construct The Hardware Platform 60
4.2.2 PCB Of Hardware 61
CHAPTER 5: CONTROL ALGORITHMS OF SELF-DRIVING CAR PROTOTYPE 62
5.1 CONTROL THEORY 62
5.1.1 Servo Control Theory 62
5.1.2 UART Communication Theory 64
5.2 FLOWCHART OF COLLECTING TRAINING DATA 67
5.3 FLOWCHART OF NAVIGATING THE CAR USING TRAINED MODEL 68 CHAPTER 6: EXPERIMENTS 69
6.1 EXPERIMENTAL ENVIRONMENTS 69
6.2 COLLECT DATA 70
6.3 DATA AUGMENTATIONS 71
6.4 TRAINING PROCESS 72
6.5 OUTDOOR EXPERIMENTS RESULTS 75
CHAPTER 7: CONCLUSION AND FUTURE WORK 77
REFERENCES 79
Trang 3ABBREVIATIONS AND ACRONYMS
ADAS : Advanced Driving Assistance System
CCD : Charge Coupled Device
CMOS : Complimentary Metal-Oxide Semiconductor
CNN : Convolutional Neural Network
DSP : Digital Signal Processing
FOV : Field of View
FPGA : Field-Programmable Gate Array
GPS : Global Positioning System
GPIO : General Purpose Input-Output
GPU : Graphics Processing Unit
IMU : Inertial Measurement Unit
LIDAR : Light Detection And Ranging
PAS : Parking Assistance System
PCB : Printed Circuit Board
PWM: Pulse Width Modulation
RADAR : Radio Detection And Ranging
Trang 4SSD : Single Shot Detection
YOLO : You Only Look Once
SGD : Stochastic Gradient Descent
RMSProp : Root Mean Square Propagation
NAG : Nesterov Accelerated Gradient
Adagrad : Adaptive Gradient Algorithm
Adam : Adaptive Moment Estimation
Trang 5LIST OF FIGURES
Figure 1.1 Google’s fully Self-Driving Car design introduced in May 2014 3
Figure 1.2 Features of Google Self-Driving Car 4
Figure 1.3 Uber Self-driving Car 5
Figure 1.4 Self-driving Car of HCMUTE 7
Figure 2.1 How Cars are getting smarter 10
Figure 2.2 Important components of a self-driving car 11
Figure 2.3 A laser sensor on the roof constantly scans the surroundings 12
Figure 2.4 The map was drawn by LIDAR 13
Figure 2.5 Structure and functionality of LIDAR 14
Figure 2.6 Comparison between LIDAR and RADAR 16
Figure 2.7 Camera on Autonomous Car 17
Figure 2.8 Placement of ultrasonic sensors for PAS 17
Figure 2.9 Ultrasonic Sensor on Autonomous Car 18
Figure 2.10 Relation Between AI, Machine Learning, and Deep Learning 20
Figure 2.11 Neural networks, which are organized in layers consisting of a set of interconnected nodes Networks can have tens or hundreds of hidden layers 22
Figure 2.12 A simple Neural Network or a Perceptron 22
Figure 2.13 Compare performance between DL with order learning algorithms 23
Figure 3.1 CNN architecture 24
Figure 3.2 The input data, filter and result of a convolution layer 25
Figure 3.3 The convolutional operation of a CNN I is an input array K is a kernal I*K is a output of convolution operation 26
Figure 3.4 The result of a convolution operation (a) is input image 26
Figure 3.5 Perform multiple convolutions on an input 27
Figure 3.6 The convolution operation for each filter 27
Figure 3.7 Apply zero Padding for input matrix 29
Figure 3.8 The max pooling operation 29
Figure 3.9 The deep Neural Networks to classify multiple classes 30
Figure 3.10 Network architecture 31
Figure 3.11 Classify multiple classes in image (a) Image with generate boxes (b) Default boxes with 8x8 feature map (c) Default boxes with 4x4 feature map 32
Figure 3.12 The standard convolutional filters in (a) are replaced by two layers: depthwise convolution in (b) and pointwise convolution in (c) to build a depthwise separate filter 36
Figure 3.13 The left side is a convolutional layer with batchnorm and ReLU The right side is a depth wise separate convolutions with depthwise and pointwise layers followed by batchnorm and ReLU 37
Figure 3.14 Detected Images before applying Non-max suppression (a) Left traffic sign, (b) Car object 38
Trang 6Figure 3.15 Detected Images after applying Non-max suppression (a) Left traffic
sign, (b) Car object 38
Figure 4.1 Block diagram of RC self-driving car platform 43
Figure 4.2 1/10 Scale 4WD Off Road Remote Control Car Dessert Buggy 43
Figure 4.3 Brushed Motor RC-540PH 45
Figure 4.4 Outline drawing of Brushed motor RC-540 45
Figure 4.5 Motor control motor BTS7960 46
Figure 4.6 Digital RC Servo FR-1501MG 47
Figure 4.7 Raspberry Pi 3 Model B+ 47
Figure 4.8 Raspberry Pi block function 49
Figure 4.9 Raspberry Pi 3 B+ Pinout 49
Figure 4.10 Location of connectors and main ICs on Raspberry Pi 3 50
Figure 4.11 NVIDIA Jetson Nano Developer Kit 50
Figure 4.12 Jetson Nano compute module with 260-pin edge connector 51
Figure 4.13 NVIDIA Jetson Nano Pinout 52
Figure 4.14 Performance of various deep learning inference networks with Jetson Nano and TensorRT, using FP16 precision and batch size 1 53
Figure 4.15 Camera Logitech C270 53
Figure 4.16 Arduino Nano 54
Figure 4.17 Arduino Nano Pinout 56
Figure 4.18 Lipo Battery 2S-30C 2200mAh 56
Figure 4.19 LM2596S Dual USB Type 57
Figure 4.20 DC XH-M404 XL4016E1 8A 58
Figure 4.21 USB UART PL2303 59
Figure 4.22 Hardware Wiring Diagram 59
Figure 4.23 Hardware Platform (a) Front, (b) Side 60
Figure 4.24 Circuit diagram 61
Figure 4.25 Layout of the PCB 61
Figure 5.1 Inside of RC Servo 63
Figure 5.2 Variable Pulse width control servo position 63
Figure 5.3 UART Wiring 64
Figure 5.4 UART receives data in parallel from the data bus 65
Figure 5.5 Data Frame of Transmitting UART 66
Figure 5.6 Transmitting and Receiving UART 66
Figure 5.7 Data Frame of Receiving UART 66
Figure 5.8 Converts the serial data back into parallel 66
Figure 5.9 Flowchart of collect image training 67
Figure 5.10 Flowchart of Navigating the Car using Trained Model 68
Figure 6.1 The overall oval-shaped lined track 69
Figure 6.2 Lined track and traffic signs recognition 69
Figure 6.3 Traffic signs (a) Left, (b) Right, (c) Stop 70
Figure 6.4 Some typical images of the Dataset 70
Figure 6.5 Horizontal Flipis (a) Original image, (b) Horizontal flipped image 71
Trang 7Figure 6.6 Brightness Augmentation (a) Original image, (b) Brighter image and
(c) Darker image 72
Figure 6.7 GUI of Training App 72 Figure 6.8 Model is under training 73 Figure 6.9 The visualization output of convolutional layers (a) is an originally
selected frame.(b), (c), (d), (e) and (f) are the feature maps at first five
convolutional layers 74
Figure 6.10 Change in loss value throughout training 74 Figure 6.11 Change in accuracy value throughout training 75 Figure 6.12 Experimental Results: The top is images and the bottom is outputs after
through Softmax function of the model (a) Steer is100, (b) Steer is 110, (c) Steer is
120, (d) Steer is 130, (e) Steer is 140, (f) Steer is 150, (g) Steer is 160 75
Figure 6.13 The actual and predicted steering wheel angles of the models 76 Figure 6.14 The outputs of the object detection model 76
Trang 8LIST OF TABLES
Table 3.1 MobileNet body Architecture 377
Table 4.1 Specification of Brushed motor RC-540 455
Table 4.2 Specifications of Raspberry Pi 3 Model B+ 488
Table 4.3 Specifications of NVIDIA Jetson Nano 51
Table 4.4 Technical specifications of Arduino Nano 555
Trang 9
as output The experimental results demonstrate the effectiveness and robustness of autopilot model in lane keeping task and detect traffic sign The speed is about 5-6km/h in a wide variety of driving conditions, regardless of whether lane markings are present or not
Keywords: Computer Vison, Real-time navigation, self-driving car, Object
Detection Model, Classification Model
Trang 10CHAPTER 1: OVERVIEW
1.1 INTRODUCTION
Lately year, technology companies have been discussing autonomous cars and trucks Promises of life-changing safety and ease have been hung on these vehicles Now some of these promises are beginning to come to fruition, as cars with more and more autonomous features hit the market each year
Autonomous car is most likely still years away from being available to consumers, they are closer than many people think Current estimates predict that
by 2025 the world will see over 600,000 self-driving cars on the road, and by 2035 that number will jump to almost 21 million Trials of self-driving car services have actually begun in some cities in the United States And even though fully self-driving cars are not on the market yet, current technology allows vehicles to be more autonomous than ever before Using intricate systems of cameras, lasers, radar, GPS, and interconnected communication between vehicles
Since their introduction in the early 1990s, Convolutional Neural Networks (CNNs) [1] have been the most popular deep learning architecture due to the effectiveness of conv-nets on image related problems such as handwriting recognition, facial recognition, cancer cell classification The breakthrough of CNNs is that features are learned automatically from training examples Although their primary disadvantage is the requirements of very large amounts of training data, recent studies have shown that excellent performance can be achieved with networks trained using “generic” data For the last few years, CNNs have achieved state-of-the-art performance in most of important classification tasks [2], Object detection tasks [3]; Generative Adversarial Networks [4]
Besides, with the increase in computational capacities, we are presently able to train complex neural networks to understand the vehicle’s environment and decide its behavior For example, Tesla Model S was known to use a specialized chip (MobileEye EyeQ), which used a deep neural network for vision-based real-time obstacle detection and avoidance More recently, researchers are investigating DNN based end-to-end control of cars and other robots [5] Executing CNN on an embedded computing platform has several challenges First, despite many calculations, strict real-time requirements are demanded For instance, latency in a vision-based object detection task may be directly linked to the safety of the vehicle On the other hand, the computing hardware platform must also satisfy cost, size, weight, and power constraints These two conflicting requirements complicate
Trang 11the platform selection process as observed in There are already several relatively low-cost RC-car based prototypes, such as MIT’s RaceCar [6] and UPenn’s F1/10 racecar
Encouraged by these positive results, in this thesis we develop a real-time end deep learning based RC-car platform In term of hardware, it is included a Raspberry Pi 3 Model B plus quad-core computer, a Jetson Nano Developer Kit , two Logitech camera, an Arduino Nano and a 1/10 scale RC car This research target is training two model: a vision-oriented model and an object detection model
end-to-to real-time auend-to-tonomously navigate the car in outdoor environments with a wide variety of driving conditions
1.2 BACKGROUND AND RELATED WORK
1.2.1 Overview about Autonomous Car
Autonomous Car is the car that can be self-driving, they combine sensors and software to control, navigate, and drive the vehicle Currently, there are no legally operating, fully-autonomous vehicles in the United States However, there are partially-autonomous vehicles-cars and trucks with varying amounts of self-automation, from conventional cars with brake and lane assistance to highly-independent, self-driving prototypes Though still in its infancy, self-driving technology is becoming increasingly common and could radically transform our transportation system (and by extension, our economy and society) Based on automaker and technology company estimates, level 4 self-driving cars could be for sale in the next several years (see the callout box for details on autonomy levels) Various self-driving technologies have been developed by Google, Uber, Tesla, Nissan, and other major automakers, researchers, and technology companies
While design details vary, most self-driving systems create and maintain an internal map of their surroundings, based on a wide array of sensors, like radar Uber’s self-driving prototypes use sixty-four laser beams, along with other sensors,
to construct their internal map; Google’s prototypes have, at various stages, used lasers, radar, high-powered cameras, and sonar Software then processes those inputs, plots a path, and sends instructions to the vehicle’s “actuators,” which control acceleration, braking, and steering Hard-coded rules, obstacle avoidance algorithms, predictive modeling, and “smart” object discrimination (etc, knowing the difference between a bicycle and a motorcycle) help the software follow traffic rules and navigate obstacles Partially-autonomous vehicles may require a human driver to intervene if the system encounters uncertainty; fully-autonomous vehicles may not even offer a steering wheel Self-driving cars can be further distinguished
Trang 12as being “connected” or not, indicating whether they can communicate with other vehicles and/or infrastructure, such as next generation traffic lights Most prototypes
do not currently have this capability
1.2.2 Literature Review and Other Study
1.2.2.1 Google Self-driving Car Project
Although many companies are racing to be the ones to bring a fully autonomous, commercially viable vehicle to the market including Lyft, Ford, Uber, Honda, Toyota, Tesla and many others, it’s Waymo, the autonomous vehicle division of Alphabet, Google’s parent company, that has been the first to reach many milestones along the journey
At the end of last year, the Waymo team announced on November 7, 2017,
“Starting now, Waymo's fully self-driving vehicles—our safest, most advanced vehicles on the road today—are test-driving on public roads, without anyone in the driver's seat.”
These vehicles have been on the roads of Chandler, AZ, a suburb of Phoenix, since mid-October without a safety driver behind the wheel, although until further notice there is a Waymo employee in the back seat Waymo vehicles are equipped with powerful sensors that provide them with 360-degree views of the world, something a human behind the wheel never gets There are short-range lasers and those that can see up to 300 meters away
These vehicles don’t have free rein to drive wherever they want quite yet—they are “geofenced” within a 100-square-mile area As the cars collect more data and acquire more driving experience, that area will expand Waymo has an Early Rider program that allows those to apply who might be interested in using the autonomous vehicles to transport them around town
Figure 1.1 Google’s fully Self-Driving Car design introduced in May 2014
Trang 13Figure 1.2 Features of Google Self-Driving Car
Technology: Google’s robotic cars have about $150,000 in equipment as shown
in Figure 1.2 including a LIDAR system that itself costs $70,000 The Velodyne
64-beam laser range finder mounted on top allows the vehicle to generate a detailed 3D map of its environment The car takes these generated maps and combines them with high-resolution maps of the world, producing different types of data models that are then used for driving itself Some of these computations are performed on remote computer farms, in addition to on-board systems
Limitations: As of August 28, 2014 the latest prototype has not been tested in
heavy rain or snow due to safety concerns The car still relies primarily on programmed route data; they do not obey temporary traffic signals, and in some situations, revert to a slower “extra cautious” mode in complex unmapped intersections The lidar technology cannot spot some potholes or discern when humans, such as a police officer, are signaling the car to stop However, Google is having these issues fixed by 2020
pre-1.2.2.2 Uber Self-driving Car Project
Uber thought it would have 75,000 autonomous vehicles on the roads this year and be operating driverless taxi services in 13 cities by 2022, according to court documents unsealed last week To reach those ambitious goals, the ridesharing company, which hopes to go public later this year, was spending $20 million a month on developing self-driving technologies
The figures, dating back to 2016, paint a picture of a company desperate to meet over-ambitious autonomy targets and one that is willing to spend freely, even recklessly, to get there As Uber prepares for its IPO later this year, the new details could prove an embarrassing reminder that the company is still trailing in its efforts
Trang 14Figure 1.3 Uber Self-driving Car
to develop technology that founder Travis Kalanick called “existential” to Uber’s future The report was written for Uber as part of last year’s patent and trade secret theft lawsuit with rival Waymo, which accused engineer Anthony Levandowski of taking technical secrets with him when he left Google to found self-driving truck startup Otto Uber acquired Otto in 2016 Uber hired Walter Bratic, the author of the report, as an expert witness to question Waymo’s valuation of the economic damages it had suffered — a whopping $1.85 billion Bratic’s report capped at
$605,000 the cost to independently develop Waymo’s purported trade secrets
Waymo eventually settled for 0.34 percent of Uber’s equity, which could be worth around $300 million after an IPO if a recent $90 billion valuation of the company is accurate Bratic’s report provides details of internal analyses and reports codenamed Project Rubicon that Uber carried out during 2016 A presentation in January that year projected that driverless cars could become profitable for Uber in
2018, while a May report said Uber might have 13,000 self-driving taxis by 2019 Just four months later, that estimate had jumped to 75,000 vehicles
The current head of Uber’s self-driving technologies, Eric Meyhofer, testified that Uber’s original estimates of having tens of thousands of AVs in a dozen cities
by 2022 were “highly speculative” “assumptions and estimates.” Although Meyhofer declined to provide any other numbers, he did say, “They probably ran a lot of scenarios beyond 13 cities Maybe they assumed two in another scenario, or one, or three hundred It’s a set of knobs you turn to try to understand parameters that you need to try to meet.”
Trang 15One specific goal, set by John Bares, the engineer then in charge of Uber’s autonomous vehicles, was for Uber to be able to forgo human safety drivers by
2020 The company’s engineers seemed certain that acquiring Otto and Levandowski would supercharge its progress
1.2.2.3 Embedded Computing Platforms for Real-Time Inferencing
Real-time embedded systems, such as an autonomous vehicle, present unique challenges for deep learning, as the computing platforms of such systems must satisfy two often conflicting goals [7]: The platform must provide enough computing capacity for real-time processing of computationally expensive AI workloads (deep neural networks); The platform must also satisfy various constraints such as cost, size, weight, and power consumption limits
Accelerating AI workloads, especially inferencing operations, has received a lot
of attention from academia and industry in recent years as applications of deep learning are broadening to areas of real-time embedded systems such as autonomous vehicles These efforts include the development of various heterogeneous architecture-based system-on-a-chips (SOCs) that may include multiple cores, GPU, DSP, FPGA, and neural network optimized ASIC hardware Consolidating multiple tasks on SoCs with lots of shared hardware resources while guaranteeing real-time performance is also an active research area, which is orthogonal to improving raw performance Consolidation is necessary for efficiency, but unmanaged interference can nullify the benefits of consolida-tion For these reasons, finding a good computing platform is a non-trivial task, one that requires a deep understanding of the workloads and the hardware platform being utilized
1.2.2.4 Real-Time Self-Driving Car Navigation Using Deep Neural Network (Paper)
This is the paper was published on the 4th International Conference on Green Technology and Sustainable Development – GTSD 2018
This paper presented a monocular vision-based self-driving car prototype using Deep Neural Network First, the CNN model parameters were trained by using data collected from vehicle platform built with a 1/10 scale RC car, Raspberry Pi 3 Model B computer and front-facing camera The training data were road images paired with the time-synchronized steering angle generated by manually driving Second, road tests the model on Raspberry to drive itself in the outdoor environment around oval-shaped and 8-shaped with traffic sign lined track The
Trang 16Figure 1.4 Self-driving Car of HCMUTE
experimental results demonstrate the effectiveness and robustness of autopilot model in lane keeping task Vehicle's top speed is about 5-6km/h in a wide variety
of driving conditions, regardless of whether lane markings are present or not [8] In this paper, model Classification was used and the training accuracy was 92,38%
1.3 OBJECTIVES OF THE THESIS
The thesis concentrated on several key goals:
Research will be conducted into the theory of an autonomous vehicle and the various considerations that may be necessary during the construction of a self-driving car prototype
Researching and using the suitable and compatible components such as sensor, microcontroller, motor, driver, power supply, communication, etc… In detail, this thesis will mainly focus on using Raspberry Pi 3 Model B+, NVIDIA Jetson Nano Kit, Camera Module, Arduino Nano and some of Driver
Using Convolutional Neural Network directly maps raw input images to a predicted steering angle and detect some object about traffic sign (Left, Right, Stop) and “Car” object as output
The target is the car can drive itself real-time in the outdoor environment into the map and obey the traffic sign
1.4 OBJECT AND RESEARCHING SCOPE
In the scope of this thesis, a real-time end-to-end deep learning based on RC-car platform was develop using 1/10 scale RC Car chassis, Raspberry Pi 3 Model B+, NVIDIA Jetson Nano Kit, Logitech Camera and Arduino Nano Convolutional Neural Network (CNN) directly maps raw input images to a predicted steering
Trang 17angle as output Road testing the trained model on Raspberry and Jetson Nano to drive itself real-time in the outdoor environment around the map with traffic sign
lined track Then evaluating the effectiveness and robustness of autopilot model in lane keeping task rely on experimental results
1.5 RESEARCHING METHOD
The research project was divided into chapters, each a sequential step in the process of developing and building the self-driving car prototype This approach was utilized in an attempt to progress the project from one task to the next, as it was undertaken Each is defined so that it builds on the previous task thus evolving the robot within the goals and requirements generated This ultimately led to the completion of the car that met the objectives within the timeframe available
The first step is to define the key point and objective to deeply understanding what a real-time end-to-end deep learning based self-driving car and convolutional neural network actually is Besides that, it is critical to determining plans for conducting research and performing the suitable design and programming
The second step is to refer the previous projects It is useful to have appropriate approaches This established the foundations for making an informed decision based on the previous experiences to avoid mistakes and obsolete design
The third step in theoretical method was to apply knowledge studied to control system This approach provides valuable data of the likely stability capability of the control system by finding the suitable parameters through many days
The next step is to design, manufacture PCB, chassis, drive shaft, etc…and assembly all hardware components In parallel, for making the good system, the following step is to analyze the performance of the system This also provides the chances to calibrate and perform additional changes to making the final system The practical method is to directly design mechanism and PCB This approach allows for testing on real system and making the final assessment by the response of self – driving car prototype
Road testing the trained model on Raspberry to drive itself real-time in the outdoor environment around oval-shaped and 8-shaped with traffic sign lined track Then evaluating the effectiveness and robustness of autopilot model in lane keeping task rely on experimental results
Trang 181.6 THE CONTENT OF THESIS
The thesis “Research, Design and Construct Real–Time Self–Driving Car using Deep Neural Network” includes the following chapters:
Chapter I: Overview: This chapter provides a brief overview of the requirements
of the report including introduction, goals, scope and content of the thesis
Chapter II: The principle of self-driving cars: This chapter provides the basic
knowledge about this thesis such as the principle of self-driving car, artificial intelligence, machine learning, deep learning, convolutional neural network
Chapter III: Convolutional Neural Network: This chapter gives the
knowledge about Convolutional Neural Network, the Structure of CNN
Chapter IV: Hardware Design of Self-driving car Prototype: This chapter
provides the car’s design, choosing hardware After that, constructing the car platform based on the design
Chapter V: Control Algorithms of Self-driving car Prototype: This chapter
provides algorithms, diagrams of software
Chapter VI: Experiment: This chapter shows the experimental results of this
thesis
Chapter VII: Conclusion and Future Work: This chapter provides the
conclusion in term of advantage and limitation of this thesis This also concludes the contribution and proposing ideas, orientations for future work
Trang 19CHAPTER 2: THE PRINCIPLE OF SELF – DRIVING CARS
In this section, the overview of self-driving cars and related technologies is presented This includes the introduction of autonomous cars, different technologies used in self-driving cars, the overview about Artificial Intelligent, Machine Learning, Deep Learning, the basically of Convolution Neural Network used to navigate car prototype in this thesis and
2.1 INTRODUCTION OF SELF – DRIVING CARS
A self-driving car (driverless, autonomous, robotic car) is a vehicle that is capable of sensing its environment and navigating without human input Self-driving cars can detect environments using a variety of techniques such as radar, GPS and computer vision Advanced control systems interpret sensory information
to identify appropriate navigational paths, as well as obstacles and relevant signage Self-driving cars have control systems that are capable of analyzing sensory data to distinguish between different cars on the road This is very useful in planning a path
to the desired destination Autonomous car technology aims to achieve:
- The benefits of technology by processing large amounts of data and using it
to make intelligent decisions
- The human ability to be adaptted to known or unknown environments
Autonomy still implies personal ownership Looking into the future, some believe that steering wheels will disappear completely and the vehicle will do all the driving using the same system of sensors, radar, and GPS mapping that today's driverless cars employ This will be something that is ultimately up to the self-driving car companies currently building the future of this technology The answer
for how cars are getting smarter is illustrated in Figure 2.1
Figure 2.1 How Cars are getting smarter
Trang 20Figure 2.2 Important components of a self-driving car
2.2 DIFFERENT TECHNOLOGIES USED IN SELF-DRIVING CARS
Self driving autonomous cars use various automotive technologies to provide an effortless mode of transportation Providing this type of transportation requires a harmonious sychronization of advanced sensors gathering information about the surrounding environments, sophisticated algorithms processing that data and controlling the vehicles, and computational power processing it all in real time The software can recognize objects, people, cars, road marking, signs and traffic lights, obeying the rules of the road and allowing for multiple unpredictable hazards, including cyclists It can even detect road works and safely navigate
around them Figure 2.2 shows the important components of a self-driving vehicle
The list of parts and their functionalities will be discussed in this section
2.2.1 Laser
A laser is a device that emits light through a process of optical amplification based on the stimulated emission of electromagnetic radiation The term "laser" originated as an acronym for light amplification by stimulated emission of radiation The new system, developed by researchers at the University of California, Berkeley, can remotely sense objects across distances as long as 30 feet, 10 times farther than what could be done with comparable current low-power laser systems With further development, the technology could be used to make smaller, cheaper 3D imaging systems that offer exceptional range for use in self-driving cars
Trang 21Figure 2.3 A laser sensor on the roof constantly scans the surroundings
2.2.2 Lidar
LIDAR, which stands for Light Detection and Ranging, is a remote sensing method that uses light in the form of a pulsed laser to measure ranges (variable distances) to the Earth These light pulses—combined with other data recorded by the airborne system— generate precise, three-dimensional information about the shape of the Earth and its surface characteristics
A LIDAR instrument principally consists of a laser, a scanner, and a specialized GPS receiver Airplanes and helicopters are the most commonly used platforms for acquiring LIDAR data over broad areas Two types of LIDAR are topographic and bathymetric Topographic LIDAR typically uses a near-infrared laser to map the land, while bathymetric lidar uses water-penetrating green light to also measure seafloor and riverbed elevations LIDAR systems allow scientists and mapping professionals to examine both natural and manmade environments with accuracy, precision, and flexibility
Lidar uses ultraviolet, visible or near infrared light to image objects It can target
a wide range of materials, including nonmetallic objects, rocks, rain, chemical compounds, aerosols, clouds and even single molecules A narrow laser-beam can map physical features with very high resolutions
NASA has identified lidar as a key technology for enabling autonomous precision safe landing of future robotic and crewed lunar-landing vehicles
Trang 22Figure 2.4 The map was drawn by LIDAR
Wavelengths of lidar vary from 10 micrometers to approximately 250nm (UV) The backscattering property of light is the key to its functionality Different types of scattering are used for different lidar applications: most commonly Rayleigh scattering, Mie scattering, Raman scattering, and fluorescence Suitable combinations of wavelengths can allow for remote mapping of atmospheric contents by identifying wavelength-dependent changes in the intensity of the returned signal
The structure and functionality of LIDAR is shown in Figure 2.4 In general, there are two kinds of lidar detection schemes:
- Incoherent or direct energy detection: Principally it is an amplitude
measurement
- Coherent detection: Coherent systems generally use Optical heterodyne
detection, which, being more sensitive than direct detection, allows them to operate
at a much lower power but at the expense of more complex transceiver requirements This is best for doppler, or phase sensitive measurements
In both kinds of lidar, there are two types of pulse models:
- Micropulse lidar systems: Micropulse systems have developed as a result of the ever increasing amount of computer power available combined with advances in
laser technology They use considerably less energy in the laser, typically on the order of one microjoule, and are often "eye-safe," meaning they can be used without safety precautions
- High energy systems: High-power systems are common in atmospheric research, where they are widely used for measuring many atmospheric parameters:
Trang 23the height, layering and densities of clouds, cloud particle properties, temperature, pressure, wind, humidity, trace gas concentration
Figure 2.5 Structure and functionality of LIDAR
There are several major components to a lidar system:
- Laser: 600 - 1000nm lasers are most common for non-scientific applications
They are inexpensive, but since they can be focused and easily absorbed by the eye, the maximum power is limited by the need to make them eye-safe Eye-safety is often a requirement for most applications A common alternative, 1550 nm lasers, are eye-safe at much higher power levels since this wavelength is not focused by the eye, but the detector technology is less advanced and so these wavelengths are generally used at longer ranges and lower accuracies They are also used for military applications as 1550 nm is not visible in night vision goggles, unlike the shorter 1000 nm infrared laser Airborne topographic mapping lidars generally use
1064 nm diode pumped YAG lasers, while bathymetric systems generally use 532
nm frequency doubled diode pumped YAG lasers because 532 nm penetrates water with much less attenuation than does 1064 nm Laser settings include the laser repetition rate, which controls the data collection speed Pulse length is generally an attribute of the laser cavity length, the number of passes required through the gain material (YAG, YLF, etc.), and Q-switch speed
- Scanner and optics: How fast images can be developed is also affected by
the speed at which they are scanned There are several options to scan the azimuth and elevation, including dual oscillating plane mirrors, a combination with a polygon mirror, a dual axis scanner (see Laser scanning) Optic choices affect the angular resolution and range that can be detected A hole mirror or a beam splitter are options to collect a return signal
- Photodetector and receiver electronics: Two main photodetector
technologies are used in lidars: solid state photodetectors, such as silicon avalanche photodiodes, or photomultipliers The sensitivity of the receiver is another parameter that has to be balanced in a lidar design
Trang 24- Position and navigation systems: Lidar sensors that are mounted on mobile
platforms such as airplanes or satellites require instrumentation to determine the absolute position and orientation of the sensor Such devices generally include a Global Positioning System receiver and an Inertial Measurement Unit (IMU)
- 3D Imaging: 3D imaging can be achieved using both scanning and scanning systems "3D gated viewing laser radar" is a non-scanning laser ranging system that applies a pulsed laser and a fast gated camera
non-Imaging lidar can also be performed using arrays of high speed detectors and modulation sensitive detector arrays typically built on single chips using CMOS and hybrid CMOS/CCD fabrication techniques In these devices each pixel performs some local processing such as demodulation or gating at high speed, downconverting the signals to video rate so that the array may be read like a camera Using this technique many thousands of pixels / channels may be acquired simultaneously High resolution 3D lidar cameras use homodyne detection with an electronic CCD or CMOS shutter
A coherent Imaging lidar uses Synthetic array heterodyne detection to enable a staring single element receiver to act as though it were an imaging array
There are a wide variety of applications for lidar in Agriculture, Archaeology, Autonomous vehicles, Biology and conservation, Geology and soil science, Atmospheric remote sensing and meteorology, Law enforcement, Military, Mining, Physics and astronomy, Robotics, Spaceflight, Surveying, Transport, Wind farm optimization, Solar photovoltaic deployment optimization and so much more
2.2.3 Radar
The RADAR system works in much the same way as the LiDAR, with the only difference being that it uses radio waves instead of laser In the RADAR instrument, the antenna doubles up as a radar receiver as well as a transmitter However, radio waves have less absorption compared to the light waves when contacting objects Thus, they can work over a relatively long distance The most well-known use of RADAR technology is for military purposes Airplanes and battleships are often equipped with RADAR to measure altitude and detect other transport devices and objects in the vicinity
The RADAR system, on the other hand, is relatively less expensive Cost is one
of the reasons why Tesla has chosen this technology over LiDAR It also works equally well in all weather conditions such as fog, rain, and snow, and dust However, it is less angularly accurate than LiDAR as it loses the sight of the target vehicle on curves It may get confused if multiple objects are placed very close to
Trang 25each other For example, it may consider two small cars in the vicinity as one large vehicle and send wrong proximity signal Unlike the LiDAR system, RADAR can determine relative traffic speed or the velocity of a moving object accurately using the Doppler frequency shift
Figure 2.6 Comparison between LIDAR and RADAR
2.2.4 GPS
GPS is a constellation of satellites that provides a user with an accurate position
on the surface of the earth This satellite based on navigation system was developed
by the U.S Department of Defense (DoD) in early 1970s It was first intended for military use but later it was made available to civilian users GPS can provide precise position and time information to a user anywhere in the world It is a way system i.e a user can only receive signals but cannot send signals to the satellite This type of configuration is needed due to security reasons as wells as to serve unlimited number of users Cars with these systems or with on-board navigation systems also have global position (or GPS) systems installed GPS uses satellites to assess the exact location of a car Using GPS- based on techniques alone, a lot of the challenges can be triumphed over for the development of autonomous cars
2.2.5 Camera
Cameras are already commonplace on modern cars Since 2018, all new vehicles
in the US are required to fit reversing cameras as standard Any car with a lane departure warning system (LDW) will use a front-facing camera to detect painted markings on the road Autonomous vehicles are no different Almost all development vehicles today feature some sort of visible light camera for detecting road markings – many feature multiple cameras for building a 360-degree view of the vehicle’s environment Cameras are very good at detecting and recognizing
Trang 26objects, so the image data they produce can be fed to AI-based algorithms for object classification
Figure 2.7 Camera on Autonomous Car
Some companies, such as Mobileye, rely on cameras for almost all of their sensing However, they are not without their drawbacks Just like your own eyes, visible light cameras have limited capabilities in conditions of low visibility Additionally, using multiple cameras generates a lot of video data to process, which requires substantial computing hardware
Beyond visible light cameras, there are also infrared cameras, which offer superior performance in darkness and additional sensing capabilities
2.2.6 Ultrasonic Sensors
Figure 2.8 Placement of ultrasonic sensors for PAS
Trang 27Figure 2.9 Ultrasonic Sensor on Autonomous Car
In an ADAS system, ultrasonic sensors play an important role in the parking of vehicles, avoiding obstacles in blind spots, and detecting pedestrians One of the companies providing ultrasound sensors for ADAS systems is Murata They
provide ultrasonic sensors for up to 10 meters, which are optimum for a Parking Assistance System (PAS) Figure 2.8 and Figure 2.9 shows where ultrasonic sensors
are placed on a car
2.3 OVERVIEW ABOUT ARTIFICIAL INTELLIGENCE
2.3.1 Artificial Intelligence
Artificial intelligence (AI) is a way of making a computer, a controlled robot, or a software think intelligently, in the similar manner the intelligent humans think AI is accomplished by studying how human brain thinks, and how humans learn, decide, and work while trying to solve a problem, and then using the outcomes of this study as a basis of developing intelligent software and systems
computer-Artificial intelligence is a science and technology based on disciplines such as Computer Science, Biology, Psychology, Linguistics, Mathematics, and Engineering A major thrust of AI is in the development of computer functions associated with human intelligence, such as reasoning, learning, and problem solving
To further explain the goals of Artificial Intelligence, researchers extended their primary goal to these six main goals :
Trang 28-Planning, Scheduling and Optimization: Enable a computer to set goals and
achieve them They need a way to visualize the future-a representation of the state
of the world and be able to make predictions about how their actions will change it and be able to make choices that maximize the utility of available choices
-Natural processing language: Gives machines the ability to read
and understand human language A sufficiently powerful natural language processing system would enable natural-language user interfaces and the acquisition
of knowledge directly from human-written sources, such as newswire texts
-Speech processing: is the study of speech signals and the processing methods
of signals The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied
to speech signals Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals The input is called speech recognition and the output is called speech synthesis
-Machine Learning: A fundamental concept of AI research since the field’s
inception, is the study of computer algorithms that improve automatically through experience
-Robotics: Advanced robotic arms and other industrial robots, widely used in
modern factories, can learn from experience how to move efficiently despite the presence of friction and gear slippage
-Vision: a field of study that seeks to develop techniques to help computers “see”
and understand the content of digital images such as photographs and videos
-Expert systems: is a computer system that emulates the decision-making ability
of a human expert Expert systems are designed to solve complex problems
by reasoning through bodies of knowledge, represented mainly as if–then rules rather than through conventional procedural code
2.3.2 Machine Learning
Machine Learning (ML) is a subset of Artificial Intelligence ML is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead Machine learning algorithms build a mathematical model based
on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task Machine learning algorithms are used in a wide variety of applications, such as email filtering, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task Machine Learning is closely related
Trang 29to computational statistics, which focuses on making predictions using computers The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning
Figure 2.10 Relation Between AI, Machine Learning, and Deep Learning
Every machine learning algorithm has three components:
Representation: how to represent knowledge Examples include decision
trees, sets of rules, instances, graphical models, neural networks, support vector machines, model ensembles and others
Evaluation: the way to evaluate candidate programs (hypotheses) Examples
include accuracy, prediction and recall, squared error, likelihood, posterior probability, cost, margin, entropy k-L divergence and others
Optimization: the way candidate programs are generated known as the
search process For example, combinatorial optimization, convex
optimization, constrained optimization
There are four types of machine learning:
Supervised Learning: The learning algorithm would fall under this category
if the desired output for the network is also provided with the input while training the network By providing the neural network with both an input and output pair it
is possible to calculate an error based on its target output and actual output It can then use that error to make corrections to the network by updating its weights
Trang 30 Unsupervised Learning: In this paradigm, the neural network is only given a
set of inputs and it's the neural network's responsibility to find some kind of pattern within the inputs provided without any external aid This type of learning paradigm
is often used in data mining and is also used by many recommendation algorithms due to their ability to predict a user's preferences based on the preferences of other similar users it has grouped together
Semi-supervised learning: Training data includes a few desired outputs
Reinforcement Learning: Reinforcement learning is similar to supervised
learning in that some feedback is given, however instead of providing a target output a reward is given based on how well the system performed The aim of reinforcement learning is to maximize the reward the system receives through trial-and-error This paradigm relates strongly to how learning works in nature, for example, an animal might remember the actions it's previously taken which helped
it to find food (the reward)
Supervised learning is the most mature, the most studied and the type of learning used by most machine learning algorithms Learning with supervision is much easier than learning without supervision Inductive Learning is where we are given examples of a function in the form of data (x) and the output of the function (f(x)) The goal of inductive learning is to learn the function for new data (x)
Classification: when the function being learned is discrete
Regression: when the function being learned is continuous
Probability Estimation: when the output of the function is a probability
2.3.3 Deep Learning
Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans: learn by example Deep learning is a key technology behind driverless cars, enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost It is the key to voice control in consumer devices like phones, tablets, TVs, and hands-free speakers Deep learning is getting lots of attention lately and for good reason It’s achieving results that were not possible before
Trang 31Figure 2.11 Neural networks, which are organized in layers consisting of a set of interconnected nodes Networks can have tens or hundreds of hidden layers
In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound Deep learning models can achieve state-of-the-art accuracy, sometimes exceeding human-level performance Models are trained by using a large set of labeled data and neural network architectures that contain many layers Most deep learning methods use neural network architectures, which is why deep learning models are often referred to as deep neural networks The term
“deep” usually refers to the number of hidden layers in the neural network Traditional neural networks only contain 2-3 hidden layers, while deep networks can have as many as 150
Neural Network is a concept at the heart of Deep Learning, which can be thought
of as similar to a function in a programing language The best simple unit of Neural
Network, a Perceptron, takes in inputs (like parameters of a function), runs through
the process (function steps), and finally provides a response (the output of the function)
Figure 2.12 A simple Neural Network or a Perceptron
Trang 32The Neural Networks generate solutions that can classify data based on the factors that affect the classification Also, Perceptron are just an encoding of our solutions in a graphical format-depicted similar to the neurons in the brain
Figure 2.13 Compare performance between DL with order learning algorithms
Trang 33CHAPTER 3: CONVOLUTIONAL NEURAL NETWORK
3.1 INTRODUCTION
Convolutional Neural Network (CNN) is a Deep Learning algorithm which can take an input image, assign importance (learnable weights and biases) to various aspects or objects in image and be able to differentiate one from the other The pre-processing required in CNN is much lower as compared to other classification algorithms While in primitive methods filter are hand-engineered, with enough training, CNN has abilities to learn these filters or characteristics The breakthrough
of CNNs is that features are learned automatically from training examples
CNNs are used to evaluate inputs through convolutions The input is convolved with a filter This convolution leads the network to detect edges and lower level features in earlier layers and more complex features in deeper layers in the network CNNs are used in combination with pooling layers and they often have fully connected layers at the end, as you can see in the picture below Run forward propagation as you would in a vanilla neural network and minimize the loss function through backpropagation to train the CNNs
3.2 STRUCTURE OF CONVOLUTIONAL NEURAL NETWORKS
All CNN models follow a similar architecture A simple CNN is a sequence of layer, and every layer of a CNN transforms one volume of activations to another through a differentiable function We use three main types of layers to build CNN architecture: Convolution Layer, Pooling Layer and Fully-Connected Layer (exactly
as seen in regular Neural Networks) These layers are also called as Hidden Layer
We will stack these layers to form a full CNN architecture
Figure 3.1 CNN architecture
Trang 343.2.1 Convolution Layer
Convolution Layer performs an operation called a “convolution” A convolution
is a linear operation on two functions to produce a third function that expresses how the shape of one is modified by the other Two function input are an input data and
a correlation kernel array (called filter) Both of them are combined to produce an output (third function) through a cross-correlation operation
The filter is smaller than the input data It first positioned at the top-left corner of the input array Then, it slides cross the input array, both from left to right and top
to bottom Every time the filter slide to a position on the input array, the input subarray contained that window and the kernel array are multiplied (element-wise) and the resulting array is summed up yielding a single scalar value As the filter is applied multiple times to the input array, the result is a two-dimensional array of output values that present a filtering of the input As such, the two-dimensional output array from this operation is called a “feature map”
Using a filter that is smaller than the input data is intentional as it allows the same filter (set of weights) to be multiplied by the input array multiple times at different points on the input Specifically, the filter is applied systematically to each overlapping part of filter-sized patch of the input data, from left to right, from top to bottom If the filter is designed to detect a specific type of feature in the input, then the application of that filter systemically across the entire input image allows the filter an opportunity to discover anywhere in the image This capability is commonly referred to as translation invariance The general interest in the whether the feature is present rather where it was present
Figure 3.2 The input data, filter and result of a convolution layer
Trang 35Figure 3.3 The convolutional operation of a CNN I is an input array K is a
kernal I*K is a output of convolution operation
Figure 3.4 The result of a convolution operation (a) is input image
(b) is features map of images after a convolution layer
The mathematical formula of convolution operation is illustrated in Equation 3.1
(𝐼 ∗ 𝐾)𝑥𝑦 = ∑ ∑𝑤 𝐾𝑖𝑗 ∙ 𝐼𝑥+𝑖−1,𝑦+𝑖−1
𝑗=1
ℎ
Where I is an input image, K is the filter size of h x w
The output size of feature map after giving the input data through convolution
operation is illustrated in Equation 3.2
Trang 36Figure 3.5 Perform multiple convolutions on an input
Figure 3.6 The convolution operation for each filter
3.2.2 Activation function
Activation functions are mathematical equations that determine the output of a neural network The function is attached to each neuron in the network, and determines whether it should be activated (“fired”) or not, based on whether each neuron’s input is relevant for the model’s prediction They help remove abnormal noise and transfer from linear network to non-linear network Activation functions also help normalize the output of each neuron to a range between 0 and 1 or between -1 and 1 or other range
After every convolutional layer, we usually apply activation function If activation function isn’t applied, the problem is that the Neural Network would behave just like a single perception, because the sum of all the layers would still be
a linear network, meaning the output could be calculated as the linear combination
of the outputs A linear equation is easy to solve but they are limited in their complexity and have less power to learn complex functional mappings from data A
Neural Network without activation function would simply be a linear regression
Trang 37model, which is limited power and does not performs good most of the times We
want our neural network to not only learn and compute a linear function but also something more complicated than that Thus, a non-linear function will help us that
Most popular types of activation functions such as Sigmoid, Tanh, ReLU
Sigmoid Activation function is an activation function of form f(x) = 1 / (1 + exp(-x)) Its range is between 0 and 1 It is a shaped curve It is easy to
understand and apply but it has major reasons which have made it fall out of popularity are vanishing gradient problem, its output is not zero centered (between 0 and 1) that makes optimization harder, it takes time to convergence
Tanh’s mathematical is f(x) = (1 – exp(-2x)) / (1 + exp(-2x)) Its output is
zero centered (between -1 and 1) Hence optimization is easier but still suffers from vanishing gradient problem
ReLU (Rectified linear units) has become very popular in the past couple of
years It was recently proved that it had six times improvement in
convergence from Tanh function It’s just f(x) = max(0, x) i.e if x < 0 ,
f(x) = 0 and if x >= 0 , f(x) = x Hence as seeing the mathematical form of
this function we can see that it is very simple and efficient A lot of times in Machine learning and computer science we notice that most simple and
consistent techniques and methods are only preferred and are best Hence, it
avoids and rectifies vanishing gradient problem Almost all deep learning
models use ReLU nowadays But its limitation is that it should be used
within hidden layers of a neural network model
3.2.3 Stride and Padding
Stride is the number of pixels shifts over the input matrix if the stride is one then, it means we move the filters to one pixel at a time from left to right and top to bottom If the stride is two, it means we move the filters to two pixels There are two types of results to the operation-one in which the convolved feature is reduced
in dimensionality as compared to the input, and the other in which the dimensionality is either increased or remains the same
If the edge of input array includes some of necessary information We should
apply Same Padding which keeps the size of output array is equals or bigger than the size of input array On the other hand, we should apply Valid Padding to remove the unnecessary on the edge of input array Valid Padding means no
padding
Trang 38
Figure 3.7 Apply zero Padding for input matrix
3.2.4 Pooling Layer
Figure 3.8 The max pooling operation
Similar to the Convolution Layer, the Pooling Layer is responsible for reducing the spatial size of the convolved feature This is to decrease the computational power required to process the data through dimensionality reduction Dimensionality reduction helps decrease calculated volume and extract low level features from neighborhood pixels Furthermore, it is useful for extracting dominant features which are rotational and positional invariant, thus maintaining the processing of effectively training of the model
There are two types of Pooling: Max Pooling and Average Pooling Max Pooling returns the maximum value from the portion of the image covered by the kernel On the other hand, Average Pooling returns the average of all the values from the portion of the image covered by the kernel
Trang 39Max Pooling extracts the most important features like edges On the other hand, Average Pooling extracts features so smoothly Thus, it depends on the type of dataset to choose Max Pooling or Average Pooling
In CNN architectures, Pooling is typically performed with 2x2 windows, stride 2 and no padding
3.2.5 Fully-Connected layer
Adding a Fully-Connected Layer is a cheap way of learning non-linear combinations of the high-level features as represented by the output of the convolutional layer The Fully-Connected Layer is learning a possibly non-linear function in that space To classify images, we must convert the feature map into a suitable form for Multi-Level Perception Thus, we shall flatten the image into a column vector The fatten output is fed to a feed-forward neural network and backpropagation applied to every iteration training Over a series of epochs, the model is able to distinguish between dominating and certain low-level features in
images and classify them using the Softmax function or Sigmoid function When
we want to classify multiple classes (more than two), we must use Softmax
function On the other hand, we just use Sigmoid function to classify two classes
Figure 3.9 The deep Neural Networks to classify multiple classes
Trang 403.3 NETWORK ARCHITECTURE AND PARAMETER OPTIMIZATION
The network architecture comprised of 9 layers, including 5 convolutional layers and 4 fully connected ones The input image is 66x200x3 (height x width x depth format) Convolutional layers were designed to perform feature extraction and were chosen empirically through a series of experiments with varied layer configurations The first three convolutional layers use the 7x7 kernel size, a stride of 2x2 The respective depth of each layer is 8, 16, 32 and 64 to push network going deeper The local feature is continue to be processed in last two convolutional layers with kernel size 3×3, depth of 64 After the convolutional layers, the output gets flatten and then followed by a series of fully connected layers, of gradually decreasing sizes: 100,
50, 20 and 7 All hidden layers are equipped with the rectified linear unit (ReLU) to improve convergence From feature vectors, we apply softmax to calculate steering wheel angle probability The whole network will have roughly 194,173 parameters and will offer great training performance on modest hardware
Figure 3.10 Network architecture