vision based navigation of autonomous car using convolutional neural network

Embedded Computing Platforms for Real-Time Inferencing Real-time embedded systems, such as an autonomous vehicle, present unique challenges for deep learning, as the computing platforms

Trang 1

CONTENTS

THESIS TASKS i

SCHEDULE ii

ASSURANCE STATEMENT iv

ACKNOWLEDGEMENT v

ADVISOR’S COMMENT SHEET vi

REVIEWER’S COMMENT SHEET vii

CONTENTS viii

ABBREVIATIONS AND ACRONYMS x

LIST OF FIGURES xii

LIST OF TABLES xv

ABSTRACT xvi

CHAPTER 1: OVERVIEW 1

1.1 INTRODUCTION 1

1.2 BACKGROUND AND RELATED WORK 2

1.2.1 Overview about Autonomous Car 2

1.2.2 Literature Review and Other Study 3

1.3 OBJECTIVES OF THE THESIS 7

1.4 OBJECT AND RESEARCHING SCOPE 7

1.5 RESEARCHING METHOD 8

1.6 THE CONTENT OF THESIS 9

CHAPTER 2: THE PRINCIPLE OF SELF – DRIVING CARS 10

2.1 INTRODUCTION OF SELF – DRIVING CARS 10

2.2 DIFFERENT TECHNOLOGIES USED IN SELF-DRIVING CARS 11

2.2.1 Laser 11

2.2.2 Lidar 12

2.2.3 Radar 15

2.2.4 GPS 16

2.2.5 Camera 16

2.2.6 Ultrasonic Sensors 17

2.3 OVERVIEW ABOUT ARTIFICIAL INTELLIGENCE 18

2.3.1 Artificial Intelligence 18

2.3.2 Machine Learning 19

2.3.3 Deep Learning 21

CHAPTER 3: CONVOLUTIONAL NEURAL NETWORK 24

3.1 INTRODUCTION 24

3.2 STRUCTURE OF CONVOLUTIONAL NEURAL NETWORKS 24

3.2.1 Convolution Layer 25

3.2.2 Activation function 27

3.2.3 Stride and Padding 28

3.2.4 Pooling Layer 29

Trang 2

3.2.5 Fully-Connected layer 30

3.3 NETWORK ARCHITECTURE AND PARAMETER OPTIMIZATION 31

3.4 OBJECT DETECTION 32

3.4.1 Single Shot Detection framework 32

3.4.2 MobileNet Architecture 34

3.4.3 Non-Maximum Suppression 38

3.5 OPTIMIZE NEURAL NETWORKS 39

3.5.1 Types of Gradient Descent 39

3.5.2 Types of Optimizer 40

CHAPTER 4: HARDWARE DESIGN OF SELF-DRIVING CAR PROTOTYPE 43

4.1 HARDWARE COMPONENTS 43

4.1.1 1/10 Scale 4WD Off Road Remote Control Car Buggy Desert 43

4.1.2 Brushed Motor RC-540PH 44

4.1.3 Motor control module BTS7960 45

4.1.4 RC Servo MG996 47

4.1.5 Raspberry Pi 3 Model B+ 47

4.1.6 NVIDIA Jetson Nano Developer Kit 50

4.1.7 Camera Logitech C270 53

4.1.8 Arduino Nano 54

4.1.9 Lipo Battery 2S-30C 2200mAh 56

4.1.10 Voltage reduction module 57

4.1.11 USB UART PL2303 59

4.2 HARDWARE WIRING DIAGRAM 59

4.2.1 Construct The Hardware Platform 60

4.2.2 PCB Of Hardware 61

CHAPTER 5: CONTROL ALGORITHMS OF SELF-DRIVING CAR PROTOTYPE 62

5.1 CONTROL THEORY 62

5.1.1 Servo Control Theory 62

5.1.2 UART Communication Theory 64

5.2 FLOWCHART OF COLLECTING TRAINING DATA 67

5.3 FLOWCHART OF NAVIGATING THE CAR USING TRAINED MODEL 68 CHAPTER 6: EXPERIMENTS 69

6.1 EXPERIMENTAL ENVIRONMENTS 69

6.2 COLLECT DATA 70

6.3 DATA AUGMENTATIONS 71

6.4 TRAINING PROCESS 72

6.5 OUTDOOR EXPERIMENTS RESULTS 75

CHAPTER 7: CONCLUSION AND FUTURE WORK 77

REFERENCES 79

Trang 3

ABBREVIATIONS AND ACRONYMS

ADAS : Advanced Driving Assistance System

CCD : Charge Coupled Device

CMOS : Complimentary Metal-Oxide Semiconductor

CNN : Convolutional Neural Network

DSP : Digital Signal Processing

FOV : Field of View

FPGA : Field-Programmable Gate Array

GPS : Global Positioning System

GPIO : General Purpose Input-Output

GPU : Graphics Processing Unit

IMU : Inertial Measurement Unit

LIDAR : Light Detection And Ranging

PAS : Parking Assistance System

PCB : Printed Circuit Board

PWM: Pulse Width Modulation

RADAR : Radio Detection And Ranging

Trang 4

SSD : Single Shot Detection

YOLO : You Only Look Once

SGD : Stochastic Gradient Descent

RMSProp : Root Mean Square Propagation

NAG : Nesterov Accelerated Gradient

Adagrad : Adaptive Gradient Algorithm

Adam : Adaptive Moment Estimation

Trang 5

LIST OF FIGURES

Figure 1.1 Google’s fully Self-Driving Car design introduced in May 2014 3

Figure 1.2 Features of Google Self-Driving Car 4

Figure 1.3 Uber Self-driving Car 5

Figure 1.4 Self-driving Car of HCMUTE 7

Figure 2.1 How Cars are getting smarter 10

Figure 2.2 Important components of a self-driving car 11

Figure 2.3 A laser sensor on the roof constantly scans the surroundings 12

Figure 2.4 The map was drawn by LIDAR 13

Figure 2.5 Structure and functionality of LIDAR 14

Figure 2.6 Comparison between LIDAR and RADAR 16

Figure 2.7 Camera on Autonomous Car 17

Figure 2.8 Placement of ultrasonic sensors for PAS 17

Figure 2.9 Ultrasonic Sensor on Autonomous Car 18

Figure 2.10 Relation Between AI, Machine Learning, and Deep Learning 20

Figure 2.11 Neural networks, which are organized in layers consisting of a set of interconnected nodes Networks can have tens or hundreds of hidden layers 22

Figure 2.12 A simple Neural Network or a Perceptron 22

Figure 2.13 Compare performance between DL with order learning algorithms 23

Figure 3.1 CNN architecture 24

Figure 3.2 The input data, filter and result of a convolution layer 25

Figure 3.3 The convolutional operation of a CNN I is an input array K is a kernal I*K is a output of convolution operation 26

Figure 3.4 The result of a convolution operation (a) is input image 26

Figure 3.5 Perform multiple convolutions on an input 27

Figure 3.6 The convolution operation for each filter 27

Figure 3.7 Apply zero Padding for input matrix 29

Figure 3.8 The max pooling operation 29

Figure 3.9 The deep Neural Networks to classify multiple classes 30

Figure 3.10 Network architecture 31

Figure 3.11 Classify multiple classes in image (a) Image with generate boxes (b) Default boxes with 8x8 feature map (c) Default boxes with 4x4 feature map 32

Figure 3.12 The standard convolutional filters in (a) are replaced by two layers: depthwise convolution in (b) and pointwise convolution in (c) to build a depthwise separate filter 36

Figure 3.13 The left side is a convolutional layer with batchnorm and ReLU The right side is a depth wise separate convolutions with depthwise and pointwise layers followed by batchnorm and ReLU 37

Figure 3.14 Detected Images before applying Non-max suppression (a) Left traffic sign, (b) Car object 38

Trang 6

Figure 3.15 Detected Images after applying Non-max suppression (a) Left traffic

sign, (b) Car object 38

Figure 4.1 Block diagram of RC self-driving car platform 43

Figure 4.2 1/10 Scale 4WD Off Road Remote Control Car Dessert Buggy 43

Figure 4.3 Brushed Motor RC-540PH 45

Figure 4.4 Outline drawing of Brushed motor RC-540 45

Figure 4.5 Motor control motor BTS7960 46

Figure 4.6 Digital RC Servo FR-1501MG 47

Figure 4.7 Raspberry Pi 3 Model B+ 47

Figure 4.8 Raspberry Pi block function 49

Figure 4.9 Raspberry Pi 3 B+ Pinout 49

Figure 4.10 Location of connectors and main ICs on Raspberry Pi 3 50

Figure 4.11 NVIDIA Jetson Nano Developer Kit 50

Figure 4.12 Jetson Nano compute module with 260-pin edge connector 51

Figure 4.13 NVIDIA Jetson Nano Pinout 52

Figure 4.14 Performance of various deep learning inference networks with Jetson Nano and TensorRT, using FP16 precision and batch size 1 53

Figure 4.15 Camera Logitech C270 53

Figure 4.16 Arduino Nano 54

Figure 4.17 Arduino Nano Pinout 56

Figure 4.18 Lipo Battery 2S-30C 2200mAh 56

Figure 4.19 LM2596S Dual USB Type 57

Figure 4.20 DC XH-M404 XL4016E1 8A 58

Figure 4.21 USB UART PL2303 59

Figure 4.22 Hardware Wiring Diagram 59

Figure 4.23 Hardware Platform (a) Front, (b) Side 60

Figure 4.24 Circuit diagram 61

Figure 4.25 Layout of the PCB 61

Figure 5.1 Inside of RC Servo 63

Figure 5.2 Variable Pulse width control servo position 63

Figure 5.3 UART Wiring 64

Figure 5.4 UART receives data in parallel from the data bus 65

Figure 5.5 Data Frame of Transmitting UART 66

Figure 5.6 Transmitting and Receiving UART 66

Figure 5.7 Data Frame of Receiving UART 66

Figure 5.8 Converts the serial data back into parallel 66

Figure 5.9 Flowchart of collect image training 67

Figure 5.10 Flowchart of Navigating the Car using Trained Model 68

Figure 6.1 The overall oval-shaped lined track 69

Figure 6.2 Lined track and traffic signs recognition 69

Figure 6.3 Traffic signs (a) Left, (b) Right, (c) Stop 70

Figure 6.4 Some typical images of the Dataset 70

Figure 6.5 Horizontal Flipis (a) Original image, (b) Horizontal flipped image 71

Trang 7

Figure 6.6 Brightness Augmentation (a) Original image, (b) Brighter image and

(c) Darker image 72

Figure 6.7 GUI of Training App 72 Figure 6.8 Model is under training 73 Figure 6.9 The visualization output of convolutional layers (a) is an originally

selected frame.(b), (c), (d), (e) and (f) are the feature maps at first five

convolutional layers 74

Figure 6.10 Change in loss value throughout training 74 Figure 6.11 Change in accuracy value throughout training 75 Figure 6.12 Experimental Results: The top is images and the bottom is outputs after

through Softmax function of the model (a) Steer is100, (b) Steer is 110, (c) Steer is

120, (d) Steer is 130, (e) Steer is 140, (f) Steer is 150, (g) Steer is 160 75

Figure 6.13 The actual and predicted steering wheel angles of the models 76 Figure 6.14 The outputs of the object detection model 76

Trang 8

LIST OF TABLES

Table 3.1 MobileNet body Architecture 377

Table 4.1 Specification of Brushed motor RC-540 455

Table 4.2 Specifications of Raspberry Pi 3 Model B+ 488

Table 4.3 Specifications of NVIDIA Jetson Nano 51

Table 4.4 Technical specifications of Arduino Nano 555

Trang 9

as output The experimental results demonstrate the effectiveness and robustness of autopilot model in lane keeping task and detect traffic sign The speed is about 5-6km/h in a wide variety of driving conditions, regardless of whether lane markings are present or not

Keywords: Computer Vison, Real-time navigation, self-driving car, Object

Detection Model, Classification Model

Trang 10

CHAPTER 1: OVERVIEW

1.1 INTRODUCTION

Lately year, technology companies have been discussing autonomous cars and trucks Promises of life-changing safety and ease have been hung on these vehicles Now some of these promises are beginning to come to fruition, as cars with more and more autonomous features hit the market each year

Autonomous car is most likely still years away from being available to consumers, they are closer than many people think Current estimates predict that

by 2025 the world will see over 600,000 self-driving cars on the road, and by 2035 that number will jump to almost 21 million Trials of self-driving car services have actually begun in some cities in the United States And even though fully self-driving cars are not on the market yet, current technology allows vehicles to be more autonomous than ever before Using intricate systems of cameras, lasers, radar, GPS, and interconnected communication between vehicles

Since their introduction in the early 1990s, Convolutional Neural Networks (CNNs) [1] have been the most popular deep learning architecture due to the effectiveness of conv-nets on image related problems such as handwriting recognition, facial recognition, cancer cell classification The breakthrough of CNNs is that features are learned automatically from training examples Although their primary disadvantage is the requirements of very large amounts of training data, recent studies have shown that excellent performance can be achieved with networks trained using “generic” data For the last few years, CNNs have achieved state-of-the-art performance in most of important classification tasks [2], Object detection tasks [3]; Generative Adversarial Networks [4]

Besides, with the increase in computational capacities, we are presently able to train complex neural networks to understand the vehicle’s environment and decide its behavior For example, Tesla Model S was known to use a specialized chip (MobileEye EyeQ), which used a deep neural network for vision-based real-time obstacle detection and avoidance More recently, researchers are investigating DNN based end-to-end control of cars and other robots [5] Executing CNN on an embedded computing platform has several challenges First, despite many calculations, strict real-time requirements are demanded For instance, latency in a vision-based object detection task may be directly linked to the safety of the vehicle On the other hand, the computing hardware platform must also satisfy cost, size, weight, and power constraints These two conflicting requirements complicate

Trang 11

the platform selection process as observed in There are already several relatively low-cost RC-car based prototypes, such as MIT’s RaceCar [6] and UPenn’s F1/10 racecar

Encouraged by these positive results, in this thesis we develop a real-time end deep learning based RC-car platform In term of hardware, it is included a Raspberry Pi 3 Model B plus quad-core computer, a Jetson Nano Developer Kit , two Logitech camera, an Arduino Nano and a 1/10 scale RC car This research target is training two model: a vision-oriented model and an object detection model

end-to-to real-time auend-to-tonomously navigate the car in outdoor environments with a wide variety of driving conditions

1.2 BACKGROUND AND RELATED WORK

1.2.1 Overview about Autonomous Car

Autonomous Car is the car that can be self-driving, they combine sensors and software to control, navigate, and drive the vehicle Currently, there are no legally operating, fully-autonomous vehicles in the United States However, there are partially-autonomous vehicles-cars and trucks with varying amounts of self-automation, from conventional cars with brake and lane assistance to highly-independent, self-driving prototypes Though still in its infancy, self-driving technology is becoming increasingly common and could radically transform our transportation system (and by extension, our economy and society) Based on automaker and technology company estimates, level 4 self-driving cars could be for sale in the next several years (see the callout box for details on autonomy levels) Various self-driving technologies have been developed by Google, Uber, Tesla, Nissan, and other major automakers, researchers, and technology companies

While design details vary, most self-driving systems create and maintain an internal map of their surroundings, based on a wide array of sensors, like radar Uber’s self-driving prototypes use sixty-four laser beams, along with other sensors,

to construct their internal map; Google’s prototypes have, at various stages, used lasers, radar, high-powered cameras, and sonar Software then processes those inputs, plots a path, and sends instructions to the vehicle’s “actuators,” which control acceleration, braking, and steering Hard-coded rules, obstacle avoidance algorithms, predictive modeling, and “smart” object discrimination (etc, knowing the difference between a bicycle and a motorcycle) help the software follow traffic rules and navigate obstacles Partially-autonomous vehicles may require a human driver to intervene if the system encounters uncertainty; fully-autonomous vehicles may not even offer a steering wheel Self-driving cars can be further distinguished

Trang 12

as being “connected” or not, indicating whether they can communicate with other vehicles and/or infrastructure, such as next generation traffic lights Most prototypes

do not currently have this capability

1.2.2 Literature Review and Other Study

1.2.2.1 Google Self-driving Car Project

Although many companies are racing to be the ones to bring a fully autonomous, commercially viable vehicle to the market including Lyft, Ford, Uber, Honda, Toyota, Tesla and many others, it’s Waymo, the autonomous vehicle division of Alphabet, Google’s parent company, that has been the first to reach many milestones along the journey

At the end of last year, the Waymo team announced on November 7, 2017,

“Starting now, Waymo's fully self-driving vehicles—our safest, most advanced vehicles on the road today—are test-driving on public roads, without anyone in the driver's seat.”

These vehicles have been on the roads of Chandler, AZ, a suburb of Phoenix, since mid-October without a safety driver behind the wheel, although until further notice there is a Waymo employee in the back seat Waymo vehicles are equipped with powerful sensors that provide them with 360-degree views of the world, something a human behind the wheel never gets There are short-range lasers and those that can see up to 300 meters away

These vehicles don’t have free rein to drive wherever they want quite yet—they are “geofenced” within a 100-square-mile area As the cars collect more data and acquire more driving experience, that area will expand Waymo has an Early Rider program that allows those to apply who might be interested in using the autonomous vehicles to transport them around town

Figure 1.1 Google’s fully Self-Driving Car design introduced in May 2014

Trang 13

Figure 1.2 Features of Google Self-Driving Car

Technology: Google’s robotic cars have about $150,000 in equipment as shown

in Figure 1.2 including a LIDAR system that itself costs $70,000 The Velodyne

64-beam laser range finder mounted on top allows the vehicle to generate a detailed 3D map of its environment The car takes these generated maps and combines them with high-resolution maps of the world, producing different types of data models that are then used for driving itself Some of these computations are performed on remote computer farms, in addition to on-board systems

Limitations: As of August 28, 2014 the latest prototype has not been tested in

heavy rain or snow due to safety concerns The car still relies primarily on programmed route data; they do not obey temporary traffic signals, and in some situations, revert to a slower “extra cautious” mode in complex unmapped intersections The lidar technology cannot spot some potholes or discern when humans, such as a police officer, are signaling the car to stop However, Google is having these issues fixed by 2020

pre-1.2.2.2 Uber Self-driving Car Project

Uber thought it would have 75,000 autonomous vehicles on the roads this year and be operating driverless taxi services in 13 cities by 2022, according to court documents unsealed last week To reach those ambitious goals, the ridesharing company, which hopes to go public later this year, was spending $20 million a month on developing self-driving technologies

The figures, dating back to 2016, paint a picture of a company desperate to meet over-ambitious autonomy targets and one that is willing to spend freely, even recklessly, to get there As Uber prepares for its IPO later this year, the new details could prove an embarrassing reminder that the company is still trailing in its efforts

Trang 14

Figure 1.3 Uber Self-driving Car

to develop technology that founder Travis Kalanick called “existential” to Uber’s future The report was written for Uber as part of last year’s patent and trade secret theft lawsuit with rival Waymo, which accused engineer Anthony Levandowski of taking technical secrets with him when he left Google to found self-driving truck startup Otto Uber acquired Otto in 2016 Uber hired Walter Bratic, the author of the report, as an expert witness to question Waymo’s valuation of the economic damages it had suffered — a whopping $1.85 billion Bratic’s report capped at

$605,000 the cost to independently develop Waymo’s purported trade secrets

Waymo eventually settled for 0.34 percent of Uber’s equity, which could be worth around $300 million after an IPO if a recent $90 billion valuation of the company is accurate Bratic’s report provides details of internal analyses and reports codenamed Project Rubicon that Uber carried out during 2016 A presentation in January that year projected that driverless cars could become profitable for Uber in

2018, while a May report said Uber might have 13,000 self-driving taxis by 2019 Just four months later, that estimate had jumped to 75,000 vehicles

The current head of Uber’s self-driving technologies, Eric Meyhofer, testified that Uber’s original estimates of having tens of thousands of AVs in a dozen cities

by 2022 were “highly speculative” “assumptions and estimates.” Although Meyhofer declined to provide any other numbers, he did say, “They probably ran a lot of scenarios beyond 13 cities Maybe they assumed two in another scenario, or one, or three hundred It’s a set of knobs you turn to try to understand parameters that you need to try to meet.”

Trang 15

One specific goal, set by John Bares, the engineer then in charge of Uber’s autonomous vehicles, was for Uber to be able to forgo human safety drivers by

2020 The company’s engineers seemed certain that acquiring Otto and Levandowski would supercharge its progress

1.2.2.3 Embedded Computing Platforms for Real-Time Inferencing

Real-time embedded systems, such as an autonomous vehicle, present unique challenges for deep learning, as the computing platforms of such systems must satisfy two often conflicting goals [7]: The platform must provide enough computing capacity for real-time processing of computationally expensive AI workloads (deep neural networks); The platform must also satisfy various constraints such as cost, size, weight, and power consumption limits

Accelerating AI workloads, especially inferencing operations, has received a lot

of attention from academia and industry in recent years as applications of deep learning are broadening to areas of real-time embedded systems such as autonomous vehicles These efforts include the development of various heterogeneous architecture-based system-on-a-chips (SOCs) that may include multiple cores, GPU, DSP, FPGA, and neural network optimized ASIC hardware Consolidating multiple tasks on SoCs with lots of shared hardware resources while guaranteeing real-time performance is also an active research area, which is orthogonal to improving raw performance Consolidation is necessary for efficiency, but unmanaged interference can nullify the benefits of consolida-tion For these reasons, finding a good computing platform is a non-trivial task, one that requires a deep understanding of the workloads and the hardware platform being utilized

1.2.2.4 Real-Time Self-Driving Car Navigation Using Deep Neural Network (Paper)

This is the paper was published on the 4th International Conference on Green Technology and Sustainable Development – GTSD 2018

This paper presented a monocular vision-based self-driving car prototype using Deep Neural Network First, the CNN model parameters were trained by using data collected from vehicle platform built with a 1/10 scale RC car, Raspberry Pi 3 Model B computer and front-facing camera The training data were road images paired with the time-synchronized steering angle generated by manually driving Second, road tests the model on Raspberry to drive itself in the outdoor environment around oval-shaped and 8-shaped with traffic sign lined track The

Trang 16

Figure 1.4 Self-driving Car of HCMUTE

experimental results demonstrate the effectiveness and robustness of autopilot model in lane keeping task Vehicle's top speed is about 5-6km/h in a wide variety

of driving conditions, regardless of whether lane markings are present or not [8] In this paper, model Classification was used and the training accuracy was 92,38%

1.3 OBJECTIVES OF THE THESIS

The thesis concentrated on several key goals:

 Research will be conducted into the theory of an autonomous vehicle and the various considerations that may be necessary during the construction of a self-driving car prototype

 Researching and using the suitable and compatible components such as sensor, microcontroller, motor, driver, power supply, communication, etc… In detail, this thesis will mainly focus on using Raspberry Pi 3 Model B+, NVIDIA Jetson Nano Kit, Camera Module, Arduino Nano and some of Driver

 Using Convolutional Neural Network directly maps raw input images to a predicted steering angle and detect some object about traffic sign (Left, Right, Stop) and “Car” object as output

 The target is the car can drive itself real-time in the outdoor environment into the map and obey the traffic sign

1.4 OBJECT AND RESEARCHING SCOPE

In the scope of this thesis, a real-time end-to-end deep learning based on RC-car platform was develop using 1/10 scale RC Car chassis, Raspberry Pi 3 Model B+, NVIDIA Jetson Nano Kit, Logitech Camera and Arduino Nano Convolutional Neural Network (CNN) directly maps raw input images to a predicted steering

Trang 17

angle as output Road testing the trained model on Raspberry and Jetson Nano to drive itself real-time in the outdoor environment around the map with traffic sign

lined track Then evaluating the effectiveness and robustness of autopilot model in lane keeping task rely on experimental results

1.5 RESEARCHING METHOD

The research project was divided into chapters, each a sequential step in the process of developing and building the self-driving car prototype This approach was utilized in an attempt to progress the project from one task to the next, as it was undertaken Each is defined so that it builds on the previous task thus evolving the robot within the goals and requirements generated This ultimately led to the completion of the car that met the objectives within the timeframe available

The first step is to define the key point and objective to deeply understanding what a real-time end-to-end deep learning based self-driving car and convolutional neural network actually is Besides that, it is critical to determining plans for conducting research and performing the suitable design and programming

The second step is to refer the previous projects It is useful to have appropriate approaches This established the foundations for making an informed decision based on the previous experiences to avoid mistakes and obsolete design

The third step in theoretical method was to apply knowledge studied to control system This approach provides valuable data of the likely stability capability of the control system by finding the suitable parameters through many days

The next step is to design, manufacture PCB, chassis, drive shaft, etc…and assembly all hardware components In parallel, for making the good system, the following step is to analyze the performance of the system This also provides the chances to calibrate and perform additional changes to making the final system The practical method is to directly design mechanism and PCB This approach allows for testing on real system and making the final assessment by the response of self – driving car prototype

Road testing the trained model on Raspberry to drive itself real-time in the outdoor environment around oval-shaped and 8-shaped with traffic sign lined track Then evaluating the effectiveness and robustness of autopilot model in lane keeping task rely on experimental results

Trang 18

1.6 THE CONTENT OF THESIS

The thesis “Research, Design and Construct Real–Time Self–Driving Car using Deep Neural Network” includes the following chapters:

Chapter I: Overview: This chapter provides a brief overview of the requirements

of the report including introduction, goals, scope and content of the thesis

Chapter II: The principle of self-driving cars: This chapter provides the basic

knowledge about this thesis such as the principle of self-driving car, artificial intelligence, machine learning, deep learning, convolutional neural network

Chapter III: Convolutional Neural Network: This chapter gives the

knowledge about Convolutional Neural Network, the Structure of CNN

Chapter IV: Hardware Design of Self-driving car Prototype: This chapter

provides the car’s design, choosing hardware After that, constructing the car platform based on the design

Chapter V: Control Algorithms of Self-driving car Prototype: This chapter

provides algorithms, diagrams of software

Chapter VI: Experiment: This chapter shows the experimental results of this

thesis

Chapter VII: Conclusion and Future Work: This chapter provides the

conclusion in term of advantage and limitation of this thesis This also concludes the contribution and proposing ideas, orientations for future work

Trang 19

CHAPTER 2: THE PRINCIPLE OF SELF – DRIVING CARS

In this section, the overview of self-driving cars and related technologies is presented This includes the introduction of autonomous cars, different technologies used in self-driving cars, the overview about Artificial Intelligent, Machine Learning, Deep Learning, the basically of Convolution Neural Network used to navigate car prototype in this thesis and

2.1 INTRODUCTION OF SELF – DRIVING CARS

A self-driving car (driverless, autonomous, robotic car) is a vehicle that is capable of sensing its environment and navigating without human input Self-driving cars can detect environments using a variety of techniques such as radar, GPS and computer vision Advanced control systems interpret sensory information

to identify appropriate navigational paths, as well as obstacles and relevant signage Self-driving cars have control systems that are capable of analyzing sensory data to distinguish between different cars on the road This is very useful in planning a path

to the desired destination Autonomous car technology aims to achieve:

- The benefits of technology by processing large amounts of data and using it

to make intelligent decisions

- The human ability to be adaptted to known or unknown environments

Autonomy still implies personal ownership Looking into the future, some believe that steering wheels will disappear completely and the vehicle will do all the driving using the same system of sensors, radar, and GPS mapping that today's driverless cars employ This will be something that is ultimately up to the self-driving car companies currently building the future of this technology The answer

for how cars are getting smarter is illustrated in Figure 2.1

Figure 2.1 How Cars are getting smarter

Trang 20

Figure 2.2 Important components of a self-driving car

2.2 DIFFERENT TECHNOLOGIES USED IN SELF-DRIVING CARS

Self driving autonomous cars use various automotive technologies to provide an effortless mode of transportation Providing this type of transportation requires a harmonious sychronization of advanced sensors gathering information about the surrounding environments, sophisticated algorithms processing that data and controlling the vehicles, and computational power processing it all in real time The software can recognize objects, people, cars, road marking, signs and traffic lights, obeying the rules of the road and allowing for multiple unpredictable hazards, including cyclists It can even detect road works and safely navigate

around them Figure 2.2 shows the important components of a self-driving vehicle

The list of parts and their functionalities will be discussed in this section

2.2.1 Laser

A laser is a device that emits light through a process of optical amplification based on the stimulated emission of electromagnetic radiation The term "laser" originated as an acronym for light amplification by stimulated emission of radiation The new system, developed by researchers at the University of California, Berkeley, can remotely sense objects across distances as long as 30 feet, 10 times farther than what could be done with comparable current low-power laser systems With further development, the technology could be used to make smaller, cheaper 3D imaging systems that offer exceptional range for use in self-driving cars

Trang 21

Figure 2.3 A laser sensor on the roof constantly scans the surroundings

2.2.2 Lidar

LIDAR, which stands for Light Detection and Ranging, is a remote sensing method that uses light in the form of a pulsed laser to measure ranges (variable distances) to the Earth These light pulses—combined with other data recorded by the airborne system— generate precise, three-dimensional information about the shape of the Earth and its surface characteristics

A LIDAR instrument principally consists of a laser, a scanner, and a specialized GPS receiver Airplanes and helicopters are the most commonly used platforms for acquiring LIDAR data over broad areas Two types of LIDAR are topographic and bathymetric Topographic LIDAR typically uses a near-infrared laser to map the land, while bathymetric lidar uses water-penetrating green light to also measure seafloor and riverbed elevations LIDAR systems allow scientists and mapping professionals to examine both natural and manmade environments with accuracy, precision, and flexibility

Lidar uses ultraviolet, visible or near infrared light to image objects It can target

a wide range of materials, including nonmetallic objects, rocks, rain, chemical compounds, aerosols, clouds and even single molecules A narrow laser-beam can map physical features with very high resolutions

NASA has identified lidar as a key technology for enabling autonomous precision safe landing of future robotic and crewed lunar-landing vehicles

Trang 22

Figure 2.4 The map was drawn by LIDAR

Wavelengths of lidar vary from 10 micrometers to approximately 250nm (UV) The backscattering property of light is the key to its functionality Different types of scattering are used for different lidar applications: most commonly Rayleigh scattering, Mie scattering, Raman scattering, and fluorescence Suitable combinations of wavelengths can allow for remote mapping of atmospheric contents by identifying wavelength-dependent changes in the intensity of the returned signal

The structure and functionality of LIDAR is shown in Figure 2.4 In general, there are two kinds of lidar detection schemes:

- Incoherent or direct energy detection: Principally it is an amplitude

measurement

- Coherent detection: Coherent systems generally use Optical heterodyne

detection, which, being more sensitive than direct detection, allows them to operate

at a much lower power but at the expense of more complex transceiver requirements This is best for doppler, or phase sensitive measurements

In both kinds of lidar, there are two types of pulse models:

- Micropulse lidar systems: Micropulse systems have developed as a result of the ever increasing amount of computer power available combined with advances in

laser technology They use considerably less energy in the laser, typically on the order of one microjoule, and are often "eye-safe," meaning they can be used without safety precautions

- High energy systems: High-power systems are common in atmospheric research, where they are widely used for measuring many atmospheric parameters:

Trang 23

the height, layering and densities of clouds, cloud particle properties, temperature, pressure, wind, humidity, trace gas concentration

Figure 2.5 Structure and functionality of LIDAR

There are several major components to a lidar system:

- Laser: 600 - 1000nm lasers are most common for non-scientific applications

They are inexpensive, but since they can be focused and easily absorbed by the eye, the maximum power is limited by the need to make them eye-safe Eye-safety is often a requirement for most applications A common alternative, 1550 nm lasers, are eye-safe at much higher power levels since this wavelength is not focused by the eye, but the detector technology is less advanced and so these wavelengths are generally used at longer ranges and lower accuracies They are also used for military applications as 1550 nm is not visible in night vision goggles, unlike the shorter 1000 nm infrared laser Airborne topographic mapping lidars generally use

1064 nm diode pumped YAG lasers, while bathymetric systems generally use 532

nm frequency doubled diode pumped YAG lasers because 532 nm penetrates water with much less attenuation than does 1064 nm Laser settings include the laser repetition rate, which controls the data collection speed Pulse length is generally an attribute of the laser cavity length, the number of passes required through the gain material (YAG, YLF, etc.), and Q-switch speed

- Scanner and optics: How fast images can be developed is also affected by

the speed at which they are scanned There are several options to scan the azimuth and elevation, including dual oscillating plane mirrors, a combination with a polygon mirror, a dual axis scanner (see Laser scanning) Optic choices affect the angular resolution and range that can be detected A hole mirror or a beam splitter are options to collect a return signal

- Photodetector and receiver electronics: Two main photodetector

technologies are used in lidars: solid state photodetectors, such as silicon avalanche photodiodes, or photomultipliers The sensitivity of the receiver is another parameter that has to be balanced in a lidar design

Trang 24

- Position and navigation systems: Lidar sensors that are mounted on mobile

platforms such as airplanes or satellites require instrumentation to determine the absolute position and orientation of the sensor Such devices generally include a Global Positioning System receiver and an Inertial Measurement Unit (IMU)

- 3D Imaging: 3D imaging can be achieved using both scanning and scanning systems "3D gated viewing laser radar" is a non-scanning laser ranging system that applies a pulsed laser and a fast gated camera

non-Imaging lidar can also be performed using arrays of high speed detectors and modulation sensitive detector arrays typically built on single chips using CMOS and hybrid CMOS/CCD fabrication techniques In these devices each pixel performs some local processing such as demodulation or gating at high speed, downconverting the signals to video rate so that the array may be read like a camera Using this technique many thousands of pixels / channels may be acquired simultaneously High resolution 3D lidar cameras use homodyne detection with an electronic CCD or CMOS shutter

A coherent Imaging lidar uses Synthetic array heterodyne detection to enable a staring single element receiver to act as though it were an imaging array

There are a wide variety of applications for lidar in Agriculture, Archaeology, Autonomous vehicles, Biology and conservation, Geology and soil science, Atmospheric remote sensing and meteorology, Law enforcement, Military, Mining, Physics and astronomy, Robotics, Spaceflight, Surveying, Transport, Wind farm optimization, Solar photovoltaic deployment optimization and so much more

2.2.3 Radar

The RADAR system works in much the same way as the LiDAR, with the only difference being that it uses radio waves instead of laser In the RADAR instrument, the antenna doubles up as a radar receiver as well as a transmitter However, radio waves have less absorption compared to the light waves when contacting objects Thus, they can work over a relatively long distance The most well-known use of RADAR technology is for military purposes Airplanes and battleships are often equipped with RADAR to measure altitude and detect other transport devices and objects in the vicinity

The RADAR system, on the other hand, is relatively less expensive Cost is one

of the reasons why Tesla has chosen this technology over LiDAR It also works equally well in all weather conditions such as fog, rain, and snow, and dust However, it is less angularly accurate than LiDAR as it loses the sight of the target vehicle on curves It may get confused if multiple objects are placed very close to

Trang 25

each other For example, it may consider two small cars in the vicinity as one large vehicle and send wrong proximity signal Unlike the LiDAR system, RADAR can determine relative traffic speed or the velocity of a moving object accurately using the Doppler frequency shift

Figure 2.6 Comparison between LIDAR and RADAR

2.2.4 GPS

GPS is a constellation of satellites that provides a user with an accurate position

on the surface of the earth This satellite based on navigation system was developed

by the U.S Department of Defense (DoD) in early 1970s It was ﬁrst intended for military use but later it was made available to civilian users GPS can provide precise position and time information to a user anywhere in the world It is a way system i.e a user can only receive signals but cannot send signals to the satellite This type of conﬁguration is needed due to security reasons as wells as to serve unlimited number of users Cars with these systems or with on-board navigation systems also have global position (or GPS) systems installed GPS uses satellites to assess the exact location of a car Using GPS- based on techniques alone, a lot of the challenges can be triumphed over for the development of autonomous cars

2.2.5 Camera

Cameras are already commonplace on modern cars Since 2018, all new vehicles

in the US are required to fit reversing cameras as standard Any car with a lane departure warning system (LDW) will use a front-facing camera to detect painted markings on the road Autonomous vehicles are no different Almost all development vehicles today feature some sort of visible light camera for detecting road markings – many feature multiple cameras for building a 360-degree view of the vehicle’s environment Cameras are very good at detecting and recognizing

Trang 26

objects, so the image data they produce can be fed to AI-based algorithms for object classification

Figure 2.7 Camera on Autonomous Car

Some companies, such as Mobileye, rely on cameras for almost all of their sensing However, they are not without their drawbacks Just like your own eyes, visible light cameras have limited capabilities in conditions of low visibility Additionally, using multiple cameras generates a lot of video data to process, which requires substantial computing hardware

Beyond visible light cameras, there are also infrared cameras, which offer superior performance in darkness and additional sensing capabilities

2.2.6 Ultrasonic Sensors

Figure 2.8 Placement of ultrasonic sensors for PAS

Trang 27

Figure 2.9 Ultrasonic Sensor on Autonomous Car

In an ADAS system, ultrasonic sensors play an important role in the parking of vehicles, avoiding obstacles in blind spots, and detecting pedestrians One of the companies providing ultrasound sensors for ADAS systems is Murata They

provide ultrasonic sensors for up to 10 meters, which are optimum for a Parking Assistance System (PAS) Figure 2.8 and Figure 2.9 shows where ultrasonic sensors

are placed on a car

2.3 OVERVIEW ABOUT ARTIFICIAL INTELLIGENCE

2.3.1 Artificial Intelligence

Artificial intelligence (AI) is a way of making a computer, a controlled robot, or a software think intelligently, in the similar manner the intelligent humans think AI is accomplished by studying how human brain thinks, and how humans learn, decide, and work while trying to solve a problem, and then using the outcomes of this study as a basis of developing intelligent software and systems

computer-Artificial intelligence is a science and technology based on disciplines such as Computer Science, Biology, Psychology, Linguistics, Mathematics, and Engineering A major thrust of AI is in the development of computer functions associated with human intelligence, such as reasoning, learning, and problem solving

To further explain the goals of Artificial Intelligence, researchers extended their primary goal to these six main goals :

Trang 28

-Planning, Scheduling and Optimization: Enable a computer to set goals and

achieve them They need a way to visualize the future-a representation of the state

of the world and be able to make predictions about how their actions will change it and be able to make choices that maximize the utility of available choices

-Natural processing language: Gives machines the ability to read

and understand human language A sufficiently powerful natural language processing system would enable natural-language user interfaces and the acquisition

of knowledge directly from human-written sources, such as newswire texts

-Speech processing: is the study of speech signals and the processing methods

of signals The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied

to speech signals Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals The input is called speech recognition and the output is called speech synthesis

-Machine Learning: A fundamental concept of AI research since the field’s

inception, is the study of computer algorithms that improve automatically through experience

-Robotics: Advanced robotic arms and other industrial robots, widely used in

modern factories, can learn from experience how to move efficiently despite the presence of friction and gear slippage

-Vision: a field of study that seeks to develop techniques to help computers “see”

and understand the content of digital images such as photographs and videos

-Expert systems: is a computer system that emulates the decision-making ability

of a human expert Expert systems are designed to solve complex problems

by reasoning through bodies of knowledge, represented mainly as if–then rules rather than through conventional procedural code

2.3.2 Machine Learning

Machine Learning (ML) is a subset of Artificial Intelligence ML is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead Machine learning algorithms build a mathematical model based

on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task Machine learning algorithms are used in a wide variety of applications, such as email filtering, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task Machine Learning is closely related

Trang 29

to computational statistics, which focuses on making predictions using computers The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning

Figure 2.10 Relation Between AI, Machine Learning, and Deep Learning

Every machine learning algorithm has three components:

 Representation: how to represent knowledge Examples include decision

trees, sets of rules, instances, graphical models, neural networks, support vector machines, model ensembles and others

 Evaluation: the way to evaluate candidate programs (hypotheses) Examples

include accuracy, prediction and recall, squared error, likelihood, posterior probability, cost, margin, entropy k-L divergence and others

 Optimization: the way candidate programs are generated known as the

search process For example, combinatorial optimization, convex

optimization, constrained optimization

There are four types of machine learning:

 Supervised Learning: The learning algorithm would fall under this category

if the desired output for the network is also provided with the input while training the network By providing the neural network with both an input and output pair it

is possible to calculate an error based on its target output and actual output It can then use that error to make corrections to the network by updating its weights

Trang 30

 Unsupervised Learning: In this paradigm, the neural network is only given a

set of inputs and it's the neural network's responsibility to find some kind of pattern within the inputs provided without any external aid This type of learning paradigm

is often used in data mining and is also used by many recommendation algorithms due to their ability to predict a user's preferences based on the preferences of other similar users it has grouped together

 Semi-supervised learning: Training data includes a few desired outputs

 Reinforcement Learning: Reinforcement learning is similar to supervised

learning in that some feedback is given, however instead of providing a target output a reward is given based on how well the system performed The aim of reinforcement learning is to maximize the reward the system receives through trial-and-error This paradigm relates strongly to how learning works in nature, for example, an animal might remember the actions it's previously taken which helped

it to find food (the reward)

Supervised learning is the most mature, the most studied and the type of learning used by most machine learning algorithms Learning with supervision is much easier than learning without supervision Inductive Learning is where we are given examples of a function in the form of data (x) and the output of the function (f(x)) The goal of inductive learning is to learn the function for new data (x)

 Classification: when the function being learned is discrete

 Regression: when the function being learned is continuous

 Probability Estimation: when the output of the function is a probability

2.3.3 Deep Learning

Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans: learn by example Deep learning is a key technology behind driverless cars, enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost It is the key to voice control in consumer devices like phones, tablets, TVs, and hands-free speakers Deep learning is getting lots of attention lately and for good reason It’s achieving results that were not possible before

Trang 31

Figure 2.11 Neural networks, which are organized in layers consisting of a set of interconnected nodes Networks can have tens or hundreds of hidden layers

In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound Deep learning models can achieve state-of-the-art accuracy, sometimes exceeding human-level performance Models are trained by using a large set of labeled data and neural network architectures that contain many layers Most deep learning methods use neural network architectures, which is why deep learning models are often referred to as deep neural networks The term

“deep” usually refers to the number of hidden layers in the neural network Traditional neural networks only contain 2-3 hidden layers, while deep networks can have as many as 150

Neural Network is a concept at the heart of Deep Learning, which can be thought

of as similar to a function in a programing language The best simple unit of Neural

Network, a Perceptron, takes in inputs (like parameters of a function), runs through

the process (function steps), and finally provides a response (the output of the function)

Figure 2.12 A simple Neural Network or a Perceptron

Trang 32

The Neural Networks generate solutions that can classify data based on the factors that affect the classification Also, Perceptron are just an encoding of our solutions in a graphical format-depicted similar to the neurons in the brain

Figure 2.13 Compare performance between DL with order learning algorithms

Trang 33

CHAPTER 3: CONVOLUTIONAL NEURAL NETWORK

3.1 INTRODUCTION

Convolutional Neural Network (CNN) is a Deep Learning algorithm which can take an input image, assign importance (learnable weights and biases) to various aspects or objects in image and be able to differentiate one from the other The pre-processing required in CNN is much lower as compared to other classification algorithms While in primitive methods filter are hand-engineered, with enough training, CNN has abilities to learn these filters or characteristics The breakthrough

of CNNs is that features are learned automatically from training examples

CNNs are used to evaluate inputs through convolutions The input is convolved with a filter This convolution leads the network to detect edges and lower level features in earlier layers and more complex features in deeper layers in the network CNNs are used in combination with pooling layers and they often have fully connected layers at the end, as you can see in the picture below Run forward propagation as you would in a vanilla neural network and minimize the loss function through backpropagation to train the CNNs

3.2 STRUCTURE OF CONVOLUTIONAL NEURAL NETWORKS

All CNN models follow a similar architecture A simple CNN is a sequence of layer, and every layer of a CNN transforms one volume of activations to another through a differentiable function We use three main types of layers to build CNN architecture: Convolution Layer, Pooling Layer and Fully-Connected Layer (exactly

as seen in regular Neural Networks) These layers are also called as Hidden Layer

We will stack these layers to form a full CNN architecture

Figure 3.1 CNN architecture

Trang 34

3.2.1 Convolution Layer

Convolution Layer performs an operation called a “convolution” A convolution

is a linear operation on two functions to produce a third function that expresses how the shape of one is modified by the other Two function input are an input data and

a correlation kernel array (called filter) Both of them are combined to produce an output (third function) through a cross-correlation operation

The filter is smaller than the input data It first positioned at the top-left corner of the input array Then, it slides cross the input array, both from left to right and top

to bottom Every time the filter slide to a position on the input array, the input subarray contained that window and the kernel array are multiplied (element-wise) and the resulting array is summed up yielding a single scalar value As the filter is applied multiple times to the input array, the result is a two-dimensional array of output values that present a filtering of the input As such, the two-dimensional output array from this operation is called a “feature map”

Using a filter that is smaller than the input data is intentional as it allows the same filter (set of weights) to be multiplied by the input array multiple times at different points on the input Specifically, the filter is applied systematically to each overlapping part of filter-sized patch of the input data, from left to right, from top to bottom If the filter is designed to detect a specific type of feature in the input, then the application of that filter systemically across the entire input image allows the filter an opportunity to discover anywhere in the image This capability is commonly referred to as translation invariance The general interest in the whether the feature is present rather where it was present

Figure 3.2 The input data, filter and result of a convolution layer

Trang 35

Figure 3.3 The convolutional operation of a CNN I is an input array K is a

kernal I*K is a output of convolution operation

Figure 3.4 The result of a convolution operation (a) is input image

(b) is features map of images after a convolution layer

The mathematical formula of convolution operation is illustrated in Equation 3.1

(𝐼 ∗ 𝐾)𝑥𝑦 = ∑ ∑𝑤 𝐾𝑖𝑗 ∙ 𝐼𝑥+𝑖−1,𝑦+𝑖−1

𝑗=1

ℎ

Where I is an input image, K is the filter size of h x w

The output size of feature map after giving the input data through convolution

operation is illustrated in Equation 3.2

Trang 36

Figure 3.5 Perform multiple convolutions on an input

Figure 3.6 The convolution operation for each filter

3.2.2 Activation function

Activation functions are mathematical equations that determine the output of a neural network The function is attached to each neuron in the network, and determines whether it should be activated (“fired”) or not, based on whether each neuron’s input is relevant for the model’s prediction They help remove abnormal noise and transfer from linear network to non-linear network Activation functions also help normalize the output of each neuron to a range between 0 and 1 or between -1 and 1 or other range

After every convolutional layer, we usually apply activation function If activation function isn’t applied, the problem is that the Neural Network would behave just like a single perception, because the sum of all the layers would still be

a linear network, meaning the output could be calculated as the linear combination

of the outputs A linear equation is easy to solve but they are limited in their complexity and have less power to learn complex functional mappings from data A

Neural Network without activation function would simply be a linear regression

Trang 37

model, which is limited power and does not performs good most of the times We

want our neural network to not only learn and compute a linear function but also something more complicated than that Thus, a non-linear function will help us that

Most popular types of activation functions such as Sigmoid, Tanh, ReLU

 Sigmoid Activation function is an activation function of form f(x) = 1 / (1 + exp(-x)) Its range is between 0 and 1 It is a shaped curve It is easy to

understand and apply but it has major reasons which have made it fall out of popularity are vanishing gradient problem, its output is not zero centered (between 0 and 1) that makes optimization harder, it takes time to convergence

 Tanh’s mathematical is f(x) = (1 – exp(-2x)) / (1 + exp(-2x)) Its output is

zero centered (between -1 and 1) Hence optimization is easier but still suffers from vanishing gradient problem

 ReLU (Rectified linear units) has become very popular in the past couple of

years It was recently proved that it had six times improvement in

convergence from Tanh function It’s just f(x) = max(0, x) i.e if x < 0 ,

f(x) = 0 and if x >= 0 , f(x) = x Hence as seeing the mathematical form of

this function we can see that it is very simple and efficient A lot of times in Machine learning and computer science we notice that most simple and

consistent techniques and methods are only preferred and are best Hence, it

avoids and rectifies vanishing gradient problem Almost all deep learning

models use ReLU nowadays But its limitation is that it should be used

within hidden layers of a neural network model

3.2.3 Stride and Padding

Stride is the number of pixels shifts over the input matrix if the stride is one then, it means we move the filters to one pixel at a time from left to right and top to bottom If the stride is two, it means we move the filters to two pixels There are two types of results to the operation-one in which the convolved feature is reduced

in dimensionality as compared to the input, and the other in which the dimensionality is either increased or remains the same

If the edge of input array includes some of necessary information We should

apply Same Padding which keeps the size of output array is equals or bigger than the size of input array On the other hand, we should apply Valid Padding to remove the unnecessary on the edge of input array Valid Padding means no

padding

Trang 38

Figure 3.7 Apply zero Padding for input matrix

3.2.4 Pooling Layer

Figure 3.8 The max pooling operation

Similar to the Convolution Layer, the Pooling Layer is responsible for reducing the spatial size of the convolved feature This is to decrease the computational power required to process the data through dimensionality reduction Dimensionality reduction helps decrease calculated volume and extract low level features from neighborhood pixels Furthermore, it is useful for extracting dominant features which are rotational and positional invariant, thus maintaining the processing of effectively training of the model

There are two types of Pooling: Max Pooling and Average Pooling Max Pooling returns the maximum value from the portion of the image covered by the kernel On the other hand, Average Pooling returns the average of all the values from the portion of the image covered by the kernel

Trang 39

Max Pooling extracts the most important features like edges On the other hand, Average Pooling extracts features so smoothly Thus, it depends on the type of dataset to choose Max Pooling or Average Pooling

In CNN architectures, Pooling is typically performed with 2x2 windows, stride 2 and no padding

3.2.5 Fully-Connected layer

Adding a Fully-Connected Layer is a cheap way of learning non-linear combinations of the high-level features as represented by the output of the convolutional layer The Fully-Connected Layer is learning a possibly non-linear function in that space To classify images, we must convert the feature map into a suitable form for Multi-Level Perception Thus, we shall flatten the image into a column vector The fatten output is fed to a feed-forward neural network and backpropagation applied to every iteration training Over a series of epochs, the model is able to distinguish between dominating and certain low-level features in

images and classify them using the Softmax function or Sigmoid function When

we want to classify multiple classes (more than two), we must use Softmax

function On the other hand, we just use Sigmoid function to classify two classes

Figure 3.9 The deep Neural Networks to classify multiple classes

Trang 40

3.3 NETWORK ARCHITECTURE AND PARAMETER OPTIMIZATION

The network architecture comprised of 9 layers, including 5 convolutional layers and 4 fully connected ones The input image is 66x200x3 (height x width x depth format) Convolutional layers were designed to perform feature extraction and were chosen empirically through a series of experiments with varied layer configurations The first three convolutional layers use the 7x7 kernel size, a stride of 2x2 The respective depth of each layer is 8, 16, 32 and 64 to push network going deeper The local feature is continue to be processed in last two convolutional layers with kernel size 3×3, depth of 64 After the convolutional layers, the output gets flatten and then followed by a series of fully connected layers, of gradually decreasing sizes: 100,

50, 20 and 7 All hidden layers are equipped with the rectified linear unit (ReLU) to improve convergence From feature vectors, we apply softmax to calculate steering wheel angle probability The whole network will have roughly 194,173 parameters and will offer great training performance on modest hardware

Figure 3.10 Network architecture

Định dạng
Số trang	89
Dung lượng	4,63 MB
File đính kèm	Code.rar (3 KB)