(Đồ án hcmute) design start stop circuit through traffic detection

TECHNOLOGY AND EDUCATION MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY UNIVERSITY OF DESIGN START STOP CIRCUIT THROUGH TRAFFIC DETECTION... HO CHI MINH CITY UNIVERSITY OF TECHNOL

Trang 1

TECHNOLOGY AND EDUCATION

MINISTRY OF EDUCATION AND TRAINING

HO CHI MINH CITY UNIVERSITY OF

DESIGN START STOP CIRCUIT THROUGH

TRAFFIC DETECTION

Trang 2

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION

FACULTY FOR HIGH QUALITY TRAINING

GRADUATION PROJECT

Ho Chi Minh City, 24 December 2022

DESIGN START STOP CIRCUIT THROUGH TRAFFIC DETECTION

Students: Le Chan Pham ID: 18145045

Vo Huy Vu ID: 18145080

Advisor: Assoc.Prof Do Van Dung

Trang 3

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION

FACULTY FOR HIGH QUALITY TRAINING

Advisor: Assoc.Prof Do Van Dung

Ho Chi Minh City, 24 December 2022

Trang 4

THE SOCIALIST REPUBLIC OF VIETNAM

Independence – Freedom– Happiness

-

Ho Chi Minh City, December, 2022

GRADUATION PROJECT ASSIGNMENT

Student name: LE CHAN PHAM Student ID: 18145018

Student name: VO HUY VU Student ID: 18145030

Major: Automotive engineering technology

Advisor: Assoc.Prof DO VAN DUNG Phone number: 0966879932

Date of assignment: Octorber 2022 Date of submission: December 2022

1 Project title: Design Start Stop circuit through object detection

2 Equipment: Laptop with GPU, HD Camera, Arduino UNO

3 Content of the project: Research convolutional neural networks, YOLO algorithm model, train YOLO model, evaluate the results and use output to control Arduino

4 Final product: Traffic light detection system through webcam, videos and images

CHAIR OF THE PROGRAM ADVISOR

Sign with full name Sign with full name

Trang 5

-

ADVISOR’S EVALUATION SHEET

Project title: Design Start Stop circuit through object detection

6 Mark: ……… (In words: )

Ho Chi Minh City, … month, … day, … year

ADVISOR

(Sign with full name)

Trang 6

-

PRE-DEFENSE EVALUATION SHEET

REVIEWER

Trang 7

-

EVALUATION SHEET OF DEFENSE COMMITTEE MEMBER

Name of Defense Committee Member: ………

COMMITTEE MEMBER

Trang 9

ACKNOWLEDGE

Throughout our studies and graduation process, my team was always cared for,

guided, and assisted by teachers from the Faculty of High Quality Training, as well as

the support and assistance from friends and colleagues

First and foremost, we want to thanks The Board of Directors of Ho Chi Minh City

University of Technology and Education for creating all conditions in terms of facilities

along with modern equipment and library system with a variety of documents, which is

convenient for students in order to research information

We would like to express our gratitude to the instructor Assoc.Prof Do Van Dung for

assisting and leading us in complete this project

Because of the team’s limited experience, this study will have errors when practicing

and finishing the graduation thesis We are looking forward to hearing feedback and

advice from professors to help us complete our report

Sincerely thank you!

Ho Chi Minh City, 24th December 2022

Trang 10

CONTENTS

DISCLAIMER i

ACKNOWLEDGE ii

CONTENTS iii

ABSTRACT xii

CHAPTER 1: INTRODUCTION 1

1.1 Reason for choosing topic: 1

1.2 Scope of research: 2

1.3 Project structure: 2

1.4 Thesis limited: 2

CHAPTER 2: FUNDAMENTALS 3

2.1 Overview of traffic lights system: 3

2.2 Overview of Engine Start Stop system: 3

2.2.1 How does engine start stop system work? 4

2.2.2 What are the benefits of Stop-Start? 5

2.2.3 What are the downsides of Stop-Start? 5

2.3 Introduction to Deep Learning: 6

2.3.1 What is Deep Learning: 6

2.3.2 The difference between the Machine Learning and Deep Learning : 7

2.3.3 Some neural network in Deep Learning: 8

2.4 Overview of Convolutional Neural Network in image classification: 12

2.4.1 What is Convolutional Neural Network? 12

2.4.2 Convolutional Neural Network Architecture: 14

2.5 ResNet: 21

2.6 How to detect an object: 23

2.7 Introduction some object detection algorithm: 25

2.7.1 R-CNN (Region–based Convolutional Neural Networks): 26

2.7.2 Fast R-CNN: 27

2.7.3 Faster R-CNN: 28

2.7.4 SSD (Single Shot Multi-box Detector): 30

CHAPTER 3: YOLO ALGORITHM MODEL 32

3.1 What is YOLO? 32

Trang 11

3.2 YOLO algorithm model: 32

3.2.1 Prediction output in YOLO: 32

3.2.2 Anchor box: 38

3.2.3 Multi-label image classification: 39

3.2.4 Non-maximum suppression (NMS): 40

3.2.5 Intersection Over Union (IOU): 40

3.2.6 YOLO network architecture: 41

3.2.7 Loss Function: 45

4.2.8 Training YOLO: 75

4.2.9 Mean Average Precision (mAP): 79

CHAPTER 4: DESIGN IDEAL ENGINE START-STOP SYSTEM MODEL AND ALTERNATIVE ENGINE START-STOP SYSTEM MODEL 82

4.1 Design ideal engine start-stop system model: 82

4.1.1 Changes in the classic engine start-stop system model : 82

4.1.2 Components of the ideal engine start-stop system model: 83

4.1.3 Process of the ideal engine start-stop system model: 89

4.2 Alternative engine start-stop system model: 91

4.2.1 Model overview: 91

4.2.2 Hardware: 92

4.2.3 Software: 97

CHAPTER 5: OPERATION RESULT AND FUTURE DEVELOPMENT 101

5.1 Operation: 101

5.1.1 Label image for training: 101

5.1.2 Training Yolo on Google Colab: 104

5.1.3 Operate Yolo on Windows: 106

5.1.4 Connect Arduino to Python: 107

5.2 Final result: 108

5.3 Conclusion 112

5.3.1 Strength: 112

5.5.2 Weakness: 113

5.4 Future development 113

5.4.1 Hardware: 113

5.4.2 Software: 114

Trang 12

REFERENCES 115

Trang 13

LIST OF FIGURES AND TABLES

Figure 2 1 Diagram of the engine start stop circuit 4

Figure 2 2 Engine Start Stop button on Mercedes 4

Figure 2 3 Comparison between Machine Learning and Deep Learning 8

Figure 2 4 The typical structure of ANN 9

Figure 2 5 Perceptron 9

Figure 2 6 A looping constraint on the hidden layer of ANN turns to RNN 10

Figure 2 7 Operation example of RNN 10

Figure 2 8 Unrolled RNN 11

Figure 2 9 Output of Convolution 12

Figure 2 10 CNN – Image Classification 12

Figure 2 11 Comparing the different between ANN, RNN, CNN 12

Figure 2 12 Layers in a CNN network 13

Figure 2 13 CNN network model – AlexNet 13

Figure 2 14 The image that the computer sees 14

Figure 2 15 Convolution between input and a kernel to generate data for a hidden layer neuron 15

Figure 2 16 Example of a convolutional layer 15

Figure 2 17 Graph of Sigmoid function 17

Figure 2 18 Graph of ReLU 18

Figure 2 19 Graph of Leaky ReLU function 19

Figure 2 20 Example of pooling layer 20

Figure 2 21 Fully-Connected Layer 21

Figure 2 22 The relationship between network depth and performance 21

Figure 2 23 Residual Block model 22

Figure 2 24 Object detection in computer vision 24

Figure 2 25 Image processing diagram 24

Trang 14

Figure 2 26 R-CNN model 26

Figure 2 27 R-CNN 27

Figure 2 28 Fast R-CNN 28

Figure 2 29 Faster R-CNN model 29

Figure 2 30 SSD model 30

Figure 3 1 YOLO input image is divided into 7 ×7 33

Figure 3 2 Example of calculating boundary box coordinates in 448× 448 size 34

Figure 3 3 Output of YOLOv3 36

Figure 3 4 Output of YOLOv3 37

Figure 3 5 Anchor box solves the problem of detecting many objects that appear on the same output image area 38

Figure 3 6 Example of multi-object recognition (person and vehicle) appearing in the same area 39

Figure 3 7 YOLOv3 can detect objects with similar characteristics such as women and people 40

Figure 3 8 Ratio between area of overlap and area of unio 41

Figure 3 9 General architecture of YOLO 41

Figure 3 10 YOLOv1 architecture 42

Figure 3 13 How the classification loss function work 46

Figure 3 14 Formula to estimate boundary box from anchor box 49

Figure 3 15 MS COCO object detection 52

Figure 3 16 An object detection mode 53

Figure 3 17 Dense block layers 54

Figure 3 18 Dense Net 55

Figure 3 19 Cross-stage-partical-connection 55

Trang 15

Figure 3 20 Object detection process 56

Figure 3 21 Applying SPP in Yolo( without DC block) 57

Figure 3 22 Yolo with SPP (with DC block) 57

Figure 3 23 Path Aggregation Network (PAN) 58

Figure 3 24 The design of Neck 59

Figure 3 25 In yolov4, the researchers changed add function to contact function 59

Figure 3 26 In yolo4, the researchers changed add function to contact function 60

Figure 3 27 Spatial Attention Module 60

Figure 3 28 Convolutional Block Attention Module 61

Figure 3 29 Yolov4-SAM 61

Figure 3 30 CutMix data augmentation 62

Figure 3 31 Mosaic data augmentation 63

Figure 3 32 DropBlock regularization 63

Figure 3 33 Algorithm DropBlock 64

Figure 3 34 Class label smoothing 64

Figure 3 35 Mish activation 65

Figure 3 36 Output landscape for Mish comparison 66

Figure 3 37 Multi-input weighted residual connections 67

Figure 3 38 Deepwise Conv block 68

Figure 3 39 MobleNetv2 convolution 68

Figure 3 40 Invert Residual Block 69

Figure 3 41 CmBN 72

Figure 3 42 Net layer in the cfg file 76

Figure 3 43 It should be noted that the [yolo] layers and [convolution] layers are configured before [yolo] when you want to detect selected objects 79

Figure 3 44 Illustration of TP and FP 80

Figure 4 1 Classic Engine Start-Stop system 82

Trang 16

Figure 4 2 Ideal Engine Start-Stop system 83

Figure 4 3 HD Camera 84

Figure 4 4 NVIDIA Jetson Nano A02 85

Figure 4 5 What's on NVIDIA Jetson Nano 85

Figure 4 6 Battery VARTA AGM LN6 605901053 12V 105AH 86

Figure 4 7 Ideal Start-Stop system work flow 90

Figure 4 8 Alternative Start=Stop work flow 92

Figure 4 9 HD Camera 93

Figure 4 10 Laptop Dell G3 Gaming with NVIDIA GTX 1050 Ti GPU 94

Figure 4 11 Laptop Specifications 95

Figure 4 12 Arduino UNO R3 96

Figure 5 1 Model diagram 101

Figure 5 2 Training data folder 102

Figure 5 3 Predefined-classes file 103

Figure 5 4 A file label of an image 104

Figure 5 5 Clone Yolov7 from Github 104

Figure 5 6.Install necessary library 104

Figure 5 7 Try to detect with pretrain weight 105

Figure 5 8 Unzip training data from Drive 105

Figure 5 9 Reorganize the training data folder 105

Figure 5 10 Start to train YOLO 106

Figure 5 11 Try to detect after training and print out result 106

Figure 5 12 Open Yolo path and import library 106

Figure 5 13 Start detect model by run detect.py file 107

Figure 5 14 Detect in real-time with image on mobile phone 107

Figure 5 15 Upload example to Arduino 108

Trang 17

Figure 5 16 Connect Arduino to Python 108

Figure 5 17 Logic code for connecting arduino 108

Figure 5 18 Detect image on the internet 109

Figure 5 19 Detect an image on the internet 109

Figure 5 20 Detect object small and far away 110

Figure 5 21 Detect in low bright condition 111

Figure 5 22 Detect with flared condition 111

Figure 5 23 Detect video 111

Figure 5 24 Detect video with obstacle 112

Figure 5 25 Detect in real time 112

Trang 18

LIST OF ABBREVIATIONS

AI: Artificial Intelligence

CNN: Convolutional Neural Network

RNN: Recurrent Neural Network

ANN: Artificial Neural Network

IoT: Internet of Things

ResNet: Residual Neural Network

R-CNN: Region-based Convolutional Neural Network Fast

R-CNN: Fast Region-based Convolutional Neural Network Faster

R-CNN: Faster Region-based Convolutional Neural Network

SSD: Single Shot Multi-box Detector

YOLO: You Only Look Once

MAP: Mean Average Precision

IOU: Intersection Over Union

GPU: Graphics Processing Unit

CPU: Central Processing Unit

DNN: Deep Neural Network

CUDA: Compute Unified Device Architecture

CuDNN: NVidia CUDA® Deep Neural Network

ReLU: Rectified Linear Unit

RoI: Region of Interest

FPS: Frame Per Second

Trang 19

ABSTRACT

The world is witnessing a rapid change in the future of artificial intelligence Automobile brands are investing millions of dollars in developing information technology Thanks to object detection, we can manufacture many different automatic systems

Because of that, we decide to improve the traditional engine start stop system efficiency by adding the traffic light detection system Firstly, we must learn about the object detection algorithm We also know about the engine start stop system working principle and its electric circuit In this project, we develop an object detection model base on YoloV7 model and use Python language in operation

Trang 20

CHAPTER 1: INTRODUCTION

1.1 Reason for choosing topic:

Traditional engine start-stop technology’s working principle is that the engine stops once the brake pedal has been depressed for 2 seconds, and runs again when the brake pedal is depressed again, which helps save energy However, this trigger technology has two important disadvantages:

− When a vehicle stops for red light for less than 5 seconds, the fuel consumed

by activating the engine start-stop technology is more than when the engine idles for

a time for the red light

− It only considers the vehicle status, stopping or running, but neglects the road status, especially road congestion, which leads to frequent start-stop activation, further affecting both vehicle stability driving comfort

The main reason for the above disadvantages is the unintelligent engine start-stop system trigger To solve this problem, this project combines the traditional engine start-stop system with traffic lights detection using Yolo algorithm model System can effectively improve the driving experience, reduce engine fuel consumption, and help promote traditional engine start-stop technology

In recent years, the wave of artificial intelligence is exploding strongly and its application are endless The technology and AI application can be applied in many fields such as healthcare, self-driving cars, smart home, social media, space exploration,… However, the application of AI in real world requires not only high accuracy but also fast response speed In object detection, there are many advanced models born to solve this problem, but most of them cannot be used in real time due

to large computational resource requirements

For this reason, it is necessary to study and research the computer vision Nowadays, many countries have been applying computer vision to daily life, such as China's SkyNet, BKAV's AI View camera for counting people, social distancing, face recognition… Thus, it can be seen that the use of AI to exploit image data has been a trend

Beside that, YOLO is one of the most advanced object detection models available today YOLO is an end-to-end model that uses a single deep neural network to train and label as well as determine the position of each object appearing on the frame (instead of using 2 neural network and training on each new network in order to give the same predictions like previous models) It can be said that YOLO has built a first approach to make the problem of object detection really possible in life Based on the above statement and with the suggestion of the lecturer, we decided to choose the

Trang 21

topic “Research object detection technology for vehicle by using python” as the research topic for the graduation project

1.2 Scope of research:

The thesis: “Design engine start stop circuit through traffic detection” is carried out with the following aims:

− Learning about engine start stop system

− Learn about CNN network and YOLO algorithm

− Learn how to use (train and detect) YOLO and the application of YOLO in many different coding environments

− Try to detect images and videos in real time using OpenCV library and Python programming language

− The potential of engine start stop system using object detection

− Design engine start stop system with object detection model

− Simulate engine start stop system with object detection model

1.3 Project structure:

Chapter 1: Introduction

Chapter 2: Fundamentals

Chapter 3: Yolo algorithm model

Chapter 4: Design ideal engine start-stop model and alternative start-stop model Chapter 5: Operation, results and future development

1.4 Thesis limited:

In this project, our team only have 11 weeks to study, research, construct, improve and develop model Because of the time limitation, we do not have enough time to design an ideal engine start-stop system and connect it to a real-life vehicle In addition, the lack of equipment prevents us to build a complete start-stop circuit Because the price of an ECU Engine Start-Stop is too high for us to afford and the ECU is not available in school resources, we are only able to export the Digital signal

to control the LED instead send it to the ECU Therefore, our main destination in this thesis is to design, create and simulate how object detection system can connect and transmit signal to the ECU by control LED signal

Trang 22

CHAPTER 2: FUNDAMENTALS

2.1 Overview of traffic lights system:

Traffic lights are signaling devices positioned at road intersections, pedestrian crossings and other’s locations in other to control flows of traffic Traffic lights consist normally of three signals, transmitting meaningful information to drivers and riders through colors and symbols The regular traffic light colors are red, yellow, and green arranged vertically or horizontally in that order Although this is internationally standardized, variations exist on national and local scales as to traffic light sequences and laws

The method was first introduced in December 1868 on Parliament Square in London to reduce the need for police officers to control traffic Since then, electricity and computerized control has advanced traffic light technology and increased intersection capacity The system is also used for other purposes, for example, to control pedestrian movements, variable lane control (such as tidal flow systems or smart motorways), and railway level crossings

A set of lights, known as a signal head, may have one, two, three, or more aspects The most common signal type has three aspects facing the oncoming traffic: red on top, amber below, and green below that Additional aspects may be fitted to the signal, usually to indicate specific restrictions or filter movements

2.2 Overview of Engine Start Stop system:

A vehicle start-stop system or stop-start system automatically shuts down and restarts the internal combustion engine to reduce the amount of time the engine spends idling, thereby reducing fuel consumption and emissions This is most advantageous for vehicles which spend significant amounts of time waiting at traffic lights or frequently come to a stop in traffic jams Start-stop technology may become more common with more stringent government fuel economy and emissions regulations

Trang 23

Figure 2 1 Diagram of the engine start stop circuit

Figure 2 2 Engine Start Stop button on Mercedes

2.2.1 How does engine start stop system work?

The start-stop system detects when the car is stationary and on the basis of sensors

it determines a series of other factors about the operating mode of the vehicle If the driver has stopped at a traffic light and sets the transmission to neutral, the start-stop system stops the engine (the fuel supply system and the engine ignition system will stop working temporarily) With some more recent models, the engine even switches off if the speed falls below a certain value

Although the engine, and therefore the primary source of power for all systems

is switched off, all of the electrical consumers and assistants are still supplied with power This is provided by the battery of the vehicle As soon as the clutch is actuated, the automatic start-stop system restarts the engine

Trang 24

For vehicles with automatic or dual clutch transmissions, the automatic start-stop system responds to actuation of the brake alone If the vehicle is braked to a standstill and the driver’s foot remains on the brake pedal, the automatic start-stop system stops the engine When the brake is released, the automatic system starts the engine again

At this time, the battery will stimulate an electric current to the starter, drive the flywheel to rotate, fuel continues to be pumped and the engine works again This process takes less than 1 second

This system is controlled by the ECU through a main relay Parameters are received and calculated through the accelerator pedal position sensor, speed sensor, brake pedal position sensor

When the engine resumes, it will drive the electrical systems to normal operation, the generator will perform the process of recharging the battery if the battery power

is below the specified level

2.2.2 What are the benefits of Stop-Start?

The idea behind the start-stop system is simple: If the engine is stopped for short periods, for example while waiting at traffic lights, fuel consumption and emissions are reduced In this way, the automatic start-stop system helps to save fuel and protect the climate With this technology, CO2– emissions can be reduced by 3 – 8% and the same with fuel consumption The benefits to the environment and improved efficiency have caused a rapid spread of automatic start-systems to all classes of vehicle.It is estimated that by 2020, this technology can save up to 1.6 billion gallons

of fuel (1 gallon = 3.785411784 liters) and help reduce 8 tons of CO2 emissions at

14 locations around the world

The main benefits are two folds Firstly, it reduces pollution An idling car creates pointless pollution and by turning it off you won't be producing any at all Secondly,

it keeps the engine temperature from becoming too hot while the vehicle is running Pollution is an increasing problem in many towns and cities, so every little reduction helps Secondly there is a fuel saving to be had Granted it's not a huge amount

But if much of your driving is in stop start traffic it will all add up A third minor benefit is that it's quieter and more relaxing sitting in a car that's not thrumming away

at idle

2.2.3 What are the downsides of Stop-Start?

Sadly, it's not a perfect system, and there are some downsides The primary one

is that while the main intention of the device is to lower emissions you have to wonder if we're robbing Peter to pay Paul How much pollution is caused by the

Trang 25

manufacture of the extra components required, and how much more waste is create

at the vehicles end of life?

Traditional engine start-stop technology’s working principle is that the engine stops once the brake pedal has been depressed for 2 seconds, and runs again when the brake pedal is depressed again, which helps save energy However, this trigger technology has two important disadvantages:

− When a vehicle stops for red light for less than 5 seconds, the fuel consumed

by activating the engine start-stop technology is more than when the engine idles for

a time for the red light

− It only considers the vehicle status, stopping or running, but neglects the road status, especially road congestion, which leads to frequent start-stop activation, further affecting both vehicle stability driving comfort

Car manufacturers are being pushed to meet ever more strict emissions guidelines, and Stop-Start technology helps them achieve these targets – but there doesn't appear to be any studies which take into consideration the levels of pollution caused during production

As Stop-Start places extra demand on components you need specific, powerful batteries and more robust starters and engine mounts While these shouldn't have lifespans any shorter than those on a regular car, the cost of replacement can be substantially higher than non Stop-Start equipped cars, plus the added complexity is likely to make labor charges higher on cars undergoing work in these areas

But the main downside is a lot of people simply don't like the sensation of their car automatically turning off, and manufacturers have identified that many owners just turn off the feature when they get in the car

It's something they're not used to, and don't really understand, or fully trust But our advice has to be to always leave Stop-Start engaged if your car is equipped with it

2.3 Introduction to Deep Learning:

2.3.1 What is Deep Learning:

Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it to

“learn” from large amounts of data While a neural network with a single layer can still make approximate predictions, additional hidden layers can help to optimize and refine for accuracy

Deep learning drives many artificial intelligence (AI) applications and services that improve automation, performing analytical and physical tasks without human

Trang 26

intervention Deep learning technology lies behind everyday products and services (such as digital assistants, voice-enabled TV remotes, and credit card fraud detection)

as well as emerging technologies (such as self-driving cars)

2.3.2 The difference between the Machine Learning and Deep Learning :

Human

Intervention - Requires more ongoing human

intervention to get results

- More complex to set up but requires minimal intervention thereafter

Time - Can be set up and operate

quickly but may be limited in the power of their results

- Take more time to set up but can generate results instantaneously

Approach - Require structured data and

uses traditional algorithms like linear regression

- Employs neural networks and is built to accommodate large volumes of unstructured data

Applications - Use in your email inbox, bank,

Enables more complex and autonomous programs, like self-driving cars or robots that perform advanced surgery

Table 2 1 The difference between the Machine Learning and Deep Learning

Trang 27

Figure 2 3 Comparison between Machine Learning and Deep Learning

2.3.3 Some neural network in Deep Learning:

2.3.3.1 Artificial Neural Network (ANN):

ANN consists of 3 layers – Input, Hidden and Output The input layer accepts the inputs, the hidden layer processes the inputs, and the output layer produces the result Essentially, each layer tries to learn certain weights

ANN can be used to solve problems related to:

- Tabular data

- Image data

- Text data

Trang 28

Figure 2 4 The typical structure of ANN

One of the main reasons behind universal approximation is the activation function Activation functions introduce nonlinear properties to the network This helps the network learn any complex relationship between input and output

Trang 29

b×w2= 0.6

c×w3= 1.4

For w1, w2, w3 are the weights

The output of the neuron will then be y = a(x) = a(2.7 + bias)

2.3.3.2 Recurrent Neural Network (RNN):

RNN has a recurrent connection on the hidden state This looping constraint ensures that sequential information is captured in the input data

We can use recurrent neural networks to solve the problems related to:

- Time series data

- Text data

- Audio data

Figure 2 6 A looping constraint on the hidden layer of ANN turns to RNN

RNN captures the sequential information present in the input data i.e., dependency between the words in the text while making predictions:

Trang 30

As you can see here, the output (o1, o2, o3, o4) at each time step depends not only

on the current word but also on the previous words

RNNs share the parameters across different time steps This is popularly known

as Parameter Sharing This results in fewer parameters to train and decreases the computational cost

Figure 2 8 Unrolled RNN

As shown in the above figure, 3 weight matrices – U, W, V, are the weight matrices that are shared across all the time steps

2.3.3.3 Convolution Neural Network (CNN):

Convolutional neural networks (CNN) are all the rage in the deep learning community right now These CNN models are being used across different applications and domains, and they’re especially prevalent in image and video processing projects

The building blocks of CNNs are filters a.k.a kernels Kernels are used to extract the relevant features from the input using the convolution operation Let’s try to grasp the importance of filters using images as input data Convolving an image with filters results in a feature map:

Trang 31

Figure 2 9 Output of Convolution

CNN learns the filters automatically without mentioning it explicitly These filters help in extracting the right and relevant features from the input data

Figure 2 10 CNN – Image Classification

CNN captures the spatial features from an image Spatial features refer to the arrangement of pixels and the relationship between them in an image They help us

in identifying the object accurately, the location of an object, as well as its relation with other objects in an image

(time series, text, audio)

Image data

Vanishing & Exploding

Gradient

Figure 2 11 Comparing the different between ANN, RNN, CNN

2.4 Overview of Convolutional Neural Network in image classification: 2.4.1 What is Convolutional Neural Network?

Convolutional neural network (CNN) is one of the special feedforward networks

Trang 32

CNN network is the most popular and advanced deep learning model today Most of the current image recognition and processing systems use the CNN network because

of its fast processing speed and high accuracy In a traditional neural network, the layers are considered one-dimensional, while in the CNN, the layers are considered three- dimensional: height, width, and depth The CNN network has two important concepts: the local receiving field and the parameter sharing These concepts contribute to reducing the number of weights that need to be trained, thereby increasing the speed of computation

Figure 2 12 Layers in a CNN network

CNN is typically composed of three types of layers (or building blocks):

- Convolution

- Pooling

- Fully connected layers

The fully connected layer is like regular neural networks, and the convolutional layer performs multiple convolutions on top of the previous layer The pooling layer can reduce the sample size per block of the previous layer In CNNs, the network architecture often overlaps these three layers to build the full architecture

Figure 2 13 CNN network model – AlexNet

Computers see images differently than humans The image seen by the computer

is represented as an array containing the values of pixels This array of values can

be a 2-D array for grayscale images or a 3-D array for RGB color images Tensor is

Trang 33

used to call arrays with dimension greater than or equal to 3 1-dimensional tensor

is an array, 2- dimensional tensor is a matrix A color image of size 512× 512 pixels

is a 3-dimensional tensor (512, 512, 3), where 3 represents depth, also known as R,

B, G color Usually for image processing, there will be a lot of parameters to calculate A tensor image (512, 512,3) will have 512× 512× 3 = 785432 input parameters A neural network has 2 hidden layers, each hidden layer has 16 neurons and 2 neurons in the output, the number of parameters to be calculated will be:

785432 × 16 + 16 × 16 + 16 × 2 = 12567200 Building a good recognition model will need more hidden layers and also the number of neurons in each layer Thus, the number of parameters to be calculated will be even larger A large number of computations can slow down the model, in addition to requiring expensive, modern computer In many cases, the amount of computation can exceed the ability of current computers

Figure 2 14 The image that the computer sees

2.4.2 Convolutional Neural Network Architecture:

2.4.2.1 Local Receptive Field:

Through several studies of image processing, researchers found that features in an image are often local, and pixels that are close together are often interconnected Thus, the network architecture can convert a fully connected network to a locally connected network, reducing computational complexity This is one of the main ideas

in CNN0

Trang 34

Figure 2 15 Convolution between input and a kernel to generate data for a hidden layer neuron

Kernel also known as filter, is often used to extract features contained in an image The kernel can be a matrix or a 3-dimensional tensor if the input is a color image The depth of the kernel depends on the depth of the input The kernel traverses the entire input and performs scalar multiplication over the regions it passes through These areas are called locally connected fields The way the kernel moves

on the input is like the Sliding Window technique in image processing, going from left to right, top to bottom The result is a feature map containing the results of the performed scalar calculations The depth of the output is the number of kernels used

in that convolutional layer So when it comes to the convolution layer, it means using the kernel to scan the entire input of that layer and perform scalar multiplication on each region the kernel scans through The end result is the output of the convolutional layer

Figure 2 16 Example of a convolutional layer

Trang 35

The input to the upper convolutional layer is an image of size 32× 32× 3 The convolution layer uses 17 kernels of size 5× 5× 3, strides are 1 Each kernel contains 5× 5× 3 weights, in the learning process of these weights will be adjusted So the output size will be:

There is input (H× W× D) use kernel (k× k× D), number of kernels is N, stride is

s and padding is p The general formula for calculating the output of a1 convolutional layer is :

1305600 However, each neuron has the ability to locally connect to an area of the image through the 5×5×3 kernel and the same kernel it will scan the entire input 32×

32 So the actual number of parameters will be:

(Size of kernel)× (Number of kernels)=(5×5×3)×17=1275 Thanks to the ability of CNN to share parameters, the number of parameters to

be calculated during training is significantly reduced

Trang 36

2.4.2.3 Activation Function:

In a general neural network, the activation function acts as the nonlinear component at the output of the neurons In the classification and identification problem, the data points are discrete Without nonlinear activation functions, the neural network of even multiple layers will still be effective as a linear layer, which makes it inapplicable to classification or recognition problems

Suppose with input X, output is Y, weight is W In the first layer, we have the formula in the sum function:

𝑍1 = 𝑊1.X Then Z is pushed into a nonlinear activation function g(x) = cx where c is a real number

𝑎1= g(𝑍1)=c.𝑍1

Similarly, the output of the first layer 𝑎1 is the input of the second layer:

𝑍2 = 𝑊2.𝑎1=𝑊2.c.𝑍1=𝑊2.c.𝑊1.X

Sigmoid function

Figure 2 17 Graph of Sigmoid function

The Sigmoid function takes a real number as input and converts it to a value in the range (0, 1) The input of a very small negative real number will give the output asymptote to 0, conversely, if the input is a large positive real number, the output will be a number asymptotically to 1 In the past, the Sigmoid function was often used because has a very nice derivative However, at present, the Sigmoid function

Trang 37

is rarely used because of the following disadvantages:

- Sigmoid function storm and error gradient: A noticeable disadvantage is that when the input has a large absolute value (negative or positive), the gradient of this function will be very close to 0 This means that the coefficient corresponding to the unit will be almost completely unchanged (also known as the Vanishing Gradient phenomenon)

- The Sigmoid function has no center of 0 which makes it difficult to converge

ReLU (Rectified Linear Unit ) function

Figure 2 18 Graph of ReLU

The ReLU function is being used quite a lot in recent years when training neural networks ReLU simply reduce values less than 0 The advantages of ReLU are:

- The convergence speed is much faster ReLU has a convergence speed 6 times faster than Sigmoid This may be because the ReLU is not have Vanishing Gradient like Sigmoid

- Faster calculation Sigmoid uses the exp function and the formula is much more complex than ReLU, so it will cost more to calculate

Trang 38

Leaky ReLU function:

Figure 2 19 Graph of Leaky ReLU function

Leaky ReLU is an attempt at eliminating dying ReLU Instead of returning zero for inputs less than zero, the Leaky ReLU generates a slightly sloped bevel There are many reports that Leaky ReLU is more effective than ReLU, but this effect is not clear and consistent

In addition, Leaky ReLU, there is a well-known variant of ReLU, which is PReLU PReLU is similar to Leaky ReLU but allows the neuron to automatically choose the best α coefficient

2.4.2.4 Pooling layer:

The pooling layer will reduce the size of the image immediately after performing the convolution, helping to retain the most prominent features and properties of the image This allows to reduce the amount of computation when the image is too large, while not losing important features of the image

Although we have used locally connected networks and shared parameters, the number of parameters in the neural network is still too large Compared to a relatively small data set, it can cause overfitting Therefore, artificial neural networks often insert pooling layers into the network The pooling layer processes to gradually reduce the number of parameters to improve the computation time in the neural network The pooling layer applies downsampling to the previous layer using the max The pooling layer operates independently on each of the previous layers Also,

Trang 39

it is possible to set the amounts of pixels when we move the kernel slide or s stride equal to 2 as do with convolution layer

Figure 2 20 Example of pooling layer

In the above example, the kernel size is 2 x 2 and the stride is 2 At each window, the max function will take the maximum value to represent the value of the next layer There are two types of pooling: If the kernel size is equal to the stride, it is Traditional Pooling If the sliding window size is larger than the stride, it is Overlapping Pooling In practice, neural networks often use a kernel size of 2 x 2 with a stride size of 2 in pooling and use a kernel size of 3 x 3 with a stride size of 2

in pooling, because if increasing window size will very easily lose the characteristics

of the data In addition to pooling using the max function, one can use other functions For example, one can use the kernel's average function to calculate the value for the next layer, called the average pooling

2.4.2.5 Fully-Connected Layer:

The third layer in a CNN network is the fully connected layer This layer is like

a traditional neural network: the neurons in the previous layer connect to a neuron in the next layer, and the last layer is the output To be able to import images from previous layers, the output data must be flattened into a multidimensional vector Finally, use the SoftMax function to perform object classification

Trang 40

Figure 2 21 Fully-Connected Layer

2.5 ResNet:

ResNet (stand for Residual Network) is a deep learning network with CNN architecture that received attention in 2012 after the LSVRC2012 competition and became popular in the field of machine vision ResNet makes it possible and efficient

to train hundreds or even thousands of layers of neural networks

Since AlexNet, CNN architectures are getting deeper and deeper While AlexNet has only 5 convolutional layers, VGG and GoogleNet networks (aka Inception_v1) have 19 and 22 layers respectively However, increasing network depth is more than simply stacking layers together Deep networks are difficult to train because of the vanishing gradient problem – since the gradient is propagated back to previous layers, repeated multiplication can make the gradient extremely small As a result, the network's performance decrease rapidly

Figure 2 22 The relationship between network depth and performance

The main idea of ResNet is to use a uniform shortcut connection to traverse one

or more layers Such a block is called a Residual Block as shown in the following

Tiêu đề	Design start stop circuit through traffic detection
Tác giả	Le Chan Pham, Vo Huy Vu
Người hướng dẫn	Assoc.Prof Do Van Dung
Trường học	Ho Chi Minh City University of Technology and Education
Chuyên ngành	Automotive Engineering
Thể loại	Graduation project
Năm xuất bản	2022
Thành phố	Ho Chi Minh City

Định dạng
Số trang	137
Dung lượng	14,75 MB