Study on camera based real time car speed monitor using yolov5 multiple object detection model

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY NGUYEN NGOC TRUC STUDY ON CAMERA-BASED REAL-TIME CAR SPEED MORNITOR USING YOLOv5 MULTIPLE OBJECT

Trang 1

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

NGUYEN NGOC TRUC

STUDY ON CAMERA-BASED REAL-TIME CAR SPEED MORNITOR USING YOLOv5 MULTIPLE OBJECT DETECTION MODEL

Major: Vehicle Engineering

Major code: 8520116

MASTER’S THESIS

HO CHI MINH CITY, July 2023

Trang 2

THIS THESIS IS COMPLETED AT

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY – VNU-HCM Supervisor: Trần Đăng Long, Ph.D

Examiner 1: Trần Hữu Nhân, Ph.D

Examiner 2: Nguyễn Văn Trạng, Ph.D

This master’s thesis is defended at HCM City University of Technology, VNU- HCM City on July 15th, 2023

Master’s Thesis Committee:

1 Chairman: Lê Tất Hiển Assoc.Prof.Ph.D

2 Member: Võ Tấn Châu, Ph.D

3 Secretary: Hồng Đức Thông, Ph.D

4 Reviewer 1: Trần Hữu Nhân, Ph.D

5 Reviewer 2: Nguyễn Văn Trạng, Ph.D

Approval of the Chairman of Master’s Thesis Committee and Dean of Faculty

of Transportation Engineering after the thesis being corrected (If any)

CHAIRMAN OF THESIS COMMITTEE HEAD OF FACULTY OF

TRANSPORTATION ENGINEERING

Trang 3

UNIVERSITY HO CHI MINH CITY

HO CHI MINH CITY

THE TASK SHEET OF MASTER’S THESIS

Full name: Nguyễn Ngọc Trực Studen code: 2170108

Date of Birth: 30/07/1996 Place of birth: Đăk Lăk

Major: Vehicle Engineering Major code: 8520116

I THESIS TOPIC: Study on camera-based real-time car speed monitor using

YOLOv5 multiple object detection model

ĐỀ TÀI LUẬN VĂN : Nghiên cứu ứng dụng mô hình YOLOv5 nhận diện đa

vật thể trong ảnh cho hệ thống giám sát tốc độ xe bằng camera theo thời gian thực

II TASKS AND CONTENTS:

- Develop a traffic sign recognition system, specifically for speed limit signs, from images captured by cameras on the road

- Employ the Jetson Nano embedded computer as the central processing unit to run the YOLOv5 model for detecting speed limit signs Simultaneously, compare the sign recognition results with the vehicle's current speed accessed from the OBD-II system Subsequently, the system provides a direct alert to the driver on the screen if the speed limit is exceeded

III TASKS STARTING DATE: February 06th, 2023

IV TASKS ENDING DATE: June 12th 2023

VIII INSTRUCTOR: Trần Đăng Long, Ph.D

Ho Chi Minh City, July 15 th 2023.

INSTRUCTOR

(Full name & Signature) HEAD OF DEPARTMENT (Full name & Signature)

DEAN - FACULTY OF TRANSPORTATION ENGINEERING

(Full name & Signature)

Trang 4

ACKNOWLEDGEMENT

I would like to express my heartfelt gratitude to my thesis advisor, Ph.D Tran Dang Long, for his invaluable guidance, unwavering support, and continuous encouragement throughout the entire duration of this thesis His expertise, insightful feedback, and constructive criticism have immensely contributed to the success of this research endeavor

I am also deeply grateful to Ho Nam Hoa, for his assistance and collaboration

in helping me establish the OBD-II CAN communication His technical knowledge, dedication, and willingness to share his expertise have been instrumental in overcoming challenges and achieving significant milestones in this project

Furthermore, I extend my sincere appreciation to my friend, Bui Huu Nghia, for his valuable contribution in collecting the dataset His commitment, attention to detail, and assistance in data acquisition have greatly enriched the quality of this research

I would also like to acknowledge the support and encouragement received from my family and friends throughout this academic journey

Lastly, I am grateful to all the individuals who have directly or indirectly contributed to the completion of this thesis Their support, guidance, and encouragement have been indispensable in shaping my research and personal growth

Ho Chi Minh City, 15th July, 2023

Researcher,

Nguyen Ngoc Truc

Trang 5

This study aims to two objectives Firstly, the design of a real-time traffic sign detection system for automobiles, with a specific focus on speed limit signs, using the YOLOv5 model Secondly, the study includes an assessment of the practical implementation of the traffic sign detection system by integrating it with a speed warning system that can be integrated into vehicles

This study includes several key tasks Firstly, extensive research was conducted to identify real-time detection methods suitable for traffic signs Subsequently, a comprehensive dataset of speed limit traffic signs was prepared, consists of 3200 images The next step involved training a model for the detection of these speed limit signs with the result of mAP 0.922 across 10 classes The model was then implemented on a Jetson Nano embedded computer Parallelly, an ESP32 microcontroller was utilized to extract actual vehicle speed data from the OBD-II system Lastly, the speed limit traffic sign detection system and the actual vehicle speed information were seamlessly integrated to develop a speed warning system

The experimental results demonstrate the efficiency of the proposed traffic sign detection system in this study The YOLOv5 model achieves 4 frames per second (FPS) on Jetson Nano computer in real-time detection speed Moreover, by integrating the speed limit sign detection system with real-time monitoring of the actual vehicle speed, it enables timely warnings to the driver in the event of exceeding the speed limit

Additionally, the experimental results showed limitations of the speed limit traffic sign detection system One such limitation is its inability to detect the number

of lanes on the road, which affects its accuracy in providing the precise speed limit, particularly in residential areas Furthermore, there were instances where untrained traffic signs were mistakenly detected as speed limit signs To address these issues, it

is recommended to expand the training dataset to include a wider range of traffic signs, not limited to speed limit signs alone

In summary, the developed system exhibits significant potential for applications in the automotive industry, particularly in the field of Advanced Driver Assistance Systems (ADAS)

Trang 6

TÓM TẮT LUẬN VĂN THẠC SĨ

Nghiên cứu này nhằm đạt được hai mục tiêu chính Thứ nhất, thiết kế một hệ thống nhận diện biển báo giao thông theo thời gian thực cho ô tô, tập trung đặc biệt vào biển báo giới hạn tốc độ, bằng cách sử dụng mô hình YOLOv5 Thứ hai, nghiên cứu bao gồm việc đánh giá khả năng ứng dụng thực tế của hệ thống nhận diện biển báo giao thông bằng cách tích hợp với một hệ thống cảnh báo tốc độ có thể sử dụng trên xe ô tô

Nghiên cứu này bao gồm một số nhiệm vụ chính Đầu tiên, nghiên cứu xác định được các phương pháp nhận diện biển báo giao thông theo thời gian thực Sau

đó, đã chuẩn bị tập dữ liệu về biển báo giới hạn tốc độ, bao gồm 3200 hình ảnh Bước tiếp theo, huấn luyện một mô hình để nhận diện các biển hạn chế tốc độ này với kết quả mAP là 0.922 trên 10 loại biển báo giao thông Sau đó, mô hình đã được triển khai trên máy tính nhúng Jetson Nano Đồng thời, đã sử dụng vi điều khiển ESP32

để trích xuất dữ liệu tốc độ thực tế của xe từ hệ thống OBD-II Cuối cùng, hệ thống nhận diện biển hạn chế tốc độ và thông tin tốc độ xe thực tế đã được tích hợp để phát triển thành một hệ thống cảnh báo tốc độ

Kết quả thử nghiệm thể hiện tính ứng dụng của hệ thống nhận diện biển báo giao thông được trình bày trong nghiên cứu này Mô hình YOLOv5 đạt được 4 khung hình/giây (FPS) trên máy tính Jetson Nano trong quá trình nhận diện với thời gian thực Hơn nữa, bằng cách tích hợp hệ thống nhận diện biển báo giới hạn tốc độ với việc giám sát tốc độ thực tế của xe, hệ thống cho phép cảnh báo kịp thời cho người lái trong trường hợp vượt quá giới hạn tốc độ

Ngoài ra, kết quả thử nghiệm cũng cho thấy những hạn chế của hệ thống Một trong những hạn chế đó là khả năng không thể nhận diện số làn đường trên đường, điều này ảnh hưởng đến độ chính xác của hệ thống trong việc xác định giới hạn tốc

độ chính xác, đặc biệt là trong khu vực dân cư Hơn nữa, có những trường hợp biển báo giao thông chưa được huấn luyện bị nhận diện nhầm là biển báo giới hạn tốc độ

Để khắc phục các vấn đề này, tập dữ liệu huấn luyện cần được đa dạng hơn cho các loại biển báo giao thông khác, không chỉ giới hạn ở biển báo giới hạn tốc độ

Tóm lại, hệ thống được phát triển cho thấy tiềm năng đáng kể cho các ứng dụng trong ngành công nghiệp ô tô, đặc biệt là trong lĩnh vực Advanced Driver Assistance Systems (ADAS)

Trang 7

THE COMMITMENT OF THE THESIS’ AUTHOR

I am Nguyen Ngoc Truc, Master’s student of Department of Vehicle Engineering,

Faculty of Transportation, class 2021, at Ho Chi Minh City University of Technology

I guarantee that the information below is accurate:

(i) I conducted all of the work for this research study by myself

(ii) This thesis uses actual, reliable, and highly precise sources for its

references and citations

(iii) The information and findings of this study were produced

independently by me and honesty

Ho Chi Minh City, 15th July, 2023

Researcher,

Nguyen Ngoc Truc

Trang 8

1.1 Background 2

1.2 Literature Review 4

1.2.1 Speed Warning Systems 4

1.2.2 Traffic Sign Detection 7

1.2.3 Object Detectors 8

1.3 Research Objectives 9

1.4 Research Methodology 10

1.5 Research Contents 10

1.6 Scope of Research 12

1.7 Research Contributions 12

1.8 Research Outline 12

2 Fundamentals 14 2.1 Convolutional Neural Networks 15

2.1.1 Convolutional Layer 15

2.1.2 Pooling Layer 17

2.1.3 Fully Connected Layer 18

2.1.4 Activation Function 19

2.2 YOLOv5 21

2.2.1 Introduce to YOLO 21

Trang 9

2.2.2 YOLOv5 Architecture 22

2.3 Evaluation Metrics 26

2.3.1 Confusion Matrix 26

2.3.2 Intersection over Union 28

2.3.3 Precision and Recall 29

2.3.4 Mean Average Precision 30

2.3.5 F1 Score 30

2.4 Toolchain 30

2.4.1 Roboflow 31

2.4.2 Google Colaboratory 31

2.5 Conclusion 32

3 Design A Speed Limit Signs Detection Model 33 3.1 Prepare Dataset 34

3.1.1 Dataset Requirement 35

3.1.2 Dataset Classes 35

3.1.3 Dataset Collection 37

3.1.4 Data Annotation 38

3.1.5 Data Augmentation 38

3.1.6 Dataset Structure 40

3.2 Training Model 41

3.2.1 Install dependencies 41

3.2.2 Download Dataset 41

3.2.3 Training Model Parameters 42

3.2.4 Training Results 44

4 Experimental Evaluations 48 4.1 Experimental Preparation 49

4.1.1 Hardware Circuit Diagram 49

4.1.2 Software Algorithm Flowchart 50

Trang 10

4.1.3 Speed Limit Caching Algorithm 51

4.1.4 Finite State Machine Based Speed Warning Algorithm 53

4.2 Experimental Apparatus 55

4.2.1 Jetson Nano 55

4.2.2 Camera Raspberry Pi V2 57

4.2.3 ESP32 59

4.2.4 CAN Transceiver 60

4.2.5 DC-DC Converter 61

4.2.6 OBD-II Adapter 61

4.3 Deploy on Jetson Nano 62

4.3.1 Build Model Engine 62

4.3.2 Run Model Engine 63

4.4 Experiment Conditions 64

5 Results and Discussions 66 5.1 System Setup 67

5.2 Speed Limit Detection 69

5.2.1 Results 69

5.2.2 Error Cases 72

5.3 Speed Warning Applications 75

Trang 11

1.1 Types of ADAS 2

1.2 GSpeed, based on GPS and developed by iCar 5

1.3 Concept of Smart Road Signs communicate to vehicles 6

1.4 The comparison of YOLOv3 on performance 9

1.5 Research Contents and Workflows 10

2.1 An example of CNN architecture to classify handwritten digits 15

2.2 The Convolution Operation 16

2.3 An example of convolution with stride equal to 2 16

2.4 An example of padding in convolutional 17

2.5 An example of max pooling and average pooling 17

2.6 An example of the fully connected layer’s input multiplied by the weights matrix to receive the output vector 18

2.7 Plot of sigmoid activation function 19

2.8 Plot of tanh activation function 20

2.9 Plot of ReLU activation function 20

2.10 How YOLO works 21

2.11 Darknet-53 Architecture 23

2.12 (a) DenseNet and (b) Cross Stage Partial DenseNet 24

2.13 YOLOv5 Network Architecture 25

2.14 Confusion Matrix Definition 27

2.15 Computing the Intersection over Union 28

Trang 12

LIST OF FIGURES

2.16 Define TP, FP base on IoU 29

2.17 The computer vision workflow on Roboflow 31

3.1 Dataset Preparation Workflows 34

3.2 Recorded Traffic Signs at Day and Night 38

3.3 Data annotating on roboflow 39

3.4 Image before and after augmentation 39

3.5 Dataset Health Check before Augment 40

3.6 Export Dataset with Download Code 42

3.7 The YOLOv5s Model Training Architecture 43

3.8 Training Results over 100 Epochs 44

3.9 Confusion Matrix 45

3.10 Precision and Recall Curve 46

3.11 F1-Confidence Curve 47

4.1 Concept of Experimental System 48

4.2 Hardware Circuit Diagram 49

4.3 Software Algorithm Flowchart 50

4.4 Speed Limit Caching Algorithm 52

4.5 FSM based speed warning algorithm 53

4.6 Jetson Nano Developer Kit B01 55

4.7 Camera Raspberry Pi V2 58

4.8 Microcontroller ESP32 59

4.9 Module CAN Transceiver SN65HVD230 60

4.10 DC-DC Buck Converter 61

4.11 OBD-II Male Adapter 62

5.1 The system setup for experiment 67

5.2 The system implemented on vehicle 67

5.3 The system was tested on vehicle 68

5.4 Speed detection system being tested in afternoon environments 69

Trang 13

5.5 Speed detection system being tested in nighttime environments 70

5.6 Speed detection system being tested in various environments 70

5.7 Speed detection system being tested in various environments 71

5.8 The width limit signs mistaken by speed limit 50 km/h 72

5.9 Speed limit 80 km/h mistaken by speed limit 60 km/h in few frames 73 5.10 Warning in case speed exceeds from 1-5 km/h 75

5.11 Warning in case speed exceeds more than 5km/h 76

5.12 Warning in case speed falls below minimum from 1-5 km/h 76

5.13 Warning in case speed falls below minimum over 5 km/h 77

Trang 14

List of Tables

2.1 An example to calculate dimension of output activation map 24

3.1 Traffic sign classes 36

4.1 Jetson Nano GPIO 56

4.2 Technical specifications of Jetson Nano Developer Kit B01 57

4.3 Technical specifications of Raspberry Pi Camera Module V2 58

4.4 ESP32 GPIO 59

4.5 Experiment Conditions 64

Trang 15

ADAS Advanced Driver Assistance Systems

ACC Adaptive Cruise Control

LDW Lane Departure Warning

GPS Global Positioning System

AI Artificial Intelligence

CV Computer Vision

CNN Convolutional Neural Networks

YOLO You Only Look Once

OBD-II On-Board Diagnostics II

SWS Speed Warning Systems

ROI Regions of Interest

mAP Mean Average Precision

FPS Frames Per Second

R-CNN Region-Based Convolutional Neural Network

Trang 16

SSD Single Shot MultiBox Detector

RPN Region Proposal Network

tanh hyperbolic tangent

ReLU Rectified Linear Unit

SPP Spatial Pyramid Pooling

ECU Electronic Control Unit

CAN Controller Area Network

FSM Finite State Machine

Trang 17

This introduction chapter of the study consists of 8 sections It begins with

a background explanation, highlighting the motivation behind selecting traffic signdetection as the topic of study The chapter then delves into an overview of AdvancedDriver Assistance Systems (ADAS) and emphasizes the importance of speed warningsystems within this context It explains that the development of a speed warningsystem necessitates a reliable model for detecting speed limit signs The objectives ofthe thesis are subsequently presented, outlining the specific goals to be achieved Theresearch methodology and the contributions of the study are discussed, along withthe defined scopes of investigation Finally, the chapter provides an outline of eachsubsequent chapter, giving readers a preview of the topics covered in the thesis

Trang 18

CHAPTER 1 INTRODUCTION

1.1 Background

Figure 1.1: Types of ADAS [1]

In recent years, Advanced Driver Assistance Systems (ADAS) have emerged as apromising approach to enhance driving safety and reduce the number of accidents onthe road ADAS utilize various technologies, such as sensors, cameras, and commu-nication systems, to provide drivers with advanced warning and assistance in criticaldriving situations

One of the most common ADAS features is Adaptive Cruise Control (ACC),which helps drivers maintain a safe distance from the vehicle in front by automati-cally adjusting the speed of the vehicle Another important ADAS feature is LaneDeparture Warning (LDW), which alerts drivers when they are drifting out of theirlane In addition,ADAScan also assist drivers in parking, with features such as park-ing sensors and automatic parking systems Blind spot detection systems can alsoprovide drivers with visual or auditory warnings when there is a vehicle or obstacle

in their blind spot

Speed Warning Systems (SWS) are also an important feature ofADAS, as ing is a common cause of accidents These systems can be implemented using varioustechnologies, such as Global Positioning System (GPS), camera-based object detec-tion, and communication with roadside infrastructure Some studies have shown that

Trang 19

speed-speed limit warning systems can be effective in reducing speed-speeding behavior and proving road safety [2].

im-Recent advancements in Artificial Intelligence (AI) and Computer Vision (CV)have led to significant improvements in the accuracy and reliability ofADAS Deeplearning based approaches, such as Convolutional Neural Networks (CNN), haveshown promising results in object detection and recognition tasks, which are impor-tant forADAS

Camera-based object detection has emerged as a promising technology for speedlimit warning systems The You Only Look Once (YOLO) object detection model is astate-of-the-art algorithm that has shown to be effective in detecting and tracking ob-jects in real-time [3] By using modelYOLO to detect and track speed limit signs onthe road, a speed limit warning system can provide accurate and reliable information

to the driver about the current speed limit

In addition to object detection, another key component of a speed limit warningsystem is the ability to determine the vehicle’s current speed The On-Board Di-agnostics II (OBD-II) system is a standard feature in modern vehicles that providesreal-time information about the vehicle’s performance By getting an data informationfromOBD-II system into the speed limit warning system, the system can accuratelycompare the vehicle speed with the detected speed limit signs and warn the driver ifthey are exceeding the limit Several studies have evaluated the effectiveness of speedwarning systems in real-world driving environments These studies have shown thatspeed warning systems can effectively reduce speeding behavior and improve driversafety

Trang 20

1.2 Literature Review

Speeding is a major cause of road accidents and poses significant risks to bothdrivers and pedestrians To address this issue, researchers and engineers have de-veloped various speed warning systems for automobiles These systems aim to alertdrivers when they exceed the speed limit, thereby promoting safer driving behavior

In this literature review, we explore three commonly used methods for implementing

SWS: GPS-based systems, systems that communicate with roadside infrastructure,and camera-based systems

TheSWScomprises two primary components Firstly, it detects the speed limitcorresponding to specific road infrastructure Secondly, it continuously monitors theactual speed of the vehicle By comparing the detected speed limit with the actualvehicle speed, the system determines whether the driver is exceeding the speed limit

If a violation is detected, the system generates appropriate speed warning messages

to alert the driver

GPS-based SWSrely on GPStechnology to determine the vehicle’s current cation and calculate the corresponding speed limit These systems typically use datafromGPSsatellites to determine the vehicle’s location and speed, and cross-referencethis information with a digital map to determine the speed limit for that particularstretch of road [4] Recent studies have evaluated the effectiveness of SWS in im-proving driver behavior and reducing the number of accidents on the road A studyconducted by Song Wang et al [5] found thatSWSwere effective in reducing speed-ing behavior among drivers, and were particularly effective in areas with high acci-dent rates Furthermore, there are several popular SWS available in different coun-tries, such as the speed limit warning feature in Google Maps [6], which is currentlyavailable in over 40 countries excluding Vietnam In the Vietnam market, there arealso speed warning systems like Vietmap [7], which utilize GPS and are directly in-

Trang 21

lo-tegrated into their dash cameras Another recently introduced software is GSpeed

by iCar [8], which was launched in June 2023 and can be integrated into the car’smonitor The utilization of GPS-based methods is widespread, but it necessitates asubstantial database However, this approach has certain drawbacks, such as the lack

of real-time updates In some instances, the speed limit may have changed, but thesystem still relies on outdated information from its database Additionally, when twoparallel routes exist, the system may struggle to accurately detect the correct roadbeing traveled on

Figure 1.2: GSpeed, based on GPS and developed by iCar [8]

Another approach to SWS involves communication between the vehicle androadside infrastructure These systems rely on the exchange of information betweenthe vehicle and infrastructure, such as traffic signs or intelligent transportation sys-tems By receiving speed limit data from the infrastructure such as smart road signs,the system can promptly warn the driver if they are driving above the prescribed limit

In 2016, Sharpe et al [9] has implemented the wireless communications betweenroad signs and vehicles in order to determine the speed limit and issue warning mes-sages if the driver exceeds the limit The system involved multiple microcontrollersthat communicated with each other One microcontroller was integrated into the traf-fic signs, while another was installed in the vehicle, allowing data to be exchangedbetween them This approach enabled accurate monitoring of the speed limit for thevehicle However, it should be noted that this method requires extensive investigation

Trang 22

and implementation of infrastructure on the road, which may not be feasible in thecurrent traffic conditions in Vietnam

Figure 1.3: Concept of Smart Road Signs communicate to vehicles [9]

Camera-based SWSutilize computer vision techniques to detect and recognizespeed limit signs In a study conducted by Chang et al in 2015 [10], they devel-oped a speed warning system for automobiles using computer vision techniques on amobile device Their approach involved extracting red color pixels to define Regions

of Interest (ROI) and utilizing pre-defined template numbers for pattern matching.However, this method had limitations One drawback was that traffic signs can vary

in their fonts, requiring a diverse range of template numbers for accurate detection.Additionally, environmental conditions such as rain or nighttime can cause blurriness

in the traffic signs, posing further challenges for detection

Overall,SWSplay a crucial role in promoting safe driving practices and ing the occurrence of speeding-related accidents This literature review examinedthree popular methods for implementing speed warning systems: camera-based sys-tems, GPS-based systems, and systems that communicate with roadside infrastruc-ture Each method offers unique advantages and has been the subject of extensive re-search Camera-based systems leverage computer vision techniques to accurately de-tect speed limit signs, whileGPS-based systems utilizeGPStechnology to determinethe vehicle’s position and calculate the corresponding speed limit Communication-

Trang 23

reduc-based systems enable real-time information exchange between the vehicle and side infrastructure Further research and advancements in these areas can contribute

road-to the development of more robust and effective SWS, ultimately enhancing roadsafety and reducing the risks associated with speeding

Since 2010s, there has been a growing trend in utilizing camera-based objectdetection systems for the purpose of traffic sign detection This approach involvesthe application of deep learning algorithms, particularlyCNN, which have shown re-markable capabilities in accurately detecting and recognizing various types of trafficsigns [11, 12] These systems can be trained to recognize a variety of traffic signsincluding speed limit signs, and are able to work in a variety of lighting and weatherconditions In the year 2022, a comparative experiment was conducted on the GermanTraffic Sign Recognition benchmark dataset [13] with 43 classes, specifically compar-ing the performance of two popular object detection algorithms: Faster Region-BasedConvolutional Neural Network (R-CNN) [14] and YOLOv4 [15] The results of thisexperiment revealed that Faster R-CNN achieved a Mean Average Precision (mAP)

of 43.26% while operating at a speed of 6 Frames Per Second (FPS) On the otherhand,YOLOv4 exhibited superior performance with anmAP of 59.88% at a signif-icantly higher detection speed of 35 FPS These findings highlight the suitability of

YOLOv4 for real-time traffic sign detection, offering a combination of higher sion and faster detection speeds

preci-In summary, the use of deep learning for traffic sign detection has extensive plications and contributions However, there is currently a gap in the implementation

ap-of this technology for speed warning systems Therefore, this study aims to fill thisgap by evaluating the application of deep learning algorithms in traffic sign detectionfor speed warning purposes

Trang 24

Object detection is a fundamental problem in computer vision, with many cations such as autonomous driving and intelligent transportation systems The twomain categories of object detection methods are one-stage and two-stage detectors.One-stage detectors such asYOLO[3] and Single Shot MultiBox Detector (SSD) [16]can detect objects in a single pass, while two-stage detectors such as FasterR-CNN

appli-[14] and Mask R-CNN [17] first propose object regions before detecting the objectwithin those regions

Faster R-CNN is a two-stage object detection method that first proposes objectregions and then classifies objects within those regions It uses a Region ProposalNetwork (RPN) to propose regions that might contain objects and then uses a secondnetwork to classify objects within those regions FasterR-CNNhas high accuracy but

is slower than one-stage detectors such asYOLOandSSD[18]

MaskR-CNNextends FasterR-CNNby adding a branch to predict object masks

in addition to object classes and bounding boxes It achieves state-of-the-art accuracy

in object detection and instance segmentation tasks, but it is computationally sive and has a slow detection speed

expen-YOLO is a popular one-stage object detection method that uses a single neuralnetwork to predict bounding boxes and class probabilities It divides the input imageinto a grid of cells and predicts the class probabilities and bounding boxes for eachcell YOLO has a fast detection speed and can achieve real-time performance onlow-power devices [19]

SSD is another one-stage object detection method that predicts object classesand bounding boxes from feature maps of different resolutions It uses convolutionalfilters of different sizes to detect objects at different scales SSDis faster than Faster

R-CNN, but its accuracy is slightly lower, especially for small objects [18]

Trang 25

Figure 1.4: The comparison of YOLOv3 on performance [20]

Figure1.4illustrates the performance comparison betweenYOLOv3 and anothermethods It can be observed that YOLOv3 outperforms in terms of detection speed,indicating its suitability for real-time object detection applications

In summary, one-stage detectors such as YOLO and SSD are faster but havelower accuracy compared to two-stage detectors such as Faster R-CNN and Mask

R-CNN The choice of which method to use depends on the specific applicationrequirements such as speed and accuracy

1.3 Research Objectives

This study has two objectives The primary objective is to develop a real-timetraffic sign detection system, focusing specifically on the detection of speed limitsigns, utilizing the YOLOv5 model The aim is to design and implement an efficientand accurate algorithm that can detect and recognize speed limit signs in real-timescenarios

Trang 26

Secondly, the study aims to evaluate the practical application of the developedtraffic sign detection system by integrating it with a speed warning system This in-volves utilizing the trained speed limit sign detection model to monitor and comparethe actual vehicle speed with the speed limit By implementing the speed warningsystem, the study aims to provide timely warnings to the driver in the event of ex-ceeding the speed limit, promoting safer driving practices

Overall, this research seeks to contribute to the field of Computer Vision (CV)and traffic safety by designing and implementing an effective traffic sign detectionsystem and demonstrating its practical application in a Speed Warning Systems (SWS)

1.4 Research Methodology

The research methodology employed in this study is empirical research, whichinvolves gathering real-world data and conducting experiments to test the effective-ness and performance of the developed system The research focuses on training amodel and implementing the speed warning system on an embedded device, followed

by testing and evaluation in various real-world scenarios

1.5 Research Contents

Figure 1.5: Research Contents and Workflows

The research consists of four main sections: research onADAS and object tection methods, prepare the training dataset, train the traffic sings detection model,, and evaluating its performance through experimentation to validate its detection ca-pabilities

de-The first section focuses on researching ADAS and various object detection

Trang 27

methods It involves studying the existing literature, analyzing different approaches,and understanding the principles behindADASand object detection technologies.The second section of the research centers on preparing the training dataset forthe traffic sign detection model It involves collecting relevant data, such as images

or videos of traffic signs, and annotating them with appropriate labels The datasetneeds to be diverse and representative to ensure effective training of the model.The model training section involves training the traffic sign detection modelusing the prepared dataset It includes selecting an appropriate model architecture,configuring the training parameters, and optimizing the model through the trainingprocess

The final section of the research entails running the traffic sings detection model

on an embedded device and integrating it with OBD-II communication to read thevehicle speed data This allows for conducting experiments in real-world scenarios.The testing procedures are designed to evaluate the system’s performance, includingits detection accuracy, real-time capabilities, and overall effectiveness in providingspeed warnings

In conclusion, the study aims to gain insights into the object detection methodssuitable for camera-based speed warning systems It also aims to thoroughly un-derstand and utilize the traffic signs detection model Finally, the research aims toevaluate the implemented system by conducting experiments in real-world scenariosand analyzing the results using appropriate evaluation metrics The findings from thisresearch will contribute to advancing the field of speed warning systems and providevaluable insights for future enhancements and applications

Trang 28

1.6 Scope of Research

This study has a defined scope of research that is guided by certain limitations.The study is conducted within the boundaries set by these limitations

The classes of traffic signs detection include 10 classes: Speed limit 50 km/h,

60 km/h, 70 km/h, 80 km/h, 100 km/h, 120 km/h, start of residential area, end ofresidential area, and end of speed limit

The system does not include the ability to recognize auxiliary signs attached

to the main speed limit signs The system does not able to set the priority betweentemporary signs and permanent signs

1.7 Research Contributions

This study makes two significant contributions Firstly, in terms of scientificcontribution, it provides an evaluation and insights into the practical applications oftheYOLOv5 object detection model specifically for traffic sign detection

Secondly, in terms of practical significance, it extends the application of artificialintelligence to the field ofADAS, bringing advancements and potential benefits to theautomotive industry

1.8 Research Outline

Chapter 1 provides an introduction of the objectives, scope of the thesis andpresents a comprehensive review of the relevant literature and previous studies related

to traffic signs detection and speed warning systems

Chapter 2 describes the theory of deep learning model architecture and the otherfundamental to perform the model on embedded devices

Trang 29

Chapter 3 describes the data collection, pre-processing, training model, andevaluate the model training results.

Chapter 4 presents the experiment setup to evaluate the traffic signs detectionmodel and the speed warning system applications

Chapter 5 shows the experimental results demonstrate both the detection bilities and the instances of errors

capa-Chapter 6 the conclusions of the study and areas for future research

Trang 30

Chapter 2

Fundamentals

This chapter aims to understand the Convolutional Neural Networks (CNN),deep learning algorithm This knowledge serves as the fundametals for comprehend-ing the YOLO model, which will be utilized for traffic sign detection in this study.The chapter will explain the details of theYOLOv5 architecture and explore evalua-tion metrics used to evaluate the model’s capabilities Additionally, it will provide anoverview of the toolchain required for training theYOLOmodel

Trang 31

2.1 Convolutional Neural Networks

The Convolutional Neural Networks (CNN) have emerged as a powerful andwidely used deep learning algorithm in various domains, including computer vision

CNNhave revolutionized the field of image recognition and analysis by ing superior performance in tasks such as object detection, image classification, andsemantic segmentation This introduction aims to provide an explanation of CNN,their underlying principles, and their significance in deep learning

demonstrat-Figure 2.1: An example of CNNarchitecture to classify handwritten digits [21]

The convolutional layers play a crucial role in extracting features from the inputdata During the convolution operation, a convolution kernel is applied to the inputmatrix of the layer The kernel performs a dot product with the input matrix, typicallyusing the Frobenius inner product Convolutional layers perform convolutions on theinput data and transmit the output to the subsequent layer

The result of convolutional layers also called feature map Using the given ample illustrated in the Figure2.2, an input image with a single channel (K = 1) and

Trang 32

ex-CHAPTER 2 FUNDAMENTALS

Figure 2.2: The Convolution Operation [21]

dimensions of 6x6 (W = 6), and we apply a filter (or kernel) of size 3x3 (F = 3) tothe input image with a stride (S = 1) of 1 and no padding (P = 0), the dimension ofthe output feature map (also known as the activation map) can be calculated using thefollowing formula:

Figure 2.3: An example of convolution with stride equal to 2 [21]

Padding refers to the technique of adding extra pixels to the borders of an image

Trang 33

or feature map before applying a convolutional operation The purpose of padding is

to preserve the spatial dimensions of the input while preventing information loss atthe borders

Figure 2.4: An example of padding in convolutional [21]

Figure 2.5: An example of max pooling and average pooling [21]

Trang 34

CHAPTER 2 FUNDAMENTALS

A fully connected layer, also known as a dense layer or a fully connected neuralnetwork layer, is a type of layer in a neural network where each neuron is connected

to every neuron in the previous layer In other words, the outputs of all neurons in theprevious layer serve as inputs to every neuron in the fully connected layer

In a fully connected layer, each neuron performs a weighted sum of its inputs,followed by the application of an activation function The weights and biases as-sociated with each connection are learned during the training process of the neuralnetwork

The purpose of fully connected layers is to enable the network to learn complexnonlinear relationships between the input data and the target output These layers areoften added at the end of the network, following a series of convolutional or poolinglayers, to perform high-level feature extraction and classification

Figure 2.6: An example of the fully connected layer’s input multiplied by the weightsmatrix to receive the output vector [22]

In Figure 2.6, a specific example of a fully connected layer is illustrated Theinput to this layer is a vector with dimensions 1x9, meaning it has 9 elements The

Trang 35

output vector of the fully connected layer is obtained by performing a dot productbetween the input vector and the weights matrix, followed by a non-linear transfor-mation using an activation function The resulting output vector has dimensions 1x4,representing four learned features or aspects.

An activation function in a neural network is a mathematical function that duces non-linearity to the network’s output It is applied to the weighted sum of theinputs in a neuron, determining whether the neuron should be activated (fire) or not.The activation function adds non-linearity to the network, enabling it to learncomplex patterns and make more accurate predictions Without an activation func-tion, the neural network would simply be a linear combination of the input values,which limits its representation and learning capabilities

intro-Commonly used activation functions include the sigmoid function, hyperbolictangent (tanh) function, and Rectified Linear Unit (ReLU) function

The sigmoid function maps the input to a value between 0 and 1:

Figure 2.7: Plot of sigmoid activation function [23]

Trang 36

Figure 2.8: Plot oftanh activation function [23]

The ReLU function sets negative inputs to zero and keeps positive inputs changed

Figure 2.9: Plot ofReLU activation function [23]

Trang 37

The deep learning algorithm CNNare used by the YOLO method to recognizeobjects in real-time As the name indicates, the technique only needs one forwardpropagation through a neural network in order to detect objects.

Figure 2.10: How YOLO works [3]

YOLO first divides the image into N grids before doing object detection in a

Trang 38

CHAPTER 2 FUNDAMENTALS

single step (figure 2.10) These grids are of the same S × S size It is utilized tofind and locate any objects that might be present in any of these areas Boundingbox coordinates,B, for any prospective objects are predicted for each grid along withtheir object labels and a probability rating for their presence These predictions areencoded as anS × S × (B ∗ 5 + C)tensor C in this case is numbers of class

em-Another important aspect ofYOLOv5 is its flexibility and scalability The model

is available in various sizes, ranging from the smallYOLOv5-tiny model to the large

YOLOv5x model, which can handle complex object detection tasks Additionally,

YOLOv5 supports a wide range of input image sizes, making it suitable for a variety

of applications

The normalYOLOnetwork consists of three main part:

• Backbone is composed of convolutional layers that extract high-level featuresfrom the input image

• Neck helps to fuse the features from the backbone network and improve thedetection accuracy

• Head is responsible for predicting the bounding boxes and class probabilities forthe detected objects

Trang 39

InYOLOv5, the backbone consists of CSPDarknet-53 (Cross-Stage Partial work [24]), which is a deeper and wider version of Darknet-53 used inYOLOv3 [20]

Net-Figure 2.11: Darknet-53 Architecture [20]

CSPDarknet-53 has 53 layers deep, introduces a concept of cross-stage partialconnections, where the input feature maps are split into two paths One path goesthrough a convolutional block while the other path bypasses the block The outputs

of both paths are then concatenated, creating a fused representation It leverages thecross-stage partial connections to improve the information flow and promote betterfeature learning This architectural modification has shown to bring performancegains over Darknet-53, resulting in improved object detection accuracy

In the Darknet-53 architecture [20], there are five convolutional layers that ify the size of the input image Depending on the size of the input image, the finalactivation map will have dimensions equal to the input image size divided by 32 The

Trang 40

mod-CHAPTER 2 FUNDAMENTALS

Figure 2.12: (a) DenseNet and (b) Cross Stage Partial DenseNet [24]

Table2.1provided an illustrative example

Table 2.1: An example to calculate dimension of output activation map

Convolution Layer Input size Output sizeP1 [64, 6, 2, 2] 640x640x3 320x320x64P2 [128, 3, 2] 320x320x64 160x160x128P3 [256, 3, 2] 160x160x126 80x80x256P4 [512, 3, 2] 80x80x256 40x40x512P5 [1024, 3, 2] 40x40x512 20x20x1024

One observation is that as the width and height of the output activation mapdecrease, the depth or number of channels will increase

Neck

The neck network consists of two convolutional layers that reduce the spatialdimensions of the feature maps and merge them to produce a single feature map withhigher resolution

In theYOLOv5 object detection architecture, the "neck" refers to the set of layersthat follow the backbone and precede the head The purpose of the neck is to processthe feature maps produced by the backbone and extract higher-level representationsthat are more suitable for the detection task

In YOLOv5, the neck is composed of a single module called Spatial Pyramid

Tiêu đề	Study on camera-based real-time car speed monitor using yolov5 multiple object detection model
Tác giả	Nguyễn Ngọc Trực
Người hướng dẫn	Trần Đăng Long, Ph.D
Trường học	Ho Chi Minh City University of Technology
Chuyên ngành	Vehicle Engineering
Thể loại	Thesis
Năm xuất bản	2023
Thành phố	Ho Chi Minh City

Định dạng
Số trang	101
Dung lượng	5,49 MB