Ministry of education and trainingho chi minh city university of technology and educationfaculty

MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITYUNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR HIGH QUALITY TRAINING GRADUATION PROJECT COMPUTER ENGINGEERING TECHNOLOGY DESIGN AND

Trang 1

MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY

UNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR

HIGH QUALITY TRAINING

GRADUATION PROJECT COMPUTER ENGINGEERING TECHNOLOGY

DESIGN AND IMPLEMENTATION OF

CLASSFICATION AND DELIVERY BASED

Trang 2

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND

EDUCATION FACULTY FOR HIGH QUALITY TRAINING

GRADUATION PROJECT

CLASSFICATION AND DELIVERY BASED ON

COMPUTER VISION

PHẠM MINH QUÂN Student ID: 18161031 NGUYỄN HOÀI PHƯƠNG UYÊN Student ID: 18119053

Major: COMPUTER ENGINEERING TECHNOLOGY

Advisor: TRƯƠNG QUANG PHÚC, M.Eng

Ho Chi Minh City, December 2022

Trang 3

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND

EDUCATION FACULTY FOR HIGH QUALITY TRAINING

GRADUATION PROJECT

CLASSIFICATION AND DELIVERY BASED

ON COMPUTER VISION

PHẠM MINH QUÂN Student ID: 18161031 NGUYỄN HOÀI PHƯƠNG UYÊN Student ID: 18119053

Major: COMPUTER ENGINEERING TECHNOLOGY

Advisor: TRƯƠNG QUANG PHÚC, M.Eng

Ho Chi Minh City, December 2022

Trang 4

THE SOCIALIST REPUBLIC OF VIETNAM

Independence – Freedom– Happiness

-Ho Chi Minh City, December 25, 2022

GRADUATION PROJECT ASSIGNMENT

Student name: Phạm Minh Quân

Student name: Nguyễn Hoài Phương Uyên

Major: Computer Engineering Technology

Advisor: Trương Quang Phúc, MEng

Date of assignment:

_

Student ID: 18161031Student ID: 18119053Class: 18119CLA

Phone number: _

Date of submission: _

1 Project title: Design and Implementation of classification and delivery based on ComputerVision

2 Initial materials provided by the advisor: _

3 Content of the project: _

4 Final product:

CHAIR OF THE PROGRAM ADVISOR

(Sign with full name) (Sign with full name)

Trương Quang Phúc

Trang 5

THE SOCIALIST REPUBLIC OF

VIETNAM

ADVISOR’S EVALUATION SHEET

Student ID: 18161031Student ID: 18119053Major: Computer Engineering Technology

Project title: Design and Implementation of classification and delivery based on ComputerVision

Advisor: Trương Quang Phúc, MEng

EVALUATION

1 Content of the project:

2 Strengths:

3 Weaknesses:

4 Approval for oral defense? (Approved or denied)

5 Overall evaluation: (Excellent, Good, Fair, Poor) Good

6 Mark: 9.0 (in words: )

Ho Chi Minh City, December 25, 2022

ADVISOR

(Sign with full name)

Trang 6

HO CHI MINH CITY UNIVERSITY THE SOCIALIST REPUBLIC OF VIETNAM TECHNOLOGY AND EDUCATIO Independence – Freedom - Happiness

FACULTY FOR HIGH QUALITY TRAINING

Ho Chi Minh City, January 13, 2023

MODIFYING EXPLANATION OF THE GRADUATION PROJECT

MAJOR: COMPUTER ENGINEERING

1 Project title: Design and Implementation of classification and delivery based on Computer

Vision

2 Student name: Phạm Minh Quân Student ID: 18161031

Student name: Nguyễn Hoài Phương Uyên Student ID: 18161031

3 Advisor: Trương Quang Phúc, Meng

4 Defending Council: Council 2, Room: A3-404, 3rd January 2023

5 Modifying explanation of the graduation project:

TT Council comments Editing results Note

1 Many figures in chapter 2 are reused from other Many figures in chapter 2 are provided

sources without providing the related references the related references

2 Visual quality of many figures in chapter 3 is Many figures in chapter 3 are modified to

very low and hard to follow improve their visual quality

In conclusion: Author should clearly point out

The conclusion section is clearly pointed

3 which objective have accomplished instead of a

out which objective have accomplishedgeneral summarization

4 The flowchart of figure 3.6 must have a “Begin” The flowchart of figure 3.6 is modified

point and a “End” Point in a terminator shape

Head of Department Advisor Students

(Sign with full name) (Sign with full name) (Sign with full name)

Trang 7

Trang 8

PRE-DEFENSE EVALUATION SHEET

Project title: Name of Reviewer:

EVALUATION

1 Content and workload of the project

2 Strengths:

3 Weaknesses:

4 Approval for oral defense? (Approved or denied)

Trang 9

-EVALUATION SHEET OF DEFENSE COMMITTEE MEMBER

Project title:

Name of Defense Committee Member:

EVALUATION

1 Content and workload of the project

2 Strengths:

3 Weaknesses:

4 Overall evaluation: (Excellent, Good, Fair, Poor)

Trang 10

SUPERVISOR APPROVAL

Trang 11

Over the course of undertaking the project, our group received a plenty of valuablesupport that incentivize us to overcome all the problems and challenges and end upthis quite hard and meaningful project

Firstly, we would like to thanks to the School Board of the Ho Chi Minh CityUniversity of Technology and Education and Faculty for High Quality Trainingcreating wonderful conditions for me to take my project

Secondly, sincerely thank to Mr.Trương Quang Phúc, our advisor who gave us usefulguidance and instruction that help us to finish our project successfully From theseadvices we can improve our project contents and correct the mistakes as well

Thirdly, we are grateful to all of the nice classmates of class 18119CLA whocontributed to give advice and warm guidance whenever we need the support

Last but not least, Due to limited knowledge and implementation time, we cannotavoid errors We look forward to receiving your comments and suggestions to improvethis topic

In short, we really thank to all people are a part of our achievement

Ho Chi Minh city, Friday, December 23, 2022

Student performancePham Minh QuanNguyen Hoai Phuong Uyen

Trang 12

Currently, both the national and international industries are growing quickly.Manufacturers are interested in the trend of industry combined with automation Withthe advancement of digital technology, automatic lines are becoming more widelyused in manufacturing Manufacturing companies are constantly improving their owntechnology and machinery systems in order to produce high-quality products at themost competitive prices That is the foundation for improving competitive position andassisting businesses to stand firm in a competitive market The following article willdelve deeper into the role of automation in modern manufacturing

It is because the logistics industry provides numerous benefits, such as increasedproductivity, lower kernel costs, improved product quality, and lower raw materialcosts

Our team decided to implement the topic "Automatic parcels classification anddelivery through image processing" after absorbing information and researching theautomation industry The convolution neural network used which is one of the mostwidely used networks in the field right now for character recognition, barcoderecognition Generalizing the problem, we choose convolutional neural networkbecause it has a complex architecture and large parameters, good enough for objectrecognition We additionally prepare a sizable enough data collection, which includes

a few particular examples, to provide the training procedure with excellent outcomes

In order to maximize data inference and training time, we additionally chosen to useNVIDIA's Jetson Nano hardware to maximize the GPU's processing capability JPGfiles with the collected data are used for both testing and training

Trang 13

TABLE OF CONTENTS

LIST OF PICTURES vi

LIST OF TABLES viii

ABBREVIATIONS ix

CHAPTER1: OVERVIEW 1

1.1 Introduction 1

1.2 Objective 2

1.3 Limitation 2

1.4 Research Method 2

1.5 Object and Scope of the study 2

1.5.1 Object of the study 3

1.5.2 Scope of the study 3

1.6 Outline 4

CHAPTER 2: BACKGROUND 6

2.1 AI Technology 6

2.1.1 Overview of CNN 6

2.1.2 Yolo network 9

2.1.3 Yolov7 12

2.1.4 OCR Theory 16

2.1.5 Tesseract model 17

2.2 Barcode Technology 18

2.2.1 Introduction to barcode 18

2.2.2 Barcode types 19

2.2.3 The methods of Barcode scanning 20

2.2.4 Code 128 22

2.3 The overview of AGV 24

2.3.1 The introduction of AGV 24

2.2.2 The fundamental architecture of an AGV system 25

2.4 PYQT5 Platform 29

2.5 Firebase 31

2.5.1 Introduction to Firebase 31

2.5.2 Some features of Firebase 32

2.5.3 The pros and cons of firebase 33

2.6 Other techniques used in the project 33

2.6.1 The working principle of the infrared sensor circuit in the vehicle's line detector 33 2.6.2 Pulse width modulation (PWM) 34

2.6.3 General operating principles of Automatic Traction Robot 35

Trang 14

2.6.4 The method for establishing the robot's location in relation to the line 36

2.6.5 Serial Peripheral Interface (SPI) 39

CHAPTER 3: DESIGN AND IMPLEMENTATION 43

3.1 System requirements 43

3.2 Block diagram 43

3.3 AI System 45

3.3.1 Hardware Design 45

3.3.2 Detail Software Design 47

3.4 AGV SYSTEM 54

3.4.1 Machenical design 55

3.4.1 The detail hardware design 60

3.4.2 The schematic diagram of AGV system 71

3.4.3 Software Design 72

3.5 User interface of the delivery application 74

3.6 Firebase Realtime database 75

CHAPTER 4: RESULT 77

4.1 Introduction: 77

4.2 Hardware implementation 77

4.3 System Operation 80

4.4 Software System 82

4.4.1 The barcode generation process 82

4.4.2 Result of labelling process 83

4.4.3 Annotations for data 84

4.4.4 Training process 85

4.4.5 The text detecting process by Tesseract OCR model 86

4.4.6 Interface of delivery app 87

4.5 Evaluation and comparison 88

4.5.1 Comparison Yolo with others CNN 88

4.5.2 Comparison between Yolo and others CNN 92

CHAPTER 5: CONCLUSION AND FUTURE WORK 94

5.1 Conclusion 94

5.2 Future Work 94

APPENDIX 95

REFERENCE 95

Trang 15

LIST OF PICTURES

Figure 2.1: Architecture of CNN network 6

Figure 2.2: This is an image of sliding kernel through input’s matrix 7

Figure 2.3: Architecture of Yolo network 10

Figure 2.4: Identify anchor box of an object 11

Figure 2.5: The active way of YOLO 12

Figure 2.6: Architecture of E-ELAN 14

Figure 2.7: Architecture of Compound Model Scaling in YOLOv7 15

Figure 2.8: Architecture of Planned Re-parameterized Convolution 16

Figure 2.9: Architecture of Coarse for Auxiliary and Fine for Lead Loss 16

Figure 2.10: CCD Scanner 21

Figure 2.11: Laser Scanner 21

Figure 2.12: Read barcodes with Camera Software 22

Figure 2.13: Code 128 23

Figure 2.14: The part of a Code 128 Barcode 24

Figure 2.15: AGV vehicle operation diagram 25

Figure 2.16: The basic structure of an AGV system 25

Figure 2.17: Towing type 27

Figure 2.18: Cargo type 28

Figure 2.19:Forklift 28

Figure 2.20: Inference of Qt Designer Software 29

Figure 2.21: Image of QMainWindow 31

Figure 2.22: Firebase Database 32

Figure 2.23: Working principle 34

Figure 2.24: Principle diagram of infrared sensor 34

Figure 2.25: Time diagram of the PWM pulse 35

Figure 2.26: General structure of line detection robot 36

Figure 2.27: The robot is in the middle of the line 36

Figure 2.28: The robot is moving to the right level 1 37

Figure 2.29: The robot is moving to the right level 2 37

Figure 2.30: Robot turns left 38

Figure 2.31: Sensor deviating from line 38

Figure 2.32:Communication between 1 master and 1 salve 39

Figure 2.33: Independent mode in SPI protocol 40

Figure 2.34: Daisy mode in SPI protocol 40

Figure 2.35: SPI protocol operation mode 41

Figure 2.36: The communication process between master and salve uses SPI protocol 42

Figure 3.1: Block diagram of automatic classification and transformation system 44

Figure 3.2: The detailed block diagram of the AI system 45

Figure 3.3: Top view of Jetson Nano 46

Figure 3.4: Top view of Logitech C310 HD Webcam 47

Figure 3.5: The pipeline of AI system 48

Figure 3.6: Flowchart of generating bar code 49

Figure 3 7: Pipeline of pre-processing image 50

Figure 3.8: Format to export dataset 50

Figure 3.9: Pipeline of training data using Yolov7 model 51

Figure 3.10: Command Line to train dataset on Google Colab 52

Figure 3.11: Command Line to test image from dataset 52

Figure 3.12: Pipeline of processing of Tesseract 53

Figure 3.13: The flowchart of detecting barcode 54

Trang 16

Figure 3.14: Top view of a V1 reducer wheel 56

Figure 3.15: Coordinate system of robot AGV 56

Figure 3.16: Model of forces acting on the wheel 57

Figure 3.17: Dual Shaft Plastic Geared TT Motor 58

Figure 3.18: Calculation model and force analysis when the car is cornering 59

Figure 3.19: The 3D design of robot AGV 60

Figure 3.20: The block diagram of AGV system 61

Figure 3.21: The topview of ESP32 63

Figure 3.22: RFID MFRC 522 Module 64

Figure 3.23: Line Detection and Obstacle Avoidance Sensor 5 LED BFD-1000 65

Figure 3.24: Top view of the SG90 Servo Motor 66

Figure 3.25: The detail schematic for buzzer block 67

Figure 3.26: Top view of module L298N driver motor 68

Figure 3.27: Top view of V1 geared DC motor 68

Figure 3.28: Top view of Lithium-ion 18650 battery 69

Figure 3.29: Top view of LM2586HVS 3A DC to DC Step Down Buck Converter 70

Figure 3.30: The schematic of power supply block 70

Figure 3.31: The schematic diagram of AGV system 71

Figure 3.32: The operating diagram of AGV system 72

Figure 3.33: The flowchart of main program 73

Figure 3.34: The operating diagram of delivery application 74

Figure 3.35: The database of system ……… 75

Figure 4.1: An automatic parcel classify and delivery system model 77

Figure 4.2: AI system model 78

Figure 4.3: Model of vehicle system AGV 79

Figure 4.4: Line structure 80

Figure 4.5: The shape of the cargo block 80

Figure 4.6: AI system successfully recognizes and predicts barcodes 81

Figure 4.7: AGV vehicle at the delivery location 82

Figure 4.8: AGV vehicle returns to the starting position 82

Figure 4.9: Result of generating bar-code folder process 83

Figure 4.10: Result of each bar code in Code128 format 83

Figure 4.11: Result of labelling each image 84

Figure 4.12: Result of add annotation process 85

Figure 4.13: Yolov7 dataset training results were successful 86

Figure 4.14: The model's performance when tested with the input image 86

Figure 4.15: Accuracy of input image after testing process 87

Figure 4.16: The interface of delivery app 88

Figure 4.17: Comparison of mAP and FPS between Yolov7 with Yolov5, Yolov6 on CPU 89 Figure 4.18: Comparison of mAP and FPS between Yolov7 with Yolov5, Yolov6 on GPU 89 Figure 4.19: Comparison of mAP and FPS between Yolov7 with Yolov5, Yolov6 on TELSA P100 90

Figure 4.20: Comparison of AP and inference time between Yolov7 models with others Yolo’s different version models 92

Trang 17

LIST OF TABLES

Table 2.1: Describe the types of 1D barcodes used in industries 19

Table 2.2: The pros and cons of firebase 33

Table 3.1: Specifications Dual Shaft Geared Plastic TT Motor 58

Table 3.2: Power consumption esitmate of AGV system 70

Table 4.1: Comparison of mAP and FPS between Yolov7 with others version of Yolo 91

Table 4.2: Comparison between Yolo and others CNN 92

Trang 18

LIST OF ABBREVIATIONS

YOLO You only look once

OCR Optical Character RecognitionCNN Convolutional Neural NetworkSSD Single Shot Multi DetectorRCNN Region Convolutional Neural NetworkIoT Internet Of Things

IoU Intersection of Union

mAP Mean Average Precision

FPS Frames per second

AGV Automation Guided Vehicle

Trang 19

CHAPTER1: OVERVIEW

1.1 Introduction

In the current opening period, Vietnam's economy is facing many developmentopportunities, the logistics service industry is one of the economic developmentprospects that bring positive results for the country With a developed economy,multimodal transport services (logistics) have become a service industry that integratesmany high value-added activities, bringing great economic benefits, Vietnam and theeconomic environment Favorable business, high development opportunities promise

to develop strongly the service market in the coming time Developing logistics inlow-and middle-income countries can boost trade growth and benefit both businessesand consumers with cheaper prices and guaranteed service quality However, ourcountry's logistics service industry currently has many limitations, in order to thrive, it

is necessary to consider many factors and development directions For example, atpresent, Vietnam's small businesses providing logistics services are still at a low level

in applying technology to their business activities In particular, a lot of shippingbusinesses today still use manual product classification, which has limitations such ashigh error rate, time consuming sorting, labor costs for processing large productclassification Therefore, the application of science and technology to logisticsactivities is essential to overcome the remaining limitations in small and mediumenterprises in Vietnam

Artificial Intelligence (AI) is a global technology trend, attracting investment frombusinesses in its application to business, production and management processes If anybusiness knows how to make the most of the superior benefits of AI technology, theywill certainly have a strong advantage in the growth race of the digital transformationera One of the breakthroughs of AI is Deep Learning whereby businesses can easilyapply artificial intelligence (AI) to solve problems in life

Due to the outstanding advantages of Deep Learning compared to current algorithms,our team decided to apply Deep Learning algorithms to the model to implement automaticparcel classification to optimize the product classification process products, reduce therate of errors caused by human causes, reduce labor costs, shorten delivery times in thefield of logistics That can help us better understand how to operate

Trang 20

and have more clear judgments about the advantages and disadvantages that we can dothrough the model From the above assumptions, we decided to carry out the project

"Automatic parcels classify and delivery with AI technology" as the graduation project

• Building AGV (Automatically Guided Vehicle) controlled by an AI system through WIFI connection to transport parcels to each fixed compartment

• In addition, building tracking software displays the location of autonomousvehicles, the location of the dispatch box, and the number of parcels in each dispatchbox

1.3 Limitation

The project has the following limitation:

• The topic mainly focuses on identifying and classifying parcels with code 128and identifying code 128 on packages performed in good light conditions with a directshooting angle

• The parcels for classification in the subject are small volumes

• The AGV vehicle system for product sorting has a low load capacity and is implemented in indoor conditions away from direct sunlight

• The tracking software can only use on computers

1.4 Research Method

Analysis and evaluation of energy efficiency, processing speed, and performance

on embedded systems of neural network models in barcode recognition

Learn the parameters of the neural network model, then design the network model

to train the system to execute barcode recognition

Analyze and evaluate the system's functions, then select the hardware for the AI system and the AGV (Automatically Guided Vehicle)

1.5 Object and Scope of the study

Trang 21

1.5.1 Object of the study

To make it easier to approach the problems, the group researched the researchsubjects to understand better how the topic was implemented Below are the subjectsthe group conducted the study:

• Nvidia Jetson Nano Developer Kit for AI Application Deployment Hardware: The product is a small but potent embedded computer that allows running modern

AI algorithms quickly, with a 64-bit ARM quad-core CPU, an onboard 128-coreNVIDIA GPU, and 4GB LPDDR4 memory It is possible to run multiple neuralnetworks in parallel and process several high-resolution sensors simultaneously

• Neural Network:

Yolo (You Look Only Once) and OCR (Optical Character Recognition)

• MCU ESP32 for AGV vehicle system control device for parcel sorting:

The product is a wifi transceiver KIT based on the ESP32 Wifi SoC chip and thepowerful CP2102 communication chip Used for applications that need to connect,collect data, and control via Wifi waves, especially applications related to IoT

• PyQT5 framwork for build monitoring and operating programs:

Qt is a cross-platform application framework developed in the C++ programminglanguage that is used to create desktop, embedded and mobile apps Linux, OS X,Windows, VxWorks, QNX, Android, iOS, BlackBerry, Sailfish OS, and many otherplatforms are supported The Python interface for the Qt library, which is a collection

of control interface components, is called PyQt (widgets, graphical control elements)

• Firebase realtime database for communication between Jetson Nano, ESP 32,

monitoring software via WIFI connection:

Firebase is a google-owned platform that helps us develop web and mobile apps.They provide a lot of useful tools and services to develop a quality application Thatshortens the development time and helps the app to be available to users soon.Firebase real-time database is a cloud-hosted, NoSQL real-time database that allowsyou to store and sync data The data is stored as a JSON tree and is synchronized inreal-time for all connections

1.5.2 Scope of the study

Trang 22

The topic is limited in scope following the purpose of the topic In this report, theteam will analyze the advantages of applying product classification according toBarcode using Deep Learning compared to traditional methods in the form ofhardware descriptive design analysis At the same time, the group will implement amodel to perform automatic parcel classification, including:

• An AI system with the function of recognizing barcodes and classifying

barcodes according to each corresponding compartment

• An AGV vehicle system controlled by an AI system through WIFI connection

to deliver parcels to each respective compartment

• A tracking and operating software display the location of autonomous vehicles,

the number of parcels in receiving location, and the name of receiving location

1.6 Outline

In the report, the research team has tried to present it logically so that readers caneasily understand the knowledge, method, and operation of the topic The layout of thereport is divided into five chapters as follows:

Chapter 1: In this chapter, the group will present the current research status and

development trends of artificial intelligence At the same time, we raise the urgency ofthe topic of applying artificial intelligence in the classification of goods in the field ofLogistics From there, implement an automatic parcel sorting system that applies AItechnology to solve the limitations of manual product classification Finally, the groupwill set out the goal, audience, and scope of research to implement this system

Chapter 2: Background This chapter focuses on theories related to the topic,

including knowledge of neural network, electronic component and software used in thesystem

Chapter 3: Design and Implementation This chapter will present in detail the

model of the system, including the block diagram and the operating principle of thesystem Next is going to design the system, which module, electronic component andneural network model should be selected to achieve the highest efficiency and theconnection diagram between those modules and components Finally, based on thesystem design, implement hardware and software construction for the system Fromthere, the operating procedure of the system is given

Trang 23

Chapter 4: Results This chapter will present the implementation results and

make comments and evaluations with the theory presented in Chapter 2

Chapter 5: Conclusion and future work This chapter summarizes what has been

done and the limitations and evaluates the system so that solutions and newdevelopment directions can be given for the topic

Trang 24

CHAPTER 2: BACKGROUND

In this chapter, we will provide an overview of the technologies and methodsemployed in this field, including AI technology, the AGV vehicle system, barcodes,and others

fully-Figure 2.1: Architecture of CNN network [1]

Input layer: As we know, CNN is inspired from ANN model, so its input is an

image which will hold the pixel values

Trang 25

Convolutional layer: through the calculation of the scalar product between their

weights and the area connected to the input volume, will be able to determine theoutput of neurons whose local regions of the input are connected

Pooling layer: will just downscale the input along its spatial dimension,

significantly lowering the number of parameters that make up that activation

Fully connected layer: will then carry out the ANN's normal functions and try to

create class scores from the activations for categorization Additionally, it is proposedthat ReLu be applied in between these layers to enhance performance The goal of therectified linear unit, also known as ReLu, is to activate the output of the previouslayer's activation by applying a "elementwise" activation function, such as sigmoid.Then, we will specifically analysis about convolutional layer, fully connected layer

Convolutional layer: The convolutional layer is crucial to how CNNs work, as its

name suggests The usage of learnable kernels is the main emphasis of the layerparameters

These kernels often have a low spatial dimension yet cover the entire depth of theinput Each filter is involved across the spatial dimensions of the input by theconvolutional layer when the data enters it, creating a 2D activation map

The input matrix is processed over a matrix called the kernel to create a featuremap for the following layer We carry out convolution mathematical process by slidingthe Kernel matrix over the input matrix Each position performs element-by-elementmatrix multiplication and sums the results onto the feature map

Figure 2.2: This is an image of sliding kernel through input’s matrix [2]

Trang 26

More than one axis can be applied to convolution The convoluted image iscalculated as follows if we have a two-dimensional image input, I, and a two-dimensional kernel filter, K:

For example, if the network's input is an image of size 64x64x3 (a RGB coloredimage with a conditionality of 64x64) and the receptive field size is set to 6x6, eachneuron in the convolutional layer would have a total of 108 weights (6x6x3 where 3 isthe magnitude of connectivity across the volume's depth) To put this in context, astandard neuron in other forms of ANN would have 12, 288 weights

Convolutional layers can also significantly reduce the model's complexity throughoutput optimization These are optimized using three hyper parameters: depth, stride,and zero-padding

The number of neurons within the layer to the same region of the input can beused to manually set the depth of the output volume produced by the convolutionallayers This can be seen in other types of ANNs, where all of the neurons in the hiddenlayer are previously directly connected to every single neuron Reducing this hyperparameter significantly reduces the total number of neurons in the network, but it alsosignificantly reduces the model's pattern recognition capabilities

We can also define the stride, which is the depth we set around the spatialdimensional of the input to place the receptive field For example, if we set stride to 1,

we will have a heavily overlapped receptive field with extremely large activation.Alternatively, increasing the stride will reduce the amount of overlapping and produce

an output with lower spatial dimensions

Zero-padding is the simple process of padding the input's border, and it is an

effective way to give more control over the dimensional of the output volumes

It is critical to understand that by employing these techniques, we will change thespatial dimensional of the output of the convolutional layers Below formula which isprovided by the author to calculate for this:

( − ) + 2

+ 18

Trang 27

Where V denotes the input volume size (height,width,depth), R the receptive fieldsize, Z the amount of zero padding set, and S the stride If the calculated result of thisequation is not a whole integer, the stride has been set incorrectly, and the neurons willnot fit neatly across the given input.

Despite our best efforts, if we use an image input of any real dimensional, ourmodels will still be enormous However, methods for greatly reducing the overallnumber of parameters within the convolutional layer have been developed

The assumption behind parameter sharing is that if one region feature is useful tocompute in one spatial region, it is likely to be useful in another

If we constrain each individual activation map within the output volume to thesame weights and bias, the number of parameters produced by the convolutional layerwill be drastically reduced

As a result, when the back propagation stage occurs, each neuron in the outputwill represent the overall gradient, which can be totaled across the depth, updatingonly a single set of weights rather than all of them

Pooling layer: The goal of pooling layers is to gradually reduce the dimensional

of the representation, and thus the number of parameters and computationalcomplexity of the model The pooling layer runs over each activation map and uses the

"MAX" function to scale its dimensionality Most CNNs use max-pooling layers withkernels of dimensionality 2 2 applied with a stride of 2 along the spatial dimensions ofthe input This reduces the activation map to 25% of its original size while keeping thedepth volume at its original size Because of the pooling layer's destructive nature,there are only two commonly observed methods of max-pooling Typically, thepooling layers' stride and filters are both set

Fully connected layer: The neurons in the fully-connected layer are directly

connected to the neurons in the two adjacent layers, but not to any layers within them

2.1.2 Yolo network

The overview of Yolo

Yolo (You Only Look Once) is a CNN network model for object detection,classification, and recognition Yolo's convolutional layers and connected layers are

Trang 28

combined Convolutional layers extract features from images, while connected layerspredict probabilities and coordinates in objects.

Yolo is not the best algorithm, but it is the fastest in object recognition models Itcan reach near-real-time speeds, but its accuracy is not significantly lower than that ofthe top models Because Yolo is an object detection technique, the model's goal is topredict labels for objects in classification tasks as well as to locate the object As aresult, instead of assigning a single label to an image, Yolo can detect many objectswith label differences in a single snap Yolo has the advantage of only takinginformation from the entire image at once, predicting the entire object box containingthe objects Because the model is built from the ground up, it should be trained entirelyusing gradient descent

The architecture of Yolo

According to the author, Yolo network is inspired from GoogleLenet model forimage classification Their network consists of 24 convolutional layers, followed bytwo fully connected layers Instead of the GoogleLeNet inception modules, we simplyuse 1x1 reduction layers followed by 3x3 convolutional layers The entire network’sarchitecture is shown below

Figure 2.3: Architecture of Yolo network [3]

The author also trains a fast version of Yolo to test the limits of fast object detection.Fast Yolo employs a neural network with fewer convolutional layers (9 as opposed to

24) and filters in those layers Except for the network size, all training and testing parameters are the same between Yolo and Fast Yolo

Trang 29

The final output is 7x7x30 tensors.

Anchor box: YOLO will need the anchor boxes as the basis of the estimation to

find the bounding box for the object These anchor boxes will be predefined and willclosely surround the object The anchor box will be refined later by the regressionbounding box algorithm to create a predicted bounding box for the object Each object

in the training image is distributed around an anchor box in a YOLO model If thereare two or more anchor boxes surrounding the object, we will choose the one with thehighest IoU with the ground truth bounding box

Figure 2.4: Identify anchor box of an object [3]

Each object in the training image is assigned to a cell on the feature map thatcontains the object's midpoint So, in order to identify an object, we must first identifytwo components associated with it (cell, anchor box) It's not just the cell or the anchorbox

Bounding box: Each grid cell forecasts the B bounding boxes, every bounding

box has five predictions: x, y, w, h, and confidence The (x, y) coordinates representthe center of the box in relation to the grid cell bounds The width and height arecalculated in relation to the entire image Finally, the IOU between the predicted boxand any ground truth box is represented by the confidence prediction

In addition, each grid cell predicts C conditional class probabilities, Pr(Classi |Object) These probabilities are dependent on the grid cell that contains an object.Regardless of the number of boxes B, we predict only one set of class probabilities pergrid cell We multiply the conditional class probabilities by the individual boxconfidence predictions at test time

Trang 30

This provides us with confidence scores for each box based on its class Thesescores encode both the likelihood of that class appearing in the box and the accuracywith which the predicted box fits the object.

Figure 2.5: The active way of YOLO [3]

2.1.3 Yolov7

The Yolov7’s theory:

The YOLO (You Only Look Once) v7 model is the most recent addition to theYOLO family YOLO models are object detectors with a single stage Image frames in

a YOLO model are characterized by a backbone These features are combined andmixed in the neck before being transmitted to the network's head YOLO predicts thelocations and classes of objects that should have bounding boxes drawn around them.Yolov7 outperforms all known object detectors in terms of both speed andaccuracy in the 5 FPS to 160 FPS range, and has the highest accuracy (56.8% AP)among all known real-time object detectors about 30 FPS on GPU V100

The architecture of Yolov7:

Trang 31

YOLOv4, Scaled YOLOv4, and YOLO-R were used to create the architecture.Using these models as a foundation, additional experiments were carried out in order

to develop new and improved YOLOv7

Yolov7 performs the same recognition as previous Yolo versions, but it is fasterand has a shorter inference time Generally, Yolov7 is designed much convolutionlayer than others version Furthermore, Yolov7’s architecture also has been differentcompare with the previous architecture When designing a network architecture,researchers commonly prioritize fundamental requirements such as: the number ofparameters, amount of computation, and computational density is lower than before

In this version, author not only bases on the following conditions, but alsoconsiders the number of element on the convolution layer output tensors Since then,the author has created the CSPVoVNet network, which was inspired by the previousVoVNet network

E-ELAN (Extended Efficient Layer Aggregation Network): The YOLOv7

backbone's computational block is the E-ELAN It is inspired by previous networkefficiency research It was created by analyzing the following factors that influencespeed and accuracy:

• Memory access fees

• The ratio of I/O channels

• Operation by element

• Activation

• The gradient path

Simply put, the E-ELAN architecture allows the framework to learn moreeffectively It's built around the ELAN computational block At the time of writing, theELAN paper had not yet been published When ELAN information becomes available,

we will update the post

Trang 32

Figure 2.6: Architecture of E-ELAN [4]

Compound Model Scaling in YOLOv7: Different models are required for

different applications Some require highly accurate models, while others prioritizespeed Model scaling is done to meet these requirements and fit it into differentcomputing devices

The following parameters are taken into account when scaling a model size:

• Resolution ( size of the input image)

• Width (number of channels)

• Depth (number of layers)

• Stage (number of feature pyramids)

A common model scaling method is NAS (Network Architecture Search).Researchers use it to iterate through the parameters in order to find the best scalingfactors However, methods such as NAS perform parameter-specific scaling In thiscase, the scaling factors are unrelated

The YOLOv7 paper's authors demonstrate that it can be further optimized using acompound model scaling approach For concatenation-based models, width and depthare scaled in coherence

Trang 33

Figure 2.7: Architecture of Compound Model Scaling in YOLOv7 [4]

Trainable Bag of Freebies in YOLOv7:

Planned Re-parameterized Convolution

Averaging a set of model weights is used in re-parameterize techniques to create amodel that is more robust to the general patterns that it is attempting to model Therehas recently been a focus in research on module level re-parameterize, where eachcomponent of the network has its own re-parameterize strategy

The YOLOv7 authors use gradient flow propagation paths to determine whichnetwork modules should and should not use re-parameterize strategies

Trang 34

Figure 2.8: Architecture of Planned Re-parameterized Convolution [4]

The RepConv layer replaces the E-ELAN computational block's 33 convolutionlayer in the diagram above We conducted experiments by switching or replacing thepositions of RepConv, 33 Conv, and Identity connection It is simply an 11convolutional layer We can see which configurations work and which do not Moreinformation about RepConv can be found in the RepVGG paper

In addition to RepConv, YOLOv7 re-parameterized Conv-BN (Convolution BatchNormalization), OREPA (Online Convolutional Re-parameterize), and YOLO-R toachieve the best results

Coarse for Auxiliary and Fine for Lead Loss

The YOLO network head makes the final network predictions, but because it is sofar downstream in the network, it may be advantageous to add an auxiliary headsomewhere in the middle You are supervising both this detection head and the headthat will actually make predictions while training

Because there is less network between the auxiliary head and the prediction, itdoes not train as efficiently as the final head - so the YOLOv7 authors experiment withdifferent levels of supervision for this head, settling on a coarse-to-fine definitionwhere supervision is passed back from the lead head at different granularity

Figure 2.9: Architecture of Coarse for Auxiliary and Fine for Lead Loss [4]

2.1.4 OCR Theory

The Overview of OCR:

The use of technology to distinguish printed or handwritten text characters withindigital images of physical documents, such as a scanned paper document, is known asOCR (optical character recognition) The fundamental process of OCR is to examinethe text of a document and translate the characters into code that can be used for dataprocessing Text recognition is another term for optical character recognition (OCR)

The Principle work of OCR:

Trang 35

The first step in OCR is to process the physical form of a document with ascanner Following the copying of all pages, OCR software converts the document to atwo-color, or black and white, version The scanned-in image or bitmap is analyzed forlight and dark areas, with dark areas indicating characters that must be recognized andlight areas indicating background.

The dark areas are then further processed to look for alphabetic letters or numericdigits OCR programs use a variety of techniques, but most focus on one character,word, or block of text at a time Following that, characters are identified using one oftwo algorithms:

Recognition of patterns: OCR programs are fed text examples in various fonts

and formats, which are then compared and recognized in the scanned document

Detection of features: To recognize characters in a scanned document, OCR

programs use rules based on the characteristics of a specific letter or number Forcomparison, features could include the number of angled lines, crossed lines, or curves

in a character For example, the capital letter "A" could be represented by two diagonallines intersected by a horizontal line in the middle

2.1.5 Tesseract model

Text recognition is a difficult task in computer vision that has a lot of practicalapplications Optical character recognition (OCR) enables a variety of automationapplications This project focuses on natural image word detection and recognition.The targeted problem is significantly more difficult than reading text in scanneddocuments Because of the limited availability of images, the use case in focus makes

it possible to detect the text area in natural scenes with greater accuracy This isaccomplished by mounting a camera on a truck and continuously capturing similarimages The Tesseract OCR engine is then used to recognize the detected text area

Line Finding:

The line finding algorithm is designed to recognize a skewed page without having tode-skew it, saving image quality Blob filtering and line construction are critical steps inthe process Assuming that page layout analysis has already provided text regions ofroughly uniform text size, a simple percentile height filter removes drop-caps andcharacters that are vertically touching Because the median height approximates the text

Trang 36

size in the region, it is safe to filter out blobs smaller than some fraction of the medianheight, which are most likely punctuation, diacritical marks, and noise.

The filtered blobs are more likely to fit a model of parallel but sloping overlapping lines Sorting and processing the blobs by x-coordinate allows you toassign each blob to a unique text line while tracking the slope across the page, with amuch lower risk of assigning to the wrong text line in the presence of skew Once thefiltered blobs have been assigned to lines, the baselines are estimated using a leastmedian of squares fit, and the filtered-out blobs are fitted back into the appropriatelines The final step in the line creation process merges blobs that overlap by at leasthalf horizontally, associating parts of some broken characters and putting diacriticalmarks together with the correct base

non-BaseLine Fitting:

The blobs are partitioned into groups with a reasonably continuous displacementfor the original straight baseline to fit the baselines A least squares fit is used to fit aquadratic spline to the most populous partition (assumed to be the baseline) Thequadratic spline has the advantage of being reasonably stable in this calculation, but ithas the disadvantage of causing discontinuities when multiple spline segments arerequired A more traditional cubic spline might be preferable

Chopping and Fixed Pitch Detection:

Tesseract examines the text lines to see if they are fixed pitch When Tesseractencounters fixed pitch text, it chops the words into characters based on the pitch anddisables the chopper and associator on these words for the word recognition step

2.2 Barcode Technology

2.2.1 Introduction to barcode

Nowadays, automation in production and management has become a leading trendnot only in each country but also in the whole world The use of automatic dataacquisition (ADC) technology, in general, and barcode technology, in particular, hasbrought many obvious benefits in commerce and management One of the mostapparent benefits is that inventory, payment, and export management are carried outquickly and accurately Barcodes are more widely used than other ADC technologiesbecause of their economic advantages and high efficiency

Trang 37

Norman Joseph Woodland and Bernard Silver developed the idea of barcode

technology In 1984, students at Drexel University developed this idea after learning

what a food company president wanted to ask his employees to manage To be able to

test the entire process automatically

2.2.2 Barcode types

1D barcodes are standard linear barcodes with alternating parallel black and white

stripes A 1D code is called a "one-dimensional bar code" because the data encoded in

it is changed based on only one dimension - the width (horizontal) 1D barcodes

include many different types Depending on the amount of information, the form of

information is encrypted characters or digits, as well as the industry or field of use;

people are divided into many types, of which the common types on the market include

UPC,EAN, Code 128, and Code 39

Table 2.1: Describe the types of 1D barcodes used in industries

Barcode Appearance Fields Reason

Type

UPC UPC codes are used to paste UPC codes are used in

and check consumer goods, the field of code only;retail businesses, food no need fortechnology, all over the alphanumeric code,world and they are reliable.Currently, UPC Codes are UPC codes have codescommonly used for North for error checking.American countries and

Canada

Code Code 128 is applied in the Code 128 is highly

128 distribution of goods in appreciated and

logistics and transportation, famous for itsretail supply chain, and application because itmanufacturing industry has advantages such as

compact barcode,

Trang 38

diverse informationstorage, and canencode morecharacters: uppercase,lowercase, charactersnumbers, standardASCII characters, andcontrol codes.

EAN EAN code is a type of EAN codes are used in

barcode with many the field where onlysimilarities to UPC code and codes are required, no

is commonly used in alphanumeric codesEuropean countries are needed, and they

are reliable Code withcode for errorchecking

Code 39 Code 39 is widely used in the Code 39 type of code

Ministry of National overcomes the biggestDefense, the Health sector, drawback of the aboveadministrative agencies, and two types of EAN andbook publishing UPC barcodes, which

is unlimited capacityand can encode bothuppercase characters,natural numbers, andsome characters

2.2.3 The methods of Barcode scanning

Currently, there are many barcode recognition devices, and each device will have

different identification methods, all of which recognize the barcode However, no

Trang 39

method is considered the best; each identification method also has its advantages anddisadvantages, and these methods are always being researched and developed.

CCD Scanner:

Figure 2.10: CCD Scanner

The CCD scanner consists of an array of LEDs arranged so that the emitted lightrays form a straight horizontal line of light that cuts across the surface of thesymbology The reflected light obtained by the CCD Scanner Lense is the part used toconvert the light signal into a digital signal

• Advantages: low cost

• Cons: This type can only scan barcodes on flat surfaces at close range, not barcodes in curves

Laser Scanner:

Figure 2.11: Laser Scanner

Trang 40

Laser scanners consist of a reader that emits a red laser and then uses a reflector tocreate a light trail that cuts across the surface of the barcode and does not use a light-collecting lens.

• Advantages: no need for light collection, very sensitive laser scanning, highaccuracy results, can scan barcodes on work surfaces, and long-range scanningcapabilities

• Cons: The reading eye is not durable; after a while, it may be weakened due

to the phenomenon of a 'cocoon of barcodes

Read barcodes with Camera Software:

Figure 2.12: Read barcodes with Camera Software

The use of cameras is of great interest and is mainly used for applications that run

on smartphones and jobs that need to handle multiple barcodes simultaneously Use ahigh-resolution camera and autofocus to take input images and process them using pre-programmed software

• Advantages: Giving users a more intuitive view of the processing andreading of barcodes on images, a sensitive camera, good focus, and accuracy Processmultiple barcodes simultaneously Highly portable, suitable for use on small and compactmobile devices

• Cons: Affected by ambient light, the resolution and focus of the cameramust be high and appropriate, the ability to read barcodes on curved surfaces is poor, andthe reading distance is not far

2.2.4 Code 128

Introduction to Code 128

Tiêu đề	Design and Implementation of Classification and Delivery Based on Computer Vision
Tác giả	Phạm Minh Quân, Nguyễn Hoài Phương Uyên
Người hướng dẫn	Trương Quang Phúc, M.Eng
Trường học	Ho Chi Minh City University of Technology and Education
Chuyên ngành	Computer Engineering Technology
Thể loại	Graduation project
Năm xuất bản	2022
Thành phố	Ho Chi Minh City

Định dạng
Số trang	116
Dung lượng	3,73 MB