MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITYUNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR HIGH QUALITY TRAINING GRADUATION PROJECT COMPUTER ENGINGEERING TECHNOLOGY DESIGN AND
Trang 1MINISTRY OF EDUCATION AND TRAINING HO CHI MINH CITY
UNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR
HIGH QUALITY TRAINING
GRADUATION PROJECT COMPUTER ENGINGEERING TECHNOLOGY
DESIGN AND IMPLEMENTATION OF
CLASSFICATION AND DELIVERY BASED
Trang 2HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND
EDUCATION FACULTY FOR HIGH QUALITY TRAINING
GRADUATION PROJECT
DESIGN AND IMPLEMENTATION OF
CLASSFICATION AND DELIVERY BASED ON
COMPUTER VISION
PHẠM MINH QUÂN Student ID: 18161031 NGUYỄN HOÀI PHƯƠNG UYÊN Student ID: 18119053
Major: COMPUTER ENGINEERING TECHNOLOGY
Advisor: TRƯƠNG QUANG PHÚC, M.Eng
Ho Chi Minh City, December 2022
Trang 3HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND
EDUCATION FACULTY FOR HIGH QUALITY TRAINING
GRADUATION PROJECT
DESIGN AND IMPLEMENTATION OF
CLASSIFICATION AND DELIVERY BASED
ON COMPUTER VISION
PHẠM MINH QUÂN Student ID: 18161031 NGUYỄN HOÀI PHƯƠNG UYÊN Student ID: 18119053
Major: COMPUTER ENGINEERING TECHNOLOGY
Advisor: TRƯƠNG QUANG PHÚC, M.Eng
Ho Chi Minh City, December 2022
Trang 4THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-Ho Chi Minh City, December 25, 2022
GRADUATION PROJECT ASSIGNMENT
Student name: Phạm Minh Quân
Student name: Nguyễn Hoài Phương Uyên
Major: Computer Engineering Technology
Advisor: Trương Quang Phúc, MEng
Date of assignment:
_
Student ID: 18161031Student ID: 18119053Class: 18119CLA
Phone number: _
Date of submission: _
1 Project title: Design and Implementation of classification and delivery based on ComputerVision
2 Initial materials provided by the advisor: _
3 Content of the project: _
4 Final product:
CHAIR OF THE PROGRAM ADVISOR
(Sign with full name) (Sign with full name)
Trương Quang Phúc
Trang 5THE SOCIALIST REPUBLIC OF
VIETNAM
Independence – Freedom– Happiness
-Ho Chi Minh City, December 25, 2022
ADVISOR’S EVALUATION SHEET
Student name: Phạm Minh Quân
Student name: Nguyễn Hoài Phương Uyên
Student ID: 18161031Student ID: 18119053Major: Computer Engineering Technology
Project title: Design and Implementation of classification and delivery based on ComputerVision
Advisor: Trương Quang Phúc, MEng
EVALUATION
1 Content of the project:
2 Strengths:
3 Weaknesses:
4 Approval for oral defense? (Approved or denied)
5 Overall evaluation: (Excellent, Good, Fair, Poor) Good
6 Mark: 9.0 (in words: )
Ho Chi Minh City, December 25, 2022
ADVISOR
(Sign with full name)
Trương Quang Phúc
Trang 6HO CHI MINH CITY UNIVERSITY THE SOCIALIST REPUBLIC OF VIETNAM TECHNOLOGY AND EDUCATIO Independence – Freedom - Happiness
FACULTY FOR HIGH QUALITY TRAINING
Ho Chi Minh City, January 13, 2023
MODIFYING EXPLANATION OF THE GRADUATION PROJECT
MAJOR: COMPUTER ENGINEERING
1 Project title: Design and Implementation of classification and delivery based on Computer
Vision
2 Student name: Phạm Minh Quân Student ID: 18161031
Student name: Nguyễn Hoài Phương Uyên Student ID: 18161031
3 Advisor: Trương Quang Phúc, Meng
4 Defending Council: Council 2, Room: A3-404, 3rd January 2023
5 Modifying explanation of the graduation project:
TT Council comments Editing results Note
1 Many figures in chapter 2 are reused from other Many figures in chapter 2 are provided
sources without providing the related references the related references
2 Visual quality of many figures in chapter 3 is Many figures in chapter 3 are modified to
very low and hard to follow improve their visual quality
In conclusion: Author should clearly point out
The conclusion section is clearly pointed
3 which objective have accomplished instead of a
out which objective have accomplishedgeneral summarization
4 The flowchart of figure 3.6 must have a “Begin” The flowchart of figure 3.6 is modified
point and a “End” Point in a terminator shape
Head of Department Advisor Students
(Sign with full name) (Sign with full name) (Sign with full name)
Trang 7Trương Quang Phúc
Trang 8THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-Ho Chi Minh City, December 25, 2022
PRE-DEFENSE EVALUATION SHEET
Student name: Phạm Minh Quân
Student name: Nguyễn Hoài Phương Uyên
Student ID: 18161031Student ID: 18119053Major: Computer Engineering Technology
Project title: Name of Reviewer:
EVALUATION
1 Content and workload of the project
2 Strengths:
3 Weaknesses:
4 Approval for oral defense? (Approved or denied)
Trang 9THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-EVALUATION SHEET OF DEFENSE COMMITTEE MEMBER
Student name: Phạm Minh Quân
Student name: Nguyễn Hoài Phương Uyên
Student ID: 18161031Student ID: 18119053Major: Computer Engineering Technology
Project title:
Name of Defense Committee Member:
EVALUATION
1 Content and workload of the project
2 Strengths:
3 Weaknesses:
4 Overall evaluation: (Excellent, Good, Fair, Poor)
Trang 10SUPERVISOR APPROVAL
Trang 11Over the course of undertaking the project, our group received a plenty of valuablesupport that incentivize us to overcome all the problems and challenges and end upthis quite hard and meaningful project
Firstly, we would like to thanks to the School Board of the Ho Chi Minh CityUniversity of Technology and Education and Faculty for High Quality Trainingcreating wonderful conditions for me to take my project
Secondly, sincerely thank to Mr.Trương Quang Phúc, our advisor who gave us usefulguidance and instruction that help us to finish our project successfully From theseadvices we can improve our project contents and correct the mistakes as well
Thirdly, we are grateful to all of the nice classmates of class 18119CLA whocontributed to give advice and warm guidance whenever we need the support
Last but not least, Due to limited knowledge and implementation time, we cannotavoid errors We look forward to receiving your comments and suggestions to improvethis topic
In short, we really thank to all people are a part of our achievement
Ho Chi Minh city, Friday, December 23, 2022
Student performancePham Minh QuanNguyen Hoai Phuong Uyen
Trang 12Currently, both the national and international industries are growing quickly.Manufacturers are interested in the trend of industry combined with automation Withthe advancement of digital technology, automatic lines are becoming more widelyused in manufacturing Manufacturing companies are constantly improving their owntechnology and machinery systems in order to produce high-quality products at themost competitive prices That is the foundation for improving competitive position andassisting businesses to stand firm in a competitive market The following article willdelve deeper into the role of automation in modern manufacturing
It is because the logistics industry provides numerous benefits, such as increasedproductivity, lower kernel costs, improved product quality, and lower raw materialcosts
Our team decided to implement the topic "Automatic parcels classification anddelivery through image processing" after absorbing information and researching theautomation industry The convolution neural network used which is one of the mostwidely used networks in the field right now for character recognition, barcoderecognition Generalizing the problem, we choose convolutional neural networkbecause it has a complex architecture and large parameters, good enough for objectrecognition We additionally prepare a sizable enough data collection, which includes
a few particular examples, to provide the training procedure with excellent outcomes
In order to maximize data inference and training time, we additionally chosen to useNVIDIA's Jetson Nano hardware to maximize the GPU's processing capability JPGfiles with the collected data are used for both testing and training
Trang 13TABLE OF CONTENTS
LIST OF PICTURES vi
LIST OF TABLES viii
ABBREVIATIONS ix
CHAPTER1: OVERVIEW 1
1.1 Introduction 1
1.2 Objective 2
1.3 Limitation 2
1.4 Research Method 2
1.5 Object and Scope of the study 2
1.5.1 Object of the study 3
1.5.2 Scope of the study 3
1.6 Outline 4
CHAPTER 2: BACKGROUND 6
2.1 AI Technology 6
2.1.1 Overview of CNN 6
2.1.2 Yolo network 9
2.1.3 Yolov7 12
2.1.4 OCR Theory 16
2.1.5 Tesseract model 17
2.2 Barcode Technology 18
2.2.1 Introduction to barcode 18
2.2.2 Barcode types 19
2.2.3 The methods of Barcode scanning 20
2.2.4 Code 128 22
2.3 The overview of AGV 24
2.3.1 The introduction of AGV 24
2.2.2 The fundamental architecture of an AGV system 25
2.4 PYQT5 Platform 29
2.5 Firebase 31
2.5.1 Introduction to Firebase 31
2.5.2 Some features of Firebase 32
2.5.3 The pros and cons of firebase 33
2.6 Other techniques used in the project 33
2.6.1 The working principle of the infrared sensor circuit in the vehicle's line detector 33 2.6.2 Pulse width modulation (PWM) 34
2.6.3 General operating principles of Automatic Traction Robot 35
Trang 142.6.4 The method for establishing the robot's location in relation to the line 36
2.6.5 Serial Peripheral Interface (SPI) 39
CHAPTER 3: DESIGN AND IMPLEMENTATION 43
3.1 System requirements 43
3.2 Block diagram 43
3.3 AI System 45
3.3.1 Hardware Design 45
3.3.2 Detail Software Design 47
3.4 AGV SYSTEM 54
3.4.1 Machenical design 55
3.4.1 The detail hardware design 60
3.4.2 The schematic diagram of AGV system 71
3.4.3 Software Design 72
3.5 User interface of the delivery application 74
3.6 Firebase Realtime database 75
CHAPTER 4: RESULT 77
4.1 Introduction: 77
4.2 Hardware implementation 77
4.3 System Operation 80
4.4 Software System 82
4.4.1 The barcode generation process 82
4.4.2 Result of labelling process 83
4.4.3 Annotations for data 84
4.4.4 Training process 85
4.4.5 The text detecting process by Tesseract OCR model 86
4.4.6 Interface of delivery app 87
4.5 Evaluation and comparison 88
4.5.1 Comparison Yolo with others CNN 88
4.5.2 Comparison between Yolo and others CNN 92
CHAPTER 5: CONCLUSION AND FUTURE WORK 94
5.1 Conclusion 94
5.2 Future Work 94
APPENDIX 95
REFERENCE 95
Trang 15LIST OF PICTURES
Figure 2.1: Architecture of CNN network 6
Figure 2.2: This is an image of sliding kernel through input’s matrix 7
Figure 2.3: Architecture of Yolo network 10
Figure 2.4: Identify anchor box of an object 11
Figure 2.5: The active way of YOLO 12
Figure 2.6: Architecture of E-ELAN 14
Figure 2.7: Architecture of Compound Model Scaling in YOLOv7 15
Figure 2.8: Architecture of Planned Re-parameterized Convolution 16
Figure 2.9: Architecture of Coarse for Auxiliary and Fine for Lead Loss 16
Figure 2.10: CCD Scanner 21
Figure 2.11: Laser Scanner 21
Figure 2.12: Read barcodes with Camera Software 22
Figure 2.13: Code 128 23
Figure 2.14: The part of a Code 128 Barcode 24
Figure 2.15: AGV vehicle operation diagram 25
Figure 2.16: The basic structure of an AGV system 25
Figure 2.17: Towing type 27
Figure 2.18: Cargo type 28
Figure 2.19:Forklift 28
Figure 2.20: Inference of Qt Designer Software 29
Figure 2.21: Image of QMainWindow 31
Figure 2.22: Firebase Database 32
Figure 2.23: Working principle 34
Figure 2.24: Principle diagram of infrared sensor 34
Figure 2.25: Time diagram of the PWM pulse 35
Figure 2.26: General structure of line detection robot 36
Figure 2.27: The robot is in the middle of the line 36
Figure 2.28: The robot is moving to the right level 1 37
Figure 2.29: The robot is moving to the right level 2 37
Figure 2.30: Robot turns left 38
Figure 2.31: Sensor deviating from line 38
Figure 2.32:Communication between 1 master and 1 salve 39
Figure 2.33: Independent mode in SPI protocol 40
Figure 2.34: Daisy mode in SPI protocol 40
Figure 2.35: SPI protocol operation mode 41
Figure 2.36: The communication process between master and salve uses SPI protocol 42
Figure 3.1: Block diagram of automatic classification and transformation system 44
Figure 3.2: The detailed block diagram of the AI system 45
Figure 3.3: Top view of Jetson Nano 46
Figure 3.4: Top view of Logitech C310 HD Webcam 47
Figure 3.5: The pipeline of AI system 48
Figure 3.6: Flowchart of generating bar code 49
Figure 3 7: Pipeline of pre-processing image 50
Figure 3.8: Format to export dataset 50
Figure 3.9: Pipeline of training data using Yolov7 model 51
Figure 3.10: Command Line to train dataset on Google Colab 52
Figure 3.11: Command Line to test image from dataset 52
Figure 3.12: Pipeline of processing of Tesseract 53
Figure 3.13: The flowchart of detecting barcode 54
Trang 16Figure 3.14: Top view of a V1 reducer wheel 56
Figure 3.15: Coordinate system of robot AGV 56
Figure 3.16: Model of forces acting on the wheel 57
Figure 3.17: Dual Shaft Plastic Geared TT Motor 58
Figure 3.18: Calculation model and force analysis when the car is cornering 59
Figure 3.19: The 3D design of robot AGV 60
Figure 3.20: The block diagram of AGV system 61
Figure 3.21: The topview of ESP32 63
Figure 3.22: RFID MFRC 522 Module 64
Figure 3.23: Line Detection and Obstacle Avoidance Sensor 5 LED BFD-1000 65
Figure 3.24: Top view of the SG90 Servo Motor 66
Figure 3.25: The detail schematic for buzzer block 67
Figure 3.26: Top view of module L298N driver motor 68
Figure 3.27: Top view of V1 geared DC motor 68
Figure 3.28: Top view of Lithium-ion 18650 battery 69
Figure 3.29: Top view of LM2586HVS 3A DC to DC Step Down Buck Converter 70
Figure 3.30: The schematic of power supply block 70
Figure 3.31: The schematic diagram of AGV system 71
Figure 3.32: The operating diagram of AGV system 72
Figure 3.33: The flowchart of main program 73
Figure 3.34: The operating diagram of delivery application 74
Figure 3.35: The database of system ……… 75
Figure 4.1: An automatic parcel classify and delivery system model 77
Figure 4.2: AI system model 78
Figure 4.3: Model of vehicle system AGV 79
Figure 4.4: Line structure 80
Figure 4.5: The shape of the cargo block 80
Figure 4.6: AI system successfully recognizes and predicts barcodes 81
Figure 4.7: AGV vehicle at the delivery location 82
Figure 4.8: AGV vehicle returns to the starting position 82
Figure 4.9: Result of generating bar-code folder process 83
Figure 4.10: Result of each bar code in Code128 format 83
Figure 4.11: Result of labelling each image 84
Figure 4.12: Result of add annotation process 85
Figure 4.13: Yolov7 dataset training results were successful 86
Figure 4.14: The model's performance when tested with the input image 86
Figure 4.15: Accuracy of input image after testing process 87
Figure 4.16: The interface of delivery app 88
Figure 4.17: Comparison of mAP and FPS between Yolov7 with Yolov5, Yolov6 on CPU 89 Figure 4.18: Comparison of mAP and FPS between Yolov7 with Yolov5, Yolov6 on GPU 89 Figure 4.19: Comparison of mAP and FPS between Yolov7 with Yolov5, Yolov6 on TELSA P100 90
Figure 4.20: Comparison of AP and inference time between Yolov7 models with others Yolo’s different version models 92
Trang 17LIST OF TABLES
Table 2.1: Describe the types of 1D barcodes used in industries 19
Table 2.2: The pros and cons of firebase 33
Table 3.1: Specifications Dual Shaft Geared Plastic TT Motor 58
Table 3.2: Power consumption esitmate of AGV system 70
Table 4.1: Comparison of mAP and FPS between Yolov7 with others version of Yolo 91
Table 4.2: Comparison between Yolo and others CNN 92
Trang 18LIST OF ABBREVIATIONS
YOLO You only look once
OCR Optical Character RecognitionCNN Convolutional Neural NetworkSSD Single Shot Multi DetectorRCNN Region Convolutional Neural NetworkIoT Internet Of Things
IoU Intersection of Union
mAP Mean Average Precision
FPS Frames per second
AGV Automation Guided Vehicle
Trang 19CHAPTER1: OVERVIEW
1.1 Introduction
In the current opening period, Vietnam's economy is facing many developmentopportunities, the logistics service industry is one of the economic developmentprospects that bring positive results for the country With a developed economy,multimodal transport services (logistics) have become a service industry that integratesmany high value-added activities, bringing great economic benefits, Vietnam and theeconomic environment Favorable business, high development opportunities promise
to develop strongly the service market in the coming time Developing logistics inlow-and middle-income countries can boost trade growth and benefit both businessesand consumers with cheaper prices and guaranteed service quality However, ourcountry's logistics service industry currently has many limitations, in order to thrive, it
is necessary to consider many factors and development directions For example, atpresent, Vietnam's small businesses providing logistics services are still at a low level
in applying technology to their business activities In particular, a lot of shippingbusinesses today still use manual product classification, which has limitations such ashigh error rate, time consuming sorting, labor costs for processing large productclassification Therefore, the application of science and technology to logisticsactivities is essential to overcome the remaining limitations in small and mediumenterprises in Vietnam
Artificial Intelligence (AI) is a global technology trend, attracting investment frombusinesses in its application to business, production and management processes If anybusiness knows how to make the most of the superior benefits of AI technology, theywill certainly have a strong advantage in the growth race of the digital transformationera One of the breakthroughs of AI is Deep Learning whereby businesses can easilyapply artificial intelligence (AI) to solve problems in life
Due to the outstanding advantages of Deep Learning compared to current algorithms,our team decided to apply Deep Learning algorithms to the model to implement automaticparcel classification to optimize the product classification process products, reduce therate of errors caused by human causes, reduce labor costs, shorten delivery times in thefield of logistics That can help us better understand how to operate
Trang 20and have more clear judgments about the advantages and disadvantages that we can dothrough the model From the above assumptions, we decided to carry out the project
"Automatic parcels classify and delivery with AI technology" as the graduation project
• Building AGV (Automatically Guided Vehicle) controlled by an AI system through WIFI connection to transport parcels to each fixed compartment
• In addition, building tracking software displays the location of autonomousvehicles, the location of the dispatch box, and the number of parcels in each dispatchbox
1.3 Limitation
The project has the following limitation:
• The topic mainly focuses on identifying and classifying parcels with code 128and identifying code 128 on packages performed in good light conditions with a directshooting angle
• The parcels for classification in the subject are small volumes
• The AGV vehicle system for product sorting has a low load capacity and is implemented in indoor conditions away from direct sunlight
• The tracking software can only use on computers
1.4 Research Method
Analysis and evaluation of energy efficiency, processing speed, and performance
on embedded systems of neural network models in barcode recognition
Learn the parameters of the neural network model, then design the network model
to train the system to execute barcode recognition
Analyze and evaluate the system's functions, then select the hardware for the AI system and the AGV (Automatically Guided Vehicle)
1.5 Object and Scope of the study
Trang 211.5.1 Object of the study
To make it easier to approach the problems, the group researched the researchsubjects to understand better how the topic was implemented Below are the subjectsthe group conducted the study:
• Nvidia Jetson Nano Developer Kit for AI Application Deployment Hardware: The product is a small but potent embedded computer that allows running modern
AI algorithms quickly, with a 64-bit ARM quad-core CPU, an onboard 128-coreNVIDIA GPU, and 4GB LPDDR4 memory It is possible to run multiple neuralnetworks in parallel and process several high-resolution sensors simultaneously
• Neural Network:
Yolo (You Look Only Once) and OCR (Optical Character Recognition)
• MCU ESP32 for AGV vehicle system control device for parcel sorting:
The product is a wifi transceiver KIT based on the ESP32 Wifi SoC chip and thepowerful CP2102 communication chip Used for applications that need to connect,collect data, and control via Wifi waves, especially applications related to IoT
• PyQT5 framwork for build monitoring and operating programs:
Qt is a cross-platform application framework developed in the C++ programminglanguage that is used to create desktop, embedded and mobile apps Linux, OS X,Windows, VxWorks, QNX, Android, iOS, BlackBerry, Sailfish OS, and many otherplatforms are supported The Python interface for the Qt library, which is a collection
of control interface components, is called PyQt (widgets, graphical control elements)
• Firebase realtime database for communication between Jetson Nano, ESP 32,
monitoring software via WIFI connection:
Firebase is a google-owned platform that helps us develop web and mobile apps.They provide a lot of useful tools and services to develop a quality application Thatshortens the development time and helps the app to be available to users soon.Firebase real-time database is a cloud-hosted, NoSQL real-time database that allowsyou to store and sync data The data is stored as a JSON tree and is synchronized inreal-time for all connections
1.5.2 Scope of the study
Trang 22The topic is limited in scope following the purpose of the topic In this report, theteam will analyze the advantages of applying product classification according toBarcode using Deep Learning compared to traditional methods in the form ofhardware descriptive design analysis At the same time, the group will implement amodel to perform automatic parcel classification, including:
• An AI system with the function of recognizing barcodes and classifying
barcodes according to each corresponding compartment
• An AGV vehicle system controlled by an AI system through WIFI connection
to deliver parcels to each respective compartment
• A tracking and operating software display the location of autonomous vehicles,
the number of parcels in receiving location, and the name of receiving location
1.6 Outline
In the report, the research team has tried to present it logically so that readers caneasily understand the knowledge, method, and operation of the topic The layout of thereport is divided into five chapters as follows:
Chapter 1: In this chapter, the group will present the current research status and
development trends of artificial intelligence At the same time, we raise the urgency ofthe topic of applying artificial intelligence in the classification of goods in the field ofLogistics From there, implement an automatic parcel sorting system that applies AItechnology to solve the limitations of manual product classification Finally, the groupwill set out the goal, audience, and scope of research to implement this system
Chapter 2: Background This chapter focuses on theories related to the topic,
including knowledge of neural network, electronic component and software used in thesystem
Chapter 3: Design and Implementation This chapter will present in detail the
model of the system, including the block diagram and the operating principle of thesystem Next is going to design the system, which module, electronic component andneural network model should be selected to achieve the highest efficiency and theconnection diagram between those modules and components Finally, based on thesystem design, implement hardware and software construction for the system Fromthere, the operating procedure of the system is given
Trang 23Chapter 4: Results This chapter will present the implementation results and
make comments and evaluations with the theory presented in Chapter 2
Chapter 5: Conclusion and future work This chapter summarizes what has been
done and the limitations and evaluates the system so that solutions and newdevelopment directions can be given for the topic
Trang 24CHAPTER 2: BACKGROUND
In this chapter, we will provide an overview of the technologies and methodsemployed in this field, including AI technology, the AGV vehicle system, barcodes,and others
fully-Figure 2.1: Architecture of CNN network [1]
Input layer: As we know, CNN is inspired from ANN model, so its input is an
image which will hold the pixel values
Trang 25Convolutional layer: through the calculation of the scalar product between their
weights and the area connected to the input volume, will be able to determine theoutput of neurons whose local regions of the input are connected
Pooling layer: will just downscale the input along its spatial dimension,
significantly lowering the number of parameters that make up that activation
Fully connected layer: will then carry out the ANN's normal functions and try to
create class scores from the activations for categorization Additionally, it is proposedthat ReLu be applied in between these layers to enhance performance The goal of therectified linear unit, also known as ReLu, is to activate the output of the previouslayer's activation by applying a "elementwise" activation function, such as sigmoid.Then, we will specifically analysis about convolutional layer, fully connected layer
Convolutional layer: The convolutional layer is crucial to how CNNs work, as its
name suggests The usage of learnable kernels is the main emphasis of the layerparameters
These kernels often have a low spatial dimension yet cover the entire depth of theinput Each filter is involved across the spatial dimensions of the input by theconvolutional layer when the data enters it, creating a 2D activation map
The input matrix is processed over a matrix called the kernel to create a featuremap for the following layer We carry out convolution mathematical process by slidingthe Kernel matrix over the input matrix Each position performs element-by-elementmatrix multiplication and sums the results onto the feature map
Figure 2.2: This is an image of sliding kernel through input’s matrix [2]
Trang 26More than one axis can be applied to convolution The convoluted image iscalculated as follows if we have a two-dimensional image input, I, and a two-dimensional kernel filter, K:
For example, if the network's input is an image of size 64x64x3 (a RGB coloredimage with a conditionality of 64x64) and the receptive field size is set to 6x6, eachneuron in the convolutional layer would have a total of 108 weights (6x6x3 where 3 isthe magnitude of connectivity across the volume's depth) To put this in context, astandard neuron in other forms of ANN would have 12, 288 weights
Convolutional layers can also significantly reduce the model's complexity throughoutput optimization These are optimized using three hyper parameters: depth, stride,and zero-padding
The number of neurons within the layer to the same region of the input can beused to manually set the depth of the output volume produced by the convolutionallayers This can be seen in other types of ANNs, where all of the neurons in the hiddenlayer are previously directly connected to every single neuron Reducing this hyperparameter significantly reduces the total number of neurons in the network, but it alsosignificantly reduces the model's pattern recognition capabilities
We can also define the stride, which is the depth we set around the spatialdimensional of the input to place the receptive field For example, if we set stride to 1,
we will have a heavily overlapped receptive field with extremely large activation.Alternatively, increasing the stride will reduce the amount of overlapping and produce
an output with lower spatial dimensions
Zero-padding is the simple process of padding the input's border, and it is an
effective way to give more control over the dimensional of the output volumes
It is critical to understand that by employing these techniques, we will change thespatial dimensional of the output of the convolutional layers Below formula which isprovided by the author to calculate for this:
( − ) + 2
+ 18
Trang 27Where V denotes the input volume size (height,width,depth), R the receptive fieldsize, Z the amount of zero padding set, and S the stride If the calculated result of thisequation is not a whole integer, the stride has been set incorrectly, and the neurons willnot fit neatly across the given input.
Despite our best efforts, if we use an image input of any real dimensional, ourmodels will still be enormous However, methods for greatly reducing the overallnumber of parameters within the convolutional layer have been developed
The assumption behind parameter sharing is that if one region feature is useful tocompute in one spatial region, it is likely to be useful in another
If we constrain each individual activation map within the output volume to thesame weights and bias, the number of parameters produced by the convolutional layerwill be drastically reduced
As a result, when the back propagation stage occurs, each neuron in the outputwill represent the overall gradient, which can be totaled across the depth, updatingonly a single set of weights rather than all of them
Pooling layer: The goal of pooling layers is to gradually reduce the dimensional
of the representation, and thus the number of parameters and computationalcomplexity of the model The pooling layer runs over each activation map and uses the
"MAX" function to scale its dimensionality Most CNNs use max-pooling layers withkernels of dimensionality 2 2 applied with a stride of 2 along the spatial dimensions ofthe input This reduces the activation map to 25% of its original size while keeping thedepth volume at its original size Because of the pooling layer's destructive nature,there are only two commonly observed methods of max-pooling Typically, thepooling layers' stride and filters are both set
Fully connected layer: The neurons in the fully-connected layer are directly
connected to the neurons in the two adjacent layers, but not to any layers within them
2.1.2 Yolo network
The overview of Yolo
Yolo (You Only Look Once) is a CNN network model for object detection,classification, and recognition Yolo's convolutional layers and connected layers are
Trang 28combined Convolutional layers extract features from images, while connected layerspredict probabilities and coordinates in objects.
Yolo is not the best algorithm, but it is the fastest in object recognition models Itcan reach near-real-time speeds, but its accuracy is not significantly lower than that ofthe top models Because Yolo is an object detection technique, the model's goal is topredict labels for objects in classification tasks as well as to locate the object As aresult, instead of assigning a single label to an image, Yolo can detect many objectswith label differences in a single snap Yolo has the advantage of only takinginformation from the entire image at once, predicting the entire object box containingthe objects Because the model is built from the ground up, it should be trained entirelyusing gradient descent
The architecture of Yolo
According to the author, Yolo network is inspired from GoogleLenet model forimage classification Their network consists of 24 convolutional layers, followed bytwo fully connected layers Instead of the GoogleLeNet inception modules, we simplyuse 1x1 reduction layers followed by 3x3 convolutional layers The entire network’sarchitecture is shown below
Figure 2.3: Architecture of Yolo network [3]
The author also trains a fast version of Yolo to test the limits of fast object detection.Fast Yolo employs a neural network with fewer convolutional layers (9 as opposed to
24) and filters in those layers Except for the network size, all training and testing parameters are the same between Yolo and Fast Yolo
Trang 29The final output is 7x7x30 tensors.
Anchor box: YOLO will need the anchor boxes as the basis of the estimation to
find the bounding box for the object These anchor boxes will be predefined and willclosely surround the object The anchor box will be refined later by the regressionbounding box algorithm to create a predicted bounding box for the object Each object
in the training image is distributed around an anchor box in a YOLO model If thereare two or more anchor boxes surrounding the object, we will choose the one with thehighest IoU with the ground truth bounding box
Figure 2.4: Identify anchor box of an object [3]
Each object in the training image is assigned to a cell on the feature map thatcontains the object's midpoint So, in order to identify an object, we must first identifytwo components associated with it (cell, anchor box) It's not just the cell or the anchorbox
Bounding box: Each grid cell forecasts the B bounding boxes, every bounding
box has five predictions: x, y, w, h, and confidence The (x, y) coordinates representthe center of the box in relation to the grid cell bounds The width and height arecalculated in relation to the entire image Finally, the IOU between the predicted boxand any ground truth box is represented by the confidence prediction
In addition, each grid cell predicts C conditional class probabilities, Pr(Classi |Object) These probabilities are dependent on the grid cell that contains an object.Regardless of the number of boxes B, we predict only one set of class probabilities pergrid cell We multiply the conditional class probabilities by the individual boxconfidence predictions at test time
Trang 30This provides us with confidence scores for each box based on its class Thesescores encode both the likelihood of that class appearing in the box and the accuracywith which the predicted box fits the object.
Figure 2.5: The active way of YOLO [3]
2.1.3 Yolov7
The Yolov7’s theory:
The YOLO (You Only Look Once) v7 model is the most recent addition to theYOLO family YOLO models are object detectors with a single stage Image frames in
a YOLO model are characterized by a backbone These features are combined andmixed in the neck before being transmitted to the network's head YOLO predicts thelocations and classes of objects that should have bounding boxes drawn around them.Yolov7 outperforms all known object detectors in terms of both speed andaccuracy in the 5 FPS to 160 FPS range, and has the highest accuracy (56.8% AP)among all known real-time object detectors about 30 FPS on GPU V100
The architecture of Yolov7:
Trang 31YOLOv4, Scaled YOLOv4, and YOLO-R were used to create the architecture.Using these models as a foundation, additional experiments were carried out in order
to develop new and improved YOLOv7
Yolov7 performs the same recognition as previous Yolo versions, but it is fasterand has a shorter inference time Generally, Yolov7 is designed much convolutionlayer than others version Furthermore, Yolov7’s architecture also has been differentcompare with the previous architecture When designing a network architecture,researchers commonly prioritize fundamental requirements such as: the number ofparameters, amount of computation, and computational density is lower than before
In this version, author not only bases on the following conditions, but alsoconsiders the number of element on the convolution layer output tensors Since then,the author has created the CSPVoVNet network, which was inspired by the previousVoVNet network
E-ELAN (Extended Efficient Layer Aggregation Network): The YOLOv7
backbone's computational block is the E-ELAN It is inspired by previous networkefficiency research It was created by analyzing the following factors that influencespeed and accuracy:
• Memory access fees
• The ratio of I/O channels
• Operation by element
• Activation
• The gradient path
Simply put, the E-ELAN architecture allows the framework to learn moreeffectively It's built around the ELAN computational block At the time of writing, theELAN paper had not yet been published When ELAN information becomes available,
we will update the post
Trang 32Figure 2.6: Architecture of E-ELAN [4]
Compound Model Scaling in YOLOv7: Different models are required for
different applications Some require highly accurate models, while others prioritizespeed Model scaling is done to meet these requirements and fit it into differentcomputing devices
The following parameters are taken into account when scaling a model size:
• Resolution ( size of the input image)
• Width (number of channels)
• Depth (number of layers)
• Stage (number of feature pyramids)
A common model scaling method is NAS (Network Architecture Search).Researchers use it to iterate through the parameters in order to find the best scalingfactors However, methods such as NAS perform parameter-specific scaling In thiscase, the scaling factors are unrelated
The YOLOv7 paper's authors demonstrate that it can be further optimized using acompound model scaling approach For concatenation-based models, width and depthare scaled in coherence
Trang 33Figure 2.7: Architecture of Compound Model Scaling in YOLOv7 [4]
Trainable Bag of Freebies in YOLOv7:
Planned Re-parameterized Convolution
Averaging a set of model weights is used in re-parameterize techniques to create amodel that is more robust to the general patterns that it is attempting to model Therehas recently been a focus in research on module level re-parameterize, where eachcomponent of the network has its own re-parameterize strategy
The YOLOv7 authors use gradient flow propagation paths to determine whichnetwork modules should and should not use re-parameterize strategies
Trang 34Figure 2.8: Architecture of Planned Re-parameterized Convolution [4]
The RepConv layer replaces the E-ELAN computational block's 33 convolutionlayer in the diagram above We conducted experiments by switching or replacing thepositions of RepConv, 33 Conv, and Identity connection It is simply an 11convolutional layer We can see which configurations work and which do not Moreinformation about RepConv can be found in the RepVGG paper
In addition to RepConv, YOLOv7 re-parameterized Conv-BN (Convolution BatchNormalization), OREPA (Online Convolutional Re-parameterize), and YOLO-R toachieve the best results
Coarse for Auxiliary and Fine for Lead Loss
The YOLO network head makes the final network predictions, but because it is sofar downstream in the network, it may be advantageous to add an auxiliary headsomewhere in the middle You are supervising both this detection head and the headthat will actually make predictions while training
Because there is less network between the auxiliary head and the prediction, itdoes not train as efficiently as the final head - so the YOLOv7 authors experiment withdifferent levels of supervision for this head, settling on a coarse-to-fine definitionwhere supervision is passed back from the lead head at different granularity
Figure 2.9: Architecture of Coarse for Auxiliary and Fine for Lead Loss [4]
2.1.4 OCR Theory
The Overview of OCR:
The use of technology to distinguish printed or handwritten text characters withindigital images of physical documents, such as a scanned paper document, is known asOCR (optical character recognition) The fundamental process of OCR is to examinethe text of a document and translate the characters into code that can be used for dataprocessing Text recognition is another term for optical character recognition (OCR)
The Principle work of OCR:
Trang 35The first step in OCR is to process the physical form of a document with ascanner Following the copying of all pages, OCR software converts the document to atwo-color, or black and white, version The scanned-in image or bitmap is analyzed forlight and dark areas, with dark areas indicating characters that must be recognized andlight areas indicating background.
The dark areas are then further processed to look for alphabetic letters or numericdigits OCR programs use a variety of techniques, but most focus on one character,word, or block of text at a time Following that, characters are identified using one oftwo algorithms:
Recognition of patterns: OCR programs are fed text examples in various fonts
and formats, which are then compared and recognized in the scanned document
Detection of features: To recognize characters in a scanned document, OCR
programs use rules based on the characteristics of a specific letter or number Forcomparison, features could include the number of angled lines, crossed lines, or curves
in a character For example, the capital letter "A" could be represented by two diagonallines intersected by a horizontal line in the middle
2.1.5 Tesseract model
Text recognition is a difficult task in computer vision that has a lot of practicalapplications Optical character recognition (OCR) enables a variety of automationapplications This project focuses on natural image word detection and recognition.The targeted problem is significantly more difficult than reading text in scanneddocuments Because of the limited availability of images, the use case in focus makes
it possible to detect the text area in natural scenes with greater accuracy This isaccomplished by mounting a camera on a truck and continuously capturing similarimages The Tesseract OCR engine is then used to recognize the detected text area
Line Finding:
The line finding algorithm is designed to recognize a skewed page without having tode-skew it, saving image quality Blob filtering and line construction are critical steps inthe process Assuming that page layout analysis has already provided text regions ofroughly uniform text size, a simple percentile height filter removes drop-caps andcharacters that are vertically touching Because the median height approximates the text
Trang 36size in the region, it is safe to filter out blobs smaller than some fraction of the medianheight, which are most likely punctuation, diacritical marks, and noise.
The filtered blobs are more likely to fit a model of parallel but sloping overlapping lines Sorting and processing the blobs by x-coordinate allows you toassign each blob to a unique text line while tracking the slope across the page, with amuch lower risk of assigning to the wrong text line in the presence of skew Once thefiltered blobs have been assigned to lines, the baselines are estimated using a leastmedian of squares fit, and the filtered-out blobs are fitted back into the appropriatelines The final step in the line creation process merges blobs that overlap by at leasthalf horizontally, associating parts of some broken characters and putting diacriticalmarks together with the correct base
non-BaseLine Fitting:
The blobs are partitioned into groups with a reasonably continuous displacementfor the original straight baseline to fit the baselines A least squares fit is used to fit aquadratic spline to the most populous partition (assumed to be the baseline) Thequadratic spline has the advantage of being reasonably stable in this calculation, but ithas the disadvantage of causing discontinuities when multiple spline segments arerequired A more traditional cubic spline might be preferable
Chopping and Fixed Pitch Detection:
Tesseract examines the text lines to see if they are fixed pitch When Tesseractencounters fixed pitch text, it chops the words into characters based on the pitch anddisables the chopper and associator on these words for the word recognition step
2.2 Barcode Technology
2.2.1 Introduction to barcode
Nowadays, automation in production and management has become a leading trendnot only in each country but also in the whole world The use of automatic dataacquisition (ADC) technology, in general, and barcode technology, in particular, hasbrought many obvious benefits in commerce and management One of the mostapparent benefits is that inventory, payment, and export management are carried outquickly and accurately Barcodes are more widely used than other ADC technologiesbecause of their economic advantages and high efficiency
Trang 37Norman Joseph Woodland and Bernard Silver developed the idea of barcode
technology In 1984, students at Drexel University developed this idea after learning
what a food company president wanted to ask his employees to manage To be able to
test the entire process automatically
2.2.2 Barcode types
1D barcodes are standard linear barcodes with alternating parallel black and white
stripes A 1D code is called a "one-dimensional bar code" because the data encoded in
it is changed based on only one dimension - the width (horizontal) 1D barcodes
include many different types Depending on the amount of information, the form of
information is encrypted characters or digits, as well as the industry or field of use;
people are divided into many types, of which the common types on the market include
UPC,EAN, Code 128, and Code 39
Table 2.1: Describe the types of 1D barcodes used in industries
Barcode Appearance Fields Reason
Type
UPC UPC codes are used to paste UPC codes are used in
and check consumer goods, the field of code only;retail businesses, food no need fortechnology, all over the alphanumeric code,world and they are reliable.Currently, UPC Codes are UPC codes have codescommonly used for North for error checking.American countries and
Canada
Code Code 128 is applied in the Code 128 is highly
128 distribution of goods in appreciated and
logistics and transportation, famous for itsretail supply chain, and application because itmanufacturing industry has advantages such as
compact barcode,
Trang 38diverse informationstorage, and canencode morecharacters: uppercase,lowercase, charactersnumbers, standardASCII characters, andcontrol codes.
EAN EAN code is a type of EAN codes are used in
barcode with many the field where onlysimilarities to UPC code and codes are required, no
is commonly used in alphanumeric codesEuropean countries are needed, and they
are reliable Code withcode for errorchecking
Code 39 Code 39 is widely used in the Code 39 type of code
Ministry of National overcomes the biggestDefense, the Health sector, drawback of the aboveadministrative agencies, and two types of EAN andbook publishing UPC barcodes, which
is unlimited capacityand can encode bothuppercase characters,natural numbers, andsome characters
2.2.3 The methods of Barcode scanning
Currently, there are many barcode recognition devices, and each device will have
different identification methods, all of which recognize the barcode However, no
Trang 39method is considered the best; each identification method also has its advantages anddisadvantages, and these methods are always being researched and developed.
CCD Scanner:
Figure 2.10: CCD Scanner
The CCD scanner consists of an array of LEDs arranged so that the emitted lightrays form a straight horizontal line of light that cuts across the surface of thesymbology The reflected light obtained by the CCD Scanner Lense is the part used toconvert the light signal into a digital signal
• Advantages: low cost
• Cons: This type can only scan barcodes on flat surfaces at close range, not barcodes in curves
Laser Scanner:
Figure 2.11: Laser Scanner
Trang 40Laser scanners consist of a reader that emits a red laser and then uses a reflector tocreate a light trail that cuts across the surface of the barcode and does not use a light-collecting lens.
• Advantages: no need for light collection, very sensitive laser scanning, highaccuracy results, can scan barcodes on work surfaces, and long-range scanningcapabilities
• Cons: The reading eye is not durable; after a while, it may be weakened due
to the phenomenon of a 'cocoon of barcodes
Read barcodes with Camera Software:
Figure 2.12: Read barcodes with Camera Software
The use of cameras is of great interest and is mainly used for applications that run
on smartphones and jobs that need to handle multiple barcodes simultaneously Use ahigh-resolution camera and autofocus to take input images and process them using pre-programmed software
• Advantages: Giving users a more intuitive view of the processing andreading of barcodes on images, a sensitive camera, good focus, and accuracy Processmultiple barcodes simultaneously Highly portable, suitable for use on small and compactmobile devices
• Cons: Affected by ambient light, the resolution and focus of the cameramust be high and appropriate, the ability to read barcodes on curved surfaces is poor, andthe reading distance is not far
2.2.4 Code 128
Introduction to Code 128