Design of an automated plant disease detection and diagnosis system

According to the research team's survey, previous real-time disease identification projects on tomato plants mainly used different CNN architectures for research and development purposes

Trang 1

TECHNOLOGY AND EDUCATION

MINISTRY OF EDUCATION AND TRAINING

HO CHI MINH CITY UNIVERSITY OF

GRADUATION THESIS AUTOMATION AND CONTROL ENGINEERING TECHNOLOGY

Ho Chi Minh City, July 2023

DESIGN OF AN AUTOMATED PLANT DISEASE

DETECTION AND DIAGNOSIS SYSTEM

S K L 0 1 1 3 0 8

LECTURER: TRAN VU HOANG, Ph.D STUDENTS: NGUYEN HUU THANG TRAN BINH TRONG

Trang 2

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION

FACULTY FOR HIGH QUALITY TRAINING

Major: AUTOMATIC AND CONTROL ENGINEERING

Advisor: TRAN VU HOANG, Ph.D

Ho Chi Minh City, July, 2023

Trang 7

1

ACKNOWLEDGEMENTS

Firstly, we would like to express our gratitude to our family for providing us with the opportunity to study at HCMC University of Technology and Education Next, we would like to thank the professors at the HCMC University of Technology and Education for their invaluable guidance and imparting both academic and social knowledge Their unwavering support has been instrumental in overcoming challenges throughout our educational journey and expanding our understanding We extend our heartfelt gratitude to our classmates, both within and outside the classroom, as well as the senior students who generously provided assistance and support during this time The solid foundation of our knowledge was established through the guidance of our teaching faculty, friends, and seniors, which greatly facilitated the development of a successful graduation project under the supervision of our advisor

We would like to express our sincere gratitude to Dr Tran Vu Hoang, a current lecturer at the Department of Electrical - Electronics, HCMC University of Technology and Education Dr Hoang provided invaluable guidance throughout the completion of this graduation project, offering insightful suggestions and imparting previously unknown knowledge, which greatly contributed to the successful completion of the project Additionally, we extend our thanks to the members of the UTE-AI Laboratory for their wholeheartedly support during the finalization process of this project

The process of researching and developing this project is not immune to mistakes, so we sincerely hope to receive valuable feedback from esteemed faculty members, the thesis defense committee, and our peers This feedback will enable us to further improve the project and apply the findings in practical scenarios, with the ultimate aim of alleviating the hardships faced by farmers

We express our heartfelt appreciation to everyone!

Trang 8

2

ABSTRACT

The greenhouse tomato farming model was developed to solve the problem of low crop yields due to external environmental impacts due to climate change However, the model still faces several challenges and difficulties such as the need for a large horticultural workforce, and the suboptimal use of pesticides The use of labor to take care of each plant area in the garden is not optimal because diseases on tomato plants, especially brown spot disease, have a very rapid spread time To solve this labor problem, researchers have proposed solutions such as using real-time disease recognition algorithms and integrating them into Drones However, the actual implementation of these solutions with tomato plants so far has not shown effective

According to the research team's survey, previous real-time disease identification projects

on tomato plants mainly used different CNN architectures for research and development purposes The ability to identify diseases with those neural network structures is still limited in terms of recognition speed, and there is no specific proposed practical model for

an application

To tackle the challenges related to recognition speed, accuracy, and practicality in a greenhouse garden environment, our research team presents a novel system that leverages mechanically moving axes, similar to a CNC machine These axes are seamlessly integrated with a webcam, enabling flexible movement and monitoring across multiple locations for automated real-time disease identification and warnings The system is specifically designed to accurately pinpoint the affected areas of plants and display the corresponding disease types Notably, one of the primary advantages of this system is its capability to operate continuously and at high speeds over prolonged durations

The system employs a YOLO v8n model-based detection structure on a dataset consisting

of 10 common diseases of tomato plants from Kaggle, supplemented with manually collected data The achieved results demonstrate high accuracy in detecting leaf diseases

of tomato plants, with an average precision of 92.6% and an achieved speed of 18-24 FPS

on Jetson Nano Develop Board (with GPU:128-core NVIDIA Maxwell architecture-based GPU)

Trang 9

3

CONTENT PAGE

ACKNOWLEDGEMENTS 1

ABSTRACT 2

CONTENT PAGE 3

LIST OF ABBREVIATIONS 5

LIST OF TABLES 6

LIST OF FIGURES, GRAPHS 7

Chapter 1 SUMMARY 10

1.1 Introduction 10

1.2 The Objective of the Research Project 16

1.3 Research and Contents 16

1.4 Limitation of Project 17

1.5 Subject and Scope of Research 17

1.6 Research Methodology 17

1.7 Structure of Project 18

Chapter 2 OVERALL RESEARCHING 19

2.1 Growing conditions of tomato plants in a greenhouse model 19

2.1.1 Soil Quality and Composition 19

2.1.2 Temperature and climate 19

2.1.3 Water 19

2.1.4 Nutrient management 19

2.1.5 Pest and Disease Control 20

2.2 Introduction Object Detection 20

2.2.1 Overall of YOLO (You Only Look Once) 22

2.2.2 YOLOv1 23

2.2.3 YOLOv2 25

2.2.4 YOLOv3 26

2.2.5 YOLOv4 29

2.2.6 YOLOv5 30

Trang 10

4

2.2.7 YOLOv6 32

2.2.8 YOLOv7 33

2.2.9 YOLOv8 35

2.3 Mechanical Structure and Controller Methods 37

Chapter 3 SYSTEM DESIGN 40

3.1 The requirements imposed on the system 40

3.1.1 Block diagram of the system and explanation 40

3.1.2 Functions and Hardware used in CAMERA block 41

3.1.3 Main Processor and Communication Unit 42

3.1.4 Detection Software, and Hardware used 49

3.1.5 Actuators 53

3.1.6 HMI Block, and Hardware 57

3.2 Overall Operation of System 59

3.3 Electrical Cabinet 61

Chapter 4 EXPERIMENTAL RESULTS 64

4.1 Evaluation Criteria 64

4.2 Testing Environment 64

4.3 Evaluation Methodology 66

4.3.1 Evaluating on Kaggle Dataset and Fine-Tune Dataset 66

4.3.2 Assessing The Autonomous Operation of the Mechanical Model Without Object Detection 68

4.3.3 Assessing The Autonomous Operation of the Mechanical Model with Object Detection 68

4.4 Experimental Procedure 68

4.4.1 Evaluate the YOLOv8n Object Detection Model on Two Datasets 68

4.4.2 Evaluation of The Response of the Mechanical System 69

4.4.3 HMI Design 72

4.4.4 Evaluating All Functions of the System during Operate 74

Chapter 5 CONCLUSION AND DEVELOPMENT 77

5.1 Conclusion 77

5.2 Development 77

REFERENCES 79

Trang 11

5

LIST OF ABBREVIATIONS

CNN: Convolutional Neural Network

ANN: Artificial Neural Network

HSV: Hue, Saturation and Value

MDC: Maximum Distance to Centroids

YOLO: (You Only Look Once) is a popular object detection algorithm that has revolutionized the field of computer vision It is fast and efficient, making it an excellent choice for real-time object detection tasks (https://www.v7labs.com/)

FPS: Frame rate (expressed in frames per second or FPS) is typically the frequency (rate)

at which consecutive images (frames) are captured or displayed (Wikipedia)

CNC: CNC machine is a motorized maneuverable tool and often a motorized maneuverable platform, which are both controlled by a computer, according to specific (Wikipedia)

Trang 12

6

LIST OF TABLES

Table 3.1: Comparison Between Webcams at 41

Table 3.2: CPU and GPU comparison between Jetson Nano and Raspberry Pi4 42

Table 3.3: Controller for Actuators 44

Table 3.4: Comparison between YOLO versions 50

Table 3.5: Z-T Axis with Drivers and Step Motor size 42 choice 54

Table 3.6: X, Y axis Step Motor 56

Table 3.7: Electrical Components Selection 62

Table 4.1: Performance on Kaggle Dataset on all class 69

Table 4.2: Performance on fine-turn Dataset on all class 69

Table 4.3: Mechanical System Evaluation 69

Trang 13

7

LIST OF FIGURES, GRAPHS

Figure 1.1 Statistic Climate Change: Global Temperature 10

Figure 1.2 Evolution of Global Export, by sectors 11

Figure 1.3: The tomato plants are cultivated within a greenhouse model 12

Figure 1.4: Production between 2003 and 2017 12

Figure 1.5: Vented heater 13

Figure 1.6: Automatic watering system in greenhouse 13

Figure 1.7: DCS System 14

Figure 1.8: Example of Comparison between Disease and Dirty Leaf 15

Figure 1.9: Drone in Agriculture 16

Figure 2.1: Regions with CNN features 21

Figure 2.2: Benchmarking between Yolo and R-CNN family 23

Figure 2.3: YOLO v1 Architecture 23

Figure 2.4: Architecture of YOLOv2 26

Figure 2.5: Comparison between YOLOv2 and R-CNN Family 26

Figure 2.6: YOLO v3 Architecture 27

Figure 2.7: Feature Pyramid Network 27

Figure 2.8: YOLOv4 architecture 30

Figure 2.9: Comparison between CSPResBlock YOLOv4 and YOLOv5 30

Figure 2.10: SPPF (SPP-Fast) 31

Figure 2.11: Auto Anchor of YOLOv5 31

Figure 2.12: General architecture of the model 32

Figure 2.13:The effect of Implicit Knowledge according to the placement 32

Figure 2.14: Differences of couple head and decouple head 33

Figure 2.15: Improvement of YOLOv7 Speed 34

Figure 2.16: Yolov8 architecture 36

Figure 2.17: Compare speed and accuracy between YOLO versions 37

Figure 2.18: Column Robot 38

Figure 2.19: CNC 3 Axis Model 38

Trang 14

8

Figure 3.1: Blocks of System Flowchart 41

Figure 3.2: Essager C3 Webcam 42

Figure 3.3: Jetson Nano 43

Figure 3.4: Arduino MEGA 2560 Communication Pinout 44

Figure 3.5: Arduino MEGA 2560 between Jetson Nano and Tools 44

Figure 3.6: Serial Communication and Parallel Interface Example 45

Figure 3.7 TTL Logic of Serial Communication 46

Figure 3.8:Frame Transmitter of Serial Communication 46

Figure 3.9: PWM Signals 47

Figure 3.10: PWM Generator Model 48

Figure 3.11: Sub-PCB Arduino MEGA 2560 49

Figure 3.12: Processing Block 50

Figure 3.13: The discretization of the ground truth box using the General Distribution function 52

Figure 3.14: X, Y, Z Axis 53

Figure 3.15: Webcam View in Mechanical System 53

Figure 3.16: Pulley GT2-20teeth 54

Figure 3.17: A4988 Green Step Motor Driver 55

Figure 3.18: Step Motor size 42 55

Figure 3.19: Fastech Driver and Motor 56

Figure 3.20: Parameter of Fastech Driver Motor 57

Figure 3.21: Setup Parameter for Motor 57

Figure 3.22: Starting Display 58

Figure 3.23: Main Display 59

Figure 3.24: Overall Operation of System 60

Figure 3.25: Wiring Diagram of System 63

Figure 4.1: Dataset 65

Figure 4.2: Dataset Collection Information 66

Figure 4.3: Practice Model 70

Figure 4.4: Surface of Electrical Cabinet 71

Trang 15

9

Figure 4.5: Inside of Electrical Cabinet 71

Figure 4.6: Start Screen 72

Figure 4.7: Main Control Screen 73

Figure 4.8: Auto Mode 73

Figure 4.9: Manual Mode 74

Figure 4.10: Confusion Matrix on Fine-Tune Dataset 75

Figure 4.11: Operating System on HMI 75

Trang 16

One prime example is the tomato plant, which is extensively cultivated worldwide owing

to its versatility and high global consumption rates According to market research conducted between 2010 and 2021 by Tomatonews.com, a tomato plant-focused research [3], the consumption of tomato-derived products has shown an increasing trend across all three charts presented in Figure 1.2

Figure 1.1 Statistic Climate Change: Global Temperature

Trang 17

11

Figure 1.2 Evolution of Global Export, by sectors

To address the challenges posed by climate change in agriculture in general, and tomato cultivation in particular, numerous proposals have been made worldwide to increase their productivity and quality while adapting to recent climate change In traditional agriculture systems, when faced with climate change, farmers tend to reduce the number of planting cycles per year However, this approach poses a significant problem as it fails to meet market demands and leads to escalated costs associated with tomato production

In line with the global trend, the implementation of a greenhouse model, as depicted in Figure 1.3, has been adopted to enhance the growing conditions for tomato plants This model enables effective management of the physical factors that directly influence plant growth, as supported by research [4] The efficacy of this model in quality management and environmental temperature control within the greenhouse is evidenced by the graphs provided in Figure 1.4 It demonstrates that the annual output (represented by the green line) gradually increases, which proves the effectiveness of the method in [4] research

Trang 18

12

Figure 1.3: The tomato plants are cultivated within a greenhouse model

Figure 1.4: Production between 2003 and 2017 Although the aforementioned solution facilitates temperature regulation in greenhouse environments, it still faces unresolved challenges such as high investment and implementation costs, requiring substantial technical expertise and human resources To tackle these difficulties, researchers at the University of Tennessee proposed an automated heating system and natural air system [4] to regulate temperature during cold weather and low nighttime temperatures The research paper provides detailed information on all factors involved in constructing a comprehensive temperature control system for greenhouse

Trang 19

13

models, including fuel type for operating the system, system size, and heating equipment specifications, as shown in Figure 1.5 This comprehensive approach has made the installation process and investment cost calculation more flexible

Figure 1.5: Vented heater

Alongside the temperature control system, digital research [4] proposes air quality management, which includes humidity control and an automated irrigation system This system typically employs sensors to measure soil moisture and air humidity, enabling decisions to be made regarding plant watering based on demand or a pre-established plan,

as depicted in Figure 1.6 With this system, relevant parameters are continuously displayed and monitored in real-time, allowing for prompt responses to necessary conditions and the ability to make predictions As a result, it minimizes errors inherent in human experience and reduces the labor required for system operation

Figure 1.6: Automatic watering system in greenhouse

Trang 20

14

In summary, the issue of environmental impact on crop productivity has been addressed

by implementing an automated system that actively controls temperature and humidity in real-time [4] Additionally, soil moisture management and an automatic irrigation system have been incorporated into the model This system utilizes sensors and a distributed control system (DCS), as depicted in Figure 1.7 It relies on PLC technology to manage all parameters within the model and control its operation efficiently

Figure 1.7: DCS System The DSC has several advantages when applied to this model, including the ability to cluster the parameters of each area in the greenhouse for calculation and information processing This means the processing unit can be located away from the sensors, reducing investment costs and allowing the operator to work with a single processor

In addition to these advantages, it's important to consider the susceptibility of tomato plants

to diseases According to statistics from Wikipedia [5], there are six main disease groups and over 70 different diseases that can affect tomato plants Traditional methods of managing these diseases, such as manual treatment and the use of plant protection chemicals, have proven to be expensive and ineffective As a result, researchers have been investigating new solutions like identifying diseased leaves and using proposed algorithms for classification to help improve management strategies

Next is the management of harmful diseases on tomato plants According to statistics from Wikipedia [5], there are currently about six main disease groups that are harmful to tomato plants, and within each group, there are numerous different diseases, estimated to be over

70 diseases The manual solutions previously employed involved using human factors to treat plant diseases, which incurred significant labor costs The use of plant protection chemicals was not optimized for each region, resulting in wastage With the rapid spread

Trang 21

15

of leaf spot diseases on tomato plants, the previous manual approach was not truly effective To address this issue, solutions were developed to identify diseased leaves and classify them based on proposed techniques that have been around for a long time

The initial step was conducting experiments using classification on MATLAB through image processing techniques, followed by using artificial neural networks (ANN) [6] for classification The accuracy achieved was very high, but the processing time was slow, and there were still many limitations that prevented practical implementation There could be confusion between diseased plants and plants with stains caused by factors such as soil sticking to the leaves during watering, as depicted in Figure 1.8 because the color difference between soil and healthy leaves is significant

Figure 1.8: Example of Comparison between Disease and Dirty Leaf

One of the primary limitations of the model mentioned in [6] is its feature extraction capability for precise tree identification and classification To address this challenge, the convolutional neural network (CNN) has emerged as a commonly adopted approach By leveraging a CNN model, features can be effectively extracted and diseases in tomato plants can be accurately classified

Utilizing a CNN model to accomplish both tasks is considered highly effective To meet the requirements of speed and accuracy, the YOLO (You Only Look Once) algorithm is proposed for detection While traditional hand-treatment methods face various challenges, the Object Detection algorithm from CNNs, as proposed by numerous researchers, has shown promise However, when it comes to hardware implementation, there are limited solutions available Some previous proposals have suggested using drones, as shown in

Trang 22

16

Fig 1.9, for disease detection and classification Unfortunately, the soft nature of tomato plants and the airflow caused by drones often lead to inaccurate and inefficient results

Figure 1.9: Drone in Agriculture

Therefore, we propose the implementation of a mechanical structure equipped with a Direction Axis to operate and position the camera at different locations within the greenhouse model By combining this setup with the YOLO v8 algorithm, which is known for its high accuracy and fast processing speed, we anticipate meeting the specific requirements for cultivating this plant species in the greenhouse model

3-1.2 The Objective of the Research Project

This project proposes a real-time crop monitoring model that takes inspiration from CNC hardware, combined with the deep learning YOLOv8 model The monitoring model relies

on the state of the plant leaves to detect and provide timely alerts for various crop diseases Growers can take preventative measures or treatment methods in a timely manner to avoid any damage to crop yield and quality

1.3 Research and Contents

To address urgent issues and demonstrate the research topic outlined in section 1.2, the team carried out the following tasks:

- Task 1: Surveying related topics and real-world issues that currently lack optimal solutions, and proposing an approach to address them

- Task 2: Summarizing the stated requirements, designing the block diagram of the entire system, and providing detailed explanations for each block

Trang 23

- Task 6: Designing the wiring diagram for the entire electrical system

- Task 7: Creating operational flowcharts for the overall system and its related functions

- Task 8: Executing the hardware implementation practice

- Task 9: Develop the code program for the hardware to control the movement of the Webcam and facilitate communication with the central processing unit

- Task 10: Designing a remote control interface (HMI)

- Task 11: Conducting experimental testing of the system, collecting errors during disease detection to calibrate the system and supplement data for the dataset

- Task 12: Writing the thesis report and presenting a comprehensive overview of the graduation project

1.4 Limitation of Project

The research focuses on developing a detection and alert system for harmful diseases in tomato plants in a small model The system operates automatically and identifies 9 diseases and one healthy status, recognizing specific regions affected and the types of diseases present in order to provide timely intervention measures through a remotely controlled monitoring screen, managed by a central server for the entire system

1.5 Subject and Scope of Research

Subjects of Research:

- Object Detection and Classification Model

- Practice Model using a 3-Direction Axis and 1-Rotation Motor to change views of the Webcam

- Human Interface Machine (HMI Monitoring the Operation of the System)

Scope of Research:

The system operates in real-time over an extended period, ensuring timely detection and containment of diseased trees Alerts are conveyed through a monitoring screen, enabling the manager to promptly devise solutions and administer treatment for the affected trees

1.6 Research Methodology

Analysis of the residues in tomato cultivation

Trang 24

18

Survey and analysis of methods for identifying and classifying diseases on tomato plants Analysis of an optimal mechanical model for monitoring plant growth in an optimized greenhouse model

Evaluation of the system and directions for development

1.7 Structure of Project

Chapter 1: SUMMARY

In this chapter, our team will present the remaining issues of the research topic and the methods to address those problems We will outline the objectives, tasks, limitations, research subjects, and scope to provide an overview of the project's content

Chapter 2: OVERALL RESEARCHING

This chapter will explore the methods for identifying the condition of trees and classifying common diseases in the present time

Chapter 3: SYSTEM DESIGN

In this chapter, we will conduct research and design a block diagram, followed by an analysis Based on the analysis results, we will select the main processing model for detecting tree states Then, we will proceed to design the hardware model and execute the construction to ensure the system meets the specified requirements Additionally,

we will design the control and monitoring interface (HMI) with the necessary functions

Chapter 4: EXPERIMENTAL RESULTS

This chapter demonstrates the results achieved by the system, applies them to world scenarios, and provides evaluations and directions for the development of the topic based on those results

real-Chapter 5: CONCLUSION AND FUTURE DEVELOPMENT

This chapter will present the conclusions drawn from the evaluation and discuss the directions for further development based on the strengths and weaknesses identified

in Chapter 4

Trang 25

19

Chapter 2 OVERALL RESEARCHING

2.1 Growing conditions of tomato plants in a greenhouse model

In this section, an overview of the study of the conditions necessary for the tomato plant to grow and the remaining problems

2.1.1 Soil Quality and Composition

The quality and composition of the soil plays an important role in the growth of tomato plants Tomatoes thrive in well-drained, loamy soil with a pH between 6.0 and 6.8 The soil must have sufficient organic matter content to support its ability to provide nutrients and retain moisture In addition, soil fertility, which is determined by the presence of essential nutrients such as nitrogen, phosphorus and potassium, significantly affects the overall health and yield of tomato plants The conditions mentioned above can be tightly controlled in a greenhouse model, and changes in soil nutrient composition take a long time

to test for them

2.1.2 Temperature and climate

Tomatoes are warm-loving plants that require specific temperatures and climatic conditions for optimal growth The ideal temperature ranges for growing tomatoes is 20°C

to 30°C (68°F and 86°F) controlled by a temperature and airflow management system [4] Lower temperatures can hinder plant growth and development, while higher temperatures have an adverse effect on fruit set Furthermore, tomato plants thrive in areas with moderate humidity and adequate sunlight, usually 6 to 8 hours of direct sunlight per day well controlled in a greenhouse environment

2.1.3 Water

A consistent and consistent water supply is essential for tomato plants Adequate moisture

is required for proper nutrient absorption, photosynthesis and overall plant health However, over watering or poor drainage can cause root rot and other diseases Timely irrigation, considering factors such as soil moisture and weather conditions, is critical to maintaining optimal soil moisture and preventing water shortages in tomato plants

2.1.4 Nutrient management

Tomatoes have specific nutritional needs for their growth and development Essential nutrients including protein (N), phosphorus (P), potassium (K), calcium (Ca), magnesium (Mg) and others need to be supplied in appropriate amounts and ratios Proper fertilization techniques, such as soil testing, nutrient analysis, and organic or synthetic fertilization, are critical to maintaining nutrient balance and preventing deficiencies or excesses can hinder the growth of tomato plants

Trang 26

20

2.1.5 Pest and Disease Control

Tomato plants are susceptible to various pests and diseases that can significantly affect their growth and yield Implementing integrated pest management strategies, using mechanical, biological and chemical methods, is critical to minimizing damage caused by pests Regular monitoring, early detection and timely intervention are essential to ensure the health of tomato plants Previously, this required a lot of labor on the farm, but managing crops on a large area would be difficult The biggest obstacle to this is the time

to detect the disease, leaf spot disease on tomato plants grows very quickly and can increase

to 1cm in diameter of leaf spot after only one day And growing in the greenhouse model has made the disease spread quickly and uncontrollably

Therefore, it is necessary to propose a method to identify diseases on tomato plants with high speed and continuous operation to promptly detect and warn farmers

Along with the above method of disease identification is the mechanical structure In the past, some researchers have suggested using drones to detect diseases in tomato plants, but this is no longer suitable in greenhouse environments The first is that this environment does not have an open space above, so controlling the Drone will face many difficulties and damage the device Next is the Drone's limitation that will not be durable when operating for a long time and the investment cost is high To solve the above mechanical problem, a proposed system must overcome the disadvantages of Drones in detecting diseases in tomato plants

2.2 Introduction Object Detection

In the primitive era, Object Detection was a problem of extracting features based on the differences between objects that humans perceive and define (hand-crafted feature extraction) However, researchers found many limitations in this approach, such as the inability to generalize features and the complexity of algorithms used for feature extraction The main challenge was handling noise and the fact that feature extraction algorithms for each problem could not be extended to other research and development tasks

To address these major drawbacks and achieve a generalized feature extraction method, the Convolutional Neural Network (CNN) was introduced It extracts features by using filters with different parameters that slide over the input image to obtain features and generate feature maps These feature maps are then fed into max pooling layers and, ultimately, fully connected layers for classification [7]

In 1998, Yan Lecun utilized CNN for handwritten digit classification and achieved high results However, limitations in datasets and hardware devices hindered the further development of CNN Around 2012 onwards, AlexNet, the first successful implementation

of CNN, overcame the barrier of hand-crafted feature extraction From this point, it could

Trang 27

21

replace traditional feature extraction methods In 2013, the introduction of Regions with CNN features (R-CNN) marked the development of Two-Stage Detection, which improved the performance of object detection and classification tasks Instead of using filters sliding over the input image, the first stage of R-CNN determines regions likely to contain objects and provides that information The second stage utilizes CNN for classifying the object, along with an improvement in the accuracy of the Bounding box (a colored rectangular shape surrounding the object) as shown in Figure 2.1

Although the subsequent versions, Fast R-CNN [8] and Faster R-CNN [9], have improved the processing speed, they still fail to fully meet the requirements in practical applications Ultimately, to address the challenges of achieving high accuracy and significantly improved speed compared to Faster R-CNN [9] in the Object Detection problem, the YOLO model was introduced in 2016 Due to its main structure utilizing One-Stage Detection, the problem is adequately resolved

Figure 2.1: Regions with CNN features Two-stage Object Detection

Two-stage object detection methods, such as Faster R-CNN (Region-based Convolutional Neural Network) and R-FCN (Region-based Fully Convolutional Networks), follow a two-step process In the first stage, these methods generate a set of region proposals, which are potential object locations, using techniques like selective search or region proposal networks (RPN) In the second stage, the region proposals are classified and refined to obtain the final object detections Advantages of accuracy, two-stage detectors generally achieve higher accuracy compared to one-stage detectors, primarily because of the separate region proposal and classification stages, which allows for more precise localization and better handling of object variations

One-stage Object Detection

Such as YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector), aim to detect objects directly in a single pass through the network These methods use a grid-based approach where a fixed set of bounding boxes with different aspect ratios and sizes are

Trang 28

Issues of Two-stage Compared to One-stage and Result of Choice

 Computational complexity: Two-stage detectors are computationally more demanding as they require generating a large number of region proposals, which can significantly impact the inference time

 Slower speed: Due to the two-step process, two-stage detectors are generally slower compared to one-stage detectors, making them less suitable for real-time applications

 Design complexity: The architecture of two-stage detectors is more complex, involving multiple stages and components, making them harder to design, train, and optimize

Addressing the initial requirements of speed and accuracy for the Object Detection Model,

we conducted a survey and research on the advancement of the One-Stage Object Detection model, YOLO (You Only Look Once)

2.2.1 Overall of YOLO (You Only Look Once)

The first version of YOLO was introduced in 2016 by Facebook AI Research (FAIR) utilizing a single artificial neural network to perform object detection and classification simultaneously in a single frame Currently, there are approximately 8 main versions of YOLO, including smaller variants suitable for different applications depending on the required speed or accuracy As depicted in Figure 2.2, significant speed improvements can

be observed in the initial version of YOLO However, shortly after, the second version of this One-Stage Detection approach was introduced, achieving significantly higher speeds compared to the R-CNN family, while maintaining comparable accuracy to certain smaller variants within v2

Trang 29

Figure 2.3: YOLO v1 Architecture Within each cell, YOLO extracts B bounding boxes and C class probabilities The bounding boxes include the following information: The center coordinates of the bounding box (x, y), the width and height of the bounding box (w, h), and its confidence score The confidence score is used to determine whether the bounding box contains an object or not, and YOLO discards all other values of bounding boxes that don't contain objects This

Trang 30

The following is the Loss function of YOLOv1:

In the objective function above, Localization loss (the loss function of the object's position)

is represented in formulas 2.1 and 2.2, and this loss is only calculated when the cell contains

an object When there is no object, the classification loss equals zero In this formula, the loss calculation of width and height is applied with a square root of 2 When the bounding boxes have different sizes, they should be treated differently, meaning that a few pixel errors in a larger bounding box will not be as serious as that in a smaller bounding box The characters with hats, such as 𝑥 and 𝑦 , represent the top-left corner coordinates predicted by the model, and the unmarked characters represent the ground truth 𝑤 and ℎ denote the predicted width and height of the model, respectively Similarly, 𝑐 represents the confidence score predicted by the model

Confidence loss (the loss function of confidence) over the entire image is shown in formulas 2.3 and 2.4 There are two parts dedicated to calculating the confidence loss Looking at formula 2.3, this section is responsible for when there is an object present in the i-th cell Specifically, a cell will have 𝐵 = 2 bounding boxes, then considering all the predicted bounding 𝐵 boxes at the i-th cell, if the j-th bounding box has the highest overlap

Trang 31

25

result with the ground truth box (the labeled rectangle containing the object), then 1ijobj

= 1, otherwise it will be 0 Reviewing formula 2.4, similarly to formula 2.3, this part will be responsible for the i-th cell where there is no object present in that cell

Similarly, to the Localization loss, the Classification loss (the loss function of object classification) is only calculated when the cell contains an object When there is no object, the classification loss equals zero and is represented in formula 2.5

The emergence of YOLO is considered a new breakthrough in the object recognition problem, however, this model still has many issues that need to be addressed and improved YOLOv1 struggled in recognizing more than one object in the same cell With the use of

2 bounding boxes, YOLOv1 encountered many obstacles when the object had a different aspect ratio than during the training phase The Localization loss function was not good, leading to poor bounding box results These limitations contributed to the production of YOLOv2

2.2.3 YOLOv2

In order to resolve the issues of detecting only a few objects in a cell and the difficulty in accurately determining the bounding box location, which remains dynamic in YOLOv1, YOLOv2 was introduced with significant improvements

YOLOv2 utilizes the darknet-19 network architecture, consisting of 19 convolutional layers Additionally, YOLOv2 includes batch norm layers to enable faster and more stable training By including batch norm in this way, even without dropout, the model does not overfit The changes in the architecture of YOLOv2 are illustrated in Figure 2.4

The next improvement in the YOLOv2 model is the use of a 13x13 grid instead of the 7x7 grid in YOLOv1, making it easier to predict small objects Additionally, YOLOv2 employs

a skip connection block, shown in Figure 2.4, to combine information from the feature map

of the previous layer and the last layer, enabling better object recognition and the ability to detect objects of varying sizes

Borrowing the idea from Fast R-CNN, YOLOv2 utilizes anchor boxes for detecting multiple objects but this also decreases accuracy in object detection The root cause of this problem depends on the quality of the anchor boxes generated from the outset To address this issue, YOLOv2 introduced a step to select appropriate indices for the anchor boxes generated from the outset through the K-means algorithm

Trang 32

26

Figure 2.4: Architecture of YOLOv2

Figure 2.5: Comparison between YOLOv2 and R-CNN Family

YOLOv2 has many strengths, such as fast processing speed compared to two-stage networks, as shown in Figure 2.5 In addition, it achieves higher accuracy than YOLOv1 thanks to the use of Anchor Box and performs well with high-resolution images However, the weaknesses of YOLOv2 are also noteworthy, including poor quality of predicted bounding boxes and confusion when objects appear or disappear in frames

2.2.4 YOLOv3

To address the shortcomings in detecting small objects of its previous version, the third iteration of YOLO, as illustrated in Figure 2.6, incorporated Darnet-53 along with FPN, as shown in Figure 2.7, to enhance this aspect The Backbone architecture of this version allows the model to achieve a deeper learning capacity with 53 Convolutional layers To improve the issue of object sizes mentioned above, this version combines with FPN to perform object detection at various scales

Trang 33

27

In Figure 2.7, (a) A pyramid-shaped image pyramid with progressively decreasing layers has been employed to construct a feature pyramid The features are computed independently at different image scales, which can result in a slowdown of the model (b) Adjacent detection systems opt to use only a single feature to increase the speed of recognition (c) Another approach is to reuse the feature pyramid structure computed by CNN as an already feature-rich image pyramid; however, this approach affects the predictions of the initial layer (d) The Feature Pyramid Network (FPN) achieves a faster speed like (b) and (c), but with higher accuracy The feature maps at the final layer, as well

as the predictions at the first layer, contain comprehensive information from each other, thereby augmenting the accuracy of the model

Figure 2.6: YOLO v3 Architecture

Figure 2.7: Feature Pyramid Network

Trang 34

28

With the changes in this version of YOLO compared to the previous version, the use of the Sigmoid function at the output is implemented due to the input datasets containing differently labeled objects in real life The retained concept includes Anchor Box, K-means, and fine-tuning some minor details for optimal performance

In this version, YOLO generates 9 Anchor Boxes instead of 5 in the previous version, with

3 different input image sizes resulting in 9 distinct Anchor Boxes evenly distributed among the 3 input sizes The formula for calculating the parameters of the Bounding Box in YOLOv3 is based on the Anchor Box as follows:

- cx, cy are the coordinates of the corner of the grid cell

- pw, ph are the length and width of the Anchor Box

- tx, ty, tw, th are the deviations of the coordinates, length, and height of the Bounding Box with respect to the Anchor Box, respectively

- bx, by, bw, bh are, in order, the coordinates, length, and height of the Bounding Box One more change in this version compared to the previous version is the Loss Function Formula 2.10, it represents Localization Loss, Formula 2.11 represents Confidence Loss, and Formula 2.12 represents Classification Loss

In expression 2.11, if the intersection over the union of one of the 9 Anchor Boxes with the Ground Truth is the highest, it will be updated on the left side of the "+" sign Conversely,

if the intersection over the union is the lowest, it will be updated on the right side Expression 2.10, in this version, employs Binary Cross Entropy (BCE) [10] instead of Squared Loss

Trang 35

29

2.2.5 YOLOv4

The emergence of YOLOv4 marks a breakthrough in one-stage object detectors with a series of significant improvements that have resulted in excellent performance and outstanding processing speed compared to its predecessors YOLO's input can now handle flexible processing with any size and strengthened object detection ability thanks to its architecture with backbone, neck, and head, as demonstrated in Figure 2.8

YOLOv4 introduces new concepts that need to be understood Firstly, the bag of freebies technique is aimed at changing the training strategy or training cost for the model to improve its accuracy Secondly, the bag of specials method adds smaller models to the model and processing techniques for the output This method increases the computational cost of the model but results in a majority of accuracy improvement for the model

The Bag of Freebies (BoF) techniques applied in YOLOv4 consist of applying data augmentation techniques, especially mosaic augmentation, which combines four training images into one to help the model detect objects outside of its context and optimize its performance Dropblock and class label smoothing [11] are also applied to prevent overfitting and improve accuracy In addition, YOLOv4 introduces a new objective function, CIoU, and Cross mini-batch normalization (CmBN) to normalize the entire batch

of input data instead of a single sample as in batch normalization

The Bag of Specials (BoS) techniques applied in YOLOv4 include keeping the same Darknet-53 architecture as the previous version but replacing the residual blocks with the CSPBlock and changing the activation function to Mish For the neck part, the authors use

a path aggregation network (PANet), an improved version of spatial pyramid pooling (SPP) With these advancements, YOLOv4 has far exceeded its predecessors' accuracy and object detection capacity

However, while YOLOv4 has many advantages in terms of accuracy, the model still has limitations that need to be addressed Firstly, YOLOv4 requires high computing power, making it difficult to run on devices with limited resources Secondly, the model struggles

to detect small or low-contrast objects, leading to missed or incorrect results Thirdly, training the model can be time-consuming and requires a large amount of high-quality labeled data Finally, the model may not perform well on images with complex backgrounds or obscured objects

Trang 36

in the other, and higher accuracy leads to a heavier model

The first improvement is in the Backbone part In this version, the CSPResBlock of the previous version has been improved into a new module called the C3 module, which has one less Convolution layer This is depicted in Figure 2.9, with the upper part showing the previous version and the lower part showing the current version of YOLOv5

Figure 2.9: Comparison between CSPResBlock YOLOv4 and YOLOv5

Next is the activation function, which is changed from Mish or LeakyReLU for lightweight versions of YOLOv4 to SiLU for the YOLOv5 version

Trang 37

31

In the Neck section of the model, the fifth iteration utilizes SPP-Fast (SPPF) This enhances the speed twofold and has an architecture as depicted in Figure 2.10

Figure 2.10: SPPF (SPP-Fast) Instead of using MaxPooling in parallel as in SPP, SPPF runs sequentially The kernel size

is 5 for all layers instead of [5, 9, 13] as in the previous version of SPP

The change lies in Loss scaling In this version, three outputs from PAN Neck are used to detect objects of three different sizes However, the influence of objects at each scaling level on Objectness Loss varies, so the formula is updated as formula 2.13

𝐿 = 4.0 𝐿 + 1.0 𝐿 + 4.0 𝐿 (𝟐 𝟏𝟑) The improved user support is clearly demonstrated in Figure 2.11 through the utilization

of Auto Anchor, a technique that applies the Genetic Algorithm (GA) to the Anchor Box

in the subsequent step of K-means This enhancement enables the Anchor Box to perform better with custom user datasets, rather than just performing well on the COCO dataset alone

Figure 2.11: Auto Anchor of YOLOv5

Trang 38

32

2.2.7 YOLOv6

YOLOv6 in general architecture Figure 2.12 has two versions, YOLOR and YOLOX YOLOR was released in 2021, and in this version, deep latent learning is applied to improve accuracy, along with some structural changes to the model to improve its speed The main idea behind YOLOR is to use latent knowledge to improve the model's accuracy

In a neural network, features obtained from shallow layers are called explicit knowledge, while features obtained from deep layers are called latent knowledge In this paper, explicit knowledge is defined as the knowledge extracted from the input image, and latent knowledge is the knowledge not extracted from the input image Moreover, the use of latent knowledge for different positions also has different objectives, allowing users to improve tasks according to personal preferences, as shown in Figure 2.13

Figure 2.12: General architecture of the model

Figure 2.13:The effect of Implicit Knowledge according to the placement

Trang 39

33

YOLOX uses the decouple head technique instead of the coupled head technique used in previous versions Based on the author's experiments, it has been shown that separating and processing two tasks separately helps to increase performance compared to combining processing tasks on the same branch This is because the decoupling technique allows for more efficient resource allocation and more flexible architecture design, as well as reducing interference between different tasks shown in Figure 2.14

Figure 2.14: Differences of couple head and decouple head YOLOX also introduces the multi-positives technique to solve the problem of having only one positive grid for an object, which can reduce the accuracy of the model Other grids, if they can successfully detect the object, will bring benefits in gradients and reduce the imbalance between the number of positive and negative grids during training

In terms of architecture, the backbone of YOLOv6 is EfficientRep, which is evaluated to

be able to optimally handle parallel processing with optimal memory capacity and high accuracy The neck uses Rep-PAN compared to PANet in YOLOv5

2.2.8 YOLOv7

In July 2022, the seventh version of the YOLO model was introduced with several improvements over the previous version to enhance superior speed in Figure 2.15 One of the key enhancements was the utilization of Anchor Boxes

Trang 40

34

Figure 2.15: Improvement of YOLOv7 Speed Anchor Boxes are a set of pre-defined boxes with different aspect ratios In this version, 9 Anchor Boxes were employed, which helped reduce the number of misidentified boxes The changes in the YOLOv7 architecture include the Extended Efficient Layer Aggregation Network (E-ELAN) [12] This structure allows for more effective deep learning and convergence, particularly in the Backbone section The Model Scaling technique, which involves amplifying the size of the model to increase processing performance, is used for concatenation-based models Technical ratios, such as depth scaling, introduce changes in the ratio between input and output channels of a transition layer to alleviate hardware load

YOLOv7 proposes a novel strategy for model scaling based on concatenation, where the depth and width of the block are scaled by the same coefficient to maintain the optimal structure of the model

The changes in this version of the Bag-of-Freebies (BoF) include the utilization of Parameterized Convolutions Similar to the previous version, the structure in this version

Re-is also inspired by Re-Parameterized Convolutions (RepConv) However, the identity connection in RepConv disrupts the remaining properties in ResNet [13] and the chain connection in DenseNet [14] Therefore, the identity connection is removed and renamed

as RepConvN Coarse labeling is applied to the Auxiliary Head, while fine labeling is assigned to the Lead Head The Lead Head is responsible for the final output, while the Auxiliary Head assists the training process Batch Normalization is incorporated into conv-bn-activation This integrates the batch normalization's mean and variance into the bias and weights of the Convolution layer during the inference stage Implicit knowledge, inspired

Tiêu đề	Design of an Automated Plant Disease Detection and Diagnosis System
Tác giả	Nguyen Huu Thang, Tran Binh Trong
Người hướng dẫn	Tran Vu Hoang, Ph.D.
Trường học	Ho Chi Minh City University of Technology and Education
Chuyên ngành	Automatic and Control Engineering
Thể loại	Graduation Project
Năm xuất bản	2023
Thành phố	Ho Chi Minh City

Định dạng
Số trang	87
Dung lượng	9,58 MB