In this project, the LSTM network will be built and combined with the Media Pipe library to automatically identify the baby's facial features and skeleton to detect behaviors such as: a
Trang 1QUALITY TRAINING
GRADUATION PROJECT COMPUTER ENGINGEERING TECHNOLOGY
DESIGN AND IMPLEMENTATION OF A
BABY MONITORING SYSTEM
ADVISOR: TRƯƠNG NGỌC SƠN, Assoc Prof STUDENT: LE QUANG TRUNG
MAI DUONG QUYEN
SKL009670
Ho Chi Minh City, December, 2022
Trang 2Student name
ID Student Student name
ID Student ADVISOR:
Lê Quang Trung 18119048
Mai Dương Quyền 18119039
Trương Ngọc Sơn, Assoc Prof.
HO CHI MINH CITY - 12/2022
Trang 3HO CHI MINH CITY UNVIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR HIGH QUALITY
TRAINING
-
-GRADUATION PROJECT
DESIGN AND IMPLEMENTATION OF A
BABY MONITORING SYSTEM MAJOR: COMPUTER ENGINGEERING TECHNOLOGY
Student name
ID Student Student name
ID Student ADVISOR:
Lê Quang Trung 18119048
Mai Dương Quyền 18119039
Trương Ngọc Sơn, Assoc Prof.
HO CHI MINH CITY - 12/2022
Trang 4THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-Ho Chi Minh City, December 24, 2022
GRADUATION PROJECT ASSIGNMENT
Student name: Lê Quang Trung
Major: Computer Engineering Technology Student ID: 18119048
Class: 18119CLA
Advisor: Trương Ngọc Sơn Phone number: 0837975783
Date of assignment: 1/10/2022 Date of submission: 25/10/2022
system
sleep The monitor system's primary operating environment is homes with infants older than six months The final product is tested for system performance in monitoring on Jetson Nano The results were transmitted for notification by telegram and spoken announcement by loudspeaker
CHAIR OF THE PROGRAM ADVISOR
(Sign with full name) (Sign with full name)
Trang 5THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-Ho Chi Minh City, December 24, 2022
ADVISOR’S EVALUATION SHEET
Student name: Lê Quang Trung Student ID: 18119048
Major: Computer Engineering Technology
Project title: Design and implementation of a baby monitoring system
Advisor: Trương Ngọc Sơn
EVALUATION
1 Content of the project:
sleep
system
Trang 6
(Sign with full name)
THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-Ho Chi Minh City, January 1st, 2023
PRE-DEFENSE EVALUATION SHEET
Major: Computer Engineering
Project title: Design and implementation of a baby monitoring system
Name of Reviewer: Pham Van Khoa
EVALUATION
- Please refer the existing products and provide the proposed specification in
detail
Ho Chi Minh City, January 1st, 2023
REVIEWER
(Sign with full name)
Pham Van Khoa
Trang 7HO CHI MINH CITY OF UNIVERSITY SOCIALIST REPUBLIC OF VIETNAM
Ho Chi Minh City, January 10, 2023
MODIFYING EXPLANATION OF THE GRADUATION PROJECT MAJOR: COMPUTER TECHNOLOGY ENGINEERING
1 Project title:
2 Student name: Mai Dương Quyền ID: 18119039Student name: Lê Quang Trung ID: 18119048
3 Advisor:
4 Defending council: Council 2, Room: A3-404, 3rd January 2023
5 Modifying explaination of the graduation project:
No Council comments Editing results Note
Refer to the existing products Completed additional modifications to
existing products and provided detailed
1 and provide the proposed
recommended specifications in table 4.3 pagespecification in detail
41
(Sign with full name) (Sign with full name) (Sign with full name)
Trang 8THE SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom– Happiness
-Ho Chi Minh City, , 2023
EVALUATION SHEET OF DEFENSE COMMITTEE MEMBER
Major: Computer Engineering
Project title: Design and implementation of a baby monitoring system
Advisor: Trương Ngọc Sơn
Trang 9Implementation group thus formally proclaim that the research and applicationthat went into this thesis Without citing our study as the source, no published piece hasbeen copied Any infractions that may have happened are fully our fault
Students
Le Quang TrungMai Duong Quyen
Trang 10During the project implementation, implementation team received a lot of positivecomments and support to be able to complete the project completely and successfully
First of all, implementation team would like to express our sincere thanks to the team
of teachers of HO CHI MINH University of Technical Education and the Department ofHigh Quality Training for the graduation thesis Based on the knowledge throughout the fouryears of study, implementation group has been able to apply and orient themselves to theirprojects, thereby through the project to improve their understanding and help us determineour own direction to be more confident in other projects in the future
In addition, implementation team would like to express our deep thanks to PhD.Truong Ngoc Son who has oriented and dedicatedly supported the team during theimplementation and completion of the project Implementation group also wants to expressits gratitude to the 18119CLA students for their support, advise, and encouragement
During the implementation of the project, the team has gained more knowledgefrom teachers and textbooks and reference materials However, due to the limited level ofexpertise and experience, the team could not avoid the shortcomings Implementationgroup expects to receive attention and input from teachers so that implementation groupcan improve our project better
Finally, implementation team would like to wish all teachers of the Faculty ofElectrical and Electronics Science and the Faculty of High Quality Training at Ho ChiMinh City University of Pedagogy and Engineering together with all students of thefaculty a lot of health and a lot of success
Sincerely!
STUDENTS
Trang 11Artificial intelligence (AI) is an unfamiliar word in modern life today It has
become an essential part of the technology industry in most sectors (healthcare, industry,surveillance, and manufacturing lines) The monitoring of baby is one of the issues of interest of parents in a busy life, in this topic, implementation group will give ideas and implement solutions for monitoring baby while ensuring the movement of baby during sleep and wake up through the camera
In this project, the LSTM network will be built and combined with the Media Pipe library to automatically identify the baby's facial features and skeleton to detect
behaviors such as: (a) waking up in the baby (b) the baby is moving and showing signs ofmoving out of bed (c) Ensuring the baby is always within the monitoring range the parentchooses As long as one of the three factors is detected, the system will send a
notification to the parent's phone to report the condition of the baby Therefore,
caregivers are not required to be present at all times and monitor the baby's condition regularly
The system provides a solution with low monitoring costs and always ensures the condition of the baby, so that parents can have more peace of mind to spend time
on other tasks
Trang 12Table of Contents
LIST OF FIGURES xiii
LIST OF TABLES xv
LIST OF ABBREVIATIONS xvi
CHAPTER 1 OVERVIEW 1
1.1 INTRODUCTION 1
1.2 PROJECT OBJECTIVES 1
1.3 RESEARCH METHODOLOGY 2
1.4 THESIS OUTLINE 2
CHAPTER 2 BACKGROUND 4
2.1 INTRODUCTION TO DEEP LEARNING 4
2.1.1 Recurrent Neural Network (RNN) 4
2.2 LONG SHORT-TERM MEMORY NETWORK (LSTM) 8
2.2.1 Introduction to LSTM 8
2.2.2 Detail structure memory cell of LSTM 11
2.2.3 Type of advance LSTM model 13
2.2.4 Strengths and limitations of LSTM-RNN 14
2.3 OpenCV 14
2.3.1 Introduction to OpenCV 14
2.3.2 OpenCV Structure and Module 15
2.4 MediaPipe 16
2.4.1 Introduction to MediaPipe 16
2.4.2 MediaPipe solutions 18
2.4.2.1 Face Detection 18
2.4.2.2 Face Mesh 19
2.4.2.3 Hands Detection 20
2.4.2.4 Human Pose Estimation 21
2.5 Botogram (Telegram bot framework) 21
CHAPTER 3 DESIGN AND IMPLEMENTATION OF BABY MONITORING SYSTEM23 3.1 HARDWARE DESIGN 23
3.1.1 Central Processing Block 23
3.1.2 Input Block 25
Trang 133.1.3 Output Block 26
3.2 SOFTWARE DESIGN 27
3.2.1 The overview of the software system 28
3.2.2 Dataset 28
3.2.3 Flowchart of data collection algorithm 29
3.2.3.1 Algorithm Flowchart 29
3.2.3.2 Training Flowchart 34
3.2.3.3 Flowchart of detection algorithm 36
CHAPTER 4 RESULTS AND DISCUSSIONS 40
4.1 Results of the practical model 40
4.2 System results and evaluation 40
4.2.1 Results 40
4.2.2 System evaluation 42
CHAPTER 5 CONCLUSION AND FURTER WORK 46
5.1 Conclusion 46
5.2 Future work 46
REFERENCE 47
Trang 14LIST OF FIGURES
Figure 2.1: The training process between Machine Learning and Deep Learning 4
Figure 2.2: Traditional Neural Network model 5
Figure 2.3: The structure Recurrent Neural Network have loops 6
Figure 2.4: Equivalence performance Recurrent Neural Network 6
Figure 2.5: Types of issues in RNN 7
Figure 2.6: In a typical RNN, the repeating module just has one layer 8
Figure 2.7: An LSTM has a repeating module with four interconnected layers 9
Figure 2.8: The chart present Tanh function 9
Figure 2.9: Details of cell state structure 10
Figure 2.10: Gate structure consists of a sigmoid layer and a multiplication 10
Figure 2.11: The chart present Sigmoid function 11
Figure 2.12: Forget Gate detail structure in first step 11
Figure 2.13: Input gate layer for second steps 12
Figure 2.14: The present process update Ct for internal cell at third step 12
Figure 2.15: The last step to filter information in output 13
Figure 2.16: Type of advance LSTM model 13
Figure 2.17: OpenCV timeline 15
Figure 2.18: The basic structures of OpenCV 16
Figure 2.19: MediaPipe is used for object detection 17
Figure 2.20: Face detection with MediaPipe 19
Figure 2.21: Output from landmark detection and segmentation 19
Figure 2.22: The Face Mesh created by 468 Landmark points on the face 20
Figure 2.23: Hand landmark 20
Figure 2.24: BlazePose Topology 21
Figure 2.25: Visual Translator Bot 21
Figure 3.1: System diagram 23
Figure 3.2: Pinout of Jetson Nano 24
Figure 3.3: Black 4MP USB Web Camera 25
Trang 15Figure 3.4: The Z121 stereo speaker 26
Figure 3.5: The Glowy 19inch computer screen 27
Figure 3.6: Overview of connected hardware devices 27
Figure 3.7: Block diagram of software system 28
Figure 3.8: Illustration of the video in the dataset 29
Figure 3.9: Flowchart to get data of baby wake up detection 31
Figure 3.10: The image depicts 12 selected points in the model 31
Figure 3.11: Feature points of the eyes 31
Figure 3.12: Flowchart of getting data of body motion detection 33
Figure 3.13: Flowchart of the training algorithm 35
Figure 3.14: Flowchart of the algorithm to detect the baby waking up 37
Figure 3.15: Flowchart of algorithm to detect moving baby 38
Figure 3.16: Flowchart of algorithm to detect baby outside 39
Figure 4.1: Image depict the actual model 40
Figure 4.2: The test result on three case (a-b) wake up, (c-d) moving , and (e-f) outside42 Figure 4.3: The results of the notification are to be sent to the user's phone through the Telegram app 42
Trang 16LIST OF TABLES
Table 4.1: The table describes the accuracy of "Baby wake-up detection" 43Table 4.2: The table describes the accuracy of "Moving baby detection" 43
Table 4.3: Comparison table with previous models [15] 43
Trang 17LIST OF ABBREVIATIONS
AI Artificial IntelligenceRNN Recurrent Neural NetworkLSTM Long-Short Term Memory
Trang 18on the aforementioned issue The system can monitor and send a notification directly tothe parent's phone to update the baby's condition when waking up, or show signs ofmoving out of a designated area within the scope of the camera Currently, there are alsomany solutions in monitoring baby, each method and system here will use differentalgorithms and methods of identification, but the general goal in all methods and systems
is to meet the accuracy in real time, because switching the state during sleep in babyoccurs with regular frequency requires the system to always update the behavior status ofthe baby to be able to send notifications quickly against the parents to always update thebaby's status Therefore, in this topic, the group aims at a system with integration in theidentification of many characteristics of baby’s behavior
Firstly, in order to be able to determine whether the baby is in a state of sleep orwakefulness, the group will calculate the value EAR based on 12 points marked on thebaby's eyes In the next step, the MediaPipe library will identify whether the baby tends
to be moving or not through the points on the skeleton marked by the library Finally, thesystem will let the user design and select a monitored region The system will then use analgorithm to assess whether the baby is initially in the area, ensuring that the baby cannotleave the area under observation And if only one of the above elements is detected, thesystem will send a photo with a direct notification of the baby to the parent via Telegram
1.2 PROJECT OBJECTIVES
Analyzing and collecting data related to behavioral behavior in baby during sleep
is one of the first steps in the topic, which aims to provide solutions suitable for users(parents, people in the baby) to ensure applicability and high accuracy
Design and implement the LSTM network model in accordance with the monitoringsystem, meeting the aspects set out Selective calculation of points on the skeleton throughthe Meadiapipe library and the EAR on the eye to identify the awakening of the baby The
Trang 19last goal is to use Telegram to enable notification data transmission and reception to the parent's phone.
A Summary of the system design and implement process includes the following steps:
1 Collect about the actions that go through sleep and waking up in your baby(crawling/ moving arms/legs/ rolling) From there, select the actions and body parts thattend to change between sleeping and waking
calculate actions and parts of the body in preparation for coaching and program execution
Develop training model and introduce algorithms as well as algorithms in imageprocessing (MediaPipe/OpenCV/Shapely.Geometry) to be able to test and run systemdemo on the software
3 Test and optimize based on test sets to ensure stable execution to achieve high accuracy and limit errors during execution
Design and implement baby monitoring system through behavior and skeletonduring sleep Based on the reports and theoretical basis, there are similar systems fromprevious scientific papers and topics in the research groups
Calculating and determining the baby's parameters during the baby's awakening isbased on algorithms marking landmarks on both eyes, in addition to combining the baby'sskeleton recognition to determine the behaviors and movements in the sleep monitoringprocess
The system's ability to simultaneously recognize and monitor three baby behaviors
is one of its key benefits
- Baby waking up
- A baby moving and displaying indications of getting out of bed
- Keeping the baby inside the parentally selected monitoring range
The results will be compared with those of similar systems to enhance andovercome the shortcomings of other approaches
1.4 THESIS OUTLINE
The project will consist of 5 main chapters, details of each chapter include:
CHAPTER 1- OVERVIEW: The issue, solutions, and the goals and scope of the
research will all be briefly introduced in this chapter
CHAPTER 2- BACKGROUND: In this chapter, implementation group will
discuss the theory of neural network (RNN, LSTM), besides the diagram of PYTHONprogramming language and libraries used in the project MediaPipe (Support forlandmarks on the skeleton), OpenCV (Computer Vison)
Trang 20CHAPTER 3- SYSTEM DESIGN: in this chapter, the analysis of the block
diagram of the system, the solutions of the proposal team and the details of the functionalcomponents of each block are presented
CHAPTER 4- EXPERIMENTAL RESULTS: designing the system execution
on the hardware, presenting the execution results, building a complete system model togive evaluation comments in all aspects
CHAPTER 5- CONCLUSION: presents the results achieved after completing
the complete system, thereby giving direction to develop and expand the application ofthe system in the future
Trang 21CHAPTER 2 BACKGROUND
This chapter will cover the theory of deep learning networks, as well as RNN andLSTM artificial neural network models In addition, it provides an overview of Pythonprogramming language along with more framework tools to support applications in deeplearning network
2.1 INTRODUCTION TO DEEP LEARNING
2.1.1 Recurrent Neural Network (RNN)
The new technologies that computer science delivers are evolving swiftly andcontinually improving with each passing day When it comes to AI, however, ArtificialIntelligence, and more especially Machine Learning/Deep Learning, are not unknownterms in today's society Deep learning has assisted computers with tasks that people finddifficult to execute, such as identifying countless distinct objects in photographs,recognizing speech and writing, and more, allowing them to connect with people
In essence, it is clear that Deep Learning is a subset of Machine Learning, a verylarge and computationally intensive field that includes a wide range of approaches andmethods that are applied to many problems:
- Linear Regression
- Logistic Regression
- Neural Network
- Support Vector Machine
Figure 2.1: The training process between Machine Learning and Deep Learning
It is hard to discuss deep learning without mentioning the Recurrent Neural Network(RNN) for sequence issues In the past, conventional neural network models will have 3main parts, namely Input layer, Hidden layer and Output layer, precisely because they aredivided into such parts, so we can see that the inputs and outputs of traditional neuralnetworks are often independent of each other This is one of the main shortcomings of the
Trang 22traditional neural network, which is not suitable for problems or sequence/time-seriesinformation that requires subsequent predictions depending on the data and images ofprevious predictions.
Figure 2.2: Traditional Neural Network model
To solve the above problem, the RNN network model was born with the main idea
of using internal loops that allow the memory to store the information from thecomputational steps of pre-processing to be able to make predictions for the current step
RNNs (recurrent neural networks) [1, 2] are dynamic systems with internal statesthat change with each classification time step Circular connections between neurons inupper and lower layers as well as connections for optional self-feedback are to blame forthis RNNs can convey data from previous events to current processing stages thanks tothese feedback links RNNs create a recollection of time series occurrences as a result
The recurrent layers or hidden layers in RNNs are made up of recurrent cells withfeedback connections whose states are influenced by both past and present input DifferentRNNs may be created by arranging the recurrent layers in different ways RNNs maytherefore be identified primarily by their network and recurrent cell design RNNs arecapable of having various capabilities thanks to distinct cells and inner connections [3]
Trang 23Figure 2.3: The structure Recurrent Neural Network have loops
Basically, the structure of a recursive neural network A includes an input Xt and
an output Ht, it can be seen that there will be a loop allowing information to be furthertransmitted from one step of the network to another It is this repetition in the loops thathelps to create a series of lists of networks that memorize and copy each other
Figure 2.4: Equivalence performance Recurrent Neural Network
The above model describes the implementation and calculation inside the RNN neuralnetwork:
X0,1, 2 t: are the inputs in steps from 0 to t, respectively, are the one hot vectors At: is thehidden state at step t This is the memory of the network, it will be calculated based onthe hidden front state (At-2/At-1) and input at that step:
A t = f(Ux t + WA t-1 )
The function f is usually a nonlinear function such as hyperbolic tangent (tanh) orReLu ht: is the output at the t th position In this case, ht is a probability vector from theearlier in-memory lists used to estimate the network's next state
h t = softmax(VA t )
Softmax Function:
Softmax function is an exponential average function In general, the softmaxfunction takes into account the likelihood of a class appearing out of all possible classes.This probability will then be used to define the target class for the input
Trang 24Specifically, the softmax function transforms k-dimensional vectors with any realvalue into k-dimensional vectors with real values with a total of 1 Input values can bepositive, negative, zero, or greater than 1, but the softmax function will always turn theminto a value between [0:1].
Figure 2.5: Types of issues in RNN
- One to one: simple recognition signs for problems with 1 input and 1 output, often
seen in the sample for Neural Network (NN) and Convolution Neural Network (CNN)
- One to many: the problem will have one input but many outputs.
- Many to one: the opposite of case One to many, used in problems with many
inputs but only 1 output
- Many to many: for problems with many inputs and outputs.
RNN is now used in deep learning to solve issues involving sequence data or time-series data Typical examples of these applications include:
Trang 252.2 LONG SHORT-TERM MEMORY NETWORK (LSTM)
2.2.1 Introduction to LSTM
The use of recurrent neural networks (RNNs) in study fields using sequential data,such text, audio, and video, has become widespread However, when the input gap is wide,RNNs made up of sigma cells or tanh cells are unable to learn the pertinent information fromthe input data Long-term dependencies might be effectively dealt with by the long-short-term memory (LSTM) by including gate functions into the cell structure [3]
The Long Short-Term Memory network, commonly known as the LSTM is a special type
of RNN (Recurrent Neural Network) network structure Theoretically, an RNN networkcan be seen to carry information from one layer to the next, but in practice, that amount
of information can only carry a handful of states close together, which causes the gradient
to vanish In other words, RNN can only learn and gather information from states that areclose to one another (Short-Term Memory) Therefore, from this limitation in the RNNnetwork structure, the LSTM network was created to solve this problem
Due to the efficacy of this network topology, it was created in 1997 and firstpresented by Hochreiter & Schmidhuber From that time forward, it has undergonesignificant improvement and become very well-liked among machine learning and deeplearning LSTM is designed to avoid the long-term dependency problem that the RNNnetwork structure suffers from Besides, it may be learned on its own without anyassistance This is what makes LSTM unique from other network designs Defaultinformation is memorized over an extended period of time without training, unliketypical neural network
Figure 2.6: In a typical RNN, the repeating module just has one layer.
Similar to other regression networks and the RNN network architecture, LSTMalso has a chain-like architecture, but instead of having just one neuron (Tanh), thesemodules inside have different structures from the standard RNN network In LSTM, thereare 4 floors interacting with one another as shown below
Trang 26Figure 2.7: An LSTM has a repeating module with four interconnected layers Tanh Function
The Tanh function, one of the functions frequently encountered in deep learning.Unlike the sigmoid function, Tanh function will receive inputs as a real number andconvert them to a value between (-1, 1) The Tanh function are saturated at both ends,resembling the sigmoid function However, the drawbacks of the sigmoid function arereadily solved due to the symmetry property through zero
+
Figure 2.8: The chart present Tanh function
The LSTM network structure has used cell state as a form of information transmissionacross the chain (network nodes) in order to solve the short-term problem that the RNNnetwork structure has This is possible because cell state has the property that informationcan be easily stored and executed during the process without being changed
9
Trang 27Figure 2.9: Details of cell state structure
The ability to select (discard or add) the information required for the cells isanother feature of the LSMT in addition to the property of keeping informationthroughout The cells are adjusted through the gate where the information passingthrough it is screened, and LSMT will combine with a Sigmoid network layer to producethe output in range (0,1) Here, a value of 0 indicates that no information has gonethrough, whereas a value of 1 indicates that all information has Typically, the structure
of an LSTM will typically have up to 3 Gates to maintain and run the Cells
Figure 2.10: Gate structure consists of a sigmoid layer and a multiplication
Sigmoid Function (Logistic Function)
The sigmoid function is a function that gets the input value as real numbers andthen converts it into values in the range (0,1) which can be considered as probabilities in
a problem In the sigmoid function if the input is a small negative real number will give
an output that is close to zero, and conversely if the input is a large positive real numberwill give an output that is a number that is close to zero Thanks to this property, a smallchange from the input will not cause the result of the output to change much, so it willresult in stable and continuous output compared to the input
Equation : ( ) = 1
1+
10
Trang 28The drawback of Vanishing Gradient - Saturated Sigmoid Neural is anothersignificant drawback of the sigmoid function and also one of the reasons it is lesscommon today This disadvantage is readily apparent when the input has a large absolutevalue (negative or positive), resulting in a gradient of the function that will be very close
to zero This means that the coefficients corresponding to the units under considerationwill almost not be updated (Vanishing Gradient)
Figure 2.11: The chart present Sigmoid function
2.2.2 Detail structure memory cell of LSTM
The first step is the phase in which the data in the LSTM cell is selectivelyremoved from the internal state of the cell at the prior time Ct-1 At this time, theactivation value ft of forget gate at t will be calculated based on the current input value of
xt and the output value ht-1 from the previous cell of the LSTM network in combinationwith bias bf of forget gate The task of the sigmoid function now as introduced before willtransform all activation values to results with values between (0,1)
Figure 2.12: Forget Gate detail structure in first step.
Trang 29In a second step, the LSTM cell decides which information is to be stored in theintra state Ct This step will go through 2 calculation processes for it and Čt, it willrepresent the potential information that needs to be added to the cell state through thesigmoid function Here the sigmoid function is used at the input gate layer to continue toselect the update value.
Č t = tanh(Wc ×[h t-1 ,x t ] + b c )
The Tanh layer, which is located immediately to the next state, will build a vectorfor the value Čt to add to the state and combine with values it to provide the most recentupdate for the cell state
Figure 2.13: Input gate layer for second steps
In a third step, the step of updating previous cell states Ct-1 to a new state Ct iscalculated based on the calculation results obtained previously
Figure 2.14: The present process update Ct for internal cell at third step.
C t = f t * C t-1 + i t * Č t
It can be seen that we will use Hadamard multiplication (*) for the old state Ct-1 1
by ft to remove the information that we decided to forget earlier Then use it to add it * Čt
value Then the new state will depend on how the value is updated in each state
12
Trang 30As a final step, the cell output value will continue to be calculated and undergoscreening processes To begin with, it keeps using the sigmoid layer to determine whichcell state is being ejected Besides, the selected Ct cell states will be passed through afinal function to be able to limit the obtained values in the range of (-1.1), and finally thisvalue will be multiplied by the output at the previous sigmoid layer to help obtain thedesired output value.
H t = Ot * tanh(C t )
Figure 2.15: The last step to filter information in output
2.2.3 Type of advance LSTM model
Through the description, the detailed architecture of the LSTM network modelabove, it can be seen that it is one of the most basic architectures of LSTM Incomparison to the original, there are now a ton of upgraded versions that are intended to
be extremely useful in coaching and boost performance
Figure 2.16: Type of advance LSTM model
It can be seen in the LSTM network variants that were later calibrated to addpathways (peephole connections) at the interconnected ports in type (1) introduced by Gers
& Schmidhuber, this helps the port floor to receive the input value which is the cell state
Another variant can be seen in form (2), which is the connection between the 2exclusion ports and the input together Instead of breaking down the separation stepsbetween discarded information and added information like a traditional LSTM network
Trang 31architecture, by connecting the two ports together, we can only discard the informationwhen we need to replace it with new information.
And finally a complex but equally interesting variant of LSTM is the GatedRecurrent Unit (GRU) - (3) In this variant will combine the exclusion and input portsinto one update port It also combines cellular and latent states to produce an additionalalteration
2.2.4 Strengths and limitations of LSTM-RNN
According to [4], When a little amount of information must be retained for a longperiod, LSTM performs exceptionally well The usage of memory chunks is credited withthis characteristic Input and output gates provide access control for memory blocks,preventing the entry or exit of unrelated information Memory blocks are fascinatingarchitectural designs Additionally, memory blocks feature a forget gate that balances theinformation stored in each cell When a cell's prior information is no longer relevant, theforget gate can reset the state of that cell in the block Because forget gates have theability to make cells entirely forget their former state, which prevents prediction bias,they also make continuous prediction possible [5]
Like other algorithms, LSTM also needs the network's topology to be specified.Network memory is eventually constrained since the number of memory blocks in anetwork does not change dynamically Additionally, [4] note that it is impossible to getaround this restriction by uniformly expanding the network size and propose thatmodularization encourages efficient learning The modularization method is "nottypically apparent," nevertheless
2.3 OpenCV
2.3.1 Introduction to OpenCV
Computer vision, or simply "making a computer see," is the science ofprogramming a computer to analyze and eventually comprehend pictures and video [6].There are exciting new possibilities in technology, engineering, and even entertainmentwhen even minor portions of certain Computer Vision difficulties are solved A library ofprogramming functions with efficient and portable code, ideally provided for free, isessential for advancing vision research and disseminating vision information
When OpenCV (Open Source Computer Vision Library) was formally introduced
in 1999, this was one of the Intel team's primary objectives Many programmers havesince contributed to the most recent library advancements The most recent significantupgrade, known as OpenCV 2, was made in 2009 and mostly involved the C++ userinterface On the official OpenCV website, you may find the most recent library release
Trang 32The package now contains almost 2500 optimized algorithms It has more than 2.5million downloads and more than 40 thousand users worldwide Under a BSD license,OpenCV is also used for business and academic purposes There are several books on thesubject of OpenCV that must be consulted in order to learn every aspect of the OpenCVlibrary Nevertheless, once you have a fundamental understanding of OpenCV from thispaper, reading such more in-depth information should be simpler In fact, the materialsupplied here purposefully closely resembles one of the most recent OpenCV sources [9]
to make it even more convenient
Figure 2.17: OpenCV timeline
In terms of the features offered by OpenCV, it can be divided into the following groups:
- Objects detected (objected, features2D, nonFREE)
- Image /Video/I/O processing and display (core, imgproc, highgui)
- CUDA acceleration support (GPU)
- Clustering and Machine learning
2.3.2 OpenCV Structure and Module
The OpenCV has a structure with 5 main sections shown in the below The CVcomponent contains higher-level computer vision techniques and basic image processingtechniques; ML stands for machine learning library and contains a large number ofstatistical classifiers and clustering tools For saving and loading video and photos,HighGUI includes I/O capabilities, and CXCore includes the basic content and datastructures
Trang 33Figure 2.18: The basic structures of OpenCV
Besides, it is easy to see the structure of OpenCV divided into module structures, in otherwords, it will include some static libraries or shared libraries
Some popular modules are now supported in OpenCV:
- Core functionality: a compact module used to define basic data structures
including
- Image Processing (imgproc): module used to process images through the filtering of
linear and non-linear images (Linear and non-linear image filtering), geometric transformations(alignment of size, perspective), conversion of color spaces, charts
- Video Analysis: module used to analyze video in motion estimation, background
separation and algorithms depending on the problem
- 2D Features Framework (features2d): module to detect the outstanding
characteristics of the recognizer, used to retrieve parameters
- Object Detection (objecdetect): detects objects and simulations of predefined
functions (people, animals, vehicles )
- Video I/O (videoio): easy-to-use interface for video capture and encoding.
2.4 MediaPipe
2.4.1 Introduction to MediaPipe
MediaPipe is a framework for creating pipelines that carry out inference operationsover any type of sensory data Model inference, media processing algorithms, datatransformations, and other modular components may all be combined to create a perception