1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Design and implementation of a baby monitoring system

66 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Design and Implementation of a Baby Monitoring System
Người hướng dẫn Trương Ngọc Sơn, Assoc. Prof.
Trường học Ho Chi Minh City University of Technology and Education
Chuyên ngành Computer Engineering Technology
Thể loại graduation project
Năm xuất bản 2022
Thành phố Ho Chi Minh City
Định dạng
Số trang 66
Dung lượng 4,37 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • CHAPTER 1 OVERVIEW (18)
    • 1.1 INTRODUCTION (18)
    • 1.2 PROJECT OBJECTIVES (18)
    • 1.3 RESEARCH METHODOLOGY (19)
    • 1.4 THESIS OUTLINE (19)
  • CHAPTER 2 BACKGROUND (21)
    • 2.1 INTRODUCTION TO DEEP LEARNING (21)
      • 2.1.1 Recurrent Neural Network (RNN) (21)
    • 2.2 LONG SHORT-TERM MEMORY NETWORK (LSTM) (25)
      • 2.2.1 Introduction to LSTM (25)
      • 2.2.2 Detail structure memory cell of LSTM (28)
      • 2.2.3 Type of advance LSTM model (30)
      • 2.2.4 Strengths and limitations of LSTM-RNN (31)
    • 2.3 OpenCV (31)
      • 2.3.1 Introduction to OpenCV (31)
      • 2.3.2 OpenCV Structure and Module (32)
    • 2.4 MediaPipe (33)
      • 2.4.1 Introduction to MediaPipe (33)
      • 2.4.2 MediaPipe solutions (35)
        • 2.4.2.1 Face Detection (35)
        • 2.4.2.2 Face Mesh (36)
        • 2.4.2.3 Hands Detection (37)
        • 2.4.2.4 Human Pose Estimation (38)
    • 2.5 Botogram (Telegram bot framework) (38)
  • CHAPTER 3 DESIGN AND IMPLEMENTATION OF BABY MONITORING SYSTEM23 (40)
    • 3.1 HARDWARE DESIGN (40)
      • 3.1.1 Central Processing Block (40)
      • 3.1.2 Input Block (42)
      • 3.1.3 Output Block (43)
    • 3.2 SOFTWARE DESIGN (44)
      • 3.2.1 The overview of the software system (45)
      • 3.2.2 Dataset (45)
      • 3.2.3 Flowchart of data collection algorithm (46)
        • 3.2.3.1 Algorithm Flowchart (46)
        • 3.2.3.2 Training Flowchart (51)
        • 3.2.3.3 Flowchart of detection algorithm (53)
  • CHAPTER 4 RESULTS AND DISCUSSIONS (57)
    • 4.1 Results of the practical model (57)
    • 4.2 System results and evaluation (57)
      • 4.2.1 Results (57)
      • 4.2.2 System evaluation (59)
  • CHAPTER 5 CONCLUSION AND FURTER WORK (63)
    • 5.1 Conclusion (63)
    • 5.2 Future work (63)

Nội dung

In this project, the LSTM network will be built and combined with the Media Pipe library to automatically identify the baby's facial features and skeleton to detect behaviors such as: a

Trang 1

QUALITY TRAINING

GRADUATION PROJECT COMPUTER ENGINGEERING TECHNOLOGY

DESIGN AND IMPLEMENTATION OF A

BABY MONITORING SYSTEM

ADVISOR: TRƯƠNG NGỌC SƠN, Assoc Prof STUDENT: LE QUANG TRUNG

MAI DUONG QUYEN

SKL009670

Ho Chi Minh City, December, 2022

Trang 2

Student name

ID Student Student name

ID Student ADVISOR:

Lê Quang Trung 18119048

Mai Dương Quyền 18119039

Trương Ngọc Sơn, Assoc Prof.

HO CHI MINH CITY - 12/2022

Trang 3

HO CHI MINH CITY UNVIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR HIGH QUALITY

TRAINING

- 

-GRADUATION PROJECT

DESIGN AND IMPLEMENTATION OF A

BABY MONITORING SYSTEM MAJOR: COMPUTER ENGINGEERING TECHNOLOGY

Student name

ID Student Student name

ID Student ADVISOR:

Lê Quang Trung 18119048

Mai Dương Quyền 18119039

Trương Ngọc Sơn, Assoc Prof.

HO CHI MINH CITY - 12/2022

Trang 4

THE SOCIALIST REPUBLIC OF VIETNAM

Independence – Freedom– Happiness

-Ho Chi Minh City, December 24, 2022

GRADUATION PROJECT ASSIGNMENT

Student name: Lê Quang Trung

Major: Computer Engineering Technology Student ID: 18119048

Class: 18119CLA

Advisor: Trương Ngọc Sơn Phone number: 0837975783

Date of assignment: 1/10/2022 Date of submission: 25/10/2022

system

sleep The monitor system's primary operating environment is homes with infants older than six months The final product is tested for system performance in monitoring on Jetson Nano The results were transmitted for notification by telegram and spoken announcement by loudspeaker

CHAIR OF THE PROGRAM ADVISOR

(Sign with full name) (Sign with full name)

Trang 5

THE SOCIALIST REPUBLIC OF VIETNAM

Independence – Freedom– Happiness

-Ho Chi Minh City, December 24, 2022

ADVISOR’S EVALUATION SHEET

Student name: Lê Quang Trung Student ID: 18119048

Major: Computer Engineering Technology

Project title: Design and implementation of a baby monitoring system

Advisor: Trương Ngọc Sơn

EVALUATION

1 Content of the project:

sleep

system

Trang 6

(Sign with full name)

THE SOCIALIST REPUBLIC OF VIETNAM

Independence – Freedom– Happiness

-Ho Chi Minh City, January 1st, 2023

PRE-DEFENSE EVALUATION SHEET

Major: Computer Engineering

Project title: Design and implementation of a baby monitoring system

Name of Reviewer: Pham Van Khoa

EVALUATION

- Please refer the existing products and provide the proposed specification in

detail

Ho Chi Minh City, January 1st, 2023

REVIEWER

(Sign with full name)

Pham Van Khoa

Trang 7

HO CHI MINH CITY OF UNIVERSITY SOCIALIST REPUBLIC OF VIETNAM

Ho Chi Minh City, January 10, 2023

MODIFYING EXPLANATION OF THE GRADUATION PROJECT MAJOR: COMPUTER TECHNOLOGY ENGINEERING

1 Project title:

2 Student name: Mai Dương Quyền ID: 18119039Student name: Lê Quang Trung ID: 18119048

3 Advisor:

4 Defending council: Council 2, Room: A3-404, 3rd January 2023

5 Modifying explaination of the graduation project:

No Council comments Editing results Note

Refer to the existing products Completed additional modifications to

existing products and provided detailed

1 and provide the proposed

recommended specifications in table 4.3 pagespecification in detail

41

(Sign with full name) (Sign with full name) (Sign with full name)

Trang 8

THE SOCIALIST REPUBLIC OF VIETNAM

Independence – Freedom– Happiness

-Ho Chi Minh City, , 2023

EVALUATION SHEET OF DEFENSE COMMITTEE MEMBER

Major: Computer Engineering

Project title: Design and implementation of a baby monitoring system

Advisor: Trương Ngọc Sơn

Trang 9

Implementation group thus formally proclaim that the research and applicationthat went into this thesis Without citing our study as the source, no published piece hasbeen copied Any infractions that may have happened are fully our fault

Students

Le Quang TrungMai Duong Quyen

Trang 10

During the project implementation, implementation team received a lot of positivecomments and support to be able to complete the project completely and successfully

First of all, implementation team would like to express our sincere thanks to the team

of teachers of HO CHI MINH University of Technical Education and the Department ofHigh Quality Training for the graduation thesis Based on the knowledge throughout the fouryears of study, implementation group has been able to apply and orient themselves to theirprojects, thereby through the project to improve their understanding and help us determineour own direction to be more confident in other projects in the future

In addition, implementation team would like to express our deep thanks to PhD.Truong Ngoc Son who has oriented and dedicatedly supported the team during theimplementation and completion of the project Implementation group also wants to expressits gratitude to the 18119CLA students for their support, advise, and encouragement

During the implementation of the project, the team has gained more knowledgefrom teachers and textbooks and reference materials However, due to the limited level ofexpertise and experience, the team could not avoid the shortcomings Implementationgroup expects to receive attention and input from teachers so that implementation groupcan improve our project better

Finally, implementation team would like to wish all teachers of the Faculty ofElectrical and Electronics Science and the Faculty of High Quality Training at Ho ChiMinh City University of Pedagogy and Engineering together with all students of thefaculty a lot of health and a lot of success

Sincerely!

STUDENTS

Trang 11

Artificial intelligence (AI) is an unfamiliar word in modern life today It has

become an essential part of the technology industry in most sectors (healthcare, industry,surveillance, and manufacturing lines) The monitoring of baby is one of the issues of interest of parents in a busy life, in this topic, implementation group will give ideas and implement solutions for monitoring baby while ensuring the movement of baby during sleep and wake up through the camera

In this project, the LSTM network will be built and combined with the Media Pipe library to automatically identify the baby's facial features and skeleton to detect

behaviors such as: (a) waking up in the baby (b) the baby is moving and showing signs ofmoving out of bed (c) Ensuring the baby is always within the monitoring range the parentchooses As long as one of the three factors is detected, the system will send a

notification to the parent's phone to report the condition of the baby Therefore,

caregivers are not required to be present at all times and monitor the baby's condition regularly

The system provides a solution with low monitoring costs and always ensures the condition of the baby, so that parents can have more peace of mind to spend time

on other tasks

Trang 12

Table of Contents

LIST OF FIGURES xiii

LIST OF TABLES xv

LIST OF ABBREVIATIONS xvi

CHAPTER 1 OVERVIEW 1

1.1 INTRODUCTION 1

1.2 PROJECT OBJECTIVES 1

1.3 RESEARCH METHODOLOGY 2

1.4 THESIS OUTLINE 2

CHAPTER 2 BACKGROUND 4

2.1 INTRODUCTION TO DEEP LEARNING 4

2.1.1 Recurrent Neural Network (RNN) 4

2.2 LONG SHORT-TERM MEMORY NETWORK (LSTM) 8

2.2.1 Introduction to LSTM 8

2.2.2 Detail structure memory cell of LSTM 11

2.2.3 Type of advance LSTM model 13

2.2.4 Strengths and limitations of LSTM-RNN 14

2.3 OpenCV 14

2.3.1 Introduction to OpenCV 14

2.3.2 OpenCV Structure and Module 15

2.4 MediaPipe 16

2.4.1 Introduction to MediaPipe 16

2.4.2 MediaPipe solutions 18

2.4.2.1 Face Detection 18

2.4.2.2 Face Mesh 19

2.4.2.3 Hands Detection 20

2.4.2.4 Human Pose Estimation 21

2.5 Botogram (Telegram bot framework) 21

CHAPTER 3 DESIGN AND IMPLEMENTATION OF BABY MONITORING SYSTEM23 3.1 HARDWARE DESIGN 23

3.1.1 Central Processing Block 23

3.1.2 Input Block 25

Trang 13

3.1.3 Output Block 26

3.2 SOFTWARE DESIGN 27

3.2.1 The overview of the software system 28

3.2.2 Dataset 28

3.2.3 Flowchart of data collection algorithm 29

3.2.3.1 Algorithm Flowchart 29

3.2.3.2 Training Flowchart 34

3.2.3.3 Flowchart of detection algorithm 36

CHAPTER 4 RESULTS AND DISCUSSIONS 40

4.1 Results of the practical model 40

4.2 System results and evaluation 40

4.2.1 Results 40

4.2.2 System evaluation 42

CHAPTER 5 CONCLUSION AND FURTER WORK 46

5.1 Conclusion 46

5.2 Future work 46

REFERENCE 47

Trang 14

LIST OF FIGURES

Figure 2.1: The training process between Machine Learning and Deep Learning 4

Figure 2.2: Traditional Neural Network model 5

Figure 2.3: The structure Recurrent Neural Network have loops 6

Figure 2.4: Equivalence performance Recurrent Neural Network 6

Figure 2.5: Types of issues in RNN 7

Figure 2.6: In a typical RNN, the repeating module just has one layer 8

Figure 2.7: An LSTM has a repeating module with four interconnected layers 9

Figure 2.8: The chart present Tanh function 9

Figure 2.9: Details of cell state structure 10

Figure 2.10: Gate structure consists of a sigmoid layer and a multiplication 10

Figure 2.11: The chart present Sigmoid function 11

Figure 2.12: Forget Gate detail structure in first step 11

Figure 2.13: Input gate layer for second steps 12

Figure 2.14: The present process update Ct for internal cell at third step 12

Figure 2.15: The last step to filter information in output 13

Figure 2.16: Type of advance LSTM model 13

Figure 2.17: OpenCV timeline 15

Figure 2.18: The basic structures of OpenCV 16

Figure 2.19: MediaPipe is used for object detection 17

Figure 2.20: Face detection with MediaPipe 19

Figure 2.21: Output from landmark detection and segmentation 19

Figure 2.22: The Face Mesh created by 468 Landmark points on the face 20

Figure 2.23: Hand landmark 20

Figure 2.24: BlazePose Topology 21

Figure 2.25: Visual Translator Bot 21

Figure 3.1: System diagram 23

Figure 3.2: Pinout of Jetson Nano 24

Figure 3.3: Black 4MP USB Web Camera 25

Trang 15

Figure 3.4: The Z121 stereo speaker 26

Figure 3.5: The Glowy 19inch computer screen 27

Figure 3.6: Overview of connected hardware devices 27

Figure 3.7: Block diagram of software system 28

Figure 3.8: Illustration of the video in the dataset 29

Figure 3.9: Flowchart to get data of baby wake up detection 31

Figure 3.10: The image depicts 12 selected points in the model 31

Figure 3.11: Feature points of the eyes 31

Figure 3.12: Flowchart of getting data of body motion detection 33

Figure 3.13: Flowchart of the training algorithm 35

Figure 3.14: Flowchart of the algorithm to detect the baby waking up 37

Figure 3.15: Flowchart of algorithm to detect moving baby 38

Figure 3.16: Flowchart of algorithm to detect baby outside 39

Figure 4.1: Image depict the actual model 40

Figure 4.2: The test result on three case (a-b) wake up, (c-d) moving , and (e-f) outside42 Figure 4.3: The results of the notification are to be sent to the user's phone through the Telegram app 42

Trang 16

LIST OF TABLES

Table 4.1: The table describes the accuracy of "Baby wake-up detection" 43

Table 4.2: The table describes the accuracy of "Moving baby detection" 43

Table 4.3: Comparison table with previous models [15] 43

Trang 17

LIST OF ABBREVIATIONS

AI Artificial IntelligenceRNN Recurrent Neural NetworkLSTM Long-Short Term Memory

Trang 18

on the aforementioned issue The system can monitor and send a notification directly tothe parent's phone to update the baby's condition when waking up, or show signs ofmoving out of a designated area within the scope of the camera Currently, there are alsomany solutions in monitoring baby, each method and system here will use differentalgorithms and methods of identification, but the general goal in all methods and systems

is to meet the accuracy in real time, because switching the state during sleep in babyoccurs with regular frequency requires the system to always update the behavior status ofthe baby to be able to send notifications quickly against the parents to always update thebaby's status Therefore, in this topic, the group aims at a system with integration in theidentification of many characteristics of baby’s behavior

Firstly, in order to be able to determine whether the baby is in a state of sleep orwakefulness, the group will calculate the value EAR based on 12 points marked on thebaby's eyes In the next step, the MediaPipe library will identify whether the baby tends

to be moving or not through the points on the skeleton marked by the library Finally, thesystem will let the user design and select a monitored region The system will then use analgorithm to assess whether the baby is initially in the area, ensuring that the baby cannotleave the area under observation And if only one of the above elements is detected, thesystem will send a photo with a direct notification of the baby to the parent via Telegram

1.2 PROJECT OBJECTIVES

Analyzing and collecting data related to behavioral behavior in baby during sleep

is one of the first steps in the topic, which aims to provide solutions suitable for users(parents, people in the baby) to ensure applicability and high accuracy

Design and implement the LSTM network model in accordance with the monitoringsystem, meeting the aspects set out Selective calculation of points on the skeleton throughthe Meadiapipe library and the EAR on the eye to identify the awakening of the baby The

Trang 19

last goal is to use Telegram to enable notification data transmission and reception to the parent's phone.

A Summary of the system design and implement process includes the following steps:

1 Collect about the actions that go through sleep and waking up in your baby(crawling/ moving arms/legs/ rolling) From there, select the actions and body parts thattend to change between sleeping and waking

calculate actions and parts of the body in preparation for coaching and program execution

Develop training model and introduce algorithms as well as algorithms in imageprocessing (MediaPipe/OpenCV/Shapely.Geometry) to be able to test and run systemdemo on the software

3 Test and optimize based on test sets to ensure stable execution to achieve high accuracy and limit errors during execution

Design and implement baby monitoring system through behavior and skeletonduring sleep Based on the reports and theoretical basis, there are similar systems fromprevious scientific papers and topics in the research groups

Calculating and determining the baby's parameters during the baby's awakening isbased on algorithms marking landmarks on both eyes, in addition to combining the baby'sskeleton recognition to determine the behaviors and movements in the sleep monitoringprocess

The system's ability to simultaneously recognize and monitor three baby behaviors

is one of its key benefits

- Baby waking up

- A baby moving and displaying indications of getting out of bed

- Keeping the baby inside the parentally selected monitoring range

The results will be compared with those of similar systems to enhance andovercome the shortcomings of other approaches

1.4 THESIS OUTLINE

The project will consist of 5 main chapters, details of each chapter include:

CHAPTER 1- OVERVIEW: The issue, solutions, and the goals and scope of the

research will all be briefly introduced in this chapter

CHAPTER 2- BACKGROUND: In this chapter, implementation group will

discuss the theory of neural network (RNN, LSTM), besides the diagram of PYTHONprogramming language and libraries used in the project MediaPipe (Support forlandmarks on the skeleton), OpenCV (Computer Vison)

Trang 20

CHAPTER 3- SYSTEM DESIGN: in this chapter, the analysis of the block

diagram of the system, the solutions of the proposal team and the details of the functionalcomponents of each block are presented

CHAPTER 4- EXPERIMENTAL RESULTS: designing the system execution

on the hardware, presenting the execution results, building a complete system model togive evaluation comments in all aspects

CHAPTER 5- CONCLUSION: presents the results achieved after completing

the complete system, thereby giving direction to develop and expand the application ofthe system in the future

Trang 21

CHAPTER 2 BACKGROUND

This chapter will cover the theory of deep learning networks, as well as RNN andLSTM artificial neural network models In addition, it provides an overview of Pythonprogramming language along with more framework tools to support applications in deeplearning network

2.1 INTRODUCTION TO DEEP LEARNING

2.1.1 Recurrent Neural Network (RNN)

The new technologies that computer science delivers are evolving swiftly andcontinually improving with each passing day When it comes to AI, however, ArtificialIntelligence, and more especially Machine Learning/Deep Learning, are not unknownterms in today's society Deep learning has assisted computers with tasks that people finddifficult to execute, such as identifying countless distinct objects in photographs,recognizing speech and writing, and more, allowing them to connect with people

In essence, it is clear that Deep Learning is a subset of Machine Learning, a verylarge and computationally intensive field that includes a wide range of approaches andmethods that are applied to many problems:

- Linear Regression

- Logistic Regression

- Neural Network

- Support Vector Machine

Figure 2.1: The training process between Machine Learning and Deep Learning

It is hard to discuss deep learning without mentioning the Recurrent Neural Network(RNN) for sequence issues In the past, conventional neural network models will have 3main parts, namely Input layer, Hidden layer and Output layer, precisely because they aredivided into such parts, so we can see that the inputs and outputs of traditional neuralnetworks are often independent of each other This is one of the main shortcomings of the

Trang 22

traditional neural network, which is not suitable for problems or sequence/time-seriesinformation that requires subsequent predictions depending on the data and images ofprevious predictions.

Figure 2.2: Traditional Neural Network model

To solve the above problem, the RNN network model was born with the main idea

of using internal loops that allow the memory to store the information from thecomputational steps of pre-processing to be able to make predictions for the current step

RNNs (recurrent neural networks) [1, 2] are dynamic systems with internal statesthat change with each classification time step Circular connections between neurons inupper and lower layers as well as connections for optional self-feedback are to blame forthis RNNs can convey data from previous events to current processing stages thanks tothese feedback links RNNs create a recollection of time series occurrences as a result

The recurrent layers or hidden layers in RNNs are made up of recurrent cells withfeedback connections whose states are influenced by both past and present input DifferentRNNs may be created by arranging the recurrent layers in different ways RNNs maytherefore be identified primarily by their network and recurrent cell design RNNs arecapable of having various capabilities thanks to distinct cells and inner connections [3]

Trang 23

Figure 2.3: The structure Recurrent Neural Network have loops

Basically, the structure of a recursive neural network A includes an input Xt and

an output Ht, it can be seen that there will be a loop allowing information to be furthertransmitted from one step of the network to another It is this repetition in the loops thathelps to create a series of lists of networks that memorize and copy each other

Figure 2.4: Equivalence performance Recurrent Neural Network

The above model describes the implementation and calculation inside the RNN neuralnetwork:

X0,1, 2 t: are the inputs in steps from 0 to t, respectively, are the one hot vectors At: is thehidden state at step t This is the memory of the network, it will be calculated based onthe hidden front state (At-2/At-1) and input at that step:

A t = f(Ux t + WA t-1 )

The function f is usually a nonlinear function such as hyperbolic tangent (tanh) orReLu ht: is the output at the t th position In this case, ht is a probability vector from theearlier in-memory lists used to estimate the network's next state

h t = softmax(VA t )

Softmax Function:

Softmax function is an exponential average function In general, the softmaxfunction takes into account the likelihood of a class appearing out of all possible classes.This probability will then be used to define the target class for the input

Trang 24

Specifically, the softmax function transforms k-dimensional vectors with any realvalue into k-dimensional vectors with real values with a total of 1 Input values can bepositive, negative, zero, or greater than 1, but the softmax function will always turn theminto a value between [0:1].

Figure 2.5: Types of issues in RNN

- One to one: simple recognition signs for problems with 1 input and 1 output, often

seen in the sample for Neural Network (NN) and Convolution Neural Network (CNN)

- One to many: the problem will have one input but many outputs.

- Many to one: the opposite of case One to many, used in problems with many

inputs but only 1 output

- Many to many: for problems with many inputs and outputs.

RNN is now used in deep learning to solve issues involving sequence data or time-series data Typical examples of these applications include:

Trang 25

2.2 LONG SHORT-TERM MEMORY NETWORK (LSTM)

2.2.1 Introduction to LSTM

The use of recurrent neural networks (RNNs) in study fields using sequential data,such text, audio, and video, has become widespread However, when the input gap is wide,RNNs made up of sigma cells or tanh cells are unable to learn the pertinent information fromthe input data Long-term dependencies might be effectively dealt with by the long-short-term memory (LSTM) by including gate functions into the cell structure [3]

The Long Short-Term Memory network, commonly known as the LSTM is a special type

of RNN (Recurrent Neural Network) network structure Theoretically, an RNN networkcan be seen to carry information from one layer to the next, but in practice, that amount

of information can only carry a handful of states close together, which causes the gradient

to vanish In other words, RNN can only learn and gather information from states that areclose to one another (Short-Term Memory) Therefore, from this limitation in the RNNnetwork structure, the LSTM network was created to solve this problem

Due to the efficacy of this network topology, it was created in 1997 and firstpresented by Hochreiter & Schmidhuber From that time forward, it has undergonesignificant improvement and become very well-liked among machine learning and deeplearning LSTM is designed to avoid the long-term dependency problem that the RNNnetwork structure suffers from Besides, it may be learned on its own without anyassistance This is what makes LSTM unique from other network designs Defaultinformation is memorized over an extended period of time without training, unliketypical neural network

Figure 2.6: In a typical RNN, the repeating module just has one layer.

Similar to other regression networks and the RNN network architecture, LSTMalso has a chain-like architecture, but instead of having just one neuron (Tanh), thesemodules inside have different structures from the standard RNN network In LSTM, thereare 4 floors interacting with one another as shown below

Trang 26

Figure 2.7: An LSTM has a repeating module with four interconnected layers Tanh Function

The Tanh function, one of the functions frequently encountered in deep learning.Unlike the sigmoid function, Tanh function will receive inputs as a real number andconvert them to a value between (-1, 1) The Tanh function are saturated at both ends,resembling the sigmoid function However, the drawbacks of the sigmoid function arereadily solved due to the symmetry property through zero

+

Figure 2.8: The chart present Tanh function

The LSTM network structure has used cell state as a form of information transmissionacross the chain (network nodes) in order to solve the short-term problem that the RNNnetwork structure has This is possible because cell state has the property that informationcan be easily stored and executed during the process without being changed

9

Trang 27

Figure 2.9: Details of cell state structure

The ability to select (discard or add) the information required for the cells isanother feature of the LSMT in addition to the property of keeping informationthroughout The cells are adjusted through the gate where the information passingthrough it is screened, and LSMT will combine with a Sigmoid network layer to producethe output in range (0,1) Here, a value of 0 indicates that no information has gonethrough, whereas a value of 1 indicates that all information has Typically, the structure

of an LSTM will typically have up to 3 Gates to maintain and run the Cells

Figure 2.10: Gate structure consists of a sigmoid layer and a multiplication

Sigmoid Function (Logistic Function)

The sigmoid function is a function that gets the input value as real numbers andthen converts it into values in the range (0,1) which can be considered as probabilities in

a problem In the sigmoid function if the input is a small negative real number will give

an output that is close to zero, and conversely if the input is a large positive real numberwill give an output that is a number that is close to zero Thanks to this property, a smallchange from the input will not cause the result of the output to change much, so it willresult in stable and continuous output compared to the input

Equation : ( ) = 1

1+

10

Trang 28

The drawback of Vanishing Gradient - Saturated Sigmoid Neural is anothersignificant drawback of the sigmoid function and also one of the reasons it is lesscommon today This disadvantage is readily apparent when the input has a large absolutevalue (negative or positive), resulting in a gradient of the function that will be very close

to zero This means that the coefficients corresponding to the units under considerationwill almost not be updated (Vanishing Gradient)

Figure 2.11: The chart present Sigmoid function

2.2.2 Detail structure memory cell of LSTM

The first step is the phase in which the data in the LSTM cell is selectivelyremoved from the internal state of the cell at the prior time Ct-1 At this time, theactivation value ft of forget gate at t will be calculated based on the current input value of

xt and the output value ht-1 from the previous cell of the LSTM network in combinationwith bias bf of forget gate The task of the sigmoid function now as introduced before willtransform all activation values to results with values between (0,1)

Figure 2.12: Forget Gate detail structure in first step.

Trang 29

In a second step, the LSTM cell decides which information is to be stored in theintra state Ct This step will go through 2 calculation processes for it and Čt, it willrepresent the potential information that needs to be added to the cell state through thesigmoid function Here the sigmoid function is used at the input gate layer to continue toselect the update value.

Č t = tanh(Wc ×[h t-1 ,x t ] + b c )

The Tanh layer, which is located immediately to the next state, will build a vectorfor the value Čt to add to the state and combine with values it to provide the most recentupdate for the cell state

Figure 2.13: Input gate layer for second steps

In a third step, the step of updating previous cell states Ct-1 to a new state Ct iscalculated based on the calculation results obtained previously

Figure 2.14: The present process update Ct for internal cell at third step.

C t = f t * C t-1 + i t * Č t

It can be seen that we will use Hadamard multiplication (*) for the old state Ct-1 1

by ft to remove the information that we decided to forget earlier Then use it to add it * Čt

value Then the new state will depend on how the value is updated in each state

12

Trang 30

As a final step, the cell output value will continue to be calculated and undergoscreening processes To begin with, it keeps using the sigmoid layer to determine whichcell state is being ejected Besides, the selected Ct cell states will be passed through afinal function to be able to limit the obtained values in the range of (-1.1), and finally thisvalue will be multiplied by the output at the previous sigmoid layer to help obtain thedesired output value.

H t = Ot * tanh(C t )

Figure 2.15: The last step to filter information in output

2.2.3 Type of advance LSTM model

Through the description, the detailed architecture of the LSTM network modelabove, it can be seen that it is one of the most basic architectures of LSTM Incomparison to the original, there are now a ton of upgraded versions that are intended to

be extremely useful in coaching and boost performance

Figure 2.16: Type of advance LSTM model

It can be seen in the LSTM network variants that were later calibrated to addpathways (peephole connections) at the interconnected ports in type (1) introduced by Gers

& Schmidhuber, this helps the port floor to receive the input value which is the cell state

Another variant can be seen in form (2), which is the connection between the 2exclusion ports and the input together Instead of breaking down the separation stepsbetween discarded information and added information like a traditional LSTM network

Trang 31

architecture, by connecting the two ports together, we can only discard the informationwhen we need to replace it with new information.

And finally a complex but equally interesting variant of LSTM is the GatedRecurrent Unit (GRU) - (3) In this variant will combine the exclusion and input portsinto one update port It also combines cellular and latent states to produce an additionalalteration

2.2.4 Strengths and limitations of LSTM-RNN

According to [4], When a little amount of information must be retained for a longperiod, LSTM performs exceptionally well The usage of memory chunks is credited withthis characteristic Input and output gates provide access control for memory blocks,preventing the entry or exit of unrelated information Memory blocks are fascinatingarchitectural designs Additionally, memory blocks feature a forget gate that balances theinformation stored in each cell When a cell's prior information is no longer relevant, theforget gate can reset the state of that cell in the block Because forget gates have theability to make cells entirely forget their former state, which prevents prediction bias,they also make continuous prediction possible [5]

Like other algorithms, LSTM also needs the network's topology to be specified.Network memory is eventually constrained since the number of memory blocks in anetwork does not change dynamically Additionally, [4] note that it is impossible to getaround this restriction by uniformly expanding the network size and propose thatmodularization encourages efficient learning The modularization method is "nottypically apparent," nevertheless

2.3 OpenCV

2.3.1 Introduction to OpenCV

Computer vision, or simply "making a computer see," is the science ofprogramming a computer to analyze and eventually comprehend pictures and video [6].There are exciting new possibilities in technology, engineering, and even entertainmentwhen even minor portions of certain Computer Vision difficulties are solved A library ofprogramming functions with efficient and portable code, ideally provided for free, isessential for advancing vision research and disseminating vision information

When OpenCV (Open Source Computer Vision Library) was formally introduced

in 1999, this was one of the Intel team's primary objectives Many programmers havesince contributed to the most recent library advancements The most recent significantupgrade, known as OpenCV 2, was made in 2009 and mostly involved the C++ userinterface On the official OpenCV website, you may find the most recent library release

Trang 32

The package now contains almost 2500 optimized algorithms It has more than 2.5million downloads and more than 40 thousand users worldwide Under a BSD license,OpenCV is also used for business and academic purposes There are several books on thesubject of OpenCV that must be consulted in order to learn every aspect of the OpenCVlibrary Nevertheless, once you have a fundamental understanding of OpenCV from thispaper, reading such more in-depth information should be simpler In fact, the materialsupplied here purposefully closely resembles one of the most recent OpenCV sources [9]

to make it even more convenient

Figure 2.17: OpenCV timeline

In terms of the features offered by OpenCV, it can be divided into the following groups:

- Objects detected (objected, features2D, nonFREE)

- Image /Video/I/O processing and display (core, imgproc, highgui)

- CUDA acceleration support (GPU)

- Clustering and Machine learning

2.3.2 OpenCV Structure and Module

The OpenCV has a structure with 5 main sections shown in the below The CVcomponent contains higher-level computer vision techniques and basic image processingtechniques; ML stands for machine learning library and contains a large number ofstatistical classifiers and clustering tools For saving and loading video and photos,HighGUI includes I/O capabilities, and CXCore includes the basic content and datastructures

Trang 33

Figure 2.18: The basic structures of OpenCV

Besides, it is easy to see the structure of OpenCV divided into module structures, in otherwords, it will include some static libraries or shared libraries

Some popular modules are now supported in OpenCV:

- Core functionality: a compact module used to define basic data structures

including

- Image Processing (imgproc): module used to process images through the filtering of

linear and non-linear images (Linear and non-linear image filtering), geometric transformations(alignment of size, perspective), conversion of color spaces, charts

- Video Analysis: module used to analyze video in motion estimation, background

separation and algorithms depending on the problem

- 2D Features Framework (features2d): module to detect the outstanding

characteristics of the recognizer, used to retrieve parameters

- Object Detection (objecdetect): detects objects and simulations of predefined

functions (people, animals, vehicles )

- Video I/O (videoio): easy-to-use interface for video capture and encoding.

2.4 MediaPipe

2.4.1 Introduction to MediaPipe

MediaPipe is a framework for creating pipelines that carry out inference operationsover any type of sensory data Model inference, media processing algorithms, datatransformations, and other modular components may all be combined to create a perception

Ngày đăng: 11/05/2023, 09:52

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w