Service robot for students based on computer vision and natural language processing

GRADUATION PROJECTSERVICE ROBOT FOR STUDENTS BASED ON COMPUTER VISION AND NATURAL LANGUAGE PROCESSING Major: AUTOMATION AND CONTROL ENGINEERING TECHNOLOGY Advisor: Assoc... GRADUATION PR

Trang 1

Ho Chi Minh City, August, 2022

S K L 0 0 9 3 2 5

GRADUATION PROJECT AUTOMATION AND CONTROL ENGINEERING

Trang 2

GRADUATION PROJECT

SERVICE ROBOT FOR STUDENTS BASED ON

COMPUTER VISION AND NATURAL LANGUAGE

PROCESSING

Major: AUTOMATION AND CONTROL ENGINEERING TECHNOLOGY

Advisor: Assoc Prof PhD LE MY HA

NGUYỄN TUẤN THANH Student ID: 17151028

Ho Chi Minh City, August 2022

Trang 3

GRADUATION PROJECT

SERVICE ROBOT FOR STUDENTS BASED ON

COMPUTER VISION AND NATURAL LANGUAGE

Advisor: Assoc Prof PhD LE MY HA

Ho Chi Minh City, August 2022

Trang 4

Independence – Freedom– Happiness

-Ho Chi Minh City, August 6 th , 2022

GRADUATION PROJECT ASSIGNMENT

Student name: Nguyen Tuan Thanh Student ID: 17151028

Major: Automation and Control Engineering

Technology Class: 17151CLA1

Advisor: Assoc Prof PhD Le My Ha Phone number: 0938811201

Date of assignment: Feb 21 th , 2022 Date of submission: August 6 th , 2022

1 Project title: Service robot for students based on computer vision and natural language

processing

2 Initial materials provided by the advisor: References, reference programs, data sets, expectedparameters of the Robot

3 Content of the project:

- Design, implement a service robot with two functions: chat and talk

- Apply computer vision to identify wearing a mask and user information

- Apply natural language processing in virtual voice assistant to communicate with human

- Apply natural language toolkit (NLTK) to build chatbot to communicate with human

- Build database and collect more database when communicate with human

4 Final product: Finish a service robot that have abilities to recognize human with high accuracyand communicating with human by given knowledge database

CHAIR OF THE PROGRAM

(Sign with full name) ADVISOR

(Sign with full name)

Trang 5

Faculty for High Quality Training – HCMC University of Technology and Education

THE SOCIALIST REPUBLIC OF VIETNAM

-Ho Chi Minh City, August 6, 2022 ADVISOR’S EVALUATION SHEET Student name: Nguyen Tuan Thanh Student ID: 17151028 Major: Automation and Control Engineering Technology Project title: Service robot for students based on computer vision and natural language processing Advisor:Assoc Prof PhD Le My Ha EVALUATION 1 Content of the project: - Design, implement a service robot with two functions: chat and talk - Apply computer vision to identify wearing a mask and user information - Apply natural language processing in virtual voice assistant to communicate with human - Apply natural language toolkit (NLTK) to build chat bot to communicate with human - Build database and collect more database when communicate with human 2 Strengths:

3 Weaknesses:

4 Approval for oral defense? (Approved or denied)

5 Overall evaluation: (Excellent, Good, Fair, Poor)

6 Mark: ………… (in words: )

Ho Chi Minh City, August 6 th , 2022

ADVISOR

Trang 6

-Ho Chi Minh City, August 6, 2022 PRE-DEFENSE EVALUATION SHEET Student name: Nguyen Tuan Thanh Student ID: 17151028 Major: Automation and Control Engineering Technology Project title: Service robot for students based on computer vision and natural language processing Name of Reviewer:

EVALUATION 1 Content of the project: - Design, implement a service robot with two functions: chat and talk - Apply computer vision to identify wearing a mask and user information - Apply natural language processing in virtual voice assistant to communicate with human - Apply natural language toolkit (NLTK) to build chat bot to communicate with human - Build database and collect more database when communicate with human 2 Strengths:

3 Weaknesses:

4 Approval for oral defense? (Approved or denied)

6 Mark: ………… (in words: )

REVIEWER

Trang 7

-EVALUATION SHEET OF DEFENSE COMMITTEE MEMBER

Student name: Nguyen Tuan Thanh Student ID: 17151028Major: Automation and Control Engineering Technology

Project title: Service robot for students based on computer vision and natural languageprocessing

Name of Defense Committee Member:

EVALUATION

1 Content of the project:

- Design, implement a service robot with two functions: chat and talk

- Apply computer vision to identify wearing a mask and user information

- Apply natural language processing in virtual voice assistant to communicate with human

- Apply natural language toolkit (NLTK) to build chat bot to communicate with human

- Build database and collect more database when communicate with human

2 Strengths:

3 Weaknesses:

5 Mark: ………… (in words: )

COMMITTEE MEMBER

Trang 8

Graduation Thesis

ACKNOWLEDGEMENT

In the process of completing the graduation project, in addition to my ownunderstanding, I have received a lot of support and dedicated help

First, I would like to express my deep gratitude to Associate Professor Dr Le My

Ha, who is both a teacher, a supporter, and an inspiration for me to complete this thesis

He oriented me to the right topic, and how to do it, and gave objective feedback to help

me when defending myself in front of the council Therefore, I feel very fortunate to haveworked with him

Next, I would like to thank the faculty of electronics and electronics faculty aswell as the high-quality training department for imparting useful knowledge during fouryears at the university This knowledge plays a fundamental role in the implementation of

my graduation thesis

In addition, I would also like to thank the Intelligent Systems Laboratory (ISLAB)

of the Faculty of Electrical and Electronic Engineering for supporting me in terms offacilities as well as useful knowledge during the completion of the project Andindispensable is the deep thanks to a friend Tran Thanh Hung who supported and guided

me to develop this topic

Finally, I would like to thank my family for always supporting, caring, andmotivating me to complete the project in the best possible way

Ho Chi Minh city, August 6th2022

Student

Trang 9

Graduation Thesis

Table of Contents

CHAPTER 1: INTRODUCTION 1

1.1 Define a problem 1

1.2 Project objectives 2

1.3 Project task 2

1.4 Project scopes 2

1.5 Approach and research 2

1.6 Project description 2

CHAPTER 2: LITERATURE REVIEW 4

2.1 Survey of robots being used in service industry 4

2.1.1 Mission of robots in the service industry 4

2.1.2 Pepper robot 4

2.2 Background of face recognition system 6

2.2.1 Concept 6

2.2.2 Structure and procedure for face recognition 6

2.2.3 Face Detection 8

2.3 Color spaces in image processing 9

2.3.1 RGB color space (Red-Green-Blue) 9

2.3.2 HSV color space (Hue-Saturation-Value) 9

2.4 Histogram of Oriented Gradients algorithm 10

2.5 Support Vector Machine algorithm 11

2.6 Background of speech recognition system 13

2.6.1 Concept 13

2.6.2 Speech Recognition 13

2.6.3 Applications 15

2.7 Framework and libraries 17

2.7.1 Framework Pytorch 17

2.7.2 Pandas 18

2.7.3 Numpy 18

2.8 Voice Assistant 18

2.9 ChatBot 19

Trang 10

Graduation Thesis

CHAPTER 3: SYSTEM DESIGN AND CONSTRUCTION 22

3.1 Requirements of the system 22

3.2 System description 22

3.2.1 The block diagram of the system 22

3.2.2 The function of each block 22

3.3 System design 23

3.3.1 Face detection: 23

3.3.2 Face recognition and identification: 24

3.3.3 Face mask detection 28

3.3.4 Speech recognition and voice assistant 30

3.3.5 Chatbot 30

CHAPTER 4: EXPERIMENT RESULTS, FINDINGS AND ANALYSIS 36

4.1 Face detection 36

4.2 Face recognition and identification 37

4.2.1 Training image data 37

4.2.2 Performing the face recognition 37

4.3 Face mask detection 38

4.4 Speech recognition and voice assistant 40

4.5 Chatbot 43

4.5.1 Create Training Data 43

4.5.2 NLP Basics 44

4.5.3 Complete chatbot 45

4.6 User interface 46

CHAPTER 5: CONCLUSIONS AND DIRECTIONS OF DEVELOPMENT 47

5.1 Conclusion 47

5.2 Direction of development 47

REFERENCES 48

Trang 11

Graduation Thesis

ABBREVIATIONS

NLP: Natural Language Processing OpenCV: Open Source Computer Vision Library HOG: Histogram of Oriented Gradients

SVM: Support Vector Machine Q&A: Question and answer

Trang 12

Graduation Thesis

List of figures

Figure 2 1 Pepper robot working in a mobile store 5

Figure 2 2 A typically procedure for face recognition model 7

Figure 2 3 Face and eye detection 8

Figure 2 4 RGB color space (Red-Green-Blue) 9

Figure 2 5 HSV color space (Hue-Saturation-Value) 10

Figure 2 6 Applications of HOG 11

Figure 2 7 An example of support vector in 2-Dimensional data 11

Figure 2 8 Margins describing in a plane 12

Figure 2 9 An example of linearly non separable dataset 12

Figure 2 10 Speech Recognition 14

Figure 2 11 Implementation of Speech Recognition 14

Figure 2 12 Interface of Windows Speech Recognition 16

Figure 2 13 Interface of Voice-To-Text Facebook Messenger 16

Figure 2 14 Interface of Google Speech to Text 17

Figure 2 15 Pytorch and TensorFlow Frameworks from 2017 to 2021 [10] 17

Figure 2 16 Reading CSV files with Pandas [11] 18

Figure 2 17 Example illustrating some functions in Numpy 18

Figure 2 18 Market share of voice assistants in the US, May 2018 [12] 19

Figure 2 19 Illustration for Chatbot 20

Figure 3 1 Block diagram of service robot designing by student 22

Figure 3 2 Face detection with 6 landmarks and multi-face support [17] 23

Figure 3 3 Training Process of face recognition 24

Figure 3 4 Five features of Haar cascade method [18] (a) Edge features (b) Line features (c) Four-rectangle feature 24

Figure 3 5 Cascade structure for Haar classifiers [18] 25

Figure 3 6 Sliding window in grayscale image [19] 27

Figure 3 7 Image meshing and histogram calculation [19] 27

Figure 3 8 Face recognition and identification processing 28

Figure 3 9 Face mask detection process 29

Trang 13

Graduation Thesis

Figure 3 10 Plotting all the milestone central issues of an individual's face on a white

foundation can provide us with a best guess of the shape [20] 29

Figure 3 11 The structure of training data [22] 31

Figure 3 12 Example of training data bag of words [22] 32

Figure 3 13 Example of NLP preprocessing pipeline [22] 32

Figure 3 14 Structure of Feed Forward Neural Network [23] 33

Figure 3 15 The simplest form of perceptron [23] 34

Figure 3 16 Chatbot training structure [22] 34

Figure 4 1 Six facial features are displayed when human face is detected and frame rate is measured 36

Figure 4 2 Detecting multiple faces in the same frame 37

Figure 4 3 The process of training image data 37

Figure 4 4 Username recognition and display 38

Figure 4 5 Detect 68 landmarks on user's face 39

Figure 4 6 Bounding the mouth and warning when the user is not wearing a mask 39

Figure 4 7 When the user wears a mask, the system will not give an alert 40

Figure 4 8 Identify and answer questions from users when the question is in the data set .41

Figure 4 9 Identify and answer questions from users when the question is not in the data set 41

Figure 4 10 Save unknown questions to unknown question sheet in excel 42

Figure 4 11 Relative calculation of response speed of gtts library 42

Figure 4 12 Relative calculation of response speed of pyttsx3 library 43

Figure 4 13 Training data made by the student 43

Figure 4 14 Tokenize all questions from data file 44

Figure 4 15 Lowercase all word tokenized and remove characters 44

Figure 4 16 All words after remove duplicate word and sorted 45

Figure 4 17 Example of the bag of words for all patterns 45

Figure 4 18 Chatbot interface 46

Figure 4 19 User interface designed by the student 46

Trang 14

Graduation Thesis

List of Tables

Table 2- 1 Specifications of Pepper robot 5Table 2- 2 The speech recognition package in Python 15Table 4- 1 Sample collects data from students 40

Trang 15

Graduation Thesis

ABSTRACT

With the advancement of science and technology, robots are gradually replacinghumans in work or help in daily life Similarly, to bring convenience to answering thedaily questions of students, this project will design a service robot that combinescomputer vision and natural language processing to adapt to this purpose Compared tothe traditional way of answering questions, students can go to school personnel ormessage student forums to ask about the problem they are facing These forms will oftentake a lot of time because the response time is often quite long, the number of staff islimited, and the number of students asking questions is often quite large Therefore, thistopic proposes a solution to replace traditional question-answering forms with robotscapable of consulting and answering questions of students through two forms ofcommunication: talking and chatting When in talking mode, the robot will recognize theuser, recognize the question by voice, and process it to give the appropriate answer.When in chatting mode, the user will enter a question into the chat box, and then therobot will process and give an appropriate answer From the descriptions above, this topicshows the convenience of answering questions of students quickly, saving humanresources for the school, and at the same time capturing objectively questions of students

Keywords: service robot, computer vision, natural language processing.

Trang 16

Graduation Thesis Chapter 1: Introduction

CHAPTER 1: INTRODUCTION 1.1 Define a problem

In the field of education, in addition to imparting useful knowledge, it is alsonecessary to listen to and answer the questions of students in the most effective way.Usually, the school will set up counseling teams or online forums for students to givetheir opinions or ask about unclear issues For the form of Q&A with the counselor, theschool will set up a team to take charge of this task The advantage of this form is thatstudents will easily communicate and receive the right answers with more focus As forthe online asking form through forums, the university also has to hire human resourcesfor the waiter to reply to messages to answer the questions of the students This formbrings convenience, even students can ask for answers through this format withouthaving to go to school On the other hand, these two forms have disadvantages such aslong waiting time for counseling, inflexible counseling hours, a limited number ofconsultants, and a large number of students Figure 1.1 reflects the fact that students have

to queue to receive advice from the school

Figure 1 1 Students line up to wait for their turn for advice from the school

In addition, due to the impact of the Covid-19 pandemic, human-to-humancommunication has become increasingly difficult From the above problems, the robotcannot be a more suitable solution in reducing the limitations of the two forms above.This device can effectively work with inquiries of students through two forms talking andchatting by using computer vision and natural language Therefore, this thesis will beproposed with the name “Service robot for students based on computer vision and naturallanguage processing”

Trang 17

1.2 Project objectives

With the essential need to serve the needs of students in answering the problemsencountered, this thesis was created to build a service robot with two functions talkingand chatting This robot is capable of recognizing and warning when the user is notwearing a mask, user information storage, and communicating by voice or text depending

on the intended use of the user

1.3 Project task

The project is implemented with the following main contents:

 Task 1: Collecting inquiries from students in the university

 Task 2: Surveying methods for face detection and face recognition

 Task 3: Surveying methods for speech recognition and processing

 Task 4: Researching about virtual assistant and chatbot

 Task 5: Researching natural language processing methods

 Task 6: Write the outlines to summarize the requirements of the project, design theblock diagram of the system, and explain the functions of the blocks

 Task 7: Designing software interfaces to interact with users

 Task 8: Test experiment, evaluate and calibrate the entire system

 Task 9: Write the project report

1.4 Project scopes

This project was created just to serve the questions of students on campus inVietnamese language on software interface, the accuracy of the answers is based on thevariety of data collected and suitable in a low-noise environment

1.5 Approach and research

 Approach:

– Reach out to the research object

– List the challenges that can be encountered when solving the problem

– Survey, evaluate and select algorithms, thereby forming the suitable system

1.6 Project description

The project is presented in 5 chapters as follows:

 Chapter 1: INTRODUCTION

Trang 18

Introducing the research content of the topic, setting out the objectives and tasksthat the topic needs to achieve, as well as clearly identifying the specific subjectand scope of research for the topic

 Chapter 2: THEORETICAL BASIS

A general presentation of the subject of study, the algorithms used, and theknowledge involved in the system training process

 Chapter 3: SYSTEM DESIGN AND CONSTRUCTIONDetailing the functionality of each working block, explaining specifically theimprovements used in system development, the functionality of the interface andsoftware

 Chapter 4: RESULTS ACHIEVEDGiving the test results that have been achieved proving the system's ability tocomplete the work

 Chapter 5: CONCLUSIONS AND DIRECTIONS OF DEVELOPMENTSummarizing the solved problems and bring out the remaining problems, therebygiving directions to solve them

Trang 19

Graduation Thesis Chapter 2: Literature Review

CHAPTER 2: LITERATURE REVIEW

In this chapter, the student will introduce the application of robots in the industry,the theory of face recognition and speech recognition, and their applications Besides, thestudent also introduced the Pytorch framework, a popular framework for MachineLearning problems, and some other libraries

2.1 Survey of robots being used in service industry

It is necessary to first describe robots in order to talk about their purposes A robot

is, in the simplest words, a machine designed to do difficult actions or jobs automatically.Some robots are designed to resemble humans and these are called androids, but manyrobots do not take such a form

Modern robots may employ artificial intelligence (AI) and speech recognitiontechnologies, and they may be fully or partially autonomous The industrial robots used

in factories or production lines are an example of how most robots are programmed tocarry out certain jobs with remarkable precision

2.1.1 Mission of robots in the service industry

Robots have been a prominent technology trend in the hospitality sector in partbecause self-service and automation concepts are becoming more and more important tothe client experience The usage of robots can result in advancements in efficiency,accuracy, and even speed

For example, chatbots allow a hotel or travel company to provide 24/7 supportthrough online chat or instant messaging services, even when staff would be unavailable,delivering extremely swift response times Meanwhile, a robot used during the check-inprocess can speed up the entire process, reducing congestion

2.1.2 Pepper robot

Pepper is a semi-humanoid robot manufactured by SoftBank Robotics (formerlyAldebaran Robotics), designed with the ability to read emotions It was introduced in aconference on 5 June 2014, and was showcased in SoftBank Mobile phone stores inJapan beginning the next day Pepper's ability to recognize emotion is based on detectionand analysis of facial expressions and voice tones To do so, Pepper has been equippedwith hardware such as:

 20 degrees of freedom for normal and expressive movements

 Speech recognition and voice assistant in 15 languages

 Perception modules

 Touch sensors, LEDs and microphones

 Infrared sensors, bumpers, an inertial unit, 2D and 3D cameras, and sonars

Figure 2.1 shows a robot called Pepper working in a mobile store [1]

Trang 20

Figure 2 1 Pepper robot working in a mobile store

● Specifications:

The robot's head has four microphones, two HD cameras (in the mouth andforehead), and a 3-D depth sensor (behind the eyes) There is a gyroscope in the torso andtouch sensors in the head and hands The mobile base has two sonars, six lasers, threebumper sensors, and a gyroscope

It is able to run the existing content in the app store designed for SoftBank's Naorobot Some necessary information about the robot is shown in that specifications table2.1

Table 2- 1 Specifications of Pepper robot

Dimensions Height: 1.20 meters (4 ft)

Depth: 425 millimeters (17 in)Width: 485 millimeters (19 in)

Capacity: 30.0Ah/795WhDisplay 10.1-inch touch displayHead Mic × 4, RGB camera × 2,3D sensor × 1,

Trang 21

Touch sensor × 3

Legs Sonar sensor × 2, Laser sensor × 6, Bumper

sensor × 3, Gyro sensor × 1

Moving parts Degrees of motion

Head (2°), Shoulder (2° L&R), Elbow (2rotations L&R), Wrist (1° L&R), Hand with

5 fingers (1° L&R), Hip (2°), Knee (1°),Base (3°)

2.2.2 Structure and procedure for face recognition

Generally, a face recognition system is often described as a process that involvesfour stages as shown in Figure 2.2: face detection, face alignment, feature extraction, andfinally face recognition

Trang 22

Figure 2 2 A typically procedure for face recognition model

Regarding the image above, it is able to conclude that a face recognition modelcontains 5 stages as described in detail below

Face detection: As can be seen from the chart, the input of face detection is asequence of images captured from a video stream The detected faces may need to betracked across multiple frames using a face tracking component While face detectionprovides a coarse estimate of the location and scale of the face, face landmarkinglocalizes facial landmarks (e.g., eyes, nose, mouth, and facial outline) This may beaccomplished by a landmarking module or face alignment module In short, facedetection will locate one or more faces in the image and mark them with a bounding box[2]

Face alignment: This stage is performed to normalize the face geometrically andphotometrically This is necessary because state-of-the-art recognition methods areexpected to recognize face images with varying pose and illumination The geometricalnormalization process transforms the face into a standard frame by face cropping.Warping or morphing may be used for more elaborate geometric normalization Thephotometric normalization process normalizes the face based on properties such asillumination and gray scale [2]

Feature extraction: This is vital for face recognition Face feature extraction isperformed on the normalized face to extract salient information that is useful fordistinguishing faces of different persons and is robust with respect to the geometric andphotometric variations The extracted face features are used for face matching, which isdescribed at the next stage [2]

Feature matching: The final stage which performs matching of the face against one

or more known faces in a prepared database is shown the matcher outputs ‘yes’ or ‘no’for 1:1 verification In case of 1: N identification, the output is the identity of the inputface when the top match is found with sufficient confidence or unknown when the tipmatch score is below a threshold The main challenge in this stage of face recognition is

to find a suitable similarity metric for comparing facial features [2]

Trang 23

Figure 2 3 Face and eye detection

The algorithms must be trained on huge data sets with hundreds of thousands ofboth positive and negative images in order to assist assure accuracy The algorithms'capacity to identify faces in a picture and where they are increases with training

The methods used in face detection:

 Knowledge-based, or rule-based methods, describe a face based on rules Thechallenge of this approach is the difficulty of coming up with well-defined rules

 Feature invariant methods which use features such as a person's eyes or nose todetect a face

 Template-matching methods are based on comparing images with standard facepatterns or features that have been stored previously and correlating the two to

Trang 24

detect a face Unfortunately, these methods do not address variations in pose, scale,and shape

 Appearance-based methods employ statistical analysis and machine learning tofind the relevant characteristics of face images This method, also used in featureextraction for face recognition, is divided into sub-methods

2.3 Color spaces in image processing 2.3.1 RGB color space (Red-Green-Blue)

RGB color models use complementary modeling in which red, green, and bluelight are combined in different ways to form other colors There, colors are represented asone or more integer decimal values The RGB color model was represented in Figure 2.4

Figure 2 4 RGB color space (Red-Green-Blue)

If each color channel is encoded with 1 byte (8 bits), and the value is in thesegment [0, 255], then we have a 24-bit color image, and all 28 × 28× 28 = 16,581,375colors can be encoded (about16 million colors) For example, some of the basic colorsrepresented in the RGB color space such as: [0; 0; 0] is Black, [255; 255; 255] is White,[255; 0; 0] is Red, [0; 255; 0] is Green, [0; 0; 255] is Blue

2.3.2 HSV color space (Hue-Saturation-Value)

HSV color space, which is also known as HSI (Hue-Saturation-Intensity), HSL(Hue-Saturation-Light) It is based on visual color properties such as tint, shade, and tone;

in other words, they are color, purity, and brightness Figure 2.5 showing the briefdescription of HSV space color

Trang 25

Figure 2 5 HSV color space (Hue-Saturation-Value)

● Hue: color tone, runs from 0 to 360

● Saturation: is the degree of purity of the color, which means how much white isadded to the pure color The value of S is in the segment [0, 255], where S = 255

is the purest color, completely non-white In other words, the larger the S, thepurer color

● Value: Also known as Intensity, Lightness, the value ranges in [0, 255], where V =

0 is completely dark (black), V = 255 is completely bright In other words, thelarger the V, the brighter color

2.4 Histogram of Oriented Gradients algorithm

HOG (Histogram of oriented gradient) [5] is an algorithm that will generate afeature descriptor to detect objects From a photo, we will take out two importantmatrices that help save image information: gradient magnitude and gradient orientation

By combining these 2 pieces of information into a histogram distribution chart, where thegradient magnitude is counted according to the bins groups of the gradient equation.Finally, we will obtain the HOG-specific vector representing the histogram Someapplications of HOG are shown in Figure 2.6

Trang 26

Figure 2 6 Applications of HOG

2.5 Support Vector Machine algorithm

Supervised learning algorithms, such as SVM, are used to solve both classificationand regression issues However, it is largely employed in Machine LearningClassification issues Using a method or parameter known as Kernel, SVMs caneffectively conduct non-linear classification in addition to linear classification byimplicitly mapping their inputs into high-dimensional feature spaces

SVMs are based on the idea of finding a hyperplane that best divides a dataset intotwo classes, as shown in Figure 2.7

Figure 2 7 An example of support vector in 2-Dimensional data

For 1 Dimensional data, the support vector classifier is a point Similarly, for Dimensional data, the support vector classifier will be a line, and for 3-dimensional data,

2-a support vector cl2-assifier is 2-a pl2-ane And for 4 dimension2-al or more, the support vectorclassifier will be a hyperplane

In geometry, a hyperplane is a subspace whose dimension is one less than that ofits ambient space If space is 3-dimensional then its hyperplanes are the 2-dimensional

Trang 27

planes, while if the space is 2-dimensional, its hyperplanes are the 1-dimensional lines.This notion can be used in any general space in which the concept of the dimension of asubspace is defined [3]

Figure 2 8 Margins describing in a plane

The distance between the hyperplane and the nearest data point from either set isknown as the margin The goal is to choose a hyperplane with the greatest possiblemargin between the hyperplane and any point within the training set, giving a greaterchance of new data being classified correctly, as shown in Figure 2.8

However, data is rarely ever as clean as our simple example above A dataset willoften look more like the jumbled balls below which represents a linearly non-separabledataset To classify a dataset like the one above it's necessary to move away from a 2dview of the data to a 3d view as shown in Figure 2.9

Figure 2 9 An example of linearly non separable dataset

Because we are now in three dimensions, our hyperplane can no longer be a line

It must now be a plane as shown in the example above The idea is that the data willcontinue to be mapped into higher and higher dimensions until a hyperplane can beformed to segregate it

SVM could work well on smaller and cleaner datasets with high accuracy.Because of using a subset of training points, it gives more efficient results Despite giving

Trang 28

some advantages, there are also some drawbacks when applying SVM algorithm Firstly,this algorithm is not suitable for dealing with larger datasets which makes the trainingtime longer Secondly, the critical problem is less effective on noisier datasets withoverlapping classes

2.6 Background of speech recognition system 2.6.1 Concept

The method of speech recognition is intricate The voice output signal is analog.These signal samples are feature extracted by sampling, quantizing, and coding to create

a digital signal These characteristics will serve as the identification process' input Therecognition outcome will be produced by the recognition system

Some difficult factors for speech recognition problem:

- When pronouncing, speakers are fast and slow

- The spoken words are often different in length

- The same person says the same word but has different pronunciations and endings withdifferent analysis results

- Each person has their own voice expressed through pitch, loudness, intensity, pitch andtimbre Noise factors of the environment, receiving equipment… also not small to therecognition efficiency

The speech-to-text recognition and conversion system is widely researched anddeveloped by domestic and international scientists

a suitable form, as shown in Figure 2.10

Trang 29

Figure 2 10 Speech Recognition

Although speech recognition appears quite futuristic, it is already commonplace

We can speak out our question or the one we want help with on automated phone calls,and voice recognition is also used by your virtual assistants like Siri or Alexa to conversewith you naturally

Python's speech recognition uses algorithms that model speech in terms of bothlanguage and sound In order to extract the more important parts of speech, such as wordsand sentences, acoustic modeling is utilized to distinguish the phenones and phonetics inour speech

Figure 2 11 Implementation of Speech Recognition

Shown in Figure 2.11, speech recognition begins by using a microphone totransform the sound energy produced by the speaker into electrical energy This electricalenergy is subsequently transformed from analog to digital and eventually to text

It separates the audio data into sounds and then uses algorithms to analyze thesounds to determine which word is most likely to fit the audio Neural Networks andNatural Language Processing [6, 7] are used for all of this The accuracy of voicerecognition can be increased by identifying temporal patterns using hidden Markovmodels

For the speech recognition task, python supports a lot of packages Table 2.2outlines some of these packages and highlights their specialty

Trang 30

Table 2- 2 The speech recognition package in Python

Package FunctionalityApiai Includes natural language processing for identifying a

speaker’s intent

Google-cloud-speech Offers basic speech to text conversion

Speech Recognition Offers easy audio processing and microphone accessibility

Watson-developer-cloud Watson developer cloud is an Artificial Intelligence API that

makes creating, debugging, running, and deploying APIseasy It can be used to perform basic speech recognitiontasks

Table 2.2 provides information about packages available in python, in which, there

is one package that stands out in terms of ease-of-use is Speech Recognition [6, 8]

Recognizing speech requires audio input, and Speech Recognition makesretrieving this input really easy Instead of having to build scripts for accessingmicrophones and processing audio files from scratch, Speech Recognition can have we

up and running in just a few minutes

The Speech Recognition package offers the following advantages [6, 8]:

 Easy speech recognition from the microphone

 Makes it easy to transcribe an audio file

 It also lets us save audio data into an audio file

 It also shows us recognition results in an easy-to-understand format

2.6.3 Applications

● Windows Speech Recognition

As seen in Figure 2.12, the 2009-born "Windows Voice Recognition" applicationbuilt into Microsoft Windows 7, Windows 8, and Windows 10 can identify speech tomanage and control software and apps on the Windows operating system to speed up userexperience

Trang 31

Figure 2 12 Interface of Windows Speech Recognition

The ability to manage and control computer software and applications, as well asproduce text from voice, are the application's key capabilities However, there are still alot of issues with this identifier to be fixed, like the need to memorize it before using it;trouble accurately separating voices; the identifier is ineffective and cannot yet discernVietnamese

● Voice-To-Text Facebook Messenger

Facebook Messenger launched with the "Voice-To-Text" feature added in 2013

As seen in Figure 2.13, this program identifies the voice and turns it into a text messagethat is transmitted to the receiver via the Facebook Messenger application's text messageinput

Along with other benefits like the ability to recognize quite precisely, employingthe Facebook machine's data warehouse eliminates the requirement for prior training.However, this tool does not support Vietnamese and simply converts voice to text

Figure 2 13 Interface of Voice-To-Text Facebook Messenger

● Google Speech to Text

Google created Google Speech to Text around two years ago The programintegrates into the Chrome browser, works on a variety of platforms, including Windows,iOS, and Android, and can detect lengthy texts The Google Speed to Text API isdisplayed in Figure 2.14

Trang 32

Figure 2 14 Interface of Google Speech to Text

This tool was launched in 2017 and has significantly improved the disadvantages

of its predecessors such as the ability to recognize good language conversion and supportVietnamese Figure 2.14 shows famous voice recognition tools, which can realize theusefulness of voice recognition in the world Based on the above research, the studentdecided to use Google's voice recognition tool to apply to this project

2.7 Framework and libraries 2.7.1 Framework Pytorch

PyTorch [9] is a Python-based library for creating Deep Learning models andusing them for various applications PyTorch is not just a Deep Learning library, but apackage for scientific computing as the official documentation of PyTorch [9] mentioned:

“It's a Python-based scientific computing package targeted at two sets of audiences:

1 A replacement for NumPy to use the power of GPUs

2 A deep learning research platform that provides maximum flexibility and speed.”

PyTorch [9] is similar to Python, it is designed with a focus on ease of use andeven users with very basic programming knowledge can use it in Deep Learning relatedprojects Figure 2.15 compares two popular frameworks in Machine Learning problems,TensorFlow and Pytorch in the period from 2017 to 2021

Figure 2 15 Pytorch and TensorFlow Frameworks from 2017 to 2021 [10]

Tiêu đề	Service robot for students based on computer vision and natural language processing
Tác giả	Nguyen Tuan Thanh
Người hướng dẫn	Assoc. Prof. PHD. Le My Ha
Trường học	Ho Chi Minh City University of Technology and Education
Chuyên ngành	Automation and Control Engineering
Thể loại	Graduation project
Năm xuất bản	2022
Thành phố	Ho Chi Minh City

Định dạng
Số trang	65
Dung lượng	4,3 MB