Khóa luận tốt nghiệp xây dựng hệ thống nhận diện người nổi tiếng ở VIỆT NAM cho mạng xã hội LOTUS

HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC CÔNG NGHỆ THÔNG TINKHOA CÔNG NGHỆ PHẦN MỀM LÊ THỊ PHƯƠNG NGÂN NGUYẾN TIẾN TRUNG KHÓA LUẬN TỐT NGHIỆP XÂY DỰNG HỆ THỐNG NHẬN DIỆN NGƯỜI NỔI TIẾNG Ở VIỆT NAM C

Trang 1

ĐẠI HỌC QUỐC GIA TP HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC CÔNG NGHỆ THÔNG TIN

KHOA CÔNG NGHỆ PHẦN MỀM

LÊ THỊ PHƯƠNG NGÂN NGUYẾN TIẾN TRUNG

KHÓA LUẬN TỐT NGHIỆP

XÂY DỰNG HỆ THỐNG NHẬN DIỆN NGƯỜI NỔI TIẾNG Ở VIỆT NAM CHO MẠNG XÃ HỘI LOTUS BUILDING VIETNAMESE CELEBRITY FACE RECOGNITION SYSTEM FOR LOTUS - VIETNAM SOCIAL NETWORK

KỸ SƯ NGÀNH KỸ THUẬT PHẦN MỀM

TP HỒ CHÍ MINH, 2021

Trang 2

VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGY

Trang 3

THÔNG TIN HỘI ĐỒNG CHẤM KHÓA LUẬN TỐT NGHIỆP

Hội đồng chấm khóa luận tốt nghiệp, thành lập theo Quyết định số ……… ngày ……… của Hiệu trưởng Trường Đại học Công nghệ Thông tin

1 ……… – Chủ tịch

2 ……… – Thư ký

3 ……… – Ủy viên

4 ……… – Ủy viên

Trang 4

ĐẠI HỌC QUỐC GIA TP HỒ CHÍ MINH

TRƯỜNG ĐẠI HỌC

CÔNG NGHỆ THÔNG TIN

CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM

Độc Lập - Tự Do - Hạnh Phúc

TP HCM, ngày… tháng… năm……

NHẬN XÉT KHÓA LUẬN TỐT NGHIỆP (CỦA CÁN BỘ HƯỚNG DẪN/PHẢN BIỆN)

Tên khóa luận:

XÂY DỰNG HỆ THỐNG NHẬN DIỆN NGƯỜI NỔI TIẾNG Ở VIỆT NAM CHO MẠNG

XÃ HỘI LOTUS

Lê Thị Phương Ngân 16520792 Ths Đỗ Văn Tiến

Nguyễn Tiến Trung 16521321

Đánh giá Khóa luận

1 Về cuốn báo cáo:

Số trang _ Số chương _

Số bảng số liệu _ Số hình vẽ _

Số tài liệu tham khảo _ Sản phẩm _

Một số nhận xét về hình thức cuốn báo cáo:

2 Về nội dung nghiên cứu:

Trang 5

3 Về chương trình ứng dụng:

4 Về thái độ làm việc của sinh viên:

Đánh giá chung:Khóa luận đạt/không đạt yêu cầu của một khóa luận tốt nghiệp kỹ sư/ cử nhân,

xếp loại Giỏi/ Khá/ Trung bình

Điểm từng sinh viên:

Lê Thị Phương Ngân: ……… /10

Nguyễn Tiến Trung: ……… /10

Người nhận xét

(Ký tên và ghi rõ họ tên)

ĐỖ VĂN TIẾN

Trang 6

ĐẠI HỌC QUỐC GIA TP HỒ CHÍ MINH

TRƯỜNG ĐẠI HỌC

CÔNG NGHỆ THÔNG TIN

CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM

Độc Lập - Tự Do - Hạnh Phúc

TP HCM, ngày… tháng… năm……

NHẬN XÉT KHÓA LUẬN TỐT NGHIỆP

(CỦA CÁN BỘ PHẢN BIỆN)

Tên khóa luận:

XÂY DỰNG HỆ THỐNG NHẬN DIỆN NGƯỜI NỔI TIẾNG Ở VIỆT NAM CHO MẠNG

XÃ HỘI LOTUS

Lê Thị Phương Ngân 16520792

Nguyễn Tiến Trung 16521321

Đánh giá Khóa luận

1 Về cuốn báo cáo:

Số trang _ Số chương _

Số bảng số liệu _ Số hình vẽ _

Số tài liệu tham khảo _ Sản phẩm _

Một số nhận xét về hình thức cuốn báo cáo:

2 Về nội dung nghiên cứu:

Trang 7

3 Về chương trình ứng dụng:

4 Về thái độ làm việc của sinh viên:

Đánh giá chung:Khóa luận đạt/không đạt yêu cầu của một khóa luận tốt nghiệp kỹ sư/ cử nhân,

xếp loại Giỏi/ Khá/ Trung bình

Điểm từng sinh viên:

Lê Thị Phương Ngân: ……… /10

Nguyễn Tiến Trung: ……… /10

Người nhận xét

(Ký tên và ghi rõ họ tên)

Trang 8

We would like to give our thesis for those who always help and teach ususeful knowledge during the time we complete our thesis, for those who al-ways inspire us when we face difficult problems and for our beloved familywho always facilitate us to complete our entire study.

Trang 9

Firstly, we specially would like to show our appreciation and thank M.S DoVan Tien so much for instructing, helping and making pieces of useful ad-vice for us The conscientious instructors have taught us a lot of knowledge,along with various skills to complete our undergraduate thesis

Lastly, we thank our family and friends of the KTPM2016 class for alwaysinspiring us through the time we studied at University of Information Tech-nology

Ho Chi Minh City, 1 - 2021

Le Thi Phuong Ngan - Nguyen Tien Trung

Trang 10

1.1 Problem statements 3

1.2 Goals and scope 6

1.2.1 Goals 6

1.2.2 Scope 6

1.3 Contributions 7

1.4 Outline 7

2 RELATED WORKS 8 2.1 Introduction 8

2.2 Image processing face recognition 9

2.3 The methods face recognition 10

2.3.1 Classical face recognition algorithms 10

2.3.2 Deep learning for face recognition 11

2.4 Face recognition applications 11

Trang 11

3.1 Background 13

3.1.1 Artificial Neural Networks (ANNs) 13

3.1.2 Convolutional Neural Networks (CNNs) 17

3.2 Face Detection 20

3.2.1 MTCNN 20

3.2.2 SSH 22

3.2.3 Retinaface 25

3.3 Face Recognition 28

3.3.1 Softmax Loss 28

3.3.2 Centre Loss 28

3.3.3 Triplet Loss 29

3.3.4 SphereFace Loss 30

3.3.5 CosFace Loss 31

3.3.6 Arcface 31

3.4 Searching Embedding Vector 34

3.4.1 Similarity Searching 34

3.4.2 Evaluating similarity search 35

3.4.3 Faiss 35

4 EXPERIMENT 36 4.1 Dataset 36

4.1.1 Create celebrity list 36

4.1.2 Preprocess data 38

4.2 Evaluation 38

4.2.1 Some evaluation measures 38

4.2.1.1 38

4.2.1.2 Precision and Recall 39

4.2.1.3 F1-Score 39

4.2.1.4 IoU 40

4.2.1.5 mAP 41

Trang 12

4.3 Results and Evalutions 41

4.3.1 Results 41

4.3.1.1 Face detection stage 41

4.3.2 Evalutions 42

4.4 Discussion 43

5 DEMONSTRATION 44 5.1 Preamble 44

5.2 Use-case diagram 44

5.2.1 Actor 44

5.2.2 Usecase diagram 45

5.2.3 Usecase Specification 46

5.2.3.1 The use case description celebrity identity 46

5.2.3.2 The use case description looking for celebrity 47

5.2.3.3 The use case description add celebrity 48

5.3 Sequence diagram 49

5.4 Activity diagram 52

5.5 Processing flow 55

5.6 System architecture 56

5.6.1 The celebrity recognition system API in Vietnam 57

5.7 Interface 59

5.7.1 Screen Details Description 59

5.7.1.1 Home Page 59

5.7.1.2 Predict Page 61

5.7.1.3 Search Page 63

5.7.1.4 Celebrity Add Page 64

5.8 Consequence 66

5.9 Conclusion 66

Trang 13

6.2 Development 68

Trang 14

List of Figures

1.1 Some celebrity people in Vietnam 4

2.1 Face recognition process flow.1 9

2.2 A mesh consists of vertices plus triangles 10

2.3 A mesh consists of vertices plus triangles 11

3.1 The illustration of a simple architecture of Artificial neural network 13

3.2 A cartoon drawing of a biological neuron (left) and a common math-ematical model (right) 14

3.3 Sigmoid function 15

3.4 Tanh function 16

3.5 ReLU function 16

3.6 Activation function of ANN 16

3.7 An example of the receptive field 18

3.8 The ReLU activation function 19

3.9 An example for fully connected layer (FC) 20

3.10 Pipeline of cascaded framework 21

3.11 The architectures of P-Net 21

3.12 The architectures of R-Net 22

3.13 The architectures of O-net 22

3.14 The network architecture of SSH 23 3.15 Detection Module : Set of conv layers for detecting and localizing faces 23

Trang 15

LIST OF FIGURES

3.19 Common loss functions for Face Recognition: Softmax Loss 28

3.20 Decision margins of Softmax loss function under binary classification case 28

3.21 Based on the centre and feature normalisation, all identities are dis-tributed on a hyperspher 29

3.22 Triplet loss example 29

3.23 Feometry Interpretation of Euclidean margin loss 30

3.24 Decision margins of different loss functions under binary classifica-tion case 31

3.25 Training a DCNN for face recognition supervised by the ArcFace loss 32 3.26 Toy examples under the Softmax and ArcFace loss on 8 identities with 2D features 33

4.1 Query list data from https://query.wikidata.org/ 37

4.2 Crawl image for Son Tung MTP singer from selenium 38

4.3 Illustration for Confusion matrix (Internet) 39

4.4 Formula Precision and Recall (Internet) 39

4.5 Formula F1-Score (Internet) 40

4.6 Illustration for IoU (Internet) 40

4.7 Formula mAP (Internet) 41

4.8 Representation formula mAP (Internet) 41

4.9 Statistical chart mAP 42

4.10 Statistical chart Average inference time 42

5.1 Usecase diagram1 45

5.2 Sequence diagram Celebrity identity 49

5.3 Sequence diagram looking for celebrity 50

5.4 Sequence diagram add celebrity 51

5.5 Activity diagram celebrity identity 52

5.6 Activity diagram looking for celebrity 53

5.7 Activity diagram add celebrity 54

5.8 Processing flow system 55

Trang 16

LIST OF FIGURES

5.9 Client-server architecture 56

5.10 The face recognition celebrity system architecture 57

5.11 Screens flow diagram of the application that celebrity recognition in images.1 60

5.12 Home Page.2 61

5.13 Predict Page 62

5.14 Search Page 63

5.15 Celebrity Add Page Steps 1 64

5.16 Celebrity Add Page Steps 2 65

Trang 17

List of Tables

3.1 Verification performance (%) of different methods on LFW and YTF 34

4.1 Celebrity data statistics 37

5.1 List actor 44

5.2 Table usecase 46

5.3 Input paramenter API: /getcelebs 57

5.4 Results API: /getcelebs 58

5.5 Input paramenter API: /predict 58

5.6 Results API: /predict 58

5.7 Input paramenter API: /search 58

5.8 Results API: /search 58

5.9 Input paramenter API: /checkname 59

5.10 Results API: /checkname 59

5.11 Input paramenter API: /addceleb 59

5.12 Results API: /addceleb 59

5.13 Table List Screen 60

5.14 Parameters in Home Page 62

5.15 Parameters in Predict Page 62

5.16 Parameters in Search Page 63

5.17 Parameters in Celebrity Add Page 65

Trang 18

Abbreviations

ANN(s) Artificial Neural Networks

CNN(s) Convolutional Neural Networks

Conv Convolutional

LBP Local binary patterns

LMCL Large Margin Cosine Loss

MTCNN Multi-task Cascaded Convolutional NetworksNMS Non-maximum suppression

PCA Principal component analysi

ReLU Rectified Linear Unit Layer

SSH Single Stage Headless

SV M Support Vector Machine

Trang 19

With the development of technology and engineering, computer fields areincreasingly developed, especially in the field of computer vision One ofthe problems in computer vision is face recognition Face recognition hasmany applications in many different fields: attendance, identification, secu-rity, etc

There are many research and methods for the face recognition problem inimages However, The methods applied primarily deal with images contain-ing faces that are oriented, vertically oriented, and well-lit In the real prob-lem, images need to be processed because they are subjected to many envi-ronmental impacts and many different distortions Especially with celebrityrecognition problems, the recognition and classification of celebrity people

in a country are very complicated So the group decided to research andsolve the problem of recognizing the faces of celebrity people in Vietnam

In the field of computer vision, there are quite a few methods to solve theseproblems, especially the current approaches that are supposed to achievegood results are the use of deep learning Choosing the precision and pro-cessing speed to build the application is also a big challenge for us

In consequence, this thesis has done the following contents:

• Get an overview of machine learning, the basics of machine learning

• Get an overview of Deep Learning and explore today’s most advancedfacial recognition methods using Deep Learning

• Building training data set including images for face detection, imagesfor face recognition

• Building model evaluation data set including images for face detectionand images for face recognition For the evaluation of the respective

Trang 20

models in each stage.

• Building a celebrity face recognition website application in Vietnam inimages

Trang 21

Chapter 1 INTRODUCTION

1.1 Problem statements

Nowadays, with the development of information technology, data is distributed erywhere, every second million images and videos are uploaded to the Internet anddistributed quickly In particular, social networks are where people regularly postpictures The meaning of posting images is often to share memories, experiences,products, and personal art with others The images of celebrities often have largeshares and interactions on social networks This is an advantage of celebrities in thefield of product advertising

ev-Currently, many domestic and foreign companies have been solving face tion problems such as Apple using face recognition to unlock mobile devices; Face-book uses a friend face tagging system to connect with the community; Financialcompanies have used face recognition to authenticate payments instead of hard cards;airports and terminals use face recognition to control security; Schools, companieswant to use automatic attendance and attendance systems through facial authentica-tion, , but the recognition of faces of famous people in Vietnam in images is veryfew people mind and especially with celebrities in Vietnam, while more and morepictures are posted on social networks

recogni-Based on our limited understanding during the survey period, we find that therehave been many studies on the face recognition problem, many models were given,many pre-trained models were public with free public face datasets The resultsachieved on this data set are very good, but many observations show that applyingthem to real problems in Vietnam is not as good With the aim of building research

Trang 22

(a) My Tam (b) Chi Pu

Figure 1.1: Some celebrity people in Vietnam.

Trang 23

purposes, capturing technology, optimizing the human face recognition problem forVietnamese, we decided to research and build a celebrity face recognition system inVietnam This and that is also the reason why the Vietnamese celebrity identity sys-tem in the image came into being With this system, we can extract information in theimage so that it can be applied in many different fields, especially in the recommen-dation system.

In recent years, the amount of data appears more and more, this contributes to therapid development of Deep Learning and gradually becomes a trend with improvedcomputing speed Deep Learning methods and algorithms achieve better results thanthe handcrafted approach That is why the Deep Learning approach is interested in theresearch community Therefore, in the process of researching and developing DeepLearning, the number of algorithms and methods from there appeared more and morerich and diverse And to know which algorithm is suitable for use in this identificationproblem, we have conducted a survey and evaluation on many methods to choose themost suitable method for use From then apply this method to build up the identifierused to develop applications in the future

Trang 24

1.2 Goals and scope

1.2.1 Goals

To solve the problem of face recognition in images We have set out particular goals

to get the thesis done:

• Get an overview of machine learning, the basics of machine learning

• Get an overview of Deep Learning and explore today’s most advanced facialrecognition methods using Deep Learning

• Building training data set including images for face detection, images for facerecognition

• Building model evaluation data set including images for face detection and ages for face recognition For the evaluation of the respective models in eachstage

im-• Building a celebrity face recognition website application in Vietnam in images

1.2.2 Scope

The scope of our thesis consists:

• Face recognition celebrity people in Vietnam

• Execute algorithm evaluation of each stage: face detection, face recognition

• Build a list of celebrity people in Vietnam and a data set of celebrity people inVietnam for the face recognition phase, data is collected on Google Image

• Building illustration applications for the problem of face recognition celebritypeople in Vietnam in images

Trang 25

1.3 Contributions

These are some of the contributions that we made after making the thesis:

• Researching knowledge and approaches in accordance with the face recognitionapproaches, especially approaches based on Deep Learning

• We build a dataset exclusively for research purposes, to optimize the problem ofhuman facial recognition for Vietnamese

• We evaluate state-of-the-art face detection approaches based on different aspectsincluding the execution time, accuracy, resource usage, along with the trade-offamong avariety of different inputs and its base network Following the results

we achieved, we make our analyses how to choose a suit-able models for facedetecting

• Building illustrations that users use to recognition celebrity people in images

1.4 Outline

Chapter 1: General introduction

Chapter 2: Related works

Chapter 3: Face recognition with deep learning

Chapter 4: Experiment

Chapter 5: Demonstration

Trang 26

Chapter 2 RELATED WORKS

In this chapter, we will introduce the face recognition problem, processing flow, facerecognition methods, and applications of the problem

2.1 Introduction

Face recognition, as one of the most successful applications of computer vision, hasrecently gained significant attention It is due to availability of feasible technologies,including mobile solutions Research in automatic face recognition has been con-ducted since the 1960s, but the problem is still largely unsolved Last decade hasprovided significant progress in this area owing to advances in face modelling andanalysis techniques Although systems have been developed for face detection andtracking, reliable face recognition still offers a great challenge to computer vision andpattern recognition researchers There are several reasons for recent increased interest

in face recognition, including rising public concern for security, the need for identityverification in the digital world, face analysis and modelling techniques in multimediadata management and computer entertainment In this chapter, we have discussed facerecognition processing, including major components such as face detection, tracking,alignment and feature extraction, and it points out the technical challenges of build-ing a face recognition system We focus on the importance of the most successfulsolutions available so far The final part of the chapter describes chosen face recog-nition methods and applications and their potential use in areas not related to facerecognition

Trang 27

2.2 Image processing face recognition

Face recognition is often described as a process that first involves four steps; they areface detection, face alignment, feature extraction, and finally face recognition

• Face Detection: Locate one or more faces in the image and mark with a ing box

bound-• Face Alignment: Normalize the face to be consistent with the database, such asgeometry and photometrics

• Feature Extraction: Extract features from the face that can be used for therecognition task

• Face Recognition: Perform matching of the face against one or more knownfaces in a prepared database

Figure 2.1: Face recognition process flow.1.

Trang 28

2.3 The methods face recognition

2.3.1 Classical face recognition algorithms

These techniques were born and developed long ago Thanks to the binding of theinput image to be of good quality, traditional techniques manually extract featuresfrom the image, which are then used for classification

The traditional face recognition algorithms can be categorised into two categories:holistic features and local feature approaches.Representatives of the two methods wetake for example are PCA [1] and LBP [2]

PCA [1]: Composition analysis is the oldest and most popular method when it comes

to face recognition research The main idea of the component analysis algorithm is toreduce the number of dimensions of the data set with a great correlation between thevariables The essence of PCA [1] is to solve the problem of finding eigenvalues andeigenvectors for the symmetry matrix

Figure 2.2: A mesh consists of vertices plus triangles

LBP [2] This method divides the image into a grid of blocks, the features on each

Trang 29

Figure 2.3: A mesh consists of vertices plus triangles

2.3.2 Deep learning for face recognition

With the advancement of hardware, deep learning techniques gradually prevailed withgreater precision than traditional techniques The biggest difference is that the featureextraction takes place automatically, without requiring the convolution classes prede-fined This helps deep learning techniques to solve the problem at a more generallevel and avoids too much dependence on each data set

It can be said that the biggest advantage of deep learning techniques is the process

of self-learning to choose the best features to classify, and the performance of racy is much superior to other traditional techniques However, training deep learningmodels requires a large amount of data, is very expensive, and also takes a lot of time

accu-to train

2.4 Face recognition applications

The potential application areas of face recognition technology can be outlined as lows:

fol-• Automated surveillance, where the objective is to recognise and track people

• Face recognition can use to look for lost children or other missing persons or

Trang 30

tracking is known or suspected criminals.

• Airplane-boarding gate, the face recognition may be used in places of randomchecks merely to screen passengers for further investigation

• Scan your face to unlock your phone similar to Apple

• Payment assistance: Instead of paying with cash or a credit card, you simplyshow an expression to the scanner to make a purchase

The above are some of the applications of popular face recognition, which can also

be applied in many different fields: military, medical, entertainment, etc

There have been envisaged many applications for face recognition, but most ones ploit only superficially the great potential of this technology Most of the applicationsare notably limited in their ability to handle pose, lighting changes, or aging

ex-2.5 Conclusion

Face recognition is still a challenging problem in the field of computer vision It hasreceived a great deal of attention over the past years because of its several applica-tions in various domains Although there is strong research effort in this area, facerecognition systems are far from ideal to perform adequately in all situations formreal world There is much work to be done in order to realise methods that reflecthow humans recognise faces and optimally make use of the temporal evolution of theappearance of the face for recognition

Through this chapter, we know the methods, stages of the face recognition lem, and the problem encountered in different problems, to understand the context ofthe current problem that needs to be solved in the direction In addition, it is to knowhow to choose an approach and a suitable solution to the problem being solved Fromthere, we can select the corresponding methods for subproblems

Trang 31

prob-Chapter 3 FACE RECOGNITION ON DEEP LEARNING

In this part, we present the foundation which we read for our proposition includingArtificial Neural Networks, Convolutional Neural Networks

3.1 Background

3.1.1 Artificial Neural Networks (ANNs)

Figure 3.1: The illustration of a simple architecture of Artificial neural network 1

Artificial Neural Network is a deep learning algorithm that emerged and evolvedfrom the idea of Biological Neural Networks of human brains An attempt to simulatethe workings of the human brain ANN works very similar to the biological neuralnetworks but doesn’t exactly resemble its workings By feeding training examples(“experience”) to an ANN, and by adjusting the weights accordingly, an ANN learnscomplex functions much like a biological brain A neural network with a single layer

is called a perceptron A multi-layer perceptron is called Artificial Neural Networks

A Neural network can possess any number of layers Each layer can have one or more

1 http://cs231n.github.io/neural-networks-1/

Trang 32

neurons or units Each of the neurons is interconnected with each and every other ron For regular neural networks, the most common layer type is the fully-connectedlayer in which neurons between two adjacent layers are fully pairwise connected, butneurons within a single layer share no connections Each layer could have differentactivation functions as well The commonest type of artificial neural network consists

neu-of three groups, or layers, neu-of units: a layer neu-of “input” units is connected to a layer neu-of

“hidden” units, which is connected to a layer of “output” units as Figure 2.1

Figure 3.2: 2

The above network takes numerical inputs x1, x2and has weights w1, w2associatedwith those inputs Additionally, there is another input 1 with weight b (called theBias) associated with it

Each neuron receives input signals from its dendrites and produces output signalsalong its (single) axon The axon eventually branches out and connects via synapses

to dendrites of other neurons The basic unit of computation in a neural network isthe neuron, often called as a node or unit It receives input from some other nodes,

or from an external source and computes an output Each input has an associatedweight ( w0), which is assigned on the basis of its relative importance to other inputs.The node applies a function f (defined below) to the weighted sum of its inputs as inFigure 2.2

Trang 33

function is to introduce non-linearity into the output of a neuron This is importantbecause most real world data is non linear and we want neurons to learn these nonlinear representations.

Specifically, each neuron in the neural network gives an output determined by anactivation function acting on the inputs and there are some activation functions used,there are several activation functions:

• Sigmod: The Sigmod non-linearity has the mathematical form f (x) = σ (x) =

1

1+e−x, takes a real-valued input and squashes it to range [0, 1]

Figure 3.3: Sigmoid function 3

• Tanh: The Tanh like the sigmoid neuron, its activations saturate, but unlike thesigmoid neuron its output is zero-centered, takes a real-valued input and squashes

it to the range [−1, 1] Function Forumula: f (x) = σ (x) = 1+e1−x

• ReLU: ReLU stands for Rectified Linear Unit It takes a real-valued input andthresholds it at zero (replaces negative values with zero) Function Forumula:

f(x) = ReLU (x) = max(0, x)

The overall of ANN will help us to comprehend the theory, learning rules, tions of the most important neural network models, definitions and type of compu-tation The mathematical model of ANN brings insights on the definition of input,weight, summing function, activation function and output

applica-3 http://cs231n.github.io/neural-networks-1/

Trang 34

Figure 3.4: Tanh function

Figure 3.5: ReLU function5.

After that, ANN will decide how to learn the weight's type of adjustment by changingparameters The inside of ANN, each node (neuron) implements some computationsimply and transform the information of results to the nodes of the behind layer bythe fully connections between nodes are in different layers that are not in a samelayer While the time of the information conveyed is that the weights will be changed

or updated to present the signal is amplified or diminished and these information istransformed through forward and back propagation algorithm on network

Trang 35

There are various missions which are extremely complicated related to ComputerVision and Pattern Recognition but it is really done easily by analysis of human Forinstance, human quickly detects a number of objects with visual information withoutlots of efforts It is necessary that computers try to simulate how humans performthese tasks within limitations of the physical hardware To do this, the studies of ANNmodel are actually effective and necessary because those are going to present howneurons convey information to one another in network These will help researchersimprove their contributions and works regarding the systems of artificial intelligence.

To get achievements successfully, ANN must overcome a training stage steadily tolearn weights and updates them suitably for a particular task and one of these trainingmethods is Backpropagation Backpropagation algorithm is usually used with anoptimized method as Gradient Descent The Backpropagation will adjust weights

to achieve a set of well optimized value (minimum errors) Before going to trainingstage, ANNs have to create a set of weights, along with learning rate α The weightsare randomly created and are usually small value between -1 and 1 Learning rate is

a parameter to control the changing of weights and make the network's loss functionget minima in training phase The small value of learning rate indicate that value

of weights will just change tightly, the time of training have loss function achieveminima slowly Otherwise, if the value of learning rate is big, weights will changewidely and the training phase can lead to some errors because of widely changingweights

3.1.2 Convolutional Neural Networks (CNNs)

CNN is a model of deep learning network and CNN's neurons is inspired and oped by the organization of visual cortex The architecture of CNN is similar to thearchitecture of the original neural network CNN is constructed by neurons whichcan learn weights and biases Each neuron takes an input and transform informationwith non-linear property The entire CNN is going to present a differentiable scorefunction from the raw image like a node of network and class scores CNN still own

devel-a loss function such devel-as SVM or Softmdevel-ax on the ldevel-ast fully connected ldevel-ayer devel-and the

Trang 36

methods are developed for learning on ANN is still applied to CNN.

The CNN is different from ANN, it highlights the explicit assumption that the inputsare images and the images allow us to encode the definite characteristics of imagesinto the architecture of network Then, the network will present the forward algorithmefficiently and help to reduce the parameters significantly The layers used in CNNconsist of Convolution Layer, Pooling Layer, Rectified Linear Unit, Normalizationlayer and Fully-Connected Layer, Dropout Layer and these layers are stacked accord-ing to a certain order to generate a fully architecture of CNN Thus the construction

of CNN's architecture is flexible and depend on kinds of different problems that will

be sorted differently To understand the network clearly, these are some descriptionabout the layers of CNN:

- Convolutional layer (Conv layer): the convolutional layer is a core layer ofconvolutional network and there are a number of parameters in the layer that is kinds

of the convolution kernel (filter) as figure 3.7 Each filter is considered as a smallmatrix and is used to reduce the parameters of the next layer by striding the filteracross the entire image to compute results Each region that the filters stride acrosscalled as the receptive fields

Figure 3.7: An example of the receptive field7.

- Pooling layer: Pooling layer is another layer of CNN that is usually an ate layer after convolutional layer The pooling layer simplifies the results from the

Trang 37

immedi-putation in pooling layer such as average, stochastic, min, max, but the function

of max pooling is used widely by researchers

- Rectified Linear Unit Layer (ReLU): the goal of ReLU is to computes the tivation function f(x) = max(0, x) thresholding at zero This leaves the size of imageunchanged There are some advantages and disadvantage to use the ReLUs Specif-ically, researchers find it accelerate the convergence of stochastic gradient descentcompared to the sigmoid or tanh functions and if it is compared to tanh or sigmoidneurons related to expensive operations, the ReLU can be implemented by threshold-ing a matrix of activations at zero Unfortunately, ReLU units can be fragile during

ac-Figure 3.8: The ReLU activation function8.

training For example, a large gradient flowing through a ReLU neuron could causethe weights to update in such a way that the neuron will never activate on any data-point again If this happens, then the gradient flowing through the unit will forever bezero from that point on That is, the ReLU units can irreversibly die during trainingsince they can get knocked off the data manifold

- Fully-connected layer (FC layer): the fully connected layer is a classical type

of the neural network layer and the neurons fully connect to the entire activations

of the previous layer in a fully connected layer Specifically, the layer takes aninput as a vector of numbers and each input is fully and directly connected to each ofoutputs of the previous layer and the outputs also are the vector of numbers Each of

7 http://cs231n.github.io/convolutional-networks/

9 https://www.quora.com/What-is-an-intuitive-explanation-of-Convolutional-Neural-Networks

Trang 38

Figure 3.9: An example for fully connected layer (FC)

those connections possesses a weight that present how important they are - Dropout:Dropout is a very effective, simple and recently introduced regularization method and

is a simple way to help the network to reduce over-fitting During training time,dropout is implemented by only keeping a neuron active with some probability p (ahyperparameter), or setting it to zero otherwise

3.2 Face Detection

A new framework which consist of three stages to perform face detection and faciallandmark detection simultaneously In the first stage, it will propose several candidatewindows quickly through a shallow CNN After that, the second network will refinesthe windows to reject a large number of non-faces windows through a more complexCNN Finally, it uses a more powerful CNN to refine the result and output faciallandmarks positions

Trang 39

Figure 3.10: Pipeline of cascaded framework

face The notation ydeti ∈ {0, 1} denotes the ground-truth label

Given an image, they use image pyramid so that they have the image in multiple scale.Then the image is given as input to the following three-stage cascaded framework:

1 In the first stage, a fully convolutional network which is called Proposal work (P-Net) is used to obtain proposed regions and their bounding box regres-sion vectors The obtained regression vectors is used to calibrate the proposedregions and then apply non-maxima suppression (NMS) to merge highly over-lapped regions

Net-Figure 3.11: The architectures of P-Net

2 All proposed regions will be fed to another CNN which is called Refine Network

Trang 40

(R-Net), which will reject a large number of false candidates, performs anothercalibration with bounding box regression and also NMS candidate merge.

Figure 3.12: The architectures of R-Net

3 In the last stage, it is similar to second stage and is called Output Network Net) To furthermore describe face in details, they also output five facial land-marks positions

(O-Figure 3.13: The architectures of O-net

Unlike two stage proposal-classification detectors, SSH [4] detects faces in a singlestage directly from the early convolutional layers in a classification network SSH [4]

is headless

Splitting the overall architecture into 3 parts:

1 Detection Module, which detects the faces

2 Context Module, part of detection module

Định dạng
Số trang	87
Dung lượng	12,7 MB