VIETNAM NATIONAL UNIVERSITY OF HO CHI MINH CITY HO CHI MINH UNIVERSITY OF TECHNOLOGY ---o0o---NGUYỄN ĐỨC MINH FACE RECOGNITION PERFORMANCE COMPARISON BETWEEN K-NEAREST NEIGHBORS ALGOR
Trang 1
VIETNAM NATIONAL UNIVERSITY OF HO CHI MINH CITY
HO CHI MINH UNIVERSITY OF TECHNOLOGY
-o0o -NGUYỄN ĐỨC MINH
FACE RECOGNITION PERFORMANCE
COMPARISON BETWEEN K-NEAREST NEIGHBORS ALGORITHM
AND SELF-ORGANIZED MAP
SO SÁNH NHẬN DIỆN KHUÔN MẶT
SỬ DỤNG GIẢI THUẬT K GẦN NHẤT VỚI MẠNG NƠ-RON TỰ CẤU TRÚC
Department: Control Engineering & Automation Department ID: 60520216
MASTER THESIS
HO CHI MINH CITY, September 2020
Trang 2CÔNG TRÌNH ĐƯỢC HOÀN THÀNH TẠI TRƯỜNG ĐẠI HỌC BÁCH KHOA – ĐHQG – HCM
Cán bộ hướng dẫn khoa học: GS.TS Hồ Phạm Huy Ánh
Cán bộ chấm nhận xét 1: PGS.TS Huỳnh Thái Hoàng
Cán bộ chấm nhận xét 2: PGS.TS Nguyễn Tấn Lũy
Luận văn thạc sĩ được bảo vệ tại Trường Đại học Bách Khoa, ĐHQG TP HCM Ngày 04 tháng 09 năm 2020
Thành phần Hội đồng đánh giá luận văn thạc sĩ gồm:
1 Chủ tịch: PGS.TS Nguyễn Thanh Phương
2 Thư kí: TS Trần Ngọc Huy
3 Phản biện 1: PGS.TS Huỳnh Thái Hoàng
4 Phản biện 2: PGS.TS Nguyễn Tấn Lũy
5 Ủy viên: TS Nguyễn Hoàng Giáp
Xác nhận của Chủ tịch Hội đồng đánh giá LV và Trưởng Khoa quản lý chuyên ngành sau khi luận văn đã được sửa chữa (nếu có)
CHỦ TỊCH HỘI ĐỒNG TRƯỞNG KHOA ĐIỆN-ĐIỆN TỬ
PGS.TS NGUYỄN THANH PHƯƠNG TS HUỲNH PHÚ MINH CƯỜNG
Trang 3ĐẠI HỌC QUỐC GIA TP.HCM
TRƯỜNG ĐẠI HỌC BÁCH KHOA
CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM
Độc lập - Tự do - Hạnh phúc
NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: NGUYỄN ĐỨC MINH MSHV: 1770217
Ngày, tháng, năm sinh: 01/11/1994 Nơi sinh: TP.HCM
Chuyên ngành: Kỹ thuật Điều Khiển và Tự Động Hóa Mã số : 60520216
I TÊN ĐỀ TÀI: So Sánh Nhận Diện Khuôn Mặt Sử Dụng Giải Thuật K Gần Nhất với Mạng Nơ-ron Tự Cấu Trúc
Face Recognition Performance Comparison between K-Nearest Neighbors Algorithm and Self-Organized Map
II NHIỆM VỤ VÀ NỘI DUNG: Xây dựng hai hệ thống nhận diện khuôn mặt khác nhau bằng các phương pháp Giải thuật K gần nhất và Mạng Nơ-ron tự cấu trúc Từ đó tiến hành so sánh về mặt lý thuyết và ứng dụng, độ chính xác trong việc nhận diện khuôn mặt, ưu và khuyết điểm của hai phương pháp
III NGÀY GIAO NHIỆM VỤ : 19/08/2019
IV NGÀY HOÀN THÀNH NHIỆM VỤ: 03/08/2020
V CÁN BỘ HƯỚNG DẪN (Ghi rõ học hàm, học vị, họ, tên):
GS.TS Hồ Phạm Huy Ánh
Tp HCM, ngày 23 tháng 09 năm 2020 CÁN BỘ HƯỚNG DẪN (Họ tên và chữ ký) CHỦ NHIỆM BỘ MÔN ĐÀO TẠO (Họ tên và chữ ký) GS.TS Hồ Phạm Huy Ánh TS Nguyễn Vĩnh Hảo TRƯỞNG KHOA….………
(Họ tên và chữ ký)
TS Huỳnh Phú Minh Cường
Trang 4A C K N O W L E D G M E N T S
First and foremost, I would like to express our sincere gratitude and respect to mysenior project supervisor, Assoc Prof Dr Ho Pham Huy Anh for his guidance, advice,supervision and patience His enthusiastic support and encouragement gives me motivation
to research in this field
Also, I would like to thank my lecturers at Ho Chi Minh City University of Technology(HCMUT) who imparted valuable knowledge as well as shared their experiences andadvice for me in the past years These things are very meaningful for my further studyingand following jobs in the future
Besides, I also want to gives a profound thanks to my parents for their understanding,encouragement and support during the study period at HCMUT It always motivate me tostrive on my learning path
Last but not least, I would like to thank all my friends who also has an important role in
my studying with their supports and encouragement throughout our time studying together
in the Ho Chi Minh City University of Technology
Ho Chi Minh City, August, 2020
StudentNGUYEN DUC MINH
i
Trang 5A B S T R A C T
In recent years, automatic subject and object recognition system is not only a newtrend but also a challenging technology that attracts lots of attention due to its variousapplications in different fields Face recognition is one of those functions Currentlythere are many techniques that can provide a robust solution to various situations of thistechnology, which can adapt through environmental conditions and factors that affect therecognition ability
Nowadays, automating the face recognition process is a very practical task due to its widerange of applications including surveillance, human to machine interaction, security system,video compression, video indexing of large databases and a whole of other multimediaapplications Therefore, many designs and developments of a face recognition systemthat can apply for at least one of the possibility above can be found anywhere, frommobile phone camera to security surveillance Yet performing a detailed comparisonbetween methods still haven’t come to interest of researchers, as until now not manyarticles mention about this This problem might restrain newcomers of face recognitionfield to get an overview on current advantages and disadvantages of technologies, alsoexperienced researchers might be affected by too focusing in one field without noticedabout different methods Therefore, a performance comparison between two applicableface recognition method is our main goal
In this document, we describe the work completed for our senior project and providethe design of an efficient high-speed face recognition system As a further step from
my university thesis [1], this project includes a research about some of the existingmethods for face recognition, develop two algorithms for two face recognition systemsthat formulates both image-based and feature-based approach, using the 29 levels ResidualNeural Network (ResNet-29) for encoding facial features in a normal picture, following
by k-Nearest Neighbors (KNN) Algorithm for training and recognition The method isimplement in Python language and thus applicable for various operating system, with auser friendly GUI and tested many times in different working conditions to prove that themethod has good success rate in real life applications From that we perform a performancecomparison between this new method and the result of my previous work, which wasusing Illumination Normalization (IN), 2D-DCT and Self-Organized Map (SOM) written
in MATLAB environment Due to different algorithm and environment, we will focus onfinal result of the output application
ii
Trang 6T Ó M T Ắ T L U Ậ N VĂ N
Những năm gần đây, các ứng dụng nhận dạng vật thể đã và đang trở nên thông dụnghơn và thu hút nhiều sự chú ý từ những nhà nghiên cứu, nhờ vào tiềm năng rộng lớn mà
nó đem lại Nhận diện khuôn mặt là một trong số những ứng dụng đó Cho đến hiện tại,
đã có không ít phương pháp nhận diện khuôn mặt cho ra kết quả rất tốt, bất kể trong điềukiện hình ảnh và ánh sáng khắc nghiệt - những yếu tố ảnh hưởng xấu tới kết quả nhận dạngtrong các phương pháp truyền thống
Ngày nay, ứng dụng nhận diện khuôn mặt đã trở nên quen thuộc Trong đó có thể kểđến ứng dụng mở khóa bằng khuôn mặt trên điện thoại di động, hệ thống an ninh, táchlọc thông tin nhận dạng từ video, và còn rất nhiều những ứng dụng khác Để đạt được khảnăng ứng dụng lớn như vậy, rất nhiều phương pháp nhận diện khuôn mặt khác nhau đãđược đề xuất Tuy nhiên việc so sánh thuật toán và kết quả giữa các phương pháp lại ítnhận được sự chú ý của những nhà nghiên cứu Hiện tại, không có nhiều bài báo khoa học
đề cập đến việc này Hệ quả là những người mới bắt đầu tìm hiểu về ứng dụng nhận diệnkhuôn mặt sẽ khó tìm được ưu và khuyêt điểm giữa những phương pháp, còn những người
đã có kinh nghiệm lại tập trung vào một vài phương pháp mà họ biết và bỏ qua tiềm năngcủa các thuật toán nhận diện khác Vì lý do đó, mục tiêu của luận văn này là đưa ra một bài
so sánh về thuật toán cũng như độ chính xác giữa hai phương pháp nhận diện khuôn mặtthông dụng
Trong bài viết này, tôi sẽ mô tả những việc đã thực hiện để xây dựng và so sánh hai hệthống nhận diện khuôn mặt khác nhau Được phát triển từ luận văn tốt nghiệp đại học củatôi [1], luận văn thạc sĩ này sẽ bao gồm các công việc: tìm hiểu và nghiên cứu các phươngpháp nhận diện khuôn mặt thông dụng hiện nay, xây dựng hai hệ thống nhận diện khuônmặt dựa vào hai thuật toán đã chọn lọc, ứng dụng nhận diện dựa vào hình ảnh và dựa vàođặc trưng Hệ thống thứ nhất sử dụng mạng nơ-ron ResNet 29 lớp (ResNet-29) để nén đặctrưng hình ảnh, kết hợp với thuật toán K gần nhất (KNN) cho quá trình huấn luyện và nhậndạng Ngôn ngữ lập trình Python được chọn để ứng dụng cho nhiều hệ điều hành khácnhau, với GUI dễ sử dụng và được kiểm thử nhiều lần trong nhiều điều kiện ánh sáng khácnhau nhằm chứng minh khả năng ứng dụng của hệ thống này Sau đó, tôi tiến hành so sánh
nó với hệ thống thứ hai lấy từ luận văn đại học của tôi, sử dụng xử lý hình ảnh bằng thuậttoán IN và 2D-DCT, với bộ nhận diện là mạng nơ-ron tự cấu trúc (SOM) trong môi trườngMATLAB Vì thuật toán và ngôn ngữ khác nhau, ta sẽ tập trung vào so sánh kết quả đầu racủa hai phương pháp trong ứng dụng thực tiễn
iii
Trang 7D E C L A R A T I O N
I declare that this thesis is an original report of my research, has been written by me andhas not been submitted anywhere else The experimental work is almost entirely my ownwork The collaborative contributions have been indicated clearly and acknowledged Duereferences have been provided on all supporting literatures and resources
I declare that this thesis was composed by myself, that the work contained herein is myown except where explicitly stated otherwise in the text, and that this work has not beensubmitted for any other publication or professional qualification
Ho Chi Minh City, August, 2020
StudentNGUYEN DUC MINH
iii
Trang 8C O N T E N T S
I I N T R O D U C T I O N
1.1 Pattern Recognition 3
1.2 Face Recognition 3
2 A B O U T T H I S P R O J E C T 5 2.1 Project Overview 5
2.2 Problem Statement 5
2.3 Project Objective 6
2.4 Project Methodology 6
2.4.1 Study and Research 6
2.4.2 Design and Implementation 7
2.4.3 Performance Comparison 7
II R E L AT E D T H E O R Y 3 M A C H I N E L E A R N I N G A N D A R T I F I C I A L N E U -R A L N E T W O -R K 9 3.1 Introduction 9
3.2 Historical Background 10
3.2.1 Origins of Machine Learning 10
3.2.2 Origins of Neural Networks 12
3.3 Machine Learning Algorithms 14
3.3.1 An overview 14
3.3.2 Machine Learning Models 15
3.3.2.1 Artificial Neural Networks 16
3.3.2.2 Decision Trees 16
3.3.2.3 Linear Regression 18
3.3.2.4 Support Vector Machine 19
3.3.2.5 k-Nearest Neighbors 21
3.3.2.6 Bayesian Networks 21
3.4 k-Nearest Neighbors 22
3.4.1 KNN Algorithms 22
iv
Trang 9C O N T E N T S v
3.4.1.1 Determine value of K 23
3.4.1.2 Distance calculation 23
3.4.1.3 Output class measurement 23
3.4.2 Application of KNN 24
3.5 Neural Network Algorithms 24
3.5.1 Biological and Artificial Neurons 25
3.5.1.1 Biological Neurons 25
3.5.1.2 Artificial Neurons 26
3.5.1.2.1 Firing Rules 26
3.5.1.2.2 Simple Artificial Neuron 27
3.5.1.2.3 Complicated Artificial Neuron 28
3.5.2 Architecture of Neural Networks 28
3.5.2.1 Feed-forward Networks 29
3.5.2.1.1 Single-Layer Perceptron 30
3.5.2.1.2 Multi-layer Perceptron 31
3.5.2.1.3 ADALINE 32
3.5.2.1.4 Radial Basis Function Network 33
3.5.2.1.5 Convolutional Neural Network (CNN) 34 3.5.2.1.6 Residual Neural Network (ResNet) 35
3.5.2.1.7 Kohonen Self-Organizing Map (SOM) 36 3.5.2.2 Recurrent Networks 37
3.5.2.2.1 Simple Recurrent Network 37
3.5.2.2.2 Hopfield Network 38
3.5.2.2.3 Echo State Network 38
3.5.2.2.4 Long Short-term Memory Network 38
3.5.2.3 Stochastic Neural Networks 39
3.5.2.4 Botlzmann Machine 39
3.5.2.5 Modular Neural Networks 39
3.5.3 Neural Network Training 39
3.5.3.1 Definition of Training 39
3.5.3.2 Selection of Cost Function 40
3.5.3.3 Memorization of Inputs 41
3.5.3.3.1 Associative Mapping 41
3.5.3.3.1.1 Auto-association 41
3.5.3.3.1.2 Hetero-association 42
3.5.3.3.2 Regularity Detection 42
3.5.3.4 Determination of Weight 42
3.5.4 Learning Paradigms 42
3.5.4.1 Supervised Learning 43
3.5.4.2 Unsupervised Learning 43
3.5.4.3 Reinforcement Learning 43
3.5.4.4 Training Function 44
3.5.5 Learning Algorithm 44
3.5.6 Employing Artificial Neural Networks 45
Trang 10C O N T E N T S vi
3.5.6.1 Selection of Model 45
3.5.6.2 Selection of Learning Algorithm 45
3.5.6.3 Robustness 45
3.5.7 Applications of Artificial Neural Networks 45
3.6 Kohonen Self-Organizing Map 46
3.6.1 SOM Network Architecture 46
3.6.2 Training Process of SOM 47
3.6.2.1 The Competitive Process 47
3.6.2.2 The Cooperative Process 48
3.6.2.3 The Adaptive Process 48
3.6.2.4 Ordering and Convergence 49
3.6.3 SOM Applications 49
3.7 Conclusion 50
3.7.1 Different between ML and ANN 50
3.7.2 Applying ML and ANN in this project 51
4 I M A G E C O M P R E S S I O N 52 4.1 Discrete Cosine Transform 52
4.1.1 Introduction 52
4.1.2 Properties of DCT 53
4.1.3 Definition of DCT 54
4.1.3.1 Overview 54
4.1.3.2 One-dimensional type-2 DCT 54
4.1.3.3 Two-Dimensional Type-2 DCT 56
4.1.3.4 2D-DCT in Image compression 57
4.1.3.4.1 2D-DCT basis functions 57
4.1.3.4.2 DCT coefficients matrix 58
4.1.4 DCT Image Compression in JPEG format 60
4.1.4.1 Overview 60
4.1.4.2 Example with detailed process 61
4.1.5 Other Applications of DCT 63
4.1.6 Conclusion 63
4.2 Illumination Normalization 63
4.2.1 An Engineer Approach 63
4.2.2 IN Techniques 66
4.2.2.1 Introduction 66
4.2.2.2 Finding least error IN technique 67
4.2.2.3 Applying and testing AS with DCT in image compression 68
4.2.3 Conclusion 73
4.2.3.1 Research Analysis 73
4.2.3.2 Disadvantages of the Proposed Method 74
4.3 Residual Network for Image Data Encoding 74
Trang 11C O N T E N T S vii
4.3.1 Motivation 74
4.3.2 Deviation with other Neural Networks 75
4.3.2.1 Identity block 76
4.3.2.2 Convolutional block 77
4.3.3 Applying ResNet in image encoding 78
4.3.4 Conclusion 79
III D E S I G N A N D I M P L E M E N TAT I O N 5 F A C E I M A G E P R O C E S S I N G 83 5.1 Data Gathering 83
5.1.1 High Resolution Images 83
5.1.2 Online Database of Images 84
5.1.3 Low Resolution Images 85
5.2 Image Pre-processing 85
5.2.1 High Resolution Images 85
5.2.2 Online Database Images 86
5.2.3 Low Resolution Images 86
5.3 Image Data Compression 87
5.3.1 Input design for SOM system 87
5.3.2 Input design for KNN system 88
5.4 Reshape and Save Image Data 88
6 S Y S T E M D E S I G N F O R S E L F - O R G A N I Z E D M A P 91 6.1 Network Architecture 91
6.2 Network Size 92
6.3 Training Time 94
6.4 Other Parameters 95
7 S Y S T E M D E S I G N F O R K - N E A R E S T N E I G H B O R S 96 7.1 System Architecture 96
7.2 Determine value of k 97
7.3 Other Parameters 98
IV T E S T I N G A N D E X P E R I M E N TA L W O R K 8 N E U R A L N E T W O R K VA L I D AT I O N 101 8.1 Trained Data 101
8.2 Untrained Data 102
8.3 SOM Recognition Program 102
8.4 KNN Recognition Program 103
Trang 12C O N T E N T S viii
9.1 Calibrate SOM System 106
9.1.1 Optimal Number of Neurons 106
9.1.2 Optimal Number of Epochs 107
9.2 Calibrate KNN System 108
9.2.1 Optimal Number of K 108
9.2.2 Optimal Number of Nearest Distance 110
10 G U I I M P L E M E N TAT I O N 112 10.1 GUI Design for SOM System 112
10.1.1 Main Window 112
10.1.2 Database Camera Window 114
10.1.3 Database Modifying Window 115
10.1.4 Input Modifying Window 115
10.1.5 Input Camera Window 116
10.2 GUI Design for KNN System 116
10.2.1 Main Window 117
10.2.2 Database Acquisition Windows 118
10.2.3 Training Data Encoding Wizard 120
10.2.4 Test Face Recognition Wizard 121
11 P E R F O R M A N C E C O M PA R I S O N 124 V R E C O M M E N D AT I O N S A N D C O N C L U S I O N 12 C O N C L U S I O N 127 12.1 Significance of the Project 127
12.2 Remaining Limitations 128
VI R E F E R E N C E S
Trang 13L I S T O F F I G U R E S
Figure 1.0.1 General steps of a facial recognition system 2
Figure 1.1.1 Block diagram of a pattern recognition system 3
Figure 1.2.1 Block diagram of a face recognition system 4
Figure 3.2.1 Model comparison between Normal programmed code and Ma-chine Learning program 10
Figure 3.3.1 Illustration on learning from experience [2] 15
Figure 3.3.2 Basic contents of a decision tree 16
Figure 3.3.3 Basic concepts of how SVM works 20
Figure 3.3.4 A Bayesian network with 9 nodes and their conditional probability distribution 22
Figure 3.5.1 A simple biological neuron 25
Figure 3.5.2 Artificial neuron with similar structure compare to a human neuron 26 Figure 3.5.3 Structure of a node with its inputs (xi), weights (wi), activation function σ(z)and output function ν 28
Figure 3.5.4 Structure of a node with its inputs (xi), weights (wi), weighted inputs (wixi), activation function σ(z)and output function ν 28
Figure 3.5.5 General structure of feed-forward neural network 29
Figure 3.5.6 A single-layer perceptron network capable of calculating XOR 30 Figure 3.5.7 Architecture of RBF network 33
Figure 3.5.8 CNN architecture with reference to Multi-layer perceptron 35
Figure 3.5.9 Key feature analysis using filters in CNN’s convolutional layer 35 Figure 3.5.10 Overview of CNN algorithm 36
Figure 3.5.11 Overview of ResNet-50 structure 36
Figure 3.5.12 General structure of feedback neural network 37
Figure 3.5.13 General structure of simple recurrent neural network 38
Figure 3.6.1 Kohonen SOM structure 47
Figure 3.7.1 Relation between AI, ML and NN 50
Figure 4.1.1 DCT transforms an image from the spatial domain to the elemen-tary frequency domain 53
Figure 4.1.2 Three frequency regions of a DCT matrix and its histogram, applied in a sample image block that containing 8 × 8 pixels 54
Figure 4.1.3 Amplitude spectra of the gray-scaled image above, under the DFT and the DCT 55
Figure 4.1.4 The first step to compute a 2D-DCT from a N1 × N2matrix is apply 1D-DCT for elements in each N2rows 56
Figure 4.1.5 Next, we compute 1D-DCT of each columns in the matrix ob-tained from above step 56
ix
Trang 14L I S T O F F I G U R E S x
Figure 4.1.6 A more generic 2D-DCT architecture for a 8 × 8 matrix 57
Figure 4.1.7 The DCT transforms an 8 × 8 block of input values to a linear combination of these 64 patterns The patterns are referred to as the 2D-DCT basis functions 58
Figure 4.1.8 A more generic view about the usage of 2D-DCT basis functions DC coefficient is the mean of each corresponding block AC coefficients are simply the rest coefficients in the block 59
Figure 4.1.9 Comparison between pixel data matrix and DCT coefficients 59
Figure 4.1.10 Illustration of 2D blocked DCT of face image with 8 × 8 blocks These blocks were randomly picked from the image [3] 60
Figure 4.1.11 Illustration of JPEG image compression processes using DCT and IDCT [4] 61
Figure 4.1.12 Illustration of blocked 2D-DCT of a face image with 8 × 8 blocks 62 Figure 4.1.13 An example of the 8 × 8 matrix used as the mask, with the first 10 coefficients are selected 62
Figure 4.1.14 Zigzag ordering of DCT coefficients converting a 2D matrix into a 1D array of integers, which used in JPEG image compression The frequency (horizontal and vertical) increases in this order, and the coefficient variance (average of magnitude square) decreases in this order 63
Figure 4.1.15 2D-DCT and IDCT applied on a sample face image in JPEG compression example 64
Figure 4.1.16 An example of the above process from an image block’s pixel data to quantized DCT coefficients 65
Figure 4.1.17 An example of the above process from quantized DCT coefficients to quantized image block, using IDCT By using quantized DCT to store data, this saves a lot of bits, but we no longer have an exact replica of original image block 65
Figure 4.2.1 Results showing effect of 21 IN techniques applied on a sample image 69
Figure 4.2.2 The JPEG default normalization matrix (Q) The DCT coefficients matrix will be rounded-divide with each corresponding element of Q to get the normalized DCT coefficients 71
Figure 4.2.3 Algorithm implementation procedures of the proposed DCT using JPEG coding of an image (called ASDCT, means AS and DCT) 71 Figure 4.2.4 Receiver operating characteristic (ROC) curve of ASDCT with Mahalanobis Cosine distance (MAHCOS [5]) on a sample image set [6] 72
Figure 4.2.5 Cumulative match characteristic (CMC) curve of ASDCT with Mahalanobis Cosine distance (MAHCOS [5]) on a sample image set [6] 72
Figure 4.3.1 General structure of FaceNet and its learning objective 75
Figure 4.3.2 Identity block taken from figure3.5.11 76
Figure 4.3.3 Convolutional block taken from figure3.5.11 77
Trang 15L I S T O F F I G U R E S xi
Figure 4.3.4 Proposed ResNet architectures from section [7] and their
perfor-mance 78
Figure 5.1.1 A high resolution face image (3920×2204 pixels) 84
Figure 5.1.2 Sample images from ORL database 84
Figure 5.1.3 Images taken from built-in webcam 85
Figure 5.2.1 (a) Original image (b) Preprocessed image 86
Figure 5.3.1 (a) Original image (b) DCT compressed image 87
Figure 5.3.2 Zoomed result image after resize 87
Figure 5.3.3 (a) Original image (b) Encoded array 88
Figure 5.4.1 SOM toolbox data selection screen 89
Figure 5.4.2 (a) MATLAB’s reshape command (b) Reshaped image 90
Figure 6.1.1 General architecture of our SOM 92
Figure 6.2.1 Database of images is loaded into MATLAB workspace 93
Figure 6.2.2 The starting screen of SOM generating tool 93
Figure 6.2.3 An SOM topology map with 128 neurons (8×16) 94
Figure 6.3.1 (a) Good sample hit (b) Bad sample hit 95
Figure 7.1.1 KNN working principle 97
Figure 8.1.1 (a) Trained database (b) Input subjects for the initial test 101
Figure 8.2.1 (a) Trained database (b) Input subject for the second test 102
Figure 8.3.1 General process of the face recognition system using SOM 103
Figure 8.3.2 (a) Trained database (b) Input subjects for the program 104
Figure 8.4.1 General process of the face recognition system using KNN 105
Figure 9.1.1 Effect of number of neurons vs accuracy 107
Figure 9.1.2 Effect of number of epochs vs accuracy 109
Figure 9.2.1 Effect of number of K vs accuracy 110
Figure 9.2.2 Effect of number of D vs accuracy 111
Figure 10.1.1 SOM GUI main window 113
Figure 10.1.2 SOM GUI database camera window 114
Figure 10.1.3 SOM GUI database modifying window 115
Figure 10.1.4 SOM GUI input modifying window 116
Figure 10.1.5 SOM GUI input camera window 117
Figure 10.2.1 KNN GUI Main window 118
Figure 10.2.2 (a) "File" drop down list (b) "Options" drop down list (c) "Options - Change Camera ID" window (d) Auto save location 119
Figure 10.2.3 (a) "Data Input" tab contents (b) "Recognition Input" tab contents 122
Figure 10.2.4 User Data Input Wizard 122
Figure 10.2.5 (a) Camera feed window with manual snapshot button at the bottom (b) Snapshot support function window 123
Figure 10.2.6 Face Recognition Wizard 123
Trang 16L I S T O F T A B L E S
Table 1 Truth table of the above example 27
Table 2 Truth table of the above example with firing rule applied 27
Table 3 Comparison of 21 proposed techniques’ error [6] 68
Table 4 Structure of ResNet-29 model used for image data 80
Table 5 Description of UCI data sets used 98
Table 6 Result of k-NN with different k values 98
Table 7 Comparison of different number of neurons vs simulation time and accuracy 107
Table 8 Comparison of different number of epochs vs simulation time and accuracy 108
Table 9 Comparison of different number of k vs simulation time and accuracy 109
Table 10 Comparison of different distance thresold vs accuracy 111
Table 11 General system deviation between the designed SOM and KNN 125
xii
Trang 17A C R O N Y M S
PCA Principal Component Analysis
LDA Linear Discriminant Analysis
LFA Local Feature Analysis
ADALINE ADAptive LINear Elements
ANN Artificial Neural Network
RBF Radical Basis Function
ASIC Application Specific Integrated Circuit
MCP McCulloch and Pitts model
SVM Support Vector Machine
CNN Convolutional Neural Network
SLP Single-Layer Perceptron
MLP Multi-Layer Perceptron
SRN Simple Recurrent Network
xiii
Trang 18L I S T O F TA B L E S xiv
KDD Knowledge in Discovery Database
DCT Discrete Cosine Transform
IN Illumination Normalization
SSR Single Scale Retinex
ASR Adaptive Single scale Retinex
SSQ Single scales Self-Quotient image
MSQ Multi scale Self-Quotient image
MAS Modified Anisotropic diffusion
WEB Single scale WEBerfaces
LSSF Large and Small Scale Features
DOG Difference Of Gaussian filtering
Trang 19L I S T O F TA B L E S xv
JPEG Joint Photographic Experts Group
PSNR Peak Signal to Noise Ratio
RMSE Root Mean Square Error
ASDCT AniSotropic diffusion and Discrete Cousin Transform
ROC Receiver Operating Characteristic
CMC Cumulative Match Characteristic
MAHCOS MAHalanobis COSine
DFT Discrete Fourier Transform
MPEG Moving Picture Experts Group
MJPEG Motion Joint Photographic Experts Group
DCT-II Type-2 Discrete Cousin Transform
1D-DCT One-Dimensional Type-2 Discrete Cousin Transform
2D-DCT Two-Dimensional Type-2 Discrete Cousin Transform
IDCT Inverse Discrete Cousin Transform
2D-IDCT Two-Dimensional Inverse Type-2 Discrete Cousin Transform
Trang 20Part I
I N T R O D U C T I O N
Trang 21O V E R V I E W
Human communication has two main aspects: verbal and non-verbal Examples of thesecond aspect are physiological reactions, facial expressions, body movements, etc all ofwhich provide essential information regards the state of an individual
Considerable researches in social psychology have indicated that facial expressions helparrange conversations and have more significant effects on the listeners’ attitude towardthe speaker than the speaker’s actual spoken words Mehrabian pointed out that the verbalpart of a message contributes only for 7% to the effect of that message as a whole Thevocal part contributes for 38%, while facial expression of the speaker contributes for 55%
to the effect of the speech Therefore face recognition is an important addition not only tocomputer vision research but also to other fields of science as well [8]
Recent advances in image analysis and pattern recognition have been opening upnew branches for automatic detection and classification of emotional and conversationalfacial signals Automating facial expression analysis could bring facial expressions intohuman-machine interaction as a new modality and could make the interaction neater andmore efficient Such a system could also make classification of facial expressions widelyaccessible as a tool for research in behavioral science and medicine In Figure1.0.1, theoutline of a typical face recognition system is given This outline also takes into accountthe characteristics of a typical pattern recognition system [9]
Figure 1.0.1: General steps of a facial recognition system
2
Trang 221.1 PAT T E R N R E C O G N I T I O N 3
1.1 PAT T E R N R E C O G N I T I O N
Pattern recognition is a branch of artificial intelligence associated with the classification
or description of observations - defining points in an appropriate multidimensional space.Pattern recognition aims to categorize data (patterns) based on either a priory knowledge
or on statistical information extracted from the patterns The patterns to be classified areusually groups of measurements or observations A complete pattern recognition systemconsists of a sensor that gathers the observations; a feature extraction mechanism thatcomputes numeric or symbolic information from those observations; and a classification ordescription scheme that classifies or describes observations, based on the extracted features[10] Pattern recognition has extensive application in astronomy, medicine, robotics, andremote sensing by satellites
Although some of the barriers that hindered such automated pattern recognition systemshave been removed due to advances in computer hardware in recent years, providing ma-chines capable of faster and more sophisticated computation, the field of pattern recognition
is still pretty much in its early stages [11]
In summary, pattern recognition can be classified as the categorization of input data intorecognizable classes via the extraction of significant features or attributes of the data from
a background of irrelevant details The functional block diagram of an adaptive patternrecognition system is shown in Figure1.1.1
Figure 1.1.1: Block diagram of a pattern recognition system
1.2 F A C E R E C O G N I T I O N
Face recognition, although considered a casual task for the human brain, has proved to
be extremely difficult to simulate artificially, because although similarities exist betweenfaces, they can vary considerably in terms of age, skin color, angle, facial expressions or
Trang 231.2 F A C E R E C O G N I T I O N 4
facial details such as glasses or beards The problem is further complicated by varyinglighting conditions, image qualities and geometries, as well as the possibility of partialocclusion and disguise [9]
For basic face recognition systems, some of these effects can be avoided by assumingand ensuring a uniform background and lighting condition This assumption is acceptablefor some applications such as automated separation of goods on a production line, wherelighting condition can be controlled and the image background is uniform For manyapplications however, this is not suitable, and systems must be designed to accuratelyclassify images subjected to a variety of unpredictable conditions Figure1.2.1outlinesthe block diagram of a typical face recognition system
Figure 1.2.1: Block diagram of a face recognition system
Trang 24image-in the image.
This project includes the research, design, development and performance comparison
of two efficient facial recognition systems Each systems have completely differentalgorithm, but both are aim for the same goal of recognizing facial identity and are based
on general architecture of facial recognition systems The first system is follows based approach and programmed in MATLAB The other system is applying the remainingapproach: feature based, with the programming language is Python
While this technology received a lot of attention from the foreign countries, especiallydeveloped countries where research in computer vision has becoming an important fielddue to the increasing in its potential in commercial and law enforcement, it’s not surprise
to see face recognition appears in domestic applications Yet we rarely found any articlementions about performance comparison between some popular methods, as public usagesare only focus on applying to a specific task This phenomenon is blocking a real usageoverview of whom interested in this field but have not experienced in all aspects of it
5
Trang 252.3 P R O J E C T O B J E C T I V E 6
Therefore, our project focuses on developing two different techniques, both can provide
a solution for an efficient high-speed face recognition system in surveillance applications,then perform speed and accuracy comparison between them in different conditions From
it, we can find out which method is good for which situation as well as their advantages.2.3 P R O J E C T O B J E C T I V E
The goal of this project is to study and design two efficient high-speed face recognitionsystems, one in MATLAB and the other is in Python, with different popular algorithms,then perform a real application performance analysis on the same hardware The specificobjectives are listed below
• To study and understand simple pattern recognition using face images
• To design two different models for face recognition system
• To enhance the models for a high-speed system to serve surveillance purposes
• To develop a program in MATLAB based on the first designed model
• To develop a program in Python based on the second designed model
• To create a database set of face images, use them to validate both systems
• To perform tests for program optimization for both systems
• To perform performance analysis for each systems
• To demonstrate the effectiveness for each systems
2.4 P R O J E C T M E T H O D O L O G Y
The design and implementation of our project consists of two main phases
2.4.1 Study and Research
In this phase, we focus on the following missions:
• Research different approaches on face recognition
• Study about neural networks, machine learning and image compression methods
• Focus our study on the selected solutions
• Propose a general design to visualize our research
Trang 262.4 P R O J E C T M E T H O D O L O G Y 7
2.4.2 Design and Implementation
In this phase, we apply what we’ve studied to:
• Design the system working principle
• Execute the design by developing two different softwares by MATLAB and Python
• Test the theory with our calculation and adjust the program parameters
• Optimize the performance and methods of computation
• Create a simple Graphical User Interface (GUI) of the program
2.4.3 Performance Comparison
In this phase, we apply the implementation to real life usage:
• Perform accuracy check for both implemented systems
• Perform speed check for both implemented systems
• Ensure test environments are balanced for each systems
• Specify most suitable application, advantages and disadvantages of both systems
Trang 27Part II
R E L A T E D T H E O R Y
Trang 283.1 I N T R O D U C T I O N
ML is a field of Artificial Intelligence (AI) concerned with training algorithms tointerpret large amounts of data with as little human guidance as possible The algorithmsprocess data, then use what they “learned” from those calculations to adjust themselves inorder to make better decisions on the next batch of data
ANNs, or neural networks in general, are electronic models based on the neural structure
of the brain It comes without saying that some problems which are beyond the capability
of computers are actually solvable by small efforts of an animal mind, since the brain learnsfrom experience, a feature that casual machines do not possess That aspect also representsANN as a branch of ML field, since they are having the same goal: make programs proceedtasks continuously without human interaction or as less as possible
Biologically inspired methods of computing are thought to be the next major leap in thecomputing industry Computers do rote tasks very well, like keeping orders or performingcomplex mathematical calculations However, they have trouble recognizing even simplepatterns much less than generalizing those patterns of the past into actions of the future [3].Nowadays, a new field of programming that mentions words which are more "human"than traditional computation, like behave, react, self-organize, learn, generalize, and forget,etc is developing at a significant rate From the troubled early years of development to
9
Trang 293.2 H I S T O R I C A L B A C K G R O U N D 10
the unbelievable advances in the field, neural networks have been a fascinating source ofintellectual enjoyment for computer scientists around the world
3.2 H I S T O R I C A L B A C K G R O U N D
3.2.1 Origins of Machine Learning
During the early state of AI development, in 1955 researchers defined the definition of
AI of "making intelligent machines that have the ability to achieve goals like humans do"
To achieve that target, many fundamental methods had been proposed Machine Learning(ML) is one of those approach, which started the huge changes in the future we are livingin
Defined by Arthur Samuel in 1959, ML had been defined as a large field of AI whichmaking computers the ability to learn without being explicitly programmed This means asingle program, once created, will be able to learn how to do some intelligent activitiesoutside the notion of programming He created a game of checkers to visualize his concept,later known as the first computer learning program The program "improved" skills asmore time being played, studying which moves can lead to winning strategies and parsingthose move into its program automatically
Figure 3.2.1: Model comparison between Normal programmed code and Machine Learning
pro-gram
Samuel also designed a number of mechanisms allowing his program to become better
In what Samuel called rote learning, his program recorded/remembered all positions it hadalready seen and combined this with the values of the reward function
In 1957, Frank Rosenblatt combined Donald Hebb’s model of brain cell interactionwith Arthur Samuel’s Machine Learning efforts and created the perceptron program Thissoftware was installed in a machine called Mark 1 perceptron and has been created forimage recognition purpose This made the software and the algorithms transferable andavailable for other machines
Described as the first successful neuro-computer, the Mark I perceptron developed someproblems with broken expectations Although the perceptron seemed promising, it couldnot recognize many kinds of visual patterns (such as faces), causing frustration and stallingneural network research It would be several years before the frustrations of investors and
Trang 30In the 1960s, the discovery and use of multilayers opened a new path in machinelearning and started the neural network research It was discovered that providing andusing two or more layers in the perceptron offered significantly more processing powerthan a perceptron using one layer Other versions of neural networks were created after theperceptron opened the door to “layers” in networks, and the variety of neural networkscontinues to expand The use of multiple layers led to feed forward neural networks andback propagation.
Recently, ML has become popular and used widespread in many fields, responsiblefor some of the most significant technology advancements such as new industry of selfdriving vehicles or automated drones ML built a standard concepts, brings AI researchand application flourish since 1990s to this day, including supervised and unsupervisedlearning, robotics algorithm, Internet of Things (IoT) and many more Technically, MLdesigned to be adaptive by time, continuously learning which makes them increasinglygive accurate decisions the longer they operate However, in real life situations, sometasks requires absolute accuracy and need detailed explanation such as medical diagnosis,disease analysis and pharmaceutical development This affects directly to the problem
of ML: being considered as a black-box algorithm after significant training time Thesemake machine decisions in transparent and non-understandable, even to the eyes of experts,which reduces trust in ML specifically and AI generally
Back to the above concept, researcher defines an Artificial Neural Network (ANN) whichhas hidden layers used to respond to more complicated tasks than the earlier perceptronscould We can see that ANNs are primary tool used for Machine Learning Neural networksuse input and output layers and include a hidden layer designed to transform input into datathat can be used the by output layer The hidden layers are excellent for finding patterns toocomplex for a human programmer to detect, meaning a human could not find the patternand then teach the device to recognize it
We will discuss the detailed historical background about neural networks in next section
Trang 313.2 H I S T O R I C A L B A C K G R O U N D 12
3.2.2 Origins of Neural Networks
In 1943, neuro physiologist Warren McCulloch and mathematician Walter Pitts wrote apaper on how neurons might work In order to describe how neurons in the brain mightwork, they modeled a simple neural network using electrical circuits
In 1949, Donald Hebb wrote The Organization of Behavior, a work which pointedout the fact that neural pathways are strengthened each time they are used, a conceptfundamentally essential to the ways in which humans learn If two nerves fire at the sametime, he argued, the connection between them is enhanced
As computers became more advanced in the 1950’s, it was finally possible to simulate ahypothetical neural network The first step towards this was made by Nathanial Rochesterfrom the IBM research laboratories Unfortunately for him, the first attempt to do so failed
In 1959, Bernard Widrow and Marcian Hoff of Stanford developed models called
"ADALINE" and "MADALINE." In a typical display of Stanford’s love for acronyms,the names come from their use of Multiple ADAptive LINear Elements ADALINE wasdeveloped to recognize binary patterns so that if it was reading streaming bits from a phoneline, it could predict the next bit MADALINE was the first neural network applied to areal world problem, using an adaptive filter that eliminates echoes on phone lines Whilethe system is as ancient as air traffic control systems, like air traffic control systems, it isstill in commercial use
In 1962, Widrow and Hoff developed a learning procedure that examines the valuebefore the weight adjusts it (i.e 0 or 1) according to the rule:
Weight Change=Pre-weight line value x Error
Number of Inputs
It is based on the idea that while one active perceptron may have a big error, one can adjustthe weight values to distribute it across the network, or at least to adjacent perceptrons.Applying this rule still results in an error if the line before the weight is 0, although thiswill eventually correct itself If the error is conserved so that all of it is distributed to all ofthe weights than the error is eliminated
Despite the later success of the neural network, traditional Von Neumann architecturetook over the computing scene, and neural research was left behind Ironically, John vonNeumann himself suggested the imitation of neural functions by using telegraph relays orvacuum tubes
In the same time period, a paper was written that suggested there could not be anextension from the single layered neural network to a multiple layered neural network Inaddition, many people in the field were using a learning function that was fundamentallyflawed because it was not differentiated across the entire line As a result, research andfunding went drastically down
Trang 323.2 H I S T O R I C A L B A C K G R O U N D 13
This was coupled with the fact that the early successes of some neural networks led to
an exaggeration of the potential of neural networks, especially considering the practicaltechnology at the time Promises went unfulfilled, and at times greater philosophicalquestions led to fear Writers pondered the effect that the so-called "thinking machines"would have on humans, ideas which are still around today
The idea of a computer which programs itself is very appealing If Microsoft’s Windows
2000 could reprogram itself, it might be able to repair the thousands of bugs that theprogramming staff made Such ideas were appealing but very difficult to implement Inaddition, von Neumann architecture was gaining in popularity There were a few advances
in the field, but for the most part research was few and far between
In 1972, Kohonen and Anderson developed a similar network independently of oneanother, which we will discuss more about later They both used matrix mathematics todescribe their ideas but did not realize that what they were doing was creating an array ofanalog ADALINE circuits The neurons are supposed to activate a set of outputs instead ofjust one
The first multilayered network was developed in 1975, an unsupervised network
In 1982, interest in the field was renewed John Hopfield of Caltech presented a paper tothe National Academy of Sciences His approach was to create more useful machines byusing bidirectional lines Previously, the connections between neurons was only one way.That same year, Reilly and Cooper used a "Hybrid network" with multiple layers, eachlayer using a different problem-solving strategy
Also in 1982, there was a joint US-Japan conference on Cooperative/CompetitiveNeural Networks Japan announced a new Fifth Generation effort on neural networks,and US papers generated worry that the US could be left behind in the field (Fifthgeneration computing involves artificial intelligence (AI) First generation used switchesand wires, second generation used the transistor, third state used solid-state technology likeintegrated circuits and higher level programming languages, and the fourth generation iscode generators.) As a result, there was more funding and thus more research in the field
In 1986, with multiple layered neural networks in the news, the problem was how toextend the Widrow-Hoff rule to multiple layers Three independent groups of researchers,one of which included David Rumelhart, a former member of Stanford’s psychologydepartment, came up with similar ideas which are now called back propagation networksbecause it distributes pattern recognition errors throughout the network Hybrid networksused just two layers, these back-propagation networks use many The result is that back-propagation networks are "slow learners," needing possibly thousands of iterations tolearn [13]
Today, neural networks discussions are occurring everywhere Yet, its future lies inhardware development, the most important key to the whole technology Currently, most
Trang 333.3 M A C H I N E L E A R N I N G A L G O R I T H M S 14
neural network development is simply proving that the principal works Researchersare developing neural networks that take weeks to learn due to processing limitations.Hence, specialized chips are required to realize these prototypes out of the lab and putthem into use Many companies are working on three main types of neuro chips: digital,analog, and optical Some companies are even working on creating a "silicon compiler" togenerate a neural network Application Specific Integrated Circuit (ASIC) These ASICsand neuron-like digital chips appear to be a trend of the near future Ultimately, opticalchips look very promising However, it may be years before optical chips see the light ofday in commercial applications
in Figure3.3.1for general cases applied in human individual
Practically, because training sets are finite and the future is uncertain, learning theoryusually does not yield guarantees of the performance of algorithms Instead, probabilisticbounds on the performance are quite common The bias–variance decomposition is oneway to quantify generalization error [15] Details about properties of machine learningwill be discussed in the next part
Aside from software developments, compatible hardware that can provide sufficientperformance for applying in real application is also an important topic Since 2010s,advances in both machine learning algorithms and computer hardware have led to moreefficiency
Although machine learning has been transformative in some fields, machine-learningprograms often fail to deliver expected results [16] There are many reasons for thisdrawback: lack of (suitable) data, lack of access to the data, data bias, privacy problems,badly chosen tasks and algorithms, wrong tools and people, lack of resources, and eval-uation problems For most applications, the problem about data bias is the main cause,especially when there are large set of training data A machine learning system trained
Trang 343.3 M A C H I N E L E A R N I N G A L G O R I T H M S 15
Figure 3.3.1: Illustration on learning from experience [2]
on current objects only, may not be able to predict the necessary steps when challenged
by a new object that are not represented in the training data When trained on man-madedata, machine learning is likely to pick up the same constitutional and unconscious biasesalready present in society In facial recognition applications, we also suffer the same effect:
ML programs usually trying to predict a new face that is not in the trained set as a closestface that have some minor similarities
Because of such challenges, the effective use of machine learning may take longer to
be adopted in other domains Concern for fairness in machine learning is reducing bias
in machine learning and propelling its use for human good, which has been expressed byartificial intelligence scientists
3.3.2 Machine Learning Models
After several years of research, currently ML algorithms can be categorized into learningtypes, each learning types can apply to some of the models An implemented model willthen become feasible software and integrated to a hardware, which is now called a completesystem The types of machine learning algorithms differ in their approach, the type of datathey input and output as well as the type of task or problem that they are intended to solve.Broadly, there are 3 types of Machine Learning Algorithms: Supervised Learning,Unsupervised Learning and Reinforcement Learning These are the most common anddefined basis in this field As Neural Networks are also following the same trait as asub-type of Machine Learning, we can discuss about learning paradigms in section3.5.4
on page42for more detailed examples
Trang 353.3 M A C H I N E L E A R N I N G A L G O R I T H M S 16
In this section we will focus on machine learning models, which are the skeletons of thesystem Each models can use different approaches as long as they can perform trainingand process trained data with additional parameters to make predictions Various types ofmodels have been used and researched for machine learning systems
3.3.2.1 Artificial Neural Networks
Because ANN is the kind of model we selected to implement for our project, it will bediscussed later in a separate section3.5.1on page25
3.3.2.2 Decision Trees
Decision Trees (DTs) are a non-parametric supervised learning method used for sification and regression The goal is to create a model that predicts the value of a targetvariable by learning simple decision rules inferred from the data features [17]
clas-Specifically, a decision tree is a decision support tool that uses a tree-like graph ormodel of decisions and their possible consequences, including chance event outcomes,resource costs, and utility Represented in figure3.3.2, a decision tree will have followingcontents: each internal node represents a “test” on an attribute, each branch represents theoutcome of the test and each leaf node represents a class label as well as decision takenafter computing all attributes The paths from root to leaf represent classification rules It
is one way to display an algorithm that only contains conditional control statements
Figure 3.3.2: Basic contents of a decision tree
Decision trees have a natural “if then else” construction that makes it fit easily into
a programmatic structure They also are well suited to categorization problems whereattributes or features are systematically checked to determine a final category However, inpractice we might face one problem: how do we know when to split the branch As wecan see, the decision of making strategic splits heavily affects a tree’s accuracy Decisiontrees use multiple algorithms to decide to split a node in two or more sub-nodes The
Trang 363.3 M A C H I N E L E A R N I N G A L G O R I T H M S 17
creation of sub-nodes increases the homogeneity of resultant sub-nodes In other words,
we can say that purity of the node increases with respect to the target variable Decisiontree splits the nodes on all available variables and then selects the split which results inmost homogeneous sub-nodes This approach will then lead to over fitting problem
To reduce the possibility of getting over fitting issue, researchers provided some decisiontree algorithm to follows Algorithms for constructing decision trees usually work top-down, by choosing a variable at each step that best splits the set of items [18] Below is thelist of the most common algorithms of DT:
• ID3 (Iterative Dichotomiser 3) and C4.5 (successor of ID3), both used InformationGain to represents entropy of information contents
• CART (Classification And Regression Tree), which uses Gini impurity to measurehow often a randomly chosen element from the set would be incorrectly labeled
In regression, CART used Variance reduction, requires metric discretionary beforebeing applied The variance reduction of a node N is defined as the total reduction
of the variance of the target variable x due to the split at this node
• Chi-square automatic interaction detection (CHAID) performs multi-level splitswhen computing classification trees
• Multivariate adaptive regression spline (MARS) extends decision trees to handlenumerical data better
From above properties of DT, we can observe some advantage about this method andhow it is not applicable for complicated programs such as facial recognition
First impression on DTs must be they are very easy to understand, even for people fromnon-analytically background It does not require any statistical knowledge to read andinterpret them Its graphical representation is very intuitive and users can easily relatetheir hypothesis Secondly, decision trees require relatively little effort from users for datapreparation since the decision choice are mainly based on yes no options or simplifiedparameters Furthermore, non-parametric method helps decision trees reduce assumptionsabout the space distribution and the classifier structure
But there are many drawbacks, where the most common problem about DT is the highpossibility of approaching over fitting Decision-tree learners can create over-complextrees that do not generalize the data well This problem gets solved by setting constraints
on model parameters and pruning, using the above algorithms Aside from over fitting,DTs are also create biased trees if some classes dominate It is therefore recommended tobalance the data set prior to fitting with the decision tree Generally, it gives low predictionaccuracy for a data set as compared to other machine learning algorithms [17]
Trang 373.3 M A C H I N E L E A R N I N G A L G O R I T H M S 18
3.3.2.3 Linear Regression
Linear regression (LR) is another supervised-learning method It attempts to modelthe relationship between two variables by fitting a linear equation to observed data Onevariable is considered to be an explanatory variable and the other is considered to be adependent variable In practice, Linear regression is used to estimate real values such ascost of houses, number of calls or total sales, as long as the data are based on continuousvariable Here, we establish relationship between independent and dependent variables
by fitting a best line This best fit line is known as regression line and represented by alinear equation below The coefficients a and b are derived based on minimizing the sum
of squared difference of distance between data points and regression line
• x is a n-dimensional column-vectors, often denoted as regression matrix Xmn thus
• a is a vector of regression coefficients with structure same as y, but with an additional
dimension applies for first column of x thus represents as a=
Trang 383.3 M A C H I N E L E A R N I N G A L G O R I T H M S 19
• Simple and multiple linear regression, where simple ones apply for single scalar
x and y given from equation3.1and multiple version used when matrix or multidimensional parameters given
• Generalized linear models which broadly used for multiple situations include ing positive quantities under large scale or modeling categorical data
model-• Hierarchical linear models organizes the data into a hierarchy of regressions, forexample where A is regressed on B and B is regressed on C
Aside from extensions, researchers also specify many estimation method which increasethe effectiveness of parameters a, b and x [19] Namely:
• Least-squares estimation
• Maximum-likelihood estimation
• Bayesian linear regression, adapted from a probabilistic graphical model that resents a set of random variables and their conditional independence discussed insection3.3.2.6
rep-3.3.2.4 Support Vector Machine
Support vector machine (SVM) is a supervised machine learning model that usesclassification algorithms for two or multiple group problems It is well known as a fast anddependable classification method that performs very well with a limited amount of data.After giving an SVM model sets of labeled training data for each category, the model can
be used to solve specific categorizing tasks
Developed at ATT Bell Laboratories by Vapnik with colleagues (Boser et al., 1992,Guyon et al., 1993, Vapnik et al., 1997), it presents one of the most robust predictionmethods, based on the statistical learning framework or VC theory proposed by Vapnikand Chervonekis (1974) and Vapnik (1982, 1995)
In this project, linear SVM is planned to be use for feature extraction process, wherespecific implementation for facial picture application will be discussed in section iii.Therefore in this section we will get an overview about SVM architecture and algorithm tostrengthen the reason to apply it in our project
The basics of Support Vector Machines and how it works are best understood with
a simple example Let’s imagine we have two outputs: red and blue, our data has twofeatures: x and y We want a classifier that given a pair of(x, y)coordinates, check if itseither red or blue Classifying data is a common task in machine learning A data point
is viewed as a p-dimensional vector, which is a list of p numbers and we want to knowwhether we can separate such points with a(p− 1)-dimensional hyper plane In this case,
p=2 and thus data point consists x and y coordinates This is called a linear classifier
Trang 393.3 M A C H I N E L E A R N I N G A L G O R I T H M S 20
A support vector machine takes these data points and outputs the hyper plane, which intwo dimensions its simply a line, that best separates the tags For SVM, this hyper plane isthe one that maximizes the margins from both tags In other words, the distance betweenany places on the hyper plane to the nearest element of each output is the largest Thisline is the decision boundary: anything that falls to one side of it we will classify as blue,and anything that falls to the other as red Figure3.3.3gives an overview about the aboveprocess, where the left most graph shows how groups of red and blue outputs are set indifferent(x, y) coordinates The middle and right most figure describe how "best hyperplane" should be determined
Figure 3.3.3: Basic concepts of how SVM works
In above example, we can rewrite the input data in mathematical way as(~x1, y1), ,(~xn, yn)
, where yiare indicating the class to which the point ~xibelongs to This should be either 1
or −1 as for now we have 2 classes A desired hyper plane is needed, so that the distancebetween it and the nearest point ~xifrom either group is maximized This hyper plane can
be written as the set of point ~x satisfying ~w·~x − b=0, where parameter k~wkb determinesthe offset of the hyper plane from the origin along the normal vector ~w
Above information is for the linear classification, where data can be separated using ahyper plane In addition to performing linear classification, SVMs can efficiently perform
a non-linear classification using what is called the kernel trick, implicitly mapping theirinputs into high-dimensional feature spaces However, here we only focus on linear method
so no need to go in details about non-linear
About SVM properties, it can be refer as an extension of the perceptron due to sharingthe same method of a generalized linear classifiers A special property is that they simulta-neously minimize the empirical classification error and maximize the geometric margin;hence they are also known as maximum margin classifiers Advantages of linear SVM can
be observed are memory efficiency ability, thanks to the usage a subset of training points
in the decision function In facial detection application, we can specify two classes: faceand non-face Obviously, non-face images are richer and broader than face images
Trang 403.3 M A C H I N E L E A R N I N G A L G O R I T H M S 21
The disadvantage of SVM is that if the number of features is much greater than thenumber of samples, the method is likely to give poor performances SVM gives efficientresult for small training samples as compared to large ones Furthermore, it does not directlyprovide probability estimates, so these must be calculated using indirect techniques
3.3.2.5 k-Nearest Neighbors
As introduced earlier, KNN is the main learning method that we are applying in thisproject and the output system from KNN going to be compared with the SOM-approachsystem Therefore, it shall be discussed in a separate chapter
3.3.2.6 Bayesian Networks
Bayesian networks are a type of Probabilistic Graphical Model that can be used to buildmodels from data and/or expert opinion At a glance, it has similar approach as DecisionTrees, both using graphical model with branches and nodes
They can be used for a wide range of tasks including prediction, anomaly detection,diagnostics, automated insight, reasoning, time series prediction and decision making underuncertainty This network capabilities can be divided in terms of the four major analyticdisciplines, descriptive analytic, diagnostic analytic, predictive analytic and prescriptiveanalytic
Before going into exactly what a Bayesian network is, we have to recall probabilitytheory First, remember that the joint probability distribution of random variables Anare denoted as P(Anand is equal to P(A1| A2, · · · , An)∗ P(A2| A3, · · · , An)∗ · · · ∗ P(Anbythe chain rule of probability Secondly, recall the property of conditional independencebetween two random variables A, B and C is equivalent to the property P(A, B | C) =P(A|
C)∗ P(B| C)or P(A| B,C) =P(A| C) In other words, as long as the value of C is knownand fixed, A and B are independent
Now we can start the formal definition of Bayesian network It is a directed graph G=(V, E), added random variable xifor each node with i ∈ V and a conditional probabilitydistribution P(xi| xEi)per node, specifying the probability of xiconditioned on its parents’values
Thus, a Bayesian network defines a probability distribution p Conversely, we say that aprobability p factorizes over a directed acyclic graph G as long as it can be decomposedinto a product of factors specified by G
Typical use of Bayesian networks can be listed are:
• Model and explain a domain
• Predict probability about states of certain variables, which can support decisionmaking under uncertainty