An effective facial expression recognition approach for intelligent game systems tài liệu, giáo án, bài giảng , luận văn...
Trang 1An effective facial expression recognition approach for intelligent game systems
Nhan Thi Cao School of Media, Soongsil University,
511, Sangdo-Dong, Dongjak-Gu, Seoul, 156-743, Korea
Email: ctnhen@yahoo.com
An Hoa Ton-That University of Information Technology, Vietnam National University,
Km 20, Hanoi Highway, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam Email: an_tth@yahoo.com
Hyung-Il Choi*
School of Media, Soongsil University,
511, Sangdo-Dong, Dongjak-Gu, Seoul, 156-743, Korea
Email: hic@ssu.ac.kr
*Corresponding author
Abstract: This paper presents a novel facial expression recognition approach
based on an improved model of completed local binary pattern and support vector machine classification to propose a method for applying to intelligence game applications as well as intelligence communication systems The capturing emotion of players can be applied in interactive games with various purposes, such as transferring player’s emotions to his or her avatar, or activating suitable action to communicate with players in order to obtain positive attitude of the players in educational games Our experiments on two databases included JAFFE (213 images) and CK (2040 images) databases show the effectiveness of the proposed method in comparison with some other methods The accuracy recognition rate of JAFFE database is 96.28% and CK database is 99.85% The advantage of this technique is simple, fast and high accuracy
Keywords: facial expression recognition; completed local binary pattern;
CLBP; intelligence game systems; support vector machine; SVM
Reference to this paper should be made as follows: Cao, N.T., Ton-That, A.H
and Choi, H-I (2016) ‘An effective facial expression recognition approach for
intelligent game systems’, Int J Computational Vision and Robotics, Vol 6,
No 3, pp.223–234
Trang 2Biographical notes: Nhan Thi Cao is a PhD candidate at Computer Vision Lab
in the School of Media at Soongsil University She received her BS (1998) in Information Technology from Dalat University She received her MS (2004) in Computer Science from University of Natural Science, Vietnam National University, Ho Chi Minh City
An Hoa Ton-That holds a PhD in Computer Science Department at the University of Information Technology, which belongs to Vietnam National University, Ho Chi Minh City, Vietnam His research interests include computer vision, pattern recognition, fuzzy systems and artificial intelligence
He received his BS (2005) in Information Technology He received his MS (2009) in Computer Science from Vietnam National University, Ho Chi Minh City, Vietnam He received his PhD (2014) in Computer Science from Soongsil University, Korea
Hyung-Il Choi is a Professor in the School of Media at Soongsil University His research interests include computer vision, pattern recognition, and artificial intelligence He received his BS (1979) in Electronic Engineering from Yonsei University He received his MS (1983) and PhD (1987) degrees in Electrical Engineering and Computer Science from the University of Michigan
This paper is a revised and expanded version of a paper entitled ‘A facial expression recognition method for intelligent game applications’ presented at Serious Games & Social Connect Community Conference and the International Symposium on Simulation & Serious Games 2014, Kintex Convention Center, South Korea, 23–24 May 2014
1 Introduction
In recent years, with the development of intelligence communication systems, data-driven animation and intelligent game applications, facial expression recognition has attracted much attention as in Ahsan et al (2013), Cao et al (2013), Liao et al (2006), Priya and Banu (2012), Shan et al (2005, 2009), Zhao and Zhang (2012), for example In this paper, we propose a novel method for recognising facial expressions based on an improvement of a completed modelling of local binary pattern Our experiments on both Japanese Female Facial Expression (JAFFE) database as in Lyons et al (1999) and Cohn-Kanade (CK) database as in Kanade et al (2000) and Lucey et al (2010) show the effectiveness of the proposed method The accuracy rate obtained is high in compare with several other methods for both databases with seven classes of facial expression
In intelligent games, emotion recognition of players through facial expression recognition can be used in many ways For example, in interactive and multiplayer games, emotions of players can be transferred to the players’ avatars on the screen Or in educational games, recognising players’ emotions can help the system how to behave in better manner For example, if the player is sleepy, the system may wake him/her up; or
if the player is happy after doing something well, the system may cheer him/her up and
so on Thus, facial expressions recognition of players is applied, intelligent game systems can become more interactive, vivid and attractive
Trang 3The rest of the paper is organised as follows: in Section 2, the face region cropping is described Section 3 presents the completed local binary pattern (CLBP) for facial expression recognition and in Section 4, experiments and results are shown Finally, in Section 5, the conclusions are given
2 Face region cropping
Face image pre-process is a process to attain normalised face images from input face images gotten from a camera or a database The normalised face images are used for extracting facial expression features This process can be divided into two steps: basic step and enhancement step The basic step is to detect the face region of an input face image and eliminate redundant regions This step can carry out by manual or a real-time face detector The enhancement step is to optimise the face region for extracting facial expression features This step can be made by cropping methods, image normalisation or image filter processes Then the face images are rescaled and used for feature extraction
Figure 1 shows the process of face image preprocess
Figure 1 The process of face image preprocess
Database/
camera
Input face images
Basic processing
Enhancement processing
Feature extraction
In this paper, the image preprocess is implemented as in Cao et al (2013) It included two steps of preprocess: basic process and enhancement process Normally, human face images from a camera or a database contain much redundant information, e.g., background or non-face regions So, to detect face region in face image, the robust real-time face detection algorithm developed by Viola and Jones (2004) is applied
However, the face images obtained still contain some redundant areas that can impact accurate recognition result and processing speed, so in the enhancement step, a cropping technique is used as in following Figure 2
Figure 2 Face region cropped by the cropping method
O(0, 0) P(x, y)
w1 w2
y = h/6
x = (w1 – w2 ) / 2
Square S for cropping
Human face image obtained from the robust real-time face detector
h
Trang 4The cropping method can be described as following:
• First, the size of square S used for cropping the human face in images is determined
The side w2 of square S will be equal to the widthwise of the human face The size of square S depends on each database even each image However, based on tested
results of some databases by the image preprocess method as in Cao et al (2013), the widthwise of the human face accounts for from 75% to 85% of the widthwise of face
images obtained from the robust real-time face detector It means that values of w2
are counted as experimental parameters
• Next step is to determine the coordinate P(x, y) from left-up corner of the image in order to crop the square S Let O(0, 0) is coordinate at left-up corner of human face image obtained from the robust real-time face detector, h is the height of the face image, w1 is the width of the face image and w2 is the width of the square S So, the coordinates are y = h/6 and x = (w1 – w2) / 2 Expression y = h/6 based on face
images having neutral facial expression Normally, forehead region occupies one-fourth of human face height Thus, forehead region occupies a not small region
on human face region but it does not contain much essential information of face expressions For this reason, two-third (2/3) of upper forehead region is trimmed and one-third (1/3) of lower forehead region from eyebrows is retained
Finally, the human face image obtained from the robust real-time face detector is cropped
by square S at coordinate P(x, y) Figure 3 shows the cropping technique applied for a
face image
Figure 3 Face region cropped by the cropping technique (the small square) (see online version
for colours)
Face region cropped by the robust real-time face detector (large square)
This cropping method aims at reducing processing time in steps of feature extraction and facial expression recognition, and most important being to improve the rate of facial expression recognition It is suitable for real time systems such as for intelligent human-machine systems or intelligent game applications
3 The CLBP for facial expression recognition
3.1 Local binary pattern
The local binary pattern (LBP) operator was first introduced as a complementary measure for local image contrast as in Ojala et al (1996) A LBP code is computed for a pixel in
an image by comparing it with its neighbours as in equation (1):
Trang 5( )
1 , 0
2 , ( )
P
c
p
x
x
−
=
≥
⎧
⎩
where g c is grey value of the central pixel, g p is the grey value of its neighbours, P is the total number of involved neighbours and R is the radius of the neighbourhood Based on
the operator, each pixel of image is labelled by a LBP code
For facial expression recognition, the uniform LBP code is usually used A LBP code
is called uniform if it contains at most two bitwise transitions from 0 to 1 or vice versa when the binary string is considered circular as in Ojala et al (2002) For example,
00000000, 001110000 and 11100001 are uniform patterns An uniform LBP operator is denoted 2
,
u
P R
LBP
A histogram of a labelled image f k (x, y) can be defined as following:
( )
,
x y
where n is the number of different labels produced by the LBP operator and
1 ( ) 0
A is true
I A
A is false
⎧
= ⎨
This histogram contains information about the distribution of the local micro-patterns, e.g., spots, edges, corners or flat areas, etc., over the whole image
3.2 Local difference sign-magnitude transform
According to Guo et al (2010), based on a central pixel g c and its P circularly and evenly spaced neighbours g p , p = 0, 1, …, P – 1, the difference between g c and g p can be
calculated as d p = g p – g c The local difference vector [d0, …, d p–1] describes the image
local structure at g c and can be decomposed into two components:
( )
⎧ =
⎪
=
p p
p
d s
d
≥
⎧
⎩ is sign of d p and m p is the magnitude of d p The equation (4) is
called the local difference sign-magnitude transform and it transforms the local difference
vector [d0, …, d p–1] into a sign vector [s0, …, s p–1] and a magnitude vector [m0, …, m p–1]
Figure 4 shows an example of the transformation
Trang 6Figure 4 (a) A 3 × 3 sample block (b) Local difference (c) Sign component (d) Magnitude
component
25 48 76
19 32 41
36 87 9
–7 16 44
4 55 –23 (a) (b)
0 1 1
0 1
1 1 0
7 16 44
13 9
4 55 23 (c) (d)
3.3 Completed LBP with CLBP_S and CLBP_M operators
The transformation shows that the original LBP uses only the sign vector to code the
local pattern because it is proved that d p can be more accurately approximated by using
the sign component s p than the magnitude component m p However, it is also found that the magnitude component may contribute additional discriminative information for pattern recognition if it is properly used
The sign component is the same as the original LBP operator defined in equation (1)
In CLBP, this component is denoted CLBP_S operator, whereas the magnitude component is continuous values as a replacement for the binary ‘1’ and ‘0’ values To code this component in a consistent format with that of sign component to exploit their additional information, the magnitude component is denoted CLBP_M operator and defined as in equation (5):
1 , 0
1,
0,
P
p
p
x c
x c
−
=
≥
⎧
⎩
where the threshold c is to be determined adaptively and set as the mean value of m p from
the whole image As the same uniform LBP operator, uniform CLBP_M P ,R operator is
,
P R
CLBP M
Two CLBP_S and CLBP_M operators have same binary string format, so they can be used together for pattern recognition In proposed method, to form a CLBP descriptor, histograms of CLBP_S and CLBP_M codes of the image are made by concatenation It means that the histograms of the CLBP_S and CLBP_M codes are calculated separately, and then concatenate the two histograms together This CLBP scheme can be represented
as ‘CLBP_S_M’
3.4 Extracting CLBP feature for facial expression recognition
In facial expression recognition application, in order to represent the face efficiently, features extracted should retain spatial information For this reason, the face image can be
Trang 7divided into small regions before extracting feature There have been proposed methods for resizing and dividing the face images, for example, 110 × 150 pixels with 6 × 7 regions shown in Figure 5(a) as in Shan et al (2005, 2009), Zhao and Zhang (2012) or
256 × 256 pixels with 3 × 5 regions shown in Figure 5(b), as in Ying et al (2009), or
64 × 64 pixels with eight regions shown in Figure 5(c) as in Liao et al (2006)
Figure 5 Proposed methods for resolution and region division
(a) (b) (c) After the face images cropped, they are resized to resolution of 64 × 64 pixels as in Cao
et al (2014), then the resized face images are divided into non-overlap regions of 8 × 8 pixels for extracting features Next, each region is calculated CLBP histogram or CLBP feature as in Figure 6 The CLBP features extracted from each region are concatenated from left to right and up to down into a single feature vector of the face image
Figure 6 Calculating CLBP histogram for face image
3.5 Choosing effective threshold for CLBP_M
Originally, CLBP was developed from LBP in order to obtain better result for texture classification, especially in case of rotation invariant texture classification They have been effectively used for facial expression recognition in recent The face image is divided into regions before extracting feature vector in facial expression recognition
Since the textures of regions in a face image are different, so we tested some different
thresholds c as following:
• the mean value of m p from the whole image
• the mean value of m p from the region
• the mean value of m p from 2
,
P R
Experiment results on both JAFFE and CK databases show that choosing the threshold as
in the last case obtains the best accurate rate in facial expression recognition
Trang 84 Experiments and results
We applied the proposed method on two databases First database is JAFFE (JAFFE) database JAFFE database in Lyons et al (1999) includes 213 grey images of ten JAFFE
Original images from the database have a resolution of 256 × 256 pixels In our experiments, we selected all 213 images as experiment samples Second database is the
CK database as in Kanade et al (2000) and Lucey et al (2010) The CK database consists
of 100 university students aged from 18 to 30 years, of which 65% were female, 15%
were African-American, and 3% were Asian or Latino Subjects were instructed to perform a series of 23 facial displays, six of which were based on description of basic emotions (anger, disgust, fear, joy, sadness, and surprise) Image sequences from neutral
to target display were digitised into 640 × 490 pixel arrays with eight-bit precision for greyscale values In CK database, many subjects do not express all six primary emotions
For our experiments, we chose subjects expressed at least three emotions (included neutral state) So, 86 subjects (56 females and 30 males) from the database are selected
Each primary emotion of a subject includes six images with expression degrees from less
to more and the neutral emotion is selected from some first images from the sequences
Totally, 2040 images (234 anger images, 276 disgust images, 150 fear images, 390 joy images, 474 neutral images, 156 sadness images, and 360 surprise images) are selected for the experiments
In classification step, support vector machine (SVM) classifier is applied since many applications have confirmed SVM obtaining high results for classifying facial expression
as in Priya and Banu (2012) and Ahsan et al (2013) We used SVM functions with Radial Basis Functions kernel of OpenCV 2.1 In order to choose optimal parameters, we carried out grid-search approach as in Hsu et al (2010) Three-fold cross-validation method is applied for experiments on platform C++ for both databases
Confusion matrix of JAFFE database and confusion matrix of CK database are shown
in Tables 1 and 2, respectively In these experiments, we used the percentage w2 / w1 of
cropped image on preprocess step is 80% and the threshold c for 2
,
P R
is the mean value of m p from 2
,
P R
Table 1 Confusion matrix of JAFFE database at 80% of percentage w2 / w1
Anger (%) Disgust (%) Fear (%) Joy (%) Neutral (%) Sadness (%) Surprise (%)
Trang 9Table 2 Confusion matrix of CK database at 80% of percentage w2 / w1
Anger (%) Disgust (%) Fear (%) Joy (%) Neutral (%) Sadness (%) Surprise (%)
As we presented in Section 3.5, there are some ways to choose the threshold c for CLBP_M operator, our experiments show that choosing this value is the mean value of
,
P R
Table 3 presents the recognition rate of two databases using various thresholds Figure 7 illustrates the results of using three kinds of threshold choosing in a chart
Table 3 Recognition rate using various thresholds on CK and JAFFE databases
The mean value of m p from the whole image 99.72 95.32
The mean value of m p from the 2
, _ u
P R
Figure 7 The chart of the results comparing various thresholds (see online version for colours)
Trang 10It is almost impossible to cover all of the published works However, for comparison, we would like to present several typical papers that represent state-of-the-art methods of facial expression recognition whereby an overview of the existing methods is presented
The comparison of a number of state-of-the-art methods with proposed approach on JAFFE database and CK database is presented in Table 4 and Table 5, respectively
Table 4 Comparison of the state-of-the-art methods with proposed method on JAFFE database
Feng et al
(2007) Shih et al (2008) Pan Lina and (2009) Zhang (2012) Zhao and Proposed method
No of facial
Cross validation test 10-fold 10-fold 10-fold 10-fold 3-fold
Notes: a LPT: linear programming technique
b 2D-LDA: 2D-linear discriminant analysis
c 1-NN: 1-nearest-neighbour
d DKLLE: discriminant kernel locally linear embedding
Table 5 Comparison of the state-of-art methods with proposed method on CK database
Ahsan et al
(2013) Shan et al (2009) Zhang (2012) Zhao and Khan et al (2013) Proposed method
Kind of feature Gabor wavelet
No of facial
sequence 2,040 Cross validation test 7-fold 10-fold 10-fold 10-fold 3-fold
Notes: a LTP: local transitional pattern
b BLBP: boosted-LBP
c 1-NN: 1-nearest-neighbour
d DKLLE: discriminant kernel locally linear embedding
e PLBP: pyramid of LBP
5 Conclusions
We presented a novel experimental method of facial expression recognition based on the proposed image preprocessing technique and the improvement of a CLBP model Our experiments showed that a suitable threshold selected in computation CLBP can obtain better recognition rate in facial expression recognition application Based on the