An effective facial expression recognition approach for intelligent game systems

An effective facial expression recognition approach for intelligent game systems tài liệu, giáo án, bài giảng , luận văn...

Trang 1

An effective facial expression recognition approach for intelligent game systems

Nhan Thi Cao School of Media, Soongsil University,

511, Sangdo-Dong, Dongjak-Gu, Seoul, 156-743, Korea

Email: ctnhen@yahoo.com

An Hoa Ton-That University of Information Technology, Vietnam National University,

Km 20, Hanoi Highway, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam Email: an_tth@yahoo.com

Hyung-Il Choi*

School of Media, Soongsil University,

511, Sangdo-Dong, Dongjak-Gu, Seoul, 156-743, Korea

Email: hic@ssu.ac.kr

*Corresponding author

Abstract: This paper presents a novel facial expression recognition approach

based on an improved model of completed local binary pattern and support vector machine classification to propose a method for applying to intelligence game applications as well as intelligence communication systems The capturing emotion of players can be applied in interactive games with various purposes, such as transferring player’s emotions to his or her avatar, or activating suitable action to communicate with players in order to obtain positive attitude of the players in educational games Our experiments on two databases included JAFFE (213 images) and CK (2040 images) databases show the effectiveness of the proposed method in comparison with some other methods The accuracy recognition rate of JAFFE database is 96.28% and CK database is 99.85% The advantage of this technique is simple, fast and high accuracy

Keywords: facial expression recognition; completed local binary pattern;

CLBP; intelligence game systems; support vector machine; SVM

Reference to this paper should be made as follows: Cao, N.T., Ton-That, A.H

and Choi, H-I (2016) ‘An effective facial expression recognition approach for

intelligent game systems’, Int J Computational Vision and Robotics, Vol 6,

No 3, pp.223–234

Trang 2

Biographical notes: Nhan Thi Cao is a PhD candidate at Computer Vision Lab

in the School of Media at Soongsil University She received her BS (1998) in Information Technology from Dalat University She received her MS (2004) in Computer Science from University of Natural Science, Vietnam National University, Ho Chi Minh City

An Hoa Ton-That holds a PhD in Computer Science Department at the University of Information Technology, which belongs to Vietnam National University, Ho Chi Minh City, Vietnam His research interests include computer vision, pattern recognition, fuzzy systems and artificial intelligence

He received his BS (2005) in Information Technology He received his MS (2009) in Computer Science from Vietnam National University, Ho Chi Minh City, Vietnam He received his PhD (2014) in Computer Science from Soongsil University, Korea

Hyung-Il Choi is a Professor in the School of Media at Soongsil University His research interests include computer vision, pattern recognition, and artificial intelligence He received his BS (1979) in Electronic Engineering from Yonsei University He received his MS (1983) and PhD (1987) degrees in Electrical Engineering and Computer Science from the University of Michigan

This paper is a revised and expanded version of a paper entitled ‘A facial expression recognition method for intelligent game applications’ presented at Serious Games & Social Connect Community Conference and the International Symposium on Simulation & Serious Games 2014, Kintex Convention Center, South Korea, 23–24 May 2014

1 Introduction

In recent years, with the development of intelligence communication systems, data-driven animation and intelligent game applications, facial expression recognition has attracted much attention as in Ahsan et al (2013), Cao et al (2013), Liao et al (2006), Priya and Banu (2012), Shan et al (2005, 2009), Zhao and Zhang (2012), for example In this paper, we propose a novel method for recognising facial expressions based on an improvement of a completed modelling of local binary pattern Our experiments on both Japanese Female Facial Expression (JAFFE) database as in Lyons et al (1999) and Cohn-Kanade (CK) database as in Kanade et al (2000) and Lucey et al (2010) show the effectiveness of the proposed method The accuracy rate obtained is high in compare with several other methods for both databases with seven classes of facial expression

In intelligent games, emotion recognition of players through facial expression recognition can be used in many ways For example, in interactive and multiplayer games, emotions of players can be transferred to the players’ avatars on the screen Or in educational games, recognising players’ emotions can help the system how to behave in better manner For example, if the player is sleepy, the system may wake him/her up; or

if the player is happy after doing something well, the system may cheer him/her up and

so on Thus, facial expressions recognition of players is applied, intelligent game systems can become more interactive, vivid and attractive

Trang 3

The rest of the paper is organised as follows: in Section 2, the face region cropping is described Section 3 presents the completed local binary pattern (CLBP) for facial expression recognition and in Section 4, experiments and results are shown Finally, in Section 5, the conclusions are given

2 Face region cropping

Face image pre-process is a process to attain normalised face images from input face images gotten from a camera or a database The normalised face images are used for extracting facial expression features This process can be divided into two steps: basic step and enhancement step The basic step is to detect the face region of an input face image and eliminate redundant regions This step can carry out by manual or a real-time face detector The enhancement step is to optimise the face region for extracting facial expression features This step can be made by cropping methods, image normalisation or image filter processes Then the face images are rescaled and used for feature extraction

Figure 1 shows the process of face image preprocess

Figure 1 The process of face image preprocess

Database/

camera

Input face images

Basic processing

Enhancement processing

Feature extraction

In this paper, the image preprocess is implemented as in Cao et al (2013) It included two steps of preprocess: basic process and enhancement process Normally, human face images from a camera or a database contain much redundant information, e.g., background or non-face regions So, to detect face region in face image, the robust real-time face detection algorithm developed by Viola and Jones (2004) is applied

However, the face images obtained still contain some redundant areas that can impact accurate recognition result and processing speed, so in the enhancement step, a cropping technique is used as in following Figure 2

Figure 2 Face region cropped by the cropping method

O(0, 0) P(x, y)

w1 w2

y = h/6

x = (w1 – w2 ) / 2

Square S for cropping

Human face image obtained from the robust real-time face detector

h

Trang 4

The cropping method can be described as following:

• First, the size of square S used for cropping the human face in images is determined

The side w2 of square S will be equal to the widthwise of the human face The size of square S depends on each database even each image However, based on tested

results of some databases by the image preprocess method as in Cao et al (2013), the widthwise of the human face accounts for from 75% to 85% of the widthwise of face

images obtained from the robust real-time face detector It means that values of w2

are counted as experimental parameters

• Next step is to determine the coordinate P(x, y) from left-up corner of the image in order to crop the square S Let O(0, 0) is coordinate at left-up corner of human face image obtained from the robust real-time face detector, h is the height of the face image, w1 is the width of the face image and w2 is the width of the square S So, the coordinates are y = h/6 and x = (w1 – w2) / 2 Expression y = h/6 based on face

images having neutral facial expression Normally, forehead region occupies one-fourth of human face height Thus, forehead region occupies a not small region

on human face region but it does not contain much essential information of face expressions For this reason, two-third (2/3) of upper forehead region is trimmed and one-third (1/3) of lower forehead region from eyebrows is retained

Finally, the human face image obtained from the robust real-time face detector is cropped

by square S at coordinate P(x, y) Figure 3 shows the cropping technique applied for a

face image

Figure 3 Face region cropped by the cropping technique (the small square) (see online version

for colours)

Face region cropped by the robust real-time face detector (large square)

This cropping method aims at reducing processing time in steps of feature extraction and facial expression recognition, and most important being to improve the rate of facial expression recognition It is suitable for real time systems such as for intelligent human-machine systems or intelligent game applications

3 The CLBP for facial expression recognition

3.1 Local binary pattern

The local binary pattern (LBP) operator was first introduced as a complementary measure for local image contrast as in Ojala et al (1996) A LBP code is computed for a pixel in

an image by comparing it with its neighbours as in equation (1):

Trang 5

( )

1 , 0

2 , ( )

P

c

p

x

−

=

≥

⎧

⎩

where g c is grey value of the central pixel, g p is the grey value of its neighbours, P is the total number of involved neighbours and R is the radius of the neighbourhood Based on

the operator, each pixel of image is labelled by a LBP code

For facial expression recognition, the uniform LBP code is usually used A LBP code

is called uniform if it contains at most two bitwise transitions from 0 to 1 or vice versa when the binary string is considered circular as in Ojala et al (2002) For example,

00000000, 001110000 and 11100001 are uniform patterns An uniform LBP operator is denoted 2

,

u

P R

LBP

A histogram of a labelled image f k (x, y) can be defined as following:

( )

,

x y

where n is the number of different labels produced by the LBP operator and

1 ( ) 0

A is true

I A

A is false

⎧

= ⎨

This histogram contains information about the distribution of the local micro-patterns, e.g., spots, edges, corners or flat areas, etc., over the whole image

3.2 Local difference sign-magnitude transform

According to Guo et al (2010), based on a central pixel g c and its P circularly and evenly spaced neighbours g p , p = 0, 1, …, P – 1, the difference between g c and g p can be

calculated as d p = g p – g c The local difference vector [d0, …, d p–1] describes the image

local structure at g c and can be decomposed into two components:

( )

⎧ =

⎪

=

p p

p

d s

d

≥

⎧

⎩ is sign of d p and m p is the magnitude of d p The equation (4) is

called the local difference sign-magnitude transform and it transforms the local difference

vector [d0, …, d p–1] into a sign vector [s0, …, s p–1] and a magnitude vector [m0, …, m p–1]

Figure 4 shows an example of the transformation

Trang 6

Figure 4 (a) A 3 × 3 sample block (b) Local difference (c) Sign component (d) Magnitude

component

25 48 76

19 32 41

36 87 9

–7 16 44

4 55 –23 (a) (b)

0 1 1

0 1

1 1 0

7 16 44

13 9

4 55 23 (c) (d)

3.3 Completed LBP with CLBP_S and CLBP_M operators

The transformation shows that the original LBP uses only the sign vector to code the

local pattern because it is proved that d p can be more accurately approximated by using

the sign component s p than the magnitude component m p However, it is also found that the magnitude component may contribute additional discriminative information for pattern recognition if it is properly used

The sign component is the same as the original LBP operator defined in equation (1)

In CLBP, this component is denoted CLBP_S operator, whereas the magnitude component is continuous values as a replacement for the binary ‘1’ and ‘0’ values To code this component in a consistent format with that of sign component to exploit their additional information, the magnitude component is denoted CLBP_M operator and defined as in equation (5):

1 , 0

1,

0,

P

p

x c

−

=

≥

⎧

⎩

where the threshold c is to be determined adaptively and set as the mean value of m p from

the whole image As the same uniform LBP operator, uniform CLBP_M P ,R operator is

,

P R

CLBP M

Two CLBP_S and CLBP_M operators have same binary string format, so they can be used together for pattern recognition In proposed method, to form a CLBP descriptor, histograms of CLBP_S and CLBP_M codes of the image are made by concatenation It means that the histograms of the CLBP_S and CLBP_M codes are calculated separately, and then concatenate the two histograms together This CLBP scheme can be represented

as ‘CLBP_S_M’

3.4 Extracting CLBP feature for facial expression recognition

In facial expression recognition application, in order to represent the face efficiently, features extracted should retain spatial information For this reason, the face image can be

Trang 7

divided into small regions before extracting feature There have been proposed methods for resizing and dividing the face images, for example, 110 × 150 pixels with 6 × 7 regions shown in Figure 5(a) as in Shan et al (2005, 2009), Zhao and Zhang (2012) or

256 × 256 pixels with 3 × 5 regions shown in Figure 5(b), as in Ying et al (2009), or

64 × 64 pixels with eight regions shown in Figure 5(c) as in Liao et al (2006)

Figure 5 Proposed methods for resolution and region division

(a) (b) (c) After the face images cropped, they are resized to resolution of 64 × 64 pixels as in Cao

et al (2014), then the resized face images are divided into non-overlap regions of 8 × 8 pixels for extracting features Next, each region is calculated CLBP histogram or CLBP feature as in Figure 6 The CLBP features extracted from each region are concatenated from left to right and up to down into a single feature vector of the face image

Figure 6 Calculating CLBP histogram for face image

3.5 Choosing effective threshold for CLBP_M

Originally, CLBP was developed from LBP in order to obtain better result for texture classification, especially in case of rotation invariant texture classification They have been effectively used for facial expression recognition in recent The face image is divided into regions before extracting feature vector in facial expression recognition

Since the textures of regions in a face image are different, so we tested some different

thresholds c as following:

• the mean value of m p from the whole image

• the mean value of m p from the region

• the mean value of m p from 2

,

P R

Experiment results on both JAFFE and CK databases show that choosing the threshold as

in the last case obtains the best accurate rate in facial expression recognition

Trang 8

4 Experiments and results

We applied the proposed method on two databases First database is JAFFE (JAFFE) database JAFFE database in Lyons et al (1999) includes 213 grey images of ten JAFFE

Original images from the database have a resolution of 256 × 256 pixels In our experiments, we selected all 213 images as experiment samples Second database is the

CK database as in Kanade et al (2000) and Lucey et al (2010) The CK database consists

of 100 university students aged from 18 to 30 years, of which 65% were female, 15%

were African-American, and 3% were Asian or Latino Subjects were instructed to perform a series of 23 facial displays, six of which were based on description of basic emotions (anger, disgust, fear, joy, sadness, and surprise) Image sequences from neutral

to target display were digitised into 640 × 490 pixel arrays with eight-bit precision for greyscale values In CK database, many subjects do not express all six primary emotions

For our experiments, we chose subjects expressed at least three emotions (included neutral state) So, 86 subjects (56 females and 30 males) from the database are selected

Each primary emotion of a subject includes six images with expression degrees from less

to more and the neutral emotion is selected from some first images from the sequences

Totally, 2040 images (234 anger images, 276 disgust images, 150 fear images, 390 joy images, 474 neutral images, 156 sadness images, and 360 surprise images) are selected for the experiments

In classification step, support vector machine (SVM) classifier is applied since many applications have confirmed SVM obtaining high results for classifying facial expression

as in Priya and Banu (2012) and Ahsan et al (2013) We used SVM functions with Radial Basis Functions kernel of OpenCV 2.1 In order to choose optimal parameters, we carried out grid-search approach as in Hsu et al (2010) Three-fold cross-validation method is applied for experiments on platform C++ for both databases

Confusion matrix of JAFFE database and confusion matrix of CK database are shown

in Tables 1 and 2, respectively In these experiments, we used the percentage w2 / w1 of

cropped image on preprocess step is 80% and the threshold c for 2

,

P R

is the mean value of m p from 2

,

P R

Table 1 Confusion matrix of JAFFE database at 80% of percentage w2 / w1

Anger (%) Disgust (%) Fear (%) Joy (%) Neutral (%) Sadness (%) Surprise (%)

Trang 9

Table 2 Confusion matrix of CK database at 80% of percentage w2 / w1

Anger (%) Disgust (%) Fear (%) Joy (%) Neutral (%) Sadness (%) Surprise (%)

As we presented in Section 3.5, there are some ways to choose the threshold c for CLBP_M operator, our experiments show that choosing this value is the mean value of

,

P R

Table 3 presents the recognition rate of two databases using various thresholds Figure 7 illustrates the results of using three kinds of threshold choosing in a chart

Table 3 Recognition rate using various thresholds on CK and JAFFE databases

The mean value of m p from the whole image 99.72 95.32

The mean value of m p from the 2

, _ u

P R

Figure 7 The chart of the results comparing various thresholds (see online version for colours)

Trang 10

It is almost impossible to cover all of the published works However, for comparison, we would like to present several typical papers that represent state-of-the-art methods of facial expression recognition whereby an overview of the existing methods is presented

The comparison of a number of state-of-the-art methods with proposed approach on JAFFE database and CK database is presented in Table 4 and Table 5, respectively

Table 4 Comparison of the state-of-the-art methods with proposed method on JAFFE database

Feng et al

(2007) Shih et al (2008) Pan Lina and (2009) Zhang (2012) Zhao and Proposed method

No of facial

Cross validation test 10-fold 10-fold 10-fold 10-fold 3-fold

Notes: a LPT: linear programming technique

b 2D-LDA: 2D-linear discriminant analysis

c 1-NN: 1-nearest-neighbour

d DKLLE: discriminant kernel locally linear embedding

Table 5 Comparison of the state-of-art methods with proposed method on CK database

Ahsan et al

(2013) Shan et al (2009) Zhang (2012) Zhao and Khan et al (2013) Proposed method

Kind of feature Gabor wavelet

No of facial

sequence 2,040 Cross validation test 7-fold 10-fold 10-fold 10-fold 3-fold

Notes: a LTP: local transitional pattern

b BLBP: boosted-LBP

c 1-NN: 1-nearest-neighbour

d DKLLE: discriminant kernel locally linear embedding

e PLBP: pyramid of LBP

5 Conclusions

We presented a novel experimental method of facial expression recognition based on the proposed image preprocessing technique and the improvement of a CLBP model Our experiments showed that a suitable threshold selected in computation CLBP can obtain better recognition rate in facial expression recognition application Based on the

Định dạng
Số trang	12
Dung lượng	288,45 KB