Face Recognition pptx

inter-Chapter 1 describes a face recognition system based on 3D features, with applications in Ambient Intelligence Environment.. on a P4/3.4 Ghz based PC, offering additional advantages

Trang 3

Edited by Kresimir Delac and Mislav Grgic

I-TECH Education and Publishing

Trang 4

Published by the I-Tech Education and Publishing, Vienna, Austria

Abstracting and non-profit use of the material is permitted with credit to the source Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher No responsibility is accepted for the accuracy of information contained in the published articles Publisher assumes no responsibility liability for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained inside After this work has been published by the Advanced Robotic Systems International, authors have the right to republish it, in whole or part, in any publication of which they are an author or editor, and the make other personal use of the work

A catalog record for this book is available from the Austrian Library

Face Recognition, Edited by Kresimir Delac and Mislav Grgic

p cm

ISBN 3-86611-283-1

1 Face Recognition 2 Face sinthesys 3 Applications

Trang 5

Face recognition is a task humans perform remarkably easily and successfully This ent simplicity was shown to be dangerously misleading as the automatic face recognition seems to be a problem that is still far from solved In spite of more than 20 years of extensive research, large number of papers published in journals and conferences dedicated to this area, we still can not claim that artificial systems can measure to human performance Automatic face recognition is intricate primarily because of difficult imaging conditions (lighting and viewpoint changes induced by body movement) and because of various other effects like aging, facial expressions, occlusions etc Researchers from computer vision, im-age analysis and processing, pattern recognition, machine learning and other areas are working jointly, motivated largely by a number of possible practical applications

appar-The goal of this book is to give a clear picture of the current state-of-the-art in the field of

automatic face recognition across three main areas of interest: biometrics, cognitive models and

human-computer interaction Face recognition has an important advantage over other ric technologies - it is a nonintrusive and easy to use method As such, it became one of three identification methods used in e-passports and a biometric of choice for many other security applications Cognitive and perception models constitute an important platform for inter-disciplinary research, connecting scientists from seemingly incompatible areas and enabling them to exchange methodologies and results on a common problem Evidence from neuro-biological, psychological, perceptual and cognitive experiments provide potentially useful insights into how our visual system codes, stores and recognizes faces These insights can then be connected to artificial solutions On the other hand, it is generally believed that the success or failure of automatic face recognition systems might inform cognitive and percep-tion science community about which models have the potential to be candidates for those used by humans Making robots and computers more "human" (through human-computer interaction) will improve the quality of human-robot co-existence in the same space and thus alleviate their adoption into our every day lives In order to achieve this, robots must

biomet-be able to identify faces, expressions and emotions while interacting with humans

Hopefully, this book will serve as a handbook for students, researchers and practitioners in the area of automatic (computer) face recognition and inspire some future research ideas by identifying potential research directions The book consists of 28 chapters, each focusing on

a certain aspect of the problem Within every chapter the reader will be given an overview

of background information on the subject at hand and in many cases a description of the thors' original proposed solution The chapters in this book are sorted alphabetically, ac-cording to the first author's surname They should give the reader a general idea where the

Trang 6

au-current research efforts are heading, both within the face recognition area itself and in disciplinary approaches

inter-Chapter 1 describes a face recognition system based on 3D features, with applications in Ambient Intelligence Environment The system is placed within a framework of home automation - a community of smart objects powered by high user-friendliness Chapter 2 addresses one of the most intensely researched problems in face recognition - the problem of achieving illumination invariance The authors deal with this problem through a novel framework based on simple image filtering techniques In chapter 3 a novel method for pre-cise automatic localization of certain characteristic points in a face (such as the centers and the corners of the eyes, tip of the nose, etc) is presented An interesting analysis of the rec-ognition rate as a function of eye localization precision is also given Chapter 4 gives a de-tailed introduction into wavelets and their application in face recognition as tools for image preprocessing and feature extraction

Chapter 5 reports on an extensive experiment performed in order to analyze the effects of JPEG and JPEG2000 compression on face recognition performance It is shown that tested recognition methods are remarkably robust to compression, and the conclusions are statisti-cally confirmed using McNemar's hypothesis testing Chapter 6 introduces a feed-forward neural network architecture combined with PCA and LDA into a novel approach Chapter 7 addresses the multi-view recognition problem by using a variant of SVM and decomposing the problem into a series of easier two-class problems Chapter 8 describes three different hardware platforms dedicated to face recognition and brings us one step closer to real-world implementation In chapter 9 authors combine face and gesture recognition in a human-robot interaction framework

Chapter 10 considers fuzzy-geometric approach and symbolic data analysis for modeling the uncertainty of information about facial features Chapter 11 reviews some known ap-proaches (e.g PCA, LDA, LPP, LLE, etc.) and presents a case study of intelligent face recog-nition using global pattern averaging A theoretical analysis and application suggestion of the compact optical parallel correlator for face recognition is presented in chapter 12 Im-proving the quality of co-existence of humans and robots in the same space through another merge of face and gesture recognition is presented in chapter 13, and spontaneous facial ac-tion recognition is addressed in chapter 14

Based on lessons learned from human visual system research and contrary to traditional practice of focusing recognition on internal face features (eyes, nose, and mouth), in chapter

15 a possibility of using external features (hair, forehead, laterals, ears, jaw line and chin) is explored In chapter 16 a hierarchical neural network architecture is used to define a com-mon framework for higher level cognitive functions Simulation is performed indicating that both face recognition and facial expression recognition can be realized efficiently using the presented framework Chapter 17 gives a detailed mathematical overview of some tradi-tional and modern subspace analysis methods, and chapter 18 reviews in depth some near-est feature classifiers and introduces dissimilarity representations as a recognition tool In chapter 19 the authors present a security system in which an image of a known person is matched against multiple images extracted from a video fragment of a person approaching a protected entrance

Chapter 20 presents recent advances in machine analysis of facial expressions with special attention devoted to several techniques recently proposed by the authors 3D face recogni-tion is covered in chapter 21 Basic approaches are discussed and an extensive list of refer-

Trang 7

ences is given, making this chapter an ideal starting point for researchers new in the area

After multi-modal human verification system using face and speech is presented in chapter

22, the same authors present a new face detection and recognition method using optimized

3D information from stereo images in chapter 23 Far-field unconstrained video-to-video

face recognition system is proposed in chapter 24

Chapter 25 examines the results of research on humans in order to come up with some hints

for designs of artificial systems for face recognition Frequency domain processing and

rep-resentation of faces is reviewed in chapter 26 along with a thorough analysis of a family of

advanced frequency domain matching algorithms collectively know as the advanced

corre-lation filters Chapter 27 addresses the problem of class-based image synthesis and

recogni-tion with varying illuminarecogni-tion condirecogni-tions Chapter 28 presents a mixed reality virtual

sys-tem with a framework of using a stereo video and 3D computer graphics model

Mislav Grgic

University of Zagreb Faculty of Electrical Engineering and Computing Department of Wireless Communications Unska 3/XII, HR-10000 Zagreb, Croatia

Trang 9

Preface V

1 3D Face Recognition in a Ambient Intelligence Environment Scenario 001

Andrea F Abate, Stefano Ricciardi and Gabriele Sabatino

2 Achieving Illumination Invariance using Image Filters 015

Ognjen Arandjelovic and Roberto Cipolla

3 Automatic Facial Feature Extraction for Face Recognition 031

Paola Campadelli, Raffaella Lanzarotti and Giuseppe Lipori

4 Wavelets and Face Recognition 059

Dao-Qing Dai and Hong Yan

5 Image Compression Effects in Face Recognition Systems 075

Kresimir Delac, Mislav Grgic and Sonja Grgic

6 PCA and LDA based Neural Networks for Human Face Recognition 093

Alaa Eleyan and Hasan Demirel

7 Multi-View Face Recognition with Min-Max

Modular Support Vector Machines 107

Zhi-Gang Fan and Bao-Liang Lu

8 Design, Implementation and Evaluation of Hardware Vision Systems

dedicated to Real-Time Face Recognition 123

Ginhac Dominique, Yang Fan and Paindavoine Michel

9 Face and Gesture Recognition for Human-Robot Interaction 149

Md Hasanuzzaman and Haruki Ueno

Trang 10

10 Modelling Uncertainty in Representation of Facial

Features for Face Recognition 183

Hiremath P.S., Ajit Danti and Prabhakar C.J

11 Intelligent Global Face Recognition 219

Adnan Khashman

12 Compact Parallel Optical Correlator for

Face Recognition and its Application 235

Kashiko Kodate and Eriko Watanabe

13 Human Detection and Gesture Recognition

Based on Ambient Intelligence 261

Naoyuki Kubota

14 Investigating Spontaneous Facial Action Recognition

through AAM Representations of the Face 275

Simon Lucey, Ahmed Bilal Ashraf and Jeffrey F Cohn

15 Measuring External Face Appearance for Face Classification 287

David Masip, Agata Lapedriza and Jordi Vitria

16 Selection and Efficient Use of Local Features for Face and

Facial Expression Recognition in a Cortical Architecture 305

Masakazu Matsugu

17 Image-based Subspace Analysis for Face Recognition 321

Vo Dinh Minh Nhat and SungYoung Lee

18 Nearest Feature Rules and Dissimilarity Representations

for Face Recognition Problems 337

Mauricio Orozco-Alzate and German Castellanos-Dominguez

19 Improving Face Recognition by Video Spatial Morphing 357

Armando Padilha, Jorge Silva and Raquel Sebastiao

20 Machine Analysis of Facial Expressions 377

Maja Pantic and Marian Stewart Bartlett

21 3D Face Recognition 417

Theodoros Papatheodorou and Daniel Rueckert

22 Multi-Modal Human Verification using Face and Speech 447

Changhan Park and Joonki Paik

Trang 11

23 Face Recognition Using Optimized 3D

Information from Stereo Images 457

Changhan Park and Joonki Paik

24 Far-Field, Multi-Camera, Video-to-Video Face Recognition 467

Aristodemos Pnevmatikakis and Lazaros Polymenakos

25 Facing Visual Tasks Based on Different Cognitive Architectures 487

Marcos Ruiz-Soler and Francesco S Beltran

26 Frequency Domain Face Recognition 495

Marios Savvides, Ramamurthy Bhagavatula, Yung-hui Li and Ramzi Abiantun

27 From Canonical Face to Synthesis

An Illumination Invariant Face Recognition Approach 527

Tele Tan

28 A Feature-level Fusion of Appearance and

Passive Depth Information for Face Recognition 537

Jian-Gang Wang, Kar-Ann Toh, Eric Sung and Wei-Yun Yau

Trang 13

3D Face Recognition

in a Ambient Intelligence Environment Scenario

Andrea F Abate, Stefano Ricciardi and Gabriele Sabatino

Dip di Matematica e Informatica - Università degli Studi di Salerno

Italy

1 Introduction

Information and Communication Technologies are increasingly entering in all aspects of our life and in all sectors, opening a world of unprecedented scenarios where people interact with electronic devices embedded in environments that are sensitive and responsive to the presence of users Indeed, since the first examples of “intelligent” buildings featuring computer aided security and fire safety systems, the request for more sophisticated services, provided according to each user’s specific needs has characterized the new tendencies within domotic research The result of the evolution of the original concept of home automation is known as Ambient Intelligence (Aarts & Marzano, 2003), referring to an environment viewed as a “community” of smart objects powered by computational capability and high user-friendliness, capable of recognizing and responding to the presence

of different individuals in a seamless, not-intrusive and often invisible way As adaptivity here is the key for providing customized services, the role of person sensing and recognition become of fundamental importance

This scenario offers the opportunity to exploit the potential of face as a not intrusive biometric identifier to not just regulate access to the controlled environment but to adapt the provided services to the preferences of the recognized user Biometric recognition (Maltoni

et al., 2003) refers to the use of distinctive physiological (e.g., fingerprints, face, retina, iris) and behavioural (e.g., gait, signature) characteristics, called biometric identifiers, for automatically recognizing individuals Because biometric identifiers cannot be easily misplaced, forged, or shared, they are considered more reliable for person recognition than traditional token or knowledge-based methods Others typical objectives of biometric recognition are user convenience (e.g., service access without a Personal Identification Number), better security (e.g., difficult to forge access) All these reasons make biometrics very suited for Ambient Intelligence applications, and this is specially true for a biometric identifier such as face which is one of the most common methods of recognition that humans use in their visual interactions, and allows to recognize the user in a not intrusive way without any physical contact with the sensor

A generic biometric system could operate either in verification or identification modality, better known as one-to-one and one-to-many recognition (Perronnin & Dugelay, 2003) In the proposed Ambient Intelligence application we are interested in one-to-one recognition,

Trang 14

as we want recognize authorized users accessing the controlled environment or requesting a specific service

We present a face recognition system based on 3D features to verify the identity of subjects accessing the controlled Ambient Intelligence Environment and to customize all the services accordingly In other terms to add a social dimension to man-machine communication and thus may help to make such environments more attractive to the human user The proposed approach relies on stereoscopic face acquisition and 3D mesh reconstruction to avoid highly expensive and not automated 3D scanning, typically not suited for real time applications For each subject enrolled, a bidimensional feature descriptor is extracted from its 3D mesh and compared to the previously stored correspondent template This descriptor is a normal map, namely a color image in which RGB components represent the normals to the face geometry A weighting mask, automatically generated for each authorized person, improves recognition robustness to a wide range of facial expression

This chapter is organized as follows In section 2 related works are presented and the proposed method is introduced In section 3 the proposed face recognition method is presented in detail In section 4 the Ambient Intelligence framework is briefly discussed and experimental results are shown and commented The paper concludes in section 5 showing directions for future research and conclusions

2 Related Works

In their survey on state of the art in 3D and multi-modal face recognition, Bowyer et al (Bowyer et al., 2004) describe the most recent results and research trends, showing that “the variety and sophistication of algorithmic approaches explored is expanding” The main challenges in this field result to be the improvement of recognition accuracy, a greater robustness to facial expressions, and, more recently, the efficiency of algorithms Many methods are based on Principal Component Analysis (PCA), such is the case of Hester et al (Hester et al., 2003) which tested the potential and the limits of PCA varying the number of eigenvectors and the size of range images Pan et al (Pan et al., 2005) apply PCA to a novel mapping of the 3D data to a range, or depth, image, while Xu et al (Xu et al., 2004) aim to divide face in sub-regions using nose as the anchor, PCA to reduce feature space dimensionality and minimum distance for matching Another major research trend is based

on Iterative Closest Point (ICP) algorithm, which has been exploited in many variations for 3D shape aligning, matching or both The first example of this kind of approach to face recognition has been presented from Medioni and Waupotitsch (Medioni & Waupotitsch, 2003), then Lu and Jain (Lu & Jain, 2005) developed an extended version aimed to cope with expressive variations, whereas Chang et al (Chang et al., 2005) proposed to apply ICP not to the whole face but to a set of selected subregions instead

As a real face is fully described by its 3D shape and its texture, it is reasonable to use both kind of data (geometry and color or intensity) to improve recognition reliability: this is the idea behind Multi-Modal or (3D+2D) face recognition The work by Tsalakanidou et al (Tsalakanidou et al., 2003) is based on PCA to compare both probe’s range image and intensity/color image to the gallery, Papatheodorou and Rueckert (Papatheodorou & Rueckert, 2004) presented a 4D registration method based on Iterative Closest Point (ICP), augmented with texture data Bronstein et al (Bronstein et al., 2003) propose a multi-modal 3D + 2D recognition using eigen decomposition of flattened textures and canonical images Other authors combine 3D and 2D similarity scores obtained comparing 3D and 2D profiles

Trang 15

(Beumier & Acheroy, 2000), or extract a feature vector combining Gabor filter responses in 2D and point signatures in 3D (Wang et al., 2003)

3 Description of Facial Recognition System

The basic idea behind proposed system is to represent user’s facial surface by a digital signature called normal map A normal map is an RGB color image providing a 2D representation of the 3D facial surface, in which each normal to each polygon of a given mesh is represented by a RGB color pixel To this aim, we project the 3D geometry onto 2D space through spherical mapping The result is a bidimensional representation of original face geometry which retains spatial relationships between facial features Color info coming from face texture are used to mask eventual beard covered regions according to their relevance, resulting in a 8 bit greyscale filter mask (Flesh Mask) Then, a variety of facial expressions are generated from the neutral pose through a rig-based animation technique, and corresponding normal maps are used to compute a further 8 bit greyscale mask (Expression Weighting Mask) aimed to cope with expression variations At this time the two greyscale masks are multiplied and the resulting map is used to augment with extra 8 bit per pixel the normal map, resulting in a 32 bit RGBA bitmap (Augmented Normal Map) The whole process (see Figure 1) is discussed in depth in the following subsections 3.1 to 3.4

Figure 1 Facial and Facial Expression Recognition workflow

Trang 16

3.1 Face Capturing

As the proposed method works on 3D polygonal meshes we firstly need to acquire actual faces and to represent them as polygonal surfaces The Ambient Intelligence context, in which we are implementing face recognition, requires fast user enrollment to avoid annoying waiting time Usually, most 3D face recognition methods work on a range image

of the face, captured with laser or structured light scanner This kind of devices offer high resolution in the captured data, but they are too slow for a real time face acquisition Face unwanted motion during capturing could be another issue, while laser scanning could not

be harmless to the eyes

For all this reasons we opted for a 3D mesh reconstruction from stereoscopic images, based

on (Enciso et al., 1999) as it requires a simple equipment more likely to be adopted in a real application: a couple of digital cameras shooting at high shutter speed from two slightly different angles with strobe lighting Though the resulting face shape accuracy is inferior compared to real 3D scanning it proved to be sufficient for recognition yet much faster, with

a total time required for mesh reconstruction of about 0.5 sec on a P4/3.4 Ghz based PC, offering additional advantages, such as precise mesh alignment in 3D space thanks to the warp based approach, facial texture generation from the two captured orthogonal views and its automatic mapping onto the reconstructed face geometry

3.2 Building a Normal Map

As the 3D polygonal mesh resulting from the reconstruction process is an approximation of the actual face shape, polygon normals describe local curvature of captured face which could be view as its signature As shown in Figure 2, we intend to represent these normals

by a color image transferring face’s 3D features in a 2D space We also want to preserve the spatial relationships between facial features, so we project vertices’ 3D coordinates onto a 2D space using a spherical projection We can now store normals of mesh M in a bidimensional array N using mapping coordinates, by this way each pixel represents a normal as RGB values We refer the resulting array as the Normal Map N of mesh M and this is the signature we intend to use for the identity verification

Figure 2 (a) 3d mesh model, (b) wireframe model, (c) projection in 2D spatial coordinates, (d) normal map

3.3 Normal Map Comparison

To compare the normal map NA from input subject to another normal map NB previously stored in the reference database, we compute through:

Trang 17

( rN A⋅ rN B+ gN A⋅ gN B + bN A⋅ bN B)

= arccos

the angle included between each pairs of normals represented by colors of pixels with

corresponding mapping coordinates, and store it in a new Difference Map D with

components r, g and b opportunely normalized from spatia l domain to color domain, so

1,

,

0≤r N A g N A b N A ≤ and 0≤r N B,g N B,b N B≤1 The value ǉ, with 0 ǉ < Ǒ, is the angular

difference between the pixels with coordinates (x , N A y N A) in NA and (x , N B y N B) in NB and it is

stored in D as a gray-scale color At this point, the histogram H is analyzed to estimate the

similarity score between NA and NB On the X axis we represent the resulting angles

between each pair of comparisons (sorted from 0° degree to 180° degree), while on the Y

axis we represent the total number of differences found The curvature of H represents the

angular distance distribution between mesh MA and MB, thus two similar faces featuring

very high values on small angles, whereas two unlike faces have more distributed

differences (see Figure 3) We define a similarity score through a weighted sum between H

and a Gaussian function G, as in:

H score similarity

0

2 2 2

2

1

π σ

(2) where with the variation of ǔ and k is possible to change recognition sensibility To reduce

the effects of residual face misalignment during acquisition and sampling phases, we

calculate the angle ǉ using a k × k (usually 3 × 3 or 5 × 5) matrix of neighbour pixels

Figure 3 Example of histogram H to represent the angular distances (a) shows a typical

histogram between two similar Normal Maps, while (b) between two different Normal

Maps

3.4 Addressing Beard and Facial Expressions via 8 bit Alpha Channel

The presence of beard with variable length covering a portion of the face surface in a subject

previously enrolled without it (or vice-versa), could lead to a measurable difference in the

overall or local 3D shape of the face mesh (see Figure 4) In this case the recognition

accuracy could be affected resulting, for instance, in a higher False Rejection Rate FRR To

improve the robustness to this kind of variable facial features we rely on color data from the

captured face texture to mask the non-skin region, eventually disregarding them during the

comparison

Trang 18

Figure 4 Normal maps of the same subject enrolled in two different sessions with and without beard

We exploit flesh hue characterization in the HSB color space to discriminate between skin and beard/moustaches/eyebrows Indeed, the hue component of each given texel is much less affected from lighting conditions during capturing then its corresponding RGB value Nevertheless there could be a wide range of hue values within each skin region due to factors like facial morphology, skin conditions and pathologies, race, etc., so we need to define this range on a case by case basis to obtain a valid mask To this aim we use a set of specific hue sampling spots located over the face texture at absolute coordinates, selected to

be representative of flesh’s full tonal range and possibly distant enough from eyes, lips and typical beard and hair covered regions

Figure 5 Flesh Hue sampling points (a), Flesh Hue Range (b) non-skin regions in white (c) This is possible because each face mesh and its texture are centered and normalized during the image based reconstruction process (i.e the face’s median axis is always centered on the origin of 3D space with horizontal mapping coordinates equal to 0.5), otherwise normal map comparison would not be possible We could use a 2D or 3D technique to locate main facial features (eye, nose and lips) and to position the sampling spots relative to this features, but even these approaches are not safe under all conditions For each sampling spot we sample not just that texel but a 5 x 5 matrix of neighbour texels, averaging them to minimize the effect of local image noise As any sampling spot could casually pick wrong values due to local skin color anomalies such as moles, scars or even for improper positioning, we calculate the median of all resulting hue values from all sampling spots, resulting in a main

Flesh Hue Value FHV which is the center of the valid flesh hue range We therefore consider belonging to skin region all the texels whose hue value is within the range: -t FHV t, where t is a hue tolerance which we experimentally found could be set below 10° (see Figure

5-b) After the skin region has been selected, it is filled with pure white while the remaining pixels are converted to a greyscale value depending on their distance from the selected flesh hue range (the more the distance the darker the value)

Trang 19

To improve the facial recognition system and to address facial expressions we opt to the use

of expression weighting mask, a subject specific pre-calculated mask aimed to assign different relevance to different face regions This mask, which shares the same size of normal map and difference map, contains for each pixel an 8 bit weight encoding the local rigidity of the face surface based on the analysis of a pre-built set of facial expressions of the same subject Indeed, for each subject enrolled, each of expression variations (see Figure 6) is compared to the neutral face resulting in difference maps

Figure 6 An example of normal maps of the same subject featuring a neutral pose (leftmost face) and different facial expressions

The average of this set of difference maps specific to the same individual represent its expression weighting mask More precisely, given a generic face with its normal map N0

(neutral face) and the set of normal maps N1, N2, …, Nn (the expression variations), we first calculate the set of difference map D1, D2, …, Dn resulting from {N0 - N1, N0 - N2, …, N0 –

Nn} The average of set {D1, D2, …, Dn} is the expression weighting mask which is multiplied

by the difference map in each comparison between two faces

We generate the expression variations through a parametric rig based deformation system previously applied to a prototype face mesh, morphed to fit the reconstructed face mesh (Enciso et al., 1999) This fitting is achieved via a landmark-based volume morphing where the transformation and deformation of the prototype mesh is guided by the interpolation of

a set of landmark points with a radial basis function To improve the accuracy of this rough mesh fitting we need a surface optimization obtained minimizing a cost function based on the Euclidean distance between vertices

So we can augment each 24 bit normal map with the product of Flesh Mask and Expression Weighting Mask normalized to 8 bit (see Figure 7) The resulting 32 bit per pixel RGBA bitmap can be conveniently managed via various image formats like the Portable Network Graphics format (PNG) which is typically used to store for each pixel 24 bit of colour and 8 bit of alpha channel (transparency) When comparing any two faces, the difference map is computed on the first 24 bit of color info (normals) and multiplied to the alpha channel (filtering mask)

4 Testing Face Recognition System into an Ambient Intelligence Framework

Ambient Intelligence (AmI) worlds offer exciting potential for rich interactive experiences The metaphor of AmI envisages the future as intelligent environments where humans are surrounded by smart devices that makes the ambient itself perceptive to humans’ needs or wishes The Ambient Intelligence Environment can be defined as the set of actuators and sensors composing the system together with the domotic interconnection protocol People interact with electronic devices embedded in environments that are sensitive and responsive

to the presence of users This objective is achievable if the environment is capable to learn,

Trang 20

build and manipulate user profiles considering from a side the need to clearly identify the human attitude; in other terms, on the basis of physical and emotional user status captured from a set of biometric features

Figure 7 Comparison of two Normal Maps using Flesh Mask and the resulting Difference Map (c)

Figure 8 Ambient Intelligence Architecture

To design Ambient Intelligent Environments, many methodologies and techniques have to

be merged together originating many approaches reported in recent literature (Basten & Geilen, 2003) We opt to a framework aimed to gather biometrical and environmental data, described in (Acampora et al., 2005) to test the effectiveness of face recognition systems to aid security and to recognize the emotional user status This AmI system’s architecture is organized in several sub-systems, as depicted in Figure 8, and it is based on the following

Trang 21

sensors and actuators: internal and external temperature sensors and internal temperature actuator, internal and external luminosity sensor and internal luminosity actuator, indoor presence sensor, a infrared camera to capture thermal images of user and a set of color

cameras to capture information about gait and facial features Firstly Biometric Sensors are

used to gather user’s biometrics (temperature, gait, position, facial expression, etc.) and part

of this information is handled by Morphological Recognition Subsystems (MRS) able to

organize it semantically The resulting description, together with the remaining biometrics previously captured, are organized in a hierarchical structure based on XML technology in

order to create a new markup language, called H2ML (Human to Markup Language)

representing user status at a given time Considering a sequence of H2ML descriptions, the

Behavioral Recognition Engine (BRE), tries to recognize a particular user behaviour for which

the system is able to provide suitable services The available services are regulated by means

of the Service Regulation System (SRS), an array of fuzzy controllers coded in FML (Acampora

& Loia, 2004) aimed to achieve hardware transparency and to minimize the fuzzy inference time

This architecture is able to distribute personalized services on the basis of physical and emotional user status captured from a set of biometric features and modelled by means of a mark-up language, based on XML This approach is particularly suited to exploit biometric technologies to capture user’s physical info gathered in a semantic representation describing

a human in terms of morphological features

4.1 Experimental Results

As one of the aims in experiments was to test the performance of the proposed method in a realistic operative environment, we decided to build a 3D face database from the face capture station used in the domotic system described above The capture station featured two digital cameras with external electronic strobes shooting simultaneously with a shutter speed of 1/250 sec while the subject was looking at a blinking led to reduce posing issues More precisely, every face model in the gallery has been created deforming a pre-aligned prototype polygonal face mesh to closely fit a set of facial features extracted from front and side images of each individual enrolled in the system

Indeed, for each enrolled subject a set of corresponding facial features extracted by a structured snake method from the two orthogonal views are correlated first and then used

to guide the prototype mesh warping, performed through a Dirichlet Free Form Deformation The two captured face images are aligned, combined and blended resulting in

a color texture precisely fitting the reconstructed face mesh through the feature points previously extracted The prototype face mesh used in the dataset has about 7K triangular facets, and even if it is possible to use mesh with higher level of detail we found this resolution to be adequate for face recognition This is mainly due to the optimized tessellation which privileges key area such as eyes, nose and lips whereas a typical mesh produced by 3D scanner features almost evenly spaced vertices Another remarkable advantage involved in the warp based mesh generation is the ability to reproduce a broad range of face variations through a rig based deformation system This technique is commonly used in computer graphics for facial animation (Lee et al., 1995, Blanz & Vetter, 1999) and is easily applied to the prototype mesh linking the rig system to specific subsets of vertices on the face surface Any facial expression could be mimicked opportunely combining the effect of the rig controlling lips, mouth shape, eye closing or opening, nose

Trang 22

tip or bridge, cheek shape, eyebrows shape, etc The facial deformation model we used is based on (Lee et al., 1995) and the resulting expressions are anatomically correct

We augmented the 3D dataset of each enrolled subject through the synthesis of fiften additional expressions selected to represent typical face shape deformation due to facial expressive muscles, each one included in the weighting mask The fiften variations to the neutral face are grouped in three different classes: “good-mood”, “normal-mood” and “bad-mood” emotional status (see Figure 9)

We acquired three set front-side pair of face images from 235 different persons in three subjective facial expression to represent “normal-mood”, “good-mood” and “bad-mood” emotional status respectively (137 males and 98 females, age ranging from 19 to 65)

Figure 9 Facial Expressions grouped in normal-mood (first row), good-mood (second row), bad-mood (third row)

For the first group of experiments, we obtained a database of 235 3D face models in neutral pose (represented by “normal-mood” status) each one augmented with fiften expressive variations Experimental results are generally good in terms of accuracy, showing a Recognition Rate of 100% using the expression weighting mask and flesh mask, the Gaussian function with ǔ=4.5 and k=50 and normal map sized 128 × 128 pixels These results are generally better than those obtained by many 2D algorithms but a more meaningful comparison would require a face dataset featuring both 2D and 3D data To this aim we experimented a PCA-based 2D face recognition algorithm [Moon and Phillips 1998, Martinez and Kak 2001] on the same subjects We have trained the PCA-based recognition system with frontal face images acquired during several enrolment sessions (from 11 to 13 images for each subject), while the probe set is obtained from the same frontal images used

to generate the 3D face mesh for the proposed method This experiment has shown that our method produce better results than a typical PCA-based recognition algorithm on the same subjects More precisely, PCA-based method reached a recognition rate of 88.39% on gray-scaled images sized to 200 × 256 pixels, proving that face dataset was really challenging

Trang 23

0,7 0,75

0,8 0,85

0,9 0,95

Figure 10 shows the precision/recall improvement provided by the expression weighting mask and flesh mask The results showed in Figure 10-a were achieved comparing in one-to-many modality a query set with one expressive variations to an answer set composed by one neutral face plus ten expression variations and one face with beard In Figure 10-b are shown the results of one-to-many comparison between subject with beard and an answer set

Trang 24

composed of one neutral face and ten expressive variations Finally for the test reported in Figure 10-c the query was an expression variation or a face with beard, while the answer set could contain a neutral face plus ten associated expressive variations or a face with beard The three charts clearly show the benefits involved with the use of both expressive and flesh mask, specially when combined together

The second group of experiments has been conducted on FRGC dataset rel 2/Experiment 3s (only shape considered) to test the method's performance with respect to Receiver Operating Characteristic (ROC) curve which plots the False Acceptance Rate (FAR) against Verification Rate (1 – False Rejection Rate or FRR) for various decision thresholds The 4007 faces provided in the dataset have undergone a pre-processing stage to allow our method to work effectively The typical workflow included: mesh alignment using the embedded info provided by FRGC dataset such as outer eye corners, nose tip, chin prominence; mesh subsampling to one fourth or original resolution; mesh cropping to eliminate unwanted detail (hair, neck, ears, etc.); normal map filtering by a 5 × 5 median filter to reduce capture noise and artifacts Fig 11 shows resulting ROC curves with typical ROC values at FAR = 0.001 The Equal Error Rate (EER) measured on all two galleries reaches 5.45% on the our gallery and 6.55% on FRGC dataset

Figure 11 Comparison of ROC curves and Verification Rate at FAR=0.001

Finally, we have tested the method in order to evaluate statistically the behaviour of method

to recognize the “emotional” status of the user To this aim, we have performed a one comparison of a probe set of 3D face models representing real subjective mood status captured by camera (three facial expressions per person) with three gallery set of artificial mood status generated automatically by control rig based deformation system (fifteen facial expression per person grouped as shown in Figure 9) As shown in Table 1, the results are very interesting, because the mean recognition rate on “good-mood” status gallery is 100% while on “normal-mood” and “bad-mood” status galleries is 98.3% and 97.8% respectively

Trang 25

one-to-(probably, because of the propensity of the people to make similar facial expressions for

“normal-mood” and “bad-mood” status)

to a wide posing range

This 3D face recognition method based on 3D geometry and color texture is aimed to improve robustness to presence/absence of beard and to expressive variations It proved to

be simple and fast and experiments conducted showed high average recognition rate and a measurable effectiveness of both flesh mask and expression weighting mask Ongoing research will implement a true multi-modal version of the basic algorithm with a second recognition engine dedicated to the color info (texture) which could further enhance the discriminating power

6 References

Aarts, E & Marzano, S (2003) The New Everyday: Visions of Ambient Intelligence, 010

Publishing, Rotterdam, The Netherlands

Acampora, G & Loia, V (2004) Fuzzy Control Interoperability for Adaptive Domotic

Framework, Proceedings of 2nd IEEE International Conference on Industrial Informatics,

(INDIN04), pp 184-189, 24-26 June 2004, Berlin, Germany

Acampora, G.; Loia, V.; Nappi, M & Ricciardi, S (2005) Human-Based Models for Smart

Devices in Ambient Intelligence, Proceedings of the IEEE International Symposium

on Industrial Electronics ISIE 2005 pp 107- 112, June 20-23, 2005

Basten, T & Geilen, M (2003) Ambient Intelligence: Impact on Embedded System Design, H de

Groot (Eds.), Kluwer Academic Pub., 2003

Beumier, C & Acheroy, M (2000) Automatic Face verification from 3D and grey level cues,

Proceeding of 11th Portuguese Conference on Pattern Recognition (RECPAD 2000), May

2000, Porto, Portugal

Blanz, V & Vetter, T (1999) A morphable model for the synthesis of 3D faces, Proceedings of

SIGGRAPH 99, Los Angeles, CA, ACM, pp 187-194, Aug 1999

Bronstein, A.M.; Bronstein, M.M & Kimmel, R (2003) Expression-invariant 3D face

recognition, Proceedings of Audio and Video-Based Person Authentication (AVBPA

2003), LCNS 2688, J Kittler and M.S Nixon, 62-70,2003

Trang 26

Bowyer, K.W.; Chang, K & Flynn P.A (2004) Survey of 3D and Multi-Modal 3D+2D Face

Recognition, Proceeding of International Conference on Pattern Recognition, ICPR, 2004

Chang, K.I.; Bowyer, K & Flynn, P (2003) Face Recognition Using 2D and 3D Facial Data,

Proceedings of the ACM Workshop on Multimodal User Authentication, pp 25-32, December 2003

Chang, K.I.; Bowyer, K.W & Flynn, P.J (2005) Adaptive rigid multi-region selection for

handling expression variation in 3D face recognition, Proceedings of IEEE Workshop

on Face Recognition Grand Challenge Experiments, June 2005

Enciso, R.; Li, J.; Fidaleo, D.A.; Kim, T-Y; Noh, J-Y & Neumann, U (1999) Synthesis of 3D

Faces, Proceeding of International Workshop on Digital and Computational Video,

DCV'99, December 1999

Hester, C.; Srivastava, A & Erlebacher, G (2003) A novel technique for face recognition

using range images, Proceedings of Seventh Int'l Symposium on Signal Processing and

Its Applications, 2003

Lee, Y.; D Terzopoulos, D & Waters, K (1995) Realistic modeling for facial animation,

Proceedings of SIGGRAPH 95, Los Angeles, CA, ACM, pp 55-62, Aug 1995

Maltoni, D.; Maio D., Jain A.K & Prabhakar S (2003) Handbook of Fingerprint Recognition,

Springer, New York

Medioni,G & Waupotitsch R (2003) Face recognition and modeling in 3D Prooceding of

IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG 2003), pages 232-233, October 2003

Pan, G.; Han, S.; Wu, Z & Wang, Y (2005) 3D face recognition using mapped depth images,

Proceedings of IEEE Workshop on Face Recognition Grand Challenge Experiments, June 2005

Papatheodorou, T & Rueckert, D (2004) Evaluation of Automatic 4D Face Recognition

Using Surface and Texture Registration, Proceedings of the Sixth IEEE International

Conference on Automatic Face and Gesture Recognition, pp 321-326, May 2004, Seoul, Korea

Perronnin, G & Dugelay, J.L (2003) An Introduction to biometrics and face recognition,

Proceedings of IMAGE 2003: Learning, Understanding, Information Retrieval, Medical,Cagliari, Italy, June 2003

Tsalakanidou, F.; Tzovaras, D & Strintzis, M G (2003) Use of depth and color eigenfaces

for face recognition, Pattern Recognition Letters, vol 24, No 9-10, pp 1427-1435,

Jan-2003

Xu, C.; Wang, Y.; Tan, t & Quan, L (2004) Automatic 3D face recognition combining global

geometric features with local shape variation information, Proceedings of Sixth

International Conference on Automated Face and Gesture Recognition, May 2004, pp 308–313

Wang, Y.; Chua, C & Ho, Y (2002) Facial feature detection and face recognition from 2D

and 3D images, Pattern Recognition Letters, 23:1191-1202, 2002

Trang 27

Achieving Illumination Invariance using Image

Filters

Ognjen Arandjeloviþ and Roberto Cipolla

Department of Engineering, University of Cambridge

UK

1 Introduction

In this chapter we are interested in accurately recognizing human faces in the presence of large and unpredictable illumination changes Our aim is to do this in a setup realistic for most practical applications, that is, without overly constraining the conditions in which image data is acquired Specifically, this means that people's motion and head poses are largely uncontrolled, the amount of available training data is limited to a single short sequence per person, and image quality is low

In conditions such as these, invariance to changing lighting is perhaps the most significant practical challenge for face recognition algorithms The illumination setup in which recognition is performed is in most cases impractical to control, its physics difficult to accurately model and face appearance differences due to changing illumination are often larger than those differences between individuals [1] Additionally, the nature of most real-world applications is such that prompt, often real-time system response is needed, demanding appropriately efficient as well as robust matching algorithms

In this chapter we describe a novel framework for rapid recognition under varying illumination, based on simple image filtering techniques The framework is very general and

we demonstrate that it offers a dramatic performance improvement when used with a wide range of filters and different baseline matching algorithms, without sacrificing their computational efficiency

1.1 Previous work and its limitations

The choice of representation, that is, the model used to describe a person's face is central to the problem of automatic face recognition Consider the components of a generic face recognition system schematically shown in Figure 1

A number of approaches in the literature use relatively complex facial and scene models that explicitly separate extrinsic and intrinsic variables which affect appearance In most cases, the complexity of these models makes it impossible to compute model parameters as a

closed-form expression ("Model parameter recovery" in Figure 1) Rather, model fitting is performed through an iterative optimization scheme In the 3D Morphable Model of Blanz

and Vetter [7], for example, the shape and texture of a novel face are recovered through gradient descent by minimizing the discrepancy between the observed and predicted

appearance Similarly, in Elastic Bunch Graph Matching [8, 23], gradient descent is used to

Trang 28

recover the placements of fiducial features, corresponding to bunch graph nodes and the

locations of local texture descriptors In contrast, the Generic Shape-Illumination Manifold

method uses a genetic algorithm to perform a manifold-to-manifold mapping that preserves pose

Figure 1 A diagram of the main components of a generic face recognition system The

"Model parameter recovery" and "Classification" stages can be seen as mutually

complementary: (i) a complex model that explicitly separates extrinsic and intrinsic

appearance variables places most of the workload on the former stage, while the

classification of the representation becomes straightforward; in contrast, (ii) simplistic models have to resort to more statistically sophisticated approaches to matching

Figure 2 (a) The simplest generative model used for face recognition: images are assumed to consist of the low-frequency band that mainly corresponds to illumination changes,

midfrequency band which contains most of the discriminative, personal information and white noise, (b) The results of several most popular image filters operating under the assumption of the frequency model

Trang 29

One of the main limitations of this group of methods arises due to the existence of local minima, of which there are usually many The key problem is that if the fitted model parameters correspond to a local minimum, classification is performed not merely on noise-

contaminated but rather entirely incorrect data An additional unappealing feature of these

methods is that it is also not possible to determine if model fitting failed in such a manner The alternative approach is to employ a simple face appearance model and put greater emphasis on the classification stage This general direction has several advantages which make it attractive from a practical standpoint Firstly, model parameter estimation can now

be performed as a closed-form computation, which is not only more efficient, but also void

of the issue of fitting failure such that can happen in an iterative optimization scheme This allows for more powerful statistical classification, thus clearly separating well understood and explicitly modelled stages in the image formation process, and those that are more easily learnt implicitly from training exemplars This is the methodology followed in this chapter The sections that follow describe the method in detail, followed by a report of experimental results

2 Method details

2.1 Image processing filters

Most relevant to the material presented in this chapter are illumination-normalization

methods that can be broadly described as quasi illumination-invariant image filters These

include high-pass [5] and locally-scaled high-pass filters [21], directional derivatives [1, 10,

13, 18], Laplacian-of-Gaussian filters [1], region-based gamma intensity correction filters [2,17] and edge-maps [1], to name a few These are most commonly based on very simple image formation models, for example modelling illumination as a spatially low-frequency band of the Fourier spectrum and identity-based information as high-frequency [5,11], see Figure 2 Methods of this group can be applied in a straightforward manner to either single

or multiple-image face recognition and are often extremely efficient However, due to the simplistic nature of the underlying models, in general they do not perform well in the presence of extreme illumination changes

2.2 Adapting to data acquisition conditions

The framework proposed in this chapter is motivated by our previous research and the

findings first published in [3] Four face recognition algorithms, the Generic

Shape-Illumination method [3], the Constrained Mutual Subspace Method [12], the commercial system

Facelt and a Kullback-Leibler Divergence-based matching method, were evaluated on a large

database using (i) raw greyscale imagery, (ii) high-pass (HP) filtered imagery and (iii) the Self-Quotient Image (QI) representation [21] Both the high-pass and even further Self Quotient Image representations produced an improvement in recognition for all methods over raw grayscale, as shown in Figure 3, which is consistent with previous findings in the literature [1,5,11,21]

Of importance to this work is that it was also examined in which cases these filters help and how much depending on the data acquisition conditions It was found that recognition rates

using greyscale and either the HP or the QI filter negatively correlated (with p §-0.7), as

illustrated in Figure 4 This finding was observed consistently across the result of the four algorithms, all of which employ mutually drastically different underlying models

Trang 30

a) b) Figure 3 Performance of the (a) Mutual Subspace Method and the (b) Constrained Mutual Subspace Method using raw grey scale imagery, high-pass (HP) filtered imagery and the

Self-Quotient Image (QI), evaluated on over 1300 video sequences with extreme

illumination, pose and head motion variation (as reported in [3]) Shown are the average

performance and ± one standard deviation intervals

Figure 4 A plot of the performance improvement with HP and QI filters against the

performance of unprocessed, raw imagery across different illumination combinations used

in training and test The tests are shown in the order of increasing raw data performance for easier visualization

This is an interesting result: it means that while on average both representations increase the

recognition rate, they actually worsen it in "easy" recognition conditions when no

normalization is needed The observed phenomenon is well understood in the context of energy of intrinsic and extrinsic image differences and noise (see [22] for a thorough discussion) Higher than average recognition rates for raw input correspond to small changes in imaging conditions between training and test, and hence lower energy of extrinsic variation In this case, the two filters decrease the signal-to-noise ratio, worsening

Trang 31

the performance, see Figure 5 (a) On the other hand, when the imaging conditions between training and test are very different, normalization of extrinsic variation is the dominant factor and performance is improved, see Figure 5 (b)

(a) Similar acquisition conditions between sequences

(b) Different acquisition conditions between sequences Figure 5 A conceptual illustration of the distribution of intrinsic, extrinsic and noise signal energies across frequencies in the cases when training and test data acquisition conditions are (a) similar and (b) different, before (left) and after (right) band-pass filtering

This is an important observation: it suggests that the performance of a method that uses either of the representations can be increased further by detecting the difficulty of recognition conditions In this chapter we propose a novel learning framework to do exactly this

2.2.1 Adaptive framework

Our goal is to implicitly learn how similar the novel and training (or gallery) illumination

conditions are, to appropriately emphasize either the raw input guided face comparisons or

of its filtered output

Let be a database of known individuals, novel input corresponding to one

of the gallery classes and ǒ( ) and F( ), respectively, a given similarity function and a quasi illumination-invariant filter We then express the degree of belief μ that two face sets and belong to the same person as a weighted combination of similarities between the corresponding unprocessed and filtered image sets:

(1)

Trang 32

In the light of the previous discussion, we want ǂ* to be small (closer to 0.0) when novel and

the corresponding gallery data have been acquired in similar illuminations, and large (closer

to 1.0) when in very different ones We show that ǂ* can be learnt as a function:

(2)

where μ is the confusion margin - the difference between the similarities of the two most

similar to The value of ǂ* (μ) can then be interpreted as statistically the optimal choice of the mixing coefficient ǂ given the confusion margin μ Formalizing this we can write

(6)

2.2.2 Learning the Į - function

To learn the a-function ǂ* (μ) as defined in (3), we first need an estimate of the joint

probability density p(ǂ, μ) as per (6) The main difficulty of this problem is of practical

nature: in order to obtain an accurate estimate using one of many off-the-shelf density estimation techniques, a prohibitively large training database would be needed to ensure a

well sampled distribution of the variable μ Instead, we propose a heuristic alternative

which, we will show, will allow us to do this from a small training corpus of individuals imaged in various illumination conditions The key idea that makes such a drastic reduction

in the amount of training data possible, is to use domain specific knowledge of the

properties of p(ǂ, μ) in the estimation process

Our algorithm is based on an iterative incremental update of the density, initialized as a

uniform density over the domain ǂ, μ [0,1], see Figure 7 Given a training corpus, we

iteratively simulate matching of an "unknown" person against a set of provisional gallery individuals In each iteration of the algorithm, these are randomly drawn from the offline training database Since the ground truth identities of all persons in the offline database are

known, we can compute the confusion margin μ(ǂ) for each ǂ = k Ʀ ǂ, using the

inter-personal similarity score defined in (1) Density is then incremented at each ((k Ʀ ǂ,

μ (0)) proportionally to μ (k Ʀ ǂ) to reflect the goodness of a particular weighting in the

simulated recognition

The proposed offline learning algorithm is summarized in Figure 6 with a typical evolution

p (ǂ, μ) in Figure 7

Trang 33

The final stage of the offline learning in our method involves imposing the monotonicity

constraint on ǂ* (μ) and smoothing of the result, see Figure 8

Trang 34

Figure 7 The estimate of the joint density p(ǂ, μ) through 550 iterations for a band-pass filter

used for the evaluation of the proposed framework in Section 3.1

Trang 35

Figure 8 Typical estimates of the ǂ -function plotted against confusion margin μ The

estimate shown was computed using 40 individuals in 5 illumination conditions for a Gaussian high-pass filter As expected, ǂ* assumes low values for small confusion margins and high values for large confusion margins (see (1))

CamFace with 100 individuals of varying age and ethnicity, and equally represented

genders For each person in the database we collected 7 video sequences of the person in arbitrary motion (significant translation, yaw and pitch, negligible roll), each in a different illumination setting, see Figure 9 (a) and 10, at l0 fps and 320 x

240 pixel resolution (face size § 60 pixels) 1

ToshFace kindly provided to us by Toshiba Corp This database contains 60 individuals of

varying age, mostly male Japanese, and 10 sequences per person Each sequence corresponds to a different illumination setting, at l0 fps and 320 x 240 pixel resolution (face size § 60 pixels), see Figure 9 (b)

Face Video freely available2 and described in [14] Briefly, it contains 11 individuals and 2

sequences per person, little variation in illumination, but extreme and uncontrolled

Trang 36

variations in pose and motion, acquired at 25fps and 160 x 120 pixel resolution (face

size § 45 pixels), see Figure 9 (c)

Faces96 the most challenging subset of the University of Essex face database, freely available

from http://cswww.essex.ac.uk/mv/allfaces/ faces96 html It contains 152

individuals, most 18-20 years old and a single 20-frame sequence per person in 196

x 196 pixel resolution (face size § 80 pixels) The users were asked to approach the

camera while performing arbitrary head motion Although the illumination was

kept constant throughout each sequence, there is some variation in the manner in

which faces were lit due to the change in the relative position of the user with

respect to the lighting sources, see Figure 9 (d)

For each database except Faces96, we trained our algorithm using a single sequence per

person and tested against a single other sequence per person, acquired in a different session

(for CamFace and ToshFace different sessions correspond to different illumination

condi-tions) Since Faces96 database contains only a single sequence per person, we used the first

frames 1-10 of each for training and frames 11-20 for test Since each video sequence in this

database corresponds to a person walking to the camera, this maximizes the variation in

illumination, scale and pose between training and test, thus maximizing the recognition

challenge

Offline training, that is, the estimation of the a-function (see Section 2.2.2) was performed

using 40 individuals and 5 illuminations from the CamFace database We emphasize that

these were not used as test input for the evaluations reported in the following section

Data acquisition The discussion so far focused on recognition using fixed-scale face

images Our system uses a cascaded detector [20] for localization of faces in cluttered

images, which are then rescaled to the unform resolution of 50 x 50 pixels (approximately

the average size of detected faces in our data set)

• Gaussian high-pass filtered images [5,11] (HP):

(7)

• local intensity-normalized high-pass filtered images - similar to the Self-Quotient Image

[21] (QI):

(8)the division being element-wise,

• distance-transformed edge map [3, 9] (ED):

(9)

• Laplacian-of-Gaussian [1] (LG):

(10) and

• directional grey-scale derivatives [1,10] (DX, DY):

Trang 37

(a) Cambridge Face Database

(b) Toshiba Face Database

(c) Face Video Database

(d) Faces 96 Database Figure 9 Frames from typical video sequences from the four databases used for evaluation

Methods and representations The proposed framework was evaluated using the following filters (illustrated in Figure 11):

Trang 38

For baseline classification, we used two canonical correlations-based [15] methods:

• Constrained MSM (CMSM) [ 12] used in a state-of-the-art commercial system FacePass® [19],

• Mutual Subspace Method (MSM) [12], and

These were chosen as fitting the main premise of the chapter, due to their efficiency, numerical stability and generalization robustness [16] Specifically, we (i) represent each head motion video sequence as a linear subspace, estimated using PCA from appearance images and (ii) compare two such subspaces by computing the first three canonical correlations between them using the method of Björck and Golub [6], that is, as singular values of the matrix where are orthonormal basis of two linear subspaces

(a) FaceDBlOO

(b) FaceDB60 Figure 10 (a) Illuminations 1-7 from database FaceDBlOO and (b) illuminations 1-10 from database FaceDBOO

Figure 11 Examples of the evaluated face representations: raw grey scale input (RW), pass filtered data (HP), the Quotient Image (QI), distance-transformed edge map (ED), Laplacian-of-Gaussian filtered data (LG) and the two principal axis derivatives (DX and DY)

high-3.1 Results

To establish baseline performance, we performed recognition with both MSM and CMSM

using raw data first A summary is shown in Table 3.1 As these results illustrate, the

Cam-Face and ToshFace data sets were found to be very challenging, primarily due to extreme variations in illumination The performance on Face Video and Faces96 databases was sig-

nificantly better This can be explained by noting that the first major source of appearance variation present in these sets, the scale, is normalized for in the data extraction stage; the remainder of the appearance variation is dominated by pose changes, to which MSM and CMSM are particularly robust to [4,16]

Next we evaluated the two methods with each of the 6 filter-based face representations The

recognition results for the CamFace, ToshFace and Faces96 databases are shown in blue in Figure 12, while the results on the Face Video data set are separately shown in Table 2 for the

ease of visualization Confirming the first premise of this work as well as previous research findings, all of the filters produced an improvement in average recognition rates Little

Trang 39

interaction between method/filter combinations was found, Laplacian-of-Gaussian and the horizontal intensity derivative producing the best results and bringing the best and average recognition errors down to 12% and 9% respectively

a) CamFace

(b) ToshFace

(c) Faces96 Figure 12 Error rate statistics The proposed framework (-AD suffix) dramatically improved recognition performance on all method/filter combinations, as witnessed by the reduction

in both error rate averages and their standard deviations The results ofCMSM on Faces96 are not shown as it performed perfectly on this data set

Trang 40

CamFace ToshFace FaceVideoDB Faces96 Average

Table 2 FaceVideoDB, mean error (%)

Finally, in the last set of experiments, we employed each of the 6 filters in the proposed data-adaptive framework The recognition results are shown in red in Figure 12 and in Table

2 for the Face Video database The proposed method produced a dramatic performance

improvement in the case of all filters, reducing the average recognition error rate to only 3%

in the case of CMSM/Laplacian-of-Gaussian combination.This is a very high recognition rate for such unconstrained conditions (see Figure 9), small amount of training data per gallery individual and the degree of illumination, pose and motion pattern variation between different sequences An improvement in the robustness to illumination changes can also be seen in the significantly reduced standard deviation of the recognition, as shown in Figure 12 Finally, it should be emphasized that the demonstrated improvement is obtained with a negligible increase in the computational cost as all time-demanding learning is performed offline

4 Conclusions

In this chapter we described a novel framework for automatic face recognition in the presence of varying illumination, primarily applicable to matching face sets or sequences The framework is based on simple image processing filters that compete with unprocessed greyscale input to yield a single matching score between individuals By performing all numerically consuming computation offline, our method both (i) retains the matching efficiency of simple image filters, but (ii) with a greatly increased robustness, as all online processing is performed in closed-form Evaluated on a large, real-world data corpus, the proposed framework was shown to be successful in video-based recognition across a wide range of illumination, pose and face motion pattern changes

Tiêu đề	Face Recognition
Tác giả	Kresimir Delac, Mislav Grgic
Trường học	I-Tech Education and Publishing
Chuyên ngành	Face Recognition
Thể loại	Edited Book
Năm xuất bản	2007
Thành phố	Vienna

Định dạng
Số trang	570
Dung lượng	15,77 MB