inter-Chapter 1 describes a face recognition system based on 3D features, with applications in Ambient Intelligence Environment.. on a P4/3.4 Ghz based PC, offering additional advantages
Trang 3Edited by Kresimir Delac and Mislav Grgic
I-TECH Education and Publishing
Trang 4Published by the I-Tech Education and Publishing, Vienna, Austria
Abstracting and non-profit use of the material is permitted with credit to the source Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher No responsibility is accepted for the accuracy of information contained in the published articles Publisher assumes no responsibility liability for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained inside After this work has been published by the Advanced Robotic Systems International, authors have the right to republish it, in whole or part, in any publication of which they are an author or editor, and the make other personal use of the work
© 2007 I-Tech Education and Publishing
A catalog record for this book is available from the Austrian Library
Face Recognition, Edited by Kresimir Delac and Mislav Grgic
p cm
ISBN 3-86611-283-1
1 Face Recognition 2 Face sinthesys 3 Applications
Trang 5Face recognition is a task humans perform remarkably easily and successfully This ent simplicity was shown to be dangerously misleading as the automatic face recognition seems to be a problem that is still far from solved In spite of more than 20 years of extensive research, large number of papers published in journals and conferences dedicated to this area, we still can not claim that artificial systems can measure to human performance Automatic face recognition is intricate primarily because of difficult imaging conditions (lighting and viewpoint changes induced by body movement) and because of various other effects like aging, facial expressions, occlusions etc Researchers from computer vision, im-age analysis and processing, pattern recognition, machine learning and other areas are working jointly, motivated largely by a number of possible practical applications
appar-The goal of this book is to give a clear picture of the current state-of-the-art in the field of
automatic face recognition across three main areas of interest: biometrics, cognitive models and
human-computer interaction Face recognition has an important advantage over other ric technologies - it is a nonintrusive and easy to use method As such, it became one of three identification methods used in e-passports and a biometric of choice for many other security applications Cognitive and perception models constitute an important platform for inter-disciplinary research, connecting scientists from seemingly incompatible areas and enabling them to exchange methodologies and results on a common problem Evidence from neuro-biological, psychological, perceptual and cognitive experiments provide potentially useful insights into how our visual system codes, stores and recognizes faces These insights can then be connected to artificial solutions On the other hand, it is generally believed that the success or failure of automatic face recognition systems might inform cognitive and percep-tion science community about which models have the potential to be candidates for those used by humans Making robots and computers more "human" (through human-computer interaction) will improve the quality of human-robot co-existence in the same space and thus alleviate their adoption into our every day lives In order to achieve this, robots must
biomet-be able to identify faces, expressions and emotions while interacting with humans
Hopefully, this book will serve as a handbook for students, researchers and practitioners in the area of automatic (computer) face recognition and inspire some future research ideas by identifying potential research directions The book consists of 28 chapters, each focusing on
a certain aspect of the problem Within every chapter the reader will be given an overview
of background information on the subject at hand and in many cases a description of the thors' original proposed solution The chapters in this book are sorted alphabetically, ac-cording to the first author's surname They should give the reader a general idea where the
Trang 6au-current research efforts are heading, both within the face recognition area itself and in disciplinary approaches
inter-Chapter 1 describes a face recognition system based on 3D features, with applications in Ambient Intelligence Environment The system is placed within a framework of home automation - a community of smart objects powered by high user-friendliness Chapter 2 addresses one of the most intensely researched problems in face recognition - the problem of achieving illumination invariance The authors deal with this problem through a novel framework based on simple image filtering techniques In chapter 3 a novel method for pre-cise automatic localization of certain characteristic points in a face (such as the centers and the corners of the eyes, tip of the nose, etc) is presented An interesting analysis of the rec-ognition rate as a function of eye localization precision is also given Chapter 4 gives a de-tailed introduction into wavelets and their application in face recognition as tools for image preprocessing and feature extraction
Chapter 5 reports on an extensive experiment performed in order to analyze the effects of JPEG and JPEG2000 compression on face recognition performance It is shown that tested recognition methods are remarkably robust to compression, and the conclusions are statisti-cally confirmed using McNemar's hypothesis testing Chapter 6 introduces a feed-forward neural network architecture combined with PCA and LDA into a novel approach Chapter 7 addresses the multi-view recognition problem by using a variant of SVM and decomposing the problem into a series of easier two-class problems Chapter 8 describes three different hardware platforms dedicated to face recognition and brings us one step closer to real-world implementation In chapter 9 authors combine face and gesture recognition in a human-robot interaction framework
Chapter 10 considers fuzzy-geometric approach and symbolic data analysis for modeling the uncertainty of information about facial features Chapter 11 reviews some known ap-proaches (e.g PCA, LDA, LPP, LLE, etc.) and presents a case study of intelligent face recog-nition using global pattern averaging A theoretical analysis and application suggestion of the compact optical parallel correlator for face recognition is presented in chapter 12 Im-proving the quality of co-existence of humans and robots in the same space through another merge of face and gesture recognition is presented in chapter 13, and spontaneous facial ac-tion recognition is addressed in chapter 14
Based on lessons learned from human visual system research and contrary to traditional practice of focusing recognition on internal face features (eyes, nose, and mouth), in chapter
15 a possibility of using external features (hair, forehead, laterals, ears, jaw line and chin) is explored In chapter 16 a hierarchical neural network architecture is used to define a com-mon framework for higher level cognitive functions Simulation is performed indicating that both face recognition and facial expression recognition can be realized efficiently using the presented framework Chapter 17 gives a detailed mathematical overview of some tradi-tional and modern subspace analysis methods, and chapter 18 reviews in depth some near-est feature classifiers and introduces dissimilarity representations as a recognition tool In chapter 19 the authors present a security system in which an image of a known person is matched against multiple images extracted from a video fragment of a person approaching a protected entrance
Chapter 20 presents recent advances in machine analysis of facial expressions with special attention devoted to several techniques recently proposed by the authors 3D face recogni-tion is covered in chapter 21 Basic approaches are discussed and an extensive list of refer-
Trang 7ences is given, making this chapter an ideal starting point for researchers new in the area
After multi-modal human verification system using face and speech is presented in chapter
22, the same authors present a new face detection and recognition method using optimized
3D information from stereo images in chapter 23 Far-field unconstrained video-to-video
face recognition system is proposed in chapter 24
Chapter 25 examines the results of research on humans in order to come up with some hints
for designs of artificial systems for face recognition Frequency domain processing and
rep-resentation of faces is reviewed in chapter 26 along with a thorough analysis of a family of
advanced frequency domain matching algorithms collectively know as the advanced
corre-lation filters Chapter 27 addresses the problem of class-based image synthesis and
recogni-tion with varying illuminarecogni-tion condirecogni-tions Chapter 28 presents a mixed reality virtual
sys-tem with a framework of using a stereo video and 3D computer graphics model
Mislav Grgic
University of Zagreb Faculty of Electrical Engineering and Computing Department of Wireless Communications Unska 3/XII, HR-10000 Zagreb, Croatia
Trang 9Preface V
1 3D Face Recognition in a Ambient Intelligence Environment Scenario 001
Andrea F Abate, Stefano Ricciardi and Gabriele Sabatino
2 Achieving Illumination Invariance using Image Filters 015
Ognjen Arandjelovic and Roberto Cipolla
3 Automatic Facial Feature Extraction for Face Recognition 031
Paola Campadelli, Raffaella Lanzarotti and Giuseppe Lipori
4 Wavelets and Face Recognition 059
Dao-Qing Dai and Hong Yan
5 Image Compression Effects in Face Recognition Systems 075
Kresimir Delac, Mislav Grgic and Sonja Grgic
6 PCA and LDA based Neural Networks for Human Face Recognition 093
Alaa Eleyan and Hasan Demirel
7 Multi-View Face Recognition with Min-Max
Modular Support Vector Machines 107
Zhi-Gang Fan and Bao-Liang Lu
8 Design, Implementation and Evaluation of Hardware Vision Systems
dedicated to Real-Time Face Recognition 123
Ginhac Dominique, Yang Fan and Paindavoine Michel
9 Face and Gesture Recognition for Human-Robot Interaction 149
Md Hasanuzzaman and Haruki Ueno
Trang 1010 Modelling Uncertainty in Representation of Facial
Features for Face Recognition 183
Hiremath P.S., Ajit Danti and Prabhakar C.J
11 Intelligent Global Face Recognition 219
Adnan Khashman
12 Compact Parallel Optical Correlator for
Face Recognition and its Application 235
Kashiko Kodate and Eriko Watanabe
13 Human Detection and Gesture Recognition
Based on Ambient Intelligence 261
Naoyuki Kubota
14 Investigating Spontaneous Facial Action Recognition
through AAM Representations of the Face 275
Simon Lucey, Ahmed Bilal Ashraf and Jeffrey F Cohn
15 Measuring External Face Appearance for Face Classification 287
David Masip, Agata Lapedriza and Jordi Vitria
16 Selection and Efficient Use of Local Features for Face and
Facial Expression Recognition in a Cortical Architecture 305
Masakazu Matsugu
17 Image-based Subspace Analysis for Face Recognition 321
Vo Dinh Minh Nhat and SungYoung Lee
18 Nearest Feature Rules and Dissimilarity Representations
for Face Recognition Problems 337
Mauricio Orozco-Alzate and German Castellanos-Dominguez
19 Improving Face Recognition by Video Spatial Morphing 357
Armando Padilha, Jorge Silva and Raquel Sebastiao
20 Machine Analysis of Facial Expressions 377
Maja Pantic and Marian Stewart Bartlett
21 3D Face Recognition 417
Theodoros Papatheodorou and Daniel Rueckert
22 Multi-Modal Human Verification using Face and Speech 447
Changhan Park and Joonki Paik
Trang 1123 Face Recognition Using Optimized 3D
Information from Stereo Images 457
Changhan Park and Joonki Paik
24 Far-Field, Multi-Camera, Video-to-Video Face Recognition 467
Aristodemos Pnevmatikakis and Lazaros Polymenakos
25 Facing Visual Tasks Based on Different Cognitive Architectures 487
Marcos Ruiz-Soler and Francesco S Beltran
26 Frequency Domain Face Recognition 495
Marios Savvides, Ramamurthy Bhagavatula, Yung-hui Li and Ramzi Abiantun
27 From Canonical Face to Synthesis
An Illumination Invariant Face Recognition Approach 527
Tele Tan
28 A Feature-level Fusion of Appearance and
Passive Depth Information for Face Recognition 537
Jian-Gang Wang, Kar-Ann Toh, Eric Sung and Wei-Yun Yau
Trang 133D Face Recognition
in a Ambient Intelligence Environment Scenario
Andrea F Abate, Stefano Ricciardi and Gabriele Sabatino
Dip di Matematica e Informatica - Università degli Studi di Salerno
Italy
1 Introduction
Information and Communication Technologies are increasingly entering in all aspects of our life and in all sectors, opening a world of unprecedented scenarios where people interact with electronic devices embedded in environments that are sensitive and responsive to the presence of users Indeed, since the first examples of “intelligent” buildings featuring computer aided security and fire safety systems, the request for more sophisticated services, provided according to each user’s specific needs has characterized the new tendencies within domotic research The result of the evolution of the original concept of home automation is known as Ambient Intelligence (Aarts & Marzano, 2003), referring to an environment viewed as a “community” of smart objects powered by computational capability and high user-friendliness, capable of recognizing and responding to the presence
of different individuals in a seamless, not-intrusive and often invisible way As adaptivity here is the key for providing customized services, the role of person sensing and recognition become of fundamental importance
This scenario offers the opportunity to exploit the potential of face as a not intrusive biometric identifier to not just regulate access to the controlled environment but to adapt the provided services to the preferences of the recognized user Biometric recognition (Maltoni
et al., 2003) refers to the use of distinctive physiological (e.g., fingerprints, face, retina, iris) and behavioural (e.g., gait, signature) characteristics, called biometric identifiers, for automatically recognizing individuals Because biometric identifiers cannot be easily misplaced, forged, or shared, they are considered more reliable for person recognition than traditional token or knowledge-based methods Others typical objectives of biometric recognition are user convenience (e.g., service access without a Personal Identification Number), better security (e.g., difficult to forge access) All these reasons make biometrics very suited for Ambient Intelligence applications, and this is specially true for a biometric identifier such as face which is one of the most common methods of recognition that humans use in their visual interactions, and allows to recognize the user in a not intrusive way without any physical contact with the sensor
A generic biometric system could operate either in verification or identification modality, better known as one-to-one and one-to-many recognition (Perronnin & Dugelay, 2003) In the proposed Ambient Intelligence application we are interested in one-to-one recognition,
Trang 14as we want recognize authorized users accessing the controlled environment or requesting a specific service
We present a face recognition system based on 3D features to verify the identity of subjects accessing the controlled Ambient Intelligence Environment and to customize all the services accordingly In other terms to add a social dimension to man-machine communication and thus may help to make such environments more attractive to the human user The proposed approach relies on stereoscopic face acquisition and 3D mesh reconstruction to avoid highly expensive and not automated 3D scanning, typically not suited for real time applications For each subject enrolled, a bidimensional feature descriptor is extracted from its 3D mesh and compared to the previously stored correspondent template This descriptor is a normal map, namely a color image in which RGB components represent the normals to the face geometry A weighting mask, automatically generated for each authorized person, improves recognition robustness to a wide range of facial expression
This chapter is organized as follows In section 2 related works are presented and the proposed method is introduced In section 3 the proposed face recognition method is presented in detail In section 4 the Ambient Intelligence framework is briefly discussed and experimental results are shown and commented The paper concludes in section 5 showing directions for future research and conclusions
2 Related Works
In their survey on state of the art in 3D and multi-modal face recognition, Bowyer et al (Bowyer et al., 2004) describe the most recent results and research trends, showing that “the variety and sophistication of algorithmic approaches explored is expanding” The main challenges in this field result to be the improvement of recognition accuracy, a greater robustness to facial expressions, and, more recently, the efficiency of algorithms Many methods are based on Principal Component Analysis (PCA), such is the case of Hester et al (Hester et al., 2003) which tested the potential and the limits of PCA varying the number of eigenvectors and the size of range images Pan et al (Pan et al., 2005) apply PCA to a novel mapping of the 3D data to a range, or depth, image, while Xu et al (Xu et al., 2004) aim to divide face in sub-regions using nose as the anchor, PCA to reduce feature space dimensionality and minimum distance for matching Another major research trend is based
on Iterative Closest Point (ICP) algorithm, which has been exploited in many variations for 3D shape aligning, matching or both The first example of this kind of approach to face recognition has been presented from Medioni and Waupotitsch (Medioni & Waupotitsch, 2003), then Lu and Jain (Lu & Jain, 2005) developed an extended version aimed to cope with expressive variations, whereas Chang et al (Chang et al., 2005) proposed to apply ICP not to the whole face but to a set of selected subregions instead
As a real face is fully described by its 3D shape and its texture, it is reasonable to use both kind of data (geometry and color or intensity) to improve recognition reliability: this is the idea behind Multi-Modal or (3D+2D) face recognition The work by Tsalakanidou et al (Tsalakanidou et al., 2003) is based on PCA to compare both probe’s range image and intensity/color image to the gallery, Papatheodorou and Rueckert (Papatheodorou & Rueckert, 2004) presented a 4D registration method based on Iterative Closest Point (ICP), augmented with texture data Bronstein et al (Bronstein et al., 2003) propose a multi-modal 3D + 2D recognition using eigen decomposition of flattened textures and canonical images Other authors combine 3D and 2D similarity scores obtained comparing 3D and 2D profiles
Trang 15(Beumier & Acheroy, 2000), or extract a feature vector combining Gabor filter responses in 2D and point signatures in 3D (Wang et al., 2003)
3 Description of Facial Recognition System
The basic idea behind proposed system is to represent user’s facial surface by a digital signature called normal map A normal map is an RGB color image providing a 2D representation of the 3D facial surface, in which each normal to each polygon of a given mesh is represented by a RGB color pixel To this aim, we project the 3D geometry onto 2D space through spherical mapping The result is a bidimensional representation of original face geometry which retains spatial relationships between facial features Color info coming from face texture are used to mask eventual beard covered regions according to their relevance, resulting in a 8 bit greyscale filter mask (Flesh Mask) Then, a variety of facial expressions are generated from the neutral pose through a rig-based animation technique, and corresponding normal maps are used to compute a further 8 bit greyscale mask (Expression Weighting Mask) aimed to cope with expression variations At this time the two greyscale masks are multiplied and the resulting map is used to augment with extra 8 bit per pixel the normal map, resulting in a 32 bit RGBA bitmap (Augmented Normal Map) The whole process (see Figure 1) is discussed in depth in the following subsections 3.1 to 3.4
Figure 1 Facial and Facial Expression Recognition workflow
Trang 163.1 Face Capturing
As the proposed method works on 3D polygonal meshes we firstly need to acquire actual faces and to represent them as polygonal surfaces The Ambient Intelligence context, in which we are implementing face recognition, requires fast user enrollment to avoid annoying waiting time Usually, most 3D face recognition methods work on a range image
of the face, captured with laser or structured light scanner This kind of devices offer high resolution in the captured data, but they are too slow for a real time face acquisition Face unwanted motion during capturing could be another issue, while laser scanning could not
be harmless to the eyes
For all this reasons we opted for a 3D mesh reconstruction from stereoscopic images, based
on (Enciso et al., 1999) as it requires a simple equipment more likely to be adopted in a real application: a couple of digital cameras shooting at high shutter speed from two slightly different angles with strobe lighting Though the resulting face shape accuracy is inferior compared to real 3D scanning it proved to be sufficient for recognition yet much faster, with
a total time required for mesh reconstruction of about 0.5 sec on a P4/3.4 Ghz based PC, offering additional advantages, such as precise mesh alignment in 3D space thanks to the warp based approach, facial texture generation from the two captured orthogonal views and its automatic mapping onto the reconstructed face geometry
3.2 Building a Normal Map
As the 3D polygonal mesh resulting from the reconstruction process is an approximation of the actual face shape, polygon normals describe local curvature of captured face which could be view as its signature As shown in Figure 2, we intend to represent these normals
by a color image transferring face’s 3D features in a 2D space We also want to preserve the spatial relationships between facial features, so we project vertices’ 3D coordinates onto a 2D space using a spherical projection We can now store normals of mesh M in a bidimensional array N using mapping coordinates, by this way each pixel represents a normal as RGB values We refer the resulting array as the Normal Map N of mesh M and this is the signature we intend to use for the identity verification
Figure 2 (a) 3d mesh model, (b) wireframe model, (c) projection in 2D spatial coordinates, (d) normal map
3.3 Normal Map Comparison
To compare the normal map NA from input subject to another normal map NB previously stored in the reference database, we compute through:
Trang 17( rN A⋅ rN B+ gN A⋅ gN B + bN A⋅ bN B)
= arccos
the angle included between each pairs of normals represented by colors of pixels with
corresponding mapping coordinates, and store it in a new Difference Map D with
components r, g and b opportunely normalized from spatia l domain to color domain, so
1,
,
0≤r N A g N A b N A ≤ and 0≤r N B,g N B,b N B≤1 The value lj, with 0 lj < Ǒ, is the angular
difference between the pixels with coordinates (x , N A y N A) in NA and (x , N B y N B) in NB and it is
stored in D as a gray-scale color At this point, the histogram H is analyzed to estimate the
similarity score between NA and NB On the X axis we represent the resulting angles
between each pair of comparisons (sorted from 0° degree to 180° degree), while on the Y
axis we represent the total number of differences found The curvature of H represents the
angular distance distribution between mesh MA and MB, thus two similar faces featuring
very high values on small angles, whereas two unlike faces have more distributed
differences (see Figure 3) We define a similarity score through a weighted sum between H
and a Gaussian function G, as in:
H score similarity
0
2 2 2
2
1
π σ
(2) where with the variation of ǔ and k is possible to change recognition sensibility To reduce
the effects of residual face misalignment during acquisition and sampling phases, we
calculate the angle lj using a k × k (usually 3 × 3 or 5 × 5) matrix of neighbour pixels
Figure 3 Example of histogram H to represent the angular distances (a) shows a typical
histogram between two similar Normal Maps, while (b) between two different Normal
Maps
3.4 Addressing Beard and Facial Expressions via 8 bit Alpha Channel
The presence of beard with variable length covering a portion of the face surface in a subject
previously enrolled without it (or vice-versa), could lead to a measurable difference in the
overall or local 3D shape of the face mesh (see Figure 4) In this case the recognition
accuracy could be affected resulting, for instance, in a higher False Rejection Rate FRR To
improve the robustness to this kind of variable facial features we rely on color data from the
captured face texture to mask the non-skin region, eventually disregarding them during the
comparison
Trang 18Figure 4 Normal maps of the same subject enrolled in two different sessions with and without beard
We exploit flesh hue characterization in the HSB color space to discriminate between skin and beard/moustaches/eyebrows Indeed, the hue component of each given texel is much less affected from lighting conditions during capturing then its corresponding RGB value Nevertheless there could be a wide range of hue values within each skin region due to factors like facial morphology, skin conditions and pathologies, race, etc., so we need to define this range on a case by case basis to obtain a valid mask To this aim we use a set of specific hue sampling spots located over the face texture at absolute coordinates, selected to
be representative of flesh’s full tonal range and possibly distant enough from eyes, lips and typical beard and hair covered regions
Figure 5 Flesh Hue sampling points (a), Flesh Hue Range (b) non-skin regions in white (c) This is possible because each face mesh and its texture are centered and normalized during the image based reconstruction process (i.e the face’s median axis is always centered on the origin of 3D space with horizontal mapping coordinates equal to 0.5), otherwise normal map comparison would not be possible We could use a 2D or 3D technique to locate main facial features (eye, nose and lips) and to position the sampling spots relative to this features, but even these approaches are not safe under all conditions For each sampling spot we sample not just that texel but a 5 x 5 matrix of neighbour texels, averaging them to minimize the effect of local image noise As any sampling spot could casually pick wrong values due to local skin color anomalies such as moles, scars or even for improper positioning, we calculate the median of all resulting hue values from all sampling spots, resulting in a main
Flesh Hue Value FHV which is the center of the valid flesh hue range We therefore consider belonging to skin region all the texels whose hue value is within the range: -t FHV t, where t is a hue tolerance which we experimentally found could be set below 10° (see Figure
5-b) After the skin region has been selected, it is filled with pure white while the remaining pixels are converted to a greyscale value depending on their distance from the selected flesh hue range (the more the distance the darker the value)
Trang 19To improve the facial recognition system and to address facial expressions we opt to the use
of expression weighting mask, a subject specific pre-calculated mask aimed to assign different relevance to different face regions This mask, which shares the same size of normal map and difference map, contains for each pixel an 8 bit weight encoding the local rigidity of the face surface based on the analysis of a pre-built set of facial expressions of the same subject Indeed, for each subject enrolled, each of expression variations (see Figure 6) is compared to the neutral face resulting in difference maps
Figure 6 An example of normal maps of the same subject featuring a neutral pose (leftmost face) and different facial expressions
The average of this set of difference maps specific to the same individual represent its expression weighting mask More precisely, given a generic face with its normal map N0
(neutral face) and the set of normal maps N1, N2, …, Nn (the expression variations), we first calculate the set of difference map D1, D2, …, Dn resulting from {N0 - N1, N0 - N2, …, N0 –
Nn} The average of set {D1, D2, …, Dn} is the expression weighting mask which is multiplied
by the difference map in each comparison between two faces
We generate the expression variations through a parametric rig based deformation system previously applied to a prototype face mesh, morphed to fit the reconstructed face mesh (Enciso et al., 1999) This fitting is achieved via a landmark-based volume morphing where the transformation and deformation of the prototype mesh is guided by the interpolation of
a set of landmark points with a radial basis function To improve the accuracy of this rough mesh fitting we need a surface optimization obtained minimizing a cost function based on the Euclidean distance between vertices
So we can augment each 24 bit normal map with the product of Flesh Mask and Expression Weighting Mask normalized to 8 bit (see Figure 7) The resulting 32 bit per pixel RGBA bitmap can be conveniently managed via various image formats like the Portable Network Graphics format (PNG) which is typically used to store for each pixel 24 bit of colour and 8 bit of alpha channel (transparency) When comparing any two faces, the difference map is computed on the first 24 bit of color info (normals) and multiplied to the alpha channel (filtering mask)
4 Testing Face Recognition System into an Ambient Intelligence Framework
Ambient Intelligence (AmI) worlds offer exciting potential for rich interactive experiences The metaphor of AmI envisages the future as intelligent environments where humans are surrounded by smart devices that makes the ambient itself perceptive to humans’ needs or wishes The Ambient Intelligence Environment can be defined as the set of actuators and sensors composing the system together with the domotic interconnection protocol People interact with electronic devices embedded in environments that are sensitive and responsive
to the presence of users This objective is achievable if the environment is capable to learn,
Trang 20build and manipulate user profiles considering from a side the need to clearly identify the human attitude; in other terms, on the basis of physical and emotional user status captured from a set of biometric features
Figure 7 Comparison of two Normal Maps using Flesh Mask and the resulting Difference Map (c)
Figure 8 Ambient Intelligence Architecture
To design Ambient Intelligent Environments, many methodologies and techniques have to
be merged together originating many approaches reported in recent literature (Basten & Geilen, 2003) We opt to a framework aimed to gather biometrical and environmental data, described in (Acampora et al., 2005) to test the effectiveness of face recognition systems to aid security and to recognize the emotional user status This AmI system’s architecture is organized in several sub-systems, as depicted in Figure 8, and it is based on the following
Trang 21sensors and actuators: internal and external temperature sensors and internal temperature actuator, internal and external luminosity sensor and internal luminosity actuator, indoor presence sensor, a infrared camera to capture thermal images of user and a set of color
cameras to capture information about gait and facial features Firstly Biometric Sensors are
used to gather user’s biometrics (temperature, gait, position, facial expression, etc.) and part
of this information is handled by Morphological Recognition Subsystems (MRS) able to
organize it semantically The resulting description, together with the remaining biometrics previously captured, are organized in a hierarchical structure based on XML technology in
order to create a new markup language, called H2ML (Human to Markup Language)
representing user status at a given time Considering a sequence of H2ML descriptions, the
Behavioral Recognition Engine (BRE), tries to recognize a particular user behaviour for which
the system is able to provide suitable services The available services are regulated by means
of the Service Regulation System (SRS), an array of fuzzy controllers coded in FML (Acampora
& Loia, 2004) aimed to achieve hardware transparency and to minimize the fuzzy inference time
This architecture is able to distribute personalized services on the basis of physical and emotional user status captured from a set of biometric features and modelled by means of a mark-up language, based on XML This approach is particularly suited to exploit biometric technologies to capture user’s physical info gathered in a semantic representation describing
a human in terms of morphological features
4.1 Experimental Results
As one of the aims in experiments was to test the performance of the proposed method in a realistic operative environment, we decided to build a 3D face database from the face capture station used in the domotic system described above The capture station featured two digital cameras with external electronic strobes shooting simultaneously with a shutter speed of 1/250 sec while the subject was looking at a blinking led to reduce posing issues More precisely, every face model in the gallery has been created deforming a pre-aligned prototype polygonal face mesh to closely fit a set of facial features extracted from front and side images of each individual enrolled in the system
Indeed, for each enrolled subject a set of corresponding facial features extracted by a structured snake method from the two orthogonal views are correlated first and then used
to guide the prototype mesh warping, performed through a Dirichlet Free Form Deformation The two captured face images are aligned, combined and blended resulting in
a color texture precisely fitting the reconstructed face mesh through the feature points previously extracted The prototype face mesh used in the dataset has about 7K triangular facets, and even if it is possible to use mesh with higher level of detail we found this resolution to be adequate for face recognition This is mainly due to the optimized tessellation which privileges key area such as eyes, nose and lips whereas a typical mesh produced by 3D scanner features almost evenly spaced vertices Another remarkable advantage involved in the warp based mesh generation is the ability to reproduce a broad range of face variations through a rig based deformation system This technique is commonly used in computer graphics for facial animation (Lee et al., 1995, Blanz & Vetter, 1999) and is easily applied to the prototype mesh linking the rig system to specific subsets of vertices on the face surface Any facial expression could be mimicked opportunely combining the effect of the rig controlling lips, mouth shape, eye closing or opening, nose
Trang 22tip or bridge, cheek shape, eyebrows shape, etc The facial deformation model we used is based on (Lee et al., 1995) and the resulting expressions are anatomically correct
We augmented the 3D dataset of each enrolled subject through the synthesis of fiften additional expressions selected to represent typical face shape deformation due to facial expressive muscles, each one included in the weighting mask The fiften variations to the neutral face are grouped in three different classes: “good-mood”, “normal-mood” and “bad-mood” emotional status (see Figure 9)
We acquired three set front-side pair of face images from 235 different persons in three subjective facial expression to represent “normal-mood”, “good-mood” and “bad-mood” emotional status respectively (137 males and 98 females, age ranging from 19 to 65)
Figure 9 Facial Expressions grouped in normal-mood (first row), good-mood (second row), bad-mood (third row)
For the first group of experiments, we obtained a database of 235 3D face models in neutral pose (represented by “normal-mood” status) each one augmented with fiften expressive variations Experimental results are generally good in terms of accuracy, showing a Recognition Rate of 100% using the expression weighting mask and flesh mask, the Gaussian function with ǔ=4.5 and k=50 and normal map sized 128 × 128 pixels These results are generally better than those obtained by many 2D algorithms but a more meaningful comparison would require a face dataset featuring both 2D and 3D data To this aim we experimented a PCA-based 2D face recognition algorithm [Moon and Phillips 1998, Martinez and Kak 2001] on the same subjects We have trained the PCA-based recognition system with frontal face images acquired during several enrolment sessions (from 11 to 13 images for each subject), while the probe set is obtained from the same frontal images used
to generate the 3D face mesh for the proposed method This experiment has shown that our method produce better results than a typical PCA-based recognition algorithm on the same subjects More precisely, PCA-based method reached a recognition rate of 88.39% on gray-scaled images sized to 200 × 256 pixels, proving that face dataset was really challenging
Trang 230,7 0,75
0,8 0,85
0,9 0,95
Figure 10 shows the precision/recall improvement provided by the expression weighting mask and flesh mask The results showed in Figure 10-a were achieved comparing in one-to-many modality a query set with one expressive variations to an answer set composed by one neutral face plus ten expression variations and one face with beard In Figure 10-b are shown the results of one-to-many comparison between subject with beard and an answer set
Trang 24composed of one neutral face and ten expressive variations Finally for the test reported in Figure 10-c the query was an expression variation or a face with beard, while the answer set could contain a neutral face plus ten associated expressive variations or a face with beard The three charts clearly show the benefits involved with the use of both expressive and flesh mask, specially when combined together
The second group of experiments has been conducted on FRGC dataset rel 2/Experiment 3s (only shape considered) to test the method's performance with respect to Receiver Operating Characteristic (ROC) curve which plots the False Acceptance Rate (FAR) against Verification Rate (1 – False Rejection Rate or FRR) for various decision thresholds The 4007 faces provided in the dataset have undergone a pre-processing stage to allow our method to work effectively The typical workflow included: mesh alignment using the embedded info provided by FRGC dataset such as outer eye corners, nose tip, chin prominence; mesh subsampling to one fourth or original resolution; mesh cropping to eliminate unwanted detail (hair, neck, ears, etc.); normal map filtering by a 5 × 5 median filter to reduce capture noise and artifacts Fig 11 shows resulting ROC curves with typical ROC values at FAR = 0.001 The Equal Error Rate (EER) measured on all two galleries reaches 5.45% on the our gallery and 6.55% on FRGC dataset
Figure 11 Comparison of ROC curves and Verification Rate at FAR=0.001
Finally, we have tested the method in order to evaluate statistically the behaviour of method
to recognize the “emotional” status of the user To this aim, we have performed a one comparison of a probe set of 3D face models representing real subjective mood status captured by camera (three facial expressions per person) with three gallery set of artificial mood status generated automatically by control rig based deformation system (fifteen facial expression per person grouped as shown in Figure 9) As shown in Table 1, the results are very interesting, because the mean recognition rate on “good-mood” status gallery is 100% while on “normal-mood” and “bad-mood” status galleries is 98.3% and 97.8% respectively
Trang 25one-to-(probably, because of the propensity of the people to make similar facial expressions for
“normal-mood” and “bad-mood” status)
to a wide posing range
This 3D face recognition method based on 3D geometry and color texture is aimed to improve robustness to presence/absence of beard and to expressive variations It proved to
be simple and fast and experiments conducted showed high average recognition rate and a measurable effectiveness of both flesh mask and expression weighting mask Ongoing research will implement a true multi-modal version of the basic algorithm with a second recognition engine dedicated to the color info (texture) which could further enhance the discriminating power
6 References
Aarts, E & Marzano, S (2003) The New Everyday: Visions of Ambient Intelligence, 010
Publishing, Rotterdam, The Netherlands
Acampora, G & Loia, V (2004) Fuzzy Control Interoperability for Adaptive Domotic
Framework, Proceedings of 2nd IEEE International Conference on Industrial Informatics,
(INDIN04), pp 184-189, 24-26 June 2004, Berlin, Germany
Acampora, G.; Loia, V.; Nappi, M & Ricciardi, S (2005) Human-Based Models for Smart
Devices in Ambient Intelligence, Proceedings of the IEEE International Symposium
on Industrial Electronics ISIE 2005 pp 107- 112, June 20-23, 2005
Basten, T & Geilen, M (2003) Ambient Intelligence: Impact on Embedded System Design, H de
Groot (Eds.), Kluwer Academic Pub., 2003
Beumier, C & Acheroy, M (2000) Automatic Face verification from 3D and grey level cues,
Proceeding of 11th Portuguese Conference on Pattern Recognition (RECPAD 2000), May
2000, Porto, Portugal
Blanz, V & Vetter, T (1999) A morphable model for the synthesis of 3D faces, Proceedings of
SIGGRAPH 99, Los Angeles, CA, ACM, pp 187-194, Aug 1999
Bronstein, A.M.; Bronstein, M.M & Kimmel, R (2003) Expression-invariant 3D face
recognition, Proceedings of Audio and Video-Based Person Authentication (AVBPA
2003), LCNS 2688, J Kittler and M.S Nixon, 62-70,2003
Trang 26Bowyer, K.W.; Chang, K & Flynn P.A (2004) Survey of 3D and Multi-Modal 3D+2D Face
Recognition, Proceeding of International Conference on Pattern Recognition, ICPR, 2004
Chang, K.I.; Bowyer, K & Flynn, P (2003) Face Recognition Using 2D and 3D Facial Data,
Proceedings of the ACM Workshop on Multimodal User Authentication, pp 25-32, December 2003
Chang, K.I.; Bowyer, K.W & Flynn, P.J (2005) Adaptive rigid multi-region selection for
handling expression variation in 3D face recognition, Proceedings of IEEE Workshop
on Face Recognition Grand Challenge Experiments, June 2005
Enciso, R.; Li, J.; Fidaleo, D.A.; Kim, T-Y; Noh, J-Y & Neumann, U (1999) Synthesis of 3D
Faces, Proceeding of International Workshop on Digital and Computational Video,
DCV'99, December 1999
Hester, C.; Srivastava, A & Erlebacher, G (2003) A novel technique for face recognition
using range images, Proceedings of Seventh Int'l Symposium on Signal Processing and
Its Applications, 2003
Lee, Y.; D Terzopoulos, D & Waters, K (1995) Realistic modeling for facial animation,
Proceedings of SIGGRAPH 95, Los Angeles, CA, ACM, pp 55-62, Aug 1995
Maltoni, D.; Maio D., Jain A.K & Prabhakar S (2003) Handbook of Fingerprint Recognition,
Springer, New York
Medioni,G & Waupotitsch R (2003) Face recognition and modeling in 3D Prooceding of
IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG 2003), pages 232-233, October 2003
Pan, G.; Han, S.; Wu, Z & Wang, Y (2005) 3D face recognition using mapped depth images,
Proceedings of IEEE Workshop on Face Recognition Grand Challenge Experiments, June 2005
Papatheodorou, T & Rueckert, D (2004) Evaluation of Automatic 4D Face Recognition
Using Surface and Texture Registration, Proceedings of the Sixth IEEE International
Conference on Automatic Face and Gesture Recognition, pp 321-326, May 2004, Seoul, Korea
Perronnin, G & Dugelay, J.L (2003) An Introduction to biometrics and face recognition,
Proceedings of IMAGE 2003: Learning, Understanding, Information Retrieval, Medical,Cagliari, Italy, June 2003
Tsalakanidou, F.; Tzovaras, D & Strintzis, M G (2003) Use of depth and color eigenfaces
for face recognition, Pattern Recognition Letters, vol 24, No 9-10, pp 1427-1435,
Jan-2003
Xu, C.; Wang, Y.; Tan, t & Quan, L (2004) Automatic 3D face recognition combining global
geometric features with local shape variation information, Proceedings of Sixth
International Conference on Automated Face and Gesture Recognition, May 2004, pp 308–313
Wang, Y.; Chua, C & Ho, Y (2002) Facial feature detection and face recognition from 2D
and 3D images, Pattern Recognition Letters, 23:1191-1202, 2002
Trang 27Achieving Illumination Invariance using Image
Filters
Ognjen Arandjeloviþ and Roberto Cipolla
Department of Engineering, University of Cambridge
UK
1 Introduction
In this chapter we are interested in accurately recognizing human faces in the presence of large and unpredictable illumination changes Our aim is to do this in a setup realistic for most practical applications, that is, without overly constraining the conditions in which image data is acquired Specifically, this means that people's motion and head poses are largely uncontrolled, the amount of available training data is limited to a single short sequence per person, and image quality is low
In conditions such as these, invariance to changing lighting is perhaps the most significant practical challenge for face recognition algorithms The illumination setup in which recognition is performed is in most cases impractical to control, its physics difficult to accurately model and face appearance differences due to changing illumination are often larger than those differences between individuals [1] Additionally, the nature of most real-world applications is such that prompt, often real-time system response is needed, demanding appropriately efficient as well as robust matching algorithms
In this chapter we describe a novel framework for rapid recognition under varying illumination, based on simple image filtering techniques The framework is very general and
we demonstrate that it offers a dramatic performance improvement when used with a wide range of filters and different baseline matching algorithms, without sacrificing their computational efficiency
1.1 Previous work and its limitations
The choice of representation, that is, the model used to describe a person's face is central to the problem of automatic face recognition Consider the components of a generic face recognition system schematically shown in Figure 1
A number of approaches in the literature use relatively complex facial and scene models that explicitly separate extrinsic and intrinsic variables which affect appearance In most cases, the complexity of these models makes it impossible to compute model parameters as a
closed-form expression ("Model parameter recovery" in Figure 1) Rather, model fitting is performed through an iterative optimization scheme In the 3D Morphable Model of Blanz
and Vetter [7], for example, the shape and texture of a novel face are recovered through gradient descent by minimizing the discrepancy between the observed and predicted
appearance Similarly, in Elastic Bunch Graph Matching [8, 23], gradient descent is used to
Trang 28recover the placements of fiducial features, corresponding to bunch graph nodes and the
locations of local texture descriptors In contrast, the Generic Shape-Illumination Manifold
method uses a genetic algorithm to perform a manifold-to-manifold mapping that preserves pose
Figure 1 A diagram of the main components of a generic face recognition system The
"Model parameter recovery" and "Classification" stages can be seen as mutually
complementary: (i) a complex model that explicitly separates extrinsic and intrinsic
appearance variables places most of the workload on the former stage, while the
classification of the representation becomes straightforward; in contrast, (ii) simplistic models have to resort to more statistically sophisticated approaches to matching
Figure 2 (a) The simplest generative model used for face recognition: images are assumed to consist of the low-frequency band that mainly corresponds to illumination changes,
midfrequency band which contains most of the discriminative, personal information and white noise, (b) The results of several most popular image filters operating under the assumption of the frequency model
Trang 29One of the main limitations of this group of methods arises due to the existence of local minima, of which there are usually many The key problem is that if the fitted model parameters correspond to a local minimum, classification is performed not merely on noise-
contaminated but rather entirely incorrect data An additional unappealing feature of these
methods is that it is also not possible to determine if model fitting failed in such a manner The alternative approach is to employ a simple face appearance model and put greater emphasis on the classification stage This general direction has several advantages which make it attractive from a practical standpoint Firstly, model parameter estimation can now
be performed as a closed-form computation, which is not only more efficient, but also void
of the issue of fitting failure such that can happen in an iterative optimization scheme This allows for more powerful statistical classification, thus clearly separating well understood and explicitly modelled stages in the image formation process, and those that are more easily learnt implicitly from training exemplars This is the methodology followed in this chapter The sections that follow describe the method in detail, followed by a report of experimental results
2 Method details
2.1 Image processing filters
Most relevant to the material presented in this chapter are illumination-normalization
methods that can be broadly described as quasi illumination-invariant image filters These
include high-pass [5] and locally-scaled high-pass filters [21], directional derivatives [1, 10,
13, 18], Laplacian-of-Gaussian filters [1], region-based gamma intensity correction filters [2,17] and edge-maps [1], to name a few These are most commonly based on very simple image formation models, for example modelling illumination as a spatially low-frequency band of the Fourier spectrum and identity-based information as high-frequency [5,11], see Figure 2 Methods of this group can be applied in a straightforward manner to either single
or multiple-image face recognition and are often extremely efficient However, due to the simplistic nature of the underlying models, in general they do not perform well in the presence of extreme illumination changes
2.2 Adapting to data acquisition conditions
The framework proposed in this chapter is motivated by our previous research and the
findings first published in [3] Four face recognition algorithms, the Generic
Shape-Illumination method [3], the Constrained Mutual Subspace Method [12], the commercial system
Facelt and a Kullback-Leibler Divergence-based matching method, were evaluated on a large
database using (i) raw greyscale imagery, (ii) high-pass (HP) filtered imagery and (iii) the Self-Quotient Image (QI) representation [21] Both the high-pass and even further Self Quotient Image representations produced an improvement in recognition for all methods over raw grayscale, as shown in Figure 3, which is consistent with previous findings in the literature [1,5,11,21]
Of importance to this work is that it was also examined in which cases these filters help and how much depending on the data acquisition conditions It was found that recognition rates
using greyscale and either the HP or the QI filter negatively correlated (with p §-0.7), as
illustrated in Figure 4 This finding was observed consistently across the result of the four algorithms, all of which employ mutually drastically different underlying models
Trang 30a) b) Figure 3 Performance of the (a) Mutual Subspace Method and the (b) Constrained Mutual Subspace Method using raw grey scale imagery, high-pass (HP) filtered imagery and the
Self-Quotient Image (QI), evaluated on over 1300 video sequences with extreme
illumination, pose and head motion variation (as reported in [3]) Shown are the average
performance and ± one standard deviation intervals
Figure 4 A plot of the performance improvement with HP and QI filters against the
performance of unprocessed, raw imagery across different illumination combinations used
in training and test The tests are shown in the order of increasing raw data performance for easier visualization
This is an interesting result: it means that while on average both representations increase the
recognition rate, they actually worsen it in "easy" recognition conditions when no
normalization is needed The observed phenomenon is well understood in the context of energy of intrinsic and extrinsic image differences and noise (see [22] for a thorough discussion) Higher than average recognition rates for raw input correspond to small changes in imaging conditions between training and test, and hence lower energy of extrinsic variation In this case, the two filters decrease the signal-to-noise ratio, worsening
Trang 31the performance, see Figure 5 (a) On the other hand, when the imaging conditions between training and test are very different, normalization of extrinsic variation is the dominant factor and performance is improved, see Figure 5 (b)
(a) Similar acquisition conditions between sequences
(b) Different acquisition conditions between sequences Figure 5 A conceptual illustration of the distribution of intrinsic, extrinsic and noise signal energies across frequencies in the cases when training and test data acquisition conditions are (a) similar and (b) different, before (left) and after (right) band-pass filtering
This is an important observation: it suggests that the performance of a method that uses either of the representations can be increased further by detecting the difficulty of recognition conditions In this chapter we propose a novel learning framework to do exactly this
2.2.1 Adaptive framework
Our goal is to implicitly learn how similar the novel and training (or gallery) illumination
conditions are, to appropriately emphasize either the raw input guided face comparisons or
of its filtered output
Let be a database of known individuals, novel input corresponding to one
of the gallery classes and ǒ( ) and F( ), respectively, a given similarity function and a quasi illumination-invariant filter We then express the degree of belief μ that two face sets and belong to the same person as a weighted combination of similarities between the corresponding unprocessed and filtered image sets:
(1)
Trang 32In the light of the previous discussion, we want ǂ* to be small (closer to 0.0) when novel and
the corresponding gallery data have been acquired in similar illuminations, and large (closer
to 1.0) when in very different ones We show that ǂ* can be learnt as a function:
(2)
where μ is the confusion margin - the difference between the similarities of the two most
similar to The value of ǂ* (μ) can then be interpreted as statistically the optimal choice of the mixing coefficient ǂ given the confusion margin μ Formalizing this we can write
(6)
2.2.2 Learning the Į - function
To learn the a-function ǂ* (μ) as defined in (3), we first need an estimate of the joint
probability density p(ǂ, μ) as per (6) The main difficulty of this problem is of practical
nature: in order to obtain an accurate estimate using one of many off-the-shelf density estimation techniques, a prohibitively large training database would be needed to ensure a
well sampled distribution of the variable μ Instead, we propose a heuristic alternative
which, we will show, will allow us to do this from a small training corpus of individuals imaged in various illumination conditions The key idea that makes such a drastic reduction
in the amount of training data possible, is to use domain specific knowledge of the
properties of p(ǂ, μ) in the estimation process
Our algorithm is based on an iterative incremental update of the density, initialized as a
uniform density over the domain ǂ, μ [0,1], see Figure 7 Given a training corpus, we
iteratively simulate matching of an "unknown" person against a set of provisional gallery individuals In each iteration of the algorithm, these are randomly drawn from the offline training database Since the ground truth identities of all persons in the offline database are
known, we can compute the confusion margin μ(ǂ) for each ǂ = k Ʀ ǂ, using the
inter-personal similarity score defined in (1) Density is then incremented at each ((k Ʀ ǂ,
μ (0)) proportionally to μ (k Ʀ ǂ) to reflect the goodness of a particular weighting in the
simulated recognition
The proposed offline learning algorithm is summarized in Figure 6 with a typical evolution
p (ǂ, μ) in Figure 7
Trang 33The final stage of the offline learning in our method involves imposing the monotonicity
constraint on ǂ* (μ) and smoothing of the result, see Figure 8
Trang 34Figure 7 The estimate of the joint density p(ǂ, μ) through 550 iterations for a band-pass filter
used for the evaluation of the proposed framework in Section 3.1
Trang 35Figure 8 Typical estimates of the ǂ -function plotted against confusion margin μ The
estimate shown was computed using 40 individuals in 5 illumination conditions for a Gaussian high-pass filter As expected, ǂ* assumes low values for small confusion margins and high values for large confusion margins (see (1))
CamFace with 100 individuals of varying age and ethnicity, and equally represented
genders For each person in the database we collected 7 video sequences of the person in arbitrary motion (significant translation, yaw and pitch, negligible roll), each in a different illumination setting, see Figure 9 (a) and 10, at l0 fps and 320 x
240 pixel resolution (face size § 60 pixels) 1
ToshFace kindly provided to us by Toshiba Corp This database contains 60 individuals of
varying age, mostly male Japanese, and 10 sequences per person Each sequence corresponds to a different illumination setting, at l0 fps and 320 x 240 pixel resolution (face size § 60 pixels), see Figure 9 (b)
Face Video freely available2 and described in [14] Briefly, it contains 11 individuals and 2
sequences per person, little variation in illumination, but extreme and uncontrolled
Trang 36variations in pose and motion, acquired at 25fps and 160 x 120 pixel resolution (face
size § 45 pixels), see Figure 9 (c)
Faces96 the most challenging subset of the University of Essex face database, freely available
from http://cswww.essex.ac.uk/mv/allfaces/ faces96 html It contains 152
individuals, most 18-20 years old and a single 20-frame sequence per person in 196
x 196 pixel resolution (face size § 80 pixels) The users were asked to approach the
camera while performing arbitrary head motion Although the illumination was
kept constant throughout each sequence, there is some variation in the manner in
which faces were lit due to the change in the relative position of the user with
respect to the lighting sources, see Figure 9 (d)
For each database except Faces96, we trained our algorithm using a single sequence per
person and tested against a single other sequence per person, acquired in a different session
(for CamFace and ToshFace different sessions correspond to different illumination
condi-tions) Since Faces96 database contains only a single sequence per person, we used the first
frames 1-10 of each for training and frames 11-20 for test Since each video sequence in this
database corresponds to a person walking to the camera, this maximizes the variation in
illumination, scale and pose between training and test, thus maximizing the recognition
challenge
Offline training, that is, the estimation of the a-function (see Section 2.2.2) was performed
using 40 individuals and 5 illuminations from the CamFace database We emphasize that
these were not used as test input for the evaluations reported in the following section
Data acquisition The discussion so far focused on recognition using fixed-scale face
images Our system uses a cascaded detector [20] for localization of faces in cluttered
images, which are then rescaled to the unform resolution of 50 x 50 pixels (approximately
the average size of detected faces in our data set)
• Gaussian high-pass filtered images [5,11] (HP):
(7)
• local intensity-normalized high-pass filtered images - similar to the Self-Quotient Image
[21] (QI):
(8)the division being element-wise,
• distance-transformed edge map [3, 9] (ED):
(9)
• Laplacian-of-Gaussian [1] (LG):
(10) and
• directional grey-scale derivatives [1,10] (DX, DY):
Trang 37(a) Cambridge Face Database
(b) Toshiba Face Database
(c) Face Video Database
(d) Faces 96 Database Figure 9 Frames from typical video sequences from the four databases used for evaluation
Methods and representations The proposed framework was evaluated using the following filters (illustrated in Figure 11):
Trang 38For baseline classification, we used two canonical correlations-based [15] methods:
• Constrained MSM (CMSM) [ 12] used in a state-of-the-art commercial system FacePass® [19],
• Mutual Subspace Method (MSM) [12], and
These were chosen as fitting the main premise of the chapter, due to their efficiency, numerical stability and generalization robustness [16] Specifically, we (i) represent each head motion video sequence as a linear subspace, estimated using PCA from appearance images and (ii) compare two such subspaces by computing the first three canonical correlations between them using the method of Björck and Golub [6], that is, as singular values of the matrix where are orthonormal basis of two linear subspaces
(a) FaceDBlOO
(b) FaceDB60 Figure 10 (a) Illuminations 1-7 from database FaceDBlOO and (b) illuminations 1-10 from database FaceDBOO
Figure 11 Examples of the evaluated face representations: raw grey scale input (RW), pass filtered data (HP), the Quotient Image (QI), distance-transformed edge map (ED), Laplacian-of-Gaussian filtered data (LG) and the two principal axis derivatives (DX and DY)
high-3.1 Results
To establish baseline performance, we performed recognition with both MSM and CMSM
using raw data first A summary is shown in Table 3.1 As these results illustrate, the
Cam-Face and ToshFace data sets were found to be very challenging, primarily due to extreme variations in illumination The performance on Face Video and Faces96 databases was sig-
nificantly better This can be explained by noting that the first major source of appearance variation present in these sets, the scale, is normalized for in the data extraction stage; the remainder of the appearance variation is dominated by pose changes, to which MSM and CMSM are particularly robust to [4,16]
Next we evaluated the two methods with each of the 6 filter-based face representations The
recognition results for the CamFace, ToshFace and Faces96 databases are shown in blue in Figure 12, while the results on the Face Video data set are separately shown in Table 2 for the
ease of visualization Confirming the first premise of this work as well as previous research findings, all of the filters produced an improvement in average recognition rates Little
Trang 39interaction between method/filter combinations was found, Laplacian-of-Gaussian and the horizontal intensity derivative producing the best results and bringing the best and average recognition errors down to 12% and 9% respectively
a) CamFace
(b) ToshFace
(c) Faces96 Figure 12 Error rate statistics The proposed framework (-AD suffix) dramatically improved recognition performance on all method/filter combinations, as witnessed by the reduction
in both error rate averages and their standard deviations The results ofCMSM on Faces96 are not shown as it performed perfectly on this data set
Trang 40CamFace ToshFace FaceVideoDB Faces96 Average
Table 2 FaceVideoDB, mean error (%)
Finally, in the last set of experiments, we employed each of the 6 filters in the proposed data-adaptive framework The recognition results are shown in red in Figure 12 and in Table
2 for the Face Video database The proposed method produced a dramatic performance
improvement in the case of all filters, reducing the average recognition error rate to only 3%
in the case of CMSM/Laplacian-of-Gaussian combination.This is a very high recognition rate for such unconstrained conditions (see Figure 9), small amount of training data per gallery individual and the degree of illumination, pose and motion pattern variation between different sequences An improvement in the robustness to illumination changes can also be seen in the significantly reduced standard deviation of the recognition, as shown in Figure 12 Finally, it should be emphasized that the demonstrated improvement is obtained with a negligible increase in the computational cost as all time-demanding learning is performed offline
4 Conclusions
In this chapter we described a novel framework for automatic face recognition in the presence of varying illumination, primarily applicable to matching face sets or sequences The framework is based on simple image processing filters that compete with unprocessed greyscale input to yield a single matching score between individuals By performing all numerically consuming computation offline, our method both (i) retains the matching efficiency of simple image filters, but (ii) with a greatly increased robustness, as all online processing is performed in closed-form Evaluated on a large, real-world data corpus, the proposed framework was shown to be successful in video-based recognition across a wide range of illumination, pose and face motion pattern changes