using active appearance models for face recognition

Once derived, this model givesthe opportunity for various applications to use it for further investigations ofthe modelled face like characterise the pose, expression, or identity of a f

Trang 1

Active Appearance Models for Face Recognition

Paul Ivan ivan.paul@gmail.com

Supervisor: dr Sandjai Bhulai

April 4, 2007

Vrije Universiteit Amsterdam Faculteit der Exacte Wetenschappen Business Mathematics & Informatics

De Boelelaan 1081a

1081 HV Amsterdam

Trang 2

2

Trang 3

A growing number of applications are starting to use face recognition as theinitial step towards interpreting human actions, intention, and behaviour, as

a central part of next-generation smart environments Recognition of facialexpressions is an important example of face-recognition techniques used inthese smart environments In order to be able to recognize faces, there aresome difficulties to overcome Faces are highly variable, deformable objects,and can have very different appearances in images depending on pose, light-ing, expression, and the identity of the person Besides that, face images canhave different backgrounds, differences in image resolution, contrast, bright-ness, sharpness, and colour balance

This paper describes a model-based approach, called Active AppearanceModels, for the interpretation of face images, capable of overcoming thesedifficulties This method is capable of ‘explaining’ the appearance of a face interms of a compact set of model parameters Once derived, this model givesthe opportunity for various applications to use it for further investigations ofthe modelled face (like characterise the pose, expression, or identity of a face).The second part of this paper describes some variations on Active AppearanceModels aimed at increasing the performance and the computational speed ofActive Appearance Models

Trang 4

4

Trang 5

This paper was written as part of the master Business Mathematics andInformatics at the Vrije Universiteit, Amsterdam The main goal of this as-signment is to write a clear and concise paper on a certain scientific problem,with a knowledgable manager as the target audience

I want to thank dr Sandjai Bhulai for helping me defining a good ject for this paper and his comments during the writing-process

sub-Paul Ivan

Amsterdam, April 4, 2007

Trang 6

6

Trang 7

2 Active Appearance Models 13

2.1 Statistical Shape Models 13

2.2 Statistical Texture Models 16

2.3 The Combined Appearance Model 18

2.4 The Active Appearance Search Algorithm 20

2.5 Multi-resolution Implementation 22

2.6 Example of a Run 23

3 Variations on the AAMs 27 3.1 Sub-sampling during Search 27

3.2 Search Using Shape Parameters 28

3.3 Direct AAMs 29

3.4 Compositional Approach 30

4 Experimental Results 31 4.1 Sub-sampling vs Shape vs Basic 31

4.2 Comparative performance 33

7

Trang 8

8 CONTENTS

Trang 9

Chapter 1

Introduction

Researchers today are actively building smart environments These ronments, such as rooms, cars, offices, and stores, are equipped with smartvisual, audio, and touch sensitive applications The key goal of these ap-plications is usually to give machines perceptual abilities that allow them

envi-to function naturally with people, in other words, envi-to recognize the peopleand remember their preferences and characteristics, to know what they arelooking at, and to interpret their words, gestures, and unconscious cues such

as vocal prosody and body language [7]

A growing number of applications are starting to use face recognition asthe initial step towards interpreting human actions, intention, and behaviour,

as a central part of next-generation smart environments Many of the actionsand behaviours humans display can only be interpreted if you also know theperson’s identity, and the identity of the people around them

Recognition of facial expressions is an important example of face-recognitiontechniques used in these smart environments It can, for example, be usefulfor a smart system to know whether the user looks impatient because infor-mation is being presented too slowly, or confused because it is going too fast.Facial expressions provide clues for identifying and distinguishing betweenthese different moods In recent years, much effort has been put into thearea of recognizing facial expressions, a capability that is critical for a variety

of human-machine interfaces, with the hope of creating person-independentexpression recognition capability Other examples of face-recognition tech-niques are recognizing the identity of a face/person or characterizing the pose

of a face

Various fields could benefit of systems capable of automatically extractingthis kind of information from images (or sequences of images, like a video-stream) For example, a store equipped with a smart system capable ofexpression recognition could benefit from this information in several ways

9

Trang 10

10 CHAPTER 1 INTRODUCTION

Such a system could monitor the reaction of people to certain advertisements

or products in the store, or the other way around, they could adjust theirin-store advertisements based on the expressions of the customers In thesame manner, marketing research could be done with cameras monitoringthe reaction of people to their products Face recognition techniques aimed

at recognizing the identity of a person, could help such a store when a valuedrepeat customer enters a store

Other examples are, behaviour monitoring in an eldercare or childcare cility, and command-and-control interfaces in a military or industrial setting

fa-In each of these applications identity information is crucial in order to providemachines with the background knowledge needed to interpret measurementsand observations of human actions

Goals and Overview In order to be able to recognize faces, there aresome difficulties to overcome Faces are highly variable, deformable objects,and can have very different appearances in images depending on pose, light-ing, expression, and the identity of the person Besides that, face images canhave different backgrounds, differences in image resolution, contrast, bright-ness, sharpness, and colour balance This means that interpretation of suchimages/faces requires the ability to understand this variability in order toextract useful information and this extracted information must be of somemanageable size, because a typical face image is far too large to use for anyclassification task directly

Another important feature of face-recognition techniques is real-time plicability For an application in a store, as described above, to be successful,the system must be fast enough to capture all the relevant information de-rived from video images If the computation takes too long, the person might

ap-be gone, or might have a different expression The need for real-time cability thus demands for high performance and efficiency of applications forface recognition

appli-This paper describes a model-based approach for the interpretation offace images, capable of overcoming these difficulties This method is capable

of ‘explaining’ the appearance of a face in terms of a compact set of modelparameters The created models are realistically looking faces, closely resem-bling the original face depicted in the face image Once derived, this modelgives the opportunity for various applications to use it for further investiga-tions of the modelled face (like characterise the pose, expression, or identity

of a face)

This method, called Active Appearance Models, in its basic form is scribed in Chapter 2 Because of the need for real-time applications using this

Trang 11

technology, variations on the basic form aimed at increasing the performanceand the computational speed are discussed in Chapter 3 Some experimentalresults of comparative tests between the basic form and the variations arepresented in Chapter 4 Finally, a general conclusion/discussion will be given

in Chapter 5

Trang 12

12 CHAPTER 1 INTRODUCTION

Trang 13

Chapter 2

Active Appearance Models

The Active Appearance Model, as described by Cootes, Taylor, and Edwards(see, [1] and [6]) requires a combination of statistical shape and texture mod-els to form a combined appearance model This combined appearance model

is then trained with a set of example images After training the model, newimages can be interpreted using the Active Appearance Search Algorithm.This chapter will describe these models in detail, mostly following to thework of [1], [6], and [5]

2.1 Statistical Shape Models

The statistical shape model is used to represent objects in images A shape

is described by a set of n points The points are called landmarks and areoften in 2D or 3D space The goal of the statistical shape model is to derive amodel which allows us to both analyze new shapes and to synthesize shapessimilar to those in the training set The training set is often generated byhand annotation of a set of training images, an example of such a hand-annotated image can be seen in Figure 2.1 By analyzing the variations inshape over the training set, a model is built which can mimic this variation

If in the two dimensional case a shape is defined by n points, we representthe shape by a 2n element vector formed by concatenating the elements ofthe individual point positions:

If we have a training set of s training examples, we generate s such vectors

xi, in which xi is the shape vector of shape i Now, because faces in theimages in the training set can be at different positions, of different size, andhave different orientation, we wish to align the training set before we perform

13

Trang 14

14 CHAPTER 2 ACTIVE APPEARANCE MODELS

Figure 2.1: Hand annotated face

statistical analysis The most popular approach of doing this is aligning eachshape by minimizing the sum of distances of each shape to the mean shapevector, ¯x, over all s shape vectors

1 Translate each example so that its center of gravity1 is at the origin

2 Choose one example as an initial estimate of the mean shape and scale

Trang 15

2.1 STATISTICAL SHAPE MODELS 15

3 Record the first estimate as ¯xi, with i = 0 to define the default referenceframe

4 Align all the shapes with the current estimate of the mean shape

5 Re-estimate the mean from the aligned shapes

6 Apply constraints in the current estimate of the mean by aligning itwith ¯xi and scaling so that ||¯xi+1|| = 1, set i = i + 1 and record thisestimate as ¯xi

7 If not converged, return to 4 (convergence is declared if the estimate

of the mean does not change significantly after an iteration.)

We now have a set s of points xi, aligned into a common co-ordinate frame.These vectors form a distribution in the 2n dimensional space in which theylive We wish to model this distribution to be able to generate new examples,similar to those in the training set, and to be able to examine new shapes todecide whether they are plausible examples

We would like to have a parametrized model M of the form x = M (b),where b is a vector of the parameters of the model To be able to derive such

a model we first reduce the dimensionality of the data from 2n to a moremanageable size This is done by applying Principal Component Analysis(PCA) PCA is used to extract the main features of the data, by seekingthe direction in the feature space which accounts for the largest amount ofvariance in the data set with possible correlations between variables Thisdirection (the first principal component) becomes the first axis of the newfeature space This process is repeated to derive the second principal com-ponent, and so on until either all variance is explained in the new featurespace or the total explained variance is above a certain threshold (l) Thisapproach is as follows:

1 Compute the mean of the data,

Trang 16

3 Compute the eigenvectors φi and the corresponding eigenvalues λs,i of

S (sorted such that λs,i≥ λs,i+1)

Then, if Ps contains the l eigenvectors corresponding to the largest values, then we can approximate any shape vector x of the training set using:

Now we have the parametrized form, in which the vector bs defines the set

of parameters of the model By the use of Principal Component Analysis wehave reduced the number of parameters from s to l with l < s Depending

on l this can be a significant reduction in dimensionality By varying theelements of b we can vary the shape The variance of the ith parameter biacross the training set is given by λs,i By applying limits of ±3pλs,i to theparameters of bs we ensure that the shape generated is similar to those inthe original training set The number of parameters in bs is defined as thenumber of modes of variation of the shape model

2.2 Statistical Texture Models

To be able to synthesize a complete image of an object we would like toinclude the texture information of an image By ‘texture’ we mean, thepattern of intensities or colours across an image patch

Given an annotated training set, we can generate a statistical model ofshape variation from the points Given a mean shape, we can warp each train-ing image into the mean shape, to obtain a ‘shape-free’ patch From that wecan build a statistical model of the texture variation in this patch Warp-ing each training image means, changing an image so that its control pointsmatch the mean shape (using a triangulation algorithm, see Appendix F of[6]) This is done to remove spurious texture variations due to shape differ-ences We then sample the intensity information from the shape-normalizedimage over the region covered by the mean shape to form a texture vector,

gimage

To minimize the effect of global lighting, the shape-free patches should

be photometrically aligned, or in other words, the shape-free patches should

be normalized This is done by minimizing the sum of squared distances Eg

between each texture vector and the mean of the aligned vectors ¯g, using

Trang 17

2.2 STATISTICAL TEXTURE MODELS 17

offsetting (changing brightness) and scaling (changing the contrast) of theentire shape-free patch:

Eg is minimized using the transformation gi = (gimage− β1)/α, where α

is the scaling factor and β is the offset

where n is the number of elements in the vector

Obtaining the mean of the normalized data is a recursive process, as thenormalization is defined in terms of the mean This can be solved by aniterative algorithm Use one of the examples as the first estimate of themean, align the others to it (using 2.7 and 2.8) and re-estimate the mean,calculate Eg and keep iterating between the two until Eg is converged (doesnot get smaller anymore)

The next step is to apply PCA to the normalized data, in a similar manner

as with the shape models This results in:

in which Pg contains the k eigenvectors corresponding to the largest values λg,i and bg are the grey-level parameters of the model The number ofparameters are called the number of texture modes

eigen-The elements of bi are again bound by:

If we represent the normalization parameters α and β in a vector u =(α − 1, β)T, we represent u as u = (u1, u2)T, and g = (gimage− β1)/α, we canstate that the transformation from g to gimage is the following:

Now we can generate the texture in the image in the following manner:

gimage≈ Tu(¯g + Pgbg) = (1 + u1)(¯g + Pgbg) + u21 (2.12)

Trang 18

2.3 The Combined Appearance Model

The appearance model combines both the shape model and the texturemodel It does this by combining the parameter vectors bs and bg to form

a combined parameter vector bsg Because these vectors are of a differentnature and thus of a different relevance, one of them will be weighted

A simpler alternative is to set Ws = rI where r2 is the ratio of the totalintensity variation to the shape variation (in normalized frames) Note that

we already calculated the intensity variation and the shape variation in theform of the eigenvalues λs,i and λg,i, of the covariation matrix of the shapevectors and the intensity vectors Thus:

Ws = λ

+ g

where Pc are the eigenvectors belonging to the m largest eigenvalues of thecovariance matrix of combined and weighted texture- and shape modes bsg.The vector c is a vector of appearance parameters controlling both the shapeand grey-levels of the model, defined as the Appearance modes of variation.Note that the dimension of the vector c is smaller since m ≤ l + k Nowfrom this model we can extract an approximation of the original shape andtexture information by calculating:

x = ¯x + PsWs−1Pcsc, g = ¯g + PgPcgc, (2.17)

Trang 19

2.3 THE COMBINED APPEARANCE MODEL 19

Figure 2.2: Result of varying the first three appearance modes

of the appearance vector c are referred to as the appearance modes

Figure 2.2 (taken from [5]) shows the effect of varying the first three (themost significant) appearance modes Note that the image in the middle, atzero, is the mean face (derived from a particular training set) From thisimage, we can clearly see how the first two modes affect both the shape andthe texture information of the face model Note that, the composition of thetraining set, the amount of variance retained in each step and the weight-ing of shape versus texture information will determine the most significantappearance modes and what these modes look like (or what their influenceis)

Tiêu đề	Using Active Appearance Models for Face Recognition
Tác giả	Paul Ivan
Người hướng dẫn	dr. Sandjai Bhulai
Trường học	Vrije Universiteit Amsterdam
Chuyên ngành	Business Mathematics and Informatics
Thể loại	Thesis
Năm xuất bản	2007
Thành phố	Amsterdam

Định dạng
Số trang	39
Dung lượng	367,42 KB