The Online Video Contextual Advertisement User-Oriented System using Video-based Recognition

34 2 The Contextual Advertising based on Face Recognition Overview 36... 62 3 Some Techniques Improve the Efficiency of Face Recognition 63 3.1 Video-based face recognition used FS probl

Trang 1

MINISTRY OF EDUCATION AND TRAINING

Duy Tan University

Le Nguyen Bao

The Online Video Contextual

Advertisement User-Oriented System

using Video-based Recognition

Doctor of Philosophy of Computer Science

Da Nang - 2017

Trang 2

MINISTRY OF EDUCATION AND TRAINING

Duy Tan University

Le Nguyen Bao

The Online Video Contextual

Advertisement User-Oriented System

using Video-based Recognition

Major: Computer ScienceCode: 62.46.01.10

Doctor of Philosophy of Computer Science

Scientific supervisor: Associate Professor Do Nang Toan

Da Nang - 2017

Trang 3

Declaration of Original Work

I, Le Nguyen Bao, hereby declare that the work entitled The Online Video textual Advertisement User-Oriented System using Video-based Recognition is myoriginal work I have not copied from any other postgraduates’ work or from anyother sources except where due references or acknowledgment is made explicitly

Con-in the text, nor has any part been written for me by another person

Ph.D candidate’s signature

Le Nguyen Bao

Trang 4

1.1 Face detection and tracking 4

1.1.1 Face detection 4

1.1.2 Face tracking 5

1.2 Face recognition 7

1.3 Video-based face recognition 10

1.3.1 Haar cascade classifiers 12

1.3.2 Kalman filter 16

1.3.3 Discrete cosine transform 18

1.3.4 K-Means clustering 21

1.3.5 K-Nearest neighbors classification 22

1.4 Ant Colony Optimization 24

1.4.1 The Ant Colony Optimization Meta-heuristic 25

1.4.1.1 Construct Ant Solutions 27

1.4.1.2 Apply Local Search 27

1.4.1.3 Update Pheromones 28

1.4.2 Main ACO Algorithms 29

1.4.2.1 Ant System 29

1.4.2.2 Ant Colony System 30

1.4.2.3 MAX-MIN Ant System 31

1.4.3 Applications of Ant Colony Optimization 32

1.4.3.1 Applications to NP-Hard Problems 32

1.4.3.2 Applications to Telecommunication Networks 33

1.4.3.3 Applications to Industrial Problems 33

1.5 Summary 34

2 The Contextual Advertising based on Face Recognition Overview 36

Trang 5

2.1 Online Advertisement 37

2.1.1 What is online advertising? 37

2.1.1.1 Definitions 38

2.1.1.2 Traditional advertising vs Online advertising 40

2.1.2 Advertising Metrics 42

2.1.3 Key Elements 42

2.2 Contextual Advertising 43

2.2.1 Purchase Funnel 44

2.2.2 Types of Advertising 45

2.2.3 Payment Models 46

2.2.4 Research Challenges and Opportunities 48

2.3 Display Advertising 51

2.3.1 Research Challenges and Opportunities in Display Advertising 52 2.3.2 New Trends and Issues 54

2.4 Elevator Advertising 55

2.4.1 Creative Elevator Advertisements 55

2.4.2 Elevator Screen 56

2.5 Related Works 59

2.6 Summary 62

3 Some Techniques Improve the Efficiency of Face Recognition 63 3.1 Video-based face recognition used FS problem 63

3.1.1 Video-based face recognition problem 63

3.1.2 Feature selection problem 64

3.2 Our framework of face recognition 65

3.2.1 Pseudo Zernike Moment Invariant 66

3.2.2 Discrete Wavelet Transform 68

3.2.3 k-Nearest Neighbor Classifier 69

3.3 MMAS proposed for feature selection problem 71

3.3.1 Construct Ant solutions 71

3.3.2 Update pheromones 71

3.3.3 Our algorithm implementation 74

3.4 Experiment and Results 75

3.4.1 Experiment implementation 75

3.4.1.1 ORL database 75

3.4.1.2 AR database 76

3.4.1.3 FERET database 77

3.4.1.4 GEORGIA TECH database 78

3.4.1.5 LFW database 79

3.4.2 Case study 1 80

3.5 Summary 86

Trang 6

4 The Online Video Contextual Advertisement User-Oriented Sys-tem using Video-based Recognition Elevator 87

4.1 Framework for online video contextual advertisement user-oriented

in elevator system 87

4.1.1 Identifying and classifying objects based on images captured from the Camera 88

4.1.2 Accessing video database under classified objects 90

4.1.3 Transferring video content 91

4.2 Real Time Multimedia Protocol 95

4.2.1 Streaming 96

4.2.1.1 Traditional Streaming 97

4.2.1.2 Progressive Download 97

4.2.1.3 Adaptive streaming 97

4.2.2 Real-Time Networked Multimedia 98

4.2.3 Real-time Streaming Media Protocols 99

4.2.3.1 Basics of streaming protocols 99

4.2.3.2 Datagram Protocol 101

4.2.3.3 Multicast IP protocol 102

4.2.3.4 Real-Time Streaming Protocol 103

4.2.3.5 SMIL Protocol 106

4.3 Experiment and results 107

4.3.1 Experiment implementation 107

4.3.1.1 Honda/UCSD database 107

4.3.1.2 MoBo Dataset 107

4.4 Summary 111

Trang 7

List of Tables

1.1 Features face detection approaches 5

1.2 The challenges of boosting learning for face detection 6

1.3 The learning schemes for face detection 6

1.4 Face tracking approaches 7

1.5 Video based face recognition approaches 11

1.6 A non-exhaustive list of successful ACO algorithms 25

1.7 List of applications of ACO algorithms grouped by problem type 35

2.1 Some differences between traditional and online advertising 40

2.2 Summary of approaches and methods for face recognition problem 60

2.3 Summary of approaches and methods in feature selection problem 61

3.1 The ORL face database properties 76

3.2 The AR face database properties 77

3.3 The FERET face database properties 78

3.4 The GEORGIA TECH face database properties 79

3.5 The LFW face database properties 80

3.6 The comparison performance of meta-heuristic algorithms with PZMI fea-ture subsets in Case study 1 81

3.7 Evaluation recognition rate (%) of meta-heuristic algorithms with PZMI feature subset sizes in Case study 1 81

3.8 The comparison performance of algorithms with 1D-DWT feature subsets in Case study 2 84

3.9 Evaluation recognition rate (%) of each algorithm with 1D-DWT feature subset sizes in Case study 2 84

3.10 Comparison of recognition accuracy between PZMI and MMAS-DWT algorithm with difference face data sets 85

Trang 8

List of Figures

1.1 Face recognition processing flow 8

1.2 Face recognition process 8

1.3 Process of face recognition in video 11

1.4 Examples of Haar-like features Their values represent the intensity dif-ferences between the black and the white areas 12

1.5 Integral image (a) The integral at (x , y) is the sum of all pixels up to (x , y)in the original image (b) The area of rectangle D results to ii (A+B +C +D )−ii (A+C )−ii (A+B )+ii (A) Each ii () can be determined with a single array 13

1.6 Structure of the classifier cascade ”Yes” and ”no” denote if the sub-window successfully passed the previous stage 14

1.7 Two out of four features evaluated in the first stage of the face detection cascade used in this work Both features embody the observation that the forehead and cheeks of a person are usually brighter than the eye region because the eyes are located further aback 15

1.8 Overview of the Kalman filter as a predictor-corrector algorithm 16

1.9 Cosine basis functions of the discrete cosine transform for input size 8×8 The frequency of the basis functions increases from top left (0, 0) to bot-tom right (8, 8) 19

1.10 Discrete cosine transform of an 88 pixels image patch The coefficients represent the basis functions depicted in Figure 1.9 20

1.11 Average energy of all 64 blocks of the image in Figure 1.10(a) The DC coefficient has been removed to allow meaningful scaling 20

1.12 The Discrete cosine transform coefficients are serialized according to a zig-zag pattern 21

2.1 Creative Elevator Advertisements 57

2.2 Elevator Screen 58

2.3 Elevator Advertisements Screen 58

2.4 The integrated LCD screen play video system 58

2.5 Application of the integrated LCD screen play video system 59

3.1 Framework of face recognition used MMAS algorithm 65

3.2 Digraph G = (E , V ) for feature set F 71

3.3 Sample images of the ORL face database 75

3.4 Sample images of the AR face database 76

3.5 Sample images of the FERET face database 78

3.6 Sample images of the GEORGIA TECH face database 79

3.7 Sample images of the LFW face database 80

3.8 Assessment of the recognition rate of PZMI features 82

3.9 Assessment of the recognition rate of 1D-DWT features 83

3.10 Evaluation performance of proposed algorithms 85

Trang 9

3.11 Assessment of VbFR approaches execution on Honda/UCSD and

CMU-MoBo datasets 86

4.1 The online video contextual advertisement system proposed 88

4.2 Object recognition model from the camera 89

4.3 Video database access model based metadata structure 91

4.4 Transferring video content protocols 93

4.5 Sample images of the MoBo face database 107

4.6 Some face images extracted from the Honda/UCSD Video Database 108

4.7 Comparision the recognition rate (%) approaches on Honda/UCSD and CMU MoBo datasets 108

4.8 The proposed system automatically detects object and stores the object in the database 109

4.9 The system play default video player when not detected object 109

4.10 Auto detect, identifier object and choose play the most suitable video 110

4.11 Detecting multiple object identifiers concurrent 110

Trang 10

List of Acronyms

Acronym What (it) Stands For

AS Ant System

ACO Ant Colony Optimization

ACS Ant Colony System

CPC Cost Per Click

CPM Cost per Mille

CPA Cost Per Action

DWT Discrete Wavelet Transform

DCC Discriminative Canonical Correlations

FS Feature Selection

ICA Independent Component

KNN K-nearest neighbors

KPA Kernel Principal Angles

HTML Hypertext Markup Language

HDS HTTP Dynamic Streaming

LDA Linear Discriminant

LLP Locality Preserving Projections

LLE Locally Linear Embedding

LNMF Local Non-negative Matrix Factorization

LNDM Learning Neighborhood Discriminative ManifoldsMDS Multidimensional Scaling

MFA Marginal Fisher Analysis

MMAS MAX-MIN Ant System

NNC Nearest Neighbor Classifier

NDMP Neighborhood Discriminative Manifold ProjectionP2P Peer-to-peer

PCA Principal component analysis

PZMI Pseudo Zernike Moment Invariant

QoS Quality of Service

SVM Support vector machine

SMIL Synchronized Multimedia Integration LanguageRTSP Real-Time Streaming Protocol

RTMP Real Time Messaging Protocol

RTP Real-time Transport Protocol

RTCP Real-time Transport Control Protocol

Trang 11

TCP Transmission Control ProtocolUDP User Datagram ProtocolXML Extensible Markup Language

Trang 12

This thesis would not have been possible without the sincere helps and tributions of many individuals Therefore, I would like to use this opportunity toexpress my deep gratitude to them

con-First of all, I would like to express my deepest and sincerest gratitude to mysupervisor Associate Professor Do Nang Toan for his valuable advice, insightfulguidance, encouragement, inspiration and continuous support throughout this re-search His advices and guidance helped me in all the time of the research andmean a lots for my future career endeavors

Last but not least, I am deeply indebted to my parents, my family for theirencouragement and unconditional support to help me overcome challenges I had

to face in my life

Trang 13

Advertising is everywhere Media that were once largely commercial freefrom movies to the internet - now come replete with commercial messages Not solong ago, most musicians were reluctant to see their work used to endorse sham-poo or sneakers In the recent years, the types of advertising have strongly growth

in both width and depth About its width, we see advertising everywhere, fromthe developed capitalist countries to developing socialist country Advertising hasbecome an important economic activity in the market economy This is the inevitable trend, as one of the effective means of showing the commercial competi-tiveness, is a driving force stimulating economic development Regarding depth,advertising not only disfiguring the way of living of the users but also to changetheir thinking, deeply influenced the culture of many classes of people in the so-ciety In this article, we want to recognize the third dimension; it is the technicaldevelopment for the advertising model Initially, the advertising techniques gofrom the primitive means such as rumors, words of mouth between friends, aninterpretation from the sales person to these new forms such as gifts, awards, has been taking the advantage of the power of mass media such as newspapers,magazines, radio, television, cinema

Today, advertising has been a convergence and new development step whenmoving to multimedia communicating model with the high interoperability viathe Internet This is also a challenge for the dynamic new forms of advertising, incontrast with the unilateral form of direct advertising as people has become passivewith the advertising systems The advertising customization, with the ability toupdate in real time to change advertising text according to the full context of asearch or the website someone is seeing, like: Google AdWords, Google AdSense.This is contextual display advertising or targeting advertising technology and isvery effective With the common forms of advertising like logos, banners, pop-

up from time to time; users accessing the site can see the ads However, thedrawback of this method is the depending on irregular traffic, difficulty to control,low-orientation, advertising contents must try harder to hit the target objectsbecause the ad is only located on a fixed website and also appears in the irrelevantarticles, columns The next technology is displaying contextual advertisementcontent on the website or other media such as mobile phone to approach user,

Trang 14

based on the context of the article, geographic location, and time of user accessing

to the ad and accessing habits of potential customers Contextually ads displayingtechnology determines the contents on the website, identifies the accessing context

of user to deliver the most appropriate ads to the user needs The main features

of the contextually customized display ads technology is automatically analyzingthe users, selecting the object to display ads, geographical location, and budgetmanagement and online bidding system At the same time, the flexible chargingsystem will: charges under the impressions (Cost Per Mile), the value of each click(Cost Per Click ) or interactive activities of users on advertising (Cost Per Action)has helped enterprises to optimize advertising costs while ensuring its effectiveness

In this research, we will focus on analyzing and proposing model of ically customized advertising system to customer objects in real-time context ap-plied in customization of advertising video contents with target directed to makecontent of ads relevant and truly useful for customers in each specific contextsthrough object analysis identifier acquired from the camera The system is acombination of image processing and identification with MCN streaming architec-ture used metadata structure to store video data plus machine learning models todeliver maximum efficiency to the system

automat-The structure of the thesis is organized into four chapters:

• In the first chapter, the thesis focuses on introducing the basic knowledge

In particular, we introduces an overview of the face detection and trackingproblem, the theoretical basis of building ants algorithms and applications.This is the basis for building the system model proposed in the next chapters

• In the second chapter, we have presented the contextual advertising based

on face recognition overview We analyze the differences between traditionaladvertising and online advertising The approaches for the online advertisingproblem and the challenges are also mentioned This is the foundation for us

to propose our online video contextual advertisement user-oriented systemusing video-based recognition

• In the third chapter, the author proposed MMAS-PZMI and MMAS-DWTalgorithms solve the FS problem used PZMI and DWT feature for VbFS Thefeatures set are represented by digraph G (E , V ) Each node used to showthe features, and the ability to choose a combination of features is presentedthe edges connecting between two adjacent nodes The heuristic informationextracted from the selected feature vector as ants pheromone The feature

Trang 15

subset optimal is selected by the shortest length features and best tion of classifier The experiments were analyzed on face recognition showthat our algorithm can be easily applied without the priori information offeatures The performance evaluated of our algorithm is better than previ-ous approaches for FS The best subset used to classify the face recognition.The experiments were analyzed on FS shows that our algorithm can be easilyapplied without the priori information of features The performance evalu-ated shows that our algorithm proposed better than previous approaches forVideo-based recognition based on FS problem.

presenta-• In the last chapter, we have presented proposal of the model of advertisingsystem automatically customized according to customer objects in elevator.From the actual images received from the camera, the system will analyzethe objects based on the given characteristic set to determine the appropri-ate class of customer Based on customer class is defined, the system willaccess to database of multimedia advertising and automatic selection, de-livery of appropriate contents For each stage, we have focused on analysisand evaluation of techniques used to enhance the viability and effectiveness

of the system The experimental results shows that our approach can be tect exactly faces from standard video database Out system can accuratelyidentify the object and choose suitable video directly from the camera inrealtime

de-The results of this studies have been published in three international nals indexed by SCIE/SSCI/Scopus and fourth proceedings of the internationalconferences indexed by Scopus and ISI proceedings

Trang 16

However, face detection from a single image is a challenging task because

of variability in scale, location, orientation (up-right, rotated ), and pose (frontal,profile) Facial expression, occlusion, and lighting conditions also change the over-all appearance of faces [83] We now give a definition of face detection: Given anarbitrary image, the goal of face detection is to determine whether or not there areany faces in the image and, if present, return the image location and extent of eachface In this subsection, we introduce the typical techniques of face detection invideo, real-time and multi-view methods There are two key issues in this process:what features to extract, and which learning algorithm to apply [83,86, 89] Thesummary of the recent advances in feature extraction for face detection are shown

in Table.1.1

Trang 17

Table 1.1: Features face detection approaches

Haar-like and Haar-like features; Rotated Haar-like features [ 53 ] its variations Rectangular features with structure [ 67 ]

Haar-like features on motion filtered image [ 3 ] Pixel-based Pixel pairs; Control point set [ 2 ]

Local Binary Pattern (LBP) features [ 3 ] Locally assembled binary feature [ 50 ] Generic linear Anisotropic Gaussian filters [ 69 ]

Local Non-negative Matrix Factorization (LNMF) [ 22 ] Generic linear features with Kullback-Leibler boosting [ 69 ] Recursive Nonparametric Discriminant Analysis (RNDA) [ 9 ] Statistics-based Edge orientation histograms; Spectral histogram [ 47 ]

Histogram of oriented Gradients (HoG) and LBP [ 9 ]

follow-in an image Face recognition or face identification compares an follow-input image(probe) against a database (gallery) and reports a match, if any The purpose

of face authentication is to verify the claim of the identity of an individual in aninput image, while face tracking methods continuously estimate the location andpossibly the orientation of a face in an image sequence in real time [29, 71, 83]

In addition to exploring better features, another venue to improve the detectorsperformance is through improving the boosting learning algorithm, particularlyunder the cascade decision structure The challenges of boosting learning for facedetection are shown in Table.1.2 and other learning schemes are shown Table.1.3

1.1.2 Face tracking

Object tracking is an important task within the field of computer vision Theproliferation of high-powered computers, the availability of high quality and inex-pensive video cameras, and the increasing need for automated video analysis hasgenerated a great deal of interest in object tracking algorithms There are threekey steps in video analysis: detection of interesting moving objects, tracking of

Trang 18

Table 1.2: The challenges of boosting learning for face detection

Challenges The boosting learning algorithm Ref General Boosting schemes AdaBoost, RealBoost, GentleBoost, FloatBoost [ 69 ] Reuse previous nodes’ results Nested cascade, Boosting chain [ 30 ]

Linear asymmetric classifier [ 87 ] Set intermediate thresholds Fixed node performance [ 58 ] during training WaldBoost, Based on validation data [ 71 ] Set intermediate thresholds after Greedy search [ 83 ] training Speed up training Soft cascade; Multiple instance pruning [ 30 ]

Greedy search in feature space [ 22 ] Random feature subset;Forward feature selection [ 72 ]

Speed up testing Reduce number of weak classifiers [ 12 ]

Feature centric evaluation; Caching attention [ 69 ] Multiview face detection Parallel cascade; Pyramid structure [ 49 ]

Decision tree; Vector valued boosting [ 22 ] Learn without subcategory labels Cluster and then train Exemplar-based learning [ 75 ]

Probabilistic boosting tree [ 54 ] Cluster with selected features [ 82 ] Multiple classifier/category boosting [ 29 ]

Table 1.3: The learning schemes for face detection

SVM speed up Reduced set vectors and approximation [ 82 ]

Resolution based SVM cascade [ 45 ] SVM multi-view face detection SVR based pose estimator [ 45 ]

SVR fusion of multiple SVMs [ 45 ]

Local and global kernels [ 82 ] Neural networks Constrained generative model [ 58 ]

Convolutional neural network [ 72 ]

SVM component detectors adaptively trained [ 29 ] Overlapping part detectors [ 15 ]

such objects from frame to frame, and analysis of object tracks to recognize theirbehavior Therefore, the use of object tracking is pertinent in the tasks of: motion-based recognition, that is, human identification based on gait, automatic objectdetection, etc; automated surveillance, that is, monitoring a scene to detect sus-picious activities or unlikely events; video indexing, that is, automatic annotationand retrieval of the videos in multimedia databases; human-computer interaction,that is, gesture recognition, eye gaze tracking for data input to computers, etc.;traffic monitoring, that is, real-time gathering of traffic statistics to direct trafficflow vehicle navigation, that is, video-based path planning and obstacle avoidancecapabilities

Trang 19

In general, face tracking can be divided into three categories, head tracking(color-based, model-based and shape-based ), facial feature tracking and combina-tion of head and facial feature tracking [71, 90] For video processing, real-time isits foremost feature to be considered Summary of the approaches of face trackingare listed in Table.1.4.

Table 1.4: Face tracking approaches

Active Model; Adaptive template tracking [ 64 ] Color and shape based Combined skin color with facial shape [ 85 ]

A new color-space method based on LDA [ 64 ] Bilateral Learning (BL) approach [ 71 ]

An enhanced mean-shift tracking approach [ 90 ] Facial feature tracking Local structural details of human faces [ 91 ]

Fitted into Kalman filtering framework [ 22 ] Local structure features within Kalman filter [ 15 ] Combination of A non-planar target undergoing an arbitrary 3D movement [ 10 ] head and feature Evolutionary method of SSGA and PSO [ 32 ]

Head tracking based on the color and edge cues; [ 15 ] Boosted Adaptive Particle Filter (BAPF) [ 69 ]

1.2 Face recognition

Biometrics is the emerging area of bioengineering; it is the automated method

of recognizing person based on a physiological or behavioral characteristic Thereexist several biometric systems such as signature, finger prints, voice, iris, retina,hand geometry, ear geometry, and face Among these systems, facial recognitionappears to be one of the most universal, collectable, and accessible systems Bio-metric face recognition, otherwise known as Automatic Face Recognition (AFR),

is a particularly attractive biometric approach, since it focuses on the same tifier that humans use primarily to distinguish one person from another: theirfaces One of its main goals is the understanding of the complex human visualsystem and the knowledge of how humans represent faces in order to discriminatedifferent identities with high accuracy [64, 67]

iden-The face recognition problem can be divided into two main stages: face cation (or authentication), and face identification (or recognition) The detectionstage is the first stage; it includes identifying and locating a face in an image.The recognition stage is the second stage; it includes feature extraction, whereimportant information for discrimination is saved, and the matching, where the

Trang 20

verifi-recognition result is given with the aid of a face database Several face verifi-recognitionmethods have been proposed.

Figure 1.1: Face recognition processing flow

In the vast literature on the topic there are different classifications of theexisting techniques The following is one possible high-level classification:

• Holistic Methods: The whole face image is used as the raw input to therecognition system An example is the well-known PCA-based techniqueintroduced by Kirby and Sirovich, followed by Turk and Pentland

• Local Feature-based Methods: Local features are extracted, such as eyes, noseand mouth Their locations and local statistics (appearance) are the input

to the recognition stage An example of this method is Elastic Bunch GraphMatching (EBGM)

Regardless of the algorithm used, facial recognition is accomplished in a fivestep process

Figure 1.2: Face recognition process

Trang 21

Image acquisition : Image acquisition can be accomplished by digitallyscanning an existing photograph or by using an electro-optical camera to acquire

a live picture of a subject Video can also be used as a source of facial images Themost existing facial recognition systems consist of a single camera The recognitionrate is relatively low when face images are of various pose and expression anddifferent illumination With increasing of the pose angle, the recognition ratedecreases The recognition rate decreases greatly when the pose angle is largerthan 30 degrees Different illumination is not a problem for some algorithms likeLDA that can still recognize faces with different illumination, but this is not truefor PCA To overcome this problem, we can generate the face images with frontalview, moderate facial expression, and same illumination if PCA algorithm is used.Image Preprocessing: Face recognition algorithms have to deal with sig-nificant amounts of illumination variations between gallery and probe images Forthis reason, image preprocessing algorithm that compensates for illumination vari-ations in images is used prior to recognition The images used are gray scaled.Histogram equalization is used here to enhance important features by modifyingthe contrast of the image, reducing the noise and thus improving the quality of animage and improving face recognition It is usually done on too dark or too brightimages The idea behind image enhancement techniques is to bring out detail that

is obscured, or simply to highlight certain features of interest in an image Imagesare enhanced to improve the recognition performance of the system

Face Detection : Face detection is a computer technology that determinesthe locations and sizes of human faces in arbitrary images It detects facial featuresand ignores anything else, such as buildings, trees and bodies Face detection can

be regarded as a specific case of object-class recognition, a major task in computervision Software is employed to detect the location of any faces in the acquiredimage Generalized patterns of what a face looks like are employed to pick out thefaces The method devised by Viola and Jones, that is used here, uses Haar-likefeatures Even for a small image, the number of Haar-like features is very large,for a 24×24 pixel window one can generate more than 180000 features AdaBoost

is used to train a classifier, which allows for a feature selection The final classifieronly uses a few hundred Haar-like features Yet, it achieves a very good hit ratewith a relatively low false detection rate

Feature Extraction : This module is responsible for composing a featurevector that is well enough to represent the face image Its goal is to extract therelevant data from the captured sample Feature extraction is divided into twocategories, the holistic feature category and the local features category Localfeature based approaches try to automatically locate specific facial features such

Trang 22

as eyes, nose and mouth based on known distances between them The holisticfeature category deals with the input face image as a whole Different methodsare used to extract the identifying features of a face The most popular method

is called Principle Components Analysis (PCA), which is commonly referred to asthe eigen face method Another method used here is called Linear DiscriminantAnalysis (LDA), which is referred to as the fisher face method Both LDA andPCA algorithms belong to the holistic feature category Template generation isthe result of the feature extraction process A template is a reduced set of datathat represents the unique features of an enrollees face consisting of weights foreach image in the database The projected space can be seen as a feature spacewhere each component is seen as a feature

Declaring a match : The last step is to compare the template generated instep four with those in a database of known faces In an identification application,the biometric device reads a sample and compares that sample against every record

or template in the database, this process returns a match or a candidate list

of potential matches that are close to the generated templates in the database

In a verification application, the generated template is only compared with onetemplate in the database that of the claimed identity, which is faster Closestmatch is found by using the Euclidean distance which finds the minimum differencebetween the weights of the input image and all images in the database

1.3 Video-based face recognition

These linear methods are clearly effective in learning data in simple Euclideanstructure PCA [67] learns a projection that maximizes its variance while MDS [9]preserves pairwise distances between data points in the new projected space Withadditional class information, LDA [81] learns a linear projection that maximizesthe ratio of the between-class scatter to the within-class scatter The emergence ofnonlinear dimensionality reduction methods such as LLE [51] and Isomap [41,80]signaled the beginning of a new paradigm of manifold learning These methodsare able to discover the underlying high-dimensional nonlinear structure of themanifold in a lower dimensional space The main disadvantage of these methods

is that they cannot deal with the out-of-sample problem where new data pointscannot be projected onto the embedded space The learning methods resolvethis limitation by deriving optimal linear approximations to the embedding usingneighborhood information in the form of neighborhood adjacency graphs [63, 71,

86] To present a comprehensive survey, we categorize existing video based facerecognition approaches in Table.1.5

Trang 23

Table 1.5: Video based face recognition approaches

Linear dimensionality Principal Component Analysis (PCA) [ 67 ]

Learning Neighborhood Discriminative Manifolds(LNDM) [ 41 ] Nonlinear dimensionality Locally Linear Embedding (LLE) [ 51 ]

Learning Locality Preserving Projections (LPP), Orthogonal LPP [ 39 ]

Marginal Fisher Analysis (MFA) [ 71 ] Neighborhood Preserving Embedding (NPE) [ 62 ]

Discriminative Canonical Correlations (DCC) [ 89 ] Neighborhood Discriminative Manifold Projection [ 40 ]

Nowadays, face recognition based on 3D is a hot research topic Generally,comprehensive methods can be divided into three main categories, namely, 2Dimages based, 3D images based and multimodal systems The differences amongthese three categories are as follows: the first category includes approaches whichuse 2D images and 3D generic face model to improve the robustness and recog-nition rate And for the second one, the methods work directly on 3D datasets.While the last group means those which utilize both 2D and 3D information [1,15].Video-image face recognition can be seen as an extension of still image based facerecognition The input of the system is videos while the database are still face im-ages Due to its importance and difficulties, many research institutes have focused

on video based face recognition with all kinds of approaches proposed, such asMassachusetts Institute of Technology, Carnegie Mellon University, University ofIllinois at Urbana-Champaign, University of Maryland, University of Cambridge,Toshiba, Institute of Automation Chinese Academy of Sciences [86] The wholeprocedure of video based face recognition is shown in Figure.1.3

Figure 1.3: Process of face recognition in video

Trang 24

Faces in videos are tracked by tracking algorithms and those high qualityface images of better resolution, pose and clarity are selected for matching based

on still image based methods [89] A lot of research has been done in this area,most of that is efficient and effective for still images only So could not be applied

to video sequences directly In the video scenes, human faces can have ited orientations and positions, so its detection is of a variety of challenges toresearchers [71, 86, 89] We have been motivated to improve face recognition invideo by exploiting temporal information, an intrinsic property only available invideos Some of these methods represent video as a complex face manifold toextract a variety of features such as exemplars and image sets represented as localmodels or subspaces

unlim-1.3.1 Haar cascade classifiers

Haar cascade classifiers represent a framework for rapid object detection inimages as proposed by Viola and Jones [69] This framework is based on a set ofHaar-like rectangular features which can be efficiently computed using an imagerepresentation called integral image A cascaded architecture trained with theAdaBoost boosting algorithm allows rapid evaluation of these features in order todetect learned objects or, in this case, faces and eyes

Haar-like features: As mentioned above, the detection framework makesuse of a large number of rectangular features which are reminiscent of Haar basisfunctions Some examples of these features are depicted in Figure 1.4

Figure 1.4: Examples of Haar-like features Their values represent the

inten-sity differences between the black and the white areas

Each feature is basically computed as an intensity difference between adjacentregions of an image Although not being invariant to rotation, a single feature caneasily be evaluated at an arbitrary location or scale This is made possible byrepresenting the image as an integral image

Integral image: Similar to an integral in mathematics, pixel ii (x , y) in theintegral image represents the sum of the pixels above and left of pixel i (x , y) in

Trang 25

Figure 1.5: Integral image (a) The integral at (x , y ) is the sum of all els up to (x , y)in the original image (b) The area of rectangle D results to

pix-ii (A+B +C +D )−pix-ii (A+C )−pix-ii (A+B )+pix-ii (A) Each pix-ii () can be determined with

in Figure 1.4, it is obvious that the most complex one, the center one, can bedetermined with as few as nine accesses to the integral image Hence, based onthis image representation, the Haar-like features described above can be evaluated

at any location or scale in constant time In comparison, computation of a feature

of size X0×Y0 in the original image requires X0• Y0 accesses

Trang 26

Classifier training: Even though any single feature can be computed veryrapidly, the exhaustive set of possible features in an image is very large In thework of Viola and Jones, it consists of approximately 160,000 features per 24×24pixel sub-window Since the input image is scanned with a sliding window at dif-ferent scales, evaluation of the full feature set leads to a very high computationaleffort [69] To reduce the number of features and to obtain an efficient detectionalgorithm, the most discriminant ones are selected using a modified version of theAdaBoost boosting algorithm The thresholded single features are considered asweak learners which are then weighted and combined to form a stronger classi-fier, which takes the form of a perception Within this classifier, discriminatingfeatures, i e., good classification functions, obtain a high weight, whereas lessdifferencing features and therefore ones with poor classification performance get alow weight In the framework by Viola and Jones, AdaBoost is used to greedilyselect a small number of distinctive features from the vast set of available ones.

It is obvious that the number of features in the strong classifier directly fects computation time as well as the correct and false detection rates A smallernumber of features leads to a faster classifier with fewer correct and more falsedetections In order to keep the number of evaluated features small but still obtaingood detection results, a cascade of several of the strong classifiers outlined above

af-is constructed A cascade af-is essentially a degenerate decaf-ision tree as depicted inFigure1.6 Each stage hands on its detections both correct and false to its suc-cessor, which is trained to discriminate these more difficult cases using additionalfeatures Negative sub-windows are discarded immediately A sub-window whichsuccessfully passes the whole cascade is considered a correct detection

Figure 1.6: Structure of the classifier cascade ”Yes” and ”no” denote if the

sub-window successfully passed the previous stage

Consequently, the entire set of selected features has only to be evaluated forthe small number of positive sub-windows compared to the overall number of sub-windows in an image The majority of negative sub-windows is discarded early

in the detection process using only a small subset of the selected features Figure

Trang 27

1.7 shows two sample features of the four features in the first stage of the facedetector that was used in this work.

Figure 1.7: Two out of four features evaluated in the first stage of the facedetection cascade used in this work Both features embody the observationthat the forehead and cheeks of a person are usually brighter than the eye

region because the eyes are located further aback

To train this system, all sub-windows that pass one stage are used as trainingsamples for the next one This stage is then trained to discriminate these moredifficult cases using a different set of features Each strong classifier has to solve

a harder problem than its predecessor For each stage, limits for acceptable rect and false detection rates are defined Features are added to these classifiersuntil these requirements are met If the overall detection rates are not yet satis-fying, another classifier is trained and added to the cascade Given its sequentialstructure, the correct detection rate D and the false detection rate F of the finalcascade with K stages can be computed using

of the faces with as few as two features

Trang 28

1.3.2 Kalman filter

The Kalman filter (KF) is a linear state estimator and was initially introduced

by Kalman Since then, the KF and several variants like the Extended Kalmanfilter for nonlinear estimation are commonly used in tracking tasks A detailedintroduction to these topics can be found in [73]

Figure 1.8: Overview of the Kalman filter as a predictor-corrector algorithm

In this work, the system contents itself with the basic implementation of theKalman filter for linear prediction It is based on a discrete-time dynamic systemwhich is described by a process model

x (t ) = A(t ) · x (t −1)+B (t ) · u(t )+v (t ) (1.6)and an observation model

z (t ) = H (t ) · x (t )+w (t ) (1.7)

The system state at time t is denoted by x (t ) · A(t ) and H (t ) stand for theknown state transition and measurement matrices, while matrix B (t ) allows tomodel the influence of some optional control input u(t ) The vectors v (t ) and w (t )represent the process noise and the observation or measurement noise, respectively.They are assumed to be independent, white Gaussian random processes with zero-mean and covariances Q (t ) and R(t ), respectively

Equations (1.6) and (1.7) allow to infer the usually not directly observablecurrent system state x (t ) from a sequence of measurements z (t )t The recursively

of Equation (1.6) is a key property of the Kalman filter, as it avoids the need toprocess all measurements Zt = z (i )ti =0 in every time step

Trang 29

When estimating the system state, let bx (t | Zt 1) denote the a priori orpredicted state estimate at time t taking into account the measurements Zt 1 =

z (i )t −1i =0 up to time t −1, and bx (t | Zt) the a posteriori or filtered state estimatederived from all measurements Zt The predicted state estimate is given by

b

x (t | Zt 1) = A(t )bx (t −1 | Zt −1)+B (t ) · u(t ) (1.8)and the resulting state prediction error is

α(t ) =ez (t ) = z (t )ez (t ) (1.12)and its covariance

S (t ) = E [ez (t )ezT(t )] = H (t )P (t | Zt −1)HT(t )+R(t ) (1.13)

The innovation describes the difference between the predicted measurementand the actual observation Together with the Kalman gain, which is defined as

K (t ) = P (t | Zt −1)HT(t )S−1(t ) (1.14)the filtered state estimate can be updated following the state update equation

b

x (t | Zt) = bx (t | Zt −1)+K (t )α(t ) (1.15)and the filtered state error covariance according to the covariance update equation

P (t | Zt) = [I K (t )H (t )]P (t | Zt −1) (1.16)

Trang 30

Since a Kalman filter uses measurements to correct its state estimates, itcan be thought of as a predictor-corrector algorithm as it is commonly used tonumerically integrate differential equations The set of equations concerning theprediction of the current state, and therefore the next measurement, is made up

of Equations (1.8) and (1.10) After the actual observation has been made, thecorrection step leads to an update of the filter state according to this observationusing the measurement update equations (1.12),(1.14),(1.15) and (1.16) Figure

1.8 summarizes this process

1.3.3 Discrete cosine transform

High-dimensional data can pose many challenges Analysis of images of size

X ×Y on pixel level, for example, would result in a feature space with X · · · Ydimensions This grows easily into thousands of dimensions, which makes it verydifficult to model the data since many traditional statistical methods break downdue to the enormous number of variables Furthermore, larger feature vectorsboth require more memory and increase processing time The good news is that,most of the time, not all dimensions are necessary in order to build a model whichcaptures the underlying characteristics of the data In fact, those can often besuitably represented using only a small fraction of the initial number of dimen-sions Unfortunately, the essential dimensions are usually not axially parallel tothe dimensions of the original data, as the variables can be highly correlated.Therefore, it is crucial to move the data to a different representation which ismore appropriate in these terms

One of the methods to achieve this is the discrete cosine transform (DCT)

It is widely used in signal processing, especially in image processing where it iswell-known as the basis of the widespread JPEG still image compression standard

It interprets the data as superimposition of cosine oscillations and transforms it

to frequency domain Since this work deals with computer vision problems, theinput signal is considered to be 2-dimensional image data For a 2-dimensionalsignal f (x , y), the DCT is defined as

2Y

(1.17)

Trang 31

where the input is of size X ×Y and C (·) is defined as

The DCT has several advantages that makes its use appealing:

Orthonormality: The DCT is orthonormal and therefore loss-less Thisway, one has full control which part of the signal is to be discarded to reduce thedimensionality No information is inherently lost by the transformation itself As

a consequence, it is fully invertible

Compactness of representation: The DCT approximates the Loeve transform (KLT) which is optimal in terms of representational compactnessunder certain conditions Applying DCT to images generally leads to high-valuedcoefficients for low-frequency basis functions and to low-valued coefficients forhigh frequency ones as can be seen in Figure 1.10 Obviously, the major part

Karhunen-of the signal energy is encoded in a small number Karhunen-of low-frequency coefficientsand therefore dimensions This is the key to reducing the dimensionality of the

Trang 32

data The DCT itself is loss-less, as mentioned above, and the dimensionality ofthe transformed signal is still the same as the one of the input signal But high-frequency coefficients can be removed without any or, at most with negligibleeffects on the input signal, thus reducing its dimensionality Essentially, this low-pass filters the original data.

Figure 1.10: Discrete cosine transform of an 88 pixels image patch The

coefficients represent the basis functions depicted in Figure1.9

Figure 1.11 visualizes this compaction by showing the average energy of all 64blocks in an input image of size 64×64 pixels The image has been split intoblocks of 8×8 pixels to allow the DCT to capture enough local detail while stillproviding sufficient compaction This size is based on the JPEG standard

Figure 1.11: Average energy of all 64 blocks of the image in Figure1.10(a).The DC coefficient has been removed to allow meaningful scaling

Data-independency: The basis functions of the DCT are independent fromthe data to be transformed This is in contrast to PCA, the discrete realization

of the KLT Since these transforms rely on the covariance of the data, the basis

of the new vector space has to be computed from a representative training set.This leads to additional efforts both in terms of computation and construction

Trang 33

of the training set The DCT always uses the basis functions shown in Figure2.6 for input of size 8×8 Hence, the representation of already processed datadoes not change as it would with PCA, if new and unforeseen data arrived due to

a non-representative training set, which would make re-computation of the basisfunctions necessary

In order to represent the coefficients of a 2-dimensional DCT as a 1-dimensionalvector, the transformed signal is scanned following a zig-zag pattern as shown inFigure1.12

Figure 1.12: The Discrete cosine transform coefficients are serialized according

to a zig-zag pattern

1.3.4 K-Means clustering

K-means, introduced by MacQueen [88], is an unsupervised learning methodwhich partitions the data into k clusters Each cluster is represented by its cen-troid The approach uses complete and hard clustering, which means that eachsample belongs to exactly one cluster It is widely used for its simplicity and effi-ciency The basic outline of this algorithm is rather simple To determine clusterassociation, an appropriate distance metric d (xn, km) is necessary Common onesare, for example, city block and Euclidean distances Furthermore, the number ofclusters k has to be chosen in advance This has to be done carefully in order toachieve meaningful clusters

The system is initialized by selecting k samples as initial cluster centroids.Depending on the available knowledge of the data, these can be chosen randomlyfrom the data, by iteratively selecting data points that lie maximally apart or

by running the algorithm on a small subset with random initialization and usingthe resulting centroids to cluster the complete data set Afterward, each point is

Trang 34

assigned to the closest centroid according to d (·, ·) Subsequently, the centroidsare recomputed as mean vector of all assigned points These two steps, assignmentand centroid update, are repeated until the cluster means do not change any more.K-means can be regarded as a hill-climbing algorithm which minimizes anobjective function For city block and Euclidean distances, these are commonlythe Sum of Errors (SE) and the Sum of Squared Errors (SSE), respectively Theyare defined as

of the number of clusters and selecting the medoid, i.e., the data point which

is closest to the cluster center, instead of the mean increases robustness againstoutliers Zhang et al proposed to use the harmonic mean to determine softcluster assignments As a result, they report a higher robustness against badinitialization [86]

1.3.5 K-Nearest neighbors classification

The k-Nearest neighbors (KNN) approach is a type of discriminative model.Thisfamily of learning techniques derives the classification result directly from a set oftraining samples instead from an abstract model of the characteristics of the data.Nearest neighbor: The elements of the training data are called represen-tatives or prototypes Representatives with n features are considered as points in

an n-dimensional vector space A new sample x ∈ Rn is labeled with the class

of the closest representative, the nearest neighbor, according to a distance metric

d (x , y) Although the resulting error will be greater than the optimal Bayes errorrate, it is never more than twice, given an unlimited number of representatives.The proof is omitted here but can be found in [38] Please note that differentlyscaled features can bias the distance metric to overemphasize large-valued features

Trang 35

at the cost of small-valued ones This effect can be mitigated with appropriatenormalization techniques.

K-nearest neighbors: If the class of x depends on a single prototype only,

it is easily affected by noise This can be avoided by selecting k nearest neighborsand derive the classification decision from their class labels The simplest way to

do this is a majority vote which assigns the most common among the class labels

in question This leads to equal contribution of every neighbor, independent of itsdistance to x and, thus, renders the approach unnecessary sensitive to the choice

of k Individual weights wi for every selected prototype ki, i = 1, , k , can bederived by taking the actual distance d (x , ki) into account For example, using

wi = d (x ,k1

i ) 2 greatly reduces the influence of distant training samples

A case is classified by a majority vote of its neighbors, with the case beingassigned to the class most common amongst its K nearest neighbors measured by

a distance function If K = 1, then the case is simply assigned to the class of itsnearest neighbor

• Euclidean distance function:

The Hamming distance: DH =

Trang 36

1.4 Ant Colony Optimization

In the early 1990s, ant colony optimization (ACO) [21] was introduced by

M Dorigo and colleagues as a novel nature-inspired meta-heuristic for the tion of hard combinatorial optimization (CO) problems ACO is a meta-heuristicfor solving hard combinatorial optimization problems The inspiring source ofACO is the pheromone trail laying and following behavior of real ants, which usepheromones as a communication medium In analogy to the biological example,ACO is based on indirect communication within a colony of simple agents, called(artificial ) ants, mediated by (artificial ) pheromone trails The pheromone trails

solu-in ACO serve as a distributed, numerical solu-information, which the ants use to abilistically construct solutions to the problem being solved and which the antsadapt during the algorithms execution to reflect their search experience

prob-The first example of such an algorithm Ant System (AS), which was proposedusing as example application the well known traveling salesman problem (TSP).Despite encouraging initial results, AS could not compete with state-of-the-artalgorithms for the TSP Nevertheless, it had the important role of stimulatingfurther research both on algorithmic variants, which obtain much better compu-tational performance, and on applications to a large variety of different problems

In fact, there exist now a considerable number of applications of such algorithmswhere world class performance is obtained Examples are applications of ACOalgorithms to problems such as sequential ordering, scheduling, assembly line bal-ancing, probabilistic TSP, 2D-HP protein folding, DNA sequencing, proteinliganddocking, packet-switched routing in Internet-like networks, and so on The ACOmeta-heuristic provides a common framework for the existing applications andalgorithmic variants Algorithms which follow the ACO meta-heuristic are calledACO algorithms [18, 19]

The (artificial ) ants in ACO implement a randomized construction heuristicwhich makes probabilistic decisions as a function of artificial pheromone trails andpossibly available heuristic information based on the input data of the problem

to be solved As such, ACO can be interpreted as an extension of traditionalconstruction heuristics, which are readily available for many combinatorial opti-mization problems Yet, an important difference with construction heuristics isthe adaptation of the pheromone trails during algorithm execution to take intoaccount the cumulated search experience

In ACO, a number of artificial ants build solutions to the considered timization problem at hand and exchange information on the quality of these

Trang 37

op-solutions via a communication scheme that is reminiscent of the one adopted byreal ants Different ant colony optimization algorithms have been proposed Theoriginal ant colony optimization algorithm is known as Ant System and was pro-posed in the early nineties Since then, a number of other ACO algorithms wereintroduced in Table.1.6 All ant colony optimization algorithms share the sameidea, which is best illustrated through an example of how ACO algorithms can beapplied [19, 20].

Table 1.6: A non-exhaustive list of successful ACO algorithms

Algorithm Acronyms Authors Years Ant System AS Dorigo, Maniezzo and Colorni 1991 Elitist Ant System EAS Dorigo, Maniezzo and Colorni 1992 Ant-Q AQ Grambardella and Dorigo 1995 Ant Colony System ACS Dorigo and Grambardella 1996 Max-Min Ant System MMAS Stutzle and Hoos 1996 Rank-Based Ant System RBAS Bullnheimer, Hartl, and trauss 1997 ANT-Net ANTN Caro v Dorigo 1998

Best-Worst Ant System BWAS Cordn, Viana, and Herrera 2000 Hyper-Cube Ant System HCAS Blum, Roli and Dorigo 2001 Population-based Ant Colony System PACS Guntsch andS Middendorf 2002 Ant Colony Optimization ACO Dorigo andS Stutzle 2004 Beam-Ant Colony Optimization BACO Blum 2004 Parallel-Ant Colony Optimization PACO Manfrin, Birattari, Stutzle and Dorigo 2006 cAnt-Miner CAM Otero, Freitas and Johnson 2008 Multi-colony Ant Colony System MACS Melo, Pereira and Costa 2009

1.4.1 The Ant Colony Optimization Meta-heuristic

Given a combinatorial optimization problem (COP), the first step for theapplication of ACO to its solution consists in defining an adequate model This isthen used to define the central component of ACO: the pheromone model

The first step for the application of ACO to a COP consists in defining a model

of the COP as a triplet (S , Ω, f ), where: S is a search space defined over a finiteset of discrete decision variables; Ω is a set of constraints among the variables;and f : S → R0+ is an objective function to be minimized (as maximizing over f isthe same as minimizing over −f , every COP can be described as a minimizationproblem)

The search space S is defined as follows A set of discrete variables Xi, i =

1, , n, with values vij ∈ Di = {v1

i, , v|Di |

i }, is given Elements of S are full ments, that is, assignments in which each variable Xi has a value vij assigned fromits domain Di The set of feasible solutions SΩ is given by the elements of S thatsatisfy all the constraints in the set Ω

Trang 38

assign-A solution s∗ ∈ SΩ is called a global optimum if and only if

prob-C either with the set of vertices V or with the set of edges E

A pheromone trail value ij is associated with each component cij (Notethat pheromone values are in general a function of the algorithm’s iteration t :

τij = τij(t ).) Pheromone values allow the probability distribution of differentcomponents of the solution to be modeled Pheromone values are used and updated

by the ACO algorithm during the search

The ants move from vertex to vertex along the edges of the constructiongraph exploiting information provided by the pheromone values and in this wayincrementally building a solution Additionally, the ants deposit a certain amount

of pheromone on the components, that is, either on the vertices or on the edgesthat they traverse The amount ∆τ of pheromone deposited may depend on thequality of the solution found Subsequent ants utilize the pheromone information

as a guide towards more promising regions of the search space

The ACO meta-heuristic is:

Algorithm 1.1 Ant colony optimization meta-heuristic

BEGIN

Set parameters, initialize pheromone trails

while termination conditions not met do

Construct Ant Solutions;

Apply Local Search optional;

Trang 39

algo-by all ants, (ii) the (optional) improvement of these solution via the use of a localsearch algorithm, and (iii) the update of the pheromones These three componentsare now explained in more details.

1.4.1.1 Construct Ant Solutions

A set of m artificial ants construct solutions from elements of a finite set ofavailable solution components C = {cij}, i = 1, , n, j = 1, , | Di | A solutionconstruction starts with an empty partial solution sp = φ Then, at each construc-tion step, the current partial solution sp is extended by adding a feasible solutioncomponent from the set of feasible neighbors N (sp) ⊆ C

The process of constructing solutions can be regarded as a path on the struction graph GC(V , E ) The allowed paths in GC are implicitly defined bythe solution construction mechanism that defines the set N (sp) with respect to apartial solution sp

con-The choice of a solution component from N (sp) is done probabilistically ateach construction step The exact rules for the probabilistic choice of solutioncomponents vary across different ACO variants The best known rule is the one

of ant system (AS):

1.4.1.2 Apply Local Search

Once solutions have been constructed, and before updating pheromones, oftensome optional actions may be required These are often called daemon actions,and can be used to implement problem specific and/or centralized actions, whichcannot be performed by single ants The most used daemon action consists inthe application of local search to the constructed solutions: the locally optimizedsolutions are then used to decide which pheromones to update

Trang 40

1.4.1.3 Update Pheromones

The aim of the pheromone update is to increase the pheromone values ciated with good solutions, and to decrease those that are associated with badones Usually, this is achieved (i) by decreasing all the pheromone values throughpheromone evaporation, and (ii) by increasing the pheromone levels associatedwith a chosen set of good solutions Supd:

F (·) is commonly called the fitness function

Pheromone evaporation implements a useful form of forgetting, favoring theexploration of new areas in the search space Different ACO algorithms, for exam-ple Ant Colony System (ACS) or MAX-MIN Ant System (MMAS), differ in theway they update the pheromone

Instantiations of the update rule given above are obtained by different ifications of Supd, which in many cases is a subset of Siter∪{sbs}, where Siter isthe set of solutions that were constructed in the current iteration, and sbs is thebest-so-far solution, that is, the best solution found since the first algorithm itera-tion A well-known example is the AS-update rule, that is, the update rule of antsystem

An example of a pheromone update rule that is more often used in practice

is the IB-update rule (where IB stands for iteration-best):

Supd ← arg max

s∈S iter

The IB-update rule introduces a much stronger bias towards the good lutions found than the AS-update rule Although this increases the speed withwhich good solutions are found, it also increases the probability of premature con-vergence An even stronger bias is introduced by the BS-update rule, where BSrefers to the use of the best-so-far solution sbs In this case, Supd is set to {ssb} In

Định dạng
Số trang	134
Dung lượng	6,61 MB