Back view car model recognition

Vehicle detection or sometime called vehicle segmentation is a task that locate vehicles in images.. 3 Figure 1.1 Objects being Recognized by Vehicle Detection, Vehicle Type Recognition,

Trang 1

BACK-VIEW CAR MODEL RECOGNITION

LE THANH SACH

A THESIS SUBMITTED IN PARTIAL FULFILLMENT

OF THE REQUIREMENT FOR THE DEGREE OF MASTER OF ENGINEERING IN COMPUTER ENGINEERING

SCHOOL OF GRADUATE STUDIES KING MONGKUT’S INSTITUTE OF TECHNOLOGY LADKRABANG

2007

Trang 2

COPY RIGHT 2007

Trang 4

วิทยานิพนธฉบับนี้มีจุดประสงคเพื่อเสนอแนวทางใหมในการจําแนกรุนและบริษัทผูผลิต ของรถยนตจากภาพนิ่งดานหลังของรถยนต ในลําดับแรกสีแดงของไฟทายรถจะถูกตรวจหาและ ถูกทดสอบ ดวยโมเดลการกระจายของแมสี ซึ่งสรางขึ้นจากตัวอยางของสีไฟทาย บริเวณที่เปนสี แดงที่อาจเปนไฟทายจะถูกตรวจจับไดโดยการเทียบพิกเซลกับแบบจําลองความเขมของสีแดงที่มี หลังจากนั้นจะมีการตรวจสอบคุณสมบัติทางตําแหนง(geometric) ของภาพทายรถเพื่อหา ตําแหนงที่ควรจะเปนไฟทาย ในการจําแนกประเภทและรุนของรถยนตมีการใชเทคนิค Eigen รวมกับการวิเคราะหแบบแยกแยะเปนเสนตรง Fisher

ในการทดลองไดนําขอมูลของรถยนตจากบริษัทผูผลิตหลายบริษัทและหลายรุนที่เปนที่ นิยม จํานวน 17 รุน ไดถูกรวบรวม เพื่อการวิเคราะหและทดสอบระบบ ความถูกตองในการจําแนก ประเภทประมาณ 93 เปอรเซนต ผลการทดลองแสดงใหเห็นวางานวิจัยนี้สามารถที่จะพัฒนาตอไป เพื่อสามารถแยกแยะรุนและบริษัทผูผลิตจํานวนมากขึ้นได

Trang 5

II

Thesis Title Back-View Car Model Recognition

Degree Master

Year 2007

Thesis Advisor Dr Watchara Chatwiriya

Thesis Co-Advisor Prof Dr Shozo Kondo

Trang 6

Acknowledgements

I would like to thank Dr Watchara Chatwiriya, my advisor, for his enthusiastic guidance and expansive discussion during the past 24 months I am also thankful to my co-advisor, Prof Dr Shozo Kondo in Tokai University, for his encouragement and practical suggestion

Specially, I am very grateful to all of members in my family - they are always motivation for me to better myself

It is also noted that I received all of kind help from members in my laboratory; I could study well in KMITL thanks to a friendly working environment they created for me Finally, I would like to mention that this thesis can not be realized without the support of JICA project for AUN/SEED-Net

May, 2007

Trang 7

IV

Contents

Page

บทคัดยอ I

Abstract II

Acknowledgements III

Contents IV

List of Tables VII

List of Figures VIII

Chapter 1Introduction 1

1.1 Background 1

1.2 Objective of the Study 2

1.3 Statement of the Thesis 3

1.4 Assumption of this Study 4

1.5 Theory or Concept to be Used in this Research 4

Chapter 2 Literature Survey 6

2.1 Vehicle Recognition 6

2.1.1 Sensor Selection 6

2.1.2 Vehicle Detection 7

2.1.3 Feature Extraction 10

2.1.4 Recognition 12

2.2 Color Image Segmentation 13

2.3 Eigen-Technique 14

2.3.1 Principal Component Analysis (PCA) 15

2.3.2 Fisher Discriminant Analysis 18

Chapter 3 System Architecture and Data Collection 25

3.1 System Architecture 25

3.2 Dataset Collection 26

Trang 8

Contents (cont.)

Page

3.2.1 Conditions for Capturing Image 26

3.2.2 The Number of Car Makes and Models under Consideration 29

3.3 Sample Reference Color Collection 30

Chapter 4 Car Back-View Image Segmentation 32

4.1 Introduction 32

4.2 Reference Color Learning 33

4.2.1 Color Density Modeling 33

4.2.2 Density Level Selection 46

4.3 Segmentation and Normalization 47

4.3.1 Segmentation 47

4.3.2 Normalization 59

Chapter 5 Feature Selection 61

5.1 Introduction 61

5.2 Image Space and Eigencar 62

5.3 Car Space and Car Feature 64

5.4 Fisher Car Space and Fisher Car Feature 65

Chapter 6 Recognition 69

6.1 Recognition with Quadratic Discriminant Functions 69

6.2 Recognition with Linear Discriminant Functions 72

6.3 Recognition with Nearest Neighborhood Rule 73

Chapter 7 Result and Discussion 76

7.1 Segmentation 76

7.1.1 Learning Reference Color 76

7.1.2 Separating Car back-views 77

Trang 9

VI

Contents (cont.)

Page

Chapter 8 Conclusion 86

Bibliography 88

Appendix A Sample Car Images 93

Appendix B Publication List 102

Trang 10

List of Tables

Table Page

3.1 Makes, Models, Years and Number of Sample Images 30

4.1 Color Prototype Learning Algorithm 37

4.2 Color Combination Algorithm 43

4.3 Likelihood Optimization Algorithm 46

4.4 Formulation for Geometric Measurements 50

7.1 Parameters used in HPL, SA and EM algorithms 76

7.2 Recognition Performance 80

7.3 Recognition Performance Using LDF 83

7.4 Recognition Performance Using QDF 84

7.5 Recognition Performance Using K-NN (K=5) 85

Trang 11

VIII

List of Figures

Figure Page 1.1 Objects being Recognized by Vehicle Detection, Vehicle Type Recognition, and

Car Model Recognition 3

1.2 Name of Several Components in Car Back-Sides 4

2.1 Directions of Projections versus Scale Factors (a-c) are Projections onto the Same Direction with Different Scale Factors (d) is Projection onto the Direction Discovered by Fisher Mapping with Scale Factor Equal to 1 20

3.1 Proposed System Architecture 26

3.2 System Configuration 28

3.3 Slanted Angle of Car Back-Side 28

3.4 A Typical Image in the dataset 29

3.5 (a) An Example Distribution of Sample Colors in RGB Color Space (b) Several Red Images Sliced from Tail Light Locations 31

4.1 (a) An Input Image (b) Its Separated Back-View Image 32

4.2 Steps in Color Density Modeling 35

4.3 (a) A Sample Set of Reference Colors Projected onto 2D-plane (u*v*) (b) An Example of Approximation Using Circular Prototypes 36

4.4 Other Approximations for The Distribution in The Previous Figure; (a) Using Big Size Circular Prototypes and (b) Using Small Size Prototypes 38

4.5 A Simple Distribution and Its Possible Approximations; (a) and (b) Use Spherical Covariance Matrices with Different Orders of Sample Colors; (c) Uses Full Covariance Matrix 39

4.6 Interested Regions and Their Boundaries Defined by Several Density Levels 47

4.7 A Simple Case of Pixel Classification (a) Original Image (b) Filtered Image 48

4.8 Definition of Car Back-View Parameters 50

4.9 Variances and Loci of Gravity Centers for Several Car Back-View Images 52

4.10 Location and Size of Car Back-View Image Candidate 53

4.11 Some Complicated Results of Pixel Classification 54

4.12 Examples of Histograms and Lanes 55

4.13 Removing Noisy Areas in Filtered Images (a) After Filtering (b) After Removing 56

Trang 12

List of Figures (cont.)

Figure Page

4.14 H-lanes Detection 56

4.15 Bounding Rectangles for Red Areas 57

4.16 Rectangular Regions of Several Candidates 57

4.17 Symmetric Rule Verification 58

4.18 Car Back-View Separation Flowchart 59

5.1 Steps in Selecting Representative Features 62

5.2 Flowchart for Obtaining Eigencars 64

5.3 Representation of Data 66

5.4 Discrimination Between Classes 67

5.5 Algorithm for Obtaining Discrimination Directions 68

6.1 Recognition Steps 72

6.2 K-NN Example 73

6.3 K-NN Algorithm 75

7.1 Boundaries of Reference Regions defined by HPL and the Proposed Method 77

7.2 Failure Situations in Detecting Red Areas 79

7.3 The Impact of Number of Dimensions (a) for Car Space (b) for Fisher Car Space 82

Trang 13

Car Model Recognition deals only with a subset of vehicles called “car” Although there are a considerable number of existing works in vehicle recognition, most of them are in either vehicle detection or vehicle type classification None of serious research can classify vehicles into subclasses of types, such as makes and models of vehicles The following text in this section presents a brief introduction to VBVR and several existing approaches The detail of VBVR can be found in the first section of chapter 2 Generally, a VBVR system is composed of three tasks, vehicle detection, representative feature extraction and recognition All of these three tasks are important

in the sense of making the system accurate and usable

Vehicle detection or sometime called vehicle segmentation is a task that locate vehicles in images Although locating objects is a very simple work for human, it is really challenging for machines There are several existing approaches [4] for detecting vehicles; such approaches can be classifies into two groups which are called

“exhaustive detection” and “selective detection” in this thesis In exhaustive detection, vehicles are searched at every pixel in images; meanwhile, selective detection focuses the search around the most likely locations only by using specific information Obviously, exhaustive detection is time-consuming and prohibitive in real-time applications [4]

Trang 14

The step following detection is to obtain representative features for classes In the view of recognition, the term “classes” is used to refer to groups of objects being distinguished by the systems [5] For example, in case of vehicle detection, a class can

be either a group of vehicles under investigation or a group of backgrounds and other obstacles; meanwhile, classes can be “Bus”, “Truck”, “Car” and so forth in vehicle type recognition Typical way for obtaining features of vehicles measures vehicle properties such as length [2], height, width [6] and color [7] Recent researches shows that other features that obtained by Principal Component Analysis (PCA) [8] [9] and transform domain (e.g Wavelet, Gabor filter) [10] are also efficient in discriminating data between classes The most challenge in feature extraction is to obtain features that are discriminative and robust to noise, distortion and modification in vehicles

The last step in VBVR is to classify unknown vehicles into classes; this step is also called recognition The method for recognition can be as simple as a comparison in some applications [7] [11] However, majority of literature use classical pattern classification methods such as Quadratic Discriminat Function (QDF), K-Nearest Neighbor (K-NN), Probabilistic Neural Network (PNN) and Support Vectore Machine (SVM) [8] [9] [10] for recognizing unknown objects

1.2 Objective of the Study

It can be seen from the previous section that the most challenging tasks in VBVR are to detect vehicles in images accurately and quickly, and to enhance the recognition complexity which is defined as the number of classes being recognized and the amount

of information that a VBVR system can answer Actually, expanding all of vehicle types

on the world into subclasses is likely impossible for a time-bounding work For these reasons, this thesis selects a subset of vehicles called “car” and aims to achieve the following tasks

1 It utilizes color and geometric properties of car back-sides in order to speed up the segmentation process for cars from images that captured from the back-view

of cars in near-field view

2 It increases the recognition complexity by trying to recognize car makes and

models as showed in Figure 1.1

Trang 15

3

Figure 1.1 Objects being Recognized by Vehicle Detection, Vehicle Type Recognition,

and Car Model Recognition

1.3 Statement of the Thesis

An investigation in many cars shows that car back-sides contain red colors at car tail lights and has some other geometric properties such as symmetry and correlation between components inside From these observations, the thesis will tackle the objectives above as follows

1 It proposes a method for describing the region of red colors in color spaces As can be seen in section 3.3 that red areas at tail lights do not contain only one pure red color, i.e [255, 0, 0] in RGB color space; actually, they contain all colors

in a certain region inside color spaces Therefore, the thesis has proposed a statistical approach for approximating such distributions

2 The thesis uses red colors to limit the region in images for searching car view; and thereby, it can speed up the segmentation task Car back-views are detected in thesis by verifying geometric properties for candidate car back-views

back-3 Eigen-technique is used for selecting representing and discriminating features of car models These features are used to recognize car models by linear discriminant function, quadratic discriminant function and nearest neighbor rule

Background, other obstacles Bus

Trang 16

1.4 Assumption of this Study

In order to realize ideas above, the thesis assumes that some assumptions should

1.5 Theory or Concept to be Used in this Research

Definition 1.1: The term car make and model are used to refer to sub-classes of vehicles, as showed in Figure 1.1

Definition 1.2: The names of several components in car back-sides that are referred

to in this thesis are given in Figure 1.2

Definition 1.3: The term red color in this thesis does not mean the pure red color in

color space, i.e [255, 0, 0] in RGB color space It can be any color that can appear in red areas of tail lights

Figure 1.2 Name of Several Components in Car Back-Sides

Left Tail Light Bumper Right Tail Light

Windshield Spoiler

Red Areas

in Tail Lights

License Plate

Trang 17

5

Definition 1.4: In this thesis, we use colors that appear in red areas of tail lights as

reference objects for segmenting car back-view images Such colors we name reference colors or interested colors for interchange Regions in color space contain such colors we name reference color regions or interested color regions

Definition 1.5: Rather than specifying each color in a color space as a reference

color, we should collect a set of such colors and then seek a way to infer reference color regions from this set Colors which are collected for such goal are called sample reference colors or sample colors for short

Trang 18

2.1.1 Sensor Selection

Generally, the first step in designing vehicle recognition systems is to select suitable sensor types for acquiring input data Steps thereafter in vehicle recognition are much dependent on the selected sensors

Sensors can be classified into two types [1], active and passive The term “active”

is used to mean that sensors detect the distance of objects by measuring the travel time

of signal emitted by the sensors and reflected by the objects Radar-based, laser-based and acoustic-based are examples of this category Meanwhile, optical sensors such as normal cameras are classified as passive sensors; sometime they are also called vision-based sensors Vehicle recognition that uses vision-based sensors is called vision-based vehicle recognition which is the context of the study in this thesis

Although vision-based sensors are less robust then radar-based and laser-based in rain, fog, night and direct sunshine, they are inexpensive and able to create a broad field of view for vehicles (up to 360degree around vehicles) Moreover, they can be used for some other specific applications such as lane marking detection and obstacle identification without requiring any modification to road infrastructure Vision-based sensors also avoid interference between sensors of the same type, which can be critical for a large number of vehicles using active sensors moving simultaneously in the same environments These reasons explain for the fact that vision-based approach receives much attention from researchers in vehicle recognition recent years

Trang 19

7 Three following sections present a survey of existing approaches for other steps in vehicle recognition that use vision-based sensors

2.1.2 Vehicle Detection

Vehicle detection is a step in vehicle recognition that locates the location of vehicles on the whole images Locations of vehicles are usually described by rectangular regions in images Such regions are called regions of interest or ROI in some applications [10] Although detecting ROI is straightforward in systems which use active sensors; it is a complicated task in vision-based systems

Generally, the framework for detecting ROIs contains two basic steps as follows

1 This step is to generate candidates for ROIs inside the whole images Basically, there are two approaches that are called exhaustive and selective detection in this thesis

2 The second step in the framework is to verify candidates to decide whether a candidate is a real ROI of vehicle Because majority of systems under the investigation such as in [2] [8] consider the verification as a two-class recognition problem; therefore the description of candidate verification is delayed and explained in section “recognition” following

2.1.2.1 Exhaustive Detection

Existing studies in this approach assume that there is no priori knowledge available for detection Hence, in order to detect vehicles, several windows of different sizes have been slided over the whole image to generate candidates [8] [9] [12] Researches in this approach is able to detect vehicles at every pixel in input images However, clearly, there are a tremendous number of candidates that will be generated by such way; therefore, this approach needs powerful computing resources and seems to be prohibitive for real time applications Usually, inputs for systems developed using this approach are still images

2.1.2.2 Selective Detection

This approach generates candidates around only the most likely regions by utilizing some specific information; therefore it can speed up the detection process Specific information can come from many ways which are summarized as follows

Trang 20

2.1.2.2.1 Subtraction-based Method

Most of researches that use vision-based sensors alone follow this method of

candidate generation Candidates are generated by a subtraction between input images

and background or between two consecutive images in image sequences [2] [11] [13]

[14] [15] The former is used only in case of the background can be modeled or

collected reliably; while the later is usually used for detecting moving objects in image

sequences

A typical background subtraction has been studied in [2] [11]; because stationary

vision-based sensors were used in a controllable environment, the background image,

called Ibg, could be modeled reliably upon the program execution To detect vehicles in

image I, a binary image Ib was formed as in equation (2.1); where θ was a threshold

value to transform the difference between two images into the binary image White pixels

in Ib that were inside enough large regions were considered as pixels in ROI

1 ,| ( , ) ( , ) | ( , )

0 ,

bg b

On the other hand, studies in [13] and [15] could adapt the background to the

change of the environment by an algorithm so-called self-adaptive background

subtraction The principal of the method in those studies is to modify the background

image (CB) by using instantaneous background (IB) and applying an appropriate

weighting α as follows

1 (1 )

CB+ = − α CB + αIB

Where, k is frame index in image sequences The instantaneous background is

defined as IB k =M k•CB k + (~M k) •I k; where, Ik is the current frame, Mk is the binary

vehicle mask and similar to Ib above

2.1.2.2.2 Knowledge-based Method

Knowledge-based methods utilize properties of vehicles such as their symmetry,

colors, edges and textures to hypothesize vehicle locations in images

1 Symmetry

Symmetry is one of the main signatures of man-made object that is very useful for

detecting and recognizing vehicles [4] Images of vehicle observed from the back-view

Trang 21

In [17], symmetric measure S A( , )x w s was computed for each scan-line of image; where, xs was position of a potential symmetric axis in interval w inside scan-line Symmetric measures for all scan-lines were accumulated to form symmetric histogram for images ROI candidates were derived from symmetric histogram and edge map of the image after that

On the other hand, work in [19] used symmetric property as a criterion for validating ROI candidates; and, in [20] the symmetric detection was formulated as an optimization problem which was solved using Neural Networks

2 Color

Although color information is very useful in face detection [21] [22] and other applications in vehicle recognition such as lane and road detection [23], there is only few existing systems use color in detecting vehicles

In [23], a set of sample colors for road was collected; after that, regions in color space that contain road colors were approximated by using spheres by a density-based learning algorithm Lu*v* color space was used in order to achieve best in the uniformity

of perception Roads were detected by checking each pixel in input images to decide whether it was inside or outside of the approximated region

A typical research that uses color for detecting vehicles was presented in [24] In that research, colors of cars and background were collected and normalized by a proposed method in that work Both normalized colors of cars and backgrounds were assumed to follow Gaussian model; thereby, all pixels in images could be classified as foreground (cars) or background according to Bayesian classifiers Pixels that were classified as foreground were good suggestions for location of cars in images

3 Shadow

Trang 22

According to [25], the shadows underneath vehicles can be used as a sign for detecting vehicles; because these regions are darker and cooler than other in images It

is obvious that such signs are very useful in the sense of locating the position of vehicle

in images However, it is difficult for choosing suitable threshold value to segment shadows from other regions Moreover, the shadow depends heavily on the illumination condition and the moving direction of vehicles

4 Vertical/Horizontal Edge and Corners

The boundary of the vehicle back-sides is nearly rectangles; moreover, vehicle back-view images usually contain many horizontal and vertical lines From these observations, studies in [16] [18] [26] [27] [28] have proposed several ways for using edges and corners to hypothesize the location of vehicles in images

The method presented in [16] and [18] generated candidates for vehicles by combining symmetric properties, corners and edges obtained from edge maps of images On the other hand, the method proposed in [26] segmented images into four regions: pavement, sky, and two lateral regions using edge grouping After that, groups

of horizontal edges on the detected pavement were then considered for hypothesizing the presence of vehicles

2.1.3 Feature Extraction

Feature extraction is a step that obtains characteristic features which will be used for verifying candidates generated in detection step above or for recognizing vehicles in vehicle recognition applications Finding robust and discriminative is the most challenge

in this step The following sections present several ways for extracting features that were used in majority of literature

2.1.3.1 Vehicle Features

Studies in this group aims to extract features that are properties of vehicles, e.g length, height, width, the number of axles and wheels, and colors Except the number of axles and wheels which have been usually measured by active sensors [6] and [29], other properties have been estimated from images as in [2] [13] [15] [30] In those studies, lengths and widths were estimated as the width and the height of vehicle

Trang 23

11 regions in 2-D images respectively; meanwhile, heights were computed from two images by stereo-based approach in [6]

As another way, in [7], distributions of colors in several areas such as tail lights, license plate and windshield inside car back-view images were employed as features for characterize cars in car detection application

2.1.3.2 Statistical Features

In this approach, images containing vehicles (ROI) are converted into 1-D vectors Features are obtained by projecting these vectors onto pre-computed directions Such directions are eigenvectors which are derived from a set of training images This method

is called Principal Component Analysis (PCA) which is presented in detail later in this chapter Typical works that follow these approaches are in [8] and [9]

2.1.3.3 Transform Domain Features

Features extracted by this approach are computed as results of a transformation such as Gabor filter [10] [11] [31] and Wavelet [31]

Gabor filter responses for image I(x,y) of size NxN can be computed as equation below

Trang 24

2.1.3.4 Generic Features

The term “generic” is used to imply that methods in this approach use general algorithms in image processing such as edge [32] and histogram [33] for extracting features

Xiaoxu et al proposed in [32] a method for extracting features by combining some following steps

1 Extract edge points using edge detection methods

2 Use SIFT [34] as local descriptor to extract local features for each edge point

3 Segment edge points into point groups based on the similarity of edge points

4 Form features from edge points segments

On the other hand, features were obtained in [33] by forming histogram of distance map which was the map of distances from each pixel in input image to corresponding pixel in the mean image of class

2.1.4 Recognition

Recognition is the step that label class name for unknown objects [5]; however, there is confusion in VBVR in the use of the term “recognition” and “detection” This is probably because the detection can be seen as a recognition problem with two classes, vehicles versus background and other obstacles [2] [8]

Recognition step is usually dependent on the kind of representative features for vehicles For example, in [7], colors of several components in car back-views such as tail lights, license plate and windshield were modeled by Gaussian Mixture Model (GMM) In order to recognize vehicles, likelihood ratio which was defined as quotient of the likelihood of testing image over the likelihood of training images was computed This value was compared to a pre-defined range to yield the recognition result which was the detection result in that study

As simple as [7], in order to recognize unknown object in [11], Gabor Jets for each pixel in testing images were computed and compared with ones derived for training images Unknown object was labeled by the label of class which was best matched with the testing image

Trang 25

13 Compared to specific methods as above, most of existing studies utilize classical pattern classification methods such as QDF, K-NN, PNN, SVM [8] [9] [10] for recognizing unknown objects

2.2 Color Image Segmentation

Image segmentation has a central role in vision-based recognition such as vehicle recognition, and face recognition; it is a process that partitions image into meaningful regions Such partition is the first obligatory step of vision systems Its qualification impacts deeply on the performance and the accuracy of the overall vision systems Recent studies favor to find techniques for doing segmentation with color images which naturally own more features than monochrome images

Colors of interested objects in images are characterized by their chromaticity and brightness [35] and, therefore, affected by the lighting conditions For this reason, interested colors are distributed randomly with an unknown probability density function

in color spaces Despite the fact that several color spaces have been employed to make the perception of colors more uniformity, the nature of unknown distribution form has not filtered out completely Hence, modeling the density of interested colors is still a problematic task

In some applications where the lighting condition is controllable, and the form or parameters of color density functions can be acquired or simply estimated, interested color regions in color spaces can be described by using cubes [21] [36], or spheres [37] or ellipses [22] Generally speaking, the assumption in those approaches is rarely satisfied in broader cases of color image segmentation Moreover, several afore-mentioned works employed a manual way to extract parameters

Research in [24] approached the color distribution modeling in statistical way It required the acquisition of both groups of interested and background colors; and utilized Bayesian classifier to segment incoming image pixels Generally, this approach can work well when the distribution of interested and background colors are in normal form and separable to each other However, that requirement is likely impractical Based on the assumption that any distribution of points in multi-dimensional space can be approximated by a GMM with enough mixing components [38], works in [39],

Trang 26

[40] and [41] utilized GMMs as the underlying model for modeling the distribution of

interested colors or combination of colors, textures, depths, and positions of pixels

However, finding a suitable starting point for learning parameters of GMMs and

choosing a reasonable number of mixing components are still drawbacks in those

approaches

2.3 Eigen-Technique

Eigenvalue problem has a widespread use in engineering; it aims to find

eigenvectors and eigenvalues that satisfy the equation below [42]

Ax= λx

Where, A is a square matrix; x and λ are eigenvectors and eigenvalues

respectively

Eigenvectors and eigenvalues have been demonstrated by many researches that

they are very useful for obtaining both of representing [9] [43] [44] [45] and

discriminating features [46] [47] in vehicle and face recognition In this thesis, the term

eigen-technique is used to refer to a way of using eigenvectors and eigenvalues

In studies [9] [43] [44] [45], images were converted to 1-D vectors; and then

projected onto K selected eigenvectors having largest eigenvalues Eigenvectors in

those researches were computed by solving the traditional eigen problem as in equation

(2.2); where, S was the scatter matrix of training samples This process is explained in

detail in next sub-section

On the other hand, studies in [46] and [47] obtained features by solving different

generalized eigen problems in equations (2.3) and (2.4) respectively

Where, SB and Sw were between and within-scatter matrices respectively; X, L and

D were matrices formed by specific ways in that study

The following two sub-sections are to present the way for obtaining representing

and discriminating features based on eigen technique

Trang 27

15

2.3.1 Principal Component Analysis (PCA)

PCA is a mathematic tool that has a long history The root of PCA was originated

from efforts of Pearson in 1901, Hotelling in 1933 [48] However, an efficient way for

computing eigenvectors and eigenvalues was delayed until 1940 by Karhunen and

Loeve

Mathematically, PCA is a linear transformation; it transforms a vector x in

d-dimensions space to vector y in d’-d-dimensions space, as showed in equations (2.5) and

(2.6) Where, d’ is usually considerable smaller than d; Wpca is the transformation matrix

This kind of transformation can be achieved by projecting vector x onto column vectors

The underlying ideas of PCA is that it aims to find a projection which can map

high-dimensional data to low-high-dimensional space while maximizes the variance of projected

data As the same effect, PCA also minimizes the sum of squared distances between

original data and projected data as much as possible The way to find such projections

can be seen in the statement below

Statement 2.1: Given a set of N samples in d-dimensions space X={x1,x2, ,xN},

transformation matrix Wpca of PCA is established from column eigenvectors that are

computed from covariance matrix C of X and their associated eigenvalues are largest

among all of eigenvalues

Proof:

Let m and S be mean and scatter matrix of X, then they can be computed by

equations (2.7) and (2.8) respectively

1 1

N k N k

At beginning, assuming that we need to reduce d-dimensions to 1-dimension, the

projection is defined as unit vector e passing through sample mean m Let yk be image

of xk in the projection defined by e, yk can be expressed as equation (2.9); where, ak is a

scalar value corresponding to the distance from yk to sample mean m

Trang 28

k k

An optimal set of ak can be obtained by minimizing the sum of squared distance as

showed in equation (2.10); where, J a1 ( , , 1 a N, )e is expressed in equation (2.11)

2 1

k N

Recognizing that ||e|| =1, partially differentiating with respect to ak, and setting the

derivative to zeros, we obtain

k k

∑ is independent to e Using Lagrange multipliers method with

undetermined multiplier λ, we can maximize etSe subject to the constraint that ||e|| =1

Trang 29

17 Equation (2.12) means that e and λ are eigenvector and eigenvalue of S

corresponding Moreover, because of etSe = λete=λ, in order to maximize etSe we can

select the eigenvector corresponding to the largest eigenvalue of S In other words, to

find the best one-dimensional projection of data, we can project the data onto a line

through the sample mean in direction of the eigenvector of the scatter matrix having the

largest eigenvalue

This result can be extended from one-dimensional projection to d’-dimensional

projection In place of equation (2.9), we write

' 1

When Jd’ is minimized, vectors e1, e2, ,ed’ are d’ eigenvectors of the scatter matrix

having the largest eigenvalues In other words, the projection matrix of PCA can be

formulated from eigenvectors of S as W pca = [ , , ,e e1 2 e d' ]; where, ei is column vector

Because S is merely equal to the sample covariance matrix C, defined in (2.13),

multiplying by N, maximizing etSe has the same effect as maximizing etCe

At this point, we already have a tool to obtain best directions for representing data

However, d-dimensions spaces that PCA works on are usually very huge; and therefore,

the size of C or S is very large For example, if d-spaces are formed from images of size

100x150 then size of C or S equals to 15000x15000 In some situations in which the

number of training samples are small or at least smaller than d, we can compute

eigenvectors more efficiently via smaller covariance matrix, as showed in Statement 2.2

below

Before the statement is given, we need some notations that are defined as follows

Let S={s1,s2, ,sN} be a set of N samples in d-dimensions from which we want to find

transform matrix Wpca, the mean of S is given

Trang 30

1 1

N i N i

=

Let X={x1,x2, ,xN} be the set of N samples obtained by subtracting m from each

element in S, i.e.x i = −s i m i, = 1 N, the covariance matrix of S can be written as

t

Let C) be a matrix defined in equation (2.16), when the number of sample N is

smaller than the number of dimensions d, then the size C) is also smaller than size of C

t

Statement 2.2: Given C and C) defined in (2.15) and (2.16) above, then

eigenvectors of C can be computed via eigenvectors of C)

=

Equation (2.18) means that ei and μi are eigenvectors and eigenvalues of C

corresponding Therefore, we can conclude that eigenvectors ei of C can be computed

from eigenvectors vi of C) by transformations ei= Xvi; meanwhile eigenvalues of two

matrices are the same

2.3.2 Fisher Discriminant Analysis

PCA is very useful tool to capture the variance of data samples; and therefore, it

can be used as a tool for obtaining representing data However, in recognition, finding

best discriminative features is very important Fortunately, one of solutions for such

challenges can be found in [49]

Basically, finding discriminative by Fisher in [49] can seen as a mapping from

d’-space to d’’-d’-space as showed in equation (2.19), by transform matrix

Trang 31

19

1 2 ''

[ , , ]

W = w w w Result Z of the mapping for a set of data point in Y can be

computed as product in equation (2.20) below

One of visual arguments to archive the target “best for discriminating” is to find Wfld

so that it maximizes the difference between means of classes for projected data

samples However, Figure 2.1 demonstrates that the magnitude of direction vector w

affects highly on the distance between the projected means but it is not important as the

direction of w Therefore, to obtain good separation, it is expected that the distance

between means of classes should be large relative to some measure of the variance for

each class The criterion function can be written in equation (2.21)

distance between the projected means ( )

variance for each class

Trang 32

Figure 2.1 Directions of Projections versus Scale Factors (a-c) are Projections onto the

Same Direction with Different Scale Factors (d) is Projection onto the

Direction Discovered by Fisher Mapping with Scale Factor Equal to 1

(a)

(b)

Trang 33

21

Figure 2.1 Directions of Projections versus Scale Factors (cont.)

(d) (c)

Trang 34

As a simple case, we consider the problem of finding Wfld for data of two classes

ω1 and ω2 And, we also want to project the data from d dimensions onto a straight line

Of course, even if the samples formed well-separated, compact clusters in d-space,

projection onto arbitrary line will usually produce a confused mixture of samples from all

of the classes and thus produce poor recognition performance However, by moving the

line around, we might be able to find an orientation for which the projected samples are

well separated This is exactly the goal of classical discriminant analysis

Suppose that Y has N d’-dimensional samples, Y={y1,y2, ,yN}, in which there are N1

samples in the subset Y1 and the remains N2 samples are in the subset Y2 and N=N1+N2

Similarly, we denote Z={z1, z2, ,zN} Z1 and Z2 are subsets of Z; they contain projected

samples in Y1 and Y2 respectively Let w be unit vector indicating the direction of

projection that we want to find If we want to obtain the projected sample z∈Z for a

sample y∈Y , we need to compute the dot product below

N ∈

Let m% 1 and m% 2 be means of two classes in the projected space; they have the

relationship to m1 and m2 as showed in equation below

On the other hand, rather than forming the projected sample variance, we define

the scatter for projected samples as follows

N s% +s% is an estimate of the variance of the projected sample variance,

and (s% 1 +s% 2 ) is called the total within-class scatter of the projected samples

From derivations in equations (2.22) and (2.23), we can re-write the criterion

function in equation (2.21) as follows

Trang 35

% %

With some further steps of re-arrangements below, we can rewrite J(w) in another

form which is better to be generalized in case of more than two classes

Let Si and Sw are scatter matrices defined by

B

S = m −m m −m

We call Sw the within-class scatter matrix It is proportional to the sample covariance

matrix Moreover, it is symmetric and positive semidefine; and, it is usually nonsingular if

N>d’; where d’ is the number of dimensions of car space Likewise, SB is called

between-class scatter matrix

From derivations in equations (2.24) and (2.25), J(w) can be written

( )

t B t w

w S w

w S w= λand

Trang 36

Equation (2.26) shows that if Sw is nonsingular matrix the direction that best

discriminating data between two classes can be computed as the eigenvector having

the largest eigenvalue of matrix 1

w B

S S− For the C-class problem, the natural generalization of Fisher’s linear discriminant

involves C-1 discriminant functions Thus, the projection is from a d-dimensional space

to a (C-1)-dimensional space and it is tacitly assumed that d ≥ C The generalization for

within-class and between-class scatter matrices is defined in equations (2.27) and

w is now generalized to Wfld which its columns are directions that we want to seek

thus its size is d-by-(C-1) The criterion function J(w) is rewritten as

( )

t fld B fld t fld w fld

W S W

J w

W S W

=

Each wi in columns of Wfld can be obtained by solving the conventional eigenvalue

problem as showed in equation (2.26) However, this is actually undesirable, since it

requires an unnecessary computation of the inverse of Sw and also the singularity

problem of matrix Instead, one can find the eigenvalues as roots of the characteristic

polynomial

and then solve to find corresponding wi in equation below

(S B− λi S W)w i = 0

Trang 37

Chapter 3

System Architecture and Data Collection

3.1 System Architecture

The developed system can run in either training or recognizing phases Its structure

can be seen in Figure 3.1 The term “training” is used to refer to the tasks of learning

reference color regions from samples inside it and training classifiers being used in recognizing phase Meanwhile, recognizing phase makes decision on which make and model of cars appearing in incoming images

Before passing these two phases, reference images in the dataset, color samples and incoming images are transformed to appropriate color space on which the algorithm for learning reference colors works In this thesis, color space CIE L*u*v* is utilized due to its uniformity of perception; and, such transformation from RGB color space can be seen in [50]

The mission of module reference color learning is to define the boundary of reference color regions in color space This means that colors in red areas of tail lights are described clearly Thereby, they can be used in module car back-view segmentation to generate candidates for tail light locations and of course for position of car back-view also

After that, geometric properties of car back-views are verified in these suggested positions Several examples of such properties are ratio between the width and the height of real car back-view images, typical distance between two tail lights and symmetric property The candidates which best score verification criteria are considered car back-view images, and thus separated and normalized

In training phases, all of car back-view images which are extracted from the dataset are gathered altogether before selecting representative features for each of car model by module feature selection After that, selected representative features are used

to train classifiers such as linear and quadratic discriminant functions

In recognizing phase, in order to obtain characteristic features of cars, incoming images are processed through modules car back-view separation and feature selection

Trang 38

Then, extracted features are tested against trained classifiers mentioned before to result which makes and models of car appearing in incoming images

Figure 3.1 Proposed System Architecture

3.2 Dataset Collection

3.2.1 Conditions for Capturing Image

To experiment approaches proposed in this thesis, a set of images has been recorded for cars, and referred as a dataset in this thesis In order to capture images, several assumptions have been made and explained as follows

We assume that our recognition system is able to make decision about makes and models of cars which are standing inside cluttered background Therefore, motion blur effect might be avoided when building the system; and, dataset’s images can be acquired by capturing cars standing on road

Compared to the front-view and the side-view, the car back-view probably exhibits most of significant characteristic features of car makes and models From the back-view,

Dataset

Color

Samples

Car back-view image segmentation

Recognition Car makes and models

Training Phase Recognizing Phase

Incoming

image

Trang 39

27 not only indicative parts such as make’s logo and license plate, the form of tail lights, spoilers and bumpers also bear useful information Especially, tail lights and their surrounded areas can help recognition system distinguishing car makes and models rather accurately For those reasons, the dataset is built upon images captured from the back-side of cars

We also assume that the developed system is installed so that images can be

captured in near-field view, e.g a typical setup of camera system can be seen in Figure 3.2 Where, distance D from the camera to the back-side of cars is around 4 to 6 meters

In the horizontal direction, camera position can be varied within 1 miter around the vertical middle plane; whereas, in the vertical direction, camera position can be from 1.5

to 2.5 miters from the ground Morever, car back-sides can be slanted within ±3.5

degrees around horizontal line, as showed in Figure 3.3

As showed in the next chapter, colors in red areas of tail lights are used as reference elements from which car back-view images are separated and normalized Therefore, there is another assumption that the developed system is able to be installed

in cluttered scenery where only car may be in red colors This seems a very tight constraint Fortunately, we are still able to satisfy such requirement in real applications; moreover, geometry properties of car back-views such as symmetry and ratio between inside components can help to release that condition For experiment in this thesis, the dataset’s images were captured so that the scenery surrounded cars can contain several red areas but small and unconnected to tail lights; cars were also not in red colors

Trang 40

Figure 3.2 System Configuration

Figure 3.3 Slanted Angle of Car Back-Side

Effects such as rained, snowed and highly distorted are also not considered in the

developed dataset A typical image can be seen in Figure 3.4 below or in appdendix A

The dataset’s images are normalized to a size of 400x300 (WidthxHeight) that can be processed in acceptable time All of images in the dataset are in color format with bit depth of 24 and are stored in JPEG format

vertical middle plane

W

Slanted angle of car back-side The ground direction

Định dạng
Số trang	128
Dung lượng	8,02 MB