Vehicle detection or sometime called vehicle segmentation is a task that locate vehicles in images.. 3 Figure 1.1 Objects being Recognized by Vehicle Detection, Vehicle Type Recognition,
Trang 1BACK-VIEW CAR MODEL RECOGNITION
LE THANH SACH
A THESIS SUBMITTED IN PARTIAL FULFILLMENT
OF THE REQUIREMENT FOR THE DEGREE OF MASTER OF ENGINEERING IN COMPUTER ENGINEERING
SCHOOL OF GRADUATE STUDIES KING MONGKUT’S INSTITUTE OF TECHNOLOGY LADKRABANG
2007
Trang 2COPY RIGHT 2007
Trang 4วิทยานิพนธฉบับนี้มีจุดประสงคเพื่อเสนอแนวทางใหมในการจําแนกรุนและบริษัทผูผลิต ของรถยนตจากภาพนิ่งดานหลังของรถยนต ในลําดับแรกสีแดงของไฟทายรถจะถูกตรวจหาและ ถูกทดสอบ ดวยโมเดลการกระจายของแมสี ซึ่งสรางขึ้นจากตัวอยางของสีไฟทาย บริเวณที่เปนสี แดงที่อาจเปนไฟทายจะถูกตรวจจับไดโดยการเทียบพิกเซลกับแบบจําลองความเขมของสีแดงที่มี หลังจากนั้นจะมีการตรวจสอบคุณสมบัติทางตําแหนง(geometric) ของภาพทายรถเพื่อหา ตําแหนงที่ควรจะเปนไฟทาย ในการจําแนกประเภทและรุนของรถยนตมีการใชเทคนิค Eigen รวมกับการวิเคราะหแบบแยกแยะเปนเสนตรง Fisher
ในการทดลองไดนําขอมูลของรถยนตจากบริษัทผูผลิตหลายบริษัทและหลายรุนที่เปนที่ นิยม จํานวน 17 รุน ไดถูกรวบรวม เพื่อการวิเคราะหและทดสอบระบบ ความถูกตองในการจําแนก ประเภทประมาณ 93 เปอรเซนต ผลการทดลองแสดงใหเห็นวางานวิจัยนี้สามารถที่จะพัฒนาตอไป เพื่อสามารถแยกแยะรุนและบริษัทผูผลิตจํานวนมากขึ้นได
Trang 5II
Thesis Title Back-View Car Model Recognition
Degree Master
Year 2007
Thesis Advisor Dr Watchara Chatwiriya
Thesis Co-Advisor Prof Dr Shozo Kondo
Trang 6Acknowledgements
I would like to thank Dr Watchara Chatwiriya, my advisor, for his enthusiastic guidance and expansive discussion during the past 24 months I am also thankful to my co-advisor, Prof Dr Shozo Kondo in Tokai University, for his encouragement and practical suggestion
Specially, I am very grateful to all of members in my family - they are always motivation for me to better myself
It is also noted that I received all of kind help from members in my laboratory; I could study well in KMITL thanks to a friendly working environment they created for me Finally, I would like to mention that this thesis can not be realized without the support of JICA project for AUN/SEED-Net
May, 2007
Trang 7IV
Contents
Page
บทคัดยอ I
Abstract II
Acknowledgements III
Contents IV
List of Tables VII
List of Figures VIII
Chapter 1Introduction 1
1.1 Background 1
1.2 Objective of the Study 2
1.3 Statement of the Thesis 3
1.4 Assumption of this Study 4
1.5 Theory or Concept to be Used in this Research 4
Chapter 2 Literature Survey 6
2.1 Vehicle Recognition 6
2.1.1 Sensor Selection 6
2.1.2 Vehicle Detection 7
2.1.3 Feature Extraction 10
2.1.4 Recognition 12
2.2 Color Image Segmentation 13
2.3 Eigen-Technique 14
2.3.1 Principal Component Analysis (PCA) 15
2.3.2 Fisher Discriminant Analysis 18
Chapter 3 System Architecture and Data Collection 25
3.1 System Architecture 25
3.2 Dataset Collection 26
Trang 8Contents (cont.)
Page
3.2.1 Conditions for Capturing Image 26
3.2.2 The Number of Car Makes and Models under Consideration 29
3.3 Sample Reference Color Collection 30
Chapter 4 Car Back-View Image Segmentation 32
4.1 Introduction 32
4.2 Reference Color Learning 33
4.2.1 Color Density Modeling 33
4.2.2 Density Level Selection 46
4.3 Segmentation and Normalization 47
4.3.1 Segmentation 47
4.3.2 Normalization 59
Chapter 5 Feature Selection 61
5.1 Introduction 61
5.2 Image Space and Eigencar 62
5.3 Car Space and Car Feature 64
5.4 Fisher Car Space and Fisher Car Feature 65
Chapter 6 Recognition 69
6.1 Recognition with Quadratic Discriminant Functions 69
6.2 Recognition with Linear Discriminant Functions 72
6.3 Recognition with Nearest Neighborhood Rule 73
Chapter 7 Result and Discussion 76
7.1 Segmentation 76
7.1.1 Learning Reference Color 76
7.1.2 Separating Car back-views 77
Trang 9VI
Contents (cont.)
Page
Chapter 8 Conclusion 86
Bibliography 88
Appendix A Sample Car Images 93
Appendix B Publication List 102
Trang 10List of Tables
Table Page
3.1 Makes, Models, Years and Number of Sample Images 30
4.1 Color Prototype Learning Algorithm 37
4.2 Color Combination Algorithm 43
4.3 Likelihood Optimization Algorithm 46
4.4 Formulation for Geometric Measurements 50
7.1 Parameters used in HPL, SA and EM algorithms 76
7.2 Recognition Performance 80
7.3 Recognition Performance Using LDF 83
7.4 Recognition Performance Using QDF 84
7.5 Recognition Performance Using K-NN (K=5) 85
Trang 11VIII
List of Figures
Figure Page 1.1 Objects being Recognized by Vehicle Detection, Vehicle Type Recognition, and
Car Model Recognition 3
1.2 Name of Several Components in Car Back-Sides 4
2.1 Directions of Projections versus Scale Factors (a-c) are Projections onto the Same Direction with Different Scale Factors (d) is Projection onto the Direction Discovered by Fisher Mapping with Scale Factor Equal to 1 20
3.1 Proposed System Architecture 26
3.2 System Configuration 28
3.3 Slanted Angle of Car Back-Side 28
3.4 A Typical Image in the dataset 29
3.5 (a) An Example Distribution of Sample Colors in RGB Color Space (b) Several Red Images Sliced from Tail Light Locations 31
4.1 (a) An Input Image (b) Its Separated Back-View Image 32
4.2 Steps in Color Density Modeling 35
4.3 (a) A Sample Set of Reference Colors Projected onto 2D-plane (u*v*) (b) An Example of Approximation Using Circular Prototypes 36
4.4 Other Approximations for The Distribution in The Previous Figure; (a) Using Big Size Circular Prototypes and (b) Using Small Size Prototypes 38
4.5 A Simple Distribution and Its Possible Approximations; (a) and (b) Use Spherical Covariance Matrices with Different Orders of Sample Colors; (c) Uses Full Covariance Matrix 39
4.6 Interested Regions and Their Boundaries Defined by Several Density Levels 47
4.7 A Simple Case of Pixel Classification (a) Original Image (b) Filtered Image 48
4.8 Definition of Car Back-View Parameters 50
4.9 Variances and Loci of Gravity Centers for Several Car Back-View Images 52
4.10 Location and Size of Car Back-View Image Candidate 53
4.11 Some Complicated Results of Pixel Classification 54
4.12 Examples of Histograms and Lanes 55
4.13 Removing Noisy Areas in Filtered Images (a) After Filtering (b) After Removing 56
Trang 12List of Figures (cont.)
Figure Page
4.14 H-lanes Detection 56
4.15 Bounding Rectangles for Red Areas 57
4.16 Rectangular Regions of Several Candidates 57
4.17 Symmetric Rule Verification 58
4.18 Car Back-View Separation Flowchart 59
5.1 Steps in Selecting Representative Features 62
5.2 Flowchart for Obtaining Eigencars 64
5.3 Representation of Data 66
5.4 Discrimination Between Classes 67
5.5 Algorithm for Obtaining Discrimination Directions 68
6.1 Recognition Steps 72
6.2 K-NN Example 73
6.3 K-NN Algorithm 75
7.1 Boundaries of Reference Regions defined by HPL and the Proposed Method 77
7.2 Failure Situations in Detecting Red Areas 79
7.3 The Impact of Number of Dimensions (a) for Car Space (b) for Fisher Car Space 82
Trang 13Car Model Recognition deals only with a subset of vehicles called “car” Although there are a considerable number of existing works in vehicle recognition, most of them are in either vehicle detection or vehicle type classification None of serious research can classify vehicles into subclasses of types, such as makes and models of vehicles The following text in this section presents a brief introduction to VBVR and several existing approaches The detail of VBVR can be found in the first section of chapter 2 Generally, a VBVR system is composed of three tasks, vehicle detection, representative feature extraction and recognition All of these three tasks are important
in the sense of making the system accurate and usable
Vehicle detection or sometime called vehicle segmentation is a task that locate vehicles in images Although locating objects is a very simple work for human, it is really challenging for machines There are several existing approaches [4] for detecting vehicles; such approaches can be classifies into two groups which are called
“exhaustive detection” and “selective detection” in this thesis In exhaustive detection, vehicles are searched at every pixel in images; meanwhile, selective detection focuses the search around the most likely locations only by using specific information Obviously, exhaustive detection is time-consuming and prohibitive in real-time applications [4]
Trang 14The step following detection is to obtain representative features for classes In the view of recognition, the term “classes” is used to refer to groups of objects being distinguished by the systems [5] For example, in case of vehicle detection, a class can
be either a group of vehicles under investigation or a group of backgrounds and other obstacles; meanwhile, classes can be “Bus”, “Truck”, “Car” and so forth in vehicle type recognition Typical way for obtaining features of vehicles measures vehicle properties such as length [2], height, width [6] and color [7] Recent researches shows that other features that obtained by Principal Component Analysis (PCA) [8] [9] and transform domain (e.g Wavelet, Gabor filter) [10] are also efficient in discriminating data between classes The most challenge in feature extraction is to obtain features that are discriminative and robust to noise, distortion and modification in vehicles
The last step in VBVR is to classify unknown vehicles into classes; this step is also called recognition The method for recognition can be as simple as a comparison in some applications [7] [11] However, majority of literature use classical pattern classification methods such as Quadratic Discriminat Function (QDF), K-Nearest Neighbor (K-NN), Probabilistic Neural Network (PNN) and Support Vectore Machine (SVM) [8] [9] [10] for recognizing unknown objects
1.2 Objective of the Study
It can be seen from the previous section that the most challenging tasks in VBVR are to detect vehicles in images accurately and quickly, and to enhance the recognition complexity which is defined as the number of classes being recognized and the amount
of information that a VBVR system can answer Actually, expanding all of vehicle types
on the world into subclasses is likely impossible for a time-bounding work For these reasons, this thesis selects a subset of vehicles called “car” and aims to achieve the following tasks
1 It utilizes color and geometric properties of car back-sides in order to speed up the segmentation process for cars from images that captured from the back-view
of cars in near-field view
2 It increases the recognition complexity by trying to recognize car makes and
models as showed in Figure 1.1
Trang 153
Figure 1.1 Objects being Recognized by Vehicle Detection, Vehicle Type Recognition,
and Car Model Recognition
1.3 Statement of the Thesis
An investigation in many cars shows that car back-sides contain red colors at car tail lights and has some other geometric properties such as symmetry and correlation between components inside From these observations, the thesis will tackle the objectives above as follows
1 It proposes a method for describing the region of red colors in color spaces As can be seen in section 3.3 that red areas at tail lights do not contain only one pure red color, i.e [255, 0, 0] in RGB color space; actually, they contain all colors
in a certain region inside color spaces Therefore, the thesis has proposed a statistical approach for approximating such distributions
2 The thesis uses red colors to limit the region in images for searching car view; and thereby, it can speed up the segmentation task Car back-views are detected in thesis by verifying geometric properties for candidate car back-views
back-3 Eigen-technique is used for selecting representing and discriminating features of car models These features are used to recognize car models by linear discriminant function, quadratic discriminant function and nearest neighbor rule
Background, other obstacles Bus
Trang 161.4 Assumption of this Study
In order to realize ideas above, the thesis assumes that some assumptions should
1.5 Theory or Concept to be Used in this Research
Definition 1.1: The term car make and model are used to refer to sub-classes of vehicles, as showed in Figure 1.1
Definition 1.2: The names of several components in car back-sides that are referred
to in this thesis are given in Figure 1.2
Definition 1.3: The term red color in this thesis does not mean the pure red color in
color space, i.e [255, 0, 0] in RGB color space It can be any color that can appear in red areas of tail lights
Figure 1.2 Name of Several Components in Car Back-Sides
Left Tail Light Bumper Right Tail Light
Windshield Spoiler
Red Areas
in Tail Lights
License Plate
Trang 175
Definition 1.4: In this thesis, we use colors that appear in red areas of tail lights as
reference objects for segmenting car back-view images Such colors we name reference colors or interested colors for interchange Regions in color space contain such colors we name reference color regions or interested color regions
Definition 1.5: Rather than specifying each color in a color space as a reference
color, we should collect a set of such colors and then seek a way to infer reference color regions from this set Colors which are collected for such goal are called sample reference colors or sample colors for short
Trang 182.1.1 Sensor Selection
Generally, the first step in designing vehicle recognition systems is to select suitable sensor types for acquiring input data Steps thereafter in vehicle recognition are much dependent on the selected sensors
Sensors can be classified into two types [1], active and passive The term “active”
is used to mean that sensors detect the distance of objects by measuring the travel time
of signal emitted by the sensors and reflected by the objects Radar-based, laser-based and acoustic-based are examples of this category Meanwhile, optical sensors such as normal cameras are classified as passive sensors; sometime they are also called vision-based sensors Vehicle recognition that uses vision-based sensors is called vision-based vehicle recognition which is the context of the study in this thesis
Although vision-based sensors are less robust then radar-based and laser-based in rain, fog, night and direct sunshine, they are inexpensive and able to create a broad field of view for vehicles (up to 360degree around vehicles) Moreover, they can be used for some other specific applications such as lane marking detection and obstacle identification without requiring any modification to road infrastructure Vision-based sensors also avoid interference between sensors of the same type, which can be critical for a large number of vehicles using active sensors moving simultaneously in the same environments These reasons explain for the fact that vision-based approach receives much attention from researchers in vehicle recognition recent years
Trang 197 Three following sections present a survey of existing approaches for other steps in vehicle recognition that use vision-based sensors
2.1.2 Vehicle Detection
Vehicle detection is a step in vehicle recognition that locates the location of vehicles on the whole images Locations of vehicles are usually described by rectangular regions in images Such regions are called regions of interest or ROI in some applications [10] Although detecting ROI is straightforward in systems which use active sensors; it is a complicated task in vision-based systems
Generally, the framework for detecting ROIs contains two basic steps as follows
1 This step is to generate candidates for ROIs inside the whole images Basically, there are two approaches that are called exhaustive and selective detection in this thesis
2 The second step in the framework is to verify candidates to decide whether a candidate is a real ROI of vehicle Because majority of systems under the investigation such as in [2] [8] consider the verification as a two-class recognition problem; therefore the description of candidate verification is delayed and explained in section “recognition” following
2.1.2.1 Exhaustive Detection
Existing studies in this approach assume that there is no priori knowledge available for detection Hence, in order to detect vehicles, several windows of different sizes have been slided over the whole image to generate candidates [8] [9] [12] Researches in this approach is able to detect vehicles at every pixel in input images However, clearly, there are a tremendous number of candidates that will be generated by such way; therefore, this approach needs powerful computing resources and seems to be prohibitive for real time applications Usually, inputs for systems developed using this approach are still images
2.1.2.2 Selective Detection
This approach generates candidates around only the most likely regions by utilizing some specific information; therefore it can speed up the detection process Specific information can come from many ways which are summarized as follows
Trang 202.1.2.2.1 Subtraction-based Method
Most of researches that use vision-based sensors alone follow this method of
candidate generation Candidates are generated by a subtraction between input images
and background or between two consecutive images in image sequences [2] [11] [13]
[14] [15] The former is used only in case of the background can be modeled or
collected reliably; while the later is usually used for detecting moving objects in image
sequences
A typical background subtraction has been studied in [2] [11]; because stationary
vision-based sensors were used in a controllable environment, the background image,
called Ibg, could be modeled reliably upon the program execution To detect vehicles in
image I, a binary image Ib was formed as in equation (2.1); where θ was a threshold
value to transform the difference between two images into the binary image White pixels
in Ib that were inside enough large regions were considered as pixels in ROI
1 ,| ( , ) ( , ) | ( , )
0 ,
bg b
On the other hand, studies in [13] and [15] could adapt the background to the
change of the environment by an algorithm so-called self-adaptive background
subtraction The principal of the method in those studies is to modify the background
image (CB) by using instantaneous background (IB) and applying an appropriate
weighting α as follows
1 (1 )
CB+ = − α CB + αIB
Where, k is frame index in image sequences The instantaneous background is
defined as IB k =M k•CB k + (~M k) •I k; where, Ik is the current frame, Mk is the binary
vehicle mask and similar to Ib above
2.1.2.2.2 Knowledge-based Method
Knowledge-based methods utilize properties of vehicles such as their symmetry,
colors, edges and textures to hypothesize vehicle locations in images
1 Symmetry
Symmetry is one of the main signatures of man-made object that is very useful for
detecting and recognizing vehicles [4] Images of vehicle observed from the back-view
Trang 21In [17], symmetric measure S A( , )x w s was computed for each scan-line of image; where, xs was position of a potential symmetric axis in interval w inside scan-line Symmetric measures for all scan-lines were accumulated to form symmetric histogram for images ROI candidates were derived from symmetric histogram and edge map of the image after that
On the other hand, work in [19] used symmetric property as a criterion for validating ROI candidates; and, in [20] the symmetric detection was formulated as an optimization problem which was solved using Neural Networks
2 Color
Although color information is very useful in face detection [21] [22] and other applications in vehicle recognition such as lane and road detection [23], there is only few existing systems use color in detecting vehicles
In [23], a set of sample colors for road was collected; after that, regions in color space that contain road colors were approximated by using spheres by a density-based learning algorithm Lu*v* color space was used in order to achieve best in the uniformity
of perception Roads were detected by checking each pixel in input images to decide whether it was inside or outside of the approximated region
A typical research that uses color for detecting vehicles was presented in [24] In that research, colors of cars and background were collected and normalized by a proposed method in that work Both normalized colors of cars and backgrounds were assumed to follow Gaussian model; thereby, all pixels in images could be classified as foreground (cars) or background according to Bayesian classifiers Pixels that were classified as foreground were good suggestions for location of cars in images
3 Shadow
Trang 22According to [25], the shadows underneath vehicles can be used as a sign for detecting vehicles; because these regions are darker and cooler than other in images It
is obvious that such signs are very useful in the sense of locating the position of vehicle
in images However, it is difficult for choosing suitable threshold value to segment shadows from other regions Moreover, the shadow depends heavily on the illumination condition and the moving direction of vehicles
4 Vertical/Horizontal Edge and Corners
The boundary of the vehicle back-sides is nearly rectangles; moreover, vehicle back-view images usually contain many horizontal and vertical lines From these observations, studies in [16] [18] [26] [27] [28] have proposed several ways for using edges and corners to hypothesize the location of vehicles in images
The method presented in [16] and [18] generated candidates for vehicles by combining symmetric properties, corners and edges obtained from edge maps of images On the other hand, the method proposed in [26] segmented images into four regions: pavement, sky, and two lateral regions using edge grouping After that, groups
of horizontal edges on the detected pavement were then considered for hypothesizing the presence of vehicles
2.1.3 Feature Extraction
Feature extraction is a step that obtains characteristic features which will be used for verifying candidates generated in detection step above or for recognizing vehicles in vehicle recognition applications Finding robust and discriminative is the most challenge
in this step The following sections present several ways for extracting features that were used in majority of literature
2.1.3.1 Vehicle Features
Studies in this group aims to extract features that are properties of vehicles, e.g length, height, width, the number of axles and wheels, and colors Except the number of axles and wheels which have been usually measured by active sensors [6] and [29], other properties have been estimated from images as in [2] [13] [15] [30] In those studies, lengths and widths were estimated as the width and the height of vehicle
Trang 2311 regions in 2-D images respectively; meanwhile, heights were computed from two images by stereo-based approach in [6]
As another way, in [7], distributions of colors in several areas such as tail lights, license plate and windshield inside car back-view images were employed as features for characterize cars in car detection application
2.1.3.2 Statistical Features
In this approach, images containing vehicles (ROI) are converted into 1-D vectors Features are obtained by projecting these vectors onto pre-computed directions Such directions are eigenvectors which are derived from a set of training images This method
is called Principal Component Analysis (PCA) which is presented in detail later in this chapter Typical works that follow these approaches are in [8] and [9]
2.1.3.3 Transform Domain Features
Features extracted by this approach are computed as results of a transformation such as Gabor filter [10] [11] [31] and Wavelet [31]
Gabor filter responses for image I(x,y) of size NxN can be computed as equation below
Trang 242.1.3.4 Generic Features
The term “generic” is used to imply that methods in this approach use general algorithms in image processing such as edge [32] and histogram [33] for extracting features
Xiaoxu et al proposed in [32] a method for extracting features by combining some following steps
1 Extract edge points using edge detection methods
2 Use SIFT [34] as local descriptor to extract local features for each edge point
3 Segment edge points into point groups based on the similarity of edge points
4 Form features from edge points segments
On the other hand, features were obtained in [33] by forming histogram of distance map which was the map of distances from each pixel in input image to corresponding pixel in the mean image of class
2.1.4 Recognition
Recognition is the step that label class name for unknown objects [5]; however, there is confusion in VBVR in the use of the term “recognition” and “detection” This is probably because the detection can be seen as a recognition problem with two classes, vehicles versus background and other obstacles [2] [8]
Recognition step is usually dependent on the kind of representative features for vehicles For example, in [7], colors of several components in car back-views such as tail lights, license plate and windshield were modeled by Gaussian Mixture Model (GMM) In order to recognize vehicles, likelihood ratio which was defined as quotient of the likelihood of testing image over the likelihood of training images was computed This value was compared to a pre-defined range to yield the recognition result which was the detection result in that study
As simple as [7], in order to recognize unknown object in [11], Gabor Jets for each pixel in testing images were computed and compared with ones derived for training images Unknown object was labeled by the label of class which was best matched with the testing image
Trang 2513 Compared to specific methods as above, most of existing studies utilize classical pattern classification methods such as QDF, K-NN, PNN, SVM [8] [9] [10] for recognizing unknown objects
2.2 Color Image Segmentation
Image segmentation has a central role in vision-based recognition such as vehicle recognition, and face recognition; it is a process that partitions image into meaningful regions Such partition is the first obligatory step of vision systems Its qualification impacts deeply on the performance and the accuracy of the overall vision systems Recent studies favor to find techniques for doing segmentation with color images which naturally own more features than monochrome images
Colors of interested objects in images are characterized by their chromaticity and brightness [35] and, therefore, affected by the lighting conditions For this reason, interested colors are distributed randomly with an unknown probability density function
in color spaces Despite the fact that several color spaces have been employed to make the perception of colors more uniformity, the nature of unknown distribution form has not filtered out completely Hence, modeling the density of interested colors is still a problematic task
In some applications where the lighting condition is controllable, and the form or parameters of color density functions can be acquired or simply estimated, interested color regions in color spaces can be described by using cubes [21] [36], or spheres [37] or ellipses [22] Generally speaking, the assumption in those approaches is rarely satisfied in broader cases of color image segmentation Moreover, several afore-mentioned works employed a manual way to extract parameters
Research in [24] approached the color distribution modeling in statistical way It required the acquisition of both groups of interested and background colors; and utilized Bayesian classifier to segment incoming image pixels Generally, this approach can work well when the distribution of interested and background colors are in normal form and separable to each other However, that requirement is likely impractical Based on the assumption that any distribution of points in multi-dimensional space can be approximated by a GMM with enough mixing components [38], works in [39],
Trang 26[40] and [41] utilized GMMs as the underlying model for modeling the distribution of
interested colors or combination of colors, textures, depths, and positions of pixels
However, finding a suitable starting point for learning parameters of GMMs and
choosing a reasonable number of mixing components are still drawbacks in those
approaches
2.3 Eigen-Technique
Eigenvalue problem has a widespread use in engineering; it aims to find
eigenvectors and eigenvalues that satisfy the equation below [42]
Ax= λx
Where, A is a square matrix; x and λ are eigenvectors and eigenvalues
respectively
Eigenvectors and eigenvalues have been demonstrated by many researches that
they are very useful for obtaining both of representing [9] [43] [44] [45] and
discriminating features [46] [47] in vehicle and face recognition In this thesis, the term
eigen-technique is used to refer to a way of using eigenvectors and eigenvalues
In studies [9] [43] [44] [45], images were converted to 1-D vectors; and then
projected onto K selected eigenvectors having largest eigenvalues Eigenvectors in
those researches were computed by solving the traditional eigen problem as in equation
(2.2); where, S was the scatter matrix of training samples This process is explained in
detail in next sub-section
On the other hand, studies in [46] and [47] obtained features by solving different
generalized eigen problems in equations (2.3) and (2.4) respectively
Where, SB and Sw were between and within-scatter matrices respectively; X, L and
D were matrices formed by specific ways in that study
The following two sub-sections are to present the way for obtaining representing
and discriminating features based on eigen technique
Trang 2715
2.3.1 Principal Component Analysis (PCA)
PCA is a mathematic tool that has a long history The root of PCA was originated
from efforts of Pearson in 1901, Hotelling in 1933 [48] However, an efficient way for
computing eigenvectors and eigenvalues was delayed until 1940 by Karhunen and
Loeve
Mathematically, PCA is a linear transformation; it transforms a vector x in
d-dimensions space to vector y in d’-d-dimensions space, as showed in equations (2.5) and
(2.6) Where, d’ is usually considerable smaller than d; Wpca is the transformation matrix
This kind of transformation can be achieved by projecting vector x onto column vectors
The underlying ideas of PCA is that it aims to find a projection which can map
high-dimensional data to low-high-dimensional space while maximizes the variance of projected
data As the same effect, PCA also minimizes the sum of squared distances between
original data and projected data as much as possible The way to find such projections
can be seen in the statement below
Statement 2.1: Given a set of N samples in d-dimensions space X={x1,x2, ,xN},
transformation matrix Wpca of PCA is established from column eigenvectors that are
computed from covariance matrix C of X and their associated eigenvalues are largest
among all of eigenvalues
Proof:
Let m and S be mean and scatter matrix of X, then they can be computed by
equations (2.7) and (2.8) respectively
1 1
N k N k
At beginning, assuming that we need to reduce d-dimensions to 1-dimension, the
projection is defined as unit vector e passing through sample mean m Let yk be image
of xk in the projection defined by e, yk can be expressed as equation (2.9); where, ak is a
scalar value corresponding to the distance from yk to sample mean m
Trang 28k k
An optimal set of ak can be obtained by minimizing the sum of squared distance as
showed in equation (2.10); where, J a1 ( , , 1 a N, )e is expressed in equation (2.11)
2 1
k N
Recognizing that ||e|| =1, partially differentiating with respect to ak, and setting the
derivative to zeros, we obtain
k k
∑ is independent to e Using Lagrange multipliers method with
undetermined multiplier λ, we can maximize etSe subject to the constraint that ||e|| =1
Trang 2917 Equation (2.12) means that e and λ are eigenvector and eigenvalue of S
corresponding Moreover, because of etSe = λete=λ, in order to maximize etSe we can
select the eigenvector corresponding to the largest eigenvalue of S In other words, to
find the best one-dimensional projection of data, we can project the data onto a line
through the sample mean in direction of the eigenvector of the scatter matrix having the
largest eigenvalue
This result can be extended from one-dimensional projection to d’-dimensional
projection In place of equation (2.9), we write
' 1
When Jd’ is minimized, vectors e1, e2, ,ed’ are d’ eigenvectors of the scatter matrix
having the largest eigenvalues In other words, the projection matrix of PCA can be
formulated from eigenvectors of S as W pca = [ , , ,e e1 2 e d' ]; where, ei is column vector
Because S is merely equal to the sample covariance matrix C, defined in (2.13),
multiplying by N, maximizing etSe has the same effect as maximizing etCe
At this point, we already have a tool to obtain best directions for representing data
However, d-dimensions spaces that PCA works on are usually very huge; and therefore,
the size of C or S is very large For example, if d-spaces are formed from images of size
100x150 then size of C or S equals to 15000x15000 In some situations in which the
number of training samples are small or at least smaller than d, we can compute
eigenvectors more efficiently via smaller covariance matrix, as showed in Statement 2.2
below
Before the statement is given, we need some notations that are defined as follows
Let S={s1,s2, ,sN} be a set of N samples in d-dimensions from which we want to find
transform matrix Wpca, the mean of S is given
Trang 301 1
N i N i
=
Let X={x1,x2, ,xN} be the set of N samples obtained by subtracting m from each
element in S, i.e.x i = −s i m i, = 1 N, the covariance matrix of S can be written as
t
Let C) be a matrix defined in equation (2.16), when the number of sample N is
smaller than the number of dimensions d, then the size C) is also smaller than size of C
t
Statement 2.2: Given C and C) defined in (2.15) and (2.16) above, then
eigenvectors of C can be computed via eigenvectors of C)
=
Equation (2.18) means that ei and μi are eigenvectors and eigenvalues of C
corresponding Therefore, we can conclude that eigenvectors ei of C can be computed
from eigenvectors vi of C) by transformations ei= Xvi; meanwhile eigenvalues of two
matrices are the same
2.3.2 Fisher Discriminant Analysis
PCA is very useful tool to capture the variance of data samples; and therefore, it
can be used as a tool for obtaining representing data However, in recognition, finding
best discriminative features is very important Fortunately, one of solutions for such
challenges can be found in [49]
Basically, finding discriminative by Fisher in [49] can seen as a mapping from
d’-space to d’’-d’-space as showed in equation (2.19), by transform matrix
Trang 3119
1 2 ''
[ , , ]
W = w w w Result Z of the mapping for a set of data point in Y can be
computed as product in equation (2.20) below
One of visual arguments to archive the target “best for discriminating” is to find Wfld
so that it maximizes the difference between means of classes for projected data
samples However, Figure 2.1 demonstrates that the magnitude of direction vector w
affects highly on the distance between the projected means but it is not important as the
direction of w Therefore, to obtain good separation, it is expected that the distance
between means of classes should be large relative to some measure of the variance for
each class The criterion function can be written in equation (2.21)
distance between the projected means ( )
variance for each class
Trang 32Figure 2.1 Directions of Projections versus Scale Factors (a-c) are Projections onto the
Same Direction with Different Scale Factors (d) is Projection onto the
Direction Discovered by Fisher Mapping with Scale Factor Equal to 1
(a)
(b)
Trang 3321
Figure 2.1 Directions of Projections versus Scale Factors (cont.)
(d) (c)
Trang 34As a simple case, we consider the problem of finding Wfld for data of two classes
ω1 and ω2 And, we also want to project the data from d dimensions onto a straight line
Of course, even if the samples formed well-separated, compact clusters in d-space,
projection onto arbitrary line will usually produce a confused mixture of samples from all
of the classes and thus produce poor recognition performance However, by moving the
line around, we might be able to find an orientation for which the projected samples are
well separated This is exactly the goal of classical discriminant analysis
Suppose that Y has N d’-dimensional samples, Y={y1,y2, ,yN}, in which there are N1
samples in the subset Y1 and the remains N2 samples are in the subset Y2 and N=N1+N2
Similarly, we denote Z={z1, z2, ,zN} Z1 and Z2 are subsets of Z; they contain projected
samples in Y1 and Y2 respectively Let w be unit vector indicating the direction of
projection that we want to find If we want to obtain the projected sample z∈Z for a
sample y∈Y , we need to compute the dot product below
N ∈
Let m% 1 and m% 2 be means of two classes in the projected space; they have the
relationship to m1 and m2 as showed in equation below
On the other hand, rather than forming the projected sample variance, we define
the scatter for projected samples as follows
N s% +s% is an estimate of the variance of the projected sample variance,
and (s% 1 +s% 2 ) is called the total within-class scatter of the projected samples
From derivations in equations (2.22) and (2.23), we can re-write the criterion
function in equation (2.21) as follows
Trang 35% %
% %
With some further steps of re-arrangements below, we can rewrite J(w) in another
form which is better to be generalized in case of more than two classes
Let Si and Sw are scatter matrices defined by
B
S = m −m m −m
We call Sw the within-class scatter matrix It is proportional to the sample covariance
matrix Moreover, it is symmetric and positive semidefine; and, it is usually nonsingular if
N>d’; where d’ is the number of dimensions of car space Likewise, SB is called
between-class scatter matrix
From derivations in equations (2.24) and (2.25), J(w) can be written
( )
t B t w
w S w
w S w= λand
Trang 36Equation (2.26) shows that if Sw is nonsingular matrix the direction that best
discriminating data between two classes can be computed as the eigenvector having
the largest eigenvalue of matrix 1
w B
S S− For the C-class problem, the natural generalization of Fisher’s linear discriminant
involves C-1 discriminant functions Thus, the projection is from a d-dimensional space
to a (C-1)-dimensional space and it is tacitly assumed that d ≥ C The generalization for
within-class and between-class scatter matrices is defined in equations (2.27) and
w is now generalized to Wfld which its columns are directions that we want to seek
thus its size is d-by-(C-1) The criterion function J(w) is rewritten as
( )
t fld B fld t fld w fld
W S W
J w
W S W
=
Each wi in columns of Wfld can be obtained by solving the conventional eigenvalue
problem as showed in equation (2.26) However, this is actually undesirable, since it
requires an unnecessary computation of the inverse of Sw and also the singularity
problem of matrix Instead, one can find the eigenvalues as roots of the characteristic
polynomial
and then solve to find corresponding wi in equation below
(S B− λi S W)w i = 0
Trang 37Chapter 3
System Architecture and Data Collection
3.1 System Architecture
The developed system can run in either training or recognizing phases Its structure
can be seen in Figure 3.1 The term “training” is used to refer to the tasks of learning
reference color regions from samples inside it and training classifiers being used in recognizing phase Meanwhile, recognizing phase makes decision on which make and model of cars appearing in incoming images
Before passing these two phases, reference images in the dataset, color samples and incoming images are transformed to appropriate color space on which the algorithm for learning reference colors works In this thesis, color space CIE L*u*v* is utilized due to its uniformity of perception; and, such transformation from RGB color space can be seen in [50]
The mission of module reference color learning is to define the boundary of reference color regions in color space This means that colors in red areas of tail lights are described clearly Thereby, they can be used in module car back-view segmentation to generate candidates for tail light locations and of course for position of car back-view also
After that, geometric properties of car back-views are verified in these suggested positions Several examples of such properties are ratio between the width and the height of real car back-view images, typical distance between two tail lights and symmetric property The candidates which best score verification criteria are considered car back-view images, and thus separated and normalized
In training phases, all of car back-view images which are extracted from the dataset are gathered altogether before selecting representative features for each of car model by module feature selection After that, selected representative features are used
to train classifiers such as linear and quadratic discriminant functions
In recognizing phase, in order to obtain characteristic features of cars, incoming images are processed through modules car back-view separation and feature selection
Trang 38Then, extracted features are tested against trained classifiers mentioned before to result which makes and models of car appearing in incoming images
Figure 3.1 Proposed System Architecture
3.2 Dataset Collection
3.2.1 Conditions for Capturing Image
To experiment approaches proposed in this thesis, a set of images has been recorded for cars, and referred as a dataset in this thesis In order to capture images, several assumptions have been made and explained as follows
We assume that our recognition system is able to make decision about makes and models of cars which are standing inside cluttered background Therefore, motion blur effect might be avoided when building the system; and, dataset’s images can be acquired by capturing cars standing on road
Compared to the front-view and the side-view, the car back-view probably exhibits most of significant characteristic features of car makes and models From the back-view,
Dataset
Color
Samples
Car back-view image segmentation
Recognition Car makes and models
Training Phase Recognizing Phase
Incoming
image
Trang 3927 not only indicative parts such as make’s logo and license plate, the form of tail lights, spoilers and bumpers also bear useful information Especially, tail lights and their surrounded areas can help recognition system distinguishing car makes and models rather accurately For those reasons, the dataset is built upon images captured from the back-side of cars
We also assume that the developed system is installed so that images can be
captured in near-field view, e.g a typical setup of camera system can be seen in Figure 3.2 Where, distance D from the camera to the back-side of cars is around 4 to 6 meters
In the horizontal direction, camera position can be varied within 1 miter around the vertical middle plane; whereas, in the vertical direction, camera position can be from 1.5
to 2.5 miters from the ground Morever, car back-sides can be slanted within ±3.5
degrees around horizontal line, as showed in Figure 3.3
As showed in the next chapter, colors in red areas of tail lights are used as reference elements from which car back-view images are separated and normalized Therefore, there is another assumption that the developed system is able to be installed
in cluttered scenery where only car may be in red colors This seems a very tight constraint Fortunately, we are still able to satisfy such requirement in real applications; moreover, geometry properties of car back-views such as symmetry and ratio between inside components can help to release that condition For experiment in this thesis, the dataset’s images were captured so that the scenery surrounded cars can contain several red areas but small and unconnected to tail lights; cars were also not in red colors
Trang 40Figure 3.2 System Configuration
Figure 3.3 Slanted Angle of Car Back-Side
Effects such as rained, snowed and highly distorted are also not considered in the
developed dataset A typical image can be seen in Figure 3.4 below or in appdendix A
The dataset’s images are normalized to a size of 400x300 (WidthxHeight) that can be processed in acceptable time All of images in the dataset are in color format with bit depth of 24 and are stored in JPEG format
vertical middle plane
W
Slanted angle of car back-side The ground direction