A novel distribution-based feature for rapid object detectionJifeng Shena,n , Changyin Suna, Wankou Yanga, Zhenyu Wanga, Zhongxi Suna,b a School of Automation, Southeast University, Nanj
Trang 1A novel distribution-based feature for rapid object detection
Jifeng Shena,n
, Changyin Suna, Wankou Yanga, Zhenyu Wanga, Zhongxi Suna,b a
School of Automation, Southeast University, Nanjing 210096, China
b
College of Science, Hohai University, Nanjing 210098, China
Article history:
Received 6 June 2010
Received in revised form
16 February 2011
Accepted 17 March 2011
Communicated by Tao Mei
Available online 23 May 2011
Keywords:
Object detection
LBP
Adaptive projection-MBLBP
Asymmetric Gentle Adaboost
a b s t r a c t
The discriminative power of a feature has an impact on the convergence rate in training and running speed
in evaluating an object detector In this paper, a novel distribution-based discriminative feature is proposed
to distinguish objects of rigid object categories from background It fully makes use of the advantage of local binary pattern (LBP) that specializes in encoding local structures and statistic information of distribution from training data, which is utilized in getting optimal separating hyperplane The proposed feature maintains the merit of simplicity in calculation and powerful discriminative ability to distinguish objects from background patches Three LBP-based features are derived to adaptive projection ones, which are more discriminative than original versions The asymmetric Gentle Adaboost organized in nested cascade structure constructs the final detector The proposed features are evaluated on two different object categories: frontal human faces and side-view cars Experimental results demonstrate that the proposed features are more discriminative than traditional Haarlike features and multi-block LBP (MBLBP) features Furthermore they are also robust in monotonous variations of illumination
&2011 Elsevier B.V All rights reserved
1 Introduction
In the last few years, the problem of localizing specified objects
in still images or video frames has received a lot of attentions
It is widely used in biometric verification, video surveillance
and automatic driver aided system[1,2], etc Object detection in
arbitrary scenes is rather challenging since objects vary differently
in appearance even in the same category The main factors are
roughly divided into three aspects: photometric variation,
view-point variation and Intra-class variability For example, faces can
have different expression and under different illumination Among
many of the object detection methods, appearance-based method
using boosting algorithm is the most popular one, which is
attributed to Viola and Jones[3] who propose the first real-time
face detection system
The object detection research topics can be divided into two
categories The first category is feature representation The most
famous feature is Haar wavelet[3,4], which encodes the
differ-ence between joint areas locating at specified position But it only
reflects change of intensity in horizontal, vertical and diagonal
directions Extended Haarlike feature[5]is proposed to enrich the
feature set, which decreases about 10% of the false positives at
the same recall rate After that, disjoint Haarlike feature [6,7
which breaks up the connected sub-regions, is proposed to deal with multi-view face detection Huang et al.[8] propose a new granular feature, which is also applied to multi-view face detec-tion and proves to be the current state-of-art Recently, it is also applied to the human detection[9]field and gets a considerable result Another important feature, which is worth to mention, is LBP-based feature The LBP[10]is firstly used in face recognition
decreases the feature number in training a detector Similar feature is proposed, such as[12]using different coding rule to define features More recently, more variants[13]appear; they are more robust to illumination variations Covariance feature
[14]firstly used in human detection[15]is also applied to face detection This feature outperforms the Haarlike feature when the sample size is small but has high calculation complexity Histo-gram of gradient orientation (HOG)[16]exploits the histogram
of gradient orientation weighted by amplitude to model human silhouette and becomes the state-of-art algorithm in human detection field It derives from EOH[17]and SIFT[18], which is widely used in object detection and interest points detection, respectively Others use color information [19,20] to aid object detection
The other category is related to classifier construction There
[20,22,23], NN[24], SNoW[25], etc in the earlier research, but most of them cannot be applied to real-time applications and have high false positive rate The first real-time object detector comes from cascade structure[3], which derives from the coarse
Contents lists available atScienceDirect
Neurocomputing
0925-2312/$ - see front matter & 2011 Elsevier B.V All rights reserved.
n
Corresponding author.
E-mail addresses: shenjifeng@gmail.com (J Shen), cysun@seu.edu.cn (C Sun),
wankou_yang@yahoo.com.cn (W Yang), seuwzy@gmail.com (Z Wang),
sunzhx@hhu.edu.cn (Z Sun).
Trang 2empirically and use thousands of features to exclude a large
amount of the negative samples After that, more research
works are concerned about how to improve detection speed and
adaptively choose optimal parameters, such as booting chain[27],
waldboost[28], soft cascade[29], dynamic cascade [30],
multi-exit boosting[31], MSL[32], etc
Many of the state-of-art object detection systems suffer from
two main issues First, it is rather time consuming to train a robust
detector Second, extracted features are not discriminative enough
especially in the later cascade nodes to exclude hard negative
samples To overcome these problems, we propose a new
distribu-tion-based feature, which possesses strong discriminative ability,
is very efficient to train and can be applied to real-time detection
The partial work of this idea is published in [33], but is not
evaluated as thoroughly
This remaining of this paper is organized as follows Section 2
presents an overview of LBP-based method exploited in current
object detection Section 3 describes our proposed new features in
detail Section 4 demonstrates the hierarchy of our detector
Experimental result and performance analysis are presented in
Section 5 Finally, conclusions are given in Section 6
2 Related works
2.1 LBP-based features
Ojala and Pietik ¨ainen [10] propose the local binary pattern
(LBP), which is widely used in texture classification It encodes the
difference between center pixel and its surrounding ones in a
circular sequence manner It characterizes the local spatial
struc-ture of image in Eq (1):
f R,N¼X
N1
i ¼ 0
sðp i p cÞ2i, sðxÞ ¼ 1, x Z0
(
ð1Þ
where p i is one of the N neighbor pixels around the center pixel p c,
on a circle or square of radius R LBP favors its usage as a feature
descriptor of its tolerance against illumination changes and
com-putational simplicity It is successfully applied to many computer
vision and pattern recognition fields, such as face recognition
[10,34], face detection[11], facial expression recognition[35,36],
background subtraction [37], dynamic texture analysis [38,39],
gender classification[40,41] and so on
called census transform (CT), which is a summary of local spatial
structure It defines an ordered set of comparisons of pixel
intensities in a local area, which pixels have lesser intensity than
the pixel in the center The coding rule is defined as follows:
where denotes the concatenation operation, I(x) is the intensity
of pixel x, N(x) is the neighborhood of pixel x and zðx, yÞ is a
concatenate function Froba and Ernst [12] extend the original
work and propose modified census transforms (MCT), which is
applied to face detection It differentiates from census transform
by taking the center pixel intensity and average pixel intensity
into consideration that generates a longer binary string The
formula for encoding MCT is as follows:
where IðxÞ is the average intensities in N0(x).
Recently, Heikkil et al.[43]propose a center-symmetric LBP
(CSLBP) descriptor for interest regions detection, which combines
the good properties of SIFT and LBP It also uses only half of
the coding length of traditional LBP descriptors CSLBP is defined
as follows:
f R,N,T¼ X
N=21
i ¼ 0
0, others
(
ð4Þ
where p i,p i þ ðN=2Þcorrespond to the gray values of
center-sym-metric pairs of pixels of N equally spaced pixels on a circle of radius R.
The main difference between those aforementioned non-para-metric local binary encoding features is its coding sequence All of these features use binary string to represent local feature instead
of using pixel intensity directly
More recently, Zhang et al.[11]propose a MBLBP feature, which extends the coding of single pixel to block and successfully used in face detection A MBLBP feature is composed of 3 3 cells and each cell contains a set of pixels The feature is an 8-bits encoding string
in a circular manner from number 1 to 8 (see inFig 1(a)) where each bit reflects whether its average intensity in a cell is larger or smaller than the center one Its multi-scale version is shown in
Fig 1(b) and (c) MBLBP is less sensitive to intensity variation in local area than original LBP The MBLBP feature, which is used in face detector, only uses 10% quantity of Haarlike features and more discriminative than Haarlike features at the same time Yan et al
[44]propose a similar feature called LAB and utilize feature-centric method to improve efficiency in calculation
2.2 Distribution-based features
Pavani et al.[45]propose a new optimally weighted rectangles for face and heart detection; they integrate the distribution information of positive and negative training samples into Haarlike features in order to find an optimal separating hyper-plane, which maximizes the margin between positive class and negative class It can be formulated as follows:
wopt¼ arg max
w
w T S b w
S b ¼ mþ m, S w ¼ Sþ þ S
8i, y i¼ 1
ðmi mþÞðmi mþÞT
8i, y i¼ 1
ðmi mÞðmi mÞT
where woptis the optimal projection to maximize the between-class
distance and minimize the within-class distance, mþ and m are
the mean vector of positive and negative samples, respectively, S b
and S w are the between-class and within-class scatter matrix, respectively
Fig 2(b) shows the two-dimensional distribution of positive and negative samples with default projection direction and optimal projection direction This kind of feature fully makes use of the distribution information of training samples in order to improve discriminative power of classifiers and decrease the evaluation time to exclude negative patches The Haarlike feature
4 5 6 7
Trang 3comprising k connecting areas can be formulated as follows:
k
i ¼ 1
The default Haarlike feature, which is shown inFig 2(a), can
be represented as ½1 1½u1 u2T , where u1, u2are the average
intensities in white and black area, respectively The optimally
weighted feature has a different projection direction from vector
( 1, 1), which is the key improvement in performance
2.3 Asymmetric Gentle Adaboost
Asymmetric characteristic inherently exists in many object
detection fields where positive targets need to be distinguished
from enormous background patterns Although symmetric Adaboost
can achieve good performance in object detection, ignoring distri-bution imbalance between positive and negative samples makes the boosting algorithms converge much slower than asymmetric ones Asymmetric boosting[46,47] utilizes cost function to penalize more
on false negatives than false positives, so as to get a better separating hyperplane Recently Gentle Adaboost is extended to asymmetric one, which further improves the performance in classi-fication problems[48]
The goal function of asymmetric Gentle Adaboost, which uses asymmetric additive logistic regression model, is formulated as follows:
J asym ðFÞ ¼ E½Iðy ¼ 1Þe yC1FðxÞ þ Iðy ¼ 1Þe yC2FðxÞ ð7Þ
where C1 and C2 are cost factors of false negatives and false
positives, respectively, and F(x) is a signed real confidence value.
Using Newton update technique, the optimal weak classifier can be worked out as follows:
f opt ðxÞ ¼ C1P w ðy ¼ 1 9xÞC2 P w ðy ¼ 1 9xÞ
where P w ðy ¼ 1 9xÞ and P w ðy ¼ 1 9xÞare the cumulative weight
distributions from positive and negative samples, respectively The asymmetric Gentle Adaboost algorithm used in this paper is shown inFig 3
3 Adaptive projection of block-based local binary features Inspiring from the effectiveness of optimal weight embedded
in Haarlike features, we combine the block-based LBP and adaptive projection strategy, which merge the distribution infor-mation of training samples into descriptors to enhance its discriminative ability The experimental result indicates that the adaptive projection (AP)-based MBLBP is more discriminative than weighted Haarlike and MBLBP features, which inherit the advantage of both features at the same time
Except that, we also introduce the block-based version of modified census transform, CSLBP and its AP versions AP-MBLBP, AP-MBMCT and AP-MBCSLBP will be explained in detail in Sections 3.1, 3.2 and 3.3, respectively
3.1 AP-MBLBP
The idea of AP-MBLBP[33]is to get optimal features in each direction to maximize the margin between positive and negative feature vectors In order to combine these feature assembles, we encode rectangle regions of every direction similar to LBP The AP-MBLBP operator is defined as follows:
N
i ¼ 0
where p cis the average intensity of center area, which is the gray area shown inFig 4(a); p i is the ith direction around p c ; w iis the
weight of p i ; gðp i,p c,w i,w cÞ is a piecewise function, which reflects
the relationship between p i and p c, which is defined as follows:
gðp i,p c,w i,w cÞ ¼ 1, w i p i4w c p c
0, w i p irw c p c
(
ð10Þ
between-class variation and minimizes within-class variation
Fig 4shows the details of the process to generate AP-MBLBP features Firstly a block, which is composed of 3 3 cells, is shown
in Fig 4(a) Then the block is separated into 8 center-adjacent
following, the optimal weight of each cell is calculated by Fig 3 Asymmetric Gentle Adaboost.
Fig 2 Distributions between positive and negative samples.
Trang 4Eq (5) Finally, encoding of all the cells is determined by Eq (10).
operator, which is similar to LBP
We can see in Fig 4(b), eight two-adjacent cells reflect the
variations of eight directions between center cell and adjacent
ones, which encode local intensity variation information of
differ-ent direction Instead of utilizing only one optimal direction like
Haarlike features, the other seven less discriminative features,
which are neglected, can also aid more local variation information
to enhance total discriminative power of APLBP features
The key step inFig 4(c) fully makes use of distribution of samples
to maximize the margin between positive and negative samples by
each CAF in order to promote the global discriminative power of
AP-MBLBP It is worth to mention that AP-MBLBP is different from
MBLBP, since the former have different weights for surrounding
rectangle regions The weight of the former one encodes the
discriminative information between positive and negative samples
AP-MBLBP can be considered as a generalized MBLBP We denote
weight vector w ¼ ðw1, w2, ,w8Þ, when w ¼ ð1,1, ,1Þ, and
AP-MBLBP is degenerated to MBLBP AP-MBLBP has all the
advan-tages of MBLBP; furthermore it is more discriminative than MBLBP
In order to find the intrinsic reason of its strong discriminative
ability, we visualize the first three AP-MBLBP features from our
trained detector FromFig 5(b), we can see that the first feature is
symmetric along the center of face, and it covers the most
important region in face area such as eyes and nose It encodes
the symmetric context information of human face, which
inher-ently exists in nature Features inFig 5(c) and (d) encode the
geometric information of eyes, nose, mouth and face silhouette
information, which is very discriminative to exclude negative
patches More discussion will be given in Section 5.2
3.2 MBMCT and AP-MBMCT
We propose the block-based version of modified census
trans-form (MBMCT) and its AP version (AP-MBMCT), which is similar
to the MBLBP and AP-MBLBP MBMCT encodes not only the
surrounding cells of center cell, but also the average intensities
of all cells in a block So the total length of encoding bit string is longer than MBLBP The coding method is defined as follows:
f mbct¼X
N
i ¼ 0
gðp i,p,w iÞ ¼ 1, w i p i4p
0, w i p irp , p ¼
1
N
X
N1
0
p i
(
The process of generating AP-MBMCT is shown inFig 6, which
is similar to generate AP-MBLBP except that the length of final bit string is longer than the former one and it uses average block intensity instead of center block
3.3 MBCSLBP and AP-MBCSLBP
We also propose the block-based version of CSLBP and AP-MBCSLBP, and its definition is described as follows:
N=21
i ¼ 0
gðp i,p j,w i,w jÞ ¼
1, w i p i4w j p j
0, w i p irw j p j
(
The process of generating AP-MBCSLBP is demonstrated in
Fig 7, where the length of the final bit string is half of the length
of AP-MBLBP Four pairs of sub-regions are central symmetry with respect to the center block, which is different from MBLBP and MBMCT
3.4 Time complexity of generating adaptive projection features The adaptive projection features utilize N training samples; each sample is d d size, and the dimension of feature vector is K The final optimal weight of the feature is w ¼ S1ðmþmÞ, where S
1 -0.6 0.3 -0.6
-0.2
0.2 -0.1 0.2 -0.1
4
5 6 7 8
1 1 1
0
0
Fig 4 AP-MBLBP features.
Fig 5 Selected features on average face: (a) average face on training data, (b) first select feature, (c) second selected feature, (d) third selected feature.
5 6 7
1
1 1
0
0 0 0 0 1
0.3 -0.6
-0.2 9
0.3 -0.6
-0.1
0.2 -0.1 0.2
Fig 6 AP-MBMCT features.
Trang 5is K K between-class matrix, and mþ, m are K-dimensional
positive and negative average feature vectors, respectively We use
LU decomposition algorithm to solve the inverse of matrix, so the
complexity is (2/3)K3 The complexity of computing each feature is
OðK3þ K2þ NKÞ, where K equals 2 in this paper So it is very efficient
to calculate, but little slower than original MBLBP features in the
training section
4 Object detection architecture
The proposed feature is designed to detect multiple object
instances in different scales at different locations in an input
image The object detection architecture is shown inFig 8 The
process of constructing an object detector comprises two phases,
one is training based on a large amount of training data, and the
other is testing in a multi-scale image pyramids In the training
section, adaptive projection feature set is calculated based on
original feature set and training data; then a nested cascade is
constructed using asymmetric Gentle Adaboost It is worth to
mention that adaptive projection feature set has to be
recalcu-lated after the training data is changed such as after bootstrap
negative sample in each stage This result in the final object
detector is shown in the bottom part ofFig 8 In the test section,
trained object detector evaluates in test database Firstly an image
pyramid for every test image is constructed; secondly sliding
window technique is utilized to check every possible window
whether it contains an object Thirdly, non-maximum
suppres-sion method is used to merge multiple detection windows around
one object This procedure is shown in the top ofFig 8
5 Experimental results and analysis
We evaluate our proposed features on three well-know
Internet
5.1 Experimental setup
In the face detection experiment, the training set consists of approximately 10 000 faces, which covered out-of-plan rotation and in-plan rotation in the range of [ 20, 20] Through mirroring, random shifting operation, the size of positive sample set is enlarged to 40 000 Except that more than 20 000 large images, which do not contain any faces, were used for bootstrap All
samples are normalized to 24 24 size We use N¼8 in Eqs (9),
(11) and (12) The number of MBLBP generated in face detection experiment is 8464 The size of MBLBP varies from 3 to 24 pixels
in width and height, respectively The parameters of cost factor
are C1¼1 and C2 ¼0.25 in asymmetric Gentle Adaboost 256-bin,
classifier for AP-MBLBP, AP-MBCT and AP-MBCSLBP features, respectively The minimum detection rate and maximum false positive rate are set to 0.9995 and 0.4, respectively
In the car detection experiment, we get training data from UIUC side view car dataset, which contain 550 car image and 500 negative images in training set The original size of training samples is at a resolution of 100 40; we manually cropped and resized it to
64 32 in order to reduce the negative effect from background The number of MBLBP generated in the car detection experiment is
30 513 The size of the MBLBP varies from 3 to 63 pixels in width and 3 to 30 in height The minimum detection rate and maximum false positive rates are set to 0.995 and 0.4, respectively Other parameter settings are the same as the above mentioned
5.2 Features comparison
In order to evaluate the effectiveness of our proposed features,
we conduct a comparison among Haar[3], Fldaþ Haar[45], LBP,
0 -0.2
Fig 7 AP-MBCSLBP features.
SC1
Rejected
true
Test images
Build image prymid
Sliding window
Non maximun suppresion
Training samples (objects, background)
distribution information of training samples
Original Feature set
Adaptive projection Feature set
Train nested cascade Classifier
Object detector
object
Test:
Train:
Feature-centric image prymid
Fig 8 Object detection architecture.
Trang 6AP-MBCSLBP features based on face dataset The result is shown
in Fig 9 In this experiment, we randomly choose 20 000 face
samples and 20 000 negative samples for training, and the
validation set uses all the left 20 000 faces for cross validation
Eight single node detectors with each containing 50 weak classifiers are trained and evaluated in the validation set From Fig 9, we conclude that AP-MBMCT, AP-MBLBP and AP-MBCSLBP are more discriminative than MBMCT, MBLBP and Fig 9 Feature comparison (a) False negative rate vs weak classifier number, (b) false positive rate vs weak classifier number, (c) error rate vs weak classifier number.
Trang 7MBCSLBP, respectively AP-MBMCT and AP-MBLBP have the
low-est false negative rate and false positive rate with the same weak
classifiers, where AP- MBMCT is slightly better than AP-MBLBP at
the first 30 rounds and convergence to the same after that The
MBCSLBP is less discriminative compared to the LBP, MBMCT and
MBLBP, but better than Haar and Fldaþ Haar features (difference
in about 10%) ThroughFig 9(a), we can see Haarlike features
cannot distinguish faces from difficult negative samples, but
LBP-based features are able to discriminate it effectively This is due to
the advantage of LBP, which possesses nice discriminative powers
in classifying weak texture object
InFig 9(b), we can see that both of the Haarlike feature and
LBP-based features can exclude negative patches effectively
(difference in about 2%) There are slight differences among
AP-MBLBP, AP-MBMCT and AP-MBCSLBP in excluding negative
samples, and AP-MBLBP and AP-MBMCT also perform better than
AP-MBCSLBP Fig 9(c) is the total error rate, which comprises
false positives and false negatives AP-MBLBP and AP-MBMCT
have little difference in discriminative power, and all distribution-based features (AP version) perform better than the original ones
So features making use of distribution of the training samples can improve the performance in the classification
We visualized the top 12 FS2 features inFig 10 InFig 10, the most discriminative features are located at the eyes, corner of mouth and nose and face silhouette So the symmetric context of human face is utilized implicitly The symmetric geometric relationship of two eyes in first feature can exclude more than 30% negative patches in the initial training set with zero false negatives Furthermore, the left half part (top 2) and the right half part (top 8) features describe the concurrence of one eye, corner
of mouth and half silhouette of face So the feature makes use of both symmetric context information and geometric relationship
of components in face implicitly, which improves the discrimina-tive ability To sum up, most of the discriminadiscrimina-tive features are laid in the face area or overlap with the face and background
So the silhouette of face and eyes is the most important area to
Fig 10 Top 12 features on FS2 set selected by our detector.
Trang 8distinguish from non-faces That is why our proposed feature is
superior to others
5.3 Evaluation on MITþCMU dataset
5.3.1 Comparison of different features
We evaluate our detector on CMUþMIT frontal face database,
which consists of 130 files containing 507 faces Due to their
similarity of discriminative power among MBLBP, MBMCT and
CSLBP we evaluate these feature sets together instead of one by
one One feature set (FS-1) comprises MBLBP, MBMCT and
MBCSLBP; the other set (FS-2) includes AP-MBLBP, AP- MBMCT
and AP-MBCSLBP The ROC curve is displayed inFig 11
The data of red solid curve inFig 11is adopted from original
papers and the others are our implementation The final trained
detectors based on LBP-like features contain 9 nodes about 400
weak classifiers Some of the test results in MITþ CMU database
are shown inFig 13
FromFig 11, we can see that the discriminative ability of FS-2
is much better than FS-1 It is due to the distribution information,
which is embedded into the features to enhance the
discrimina-tive ability The ROC curve of our detector FS-2 is also higher than
the Yan[44], which is the state-of-art During the experiment, we
have observed that it is very effective to use MSL[44]method to
generate a well-distributed positive training set; the finally
positive training set (about 9000 samples), which is generated
by bootstrapping in each stage, has a very nice representation of
the whole 40 000 original training sets We also train on this
distilled 9000 samples and get nearly the same result We also
find that the well-distributed training set has an important effect
on the accuracy of detector So it is very important to collect a
well-distributed positive and negative training sets in order to
train a robust detector
5.3.2 Comparison of different training method
In this section we compare Nested cascade with MSL method
in training face detector Both of them use real value inheriting
techniques to decrease the number of features The difference
between them is that the former needs to bootstrap positives
incrementally to get a well-distributed face subset, which
effec-tively represents all the face images in a huge validation set
In ours method, we use a pre-selected face dataset, which is much smaller than the former one In order to using MSL method to train face detector, we collect a huge face dataset, which contains about 390 000 faces from many well-know face databases and the quantity is even larger than paper[44]used
We have implemented five different detectors (MSL with different features including LAB, FS1 and FS2, and nested cascade with FS1 and FS2) and the result of comparison is shown in
Fig 12 FromFig 12, we can see that when training with FS1 feature set, both MSL and nested cascade have very similar results In the case of training FS2 feature set, nested cascade is slightly better than MSL method We think that the MSL method with FS2 feature set needs to bootstrap positives in each round of every stage in an incremental manner, which is more sensitive to the distribution changing of positive and negative data especially
in earlier stage Furthermore, much of the computation load focus
on recalculating adaptive projection feature set after positive samples are bootstrapped, so it is more time consuming to train with MSL structure
FromFig 12, we can see that the ROC of our implementation of MSL with LAB feature is slightly lower than its published result[44]
We think it is may be due to different training data, because how to generate a well-distributed face manifold as training data is still a difficult problem [51], which is out of the scope of this paper Furthermore, the MSL method is more complicated to implement and our method is easy to implement with reasonable performance
5.4 Evaluation on UIUC dataset
We also evaluated our features on UIUC side view car dataset, which consists of single scale set with 170 images containing
200 cars and multi-scale test dataset with 108 images containing
139 cars We denote them Test Set I and Test Set II and use the procedure provided in this dataset to evaluate the detection result The car detector is trained on the UIUC car training set, which comprises 550 car images and 500 background images using the same method as face Two detectors with different features are evaluated in the UIUC datasets The final trained detector contains only about 90 weak classifiers The recall-precision curve of single-scale and multi-scale test dataset is demonstrated in
Fig 14 We can see that FS-2 feature set has high recall rate at the same precision; it is superior to FS-1 feature set, which is
Trang 9consistent as the result evaluated on face dataset We compare
our approach with previous approaches following the equal
precision and recall rate (EPR) method The results are listed in
Table 1 From that we can see the FS-1 and FS-2 feature set have the moderate detection rate in detecting low resolution and complex background images
Fig 13 Face detector output on test images.
Fig 14 1-precision vs recall curve of test result.
Trang 10We visualize the top 12 features from FS2 detector inFig 15.
We can see that most discriminative features focus on bottom of
the car, especially in the area of wheels In this experiment, we find
that nearly no features lay on the top of the car, we think this due
to the training data have both left side view and right side view
mixed together So we manually separate the training set into two
datasets (left side view and right side view), and we train on these
two datasets We observe slight improvement in detection rate, so
it is effective to detect left and right side view separately
From Table 1, we can see that our method does not get the state-of-art result; we think that the side view car has smaller discriminative area than faces and all the positions mainly lay on the silhouette of car, so shape information predominates the importance of feature, that is why other shape-based descriptor gets the state-of-art result Some of the car detection results are displayed inFig 16
Table 1
EPR rates of different methods on UIUC dataset.
Methods Single-scale
(Test Set I) (%)
Multi-scale (Test Set II) (%) Agarwal et al [49] 76.5 39.6
Mutch and Lowe [52] 99.94 90.6
Wu and Nevatia [53] 97.5 93.5
Fritz et al [54] 88.6 87.8
Lampert et al [55] 98.5 98.6
MB-(LBP,MCT,CSLBP) 85.5 81.3
AP-MB-(LBP,MCT,CSLBP) 90.5 87
Fig 15 Top 12 features selected by our detector.
Table 2 Face detection results on PASCAL VOC dataset.
Dataset name VOC 2007 test set VOC 2008 test set VOC 2009 test set Description 4952 images 4336 images 6925 images Feature Haar FS1 FS2 Haar FS1 FS2 Haar FS1 FS2 False positives 94 87 85 66 77 57 112 125 103 True positives 853 1083 1135 1026 1246 1312 1352 1531 1597 The bold numbers in Table 2 means the number of false positives and true positives found by our detector with feature FS2 in the VOC 2007, 2008 and 2009 dataset respectively For the number of false positives, the less the better For the number of true positives, the more the better.