A novel distribution based feature for r

A novel distribution-based feature for rapid object detectionJifeng Shena,n , Changyin Suna, Wankou Yanga, Zhenyu Wanga, Zhongxi Suna,b a School of Automation, Southeast University, Nanj

Trang 1

A novel distribution-based feature for rapid object detection

Jifeng Shena,n

, Changyin Suna, Wankou Yanga, Zhenyu Wanga, Zhongxi Suna,b a

School of Automation, Southeast University, Nanjing 210096, China

b

College of Science, Hohai University, Nanjing 210098, China

Article history:

Received 6 June 2010

Received in revised form

16 February 2011

Accepted 17 March 2011

Communicated by Tao Mei

Available online 23 May 2011

Keywords:

Object detection

LBP

Adaptive projection-MBLBP

Asymmetric Gentle Adaboost

a b s t r a c t

The discriminative power of a feature has an impact on the convergence rate in training and running speed

in evaluating an object detector In this paper, a novel distribution-based discriminative feature is proposed

to distinguish objects of rigid object categories from background It fully makes use of the advantage of local binary pattern (LBP) that specializes in encoding local structures and statistic information of distribution from training data, which is utilized in getting optimal separating hyperplane The proposed feature maintains the merit of simplicity in calculation and powerful discriminative ability to distinguish objects from background patches Three LBP-based features are derived to adaptive projection ones, which are more discriminative than original versions The asymmetric Gentle Adaboost organized in nested cascade structure constructs the ﬁnal detector The proposed features are evaluated on two different object categories: frontal human faces and side-view cars Experimental results demonstrate that the proposed features are more discriminative than traditional Haarlike features and multi-block LBP (MBLBP) features Furthermore they are also robust in monotonous variations of illumination

1 Introduction

In the last few years, the problem of localizing speciﬁed objects

in still images or video frames has received a lot of attentions

It is widely used in biometric veriﬁcation, video surveillance

and automatic driver aided system[1,2], etc Object detection in

arbitrary scenes is rather challenging since objects vary differently

in appearance even in the same category The main factors are

roughly divided into three aspects: photometric variation,

view-point variation and Intra-class variability For example, faces can

have different expression and under different illumination Among

many of the object detection methods, appearance-based method

using boosting algorithm is the most popular one, which is

attributed to Viola and Jones[3] who propose the ﬁrst real-time

face detection system

The object detection research topics can be divided into two

categories The ﬁrst category is feature representation The most

famous feature is Haar wavelet[3,4], which encodes the

differ-ence between joint areas locating at speciﬁed position But it only

reﬂects change of intensity in horizontal, vertical and diagonal

directions Extended Haarlike feature[5]is proposed to enrich the

feature set, which decreases about 10% of the false positives at

the same recall rate After that, disjoint Haarlike feature [6,7

which breaks up the connected sub-regions, is proposed to deal with multi-view face detection Huang et al.[8] propose a new granular feature, which is also applied to multi-view face detec-tion and proves to be the current state-of-art Recently, it is also applied to the human detection[9]ﬁeld and gets a considerable result Another important feature, which is worth to mention, is LBP-based feature The LBP[10]is ﬁrstly used in face recognition

decreases the feature number in training a detector Similar feature is proposed, such as[12]using different coding rule to deﬁne features More recently, more variants[13]appear; they are more robust to illumination variations Covariance feature

[14]ﬁrstly used in human detection[15]is also applied to face detection This feature outperforms the Haarlike feature when the sample size is small but has high calculation complexity Histo-gram of gradient orientation (HOG)[16]exploits the histogram

of gradient orientation weighted by amplitude to model human silhouette and becomes the state-of-art algorithm in human detection ﬁeld It derives from EOH[17]and SIFT[18], which is widely used in object detection and interest points detection, respectively Others use color information [19,20] to aid object detection

The other category is related to classiﬁer construction There

[20,22,23], NN[24], SNoW[25], etc in the earlier research, but most of them cannot be applied to real-time applications and have high false positive rate The ﬁrst real-time object detector comes from cascade structure[3], which derives from the coarse

Contents lists available atScienceDirect

Neurocomputing

n

Corresponding author.

E-mail addresses: shenjifeng@gmail.com (J Shen), cysun@seu.edu.cn (C Sun),

wankou_yang@yahoo.com.cn (W Yang), seuwzy@gmail.com (Z Wang),

sunzhx@hhu.edu.cn (Z Sun).

Trang 2

empirically and use thousands of features to exclude a large

amount of the negative samples After that, more research

works are concerned about how to improve detection speed and

adaptively choose optimal parameters, such as booting chain[27],

waldboost[28], soft cascade[29], dynamic cascade [30],

multi-exit boosting[31], MSL[32], etc

Many of the state-of-art object detection systems suffer from

two main issues First, it is rather time consuming to train a robust

detector Second, extracted features are not discriminative enough

especially in the later cascade nodes to exclude hard negative

samples To overcome these problems, we propose a new

distribu-tion-based feature, which possesses strong discriminative ability,

is very efﬁcient to train and can be applied to real-time detection

The partial work of this idea is published in [33], but is not

evaluated as thoroughly

This remaining of this paper is organized as follows Section 2

presents an overview of LBP-based method exploited in current

object detection Section 3 describes our proposed new features in

detail Section 4 demonstrates the hierarchy of our detector

Experimental result and performance analysis are presented in

Section 5 Finally, conclusions are given in Section 6

2 Related works

2.1 LBP-based features

Ojala and Pietik ¨ainen [10] propose the local binary pattern

(LBP), which is widely used in texture classiﬁcation It encodes the

difference between center pixel and its surrounding ones in a

circular sequence manner It characterizes the local spatial

struc-ture of image in Eq (1):

f R,N¼X

N1

i ¼ 0

sðp i p cÞ2i, sðxÞ ¼ 1, x Z0

(

ð1Þ

where p i is one of the N neighbor pixels around the center pixel p c,

on a circle or square of radius R LBP favors its usage as a feature

descriptor of its tolerance against illumination changes and

com-putational simplicity It is successfully applied to many computer

vision and pattern recognition ﬁelds, such as face recognition

[10,34], face detection[11], facial expression recognition[35,36],

background subtraction [37], dynamic texture analysis [38,39],

gender classiﬁcation[40,41] and so on

called census transform (CT), which is a summary of local spatial

structure It deﬁnes an ordered set of comparisons of pixel

intensities in a local area, which pixels have lesser intensity than

the pixel in the center The coding rule is deﬁned as follows:

where denotes the concatenation operation, I(x) is the intensity

of pixel x, N(x) is the neighborhood of pixel x and zðx, yÞ is a

concatenate function Froba and Ernst [12] extend the original

work and propose modiﬁed census transforms (MCT), which is

applied to face detection It differentiates from census transform

by taking the center pixel intensity and average pixel intensity

into consideration that generates a longer binary string The

formula for encoding MCT is as follows:

where IðxÞ is the average intensities in N0(x).

Recently, Heikkil et al.[43]propose a center-symmetric LBP

(CSLBP) descriptor for interest regions detection, which combines

the good properties of SIFT and LBP It also uses only half of

the coding length of traditional LBP descriptors CSLBP is deﬁned

as follows:

f R,N,T¼ X

N=21

i ¼ 0

0, others

(

ð4Þ

where p i,p i þ ðN=2Þcorrespond to the gray values of

center-sym-metric pairs of pixels of N equally spaced pixels on a circle of radius R.

The main difference between those aforementioned non-para-metric local binary encoding features is its coding sequence All of these features use binary string to represent local feature instead

of using pixel intensity directly

More recently, Zhang et al.[11]propose a MBLBP feature, which extends the coding of single pixel to block and successfully used in face detection A MBLBP feature is composed of 3 3 cells and each cell contains a set of pixels The feature is an 8-bits encoding string

in a circular manner from number 1 to 8 (see inFig 1(a)) where each bit reﬂects whether its average intensity in a cell is larger or smaller than the center one Its multi-scale version is shown in

Fig 1(b) and (c) MBLBP is less sensitive to intensity variation in local area than original LBP The MBLBP feature, which is used in face detector, only uses 10% quantity of Haarlike features and more discriminative than Haarlike features at the same time Yan et al

[44]propose a similar feature called LAB and utilize feature-centric method to improve efﬁciency in calculation

2.2 Distribution-based features

Pavani et al.[45]propose a new optimally weighted rectangles for face and heart detection; they integrate the distribution information of positive and negative training samples into Haarlike features in order to ﬁnd an optimal separating hyper-plane, which maximizes the margin between positive class and negative class It can be formulated as follows:

wopt¼ arg max

w

w T S b w

S b ¼ mþ m, S w ¼ Sþ þ S

8i, y i¼ 1

ðmi mþÞðmi mþÞT

8i, y i¼ 1

ðmi mÞðmi mÞT

where woptis the optimal projection to maximize the between-class

distance and minimize the within-class distance, mþ and m are

the mean vector of positive and negative samples, respectively, S b

and S w are the between-class and within-class scatter matrix, respectively

Fig 2(b) shows the two-dimensional distribution of positive and negative samples with default projection direction and optimal projection direction This kind of feature fully makes use of the distribution information of training samples in order to improve discriminative power of classiﬁers and decrease the evaluation time to exclude negative patches The Haarlike feature

4 5 6 7

Trang 3

comprising k connecting areas can be formulated as follows:

k

i ¼ 1

The default Haarlike feature, which is shown inFig 2(a), can

be represented as ½1 1½u1 u2T , where u1, u2are the average

intensities in white and black area, respectively The optimally

weighted feature has a different projection direction from vector

( 1, 1), which is the key improvement in performance

2.3 Asymmetric Gentle Adaboost

Asymmetric characteristic inherently exists in many object

detection ﬁelds where positive targets need to be distinguished

from enormous background patterns Although symmetric Adaboost

can achieve good performance in object detection, ignoring distri-bution imbalance between positive and negative samples makes the boosting algorithms converge much slower than asymmetric ones Asymmetric boosting[46,47] utilizes cost function to penalize more

on false negatives than false positives, so as to get a better separating hyperplane Recently Gentle Adaboost is extended to asymmetric one, which further improves the performance in classi-ﬁcation problems[48]

The goal function of asymmetric Gentle Adaboost, which uses asymmetric additive logistic regression model, is formulated as follows:

J asym ðFÞ ¼ E½Iðy ¼ 1Þe yC1FðxÞ þ Iðy ¼ 1Þe yC2FðxÞ ð7Þ

where C1 and C2 are cost factors of false negatives and false

positives, respectively, and F(x) is a signed real conﬁdence value.

Using Newton update technique, the optimal weak classiﬁer can be worked out as follows:

f opt ðxÞ ¼ C1P w ðy ¼ 1 9xÞC2 P w ðy ¼ 1 9xÞ

where P w ðy ¼ 1 9xÞ and P w ðy ¼ 1 9xÞare the cumulative weight

distributions from positive and negative samples, respectively The asymmetric Gentle Adaboost algorithm used in this paper is shown inFig 3

3 Adaptive projection of block-based local binary features Inspiring from the effectiveness of optimal weight embedded

in Haarlike features, we combine the block-based LBP and adaptive projection strategy, which merge the distribution infor-mation of training samples into descriptors to enhance its discriminative ability The experimental result indicates that the adaptive projection (AP)-based MBLBP is more discriminative than weighted Haarlike and MBLBP features, which inherit the advantage of both features at the same time

Except that, we also introduce the block-based version of modiﬁed census transform, CSLBP and its AP versions AP-MBLBP, AP-MBMCT and AP-MBCSLBP will be explained in detail in Sections 3.1, 3.2 and 3.3, respectively

3.1 AP-MBLBP

The idea of AP-MBLBP[33]is to get optimal features in each direction to maximize the margin between positive and negative feature vectors In order to combine these feature assembles, we encode rectangle regions of every direction similar to LBP The AP-MBLBP operator is deﬁned as follows:

N

i ¼ 0

where p cis the average intensity of center area, which is the gray area shown inFig 4(a); p i is the ith direction around p c ; w iis the

weight of p i ; gðp i,p c,w i,w cÞ is a piecewise function, which reﬂects

the relationship between p i and p c, which is deﬁned as follows:

gðp i,p c,w i,w cÞ ¼ 1, w i p i4w c p c

0, w i p irw c p c

(

ð10Þ

between-class variation and minimizes within-class variation

Fig 4shows the details of the process to generate AP-MBLBP features Firstly a block, which is composed of 3 3 cells, is shown

in Fig 4(a) Then the block is separated into 8 center-adjacent

following, the optimal weight of each cell is calculated by Fig 3 Asymmetric Gentle Adaboost.

Fig 2 Distributions between positive and negative samples.

Trang 4

Eq (5) Finally, encoding of all the cells is determined by Eq (10).

operator, which is similar to LBP

We can see in Fig 4(b), eight two-adjacent cells reﬂect the

variations of eight directions between center cell and adjacent

ones, which encode local intensity variation information of

differ-ent direction Instead of utilizing only one optimal direction like

Haarlike features, the other seven less discriminative features,

which are neglected, can also aid more local variation information

to enhance total discriminative power of APLBP features

The key step inFig 4(c) fully makes use of distribution of samples

to maximize the margin between positive and negative samples by

each CAF in order to promote the global discriminative power of

AP-MBLBP It is worth to mention that AP-MBLBP is different from

MBLBP, since the former have different weights for surrounding

rectangle regions The weight of the former one encodes the

discriminative information between positive and negative samples

AP-MBLBP can be considered as a generalized MBLBP We denote

weight vector w ¼ ðw1, w2, ,w8Þ, when w ¼ ð1,1, ,1Þ, and

AP-MBLBP is degenerated to MBLBP AP-MBLBP has all the

advan-tages of MBLBP; furthermore it is more discriminative than MBLBP

In order to ﬁnd the intrinsic reason of its strong discriminative

ability, we visualize the ﬁrst three AP-MBLBP features from our

trained detector FromFig 5(b), we can see that the ﬁrst feature is

symmetric along the center of face, and it covers the most

important region in face area such as eyes and nose It encodes

the symmetric context information of human face, which

inher-ently exists in nature Features inFig 5(c) and (d) encode the

geometric information of eyes, nose, mouth and face silhouette

information, which is very discriminative to exclude negative

patches More discussion will be given in Section 5.2

3.2 MBMCT and AP-MBMCT

We propose the block-based version of modiﬁed census

trans-form (MBMCT) and its AP version (AP-MBMCT), which is similar

to the MBLBP and AP-MBLBP MBMCT encodes not only the

surrounding cells of center cell, but also the average intensities

of all cells in a block So the total length of encoding bit string is longer than MBLBP The coding method is deﬁned as follows:

f mbct¼X

N

i ¼ 0

gðp i,p,w iÞ ¼ 1, w i p i4p

0, w i p irp , p ¼

1

N

X

N1

0

p i

(

The process of generating AP-MBMCT is shown inFig 6, which

is similar to generate AP-MBLBP except that the length of ﬁnal bit string is longer than the former one and it uses average block intensity instead of center block

3.3 MBCSLBP and AP-MBCSLBP

We also propose the block-based version of CSLBP and AP-MBCSLBP, and its deﬁnition is described as follows:

N=21

i ¼ 0

gðp i,p j,w i,w jÞ ¼

1, w i p i4w j p j

0, w i p irw j p j

(

The process of generating AP-MBCSLBP is demonstrated in

Fig 7, where the length of the ﬁnal bit string is half of the length

of AP-MBLBP Four pairs of sub-regions are central symmetry with respect to the center block, which is different from MBLBP and MBMCT

3.4 Time complexity of generating adaptive projection features The adaptive projection features utilize N training samples; each sample is d d size, and the dimension of feature vector is K The ﬁnal optimal weight of the feature is w ¼ S1ðmþmÞ, where S

1 -0.6 0.3 -0.6

-0.2

0.2 -0.1 0.2 -0.1

4

5 6 7 8

1 1 1

0

Fig 4 AP-MBLBP features.

Fig 5 Selected features on average face: (a) average face on training data, (b) ﬁrst select feature, (c) second selected feature, (d) third selected feature.

5 6 7

1

1 1

0

0 0 0 0 1

0.3 -0.6

-0.2 9

0.3 -0.6

-0.1

0.2 -0.1 0.2

Fig 6 AP-MBMCT features.

Trang 5

is K K between-class matrix, and mþ, m are K-dimensional

positive and negative average feature vectors, respectively We use

LU decomposition algorithm to solve the inverse of matrix, so the

complexity is (2/3)K3 The complexity of computing each feature is

OðK3þ K2þ NKÞ, where K equals 2 in this paper So it is very efﬁcient

to calculate, but little slower than original MBLBP features in the

training section

4 Object detection architecture

The proposed feature is designed to detect multiple object

instances in different scales at different locations in an input

image The object detection architecture is shown inFig 8 The

process of constructing an object detector comprises two phases,

one is training based on a large amount of training data, and the

other is testing in a multi-scale image pyramids In the training

section, adaptive projection feature set is calculated based on

original feature set and training data; then a nested cascade is

constructed using asymmetric Gentle Adaboost It is worth to

mention that adaptive projection feature set has to be

recalcu-lated after the training data is changed such as after bootstrap

negative sample in each stage This result in the ﬁnal object

detector is shown in the bottom part ofFig 8 In the test section,

trained object detector evaluates in test database Firstly an image

pyramid for every test image is constructed; secondly sliding

window technique is utilized to check every possible window

whether it contains an object Thirdly, non-maximum

suppres-sion method is used to merge multiple detection windows around

one object This procedure is shown in the top ofFig 8

5 Experimental results and analysis

We evaluate our proposed features on three well-know

Internet

5.1 Experimental setup

In the face detection experiment, the training set consists of approximately 10 000 faces, which covered out-of-plan rotation and in-plan rotation in the range of [ 20, 20] Through mirroring, random shifting operation, the size of positive sample set is enlarged to 40 000 Except that more than 20 000 large images, which do not contain any faces, were used for bootstrap All

samples are normalized to 24 24 size We use N¼8 in Eqs (9),

(11) and (12) The number of MBLBP generated in face detection experiment is 8464 The size of MBLBP varies from 3 to 24 pixels

in width and height, respectively The parameters of cost factor

are C1¼1 and C2 ¼0.25 in asymmetric Gentle Adaboost 256-bin,

classiﬁer for AP-MBLBP, AP-MBCT and AP-MBCSLBP features, respectively The minimum detection rate and maximum false positive rate are set to 0.9995 and 0.4, respectively

In the car detection experiment, we get training data from UIUC side view car dataset, which contain 550 car image and 500 negative images in training set The original size of training samples is at a resolution of 100 40; we manually cropped and resized it to

64 32 in order to reduce the negative effect from background The number of MBLBP generated in the car detection experiment is

30 513 The size of the MBLBP varies from 3 to 63 pixels in width and 3 to 30 in height The minimum detection rate and maximum false positive rates are set to 0.995 and 0.4, respectively Other parameter settings are the same as the above mentioned

5.2 Features comparison

In order to evaluate the effectiveness of our proposed features,

we conduct a comparison among Haar[3], Fldaþ Haar[45], LBP,

0 -0.2

Fig 7 AP-MBCSLBP features.

SC1

Rejected

true

Test images

Build image prymid

Sliding window

Non maximun suppresion

Training samples (objects, background)

distribution information of training samples

Original Feature set

Adaptive projection Feature set

Train nested cascade Classifier

Object detector

object

Test:

Train:

Feature-centric image prymid

Fig 8 Object detection architecture.

Trang 6

AP-MBCSLBP features based on face dataset The result is shown

in Fig 9 In this experiment, we randomly choose 20 000 face

samples and 20 000 negative samples for training, and the

validation set uses all the left 20 000 faces for cross validation

Eight single node detectors with each containing 50 weak classifiers are trained and evaluated in the validation set From Fig 9, we conclude that AP-MBMCT, AP-MBLBP and AP-MBCSLBP are more discriminative than MBMCT, MBLBP and Fig 9 Feature comparison (a) False negative rate vs weak classifier number, (b) false positive rate vs weak classifier number, (c) error rate vs weak classifier number.

Trang 7

MBCSLBP, respectively AP-MBMCT and AP-MBLBP have the

low-est false negative rate and false positive rate with the same weak

classiﬁers, where AP- MBMCT is slightly better than AP-MBLBP at

the ﬁrst 30 rounds and convergence to the same after that The

MBCSLBP is less discriminative compared to the LBP, MBMCT and

MBLBP, but better than Haar and Fldaþ Haar features (difference

in about 10%) ThroughFig 9(a), we can see Haarlike features

cannot distinguish faces from difﬁcult negative samples, but

LBP-based features are able to discriminate it effectively This is due to

the advantage of LBP, which possesses nice discriminative powers

in classifying weak texture object

InFig 9(b), we can see that both of the Haarlike feature and

LBP-based features can exclude negative patches effectively

(difference in about 2%) There are slight differences among

AP-MBLBP, AP-MBMCT and AP-MBCSLBP in excluding negative

samples, and AP-MBLBP and AP-MBMCT also perform better than

AP-MBCSLBP Fig 9(c) is the total error rate, which comprises

false positives and false negatives AP-MBLBP and AP-MBMCT

have little difference in discriminative power, and all distribution-based features (AP version) perform better than the original ones

So features making use of distribution of the training samples can improve the performance in the classiﬁcation

We visualized the top 12 FS2 features inFig 10 InFig 10, the most discriminative features are located at the eyes, corner of mouth and nose and face silhouette So the symmetric context of human face is utilized implicitly The symmetric geometric relationship of two eyes in ﬁrst feature can exclude more than 30% negative patches in the initial training set with zero false negatives Furthermore, the left half part (top 2) and the right half part (top 8) features describe the concurrence of one eye, corner

of mouth and half silhouette of face So the feature makes use of both symmetric context information and geometric relationship

of components in face implicitly, which improves the discrimina-tive ability To sum up, most of the discriminadiscrimina-tive features are laid in the face area or overlap with the face and background

So the silhouette of face and eyes is the most important area to

Fig 10 Top 12 features on FS2 set selected by our detector.

Trang 8

distinguish from non-faces That is why our proposed feature is

superior to others

5.3 Evaluation on MITþCMU dataset

5.3.1 Comparison of different features

We evaluate our detector on CMUþMIT frontal face database,

which consists of 130 ﬁles containing 507 faces Due to their

similarity of discriminative power among MBLBP, MBMCT and

CSLBP we evaluate these feature sets together instead of one by

one One feature set (FS-1) comprises MBLBP, MBMCT and

MBCSLBP; the other set (FS-2) includes AP-MBLBP, AP- MBMCT

and AP-MBCSLBP The ROC curve is displayed inFig 11

The data of red solid curve inFig 11is adopted from original

papers and the others are our implementation The ﬁnal trained

detectors based on LBP-like features contain 9 nodes about 400

weak classiﬁers Some of the test results in MITþ CMU database

are shown inFig 13

FromFig 11, we can see that the discriminative ability of FS-2

is much better than FS-1 It is due to the distribution information,

which is embedded into the features to enhance the

discrimina-tive ability The ROC curve of our detector FS-2 is also higher than

the Yan[44], which is the state-of-art During the experiment, we

have observed that it is very effective to use MSL[44]method to

generate a well-distributed positive training set; the ﬁnally

positive training set (about 9000 samples), which is generated

by bootstrapping in each stage, has a very nice representation of

the whole 40 000 original training sets We also train on this

distilled 9000 samples and get nearly the same result We also

ﬁnd that the well-distributed training set has an important effect

on the accuracy of detector So it is very important to collect a

well-distributed positive and negative training sets in order to

train a robust detector

5.3.2 Comparison of different training method

In this section we compare Nested cascade with MSL method

in training face detector Both of them use real value inheriting

techniques to decrease the number of features The difference

between them is that the former needs to bootstrap positives

incrementally to get a well-distributed face subset, which

effec-tively represents all the face images in a huge validation set

In ours method, we use a pre-selected face dataset, which is much smaller than the former one In order to using MSL method to train face detector, we collect a huge face dataset, which contains about 390 000 faces from many well-know face databases and the quantity is even larger than paper[44]used

We have implemented ﬁve different detectors (MSL with different features including LAB, FS1 and FS2, and nested cascade with FS1 and FS2) and the result of comparison is shown in

Fig 12 FromFig 12, we can see that when training with FS1 feature set, both MSL and nested cascade have very similar results In the case of training FS2 feature set, nested cascade is slightly better than MSL method We think that the MSL method with FS2 feature set needs to bootstrap positives in each round of every stage in an incremental manner, which is more sensitive to the distribution changing of positive and negative data especially

in earlier stage Furthermore, much of the computation load focus

on recalculating adaptive projection feature set after positive samples are bootstrapped, so it is more time consuming to train with MSL structure

FromFig 12, we can see that the ROC of our implementation of MSL with LAB feature is slightly lower than its published result[44]

We think it is may be due to different training data, because how to generate a well-distributed face manifold as training data is still a difﬁcult problem [51], which is out of the scope of this paper Furthermore, the MSL method is more complicated to implement and our method is easy to implement with reasonable performance

5.4 Evaluation on UIUC dataset

We also evaluated our features on UIUC side view car dataset, which consists of single scale set with 170 images containing

200 cars and multi-scale test dataset with 108 images containing

139 cars We denote them Test Set I and Test Set II and use the procedure provided in this dataset to evaluate the detection result The car detector is trained on the UIUC car training set, which comprises 550 car images and 500 background images using the same method as face Two detectors with different features are evaluated in the UIUC datasets The ﬁnal trained detector contains only about 90 weak classiﬁers The recall-precision curve of single-scale and multi-scale test dataset is demonstrated in

Fig 14 We can see that FS-2 feature set has high recall rate at the same precision; it is superior to FS-1 feature set, which is

Trang 9

consistent as the result evaluated on face dataset We compare

our approach with previous approaches following the equal

precision and recall rate (EPR) method The results are listed in

Table 1 From that we can see the FS-1 and FS-2 feature set have the moderate detection rate in detecting low resolution and complex background images

Fig 13 Face detector output on test images.

Fig 14 1-precision vs recall curve of test result.

Trang 10

We visualize the top 12 features from FS2 detector inFig 15.

We can see that most discriminative features focus on bottom of

the car, especially in the area of wheels In this experiment, we ﬁnd

that nearly no features lay on the top of the car, we think this due

to the training data have both left side view and right side view

mixed together So we manually separate the training set into two

datasets (left side view and right side view), and we train on these

two datasets We observe slight improvement in detection rate, so

it is effective to detect left and right side view separately

From Table 1, we can see that our method does not get the state-of-art result; we think that the side view car has smaller discriminative area than faces and all the positions mainly lay on the silhouette of car, so shape information predominates the importance of feature, that is why other shape-based descriptor gets the state-of-art result Some of the car detection results are displayed inFig 16

Table 1

EPR rates of different methods on UIUC dataset.

Methods Single-scale

(Test Set I) (%)

Multi-scale (Test Set II) (%) Agarwal et al [49] 76.5 39.6

Mutch and Lowe [52] 99.94 90.6

Wu and Nevatia [53] 97.5 93.5

Fritz et al [54] 88.6 87.8

Lampert et al [55] 98.5 98.6

MB-(LBP,MCT,CSLBP) 85.5 81.3

AP-MB-(LBP,MCT,CSLBP) 90.5 87

Fig 15 Top 12 features selected by our detector.

Table 2 Face detection results on PASCAL VOC dataset.

Dataset name VOC 2007 test set VOC 2008 test set VOC 2009 test set Description 4952 images 4336 images 6925 images Feature Haar FS1 FS2 Haar FS1 FS2 Haar FS1 FS2 False positives 94 87 85 66 77 57 112 125 103 True positives 853 1083 1135 1026 1246 1312 1352 1531 1597 The bold numbers in Table 2 means the number of false positives and true positives found by our detector with feature FS2 in the VOC 2007, 2008 and 2009 dataset respectively For the number of false positives, the less the better For the number of true positives, the more the better.

Tiêu đề	A Novel Distribution-Based Feature for Rapid Object Detection
Tác giả	Jifeng Shen, Changyin Sun, Wankou Yang, Zhenyu Wang, Zhongxi Sun
Trường học	School of Automation, Southeast University, Nanjing, China; College of Science, Hohai University, Nanjing, China
Chuyên ngành	Object Detection
Thể loại	Research Paper
Năm xuất bản	2011
Thành phố	Nanjing

Định dạng
Số trang	13
Dung lượng	1,16 MB