Study on new approaches for vehicle detection using stereoscopic information

The proposed vehicle detection approach accepts stereoimages that are captured by using stereo cameras as its input, and it performsthree main tasks as the following.. Finally, on-roadve

Trang 1

Study on New Approaches for Vehicle Detection

Using Stereoscopic Information

Advisor Prof Dr Shozo KONDO

Course of Science and Technology

Graduate School of Science and Technology

Tokai University

Le Thanh Sach

Trang 2

Vision-based vehicle detection is a useful, active, and challenging research ﬁeld It

is helpful for many applications, for example, Intelligent Transportation Systemsand Robot Vision Generally, a detection framework consists of two steps; one step

is to generate candidate vehicles’ regions from input images, and the other step is toverify the generated candidates For the verification step, many existing researchesdemonstrated that not only vehicles can be discriminated from the backgroundscene but also vehicles can be classified into hierarchical sub-classes For example,vehicles can be classified into Car, Bus, and Truck; and then Car can be subdividedinto Honda, Toyota, and so on However, efficient and reliable techniques to obtaincandidate vehicles’ regions are still missing In addition, estimating the distancefrom the detected vehicles to the host vehicle is also another challenge In order

to solve those problems, we propose to use stereoscopic information for vehicledetection in this thesis The proposed vehicle detection approach accepts stereoimages that are captured by using stereo cameras as its input, and it performsthree main tasks as the following First, disparity images are computed from theinput stereo images There are two challenging problems in this task, reducing thecomputation time and increasing the quality of disparity images In order to solvethese two problems, we propose new approaches that can eﬃciently and reliablyperform stereo matching Second, ground planes are estimated from the computeddisparity images For this task, we propose a method that is named as DynamicProgramming Based method for ground plane estimation The proposed method

is able work in the case that roads are not ﬂat In this task, U- and V-disparityimages are reliably produced from the disparity images as well Finally, on-roadvehicles are located in the input images by combining the estimated ground planesand the U- and the V-disparity images by a proposed diﬀerential-based method.The distance and the location of the located vehicles with respect to the stereocameras coordinate are derived from stereo camera parameters and the disparities

of the located vehicles Experimental results on both of artificial stereo imagesand real stereo images demonstrate that disparity images can be efficiently andreliably computed even in the low texture condition and that vehicles can bestraightforwardly located by the proposed differential-based method

Trang 3

1.1 Motivation 1

1.2 Problem 3

1.2.1 Active Sensor 4

1.2.1.1 Radar sensor 5

1.2.1.2 LIDAR sensor 5

1.2.1.3 Summary 6

1.2.2 Passive Sensor 7

1.2.2.1 Single Camera 8

1.2.2.2 Stereo Cameras 17

1.2.2.3 Multiple Cameras 21

1.2.2.4 Summary 22

1.2.3 Fusion of Sensors 26

1.2.3.1 Radar + Camera 26

1.2.3.2 LIDAR + Camera 26

1.2.3.3 Summary 27

1.3 Scope of the Thesis 27

1.4 Structure of the Thesis 28

2 Related Works 31 2.1 Stereo Matching 31

2.1.1 Introduction 31

2.1.2 Procedure 34

2.1.2.1 Pre-processing 34

2.1.2.2 Pixel-Matching Computation 34

ii

Trang 4

Contents iii

2.1.2.3 Cost Aggregation 35

2.1.2.4 Disparity Computation 35

2.1.2.5 Reﬁnement 37

2.1.3 Validation 37

2.1.4 Eﬃciency of Stereo Matching 38

2.1.5 Stereo Matching Evaluation 38

2.1.6 Stereo Matching Summary 38

2.2 Stereo-Based Ground Plane Estimation 39

2.2.2 Camera Geometry 40

2.2.3 Hough Transform Based Method 42

2.2.4 Fitting-Based Method 42

2.2.4.1 Least-Squares Method 42

2.2.4.2 Iteratively Reweighted Least-Squares Method 43

2.2.4.3 RANSAC Method 44

2.2.5 Dynamic Programming Based Method 44

2.2.6 Parallel-Based Method 45

2.2.7 Polar Histogram Based Method 45

2.2.8 Ground Plane Estimation Summary 46

2.3 Stereo-Based Vehicle Detection 46

2.3.2 U- and V-Disparity Images Combination 47

2.3.3 Disparity Grouping 48

2.3.4 Vehicle Detection Summary 48

3 A New Coarse-To-Fine Method for Computing Disparity Images 49 3.1 Outline of the Proposal 49

3.2 Sampling of Disparity Spaces 51

3.3 Coarsest Map Estimation 55

3.4 Active Region Determination 56

3.5 Cost Aggregation 58

3.6 Disparity Map Computation 60

3.7 Summary 61

4 Robust Approaches for Stereo Matching 62 4.1 Outline of the Proposal 62

4.2 Pixel-Matching Cost Computation 65

4.3 Cost Aggregation 66

4.3.1 Relationship: cost volumes and edge maps 68

4.3.2 Horizontal Aggregation 70

4.3.3 Vertical Aggregation 79

4.4 Disparity Image Computation 80

4.4.1 GCP Selection 80

4.4.2 GCP-DP 81

Trang 5

Contents iv

4.5 Summary 82

5 Stereo-Based Ground Plane Estimation and Vehicle Detection 84 5.1 Outline of the Proposal 84

5.2 Ground Plane Estimation 85

5.2.1 V-Disparity Image and V-D Cost Image Computation 85

5.2.2 DP-Based Estimation 87

5.2.3 Ground Plane’s Parameters 90

5.3 Vehicle Detection 91

5.3.1 Vehicle’s Disparity and Lower Mark Determination 91

5.3.2 Upper Mark Determination 93

5.3.3 Left and Right Mark Determination 94

5.3.3.1 Creating Vehicle-Disparity Image 95

5.3.3.2 Creating Density Histogram 97

5.3.3.3 Computing Diﬀerentials 97

5.3.3.4 Generating Pairs of Left Mark and Right Mark 99

5.3.3.5 Merging Pairs of Left Mark and Right Mark 101

5.3.4 Vehicle Veriﬁcation 104

5.4 Summary 104

6 Experimental Results and Discussions 105 6.1 Outline of the Experiments 105

6.2 A New Coarse-To-Fine Method for Computing Disparity Images 106

6.3 Robust Approaches for Stereo Matching 110

6.3.1 Qualitative Comparisons 112

6.3.2 Quantitative Comparisons 113

6.4 Ground Plane Estimation 115

6.5 Vehicle Detection 119

6.6 Summary 121

7 Conclusion 126 7.1 Summary of Results 126

7.2 Future Research Directions 127

Trang 6

List of Figures

1.1 Overview of existing approaches 4

1.2 The demonstration of Inverse Perspective Mapping 18

2.1 An example of stereo cameras 32

2.2 Geometry of stereo cameras 40

2.3 Ground Plane in Disparity Space 41

2.4 Demonstration of U- and V-disparity image combination 47

3.1 The coarse-to-ﬁne method’s idea 50

3.2 An active region’s example 50

3.3 Outline of the proposed method 52

3.4 An original image and its low resolution image 53

3.5 Matching low resolution stereo images 54

3.6 The sampled disparity space for the coarsest map 56

3.7 Active region 57

3.8 Modiﬁed active region 58

3.9 Description of active region 59

4.1 Examples of homogeneous regions 63

4.2 The proposed stereo matching system 64

4.3 Horizontal correspondence between cost volumes and edge maps 70

4.4 Vertical correspondence between cost volumes and edge maps 71

4.5 Examples of edge-segments and mid-segments 72

4.6 Proposed data structures 73

5.1 Examples of V-Disparity image and V-D cost image 86

5.2 Example of the ground plane’s proﬁle when the road is empty 88

5.3 Example of the ground plane’s proﬁle when there is a large on-road vehicle 89

5.4 Examples of Ground Plane’s Parameters 90

5.5 Five parameters of a frontal vehicle 92

5.6 An example of a D-costs array 93

5.7 An example of a V-costs array 94

5.8 Determination procedure of left and right marks 95

5.9 Illustration for the left and the right mark determination 96

5.10 Vehicle-disparity image formulation 97

5.11 Fitting to the vertical edge on the vehicle’s rear view 101

v

Trang 7

List of Figures vi

6.1 Qualitative illustration of the proposed coarse-to-ﬁne method 1 107

6.2 Qualitative illustration of the proposed coarse-to-ﬁne method 2 108

6.3 Quantitative comparison for the computation time 109

6.4 Quantitative comparison for the accuracy of disparity images 110

6.5 Qualitative comparisons for the artiﬁcial and the real sequence 114

6.6 Quantitative comparisons for the artiﬁcial sequence 115

6.7 Quantitative comparisons for the artiﬁcial and the real sequence 116

6.8 Classiﬁcation of road pixels 117

6.9 Artiﬁcial Sequence, examples of ground planes’ parameters 118

6.10 Real Sequence, examples of ground planes’ parameters 119

6.11 Artiﬁcial Sequence, ground planes’ parameters 120

6.12 Artiﬁcial Sequence, Comparison of ground planes’ parameters 121

6.13 Real Sequence, ground planes’ parameters 122

6.14 Illustration of ground plane estimation 124

6.15 Examples of vehicle detection 125

Trang 8

List of Tables

1.1 List of major research groups in the ﬁeld 2

1.2 List of major conferences and transactions in the ﬁeld 3

3.1 Outline of cost computation 60

4.1 Cost aggregation guided by edge maps: pseudo code 75

4.2 Cost aggregation guided by edge maps: explanation 77

6.1 Coarse-to-Fine Method: Parameters used to compute disparity maps106 6.2 Robust Approaches: Parameters used to compute disparity images 111 6.3 Parameters used for vehicle detection 123

vii

Trang 9

To my parents, my wife, and Ms Quynh-Anh.

They always give me endless love and strong motivation.

viii

Trang 10

Δ U , Δ V , Δ d : Sample steps in U, V, and

D-direction in disparity spaces.

C aggr (u, v, d) : A 3D cost volume that

contains aggregated pixel-matching

costs C aggr (u, v, d) can be either

C aggrL (u, v, d) or C aggrR (u, v, d)

C L

aggr (u, v, d) : A 3D cost volume that contains

aggregated pixel-matching costs In

the computation, the reference

im-age is the left imim-age of the input pair

of stereo images.

C R

aggr (u, v, d) : A 3D cost volume that contains

aggregated pixel-matching costs In

the computation, the reference

im-age is the right imim-age of the input

pair of stereo images.

C pix (u, v, d) : A 3D cost volume that contains

pixel-matching costs C pix (u, v, d)

pix (u, v, d) : A 3D cost volume that contains

pixel-matching costs In the

com-putation, the reference image is the

left image of the input pair of stereo

images.

C R

pix (u, v, d) : A 3D cost volume that

con-tains pixel-matching costs In the

computation, the reference image is

the right image of the input pair of

stereo images.

candi mark lef t and candi mark right :

Can-didate left mark and canCan-didate right mark of detected vehicles.

candi mark lower and candi mark upper :

Candidate lower mark and date upper mark of detected vehicles.

candi-cost(v, d) : V-D cost image.

d inf : The disparity of objects at the

in-ﬁnity.

d vhd : The disparity of detected vehicles densi(u) : Density histogram.

diff (u) and diff acc (u) : Diﬀerentials and

Accumulated diﬀerentials tively.

respec-I vdi (u, v) : Vehicle-disparity image.

I vd (v, d) : V-disparity image.

map(u, v) : Disparity image map(u, v)

rep-resents for either map L (u, v) or map R (u, v)

map L (u, v) : Left disparity image.

map R (u, v) : Right disparity image.

mark lower , mark upper , mark lef t , and mark right

: lower mark, upper mark, left mark, and right mark of detected vehicles.

path DP : The path in V-D cost image that is

discovered by the proposed dynamic programming based method.

pathgpDP : The ground plane’s proﬁle that is

extracted from path DP pathgpLF : The ground plane’s proﬁle that

is obtained by ﬁtting pathgpDP to a straight line.

W U × W V : The size of aggregation windows.

ix

Trang 11

D : The width of disparity search

range.

DP : Dynamic Programming.

FPGA : Field Programmable Gate Array

GPU : Graphic Processing Unit.

IPM : Inverse Perspective Mapping.

MAE : Mean of Absolute Errors.

SSE : Streaming Single Instruction

Mul-tiple Data Extensions.

U and V : The width and the height of input

images.

WTA : Winner-Takes-All, a method for

determining the disparity of pixels.

Trang 12

of the host vehicles are referred to as the frontal vehicles and the frontal obstaclesrespectively in this thesis Terminologies ”preceding vehicles/obstacles” are in-terchanged with ”frontal vehicles/obstacles” to avoid the boring repetitious Thetechniques and the technologies developed in Vehicle Detection enable the host ve-hicles to sense the surrounding environment, so they are important and useful formany applications, for example, intelligent vehicles, autonomous vehicles, robotnavigation systems, traﬃc management systems, and so on.

Vehicle Detection and and its related researches in Intelligent Transport Systemsare currently active research fields, as demonstrated by a large number of partic-ipants and research groups around the world Some major groups are listed inTable1.1 Research groups proposed different ways to sense the environment infront of host vehicles Some groups used active sensors, e.g., Radar and LiDar.Meanwhile the other groups used passive sensors, i.e., normal cameras, in combi-nation with computer vision techniques Especially, the research group in the firstrow of Table1.1 demonstrated that frontal obstacles can be detected by using onlyvision information and that the distance from the detected obstacles to the hostvehicles can be also estimated

1

Trang 13

Chapter 1 Introduction 2

Table 1.1: List of major research groups in the ﬁeld

Name Explanation

PARMA This research group is from Parma University, Italy The

re-searchers in this group are the founders of GOLD - Generic stacle and Lane Detection System This group was also responsiblefor building the stereovision system of TerraMax which is an au-tonomous vehicle participating to the DARPA Grand Challenge.CMU This is one the largest and earliest research group in the ﬁeld, it

Ob-is from Carnegie Mellon University, USA It Ob-is well-known withautonomous vehicles named NavLab The researchers in this group

is the founders of SCARF (Supervised Classiﬁcation Applied toRoad Following), UNSCARF (UNSupervised Classiﬁcation Applied

to Road Following), YARF (Yet Another Road Follower), and so

on In the DARPA Grand Challenge, their team (Red team) wasone of the leaders in the competition

LIVIC This group is from INRETS, France The researchers in this groups

are the founders of the terminology V-Disparity Image

Vehicle Detection is strongly encouraged by academe and industry as well Inacademe, the research is supported by a system of special conferences and trans-actions, as shown in Table 1.2, and by a crowded competition, e.g., the DARPAGrand Challenge1 In industry, the hardwares that are used for vehicle detectionand related researches in Intelligent Transportation Systems have achieved an im-portant improvement during the last few years For example, recent laser scannerscan perform the scanning task with high spatial resolution at high speeds, and re-cent stereo cameras can produce high resolution images with high frame rate Inaddition, in order to help the processing of a massive amount of visual information

in images, special instruction sets2 and special hardwares3 have been also released.Recent years, many prototypes for intelligent or autonomous vehicles have beendeveloped and demonstrated in the DARPA Grand Challenge, and there was a lot

of eﬀort of researchers in Intelligent Transportation Systems However, the ability

of locate vehicles/obstacles and to estimate their distances to host vehicles are still

a challenging problem

Because of the usefulness, the activeness, and the challenge of vehicle detection,

it is selected to be studied in this thesis Based on promising results of the periments on using visual information for detecting vehicles and obstacles, stereo

ex-1Defense Advanced Research Projects Agency: http://www.darpa.mil/grandchallenge/

2Intel: SSE, AMD: 3DNow

3GPU: Graphic Processing Units

Trang 14

Table 1.2: List of major conferences and transactions in the ﬁeld

Short Name Explanation

ITSC IEEE Conference on Intelligent Transportation Systems

IVS IEEE Intelligent Vehicles Symposium

ITST IEEE Transactions on Intelligent Transportation Systems

ICIP IEEE Conference on Image Processing

IPT IEEE Transactions on Image Processing

CVPR IEEE Conference on Computer Vision and Pattern RecognitionPAMI IEEE Transactions on Pattern Analysis and Machine Intelligence

images and computer vision techniques are used to obtain 3D information aboutthe environment in front of host vehicles This information is then used to detectvehicles and obstacles in this thesis Especially, the whole rear views of frontalvehicles are detected and separated in such a way that they are able to be usedwith further applications like vehicle classiﬁcation

This section presents an overview of existing approaches in vehicle detection Theadvantage and disadvantage for each of the approaches are also identiﬁed Thissection creates a basis for selecting problems that are solved in this thesis Theselected problems are outlined in the next section

As shown in FIGURE1.1, existing approaches can be classiﬁed by the sensor that

is used to detect vehicles, so we have the approaches corresponding to using activesensor, passive sensor, or fusion of sensors for vehicle detection The terminology

”active sensor” is used to mean that in order to estimate the distance of frontalobjects, the sensor actively sends a probing signal and measures the elapsed time

of the reﬂected signal that is caused by the frontal objects [10] The elapsed time isthen used to infer the distance of the frontal objects Meanwhile, the terminology

”passive sensor” is used to indicate optical sensors, i.e normal cameras, whichproduce images by focusing the light on photoactive regions The captured imagesare then used to detect the frontal objects The way to detect vehicles in each ofthe approaches mentioned above is introduced in the following subsections Thereare also short discussions about the advantage and the disadvantage at the end ofeach of the approaches

Trang 15

is shorter than the LIDAR sensor signal’s Radar uses radio waves or microwaves,meanwhile LIDAR uses signals having higher frequencies, for example, ultravio-let, visible, or near infrared Some typical research works using active sensors aregiven in the following subsections.

Trang 16

Chapter 1 Introduction 51.2.1.1 Radar sensor

Yiguang et al [1] mounted a pulse radar sensor on a bar that was hung 6 metersover a lane so that the sensor was in the middle of the lane The area right belowthe radar was swept for every 0.01 seconds to measure the height of vehicles Theoutput of the radar, which was the correlation between the probing signal andthe reﬂected signals within 1.1 meters to 6.9 meters, for every 150 measurementswere converted to a 2D-image, which was called a height image If there are threereﬂecting points in the height image at a certain height then there is a vehicle atthat height

Only the distance that is measured by radar sensor is not enough for reliablydetecting vehicles, so many existing approaches combined the distance with visualinformation captured by vision sensor to detect vehicles, for example [2 4] Thefusion of radar sensor and camera is introduced in Section 1.2.3.2

1.2.1.2 LIDAR sensor

LIDAR sensor is widely used for measuring 3D information in the scene ing the host vehicle For example, all of the ﬁrst three leading competitors inthe DARPA Grand Challenge 2005 [5 7] used LIDAR sensors for obtaining 3Dinformation

surround-In [5], range information was obtained by using only LIDAR sensors The sensorswere angled downward to scan the terrain in front of the host vehicle as it moves.The five sensors were mounted at five different angles Each sensor generates

a vector of 181 range measurements spaced 0.5 degrees apart Projecting themeasurements into the global coordinate frame according to the estimated pose ofthe vehicle results in a 3-D point cloud for each laser Obstacle detection on themeasured points can be formulated as a classiﬁcation problem

In [6], RedTeam used 7 LIDAR sensors for sensing the scene, one sensor for taining long range, four sensors for detecting obstacles in short range, and twosensors for detecting obstacles and for mapping terrain topology In combinationwith one radar sensor, the 3D information can be obtained and used for detectingobstacles

Trang 17

ob-Chapter 1 Introduction 6

In [7], 3D information was obtained by combing LIDAR sensors and stereovision.For LIDAR sensors, three SICK LMS-291 LIDARs and a IBEO ALASCA LIDARwere used The three SICK LMS-291 LIDARs were used for positive and negativeobstacle detection The two forward facing SICK LIDARs were mounted on theoutermost edges of the front rollbar They were pointed 10 degrees down and 25degrees outward from the truck so that there is good coverage on extreme turns.The rear facing LIDAR was mounted near the cargo bed height in the middle of thetruck and was pointed down for negative obstacle detection The IBEO ALASCALIDAR was a 4-plane scanner that was used for positive obstacle detection TheIBEO LIDAR was mounted level in the front bumper and had two planes that scantoward the ground and two planes that scan toward the sky With a range of 80meters and a resolution of 0.25 degrees the IBEO can detect obstacles accurately

at long and close range The 240-degree scan area allows the IBEO to see obstaclesaround upcoming turns

LIDAR provides excellent range information to diﬀerent objects However, it

is diﬃcult to recognize these objects as vehicles from range information alone.Therefore, there were many research works that fused LIDAR sensors and passivesensors [8, 9]

The most advantage of LIDAR sensor is that it can produce high accuracy 3Dinformation in clear weather However, LIDAR sensor is very sensitive to badweather, for example, fog, rain, or snow The spatial resolution and the price of

Trang 18

LIDAR sensor depend on the type of the LIDAR sensor Nowadays, there aresome 3D laser scanners that produce very high resolution with very high accuracy,however such sensors are very expensive too, from 70 to 150 thousands of USD[11], by 2008 In order to over come the cost, some intelligent vehicle prototypesused only laser line scanners [5, 6] or laser multiple plane scanners (four planessupported by IBEO scanner) [7]

Both of radar and LIDAR sensor can provide only distance of frontal objects, and

it can not provide rich information, such as color, texture, and edges as in visionsensor Using only distance can not produce high accuracy classiﬁcation system

in further steps of Intelligent Transportation Systems [9]

1.2.2 Passive Sensor

Passive sensor is used as a tool to capture images of the scene surrounding thehost vehicles, i.e., the scene in front, in back, on the left side, or on the rightside of the host vehicles The detection of vehicles and obstacles is based on theobtained images by using the techniques in Image Processing, Computer Vision,and Pattern Recognition

Generally, the detection of vehicles contains a step to generate candidates forvehicles and another step to verify the generated candidates The purpose of thefirst step is to generate several regions of interest that are likely to be the image ofvehicles An important criteria of the first step is that the vehicles should be one ofthe generated candidates to avoid the missing in the detection For frontal vehicledetection, it is also another challenge to align the detected regions of interest withthe horizonal and the vertical boundary edges of the rear view of the vehicles Suchalignment is important because it can improve the performance in the verificationstep

The existing approaches, which are explained in the following subsections, for erating vehicle candidates depends on the number of cameras and the informationthat are available in the detection

gen-Based on the generated candidates in the previous step, the veriﬁcation step is

to decide whether a candidate is a vehicle or not This task can be done by thetechniques in Pattern Recognition, for example, K-NN, Bayessian Classiﬁer, andSupport Vector Machine

Trang 19

Chapter 1 Introduction 81.2.2.1 Single Camera

1 Exhaustive Search: Based on the fact that vehicles appear as rectangularregions at some locations in the input image, this approach moves severalrectangular windows with diﬀerent sizes from left to right and from top tobottom of the input image Each region produced by a move is considered

as a candidate and is verified in the verification step Several windows withdifferent sizes are used to enable the system to detect the same vehicle atdifferent distances and to detect different vehicles that have different sizes.Some typical research works that performed the detection by this approachare [13], [14], and [15] This approach is simple for understanding However,

in practical, this approach need very strong computing resources to generate

a large number of candidates and to verify them also

2 Selective Search: In order to avoid the computation problem as in theexhaustive search, this approach uses speciﬁc information that is supposed

to be known in advance to limit the regions that are search for vehicles.There are several kinds of the speciﬁc information, and they are introduced

as follows

2.1 Symmetry:

Generally, the rear and the front view of vehicles are usually symmetricaround a vertical and middle axis This characteristic was utilized todetect vehicles in the input image by many existing research works Thediﬀerence among the existing research works that used the symmetricproperty is the way to use and the way to measure the symmetricproperty, which is referred to as the symmetry degree More specially,the symmetric property can be used to generate vehicle candidates fromthe input image [16, 17, 21, 25, 26, 39, 45] or to verify the candidates[18–20, 22] that are generated by other ways There are several ways

to measure the symmetry degree as follows

Michael et al [16, 17], one of the earliest researchers who used thesymmetric property, considered w pixels that are centered around a po-sition xsin a certain scan-line of the input image as function fxs,w(x) ofintensities, i.e., gray levels This function was decomposed into an evenand an odd function After that, the symmetry degree was measured insuch a way that it is proportional to the ratio of the energy of the even

Trang 20

function to the odd function’s The symmetry degree was measuredfor every pixel in the input image and for every possible symmetriccenter, i.e., xs, and for every possible symmetric interval, i.e., w Thesymmetry degrees of a number of scan-lines were also accumulated toachieve a reliable measurement Large symmetry degrees were selected

by thresholding, their symmetric axes and intervals, i.e., < xs, w >, ﬁne the left and the right border of the detected vehicles The contour

de-of the detected vehicles were extracted by using Symmetry EnhancingEdge Detector [17]

In [45], Sach et al used the symmetric property for verifying the vehiclecandidate He measured the symmetry degree of the candidate by thevariance of symmetric centers that were determined for horizontal scan-lines in the candidate The small variance the symmetric centers have,the large symmetry degree the candidate owns

In [21], Broggi et al used the symmetric property for generating thecandidates He computed the symmetry degree by sn2 on edge maps,where s is the number of the symmetric points, and n is the totalnumber of white pixels Being similar to [16,17], the symmetry degreewas measured for every pixels and for every possible pair < xs, w >.The left and the right border of the detected vehicles were determinedfrom the pairs < xs, w > that have large symmetry degrees The shadowproperty was used to detect the bottom border of the detected vehicles.The top border was determined by the aspect ratio of the width to theheight of the detected vehicles

Tie et al [22] divided the candidate region into a left part and a rightpart which are separated by a vertical and middle axis The symmetrydegree was measured in such a way that it is proportional to the ratio

of the average of the left and the right part to the diﬀerence betweenthe left and the right part

The symmetric property was widely used in Vehicle Detection Thereare several existing ways for measuring the symmetry degree, all of themcan perform the measurement on a single candidate quickly However,

in order to generate candidates, it is necessary to perform the ment for every pixel, for every possible symmetric axis, and for everypossible vehicle widths By this way, the detection becomes a timeconsuming task Moreover, the symmetry approach also has problem

Trang 21

measure-Chapter 1 Introduction 10

in the case that there exist low texture regions in the input image andthat there exist objects that own the symmetric property

2.2 Color:

In Intelligent Transport Systems, using color for detecting lane and road

is more popular than for detecting vehicles This trend is due to thefact that the road’s colors distribute in some certain regions of a colorspace, meanwhile the vehicle’s colors disperse randomly in the colorspace Therefore, the most important characteristic of color approach

is the way to model the distribution of interested colors, which are road’scolors in Road Detection and vehicle’s colors in Vehicle Detection

In [50], Crisman et al proposed two methods for determining the tribution of road’s colors The first method is SCARF (SupervisedClassification Applied to Road Following), and the other is UNSCARF(UNSupervised Classification Applied to Road Following) In order toobtain road’s colors, he used two color cameras that were separated by

dis-a short distdis-ance He tredis-ated the two imdis-ages cdis-aptured from the twocameras as if they were taken from a single camera The objective ofusing two diﬀerent cameras for capturing the same scene is to enlargevariation of colors in pixels which now have 6 dimensions In SCARF,sample road’s colors, which now were in a 6-dimensions color space,were assumed to have the Gaussian model, so they are used to estimatethe unknown parameters of the model After that, incoming pixelswere classiﬁed into road or non-road class In UNSCARF, the position

of pixels, i.e., x-index and y-index, were also combined with color formation to form vectors of 5 components (if only single image wasused) or 8 components (if both of the two images were used) Road areclassiﬁed by a clustering technique

in-In [51], a set of sample colors for road was collected After that, Guo

et al used spheres to model the regions in the color space that containthe sample road colors Color space Lu∗v∗ was used in order to achievethe uniformity of perception Roads were detected by checking eachpixel in input images to decide whether it was inside or outside of theapproximated region

Luo et al [48] proposed a method to normalize sample colors of carsand the background The color of cars and the color of the backgroundwere assumed to follow the Gaussian model Thereby, all pixels in the

Trang 22

input image could be classified as foreground (cars) or the backgroundaccording to the Bayesian classifier The pixels that were classified asthe foreground are good suggestions for hypothesizing the location ofcars in the image

Obviously, in order to estimate the likelihood of vehicle’s colors, it ishard to assume that a certain color has more possibility than the otherones in the color space Therefore, in [31], Wang et al assumed thatthe colors of vehicles have uniform distribution, i.e., all of the colors

in side the color space have the same possibility to be vehicle’s colors.Based on that assumption, he used Hidden Markov Model to learn thecolor of foreground, shadows, and the background After that, vehiclescould be detected

In another approach, Sach et al [45] focused on the red colors oftail lights of cars He obtained the sample red colors and ﬁt them to aGaussian mixture model After that, the red areas of the tail lights weredetected and grouped together to generate car back-view candidates.2.3 Shadow:

Under the sunshine condition, a vehicle create a shadow on road foritself There are two signs associated with the shadow The ﬁrst one

is that the shadow is darker than the road’s colors The second one

is that the shadow is next to the vehicle These two signs are used todetect vehicles in many existing research works

In [27], which is one of the simplest and earliest method for detectingshadows and vehicles, shadows were detected from gray images Theidea is that road’s intensities were assumed to have Gaussian distri-bution Hence, vehicle detection systems contains the following steps.First, the road intensities were collected and ﬁtted to a Gaussian model.Next, the threshold that was expected to separate the shadow’s pixelsand road’s was determined by Tsh = (μ− 3σ), where μ and σ are themean and the variance of the sample road’s intensities respectively Af-ter that, the shadows were detected by thresholding the input imagewith Tsh Finally, the shadows were assumed to be located underneaththe detected vehicles, so bounding boxes of the detected vehicles weregenerated so that they lie above the detected shadows and that their

Trang 23

aspect ratio satisfy a predeﬁned constraint The way for detecting ows as mentioned above is widely used in many other research works[18, 20,32, 36]

shad-In [30], gray image sequence was used as the input The intensities ofthe background (road) and shadows were assumed to have Gaussiandistribution Meanwhile, the distribution of foreground objects wasassumed to be uniform Every pixel belongs to one of the three classes

as follows: the background, the foreground, and the shadow Jien et

al proposed to use HMM of 3 states, which correspond to the threeclasses, for modeling the transition of the class of pixels from frame toframe After training, the trained HMMs were used to classify pixels inincoming frames to one of the above three classes An modiﬁcation of[30] that can work with RGB color image is given in [31]

Similar to [27], the thresholding technique was also used in [28, 29].However, threshold values were deﬁned on RGB [28] and YCbCr [29]color space rather than in gray intensities

2.4 Vertical/Horizontal Edges:

Generally, the rear and the front view of vehicles that have more than

2 wheels contain many vertical and many horizontal edges The edgesare caused by components in the vehicles, for example, license plate,bumper, and spoiler and caused by the diﬀerence between the color andthe depth of the vehicles and the objects’s surrounding them So, theedge information is useful for detecting vehicles, and in fact it is used bymany existing research works The diﬀerence among the existing works

in using the edge information is characterized by the way to utilize thisinformation There are several ways as follows

In [23, 24], vertical and horizontal edges were extracted and olded, and edge-based symmetry degrees were computed on those do-mains The ﬁnal symmetry degree that was obtained by combining theedge-based and the intensity-based symmetry degree was reliable andable to solve the uniform problem that results highly symmetry de-gree for homogeneous regions Based on the reliable symmetry degree,vehicles could be detected from the input image

thresh-Goerick et al [37] proposed a method called Local Orientation Coding(LOC) to extract edge information An image obtained by this methodconsists of strings of binary code representing the directional gray-level

Trang 24

variation in the pixels neighborhood These codes carry essentiallyedge information Handmann et al [38] also used LOC, together withshadow information, for vehicle detection

Mathews et al [36] generated candidates by the following procedure.First, the gradient image that was used to detect vertical edges wasobtained from the input gray image Next, the gradient image wasprojected onto a horizontal line to obtain a histogram Local maximumsthat indicate the locations of vertical edges were determined, so the leftand the right border of the region of interested could be identiﬁed.After that, the bottom border was determined by detecting shadows.Finally, the top border was inferred so that the width and the height

of the region of interested are the same

The left and right border of vehicles that are far from the camera arehard to be detected by [36] because their lengths are so small There-fore, Gwang et al [39] mapped the input image to the top view by usingInverse Perspective Mapping (IPM) After mapping , the left and theright border are longer , so they can be detected from the edge map ofthe remapped image The edge map of the original image was projectedonto a vertical line to detect bottom border of vehicles The top borderwas inferred so that the regions of interested have the same width andheight A method that is similar to [36, 39], i.e., borders are detected

by projecting edge maps, was used in [41] to generate candidates forvehicles

Parodi and Piccioli [40] proposed to extract the general structure of atraﬃc scene by ﬁrst segmenting an image into four regions: pavement,sky, and two lateral regions using edge grouping Groups of horizontaledges on the detected pavement were then considered for hypothesizingthe presence of vehicles

2.5 Corners:

A corner can be defined as the intersection of two edges, or it can bedefined as a point where there are two dominant and different edgeorientations in a local neighbourhood of the point Generally, the rearand the front view of vehicles have several corners that are caused bythe boundary of the vehicles (e.g., upper-left, lower-left, lower-right,and upper-right corner) or by many sub-regions inside the vehicle’s

Trang 25

lower-On the other hand, Harris detector [47] was used in [48, 49] to detectcorners in the input image The set of the detected corners were thenused to verify vehicle candidates For example, in [48], if a candidatehas less than 0.5λC , where λC was the average number of corners in

a vehicle class, then it was rejected Meanwhile, in [49], the Hausdorﬀdistance between set A and every set B in the vehicle database wascomputed, where A was the set of the detected corners in the candidate,and B is representative for a set of corners that was detected for a vehicleclass in the database The candidate was then classiﬁed into the classthat has the minimum distance

by using the shadow information Talinke et al [33] used four measures(Energy, Contrast, Entropy, and Correlation), to generate candidates

in combination with the shadow information Meanwhile, Hartwig et

al [35] used 6 measures for generating vehicle candidates

2.7 Vehicle Lights:

One of the most recognizable features of 4-wheel vehicles is that therear and the front of the vehicles have two lights, two head-lights in thefront view and two tail-light in the rear view Generally, the two vehiclelights own the following characteristics First, the shape and the size

Trang 26

of the two vehicle lights are similar Second, the distance between thetwo vehicle lights also satisﬁes a certain constraint that depends on thevehicle type Third, the tail-lights usually contain red colors Fourth,the vehicle lights are very bright at night-time Based on the aboverecognizable characteristics, vehicles can be detected

In [42], Cucchiara et al proposed a method for detecting head-lights

in night-time at follows First, the scene that are outside the roadwas removed by masking with some predeﬁned masks The analysis ofthe resultant image is simpler because street-lamps were also removed.Next, the binary image was computed by thresholding Finally, pairs

of head-lights were detected by utilizing the features of head-lights, forexample, the shape, the size, and the minimum distance between thetwo lights in the pairs The minimum rectangular box that contains thepair of the detected head-lights was generated as vehicle candidates

In [43–45], tail-lights were detected by analyzing the spatial relationshipbetween the two tail-lights, e.g., the shape, the size, and the distance,and by utilizing red colors in the tail-lights as well

3 Motion-Based Search:

3.1 Subtraction-Based Features:

Most of researches that use vision-based sensors alone follow this method

of candidate generation Candidates are generated by a subtraction tween input images and background or between two consecutive images

be-in image sequences [52–60] The former is used only in case of thebackground can be modeled or collected reliably; while the later is usu-ally used for detecting moving objects in image sequences A typicalbackground subtraction has been studied in [56, 57]; because station-ary vision-based sensors were used in a controllable environment, thebackground image, called Ibg(x, y), could be modeled reliably upon theprogram execution To detect vehicles in image I(x, y), a binary image

Ib(x, y) was formed as in the following equation

Ib(x, y) =

1, |I(x, y) − Ibg(x, y)| ≥ θ

Trang 27

where θ was a threshold value to transform the diﬀerence between twoimages into the binary image White pixels in Ib(x, y) that were insideenough large regions were used to generate vehicle candidates

On the other hand, studies in [58–60] could adapt the background tothe change of the environment by an algorithm so-called self-adaptivebackground subtraction The principal of the method in those stud-ies is to modify the background image (CB) by using instantaneousbackground (IB) and applying an appropriate weighting α as follows:

CBk+1 = (1− α)CBk+ αIBk , (1.2)

where, k is frame index in image sequences The instantaneous ground is deﬁned as IBk = Mk• CBk+ (∼ Mk)• Ik ; where, Ik is thecurrent frame; Mk is the binary vehicle mask, and it is similar to Ibabove

back-3.2 Optical Flow:

The optical ﬂow approach utilizes the relationship between consecutiveframes in the input image sequences to detect vehicles Let us representimage intensity at location (x, y) at time t by E(x, y, t) Pixels on theimages appear to be moving due to the relative motion between thesensor and the scene The vector ﬁeld o(x, y) of this motion is referred

to as optical ﬂow

Optical flow can provide strong information for generating vehicle didates Approaching vehicles at an opposite direction produce a di-verging flow, which can be quantitatively distinguished from the flowcaused by the car ego-motion [61] On the other hand, departing orovertaking vehicles produce a converging flow To take advantage ofthese observations in obstacle detection, the image is first subdividedinto small sub-images and an average speed is estimated in every sub-image Sub-images with a large speed difference from the global speedestimation are labeled as possible obstacles

can-The performance of several methods for recovering optical flow o(x, y)from the intensity E(x, y, t) have been compared in [62] using someselected image sequences from (mostly fixed) cameras Most of thesemethods compute temporal and spatial derivatives of the intensity pro-files and, therefore, are referred to as differential techniques Getting

Trang 28

a reliable dense optical ﬂow estimate under a moving-camera scenario

is not an easy task Giachetti et al [61] developed some of the bestfirst-order and second-order differential methods in the literature andapplied them to a typical image sequence taken from a moving vehiclealong a flat and straight road In particular, they managed to remapthe corresponding points between two consecutive frames, by minimiz-ing the following distance measure:

a less dense grid to reduce computational cost

Kruger et al [63] estimated optical ﬂow from spatiotemporal tives of the gray value images using a local approach They furtherclustered the estimated optical ﬂow to eliminate outliers Assuming

deriva-a cderiva-alibrderiva-ated cderiva-amerderiva-a deriva-and known ego-motion, they detected both ing and stationary objects Generating a displacement vector for eachpixel (i.e., dense optical flow) is time consuming and also impracticalfor a real-time system In contrast to dense optical flow, sparse opticalflow is less time consuming by utilizing image features, such as corners[64], local minima and maxima [65], or Color Blobs [66] Although

mov-it can only produce a sparse flow, feature based methods can providesufficient information for generating vehicle candidates Moreover, incontrast to pixel-based optical flow estimation methods where pixelsare processed independently, feature-based methods utilize high-levelinformation Consequently, they are less sensitive to noise

1.2.2.2 Stereo Cameras

1 Inverse Perspective Mapping:

Inverse perspective mapping (IPM) was ﬁrst introduced in [67] The idea

of IPM is to reverse the perspective mapping that has been performed by acamera We consider the camera as a perspective mapping Mathematically,

a point P in the real-world coordinate that is deﬁned with respect to the

Trang 29

camera’s coordinate can be perspectively mapped onto a point p in the imageplane of the camera by p = M× P , where M is the camera matrix, and thecamera’s parameters in the matrix can be estimated by calibration Because

of the perspective mapping, road and objects are distorted in the image, forexample, two parallel borders of the road in the real-world coordinate have anintersection in the horizonal line of vanishing points The inverse perspectivemapping is to reverse that eﬀect Of course, P can not be obtained easily by

P = M−1× p because perspective mapping is not mathematically invertible,i.e., M is not an invertible matrix However, if we add some more constraints,then we can perform the inverse perspective mapping For example, if weassume P originally lies in an horizontal plane that we can know, then Pcan be determined mathematically Existing research works in IPM usuallyassumed that P lay in the road plane and that the position of the road planewith respect to the camera was known in advance Usually, the position ofthe road plane was deﬁned via the extrinsic parameters of the camera, i.e.,the height, and the pitch, the yaw, and the roll angle

Figure 1.2: The demonstration of Inverse Perspective Mapping Upper

row: an example of the scene Lower row: (a) left image, (b) right image,

(c) left remapped image, (d) right remapped image, and (e) diﬀerence

image in which the gray area represents the region of the road not seen

by both cameras (from [72])

In fact, IPM is not a technique reserved for pairs of stereo images, it can bedone for every single image whenever the parameters of the camera is known

Trang 30

There exist many research works that performed IPM on single images forlane detection [68–71] and vehicle detection [39]

An extension of IPM for stereo cameras was introduced by Bertozzi et al

in [72], and they also used IPM in many research works [24, 73, 74] Theidea of the extension is shown in FIGURE 1.2 The assumption was that theparameters (both of intrinsic and extrinsic) of stereo cameras were known inadvance (by a calibration task), the left and the right images were able to beremapped onto the road plane The diﬀerence between the two remappedimages were computed In the ideal case, the diﬀerences for the road’s pixels

in the diﬀerence image were zero, and every obstacle resulted two triangles

in the diﬀerence image, as shown in FIGURE1.2 (e) The appearance of thetwo triangles were a recognizable feature of the obstacles, and it was usedfor the detection

Based on the diﬀerence image, the obstacle detection problem becomes thedetection of the two triangles Let us denote CL(x, y) and CR(x, y) are twopoints that are images of the center of the left and the right camera on theroad plane respectively The orientation of the two triangles is shown inFIGURE 1.2 (e) The left edges of the two triangles pass through CR(x, y),meanwhile the right edges passes through CL(x, y) In [73], Bertozzi et al.assumed that the distance between CL(x, y) and CR(x, y) are small, so theyuniﬁed CL(x, y) and CR(x, y) to the middle point CLR(x, y) = CL (x,y)+C R (x,y)

A polar histogram was built by rotating a scan-line around CLR(x, y) cal maxima were detected from the polar histogram and then associated togenerate vehicle candidates

Lo-In practical, the detection of the two triangles a diﬃcult task because of thefollowing problems:

• Because there exist textures and non-homogeneous regions in frontalobjects and because the shape of the frontal objects are irregular shape,the two triangles are too noisy

• If there are more than one frontal objects then there are more twotriangles In this case, it is hard to associate local maxima in the polarhistogram

• In the case that a frontal object is partially visible, there exist only onelocal maxima This is also another diﬃcult situation

Trang 31

Ki et al [75] based on the fact that a vertical line in the input image will

be mapped onto a straight line in the remapped image, and the straight linepasses through a point that is the projection of the center of the camera

on the road plane Moreover, the straight line in the remapped image islonger than the original vertical line Hence, obstacles were detected by thefollowing procedure First, IPM was done for both of the two images in theinput stereo pair to obtain a pair of the remapped images After that, theremapped images were used to obtain edge maps by edge detection Polarhistograms were computed for both of the two edge maps using the technique

in [73] Finally, obstacles were detected by associating local maxima in thetwo computed polar histograms The relationship between local maxima inconsecutive frames in the input image sequence were also utilized to detectobstacles more accurately

2 U- and V-Disparity Image:

U- and V-disparity images were ﬁrst introduced by Labayrade et al in[76,77], and they are analyzed in detail in [78,79] The detection of vehiclesand obstacles by this approaches usually contains the following tasks:(a) Computing disparity images: Disparity images are computed fromthe input stereo pair In the computation, one of the two images inthe input stereo pair is used as a reference image The size of thedisparity image that is computed for the reference image is the same

as the reference image’s Each pixel in the disparity image encodesthe 3D information associated to that pixel Actually, this step usestechniques in stereo matching, which is one of the research areas incomputer vision Stereo matching is introduced in detail in Section 2.1.(b) Computing U- and V-disparity Image: A disparity image is ac-tually is a surface in a 3D volume that its width, height, and lengthdimension are U, V, and D respectively Where U × V is the size ofthe images in the input stereo pair, and D is the disparity search range.Given a disparity surface, the U–disparity image is obtained by project-ing the disparity image onto the U×V plane and accumulating the totalnumber of points in the disparity surface for each pixel in the U × Vplane If the projection and accumulation are done for the V ×D planethen we obtain the V-disparity image Further information is given inSection 2.2

Trang 32

(c) Estimating the ground plane: Because the frontal objects are onthe ground plane, it is reasonable to detect the ground plane ﬁrst andthe frontal objects later on An overview of the methods for this task isgiven in Section 2.2 The proposed method in this thesis for this task

is presented in Section 5.2

(d) Locating vehicles: Vehicles have the following recognizable featuresthat are utilized for the detection First, vehicles appear as verticalstraight lines in the V-disparity image Second, the disparity of thestraight lines keep the depth information of the vehicles Third, thelower end of the vertical straight lines contacts with the slanted line ofthe ground plane Fourth, the vehicles also appear as straight lines inthe U-disparity image By combing the above features, the vehicles can

be located in the image and the world coordinate

1.2.2.3 Multiple Cameras

According to [80, 81], although 3D information can be recovered from pairs ofstereo images by stereo matching, there are several advantages to use more thantwo cameras as follows:

1 Repeating texture in the input stereo pair makes the determination of responding pixels more diﬃcult and creates large matching errors Suchthe errors can be reduced by using more than two cameras, with a carefularrangement of the cameras

cor-2 Stereo cameras having short baseline produce small matching errors, but theycan not determine the distance of far objects In order to detect far objects,stereo cameras having long baseline are preferred However, long baselinestereo cameras produce large matching errors Therefore, it is reasonable tocombine a long and a short baseline together

3 Occlusion is one of the reasons that cause large matching errors Adding anew camera is expected to reduce such the errors, because the points that areoccluded before adding the new camera can be observed by the new camera

In the case of using more than two cameras, the arrangement of the location

of the cameras is important In [81], Todd et al used three cameras, and the

Trang 33

arrangement was ”L”-shape, i.e the ﬁrst two cameras had a baseline about 1.2m,and the third camera was displaced about 50cm horizontally and 30cm vertically.For performing multi-baseline stereo, the matching cost of one pixel in a image thatwas selected as the reference was computed by accumulating the cost of matchingthe point in the reference image with each candidate corresponding point in theremaining images They reported that the system could detect objects as small as14cm at range in excess of 100 m

In [82], Alberto et al used three cameras that were mounted on the same zontal bar The first two cameras had a baseline of 1.5m, and the third camerawas located in-between the first two, and it was displaced about 0.5m from thefirst camera By such the arrangement, they had three baselines, 0.5m, 1m, and1.5m In order to simplify the computation, the traditional stereo matching [83]was used used for each of the baseline They used such the vision system for theirautonomous vehicle, its name is TerraMax, which participated to the DARPAGrand Challenge 2005 competition During the competition, TerraMax selectedthe viewing baseline based on its speed

hori-1.2.2.4 Summary

1 Single Camera:

Without using any a priori knowledge about vehicles, for example, color,symmetry, and texture, exhaustive search has to move a rectangular windowaround the input image to probe the vehicles The idea of the exhaustivesearch method is quite simple, and its accuracy totally depends on the ro-bustness of the veriﬁcation module Obviously, the exhaustive search method

is time-consuming, especially in the case of using several windows of ent sizes to detect the vehicles in different distances and with different sizes.Moreover, several overlapped windows, which are slightly different in loca-tion or size, can be detected for only one vehicle in the input image

diﬀer-Symmetry was used by many research works However, in order to use onlythe symmetry information for generating vehicle candidates, it is necessary

to have a fast method to compute symmetry maps, and this problem is one ofthe most challenge task in using the symmetry information Moreover, eventhe symmetry maps have been computed from edge maps, the symmetrymaps are still not reliable because of symmetric objects in the background

Trang 34

and because of homogeneous regions In addition, the symmetry informationcan help to determine the left and the right border of vehicles only, thelocation of the top and the bottom border is still another problem

Color was used by a limited number of research works because the color

of vehicles is unpredictable, except red colors of tail-lights Moreover, thecolor of every object depends on many factors, for example, illumination,reﬂectance property, viewing geometry, and sensor parameters

Shadow was exploited by many studies The advantage of the shadow mation is that it is simple for computation, and it can be utilized to generatevehicle candidates However, using the shadow information incurs at leasttwo problems The first problem is that it is difficult to reliably detect theshadow This problem is due the color of the shadow depends on weather,the color of road, and the color of other objects The second problem isthat if we can detect the shadow then it is also difficult to generate vehiclecandidates using only the shadow information This problem is due to thelocations of the detected shadow and the vehicle associated with the shadowdepend on the direction of the sunlight

infor-Corner and edges are useful information for generating and verifying vehicles.However, it is hard to select threshold values for obtaining the edges and thecorners In addition, the background objects may contain corners, and edgessimilar to vehicles’ Therefore, it is more reasonable to detect vehicles byﬁrst locating the road and then using the corners and the edges inside theroad area only

Using texture does not incur the problem of determination of threshold ues like using corner and edge However, textures of vehicles and otherbackground objects may be similar, and it is diﬃcult to separate betweenthe vehicles and the other background objects by using only the textureinformation

val-Vehicle lights are good recognizable features for detecting vehicles, cially in night-time In addition, red colors in tail-lights are also other usefulfeatures However, associating the left and the right vehicle lights is a chal-lenging problems This problem is due to the fact that there are too manyother background objects that have red colors or have brightness similar tovehicle lights’

Trang 35

espe-Chapter 1 Introduction 24

The advantage of subtraction-based features is that such the features can

be computed quickly and that the features provide a good sign to generatevehicle candidates, especially in the case that the vehicles have a slightly highvelocity However, it is imperative to have a reliable method for modellingthe background image

Although optical flow provides rich information for generating vehicle dates, its computation is a time-consuming task It is also difficult to obtainreliable optical flows because of the following situations: the large displace-ment of corresponding pixels in consecutive frames that is caused by a highvelocity of the host vehicle, the lack of texture in the input image, and theshock and the vibration of camera during the movement of the host vehicle

candi-In addition, optical ﬂow and subtraction-based features as well can not beused for detecting parking vehicles

More importantly, all of the approaches that use single camera as mentionedabove are not able to estimate the distance from the host vehicle to frontalobjects Unfortunately, this demand is necessary for building intelligent andautonomous vehicles

2 Stereo Camera:

Both of the approaches using stereo cameras mentioned above are able toestimate the distance of frontal objects Compared to radar sensor, stereocameras can provide wider ﬁeld of view, more lateral accuracy, and lowercost

In IPM, the accuracy of vehicle detection totally depends on the accuracy

of the detection of triangles in the diﬀerence image In the ideal situation,IPM can produce a well-separated triangles in the diﬀerence image, so thetriangles are easily detected from a polar-based histogram However, IPM

is so sensitive to the height and the pitch of stereo cameras, and these twoparameters continuously change when the vehicles moves In combinationwith the texture and homogeneous regions in the rear view if the vehicles,the variation of the height and the pitch of the stereo cameras adds morenoises to the diﬀerence image, so it is diﬃcult to detect the triangles Inaddition, IPM can provide a tool to determine the distance, and the left, theright and the bottom border of the vehicles in the input image, it is anotherchallenge to detect the top border of the vehicles Performing IPM is also atime consuming task

Trang 36

The advantage of U- and V-disparity image is as follows First, it can providethe information about the ground plane, the profile of and the disparityassociated with the ground plane Second, it can accurately determine thebottom and the top border of frontal vehicles by considering the V-disparityimage However, it is difficult to determine the left and the right border ofthe vehicles This problem is due to the discontinuous of the profile of thevehicles in the U-disparity image

However, The U- and V-disparity image approach still have some challenges

as follows:

• In theory, the longer baseline a stereovision system has, the far rangethe system can achieve However, the longer baseline also means thatthe disparity search range will be increased As a consequence, the thetime for computing disparity images from stereo pairs captured by along baseline will be increased This problem is one of the challengesfor using stereovision

• The detection of vehicles by using U- and V-disparity image totally pends on disparity images, so increasing the quality of disparity images

de-in also another challenge, especially de-in the case of havde-ing homogenousregions in the input stereo pairs, which is quite popular in the realcondition

• The bottom of vehicles are determined at the contact points of thevehicles with the ground plane, so it is necessary to have a reliablemethod for estimating the ground plane, especially in the case of non-ﬂat road

• Accuracy locating the left and the right border of vehicles are tant because it aﬀects the accuracy of the veriﬁcation step All of theinvestigated studies did not provide a way for locating and extractingthe whole view of the vehicles

impor-3 Multiple Camera:

Generally, using more camera can provide better accuracy of the recover 3Dinformation, except some case where the addition cameras also can not seeoccluded objects points However, Adding more cameras also mean that ahigher cost has to be paid for the vision system and that the computation

Trang 37

time increase as well Until recent studies, stereo matching can be in time for stereo images of slight small sizes and of small disparity search rangeonly So, using multiple camera is impractical now

real-1.2.3 Fusion of Sensors

1.2.3.1 Radar + Camera

Giancarlo et al [2] used a radar to roughly estimate the distance from the hostvehicle to frontal objects and used another camera to capture the scene in front ofthe host vehicle Both of the radar and the camera were calibrated, so whenever afrontal object was detected by the radar, its location was converted to the imagecaptured by the camera Regions of interested in the image were generated andveriﬁed with the symmetry information Using radar information to limit the areathat will be searched for frontal vehicles was also used in [3]

On the other hand, in [4], whenever a frame was captured by a camera, the responding radar information at that time was obtained and used to model aprobability function P (Z), which shows how much possibility a vehicle would ap-pear at a depth Z Another probability function PZ(x, y) was also computed foreach frame PZ(x, y) shows how much possibility a vehicle would appear at loca-tion (x, y) for a given depth Z, and it as computed by using image data Localmaxima were determined from the product of the two probability functions, i.e

cor-PZ(x, y)∗ P (Z), and the locations of vehicles can be detected from the detectedlocal maxima

1.2.3.2 LIDAR + Camera

Almost of the studies on fusion of LIDAR and Camera [8, 9,12] performed vehicledetection by the following steps LIDAR sensors were used to measure the 3Dinformation of frontal objects The measured 3D information were converted toimage coordinates, before converting both of the LIDAR sensors and the camerawere calibrated The candidates of vehicles were generated by using 3D informa-tion from the LIDAR sensors After that, the candidates were veriﬁed by usingtechniques in pattern recognition, for example, Adaboost classiﬁer

Trang 38

Chapter 1 Introduction 271.2.3.3 Summary

Obviously, combining active sensors and passive sensors can provide better ability

of detection and classiﬁcation However, it also requires an extra cost as well Inaddition, a workable calibration method is also required for relating the points de-tected by the active sensors to the corresponding positions in the images captured

by the passive sensors

As discussed in Section1.2, LIDAR sensor, especially 3D laser scanner, is the bestmethod that can produce high quality 3D information for the scene surroundingthe host vehicle However, passive sensor is selected in this thesis because thefollowing reasons First, passive sensor that can produce high-resolution imageswith a high frame rate is now available with a reasonable price Meanwhile, theprice of high quality 3D laser scanners now is available with several thousands of USdollars Second, using the passive sensor can provide a plenty of visual informationthat are useful for further steps like vehicle classiﬁcation and pedestrian detection.Third, the passive sensor is selected because it does not cause the interference that

is the problem in using active sensor

For using passive sensor, Inverse Perspective Mapping is not selected because itworks well with only long baseline stereo cameras and because it requires thecorrect pitch angle of the stereo cameras

The selected approach for detecting vehicles, i.e., by combining V-disparity imagesand U-disparity images, needs to solve the three tasks as follows: (1) computingdisparity images, (2) estimating ground planes, and (3) locating vehicles Theproblems that are solved for each task are given in the following

1 For computing disparity images: Reducing the time for computing parity images and enhancing the quality of the disparity images are twoimportant problems They are selected to be solved in this thesis Themethod for reducing the computation time is proposed in this thesis and isexplained in Chapter 3 The approaches for enhancing the disparity imagesare proposed and are presented in Chapter 4

Trang 39

dis-Chapter 1 Introduction 28

2 For estimating ground planes: Based on the disparity images computed

by the proposed approaches, a new method that is able to reliable estimateground planes in the case of low texture stereo images is proposed Theproposed method for estimating the ground planes is also able to work withnon-ﬂat roads, which is a diﬃcult case in the ground plane estimation task.The proposed method for estimating the ground planes is presented in Chap-ter 5

3 For locating vehicles: The view of frontal vehicles should be extracted

as exactly as possible in order to increase the performance of the vehicleveriﬁcation task in Vehicle Detection This challenge is solved in this thesis.The proposed method for the challenge is presented in Chapter 5

This thesis contains seven chapters In Chapter 1, the author would like to deﬁnethe problems that he selects to solve in the thesis The related works will beintroduced in Chapter 2 The next three chapters, i.e., Chapter 3, 4, and 5,are reserved for explaining the techniques that are proposed by himself Theexperimental results and comparisons are presented in Chapter 6 He will makesome conclusions and introduces some further improvements in Chapter 7 Thecontent of each chapter is as follows:

Chapter 1: First, the author would like to introduce some reasons that motivatehim to select ”stereo-based vehicle detection” to study in his thesis Next, chal-lenging problems in the selected topic are then identiﬁed Finally, The thesis’sscope is deﬁned by listing selected problems that are solved in the thesis Inaddition, the structure of the thesis is also presented in this chapter

Chapter 2: This chapter is to introduce the related works of the selected topic.The proposed vehicle detection framework consists of three main tasks as follows:computing disparity images from incoming stereo pairs, estimating the groundplane from the computed disparity images, and locating vehicles in the input im-ages Each of these tasks corresponds to a research area Therefore, the content ofthis chapter is as the following In the ﬁrst section, the terminologies, the proce-dure and the challenges of Stereo Matching are introduced In the second section,the geometry of stereo cameras and U- and V-disparity images are ﬁrst presented

Trang 40

After that, existing methods for estimating the ground plane are reviewed Inthe third section, the way to combine the estimated ground plane and U- and V-disparity images is given The other stereo-based method that based on groupingdisparities is also explained in this chapter

Chapter 3: This chapter is to thoroughly explain the proposed coarse-to-ﬁne proach for eﬃciently computing disparity images The chapter begins with a shortintroduction to the idea of the proposal Next, new concepts related to ”samplingdisparity space” that is proposed in this thesis are explained and illustrated Afterthat, there are 4 sections that explain all of the steps in the proprosed approach.Some discussions are also added as the summary for this chapter

ap-Chapter 4: This chapter explains the proposed approaches for reliably computingdisparity images from low texture stereo images The chapter begins with a shortintroduction to the proposal’s idea After that, the explanation for each task in theproposed approaches is given The three main tasks in the proposed approachesare as follows: (a) computing pixel-matching costs, (b) aggregating costs, and (c)computing disparity images In (a), a new cost function that is proposed in thethesis is presented In (b), a new cost aggregation method that uses edge mapsduring the aggregation to achieve the robustness against low texture regions isthoroughly explained In (c), a new disparity image computation that is based ondynamic programming technique and ground control points is introduced Thischapter ends with a short discussion as its summary

Chapter 5: This chapter gives the explanation of the proposed method for based ground plane estimation and vehicle detection The chapter begins with

stereo-a short introduction to the proposstereo-al’s idestereo-a After thstereo-at, ground plstereo-ane estimstereo-ationand vehicle detection are given For ground plane estimation, the procedure forobtaining U- and V-disparity images is ﬁrst presented The dynamic programmingbased method for estimating ground plane is then introduced Finally, the methodfor combining U- and V-disparity images to locate vehicles is explained Thischapter ends with a short discussion as its summary

Chapter 6: This chapter presents experimental results have been done in thisthesis The results are organized into 3 parts corresponding to each proposal in thethesis First, the proposed coarse-to-ﬁne approach for computing disparity images

is evaluated by comparing to a test-bed that is well-known in stereo matching.Next, the robustness of the proposed approaches for computing disparity images

Định dạng
Số trang	154
Dung lượng	5,9 MB