The proposed vehicle detection approach accepts stereoimages that are captured by using stereo cameras as its input, and it performsthree main tasks as the following.. Finally, on-roadve
Trang 1Study on New Approaches for Vehicle Detection
Using Stereoscopic Information
Advisor Prof Dr Shozo KONDO
Course of Science and Technology
Graduate School of Science and Technology
Tokai University
Le Thanh Sach
Trang 2Vision-based vehicle detection is a useful, active, and challenging research field It
is helpful for many applications, for example, Intelligent Transportation Systemsand Robot Vision Generally, a detection framework consists of two steps; one step
is to generate candidate vehicles’ regions from input images, and the other step is toverify the generated candidates For the verification step, many existing researchesdemonstrated that not only vehicles can be discriminated from the backgroundscene but also vehicles can be classified into hierarchical sub-classes For example,vehicles can be classified into Car, Bus, and Truck; and then Car can be subdividedinto Honda, Toyota, and so on However, efficient and reliable techniques to obtaincandidate vehicles’ regions are still missing In addition, estimating the distancefrom the detected vehicles to the host vehicle is also another challenge In order
to solve those problems, we propose to use stereoscopic information for vehicledetection in this thesis The proposed vehicle detection approach accepts stereoimages that are captured by using stereo cameras as its input, and it performsthree main tasks as the following First, disparity images are computed from theinput stereo images There are two challenging problems in this task, reducing thecomputation time and increasing the quality of disparity images In order to solvethese two problems, we propose new approaches that can efficiently and reliablyperform stereo matching Second, ground planes are estimated from the computeddisparity images For this task, we propose a method that is named as DynamicProgramming Based method for ground plane estimation The proposed method
is able work in the case that roads are not flat In this task, U- and V-disparityimages are reliably produced from the disparity images as well Finally, on-roadvehicles are located in the input images by combining the estimated ground planesand the U- and the V-disparity images by a proposed differential-based method.The distance and the location of the located vehicles with respect to the stereocameras coordinate are derived from stereo camera parameters and the disparities
of the located vehicles Experimental results on both of artificial stereo imagesand real stereo images demonstrate that disparity images can be efficiently andreliably computed even in the low texture condition and that vehicles can bestraightforwardly located by the proposed differential-based method
Trang 31.1 Motivation 1
1.2 Problem 3
1.2.1 Active Sensor 4
1.2.1.1 Radar sensor 5
1.2.1.2 LIDAR sensor 5
1.2.1.3 Summary 6
1.2.2 Passive Sensor 7
1.2.2.1 Single Camera 8
1.2.2.2 Stereo Cameras 17
1.2.2.3 Multiple Cameras 21
1.2.2.4 Summary 22
1.2.3 Fusion of Sensors 26
1.2.3.1 Radar + Camera 26
1.2.3.2 LIDAR + Camera 26
1.2.3.3 Summary 27
1.3 Scope of the Thesis 27
1.4 Structure of the Thesis 28
2 Related Works 31 2.1 Stereo Matching 31
2.1.1 Introduction 31
2.1.2 Procedure 34
2.1.2.1 Pre-processing 34
2.1.2.2 Pixel-Matching Computation 34
ii
Trang 4Contents iii
2.1.2.3 Cost Aggregation 35
2.1.2.4 Disparity Computation 35
2.1.2.5 Refinement 37
2.1.3 Validation 37
2.1.4 Efficiency of Stereo Matching 38
2.1.5 Stereo Matching Evaluation 38
2.1.6 Stereo Matching Summary 38
2.2 Stereo-Based Ground Plane Estimation 39
2.2.1 Introduction 39
2.2.2 Camera Geometry 40
2.2.3 Hough Transform Based Method 42
2.2.4 Fitting-Based Method 42
2.2.4.1 Least-Squares Method 42
2.2.4.2 Iteratively Reweighted Least-Squares Method 43
2.2.4.3 RANSAC Method 44
2.2.5 Dynamic Programming Based Method 44
2.2.6 Parallel-Based Method 45
2.2.7 Polar Histogram Based Method 45
2.2.8 Ground Plane Estimation Summary 46
2.3 Stereo-Based Vehicle Detection 46
2.3.1 Introduction 46
2.3.2 U- and V-Disparity Images Combination 47
2.3.3 Disparity Grouping 48
2.3.4 Vehicle Detection Summary 48
3 A New Coarse-To-Fine Method for Computing Disparity Images 49 3.1 Outline of the Proposal 49
3.2 Sampling of Disparity Spaces 51
3.3 Coarsest Map Estimation 55
3.4 Active Region Determination 56
3.5 Cost Aggregation 58
3.6 Disparity Map Computation 60
3.7 Summary 61
4 Robust Approaches for Stereo Matching 62 4.1 Outline of the Proposal 62
4.2 Pixel-Matching Cost Computation 65
4.3 Cost Aggregation 66
4.3.1 Relationship: cost volumes and edge maps 68
4.3.2 Horizontal Aggregation 70
4.3.3 Vertical Aggregation 79
4.4 Disparity Image Computation 80
4.4.1 GCP Selection 80
4.4.2 GCP-DP 81
Trang 5Contents iv
4.5 Summary 82
5 Stereo-Based Ground Plane Estimation and Vehicle Detection 84 5.1 Outline of the Proposal 84
5.2 Ground Plane Estimation 85
5.2.1 V-Disparity Image and V-D Cost Image Computation 85
5.2.2 DP-Based Estimation 87
5.2.3 Ground Plane’s Parameters 90
5.3 Vehicle Detection 91
5.3.1 Vehicle’s Disparity and Lower Mark Determination 91
5.3.2 Upper Mark Determination 93
5.3.3 Left and Right Mark Determination 94
5.3.3.1 Creating Vehicle-Disparity Image 95
5.3.3.2 Creating Density Histogram 97
5.3.3.3 Computing Differentials 97
5.3.3.4 Generating Pairs of Left Mark and Right Mark 99
5.3.3.5 Merging Pairs of Left Mark and Right Mark 101
5.3.4 Vehicle Verification 104
5.4 Summary 104
6 Experimental Results and Discussions 105 6.1 Outline of the Experiments 105
6.2 A New Coarse-To-Fine Method for Computing Disparity Images 106
6.3 Robust Approaches for Stereo Matching 110
6.3.1 Qualitative Comparisons 112
6.3.2 Quantitative Comparisons 113
6.4 Ground Plane Estimation 115
6.5 Vehicle Detection 119
6.6 Summary 121
7 Conclusion 126 7.1 Summary of Results 126
7.2 Future Research Directions 127
Trang 6List of Figures
1.1 Overview of existing approaches 4
1.2 The demonstration of Inverse Perspective Mapping 18
2.1 An example of stereo cameras 32
2.2 Geometry of stereo cameras 40
2.3 Ground Plane in Disparity Space 41
2.4 Demonstration of U- and V-disparity image combination 47
3.1 The coarse-to-fine method’s idea 50
3.2 An active region’s example 50
3.3 Outline of the proposed method 52
3.4 An original image and its low resolution image 53
3.5 Matching low resolution stereo images 54
3.6 The sampled disparity space for the coarsest map 56
3.7 Active region 57
3.8 Modified active region 58
3.9 Description of active region 59
4.1 Examples of homogeneous regions 63
4.2 The proposed stereo matching system 64
4.3 Horizontal correspondence between cost volumes and edge maps 70
4.4 Vertical correspondence between cost volumes and edge maps 71
4.5 Examples of edge-segments and mid-segments 72
4.6 Proposed data structures 73
5.1 Examples of V-Disparity image and V-D cost image 86
5.2 Example of the ground plane’s profile when the road is empty 88
5.3 Example of the ground plane’s profile when there is a large on-road vehicle 89
5.4 Examples of Ground Plane’s Parameters 90
5.5 Five parameters of a frontal vehicle 92
5.6 An example of a D-costs array 93
5.7 An example of a V-costs array 94
5.8 Determination procedure of left and right marks 95
5.9 Illustration for the left and the right mark determination 96
5.10 Vehicle-disparity image formulation 97
5.11 Fitting to the vertical edge on the vehicle’s rear view 101
v
Trang 7List of Figures vi
6.1 Qualitative illustration of the proposed coarse-to-fine method 1 107
6.2 Qualitative illustration of the proposed coarse-to-fine method 2 108
6.3 Quantitative comparison for the computation time 109
6.4 Quantitative comparison for the accuracy of disparity images 110
6.5 Qualitative comparisons for the artificial and the real sequence 114
6.6 Quantitative comparisons for the artificial sequence 115
6.7 Quantitative comparisons for the artificial and the real sequence 116
6.8 Classification of road pixels 117
6.9 Artificial Sequence, examples of ground planes’ parameters 118
6.10 Real Sequence, examples of ground planes’ parameters 119
6.11 Artificial Sequence, ground planes’ parameters 120
6.12 Artificial Sequence, Comparison of ground planes’ parameters 121
6.13 Real Sequence, ground planes’ parameters 122
6.14 Illustration of ground plane estimation 124
6.15 Examples of vehicle detection 125
Trang 8List of Tables
1.1 List of major research groups in the field 2
1.2 List of major conferences and transactions in the field 3
3.1 Outline of cost computation 60
4.1 Cost aggregation guided by edge maps: pseudo code 75
4.2 Cost aggregation guided by edge maps: explanation 77
6.1 Coarse-to-Fine Method: Parameters used to compute disparity maps106 6.2 Robust Approaches: Parameters used to compute disparity images 111 6.3 Parameters used for vehicle detection 123
vii
Trang 9To my parents, my wife, and Ms Quynh-Anh.
They always give me endless love and strong motivation.
viii
Trang 10Δ U , Δ V , Δ d : Sample steps in U, V, and
D-direction in disparity spaces.
C aggr (u, v, d) : A 3D cost volume that
contains aggregated pixel-matching
costs C aggr (u, v, d) can be either
C aggrL (u, v, d) or C aggrR (u, v, d)
C L
aggr (u, v, d) : A 3D cost volume that contains
aggregated pixel-matching costs In
the computation, the reference
im-age is the left imim-age of the input pair
of stereo images.
C R
aggr (u, v, d) : A 3D cost volume that contains
aggregated pixel-matching costs In
the computation, the reference
im-age is the right imim-age of the input
pair of stereo images.
C pix (u, v, d) : A 3D cost volume that contains
pixel-matching costs C pix (u, v, d)
pix (u, v, d) : A 3D cost volume that contains
pixel-matching costs In the
com-putation, the reference image is the
left image of the input pair of stereo
images.
C R
pix (u, v, d) : A 3D cost volume that
con-tains pixel-matching costs In the
computation, the reference image is
the right image of the input pair of
stereo images.
candi mark lef t and candi mark right :
Can-didate left mark and canCan-didate right mark of detected vehicles.
candi mark lower and candi mark upper :
Candidate lower mark and date upper mark of detected vehi- cles.
candi-cost(v, d) : V-D cost image.
d inf : The disparity of objects at the
in-finity.
d vhd : The disparity of detected vehicles densi(u) : Density histogram.
diff (u) and diff acc (u) : Differentials and
Accumulated differentials tively.
respec-I vdi (u, v) : Vehicle-disparity image.
I vd (v, d) : V-disparity image.
map(u, v) : Disparity image map(u, v)
rep-resents for either map L (u, v) or map R (u, v)
map L (u, v) : Left disparity image.
map R (u, v) : Right disparity image.
mark lower , mark upper , mark lef t , and mark right
: lower mark, upper mark, left mark, and right mark of detected vehicles.
path DP : The path in V-D cost image that is
discovered by the proposed dynamic programming based method.
pathgpDP : The ground plane’s profile that is
extracted from path DP pathgpLF : The ground plane’s profile that
is obtained by fitting pathgpDP to a straight line.
W U × W V : The size of aggregation windows.
ix
Trang 11D : The width of disparity search
range.
DP : Dynamic Programming.
FPGA : Field Programmable Gate Array
GPU : Graphic Processing Unit.
IPM : Inverse Perspective Mapping.
MAE : Mean of Absolute Errors.
SSE : Streaming Single Instruction
Mul-tiple Data Extensions.
U and V : The width and the height of input
images.
WTA : Winner-Takes-All, a method for
determining the disparity of pixels.
Trang 12of the host vehicles are referred to as the frontal vehicles and the frontal obstaclesrespectively in this thesis Terminologies ”preceding vehicles/obstacles” are in-terchanged with ”frontal vehicles/obstacles” to avoid the boring repetitious Thetechniques and the technologies developed in Vehicle Detection enable the host ve-hicles to sense the surrounding environment, so they are important and useful formany applications, for example, intelligent vehicles, autonomous vehicles, robotnavigation systems, traffic management systems, and so on.
Vehicle Detection and and its related researches in Intelligent Transport Systemsare currently active research fields, as demonstrated by a large number of partic-ipants and research groups around the world Some major groups are listed inTable1.1 Research groups proposed different ways to sense the environment infront of host vehicles Some groups used active sensors, e.g., Radar and LiDar.Meanwhile the other groups used passive sensors, i.e., normal cameras, in combi-nation with computer vision techniques Especially, the research group in the firstrow of Table1.1 demonstrated that frontal obstacles can be detected by using onlyvision information and that the distance from the detected obstacles to the hostvehicles can be also estimated
1
Trang 13Chapter 1 Introduction 2
Table 1.1: List of major research groups in the field
Name Explanation
PARMA This research group is from Parma University, Italy The
re-searchers in this group are the founders of GOLD - Generic stacle and Lane Detection System This group was also responsiblefor building the stereovision system of TerraMax which is an au-tonomous vehicle participating to the DARPA Grand Challenge.CMU This is one the largest and earliest research group in the field, it
Ob-is from Carnegie Mellon University, USA It Ob-is well-known withautonomous vehicles named NavLab The researchers in this group
is the founders of SCARF (Supervised Classification Applied toRoad Following), UNSCARF (UNSupervised Classification Applied
to Road Following), YARF (Yet Another Road Follower), and so
on In the DARPA Grand Challenge, their team (Red team) wasone of the leaders in the competition
LIVIC This group is from INRETS, France The researchers in this groups
are the founders of the terminology V-Disparity Image
Vehicle Detection is strongly encouraged by academe and industry as well Inacademe, the research is supported by a system of special conferences and trans-actions, as shown in Table 1.2, and by a crowded competition, e.g., the DARPAGrand Challenge1 In industry, the hardwares that are used for vehicle detectionand related researches in Intelligent Transportation Systems have achieved an im-portant improvement during the last few years For example, recent laser scannerscan perform the scanning task with high spatial resolution at high speeds, and re-cent stereo cameras can produce high resolution images with high frame rate Inaddition, in order to help the processing of a massive amount of visual information
in images, special instruction sets2 and special hardwares3 have been also released.Recent years, many prototypes for intelligent or autonomous vehicles have beendeveloped and demonstrated in the DARPA Grand Challenge, and there was a lot
of effort of researchers in Intelligent Transportation Systems However, the ability
of locate vehicles/obstacles and to estimate their distances to host vehicles are still
a challenging problem
Because of the usefulness, the activeness, and the challenge of vehicle detection,
it is selected to be studied in this thesis Based on promising results of the periments on using visual information for detecting vehicles and obstacles, stereo
ex-1Defense Advanced Research Projects Agency: http://www.darpa.mil/grandchallenge/
2Intel: SSE, AMD: 3DNow
3GPU: Graphic Processing Units
Trang 14Chapter 1 Introduction 3
Table 1.2: List of major conferences and transactions in the field
Short Name Explanation
ITSC IEEE Conference on Intelligent Transportation Systems
IVS IEEE Intelligent Vehicles Symposium
ITST IEEE Transactions on Intelligent Transportation Systems
ICIP IEEE Conference on Image Processing
IPT IEEE Transactions on Image Processing
CVPR IEEE Conference on Computer Vision and Pattern RecognitionPAMI IEEE Transactions on Pattern Analysis and Machine Intelligence
images and computer vision techniques are used to obtain 3D information aboutthe environment in front of host vehicles This information is then used to detectvehicles and obstacles in this thesis Especially, the whole rear views of frontalvehicles are detected and separated in such a way that they are able to be usedwith further applications like vehicle classification
This section presents an overview of existing approaches in vehicle detection Theadvantage and disadvantage for each of the approaches are also identified Thissection creates a basis for selecting problems that are solved in this thesis Theselected problems are outlined in the next section
As shown in FIGURE1.1, existing approaches can be classified by the sensor that
is used to detect vehicles, so we have the approaches corresponding to using activesensor, passive sensor, or fusion of sensors for vehicle detection The terminology
”active sensor” is used to mean that in order to estimate the distance of frontalobjects, the sensor actively sends a probing signal and measures the elapsed time
of the reflected signal that is caused by the frontal objects [10] The elapsed time isthen used to infer the distance of the frontal objects Meanwhile, the terminology
”passive sensor” is used to indicate optical sensors, i.e normal cameras, whichproduce images by focusing the light on photoactive regions The captured imagesare then used to detect the frontal objects The way to detect vehicles in each ofthe approaches mentioned above is introduced in the following subsections Thereare also short discussions about the advantage and the disadvantage at the end ofeach of the approaches
Trang 15is shorter than the LIDAR sensor signal’s Radar uses radio waves or microwaves,meanwhile LIDAR uses signals having higher frequencies, for example, ultravio-let, visible, or near infrared Some typical research works using active sensors aregiven in the following subsections.
Trang 16Chapter 1 Introduction 51.2.1.1 Radar sensor
Yiguang et al [1] mounted a pulse radar sensor on a bar that was hung 6 metersover a lane so that the sensor was in the middle of the lane The area right belowthe radar was swept for every 0.01 seconds to measure the height of vehicles Theoutput of the radar, which was the correlation between the probing signal andthe reflected signals within 1.1 meters to 6.9 meters, for every 150 measurementswere converted to a 2D-image, which was called a height image If there are threereflecting points in the height image at a certain height then there is a vehicle atthat height
Only the distance that is measured by radar sensor is not enough for reliablydetecting vehicles, so many existing approaches combined the distance with visualinformation captured by vision sensor to detect vehicles, for example [2 4] Thefusion of radar sensor and camera is introduced in Section 1.2.3.2
1.2.1.2 LIDAR sensor
LIDAR sensor is widely used for measuring 3D information in the scene ing the host vehicle For example, all of the first three leading competitors inthe DARPA Grand Challenge 2005 [5 7] used LIDAR sensors for obtaining 3Dinformation
surround-In [5], range information was obtained by using only LIDAR sensors The sensorswere angled downward to scan the terrain in front of the host vehicle as it moves.The five sensors were mounted at five different angles Each sensor generates
a vector of 181 range measurements spaced 0.5 degrees apart Projecting themeasurements into the global coordinate frame according to the estimated pose ofthe vehicle results in a 3-D point cloud for each laser Obstacle detection on themeasured points can be formulated as a classification problem
In [6], RedTeam used 7 LIDAR sensors for sensing the scene, one sensor for taining long range, four sensors for detecting obstacles in short range, and twosensors for detecting obstacles and for mapping terrain topology In combinationwith one radar sensor, the 3D information can be obtained and used for detectingobstacles
Trang 17ob-Chapter 1 Introduction 6
In [7], 3D information was obtained by combing LIDAR sensors and stereovision.For LIDAR sensors, three SICK LMS-291 LIDARs and a IBEO ALASCA LIDARwere used The three SICK LMS-291 LIDARs were used for positive and negativeobstacle detection The two forward facing SICK LIDARs were mounted on theoutermost edges of the front rollbar They were pointed 10 degrees down and 25degrees outward from the truck so that there is good coverage on extreme turns.The rear facing LIDAR was mounted near the cargo bed height in the middle of thetruck and was pointed down for negative obstacle detection The IBEO ALASCALIDAR was a 4-plane scanner that was used for positive obstacle detection TheIBEO LIDAR was mounted level in the front bumper and had two planes that scantoward the ground and two planes that scan toward the sky With a range of 80meters and a resolution of 0.25 degrees the IBEO can detect obstacles accurately
at long and close range The 240-degree scan area allows the IBEO to see obstaclesaround upcoming turns
LIDAR provides excellent range information to different objects However, it
is difficult to recognize these objects as vehicles from range information alone.Therefore, there were many research works that fused LIDAR sensors and passivesensors [8, 9]
The most advantage of LIDAR sensor is that it can produce high accuracy 3Dinformation in clear weather However, LIDAR sensor is very sensitive to badweather, for example, fog, rain, or snow The spatial resolution and the price of
Trang 18Chapter 1 Introduction 7
LIDAR sensor depend on the type of the LIDAR sensor Nowadays, there aresome 3D laser scanners that produce very high resolution with very high accuracy,however such sensors are very expensive too, from 70 to 150 thousands of USD[11], by 2008 In order to over come the cost, some intelligent vehicle prototypesused only laser line scanners [5, 6] or laser multiple plane scanners (four planessupported by IBEO scanner) [7]
Both of radar and LIDAR sensor can provide only distance of frontal objects, and
it can not provide rich information, such as color, texture, and edges as in visionsensor Using only distance can not produce high accuracy classification system
in further steps of Intelligent Transportation Systems [9]
1.2.2 Passive Sensor
Passive sensor is used as a tool to capture images of the scene surrounding thehost vehicles, i.e., the scene in front, in back, on the left side, or on the rightside of the host vehicles The detection of vehicles and obstacles is based on theobtained images by using the techniques in Image Processing, Computer Vision,and Pattern Recognition
Generally, the detection of vehicles contains a step to generate candidates forvehicles and another step to verify the generated candidates The purpose of thefirst step is to generate several regions of interest that are likely to be the image ofvehicles An important criteria of the first step is that the vehicles should be one ofthe generated candidates to avoid the missing in the detection For frontal vehicledetection, it is also another challenge to align the detected regions of interest withthe horizonal and the vertical boundary edges of the rear view of the vehicles Suchalignment is important because it can improve the performance in the verificationstep
The existing approaches, which are explained in the following subsections, for erating vehicle candidates depends on the number of cameras and the informationthat are available in the detection
gen-Based on the generated candidates in the previous step, the verification step is
to decide whether a candidate is a vehicle or not This task can be done by thetechniques in Pattern Recognition, for example, K-NN, Bayessian Classifier, andSupport Vector Machine
Trang 19Chapter 1 Introduction 81.2.2.1 Single Camera
1 Exhaustive Search: Based on the fact that vehicles appear as rectangularregions at some locations in the input image, this approach moves severalrectangular windows with different sizes from left to right and from top tobottom of the input image Each region produced by a move is considered
as a candidate and is verified in the verification step Several windows withdifferent sizes are used to enable the system to detect the same vehicle atdifferent distances and to detect different vehicles that have different sizes.Some typical research works that performed the detection by this approachare [13], [14], and [15] This approach is simple for understanding However,
in practical, this approach need very strong computing resources to generate
a large number of candidates and to verify them also
2 Selective Search: In order to avoid the computation problem as in theexhaustive search, this approach uses specific information that is supposed
to be known in advance to limit the regions that are search for vehicles.There are several kinds of the specific information, and they are introduced
as follows
2.1 Symmetry:
Generally, the rear and the front view of vehicles are usually symmetricaround a vertical and middle axis This characteristic was utilized todetect vehicles in the input image by many existing research works Thedifference among the existing research works that used the symmetricproperty is the way to use and the way to measure the symmetricproperty, which is referred to as the symmetry degree More specially,the symmetric property can be used to generate vehicle candidates fromthe input image [16, 17, 21, 25, 26, 39, 45] or to verify the candidates[18–20, 22] that are generated by other ways There are several ways
to measure the symmetry degree as follows
Michael et al [16, 17], one of the earliest researchers who used thesymmetric property, considered w pixels that are centered around a po-sition xsin a certain scan-line of the input image as function fxs,w(x) ofintensities, i.e., gray levels This function was decomposed into an evenand an odd function After that, the symmetry degree was measured insuch a way that it is proportional to the ratio of the energy of the even
Trang 20Chapter 1 Introduction 9
function to the odd function’s The symmetry degree was measuredfor every pixel in the input image and for every possible symmetriccenter, i.e., xs, and for every possible symmetric interval, i.e., w Thesymmetry degrees of a number of scan-lines were also accumulated toachieve a reliable measurement Large symmetry degrees were selected
by thresholding, their symmetric axes and intervals, i.e., < xs, w >, fine the left and the right border of the detected vehicles The contour
de-of the detected vehicles were extracted by using Symmetry EnhancingEdge Detector [17]
In [45], Sach et al used the symmetric property for verifying the vehiclecandidate He measured the symmetry degree of the candidate by thevariance of symmetric centers that were determined for horizontal scan-lines in the candidate The small variance the symmetric centers have,the large symmetry degree the candidate owns
In [21], Broggi et al used the symmetric property for generating thecandidates He computed the symmetry degree by sn2 on edge maps,where s is the number of the symmetric points, and n is the totalnumber of white pixels Being similar to [16,17], the symmetry degreewas measured for every pixels and for every possible pair < xs, w >.The left and the right border of the detected vehicles were determinedfrom the pairs < xs, w > that have large symmetry degrees The shadowproperty was used to detect the bottom border of the detected vehicles.The top border was determined by the aspect ratio of the width to theheight of the detected vehicles
Tie et al [22] divided the candidate region into a left part and a rightpart which are separated by a vertical and middle axis The symmetrydegree was measured in such a way that it is proportional to the ratio
of the average of the left and the right part to the difference betweenthe left and the right part
The symmetric property was widely used in Vehicle Detection Thereare several existing ways for measuring the symmetry degree, all of themcan perform the measurement on a single candidate quickly However,
in order to generate candidates, it is necessary to perform the ment for every pixel, for every possible symmetric axis, and for everypossible vehicle widths By this way, the detection becomes a timeconsuming task Moreover, the symmetry approach also has problem
Trang 21measure-Chapter 1 Introduction 10
in the case that there exist low texture regions in the input image andthat there exist objects that own the symmetric property
2.2 Color:
In Intelligent Transport Systems, using color for detecting lane and road
is more popular than for detecting vehicles This trend is due to thefact that the road’s colors distribute in some certain regions of a colorspace, meanwhile the vehicle’s colors disperse randomly in the colorspace Therefore, the most important characteristic of color approach
is the way to model the distribution of interested colors, which are road’scolors in Road Detection and vehicle’s colors in Vehicle Detection
In [50], Crisman et al proposed two methods for determining the tribution of road’s colors The first method is SCARF (SupervisedClassification Applied to Road Following), and the other is UNSCARF(UNSupervised Classification Applied to Road Following) In order toobtain road’s colors, he used two color cameras that were separated by
dis-a short distdis-ance He tredis-ated the two imdis-ages cdis-aptured from the twocameras as if they were taken from a single camera The objective ofusing two different cameras for capturing the same scene is to enlargevariation of colors in pixels which now have 6 dimensions In SCARF,sample road’s colors, which now were in a 6-dimensions color space,were assumed to have the Gaussian model, so they are used to estimatethe unknown parameters of the model After that, incoming pixelswere classified into road or non-road class In UNSCARF, the position
of pixels, i.e., x-index and y-index, were also combined with color formation to form vectors of 5 components (if only single image wasused) or 8 components (if both of the two images were used) Road areclassified by a clustering technique
in-In [51], a set of sample colors for road was collected After that, Guo
et al used spheres to model the regions in the color space that containthe sample road colors Color space Lu∗v∗ was used in order to achievethe uniformity of perception Roads were detected by checking eachpixel in input images to decide whether it was inside or outside of theapproximated region
Luo et al [48] proposed a method to normalize sample colors of carsand the background The color of cars and the color of the backgroundwere assumed to follow the Gaussian model Thereby, all pixels in the
Trang 22Chapter 1 Introduction 11
input image could be classified as foreground (cars) or the backgroundaccording to the Bayesian classifier The pixels that were classified asthe foreground are good suggestions for hypothesizing the location ofcars in the image
Obviously, in order to estimate the likelihood of vehicle’s colors, it ishard to assume that a certain color has more possibility than the otherones in the color space Therefore, in [31], Wang et al assumed thatthe colors of vehicles have uniform distribution, i.e., all of the colors
in side the color space have the same possibility to be vehicle’s colors.Based on that assumption, he used Hidden Markov Model to learn thecolor of foreground, shadows, and the background After that, vehiclescould be detected
In another approach, Sach et al [45] focused on the red colors oftail lights of cars He obtained the sample red colors and fit them to aGaussian mixture model After that, the red areas of the tail lights weredetected and grouped together to generate car back-view candidates.2.3 Shadow:
Under the sunshine condition, a vehicle create a shadow on road foritself There are two signs associated with the shadow The first one
is that the shadow is darker than the road’s colors The second one
is that the shadow is next to the vehicle These two signs are used todetect vehicles in many existing research works
In [27], which is one of the simplest and earliest method for detectingshadows and vehicles, shadows were detected from gray images Theidea is that road’s intensities were assumed to have Gaussian distri-bution Hence, vehicle detection systems contains the following steps.First, the road intensities were collected and fitted to a Gaussian model.Next, the threshold that was expected to separate the shadow’s pixelsand road’s was determined by Tsh = (μ− 3σ), where μ and σ are themean and the variance of the sample road’s intensities respectively Af-ter that, the shadows were detected by thresholding the input imagewith Tsh Finally, the shadows were assumed to be located underneaththe detected vehicles, so bounding boxes of the detected vehicles weregenerated so that they lie above the detected shadows and that their
Trang 23Chapter 1 Introduction 12
aspect ratio satisfy a predefined constraint The way for detecting ows as mentioned above is widely used in many other research works[18, 20,32, 36]
shad-In [30], gray image sequence was used as the input The intensities ofthe background (road) and shadows were assumed to have Gaussiandistribution Meanwhile, the distribution of foreground objects wasassumed to be uniform Every pixel belongs to one of the three classes
as follows: the background, the foreground, and the shadow Jien et
al proposed to use HMM of 3 states, which correspond to the threeclasses, for modeling the transition of the class of pixels from frame toframe After training, the trained HMMs were used to classify pixels inincoming frames to one of the above three classes An modification of[30] that can work with RGB color image is given in [31]
Similar to [27], the thresholding technique was also used in [28, 29].However, threshold values were defined on RGB [28] and YCbCr [29]color space rather than in gray intensities
2.4 Vertical/Horizontal Edges:
Generally, the rear and the front view of vehicles that have more than
2 wheels contain many vertical and many horizontal edges The edgesare caused by components in the vehicles, for example, license plate,bumper, and spoiler and caused by the difference between the color andthe depth of the vehicles and the objects’s surrounding them So, theedge information is useful for detecting vehicles, and in fact it is used bymany existing research works The difference among the existing works
in using the edge information is characterized by the way to utilize thisinformation There are several ways as follows
In [23, 24], vertical and horizontal edges were extracted and olded, and edge-based symmetry degrees were computed on those do-mains The final symmetry degree that was obtained by combining theedge-based and the intensity-based symmetry degree was reliable andable to solve the uniform problem that results highly symmetry de-gree for homogeneous regions Based on the reliable symmetry degree,vehicles could be detected from the input image
thresh-Goerick et al [37] proposed a method called Local Orientation Coding(LOC) to extract edge information An image obtained by this methodconsists of strings of binary code representing the directional gray-level
Trang 24Chapter 1 Introduction 13
variation in the pixels neighborhood These codes carry essentiallyedge information Handmann et al [38] also used LOC, together withshadow information, for vehicle detection
Mathews et al [36] generated candidates by the following procedure.First, the gradient image that was used to detect vertical edges wasobtained from the input gray image Next, the gradient image wasprojected onto a horizontal line to obtain a histogram Local maximumsthat indicate the locations of vertical edges were determined, so the leftand the right border of the region of interested could be identified.After that, the bottom border was determined by detecting shadows.Finally, the top border was inferred so that the width and the height
of the region of interested are the same
The left and right border of vehicles that are far from the camera arehard to be detected by [36] because their lengths are so small There-fore, Gwang et al [39] mapped the input image to the top view by usingInverse Perspective Mapping (IPM) After mapping , the left and theright border are longer , so they can be detected from the edge map ofthe remapped image The edge map of the original image was projectedonto a vertical line to detect bottom border of vehicles The top borderwas inferred so that the regions of interested have the same width andheight A method that is similar to [36, 39], i.e., borders are detected
by projecting edge maps, was used in [41] to generate candidates forvehicles
Parodi and Piccioli [40] proposed to extract the general structure of atraffic scene by first segmenting an image into four regions: pavement,sky, and two lateral regions using edge grouping Groups of horizontaledges on the detected pavement were then considered for hypothesizingthe presence of vehicles
2.5 Corners:
A corner can be defined as the intersection of two edges, or it can bedefined as a point where there are two dominant and different edgeorientations in a local neighbourhood of the point Generally, the rearand the front view of vehicles have several corners that are caused bythe boundary of the vehicles (e.g., upper-left, lower-left, lower-right,and upper-right corner) or by many sub-regions inside the vehicle’s
Trang 25lower-On the other hand, Harris detector [47] was used in [48, 49] to detectcorners in the input image The set of the detected corners were thenused to verify vehicle candidates For example, in [48], if a candidatehas less than 0.5λC , where λC was the average number of corners in
a vehicle class, then it was rejected Meanwhile, in [49], the Hausdorffdistance between set A and every set B in the vehicle database wascomputed, where A was the set of the detected corners in the candidate,and B is representative for a set of corners that was detected for a vehicleclass in the database The candidate was then classified into the classthat has the minimum distance
by using the shadow information Talinke et al [33] used four measures(Energy, Contrast, Entropy, and Correlation), to generate candidates
in combination with the shadow information Meanwhile, Hartwig et
al [35] used 6 measures for generating vehicle candidates
2.7 Vehicle Lights:
One of the most recognizable features of 4-wheel vehicles is that therear and the front of the vehicles have two lights, two head-lights in thefront view and two tail-light in the rear view Generally, the two vehiclelights own the following characteristics First, the shape and the size
Trang 26Chapter 1 Introduction 15
of the two vehicle lights are similar Second, the distance between thetwo vehicle lights also satisfies a certain constraint that depends on thevehicle type Third, the tail-lights usually contain red colors Fourth,the vehicle lights are very bright at night-time Based on the aboverecognizable characteristics, vehicles can be detected
In [42], Cucchiara et al proposed a method for detecting head-lights
in night-time at follows First, the scene that are outside the roadwas removed by masking with some predefined masks The analysis ofthe resultant image is simpler because street-lamps were also removed.Next, the binary image was computed by thresholding Finally, pairs
of head-lights were detected by utilizing the features of head-lights, forexample, the shape, the size, and the minimum distance between thetwo lights in the pairs The minimum rectangular box that contains thepair of the detected head-lights was generated as vehicle candidates
In [43–45], tail-lights were detected by analyzing the spatial relationshipbetween the two tail-lights, e.g., the shape, the size, and the distance,and by utilizing red colors in the tail-lights as well
3 Motion-Based Search:
3.1 Subtraction-Based Features:
Most of researches that use vision-based sensors alone follow this method
of candidate generation Candidates are generated by a subtraction tween input images and background or between two consecutive images
be-in image sequences [52–60] The former is used only in case of thebackground can be modeled or collected reliably; while the later is usu-ally used for detecting moving objects in image sequences A typicalbackground subtraction has been studied in [56, 57]; because station-ary vision-based sensors were used in a controllable environment, thebackground image, called Ibg(x, y), could be modeled reliably upon theprogram execution To detect vehicles in image I(x, y), a binary image
Ib(x, y) was formed as in the following equation
Ib(x, y) =
1, |I(x, y) − Ibg(x, y)| ≥ θ
Trang 27Chapter 1 Introduction 16
where θ was a threshold value to transform the difference between twoimages into the binary image White pixels in Ib(x, y) that were insideenough large regions were used to generate vehicle candidates
On the other hand, studies in [58–60] could adapt the background tothe change of the environment by an algorithm so-called self-adaptivebackground subtraction The principal of the method in those stud-ies is to modify the background image (CB) by using instantaneousbackground (IB) and applying an appropriate weighting α as follows:
CBk+1 = (1− α)CBk+ αIBk , (1.2)
where, k is frame index in image sequences The instantaneous ground is defined as IBk = Mk• CBk+ (∼ Mk)• Ik ; where, Ik is thecurrent frame; Mk is the binary vehicle mask, and it is similar to Ibabove
back-3.2 Optical Flow:
The optical flow approach utilizes the relationship between consecutiveframes in the input image sequences to detect vehicles Let us representimage intensity at location (x, y) at time t by E(x, y, t) Pixels on theimages appear to be moving due to the relative motion between thesensor and the scene The vector field o(x, y) of this motion is referred
to as optical flow
Optical flow can provide strong information for generating vehicle didates Approaching vehicles at an opposite direction produce a di-verging flow, which can be quantitatively distinguished from the flowcaused by the car ego-motion [61] On the other hand, departing orovertaking vehicles produce a converging flow To take advantage ofthese observations in obstacle detection, the image is first subdividedinto small sub-images and an average speed is estimated in every sub-image Sub-images with a large speed difference from the global speedestimation are labeled as possible obstacles
can-The performance of several methods for recovering optical flow o(x, y)from the intensity E(x, y, t) have been compared in [62] using someselected image sequences from (mostly fixed) cameras Most of thesemethods compute temporal and spatial derivatives of the intensity pro-files and, therefore, are referred to as differential techniques Getting
Trang 28Chapter 1 Introduction 17
a reliable dense optical flow estimate under a moving-camera scenario
is not an easy task Giachetti et al [61] developed some of the bestfirst-order and second-order differential methods in the literature andapplied them to a typical image sequence taken from a moving vehiclealong a flat and straight road In particular, they managed to remapthe corresponding points between two consecutive frames, by minimiz-ing the following distance measure:
a less dense grid to reduce computational cost
Kruger et al [63] estimated optical flow from spatiotemporal tives of the gray value images using a local approach They furtherclustered the estimated optical flow to eliminate outliers Assuming
deriva-a cderiva-alibrderiva-ated cderiva-amerderiva-a deriva-and known ego-motion, they detected both ing and stationary objects Generating a displacement vector for eachpixel (i.e., dense optical flow) is time consuming and also impracticalfor a real-time system In contrast to dense optical flow, sparse opticalflow is less time consuming by utilizing image features, such as corners[64], local minima and maxima [65], or Color Blobs [66] Although
mov-it can only produce a sparse flow, feature based methods can providesufficient information for generating vehicle candidates Moreover, incontrast to pixel-based optical flow estimation methods where pixelsare processed independently, feature-based methods utilize high-levelinformation Consequently, they are less sensitive to noise
1.2.2.2 Stereo Cameras
1 Inverse Perspective Mapping:
Inverse perspective mapping (IPM) was first introduced in [67] The idea
of IPM is to reverse the perspective mapping that has been performed by acamera We consider the camera as a perspective mapping Mathematically,
a point P in the real-world coordinate that is defined with respect to the
Trang 29Chapter 1 Introduction 18
camera’s coordinate can be perspectively mapped onto a point p in the imageplane of the camera by p = M× P , where M is the camera matrix, and thecamera’s parameters in the matrix can be estimated by calibration Because
of the perspective mapping, road and objects are distorted in the image, forexample, two parallel borders of the road in the real-world coordinate have anintersection in the horizonal line of vanishing points The inverse perspectivemapping is to reverse that effect Of course, P can not be obtained easily by
P = M−1× p because perspective mapping is not mathematically invertible,i.e., M is not an invertible matrix However, if we add some more constraints,then we can perform the inverse perspective mapping For example, if weassume P originally lies in an horizontal plane that we can know, then Pcan be determined mathematically Existing research works in IPM usuallyassumed that P lay in the road plane and that the position of the road planewith respect to the camera was known in advance Usually, the position ofthe road plane was defined via the extrinsic parameters of the camera, i.e.,the height, and the pitch, the yaw, and the roll angle
Figure 1.2: The demonstration of Inverse Perspective Mapping Upper
row: an example of the scene Lower row: (a) left image, (b) right image,
(c) left remapped image, (d) right remapped image, and (e) difference
image in which the gray area represents the region of the road not seen
by both cameras (from [72])
In fact, IPM is not a technique reserved for pairs of stereo images, it can bedone for every single image whenever the parameters of the camera is known
Trang 30Chapter 1 Introduction 19
There exist many research works that performed IPM on single images forlane detection [68–71] and vehicle detection [39]
An extension of IPM for stereo cameras was introduced by Bertozzi et al
in [72], and they also used IPM in many research works [24, 73, 74] Theidea of the extension is shown in FIGURE 1.2 The assumption was that theparameters (both of intrinsic and extrinsic) of stereo cameras were known inadvance (by a calibration task), the left and the right images were able to beremapped onto the road plane The difference between the two remappedimages were computed In the ideal case, the differences for the road’s pixels
in the difference image were zero, and every obstacle resulted two triangles
in the difference image, as shown in FIGURE1.2 (e) The appearance of thetwo triangles were a recognizable feature of the obstacles, and it was usedfor the detection
Based on the difference image, the obstacle detection problem becomes thedetection of the two triangles Let us denote CL(x, y) and CR(x, y) are twopoints that are images of the center of the left and the right camera on theroad plane respectively The orientation of the two triangles is shown inFIGURE 1.2 (e) The left edges of the two triangles pass through CR(x, y),meanwhile the right edges passes through CL(x, y) In [73], Bertozzi et al.assumed that the distance between CL(x, y) and CR(x, y) are small, so theyunified CL(x, y) and CR(x, y) to the middle point CLR(x, y) = CL (x,y)+C R (x,y)
A polar histogram was built by rotating a scan-line around CLR(x, y) cal maxima were detected from the polar histogram and then associated togenerate vehicle candidates
Lo-In practical, the detection of the two triangles a difficult task because of thefollowing problems:
• Because there exist textures and non-homogeneous regions in frontalobjects and because the shape of the frontal objects are irregular shape,the two triangles are too noisy
• If there are more than one frontal objects then there are more twotriangles In this case, it is hard to associate local maxima in the polarhistogram
• In the case that a frontal object is partially visible, there exist only onelocal maxima This is also another difficult situation
Trang 31Chapter 1 Introduction 20
Ki et al [75] based on the fact that a vertical line in the input image will
be mapped onto a straight line in the remapped image, and the straight linepasses through a point that is the projection of the center of the camera
on the road plane Moreover, the straight line in the remapped image islonger than the original vertical line Hence, obstacles were detected by thefollowing procedure First, IPM was done for both of the two images in theinput stereo pair to obtain a pair of the remapped images After that, theremapped images were used to obtain edge maps by edge detection Polarhistograms were computed for both of the two edge maps using the technique
in [73] Finally, obstacles were detected by associating local maxima in thetwo computed polar histograms The relationship between local maxima inconsecutive frames in the input image sequence were also utilized to detectobstacles more accurately
2 U- and V-Disparity Image:
U- and V-disparity images were first introduced by Labayrade et al in[76,77], and they are analyzed in detail in [78,79] The detection of vehiclesand obstacles by this approaches usually contains the following tasks:(a) Computing disparity images: Disparity images are computed fromthe input stereo pair In the computation, one of the two images inthe input stereo pair is used as a reference image The size of thedisparity image that is computed for the reference image is the same
as the reference image’s Each pixel in the disparity image encodesthe 3D information associated to that pixel Actually, this step usestechniques in stereo matching, which is one of the research areas incomputer vision Stereo matching is introduced in detail in Section 2.1.(b) Computing U- and V-disparity Image: A disparity image is ac-tually is a surface in a 3D volume that its width, height, and lengthdimension are U, V, and D respectively Where U × V is the size ofthe images in the input stereo pair, and D is the disparity search range.Given a disparity surface, the U–disparity image is obtained by project-ing the disparity image onto the U×V plane and accumulating the totalnumber of points in the disparity surface for each pixel in the U × Vplane If the projection and accumulation are done for the V ×D planethen we obtain the V-disparity image Further information is given inSection 2.2
Trang 32Chapter 1 Introduction 21
(c) Estimating the ground plane: Because the frontal objects are onthe ground plane, it is reasonable to detect the ground plane first andthe frontal objects later on An overview of the methods for this task isgiven in Section 2.2 The proposed method in this thesis for this task
is presented in Section 5.2
(d) Locating vehicles: Vehicles have the following recognizable featuresthat are utilized for the detection First, vehicles appear as verticalstraight lines in the V-disparity image Second, the disparity of thestraight lines keep the depth information of the vehicles Third, thelower end of the vertical straight lines contacts with the slanted line ofthe ground plane Fourth, the vehicles also appear as straight lines inthe U-disparity image By combing the above features, the vehicles can
be located in the image and the world coordinate
1.2.2.3 Multiple Cameras
According to [80, 81], although 3D information can be recovered from pairs ofstereo images by stereo matching, there are several advantages to use more thantwo cameras as follows:
1 Repeating texture in the input stereo pair makes the determination of responding pixels more difficult and creates large matching errors Suchthe errors can be reduced by using more than two cameras, with a carefularrangement of the cameras
cor-2 Stereo cameras having short baseline produce small matching errors, but theycan not determine the distance of far objects In order to detect far objects,stereo cameras having long baseline are preferred However, long baselinestereo cameras produce large matching errors Therefore, it is reasonable tocombine a long and a short baseline together
3 Occlusion is one of the reasons that cause large matching errors Adding anew camera is expected to reduce such the errors, because the points that areoccluded before adding the new camera can be observed by the new camera
In the case of using more than two cameras, the arrangement of the location
of the cameras is important In [81], Todd et al used three cameras, and the
Trang 33Chapter 1 Introduction 22
arrangement was ”L”-shape, i.e the first two cameras had a baseline about 1.2m,and the third camera was displaced about 50cm horizontally and 30cm vertically.For performing multi-baseline stereo, the matching cost of one pixel in a image thatwas selected as the reference was computed by accumulating the cost of matchingthe point in the reference image with each candidate corresponding point in theremaining images They reported that the system could detect objects as small as14cm at range in excess of 100 m
In [82], Alberto et al used three cameras that were mounted on the same zontal bar The first two cameras had a baseline of 1.5m, and the third camerawas located in-between the first two, and it was displaced about 0.5m from thefirst camera By such the arrangement, they had three baselines, 0.5m, 1m, and1.5m In order to simplify the computation, the traditional stereo matching [83]was used used for each of the baseline They used such the vision system for theirautonomous vehicle, its name is TerraMax, which participated to the DARPAGrand Challenge 2005 competition During the competition, TerraMax selectedthe viewing baseline based on its speed
hori-1.2.2.4 Summary
1 Single Camera:
Without using any a priori knowledge about vehicles, for example, color,symmetry, and texture, exhaustive search has to move a rectangular windowaround the input image to probe the vehicles The idea of the exhaustivesearch method is quite simple, and its accuracy totally depends on the ro-bustness of the verification module Obviously, the exhaustive search method
is time-consuming, especially in the case of using several windows of ent sizes to detect the vehicles in different distances and with different sizes.Moreover, several overlapped windows, which are slightly different in loca-tion or size, can be detected for only one vehicle in the input image
differ-Symmetry was used by many research works However, in order to use onlythe symmetry information for generating vehicle candidates, it is necessary
to have a fast method to compute symmetry maps, and this problem is one ofthe most challenge task in using the symmetry information Moreover, eventhe symmetry maps have been computed from edge maps, the symmetrymaps are still not reliable because of symmetric objects in the background
Trang 34Chapter 1 Introduction 23
and because of homogeneous regions In addition, the symmetry informationcan help to determine the left and the right border of vehicles only, thelocation of the top and the bottom border is still another problem
Color was used by a limited number of research works because the color
of vehicles is unpredictable, except red colors of tail-lights Moreover, thecolor of every object depends on many factors, for example, illumination,reflectance property, viewing geometry, and sensor parameters
Shadow was exploited by many studies The advantage of the shadow mation is that it is simple for computation, and it can be utilized to generatevehicle candidates However, using the shadow information incurs at leasttwo problems The first problem is that it is difficult to reliably detect theshadow This problem is due the color of the shadow depends on weather,the color of road, and the color of other objects The second problem isthat if we can detect the shadow then it is also difficult to generate vehiclecandidates using only the shadow information This problem is due to thelocations of the detected shadow and the vehicle associated with the shadowdepend on the direction of the sunlight
infor-Corner and edges are useful information for generating and verifying vehicles.However, it is hard to select threshold values for obtaining the edges and thecorners In addition, the background objects may contain corners, and edgessimilar to vehicles’ Therefore, it is more reasonable to detect vehicles byfirst locating the road and then using the corners and the edges inside theroad area only
Using texture does not incur the problem of determination of threshold ues like using corner and edge However, textures of vehicles and otherbackground objects may be similar, and it is difficult to separate betweenthe vehicles and the other background objects by using only the textureinformation
val-Vehicle lights are good recognizable features for detecting vehicles, cially in night-time In addition, red colors in tail-lights are also other usefulfeatures However, associating the left and the right vehicle lights is a chal-lenging problems This problem is due to the fact that there are too manyother background objects that have red colors or have brightness similar tovehicle lights’
Trang 35espe-Chapter 1 Introduction 24
The advantage of subtraction-based features is that such the features can
be computed quickly and that the features provide a good sign to generatevehicle candidates, especially in the case that the vehicles have a slightly highvelocity However, it is imperative to have a reliable method for modellingthe background image
Although optical flow provides rich information for generating vehicle dates, its computation is a time-consuming task It is also difficult to obtainreliable optical flows because of the following situations: the large displace-ment of corresponding pixels in consecutive frames that is caused by a highvelocity of the host vehicle, the lack of texture in the input image, and theshock and the vibration of camera during the movement of the host vehicle
candi-In addition, optical flow and subtraction-based features as well can not beused for detecting parking vehicles
More importantly, all of the approaches that use single camera as mentionedabove are not able to estimate the distance from the host vehicle to frontalobjects Unfortunately, this demand is necessary for building intelligent andautonomous vehicles
2 Stereo Camera:
Both of the approaches using stereo cameras mentioned above are able toestimate the distance of frontal objects Compared to radar sensor, stereocameras can provide wider field of view, more lateral accuracy, and lowercost
In IPM, the accuracy of vehicle detection totally depends on the accuracy
of the detection of triangles in the difference image In the ideal situation,IPM can produce a well-separated triangles in the difference image, so thetriangles are easily detected from a polar-based histogram However, IPM
is so sensitive to the height and the pitch of stereo cameras, and these twoparameters continuously change when the vehicles moves In combinationwith the texture and homogeneous regions in the rear view if the vehicles,the variation of the height and the pitch of the stereo cameras adds morenoises to the difference image, so it is difficult to detect the triangles Inaddition, IPM can provide a tool to determine the distance, and the left, theright and the bottom border of the vehicles in the input image, it is anotherchallenge to detect the top border of the vehicles Performing IPM is also atime consuming task
Trang 36Chapter 1 Introduction 25
The advantage of U- and V-disparity image is as follows First, it can providethe information about the ground plane, the profile of and the disparityassociated with the ground plane Second, it can accurately determine thebottom and the top border of frontal vehicles by considering the V-disparityimage However, it is difficult to determine the left and the right border ofthe vehicles This problem is due to the discontinuous of the profile of thevehicles in the U-disparity image
However, The U- and V-disparity image approach still have some challenges
as follows:
• In theory, the longer baseline a stereovision system has, the far rangethe system can achieve However, the longer baseline also means thatthe disparity search range will be increased As a consequence, the thetime for computing disparity images from stereo pairs captured by along baseline will be increased This problem is one of the challengesfor using stereovision
• The detection of vehicles by using U- and V-disparity image totally pends on disparity images, so increasing the quality of disparity images
de-in also another challenge, especially de-in the case of havde-ing homogenousregions in the input stereo pairs, which is quite popular in the realcondition
• The bottom of vehicles are determined at the contact points of thevehicles with the ground plane, so it is necessary to have a reliablemethod for estimating the ground plane, especially in the case of non-flat road
• Accuracy locating the left and the right border of vehicles are tant because it affects the accuracy of the verification step All of theinvestigated studies did not provide a way for locating and extractingthe whole view of the vehicles
impor-3 Multiple Camera:
Generally, using more camera can provide better accuracy of the recover 3Dinformation, except some case where the addition cameras also can not seeoccluded objects points However, Adding more cameras also mean that ahigher cost has to be paid for the vision system and that the computation
Trang 37Chapter 1 Introduction 26
time increase as well Until recent studies, stereo matching can be in time for stereo images of slight small sizes and of small disparity search rangeonly So, using multiple camera is impractical now
real-1.2.3 Fusion of Sensors
1.2.3.1 Radar + Camera
Giancarlo et al [2] used a radar to roughly estimate the distance from the hostvehicle to frontal objects and used another camera to capture the scene in front ofthe host vehicle Both of the radar and the camera were calibrated, so whenever afrontal object was detected by the radar, its location was converted to the imagecaptured by the camera Regions of interested in the image were generated andverified with the symmetry information Using radar information to limit the areathat will be searched for frontal vehicles was also used in [3]
On the other hand, in [4], whenever a frame was captured by a camera, the responding radar information at that time was obtained and used to model aprobability function P (Z), which shows how much possibility a vehicle would ap-pear at a depth Z Another probability function PZ(x, y) was also computed foreach frame PZ(x, y) shows how much possibility a vehicle would appear at loca-tion (x, y) for a given depth Z, and it as computed by using image data Localmaxima were determined from the product of the two probability functions, i.e
cor-PZ(x, y)∗ P (Z), and the locations of vehicles can be detected from the detectedlocal maxima
1.2.3.2 LIDAR + Camera
Almost of the studies on fusion of LIDAR and Camera [8, 9,12] performed vehicledetection by the following steps LIDAR sensors were used to measure the 3Dinformation of frontal objects The measured 3D information were converted toimage coordinates, before converting both of the LIDAR sensors and the camerawere calibrated The candidates of vehicles were generated by using 3D informa-tion from the LIDAR sensors After that, the candidates were verified by usingtechniques in pattern recognition, for example, Adaboost classifier
Trang 38Chapter 1 Introduction 271.2.3.3 Summary
Obviously, combining active sensors and passive sensors can provide better ability
of detection and classification However, it also requires an extra cost as well Inaddition, a workable calibration method is also required for relating the points de-tected by the active sensors to the corresponding positions in the images captured
by the passive sensors
As discussed in Section1.2, LIDAR sensor, especially 3D laser scanner, is the bestmethod that can produce high quality 3D information for the scene surroundingthe host vehicle However, passive sensor is selected in this thesis because thefollowing reasons First, passive sensor that can produce high-resolution imageswith a high frame rate is now available with a reasonable price Meanwhile, theprice of high quality 3D laser scanners now is available with several thousands of USdollars Second, using the passive sensor can provide a plenty of visual informationthat are useful for further steps like vehicle classification and pedestrian detection.Third, the passive sensor is selected because it does not cause the interference that
is the problem in using active sensor
For using passive sensor, Inverse Perspective Mapping is not selected because itworks well with only long baseline stereo cameras and because it requires thecorrect pitch angle of the stereo cameras
The selected approach for detecting vehicles, i.e., by combining V-disparity imagesand U-disparity images, needs to solve the three tasks as follows: (1) computingdisparity images, (2) estimating ground planes, and (3) locating vehicles Theproblems that are solved for each task are given in the following
1 For computing disparity images: Reducing the time for computing parity images and enhancing the quality of the disparity images are twoimportant problems They are selected to be solved in this thesis Themethod for reducing the computation time is proposed in this thesis and isexplained in Chapter 3 The approaches for enhancing the disparity imagesare proposed and are presented in Chapter 4
Trang 39dis-Chapter 1 Introduction 28
2 For estimating ground planes: Based on the disparity images computed
by the proposed approaches, a new method that is able to reliable estimateground planes in the case of low texture stereo images is proposed Theproposed method for estimating the ground planes is also able to work withnon-flat roads, which is a difficult case in the ground plane estimation task.The proposed method for estimating the ground planes is presented in Chap-ter 5
3 For locating vehicles: The view of frontal vehicles should be extracted
as exactly as possible in order to increase the performance of the vehicleverification task in Vehicle Detection This challenge is solved in this thesis.The proposed method for the challenge is presented in Chapter 5
This thesis contains seven chapters In Chapter 1, the author would like to definethe problems that he selects to solve in the thesis The related works will beintroduced in Chapter 2 The next three chapters, i.e., Chapter 3, 4, and 5,are reserved for explaining the techniques that are proposed by himself Theexperimental results and comparisons are presented in Chapter 6 He will makesome conclusions and introduces some further improvements in Chapter 7 Thecontent of each chapter is as follows:
Chapter 1: First, the author would like to introduce some reasons that motivatehim to select ”stereo-based vehicle detection” to study in his thesis Next, chal-lenging problems in the selected topic are then identified Finally, The thesis’sscope is defined by listing selected problems that are solved in the thesis Inaddition, the structure of the thesis is also presented in this chapter
Chapter 2: This chapter is to introduce the related works of the selected topic.The proposed vehicle detection framework consists of three main tasks as follows:computing disparity images from incoming stereo pairs, estimating the groundplane from the computed disparity images, and locating vehicles in the input im-ages Each of these tasks corresponds to a research area Therefore, the content ofthis chapter is as the following In the first section, the terminologies, the proce-dure and the challenges of Stereo Matching are introduced In the second section,the geometry of stereo cameras and U- and V-disparity images are first presented
Trang 40Chapter 1 Introduction 29
After that, existing methods for estimating the ground plane are reviewed Inthe third section, the way to combine the estimated ground plane and U- and V-disparity images is given The other stereo-based method that based on groupingdisparities is also explained in this chapter
Chapter 3: This chapter is to thoroughly explain the proposed coarse-to-fine proach for efficiently computing disparity images The chapter begins with a shortintroduction to the idea of the proposal Next, new concepts related to ”samplingdisparity space” that is proposed in this thesis are explained and illustrated Afterthat, there are 4 sections that explain all of the steps in the proprosed approach.Some discussions are also added as the summary for this chapter
ap-Chapter 4: This chapter explains the proposed approaches for reliably computingdisparity images from low texture stereo images The chapter begins with a shortintroduction to the proposal’s idea After that, the explanation for each task in theproposed approaches is given The three main tasks in the proposed approachesare as follows: (a) computing pixel-matching costs, (b) aggregating costs, and (c)computing disparity images In (a), a new cost function that is proposed in thethesis is presented In (b), a new cost aggregation method that uses edge mapsduring the aggregation to achieve the robustness against low texture regions isthoroughly explained In (c), a new disparity image computation that is based ondynamic programming technique and ground control points is introduced Thischapter ends with a short discussion as its summary
Chapter 5: This chapter gives the explanation of the proposed method for based ground plane estimation and vehicle detection The chapter begins with
stereo-a short introduction to the proposstereo-al’s idestereo-a After thstereo-at, ground plstereo-ane estimstereo-ationand vehicle detection are given For ground plane estimation, the procedure forobtaining U- and V-disparity images is first presented The dynamic programmingbased method for estimating ground plane is then introduced Finally, the methodfor combining U- and V-disparity images to locate vehicles is explained Thischapter ends with a short discussion as its summary
Chapter 6: This chapter presents experimental results have been done in thisthesis The results are organized into 3 parts corresponding to each proposal in thethesis First, the proposed coarse-to-fine approach for computing disparity images
is evaluated by comparing to a test-bed that is well-known in stereo matching.Next, the robustness of the proposed approaches for computing disparity images