Geometrical environment understanding by building recognition for the intelligent transportation and robot systems

1.2 An example of line segment detection: a Original image in ZuBuDdata set; b 554 detected line segments, the red lines overwrite on theoriginal image; for easy vision, in c, the black

Trang 1

By Hoang-Hon Trinh

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

AT UNIVERSITY OF ULSAN ULSAN, KOREA DECEMBER 2008

c

° Copyright by Hoang-Hon Trinh, 2008

Trang 2

requirements for the degree of Doctor of Philosophy.

Dated: December 2008ii

Trang 3

iii

Trang 4

Department: Graduate School of Electrical Engineering

Degree: Ph.D Convocation: December Year: 2008

Permission is herewith granted to University of Ulsan to circulate and

to have copied for non-commercial purposes, at its discretion, the above titleupon the request of individuals or institutions

IS CLEARLY ACKNOWLEDGED.

iii

Trang 6

Acknowledgements xvi

0.1 Introduction of ITRS 1

0.2 Building and Environment (A Good Landmark for ITRS) 2

0.3 Building Recognition for Localization 4

0.4 3D Reconstruction Environment for Navigation, Mapping and Exploring 6 0.5 Proposed Method for ITRS 6

0.6 Data Sets 8

0.6.1 ZuBuD Data Set 9

0.6.2 UlBuD01 data set 9

0.6.3 UlBuD02 data set 10

0.7 Unification of Words, Phrases and Definitions in This Dissertation 10

1 Building Detection 12 1.1 Introduction 12

1.2 Line Segment Detection 14

1.2.1 Detecting Line Segment 14

1.2.2 Model of Line Segment (MLS) 15

1.3 MSAC-based Calculation of Dominant Vanishing Points (DVPs) 19

v

Trang 7

2 Building Recognition 40

2.1 Introduction 40

2.2 Area of Building Facet 43

2.2.1 Wall Color Histogram (WCH) 43

2.2.2 Localized Color Histogram [81] 47

2.3 Local Features of Building Facet 50

2.3.1 SIFT Description 51

2.3.2 Rectangular Shape and Local Features of Building Facet 53

2.4 Data Training 56

2.4.1 Matching and Constraints 57

2.4.2 Canonical RANSAC and Hough Transform-based Verification of Correspondences of Image Pairs 58

2.4.3 Cross Ratio-based Verification of Correspondences of Image Pairs 64 2.4.4 Geometric Normalization 67

2.4.5 SVD-based Method for Calculating the Approximate Vectors 71 2.4.6 A Common Model 74

3 Geometric Analysis for 3D Reconstruction of Building 77 3.1 Introduction 77

3.2 Principal Component (PCs) Detection 78

4 Experiments 81 4.1 Experimental Building Detection 81

4.2 Experimental Building Recognition 86

4.2.1 Experimental Recognition of ZuBuD data 87

4.2.2 Experimental Recognition of UlBuD01 data 89

4.2.3 Experimental Recognition of UlBuD02 data 93

vi

Trang 8

vii

Trang 9

2.1 The distances of histograms (d i , i = 1, 2, , 5) between the test and the

stored images from left to right, respectively 50

2.2 Corresponding parameters for estimating DVP and homography matrix 59 2.3 Explanation for Fig.2.17 67

2.4 Selected γ for updating wall color histogram and local features . 72

3.1 Estimated size of buildings in Fig.3.2 80

4.1 Test conditions for detecting building’s facets 83

4.2 Summary of building detection 86

4.3 Explanation for each sub-image in Fig.4.7 89

4.4 Results of building recognition 90

4.5 Comparing the size of database for each building 95

viii

Trang 10

1.2 An example of line segment detection: (a) Original image in ZuBuDdata set; (b) 554 detected line segments, the red lines overwrite on theoriginal image; for easy vision, in (c), the black lines are overwritten

on the blurred image with linear transformation [min, max] (values of

pixels) → [125, 255]; 161.3 MLS: Example image is taken from ZuBuD data set; (a) Neighboredregions; (b) line segment detection; (c) building and non-building seg-ments are selected by handling 17

1.4 Distribution of 4I, σ m 200 first sampled segments and selected olds 181.5 Using MLS to reduce the noise from natural object regions or images 191.6 The angle between the segment and the line where lies through thesegment’s middle point and vanishing point 241.7 Retrieval of broken edges: (a) the vertical segments which create anacute angle 200 in maximum with y-axis; (b) the right edge split intofour segments replacing by the blue one 25

thresh-ix

Trang 11

to low priority 281.10 The single line: (a) around region of line segment; (b) relation betweenlength of line segment and diameter of semicircles; (b-f) survived seg-ments 311.11 Co-existences and extended line segments 321.12 Candidate line: (a) vertical lines and horizontal segments (ZuBud dataimage); (b,c) the intersections and estimated height at each vertical

line; (d,e) candidate lines of horizontal groups, H1and H2 respectively;magenta lines are the candidate for both of horizontal groups 351.13 Other examples for roughly detecting of building facets in step by step:(a-h) come from one building; (i) comes from another one; Blue group

in (h) is rejected because it is not passed the thresholds N and h0 361.14 Facet detection: (a) four roughly detected facets; (b) three facets sur-vive after rejecting the ambiguities; (c) result of facet detection 371.15 Accuracy of Facet’s Boundaries: the first row are the results of roughdetection of facets; the second row is final results of facet detection 392.1 Convex quadrangle as the boundary of detected facet 432.2 Illustration for detecting wall region; (g) final wall color histogram 442.3 Extracted wall region of corresponding facets of one building in ZuBuDdata set 45

x

Trang 12

a keypoint and its approximate region is represented by the size of circle 532.9 Selected keypoints and features: (d) is zoomed in regions of yellowrectangles of (c) 542.10 (a) two detected and corresponding transformed facets; detected key-points (red marks); (b,c) correct matches with the original and trans-formed images, respectively 552.11 Repeated features of building: the green circles are correct match andfive repeated features; their distances are approximate together; theimages are in ZuBuD data set 56

2.12 d0 threshold selection by statistics 582.13 Illustration of a drawback of canonical RANSAC for verification: (a)

104 matches of building facets; (b) the best sample given by usinghomography matrix; (c) the correct sample 602.14 Examples of object recognition by using Hough transform-based method;

in each image, the small one is training image and the big one is testimage 62

xi

Trang 13

2.18 Relation between factor α and maximum area of detected facets, A . 70

2.19 Automatically reducing the noise by SVD-based update of features 71

2.20 The observing distances are reduced following the updated times with different γ . 73

2.21 Building images, facet detection, wall regions, wall histogram and com-mon models 75

3.1 Step-by-step illustration of PC detection process 79

3.2 Typical results of detection of principal component 79

4.1 Examples of facet detection in general test conditions: two first rows are illustrated results of ZuBuD data ; two last rows are the test images of our data 82

4.2 The detection by movement of ITRS: (a) and (b) are undetected and detected building, respectively 83

4.3 Examples of non-building images 84

4.4 Confused detection of building images 84

4.5 Examples of detection of non-planar buildings 85

4.6 Several worst results of building detection 85

xii

Trang 14

descending rank from left to right 924.11 Two cases of incorrect results: the left are the test images and rightare their correspondent images with five models of views; the shownfacets includes the ambiguous detections 934.12 From left to right, The results of without, 10 and 20 times of update,respectively 944.13 Without Transforming into rectangular shape, from left to right, Theresults of without, 10 and 20 times of update, respectively 95

xiii

Trang 15

city The outer surface of building comprises of some special properties of made objects such as rectangular shape, doors, windows, wall and columns Thesecharacters support the information for classifying the buildings with other objectsand identifying to each other The dissertation comprises of three major parts as ahierarchical system for understanding of intelligent transportation and robot systems.The first part of this thesis is for detecting landmark The buildings are classifiedwith other objects like sky, trees, bushes and roads Firstly, line segments and twoneighborhood regions are extracted A model of line segment (MLS) is constructed bycolor information of neighborhood regions MLS is used to reduce the line segments ofnon-building patterns Secondly, the rest of line segments are clustered such parallellines which have a common vanishing point (VP) by MSAC (m-estimator sampleconsensus) algorithm The maximum numbers of VPs calculated for vertical andhorizontal directions are one and five, respectively The vertical and one of horizontalclusters create a mesh of convex quadrangles (skew parallelograms) as a candidateface of building The geometrical properties like distributed density of line segmentsand number of intersections are analyzed and considered as criteria to refine thebuilding face Finally, the building facets are detected and represented by a boundary

man-xiv

Trang 16

the wall color histogram and the area of a test facet, the small number of the storedfacets in the database is chosen as a set of candidates (sub-candidates) The secondstep is for refining the correct match The local feature vectors of the test facet arecompared against the stored vectors of the sub-candidates The correct match then

is used to update the database Here, we proposed a new method, cross ratio-basedrefinement, for exactly verifying the correspondences of image pairs By updating thedatabase from the correspondences, we reduced the size of database and decreasedthe noises for matching process so that the method increases the recognition rate.With the learned database and exact recognition process, the ITRS can localize andnavigate in the environment

In the last part, the geometrical information of building’s principal components(PCs) such as doors, windows and wall regions is analyzed in more detail Here, thePCs are detected and then the edges and lengths of windows are accurately estimated.The relative geometrical information of PCs like quantity, height and alignment arecomputed The geometrical information is available for reconstruction of 3D model

of building so that the system understands a more in semantics of the environmentwhere ITRS explore

Trang 17

teacher and good professor.

I would like to thank to Ulsan Metropolitan City and MKE and MEST of KoreanGovernment which partly supported this research through the NARC and post BK21project at University of Ulsan

I was so fortunate to spend the long time abroad indulging only into my research.This favor was done by my family’s consistent helps They have been always ratherpraying for me to achieve the goal Specially, I could not fulfill the jobs during thefull course of Ph.D program if there were not my wife, Thi-DoanTrinh Vo’s persistentpatience and encouragements She has managed and brought up our young son alone

Of course, I am grateful to my parents for their patience and love Without them

this work would never have come into existence (literally)

Finally, I wish to thank, Taeho Kim, Hyun-Deok Kang, Dae-Nyeon Kim, Sung Lee, Woong Park, Phuong Trinh Pham Ngoc, Andrey Vavilin, Heechul Lim,Hyun-Uk Chae, Suk-Ju Kang, Sang Hee Lee, Sung-Woo Song, laboratory friends Ican not forget their kind friendship

December 26, 2008

xvi

Trang 18

mapping, navigation and exploration These functions make a hierarchical systemfor the running and working of the ITRS They are close relation together So it isbetter if a function can be inherited and helped from the others For example, tolocalize where the ITRS is, the ITRS usually uses three functions such as landmarkdetection, recognition and localization with two steps [15, 20, 40, 70, 76] The first stepfor learning environment, the system takes some images and detects the landmarks

as special objects or regions The landmark is encoded into a special feature (Innormally, a vector or a list of vectors) and connected closely to the specific information

of location relevance by the designer The features and information will be stored as

a database in the memory of system The second step is for working, the system alsotakes a new image The features of new image are detected and matched against thedatabase Then the information of the best match helps the system answering where

he is In this dissertation, we build a database of outdoor scene with the expectationthat it can be shared and used for every function In specially, three major functions,landmark detection, recognition and 3D reconstruction, are discussed in deeply detail

1

Trang 19

outer surfaces, rectangular shape of surface, windows and doors, many orthogonaland parallel line segments and so on So that, most of researchers of intelligent visionsystem select the building as a principal element for analyzing the outdoor scene, inparticular, the urban environment All functions of ITRS can be based on building,for example, building-based localization [16, 20, 70], building-based navigation [19,

29, 32, 51, 55], building-based 3D reconstruction of environment [9, 33, 34, 65] and

so on

Building is a rich geometrical structure of manmade object, so it is easily detected(classification with other objects) from the scene For example, There are many linesegments located on the building regions The line segments are distributed into two

or three principle directions The principle directions are orthogonal to each other In2D image, the parallel segments have a common vanishing point All major featuressuch as parallel lines, “L” junctions, “U” junctions, “Significant” parallel groups, etc.[25, 26] , or varying slopes, horizontal, vertical and cross edges [29, 32], or inter-cluster, intra-cluster [36] are used to describe the distribution and relation of linesegments locating on building So that the building is detected

Major challenges of building and environment analysis are listed as follows,

• Building densely appears in the urban environment, so each image usually

con-tains multiple buildings inside, Fig 1(a)

Trang 20

is sometimes confused.

• Building and outdoor scene is affected by the sunlight and sky, so the

illumi-nation is large changed Specially, some components of building change theircolor by the reflection as the windows in Fig 1(d)

(a) Multiple buildings in each image (b) Appearance is sensitive to the viewpoint

(c) Other manmade objects (d) Large change of illumination by the sunlight

Figure 1: Examples of challenges of building and environment analysis

Trang 21

Base on the technic of feature extraction, the recognition is separated into threeindividual methods that are appearance-based, geometry-based and local feature-based method [42] Geometry-based method [26, 43, 48, 71, 77] represents the object

by geometrical features such as shape, lines, segments, circles, etc and/or theirjunctions It is effective to recognize a building from other objects (classification ordetection building, more detail in section 0.7), but not highly effective to identify abuilding with each other; because all most the buildings have similar geometrical com-ponents such as doors, windows, columns and wall The appearance-based method[56, 57] presents an image or sub-image (window, region) by a vector feature, in nor-mally, it is a histogram The histogram is encoded by texture, pixel color, etc Therecognition result of appearance-based method is limited in the cases of occlusion,clustered background or when a part of the object changes its color (as mentioned

in the 4th challenge of section 0.3) When the system works with a large database,the recognition is constructed by two or more steps The first step is used for se-lecting a sub-candidates of match The last step is refining the best match from thesub-candidates The appearance-based method is suitable for the first steps [80, 81].The local feature-based method describes the object by a list of detectors and theirdescriptors A detector is an interest point or a segment which is invariant with scale,

Trang 22

methods, the scale invariant feature transform (SIFT) algorithm [38, 39] achieved thegood results for most of the tests of conditions [46] So that, in this dissertation, theSIFT descriptor is used for representing the building.

A major drawback of local feature-based method is that there are many storeddescriptors in the database One object is appeared in several poses (e.g 5 poses[79]) For each pose, hundreds or thousands descriptors are detected and stored Itoriginates several problems like mismatches and random noise increasing, low com-putational time and so on So that the recognition rate is affected and decreased Toovercome those problems, many technics were proposed such as using constraint formatching [7, 39, 74, 81], selecting a strong identification for storing [15, 16, 20], com-pressing database by principle component analysis (PCA) method [28], using geomet-rical position (ordered distribution) for verifying the correct matches [39], clusteringsimilar features for training database (orderless distribution) [24, 41, 73] In our case,

we used geometrical position for training database It is suitable for ITRS to notonly overcome the limitation of local feature-based method but also adapt itself tothe change of environment, for example, different daytime or seasons

Trang 23

considered as a core function of these functions The reconstruction function wasdeeply analyzed and researched by many methods The effect and strategy of eachmethod depend on exploiting the geometrical analysis and supported data from cam-

era For example, Fruh et al [17] used laser scans and camera images, Cornelis et

al [9], Goedeme et al [19] and Werner et al [72] used the multiple camera system.

In recently, a tendentious use of a single camera or single image has been proposed[1, 11, 35] This method needs to deeply analyze the geometrical information like linesegments (edges), vanishing points, geometric constraints (parallelism, perpendicular-ity), plane ground, height of a certain point in the scene and so on Fortunately, when

we constructed the data for detection and recognition functions, we collected several

of geometrical information So that the last discussed method is in this dissertation3D reconstruction of building

In this dissertation, we concentrate to solve several problems for core functions of theITRS, such as how to exactly detect the building, recognize building and analyze ge-ometrical structure of building We construct a database of ITRS which can be usedfor almost the functions The effect of core functions is also deeply concerned In thedetecting landmark function, building image is classified with non-building image and,

Trang 24

Figure 2: Building detection and

of building for training the database Here, the exact verification of correspondencesbetween image pairs is necessary for updating database To do so, a new technic

Trang 25

face A building is described by several faces or facets Each facet is represented

by a model including an area, wall color histogram and a list of SIFT descriptors.Given a training image (as a new image), if the matching process is succeeded thenthe correspondences of image pairs are verified by cross ratio-based method Thecorrespondences are used to update the common model For beginning, a commonmodel is the first appearance of each building facet After training, the database isused for recognizing a new image when the ITRS is working The recognition rate ischecked from the best match by the user

This dissertation uses three sets of data from different CCD cameras with smalldistortion; it means that a line in the real world is also a line in the image when itwill projected by the cameras The first data set, ZuBuD data set, is made by H Shao

et al [60] Here, each image tacitly contains a single building inside Each building

is taken by a subset images The subset images contains several training images andmay have one or several test images under different conditions Two the last ones aremade by ourselves In our cases, each image may contain several buildings including

an interest building The interest building must be appeared in all images of itssubset The images in all data sets are taken under general conditions such as scales,

Trang 26

images, every 5 images is used for representing each building Several buildings arenot upright appearance, they will be rotated 900 before processing Several images

have size as 320 × 240 (or 240 × 320) pixels, they will be resized into 640 × 480 (or

480 × 640) pixels The ZuBuD data set is used for building detection and recognition

processes

0.6.2 UlBuD01 data set

UlBuD01 data set contains 680 building images and 200 non-building images All ages are taken in Ulsan metropolitan city in South Korea Here, 580 building imagesare taken from 100 interest buildings and their neighbors Among them 500 imageswith 5 images for each interest building is used for training database as in ZuBuDdata set and 80 images is used for the test set 100 remained building images aretaken under the harder conditions, specially, the scale of appearance and illuminationchange 200 non-building images are focused on many other objects of outdoor scenelike the sky, trees, roads and so on Comparison with ZuBuD data set, the condi-tions of UlBuD01 data set is harder as the implementation of non-planar and smoothsurface of buildings, the reflection of glass faces and so on The UlBuD01 data set isused for building detection and recognition processes

Trang 27

im-selected from 100 buildings in UlBuD01.

The size of all images in UlBuD01 and UlBuB02 data sets is 640×480 (or 480×640)

pixels Several images of UlBuD01 and UlBuB02 data sets are used for analyzinggeometrical information of 3D reconstruction of buildings The UlBuD02 data set isused for only recognition process

in This Dissertation

In computer vision, several words, phrases and definitions get a relative meaning Forexample, “building recognition” some times is classification between building imageand non-building image [36], or some times is identification between the building toeach other [81] To easily discuss and compare our method and the others, we unifythe commonly used words, phrases and definitions as follow,

• Data set: A set of images contains a training set and a test set.

• Database: The database is the specific information of training set after training

and extracting the features of objects Each image or object in the database isdescribed by a set of features, for example, wall color histogram, local featuresand so on The features of each object is indexed to the training images Several

Trang 28

region by a convex quadrangular boundary.

• Building recognition: A process is for identifying the buildings to each others.

Given a test image, the features are extracted and then matched against thefeatures of database

• Best match: The best match (or the closest pose) is a result of recognition

process which is strongest response of matching constraints or cost functions

• Correct match: The correct match is a result of recognition process which is

checked by user

Trang 29

Building is a manmade object with rich geometrical structures So most of researchers

of intelligent vision system select building as a principal element for analyzing theoutdoor scene of urban environment The depth and width of analyzed geometricalfeatures directly affect to the application of vision based intelligent system

Some of previous works concerned a dominant appearance of building like facades,

salient and so on Garcin et al [18] detected building by Markov object process for aerial photography where the roof is used to represent the buildings Yanyun et al

[75] based on the salient of building in the sky to detect the building This approachdetected the salient building as well as in single skyscraper Contrary to mentioned

above [75], Madhavan et al [40] used the ground surface as prior knowledge to detect the building with LADAR data The works of Iqbal et al [25, 26] used the line

segments and their junctions to retrieve building images By using density probability

of junctions, the building and non-building images are identified to each other Alimitation of above methods is that the analyzed data and features are used just only

12

Trang 30

function can be used to combine with another function One of major applications ofthis direction is combination between detection and 3D model reconstruction Thebuilding is considered as a cubic structure with two orthogonal surfaces, so three DVPsare estimated [1, 2, 11] With the orthogonality of building surfaces, the methods canperform with the both of single and multiple images Specially, the work of Rother[52] used the cubic structure to improve the method for detecting vanishing points.When the buildings appear in images with one surface, the reconstruction functionwill be solved by multiple images or/and with user assistance [8, 23, 33] A majordrawback of their approaches is that each image tacitly contains a single buildinginside.

In our case, the building are detected and identified by one or several faces orfacets Each facet is described by a quadrangular boundary in 2D image The linesegments are detected and coarsely selected by MLS Then, the DVPs of segmentsare estimated by MSAC method; and they are verified by the natural geometric ofbuilding components (detail in 1.4) The intersections between line segments areenhanced and considered as criteria to detect the boundaries To solve the problem

of multiple buildings in the images, a real case in urban environment, the maximumnumbers of detected DVPs are one and five for vertical and horizontal directions,respectively

Trang 31

1 The number of connected edge pixels (L) is larger than a given threshold T1.

2 The maximum distance from the outer points (D) to the shorted line which lies through the end points must be less than a certain length T2 The illustration

of L and D is shown in Fig.1.1(a).

In general case, the process of line segment detection is following

• Calculating and labeling the edges of image by using Canny detector.

• Rejecting the labels whose L is smaller T1

• Searching the furcation point and dividing each label into several parts without

any furcation inside (Fig.1.1(b)) For special cases of polygonal shape, circleand etc., they are divided into two parts where the ends have maximum andminimum value ofp(x2+ y2), respectively In practice, one segment should bebroken into several small ones, but it will be recovered latter (detail in section1.3.2)

• For each part, calculating L and D If they are not satisfied the Eq.1.2.1,

dividing this part into two new ones where the cut-off point has maximum

distance D (Fig.1.1(c)) And calculating L and D again.

Trang 32

(a) L and D (b) Furcation point (C) (c) Cut-off point (B)

Figure 1.1: Illustration of line segment detection

A part of edge is a line segment ⇔

(

L ≥ T1

D ≤ T2 (1.2.1)

On a certain limitation, the estimation of the far or close building appearances is

controlled by the threshold T1 If T1 is sorter then we can detect the farther building.When building is so far from the ITRS, the information of building is not of service

to localization and navigation thus this building is considered as environments In

the experiments, we choose T1 and T2 as 10 and √2 pixels, respectively Fig.1.2(a) is

an original image in ZuBuD data set [60] Figs.1.2(c,d) are the detection result with

554 line segments

1.2.2 Model of Line Segment (MLS)

One of properties of manmade and natural objects is used to coarsely reject the linesegments locating on the natural patterns In the natural object patterns, the linesegments and values of pixels are randomly distributed In the manmade objectpatterns, the line segment is an edge of boundary of their components where two

Trang 33

(a) Original image (b) 554 segments (c) 554 segments

Figure 1.2: An example of line segment detection: (a) Original image in ZuBuD dataset; (b) 554 detected line segments, the red lines overwrite on the original image; foreasy vision, in (c), the black lines are overwritten on the blurred image with linear

transformation [min, max] (values of pixels) → [125, 255];

components beside this segment are contrast color; for example, the contrast betweenwindows and walls, doors and walls and so on Or the line segment is an edge of twosurfaces with different directions The refection of different surfaces to the cameracauses the intensity values in the image are also different Furthermore, the value ofpixels of manmade object is close relation with its neighbors; for example, all pixels

of wall region are similar color This property is described by the average value andvariant of RGB color intensity

For each segment, two neighbored regions are extracted as in Fig 1.3(a) Foursample lines are drawn in parallel and the same length with the interest segment

They are symmetric location on the both sides of the interest segment where d min =

3, d max = 4 pixels, respectively On each sample segment, a set of sample points iscalculated with every pixel length Then their coordinates are rounded into integer

to get two neighbored regions (Ω1,2)

The color information of neighbored regions is used to learn the property of line

Trang 34

(a) Neighbored regions (b) Line segments (c) Sampled segments

Figure 1.3: MLS: Example image is taken from ZuBuD data set; (a) Neighbored gions; (b) line segment detection; (c) building and non-building segments are selected

re-by handling

segment A model of line segment (MLS) is represented by two parameters such as

difference of average intensities (4I) and minimum of variant of RGB color intensity (σ m) as follows,

segments of non-building patterns by selected thresholds 4I ≥ 4I0 and σ m ≤ σ0

Thresholds 4I0 and σ0 are selected by empirical experiments Here, thousands

of line segments located on building and non-building patterns are handling selectedfrom many images of ZuBuD data set Figs 1.3(b,c) are one of example where the redlines locate on the edges of building and the green ones are on non-building patterns.90.16 (%) non-building segments are ruled out while 8.7 (%) building segments are

refused if 4I0 and σ0 are chosen by 20 and 20, respectively Fig 1.4 shows the

distribution of 4I and σ m of 200 first sampled segments

Trang 35

Figure 1.4: Distribution of 4I, σ m200 first sampled segments and selected thresholds.

Then the thresholds 4I0 and σ0 are used to roughly reject the detected linesegments In Fig.1.5, the first column is several original images The middle columnillustrates the detected line segments and the last column show the survived segmentsafter rejecting natural object segments We obtained 75(%) average number of linesegments that are ruled out with natural object images like in the first row of Fig.1.5.Similarly, there are 50(%) for manmade object images Almost survived segments ofbuilding images locate on manmade object regions

Trang 36

(a) Original image (b) 813 segments (c) 813 segments (d) 243 segments

(e) Original image (f) 562 segments (g) 562 segments (h) 259 segments

(i) Original image (j) 1179 segments (k) 1179 segments (l) 512 segments

(m) Original image (n) 597 segments (o) 597 segments (p) 138 segments

Figure 1.5: Using MLS to reduce the noise from natural object regions or images

During this dissertation, we assume that the origin of 2D coordinate is embedded

at bottom left corner of image The axes are coincidence with the image boundaries

Trang 37

robustly estimate the dominant vanishing points.

Suppose that a line segment in the image plane is described as l = (a, b, c) T where

a2 + b2 = 1 (normalization); given two lines, a common point is determined by

v = li × l j , where v = (v1, v2, v3)T So v is satisfied,

lT kv = 0; k = i, j (1.3.1)

If two lines or segments are parallel in the 3D world; then they will intersect to eachother in 2D image The intersection is called as a vanishing point From a set of

N line segments in 2D image, many intersections can be found Among them, the

intersection with maximum line segments through is dubbed as a dominant ing point (DVP) In practice, an error always exists when we detect line segment.Therefore, the intersection v of more than two lines is described,

vanish-e k= lT k v ' 0; (1.3.2)The total of squared error,

Trang 38

• Input data: N segments

where t is a certain threshold The set of N iis the consensus set the sample

and defines the inliers N.

3 If the size of N i (the number of inliers) is greater than some threshold T , re-estimate v by using least squares technic for all the segments in N i withEq.1.3.3 and terminate

4 If the size of N i is less than T , select a new subset and repeat the above.

5 After K trials the largest consensus set N i is selected, and the intersection

v is re-estimated using all the points of the subset N i

• Output data: The DVP v and the inliers N i

The number K of trials is selected sufficiently high to ensure with a probability, p,

that at least one of the random samples of n segments is free from outliers Suppose

Trang 39

pseudo-code of RANSAC algorithm.

Adaptive estimation of K

• Input: K = ∞ and sample count = 0;

• While K ≥ sample count Do

1 Select a sample and calculate the number of inliers

2 Compute the proportion of outliers ²

² = 1 − number of inliers

total number of segments (1.3.6)

3 Compute K from ² and a given p (Eq.1.3.5)

4 Increase the sample count by 1.

• Output: sample count

From the total error Eq.1.3.3, a robust cost function is defined as

Trang 40

i); n = 2 (size of subset, sample) (1.3.10)

To improve the result of DVP estimation, the iteratively reweighted least squares

technic [54, 68] is used for the set of inliers N i The minimization of Eq.1.3.3 isreplaced by the Eq.1.3.11

In our experiments, the number of iterations and τ are set by 10 and 1, respectively.

Most of line segments which come from the PC edges are so short And the cannyedge detector works in the integer operator So the directions of detected lines arenot probably coincident with the direction of PC edges The error from Eq.1.3.3changes in the large range Specially, when the DVP locates far from the segments,the orthogonal Euclidian distance from DVP to the segments is increase In other

Định dạng
Số trang	129
Dung lượng	12,67 MB