An efficient framework for pixel wise building segmentation from aerial images

An Efficient Framework for Pixel-wise BuildingSegmentation from Aerial Images Nguyen Tien Quang Hanoi University of Science and Technology octagon9x@gmail.com Nguyen Thi Thuy Faculty of

Trang 1

An Efficient Framework for Pixel-wise Building

Segmentation from Aerial Images

Nguyen Tien Quang

Hanoi University of Science and Technology

octagon9x@gmail.com

Nguyen Thi Thuy

Faculty of Information Technology Vietnam National University of

Agriculture

ntthuy@vnua.edu.vn Dinh Viet Sang

sangdv@soict.hust.edu.vn

Huynh Thi Thanh Binh

binhht@soict.hust.edu.vn ABSTRACT

Detection of buildings in aerial images is an important and

challenging task in computer vision and aerial image

inter-pretation This paper presents an efficient approach that

combines Random forest (RF) and a fully connected

condi-tional random field (CRF) on various features for the

de-tection and segmentation of buildings at pixel level RF

allows one to learn extremely fast on big aerial image data

The unary potentials given by RF are then combined in a

fully connected conditional random field model for

pixel-wise classification The use of high dimensional Gaussian

filter for pairwise potentials makes the inference tractable

while obtaining high classification accuracy Experiments

have been conducted on a challenging aerial image dataset

from a recent ISPRS Semantic Labeling Contest [9] We

obtained state-of-the-art accuracy with a reasonable

com-putation time

CCS Concepts

•Computing methodologies → Image segmentation;

Supervised learning by classification; Latent variable

models; Deep belief networks;

Keywords

Aerial image, building detection, random forest, fully

con-nected CRF, semantic segmentation, feature extraction

Detection and segmentation of building objects from aerial

images is important for aerial image analysis and

interpre-tation Some applications to name are cartography, 3D city

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full

cita-tion on the first page Copyrights for components of this work owned by others than

ACM must be honored Abstracting with credit is permitted To copy otherwise, or

re-publish, to post on servers or to redistribute to lists, requires prior specific permission

and/or a fee Request permissions from permissions@acm.org.

SoICT 2015, December 03-04, 2015, Hue City, Viet Nam

c

DOI: http://dx.doi.org/10.1145/2833258.2833311

modeling, land cover classification, Internet applications The topic has been widely researched in the last decades The problem is challenging due to the natural complex of terrestrial scenes and the demanding for efficient processing

of big image data sets

The problem of building segmentation is difficult for many reasons Building are mostly located in urban scene with various objects in close proximity or disturbing, such as parking lots, vehicle, ground street, trees Some objects are occluded or cluttered Buildings may appear in com-plex shapes with various architectural details; building roofs show variant reflectance, the gray roof tops are very similar

to street layer

With the advance of aerial imaging technology, high res-olution aerial images can be produced and made available for various tasks [8,9,18] Aerial images are usually taken over large areas on the ground, usually a city or some urban area of hundreds square-kilometers The ground sampling distance of aerial imagery may be at a pixel size of 10 cm, and such large urban area may be covered by thousands large-format aerial photographs at high overlaps [28] The high resolution of images makes it convenient for analysing

in details of small objects, however, processing of big image data is computational demanding

In this paper, we aim at a concrete task: to detect the appearance of buildings at pixel level, i.e building footprints extraction The detection and segmentation of buildings is necessity for many tasks, such as change detection for map revision or providing building footprints for the next steps

of building extraction and reconstruction [4,11,31] Over the years, automated building detection from aerial image has been being an active research topic There have been a lots of proposed methods for solving the problem of building detection in literature [11,21] These approaches are different in the use of data sources, the used models and the evaluation methods [23,26,34] However, how to exploit and integrate multiple sources of data efficiently in an effi-cient learning framework, to obtain satisfying performance

of the detection and segmentation of buildings at pixel level,

is still an open problem

This paper propose an efficient approach that combines

Trang 2

Random forest (RF) and a fully connected conditional

ran-dom field (CRF) on various features for the detection and

segmentation of building footprints Six informative feature

types are extracted from rich source of image data Random

forest can learn very fast on these feature sets and give

out-put of high probability to pixels belonging to building class

CRF is then employed to exploit the potential interactions

of neighbor pixels, aim to improve the classification results

given by RF CRF with Gaussian kernels can perform

infer-ence efficiently, allow to reduce computational time on big

data sets

Buildings detection and extraction is an active research

topic in photogrammetry and computer vision [23,24,28,33]

The approaches are typically different in type of image data

and the used methods Some works use single intensity

im-age only [19] Some works use data from multiple aerial

images, including color and high field data [11,39,43] Early

works mainly used geometric image features for feature

ex-traction [7,27] These approaches often fail when the

build-ing structures are complex [6] In some works, rooftops

were used as an evidence of building’s present A perceptual

grouping method or a geometric based method is then

em-ployed to detect and reconstruct buildings This approach

allows the detection and reconstruction to be done at the

same time The system is usually complicated and human

user interaction is needed in many cases [19,30]

Matikainen et al [22] proposed a system for building

de-tection from laser scanning data and aerial colour images

The data from DSM is classified into ground and

buildings-or-tree objects Buildings are then separated from trees

[5] has shown the feasibility of classification-based method

in building detection process and the possible automation

of the approach Rottensteiner [32] proposed an approach

for per-pixel classification for buildings change detection for

map revision Xu et al [40] proposed a three-step

point-based method for detecting changes to buildings and trees

using airborne light detection and ranging (LiDAR) data

Some approaches have employed graphical models for

inte-grating contextual information to improve classification

re-sult, cf Kumar and Hebert [15], Verbeek and Triggs [36]

Korc and Forstner [14] used Markov random field model and

showed that parameter learning methods can be improved

There have been attempts to use conditional random field

to model contextual information for detection of urban

ar-eas [42] or objects from aerial images [41]

Meng et al [25] used a multi-directional ground filter on

lidar data to obtain bare ground points, and then NDVI was

employed to remove trees A supervised C4.5 decision tree

analysis was then applied to classify building pixels from

non-building pixels In the result, about 2.55 percents of

tree pixels were misclassified as buildings

Recently, the ISPRS benchmark data set for urban

ob-ject detection has been released [9], which provide ground

truth for evaluation of methods The results of very

re-cent works reported in Rottensteiner et al [33] show efforts

of many researches in developing efficient methods for

au-tomated object detection and 3D buildings reconstruction

from aerial imagery Despite that, the problem of how to

ef-fectively detection and segmentation of building footprints

at pixel-level from high resolution aerial images remains a

challenge, especially in computational time

Our framework consists of three steps: feature extraction,

RF learning, and CRF inference For feature extraction, powerful feature extraction techniques are employed for ex-tracting representative features from given sources aerial im-age data (including true orthophoto (TOP) and a Digital Surface Model (DSM)) [9] These feature types are NDVI, NDSM, texton, color, saturation and entropy RF is then learnt on these features CRF is finally performed inference

on the classification output of RF Details of each step will

be presented in the following

We use the following features for the description of image data

NDVI: the normalized digital vegetation index, computed from the first (IR) and the second channels (R) of the CIR true-orthophoto (TOP)

N DV I = IR − R

The use of the NDVI is based on the fact that green veg-etation has low reflectance in the red spectrum (R) due to chlorophyll and much higher reflectance in infrared spectrum (IR) due to its cell structure Hence, this is a good feature

to distinguish green vegetation from other classes

NDSM: the difference between the DSM and the derived DTM, which classifies pixel into ground and off-ground

This feature helps to distinguish the high object classes from the low object classes

Texton: Texton is a unit of texture, reflecting the hu-man perception of textured images It has been proven to

be effective in image segmentation Therefore, representing images in the form of texton, the pixels will contain more useful information than in the form of normal color [38] Color: In this work we use the CIELab color space Un-like the RGB and CMYK color models, Lab color is designed

to approximate human vision It aspires to perceptual uni-formity, and its L component closely matches human per-ception of lightness

Saturation of CIR image: some previous works have shown that the saturation is helpful to further support the separation of vegetation and impervious surfaces

Entropy gathered over a 9 × 9 neighborhood from the DSM to exploit spatial context information of a pixel (neigh-boring)

3.2.1 Random Forest

With those extracted features, we used random forest clas-sifier to train and build unary potentials for CRF models Random forest used in this work is Breiman’s CART-RF [3] The training algorithm for random forest applies the general technique of bootstrap aggregating (bagging) to tree learn-ers Given a training set I = i1, i2, , inwhere ijis a feature vector at pixel j, with responses X = x1, x2, , xn where

xj ∈ L = {1, , l}, bagging repeatedly selects a random sample with replacement of the training set and fits trees to these samples:

for b = 1, , ntree do

Trang 3

Sample with replacement n training samples (Ib, Xb) from

(I, X)

Train a classification tree fbon (Ib, Xb)

endfor

After training, predictions for unseen samples i0 can be

made by averaging the predictions from all the individual

classification trees on i0:

ˆ

f = 1 ntree

ntree

X

b=1

fb(i0) (3)

It means to take the majority votes in the case of

classifi-cation trees The use of random forests has several

advan-tages including: the computational efficiency in both

train-ing and classification, the probabilistic output, the seamless

handling of a large variety of visual features and the inherent

feature sharing of a multi-class classifier However, by using

this technique the image pixels are labeled independently

without regarding interrelations between them Therefore,

in the later process, we can further improve the

segmenta-tion results by employing an efficient inference model (CRF)

that can exploit the interrelations between image pixels

3.2.2 Fully Connected Conditional Random Field

In this subsection we provide a brief overview of fully

con-nected Conditional Random Fields (full-CRF) for pixel-wise

labelling and introduce the technique used in this paper A

full-CRF, used in the context of pixel-wise label prediction,

models pixel labels as random variables that are conditioned

upon a global observation, and obey Markov property Here

the global observation is usually taken to be the overall

im-age

Let X be a random field over the set of random variables

X = {X1, X2, , XN}, where N is the number of pixels in

the image, and Xi is the random variable associated with

pixel i, which represents the label assigned to the pixel i

and can take any value from a predefined set of labels L =

{1, 2, , l} Let I be an image observation, which represents

the features corresponding to pixels The pair (I, X) can be

seen as a CRF model characterized by a Gibbs distribution:

P (X = x|I) = 1

Z(I)exp(−E(x|I)), (4) where E(x) is called the energy of the label assignment x ∈

LN and Z(I) is the normalization function [16] In the fully

connected pairwise CRF model [13], the energy of a label

assignment x is given by:

E(x) =

N

X

i=1

ψu(xi)

| {z }

unary

+X

i<j

ψp(xi, xj)

pairwise

(5)

where the unary energy components ψu(xi) measure the cost

of the pixel i assigned the label xi, and pairwise energy

com-ponents ψp(xi, xj) measure the cost of assigning labels xi,

xjto pixels i, j simultaneously In our model, unary

ener-gies are obtained from a RF classifier, which predicts

sepa-rately labels for pixels without considering the smoothness

and the consistency of the label assignments The pairwise

energies provide an image data-dependent smoothing term

which encourages assigning the same label to pixels with

similar properties such as the similar color and proximity

positions As was done in [13], we model pairwise potentials

as weighted Gaussians:

ψp(xi, xj) = µ(xi, xj)

M

X

m=1

w(m)k(m)G (fi, fj) (6)

where each kG(m) for m = 1, , M , is a Gaussian kernel ap-plied on feature vectors The feature vector of pixel i, which denoted by fi, is computed from image features such as spa-tial location and color values [13] The function µ(., ), called the label compatibility function, introduces a penalty for nearby similar pixels that are assigned different labels Inference Algorithm: Minimizing the above CRF en-ergy E(x) yields the most probable label assignment x for the given image, that is equivalent to the maximum a poste-riori probability inference (MAP) Since the exact minimiza-tion is intractable, Mean-Field inference computes a bution Q(X) that best approximates the probability distri-bution P (X) of the model Q(X) =Q

iQi(Xi) is a product

of independent marginals over each of the variables Each

of the marginals is constrained to be a proper probability distribution: P

xiQi(Xi = xi) = 1 and Qi(Xi) ≥ 0 The mean field approximation minimizes the KL-divergence: D(Q k P ) =X

i

Qi(xi) logQi(xi)

Pi(xi)

=X

i

Qi(xi) log Qi(xi) + Qi(xi)X

i

ψi(xi) + Qi(xi)X

i<j

ψp(xi, xj) + log Z(I)

(7)

Traditional mean field inference [12] performs the following message passing update on each marginal Qi in turn until all margial probabilities are converged:

Qi(xi) = 1

Zi

exp



−ψu(xi) −X

j6=i

X

x j

ψp(xi, xj)Qj(xj)



 (8) where Ziis the marginal normalization function Each iter-ator is guaranteed to decrease the KL-divergence, thus this inference algorithm is guaranteed to converge to a local opti-mum [12,37]) In message passing the computational bottle-neck is the evaluation of the sumP

j6=i

P

xjψp(xi, xj)Qj(xj) The computational complexity of a single update of a marginal

Qi(Xi) is O(N ) and the complexity of updating all the marginals is O(N2) Fortunately, Krahenbuhl [13] observed that a high dimensional Gaussian filter can be used to up-date all the mean field marginals concurrently in time O(N ), that makes inference tractable

In this section, we present our experimental results and evaluate the proposed building segmentation approach on a benchmark image dataset We then compare it with state-of-the-art methods

We conducted experiments on a challenging benchmark dataset recently released by the International Society of Pho-togrammetry and Remote Sensing (ISPRS) Working group III/4 for evaluation of newly proposed methods, the ISPRS Semantic Labeling Benchmark [9] This test dataset was

Trang 4

acquired over Vaihingen city in Germany The dataset

con-tains 33 large image patches, each of which consists of a true

orthophoto (TOP) extracted from a larger TOP mosaic and

a Digital Surface Model (DSM) The average size of such a

patch is about 15MB; while the resolution of a patch is

var-ied from 2336 × 1281 upto 3816 × 2550 Totally, all patches

contain over 168 million pixels The ground sampling

dis-tance of both, the TOP and the DSM, is 9 cm Labeled

ground truth was provided for 16 of the patches, which are

divided into training and validation sets The training set

consists of 11 patches (1, 3, 5, 7, 13, 17, 21, 23, 26, 32, 37)

and the validation set consists of 5 patches (11, 15, 28, 30

and 40) Normalized DSMs were provided by [20], and were

generated using the lasground tool [1], which computes the

normalized height base on the ground-off pixels

In the experiments, the system was run 20 times for each

test set All the programs were run on a machine with

CPU Intel Core i7-4770K (8 CPUs), RAM 16GB DDRIII

1600Mhz, Window 8.1, and implemented by R and C++

In this section, we compare the results of our proposed

framework to other methods reported in [10] on the

IS-PRS Semantic Labeling Benchmark dataset The evaluation

is based on different measures including precision

(correct-ness), recall (completeness) and F1-score, which are defined

as follows:

P recision = #true positive

#truepositive + #f alse positive. (9)

Recall = #true positive

#true positive + #f alse negative. (10)

F 1 − score = 2 ∗ P recision ∗ Recall

P recision + Recall . (11)

As described by the ISPRS contest committee, the

bound-aries between classes are eroded by a circular disc of 3 pixel

radius Those eroded areas are then ignored during

evalu-ation The motivation is to reduce the impact of uncertain

border definitions on the evaluation The experimental

re-sults are shown in Table1

Table 1: Building segmentation results

Method Precison(%) Recall(%) F1-score(%)

As one can see from the Table1, the proposed framework

achieves state-of-the-art recall, while maintaining high

pre-cision It means that our method can precisely detect a high

percentage of true building area, while keeping a small false

positive rate With respect to F1-score measure defined as

the harmonic mean of precision and recall, our method takes

second place, standing just after the well-known

state-of-the-art method in various computer vision tasks, CNN [29]

Nevertheless, CNN requires much more time for training and

test phrases than our framework The CNN model in [29] combines three separate CNN submodels with three differ-ent input image patch sizes: 16×16, 32×32, 64×64 In order

to demonstrate the performance of CNN-based models, we study, however, a simplified CNN model based on the spirit

of the CNN model in [29] The simplified CNN model works with image patches of size 32 × 32, and is implemented using Torch7 library with CUDA support [2] We test this model

on a strong Dell Precision T7610 Workstation with Intel Xeon 8 Core E5-2650V2 2.60 GHz, 32GB DDR3 RAM and NVIDIA Quadro K5000 Average time required for training and test phrases for both our framework and the simplified CNN model is shown in Table2

Table 2: Average time for training and test phrases

training time(s) per image(s) Simplified CNN

support Our framework with

on 4 cores

From the Table2, one can see that the CNN model, even with simplified version, is much more computationally ex-pensive than our framework Particularly, in spite of being executed on a much stronger machine, Dell Precision T7610 Workstation with CUDA support, the simplified CNN model

is slower about 25 times in training, and about 50 times

in test phrase than our framework Obviously, the com-plicated CNN model proposed in [29] must be much more time-consuming, and, therefore, much less effective than our framework

Finally, in Fig.1, we demonstrate the improvement of seg-mentation result obtained by applying the fully connected CRF model to the probabilistic results of RF on the test image patch 11 We can notice that the fully connected CRF can effectively eliminate the misclassified pixels (can

be considered as noise) from RF’s output

We have presented an efficient framework for semantic im-age segmentation, and, particularly, for pixel-wise building segmentation from aerial images Our stacked framework includes a preliminary layer using RF with carefully hand-designed features, and a denoising layer based on a fully connected CRF We then evaluate the proposed framework

in building segmentation task on the well-known ISPRS Se-mantic Labeling Benchmark dataset of aerial images [9] The experimental results show that our framework achieve state-of-the-art accuracy with a reasonable computational speed

[1] Lasground tool for bare-earth extraction

http://rapidlasso.com/lastools/lasground/ Accessed: 2015-08-10

[2] Torch 7 library.http://http://torch.ch/ Accessed: 2015-08-10

Trang 5

(a) Input image (b) Ground-truth (c) RF (d) RF+CRF

Figure 1: An illustration of improving segmentation result using CRF over RF’s output

[3] L Breiman Random forests Machine learning,

45(1):5–32, 2001

[4] N Champion, L Matikainen, X Liang, J Hyypp¨ı£¡,

and F Rottensteiner A test of 2D building change

detection methods: Comparison, evaluation and

perspectives the International Archives of the

Photogrammetry, Remote Sensing and Spatial

Information Sciences, XXXVII:297–303, 2008

[5] N Champion, G Stamon, and M Pierrot-Deseilligny

Lecture Notes in Geoinformation and Cartography,

chapter Automatic Revision of 2D Building Databases

from High Resolution Satellite Imagery: A 3D

Photogrammetric Approach, pages 43–66 Springer

Berlin Heidelberg, 2009

[6] M Drauschke and W F¨orstner Selecting appropriate

features for detecting buildings and building parts In

Proceedings of The 21st Congress of the International

Society for Photogrammetry and Remote Sensing

(ISPRS), Beijing, China, 2008

[7] A Fischer, T Kolbe, F Lang, A Cremers,

W Forstner, L Pluemer, and V Steinhage Extracting

buildings from aerial images using hierarchical

aggregation in 2d and 3d Computer Vision and Image

Understanding, 72(2):185–203, November 1998

[8] M Gruber, M Ponticelli, S Bern¨ı£¡gger, and

F Leberl Ultracamx, the large format digital aerial

camera system by Vexcel Imaging / Microsoft ISPRS

Archives, XXXVII Part B1:665–670, 2008

[9] ISPRS Working group III/4 Isprs 2d semantic labeling

contest.http://www2.isprs.org/commissions/comm3/

wg4/semantic-labeling.html Accessed: 2015-08-10

[10] ISPRS Working group III/4 Isprs semantic labeling

contest (2d) results.http://www2.isprs.org/

vaihingen-2d-semantic-labeling-contest.html

Accessed: 2015-08-10

[11] C Jaynes, E Riseman, and A Hanson Recognition

and reconstruction of buildings from multiple aerial

images Computer Vision Image Understanding,

90(1):68–98, 2003

[12] D Koller and N Friedman Probabilistic graphical

models: principles and techniques MIT press, 2009

[13] V Koltun Efficient inference in fully connected crfs

with gaussian edge potentials Adv Neural Inf Process Syst, 2011

[14] F Korc and W Forstner Interpretation terrestrial images of urban scenes using discriminative random fields In Proceedings of the Congress of the

International Society for Photogrammetry and Remote Sensing, pages B3a: 291–296, 2008

[15] S Kumar and M Hebert Man-made structure detection in natural images using a causal multiscale random field In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, pages 119–126, 2003

[16] J Lafferty, A McCallum, and F C Pereira

Conditional random fields: Probabilistic models for segmenting and labeling sequence data 2001

[17] A Lagrange and B Le Saux Convolutional neural networks for semantic labeling 2015

[18] F Leberl and J Szabo Novel totally digital photogrammetric workflow Technical report, Semana Geomatica, IGAC-Bogota, Colombia, August 2005 [19] C Lin and R Nevatia Building detection and description from a single intensity image Int Journal Computer Vision and Image Understanding,

72(2):101–121, 1998

[20] I Markus Gerke Use of the stair vision library within the isprs 2d semantic labeling benchmark (vaihingen) [21] B Matei, H Sawhney, S Samarasekera, J Kim, and

R Kumar Building segmentation for densely built urban regions using aerial lidar data In Proceedings of the IEEE Computer Vision and Pattern Recognition, pages 1–8, June 2008

[22] L Matikainen, K Kaartinen, and Hyypp¨ı£¡

Classification tree based building detection from laser scanner and aerial image data In Proceedings of ISPRS Workshop Laser Scanning, 2007

[23] H Mayer Automatic object extraction from aerial imagery—a survey focusing on buildings Computer Vision and Image Understanding, 74(2):138–149, 1999 [24] H Mayer, S Hinz, and U Stilla Advances in

Photogrammetry, Remote Sensing and Spatial Information Science, chapter 16: Automated

Trang 6

extraction of roads, buildings and vegetation from

multi-source data, pages 213–226 ISPRS Congress

book, 2008

[25] X Meng, N Currit, L Wang, and X Yang Detect

residential buildings from lidar and aerial photographs

through object-oriented land-use classification

Photogrammetric Engineering & Remote Sensing, 78,

2012

[26] S Mueller and D W Zaum Robust building

detection in aerial images In ISPRS Workshop on

Object Extraction for 3D City Models, Road Databases

and Traffic Monitoring - Concepts, Algorithms, and

Evaluation (CMRT05), 2005

[27] R Nevatia, C Lin, and A Huertas A system for

building detection from aerial images In Automatic

Extraction of Man-Made Objects from Aerial and

Space Images, Birkhaser Verlag, pages 77–86, 1997

[28] T T Nguyen, S Kluckner, H Bischof, and F Leberl

Aerial photo building classification by stacking

appearance and elevation measurements In In:

Proceedings ISPRS, 100 Years ISPRS - Advancing

Remote Sensing Science, on CD-ROM, 2010

[29] S Paisitkriangkrai, J Sherrah, P Janney, and

A Hengel Effective semantic pixel labelling with

convolutional networks and conditional random fields

In Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Workshops, pages

36–43, 2015

[30] N Paparoditis, M Cord, M Jordan, and J.-P

Cocquerez Building detection and reconstruction from

mid-and high-resolution aerial imagery Computer

Vision and Image Understanding, 72(2):122–142, 1998

[31] N Pfeifer, M Rutzinger, F Rottensteiner,

W Muecke, and M Hollaus Extraction of building

footprints from airborne laser scanning: Comparison

and validation techniques In Urban Remote Sensing

Joint Event, pages 1–9, April 2007

[32] F Rottensteiner Automated updating of building

data bases from digital surface models and

multi-spectral images International Archives of the

Photogrammetry, Remote Sensing and Spatial

Information Sciences, XXXVII B3A:pp.265–270, 2008

[33] F Rottensteiner, G Sohn, M Gerke, J D Wegner,

U Breitkopf, and J Jung Results of the isprs

benchmark on urban object detection and 3d building

reconstruction ISPRS Journal of Photogrammetry and Remote Sensing, 93:256–271, 2014

[34] B Sirmacek and C Unsalan Building detection from aerial images using invariant color features and shadow information In 23rd Intl Symp on ISCIS, pages 1–5, Oct 2008

[35] T Speldekamp, C Fries, C Gevaert, and M Gerke Automatic semantic labelling of urban areas using a rule-based approach and realized with mevislab [36] J Verbeek and B Triggs Region classification with markov field aspect models In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, pages 1–8, jun 2007

[37] M J Wainwright and M I Jordan Graphical models, exponential families, and variational inference Foundations and TrendsR

1(1-2):1–305, 2008

[38] J Winn, A Criminisi, and T Minka Object categorization by learned universal visual dictionary

In Computer Vision, 2005 ICCV 2005 Tenth IEEE International Conference on, volume 2, pages 1800–1807 IEEE, 2005

[39] M Xie, K Fu, and Y Wu Building recognition and reconstruction from aerial imagery and lidar data In Proceedings of the International Conference on Radar, pages 1–4, Oct 2006

[40] H Xu, L Cheng, M Li, Y Chen, and L Zhong Using octrees to detect changes to buildings and trees

in the urban environment from airborne lidar data Remote Sensing, 7(8):9682–9704, 2015

[41] J Yao and Z M Zhang Semi-supervised learning based object detection in aerial imagery In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1011–1016, Washington, DC, USA, 2005 IEEE Computer Society [42] P Zhong and R Wang Object detection based on combination of conditional random field and markov random field In Proceedings of the 18th International Conference on Pattern Recognition, pages 160–163, 2006

[43] P Zimmermann A new framework for automatic building detection analyzing multiple cue data International Archives of Photogrammetry and Remote Sensing, 33:1063–1070, 2000

Định dạng
Số trang	6
Dung lượng	1,55 MB