An Efficient Framework for Pixel-wise BuildingSegmentation from Aerial Images Nguyen Tien Quang Hanoi University of Science and Technology octagon9x@gmail.com Nguyen Thi Thuy Faculty of
Trang 1An Efficient Framework for Pixel-wise Building
Segmentation from Aerial Images
Nguyen Tien Quang
Hanoi University of Science and Technology
octagon9x@gmail.com
Nguyen Thi Thuy
Faculty of Information Technology Vietnam National University of
Agriculture
ntthuy@vnua.edu.vn Dinh Viet Sang
Hanoi University of Science and Technology
sangdv@soict.hust.edu.vn
Huynh Thi Thanh Binh
Hanoi University of Science and Technology
binhht@soict.hust.edu.vn ABSTRACT
Detection of buildings in aerial images is an important and
challenging task in computer vision and aerial image
inter-pretation This paper presents an efficient approach that
combines Random forest (RF) and a fully connected
condi-tional random field (CRF) on various features for the
de-tection and segmentation of buildings at pixel level RF
allows one to learn extremely fast on big aerial image data
The unary potentials given by RF are then combined in a
fully connected conditional random field model for
pixel-wise classification The use of high dimensional Gaussian
filter for pairwise potentials makes the inference tractable
while obtaining high classification accuracy Experiments
have been conducted on a challenging aerial image dataset
from a recent ISPRS Semantic Labeling Contest [9] We
obtained state-of-the-art accuracy with a reasonable
com-putation time
CCS Concepts
•Computing methodologies → Image segmentation;
Supervised learning by classification; Latent variable
models; Deep belief networks;
Keywords
Aerial image, building detection, random forest, fully
con-nected CRF, semantic segmentation, feature extraction
Detection and segmentation of building objects from aerial
images is important for aerial image analysis and
interpre-tation Some applications to name are cartography, 3D city
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full
cita-tion on the first page Copyrights for components of this work owned by others than
ACM must be honored Abstracting with credit is permitted To copy otherwise, or
re-publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee Request permissions from permissions@acm.org.
SoICT 2015, December 03-04, 2015, Hue City, Viet Nam
c
DOI: http://dx.doi.org/10.1145/2833258.2833311
modeling, land cover classification, Internet applications The topic has been widely researched in the last decades The problem is challenging due to the natural complex of terrestrial scenes and the demanding for efficient processing
of big image data sets
The problem of building segmentation is difficult for many reasons Building are mostly located in urban scene with various objects in close proximity or disturbing, such as parking lots, vehicle, ground street, trees Some objects are occluded or cluttered Buildings may appear in com-plex shapes with various architectural details; building roofs show variant reflectance, the gray roof tops are very similar
to street layer
With the advance of aerial imaging technology, high res-olution aerial images can be produced and made available for various tasks [8,9,18] Aerial images are usually taken over large areas on the ground, usually a city or some urban area of hundreds square-kilometers The ground sampling distance of aerial imagery may be at a pixel size of 10 cm, and such large urban area may be covered by thousands large-format aerial photographs at high overlaps [28] The high resolution of images makes it convenient for analysing
in details of small objects, however, processing of big image data is computational demanding
In this paper, we aim at a concrete task: to detect the appearance of buildings at pixel level, i.e building footprints extraction The detection and segmentation of buildings is necessity for many tasks, such as change detection for map revision or providing building footprints for the next steps
of building extraction and reconstruction [4,11,31] Over the years, automated building detection from aerial image has been being an active research topic There have been a lots of proposed methods for solving the problem of building detection in literature [11,21] These approaches are different in the use of data sources, the used models and the evaluation methods [23,26,34] However, how to exploit and integrate multiple sources of data efficiently in an effi-cient learning framework, to obtain satisfying performance
of the detection and segmentation of buildings at pixel level,
is still an open problem
This paper propose an efficient approach that combines
Trang 2Random forest (RF) and a fully connected conditional
ran-dom field (CRF) on various features for the detection and
segmentation of building footprints Six informative feature
types are extracted from rich source of image data Random
forest can learn very fast on these feature sets and give
out-put of high probability to pixels belonging to building class
CRF is then employed to exploit the potential interactions
of neighbor pixels, aim to improve the classification results
given by RF CRF with Gaussian kernels can perform
infer-ence efficiently, allow to reduce computational time on big
data sets
Buildings detection and extraction is an active research
topic in photogrammetry and computer vision [23,24,28,33]
The approaches are typically different in type of image data
and the used methods Some works use single intensity
im-age only [19] Some works use data from multiple aerial
images, including color and high field data [11,39,43] Early
works mainly used geometric image features for feature
ex-traction [7,27] These approaches often fail when the
build-ing structures are complex [6] In some works, rooftops
were used as an evidence of building’s present A perceptual
grouping method or a geometric based method is then
em-ployed to detect and reconstruct buildings This approach
allows the detection and reconstruction to be done at the
same time The system is usually complicated and human
user interaction is needed in many cases [19,30]
Matikainen et al [22] proposed a system for building
de-tection from laser scanning data and aerial colour images
The data from DSM is classified into ground and
buildings-or-tree objects Buildings are then separated from trees
[5] has shown the feasibility of classification-based method
in building detection process and the possible automation
of the approach Rottensteiner [32] proposed an approach
for per-pixel classification for buildings change detection for
map revision Xu et al [40] proposed a three-step
point-based method for detecting changes to buildings and trees
using airborne light detection and ranging (LiDAR) data
Some approaches have employed graphical models for
inte-grating contextual information to improve classification
re-sult, cf Kumar and Hebert [15], Verbeek and Triggs [36]
Korc and Forstner [14] used Markov random field model and
showed that parameter learning methods can be improved
There have been attempts to use conditional random field
to model contextual information for detection of urban
ar-eas [42] or objects from aerial images [41]
Meng et al [25] used a multi-directional ground filter on
lidar data to obtain bare ground points, and then NDVI was
employed to remove trees A supervised C4.5 decision tree
analysis was then applied to classify building pixels from
non-building pixels In the result, about 2.55 percents of
tree pixels were misclassified as buildings
Recently, the ISPRS benchmark data set for urban
ob-ject detection has been released [9], which provide ground
truth for evaluation of methods The results of very
re-cent works reported in Rottensteiner et al [33] show efforts
of many researches in developing efficient methods for
au-tomated object detection and 3D buildings reconstruction
from aerial imagery Despite that, the problem of how to
ef-fectively detection and segmentation of building footprints
at pixel-level from high resolution aerial images remains a
challenge, especially in computational time
Our framework consists of three steps: feature extraction,
RF learning, and CRF inference For feature extraction, powerful feature extraction techniques are employed for ex-tracting representative features from given sources aerial im-age data (including true orthophoto (TOP) and a Digital Surface Model (DSM)) [9] These feature types are NDVI, NDSM, texton, color, saturation and entropy RF is then learnt on these features CRF is finally performed inference
on the classification output of RF Details of each step will
be presented in the following
We use the following features for the description of image data
NDVI: the normalized digital vegetation index, computed from the first (IR) and the second channels (R) of the CIR true-orthophoto (TOP)
N DV I = IR − R
The use of the NDVI is based on the fact that green veg-etation has low reflectance in the red spectrum (R) due to chlorophyll and much higher reflectance in infrared spectrum (IR) due to its cell structure Hence, this is a good feature
to distinguish green vegetation from other classes
NDSM: the difference between the DSM and the derived DTM, which classifies pixel into ground and off-ground
This feature helps to distinguish the high object classes from the low object classes
Texton: Texton is a unit of texture, reflecting the hu-man perception of textured images It has been proven to
be effective in image segmentation Therefore, representing images in the form of texton, the pixels will contain more useful information than in the form of normal color [38] Color: In this work we use the CIELab color space Un-like the RGB and CMYK color models, Lab color is designed
to approximate human vision It aspires to perceptual uni-formity, and its L component closely matches human per-ception of lightness
Saturation of CIR image: some previous works have shown that the saturation is helpful to further support the separation of vegetation and impervious surfaces
Entropy gathered over a 9 × 9 neighborhood from the DSM to exploit spatial context information of a pixel (neigh-boring)
3.2.1 Random Forest
With those extracted features, we used random forest clas-sifier to train and build unary potentials for CRF models Random forest used in this work is Breiman’s CART-RF [3] The training algorithm for random forest applies the general technique of bootstrap aggregating (bagging) to tree learn-ers Given a training set I = i1, i2, , inwhere ijis a feature vector at pixel j, with responses X = x1, x2, , xn where
xj ∈ L = {1, , l}, bagging repeatedly selects a random sample with replacement of the training set and fits trees to these samples:
for b = 1, , ntree do
Trang 3Sample with replacement n training samples (Ib, Xb) from
(I, X)
Train a classification tree fbon (Ib, Xb)
endfor
After training, predictions for unseen samples i0 can be
made by averaging the predictions from all the individual
classification trees on i0:
ˆ
f = 1 ntree
ntree
X
b=1
fb(i0) (3)
It means to take the majority votes in the case of
classifi-cation trees The use of random forests has several
advan-tages including: the computational efficiency in both
train-ing and classification, the probabilistic output, the seamless
handling of a large variety of visual features and the inherent
feature sharing of a multi-class classifier However, by using
this technique the image pixels are labeled independently
without regarding interrelations between them Therefore,
in the later process, we can further improve the
segmenta-tion results by employing an efficient inference model (CRF)
that can exploit the interrelations between image pixels
3.2.2 Fully Connected Conditional Random Field
In this subsection we provide a brief overview of fully
con-nected Conditional Random Fields (full-CRF) for pixel-wise
labelling and introduce the technique used in this paper A
full-CRF, used in the context of pixel-wise label prediction,
models pixel labels as random variables that are conditioned
upon a global observation, and obey Markov property Here
the global observation is usually taken to be the overall
im-age
Let X be a random field over the set of random variables
X = {X1, X2, , XN}, where N is the number of pixels in
the image, and Xi is the random variable associated with
pixel i, which represents the label assigned to the pixel i
and can take any value from a predefined set of labels L =
{1, 2, , l} Let I be an image observation, which represents
the features corresponding to pixels The pair (I, X) can be
seen as a CRF model characterized by a Gibbs distribution:
P (X = x|I) = 1
Z(I)exp(−E(x|I)), (4) where E(x) is called the energy of the label assignment x ∈
LN and Z(I) is the normalization function [16] In the fully
connected pairwise CRF model [13], the energy of a label
assignment x is given by:
E(x) =
N
X
i=1
ψu(xi)
| {z }
unary
+X
i<j
ψp(xi, xj)
pairwise
(5)
where the unary energy components ψu(xi) measure the cost
of the pixel i assigned the label xi, and pairwise energy
com-ponents ψp(xi, xj) measure the cost of assigning labels xi,
xjto pixels i, j simultaneously In our model, unary
ener-gies are obtained from a RF classifier, which predicts
sepa-rately labels for pixels without considering the smoothness
and the consistency of the label assignments The pairwise
energies provide an image data-dependent smoothing term
which encourages assigning the same label to pixels with
similar properties such as the similar color and proximity
positions As was done in [13], we model pairwise potentials
as weighted Gaussians:
ψp(xi, xj) = µ(xi, xj)
M
X
m=1
w(m)k(m)G (fi, fj) (6)
where each kG(m) for m = 1, , M , is a Gaussian kernel ap-plied on feature vectors The feature vector of pixel i, which denoted by fi, is computed from image features such as spa-tial location and color values [13] The function µ(., ), called the label compatibility function, introduces a penalty for nearby similar pixels that are assigned different labels Inference Algorithm: Minimizing the above CRF en-ergy E(x) yields the most probable label assignment x for the given image, that is equivalent to the maximum a poste-riori probability inference (MAP) Since the exact minimiza-tion is intractable, Mean-Field inference computes a bution Q(X) that best approximates the probability distri-bution P (X) of the model Q(X) =Q
iQi(Xi) is a product
of independent marginals over each of the variables Each
of the marginals is constrained to be a proper probability distribution: P
xiQi(Xi = xi) = 1 and Qi(Xi) ≥ 0 The mean field approximation minimizes the KL-divergence: D(Q k P ) =X
i
Qi(xi) logQi(xi)
Pi(xi)
=X
i
Qi(xi) log Qi(xi) + Qi(xi)X
i
ψi(xi) + Qi(xi)X
i<j
ψp(xi, xj) + log Z(I)
(7)
Traditional mean field inference [12] performs the following message passing update on each marginal Qi in turn until all margial probabilities are converged:
Qi(xi) = 1
Zi
exp
−ψu(xi) −X
j6=i
X
x j
ψp(xi, xj)Qj(xj)
(8) where Ziis the marginal normalization function Each iter-ator is guaranteed to decrease the KL-divergence, thus this inference algorithm is guaranteed to converge to a local opti-mum [12,37]) In message passing the computational bottle-neck is the evaluation of the sumP
j6=i
P
xjψp(xi, xj)Qj(xj) The computational complexity of a single update of a marginal
Qi(Xi) is O(N ) and the complexity of updating all the marginals is O(N2) Fortunately, Krahenbuhl [13] observed that a high dimensional Gaussian filter can be used to up-date all the mean field marginals concurrently in time O(N ), that makes inference tractable
In this section, we present our experimental results and evaluate the proposed building segmentation approach on a benchmark image dataset We then compare it with state-of-the-art methods
We conducted experiments on a challenging benchmark dataset recently released by the International Society of Pho-togrammetry and Remote Sensing (ISPRS) Working group III/4 for evaluation of newly proposed methods, the ISPRS Semantic Labeling Benchmark [9] This test dataset was
Trang 4acquired over Vaihingen city in Germany The dataset
con-tains 33 large image patches, each of which consists of a true
orthophoto (TOP) extracted from a larger TOP mosaic and
a Digital Surface Model (DSM) The average size of such a
patch is about 15MB; while the resolution of a patch is
var-ied from 2336 × 1281 upto 3816 × 2550 Totally, all patches
contain over 168 million pixels The ground sampling
dis-tance of both, the TOP and the DSM, is 9 cm Labeled
ground truth was provided for 16 of the patches, which are
divided into training and validation sets The training set
consists of 11 patches (1, 3, 5, 7, 13, 17, 21, 23, 26, 32, 37)
and the validation set consists of 5 patches (11, 15, 28, 30
and 40) Normalized DSMs were provided by [20], and were
generated using the lasground tool [1], which computes the
normalized height base on the ground-off pixels
In the experiments, the system was run 20 times for each
test set All the programs were run on a machine with
CPU Intel Core i7-4770K (8 CPUs), RAM 16GB DDRIII
1600Mhz, Window 8.1, and implemented by R and C++
In this section, we compare the results of our proposed
framework to other methods reported in [10] on the
IS-PRS Semantic Labeling Benchmark dataset The evaluation
is based on different measures including precision
(correct-ness), recall (completeness) and F1-score, which are defined
as follows:
P recision = #true positive
#truepositive + #f alse positive. (9)
Recall = #true positive
#true positive + #f alse negative. (10)
F 1 − score = 2 ∗ P recision ∗ Recall
P recision + Recall . (11)
As described by the ISPRS contest committee, the
bound-aries between classes are eroded by a circular disc of 3 pixel
radius Those eroded areas are then ignored during
evalu-ation The motivation is to reduce the impact of uncertain
border definitions on the evaluation The experimental
re-sults are shown in Table1
Table 1: Building segmentation results
Method Precison(%) Recall(%) F1-score(%)
As one can see from the Table1, the proposed framework
achieves state-of-the-art recall, while maintaining high
pre-cision It means that our method can precisely detect a high
percentage of true building area, while keeping a small false
positive rate With respect to F1-score measure defined as
the harmonic mean of precision and recall, our method takes
second place, standing just after the well-known
state-of-the-art method in various computer vision tasks, CNN [29]
Nevertheless, CNN requires much more time for training and
test phrases than our framework The CNN model in [29] combines three separate CNN submodels with three differ-ent input image patch sizes: 16×16, 32×32, 64×64 In order
to demonstrate the performance of CNN-based models, we study, however, a simplified CNN model based on the spirit
of the CNN model in [29] The simplified CNN model works with image patches of size 32 × 32, and is implemented using Torch7 library with CUDA support [2] We test this model
on a strong Dell Precision T7610 Workstation with Intel Xeon 8 Core E5-2650V2 2.60 GHz, 32GB DDR3 RAM and NVIDIA Quadro K5000 Average time required for training and test phrases for both our framework and the simplified CNN model is shown in Table2
Table 2: Average time for training and test phrases
training time(s) per image(s) Simplified CNN
support Our framework with
on 4 cores
From the Table2, one can see that the CNN model, even with simplified version, is much more computationally ex-pensive than our framework Particularly, in spite of being executed on a much stronger machine, Dell Precision T7610 Workstation with CUDA support, the simplified CNN model
is slower about 25 times in training, and about 50 times
in test phrase than our framework Obviously, the com-plicated CNN model proposed in [29] must be much more time-consuming, and, therefore, much less effective than our framework
Finally, in Fig.1, we demonstrate the improvement of seg-mentation result obtained by applying the fully connected CRF model to the probabilistic results of RF on the test image patch 11 We can notice that the fully connected CRF can effectively eliminate the misclassified pixels (can
be considered as noise) from RF’s output
We have presented an efficient framework for semantic im-age segmentation, and, particularly, for pixel-wise building segmentation from aerial images Our stacked framework includes a preliminary layer using RF with carefully hand-designed features, and a denoising layer based on a fully connected CRF We then evaluate the proposed framework
in building segmentation task on the well-known ISPRS Se-mantic Labeling Benchmark dataset of aerial images [9] The experimental results show that our framework achieve state-of-the-art accuracy with a reasonable computational speed
[1] Lasground tool for bare-earth extraction
http://rapidlasso.com/lastools/lasground/ Accessed: 2015-08-10
[2] Torch 7 library.http://http://torch.ch/ Accessed: 2015-08-10
Trang 5(a) Input image (b) Ground-truth (c) RF (d) RF+CRF
Figure 1: An illustration of improving segmentation result using CRF over RF’s output
[3] L Breiman Random forests Machine learning,
45(1):5–32, 2001
[4] N Champion, L Matikainen, X Liang, J Hyypp¨ı£¡,
and F Rottensteiner A test of 2D building change
detection methods: Comparison, evaluation and
perspectives the International Archives of the
Photogrammetry, Remote Sensing and Spatial
Information Sciences, XXXVII:297–303, 2008
[5] N Champion, G Stamon, and M Pierrot-Deseilligny
Lecture Notes in Geoinformation and Cartography,
chapter Automatic Revision of 2D Building Databases
from High Resolution Satellite Imagery: A 3D
Photogrammetric Approach, pages 43–66 Springer
Berlin Heidelberg, 2009
[6] M Drauschke and W F¨orstner Selecting appropriate
features for detecting buildings and building parts In
Proceedings of The 21st Congress of the International
Society for Photogrammetry and Remote Sensing
(ISPRS), Beijing, China, 2008
[7] A Fischer, T Kolbe, F Lang, A Cremers,
W Forstner, L Pluemer, and V Steinhage Extracting
buildings from aerial images using hierarchical
aggregation in 2d and 3d Computer Vision and Image
Understanding, 72(2):185–203, November 1998
[8] M Gruber, M Ponticelli, S Bern¨ı£¡gger, and
F Leberl Ultracamx, the large format digital aerial
camera system by Vexcel Imaging / Microsoft ISPRS
Archives, XXXVII Part B1:665–670, 2008
[9] ISPRS Working group III/4 Isprs 2d semantic labeling
contest.http://www2.isprs.org/commissions/comm3/
wg4/semantic-labeling.html Accessed: 2015-08-10
[10] ISPRS Working group III/4 Isprs semantic labeling
contest (2d) results.http://www2.isprs.org/
vaihingen-2d-semantic-labeling-contest.html
Accessed: 2015-08-10
[11] C Jaynes, E Riseman, and A Hanson Recognition
and reconstruction of buildings from multiple aerial
images Computer Vision Image Understanding,
90(1):68–98, 2003
[12] D Koller and N Friedman Probabilistic graphical
models: principles and techniques MIT press, 2009
[13] V Koltun Efficient inference in fully connected crfs
with gaussian edge potentials Adv Neural Inf Process Syst, 2011
[14] F Korc and W Forstner Interpretation terrestrial images of urban scenes using discriminative random fields In Proceedings of the Congress of the
International Society for Photogrammetry and Remote Sensing, pages B3a: 291–296, 2008
[15] S Kumar and M Hebert Man-made structure detection in natural images using a causal multiscale random field In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 119–126, 2003
[16] J Lafferty, A McCallum, and F C Pereira
Conditional random fields: Probabilistic models for segmenting and labeling sequence data 2001
[17] A Lagrange and B Le Saux Convolutional neural networks for semantic labeling 2015
[18] F Leberl and J Szabo Novel totally digital photogrammetric workflow Technical report, Semana Geomatica, IGAC-Bogota, Colombia, August 2005 [19] C Lin and R Nevatia Building detection and description from a single intensity image Int Journal Computer Vision and Image Understanding,
72(2):101–121, 1998
[20] I Markus Gerke Use of the stair vision library within the isprs 2d semantic labeling benchmark (vaihingen) [21] B Matei, H Sawhney, S Samarasekera, J Kim, and
R Kumar Building segmentation for densely built urban regions using aerial lidar data In Proceedings of the IEEE Computer Vision and Pattern Recognition, pages 1–8, June 2008
[22] L Matikainen, K Kaartinen, and Hyypp¨ı£¡
Classification tree based building detection from laser scanner and aerial image data In Proceedings of ISPRS Workshop Laser Scanning, 2007
[23] H Mayer Automatic object extraction from aerial imagery—a survey focusing on buildings Computer Vision and Image Understanding, 74(2):138–149, 1999 [24] H Mayer, S Hinz, and U Stilla Advances in
Photogrammetry, Remote Sensing and Spatial Information Science, chapter 16: Automated
Trang 6extraction of roads, buildings and vegetation from
multi-source data, pages 213–226 ISPRS Congress
book, 2008
[25] X Meng, N Currit, L Wang, and X Yang Detect
residential buildings from lidar and aerial photographs
through object-oriented land-use classification
Photogrammetric Engineering & Remote Sensing, 78,
2012
[26] S Mueller and D W Zaum Robust building
detection in aerial images In ISPRS Workshop on
Object Extraction for 3D City Models, Road Databases
and Traffic Monitoring - Concepts, Algorithms, and
Evaluation (CMRT05), 2005
[27] R Nevatia, C Lin, and A Huertas A system for
building detection from aerial images In Automatic
Extraction of Man-Made Objects from Aerial and
Space Images, Birkhaser Verlag, pages 77–86, 1997
[28] T T Nguyen, S Kluckner, H Bischof, and F Leberl
Aerial photo building classification by stacking
appearance and elevation measurements In In:
Proceedings ISPRS, 100 Years ISPRS - Advancing
Remote Sensing Science, on CD-ROM, 2010
[29] S Paisitkriangkrai, J Sherrah, P Janney, and
A Hengel Effective semantic pixel labelling with
convolutional networks and conditional random fields
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops, pages
36–43, 2015
[30] N Paparoditis, M Cord, M Jordan, and J.-P
Cocquerez Building detection and reconstruction from
mid-and high-resolution aerial imagery Computer
Vision and Image Understanding, 72(2):122–142, 1998
[31] N Pfeifer, M Rutzinger, F Rottensteiner,
W Muecke, and M Hollaus Extraction of building
footprints from airborne laser scanning: Comparison
and validation techniques In Urban Remote Sensing
Joint Event, pages 1–9, April 2007
[32] F Rottensteiner Automated updating of building
data bases from digital surface models and
multi-spectral images International Archives of the
Photogrammetry, Remote Sensing and Spatial
Information Sciences, XXXVII B3A:pp.265–270, 2008
[33] F Rottensteiner, G Sohn, M Gerke, J D Wegner,
U Breitkopf, and J Jung Results of the isprs
benchmark on urban object detection and 3d building
reconstruction ISPRS Journal of Photogrammetry and Remote Sensing, 93:256–271, 2014
[34] B Sirmacek and C Unsalan Building detection from aerial images using invariant color features and shadow information In 23rd Intl Symp on ISCIS, pages 1–5, Oct 2008
[35] T Speldekamp, C Fries, C Gevaert, and M Gerke Automatic semantic labelling of urban areas using a rule-based approach and realized with mevislab [36] J Verbeek and B Triggs Region classification with markov field aspect models In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, pages 1–8, jun 2007
[37] M J Wainwright and M I Jordan Graphical models, exponential families, and variational inference Foundations and TrendsR
1(1-2):1–305, 2008
[38] J Winn, A Criminisi, and T Minka Object categorization by learned universal visual dictionary
In Computer Vision, 2005 ICCV 2005 Tenth IEEE International Conference on, volume 2, pages 1800–1807 IEEE, 2005
[39] M Xie, K Fu, and Y Wu Building recognition and reconstruction from aerial imagery and lidar data In Proceedings of the International Conference on Radar, pages 1–4, Oct 2006
[40] H Xu, L Cheng, M Li, Y Chen, and L Zhong Using octrees to detect changes to buildings and trees
in the urban environment from airborne lidar data Remote Sensing, 7(8):9682–9704, 2015
[41] J Yao and Z M Zhang Semi-supervised learning based object detection in aerial imagery In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1011–1016, Washington, DC, USA, 2005 IEEE Computer Society [42] P Zhong and R Wang Object detection based on combination of conditional random field and markov random field In Proceedings of the 18th International Conference on Pattern Recognition, pages 160–163, 2006
[43] P Zimmermann A new framework for automatic building detection analyzing multiple cue data International Archives of Photogrammetry and Remote Sensing, 33:1063–1070, 2000