Users can build a coarse walls-and-floor textured model in five mouse clicks, or a detailed model showing all furniture in a couple of minutes interaction.. Introduction A realistic 3D mod
Trang 1A semi-interactive panorama based 3D reconstruction framework for indoor scenes Trung Kien Danga,⇑, Marcel Worringa, The Duy Buib
a
Intelligent Systems Lab Amsterdam, Informatics Institute, University of Amsterdam, The Netherlands
b
Human Machine Interaction Laboratory, University of Engineering and Technology, Vietnam National University, Hanoi, Viet Nam
a r t i c l e i n f o
Article history:
Received 20 May 2010
Accepted 13 July 2011
Available online 27 July 2011
Keywords:
3D reconstruction
Panorama
Interactive
a b s t r a c t
We present a semi-interactive method for 3D reconstruction specialized for indoor scenes which com-bines computer vision techniques with efficient interaction We use panoramas, popularly used for visu-alization of indoor scenes, but clearly not able to show depth, for their great field of view, as the starting point Exploiting user defined knowledge, in term of a rough sketch of orthogonality and parallelism in scenes, we design smart interaction techniques to semi-automatically reconstruct a scene from coarse
to fine level The framework is flexible and efficient Users can build a coarse walls-and-floor textured model in five mouse clicks, or a detailed model showing all furniture in a couple of minutes interaction
We show results of reconstruction on four different scenes The accuracy of the reconstructed models is quite high, around 1% error at full room scale Thus, our framework is a good choice for applications requiring accuracy as well as application requiring a 3D impression of the scene
Ó 2011 Elsevier Inc All rights reserved
1 Introduction
A realistic 3D model of a scene and the objects it contains is an
ideal for applications such as giving an impression of a room in a
house for sale, reconstruction of bullet trajectories in crime scene
investigation, or building realistic settings for virtual training[1]
It gives good spatial perception and enables functionalities such
as measurement, manipulation, and annotation One broad
catego-rization of scenes is outdoor versus indoor Outdoor scenes have
been popular in many modeling applications[2,3], especially
creat-ing models of urban scenes[4,5] Indoor scenes are prevalent in
applications like real estate management, home decoration, or
crime scene investigation (CSI), but research on them is limited
with some notable exceptions [6–8] In this paper we consider
the 3D reconstruction of indoor scenes
While in applications like real estate management, a coarse
model of a room is sufficient, other applications need more
com-plete models For instance, in CSI the model should be comcom-plete
and show all the details in the crime scene as any object is
poten-tially evidence Each application also requires a different level of
accuracy Home decoration, for example, does not need extreme
accuracy for its purpose is merely to give an impression of the
scene For the CSI application, the model should be as accurate as
possible in order to make measurements and hypothesis validation
reliable Here we are seeking for a framework that can create
com-plete and accurate models in highly demanding applications such
as CSI, as well as coarse models for less demanding applications
3D models are often built manually from measurements and images using the background map technique Modelers take images of the object from orthogonal views (top, side and front), and try to create a model matching those images A measurement
is required to scale the model of the object to the right size Mod-eling from measurements and images is only suitable for simple scenes, as complex scenes with many objects require a lot of mea-surements, images, and interaction Even with meamea-surements, accurately modeling objects is difficult since the assumption that the line of view is orthogonal to the object is hard to meet in prac-tice Since manual reconstruction is cumbersome and time con-suming [9], automatic or semi-interactive reconstruction is preferred
Automatic methods do exist and have shown good results for isolated objects and outdoor scenes[10,3,11–13] Those methods require a camera moving around and looking towards the scene
to capture it from multiple viewpoints [14–17] Such moves maintain a large difference between viewpoints, giving accurately estimated 3D coordinates[18] Unfortunately in practice people tend not to follow such moves, making these methods inaccurate and unreliable Indeed in the well-known PhotoSynth system its has been observed that quality suffers when users do not follow the appropriate moves [12] In simple cases, when modeling single-object scenes, automatic methods give results of 2–5% relative error[19] This is sufficient for visualization, but rather low for measurements such as in CSI applications In indoor scenes where the space is limited, the situation is even worse
as it is difficult, if not impossible, to perform the capturing moves suitable for automatic reconstruction So, automatic reconstruc-tion methods in their current state are not sufficient for accurate indoor scene reconstruction
1077-3142/$ - see front matter Ó 2011 Elsevier Inc All rights reserved.
⇑Corresponding author Address: 42A – 144/4, Quan Nhan, Thanh Xuan, Hanoi,
Viet Nam Fax: +84 437547460.
E-mail address: dtkien123@gmail.com (T.K Dang).
Contents lists available atSciVerse ScienceDirect
Computer Vision and Image Understanding
j o u r n a l h o m e p a g e : w w w e l s e v i e r c o m / l o c a t e / c v i u
Trang 2Semi-interactive methods are potential solutions [20–22] A
small amount of interaction helping computers in identifying
important features makes reconstruction more reliable A few
mouse clicks are enough to build a coarse model[8] Recent work,
such as the VideoTrace system[13], shows that interaction can be
made smart and efficient by exploiting automatically estimated
geometric information
While interaction helps to efficiently improve the reliability,
there is still the problem of having limited space to move around
in indoor scenes Using panoramas is a potential solution
Panoramas give a broad field of view So a few panoramas are
enough to completely capture a scene, and moving around the
scene is no longer a problem Furthermore, building panoramas
is reliable, thus using panoramas contributes to the reliability
of the overall solution The advantages of interaction on the
one hand and panoramas on the other, suggest that a
combina-tion of them would be a good solucombina-tion for indoor scene
reconstruction
Following the above observations, we propose a multi-stage,
semi-interactive, panorama based framework for indoor scenes
In the first stage, a coarse model is build This stage extends upon
the technique in[8] We make the interaction more efficient by
providing a smart interaction technique, and rectify panoramas
to guarantee the accuracy meets our aimed quality Furthermore,
we give a reconstructability analysis and, based on that, present a
capture assistant to guide the placement of the camera Results of
the first stage, a coarse model and geometric constraints, facilitate
efficient interaction to build a detailed model in the second stage
This framework overcomes the problems mentioned and makes it
easier to create accurate and complete models
In the next section we summarize related work Section3gives
an overview of our framework Section4 describes how to turn
panoramas into a floor-plan and how to build a coarse 3D model
Section 5 describes the interaction to add details to the coarse
model Then we evaluate the accuracy and show how efficient
the framework is We close the paper with a discussion on how
to further automate the framework
2 Related work
2.1 Reconstruction from panoramas
A panorama is a wide-angle image, typically generated by
stitching images from the same viewpoint[23] Since panoramas
cover a wide view, they must be mapped on a cylinder or sphere
to view Accordingly, they are called cylindric or spherical
panora-mas Being wide-angle, panoramas give a good overview of a scene,
especially in indoor scenes where the field of view is limited On
the other hand, they do not give a good spatial perception since
the viewpoint is fixed at one point There is work on creating
panoramas using multiple viewpoints, called multi-perspective
panoramas[7,24,25] However, multi-perspective panoramas only
yield a 3D impression from the original viewpoints Other methods
are needed to make real 3D models
3D reconstruction from panoramas is found in[6,7,22] In[6],
a scene is modeled from geometric primitives, which are
manu-ally selected in panoramas of the scene Reconstruction is done
separately for each panorama, and then results of different
panoramas are merged together In [7] a dense 3D point cloud
is estimated from multi-perspective panoramas It, however,
requires a special rig for capturing the panoramas In[22]a
meth-od to do reconstruction from a cylindric panorama is proposed It
assumes that the scene, e.g a room, is composed of a set of
con-nected rectangles This method requires that all corners of the
room are visible, which is not often the case in practice In[8],
a method to reconstruct an indoor scene from normal single-per-spective panoramas is described The result is a coarse 3D model including walls onto which panoramas are projected Such a model is not sufficient for some applications such as CSI, but this simple and flexible method gives good intermediate results towards building a detailed model
2.2 Interaction in reconstruction
There are many types of interaction in reconstruction In the simplest case users define geometric primitives, such as points, lines, or pyramids and match these to the image data [20] In
[21], quadric surfaces are used to support more complex objects VideoTrace [13]lets users draw and correct vertices of a model
in an image sequence
The efficiency of interaction can be improved by exploiting what is already known about the scene The guiding principle is
to get as much geometric constraints as possible, and use them
to assist interaction These constraints can come from domain knowledge, the user interacting with the model, or through auto-matic estimation by the system, each of them we will now briefly describe
Domain knowledge in the form of prior knowledge about the type of scenes to be reconstructed is helpful in designing efficient interaction For example, when modeling man-made scenes we can assume that parallel lines are many Thus, vanishing points are helpful in constraining the interaction[6,26,27] In urban scenes there are often repeated component such as windows Hence in-stead of modeling them separately, the user can copy them[21]
In a man-made scene, objects are stacked on each other, e.g a table
is on the floor and books are on the table We can exploit these to reduce the interaction and improve accuracy[9]
Scene specific geometric constraints can be provided by users
In [9], users define how an object should be bound to another one, to reduce the degrees of freedom in the interaction to recon-struct that object In[8], after roughly defining a room by a sketch, users can build a coarse model with a few mouse clicks
Some geometric constraints can be reliably estimated by com-puters In some cases, coarse 3D structure and camera motion information can be estimated State-of-the-art interactive recon-struction systems including[13,12]take advantage of such infor-mation sources to create intuitive and efficient interaction For example, in VideoTrace[13]system, vertices drawn in one frame
by the user are tracked and rendered in other frame by the system Users browse forward or backward in the video sequence to correct those vertices until satified For the user it is like refining a model rather than creating it from scratch
In practice, those three sources of constraints are often mixed in the modeling flow, which is also what we will do in this paper
3 Framework overview
Our framework is an A-to-Z solution, from capturing an indoor scene to modeling it, which is summarized inFig 2
The framework takes as input a sketch of the floor-plan, a top-down design drawing of a room (e.g.1a) that describes its walls and their relative positions drawn by the user The capture plan-ning module analyzes the sketch to tell the user how many panora-mas are needed to completely capture the scene, and suggests camera placement i.e the appropriate viewpoints Either calibrated
or uncalibrated cameras can be used, but to guarantee good accu-racy, we advise to pre-calibrate the camera and correct the lens distortion before stitching them into panoramas Users can use a software package of their own choice to estimate the camera
Trang 3motion and stitch corrected images together into panoramas, for
example using Hugin1(Fig 1b)
To build a coarse model of a room, the users picks the corners,
intersections of walls, in the panoramas The framework provides
a smart corner picking method to make the interaction comfortable
The location of the corners on the panoramas and the sketch are
en-ough to estimate the correct floor-plan and build a coarse model of
the scene[8](Fig 1c) More expressively, we call this coarse model,
which includes textured walls and floor, a walls-and-floor model A
typical rectangular room needs only one panorama to build such a
model, where irregular rooms may need more than one panorama
depending on the shape of the room and the viewpoints of the
pan-oramas This stage is discussed in detail in Section4
In order to add more detail efficiently, we exploit the geometric
constraint resulting from the observation that indoor scenes
contain many flat objects aligned to walls We iteratively use
known surfaces to guide an interaction type that we call perspective
extrusion to add objects This technique helps to quickly build a
detailed model (Fig 1d) Details of this stage are given in Section5
4 Building a walls-and-floor model
In this section we discuss methods for building a
walls-and-floor model For easier comprehension, we present the walls-and-floor-plan
estimation and other elements prior to the capture planning For
the moment, we assume that the set of panoramas given is
sufficient for floor-plan estimation
We let the user draw a sketch of the floor-plan indicating
orthogonality and parallelism of walls, and use a method built
upon the method in [8]to estimate an accurate floor-plan This
method is based on the observation that the horizontal dimension
of the panoramic image is proportional to the horizontal view
angle of the panorama Thus a set of corners divides the panorama
into horizontal view angles of known ratio If we assure that any
panorama looks all around a room, the total horizontal view angle
is obviously 360 degrees without any measurement Hence we
know each horizontal view angle This observation is valid when
the corners are perfectly aligned to the vertical dimension Thus,
to make a more accurate floor-plan estimation than in[8], we rec-tify the panoramas to meet that condition first
Building 360-degree panoramas is well studied[23], thus we do not discuss it here For the next step, indicating corners in panora-mas, we provide smart corner picking Rectifying panorapanora-mas, and estimating the floor-plan are subsequently discussed below Then
we present the reconstructability analysis and the capture assistant
4.1 Smart corner picking
In order to estimate the floor-plan, coordinates of the top-down projections of corners are needed As panoramas may not be well aligned, getting one point on a corner is not enough Instead we need to identify a corner by a line segment One way to do that
is to ask a user to manually draw a line onto a panorama To make
it even simpler, we provide a utility to let users just casually pick a point in a panorama and the system will automatically identify the corner line
Since the straightness of lines is not preserved in the coordinate system of a panorama, here a cylindric one, we must project a user picked point into one of the images, from which the panorama is created, to work in the image coordinate system We assume that the best image is the one whose image plane is most orthogonal to the projection ray of the picked point Or in other words, the angle between the ray from the viewpoint to the image center and the projection ray rcof the picked point is smallest
if ¼ arg min i
where r(i) is the principal ray of image i
Since panoramas are usually approximately aligned, we limit the detection to a vertical image band around the picked point
We detect vertical edges around that point, and fit a line through the picked point and edge points using RANSAC[28] The picked point is used here as an anchor to avoid the auto-detected line moving to a wrong location Since the picked point is not exactly
at the right position, we afterwards relax the condition, optimizing the line without constraining it to go though the picked point to yield the final line The process is summarized in Table 1 and two examples are given inFig 3
a A rectangular room b (Unwrapped)panorama of the room
c The walls-and-floor model d Adding more detail to the model
Fig 1 Illustration of input and (intermediate) results of the reconstruction process A simple rectangular room is used as example.
1
Trang 44.2 Rectifying panoramas
To accurately estimate the floor-plan, we first rectify the pan-oramas so that corners are aligned to the vertical dimension for a cylindrical panorama
Each corner together with the viewpoint defines a plane And these planes remain unchanged no matter how we move the coor-dinate system since they are defined by the scene and viewpoint
To align the panorama cylinder we need to find the rotation R that makes those planes parallel to the vertical direction In other words, after transforming by R, the normals of planes are orthogo-nal to w= (0, 0, 1)T, i.e
uT
where uiare the planes’ normals
Using this constraint, given at least three corners, we can com-pute the last column of R1, or equivalently the last row of R, by finding the least-square solution If the last row of R is
r3= (a, b, c), and from the constraint that R is orthogonal, we choose its other rows as:
r1ffi ðb; a; 0Þ r2ffi ðac; bc; a2þ b2Þ ð3Þ
where ffi means equal up to a scale, and jr1j = jr2j = jr3j = 1 Once having computed R, we resample the panoramic image to finish the rectification
4.3 Estimating the floor-plan
The locations of corners in panoramas, identified in the previous step, give sets of horizontal angles between the corners when viewed from the panorama viewpoint If we have a way to repre-sent those angles in terms of coordinates of projections of corners and viewpoints in the floor-plan, we have a set of constraints to estimate the floor-plan and the viewpoints Here we briefly review such a method presented in[8], discuss its applicability, and show how we extend it for our work
A sketch is a model of the floor-plan We force users to draw rectilinear lines parallel to the axes by providing them with a drawing grid Of course, this alignment can be done automatically, but drawing in such way helps users to correctly define parallelism and orthogonality Note, as only parallelism and orthogonality are important in the parameterization, a sketch of a rectangular room
is any arbitrary rectangle
Assuming that the room has n corners, we need at most 2n parameters to represent it A viewpoint, whose coordinates both have to be estimated, is represented by a pair of separate parame-ters Suppose that we havevpanoramas, then the total number of parameter is 2n + 2v For each wall drawn in the sketch that is par-allel to an axis, since the two corners of a wall share a horizontal or vertical coordinate, the number of parameters is reduced by one (Fig 4a) Hence the number of parameters is reduced by the num-ber of those walls, m To further reduce the numnum-ber of parameters, the origin of the coordinate system is set at one corner, and the length of a wall is set to one, as the reconstruction is up to a scale anyway These settings reduce the number of parameters by 3 In summary, the number of parameters to be estimated is:
From the model of the floor-plan that contains the coordinates
of corners and viewpoints, we can estimate the angle between two corners as seen from a viewpoint (Fig 4b) These angles are equal
to the set of angles defined by user-picked corners in the panora-mas This set of constraints can be used to estimate the parameters
of the floor-plan model and the viewpoints
Fig 2 Overview of the proposed framework.
Table 1
Smart corner picking process.
1 Let the user pick a point in/near a corner from the panorama
2 Find the best image, according to Eq (1)
3 Perform canny edge detection in a horizontal band of one tenth of the
image width around the picked point
4 Fit a line through the picked point and the edges using RANSAC, where
the line must go though the picked point
5 Optimize the line without constraning it to the picked point
Fig 3 Two examples of smart corner picking (a) The user picks a point (b) Edges
are detected in a vertical image band; a line is fitted through the picked point and
edges Note that there is another (even longer) vertical line but the algorithm
Trang 5At this point, the coordinates of top-down projections of
view-points are estimated But the viewview-points’ heights are missing
Complete viewpoint coordinates are required to add more details
to the model in the later stage Since we already know the the floor
and the projection of the viewpoint on the floor, we only need one
point to compute the relative distance from the viewpoint to the
floor To get that point, we ask the user to pick any floor point in
each panorama to compute its viewpoint height
4.4 Reconstructability analysis
We now give an analysis of the floor-plan estimation method
To estimate the floor-plan and the viewpoint coordinates, the
number of constraint must be greater or equal to the number of
unknowns given in Eq.(4)of the previous sub-section
Suppose that viewpoint i sees cicorners, since the sum of the
angles is 360 degrees, we have ci 1 independent constraints
Since the viewpoints are different, constraints of one viewpoint
are independent of constraints of other viewpoints The problem
is solvable when the number of constraints is greater than or equal
to the number of parameters:
Xv
i¼1
ciP2n þ 3v m 3 ð5Þ
Common rooms have all walls parallel to an axis, i.e the
floor-plan is a rectilinear polygon, thus m is equal to n Eq.(5)then
simplifies to:
Xv
i¼1
Suppose that we can find a point from which all corners are
vis-ible, i.e ci= n, Eq.(6)is then further simplified tovP1 So indeed
given a rectilinear floor-plan, one panorama that sees all corners
might be enough to estimate it A special, yet the most common, case is a rectangular room Since we see all four corners from any viewpoint, one panorama might be enough to reconstruct the walls-and-floor model
We need more panoramas when the floor-plan is not a rectilin-ear polygon, and when from the chosen viewpoint we cannot see all corners.Fig 5shows examples
4.5 The capture assistant
The capture assistant helps users in planning viewpoints in the room so that the reconstruction is possible and the model covers all of the room To that end, it must know the number of unknowns given a sketch, the number of constraints produced by viewpoints and the area they cover Furthermore, it is preferred that the num-ber of viewpoints is minimal
The number of unknowns is computed easily using Eqs.(5) and (6) In a convex polygon, a line segment from any point within it to any of its vertices does not go out of itself Hence if the floor-plan is convex, counting the constraints is trivial since from any viewpoint
we see all the corners When the floor-plan is concave, the problem
is nontrivial Since we keep the sketching simple, only asking users
to align rectilinear lines of the sketch parallel to axes, the sketch is freely stretched unevenly along axes Our solution is to decompose the sketch into tiles and compute the minimal number of observable corners from each tile, invariant to how it is stretched along axes The algorithm is described in algorithmAlgorithm 1
Algorithm 1 Decomposing a sketch into invariant observable areas
Step 1: Cut the sketch into tiles using all distinguished x and y coordinates A sketch is turned into a set of rectan-gles and trianrectan-gles (Fig 6a) Where each of them is called
a tile (Fig 6b)
Step 2: For each tile, find its invariant observable area (IOA) by the following steps:
– Initiate the area contains only the tile itself
– Iteratively add a tile if it together with some tiles already added forms a convex polygon containing the initial tile
Lemma 4.1 If the sketch is different from the real floor plan by an unevenly scaling, the IOAs are invariant to unevenly scaling
Proof The sketch is different from the real floor plan by an unevenly scaling, the coordinates of corners are transformed by
an monotic function, thus the order between any pair of x or y coordinate is preserved That means if xa> xbin the floor-plan, or one sketch, in another sketch that still holds Consequently The order of tiles, as decomposed in the algorithm above, is horizon-tally and vertically unchanged in any sketch Consequently the IOAs, a set of tiles, built following step 2 in Algorithm 1 is unchanged h
Lemma 4.2 Any point in an IOA is observable from any point in the initial tile
Proof Any point is observable from another point within a convex polygon Since the extending scheme only add new tile if it is a part
of a convex polygon with the initial tile, all points in the IOA are observable from any point in the initial tile h
Fig 4 Parameterization of the floor-plan model given a sketch, simplified from
Fig 2 in [8] (a) To reduce the number of parameters, corners are represented by
shared parameters (b) Each viewpoint is parameterized separately Locations of
corners in a panorama at the viewpoint give a set of angles between corners as
viewed from the viewpoint.
b a
Unseen corner
viewpoint
Fig 5 When the floor-plan is not rectilinear (a), or if from the viewpoint we cannot
see all corners (b), we may need more than one panorama to estimate it.
Trang 6Having IOAs we check if the planned viewpoints surely cover all
the room and provide enough constrains to estimate the real
floor-plan The IOA of a viewpoint is the IOA of the tile containing it By
checking if the union of the planned viewpoints’ IOAs, we can make
sure that the set of viewpoints covers all the scene Checking
whether the floor-plan is solvable is done by summing the number
of corners observed by each IOA, and then comparing it to the
con-dition in(5)
Given the IOAs of a sketch, finding an optimal set of viewpoints,
i.e smallest number of viewpoints that covers the scene
com-pletely and satisfies the reconstructibility condition(5), is a hard
problem Let us construct a graph representing the problem Each
tile is a node in the graph For each tile, we have edges connecting
it to all tiles in its IOA Since if a tile is observable from another one,
than from it we can also observe the other tile, the edges are
undi-rected Put aside the reconstructibility condition, our problem is
finding the minimal set of nodes from which we have edges
con-nect to the rest of the nodes This is the minimal dominating set
problem, one of the known NP-complete problems[29] With an
additional condition, our problem is arguably of the same
com-plexity To suggest users a solution in interactive time, we propose
the following greedyAlgorithm 2
Algorithm 2 Suggesting viewpoints, the greedy algorithm
Step 1 Find a dominating set Initialize an empty
domi-nating set of tiles While the scene is not covered by the
union of the IOAs of tiles in the set, add a tile whose IOA
contains most uncovered tiles
Step 2 Satisfy the reconstructability condition While
the condition of(5)is not satisfied, add a tile whose IOA
contains most corners, i.e providing most number of
constraints
In practice, since there are objects in the room, we might not be able to put the camera at the suggested positions, or see all the cor-ners we should see according to the analysis Should an object, e.g
a tall wardrobe, completely block corner(s), it must be considered
as part of the walls The procedure to suggest viewpoints is the same If a suggested tile is inappropriate to place the camera, users can mark it so thatAlgorithm 2can ignore that tile when recom-puting the suggested viewpoints This procedure has proven to give good results in practical cases
Viewpoints also affect the accuracy of the floor-plan and the texture quality In practice, since the panorama is built from high resolution images, the texture quality should not be a problem
To estimate the floor plan accurately, intuitively one should place the camera in the center of the room to balance the constraints After this stage, we have a textured walls and floor model In this model, objects are projected on the walls and on the floor It gives a good overview of the scene As indicated in applications such as real estate management it should be satisfactory However for an application such as CSI, the object localization is not detailed enough Thus, we need the second stage to add more detail
5 Adding details using perspective extrusion
The model now contains planes of walls, the floor, and view-point locations We design interactive methods to add detail to the model in spirit of the whole framework: flexibly reconstructing objects from coarse to fine For example, a table is reconstructed first and then the stack of books on it Characteristics of indoor scenes are utilized in designing interaction methods meeting that idea
In indoor scenes, many objects are composed of planes Since objects are often aligned to walls, those planes are likely parallel
to at least one wall or the floor As indicated ealier, this gives a con-straint to reconstruct objects This action is similar to an extrusion,
a popular standard technique in manual 3D modeling In a normal extrusion, the orthogonal projection of the object’s boundary on a reference plane is orthogonally popped up with a known distance, creating a new object planar surface In our situation we do not see the object in orthogonal views, but from a panorama viewpoint So, instead of moving the object’s boundary on lines orthogonal to the reference plane, we move it on rays from the viewpoint to their ori-ginal locations in the reference plane (Fig 1d) Because of this con-straining, we call it a perspective extrusion
Our aim is to reconstruct an object surface S that has a surface parallel to an already reconstructed plane (Fig 7) S is recon-structed from a set of three parameters The reference plane l is a reconstructed plane to which the plane of S is parallel The distance
S to l is denoted by d; and b is a projection of the boundary of S in a panorama The reconstruction procedure includes shifting the par-allel plane l by distance d to get the object plane p, and cutting p by the pyramid of b and the viewpoint from which we see b Once we have S, users can choose whether the object is a solid box or just a planar surface The perspective extrusion process is summarized in
Table 2
In related work such as[9], object parameters are defined indi-rectly in terms of geometric objects, e.g a rectangular box In pic-tures of indoor scenes, objects are frequently occluded, making the use of geometric objects difficult To give more options in recon-structing an object, we choose to let users define those parameters directly and separately For example, a box is defined by one of its faces and the distance to the plane the face is parallel to The dis-tance can be defined by an orthogonal line to any reconstructed plane
The parallel plane l is picked from the current model We pro-vide two ways to define d, namely using one or two viewpoints
Fig 6 Illustration of the sketch decomposition algorithm (a) The sketch is cut into
rectangles and triangles using all distinguished x and y coordinates (b) The tile
graph indicates possibilities of traveling among tiles (c) For each tile the initial
observable area is itself (black); then tiles reached by traveling parallel to axes are
iteratively added (gray); finally tiles reached from two ways are added (diagonal
pattern) (d) The number of corners contained in the observable area is the minimal
number of observable corners from the tile.
Trang 7To define d from a single viewpoint, the user draws a line from the
object surface orthogonally to a reconstructed plane To define d
from two viewpoints, the user picks the projections of a point on
the object surface in two panoramas We then triangulate these
two projections to estimate the 3D coordinates of that point, and
its distance to l, which already reconstructed, is the distance d This
strategy is useful when there is no physical clue for guiding the
drawing of a line from the object’s surface orthogonally to a
recon-structed plane For example, for a chair, whose legs are bended,
standing in the middle of the room, there would be no physical
clue to draw d from a single viewpoint The boundary b is a
poly-gon drawn by users from the viewpoint To assist the drawing of
b, we assume as a default that the boundary of S has orthogonal
angles and is symmetric as long as the drawing of b does not break
this assumption Using those assumptions, we predict the
bound-ary and render it This is helpful to accurately define b, especially
when a vertex is occluded
For flexibility and accuracy, we let users define any parameter
(l, d, or b) from any available panorama viewpoint A possible
way to increase flexibility and accuracy is to let users adjust the
boundary b from different viewpoints as in VideoTrace [13]
However, that is only effective if we have many viewpoints, i.e
observations of the boundary To keep the framework simple and
the number of input panoramas small, we have decided not to
use that technique
To be reconstructible, objects must be seen and the parameters
for perspective extrusion must be definable The capture assistant
described in Section4.5handles part of this by ensuring all of the
floor and walls will be seen Of course objects can be occluded
completely by other objects, but that is hardly the case for the
main objects in the scene For l and b, if objects are complex or
curvy, we can only approximate them (Fig 11c and d) For a
‘‘floating’’ object, like the chair inFig 10a, since there is no solid connection from its surface to another surface, one should use two viewpoints to define d In general, if an object has sufficiently different appearance in two panoramas, then it is reconstructible
6 Results
We now present results showing that the proposed framework overcomes difficulties in indoor scene reconstruction to efficiently produce complete and accurate models
6.1 Datasets
Four scenes are used in our evaluation (Fig 8) Three are rooms
in a house captured by ourselves The last one is a fake crime scene captured by The Netherlands Forensic Institute The ground truth is defined by measurements made on objects in the scenes All scenes are typical indoor scenes, rather complex and the space is limited For every scene, the minimal number of panoramas required, as computed using our capture assistance, is one Because of obstacles (furniture) there was no good position for capturing all corners, thus we had to use two panoramas for the three rooms For the fake crime scene, we use one panorama
Fig 7 A perspective extrusion pops up an object from an already reconstructed
plane.
Table 2 Perspective extrusion process.
1 The user picks the reference plane l
2 The user defines the distance from l to the object plane p, either from one
or two viewpoints
3 Compute the object plane p by shifting l by d
4 The user defines the boundary though its projection b onto a panorama
5 Compute initial S by cutting the object plane p by the pyramid of b and the panorama viewpoint
6 The user choses object type, either a solid box or a planar surface
2 panoramas 2 panoramas 2 panoramas 1 panoramas
a Bedroom b Dining room c Kitchen d Fake crime scene
Table 3 Floor-plan relative errors (in percent, mean ± standard deviation) To achieve the best accuracy lens distortion should be applied before panorama stitching, and panorama rectification (Section 4.2 ) should be used The floor-plan error of the fake crime scene
is not available because of lacking ground truth.
Without rectification
Uncalibrated images
Calibrated &
rectification Bedroom 0.48 ± 1.45 0.49 ± 0.16 0.38 ± 0.14 Dining
room 7.50 ± 3.20 7.48 ± 3.17 1.18 ± 0.49 Kitchen 9.88 ± 3.24 0.48 ± 0.23 0.28 ± 0.05
Trang 86.2 Accuracy
Since the reconstructed model is up to a scale and a rotation, we
have to eliminate that ambiguity in order to evaluate the accuracy
To do so we estimate a transformation from the estimated
floor-plan to the ground truth floor-floor-plan We apply this to the model,
and then evaluate the model at two levels: at room scale (i.e
floor-plan error), and at object scale (i.e object measurements)
Table 3 shows floor-plan errors with and without rectifying
panoramas In two out of three datasets the improvement is quite
significant In one dataset, the Bedroom, the error without
rectifica-tion is almost the same as rectified since the angles of the original
panoramas almost perfect Using uncalibrated images (calibration
done during stitching) is possible, though the results are not as
good as using pre-calibrated images The errors, with pre-cali-brated images and panorama rectification, are about a few centi-meters in a room of about ten squared centi-meters The relative errors, computed by dividing the absolute error by the length of the diagonal of the rectangular bounding box of the true floor-plan, are about 1% The estimated floor-plan of the dining room is less accurate since it was hard to identify some of its corners in the panoramas Our accuracy is higher than in[8], where the error is about 4% Two differences responsible for the improvement are: the floor-plan estimation strategy we used, and our panorama rec-tification In[8], a sketch of several rooms is used to parameterize and estimate the floor-plan of multiple rooms It was noted that by doing so, and thus ignoring thickness of walls, might reduce the accuracy [8] To achieve high accuracy, we have estimated the floor-plan of each room separately More importantly, our rectifica-tion eliminates the inaccurate alignment in the input panoramas (seeTable 4)
For objects, since the angles between geometric primitives, lines and planes, are already enforced during the reconstruction,
we only evaluate the length errors, absolute and relative to the ground truth lengths
The accuracy of our framework is quite high, e.g comparing to
[8,19] Object accuracy is slightly less accurate than scene accuracy
in terms of relative error, but our examination shows that the absolute errors are about the same
6.3 Efficiency and completeness
Our framework is efficient A scene can be modeled in a dozen
of minutes Fig 9 shows the model of a rather complex scene namely the fake crime scene The walls-and-floor model is built in seconds All furniture is modeled in about 5 min The time taken
to build the final model that includes small objects such as cups
on tables is 10 min Furthermore, users do not need to measure ob-jects for modeling at capture time
Fig 10shows models of some scenes built using our framework Close-ups of objects picked from reconstructed models are given in
Fig 11 Objects composed of planar surfaces are well recon-structed, while complex curvy objects can only be approximated using perspective extrusions
7 Conclusion
We have proposed a panorama-based semi-interactive 3D reconstruction framework for indoor scenes The framework overcomes the problems of limited field of view in indoor scenes and has the desired properties: robustness, efficiency, and accu-racy Those properties make it suitable for a broad range of appli-cations, from a coarse model created in a few seconds for a presentation to a detailed model for measurement in crime scene
Table 4
Average object errors (mean ± standard deviation).
Average object error Absolute (cm) Relative (%)
Fake crime scene 6.2 ± 2.6 1.84 ± 0.89
a Walls-and-floor model b All furniture model
0 min, 6 mouse clicks 5 min, 10 extrusions
c Final model d Final textured model
10 min, 19 extrusions
s
Fig 9 Resulting models as function to time and amount of interaction spent The
example is the fake crime scene.
a Bedroom b Dining room c Kitchen
Trang 9investigation Models inexpensively created using our framework
are an intuitive medium to manage and retrieve digitized
informa-tion of scenes and use it in interactive applicainforma-tions
A limitation of the framework is that it lacks the ability to
mod-el complex objects This could be counteracted by other more
expensive techniques For example the VideoTrace technique[13]
lets users model objects from video sequences The ortho-image
technique [30] creates background maps from image sequences
to assist artists in modeling objects in 3D authoring software As
objects are complex, both techniques require images from many
different angles and more interaction Since our panoramic images
are calibrated, we can integrate those techniques into our
frame-work as plugins Once the object is reconstructed using those
tech-niques, we can automatically integrate it back into our model, by
matching panoramic images to the image sequence used to model
the object and then estimating the pose of the object Thus the
framework is a useful tool for both quickly building coarse models
as well as efficiently building accurate models In the
accompany-ing video the system is demonstrated on a number of realistic
scenes
Acknowledgments
This work is supported by the BSIK project MultimediaN and
the Research Grant from Vietnam National University, Hanoi No
QG.10.23
Appendix A Supplementary data
Supplementary data associated with this article can be found, in
the online version, atdoi:10.1016/j.cviu.2011.07.001
References
[1] T.L.J Howard, A.D Murta, S Gibson, Virtual environments for scene of crime
reconstruction and analysis, in: SPIE – Visual Data Exploration and Analysis VII,
vol 3960, 2000, pp 1–8.
[2] M Pollefeys, L.J.V Gool, M Vergauwen, K Cornelis, F Verbiest, J Tops,
Image-based 3D acquisition of archaeological heritage and applications, in: Virtual
Reality, Archeology, and Cultural Heritage, 2001, pp 255–262.
[3] N Snavely, S.M Seitz, R Szeliski, Modeling the world from internet photo
collections, International Journal of Computer Vision 80 (2) (2008) 189–210.
[4] M Pollefeys, D Nistér, J.-M Frahm, A Akbarzadeh, P Mordohai, B Clipp, C.
Engels, D Gallup, S.J Kim, P Merrell, C Salmi, S.N Sinha, B Talton, L Wang, Q.
Yang, H Stewénius, R Yang, G Welch, H Towles, Detailed real-time urban 3D
reconstruction from video, International Journal of Computer Vision 78 (2–3)
(2008) 143–167.
[5] N Cornelis, B Leibe, K Cornelis, L.V Gool, 3D urban scene modeling
integrating recognition and reconstruction, International Journal of
Computer Vision 78 (2–3) (2008) 121–141.
[6] H.-Y Shum, M Han, R Szeliski, Interactive construction of 3D models from
panoramic mosaics, in: Computer Vision and Pattern Recognition, 1998, pp.
427–433.
[7] Y Li, H.-Y Shum, C.-K Tang, R Szeliski, Stereo reconstruction from
multiperspective panoramas, IEEE Transaction on Pattern Analysis and
Machine Intelligence 26 (1) (2004) 45–62.
[8] D Farin, W Effelsberg, P.H.N de With, Floor-plan reconstruction from panoramic images, in: ACM Multimedia, 2007, pp 823–826.
[9] S Gibson, R.J Hubbold, J Cook, T.L.J Howard, Interactive reconstruction of virtual environments from video sequences, Computers & Graphics 27 (2) (2003) 293–301.
[10] M Pollefeys, L Van Gool, M Vergauwen, F Verbiest, K Cornelis, J Tops, R Koch, Visual modeling with a hand-held camera, International Journal of Computer Vision 59 (2004) 207–232.
[11] M Chandraker, S Agarwal, F Kahl, D Nister, D Kriegman, Autocalibration via rank-constrained estimation of the absolute quadric, in: IEEE Computer Vision and Pattern Recognition, 2007, pp 1–8.
[12] S.N Sinha, D Steedly, R Szeliski, M Agrawala, M Pollefeys, Interactive 3D architectural modeling from unordered photo collections, ACM Transactions
on Graphics 27 (5) (2008) 159.
[13] A van den Hengel, A Dick, T Thormählen, B Ward, P.H.S Torr, VideoTrace: rapid interactive scene modelling from video, ACM Transactions on Graphics
26 (3) (2007) 86.
[14] A Fitzgibbon, A Zisserman, Automatic 3D model acquisition and generation of new images from video sequences, in: European Signal Processing Conference,
1998, pp 1261–1269.
[15] M Pollefeys, R Koch, L Van Gool, Selfcalibration and metric reconstruction in spite of varying and unknown intrinsic camera parameters, in: IEEE International Conference on Computer Vision, 1998, pp 90–95.
[16] M Pollefeys, F Verbiest, L Van Gool, Surviving dominant planes in uncalibrated structure and motion recovery, in: European Conference on Computer Vision, 2002, pp 837–851.
[17] J Repko, M Pollefeys, 3D model from extended uncalibrated video sequences: Addressing key-frame selection and projective drift, in: International Conference on 3-D Digital Imaging and Modeling, 2005, pp 150–157 [18] R.I Hartley, P Sturm, Triangulation, Computer Vision and Image Understanding 68 (1998) 146–157.
[19] M Pollefeys, R Koch, L Van Gool, Selfcalibration and metric reconstruction in spite of varying and unknown intrinsic camera parameters, International Journal of Computer Vision 32 (1999) 7–25.
[20] P.E Debevec, C.J Taylor, J Malik, Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach, in: SIGGRAPH Annual Conference on Computer Graphics and Interactive Techniques, 1996,
pp 11–20.
[21] S El-Hakim, E Whiting, L Gonzo, 3D modeling with reusable and integrated building blocks, in: The 7th Conference on Optical 3-D Measurement Techniques, 2005, pp 3–5.
[22] R Haeusler, R Klette, F Huang, Monocular 3D reconstruction of objects based
on cylindrical panoramas, in: 3rd Pacific Rim Symposium on Advances in Image and Video Technology, 2008, pp 60–70.
[23] R Szeliski, Image alignment and stitching: a tutorial, Foundations and Trends
in Computer Graphics and Vision 2 (1) (2006) 1.
[24] Z Zhu, A.R Hanson, LAMP: 3D layered, adaptive-resolution, and multi-perspective panorama – a new scene representation, Computer Vision Image Understanding 96 (3) (2004) 294–326.
[25] W Wei, G Hui, Z Maojun, X ZhiHui, Multi-perspective panorama based on the improved pushbroom model, in: Workshop on Digital Media and its Application in Museum & Heritage, 2007, pp 85–90.
[26] R Cipolla, D Robertson, 3D models of architectural scenes from uncalibrated images and vanishing points, in: International Conference on Image Analysis and Processing, 1999, pp 824–829.
[27] M Wilczkowiak, P Sturm, E Boyer, Using geometric constraints through parallelepipeds for calibration and 3D modeling, Pattern Analysis and Machine Intelligence 27 (2) (2005) 194–207.
[28] M.A Fischler, R.C Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communication of the ACM 24 (1981) 381–395.
[29] B Korte, J Vygen, Combinatorial Optimization: Theory and Algorithms, third ed., Algorithms and Combinatorics, Springer, 2005.
[30] T Thormählen, H.-P Seidel, 3D-modeling by ortho-image generation from image sequences, in: ACM SIGGRAPH, 2008, pp 1–5.
Fig 11 Model of objects picked from models in Figs 9 and 10 It takes less than a minute to model an object Objects composed of planar surfaces (the stove and the table) are well reconstructed using our method, while complex objects like a fake body are hard to approximate using perspective extrusions alone.