DSpace at VNU: A semi-interactive panorama based 3D reconstruction framework for indoor scenes

Users can build a coarse walls-and-ﬂoor textured model in ﬁve mouse clicks, or a detailed model showing all furniture in a couple of minutes interaction.. Introduction A realistic 3D mod

Trang 1

A semi-interactive panorama based 3D reconstruction framework for indoor scenes Trung Kien Danga,⇑, Marcel Worringa, The Duy Buib

a

Intelligent Systems Lab Amsterdam, Informatics Institute, University of Amsterdam, The Netherlands

b

Human Machine Interaction Laboratory, University of Engineering and Technology, Vietnam National University, Hanoi, Viet Nam

a r t i c l e i n f o

Article history:

Received 20 May 2010

Accepted 13 July 2011

Available online 27 July 2011

Keywords:

3D reconstruction

Panorama

Interactive

a b s t r a c t

We present a semi-interactive method for 3D reconstruction specialized for indoor scenes which com-bines computer vision techniques with efficient interaction We use panoramas, popularly used for visu-alization of indoor scenes, but clearly not able to show depth, for their great field of view, as the starting point Exploiting user defined knowledge, in term of a rough sketch of orthogonality and parallelism in scenes, we design smart interaction techniques to semi-automatically reconstruct a scene from coarse

to fine level The framework is flexible and efficient Users can build a coarse walls-and-floor textured model in five mouse clicks, or a detailed model showing all furniture in a couple of minutes interaction

We show results of reconstruction on four different scenes The accuracy of the reconstructed models is quite high, around 1% error at full room scale Thus, our framework is a good choice for applications requiring accuracy as well as application requiring a 3D impression of the scene

1 Introduction

A realistic 3D model of a scene and the objects it contains is an

ideal for applications such as giving an impression of a room in a

house for sale, reconstruction of bullet trajectories in crime scene

investigation, or building realistic settings for virtual training[1]

It gives good spatial perception and enables functionalities such

as measurement, manipulation, and annotation One broad

catego-rization of scenes is outdoor versus indoor Outdoor scenes have

been popular in many modeling applications[2,3], especially

creat-ing models of urban scenes[4,5] Indoor scenes are prevalent in

applications like real estate management, home decoration, or

crime scene investigation (CSI), but research on them is limited

with some notable exceptions [6–8] In this paper we consider

the 3D reconstruction of indoor scenes

While in applications like real estate management, a coarse

model of a room is sufﬁcient, other applications need more

com-plete models For instance, in CSI the model should be comcom-plete

and show all the details in the crime scene as any object is

poten-tially evidence Each application also requires a different level of

accuracy Home decoration, for example, does not need extreme

accuracy for its purpose is merely to give an impression of the

scene For the CSI application, the model should be as accurate as

possible in order to make measurements and hypothesis validation

reliable Here we are seeking for a framework that can create

com-plete and accurate models in highly demanding applications such

as CSI, as well as coarse models for less demanding applications

3D models are often built manually from measurements and images using the background map technique Modelers take images of the object from orthogonal views (top, side and front), and try to create a model matching those images A measurement

is required to scale the model of the object to the right size Mod-eling from measurements and images is only suitable for simple scenes, as complex scenes with many objects require a lot of mea-surements, images, and interaction Even with meamea-surements, accurately modeling objects is difﬁcult since the assumption that the line of view is orthogonal to the object is hard to meet in prac-tice Since manual reconstruction is cumbersome and time con-suming [9], automatic or semi-interactive reconstruction is preferred

Automatic methods do exist and have shown good results for isolated objects and outdoor scenes[10,3,11–13] Those methods require a camera moving around and looking towards the scene

to capture it from multiple viewpoints [14–17] Such moves maintain a large difference between viewpoints, giving accurately estimated 3D coordinates[18] Unfortunately in practice people tend not to follow such moves, making these methods inaccurate and unreliable Indeed in the well-known PhotoSynth system its has been observed that quality suffers when users do not follow the appropriate moves [12] In simple cases, when modeling single-object scenes, automatic methods give results of 2–5% relative error[19] This is sufﬁcient for visualization, but rather low for measurements such as in CSI applications In indoor scenes where the space is limited, the situation is even worse

as it is difﬁcult, if not impossible, to perform the capturing moves suitable for automatic reconstruction So, automatic reconstruc-tion methods in their current state are not sufﬁcient for accurate indoor scene reconstruction

⇑Corresponding author Address: 42A – 144/4, Quan Nhan, Thanh Xuan, Hanoi,

Viet Nam Fax: +84 437547460.

E-mail address: dtkien123@gmail.com (T.K Dang).

Contents lists available atSciVerse ScienceDirect

Computer Vision and Image Understanding

j o u r n a l h o m e p a g e : w w w e l s e v i e r c o m / l o c a t e / c v i u

Trang 2

Semi-interactive methods are potential solutions [20–22] A

small amount of interaction helping computers in identifying

important features makes reconstruction more reliable A few

mouse clicks are enough to build a coarse model[8] Recent work,

such as the VideoTrace system[13], shows that interaction can be

made smart and efﬁcient by exploiting automatically estimated

geometric information

While interaction helps to efﬁciently improve the reliability,

there is still the problem of having limited space to move around

in indoor scenes Using panoramas is a potential solution

Panoramas give a broad ﬁeld of view So a few panoramas are

enough to completely capture a scene, and moving around the

scene is no longer a problem Furthermore, building panoramas

is reliable, thus using panoramas contributes to the reliability

of the overall solution The advantages of interaction on the

one hand and panoramas on the other, suggest that a

combina-tion of them would be a good solucombina-tion for indoor scene

reconstruction

Following the above observations, we propose a multi-stage,

semi-interactive, panorama based framework for indoor scenes

In the ﬁrst stage, a coarse model is build This stage extends upon

the technique in[8] We make the interaction more efﬁcient by

providing a smart interaction technique, and rectify panoramas

to guarantee the accuracy meets our aimed quality Furthermore,

we give a reconstructability analysis and, based on that, present a

capture assistant to guide the placement of the camera Results of

the ﬁrst stage, a coarse model and geometric constraints, facilitate

efﬁcient interaction to build a detailed model in the second stage

This framework overcomes the problems mentioned and makes it

easier to create accurate and complete models

In the next section we summarize related work Section3gives

an overview of our framework Section4 describes how to turn

panoramas into a ﬂoor-plan and how to build a coarse 3D model

Section 5 describes the interaction to add details to the coarse

model Then we evaluate the accuracy and show how efﬁcient

the framework is We close the paper with a discussion on how

to further automate the framework

2 Related work

2.1 Reconstruction from panoramas

A panorama is a wide-angle image, typically generated by

stitching images from the same viewpoint[23] Since panoramas

cover a wide view, they must be mapped on a cylinder or sphere

to view Accordingly, they are called cylindric or spherical

panora-mas Being wide-angle, panoramas give a good overview of a scene,

especially in indoor scenes where the ﬁeld of view is limited On

the other hand, they do not give a good spatial perception since

the viewpoint is ﬁxed at one point There is work on creating

panoramas using multiple viewpoints, called multi-perspective

panoramas[7,24,25] However, multi-perspective panoramas only

yield a 3D impression from the original viewpoints Other methods

are needed to make real 3D models

3D reconstruction from panoramas is found in[6,7,22] In[6],

a scene is modeled from geometric primitives, which are

manu-ally selected in panoramas of the scene Reconstruction is done

separately for each panorama, and then results of different

panoramas are merged together In [7] a dense 3D point cloud

is estimated from multi-perspective panoramas It, however,

requires a special rig for capturing the panoramas In[22]a

meth-od to do reconstruction from a cylindric panorama is proposed It

assumes that the scene, e.g a room, is composed of a set of

con-nected rectangles This method requires that all corners of the

room are visible, which is not often the case in practice In[8],

a method to reconstruct an indoor scene from normal single-per-spective panoramas is described The result is a coarse 3D model including walls onto which panoramas are projected Such a model is not sufﬁcient for some applications such as CSI, but this simple and ﬂexible method gives good intermediate results towards building a detailed model

2.2 Interaction in reconstruction

There are many types of interaction in reconstruction In the simplest case users deﬁne geometric primitives, such as points, lines, or pyramids and match these to the image data [20] In

[21], quadric surfaces are used to support more complex objects VideoTrace [13]lets users draw and correct vertices of a model

in an image sequence

The efﬁciency of interaction can be improved by exploiting what is already known about the scene The guiding principle is

to get as much geometric constraints as possible, and use them

to assist interaction These constraints can come from domain knowledge, the user interacting with the model, or through auto-matic estimation by the system, each of them we will now brieﬂy describe

Domain knowledge in the form of prior knowledge about the type of scenes to be reconstructed is helpful in designing efﬁcient interaction For example, when modeling man-made scenes we can assume that parallel lines are many Thus, vanishing points are helpful in constraining the interaction[6,26,27] In urban scenes there are often repeated component such as windows Hence in-stead of modeling them separately, the user can copy them[21]

In a man-made scene, objects are stacked on each other, e.g a table

is on the ﬂoor and books are on the table We can exploit these to reduce the interaction and improve accuracy[9]

Scene speciﬁc geometric constraints can be provided by users

In [9], users deﬁne how an object should be bound to another one, to reduce the degrees of freedom in the interaction to recon-struct that object In[8], after roughly deﬁning a room by a sketch, users can build a coarse model with a few mouse clicks

Some geometric constraints can be reliably estimated by com-puters In some cases, coarse 3D structure and camera motion information can be estimated State-of-the-art interactive recon-struction systems including[13,12]take advantage of such infor-mation sources to create intuitive and efﬁcient interaction For example, in VideoTrace[13]system, vertices drawn in one frame

by the user are tracked and rendered in other frame by the system Users browse forward or backward in the video sequence to correct those vertices until satiﬁed For the user it is like reﬁning a model rather than creating it from scratch

In practice, those three sources of constraints are often mixed in the modeling ﬂow, which is also what we will do in this paper

3 Framework overview

Our framework is an A-to-Z solution, from capturing an indoor scene to modeling it, which is summarized inFig 2

The framework takes as input a sketch of the ﬂoor-plan, a top-down design drawing of a room (e.g.1a) that describes its walls and their relative positions drawn by the user The capture plan-ning module analyzes the sketch to tell the user how many panora-mas are needed to completely capture the scene, and suggests camera placement i.e the appropriate viewpoints Either calibrated

or uncalibrated cameras can be used, but to guarantee good accu-racy, we advise to pre-calibrate the camera and correct the lens distortion before stitching them into panoramas Users can use a software package of their own choice to estimate the camera

Trang 3

motion and stitch corrected images together into panoramas, for

example using Hugin1(Fig 1b)

To build a coarse model of a room, the users picks the corners,

intersections of walls, in the panoramas The framework provides

a smart corner picking method to make the interaction comfortable

The location of the corners on the panoramas and the sketch are

en-ough to estimate the correct ﬂoor-plan and build a coarse model of

the scene[8](Fig 1c) More expressively, we call this coarse model,

which includes textured walls and ﬂoor, a walls-and-ﬂoor model A

typical rectangular room needs only one panorama to build such a

model, where irregular rooms may need more than one panorama

depending on the shape of the room and the viewpoints of the

pan-oramas This stage is discussed in detail in Section4

In order to add more detail efﬁciently, we exploit the geometric

constraint resulting from the observation that indoor scenes

contain many ﬂat objects aligned to walls We iteratively use

known surfaces to guide an interaction type that we call perspective

extrusion to add objects This technique helps to quickly build a

detailed model (Fig 1d) Details of this stage are given in Section5

4 Building a walls-and-ﬂoor model

In this section we discuss methods for building a

walls-and-ﬂoor model For easier comprehension, we present the walls-and-ﬂoor-plan

estimation and other elements prior to the capture planning For

the moment, we assume that the set of panoramas given is

sufﬁcient for ﬂoor-plan estimation

We let the user draw a sketch of the ﬂoor-plan indicating

orthogonality and parallelism of walls, and use a method built

upon the method in [8]to estimate an accurate ﬂoor-plan This

method is based on the observation that the horizontal dimension

of the panoramic image is proportional to the horizontal view

angle of the panorama Thus a set of corners divides the panorama

into horizontal view angles of known ratio If we assure that any

panorama looks all around a room, the total horizontal view angle

is obviously 360 degrees without any measurement Hence we

know each horizontal view angle This observation is valid when

the corners are perfectly aligned to the vertical dimension Thus,

to make a more accurate ﬂoor-plan estimation than in[8], we rec-tify the panoramas to meet that condition ﬁrst

Building 360-degree panoramas is well studied[23], thus we do not discuss it here For the next step, indicating corners in panora-mas, we provide smart corner picking Rectifying panorapanora-mas, and estimating the ﬂoor-plan are subsequently discussed below Then

we present the reconstructability analysis and the capture assistant

4.1 Smart corner picking

In order to estimate the ﬂoor-plan, coordinates of the top-down projections of corners are needed As panoramas may not be well aligned, getting one point on a corner is not enough Instead we need to identify a corner by a line segment One way to do that

is to ask a user to manually draw a line onto a panorama To make

it even simpler, we provide a utility to let users just casually pick a point in a panorama and the system will automatically identify the corner line

Since the straightness of lines is not preserved in the coordinate system of a panorama, here a cylindric one, we must project a user picked point into one of the images, from which the panorama is created, to work in the image coordinate system We assume that the best image is the one whose image plane is most orthogonal to the projection ray of the picked point Or in other words, the angle between the ray from the viewpoint to the image center and the projection ray rcof the picked point is smallest

if ¼ arg min i

where r(i) is the principal ray of image i

Since panoramas are usually approximately aligned, we limit the detection to a vertical image band around the picked point

We detect vertical edges around that point, and ﬁt a line through the picked point and edge points using RANSAC[28] The picked point is used here as an anchor to avoid the auto-detected line moving to a wrong location Since the picked point is not exactly

at the right position, we afterwards relax the condition, optimizing the line without constraining it to go though the picked point to yield the ﬁnal line The process is summarized in Table 1 and two examples are given inFig 3

a A rectangular room b (Unwrapped)panorama of the room

c The walls-and-floor model d Adding more detail to the model

Fig 1 Illustration of input and (intermediate) results of the reconstruction process A simple rectangular room is used as example.

1

Trang 4

4.2 Rectifying panoramas

To accurately estimate the ﬂoor-plan, we ﬁrst rectify the pan-oramas so that corners are aligned to the vertical dimension for a cylindrical panorama

Each corner together with the viewpoint deﬁnes a plane And these planes remain unchanged no matter how we move the coor-dinate system since they are deﬁned by the scene and viewpoint

To align the panorama cylinder we need to ﬁnd the rotation R that makes those planes parallel to the vertical direction In other words, after transforming by R, the normals of planes are orthogo-nal to w= (0, 0, 1)T, i.e

uT

where uiare the planes’ normals

Using this constraint, given at least three corners, we can com-pute the last column of R1, or equivalently the last row of R, by ﬁnding the least-square solution If the last row of R is

r3= (a, b, c), and from the constraint that R is orthogonal, we choose its other rows as:

r1ﬃ ðb; a; 0Þ r2ﬃ ðac; bc; a2þ b2Þ ð3Þ

where ffi means equal up to a scale, and jr1j = jr2j = jr3j = 1 Once having computed R, we resample the panoramic image to finish the rectification

4.3 Estimating the ﬂoor-plan

The locations of corners in panoramas, identified in the previous step, give sets of horizontal angles between the corners when viewed from the panorama viewpoint If we have a way to repre-sent those angles in terms of coordinates of projections of corners and viewpoints in the floor-plan, we have a set of constraints to estimate the floor-plan and the viewpoints Here we briefly review such a method presented in[8], discuss its applicability, and show how we extend it for our work

A sketch is a model of the ﬂoor-plan We force users to draw rectilinear lines parallel to the axes by providing them with a drawing grid Of course, this alignment can be done automatically, but drawing in such way helps users to correctly deﬁne parallelism and orthogonality Note, as only parallelism and orthogonality are important in the parameterization, a sketch of a rectangular room

is any arbitrary rectangle

Assuming that the room has n corners, we need at most 2n parameters to represent it A viewpoint, whose coordinates both have to be estimated, is represented by a pair of separate parame-ters Suppose that we havevpanoramas, then the total number of parameter is 2n + 2v For each wall drawn in the sketch that is par-allel to an axis, since the two corners of a wall share a horizontal or vertical coordinate, the number of parameters is reduced by one (Fig 4a) Hence the number of parameters is reduced by the num-ber of those walls, m To further reduce the numnum-ber of parameters, the origin of the coordinate system is set at one corner, and the length of a wall is set to one, as the reconstruction is up to a scale anyway These settings reduce the number of parameters by 3 In summary, the number of parameters to be estimated is:

From the model of the ﬂoor-plan that contains the coordinates

of corners and viewpoints, we can estimate the angle between two corners as seen from a viewpoint (Fig 4b) These angles are equal

to the set of angles deﬁned by user-picked corners in the panora-mas This set of constraints can be used to estimate the parameters

of the ﬂoor-plan model and the viewpoints

Fig 2 Overview of the proposed framework.

Table 1

Smart corner picking process.

1 Let the user pick a point in/near a corner from the panorama

2 Find the best image, according to Eq (1)

3 Perform canny edge detection in a horizontal band of one tenth of the

image width around the picked point

4 Fit a line through the picked point and the edges using RANSAC, where

the line must go though the picked point

5 Optimize the line without constraning it to the picked point

Fig 3 Two examples of smart corner picking (a) The user picks a point (b) Edges

are detected in a vertical image band; a line is ﬁtted through the picked point and

edges Note that there is another (even longer) vertical line but the algorithm

Trang 5

At this point, the coordinates of top-down projections of

view-points are estimated But the viewview-points’ heights are missing

Complete viewpoint coordinates are required to add more details

to the model in the later stage Since we already know the the ﬂoor

and the projection of the viewpoint on the ﬂoor, we only need one

point to compute the relative distance from the viewpoint to the

ﬂoor To get that point, we ask the user to pick any ﬂoor point in

each panorama to compute its viewpoint height

4.4 Reconstructability analysis

We now give an analysis of the ﬂoor-plan estimation method

To estimate the ﬂoor-plan and the viewpoint coordinates, the

number of constraint must be greater or equal to the number of

unknowns given in Eq.(4)of the previous sub-section

Suppose that viewpoint i sees cicorners, since the sum of the

angles is 360 degrees, we have ci 1 independent constraints

Since the viewpoints are different, constraints of one viewpoint

are independent of constraints of other viewpoints The problem

is solvable when the number of constraints is greater than or equal

to the number of parameters:

Xv

i¼1

ciP2n þ 3v m 3 ð5Þ

Common rooms have all walls parallel to an axis, i.e the

ﬂoor-plan is a rectilinear polygon, thus m is equal to n Eq.(5)then

simpliﬁes to:

Xv

i¼1

Suppose that we can ﬁnd a point from which all corners are

vis-ible, i.e ci= n, Eq.(6)is then further simpliﬁed tovP1 So indeed

given a rectilinear ﬂoor-plan, one panorama that sees all corners

might be enough to estimate it A special, yet the most common, case is a rectangular room Since we see all four corners from any viewpoint, one panorama might be enough to reconstruct the walls-and-ﬂoor model

We need more panoramas when the ﬂoor-plan is not a rectilin-ear polygon, and when from the chosen viewpoint we cannot see all corners.Fig 5shows examples

4.5 The capture assistant

The capture assistant helps users in planning viewpoints in the room so that the reconstruction is possible and the model covers all of the room To that end, it must know the number of unknowns given a sketch, the number of constraints produced by viewpoints and the area they cover Furthermore, it is preferred that the num-ber of viewpoints is minimal

The number of unknowns is computed easily using Eqs.(5) and (6) In a convex polygon, a line segment from any point within it to any of its vertices does not go out of itself Hence if the ﬂoor-plan is convex, counting the constraints is trivial since from any viewpoint

we see all the corners When the ﬂoor-plan is concave, the problem

is nontrivial Since we keep the sketching simple, only asking users

to align rectilinear lines of the sketch parallel to axes, the sketch is freely stretched unevenly along axes Our solution is to decompose the sketch into tiles and compute the minimal number of observable corners from each tile, invariant to how it is stretched along axes The algorithm is described in algorithmAlgorithm 1

Algorithm 1 Decomposing a sketch into invariant observable areas

Step 1: Cut the sketch into tiles using all distinguished x and y coordinates A sketch is turned into a set of rectan-gles and trianrectan-gles (Fig 6a) Where each of them is called

a tile (Fig 6b)

Step 2: For each tile, ﬁnd its invariant observable area (IOA) by the following steps:

– Initiate the area contains only the tile itself

– Iteratively add a tile if it together with some tiles already added forms a convex polygon containing the initial tile

Lemma 4.1 If the sketch is different from the real ﬂoor plan by an unevenly scaling, the IOAs are invariant to unevenly scaling

Proof The sketch is different from the real ﬂoor plan by an unevenly scaling, the coordinates of corners are transformed by

an monotic function, thus the order between any pair of x or y coordinate is preserved That means if xa> xbin the ﬂoor-plan, or one sketch, in another sketch that still holds Consequently The order of tiles, as decomposed in the algorithm above, is horizon-tally and vertically unchanged in any sketch Consequently the IOAs, a set of tiles, built following step 2 in Algorithm 1 is unchanged h

Lemma 4.2 Any point in an IOA is observable from any point in the initial tile

Proof Any point is observable from another point within a convex polygon Since the extending scheme only add new tile if it is a part

of a convex polygon with the initial tile, all points in the IOA are observable from any point in the initial tile h

Fig 4 Parameterization of the ﬂoor-plan model given a sketch, simpliﬁed from

Fig 2 in [8] (a) To reduce the number of parameters, corners are represented by

shared parameters (b) Each viewpoint is parameterized separately Locations of

corners in a panorama at the viewpoint give a set of angles between corners as

viewed from the viewpoint.

b a

Unseen corner

viewpoint

Fig 5 When the ﬂoor-plan is not rectilinear (a), or if from the viewpoint we cannot

see all corners (b), we may need more than one panorama to estimate it.

Trang 6

Having IOAs we check if the planned viewpoints surely cover all

the room and provide enough constrains to estimate the real

ﬂoor-plan The IOA of a viewpoint is the IOA of the tile containing it By

checking if the union of the planned viewpoints’ IOAs, we can make

sure that the set of viewpoints covers all the scene Checking

whether the ﬂoor-plan is solvable is done by summing the number

of corners observed by each IOA, and then comparing it to the

con-dition in(5)

Given the IOAs of a sketch, ﬁnding an optimal set of viewpoints,

i.e smallest number of viewpoints that covers the scene

com-pletely and satisﬁes the reconstructibility condition(5), is a hard

problem Let us construct a graph representing the problem Each

tile is a node in the graph For each tile, we have edges connecting

it to all tiles in its IOA Since if a tile is observable from another one,

than from it we can also observe the other tile, the edges are

undi-rected Put aside the reconstructibility condition, our problem is

ﬁnding the minimal set of nodes from which we have edges

con-nect to the rest of the nodes This is the minimal dominating set

problem, one of the known NP-complete problems[29] With an

additional condition, our problem is arguably of the same

com-plexity To suggest users a solution in interactive time, we propose

the following greedyAlgorithm 2

Algorithm 2 Suggesting viewpoints, the greedy algorithm

Step 1 Find a dominating set Initialize an empty

domi-nating set of tiles While the scene is not covered by the

union of the IOAs of tiles in the set, add a tile whose IOA

contains most uncovered tiles

Step 2 Satisfy the reconstructability condition While

the condition of(5)is not satisﬁed, add a tile whose IOA

contains most corners, i.e providing most number of

constraints

In practice, since there are objects in the room, we might not be able to put the camera at the suggested positions, or see all the cor-ners we should see according to the analysis Should an object, e.g

a tall wardrobe, completely block corner(s), it must be considered

as part of the walls The procedure to suggest viewpoints is the same If a suggested tile is inappropriate to place the camera, users can mark it so thatAlgorithm 2can ignore that tile when recom-puting the suggested viewpoints This procedure has proven to give good results in practical cases

Viewpoints also affect the accuracy of the ﬂoor-plan and the texture quality In practice, since the panorama is built from high resolution images, the texture quality should not be a problem

To estimate the floor plan accurately, intuitively one should place the camera in the center of the room to balance the constraints After this stage, we have a textured walls and floor model In this model, objects are projected on the walls and on the floor It gives a good overview of the scene As indicated in applications such as real estate management it should be satisfactory However for an application such as CSI, the object localization is not detailed enough Thus, we need the second stage to add more detail

5 Adding details using perspective extrusion

The model now contains planes of walls, the floor, and view-point locations We design interactive methods to add detail to the model in spirit of the whole framework: flexibly reconstructing objects from coarse to fine For example, a table is reconstructed first and then the stack of books on it Characteristics of indoor scenes are utilized in designing interaction methods meeting that idea

In indoor scenes, many objects are composed of planes Since objects are often aligned to walls, those planes are likely parallel

to at least one wall or the ﬂoor As indicated ealier, this gives a con-straint to reconstruct objects This action is similar to an extrusion,

a popular standard technique in manual 3D modeling In a normal extrusion, the orthogonal projection of the object’s boundary on a reference plane is orthogonally popped up with a known distance, creating a new object planar surface In our situation we do not see the object in orthogonal views, but from a panorama viewpoint So, instead of moving the object’s boundary on lines orthogonal to the reference plane, we move it on rays from the viewpoint to their ori-ginal locations in the reference plane (Fig 1d) Because of this con-straining, we call it a perspective extrusion

Our aim is to reconstruct an object surface S that has a surface parallel to an already reconstructed plane (Fig 7) S is recon-structed from a set of three parameters The reference plane l is a reconstructed plane to which the plane of S is parallel The distance

S to l is denoted by d; and b is a projection of the boundary of S in a panorama The reconstruction procedure includes shifting the par-allel plane l by distance d to get the object plane p, and cutting p by the pyramid of b and the viewpoint from which we see b Once we have S, users can choose whether the object is a solid box or just a planar surface The perspective extrusion process is summarized in

Table 2

In related work such as[9], object parameters are defined indi-rectly in terms of geometric objects, e.g a rectangular box In pic-tures of indoor scenes, objects are frequently occluded, making the use of geometric objects difficult To give more options in recon-structing an object, we choose to let users define those parameters directly and separately For example, a box is defined by one of its faces and the distance to the plane the face is parallel to The dis-tance can be defined by an orthogonal line to any reconstructed plane

The parallel plane l is picked from the current model We pro-vide two ways to deﬁne d, namely using one or two viewpoints

Fig 6 Illustration of the sketch decomposition algorithm (a) The sketch is cut into

rectangles and triangles using all distinguished x and y coordinates (b) The tile

graph indicates possibilities of traveling among tiles (c) For each tile the initial

observable area is itself (black); then tiles reached by traveling parallel to axes are

iteratively added (gray); ﬁnally tiles reached from two ways are added (diagonal

pattern) (d) The number of corners contained in the observable area is the minimal

number of observable corners from the tile.

Trang 7

To deﬁne d from a single viewpoint, the user draws a line from the

object surface orthogonally to a reconstructed plane To deﬁne d

from two viewpoints, the user picks the projections of a point on

the object surface in two panoramas We then triangulate these

two projections to estimate the 3D coordinates of that point, and

its distance to l, which already reconstructed, is the distance d This

strategy is useful when there is no physical clue for guiding the

drawing of a line from the object’s surface orthogonally to a

recon-structed plane For example, for a chair, whose legs are bended,

standing in the middle of the room, there would be no physical

clue to draw d from a single viewpoint The boundary b is a

poly-gon drawn by users from the viewpoint To assist the drawing of

b, we assume as a default that the boundary of S has orthogonal

angles and is symmetric as long as the drawing of b does not break

this assumption Using those assumptions, we predict the

bound-ary and render it This is helpful to accurately deﬁne b, especially

when a vertex is occluded

For ﬂexibility and accuracy, we let users deﬁne any parameter

(l, d, or b) from any available panorama viewpoint A possible

way to increase ﬂexibility and accuracy is to let users adjust the

boundary b from different viewpoints as in VideoTrace [13]

However, that is only effective if we have many viewpoints, i.e

observations of the boundary To keep the framework simple and

the number of input panoramas small, we have decided not to

use that technique

To be reconstructible, objects must be seen and the parameters

for perspective extrusion must be deﬁnable The capture assistant

described in Section4.5handles part of this by ensuring all of the

ﬂoor and walls will be seen Of course objects can be occluded

completely by other objects, but that is hardly the case for the

main objects in the scene For l and b, if objects are complex or

curvy, we can only approximate them (Fig 11c and d) For a

‘‘floating’’ object, like the chair inFig 10a, since there is no solid connection from its surface to another surface, one should use two viewpoints to define d In general, if an object has sufficiently different appearance in two panoramas, then it is reconstructible

6 Results

We now present results showing that the proposed framework overcomes difﬁculties in indoor scene reconstruction to efﬁciently produce complete and accurate models

6.1 Datasets

Four scenes are used in our evaluation (Fig 8) Three are rooms

in a house captured by ourselves The last one is a fake crime scene captured by The Netherlands Forensic Institute The ground truth is deﬁned by measurements made on objects in the scenes All scenes are typical indoor scenes, rather complex and the space is limited For every scene, the minimal number of panoramas required, as computed using our capture assistance, is one Because of obstacles (furniture) there was no good position for capturing all corners, thus we had to use two panoramas for the three rooms For the fake crime scene, we use one panorama

Fig 7 A perspective extrusion pops up an object from an already reconstructed

plane.

Table 2 Perspective extrusion process.

1 The user picks the reference plane l

2 The user deﬁnes the distance from l to the object plane p, either from one

or two viewpoints

3 Compute the object plane p by shifting l by d

4 The user deﬁnes the boundary though its projection b onto a panorama

5 Compute initial S by cutting the object plane p by the pyramid of b and the panorama viewpoint

6 The user choses object type, either a solid box or a planar surface

2 panoramas 2 panoramas 2 panoramas 1 panoramas

a Bedroom b Dining room c Kitchen d Fake crime scene

Table 3 Floor-plan relative errors (in percent, mean ± standard deviation) To achieve the best accuracy lens distortion should be applied before panorama stitching, and panorama rectiﬁcation (Section 4.2 ) should be used The ﬂoor-plan error of the fake crime scene

is not available because of lacking ground truth.

Without rectiﬁcation

Uncalibrated images

Calibrated &

rectiﬁcation Bedroom 0.48 ± 1.45 0.49 ± 0.16 0.38 ± 0.14 Dining

room 7.50 ± 3.20 7.48 ± 3.17 1.18 ± 0.49 Kitchen 9.88 ± 3.24 0.48 ± 0.23 0.28 ± 0.05

Trang 8

6.2 Accuracy

Since the reconstructed model is up to a scale and a rotation, we

have to eliminate that ambiguity in order to evaluate the accuracy

To do so we estimate a transformation from the estimated

floor-plan to the ground truth floor-floor-plan We apply this to the model,

and then evaluate the model at two levels: at room scale (i.e

ﬂoor-plan error), and at object scale (i.e object measurements)

Table 3 shows ﬂoor-plan errors with and without rectifying

panoramas In two out of three datasets the improvement is quite

signiﬁcant In one dataset, the Bedroom, the error without

rectiﬁca-tion is almost the same as rectiﬁed since the angles of the original

panoramas almost perfect Using uncalibrated images (calibration

done during stitching) is possible, though the results are not as

good as using pre-calibrated images The errors, with pre-cali-brated images and panorama rectification, are about a few centi-meters in a room of about ten squared centi-meters The relative errors, computed by dividing the absolute error by the length of the diagonal of the rectangular bounding box of the true floor-plan, are about 1% The estimated floor-plan of the dining room is less accurate since it was hard to identify some of its corners in the panoramas Our accuracy is higher than in[8], where the error is about 4% Two differences responsible for the improvement are: the floor-plan estimation strategy we used, and our panorama rec-tification In[8], a sketch of several rooms is used to parameterize and estimate the floor-plan of multiple rooms It was noted that by doing so, and thus ignoring thickness of walls, might reduce the accuracy [8] To achieve high accuracy, we have estimated the floor-plan of each room separately More importantly, our rectifica-tion eliminates the inaccurate alignment in the input panoramas (seeTable 4)

For objects, since the angles between geometric primitives, lines and planes, are already enforced during the reconstruction,

we only evaluate the length errors, absolute and relative to the ground truth lengths

The accuracy of our framework is quite high, e.g comparing to

[8,19] Object accuracy is slightly less accurate than scene accuracy

in terms of relative error, but our examination shows that the absolute errors are about the same

6.3 Efﬁciency and completeness

Our framework is efﬁcient A scene can be modeled in a dozen

of minutes Fig 9 shows the model of a rather complex scene namely the fake crime scene The walls-and-ﬂoor model is built in seconds All furniture is modeled in about 5 min The time taken

to build the ﬁnal model that includes small objects such as cups

on tables is 10 min Furthermore, users do not need to measure ob-jects for modeling at capture time

Fig 10shows models of some scenes built using our framework Close-ups of objects picked from reconstructed models are given in

Fig 11 Objects composed of planar surfaces are well recon-structed, while complex curvy objects can only be approximated using perspective extrusions

7 Conclusion

We have proposed a panorama-based semi-interactive 3D reconstruction framework for indoor scenes The framework overcomes the problems of limited ﬁeld of view in indoor scenes and has the desired properties: robustness, efﬁciency, and accu-racy Those properties make it suitable for a broad range of appli-cations, from a coarse model created in a few seconds for a presentation to a detailed model for measurement in crime scene

Table 4

Average object errors (mean ± standard deviation).

Average object error Absolute (cm) Relative (%)

Fake crime scene 6.2 ± 2.6 1.84 ± 0.89

a Walls-and-floor model b All furniture model

0 min, 6 mouse clicks 5 min, 10 extrusions

c Final model d Final textured model

10 min, 19 extrusions

s

Fig 9 Resulting models as function to time and amount of interaction spent The

example is the fake crime scene.

a Bedroom b Dining room c Kitchen

Trang 9

investigation Models inexpensively created using our framework

are an intuitive medium to manage and retrieve digitized

informa-tion of scenes and use it in interactive applicainforma-tions

A limitation of the framework is that it lacks the ability to

mod-el complex objects This could be counteracted by other more

expensive techniques For example the VideoTrace technique[13]

lets users model objects from video sequences The ortho-image

technique [30] creates background maps from image sequences

to assist artists in modeling objects in 3D authoring software As

objects are complex, both techniques require images from many

different angles and more interaction Since our panoramic images

are calibrated, we can integrate those techniques into our

frame-work as plugins Once the object is reconstructed using those

tech-niques, we can automatically integrate it back into our model, by

matching panoramic images to the image sequence used to model

the object and then estimating the pose of the object Thus the

framework is a useful tool for both quickly building coarse models

as well as efﬁciently building accurate models In the

accompany-ing video the system is demonstrated on a number of realistic

scenes

Acknowledgments

This work is supported by the BSIK project MultimediaN and

the Research Grant from Vietnam National University, Hanoi No

QG.10.23

Appendix A Supplementary data

Supplementary data associated with this article can be found, in

the online version, atdoi:10.1016/j.cviu.2011.07.001

References

[1] T.L.J Howard, A.D Murta, S Gibson, Virtual environments for scene of crime

reconstruction and analysis, in: SPIE – Visual Data Exploration and Analysis VII,

vol 3960, 2000, pp 1–8.

[2] M Pollefeys, L.J.V Gool, M Vergauwen, K Cornelis, F Verbiest, J Tops,

Image-based 3D acquisition of archaeological heritage and applications, in: Virtual

Reality, Archeology, and Cultural Heritage, 2001, pp 255–262.

[3] N Snavely, S.M Seitz, R Szeliski, Modeling the world from internet photo

collections, International Journal of Computer Vision 80 (2) (2008) 189–210.

[4] M Pollefeys, D Nistér, J.-M Frahm, A Akbarzadeh, P Mordohai, B Clipp, C.

Engels, D Gallup, S.J Kim, P Merrell, C Salmi, S.N Sinha, B Talton, L Wang, Q.

Yang, H Stewénius, R Yang, G Welch, H Towles, Detailed real-time urban 3D

reconstruction from video, International Journal of Computer Vision 78 (2–3)

(2008) 143–167.

[5] N Cornelis, B Leibe, K Cornelis, L.V Gool, 3D urban scene modeling

integrating recognition and reconstruction, International Journal of

Computer Vision 78 (2–3) (2008) 121–141.

[6] H.-Y Shum, M Han, R Szeliski, Interactive construction of 3D models from

panoramic mosaics, in: Computer Vision and Pattern Recognition, 1998, pp.

427–433.

[7] Y Li, H.-Y Shum, C.-K Tang, R Szeliski, Stereo reconstruction from

multiperspective panoramas, IEEE Transaction on Pattern Analysis and

Machine Intelligence 26 (1) (2004) 45–62.

[8] D Farin, W Effelsberg, P.H.N de With, Floor-plan reconstruction from panoramic images, in: ACM Multimedia, 2007, pp 823–826.

[9] S Gibson, R.J Hubbold, J Cook, T.L.J Howard, Interactive reconstruction of virtual environments from video sequences, Computers & Graphics 27 (2) (2003) 293–301.

[10] M Pollefeys, L Van Gool, M Vergauwen, F Verbiest, K Cornelis, J Tops, R Koch, Visual modeling with a hand-held camera, International Journal of Computer Vision 59 (2004) 207–232.

[11] M Chandraker, S Agarwal, F Kahl, D Nister, D Kriegman, Autocalibration via rank-constrained estimation of the absolute quadric, in: IEEE Computer Vision and Pattern Recognition, 2007, pp 1–8.

[12] S.N Sinha, D Steedly, R Szeliski, M Agrawala, M Pollefeys, Interactive 3D architectural modeling from unordered photo collections, ACM Transactions

on Graphics 27 (5) (2008) 159.

[13] A van den Hengel, A Dick, T Thormählen, B Ward, P.H.S Torr, VideoTrace: rapid interactive scene modelling from video, ACM Transactions on Graphics

26 (3) (2007) 86.

[14] A Fitzgibbon, A Zisserman, Automatic 3D model acquisition and generation of new images from video sequences, in: European Signal Processing Conference,

1998, pp 1261–1269.

[15] M Pollefeys, R Koch, L Van Gool, Selfcalibration and metric reconstruction in spite of varying and unknown intrinsic camera parameters, in: IEEE International Conference on Computer Vision, 1998, pp 90–95.

[16] M Pollefeys, F Verbiest, L Van Gool, Surviving dominant planes in uncalibrated structure and motion recovery, in: European Conference on Computer Vision, 2002, pp 837–851.

[17] J Repko, M Pollefeys, 3D model from extended uncalibrated video sequences: Addressing key-frame selection and projective drift, in: International Conference on 3-D Digital Imaging and Modeling, 2005, pp 150–157 [18] R.I Hartley, P Sturm, Triangulation, Computer Vision and Image Understanding 68 (1998) 146–157.

[19] M Pollefeys, R Koch, L Van Gool, Selfcalibration and metric reconstruction in spite of varying and unknown intrinsic camera parameters, International Journal of Computer Vision 32 (1999) 7–25.

[20] P.E Debevec, C.J Taylor, J Malik, Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach, in: SIGGRAPH Annual Conference on Computer Graphics and Interactive Techniques, 1996,

pp 11–20.

[21] S El-Hakim, E Whiting, L Gonzo, 3D modeling with reusable and integrated building blocks, in: The 7th Conference on Optical 3-D Measurement Techniques, 2005, pp 3–5.

[22] R Haeusler, R Klette, F Huang, Monocular 3D reconstruction of objects based

on cylindrical panoramas, in: 3rd Paciﬁc Rim Symposium on Advances in Image and Video Technology, 2008, pp 60–70.

[23] R Szeliski, Image alignment and stitching: a tutorial, Foundations and Trends

in Computer Graphics and Vision 2 (1) (2006) 1.

[24] Z Zhu, A.R Hanson, LAMP: 3D layered, adaptive-resolution, and multi-perspective panorama – a new scene representation, Computer Vision Image Understanding 96 (3) (2004) 294–326.

[25] W Wei, G Hui, Z Maojun, X ZhiHui, Multi-perspective panorama based on the improved pushbroom model, in: Workshop on Digital Media and its Application in Museum & Heritage, 2007, pp 85–90.

[26] R Cipolla, D Robertson, 3D models of architectural scenes from uncalibrated images and vanishing points, in: International Conference on Image Analysis and Processing, 1999, pp 824–829.

[27] M Wilczkowiak, P Sturm, E Boyer, Using geometric constraints through parallelepipeds for calibration and 3D modeling, Pattern Analysis and Machine Intelligence 27 (2) (2005) 194–207.

[28] M.A Fischler, R.C Bolles, Random sample consensus: a paradigm for model ﬁtting with applications to image analysis and automated cartography, Communication of the ACM 24 (1981) 381–395.

[29] B Korte, J Vygen, Combinatorial Optimization: Theory and Algorithms, third ed., Algorithms and Combinatorics, Springer, 2005.

[30] T Thormählen, H.-P Seidel, 3D-modeling by ortho-image generation from image sequences, in: ACM SIGGRAPH, 2008, pp 1–5.

Fig 11 Model of objects picked from models in Figs 9 and 10 It takes less than a minute to model an object Objects composed of planar surfaces (the stove and the table) are well reconstructed using our method, while complex objects like a fake body are hard to approximate using perspective extrusions alone.

Định dạng
Số trang	9
Dung lượng	882,91 KB