Báo cáo hóa học: " Research Article Occlusion-Aware View Interpolation" pot

Forward- and backward-projection methods When computing an intermediate view, the central role is played by a transformation between the coordinate systems of known images and the novel

Trang 1

Volume 2008, Article ID 803231, 15 pages

doi:10.1155/2008/803231

Research Article

Occlusion-Aware View Interpolation

Serdar Ince 1, 2 and Janusz Konrad (EURASIP Member) 1

1 Department of Electrical and Computer Engineering, Boston University, 8 Saint Mary’s Street, Boston, MA 02215, USA

2 IntelliVid Corporation, Cambridge, MA 02138, USA

Correspondence should be addressed to Janusz Konrad,jkonrad@bu.edu

Received 3 March 2008; Accepted 1 October 2008

Recommended by Peter Eisert

View interpolation is an essential step in content preparation for multiview 3D displays, free-viewpoint video, and multiview image/video compression It is performed by establishing a correspondence among views, followed by interpolation using the corresponding intensities However, occlusions pose a significant challenge, especially if few input images are available In this paper, we identify challenges related to disparity estimation and view interpolation in presence of occlusions We then propose

an occlusion-aware intermediate view interpolation algorithm that uses four input images to handle the disappearing areas The algorithm consists of three steps First, all pixels in view to be computed are classified in terms of their visibility in the input images Then, disparity for each pixel is estimated from diﬀerent image pairs depending on the computed visibility map Finally, luminance/color of each pixel is adaptively interpolated from an image pair selected by its visibility label Extensive experimental results show striking improvements in interpolated image quality over occlusion-unaware interpolation from two images and very significant gains over occlusion-aware spline-based reconstruction from four images, both on synthetic and real images Although improvements are obvious only in the vicinity of object boundaries, this should be useful in high-quality 3D applications, such

as digital 3D cinema and ultra-high resolution multiview autostereoscopic displays, where distortions at depth discontinuities are highly objectionable, especially if they vary with viewpoint change

Copyright © 2008 S Ince and J Konrad This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

1 INTRODUCTION

The generation of a novel (virtual) view of a scene captured

by real cameras is often referred to as image-based rendering.

The problem is illustrated for the case of two cameras in

have been captured by camera C J had it been used, based

on images I L and I R captured by cameras C L and C R,

respectively Generation of such views is an essential step in

content preparation for multiview 3D displays [1 3],

free-viewpoint video [4,5], and multiview compression [6 8]

A very similar problem exists in frame-rate conversion of

video, except that novel images are created from

diﬀerent-time snapshots rather than diﬀerent views

In order to render novel view, first a correspondence

among known views needs to be established followed by

an estimation of new intensity from the known intensities

in correspondence Depending on how the correspondence

mapping is defined, two approaches are possible One

approach is based on backward projection of intensities (the

term “backward projection” is borrowed from the field of video coding where it refers to predicting luminance/color from previous (in time) frames), where the mapping is defined in the coordinate system of the unknown view (J in Figure 1) Thus, this approach simplifies the final estimation to an interpolation problem The other approach

is based on forward projection of intensities, where the

mapping is defined in the coordinate system of one of the known views (I L or I R in Figure 1), thus making the final estimation more diﬃcult since projected intensities do not,

in general, belong to the sampling grid of J In fact, the

problem cannot be solved in this case by interpolation, and the novel-view intensities must be approximated instead Typically accomplished by means of additional constraints, this process is often referred to as view reconstruction We will review these two approaches in more detail inSection 2 One of the significant challenges in image-based ren-dering is dealing with occlusion areas (see Figures 1(b)–

1(c)) By the term occlusion area, we mean an area in one

input image disappearing from the other input image due

Trang 2

to scene structure, for example, area A in I Lis occluded in

I R (seeFigure 1) Note that a disappearing area becomes an

appearing area (also known as uncovered or newly-exposed

area), and vice versa, if the order of views is reversed, that is,

“right-to-left” instead of “left-to-right.”

Many approaches to novel view generation have been

proposed to date Although some methods account for

occlusions, few handle occlusions accurately Consequently,

occlusion areas are recovered inaccurately Our motivation

in this paper is to improve the novel image quality in

occlusion areas This is somewhat easier in approaches based

on forward projection since the correspondence mapping

is defined in the coordinate system of known images, and

thus known luminance/color can be used to reason about

the presence/absence as well as nature of occlusions We

have recently developed successful methods in this category

[9,10] On the other hand, in backward-projection methods

the mapping is defined in the coordinate system of a

novel image and thus no luminance/color is available to

reason about occlusions We address this diﬃculty here We

propose a new occlusion-aware backward-projection view

interpolation The method first identifies pixel visibility in

the intermediate image, that is, whether a particular pixel

is visible in all input images or only in those to the left or

to the right of the image to be reconstructed These labels

are incorporated into a variational formulation to adaptively

choose diﬀerent pairs of input images and reliably estimate

disparity under anisotropic regularization constraint The

final view generation is accomplished by occlusion-adaptive

linear intensity interpolation

The paper is organized as follows In Section 2, we

review prior work on intermediate view interpolation and

reconstruction as well as occlusion detection In Section 3,

we present the new occlusion-aware view interpolation, and

inSection 4, we show experimental results InSection 5, we

discuss benefits and deficiencies of forward- and

backward-projection approaches, and inSection 6, we summarize the

paper and draw conclusions

2 PRIOR WORK

Image-based rendering is concerned with creating an image

at a specific 3D location and specific time Adelson and

Bergen [11] formulated a description for all possible images

by means of the so-called plenoptic function that records

light rays at every possible 3D location, in every possible

direction, at every time instant, and for all wavelengths

In order to generate a new image, one simply needs to

sample this 7-dimensional function However, capturing a

full plenoptic function is diﬃcult, if not impossible, and

thus various assumptions aiming at the reduction of this

high dimensionality have been proposed For example, if

only static scenes are considered in grayscale, the number of

dimensions reduces to five

Although prior work can be classified based on the

number of dimensions of a plenoptic function used [12], in

the context of work proposed, here we prefer to classify prior

methods based on their need for structure information and

the number of input images

(1) Methods that rely on oversampling Among the most

prominent methods that rely on scene oversampling are lightfield rendering [13] and lumigraph [14] Both methods create a 4D representation of the scene using many input images The novel views are created by slicing (sampling) this 4D representation Since the scene is oversampled, the rendering process simply blends the input images, ignoring scene structure The presence of occlusions is not a problem because, thanks to oversampling, occlusions between nearest cameras are negligible, and all texture in the scene is visible from several cameras

(2) Methods that use undersampled data sets with known structure Given the scene structure, it is possible to reduce

the required number of images [15–18] If the depth map or 3D model of a scene is available, it is possible to project pixels

of the known images to a new viewpoint and reconstruct

a new image Obviously, it is not guaranteed that all pixels

in the new image will be visible in the input images However, since the scene structure is known, locations of occlusions are known, which is not the case considered in this paper

(3) Methods that use severely undersampled data sets with unknown structure These methods have no access to scene

structure and use few input images, typically 2–4 The scene structure is computed implicitly (from disparity) either using correspondence matching or projective geometry These methods can be categorized based on what approach they use to estimate the disparity: methods based on projective geometry or rectification [19–21], methods based on optical flow [5,9,22,23], methods based on block correspondence: variable-size blocks [24], fixed-size blocks [25], sliding blocks [26], methods based on feature correspondence [27], and methods using dynamic programming [28] Because

of limited input data and unknown scene structure, the reconstruction problem is ill-posed and requires some from of the regularization, usually by means of additional constraints The work presented in this paper is closest to this class of methods

2.1 Forward- and backward-projection methods

When computing an intermediate view, the central role is played by a transformation between the coordinate systems

of known images and the novel image This transformation depends on camera geometry and scene structure, and

is usually unknown It can be estimated by solving the correspondence problem with two possible definitions of the transformation: from known to novel image coordinates,

also called forward projection, or from novel to known image coordinates, called backward projection.

LetI LandI Rbe images of the same scene captured on 2D latticeΛ by two cameras We assume the distance between the cameras is normalized to 1 Suppose we need to reconstruct intermediate viewJ, also defined on Λ, but at distance 0 <

α < 1 from I L Clearly, for α = 0, J = I L, whereas for

α =1, J = I R(seeFigure 2) Due to this simple stereo setup, the transformation mentioned above simplifies to a disparity field betweenI andI

Trang 3

C L C J C R

(a)

(b)

A

B

(c) Figure 1: Illustration of intermediate view reconstruction from two cameras: (a) camera setup (C Jis a virtual camera, whileC LandC Rare real cameras), (b) occlusion eﬀect in captured images, and (c) occlusion eﬀect in one row of pixels from the images Area A from ILis being occluded inI Rby the object, while areaB is being uncovered (area B would undergo occlusion had the direction of arrows been reversed).

(a)

(b)

(c) Figure 2: Disparity vectors defined (pivoted) in: (a) left (known), (b) right (known), and (c) intermediate (unknown) images

2.1.1 Forward projection

Disparity vectors (transformation) are defined in the

coor-dinate system of known images Let dLbe a disparity field

defined on lattice Λ of I L (see Figure 2(a)), and let dR

be defined on lattice Λ of I R (see Figure 2(b)) Under the

constant-brightness assumption [29], the following holds

I L(x)= I R

x + dL(x)

, I R(x)= I L

x + dR(x)

, ∀x∈ Λ.

(1) Assuming that brightness constancy holds along the whole

disparity vector, also the following is true:

J

x +αd L(x)

= I L(x),

J

x + (1− α)d R(x)

= I R(x), ∀x∈ Λ. (2)

Clearly, the reconstruction of intermediate-view intensities

J(x + αd L(x)) and J(x + (1 − α)d R(x)) can be as simple

as substitution withI L(x) andI R(x), respectively However,

in general, x + αd L (x) / ∈Λ and x + (1 − α)d R (x) / ∈Λ,

that is, the projected points are oﬀ lattice Λ In fact,

due to the space-variant nature of disparities, the above

locations are usually irregularly spaced, whereas the goal is

to reconstructJ(x) regularly spaced (x ∈ Λ) One option

is to force the locations x +αd L(x) and x + (1− α)d R(x)

to belong to Λ For orthonormal lattices typically used,

this means forcing αd L(x) and (1− α)d R(x) to be

full-pixel vectors, that is, rounding coordinates to the nearest integer [21,25] Advanced approaches, such as those using splines to perform irregular-to-regular conversion, have also been proposed [9] While simple rounding suﬀers from objectionable reconstruction errors, advanced spline-based methods produce high-quality reconstructions but require significant computational eﬀort

2.1.2 Backward projection

Disparity vectors are defined in the coordinate system of the intermediate image J, and bidirectionally point toward

the known images [24, 30, 31] As shown in Figure 2(c),

dJ is defined on Λ in J thus forcing disparity vectors to

pass through pixel positions of the intermediate view (i.e.,

vectors are pivoted in the intermediate view) The

constant-brightness assumption now becomes

I L

x− αd J(x)

= I R

x + (1− α)d J(x)

, ∀x∈ Λ. (3) Compared to (1), each pixel inJ is guaranteed to be assigned

a disparity vector and, therefore, two intensities (fromI Land

I R) associated with it Although usually x− αd J (x) / ∈Λ and

x + (1− α)d J (x) / ∈Λ, intensities at these points can be easily calculated fromI LandI Rusing spatial interpolation

In order to compute J at distance α, a disparity field

pivoted atα is needed Although this necessitates disparity

Trang 4

estimation for eachα, it also simplifies the final computation

ofJ The reason is that view rendering becomes a byproduct

of disparity estimation; once dJ that satisfies (3) is found,

either left or right luminance/color can be used for the

intermediate-view texture An even better reconstruction is

accomplished when weighted averaging (linear

interpola-tion) of both intensities is applied [24,32]

J(x) =(1− α)I L

x− αd J(x) +αI R

x + (1− α)d J(x)

, ∀x∈ Λ. (4)

Clearly, all intermediate-view pixels are assigned an intensity,

and postprocessing is not needed

2.2 Occlusion-aware image-based rendering

In the case of oversampled data sets, if occlusions can

be reliably identified, then selection of visible features is

not diﬃcult (many views are available) In fact, explicit

detection of occlusions is not even necessary; robust

photo-consistent measures embedded into the rendering algorithm

are suﬃcient [33]

In the case of undersampled data sets, the situation

is diﬀerent, especially when scene structure (depth) is

unknown In fact, occlusions have dual impact in this

case First, correspondence (disparity) is not defined in

occlusion areas, and thus some a priori assumptions must be

made about correspondences (e.g., smoothness) Secondly,

during the estimation of disparities unreliable estimates

in occlusion areas impact the outcome at neighboring

positions, thus spreading the occlusion-related errors

Know-ing where occlusions take place can help correctKnow-ing both

problems

In forward-projection methods, pixels from I L (see

Figure 2(a)) orI R (seeFigure 2(b)) that are occluded in the

other image can be assigned a disparity based on depth

constancy assumption [34], that does not work well at

object boundaries, or by means of edge-preserving disparity

inpainting [9], that has been shown to be more accurate

The latter approach is possible since disparities are defined in

the coordinate system of known images (I LorI R), and thus

their underlying gradients can be used to guide anisotropic

disparity diﬀusion that improves the quality of estimated

disparities (discontinuities) [35,36]

In backward-projection methods, disparity is defined

in the coordinate system of the unknown image J, and

no underlying gradients are available to permit anisotropic

diﬀusion Therefore, the estimated disparities are usually

excessively smooth Although robust error metrics can

be used in regularization [37], this is often insuﬃcient

Moreover, it is unclear how to identify occlusions using a

single disparity field These are the main issues we address

in this paper

As for occlusion detection, it usually exploits one of

several constraints An ordering constraint preserves pixel

order on corresponding rows of left and right images [38]

but cannot handle thin foreground objects or narrow holes

A uniqueness constraint assures one-to-one mapping of

pixels on corresponding rows [39] In one implementation,

it relies on the geometry of disparity fields; a significant diﬀerence between forward (e.g., left-to-right) and back-ward (e.g., right-to-left) disparity vectors is indicative of occlusions [40] This constraint can also be thought of

as a geometric constraint as it relies on the analysis of

disparity field geometry Some other geometric constraints assume that disparity varies smoothly everywhere except object boundaries (continuity constraint) [39], or that occlusion areas exhibit excessive disparity gradient [41] Yet another geometric constraint seeks uncovered pixels in

I R by inspecting an irregular grid of forward disparity-compensated pixels of image I L This constraint has been shown to be very eﬀective and noise resilient in occlusion detection [42] A related, although weaker, visibility con-straint [43] also assures consistency of uncovered pixels in one image with disparity of the other image, but it permits

many-to-one matches in visible areas Finally, a photometric constraint (or constant-brightness constraint [29]) ensures intensity match in visible areas It is the simplest indicator

of occlusions but prone to errors in presence of image noise and illumination changes Methods based on multiple views compare intensity consistency along a path formed by displacement vectors in 3 or more frames [44–46] Graph cuts have also been used in multiview occlusion detection [47]

3 OCCLUSION-AWARE BACKWARD-PROJECTION VIEW INTERPOLATION

In backward-projection methods, disparities estimated around occlusion areas are erroneous since no underly-ing image gradients are available Lack of image gradient prevents the use of edge-preserving (anisotropic) diﬀusion Below, we argue that by using a coarse estimate of the intermediate image the fidelity of disparity field can be significantly improved around occlusion areas With this capacity to compute more accurate disparities, we then propose a new approach to occlusion-aware backward-projection view interpolation

3.1 Edge-preserving disparity regularization using

a coarse intermediate image

Edge-preserving (anisotropic) regularization preserves dis-parity edges better than isotropic diﬀusion [35, 48, 49] but requires an image gradient to guide the diﬀusion process Since in backward-projection methods the disparity

is defined on the sampling grid of the unknown image J,

no such gradient is available However, it turns out that simple backward-projection view interpolation described in

Section 2.1.2produces intermediate views with reliable edge information despite distorted texture in occlusion areas Although this may seem counterintuitive, the reason is that visible edges are easily matched and thus are prominent in the interpolated view.Figure 3shows an experimental result proving this point Images shown in Figures3(a) and3(c) are the input left and right images, and the one inFigure 3(b)

Trang 5

is the true intermediate image Disparity, estimated using

simple isotropic regularization:

arg min

d(x)

x∈ΩJ

I L(x− αd(x)

− I R

x + (1− α)d(x)2 +λ

∇ u 2+∇ v 2

dx,

(5) whereΩJ is the domain ofJ, d(x) = [ u(x) v(x) ] T, and

∇is the gradient operator, is shown inFigure 3(d) Clearly,

it is excessively smooth.Figure 3(e) shows an intermediate

image computed by using this disparity in (4) Although

there are significant texture errors (as clear fromFigure 3(f)),

edge maps, obtained using the Canny edge detector, are very

similar for the true and reconstructed intermediate images

(see Figures3(g) and3(h))

Therefore, we propose to use a coarse intermediate image

J c, computed using isotropically-diﬀused disparities (5), to

guide edge-preserving regularization as follows:

arg min

d(x)

x∈ΩJ

I L

x− αd(x)

− I R

x + (1− α)d(x)2

+λ

Fx

u, J c

+Fx

v, J c

dx.

(6) Above, Fx(·) assures anisotropic regularization [50] and is

defined as follows:

Fx

u, J c

= ∇ T u(x)

⎡

⎣gJ x

c(x) 0

0 gJ y

c(x)

⎤

⎦ ∇ u(x),

(7) whereg( ·) is a monotonically decreasing function, andJ x

c,J c y

are horizontal and vertical derivatives ofJ cat x If| J x

c(x)| =

| J c y(x)|, then isotropic smoothing takes place ((6) simplifies

to (5), except for diﬀerent λ) However if, for example,

| J x(x)| | J c y(x)| then stronger smoothing takes place

vertically, and the vertical edge is preserved

The disparity field shown inFigure 3(i) was computed

using formulation (6) It is clear that the object shape

is very well preserved The intermediate view obtained

using this disparity field in backward projection (4) and

its interpolation error are shown in Figures3(j) and3(k),

respectively As is clear from error images, distortions along

the horizontal boundaries of the square are suppressed

compared to Figure 3(f) because the excessive smoothness

of disparity field is eliminated Although these are

nonoc-cluding boundaries, they were assigned incorrect disparities

due to isotropic regularization (5) Edge-preserving

regular-ization (6) corrected the problem, and these areas are now

assigned accurate disparities Consequently, the intermediate

image is properly reconstructed there Significant errors,

however, persist in occlusion areas (vertical boundaries,

of the algorithm; a point is visible only in one of the

images, but reconstruction based on backward projection

(4) averages intensities from both images Therefore, next

we propose to use additional images to solve for occlusion

areas

3.2 Backward-projection view interpolation using multiple images

In order to improve reconstruction in occlusion areas, we first need to estimate their locations, and then figure out what intensities belong there Without loss of generality, let

us consider four input images as shown inFigure 4 Although this is a simple scenario, it does convey the main idea we intend to pursue While the top row shows images containing

a black square against background containing areasA and B,

the bottom row shows their horizontal cross-sections (rows

of pixels) The goal is to reconstruct the intermediate image

J using input images I1, I2, I3, andI4 Note that areasA and

B are being occluded/exposed between the four images.

In occlusion-unaware interpolation (4),I2andI3would

be the input images, and a disparity field defined onJ would

be estimated For most points inJ, it is possible to estimate

accurate disparities because the corresponding points are visible in bothI2andI3 However, areasA and B are occluded

in eitherI2orI3, and it is not possible to estimate disparities there Note that areasA and B are visible in additional images

to the left of I2 and to the right of I3 Thus, it should be possible to estimate disparities in areaA using I1andI2, and disparities in areaB using I3andI4 Therefore, a formulation

is needed to estimate disparities ofJ by choosing between

three image pairs: (I1,I2), (I3,I4), or (I2,I3)

In order to implement switching between image pairs, one first needs to identify areas A and B We propose to

use a method that we had developed earlier [42] Given a disparity field between two images, this method identifies areas that will be exposed between images, and such areas are equivalent to occluded areas when target and reference images are interchanged The method is based on the fact that pixels in the target image, that did not exist in the reference image (i.e., newly-exposed pixels), have no relationship to the reference image and, as such, cannot

be pointed to by disparity vectors Thus, when pixels of reference image are forward disparity compensated onto target image, these areas are empty and can be easily detected Since we need to identify areas that disappear to the left and to the right ofJ, we must estimate two disparity fields:

d12defined inI1and pointing toI2, and d43defined inI4and pointing toI3 We use formulation (6) withI1 (I4) used for

edge-preserving regularization when computing d12(d43) Our occlusion detection method [42] yields the areaB by

using (1+α)d12 The coeﬃcient (1+α) is needed to normalize

the disparity field so that it is correctly mapped ontoJ (see

Figure 4) The estimated areaB is exposed between I1 and

J, and, therefore, visible in I3 andI4 Similarly, using (2−

α)d43yields areaA, which is visible in I1andI2 LetL(x) be

a visibility label at location x inJ that we wish to estimate.

Clearly, by using d12and d34, we can label all points inJ as

visible inI1andI2only (L(x) = −1), visible inI3andI4only (L(x) = 1), or visible inI2 andI3 (L(x) = 0) (The actual label values have no importance; other values, such as 1, 2, and 3, could have been chosen.)

With the visibility of points inJ identified, we can now

reliably compute each point’s disparity from a suitable pair of images and also prevent oversmoothing via edge-preserving

Trang 6

(a) (b) (c) (d)

Figure 3: Results of backward-projection view interpolation for synthetic sequence no.1 with horizontal disparity: (a)I L, (b) ground-truth

J, (c) I R, (d) disparity estimated using isotropic diﬀusion (5), (e)J interpolated using (4) with disparity from (d), (f) interpolation error for

J from (e), (g) edge map of ground-truth image J, (h) edge map of interpolated image J, (i) disparity estimated using anisotropic diﬀusion (6) withJ cfrom (e), (j)J interpolated using (4) with disparity from (i), (k) interpolation error forJ from (j) SeeTable 1for PSNR values of the interpolation error

regularization We first define matching errors for image

pairs (I1,I2), (I2,I3), and (I3,I4) as follows:

θ12(x)= I1

x−(1 +α)d(x)

− I2

x− αd(x)

,

θ23(x)= I2

x− αd(x)

− I3

x + (1− α)d(x)

,

θ34(x)= I3

x + (1− α)d(x)

− I4

x + (2− α)d(x)

.

(8)

The coeﬃcients (1− α), (1 + α), (2 − α) adjust disparity

vectors depending on the distance toJ For locations x ∈ΩJ

outside ofA and B, all three errors yield small magnitudes.

However, in occlusion areas only one of them will have

a small magnitude For example, for x in area A, θ12(x)

will have a small magnitude, whereas for x in area B the

magnitude ofθ34(x) will be small.

In order to estimate disparities either bidirectionally

(visible pixels) or unidirectionally (occlusion areas), we

propose the following variational formulation that controls

intensity matching using labels L under edge-preserving

regularization:

min

x∈ΩJ

e P(x) +λe S(x)dx (9)

with

e P(x)= P12(x) +P23(x) +P34(x),

e S(x)= Fx

u, J c

+Fx

v, J c

P12(x)= δ

L(x) + 1

θ12(x)2

,

P23(x)= δ

L(x)

θ23(x)2

,

P34(x)= δ

L(x) −1

θ34(x)2

,

(11)

whereFx is defined in (7),J cis a coarse intermediate image reconstructed using disparity estimation with isotropic regularization, as proposed in Section 3.1, and δ(x) is

the Kronecker delta function Clearly, e P adaptively selects

diﬀerent pairs of input images depending on L For example,

Trang 7

I1 I2 J I3 I4

A

Visible

B Visible

Figure 4: Illustration of how to use four images in

backward-projection intermediate view interpolation AreasA and B can be

estimated inJ using (I1,I2) and (I3,I4), respectively, while points

outside ofA or B can be estimated using (I2,I3)

if L(x) = −1, then P12(x) is used Since Kronecker delta

δ(x) is not diﬀerentiable, we use an approximation, such as

δ(x) = limk → ∞ e −kx2 (k =1010 gives good approximation)

The derivation of Euler-Lagrange equations for the above

variational formulation is included in the appendix

Once the disparity field has been estimated, it is possible

to reconstruct J by using any intensity value along the

disparity vector, but averaging leads to better results (noise

suppression) We propose to reconstruct the intermediate

viewJ as follows:

J(x) = δ

L(x) + 1

ξ12+δ

L(x)

ξ23

+δ

L(x) −1

ξ34 ∀x∈ΩJ, (12)

where ξ · are intensity averages along disparity vector d(x)

defined as follows:

ξ12=1

2 I1

x−(1 +α)d(x)

+I2

x− αd(x)

,

ξ23=1

2 I2

x− αd(x)

− I3

x + (1− α)d(x)

,

ξ34=1

2 I3

x + (1− α)d(x)

− I4

x + (2− α)d(x)

.

(13)

Note that at every x, only one of the values in (13) contributes

toJ(x) (12) because of theδ( ·) terms

4 EXPERIMENTAL RESULTS

We solve partial diﬀerential equations derived in the

appendix using explicit discretization with a small time step

dt =1.5 ×10−5and 11×103iterations We employ a 4-level

hierarchical implementation in order to avoid local minima,

and bicubic interpolation to estimate subpixel intensities In

all experimental results shown in the paper, we use λ =

2000 Compared to the disparity estimation step (9), the

final view interpolation (12) is very simple and requires little

computation

In order to gauge gains due to the use of 4 images, we have compared the proposed algorithm with view interpolation based on 2-image backward projection with isotropic as well as edge-preserving regularization of disparities (see

equivalent forward-projection reconstruction using the same

4 images [9] The method uses occlusion-aware edge-preserving estimation of 3 disparity fields (from (I1,I2), (I2,I3), and (I3,I4)), followed by occlusion detection, and spline-based image reconstruction A listing of tested algo-rithms along with corresponding objective metrics (PSNR of interpolation error, i.e., diﬀerence between the ground-truth and computed intermediate images) can be found inTable 1

In the first test, we generated two additional images for the synthetic test sequence shown inFigure 3 The four input images are shown in Figures5(a)–5(d), and the ground-truth disparity, intermediate image, and label map are shown in Figures5(e)–5(g) A label fieldL estimated using the method

proposed in [42] is shown inFigure 5(h) In all label fields

in this paper, black is used to denote L(x) = −1, that

is, a point is visible in, and interpolation is performed on (I1,I2) Similarly, gray is used to denoteL(x) =0 and thus interpolation from (I2,I3), while white is used to denote

L(x) = 1 and interpolation from (I3,I4) Although there are false positives at the top and bottom boundaries of the square, since these areas are visible in all images, they can

be predicted from any pair and do not contribute to the interpolation error

Results for the 4-image occlusion-aware forward and backward projection are shown in the first row and the second row of Figure 6, respectively While the disparity field from Figure 6(a) (one of 3 disparity fields estimated

in forward projection) was estimated using one of the original images to guide edge-preserving regularization and implicit occlusion detection to prevent intensity mismatches, the disparity shown in Figure 6(d) was estimated using

a coarse image J c and occlusion labels from Figure 5(h)

In comparison with disparity from Figure 3(i), computed from two images using edge-preserving regularization, the improvement in occlusion areas is clear in both 4-image results Although it is diﬃcult to judge the estimated intermediate images J, the interpolation errors in Figures

6(c) and6(f) are clearly smaller than those in Figures 3(f) and 3(k) This is confirmed by numerical results shown

projection outperforming 4-image forward projection by over 1 dB Interestingly, the proposed edge-preserving regu-larization using a coarse intermediate images oﬀers over 2 dB improvement over isotropic regularization, both using two images

In order to verify this performance, we have prepared another synthetic sequence with more complex occlusions

respectively, between each two views, therefore occluding both the background and each other The original input images I1–I4 are shown in Figure 7along with the ground truth: disparity, intermediate image, and label map Also, a visibility label map estimated using the method proposed in [42] is shown inFigure 7(h).Figure 8shows the estimated

Trang 8

(a) (b) (c) (d)

Figure 5: Extended synthetic sequence no.1: (a)–(d)I1–I4 ground-truth, (e) disparity, (f) intermediate image, and (g) label map, (h) estimated label map (black, gray, and white colors indicate (I1,I2), (I2,I3), and (I3,I4) image pairs to be used, resp.)

Figure 6: Comparison of view interpolation methods for synthetic sequence fromFigure 5(disparity, interpolated view, and interpolation error are shown) (a)–(c) 4-image occlusion-aware forward projection, (d)–(f) 4-image occlusion-aware backward projection SeeTable 1

for algorithm description and PSNR values

disparity, interpolated intermediate image, and interpolation

error for the 4 methods described in Table 1 From error

images and PSNR values, it is clear that the method proposed

here outperforms 2-image backward-projection methods

and also the 4-image forward-projection method

Visually, the two 4-image methods stand out, the

esti-mated disparity fields are most accurate, and the computed

intermediate images carry little error Numerically, however, the proposed method has a clear edge over the 4-image forward projection (1.6 dB gain) The somewhat inferior performance of the forward-projection method stems from the fact that if a single intensity is projected to an incorrect location due to erroneous disparity estimate, it will aﬀect neighboring pixels during irregular-to-regular conversion

Trang 9

(a) (b) (c) (d)

Figure 7: Synthetic sequence no 2 with horizontal disparity (a)–(d)I1–I4ground-truth, (e) disparity, (f) intermediate image, (g) label map, and (h) estimated label map

Table 1: Description of four-view interpolation methods tested and PSNR values [dB]of the corresponding interpolation error for synthetic test sequences from Figures5and7, and natural sequence fromFigure 10

2-image isotropic BP

Backward projection (BP) using 2 images (I2,I3)

Isotropic disparity regularization (5) Linear interpolation (4)

2-image edge-preserving BP

Backward projection using 2 images (I2,I3)

Edge-preserving disparity regularization (6) Linear interpolation (4)

4-image occlusion-aware FP

Forward projection (FP) using 4 images (I1,I2,I3,I4)

Occlusion-aware edge-preserving disparity regularization [10]

Spline-based reconstruction [9] 4-image occlusion-aware BP

Backward projection using 4 images (I1,I2,I3,I4)

Occlusion-aware edge-preserving disparity regularization (9) Occlusion-aware linear interpolation (12)

using splines The reason is that spline-based

reconstruc-tion is performed globally; every pixel contributes to the

reconstruction of all other pixels This is not the case

for backward projection, where neighboring interpolations

are solved independently (except for disparity estimation)

Although there are some artifacts around edges in the

proposed approach, they are isolated as opposed to

spline-based reconstruction

This test sequence, however, has revealed one weakness of

the proposed method As it can be noticed inFigure 8(j), the

disparity to the right of the objects is distorted This is due to

the weak gradient between the object and the background

Since edge preserving regularization fails in this case, the

disparity of the object leaks into the background This

is a common problem in edge-preserving regularization Nevertheless, the proposed method improves the results for this synthetic sequence by 2.5 dB in comparison with 2-image backward projection with isotropic disparities

We also tested the proposed method on natural images

We used four frames (nos 10, 16, 22, 28) of the Flowergarden

sequence to reconstruct frame no 19 The four original images are shown in Figures 9(a)–9(d) Note how the tree trunk occludes the house in the background Disparity estimated, using backward projection with isotropic regu-larization based on 2 images (nos 16 and 22), is shown

inFigure 9(e) Note how smooth this disparity field is The

Trang 10

(a) (b) (c) (d)

Figure 8: Comparison of view interpolation methods for synthetic sequence fromFigure 7(disparity, interpolated view, and interpolation error are shown) (a)–(c) 2-image isotropic backward projection, (d)–(f) 2-image edge-preserving backward projection, (g)–(i) 4-image occlusion-aware forward projection (only disparity estimated fromI2toI3is shown), (j)–(l) 4-image occlusion-aware backward projection SeeTable 1for algorithm description and PSNR values

interpolation of image no 19 using this disparity field and

images no 16 and no 22 is shown in Figure 9(f) Note

that occlusion areas are poorly reconstructed; the texture

around the tree trunk is highly distorted, especially on

the flowerbed, house walls, and roof (see the closeup in

Figure 9(k))

A label field estimated using the method proposed in [42]

is shown in Figure 9(g), while a disparity estimated using

4-image occlusion-aware edge-preserving regularization is

shown in Figure 9(h) Compared to the 2-image isotropic

result from Figure 9(e), the new disparity exhibits sharp

tree trunk boundaries The interpolated intermediate view

is shown inFigure 9(i) Since the input sequence is actually

a video sequence, we can compare the reconstructed view

to the original frame no 19 Closeups of the original frame

no.19 and of both reconstructions are shown in Figures

9(j)–9(l) The texture of the flowerbed is not smeared in

the new reconstruction and very similar to the original

frame Also, the windows of the house cannot be identified

in Figure 9(k) as they are severely smeared However, they

are sharp and clear in Figure 9(l) Similarly, tree branches

behind the house are distorted inFigure 9(k), but are more accurately reconstructed inFigure 9(l)

Finally, we tested our algorithm on an image from the

Middlebury College’s Vision Group (Midd1 [51],Figure 10)

the other three methods, while PSNR values are presented in

Table 1 Compared to the isotropic case, the 2-image edge-preserving regularization sharpens the disparity field that, in turn, leads to 1 dB gain in PSNR However, occlusions are still not handled well; the closeup of occlusion area shows severe artifacts Since forward-projection with spline-based interpolation accounts for occlusions, we see an increase in PSNR value as well as proper reconstruction of texture in occlusion areas The proposed method adds another 0.5 dB

to the PSNR and produces intermediate image very close to the original closeup

5 DISCUSSION

The focus of this work was on severely undersampled 3D data sets with unknown scene structure, and, more specifically, on

Định dạng
Số trang	15
Dung lượng	4,87 MB