Forward- and backward-projection methods When computing an intermediate view, the central role is played by a transformation between the coordinate systems of known images and the novel
Trang 1Volume 2008, Article ID 803231, 15 pages
doi:10.1155/2008/803231
Research Article
Occlusion-Aware View Interpolation
Serdar Ince 1, 2 and Janusz Konrad (EURASIP Member) 1
1 Department of Electrical and Computer Engineering, Boston University, 8 Saint Mary’s Street, Boston, MA 02215, USA
2 IntelliVid Corporation, Cambridge, MA 02138, USA
Correspondence should be addressed to Janusz Konrad,jkonrad@bu.edu
Received 3 March 2008; Accepted 1 October 2008
Recommended by Peter Eisert
View interpolation is an essential step in content preparation for multiview 3D displays, free-viewpoint video, and multiview image/video compression It is performed by establishing a correspondence among views, followed by interpolation using the corresponding intensities However, occlusions pose a significant challenge, especially if few input images are available In this paper, we identify challenges related to disparity estimation and view interpolation in presence of occlusions We then propose
an occlusion-aware intermediate view interpolation algorithm that uses four input images to handle the disappearing areas The algorithm consists of three steps First, all pixels in view to be computed are classified in terms of their visibility in the input images Then, disparity for each pixel is estimated from different image pairs depending on the computed visibility map Finally, luminance/color of each pixel is adaptively interpolated from an image pair selected by its visibility label Extensive experimental results show striking improvements in interpolated image quality over occlusion-unaware interpolation from two images and very significant gains over occlusion-aware spline-based reconstruction from four images, both on synthetic and real images Although improvements are obvious only in the vicinity of object boundaries, this should be useful in high-quality 3D applications, such
as digital 3D cinema and ultra-high resolution multiview autostereoscopic displays, where distortions at depth discontinuities are highly objectionable, especially if they vary with viewpoint change
Copyright © 2008 S Ince and J Konrad This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
1 INTRODUCTION
The generation of a novel (virtual) view of a scene captured
by real cameras is often referred to as image-based rendering.
The problem is illustrated for the case of two cameras in
have been captured by camera C J had it been used, based
on images I L and I R captured by cameras C L and C R,
respectively Generation of such views is an essential step in
content preparation for multiview 3D displays [1 3],
free-viewpoint video [4,5], and multiview compression [6 8]
A very similar problem exists in frame-rate conversion of
video, except that novel images are created from
different-time snapshots rather than different views
In order to render novel view, first a correspondence
among known views needs to be established followed by
an estimation of new intensity from the known intensities
in correspondence Depending on how the correspondence
mapping is defined, two approaches are possible One
approach is based on backward projection of intensities (the
term “backward projection” is borrowed from the field of video coding where it refers to predicting luminance/color from previous (in time) frames), where the mapping is defined in the coordinate system of the unknown view (J in Figure 1) Thus, this approach simplifies the final estimation to an interpolation problem The other approach
is based on forward projection of intensities, where the
mapping is defined in the coordinate system of one of the known views (I L or I R in Figure 1), thus making the final estimation more difficult since projected intensities do not,
in general, belong to the sampling grid of J In fact, the
problem cannot be solved in this case by interpolation, and the novel-view intensities must be approximated instead Typically accomplished by means of additional constraints, this process is often referred to as view reconstruction We will review these two approaches in more detail inSection 2 One of the significant challenges in image-based ren-dering is dealing with occlusion areas (see Figures 1(b)–
1(c)) By the term occlusion area, we mean an area in one
input image disappearing from the other input image due
Trang 2to scene structure, for example, area A in I Lis occluded in
I R (seeFigure 1) Note that a disappearing area becomes an
appearing area (also known as uncovered or newly-exposed
area), and vice versa, if the order of views is reversed, that is,
“right-to-left” instead of “left-to-right.”
Many approaches to novel view generation have been
proposed to date Although some methods account for
occlusions, few handle occlusions accurately Consequently,
occlusion areas are recovered inaccurately Our motivation
in this paper is to improve the novel image quality in
occlusion areas This is somewhat easier in approaches based
on forward projection since the correspondence mapping
is defined in the coordinate system of known images, and
thus known luminance/color can be used to reason about
the presence/absence as well as nature of occlusions We
have recently developed successful methods in this category
[9,10] On the other hand, in backward-projection methods
the mapping is defined in the coordinate system of a
novel image and thus no luminance/color is available to
reason about occlusions We address this difficulty here We
propose a new occlusion-aware backward-projection view
interpolation The method first identifies pixel visibility in
the intermediate image, that is, whether a particular pixel
is visible in all input images or only in those to the left or
to the right of the image to be reconstructed These labels
are incorporated into a variational formulation to adaptively
choose different pairs of input images and reliably estimate
disparity under anisotropic regularization constraint The
final view generation is accomplished by occlusion-adaptive
linear intensity interpolation
The paper is organized as follows In Section 2, we
review prior work on intermediate view interpolation and
reconstruction as well as occlusion detection In Section 3,
we present the new occlusion-aware view interpolation, and
inSection 4, we show experimental results InSection 5, we
discuss benefits and deficiencies of forward- and
backward-projection approaches, and inSection 6, we summarize the
paper and draw conclusions
2 PRIOR WORK
Image-based rendering is concerned with creating an image
at a specific 3D location and specific time Adelson and
Bergen [11] formulated a description for all possible images
by means of the so-called plenoptic function that records
light rays at every possible 3D location, in every possible
direction, at every time instant, and for all wavelengths
In order to generate a new image, one simply needs to
sample this 7-dimensional function However, capturing a
full plenoptic function is difficult, if not impossible, and
thus various assumptions aiming at the reduction of this
high dimensionality have been proposed For example, if
only static scenes are considered in grayscale, the number of
dimensions reduces to five
Although prior work can be classified based on the
number of dimensions of a plenoptic function used [12], in
the context of work proposed, here we prefer to classify prior
methods based on their need for structure information and
the number of input images
(1) Methods that rely on oversampling Among the most
prominent methods that rely on scene oversampling are lightfield rendering [13] and lumigraph [14] Both methods create a 4D representation of the scene using many input images The novel views are created by slicing (sampling) this 4D representation Since the scene is oversampled, the rendering process simply blends the input images, ignoring scene structure The presence of occlusions is not a problem because, thanks to oversampling, occlusions between nearest cameras are negligible, and all texture in the scene is visible from several cameras
(2) Methods that use undersampled data sets with known structure Given the scene structure, it is possible to reduce
the required number of images [15–18] If the depth map or 3D model of a scene is available, it is possible to project pixels
of the known images to a new viewpoint and reconstruct
a new image Obviously, it is not guaranteed that all pixels
in the new image will be visible in the input images However, since the scene structure is known, locations of occlusions are known, which is not the case considered in this paper
(3) Methods that use severely undersampled data sets with unknown structure These methods have no access to scene
structure and use few input images, typically 2–4 The scene structure is computed implicitly (from disparity) either using correspondence matching or projective geometry These methods can be categorized based on what approach they use to estimate the disparity: methods based on projective geometry or rectification [19–21], methods based on optical flow [5,9,22,23], methods based on block correspondence: variable-size blocks [24], fixed-size blocks [25], sliding blocks [26], methods based on feature correspondence [27], and methods using dynamic programming [28] Because
of limited input data and unknown scene structure, the reconstruction problem is ill-posed and requires some from of the regularization, usually by means of additional constraints The work presented in this paper is closest to this class of methods
2.1 Forward- and backward-projection methods
When computing an intermediate view, the central role is played by a transformation between the coordinate systems
of known images and the novel image This transformation depends on camera geometry and scene structure, and
is usually unknown It can be estimated by solving the correspondence problem with two possible definitions of the transformation: from known to novel image coordinates,
also called forward projection, or from novel to known image coordinates, called backward projection.
LetI LandI Rbe images of the same scene captured on 2D latticeΛ by two cameras We assume the distance between the cameras is normalized to 1 Suppose we need to reconstruct intermediate viewJ, also defined on Λ, but at distance 0 <
α < 1 from I L Clearly, for α = 0, J = I L, whereas for
α =1, J = I R(seeFigure 2) Due to this simple stereo setup, the transformation mentioned above simplifies to a disparity field betweenI andI
Trang 3C L C J C R
(a)
(b)
A
B
(c) Figure 1: Illustration of intermediate view reconstruction from two cameras: (a) camera setup (C Jis a virtual camera, whileC LandC Rare real cameras), (b) occlusion effect in captured images, and (c) occlusion effect in one row of pixels from the images Area A from ILis being occluded inI Rby the object, while areaB is being uncovered (area B would undergo occlusion had the direction of arrows been reversed).
(a)
(b)
(c) Figure 2: Disparity vectors defined (pivoted) in: (a) left (known), (b) right (known), and (c) intermediate (unknown) images
2.1.1 Forward projection
Disparity vectors (transformation) are defined in the
coor-dinate system of known images Let dLbe a disparity field
defined on lattice Λ of I L (see Figure 2(a)), and let dR
be defined on lattice Λ of I R (see Figure 2(b)) Under the
constant-brightness assumption [29], the following holds
I L(x)= I R
x + dL(x)
, I R(x)= I L
x + dR(x)
, ∀x∈ Λ.
(1) Assuming that brightness constancy holds along the whole
disparity vector, also the following is true:
J
x +αd L(x)
= I L(x),
J
x + (1− α)d R(x)
= I R(x), ∀x∈ Λ. (2)
Clearly, the reconstruction of intermediate-view intensities
J(x + αd L(x)) and J(x + (1 − α)d R(x)) can be as simple
as substitution withI L(x) andI R(x), respectively However,
in general, x + αd L (x) / ∈Λ and x + (1 − α)d R (x) / ∈Λ,
that is, the projected points are off lattice Λ In fact,
due to the space-variant nature of disparities, the above
locations are usually irregularly spaced, whereas the goal is
to reconstructJ(x) regularly spaced (x ∈ Λ) One option
is to force the locations x +αd L(x) and x + (1− α)d R(x)
to belong to Λ For orthonormal lattices typically used,
this means forcing αd L(x) and (1− α)d R(x) to be
full-pixel vectors, that is, rounding coordinates to the nearest integer [21,25] Advanced approaches, such as those using splines to perform irregular-to-regular conversion, have also been proposed [9] While simple rounding suffers from objectionable reconstruction errors, advanced spline-based methods produce high-quality reconstructions but require significant computational effort
2.1.2 Backward projection
Disparity vectors are defined in the coordinate system of the intermediate image J, and bidirectionally point toward
the known images [24, 30, 31] As shown in Figure 2(c),
dJ is defined on Λ in J thus forcing disparity vectors to
pass through pixel positions of the intermediate view (i.e.,
vectors are pivoted in the intermediate view) The
constant-brightness assumption now becomes
I L
x− αd J(x)
= I R
x + (1− α)d J(x)
, ∀x∈ Λ. (3) Compared to (1), each pixel inJ is guaranteed to be assigned
a disparity vector and, therefore, two intensities (fromI Land
I R) associated with it Although usually x− αd J (x) / ∈Λ and
x + (1− α)d J (x) / ∈Λ, intensities at these points can be easily calculated fromI LandI Rusing spatial interpolation
In order to compute J at distance α, a disparity field
pivoted atα is needed Although this necessitates disparity
Trang 4estimation for eachα, it also simplifies the final computation
ofJ The reason is that view rendering becomes a byproduct
of disparity estimation; once dJ that satisfies (3) is found,
either left or right luminance/color can be used for the
intermediate-view texture An even better reconstruction is
accomplished when weighted averaging (linear
interpola-tion) of both intensities is applied [24,32]
J(x) =(1− α)I L
x− αd J(x) +αI R
x + (1− α)d J(x)
, ∀x∈ Λ. (4)
Clearly, all intermediate-view pixels are assigned an intensity,
and postprocessing is not needed
2.2 Occlusion-aware image-based rendering
In the case of oversampled data sets, if occlusions can
be reliably identified, then selection of visible features is
not difficult (many views are available) In fact, explicit
detection of occlusions is not even necessary; robust
photo-consistent measures embedded into the rendering algorithm
are sufficient [33]
In the case of undersampled data sets, the situation
is different, especially when scene structure (depth) is
unknown In fact, occlusions have dual impact in this
case First, correspondence (disparity) is not defined in
occlusion areas, and thus some a priori assumptions must be
made about correspondences (e.g., smoothness) Secondly,
during the estimation of disparities unreliable estimates
in occlusion areas impact the outcome at neighboring
positions, thus spreading the occlusion-related errors
Know-ing where occlusions take place can help correctKnow-ing both
problems
In forward-projection methods, pixels from I L (see
Figure 2(a)) orI R (seeFigure 2(b)) that are occluded in the
other image can be assigned a disparity based on depth
constancy assumption [34], that does not work well at
object boundaries, or by means of edge-preserving disparity
inpainting [9], that has been shown to be more accurate
The latter approach is possible since disparities are defined in
the coordinate system of known images (I LorI R), and thus
their underlying gradients can be used to guide anisotropic
disparity diffusion that improves the quality of estimated
disparities (discontinuities) [35,36]
In backward-projection methods, disparity is defined
in the coordinate system of the unknown image J, and
no underlying gradients are available to permit anisotropic
diffusion Therefore, the estimated disparities are usually
excessively smooth Although robust error metrics can
be used in regularization [37], this is often insufficient
Moreover, it is unclear how to identify occlusions using a
single disparity field These are the main issues we address
in this paper
As for occlusion detection, it usually exploits one of
several constraints An ordering constraint preserves pixel
order on corresponding rows of left and right images [38]
but cannot handle thin foreground objects or narrow holes
A uniqueness constraint assures one-to-one mapping of
pixels on corresponding rows [39] In one implementation,
it relies on the geometry of disparity fields; a significant difference between forward (e.g., left-to-right) and back-ward (e.g., right-to-left) disparity vectors is indicative of occlusions [40] This constraint can also be thought of
as a geometric constraint as it relies on the analysis of
disparity field geometry Some other geometric constraints assume that disparity varies smoothly everywhere except object boundaries (continuity constraint) [39], or that occlusion areas exhibit excessive disparity gradient [41] Yet another geometric constraint seeks uncovered pixels in
I R by inspecting an irregular grid of forward disparity-compensated pixels of image I L This constraint has been shown to be very effective and noise resilient in occlusion detection [42] A related, although weaker, visibility con-straint [43] also assures consistency of uncovered pixels in one image with disparity of the other image, but it permits
many-to-one matches in visible areas Finally, a photometric constraint (or constant-brightness constraint [29]) ensures intensity match in visible areas It is the simplest indicator
of occlusions but prone to errors in presence of image noise and illumination changes Methods based on multiple views compare intensity consistency along a path formed by displacement vectors in 3 or more frames [44–46] Graph cuts have also been used in multiview occlusion detection [47]
3 OCCLUSION-AWARE BACKWARD-PROJECTION VIEW INTERPOLATION
In backward-projection methods, disparities estimated around occlusion areas are erroneous since no underly-ing image gradients are available Lack of image gradient prevents the use of edge-preserving (anisotropic) diffusion Below, we argue that by using a coarse estimate of the intermediate image the fidelity of disparity field can be significantly improved around occlusion areas With this capacity to compute more accurate disparities, we then propose a new approach to occlusion-aware backward-projection view interpolation
3.1 Edge-preserving disparity regularization using
a coarse intermediate image
Edge-preserving (anisotropic) regularization preserves dis-parity edges better than isotropic diffusion [35, 48, 49] but requires an image gradient to guide the diffusion process Since in backward-projection methods the disparity
is defined on the sampling grid of the unknown image J,
no such gradient is available However, it turns out that simple backward-projection view interpolation described in
Section 2.1.2produces intermediate views with reliable edge information despite distorted texture in occlusion areas Although this may seem counterintuitive, the reason is that visible edges are easily matched and thus are prominent in the interpolated view.Figure 3shows an experimental result proving this point Images shown in Figures3(a) and3(c) are the input left and right images, and the one inFigure 3(b)
Trang 5is the true intermediate image Disparity, estimated using
simple isotropic regularization:
arg min
d(x)
x∈ΩJ
I L(x− αd(x)
− I R
x + (1− α)d(x)2 +λ
∇ u 2+∇ v 2
dx,
(5) whereΩJ is the domain ofJ, d(x) = [ u(x) v(x) ] T, and
∇is the gradient operator, is shown inFigure 3(d) Clearly,
it is excessively smooth.Figure 3(e) shows an intermediate
image computed by using this disparity in (4) Although
there are significant texture errors (as clear fromFigure 3(f)),
edge maps, obtained using the Canny edge detector, are very
similar for the true and reconstructed intermediate images
(see Figures3(g) and3(h))
Therefore, we propose to use a coarse intermediate image
J c, computed using isotropically-diffused disparities (5), to
guide edge-preserving regularization as follows:
arg min
d(x)
x∈ΩJ
I L
x− αd(x)
− I R
x + (1− α)d(x)2
+λ
Fx
u, J c
+Fx
v, J c
dx.
(6) Above, Fx(·) assures anisotropic regularization [50] and is
defined as follows:
Fx
u, J c
= ∇ T u(x)
⎡
⎣gJ x
c(x) 0
0 gJ y
c(x)
⎤
⎦ ∇ u(x),
(7) whereg( ·) is a monotonically decreasing function, andJ x
c,J c y
are horizontal and vertical derivatives ofJ cat x If| J x
c(x)| =
| J c y(x)|, then isotropic smoothing takes place ((6) simplifies
to (5), except for different λ) However if, for example,
| J x(x)| | J c y(x)| then stronger smoothing takes place
vertically, and the vertical edge is preserved
The disparity field shown inFigure 3(i) was computed
using formulation (6) It is clear that the object shape
is very well preserved The intermediate view obtained
using this disparity field in backward projection (4) and
its interpolation error are shown in Figures3(j) and3(k),
respectively As is clear from error images, distortions along
the horizontal boundaries of the square are suppressed
compared to Figure 3(f) because the excessive smoothness
of disparity field is eliminated Although these are
nonoc-cluding boundaries, they were assigned incorrect disparities
due to isotropic regularization (5) Edge-preserving
regular-ization (6) corrected the problem, and these areas are now
assigned accurate disparities Consequently, the intermediate
image is properly reconstructed there Significant errors,
however, persist in occlusion areas (vertical boundaries,
of the algorithm; a point is visible only in one of the
images, but reconstruction based on backward projection
(4) averages intensities from both images Therefore, next
we propose to use additional images to solve for occlusion
areas
3.2 Backward-projection view interpolation using multiple images
In order to improve reconstruction in occlusion areas, we first need to estimate their locations, and then figure out what intensities belong there Without loss of generality, let
us consider four input images as shown inFigure 4 Although this is a simple scenario, it does convey the main idea we intend to pursue While the top row shows images containing
a black square against background containing areasA and B,
the bottom row shows their horizontal cross-sections (rows
of pixels) The goal is to reconstruct the intermediate image
J using input images I1, I2, I3, andI4 Note that areasA and
B are being occluded/exposed between the four images.
In occlusion-unaware interpolation (4),I2andI3would
be the input images, and a disparity field defined onJ would
be estimated For most points inJ, it is possible to estimate
accurate disparities because the corresponding points are visible in bothI2andI3 However, areasA and B are occluded
in eitherI2orI3, and it is not possible to estimate disparities there Note that areasA and B are visible in additional images
to the left of I2 and to the right of I3 Thus, it should be possible to estimate disparities in areaA using I1andI2, and disparities in areaB using I3andI4 Therefore, a formulation
is needed to estimate disparities ofJ by choosing between
three image pairs: (I1,I2), (I3,I4), or (I2,I3)
In order to implement switching between image pairs, one first needs to identify areas A and B We propose to
use a method that we had developed earlier [42] Given a disparity field between two images, this method identifies areas that will be exposed between images, and such areas are equivalent to occluded areas when target and reference images are interchanged The method is based on the fact that pixels in the target image, that did not exist in the reference image (i.e., newly-exposed pixels), have no relationship to the reference image and, as such, cannot
be pointed to by disparity vectors Thus, when pixels of reference image are forward disparity compensated onto target image, these areas are empty and can be easily detected Since we need to identify areas that disappear to the left and to the right ofJ, we must estimate two disparity fields:
d12defined inI1and pointing toI2, and d43defined inI4and pointing toI3 We use formulation (6) withI1 (I4) used for
edge-preserving regularization when computing d12(d43) Our occlusion detection method [42] yields the areaB by
using (1+α)d12 The coefficient (1+α) is needed to normalize
the disparity field so that it is correctly mapped ontoJ (see
Figure 4) The estimated areaB is exposed between I1 and
J, and, therefore, visible in I3 andI4 Similarly, using (2−
α)d43yields areaA, which is visible in I1andI2 LetL(x) be
a visibility label at location x inJ that we wish to estimate.
Clearly, by using d12and d34, we can label all points inJ as
visible inI1andI2only (L(x) = −1), visible inI3andI4only (L(x) = 1), or visible inI2 andI3 (L(x) = 0) (The actual label values have no importance; other values, such as 1, 2, and 3, could have been chosen.)
With the visibility of points inJ identified, we can now
reliably compute each point’s disparity from a suitable pair of images and also prevent oversmoothing via edge-preserving
Trang 6(a) (b) (c) (d)
Figure 3: Results of backward-projection view interpolation for synthetic sequence no.1 with horizontal disparity: (a)I L, (b) ground-truth
J, (c) I R, (d) disparity estimated using isotropic diffusion (5), (e)J interpolated using (4) with disparity from (d), (f) interpolation error for
J from (e), (g) edge map of ground-truth image J, (h) edge map of interpolated image J, (i) disparity estimated using anisotropic diffusion (6) withJ cfrom (e), (j)J interpolated using (4) with disparity from (i), (k) interpolation error forJ from (j) SeeTable 1for PSNR values of the interpolation error
regularization We first define matching errors for image
pairs (I1,I2), (I2,I3), and (I3,I4) as follows:
θ12(x)= I1
x−(1 +α)d(x)
− I2
x− αd(x)
,
θ23(x)= I2
x− αd(x)
− I3
x + (1− α)d(x)
,
θ34(x)= I3
x + (1− α)d(x)
− I4
x + (2− α)d(x)
.
(8)
The coefficients (1− α), (1 + α), (2 − α) adjust disparity
vectors depending on the distance toJ For locations x ∈ΩJ
outside ofA and B, all three errors yield small magnitudes.
However, in occlusion areas only one of them will have
a small magnitude For example, for x in area A, θ12(x)
will have a small magnitude, whereas for x in area B the
magnitude ofθ34(x) will be small.
In order to estimate disparities either bidirectionally
(visible pixels) or unidirectionally (occlusion areas), we
propose the following variational formulation that controls
intensity matching using labels L under edge-preserving
regularization:
min
x∈ΩJ
e P(x) +λe S(x)dx (9)
with
e P(x)= P12(x) +P23(x) +P34(x),
e S(x)= Fx
u, J c
+Fx
v, J c
P12(x)= δ
L(x) + 1
θ12(x)2
,
P23(x)= δ
L(x)
θ23(x)2
,
P34(x)= δ
L(x) −1
θ34(x)2
,
(11)
whereFx is defined in (7),J cis a coarse intermediate image reconstructed using disparity estimation with isotropic regularization, as proposed in Section 3.1, and δ(x) is
the Kronecker delta function Clearly, e P adaptively selects
different pairs of input images depending on L For example,
Trang 7I1 I2 J I3 I4
A
Visible
B Visible
Figure 4: Illustration of how to use four images in
backward-projection intermediate view interpolation AreasA and B can be
estimated inJ using (I1,I2) and (I3,I4), respectively, while points
outside ofA or B can be estimated using (I2,I3)
if L(x) = −1, then P12(x) is used Since Kronecker delta
δ(x) is not differentiable, we use an approximation, such as
δ(x) = limk → ∞ e −kx2 (k =1010 gives good approximation)
The derivation of Euler-Lagrange equations for the above
variational formulation is included in the appendix
Once the disparity field has been estimated, it is possible
to reconstruct J by using any intensity value along the
disparity vector, but averaging leads to better results (noise
suppression) We propose to reconstruct the intermediate
viewJ as follows:
J(x) = δ
L(x) + 1
ξ12+δ
L(x)
ξ23
+δ
L(x) −1
ξ34 ∀x∈ΩJ, (12)
where ξ · are intensity averages along disparity vector d(x)
defined as follows:
ξ12=1
2 I1
x−(1 +α)d(x)
+I2
x− αd(x)
,
ξ23=1
2 I2
x− αd(x)
− I3
x + (1− α)d(x)
,
ξ34=1
2 I3
x + (1− α)d(x)
− I4
x + (2− α)d(x)
.
(13)
Note that at every x, only one of the values in (13) contributes
toJ(x) (12) because of theδ( ·) terms
4 EXPERIMENTAL RESULTS
We solve partial differential equations derived in the
appendix using explicit discretization with a small time step
dt =1.5 ×10−5and 11×103iterations We employ a 4-level
hierarchical implementation in order to avoid local minima,
and bicubic interpolation to estimate subpixel intensities In
all experimental results shown in the paper, we use λ =
2000 Compared to the disparity estimation step (9), the
final view interpolation (12) is very simple and requires little
computation
In order to gauge gains due to the use of 4 images, we have compared the proposed algorithm with view interpolation based on 2-image backward projection with isotropic as well as edge-preserving regularization of disparities (see
equivalent forward-projection reconstruction using the same
4 images [9] The method uses occlusion-aware edge-preserving estimation of 3 disparity fields (from (I1,I2), (I2,I3), and (I3,I4)), followed by occlusion detection, and spline-based image reconstruction A listing of tested algo-rithms along with corresponding objective metrics (PSNR of interpolation error, i.e., difference between the ground-truth and computed intermediate images) can be found inTable 1
In the first test, we generated two additional images for the synthetic test sequence shown inFigure 3 The four input images are shown in Figures5(a)–5(d), and the ground-truth disparity, intermediate image, and label map are shown in Figures5(e)–5(g) A label fieldL estimated using the method
proposed in [42] is shown inFigure 5(h) In all label fields
in this paper, black is used to denote L(x) = −1, that
is, a point is visible in, and interpolation is performed on (I1,I2) Similarly, gray is used to denoteL(x) =0 and thus interpolation from (I2,I3), while white is used to denote
L(x) = 1 and interpolation from (I3,I4) Although there are false positives at the top and bottom boundaries of the square, since these areas are visible in all images, they can
be predicted from any pair and do not contribute to the interpolation error
Results for the 4-image occlusion-aware forward and backward projection are shown in the first row and the second row of Figure 6, respectively While the disparity field from Figure 6(a) (one of 3 disparity fields estimated
in forward projection) was estimated using one of the original images to guide edge-preserving regularization and implicit occlusion detection to prevent intensity mismatches, the disparity shown in Figure 6(d) was estimated using
a coarse image J c and occlusion labels from Figure 5(h)
In comparison with disparity from Figure 3(i), computed from two images using edge-preserving regularization, the improvement in occlusion areas is clear in both 4-image results Although it is difficult to judge the estimated intermediate images J, the interpolation errors in Figures
6(c) and6(f) are clearly smaller than those in Figures 3(f) and 3(k) This is confirmed by numerical results shown
projection outperforming 4-image forward projection by over 1 dB Interestingly, the proposed edge-preserving regu-larization using a coarse intermediate images offers over 2 dB improvement over isotropic regularization, both using two images
In order to verify this performance, we have prepared another synthetic sequence with more complex occlusions
respectively, between each two views, therefore occluding both the background and each other The original input images I1–I4 are shown in Figure 7along with the ground truth: disparity, intermediate image, and label map Also, a visibility label map estimated using the method proposed in [42] is shown inFigure 7(h).Figure 8shows the estimated
Trang 8(a) (b) (c) (d)
Figure 5: Extended synthetic sequence no.1: (a)–(d)I1–I4 ground-truth, (e) disparity, (f) intermediate image, and (g) label map, (h) estimated label map (black, gray, and white colors indicate (I1,I2), (I2,I3), and (I3,I4) image pairs to be used, resp.)
Figure 6: Comparison of view interpolation methods for synthetic sequence fromFigure 5(disparity, interpolated view, and interpolation error are shown) (a)–(c) 4-image occlusion-aware forward projection, (d)–(f) 4-image occlusion-aware backward projection SeeTable 1
for algorithm description and PSNR values
disparity, interpolated intermediate image, and interpolation
error for the 4 methods described in Table 1 From error
images and PSNR values, it is clear that the method proposed
here outperforms 2-image backward-projection methods
and also the 4-image forward-projection method
Visually, the two 4-image methods stand out, the
esti-mated disparity fields are most accurate, and the computed
intermediate images carry little error Numerically, however, the proposed method has a clear edge over the 4-image forward projection (1.6 dB gain) The somewhat inferior performance of the forward-projection method stems from the fact that if a single intensity is projected to an incorrect location due to erroneous disparity estimate, it will affect neighboring pixels during irregular-to-regular conversion
Trang 9(a) (b) (c) (d)
Figure 7: Synthetic sequence no 2 with horizontal disparity (a)–(d)I1–I4ground-truth, (e) disparity, (f) intermediate image, (g) label map, and (h) estimated label map
Table 1: Description of four-view interpolation methods tested and PSNR values [dB]of the corresponding interpolation error for synthetic test sequences from Figures5and7, and natural sequence fromFigure 10
2-image isotropic BP
Backward projection (BP) using 2 images (I2,I3)
Isotropic disparity regularization (5) Linear interpolation (4)
2-image edge-preserving BP
Backward projection using 2 images (I2,I3)
Edge-preserving disparity regularization (6) Linear interpolation (4)
4-image occlusion-aware FP
Forward projection (FP) using 4 images (I1,I2,I3,I4)
Occlusion-aware edge-preserving disparity regularization [10]
Spline-based reconstruction [9] 4-image occlusion-aware BP
Backward projection using 4 images (I1,I2,I3,I4)
Occlusion-aware edge-preserving disparity regularization (9) Occlusion-aware linear interpolation (12)
using splines The reason is that spline-based
reconstruc-tion is performed globally; every pixel contributes to the
reconstruction of all other pixels This is not the case
for backward projection, where neighboring interpolations
are solved independently (except for disparity estimation)
Although there are some artifacts around edges in the
proposed approach, they are isolated as opposed to
spline-based reconstruction
This test sequence, however, has revealed one weakness of
the proposed method As it can be noticed inFigure 8(j), the
disparity to the right of the objects is distorted This is due to
the weak gradient between the object and the background
Since edge preserving regularization fails in this case, the
disparity of the object leaks into the background This
is a common problem in edge-preserving regularization Nevertheless, the proposed method improves the results for this synthetic sequence by 2.5 dB in comparison with 2-image backward projection with isotropic disparities
We also tested the proposed method on natural images
We used four frames (nos 10, 16, 22, 28) of the Flowergarden
sequence to reconstruct frame no 19 The four original images are shown in Figures 9(a)–9(d) Note how the tree trunk occludes the house in the background Disparity estimated, using backward projection with isotropic regu-larization based on 2 images (nos 16 and 22), is shown
inFigure 9(e) Note how smooth this disparity field is The
Trang 10(a) (b) (c) (d)
Figure 8: Comparison of view interpolation methods for synthetic sequence fromFigure 7(disparity, interpolated view, and interpolation error are shown) (a)–(c) 2-image isotropic backward projection, (d)–(f) 2-image edge-preserving backward projection, (g)–(i) 4-image occlusion-aware forward projection (only disparity estimated fromI2toI3is shown), (j)–(l) 4-image occlusion-aware backward projection SeeTable 1for algorithm description and PSNR values
interpolation of image no 19 using this disparity field and
images no 16 and no 22 is shown in Figure 9(f) Note
that occlusion areas are poorly reconstructed; the texture
around the tree trunk is highly distorted, especially on
the flowerbed, house walls, and roof (see the closeup in
Figure 9(k))
A label field estimated using the method proposed in [42]
is shown in Figure 9(g), while a disparity estimated using
4-image occlusion-aware edge-preserving regularization is
shown in Figure 9(h) Compared to the 2-image isotropic
result from Figure 9(e), the new disparity exhibits sharp
tree trunk boundaries The interpolated intermediate view
is shown inFigure 9(i) Since the input sequence is actually
a video sequence, we can compare the reconstructed view
to the original frame no 19 Closeups of the original frame
no.19 and of both reconstructions are shown in Figures
9(j)–9(l) The texture of the flowerbed is not smeared in
the new reconstruction and very similar to the original
frame Also, the windows of the house cannot be identified
in Figure 9(k) as they are severely smeared However, they
are sharp and clear in Figure 9(l) Similarly, tree branches
behind the house are distorted inFigure 9(k), but are more accurately reconstructed inFigure 9(l)
Finally, we tested our algorithm on an image from the
Middlebury College’s Vision Group (Midd1 [51],Figure 10)
the other three methods, while PSNR values are presented in
Table 1 Compared to the isotropic case, the 2-image edge-preserving regularization sharpens the disparity field that, in turn, leads to 1 dB gain in PSNR However, occlusions are still not handled well; the closeup of occlusion area shows severe artifacts Since forward-projection with spline-based interpolation accounts for occlusions, we see an increase in PSNR value as well as proper reconstruction of texture in occlusion areas The proposed method adds another 0.5 dB
to the PSNR and produces intermediate image very close to the original closeup
5 DISCUSSION
The focus of this work was on severely undersampled 3D data sets with unknown scene structure, and, more specifically, on