emerging technologies for 3d video

In this chapter, we introduce two major depth-sensing principles usingactive IR signals and state of the art applications.1.2 Time-of-Flight Depth Camera In active light sensing technolo

Trang 1

TECHNOLOGIES FOR 3D VIDEO

Trang 4

Designations used by companies to distinguish their products are often claimed as trademarks All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Emerging technologies for 3D video : creation, coding, transmission, and

rendering / Frederic Dufaux, Beatrice Pesquet-Popescu, Marco Cagnazzo.

pages cm

Includes bibliographical references and index.

ISBN 978-1-118-35511-4 (cloth)

1 3-D video–Standards 2 Digital video–Standards I Dufaux, Frederic,

1967- editor of compilation II Pesquet-Popescu, Beatrice, editor of

compilation III Cagnazzo, Marco, editor of compilation IV Title: Emerging

technologies for three dimensional video.

Trang 5

Seungkyu Lee

Ahmed Kirmani, Andrea ColaSco, and Vivek K Goyal

2.2.1 Light Fields, Reflectance Distribution Functions, and Optical

2.2.3 Synthetic Aperture Radar for Estimating Scene Reflectance 20

Trang 6

Carlos Vazquez, Liang Zhang, Filippo Speranza, Nils Plath, and Sebastian Knorr

Mounir Kaaniche, Raffaele Gaetano, Marco Cagnazzo,

and Beatrice Pesquet-Popescu

Trang 7

5.2 Geometrical Models for Stereoscopic Imaging 82

5.3.4 Fundamental Steps Involved in Stereo Matching Algorithms 89

Marco Cagnazzo, Beatrice Pesquet-Popescu, and Frederic Dufaux

Elie Gabriel Mora, Giuseppe Valenzise, Jo€el Jung, Beatrice Pesquet-Popescu,

Marco Cagnazzo, and Frederic Dufaux

7.3.1 Tools that Exploit the Inherent Characteristics of Depth Maps 1237.3.2 Tools that Exploit the Correlations with the Associated Texture 1277.3.3 Tools that Optimize Depth Map Coding for the Quality

7.4 Application Example: Depth Map Coding Using “Don’t Care” Regions 132

7.4.2 Transform Domain Sparsification Using “Don’t Care” Regions 134

Trang 8

8.1.3 Chapter Organization 141

Ngai-Man Cheung and Gene Cheung

9.2.1 Challenges: Coding Efficiency and Navigation Flexibility 163

Trang 9

9.4 Interactive Multiview Video Streaming 172

C G€oktug G€urler and A Murat Tekalp

10.2.2 Sender-Driven versus Receiver-Driven P2P Video Streaming 189

10.3.6 Multiple Requests from Multiple Peers of a Single Chunk 196

10.4.2 Rate Adaptation in Stereo Streaming: Asymmetric Coding 19710.4.3 Use Cases: Stereoscopic Video Streaming over P2P Network 200

Oliver Wang, Manuel Lang, Nikolce Stefanoski, Alexander Sorkine-Hornung,

Olga Sorkine-Hornung, Aljoscha Smolic, and Markus Gross

Trang 10

11.6.2 Position Constraints 219

11.6.4 Three-Dimensional Video Transmission Systems

Christopher Gilliam, Mike Brookes, and Pier Luigi Dragotti

12.4.1 Adaptive Sampling Based on Plenoptic Spectral Analysis 244

13 A Framework for Image-Based Stereoscopic View Synthesis from

Felix Klose, Christian Lipski, and Marcus Magnor

Trang 11

13.2 Estimating Dense Image Correspondences 258

13.4.4 Concluding with the “Who Cares?” Post-Production Pipeline 268

Janusz Konrad

14.3.1 Ghosting Suppression for Polarized and Shuttered

15.2.1 Three-Dimensional Cinema Projectors Based on Light

15.2.2 Three-Dimensional Cinema Projectors Based on Shutters 29915.2.3 Three-Dimensional Cinema Projectors Based on

Trang 12

16.3.2 Applications of the Ultra-High-Resolution Video System 322

16.4.1 Geometrical Relationship of Subject and Spatial Image 325

16.5.2 Elemental Image Generation from 3D Object Information 331

Peter Tamas Kovacs and Tibor Balogh

Trang 13

17.5 The Perfect 3D Display 345

Simon J Watt and Kevin J MacKenzie

18.4.3 Photometric Differences Between Left- and Right-Eye Images 35518.4.4 Camera Misalignments and Differences in Camera Optics 356

18.6 Motion Artefacts from Field-Sequential Stereoscopic Presentation 357

18.6.3 Distortions in Perceived Depth from Binocular Disparity 360

18.8.2 Accommodation and Vergence in the Real World and in S3D 367

Philippe Hanhart, Francesca De Simone, Martin Rerabek,

and Touradj Ebrahimi

Trang 14

19.4.5 Open Issues 388

Jean-Charles Bazin, Olivier Saurer, Friedrich Fraundorfer, and Marc Pollefeys

Trang 15

21 View Selection 416Fahad Daniyal and Andrea Cavallaro

Arnaud Bourge and Alain Bellon

Trang 16

23.4 DIBR Graphics Synthesis for Multiview Displays 458

24.2 Review of Disparity Estimation Algorithms and Implementations 469

Trang 17

The underlying principles of stereopsis have been known for a long time Stereoscopes to seephotographs in 3D appeared and became popular in the nineteenth century The first demon-strations of 3D movies took place in the first half of the twentieth century, initially usinganaglyph glasses, and then with polarization-based projection Hollywood experienced a firstshort-lived golden era of 3D movies in the 1950s In the last 10 years, 3D has regained sig-nificant interests and 3D movies are becoming ubiquitous Numerous major productions arenow released in 3D, culminating with Avatar, the highest grossing film of all time

In parallel with the recent growth of 3D movies, 3DTV is attracting significant interestfrom manufacturers and service providers This is obvious by the multiplication of new 3Dproduct announcements and services Beyond entertainment, 3D imaging technology is alsoseen as instrumental in other application areas such as video games, immersive video confer-ences, medicine, video surveillance, and engineering

With this growing interest, 3D video is often considered as one of the major upcominginnovations in video technology, with the expectation of greatly enhanced user experience.This book intends to provide an overview of key technologies for 3D video applications.More specifically, it covers the state of the art and explores new research directions, with theobjective to tackle all aspects involved in 3D video systems and services Topics addressedinclude content acquisition and creation, data representation and coding, transmission, viewsynthesis, rendering, display technologies, human perception of depth, and quality assess-ment Relevant standardization efforts are reviewed Finally, applications and implementa-tion issues are also described

More specifically, the book is composed of six parts Part One addresses different aspects of3D content acquisition and creation In Chapter 1, Lee presents depth cameras and relatedapplications The principle of active depth sensing is reviewed, along with depth imageprocessing methods such as noise modelling, upsampling, and removing motion blur In Chap-ter 2, Kirmani, ColaSco, and Goyal introduce the space-from-time imaging framework, whichachieves spatial resolution, in two and three dimensions, by measuring temporal variations oflight intensity in response to temporally or spatiotemporally varying illumination Chapter 3, byVazquez, Zhang, Speranza, Plath, and Knorr, provides an overview of the process generating astereoscopic video (S3D) from a monoscopic video source (2D), generally known as 2D-to-3Dvideo conversion, with a focus on selected recent techniques Finally, in Chapter 4, Zonepro-vides an overview of numerous contemporary strategies for shooting narrow and variable inter-axial baseline for stereoscopic cinematography Artistic implications are also discussed

A key issue in 3D video, Part Two addresses data representation, compression, and mission In Chapter 5, Kaaniche, Gaetano, Cagnazzo, and Pesquet-Popescu address the

trans-It is with great sadness that we learned that Ray Zone passed away on November 13, 2012.

Trang 18

strategies and solutions are reviewed Finally, G€urler and Tekalp propose an adaptive P2Pvideo streaming solution for streaming multiview video over P2P overlays in Chapter 10.Next, Part Three of the book discusses view synthesis and rendering In Chapter 11, Wang,Lang, Stefanoski, Sorkine-Hornung, Sorkine-Hornung, Smolic, and Gross present image-domain warping as an alternative to depth-image-based rendering techniques This techniqueutilizes simpler, image-based deformations as a means for realizing various stereoscopicpost-processing operators Gilliam, Brookes, and Dragotti, in Chapter 12, examine the state

of the art in plenoptic sampling theory In particular, the chapter presents theoretical resultsfor uniform sampling based on spectral analysis of the plenoptic function and algorithms foradaptive plenoptic sampling Finally, in Chapter 13, Klose, Lipski, and Magnor present acomplete end-to-end framework for stereoscopic free viewpoint video creation, allowing one

to viewpoint-navigate through space and time of complex real-world, dynamic scenes

As a very important component of a 3D video system, Part Four focuses on 3D displaytechnologies In Chapter 14, Konrad addresses digital signal processing methods for 3D datageneration, both stereoscopic and multiview, and for compensation of the deficiencies oftoday’s 3D displays Numerous experimental results are presented to demonstrate the useful-ness of such methods Borel and Doyen, in Chapter 15, present in detail the main 3D displaytechnologies available for cinemas, for large-display TV sets, and for mobile terminals Aperspective of evolution for the near and long term is also proposed In Chapter 16, Araifocuses on integral imaging, a 3D photography technique that is based on integral photogra-phy, in which information on 3D space is acquired and represented This chapter describesthe technology for displaying 3D space as a spatial image by integral imaging Finally, inChapter 17, Kovacs and Balogh present light-field displays, an advanced technique forimplementing glasses-free 3D displays

In most targeted applications, humans are the end-users of 3D video systems Part Fiveconsiders human perception of depth and perceptual quality assessment More specifically,

in Chapter 18, Watt and MacKenzie focus on how the human visual system interacts withstereoscopic 3D media, in view of optimizing effectiveness and viewing comfort Threemain issues are addressed: incorrect spatiotemporal stimuli introduced by field-sequentialstereo presentation, inappropriate binocular viewing geometry, and the unnatural relationshipbetween where the eyes fixate and focus in stereoscopic 3D viewing In turn, in Chapter 19,Hanhart, De Simone, Rerabek, and Ebrahimi consider mechanisms of 3D vision in humans,and their underlying perceptual models, in conjunction with the types of distortions thattoday’s and tomorrow’s 3D video processing systems produce This complex puzzle isexamined with a focus on how to measure 3D visual quality, as an essential factor in thesuccess of 3D technologies, products, and services

Trang 19

In order to complete the book, Part Six describes target applications for 3D video, as well

as implementation issues In Chapter 20, Bazin, Saurer, Fraundorfer, and Pollefeys present asemi-automatic method to generate interactive virtual tours from omnidirectional video Itallows a user to virtually navigate through buildings and indoor scenes Such a system can

be applied in various contexts, such as virtual tourism, tele-immersion, tele-presence, ande-heritage Daniyal and Cavallaro address the question of how to automatically identifywhich view is more useful when observing a dynamic scene with multiple cameras inChapter 21 This problem concerns several applications ranging from video production tovideo surveillance In particular, an overview of existing approaches for view selection andautomated video production is presented In Chapter 22, Bourge and Bellon present the hard-ware architecture of a typical mobile platform, and describe major stereoscopic 3D applica-tions Indeed, smartphones bring new opportunities to stereoscopic 3D, but also specificconstraints Chapter 23, by Le Feuvre and Mathieu, presents an integrated system for dis-playing interactive applications on multiview screens Both a simple GPU-based prototypeand a low-cost hardware design implemented on a field-programmable gate array are pre-sented Finally, in Chapter 24, Tseng and Chang propose an optimized disparity estimationalgorithm for high-definition 3DTV applications with reduced computational and memoryrequirements

By covering general and advanced topics, providing at the same time a broad and deepanalysis, the book has the ambition to become a reference for those involved or interested in3D video systems and services Assuming fundamental knowledge in image/video process-ing, as well as a basic understanding in mathematics, this book should be of interest to abroad readership with different backgrounds and expectations, including professors, graduateand undergraduate students, researchers, engineers, practitioners, and managers makingtechnological decisions about 3D video

Frederic Dufaux

Beatrice Pesquet-PopescuMarco Cagnazzo

Trang 21

List of Contributors

Jun Arai, NHK (Japan Broadcasting Corporation), Japan

Tibor Balogh, Holografika, Hungary

Jean-Charles Bazin, Computer Vision and Geometry Group, ETH Z€urich,

Switzerland

Alain Bellon, STMicroelectronics, France

Thierry Borel, Technicolor, France

Arnaud Bourge, STMicroelectronics, France

Mike Brookes, Department of Electrical and Electronic Engineering, Imperial CollegeLondon, UK

Marco Cagnazzo, Departement Traitement du Signal et des Images, Telecom ParisTech,France

Andrea Cavallaro, Queen Mary University of London, UK

Tian-Sheuan Chang, Department of Electronics Engineering, National Chiao TungUniversity, Taiwan

Gene Cheung, Digital Content and Media Sciences Research Division, National Institute

of Informatics, Japan

Ngai-Man Cheung, Information Systems Technology and Design Pillar, SingaporeUniversity of Technology and Design, Singapore

Andrea ColaSco, Media Lab, Massachusetts Institute of Technology, USA

Fahad Daniyal, Queen Mary University of London, UK

Francesca De Simone, Multimedia Signal Processing Group (MMSPG),

Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland

Didier Doyen, Technicolor, France

Pier Luigi Dragotti, Department of Electrical and Electronic Engineering, ImperialCollege London, UK

Frederic Dufaux, Departement Traitement du Signal et des Images, Telecom ParisTech,France

Trang 22

Technology, USA

Markus Gross, Disney Research Zurich, Switzerland

C G€oktug G€urler, College of Engineering, KoSc University, Turkey

Philippe Hanhart, Multimedia Signal Processing Group (MMSPG), Ecole Polytechnique

Federale de Lausanne (EPFL), Switzerland

Alexander Sorkine-Hornung, Disney Research Zurich, Switzerland

Jo€el Jung, Orange Labs, France

Mounir Kaaniche, Departement Traitement du Signal et des Images, Telecom ParisTech,France

Ahmed Kirmani, Research Laboratory of Electronics, Massachusetts Institute of

Technology, USA

Felix Klose, Institut f€ur Computergraphik, TU Braunschweig, Germany

Sebastian Knorr, imcube labs GmbH, Technische Universit€at Berlin, Germany

Janusz Konrad, Department of Electrical and Computer Engineering, Boston University,USA

Peter Tamas Kovacs, Holografika, Hungary

Manuel Lang, Disney Research Zurich, Switzerland

Seungkyu Lee, Samsung Advanced Institute of Technology, South Korea

Jean Le Feuvre, Departement Traitement du Signal et des Images, Telecom ParisTech,France

Christian Lipski, Institut f€ur Computergraphik, TU Braunschweig, Germany

Kevin J MacKenzie, Wolfson Centre for Cognitive Neuroscience, School of Psychology,Bangor University, UK

Marcus Magnor, Institut f€ur Computergraphik, TU Braunschweig, Germany

Yves Mathieu, Telecom ParisTech, France

Elie Gabriel Mora, Orange Labs, France; Departement Traitement du Signal et des Images,

Telecom ParisTech, France

Trang 23

Karsten M€uller, Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut,Germany

Beatrice Pesquet-Popescu, Departement Traitement du Signal et des Images, TelecomParisTech, France

Nils Plath, imcube labs GmbH, Technische Universit€at Berlin, Germany

Marc Pollefeys, Computer Vision and Geometry Group, ETH Z€urich, Switzerland

Martin Rerabek, Multimedia Signal Processing Group (MMSPG), Ecole Polytechnique

Federale de Lausanne (EPFL), Switzerland

Olivier Saurer, Computer Vision and Geometry Group, ETH Z€urich, Switzerland

Aljoscha Smolic, Disney Research Zurich, Switzerland

Olga Sorkine-Hornung, ETH Zurich, Switzerland

Filippo Speranza, Communications Research Centre Canada (CRC), Canada

Nikolce Stefanoski, Disney Research Zurich, Switzerland

A Murat Tekalp, College of Engineering, KoSc University, Turkey

Yu-Cheng Tseng, Department of Electronics Engineering, National Chiao Tung University,Taiwan

Giuseppe Valenzise, Departement Traitement du Signal et des Images, Telecom ParisTech,France

Carlos Vazquez, Communications Research Centre Canada (CRC), Canada

Anthony Vetro, Mitsubishi Electric Research Labs (MERL), USA

Simon J Watt, Wolfson Centre for Cognitive Neuroscience, School of Psychology, BangorUniversity, UK

Oliver Wang, Disney Research Zurich, Switzerland

Liang Zhang, Communications Research Centre Canada (CRC), Canada

Ray Zone, The 3-D Zone, USA

Trang 25

We would like to express our deepest appreciation to all the authors for their invaluablecontributions Without their commitment and efforts, this book would not have beenpossible

Moreover, we would like to gratefully acknowledge the John Wiley & Sons Ltd staff,Alex King, Liz Wingett, Richard Davies, and Genna Manaog, for their relentless supportthroughout this endeavour

Frederic Dufaux

Beatrice Pesquet-PopescuMarco Cagnazzo

Trang 27

Part One

Content Creation

Trang 29

3D sensing technologies such as digital holograph, interferometry, and integral photographyhave been studied However, they show limited performance in 3D geometry and photometryacquisition Recently, several consumer depth-sensing cameras using near-infrared light havebeen introduced in the market They have relatively low spatial resolution compared withcolor sensors and show limited sensing range and accuracy Thanks to their affordable pricesand the advantage of direct 3D geometry acquisition, many researchers from graphics, com-puter vision, image processing, and robotics have employed this new modality of data formany applications In this chapter, we introduce two major depth-sensing principles usingactive IR signals and state of the art applications.

1.2 Time-of-Flight Depth Camera

In active light sensing technology, if we can measure the flight time of a fixed-wavelengthsignal emitted from a sensor and reflected from an object surface, we can calculate the

Emerging Technologies for 3D Video: Creation, Coding, Transmission and Rendering, First Edition.

Fr ederic Dufaux, Beatrice Pesquet-Popescu, and Marco Cagnazzo.

Ó 2013 John Wiley & Sons, Ltd Published 2013 by John Wiley & Sons, Ltd.

Trang 30

of the reflected signal compared with the original emitted signal, we can calculate the distanceindirectly Recent ToF depth cameras in the market measure the phase delay of the emittedinfrared (IR) signal at each pixel and calculate the distance from the camera.

1.2.1 Principle

In this section, the principle of ToF depth sensing is explained in more detail with simplifiedexamples Let us assume that we use a sinusoidal IR wave as an active light source In gen-eral, consumer depth cameras use multiple light-emitting diodes (LEDs) to generate a fixed-wavelength IR signal What we can observe using an existing image sensor is the amount ofelectrons induced by collected photons during a certain time duration For color sensors, it isenough to count the amount of induced electrons to capture the luminance or chrominance ofthe expected bandwidth However, a single shot of photon collection is not enough for phasedelay measurement Instead, we collect photons multiple times at different time locations, asillustrated in Figure 1.2

Q1through Q4in Figure 1.2 are the amounts of electrons measured at each correspondingtime Reflected IR shows a phase delay proportional to the distance from the camera Since

we have reference emitted IR and its phase information, electron amounts at multiple timelocations (Q1through Q4have a 908 phase difference to each other) can tell us the amount ofdelay as follows:

Trang 31

wherea is the amplitude of the IR signal and f1throughf4are the normalized amounts ofelectrons.

In real sensing situations, a perfect sine wave is not possible to produce using cheap LEDs

of consumer depth cameras Any distortion on the sine wave causes miscalculation of thephase delay Furthermore, the amount of electrons induced by the reflected IR signal at acertain moment is very noisy due to the limited LED power In order to increase the signal-to-noise ratio, sensors collect electrons from multiple cycles of reflected IR signal, thusallowing some dedicated integration time

For a better understanding of the principle, let us assume that the emitted IR is a squarewave instead of sinusoidal and we have four switches at each sensor pixel to collect Q1through Q4 Each pixel of the depth sensor consists of several transistors and capacitors tocollect the electrons generated Four switches alter the on and off states with 908 phase dif-ferences based on the emitted reference IR signal as illustrated in Figure 1.3 When a switch

is turned on and reflected IR goes high, electrons are charged as indicated by shaded regions

In order to increase the signal-to-noise ratio, we repeatedly charge electrons through tiple cycles of the IR signal to measure Q1through Q4during a fixed integration time for asingle frame of depth image acquisition Once Q1through Q4are measured, the distance can

As indicated in Figure 1.4, what we have calculated is the distance R from the camera to

an object surface along the reflected IR signal This is not necessarily the distance along thez-direction of the 3D sensor coordinate Based on the location of each pixel and field of viewinformation, Z in Figure 1.4 can be calculated from R to obtain an undistorted 3D geometry.Most consumer depth cameras give calculated Z distance instead of R for user convenience

Figure 1.3 Four-phase depth sensing

Trang 32

Even though the ToF principle allows the distance imaging within the sensing range decided

by the IR modulation frequency, the quality of measured depth suffers from various sensorsystematic or nonsystematic noises (Huhle et al., 2008; Edeler et al., 2010; Foix et al., 2011;Matyunin et al., 2011) Owing to the limited power of IR light, incoming reflected IR intoeach image sensor pixel induces a limited number of electrons for depth calculation The prin-ciple of ToF depth sensing can calculate correct depth regardless of the power of IR light andthe amplitude of reflected IR However, a lower absolute amount of electrons suffers fromelectronic noises such as shot noise To resolve this problem, we increase the integration time

to collect a sufficient number of electrons for higher accuracy of depth calculation However,this limits the frame rate of sensors Increase of modulation frequency also increases sensoraccuracy under identical integration time, because it allows more cycles of modulated IRwaves for a single depth frame production However, this also limits the maximum sensingrange of depth sensors The left image in Figure 1.5 shows a 3D point cloud collected by aToF depth sensor The viewpoint is shifted to the right of the camera showing occludedregions by foreground chairs Note that the 3D data obtained from a depth sensor are not the

Figure 1.5 Measured depth and IR intensity images

Trang 33

complete volumetric data Only the 3D locations of the 2D surface seen from the camera’sviewpoint are given The right images in Figure 1.5 are depth and IR intensity images.

In active light sensors, the incoming light signal-to-noise ratio is still relatively low pared with the passive light sensors like a color camera due to the limited IR signal emittingpower In order to increase the signal-to-noise ratio further, depth sensors merge multipleneighbor sensor pixels to measure a single depth value, decreasing depth image resolution.This is called pixel binning Most consumer depth cameras perform the pixel binning andsacrifice image resolution to guarantee a certain depth accuracy Therefore, many researchersaccording to their applications perform depth image super-resolution (Schuon et al., 2008;Park et al., 2011; Yeo et al., 2011) before the use of raw depth images, as illustrated inFigure 1.6

com-The left image in Figure 1.6 is a raw depth image that is upsampled on the right Simplebilinear interpolation is used in this example Most interpolation methods, however, considerthe depth image as a 2D image and increase depth image resolution in the 2D domain Onthe other hand, if depth is going to be used for 3D reconstruction, upsampling only in twoaxes is not enough The left image in Figure 1.7 shows an example of 2D depth image

Figure 1.7 Depth image 2D versus 3D super-resolutionFigure 1.6 Depth image 2D super-resolution

Trang 34

where odd patterns can be observed around the foreground chair Figure 1.8 shows this fact more clearly.

arti-Figure 1.8 shows upsampled depth point cloud where the aligned color value is projectedonto each pixel The left image in Figure 1.8 shows lots of depth points in between the fore-ground chair and background The colors projected onto these points are from either theforeground or background of the aligned 2D color image This is a huge artifact, especiallyfor the 3D reconstruction application, where random view navigation will see this noise moreseriously, as shown in Figure 1.8

The right image in Figure 1.9 is an example of boundary noise point elimination Depthpoints away from both the foreground and background point cloud can be eliminated by out-lier elimination methods

When we take a moving object in real time, another artifact like motion blur comes out indepth images (Lindner and Kolb, 2009; Hussmann et al., 2011; Lee et al., 2012) Motion blur

is a long-standing issue of imaging devices because it leads to a wrong understanding andinformation of real-world objects For distance sensors in particular, motion blur causes distor-tions in the reconstructed 3D geometry or totally different distance information Various tech-niques have been developed to mitigate the motion blur in conventional sensors Currenttechnology, however, either requires a very short integration time to avoid motion blur oradopts computationally expensive post-processing methods to improve blurred images.Different from passive photometry imaging devices collectively using the amount of pho-tons induced in each pixel of a sensor, active geometry imaging, such as ToF sensors, investi-gates the relation between the amount of charged photons to figure out the phase difference

of an emitted light source of fixed wavelength, as explained earlier in this chapter These

Figure 1.8 Depth point cloud 3D super-resolution

Trang 35

sensors investigate the flight time of an emitted and reflected light source to calculate thedistance The phase difference of the reflected light in these principles represents the differ-ence in distance from the camera The distance image, as a result, is an integration of phasevariation over a pixel grid When there is any movement of an object or the camera itself, aphase shift will be observed at the corresponding location, shaping the infrared wavefront.The phase shift observed by a sensor pixel causes multiple reflected IR waves to capturedifferent phases within the integration time This gives the wrong distance value calculation.

As a result, the phase shift within a photon integration time produces motion blur that is notpreferable for robust distance sensing The motion blur region is the result of a series ofphase shifts within the photon integration time Reducing the integration time is not always

a preferable solution because it reduces the amount of photons collected to calculate adistance-decreasing signal-to-noise ratio On the other hand, post-processing after distancecalculation shows limited performance and is a time-consuming job

When we produce a distance image that includes motion blur, we can detect the phase shiftwithin an integration time by investigating the relation between the separately collectedamounts of photons by multiple control signals Figure 1.10 shows what happens if there isany phase shift within an integration time in a four-phase ToF sensing mechanism From theirdefinitions, the four electric charges Q1–Q4 are averages of total cumulated electric chargesover multiple “on” phases Without any phase shift, distance is calculated by equation (1.2)

by obtaining the phase difference between emitted and reflected IR waves When there is

Figure 1.9 Boundary noise elimination

Figure 1.10 Depth motion blur

Trang 36

values before and after the phase shifts Different from the original equation, reflected IR tudesa1anda2cannot be eliminated from the equation and affect the distance calculation.Figure 1.10 shows what happens in a depth image with a single phase shift During theintegration time, the original (indicated in black) and phase-shifted (indicated in grey)reflected IR come in sequentially and will be averaged to calculate a single depth value.Motion blurs around moving objects will be observed, showing quite different characteristicsfrom those of conventional color images, as shown in Figure 1.11 Note that the motion blurregions (indicated by dashed ellipses) have nearer or farther depth values than both fore-ground and background neighbor depth values In general with multiple or continuous phaseshifts, miscalculated distance is as follows:

Each control signal has a fixed phase delay from the others that gives a dedicated relation

of collected electric charges A four-phase ToF sensor makes a 908 phase delay betweencontrol signals, giving the following relations: Q1þ Q2¼ Q3þ Q4 ¼ Qsum and

jQ1 Q2j þ jQ3 Q4j ¼ Qsum Qsumis the total amount of electric charge delivered by thereflected IR In principle, every regular pixel has to meet with these conditions if there is nosignificant noise (Figure 1.12) With an appropriate sensor noise model and thresholds, theserelations can be used to see if the pixel has regular status A phase shift causes a very signifi-cant distance error exceeding the common sensor noise level and is effectively detected bytesting whether either of the relations is violated

Figure 1.11 Depth motion blur examples

Trang 37

There are several other noise sources (Foix et al., 2011) The emitted IR signal amplitudeattenuates while traveling in proportion to the reciprocal of the square of the distance Eventhough this attenuation should not affect the depth calculation of the ToF sensor in principle,the decrease of signal-to-noise ratio will degenerate the repeatability of the depth calculation.The uniformity assumption of the emitted IR onto the target object also causes spatial distortion

of the calculated depth In other words, each sensor pixel will collect reflected IR of differentamplitudes even though reflected from surfaces of identical distance Daylight interference isanother very critical issue of the practicality of the sensor Any frequency of IR signal emittedfrom the sensor will exist in daylight, which works as a noise with regard to correct depthcalculation Scattering problems (Mure-Dubois and Hugli, 2007) within the lens and sensorarchitecture are a major problem with depth sensors owing to their low signal-to-noise ratio

1.3 Structured Light Depth Camera

Kinect, a famous consumer depth camera in the market, is a structured IR light-type depthsensor which is well-known 3D geometry acquisition technology It is composed of an IRemitter, IR sensor, and color sensor, providing an IR amplitude image, depth map, and colorimage Basically, this technology utilizes conventional color sensor technology with rela-tively higher resolution Owing to the limit of the sensing range of the structured light princi-ple, the operating range of this depth sensor is around 1–4 m

1.3.1 Principle

In this type of sensor, a predetermined IR pattern is emitted onto the target objects(Figure 1.13) The pattern can be a rectangular grid or a set of random dots A calibrated IR

Figure 1.12 Boundary noise elimination

Figure 1.13 Structured IR light depth camera

Trang 38

sensing technologies assume Lambertian objects as their targets Both the ToF sensor in theleft in Figure 1.14 and the structured light sensor on the right in Figure 1.14 can see emittedlight sources.

However, most real-world objects have non-Lambertian surfaces, including transparencyand specularity Figure 1.15 shows what happens if active light depth sensors see a specularsurface and why the specular object is challenging to handle Unlike the Lambertian mate-rial, specular objects reflect the incoming light into a limited range of directions Let usassume that the convex surfaces in Figure 1.15 are mirrors, which reflect incoming light raystoward a very narrow direction If the sensor is not located on the exact direction of the

Figure 1.14 Lambertian surface

Figure 1.15 Depth of a specular surface

Trang 39

reflected ray from the mirror surface, the sensor does not receive any reflected IR and it isimpossible to calculate any depth information On the other hand, if the sensor is locatedexactly on the mirror reflection direction, the sensor will receive an excessive amount ofconcentrated IR light, causing saturation in measurement Consequently, the sensors fail toreceive the reflected light in a sensible range Such a phenomenon results in missing mea-surements for both types of sensors.

Figure 1.16 shows samples of specular objects taken by a ToF depth sensor The firstobject is a mirror where the flat area is all specular surfaces The second object shows spec-ularity in a small region The sensor in the first case is not on the mirror reflection directionand no saturation is observed However, depth within the mirror region is not correct Thesensor in the second case is on the mirror reflection direction and saturation is observed inthe intensity image This leads to wrong depth calculation

Figure 1.17 shows samples of specular objects taken by a structured light depth sensor.The mirror region of the first case shows the depth of the reflected surface The second casealso shows a specular region and leads to miscalculation of depth

In Figure 1.18 we demonstrate how a transparent object affects the sensor measurement.Considering transparent objects with background, depth sensors receive reflected light fromboth the foreground and background (Figure 1.19) The mixture of reflected light from fore-ground and background misleads the depth measurement Depending on the sensor types,however, the characteristics of the errors vary Since a ToF sensor computes the depth of atransparent object based on the mixture of reflected IR from foreground and background, thedepth measurement includes some bias toward the background For the structured light

Figure 1.17 Specular object examples of structured IR sensorFigure 1.16 Specular object examples of ToF sensor

Trang 40

sensor, the active light patterns are used to provide the correspondences between the sensorand projector With transparent objects, the measurement errors cause the mismatch on cor-respondences and yield data loss.

In general, a multipath problem (Fuchs, 2010) similar to transparent depth occurs in cave objects, as illustrated in Figure 1.20 In the left image in Figure 1.20, two different IRpaths having different flight times from the IR LEDs to the sensor can arrive at the samesensor pixel The path of the ray reflected twice on the concave surface is a spurious IRsignal and distracts from the correct depth calculation of the point using the ray whose path

con-is reflected only once In principle, a structured light sensor suffers from a similar problem

Figure 1.19 Transparent object examplesFigure 1.18 Depth of transparent object

Figure 1.20 Multipath problem

Tiêu đề	Emerging Technologies for 3D Video
Tác giả	Frederic Dufaux, Beatrice Pesquet-Popescu, Marco Cagnazzo
Trường học	Telecom Paris Tech
Chuyên ngành	Digital Video
Thể loại	Editions
Năm xuất bản	2013
Thành phố	France

Định dạng
Số trang	518
Dung lượng	16,71 MB