In this chapter, we introduce two major depth-sensing principles usingactive IR signals and state of the art applications.1.2 Time-of-Flight Depth Camera In active light sensing technolo
Trang 1TECHNOLOGIES FOR 3D VIDEO
Trang 4Designations used by companies to distinguish their products are often claimed as trademarks All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Emerging technologies for 3D video : creation, coding, transmission, and
rendering / Frederic Dufaux, Beatrice Pesquet-Popescu, Marco Cagnazzo.
pages cm
Includes bibliographical references and index.
ISBN 978-1-118-35511-4 (cloth)
1 3-D video–Standards 2 Digital video–Standards I Dufaux, Frederic,
1967- editor of compilation II Pesquet-Popescu, Beatrice, editor of
compilation III Cagnazzo, Marco, editor of compilation IV Title: Emerging
technologies for three dimensional video.
Trang 5Seungkyu Lee
Ahmed Kirmani, Andrea ColaSco, and Vivek K Goyal
2.2.1 Light Fields, Reflectance Distribution Functions, and Optical
2.2.3 Synthetic Aperture Radar for Estimating Scene Reflectance 20
Trang 6Carlos Vazquez, Liang Zhang, Filippo Speranza, Nils Plath, and Sebastian Knorr
Mounir Kaaniche, Raffaele Gaetano, Marco Cagnazzo,
and Beatrice Pesquet-Popescu
Trang 75.2 Geometrical Models for Stereoscopic Imaging 82
5.3.4 Fundamental Steps Involved in Stereo Matching Algorithms 89
Marco Cagnazzo, Beatrice Pesquet-Popescu, and Frederic Dufaux
Elie Gabriel Mora, Giuseppe Valenzise, Jo€el Jung, Beatrice Pesquet-Popescu,
Marco Cagnazzo, and Frederic Dufaux
7.3.1 Tools that Exploit the Inherent Characteristics of Depth Maps 1237.3.2 Tools that Exploit the Correlations with the Associated Texture 1277.3.3 Tools that Optimize Depth Map Coding for the Quality
7.4 Application Example: Depth Map Coding Using “Don’t Care” Regions 132
7.4.2 Transform Domain Sparsification Using “Don’t Care” Regions 134
Trang 88.1.3 Chapter Organization 141
Ngai-Man Cheung and Gene Cheung
9.2.1 Challenges: Coding Efficiency and Navigation Flexibility 163
Trang 99.4 Interactive Multiview Video Streaming 172
C G€oktug G€urler and A Murat Tekalp
10.2.2 Sender-Driven versus Receiver-Driven P2P Video Streaming 189
10.3.6 Multiple Requests from Multiple Peers of a Single Chunk 196
10.4.2 Rate Adaptation in Stereo Streaming: Asymmetric Coding 19710.4.3 Use Cases: Stereoscopic Video Streaming over P2P Network 200
Oliver Wang, Manuel Lang, Nikolce Stefanoski, Alexander Sorkine-Hornung,
Olga Sorkine-Hornung, Aljoscha Smolic, and Markus Gross
Trang 1011.6.2 Position Constraints 219
11.6.4 Three-Dimensional Video Transmission Systems
Christopher Gilliam, Mike Brookes, and Pier Luigi Dragotti
12.4.1 Adaptive Sampling Based on Plenoptic Spectral Analysis 244
13 A Framework for Image-Based Stereoscopic View Synthesis from
Felix Klose, Christian Lipski, and Marcus Magnor
Trang 1113.2 Estimating Dense Image Correspondences 258
13.4.4 Concluding with the “Who Cares?” Post-Production Pipeline 268
Janusz Konrad
14.3.1 Ghosting Suppression for Polarized and Shuttered
15.2.1 Three-Dimensional Cinema Projectors Based on Light
15.2.2 Three-Dimensional Cinema Projectors Based on Shutters 29915.2.3 Three-Dimensional Cinema Projectors Based on
Trang 1216.3.2 Applications of the Ultra-High-Resolution Video System 322
16.4.1 Geometrical Relationship of Subject and Spatial Image 325
16.5.2 Elemental Image Generation from 3D Object Information 331
Peter Tamas Kovacs and Tibor Balogh
Trang 1317.5 The Perfect 3D Display 345
Simon J Watt and Kevin J MacKenzie
18.4.3 Photometric Differences Between Left- and Right-Eye Images 35518.4.4 Camera Misalignments and Differences in Camera Optics 356
18.6 Motion Artefacts from Field-Sequential Stereoscopic Presentation 357
18.6.3 Distortions in Perceived Depth from Binocular Disparity 360
18.8.2 Accommodation and Vergence in the Real World and in S3D 367
Philippe Hanhart, Francesca De Simone, Martin Rerabek,
and Touradj Ebrahimi
Trang 1419.4.5 Open Issues 388
Jean-Charles Bazin, Olivier Saurer, Friedrich Fraundorfer, and Marc Pollefeys
Trang 1521 View Selection 416Fahad Daniyal and Andrea Cavallaro
Arnaud Bourge and Alain Bellon
Trang 1623.4 DIBR Graphics Synthesis for Multiview Displays 458
24.2 Review of Disparity Estimation Algorithms and Implementations 469
Trang 17The underlying principles of stereopsis have been known for a long time Stereoscopes to seephotographs in 3D appeared and became popular in the nineteenth century The first demon-strations of 3D movies took place in the first half of the twentieth century, initially usinganaglyph glasses, and then with polarization-based projection Hollywood experienced a firstshort-lived golden era of 3D movies in the 1950s In the last 10 years, 3D has regained sig-nificant interests and 3D movies are becoming ubiquitous Numerous major productions arenow released in 3D, culminating with Avatar, the highest grossing film of all time
In parallel with the recent growth of 3D movies, 3DTV is attracting significant interestfrom manufacturers and service providers This is obvious by the multiplication of new 3Dproduct announcements and services Beyond entertainment, 3D imaging technology is alsoseen as instrumental in other application areas such as video games, immersive video confer-ences, medicine, video surveillance, and engineering
With this growing interest, 3D video is often considered as one of the major upcominginnovations in video technology, with the expectation of greatly enhanced user experience.This book intends to provide an overview of key technologies for 3D video applications.More specifically, it covers the state of the art and explores new research directions, with theobjective to tackle all aspects involved in 3D video systems and services Topics addressedinclude content acquisition and creation, data representation and coding, transmission, viewsynthesis, rendering, display technologies, human perception of depth, and quality assess-ment Relevant standardization efforts are reviewed Finally, applications and implementa-tion issues are also described
More specifically, the book is composed of six parts Part One addresses different aspects of3D content acquisition and creation In Chapter 1, Lee presents depth cameras and relatedapplications The principle of active depth sensing is reviewed, along with depth imageprocessing methods such as noise modelling, upsampling, and removing motion blur In Chap-ter 2, Kirmani, ColaSco, and Goyal introduce the space-from-time imaging framework, whichachieves spatial resolution, in two and three dimensions, by measuring temporal variations oflight intensity in response to temporally or spatiotemporally varying illumination Chapter 3, byVazquez, Zhang, Speranza, Plath, and Knorr, provides an overview of the process generating astereoscopic video (S3D) from a monoscopic video source (2D), generally known as 2D-to-3Dvideo conversion, with a focus on selected recent techniques Finally, in Chapter 4, Zonepro-vides an overview of numerous contemporary strategies for shooting narrow and variable inter-axial baseline for stereoscopic cinematography Artistic implications are also discussed
A key issue in 3D video, Part Two addresses data representation, compression, and mission In Chapter 5, Kaaniche, Gaetano, Cagnazzo, and Pesquet-Popescu address the
trans-It is with great sadness that we learned that Ray Zone passed away on November 13, 2012.
Trang 18strategies and solutions are reviewed Finally, G€urler and Tekalp propose an adaptive P2Pvideo streaming solution for streaming multiview video over P2P overlays in Chapter 10.Next, Part Three of the book discusses view synthesis and rendering In Chapter 11, Wang,Lang, Stefanoski, Sorkine-Hornung, Sorkine-Hornung, Smolic, and Gross present image-domain warping as an alternative to depth-image-based rendering techniques This techniqueutilizes simpler, image-based deformations as a means for realizing various stereoscopicpost-processing operators Gilliam, Brookes, and Dragotti, in Chapter 12, examine the state
of the art in plenoptic sampling theory In particular, the chapter presents theoretical resultsfor uniform sampling based on spectral analysis of the plenoptic function and algorithms foradaptive plenoptic sampling Finally, in Chapter 13, Klose, Lipski, and Magnor present acomplete end-to-end framework for stereoscopic free viewpoint video creation, allowing one
to viewpoint-navigate through space and time of complex real-world, dynamic scenes
As a very important component of a 3D video system, Part Four focuses on 3D displaytechnologies In Chapter 14, Konrad addresses digital signal processing methods for 3D datageneration, both stereoscopic and multiview, and for compensation of the deficiencies oftoday’s 3D displays Numerous experimental results are presented to demonstrate the useful-ness of such methods Borel and Doyen, in Chapter 15, present in detail the main 3D displaytechnologies available for cinemas, for large-display TV sets, and for mobile terminals Aperspective of evolution for the near and long term is also proposed In Chapter 16, Araifocuses on integral imaging, a 3D photography technique that is based on integral photogra-phy, in which information on 3D space is acquired and represented This chapter describesthe technology for displaying 3D space as a spatial image by integral imaging Finally, inChapter 17, Kovacs and Balogh present light-field displays, an advanced technique forimplementing glasses-free 3D displays
In most targeted applications, humans are the end-users of 3D video systems Part Fiveconsiders human perception of depth and perceptual quality assessment More specifically,
in Chapter 18, Watt and MacKenzie focus on how the human visual system interacts withstereoscopic 3D media, in view of optimizing effectiveness and viewing comfort Threemain issues are addressed: incorrect spatiotemporal stimuli introduced by field-sequentialstereo presentation, inappropriate binocular viewing geometry, and the unnatural relationshipbetween where the eyes fixate and focus in stereoscopic 3D viewing In turn, in Chapter 19,Hanhart, De Simone, Rerabek, and Ebrahimi consider mechanisms of 3D vision in humans,and their underlying perceptual models, in conjunction with the types of distortions thattoday’s and tomorrow’s 3D video processing systems produce This complex puzzle isexamined with a focus on how to measure 3D visual quality, as an essential factor in thesuccess of 3D technologies, products, and services
Trang 19In order to complete the book, Part Six describes target applications for 3D video, as well
as implementation issues In Chapter 20, Bazin, Saurer, Fraundorfer, and Pollefeys present asemi-automatic method to generate interactive virtual tours from omnidirectional video Itallows a user to virtually navigate through buildings and indoor scenes Such a system can
be applied in various contexts, such as virtual tourism, tele-immersion, tele-presence, ande-heritage Daniyal and Cavallaro address the question of how to automatically identifywhich view is more useful when observing a dynamic scene with multiple cameras inChapter 21 This problem concerns several applications ranging from video production tovideo surveillance In particular, an overview of existing approaches for view selection andautomated video production is presented In Chapter 22, Bourge and Bellon present the hard-ware architecture of a typical mobile platform, and describe major stereoscopic 3D applica-tions Indeed, smartphones bring new opportunities to stereoscopic 3D, but also specificconstraints Chapter 23, by Le Feuvre and Mathieu, presents an integrated system for dis-playing interactive applications on multiview screens Both a simple GPU-based prototypeand a low-cost hardware design implemented on a field-programmable gate array are pre-sented Finally, in Chapter 24, Tseng and Chang propose an optimized disparity estimationalgorithm for high-definition 3DTV applications with reduced computational and memoryrequirements
By covering general and advanced topics, providing at the same time a broad and deepanalysis, the book has the ambition to become a reference for those involved or interested in3D video systems and services Assuming fundamental knowledge in image/video process-ing, as well as a basic understanding in mathematics, this book should be of interest to abroad readership with different backgrounds and expectations, including professors, graduateand undergraduate students, researchers, engineers, practitioners, and managers makingtechnological decisions about 3D video
Frederic Dufaux
Beatrice Pesquet-PopescuMarco Cagnazzo
Trang 21List of Contributors
Jun Arai, NHK (Japan Broadcasting Corporation), Japan
Tibor Balogh, Holografika, Hungary
Jean-Charles Bazin, Computer Vision and Geometry Group, ETH Z€urich,
Switzerland
Alain Bellon, STMicroelectronics, France
Thierry Borel, Technicolor, France
Arnaud Bourge, STMicroelectronics, France
Mike Brookes, Department of Electrical and Electronic Engineering, Imperial CollegeLondon, UK
Marco Cagnazzo, Departement Traitement du Signal et des Images, Telecom ParisTech,France
Andrea Cavallaro, Queen Mary University of London, UK
Tian-Sheuan Chang, Department of Electronics Engineering, National Chiao TungUniversity, Taiwan
Gene Cheung, Digital Content and Media Sciences Research Division, National Institute
of Informatics, Japan
Ngai-Man Cheung, Information Systems Technology and Design Pillar, SingaporeUniversity of Technology and Design, Singapore
Andrea ColaSco, Media Lab, Massachusetts Institute of Technology, USA
Fahad Daniyal, Queen Mary University of London, UK
Francesca De Simone, Multimedia Signal Processing Group (MMSPG),
Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland
Didier Doyen, Technicolor, France
Pier Luigi Dragotti, Department of Electrical and Electronic Engineering, ImperialCollege London, UK
Frederic Dufaux, Departement Traitement du Signal et des Images, Telecom ParisTech,France
Trang 22Technology, USA
Markus Gross, Disney Research Zurich, Switzerland
C G€oktug G€urler, College of Engineering, KoSc University, Turkey
Philippe Hanhart, Multimedia Signal Processing Group (MMSPG), Ecole Polytechnique
Federale de Lausanne (EPFL), Switzerland
Alexander Sorkine-Hornung, Disney Research Zurich, Switzerland
Jo€el Jung, Orange Labs, France
Mounir Kaaniche, Departement Traitement du Signal et des Images, Telecom ParisTech,France
Ahmed Kirmani, Research Laboratory of Electronics, Massachusetts Institute of
Technology, USA
Felix Klose, Institut f€ur Computergraphik, TU Braunschweig, Germany
Sebastian Knorr, imcube labs GmbH, Technische Universit€at Berlin, Germany
Janusz Konrad, Department of Electrical and Computer Engineering, Boston University,USA
Peter Tamas Kovacs, Holografika, Hungary
Manuel Lang, Disney Research Zurich, Switzerland
Seungkyu Lee, Samsung Advanced Institute of Technology, South Korea
Jean Le Feuvre, Departement Traitement du Signal et des Images, Telecom ParisTech,France
Christian Lipski, Institut f€ur Computergraphik, TU Braunschweig, Germany
Kevin J MacKenzie, Wolfson Centre for Cognitive Neuroscience, School of Psychology,Bangor University, UK
Marcus Magnor, Institut f€ur Computergraphik, TU Braunschweig, Germany
Yves Mathieu, Telecom ParisTech, France
Elie Gabriel Mora, Orange Labs, France; Departement Traitement du Signal et des Images,
Telecom ParisTech, France
Trang 23Karsten M€uller, Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut,Germany
Beatrice Pesquet-Popescu, Departement Traitement du Signal et des Images, TelecomParisTech, France
Nils Plath, imcube labs GmbH, Technische Universit€at Berlin, Germany
Marc Pollefeys, Computer Vision and Geometry Group, ETH Z€urich, Switzerland
Martin Rerabek, Multimedia Signal Processing Group (MMSPG), Ecole Polytechnique
Federale de Lausanne (EPFL), Switzerland
Olivier Saurer, Computer Vision and Geometry Group, ETH Z€urich, Switzerland
Aljoscha Smolic, Disney Research Zurich, Switzerland
Olga Sorkine-Hornung, ETH Zurich, Switzerland
Filippo Speranza, Communications Research Centre Canada (CRC), Canada
Nikolce Stefanoski, Disney Research Zurich, Switzerland
A Murat Tekalp, College of Engineering, KoSc University, Turkey
Yu-Cheng Tseng, Department of Electronics Engineering, National Chiao Tung University,Taiwan
Giuseppe Valenzise, Departement Traitement du Signal et des Images, Telecom ParisTech,France
Carlos Vazquez, Communications Research Centre Canada (CRC), Canada
Anthony Vetro, Mitsubishi Electric Research Labs (MERL), USA
Simon J Watt, Wolfson Centre for Cognitive Neuroscience, School of Psychology, BangorUniversity, UK
Oliver Wang, Disney Research Zurich, Switzerland
Liang Zhang, Communications Research Centre Canada (CRC), Canada
Ray Zone, The 3-D Zone, USA
Trang 25We would like to express our deepest appreciation to all the authors for their invaluablecontributions Without their commitment and efforts, this book would not have beenpossible
Moreover, we would like to gratefully acknowledge the John Wiley & Sons Ltd staff,Alex King, Liz Wingett, Richard Davies, and Genna Manaog, for their relentless supportthroughout this endeavour
Frederic Dufaux
Beatrice Pesquet-PopescuMarco Cagnazzo
Trang 27Part One
Content Creation
Trang 293D sensing technologies such as digital holograph, interferometry, and integral photographyhave been studied However, they show limited performance in 3D geometry and photometryacquisition Recently, several consumer depth-sensing cameras using near-infrared light havebeen introduced in the market They have relatively low spatial resolution compared withcolor sensors and show limited sensing range and accuracy Thanks to their affordable pricesand the advantage of direct 3D geometry acquisition, many researchers from graphics, com-puter vision, image processing, and robotics have employed this new modality of data formany applications In this chapter, we introduce two major depth-sensing principles usingactive IR signals and state of the art applications.
1.2 Time-of-Flight Depth Camera
In active light sensing technology, if we can measure the flight time of a fixed-wavelengthsignal emitted from a sensor and reflected from an object surface, we can calculate the
Emerging Technologies for 3D Video: Creation, Coding, Transmission and Rendering, First Edition.
Fr ederic Dufaux, Beatrice Pesquet-Popescu, and Marco Cagnazzo.
Ó 2013 John Wiley & Sons, Ltd Published 2013 by John Wiley & Sons, Ltd.
Trang 30of the reflected signal compared with the original emitted signal, we can calculate the distanceindirectly Recent ToF depth cameras in the market measure the phase delay of the emittedinfrared (IR) signal at each pixel and calculate the distance from the camera.
1.2.1 Principle
In this section, the principle of ToF depth sensing is explained in more detail with simplifiedexamples Let us assume that we use a sinusoidal IR wave as an active light source In gen-eral, consumer depth cameras use multiple light-emitting diodes (LEDs) to generate a fixed-wavelength IR signal What we can observe using an existing image sensor is the amount ofelectrons induced by collected photons during a certain time duration For color sensors, it isenough to count the amount of induced electrons to capture the luminance or chrominance ofthe expected bandwidth However, a single shot of photon collection is not enough for phasedelay measurement Instead, we collect photons multiple times at different time locations, asillustrated in Figure 1.2
Q1through Q4in Figure 1.2 are the amounts of electrons measured at each correspondingtime Reflected IR shows a phase delay proportional to the distance from the camera Since
we have reference emitted IR and its phase information, electron amounts at multiple timelocations (Q1through Q4have a 908 phase difference to each other) can tell us the amount ofdelay as follows:
Trang 31wherea is the amplitude of the IR signal and f1throughf4are the normalized amounts ofelectrons.
In real sensing situations, a perfect sine wave is not possible to produce using cheap LEDs
of consumer depth cameras Any distortion on the sine wave causes miscalculation of thephase delay Furthermore, the amount of electrons induced by the reflected IR signal at acertain moment is very noisy due to the limited LED power In order to increase the signal-to-noise ratio, sensors collect electrons from multiple cycles of reflected IR signal, thusallowing some dedicated integration time
For a better understanding of the principle, let us assume that the emitted IR is a squarewave instead of sinusoidal and we have four switches at each sensor pixel to collect Q1through Q4 Each pixel of the depth sensor consists of several transistors and capacitors tocollect the electrons generated Four switches alter the on and off states with 908 phase dif-ferences based on the emitted reference IR signal as illustrated in Figure 1.3 When a switch
is turned on and reflected IR goes high, electrons are charged as indicated by shaded regions
In order to increase the signal-to-noise ratio, we repeatedly charge electrons through tiple cycles of the IR signal to measure Q1through Q4during a fixed integration time for asingle frame of depth image acquisition Once Q1through Q4are measured, the distance can
As indicated in Figure 1.4, what we have calculated is the distance R from the camera to
an object surface along the reflected IR signal This is not necessarily the distance along thez-direction of the 3D sensor coordinate Based on the location of each pixel and field of viewinformation, Z in Figure 1.4 can be calculated from R to obtain an undistorted 3D geometry.Most consumer depth cameras give calculated Z distance instead of R for user convenience
Figure 1.3 Four-phase depth sensing
Trang 32Even though the ToF principle allows the distance imaging within the sensing range decided
by the IR modulation frequency, the quality of measured depth suffers from various sensorsystematic or nonsystematic noises (Huhle et al., 2008; Edeler et al., 2010; Foix et al., 2011;Matyunin et al., 2011) Owing to the limited power of IR light, incoming reflected IR intoeach image sensor pixel induces a limited number of electrons for depth calculation The prin-ciple of ToF depth sensing can calculate correct depth regardless of the power of IR light andthe amplitude of reflected IR However, a lower absolute amount of electrons suffers fromelectronic noises such as shot noise To resolve this problem, we increase the integration time
to collect a sufficient number of electrons for higher accuracy of depth calculation However,this limits the frame rate of sensors Increase of modulation frequency also increases sensoraccuracy under identical integration time, because it allows more cycles of modulated IRwaves for a single depth frame production However, this also limits the maximum sensingrange of depth sensors The left image in Figure 1.5 shows a 3D point cloud collected by aToF depth sensor The viewpoint is shifted to the right of the camera showing occludedregions by foreground chairs Note that the 3D data obtained from a depth sensor are not the
Figure 1.5 Measured depth and IR intensity images
Trang 33complete volumetric data Only the 3D locations of the 2D surface seen from the camera’sviewpoint are given The right images in Figure 1.5 are depth and IR intensity images.
In active light sensors, the incoming light signal-to-noise ratio is still relatively low pared with the passive light sensors like a color camera due to the limited IR signal emittingpower In order to increase the signal-to-noise ratio further, depth sensors merge multipleneighbor sensor pixels to measure a single depth value, decreasing depth image resolution.This is called pixel binning Most consumer depth cameras perform the pixel binning andsacrifice image resolution to guarantee a certain depth accuracy Therefore, many researchersaccording to their applications perform depth image super-resolution (Schuon et al., 2008;Park et al., 2011; Yeo et al., 2011) before the use of raw depth images, as illustrated inFigure 1.6
com-The left image in Figure 1.6 is a raw depth image that is upsampled on the right Simplebilinear interpolation is used in this example Most interpolation methods, however, considerthe depth image as a 2D image and increase depth image resolution in the 2D domain Onthe other hand, if depth is going to be used for 3D reconstruction, upsampling only in twoaxes is not enough The left image in Figure 1.7 shows an example of 2D depth image
Figure 1.7 Depth image 2D versus 3D super-resolutionFigure 1.6 Depth image 2D super-resolution
Trang 34where odd patterns can be observed around the foreground chair Figure 1.8 shows this fact more clearly.
arti-Figure 1.8 shows upsampled depth point cloud where the aligned color value is projectedonto each pixel The left image in Figure 1.8 shows lots of depth points in between the fore-ground chair and background The colors projected onto these points are from either theforeground or background of the aligned 2D color image This is a huge artifact, especiallyfor the 3D reconstruction application, where random view navigation will see this noise moreseriously, as shown in Figure 1.8
The right image in Figure 1.9 is an example of boundary noise point elimination Depthpoints away from both the foreground and background point cloud can be eliminated by out-lier elimination methods
When we take a moving object in real time, another artifact like motion blur comes out indepth images (Lindner and Kolb, 2009; Hussmann et al., 2011; Lee et al., 2012) Motion blur
is a long-standing issue of imaging devices because it leads to a wrong understanding andinformation of real-world objects For distance sensors in particular, motion blur causes distor-tions in the reconstructed 3D geometry or totally different distance information Various tech-niques have been developed to mitigate the motion blur in conventional sensors Currenttechnology, however, either requires a very short integration time to avoid motion blur oradopts computationally expensive post-processing methods to improve blurred images.Different from passive photometry imaging devices collectively using the amount of pho-tons induced in each pixel of a sensor, active geometry imaging, such as ToF sensors, investi-gates the relation between the amount of charged photons to figure out the phase difference
of an emitted light source of fixed wavelength, as explained earlier in this chapter These
Figure 1.8 Depth point cloud 3D super-resolution
Trang 35sensors investigate the flight time of an emitted and reflected light source to calculate thedistance The phase difference of the reflected light in these principles represents the differ-ence in distance from the camera The distance image, as a result, is an integration of phasevariation over a pixel grid When there is any movement of an object or the camera itself, aphase shift will be observed at the corresponding location, shaping the infrared wavefront.The phase shift observed by a sensor pixel causes multiple reflected IR waves to capturedifferent phases within the integration time This gives the wrong distance value calculation.
As a result, the phase shift within a photon integration time produces motion blur that is notpreferable for robust distance sensing The motion blur region is the result of a series ofphase shifts within the photon integration time Reducing the integration time is not always
a preferable solution because it reduces the amount of photons collected to calculate adistance-decreasing signal-to-noise ratio On the other hand, post-processing after distancecalculation shows limited performance and is a time-consuming job
When we produce a distance image that includes motion blur, we can detect the phase shiftwithin an integration time by investigating the relation between the separately collectedamounts of photons by multiple control signals Figure 1.10 shows what happens if there isany phase shift within an integration time in a four-phase ToF sensing mechanism From theirdefinitions, the four electric charges Q1–Q4 are averages of total cumulated electric chargesover multiple “on” phases Without any phase shift, distance is calculated by equation (1.2)
by obtaining the phase difference between emitted and reflected IR waves When there is
Figure 1.9 Boundary noise elimination
Figure 1.10 Depth motion blur
Trang 36values before and after the phase shifts Different from the original equation, reflected IR tudesa1anda2cannot be eliminated from the equation and affect the distance calculation.Figure 1.10 shows what happens in a depth image with a single phase shift During theintegration time, the original (indicated in black) and phase-shifted (indicated in grey)reflected IR come in sequentially and will be averaged to calculate a single depth value.Motion blurs around moving objects will be observed, showing quite different characteristicsfrom those of conventional color images, as shown in Figure 1.11 Note that the motion blurregions (indicated by dashed ellipses) have nearer or farther depth values than both fore-ground and background neighbor depth values In general with multiple or continuous phaseshifts, miscalculated distance is as follows:
Each control signal has a fixed phase delay from the others that gives a dedicated relation
of collected electric charges A four-phase ToF sensor makes a 908 phase delay betweencontrol signals, giving the following relations: Q1þ Q2¼ Q3þ Q4 ¼ Qsum and
jQ1 Q2j þ jQ3 Q4j ¼ Qsum Qsumis the total amount of electric charge delivered by thereflected IR In principle, every regular pixel has to meet with these conditions if there is nosignificant noise (Figure 1.12) With an appropriate sensor noise model and thresholds, theserelations can be used to see if the pixel has regular status A phase shift causes a very signifi-cant distance error exceeding the common sensor noise level and is effectively detected bytesting whether either of the relations is violated
Figure 1.11 Depth motion blur examples
Trang 37There are several other noise sources (Foix et al., 2011) The emitted IR signal amplitudeattenuates while traveling in proportion to the reciprocal of the square of the distance Eventhough this attenuation should not affect the depth calculation of the ToF sensor in principle,the decrease of signal-to-noise ratio will degenerate the repeatability of the depth calculation.The uniformity assumption of the emitted IR onto the target object also causes spatial distortion
of the calculated depth In other words, each sensor pixel will collect reflected IR of differentamplitudes even though reflected from surfaces of identical distance Daylight interference isanother very critical issue of the practicality of the sensor Any frequency of IR signal emittedfrom the sensor will exist in daylight, which works as a noise with regard to correct depthcalculation Scattering problems (Mure-Dubois and Hugli, 2007) within the lens and sensorarchitecture are a major problem with depth sensors owing to their low signal-to-noise ratio
1.3 Structured Light Depth Camera
Kinect, a famous consumer depth camera in the market, is a structured IR light-type depthsensor which is well-known 3D geometry acquisition technology It is composed of an IRemitter, IR sensor, and color sensor, providing an IR amplitude image, depth map, and colorimage Basically, this technology utilizes conventional color sensor technology with rela-tively higher resolution Owing to the limit of the sensing range of the structured light princi-ple, the operating range of this depth sensor is around 1–4 m
1.3.1 Principle
In this type of sensor, a predetermined IR pattern is emitted onto the target objects(Figure 1.13) The pattern can be a rectangular grid or a set of random dots A calibrated IR
Figure 1.12 Boundary noise elimination
Figure 1.13 Structured IR light depth camera
Trang 38sensing technologies assume Lambertian objects as their targets Both the ToF sensor in theleft in Figure 1.14 and the structured light sensor on the right in Figure 1.14 can see emittedlight sources.
However, most real-world objects have non-Lambertian surfaces, including transparencyand specularity Figure 1.15 shows what happens if active light depth sensors see a specularsurface and why the specular object is challenging to handle Unlike the Lambertian mate-rial, specular objects reflect the incoming light into a limited range of directions Let usassume that the convex surfaces in Figure 1.15 are mirrors, which reflect incoming light raystoward a very narrow direction If the sensor is not located on the exact direction of the
Figure 1.14 Lambertian surface
Figure 1.15 Depth of a specular surface
Trang 39reflected ray from the mirror surface, the sensor does not receive any reflected IR and it isimpossible to calculate any depth information On the other hand, if the sensor is locatedexactly on the mirror reflection direction, the sensor will receive an excessive amount ofconcentrated IR light, causing saturation in measurement Consequently, the sensors fail toreceive the reflected light in a sensible range Such a phenomenon results in missing mea-surements for both types of sensors.
Figure 1.16 shows samples of specular objects taken by a ToF depth sensor The firstobject is a mirror where the flat area is all specular surfaces The second object shows spec-ularity in a small region The sensor in the first case is not on the mirror reflection directionand no saturation is observed However, depth within the mirror region is not correct Thesensor in the second case is on the mirror reflection direction and saturation is observed inthe intensity image This leads to wrong depth calculation
Figure 1.17 shows samples of specular objects taken by a structured light depth sensor.The mirror region of the first case shows the depth of the reflected surface The second casealso shows a specular region and leads to miscalculation of depth
In Figure 1.18 we demonstrate how a transparent object affects the sensor measurement.Considering transparent objects with background, depth sensors receive reflected light fromboth the foreground and background (Figure 1.19) The mixture of reflected light from fore-ground and background misleads the depth measurement Depending on the sensor types,however, the characteristics of the errors vary Since a ToF sensor computes the depth of atransparent object based on the mixture of reflected IR from foreground and background, thedepth measurement includes some bias toward the background For the structured light
Figure 1.17 Specular object examples of structured IR sensorFigure 1.16 Specular object examples of ToF sensor
Trang 40sensor, the active light patterns are used to provide the correspondences between the sensorand projector With transparent objects, the measurement errors cause the mismatch on cor-respondences and yield data loss.
In general, a multipath problem (Fuchs, 2010) similar to transparent depth occurs in cave objects, as illustrated in Figure 1.20 In the left image in Figure 1.20, two different IRpaths having different flight times from the IR LEDs to the sensor can arrive at the samesensor pixel The path of the ray reflected twice on the concave surface is a spurious IRsignal and distracts from the correct depth calculation of the point using the ray whose path
con-is reflected only once In principle, a structured light sensor suffers from a similar problem
Figure 1.19 Transparent object examplesFigure 1.18 Depth of transparent object
Figure 1.20 Multipath problem