Academic press a wavelet tour of signal processing the sparse way 3rd edition dec 2008 ISBN 0123743702 pdf

Here is a small catalog of new elements in this thirdedition:■ Radon transform and tomography ■ Lifting for wavelets on surfaces, bounded domains, and fast computations ■ JPEG-2000 image

Trang 2

This book is printed on acid-free paper ⬁

Designations used by companies to distinguish their products are often claimed as trade-marks orregistered trademarks In all instances in which Academic Press is aware of a claim, the productnames appear in initial capital or all capital letters Readers, however, should contact the appropriatecompanies for more complete information regarding trademarks and registration

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form

or by any means, electronic, mechanical, photocopying, scanning, or otherwise, without prior writtenpermission of the publisher

Permissions may be sought directly from Elsevier’s Science & Technology Rights Department inOxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier.com

You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by

selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.”

Library of Congress Cataloging-in-Publication Data

Application submitted

ISBN 13: 978-0-12-374370-1

For information on all Academic Press publications,

visit our Website at www.books.elsevier.com

Printed in the United States

08 09 10 11 12 10 9 8 7 6 5 4 3 2 1

Trang 4

I cannot help but find striking resemblances between scientific communities andschools of fish We interact in conferences and through articles, and we movetogether while a global trajectory emerges from individual contributions Some of

us like to be at the center of the school, others prefer to wander around, and a fewswim in multiple directions in front To avoid dying by starvation in a progressivelynarrower and specialized domain, a scientiﬁc community needs also to move on.Computational harmonic analysis is still very much alive because it went beyondwavelets Writing such a book is about decoding the trajectory of the school andgathering the pearls that have been uncovered on the way Wavelets are no longerthe central topic, despite the previous edition’s original title It is just an importanttool, as the Fourier transform is Sparse representation and processing are now atthe core

In the 1980s,many researchers were focused on building time-frequency positions,trying to avoid the uncertainty barrier,and hoping to discover the ultimaterepresentation Along the way came the construction of wavelet orthogonal bases,which opened new perspectives through collaborations with physicists and math-ematicians Designing orthogonal bases with Xlets became a popular sport withcompression and noise-reduction applications Connections with approximationsand sparsity also became more apparent The search for sparsity has taken over,leading to new grounds where orthonormal bases are replaced by redundant dictio-naries of waveforms

decom-During these last seven years, I also encountered the industrial world With

a lot of naiveness, some bandlets, and more mathematics, I cofounded a start-upwith Christophe Bernard, Jérome Kalifa, and Erwan Le Pennec It took us sometime to learn that in three months good engineering should produce robust algo-rithms that operate in real time, as opposed to the three years we were used

to having for writing new ideas with promising perspectives Yet, we survivedbecause mathematics is a major source of industrial innovations for signal process-ing Semiconductor technology offers amazing computational power and ﬂexibility.However, ad hoc algorithms often do not scale easily and mathematics acceleratesthe trial-and-error development process Sparsity decreases computations, memory,and data communications Although it brings beauty, mathematical understanding

is not a luxury It is required by increasingly sophisticated information-processingdevices

New Additions

Putting sparsity at the center of the book implied rewriting many parts andadding sections Chapters 12 and 13 are new They introduce sparse represen-tations in redundant dictionaries, and inverse problems, super-resolution, and xv

Trang 5

compressive sensing Here is a small catalog of new elements in this thirdedition:

■ Radon transform and tomography

■ Lifting for wavelets on surfaces, bounded domains, and fast computations

■ JPEG-2000 image compression

■ Block thresholding for denoising

■ Geometric representations with adaptive triangulations, curvelets, andbandlets

■ Sparse approximations in redundant dictionaries with pursuit algorithms

■ Noise reduction with model selection in redundant dictionaries

■ Exact recovery of sparse approximation supports in dictionaries

■ Multichannel signal representations and processing

This book is intended as a graduate-level textbook Its evolution is also the result

of teaching courses in electrical engineering and applied mathematics A newwebsite provides software for reproducible experimentations, exercise solutions,together with teaching material such as slides with ﬁgures and MATLAB softwarefor numerical classes ofhttp://wavelet-tour.com.

More exercises have been added at the end of each chapter, ordered by level

of difﬁculty Level1exercises are direct applications of the course Level2exercisesrequires more thinking Level3includes some technical derivation exercises Level4are projects at the interface of research that are possible topics for a ﬁnal courseproject or independent study More exercises and projects can be found in thewebsite

Sparse Course Programs

The Fourier transform and analog-to-digital conversion through linear samplingapproximations provide a common ground for all courses (Chapters 2 and 3)

It introduces basic signal representations and reviews important mathematicaland algorithmic tools needed afterward Many trajectories are then possible toexplore and teach sparse signal processing The following list notes several top-ics that can orient a course’s structure with elements that can be covered alongthe way

Trang 6

Sparse representations with bases and applications:

■ Principles of linear and nonlinear approximations in bases (Chapter 9)

■ Lipschitz regularity and wavelet coefﬁcients decay (Chapter 6)

■ Wavelet bases (Chapter 7)

■ Properties of linear and nonlinear wavelet basis approximations (Chapter 9)

■ Image wavelet compression (Chapter 10)

■ Linear and nonlinear diagonal denoising (Chapter 11)

Sparse time-frequency representations:

■ Time-frequency wavelet and windowed Fourier ridges for audio processing

(Chapter 4)

■ Local cosine bases (Chapter 8)

■ Linear and nonlinear approximations in bases (Chapter 9)

■ Audio compression (Chapter 10)

■ Audio denoising and block thresholding (Chapter 11)

■ Compression and denoising in redundant time-frequency dictionaries with

best bases or pursuit algorithms (Chapter 12)

Sparse signal estimation:

■ Bayes versus minimax and linear versus nonlinear estimations (Chapter 11)

■ Wavelet bases (Chapter 7)

■ Thresholding estimation (Chapter 11)

■ Minimax optimality (Chapter 11)

■ Model selection for denoising in redundant dictionaries (Chapter 12)

■ Compressive sensing (Chapter 13)

Sparse compression and information theory:

■ Wavelet orthonormal bases (Chapter 7)

■ Compression and sparse transform codes in bases (Chapter 10)

■ Compression in redundant dictionaries (Chapter 12)

■ Source separation (Chapter 13)

Dictionary representations and inverse problems:

■ Frames and Riesz bases (Chapter 5)

■ Ideal redundant dictionary approximations (Chapter 12)

■ Pursuit algorithms and dictionary incoherence (Chapter 12)

■ Linear and thresholding inverse estimators (Chapter 13)

■ Super-resolution and source separation (Chapter 13)

Trang 7

Geometric sparse processing:

■ Time-frequency spectral lines and ridges (Chapter 4)

■ Frames and Riesz bases (Chapter 5)

■ Multiscale edge representations with wavelet maxima (Chapter 6)

■ Sparse approximation supports in bases (Chapter 9)

■ Approximations with geometric regularity,curvelets,and bandlets (Chapters 9and 12)

■ Sparse signal compression and geometric bit budget (Chapters 10 and 12)

■ Exact recovery of sparse approximation supports (Chapter 12)

■ Super-resolution (Chapter 13)

ACKNOWLEDGMENTS

Some things do not change with new editions, in particular the traces left by theones who were, and remain, for me important references As always, I am deeplygrateful to Ruzena Bajcsy and Yves Meyer

I spent the last few years with three brilliant and kind colleagues—ChristopheBernard, Jérome Kalifa, and Erwan Le Pennec—in a pressure cooker called a “start-up.” Pressure means stress, despite very good moments The resulting sauce was ablend of what all of us could provide,which brought new ﬂavors to our personalities

I am thankful to them for the ones I got, some of which I am still discovering.This new edition is the result of a collaboration with Gabriel Peyré, who madethese changes not only possible, but also very interesting to do I thank him for hisremarkable work and help

St ´ephane Mallat

Trang 8

Some things do not change with new editions, in particular the traces left by theones who were, and remain, for me important references As always, I am deeplygrateful to Ruzena Bajcsy and Yves Meyer

I spent the last few years with three brilliant and kind colleagues—ChristopheBernard, Jérome Kalifa, and Erwan Le Pennec—in a pressure cooker called a “start-up.” Pressure means stress, despite very good moments The resulting sauce was ablend of what all of us could provide,which brought new ﬂavors to our personalities

I am thankful to them for the ones I got, some of which I am still discovering

This new edition is the result of a collaboration with Gabriel Peyré, who madethese changes not only possible, but also very interesting to do I thank him for hisremarkable work and help

St ´ephane Mallat

Trang 9

f , g Inner product (A.6)

f Euclidean or Hilbert space norm

f 1 L 1orl1norm

f ⬁ L⬁norm

f [n]⫽O(g[n]) Order of: there exists K such that f [n]⭐Kg[n]

f [n]⫽o(g[n]) Small order of: lim n→⫹⬁f g [n] [n]⫽0

f [n]∼g[n] Equivalent to: f [n]⫽O( g[n]) and g[n]⫽O( f [n])

C0 Uniformly continuous functions (7.207)

Cp ptimes continuously differentiable functions

C⬁ Inﬁnitely differentiable functions

Ws (R) Sobolevstimes differentiable functions (9.8)

L 2(R) Finite energy functions

CN Complex signals of size N

Trang 10

U ⊗V Tensor product of two vector spaces (A.19)

NullU Null space of an operator U

ImU Image space of an operator U

ˆf[k] Discrete Fourier transform (3.49)

Sf (u, s) Short-time windowed Fourier transform (4.11)

Trang 11

elementary waveforms chosen in a family called a dictionary But the search for

the Holy Grail of an ideal sparse transform adapted to all signals is a hopeless quest.The discovery of wavelet orthogonal bases and local time-frequency dictionaries hasopened the door to a huge jungle of new transforms Adapting sparse representa-tions to signal properties, and deriving efﬁcient processing operators, is therefore anecessary survival strategy

An orthogonal basis is a dictionary of minimum size that can yield a sparse sentation if designed to concentrate the signal energy over a set of few vectors Thisset gives a geometric signal description Efﬁcient signal compression and noise-reduction algorithms are then implemented with diagonal operators computedwith fast algorithms But this is not always optimal

repre-In natural languages, a richer dictionary helps to build shorter and more precisesentences Similarly, dictionaries of vectors that are larger than bases are needed

to build sparse representations of complex signals But choosing is difﬁcult andrequires more complex algorithms Sparse representations in redundant dictionariescan improve pattern recognition,compression,and noise reduction,but also the res-olution of new inverse problems This includes superresolution, source separation,and compressive sensing

This ﬁrst chapter is a sparse book representation, providing the story line andthe main ideas It gives a sense of orientation for choosing a path to travel

1.1 COMPUTATIONAL HARMONIC ANALYSIS

Fourier and wavelet bases are the journey’s starting point They decompose nals over oscillatory waveforms that reveal many signal properties and provide

sig-a psig-ath to spsig-arse representsig-ations Discretized signsig-als often hsig-ave sig-a very lsig-arge

size N⭓106, and thus can only be processed by fast algorithms, typically

imple-mented with O (N log N) operations and memories Fourier and wavelet transforms 1

Trang 12

illustrate the strong connection between well-structured mathematical tools andfast algorithms.

1.1.1 The Fourier Kingdom

The Fourier transform is everywhere in physics and mathematics because it nalizes time-invariant convolution operators It rules over linear time-invariant signal

diago-processing, the building blocks of which are frequency ﬁltering operators.

Fourier analysis represents any ﬁnite energy function f (t) as a sum of sinusoidal

The amplitude ˆf (␻) of each sinusoidal wave e i ␻t is equal to its correlation with f ,

also called Fourier transform:

ˆf(␻)⫽ ⫹⬁

The more regular f (t),the faster the decay of the sinusoidal wave amplitude | ˆf(␻)|

when frequency␻ increases.

When f (t) is deﬁned only on an interval, say [0, 1], then the Fourier transform

becomes a decomposition in a Fourier orthonormal basis{ei2␲mt}m∈ZofL 2[0, 1].

If f (t) is uniformly regular, then its Fourier transform coefﬁcients also have a fast

decay when the frequency 2␲m increases, so it can be easily approximated with

few low-frequency Fourier coefﬁcients The Fourier transform therefore deﬁnes asparse representation of uniformly regular functions

Over discrete signals, the Fourier transform is a decomposition in a discreteorthogonal Fourier basis{ei2␲kn/N}0⭐k⬍N ofCN, which has properties similar to aFourier transform on functions Its embedded structure leads to fast Fourier trans-

form (FFT) algorithms,which compute discrete Fourier coefﬁcients with O (N log N)

instead of N2 This FFT algorithm is a cornerstone of discrete signal processing

As long as we are satisﬁed with linear time-invariant operators or uniformlyregular signals, the Fourier transform provides simple answers to most questions.Its richness makes it suitable for a wide range of applications such as signaltransmissions or stationary signal processing However, to represent a transientphenomenon—a word pronounced at a particular time, an apple located in theleft corner of an image—the Fourier transform becomes a cumbersome tool thatrequires many coefﬁcients to represent a localized event Indeed, the support of

ei ␻t covers the whole real line, so ˆf (␻) depends on the values f (t) for all times

t∈R This global “mix”of information makes it difﬁcult to analyze or represent any

local property of f (t) from ˆf(␻).

1.1.2 Wavelet Bases

Wavelet bases, like Fourier bases, reveal the signal regularity through the tude of coefﬁcients, and their structure leads to a fast computational algorithm

Trang 13

ampli-However, wavelets are well localized and few coefficients are needed to representlocal transient structures As opposed to a Fourier basis, a wavelet basis defines asparse representation of piecewise regular signals,which may include transients andsingularities In images, large wavelet coefficients are located in the neighborhood

of edges and irregular textures

The story began in 1910, when Haar [291] constructed a piecewise constantfunction

Let us write f, g⫽ ⫺⬁⫹⬁f (t) g∗(t) dt—the inner product in L2(R) Any ﬁnite energy

signal f can thus be represented by its wavelet inner-product coefﬁcients

Each Haar wavelet␺ j,n (t) has a zero average over its support [2 j n, 2 j (n⫹1)] If f

is locally regular and 2jis small, then it is nearly constant over this interval and thewavelet coefﬁcient f , ␺ j ,n is nearly zero.This means that large wavelet coefﬁcientsare located at sharp signal transitions only

With a jump in time, the story continues in 1980, when Strömberg [449] found

a piecewise linear function␺ that also generates an orthonormal basis and gives

better approximations of smooth functions Meyer was not aware of this result,and motivated by the work of Morlet and Grossmann over continuous wavelettransform, he tried to prove that there exists no regular wavelet␺ that generates

an orthonormal basis This attempt was a failure since he ended up constructing

a whole family of orthonormal wavelet bases, with functions␺ that are inﬁnitely

continuously differentiable [375] This was the fundamental impulse that led to awidespread search for new orthonormal wavelet bases, which culminated in thecelebrated Daubechies wavelets of compact support [194]

Trang 14

The systematic theory for constructing orthonormal wavelet bases was lished by Meyer and Mallat through the elaboration of multiresolution signalapproximations [362], as presented in Chapter 7 It was inspired by original ideasdeveloped in computer vision by Burt and Adelson [126] to analyze images at sev-eral resolutions Digging deeper into the properties of orthogonal wavelets andmultiresolution approximations brought to light a surprising link with ﬁlter banksconstructed with conjugate mirror ﬁlters, and a fast wavelet transform algorithm

estab-decomposing signals of size N with O (N) operations [361].

Filter Banks

Motivated by speech compression,in 1976 Croisier,Esteban,and Galand [189]

intro-duced an invertible ﬁlter bank, which decomposes a discrete signal f [n] into two

signals of half its size using a ﬁltering and subsampling procedure They showed

that f [n] can be recovered from these subsampled signals by canceling the aliasing terms with a particular class of ﬁlters called conjugate mirror ﬁlters This break-

through led to a 10-year research effort to build a complete filter bank theory.Necessary and sufficient conditions for decomposing a signal in subsampled com-ponents with a filtering scheme, and recovering the same signal with an inversetransform, were established by Smith and Barnwell [444], Vaidyanathan [469], andVetterli [471]

The multiresolution theory of Mallat [362] and Meyer [44] proves that anyconjugate mirror ﬁlter characterizes a wavelet␺ that generates an orthonormal basis

ofL 2(R), and that a fast discrete wavelet transform is implemented by cascading

these conjugate mirror filters [361] The equivalence between this continuous timewavelet theory and discrete filter banks led to a new fruitful interface betweendigital signal processing and harmonic analysis, first creating a culture shock that isnow well resolved

Continuous versus Discrete and Finite

Originally, many signal processing engineers were wondering what is the point ofconsidering wavelets and signals as functions, since all computations are performedover discrete signals with conjugate mirror ﬁlters.Why bother with the convergence

of infinite convolution cascades if in practice we only compute a finite number ofconvolutions? Answering these important questions is necessary in order to under-stand why this book alternates between theorems on continuous time functionsand discrete algorithms applied to finite sequences

A short answer would be “simplicity.” InL 2(R), a wavelet basis is constructed

by dilating and translating a single function␺ Several important theorems relate the

amplitude of wavelet coefﬁcients to the local regularity of the signal f Dilations

are not defined over discrete sequences, and discrete wavelet bases are thereforemore complex to describe The regularity of a discrete sequence is not well definedeither, which makes it more difficult to interpret the amplitude of wavelet coeffi-cients A theory of continuous-time functions gives asymptotic results for discrete

Trang 15

sequences with sampling intervals decreasing to zero This theory is useful becausethese asymptotic results are precise enough to understand the behavior of discretealgorithms.

But continuous time or space models are not sufﬁcient for elaborating discretesignal-processing algorithms.The transition between continuous and discrete signalsmust be done with great care to maintain important properties such as orthogo-nality Restricting the constructions to ﬁnite discrete signals adds another layer ofcomplexity because of border problems How these border issues affect numer-ical implementations is carefully addressed once the properties of the bases arethoroughly understood

Wavelets for Images

Wavelet orthonormal bases of images can be constructed from wavelet orthonormalbases of one-dimensional signals Three mother wavelets␺1(x), ␺2(x), and ␺3(x),

with x ⫽(x1, x2)∈R2, are dilated by 2jand translated by 2j n with n ⫽(n1, n2)∈Z2.This yields an orthonormal basis of the space L 2(R2) of ﬁnite energy functions

The support of a wavelet ␺ k

j ,n is a square of width proportional to the scale 2j.

Two-dimensional wavelet bases are discretized to deﬁne orthonormal bases of

images including N pixels Wavelet coefﬁcients are calculated with the fast O (N)

algorithm described in Chapter 7

Like in one dimension, a wavelet coefﬁcient  f , ␺ k

j,n has a small amplitude if

f (x) is regular over the support of ␺ k

j,n It has a large amplitude near sharp

transi-tions such as edges Figure1.1(b) is the array of N wavelet coefﬁcients Each tion k and scale 2 j corresponds to a subimage, which shows in black the position

direc-of the largest coefﬁcients above a threshold:| f , ␺ k

j,n |⭓T

1.2 APPROXIMATION AND PROCESSING IN BASES

Analog-to-digital signal conversion is the ﬁrst step of digital signal processing.Chapter 3 explains that it amounts to projecting the signal over a basis of an appro-ximation space Most often, the resulting digital representation remains much toolarge and needs to be further reduced A digital image typically includes more than

106 samples and a CD music recording has 40⫻103 samples per second Sparserepresentations that reduce the number of parameters can be obtained by thres-holding coefﬁcients in an appropriate orthogonal basis Efﬁcient compression andnoise-reduction algorithms are then implemented with simple operators in thisbasis

Trang 16

approximation from the N /16 wavelet coefﬁcients at the three largest scales (d) Nonlinear

approximation from the M ⫽N/16 wavelet coefﬁcients of largest amplitude shown in (b).

Stochastic versus Deterministic Signal Models

A representation is optimized relative to a signal class, corresponding to all tial signals encountered in an application This requires building signal models thatcarry available prior information

poten-A signal f can be modeled as a realization of a random process F , the probability

distribution of which is known a priori A Bayesian approach then tries to minimize

Trang 17

the expected approximation error Linear approximations are simpler because theyonly depend on the covariance Chapter 9 shows that optimal linear approxima-tions are obtained on the basis of principal components that are the eigenvectors

of the covariance matrix However, the expected error of nonlinear approximations

depends on the full probability distribution of F This distribution is most often

not known for complex signals, such as images or sounds, because their transientstructures are not adequately modeled as realizations of known processes such asGaussian ones

To optimize nonlinear representations, weaker but sufficiently powerful ministic models can be elaborated A deterministic model specifies a set⌰, wherethe signal belongs This set is defined by any prior information—for example, on thetime-frequency localization of transients in musical recordings or on the geometricregularity of edges in images Simple models can also define⌰ as a ball in a functionalspace, with a specific regularity norm such as a total variation norm A stochasticmodel is richer because it provides the probability distribution in⌰ When this dis-tribution is not available, the average error cannot be calculated and is replaced bythe maximum error over⌰ Optimizing the representation then amounts to mini-

deter-mizing this maximum error, which is called a minimax optimization.

1.2.1 Sampling with Linear Approximations

Analog-to-digital signal conversion is most often implemented with a linear mation operator that ﬁlters and samples the input analog signal From these samples,

approxi-a lineapproxi-ar digitapproxi-al-to-approxi-anapproxi-alog converter recovers approxi-a projection of the originapproxi-al approxi-anapproxi-alog signapproxi-alover an approximation space whose dimension depends on the sampling density.Linear approximations project signals in spaces of lowest possible dimensions toreduce computations and storage cost, while controlling the resulting error

In two dimensions, n ⫽(n1, n2) and x ⫽(x1, x2) These ﬁltered samples can also be

written as inner products:

¯f ¯␾ s (ns)⫽

f (u) ¯␾s(ns ⫺u) du⫽ f (x), ␾s(x ⫺ns)

with ␾ s (x)⫽ ¯␾ s (⫺x) Chapter 3 explains that ␾ s is chosen, like in the sic Shannon–Whittaker sampling theorem, so that a family of functions {␾ s

clas-(x ⫺ns)}1⭐n⭐N is a basis of an appropriate approximation spaceUN The best

lin-ear approximation of ¯f in U recovered from these samples is the orthogonal

Trang 18

projection ¯f N of f inUN, and if the basis is orthonormal, then

¯f N (x)⫽

N⫺1

n⫽0

A sampling theorem states that if ¯f∈UN then ¯f ⫽ ¯f N so (1.4) recovers ¯f (x)

from the measured samples Most often, ¯f does not belong to this approximation space It is called aliasing in the context of Shannon–Whittaker sampling, where

UN is the space of functions having a frequency support restricted to the N lower

frequencies The approximation error ¯f ⫺ ¯f N2must then be controlled

Linear Approximation Error

The approximation error is computed by ﬁnding an orthogonal basis B ⫽

{¯g m (x)}0⭐m⬍⫹⬁ of the whole analog signal spaceL 2[0, 1]2, with the ﬁrst N

vec-tor {¯g m (x)}0⭐m⬍N that deﬁnes an orthogonal basis of UN Thus, the orthogonalprojection onUN can be rewritten as

This error decreases quickly when N increases if the coefﬁcient amplitudes | ¯f, ¯g m|

have a fast decay when the index m increases The dimension N is adjusted to the

desired approximation error

Figure1.1(a) shows a discrete image f [n] approximated with N ⫽2562pixels

Figure 1.1(c) displays a lower-resolution image f N /16projected on a spaceUN /16of

dimension N /16,generated by N/16 large-scale wavelets It is calculated by setting

all the wavelet coefﬁcients to zero at the ﬁrst two smaller scales.The approximationerror is f ⫺f N /162/ f 2⫽14 ⫻10⫺3 Reducing the resolution introduces more

blur and errors A linear approximation space UN corresponds to a uniform grid

that approximates precisely uniform regular signals Since images ¯f are often not uniformly regular, it is necessary to measure it at a high-resolution N This is why

digital cameras have a resolution that increases as technology improves

1.2.2 Sparse Nonlinear Approximations

Linear approximations reduce the space dimensionality but can introduce importanterrors when reducing the resolution if the signal is not uniformly regular, as shown

by Figure 1.1(c) To improve such approximations, more coefﬁcients should bekept where needed—not in regular regions but near sharp transitions and edges

Trang 19

This requires deﬁning an irregular sampling adapted to the local signal regularity.This optimized irregular sampling has a simple equivalent solution through nonlinearapproximations in wavelet bases.

Nonlinear approximations operate in two stages First, a linear operator

approx-imates the analog signal ¯f with N samples written f [n]⫽ ¯f ¯␾ s (ns) Then, a

nonlinear approximation of f [n] is computed to reduce the N coefﬁcients f [n]

to M N coefﬁcients in a sparse representation.

The discrete signal f can be considered as a vector of CN Inner products andnorms inCN are written

puted from the N input sample values f [n] with an orthogonal change of basis that takes N2operations in nonstructured bases In a wavelet or Fourier bases, fast

algorithms require, respectively, O (N) and O(N log2N ) operations.

Approximation by Thresholding

For M ⬍N,an approximation f M is computed by selecting the “best”M ⬍N vectors

withinB The orthogonal projection of f on the space V⌳generated by M vectors {g m}m∈⌳inB is

We write |⌳| the size of the set ⌳ The best M ⫽|⌳| term approximation, which

minimizes f ⫺f⌳2, is thus obtained by selecting the M coefﬁcients of largest amplitude These coefﬁcients are above a threshold T that depends on M:

fM ⫽f⌳T⫽

m∈⌳T

 f , g m g m with ⌳T ⫽{m∈⌫ : | f , g m |⭓T }. (1.7)

This approximation is nonlinear because the approximation set⌳T changes with

f The resulting approximation error is:

Trang 20

wavelet coefficients is equivalent to constructing an adaptive approximation gridspecified by the scale–space support⌳T It increases the approximation resolutionwhere the signal is irregular The geometry of⌳T gives the spatial distribution ofsharp image transitions and edges, and their propagation across scales Chapter 6proves that wavelet coefficients give important information about singularitiesand local Lipschitz regularity This example illustrates how approximation support

provides“geometric”information on f ,relative to a dictionary,that is a wavelet basis

in this example

Figure 1.1(d) gives the nonlinear wavelet approximation f M recovered from the

M ⫽N/16 large-amplitude wavelet coefﬁcients, with an error f ⫺f M2/ f 2⫽

5⫻10⫺3 This error is nearly three times smaller than the linear approximation

error obtained with the same number of wavelet coefﬁcients, and the image quality

Since all projections are orthogonal, the overall approximation error on the

orig-inal analog signal ¯f (x) is the sum of the analog sampling error and the discrete

nonlinear error:

¯f ⫺ ¯f M2⫽ ¯f ⫺ ¯f N2⫹ f ⫺ f M2⫽␧ l(N, f )⫹␧n(M, f ).

In practice, N is imposed by the resolution of the signal-acquisition hardware, and

M is typically adjusted so that␧ n (M, f )⭓␧ l (N, f ).

Sparsity with Regularity

Sparse representations are obtained in a basis that takes advantage of some form

of regularity of the input signals, creating many small-amplitude coefﬁcients Sincewavelets have localized support, functions with isolated singularities produce fewlarge-amplitude wavelet coefﬁcients in the neighborhood of these singularities Non-linear wavelet approximation produces a small error over spaces of functions that

do not have “too many” sharp transitions and singularities Chapter 9 shows thatfunctions having a bounded total variation norm are useful models for images withnonfractal (ﬁnite length) edges

Edges often deﬁne regular geometric curves Wavelets detect the location ofedges but their square support cannot take advantage of their potential geometricregularity More sparse representations are deﬁned in dictionaries of curvelets orbandlets, which have elongated support in multiple directions, that can be adapted

to this geometrical regularity In such dictionaries, the approximation support⌳Tissmaller but provides explicit information about edges’ local geometrical propertiessuch as their orientation In this context, geometry does not just apply to multidi-mensional signals Audio signals, such as musical recordings, also have a complexgeometric regularity in time-frequency dictionaries

Trang 21

1.2.3 Compression

Storage limitations and fast transmission through narrow bandwidth channelsrequire compression of signals while minimizing degradation Transform codescompress signals by coding a sparse representation Chapter 10 introduces theinformation theory needed to understand these codes and to optimize theirperformance

In a compression framework, the analog signal has already been discretized into

a signal f [n] of size N This discrete signal is decomposed in an orthonormal basis

B ⫽{g m}m∈⌫ofCN:

f⫽

m∈⌫

 f , g m g m.

Coefﬁcients  f , g m are approximated by quantized values Q( f , g m ) If Q is a

uniform quantizer of step⌬, then |x ⫺Q(x)|⭐⌬/2; and if |x|⬍⌬/2, then Q(x)⫽0 The signal ˜f restored from quantized coefﬁcients is

The coefﬁcients not quantized to zero correspond to the set ⌳T ⫽{m∈⌫:

| f , g m |⭓T } with T ⫽⌬/2 For sparse signals,Chapter 10 shows that the bit budget

Ris dominated by the number of bits to code⌳T in⌫,which is nearly proportional

to its size|⌳T| This means that the “information” about a sparse representation ismostly geometric Moreover, the distortion is dominated by the nonlinear approxi-mation error f ⫺f⌳T2, for f⌳T⫽ m∈⌳T f , g m g m Compression is thus a sparse

approximation problem For a given distortion d (R, f ), minimizing R requires

reducing|⌳T| and thus optimizing the sparsity

The number of bits to code⌳T can take advantage of any prior information onthe geometry Figure1.1(b) shows that large wavelet coefﬁcients are not randomlydistributed They have a tendency to be aggregated toward larger scales, and at ﬁnescales they are regrouped along edge curves or in texture regions Using such priorgeometric models is a source of gain in coders such as JPEG-2000

Chapter 10 describes the implementation of audio transform codes Image form codes in block cosine bases and wavelet bases are introduced, together withthe JPEG and JPEG-2000 compression standards

trans-1.2.4 Denoising

Signal-acquisition devices add noise that can be reduced by estimators using priorinformation on signal properties Signal processing has long remained mostlyBayesian and linear Nonlinear smoothing algorithms existed in statistics, butthese procedures were often ad hoc and complex Two statisticians, Donoho andJohnstone [221], changed the “game” by proving that simple thresholding in sparse

Trang 22

representations can yield nearly optimal nonlinear estimators This was the ning of a considerable reﬁnement of nonlinear estimation algorithms that is stillongoing.

begin-Let us consider digital measurements that add a random noise W [n] to the original signal f [n]:

Bayes versus Minimax

To optimize the estimation operator D,one must take advantage of prior information available about signal f In a Bayes framework, f is considered a realization of a random vector F and the Bayes risk is the expected risk calculated with respect to

the prior probability distribution␲ of the random signal model F:

r(D, ␲)⫽E ␲ {r(D, F)}.

Optimizing D among all possible operators yields the minimum Bayes risk:

rn(␲)⫽ inf all D r (D, ␲).

In the 1940s,Wald brought in a new perspective on statistics with a decision ory partly imported from the theory of games This point of view uses deterministicmodels, where signals are elements of a set⌰, without specifying their probability

the-distribution in this set To control the risk for any f∈⌰,we compute the maximumrisk:

r (D, ⌰)⫽sup

f∈⌰r (D, f ).

The minimax risk is the lower bound computed over all operators D:

rn(⌰)⫽ inf all D r (D, ⌰).

In practice, the goal is to ﬁnd an operator D that is simple to implement and yields

a risk close to the minimax lower bound

Thresholding Estimators

It is tempting to restrict calculations to linear operators D because of their simplicity.

Optimal linear Wiener estimators are introduced in Chapter 11 Figure1.2(a) is animage contaminated by Gaussian white noise Figure 1.2(b) shows an optimized

Trang 23

(a) (b)

FIGURE 1.2

(a) Noisy image X (b) Noisy wavelet coefﬁcients above threshold, |X, ␺ j,n |⭓T (c) Linear

estimation X h (d) Nonlinear estimator recovered from thresholded wavelet coefﬁcients over

several translated bases

linear ﬁltering estimation ˜F ⫽X h[n],which is therefore diagonal in a Fourier basis

B.This convolution operator averages the noise but also blurs the image and keeps

low-frequency noise by retaining the image’s low frequencies

If f has a sparse representation in a dictionary, then projecting X on the

vectors of this sparse support can considerably improve linear estimators The

dif-ﬁculty is identifying the sparse support of f from the noisy data X Donoho and

Trang 24

Johnstone [221] proved that, in an orthonormal basis, a simple thresholding ofnoisy coefﬁcients does the trick Noisy signal coefﬁcients in an orthonormal basis

The set ˜⌳T is an estimate of an approximation support of f It is hopefully close to

the optimal approximation support⌳T ⫽{m∈⌫ : | f , g m |⭓T }.

Figure 1.2(b) shows the estimated approximation set ˜⌳T of noisy-wavelet ﬁcients,|X, ␺ j ,n |⭓T ,that can be compared to the optimal approximation support

coef-⌳T shown in Figure 1.1(b) The estimation in Figure 1.2(d) from wavelet cients in ˜⌳T has considerably reduced the noise in regular regions while keepingthe sharpness of edges by preserving large-wavelet coefﬁcients This estimation isimproved with a translation-invariant procedure that averages this estimator overseveral translated wavelet bases Thresholding wavelet coefﬁcients implements an

coefﬁ-adaptive smoothing, which averages the data X with a kernel that depends on the estimated regularity of the original signal f

Donoho and Johnstone proved that for Gaussian white noise of variance ␴2,

choosing T ⫽␴ 2 logeN yields a risk E{ f ⫺ ˜F2} of the order of f ⫺f⌳T2, up to

a logeN factor This spectacular result shows that the estimated support ˜⌳T doesnearly as well as the optimal unknown support⌳T The resulting risk is small if therepresentation is sparse and precise

The set ˜⌳T in Figure 1.2(b) “looks” different from the ⌳T in Figure 1.1(b)because it has more isolated points This indicates that some prior information

on the geometry of⌳T could be used to improve the estimation For audio reduction,thresholding estimators are applied in sparse representations provided bytime-frequency bases Similar isolated time-frequency coefﬁcients produce a highlyannoying “musical noise.” Musical noise is removed with a block thresholding thatregularizes the geometry of the estimated support ˜⌳T and avoids leaving isolatedpoints Block thresholding also improves wavelet estimators

noise-If W is a Gaussian noise and signals in ⌰ have a sparse representation in B,then

Chapter 11 proves that thresholding estimators can produce a nearly minimax risk

In particular, wavelet thresholding estimators have a nearly minimax risk for largeclasses of piecewise smooth signals, including bounded variation images

1.3 TIME-FREQUENCY DICTIONARIES

Motivated by quantum mechanics, in 1946 the physicist Gabor [267] proposeddecomposing signals over dictionaries of elementary waveforms which he called

Trang 25

time-frequency atoms that have a minimal spread in a time-frequency plane.

By showing that such decompositions are closely related to our perception ofsounds, and that they exhibit important structures in speech and music recordings,Gabor demonstrated the importance of localized time-frequency signal process-ing Beyond sounds, large classes of signals have sparse decompositions as sums oftime-frequency atoms selected from appropriate dictionaries The key issue is tounderstand how to construct dictionaries with time-frequency atoms adapted tosignal properties

1.3.1 Heisenberg Uncertainty

A time-frequency dictionaryD ⫽{␾ ␥}␥∈⌫ is composed of waveforms of unit norm

␾ ␥⫽1, which have a narrow localization in time and frequency The time

locali-zation u of ␾ ␥ and its spread around u, are deﬁned by

shows that f , ␾ ␥ depends mostly on the values f (t) and ˆf(␻), where ␾ ␥ (t) and

ˆ␾ ␥ (␻) are nonnegligible , and hence for (t, ␻) in a rectangle centered at (u, ␰), of

size␴ t ,␥ ⫻␴ ␻,␥ This rectangle is illustrated by Figure 1.3 in this time-frequencyplane(t, ␻) It can be interpreted as a“quantum of information”over an elementary

Trang 26

resolution cell The uncertainty principle theorem proves (see Chapter 2) that thisrectangle has a minimum surface that limits the joint time-frequency resolution:

1.3.2 Windowed Fourier Transform

A windowed Fourier dictionary is constructed by translating in time and frequency

a time window g (t), of unit norm g⫽1, centered at t ⫽0:

D⫽g u ,␰ (t)⫽g(t ⫺u) e i␰t

(u,␰)∈R2.

The atom g u,␰ is translated by u in time and by ␰ in frequency.The time-and-frequency

spread of g u,␰ is independent of u and ␰.This means that each atom g u,␰corresponds

to a Heisenberg rectangle that has a size␴ t ⫻␴ ␻independent of its position(u, ␰),

It can be interpreted as a Fourier transform of f at the frequency ␰, localized by

the window g (t ⫺u) in the neighborhood of u This windowed Fourier transform

is highly redundant and represents one-dimensional signals by a time-frequencyimage in(u, ␰) It is thus necessary to understand how to select many fewer time-

frequency coefﬁcients that represent the signal efﬁciently

Trang 27

When listening to music, we perceive sounds that have a frequency that varies

in time Chapter 4 shows that a spectral line of f creates high-amplitude dowed Fourier coefﬁcients Sf (u, ␰) at frequencies ␰(u) that depend on time u.

win-These spectral components are detected and characterized by ridge points, whichare local maxima in this time-frequency plane Ridge points deﬁne a time-frequencyapproximation support⌳ of f with a geometry that depends on the time-frequency

evolution of the signal spectral components Modifying the sound duration or audiotranspositions are implemented by modifying the geometry of the ridge support intime frequency

A windowed Fourier transform decomposes signals over waveforms that havethe same time and frequency resolution It is thus effective as long as the signal doesnot include structures having different time-frequency resolutions, some being verylocalized in time and others very localized in frequency Wavelets address this issue

by changing the time and frequency resolution

1.3.3 Continuous Wavelet Transform

In reﬂection seismology, Morlet knew that the waveforms sent underground have aduration that is too long at high frequencies to separate the returns of ﬁne, closely

spaced geophysical layers Such waveforms are called wavelets in geophysics.

Instead of emitting pulses of equal duration, he thought of sending shorter forms at high frequencies These waveforms were obtained by scaling the motherwavelet, hence the name of this transform Although Grossmann was working intheoretical physics, he recognized in Morlet’s approach some ideas that were close

wave-to his own work on coherent quantum states

Nearly forty years after Gabor, Morlet and Grossmann reactivated a tal collaboration between theoretical physics and signal processing, which led tothe formalization of the continuous wavelet transform [288] These ideas were nottotally new to mathematicians working in harmonic analysis, or to computer visionresearchers studying multiscale image processing It was thus only the beginning of

fundamen-a rfundamen-apid cfundamen-atfundamen-alysis thfundamen-at brought together scientists with very different bfundamen-ackgrounds

A wavelet dictionary is constructed from a mother wavelet␺ of zero average

The continuous wavelet transform of f at any scale s and position u is the projection

of f on the corresponding wavelet atom:

Trang 28

Varying Time-Frequency Resolution

As opposed to windowed Fourier atoms, wavelets have a time-frequency lution that changes The wavelet ␺ u ,s has a time support centered at u and proportional to s Let us choose a wavelet ␺ whose Fourier transform ˆ␺(␻) is

reso-nonzero in a positive frequency interval centered at␩ The Fourier transform ˆ␺ u,s (␻)

is dilated by 1/s and thus is localized in a positive frequency interval centered at

␰ ⫽␩/s;its size is scaled by 1/s In the time-frequency plane,the Heisenberg box of

a wavelet atom␺ u ,sis therefore a rectangle centered at(u, ␩/s), with time and

fre-quency widths, respectively, proportional to s and 1 /s When s varies, the time and

frequency width of this time-frequency resolution cell changes, but its area remainsconstant, as illustrated by Figure1.5

Large-amplitude wavelet coefﬁcients can detect and measure short frequency variations because they have a narrow time localization at high fre-quencies At low frequencies their time resolution is lower, but they have a betterfrequency resolution This modiﬁcation of time and frequency resolution is adapted

high-to represent sounds with sharp attacks, or radar signals having a frequency that mayvary quickly at high frequencies

Multiscale Zooming

A wavelet dictionary is also adapted to analyze the scaling evolution of transientswith zooming procedures across scales Suppose now that␺ is real Since it has a zero

average,a wavelet coefﬁcient Wf (u, s) measures the variation of f in a neighborhood

of u that has a size proportional to s Sharp signal transitions create large-amplitude

Heisenberg time-frequency boxes of two wavelets,␺ u,sand␺ u0,s0 When the scale s

decreases, the time support is reduced but the frequency spread increases and covers aninterval that is shifted toward high frequencies

Trang 29

Signal singularities have speciﬁc scaling invariance characterized by Lipschitz

exponents Chapter 6 relates the pointwise regularity of f to the asymptotic decay

of the wavelet transform amplitude |Wf (u, s)| when s goes to zero

Singulari-ties are detected by following the local maxima of the wavelet transform acrossscales

In images, wavelet local maxima indicate the position of edges, which are sharp

variations of image intensity It deﬁnes scale–space approximation support of f

from which precise image approximations are reconstructed At different scales,the geometry of this local maxima support provides contours of image structures

of varying sizes This multiscale edge detection is particularly effective for patternrecognition in computer vision [146]

The zooming capability of the wavelet transform not only locates isolated gular events, but can also characterize more complex multifractal signals havingnonisolated singularities Mandelbrot [41] was the ﬁrst to recognize the existence

sin-of multifractals in most corners sin-of nature Scaling one part sin-of a multifractal duces a signal that is statistically similar to the whole This self-similarity appears inthe continuous wavelet transform, which modiﬁes the analyzing scale From globalmeasurements of the wavelet transform decay, Chapter 6 measures the singular-ity distribution of multifractals This is particularly important in analyzing theirproperties and testing multifractal models in physics or in ﬁnancial time series

pro-1.3.4 Time-Frequency Orthonormal Bases

Orthonormal bases of time-frequency atoms remove all redundancy and deﬁne ble representations.A wavelet orthonormal basis is an example of the time-frequencybasis obtained by scaling a wavelet␺ with dyadic scales s ⫽2 jand translating it by

sta-2j n, which is written␺ j ,n In the time-frequency plane, the Heisenberg resolutionbox of␺ j ,nis a dilation by 2j and translation by 2j nof the Heisenberg box of␺.

A wavelet orthonormal is thus a subdictionary of the continuous wavelet transformdictionary, which yields a perfect tiling of the time-frequency plane illustrated inFigure1.6

One can construct many other orthonormal bases of time-frequency atoms, responding to different tilings of the time-frequency plane Wavelet packet and localcosine bases are two important examples constructed in Chapter 8, with time-frequency atoms that split the frequency and the time axis, respectively, in intervals

cor-of varying sizes

Wavelet Packet Bases

Wavelet bases divide the frequency axis into intervals of 1 octave bandwidth.Coifman, Meyer, and Wickerhauser [182] have generalized this construction withbases that split the frequency axis in intervals of bandwidth that may be adjusted.Each frequency interval is covered by the Heisenberg time-frequency boxes ofwavelet packet functions translated in time, in order to cover the whole plane,

as shown by Figure 1.7

Trang 30

A wavelet packet basis divides the frequency axis in separate intervals of varying sizes A tiling

is obtained by translating in time the wavelet packets covering each frequency interval

As for wavelets, wavelet-packet coefficients are obtained with a filter bank ofconjugate mirror filters that split the frequency axis in several frequency intervals.Different frequency segmentations correspond to different wavelet packet bases.For images, a filter bank divides the image frequency support in squares of dyadicsizes that can be adjusted

Local Cosine Bases

Local cosine orthonormal bases are constructed by dividing the time axis instead

of the frequency axis The time axis is segmented in successive intervals[a p , a p⫹1].The local cosine bases of Malvar [368] are obtained by designing smooth windows

g p (t) that cover each interval [a p , a p⫹1], and by multiplying them by cosine tions cos(␰t ⫹␾) of different frequencies This is yet another idea that has been

func-independently studied in physics, signal processing, and mathematics Malvar’s inal construction was for discrete signals At the same time, the physicist Wilson[486] was designing a local cosine basis, with smooth windows of inﬁnite support,

Trang 31

A local cosine basis divides the time axis with smooth windows g p (t) and translates these

windows into frequency

to analyze the properties of quantum coherent states Malvar bases were also covered and generalized by the harmonic analysts Coifman and Meyer [181] Thesedifferent views of the same bases brought to light mathematical and algorithmicproperties that opened new applications

redis-A multiplication by cos(␰t ⫹␾) translates the Fourier transform ˆg p (␻) of g p (t) by

⫾␰ Over positive frequencies, the time-frequency box of the modulated window

g p (t) cos(␰t ⫹␾) is therefore equal to the time-frequency box of g p translated by

␰ along frequencies Figure 1.8 shows the time-frequency tiling corresponding to

such a local cosine basis For images, a two-dimensional cosine basis is constructed

by dividing the image support in squares of varying sizes

1.4 SPARSITY IN REDUNDANT DICTIONARIES

In natural languages, large dictionaries are needed to reﬁne ideas with short tences, and they evolve with usage Eskimos have eight different words to describe

sen-snow quality, whereas a single word is typically sufﬁcient in a Parisian dictionary.Similarly, large signal dictionaries of vectors are needed to construct sparse rep-resentations of complex signals However, computing and optimizing a signal

approximation by choosing the best M dictionary vectors is much more difﬁcult.

1.4.1 Frame Analysis and Synthesis

Suppose that a sparse family of vectors{␾ p}p∈⌳has been selected to approximate

a signal f An approximation can be recovered as an orthogonal projection in

Trang 32

the spaceV⌳ generated by these vectors We then face one of the following twoproblems.

1 In a dual-synthesis problem,the orthogonal projection f⌳of f inV⌳must becomputed from dictionary coefﬁcients,{ f , ␾ p}p∈⌳, provided by an analysisoperator This is the case when a signal transform{ f , ␾ p}p∈⌫is calculated insome large dictionary and a subset of inner products are selected Such innerproducts may correspond to coefﬁcients above a threshold or local maximavalues

2 In a dual-analysis problem, the decomposition coefﬁcients of f⌳ must becomputed on a family of selected vectors {␾ p}p∈⌳ This problem appearswhen sparse representation algorithms select vectors as opposed to innerproducts This is the case for pursuit algorithms, which compute approxima-tion supports in highly redundant dictionaries

The frame theory gives energy equivalence conditions to solve both problemswith stable operators A family{␾ p}p∈⌳is a frame of the spaceV it generates if there

exists B ⭓A⬎0 such that

᭙h∈V, Ah2⭐

m∈⌳

|h, ␾ p|2⭐ Bh2.

The representation is stable since any perturbation of frame coefﬁcients implies

a modiﬁcation of similar magnitude on h Chapter 5 proves that the existence

of a dual frame { ˜␾ p}p∈⌳ that solves both the dual-synthesis and dual-analysisproblems:

The frame bounds A and B are redundancy factors If the vectors {␾ p}p∈⌫ are

normalized and linearly independent, then A ⭐1⭐B Such a dictionary is called a

Riesz basisofV and the dual frame is biorthogonal:

Trang 33

sampling their time, frequency, and scaling parameters, while controlling framebounds In two dimensions, directional wavelet frames include wavelets sensitive

to directional image structures such as textures or edges

To improve the sparsity of images having edges along regular geometric curves,Candès and Donoho [134] introduced curvelet frames, with elongated waveformshaving different directions, positions, and scales Images with piecewise regularedges have representations that are asymptotically more sparse by thresholdingcurvelet coefﬁcients than wavelet coefﬁcients

1.4.2 Ideal Dictionary Approximations

In a redundant dictionaryD ⫽{␾ p}p∈⌫, we would like to ﬁnd the best approximationsupport⌳ with M ⫽|⌳| vectors, which minimize the error f ⫺f⌳2 Chapter 12proves that it is equivalent to ﬁnd⌳T, which minimizes the corresponding appro-ximation Lagrangian

L0(T , f , ⌳)⫽ f ⫺f⌳2⫹T2|⌳|, (1.16)

for some multiplier T

Compression and denoising are two applications of redundant dictionaryapproximations When compressing signals by quantizing dictionary coefﬁcients,the distortion rate varies,like the Lagrangian (1.16),with a multiplier T that depends

on the quantization step Optimizing the coder is thus equivalent to minimizing thisapproximation Lagrangian For sparse representations, most of the bits are devoted

to coding the geometry of the sparse approximation set⌳T in⌫

Estimators reducing noise from observations X ⫽f ⫹W are also optimized by ﬁnding a best orthogonal projector over a set of dictionary vectors The model

selectiontheory of Barron, Birgé, and Massart [97] proves that ﬁnding ˜⌳T, whichminimizes this same LagrangianL0(T , X, ⌳),deﬁnes an estimator that has a risk on

the same order as the minimum approximation error f ⫺f⌳T2up to a logarithmicfactor This is similar to the optimality result obtained for thresholding estimators

in an orthonormal basis

The bad news is that minimizing the approximation LagrangianL0is an NP-hardproblem and is therefore computationally intractable It is necessary therefore toﬁnd algorithms that are sufﬁciently fast to compute suboptimal, but “good enough,”solutions

Dictionaries of Orthonormal Bases

To reduce the complexity of optimal approximations, the search can be reduced tosubfamilies of orthogonal dictionary vectors In a dictionary of orthonormal bases,any family of orthogonal dictionary vectors can be complemented to form an orthog-onal basisB included in D.As a result,the best approximation of f from orthogonal

vectors inB is obtained by thresholding the coefﬁcients of f in a“best basis”in D.

For tree dictionaries of orthonormal bases obtained by a recursive split oforthogonal vector spaces, the fast, dynamic programming algorithm of Coifman and

Trang 34

Wickerhauser [182] ﬁnds such a best basis with O (P) operations, where P is the

dictionary size

Wavelet packet and local cosine bases are examples of tree dictionaries of

time-frequency orthonormal bases of size P ⫽N log2N A best basis is a time-frequencytiling that is the best match to the signal time-frequency structures

To approximate geometrically regular edges, wavelets are not as efﬁcient ascurvelets, but wavelets provide more sparse representations of singularities that arenot distributed along geometrically regular curves Bandlet dictionaries, introduced

by Le Pennec, Mallat, and Peyré [342, 365], are dictionaries of orthonormal basesthat can adapt to the variability of images’ geometric regularity Minimax optimalasymptotic rates are derived for compression and denoising

1.4.3 Pursuit in Dictionaries

Approximating signals only from orthogonal vectors brings rigidity that limits theability to optimize the representation Pursuit algorithms remove this constraintwith ﬂexible procedures that search for sparse, although not necessarily optimal,dictionary approximations Such approximations are computed by optimizing thechoice of dictionary vectors{␾ p}p∈⌳

Matching Pursuit

Matching pursuit algorithms introduced by Mallat and Zhang [366] are greedy rithms that optimize approximations by selecting dictionary vectors one by one.The vector in␾ p0∈D that best approximates a signal f is

algo-␾p0⫽ argmax

p∈⌫ | f , ␾ p|

and the residual approximation error is

Rf ⫽ f ⫺ f , ␾ p0 ␾ p0.

A matching pursuit further approximates the residue Rf by selecting another

best vector ␾ p1 from the dictionary and continues this process over next-order

residues R m f, which produces a signal decomposition:

The approximation from the M-selected vectors {␾ p m}0⭐m⬍M can be reﬁned with

an orthogonal back projection on the space generated by these vectors An onal matching pursuit further improves this decomposition by orthogonalizingprogressively the projection directions␾ p mduring the decompositon The resultingdecompositions are applied to compression, denoising, and pattern recognition ofvarious types of signals, images, and videos

Trang 35

orthog-Basis Pursuit

Approximating f with a minimum number of nonzero coefﬁcients a[ p] in a

dic-tionaryD is equivalent to minimizing the l0 norma0, which gives the number

of nonzero coefﬁcients Thisl0norm is highly nonconvex, which explains why theresulting minimization is NP-hard Donoho and Chen [158] thus proposed replac-ing thel0norm by thel1norma1⫽ p∈⌫|a[ p]|, which is convex The resulting

basis pursuit algorithm computes a synthesis operator

This optimal solution is calculated with a linear programming algorithm

A basis pursuit is computationally more intense than a matching pursuit, but

it is a more global optimization that yields representations that can be moresparse

In approximation, compression, or denoising applications, f is recovered with

an error bounded by a precision parameter␧.The optimization (1.18) is thus relaxed

by ﬁnding a synthesis such that

decrea-Incoherence for Support Recovery

Matching pursuit andl1Lagrangian pursuits are optimal if they recover the imation support⌳T, which minimizes the approximation Lagrangian

approx-L0(T , f , ⌳)⫽ f ⫺f⌳2⫹T2|⌳|, where f⌳is the orthogonal projection of f in the spaceV⌳generated by{␾ p}p∈⌳.This is not always true and depends on⌳T An Exact Recovery Criteria proved by

Tropp [464] guarantees that pursuit algorithms do recover the optimal support

Trang 36

where{ ˜␾ p}p∈⌳T is the biorthogonal basis of{␾ p}p∈⌳T inV⌳T This criterion impliesthat dictionary vectors␾ qoutside⌳Tshould have a small inner product with vectors

in⌳T

This recovery is stable relative to noise perturbations if{␾ p}p∈⌳has Riesz boundsthat are not too far from 1 These vectors should be nearly orthogonal and hencehave small inner products These small inner-product conditions are interpreted

as a form of incoherence A stable recovery of⌳T is possible if vectors in⌳T areincoherent with respect to other dictionary vectors and are incoherent betweenthemselves It depends on the geometric conﬁguration of⌳T in⌫

1.5 INVERSE PROBLEMS

Most digital measurement devices, such as cameras, microphones, or medical ing systems, can be modeled as a linear transformation of an incoming analogsignal, plus noise due to intrinsic measurement ﬂuctuations or to electronic noises.This linear transformation can be decomposed into a stable analog-to-digital linear

imag-conversion followed by a discrete operator U that carries the speciﬁc

trans-fer function of the measurement device The resulting measured data can bewritten

Y [q]⫽Uf [q]⫹W [q], where f∈CN is the high-resolution signal we want to recover, and W [q] is the

measurement noise For a camera with an optic that is out of focus, the operator

U is a low-pass convolution producing a blur For a magnetic resonance imaging

system, U is a Radon transform integrating the signal along rays and the number

Q of measurements is smaller than N In such problems, U is not invertible and recovering an estimate of f is an ill-posed inverse problem.

Inverse problems are among the most difﬁcult signal-processing problems withconsiderable applications When data acquisition is difﬁcult, costly, or dangerous, orwhen the signal is degraded, super-resolution is important to recover the highestpossible resolution information.This applies to satellite observations,seismic explo-ration,medical imaging,radar,camera phones,or degraded Internet videos displayed

on high-resolution screens Separating mixed information sources from fewer surements is yet another super-resolution problem in telecommunication or audiorecognition

mea-Incoherence, sparsity, and geometry play a crucial role in the solution of

ill-deﬁned inverse problems.With a sensing matrix U with random coefﬁcients,Candès

and Tao [139] and Donoho [217] proved that super-resolution becomes stable forsignals having a sufﬁciently sparse representation in a dictionary This remarkableresult opens the door to new compression sensing devices and algorithms thatrecover high-resolution signals from a few randomized linear measurements

Trang 37

1.5.1 Diagonal Inverse Estimation

In an ill-posed inverse problem,

Y ⫽Uf ⫹W

the image spaceImU⫽{Uh : h∈C N } of U is of dimension Q smaller than the resolution space N where f belongs Inverse problems include two difﬁculties.

high-In the image spaceImU, where U is invertible, its inverse may amplify the noise

W, which then needs to be reduced by an efﬁcient denoising procedure In thenull space NullU, all signals h are set to zero Uh⫽0 and thus disappear in the

measured data Y Recovering the projection of f in NullU requires using some

strong prior information A super-resolution estimator recovers an estimation of f

in a dimension space larger than Q and hopefully equal to N , but this is not always

possible

Singular Value Decompositions

Let f⫽ m∈⌫a [m] g m be the representation of f in an orthonormal basis B ⫽

{g m}m∈⌫ An approximation must be recovered from

m∈⌫

a [m] Ug m ⫹W

A basisB of singular vectors diagonalizes U∗U Then U transforms a subset of Q

vectors {g m}m∈⌫Q of B into an orthogonal basis {Ug m}m∈⌫Q of ImU and sets all

other vectors to zero A singular value decomposition estimates the coefﬁcients

a [m] of f by projecting Y on this singular basis and by renormalizing the resulting

coefﬁcients

᭙m∈⌫, ˜a[m]⫽ Y , Ug m

Ug m2⫹h2

m ,

whereh2

mare regularization parameters

Such estimators recover nonzero coefﬁcients in a space of dimension Q and thus bring no super-resolution If U is a convolution operator, then B is the

Fourier basis and a singular value estimation implements a regularized inverseconvolution

Diagonal Thresholding Estimation

The basis that diagonalizes U∗U rarely provides a sparse signal representation For

example,a Fourier basis that diagonalizes convolution operators does not efﬁcientlyapproximate signals including singularities

Donoho [214] introduced more ﬂexibility by looking for a basisB providing a

sparse signal representation, where a subset of Q vectors {g m}m∈⌫Qare transformed

by U in a Riesz basis {Ug m}m∈⌫Q ofImU, while the others are set to zero With

an appropriate renormalization,{˜␭⫺1

m Ug m}m∈⌫ has a biorthogonal basis{ ˜␾ m}m∈⌫

Trang 38

that is normalized ˜␾ m ⫽1.The sparse coefﬁcients of f in B can then be estimated

with a thresholding

᭙m∈⌫ Q, ˜a[m]⫽␳T m (˜␭⫺1m Y, ˜␾ m ) with ␳ T (x)⫽x 1 |x|⬎T ,

for thresholds T mappropriately deﬁned

For classes of signals that are sparse in B, such thresholding estimators may

yield a nearly minimax risk, but they provide no super-resolution since this

non-linear projector remains in a space of dimension Q This result applies to classes

of convolution operators U in wavelet or wavelet packet bases Diagonal inverse

estimators are computationally efﬁcient and potentially optimal in cases wheresuper-resolution is not possible

1.5.2 Super-resolution and Compressive Sensing

Suppose that f has a sparse representation in some dictionary D ⫽{g p}p∈⌫ of

P normalized vectors The P vectors of the transformed dictionary D U ⫽UD ⫽ {Ug p}p∈⌫belong to the spaceImU of dimension Q ⬍P and thus deﬁne a redundant

dictionary Vectors in the approximation support⌳ of f are not restricted a priori

to a particular subspace ofCN Super-resolution is possible if the approximationsupport⌳ of f in D can be estimated by decomposing the noisy data Y over D U

It depends on the properties of the approximation support⌳ of f in ⌫.

Geometric Conditions for Super-resolution

Let w⌳⫽f ⫺f⌳ be the approximation error of a sparse representation f⌳⫽

p∈⌳a [ p] g p of f The observed signal can be written as

This shows that super-resolution is possible if the approximation support⌳ can be

identiﬁed by decomposing Y in the redundant transformed dictionary D U If the

exact recovery criteria is satisfy ERC (⌳)⬍1 and if {Ug p}p∈⌳is a Riesz basis, then⌳can be recovered using pursuit algorithms with controlled error bounds

For most operator U, not all sparse approximation sets can be recovered It is

necessary to impose some further geometric conditions on⌳ in ⌫, which makessuper-resolution difﬁcult and often unstable Numerical applications to sparse spikedeconvolution, tomography, super-resolution zooming, and inpainting illustratethese results

Trang 39

Compressive Sensing with Randomness

Candès and Tao [139], and Donoho [217] proved that stable super-resolution

is possible for any sufﬁciently sparse signal f if U is an operator with random

coefﬁcients Compressive sensing then becomes possible by recovering a close

approximation of f∈CN from Q N linear measurements [133].

A recovery is stable for a sparse approximation set|⌳|⭐M only if the

corre-sponding dictionary family {Ug m}m∈⌳ is a Riesz basis of the space it generates

The M-restricted isometry conditions of Candès, Tao, and Donoho [217] imposes

uniform Riesz bounds for all sets⌳⊂⌫ with |⌳|⭐M:

᭙c ∈C|⌳|, (1⫺␦M ) c2⭐

m∈⌳

c [ p] Ug p2⭐(1⫹␦ M ) c2. (1.20)

This is a strong incoherence condition on the P vectors of {Ug m}m∈⌫, which

sup-poses that any subset of less than M vectors is nearly uniformly distributed on the

unit sphere ofImU.

For an orthogonal basis D ⫽{g m}m∈⌫, this is possible for M ⭐C Q(log N)⫺1 if

U is a matrix with independent Gaussian random coefﬁcients A pursuit algorithm

then provides a stable approximation of any f ∈C Nhaving a sparse approximationfrom vectors inD.

These results open a new compressive-sensing approach to signal acquisition andrepresentation Instead of ﬁrst discretizing linearly the signal at a high-resolution

N and then computing a nonlinear representation over M coefﬁcients in some dictionary,compressive-sensing measures directly M randomized linear coefﬁcients.

A reconstructed signal is then recovered by a nonlinear algorithm, producing anerror that can be of the same order of magnitude as the error obtained by the moreclassic two-step approximation process, with a more economic acquisiton process

These results remain valid for several types of random matrices U Examples of

applications to single-pixel cameras, video super-resolution, new analog-to-digitalconverters, and MRI imaging are described

Blind Source Separation

Sparsity in redundant dictionaries also provides efﬁcient strategies to separate afamily of signals{ f s}0⭐s⬍S that are linearly mixed in K ⭐S observed signals with

since S N data values must be recovered from Q ⫽K N ⭐S N measurements Not knowing the operator U makes it even more complicated.

If each source f shas a sparse approximation support⌳sin a dictionaryD, with

S⫺1

s⫽0|⌳s |N, then it is likely that the sets {⌳ s}0⭐s⬍s are nearly disjoint In this

Trang 40

case, the operator U , the supports ⌳s , and the sources f s are approximated by

computing sparse approximations of the observed data Y k inD The distribution

of these coefficients identifies the coefficients of the mixing matrix U and the

nearly disjoint source supports.Time-frequency separation of sounds illustrate theseresults

1.6 TRAVEL GUIDE

1.6.1 Reproducible Computational Science

This book covers the whole spectrum from theorems on functions of continuousvariables to fast discrete algorithms and their applications Section 1.1.2 arguesthat models based on continuous time functions give useful asymptotic results forunderstanding the behavior of discrete algorithms Still, a mathematical analysisalone is often unable to fully predict the behavior and suitability of algorithmsfor speciﬁc signals Experiments are necessary and such experiments should bereproducible, just like experiments in other ﬁelds of science [124]

The reproducibility of experiments requires having complete software and fullsource code for inspection, modiﬁcation, and application under varied parame-ter settings Following this perspective, computational algorithms presented inthis book are available as MATLAB subroutines or in other software packages.Figures can be reproduced and the source code is available Software demonstra-

tions and selected exercise solutions are available at http://wavelet-tour.com For the instructor, solutions are available at www.elsevierdirect.com/9780123743701.

1.6.2 Book Road Map

Some redundancy is introduced between sections to avoid imposing a linear gression through the book The preface describes several possible programs for asparse signal-processing course

pro-All theorems are explained in the text and reading the proofs is not necessary tounderstand the results Most of the book’s theorems are proved in detail, and impor-tant techniques are included Exercises at the end of each chapter give examples ofmathematical, algorithmic, and numeric applications, ordered by level of difﬁculty

from 1 to 4, and selected solutions can be found at http://wavelet-tour.com.

The book begins with Chapters 2 and 3, which review the Fourier transformand linear discrete signal processing They provide the necessary backgroundfor readers with no signal-processing background Important properties of linearoperators, projectors, and vector spaces can be found in the Appendix Local time-frequency transforms and dictionaries are presented in Chapter 4; the wavelet andwindowed Fourier transforms are introduced and compared The measurement ofinstantaneous frequencies illustrates the limitations of time-frequency resolution.Dictionary stability and redundancy are introduced in Chapter 5 through the frametheory,with examples of windowed Fourier,wavelet,and curvelet frames Chapter 6

In images, wavelet local maxima indicate the position of edges, which are sharp

variations of image intensity... Morlet and Grossmann reactivated a tal collaboration between theoretical physics and signal processing, which led tothe formalization of the continuous wavelet transform [288] These ideas were

Định dạng
Số trang	808
Dung lượng	16,43 MB