The image to the right gives the best wavelet packet decomposition of each spatialsegment.. The image to the right givesthe best wavelet packet decomposition for each spatial segment.. W
Trang 1488 CHAPTER 18 Wavelet Image Compression
Time Frequency
(a)
Frequency
Time (b)
Frequency
Time (c)
Frequency
Time (d)
FIGURE 18.14
Tiling representations of several expansions for 1D signals (a) STFT-like decomposition;(b) wavelet decomposition; (c) wavelet packet decomposition, and (d) “anti-wavelet” packetdecomposition
Fig 18.14(d)highlights a wavelet packet expansion where the time-frequency attributesare exactly the reverse of the wavelet case: the expansion has good frequency resolution athigher frequencies, and good time localization at lower frequencies—we might call thisthe “anti-wavelet” packet There are a plethora of other options for the time-frequencyresolution tradeoff, and these all correspond to admissible wavelet packet choices.The extra adaptivity of the wavelet packet framework is obtained at the price ofadded computation in searching for the best wavelet packet basis, so an efficient fastsearch algorithm is the key in applications involving wavelet packets The problem ofsearching for the best basis from the wavelet packet library for the compression problemusing an RD optimization framework and a fast tree-pruning algorithm was described
in[22]
The 1D wavelet packet bases can be easily extended to 2D by writing a 2D basis tion as the product of two 1D basis functions In another words, we can treat the rowsand columns of an image separately as 1D signals The performance gains associated with
func-wavelet packets are obviously image-dependent For difficult images such as Barbara in
Fig 18.12, a wavelet packet decomposition shown inFig 18.15(a)gives much better
cod-ing performance than the wavelet decomposition The wavelet packet decoded Barbara
image at 0.1825 b/p is shown inFig 18.15(b), whose visual quality (or PSNR) is the same
as the wavelet SPIHT decoded Barbara image at 0.25 b/p inFig 18.12 The bit rate savingachieved by using a wavelet packet basis instead of the wavelet basis in this case is 27%
at the same visual quality
An important practical application of wavelet packet expansions is the FBI waveletscalar quantization (WSQ) standard for fingerprint image compression[23] Because ofthe complexity associated with adaptive wavelet packet transforms, the FBI WSQ standarduses a fixed wavelet packet decomposition in the transform stage The transform structurespecified by the FBI WSQ standard is shown inFig 18.16 It was designed for 500 dots perinch fingerprint images by spectral analysis and trial and error A total of 64 subbands aregenerated with a five-level wavelet packet decomposition Trials by the FBI have shownthat the WSQ standard benefited from having fine frequency partitions in the middlefrequency region containing the fingerprint ridge patterns
Trang 2(a) (b)
FIGURE 18.15
(a) A wavelet packet decomposition for the Barbara image White lines represent frequency
boundaries Highpass bands are processed for display; (b) Wavelet packet decoded Barbara at
p/2
p/2
p
p 0
FIGURE 18.16
The wavelet packet transform structure given in the FBI WSQ specification The number
seq-uence shows the labeling of the different subbands
Trang 3490 CHAPTER 18 Wavelet Image Compression
FIGURE 18.17
Space-frequency segmentation and tiling for the Building image The image to the left shows
that spatial segmentation separates the sky in the background from the building and the pond inthe foreground The image to the right gives the best wavelet packet decomposition of each spatialsegment Dark lines represent spatial segments; white lines represent subband boundaries ofwavelet packet decompositions Note that the upper-left corners are the lowpass bands of waveletpacket decompositions
As an extension of adaptive wavelet packet transforms, one can introduce variation by segmenting the signal in time and allowing the wavelet packet bases to evolvewith the signal The result is a time-varying transform coding scheme that can adapt tosignal nonstationarities Computationally fast algorithms are again very important forfinding the optimal signal expansions in such a time-varying system For 2D images, thesimplest of these algorithms performs adaptive frequency segmentations over regions ofthe image selected through a quadtree decomposition More complicated algorithms pro-vide combinations of frequency decomposition and spatial segmentation These jointlyadaptive algorithms work particularly well for highly nonstationary images.Figure 18.17
time-shows the space-frequency tree segmentation and tiling for the Building image[24] Theimage to the left shows the spatial segmentation result that separates the sky in the back-ground from the building and the pond in the foreground The image to the right givesthe best wavelet packet decomposition for each spatial segment
18.7 JPEG2000 AND RELATED DEVELOPMENTS
JPEG2000 by default employs the dyadic wavelet transform for natural images in manystandard applications It also allows the choice of the more general wavelet packet trans-forms for certain types of imagery (e.g., fingerprints and radar images) Instead of usingthe zerotree-based SPIHT algorithm, JPEG2000 relies on embedded block coding with
Trang 4optimized truncation (EBCOT) [25] to provide a rich set of features such as quality
scalability, resolution scalability, spatial random access, and region-of-interest coding
Besides robustness to image type changes in terms of compression performance, the
main advantage of the block-based EBCOT algorithm is that it provides easier random
access to local image components On the other hand, both encoding and decoding in
SPIHT require nonlocal memory access to the whole tree of wavelet coefficients,
caus-ing reduction in throughput when codcaus-ing large-size images A thorough description
of the JPEG2000 standard is in[1] Other JPEG2000 related references areChapter 17
and[26, 27]
Although this chapter is about wavelet coding of 2D images, the wavelet coding
framework and its extension to wavelet packets apply to 3D video as well Recent research
works (see[28]and references therein) on 3D scalable wavelet video coders based on the
framework of motion-compensated temporal filtering (MCTF)[29]have shown
com-petitive or better performance than the best MC-DCT-based standard video coder (e.g.,
H.264/AVC[30]) They have stirred considerable excitement in the video coding
com-munity and stimulated research efforts toward subband/wavelet interframe video coding,
especially in the area of scalable motion coding[31]within the context of MCTF MCTF
can be conceptually viewed as the extension of wavelet-based coding in JPEG2000 from
2D images to 3D video It nicely combines scalability features of wavelet-based coding
with motion compensation, which has been proven to be very efficient and necessary in
MC-DCT-based standard video coders We refer the readers to a recent special issue[32]
on the latest results andChapter 11in[9]for an exposition of 3D subband/wavelet video
coding
18.8 CONCLUSION
Since the introduction of wavelets as a signal processing tool in the late 1980s, a variety of
wavelet-based coding algorithms have advanced the limits of compression performance
well beyond that of the current commercial JPEG image coding standard In this chapter,
we have provided very simple high-level insights, based on the intuitive concept of
time-frequency representations, into why wavelets are good for image coding After introducing
the salient aspects of the compression problem in general and the transform coding
problem in particular, we have highlighted the key important differences between the
early class of subband coders and the more advanced class of modern-day wavelet image
coders Selecting the EZW coding structure embodied in the celebrated SPIHT algorithm
as a representative of this latter class, we have detailed its operation by using a simple
illustrative example We have also described the role of wavelet packets as a simple but
powerful generalization of the wavelet decomposition in order to offer a more robust
and adaptive transform image coding framework
JPEG2000 is the result of the rapid progress made in wavelet image coding research in
the 1990s The triumph of wavelet transform in the evolution of the JPEG2000 standard
underlines the importance of the fundamental insights provided in this chapter into why
wavelets are so attractive for image compression
Trang 5492 CHAPTER 18 Wavelet Image Compression
REFERENCES
[1] D Taubman and M Marcellin JPEG2000: Image Compression Fundamentals, Standards, and Practice Kluwer, New York, 2001.
[2] G Strang and T Nguyen Wavelets and Filter Banks Wellesley-Cambridge Press, New York, 1996.
[3] M Vetterli and J Kovaˇcevi´c Wavelets and Subband Coding Prentice-Hall, Englewood Cliffs, NJ,
[12] M W Marcellin and T R Fischer Trellis coded quantization of memoryless and Gauss-Markov
sources IEEE Trans Commun., 38(1):82–93, 1990.
[13] T Berger Rate Distortion Theory Prentice-Hall, Englewood Cliffs, NJ, 1971.
[14] N Farvardin and J W Modestino Optimum quantizer performance for a class of non-Gaussian
memoryless sources IEEE Trans Inf Theory, 30:485–497, 1984.
[15] D A Huffman A method for the construction of minimum redundancy codes Proc IRE, 40:
1098–1101, 1952.
[16] T C Bell, J G Cleary, and I H Witten Text Compression Prentice-Hall, Englewood Cliffs, NJ,
1990.
[17] J W Woods, editor Subband Image Coding Kluwer Academic, Boston, MA, 1991.
[18] M Antonini, M Barlaud, P Mathieu, and I Daubechies Image coding using wavelet transform.
IEEE Trans Image Process., 1(2):205–220, 1992.
[19] J Shapiro Embedded image coding using zero-trees of wavelet coefficients IEEE Trans Signal Process., 41(12):3445–3462, 1993.
[20] A Said and W A Pearlman A new, fast, and efficient image codec based on set partitioning in
hierarchical trees IEEE Trans Circuits Syst Video Technol., 6(3):243–250, 1996.
[21] R R Coifman and M V Wickerhauser Entropy based algorithms for best basis selection IEEE Trans Inf Theory, 32:712–718, 1992.
[22] K Ramchandran and M Vetterli Best wavelet packet bases in a rate-distortion sense IEEE Trans Image Process., 2(2):160–175, 1992.
[23] Criminal Justice Information Services WSQ Gray-Scale Fingerprint Image Compression Specification (Ver 2.0) Federal Bureau of Investigation, 1993.
Trang 6[24] K Ramchandran, Z Xiong, K Asai, and M Vetterli Adaptive transforms for image coding using
spatially-varying wavelet packets IEEE Trans Image Process., 5:1197–1204, 1996.
[25] D Taubman High performance scalable image compression with EBCOT IEEE Trans Image
Process., 9(7):1151–1170, 2000.
[26] Special Issue on JPEG2000 Signal Process Image Commun., 17(1), 2002.
[27] D Taubman and M Marcellin JPEG2000: standard for interactive imaging Proc IEEE, 90(8):
1336–1357, 2002.
[28] J Ohm, M van der Schaar, and J Woods Interframe wavelet coding – motion picture representation
for universal scalability Signal Process Image Commun., 19(9):877–908, 2004.
[29] S.-T Hsiang and J Woods Embedded video coding using invertible motion compensated 3D
subband/wavelet filter bank Signal Process Image Commun., 16(8):705–724, 2001.
[30] T Wiegand, G Sullivan, G Bjintegaard, and A Luthra Overview of the H.264/AVC video coding
standard IEEE Trans Circuits Syst Video Technol., 13:560–576, 2003.
[31] A Secker and D Taubman Highly scalable video compression with scalable motion coding IEEE
Trans Image Process., 13(8):1029–1041, 2004.
[32] Special issue on subband/wavelet interframe video coding Signal Process Image Commun., 19,
2004.
Trang 719
Gradient and Laplacian
Edge Detection
Phillip A Mlsna 1 and Jeffrey J Rodríguez 2
1Northern Arizona University;2University of Arizona
19.1 INTRODUCTION
One of the most fundamental image analysis operations is edge detection Edges are
often vital clues toward the analysis and interpretation of image information, both in
biological vision and in computer image analysis Some sort of edge detection capability
is present in the visual systems of a wide variety of creatures, so it is obviously useful in
their abilities to perceive their surroundings
For this discussion, it is important to define what is and is not meant by the term
“edge.” The everyday notion of an edge is usually a physical one, caused by either the
shapes of physical objects in three dimensions or by their inherent material properties
Described in geometric terms, there are two types of physical edges: (1) the set of points
along which there is an abrupt change in local orientation of a physical surface and (2) the
set of points describing the boundary between two or more materially distinct regions of
a physical surface Most of our perceptual senses, including vision, operate at a distance
and gather information using receptors that work in, at most, two dimensions Only
the sense of touch, which requires direct contact to stimulate the skin’s pressure sensors,
is capable of direct perception of objects in three-dimensional (3D) space However,
some physical edges of the second type may not be perceptible by touch because material
differences—for instance different colors of paint—do not always produce distinct tactile
sensations Everyone first develops a working understanding of physical edges in early
childhood by touching and handling every object within reach
The imaging process inherently performs a projection from a 3D scene to a
two-dimensional (2D) representation of that scene, according to the viewpoint of the imaging
device Because of this projection process, edges in images have a somewhat different
meaning than physical edges Although the precise definition depends on the
applica-tion context, an edge can generally be defined as a boundary or contour that separates
adjacent image regions having relatively distinct characteristics according to some
fea-ture of interest Most often this feafea-ture is gray level or luminance, but others, such as
495
Trang 8reflectance, color, or texture, are sometimes used In the most common situation whereluminance is of primary interest, edge pixels are those at the locations of abrupt gray levelchange To eliminate single-point impulses from consideration as edge pixels, one usuallyrequires that edges be sustained along a contour; i.e., an edge point must be part of anedge structure having some minimum extent appropriate for the scale of interest Edgedetection is the process of determining which pixels are the edge pixels The result of theedge detection process is typically an edge map, a new image that describes each originalpixel’s edge classification and perhaps additional edge attributes, such as magnitude andorientation.
There is usually a strong correspondence between the physical edges of a set of objectsand the edges in images containing views of those objects Infants and young childrenlearn this as they develop hand–eye coordination, gradually associating visual patternswith touch sensations as they feel and handle items in their vicinity There are manysituations, however, in which edges in an image do not correspond to physical edges Illu-mination differences are usually responsible for this effect—for example, the boundary
of a shadow cast across an otherwise uniform surface
Conversely, physical edges do not always give rise to edges in images This can also
be caused by certain cases of lighting and surface properties Consider what happenswhen one wishes to photograph a scene rich with physical edges—for example, a craggymountain face consisting of a single type of rock When this scene is imaged whilethe sun is directly behind the camera, no shadows are visible in the scene and henceshadow-dependent edges are nonexistent in the photo The only edges in such a photoare produced by the differences in material reflectance, texture, or color Since our rockysubject material has little variation of these types, the result is a rather dull photographbecause of the lack of apparent depth caused by the missing edges Thus images canexhibit edges having no physical counterpart, and they can also miss capturing edgesthat do Although edge information can be very useful in the initial stages of such imageprocessing and analysis tasks as segmentation, registration, and object recognition, edgesare not completely reliable for these purposes
If one defines an edge as an abrupt gray level change, then the derivative, or gradient,
is a natural basis for an edge detector.Figure 19.1illustrates the idea with a continuous,one-dimensional (1D) example of a bright central region against a dark background
The left-hand portion of the gray level function f c (x) shows a smooth transition from dark to bright as x increases There must be a point x0 that marks the transition fromthe low-amplitude region on the left to the adjacent high-amplitude region in the center
The gradient approach to detecting this edge is to locate x0wheref⬘
c (x)reaches a local
maximum or, equivalently, f⬘
c (x) reaches a local extremum, as shown in the second plot of
Fig 19.1 The second derivative, or Laplacian approach, locates x0where a zero-crossing
of f⬘⬘
c (x) occurs, as in the third plot ofFig 19.1 The right-hand side ofFig 19.1illustrates
the case for a falling edge located at x1
To use the gradient or the Laplacian approaches as the basis for practical image edgedetectors, one must extend the process to two dimensions, adapt to the discrete case, andsomehow deal with the difficulties presented by real images Relative to the 1D edges
Trang 9Edge detection in the 1D continuous case; changes in f c(x) indicate edges, and x0and x1are
the edge locations found by local extrema of f c(x) or by zero-crossings of f⬘ ⬘⬘
c (x).
shown inFig 19.1, edges in 2D images have the additional quality of direction One
usually wishes to find edges regardless of direction, but a directionally sensitive edge
detector can be useful at times Also, the discrete nature of digital images requires the
use of an approximation to the derivative Finally, there are a number of problems that
can confound the edge detection process in real images These include noise, crosstalk or
interference between nearby edges, and inaccuracies resulting from the use of a discrete
grid False edges, missing edges, and errors in edge location and orientation are often the
result
Because the derivative operator acts as a highpass filter, edge detectors based on
it are sensitive to noise It is easy for noise inherent in an image to corrupt the real
edges by shifting their apparent locations and by adding many false edge pixels Unless
care is taken, seemingly moderate amounts of noise are capable of overwhelming the
edge detection process, rendering the results virtually useless The wide variety of edge
detection algorithms developed over the past three decades exists, in large part, because
of the many ways proposed for dealing with noise and its effects Most algorithms employ
noise-suppression filtering of some kind before applying the edge detector itself Some
decompose the image into a set of lowpass or bandpass versions, apply the edge detector
to each, and merge the results Still others use adaptive methods, modifying the edge
detector’s parameters and behavior according to the noise characteristics of the image
Trang 10data Some recent work byMathieu et al [20]on fractional derivative operators showssome promise for enriching the gradient and Laplacian possibilities for edge detection.Fractional derivatives may allow better control of noise sensitivity, edge localization, anderror rate under various conditions.
An important tradeoff exists between correct detection of the actual edges and preciselocation of their positions Edge detection errors can occur in two forms: false positives,
in which nonedge pixels are misclassified as edge pixels, and false negatives, which arethe reverse Detection errors of both types tend to increase with noise, making goodnoise suppression very important in achieving a high detection accuracy In general, thepotential for noise suppression improves with the spatial extent of the edge detection filter.Hence, the goal of maximum detection accuracy calls for a large-sized filter Errors inedge localization also increase with noise To achieve good localization, however, the filtershould generally be of small spatial extent The goals of detection accuracy and locationaccuracy are thus put into direct conflict, creating a kind of uncertainty principle for edgedetection[28]
In this chapter, we cover the basics of gradient and Laplacian edge detection methods
in some detail Following each, we also describe several of the more important and usefuledge detection algorithms based on that approach While the primary focus is on graylevel edge detectors, some discussion of edge detection in color and multispectral images
is included
19.2 GRADIENT-BASED METHODS
19.2.1 Continuous Gradient
The core of gradient edge detection is, of course, the gradient operator,ⵜ In continuous
form, applied to a continuous-space image, f c (x,y), the gradient is defined as
ⵜf c (x,y) ⫽ ⭸f c (x,y)
⭸f c (x,y)
where ixand iy are the unit vectors in the x and y directions Notice that the gradient is a
vector, having both magnitude and direction Its magnitude,|ⵜf c (x0 , y0)|, measures themaximum rate of change in the intensity at the location(x0 , y0) Its direction is that ofthe greatest increase in intensity; i.e., it points “uphill.”
To produce an edge detector, one may simply extend the 1D case described earlier.Consider the effect of finding the local extrema ofⵜf c (x,y) or the local maxima of
Trang 1119.2 Gradient-Based Methods 499
the locally strongest of the edge contour points To fully construct edge contours, it is
better to applyEq (19.2)to a 1D local neighborhood, namely a line segment, whose
direction is chosen to cross the edge The situation is then similar to that ofFig 19.1,
where the point of locally maximum gradient magnitude is the edge point Now the issue
becomes how to select the best direction for the line segment used for the search
The most commonly used method of producing edge segments or contours from
Eq (19.2)consists of two stages: thresholding and thinning In the thresholding stage,
the gradient magnitude at every point is compared with a predefined threshold value, T
All points satisfying the following criterion are classified as candidate edge points:
The set of candidate edge points tends to form strips, which have positive width Since
the desire is usually for zero-width boundary segments or contours to describe the edges,
a subsequent processing stage is needed to thin the strips to the final edge contours
Edge contours derived from continuous-space images should have zero width because
any local maxima of ⵜfc (x,y), along a line segment that crosses the edge, cannot be
adjacent points For the case of discrete-space images, the nonzero pixel size imposes a
minimum practical edge width
Edge thinning can be accomplished in a number of ways, depending on the
appli-cation, but thinning by nonmaximum suppression is usually the best choice Generally
speaking, we wish to suppress any point that is not, in a 1D sense, a local maximum
in gradient magnitude Since a 1D local neighborhood search typically produces a
sin-gle maximum, those points that are local maxima will form edge segments only one
point wide One approach classifies an edge-strip point as an edge point if its gradient
magnitude is a local maximum in at least one direction However, this thinning method
sometimes has the side effect of creating false edges near strong edge lines[17] It is also
somewhat inefficient because of the computation required to check along a number of
different directions A better, more efficient thinning approach checks only a single
direc-tion, the gradient direcdirec-tion, to test whether a given point is a local maximum in gradient
magnitude The points that pass this scrutiny are classified as edge points Looking in
the gradient direction essentially searches perpendicular to the edge itself, producing a
scenario similar to the 1D case shown inFig 19.1 The method is efficient because it is not
necessary to search in multiple directions It also tends to produce edge segments
hav-ing good localization accuracy These characteristics make the gradient direction, local
extremum method quite popular The following steps summarize its implementation
1 Using one of the techniques described in the next section, computeⵜf for all
pixels
2 Determine candidate edge pixels by thresholding all pixels’ gradient magnitudes
by T
3 Thin by supressing all candidate edge pixels whose gradient magnitude is not a
local maximum along its gradient direction Those that survive nonmaximum
supression are classified as edge pixels
Trang 12The order of the thinning and thresholding steps might be interchanged Ifthresholding is accomplished first, the computational cost of thinning can be signifi-cantly reduced However, it can become difficult to predict the number of edge pixelsthat will be produced by a given threshold value By thinning first, there tends to besomewhat better predictability of the richness of the resulting edge map as a function ofthe applied threshold.
Consider the effect of performing the thresholding and thinning operations in tion If thresholding alone were done, the edges would show as strips or patches instead
isola-of thin segments If thinning were done without thresholding, that is, if edge points weresimply those having locally maximum gradient magnitude, many false edge points wouldlikely result because of noise Noise tends to create false edge points because some points
in edge-free areas happen to have locally maximum gradient magnitudes The ing step ofEq (19.3)is often useful to reduce noise either prior to or following thinning
threshold-A variety of adaptive methods have been developed that adjust the threshold according
to certain image characteristics, such as an estimate of local signal-to-noise ratio Adaptivethresholding can often do a better job of noise suppression while reducing the amount
Trang 1319.2 Gradient-Based Methods 501
FIGURE 19.3
Gradient edge detection steps, using the Sobel operator: (a) After thresholdingⵜf; (b) after
thinning (a) by finding the local maximum ofⵜfalong the gradient direction
become more broken and fragmented By decreasing T , one can obtain more connected
and richer edge contours, but the greater noise sensitivity is likely to produce more false
edges If only thresholding is used, as inEq (19.3)andFig 19.3(a), the edge strips tend
to narrow as T increases and widen as it decreases. Figure 19.4compares edge maps
obtained from several different threshold values
Sometimes a directional edge detector is useful One can be obtained by decomposing
the gradient into horizontal and vertical components and applying them separately
Expressed in the continuous domain, the operators become:
ⱖT for edges in the x direction.
An example of directional edge detection is illustrated inFig 19.5
A directional edge detector can be constructed for any desired direction by using the
directional derivative along a unit vector n,
⭸f c
⭸n ⫽ ⵜf c (x,y) · n,
Trang 14where is the angle of n relative to the positive x axis The directional derivative is most
sensitive to edges perpendicular to n.
The continuous-space gradient magnitude produces an isotropic or rotationally metric edge detector, equally sensitive to edges in any direction[17] It is easy to showwhyⵜfis isotropic In addition to the original X -Y coordinate system, let us introduce
sym-a new system, X⬘-Y ⬘, which is rotsym-ated by sym-an sym-angle of relsym-ative to X-Y Let n x⬘and ny⬘be
Trang 1519.2 Gradient-Based Methods 503
(a) (b)
FIGURE 19.5
Directional edge detection comparison, using the Sobel operator: (a) results of horizontal
difference operator; (b) results of vertical difference operator
the unit vectors in the x ⬘ and y⬘ directions, respectively For the gradient magnitude to
be isotropic, the same result must be produced in both coordinate systems, regardless of
UsingEq (19.4)along with abbreviated notation, the partial derivatives with respect
to the new coordinate axes are
f x⬘ ⫽ ⵜf · n x⬘ ⫽ f xcos ⫹ f ysin,
f y⬘ ⫽ ⵜf · n y⬘ ⫽ ⫺f xsin ⫹ f ycos.
From this point, it is a simple matter of plugging intoEq (19.2)to show the gradient
magnitudes are identical in both coordinate systems, regardless of the rotation angle,.
Occasionally, one may wish to reduce the computation load ofEq (19.2)by
approxi-mating the square root with a computationally simpler function Three possibilities are
gradient somewhat For instance, the approximated gradient magnitudes ofEqs (19.5),
(19.6), and(19.7)are not isotropic and produce their greatest errors for purely diagonally