Image coding using the wavelet transform.. Information Technology—JPEG2000 Image Coding System: Part 3, Motion JPEG2000.. JPEG Still Image Data Compression Standard.. The JPEG 2000 still
Trang 1If the segmentation symbol is not decoded properly, the data in the corresponding bit
plane and of the subsequent bit planes in the code-block should be discarded Finally,
resynchronization markers, including the numbering of packets, are also inserted in front
of each packet in a tile
The performance of JPEG2000 when compared with the JPEG baseline algorithm is
briefly discussed in this section The extensions included in Part 2 of the JPEG2000
standard are also listed
17.11.1 Comparison of Performance
The efficiency of the JPEG2000 lossy coding algorithm in comparison with the JPEG
baseline compression standard has been extensively studied and key results are
sum-marized in[7, 9, 24] The superior RD and error resilience performance, together with
features such as progressive coding by resolution, scalability, and region of interest, clearly
demonstrate the advantages of JPEG2000 over the baseline JPEG (with optimum
Huff-man codes) For coding common test images such as ForeHuff-man and Lena in the range
of 0.125-1.25 bits/pixel, an improvement in the peak signal-to-noise ratio (PSNR) for
JPEG2000 is consistently demonstrated at each compression ratio For example, for the
Foreman image, an improvement of 1.5 to 4 dB is observed as the bits per pixel are reduced
from 1.2 to 0.12[7]
17.11.2 Part 2 Extensions
Most of the technologies that have not been included in Part 1 due to their complexity
or because of intellectual property rights (IPR) issues have been included in Part 2[14]
These extensions concern the use of the following:
■ different offset values for the different image components;
■ different deadzone sizes for the different subbands;
■ TCQ[23];
■ visual masking based on the application of a nonlinearity to the wavelet coefficients
[44, 45];
■ arbitrary wavelet decomposition for each tile component;
■ arbitrary wavelet filters;
■ single sample tile overlap;
■ arbitrary scaling of the ROI coefficients with the necessity to code and transmit the
ROI mask to the decoder;
Trang 2458 CHAPTER 17 JPEG and JPEG2000
■ nonlinear transformations of component samples and transformations todecorrelate multiple component data;
■ extensions to the JP2 file format
Some sources and links for further information on the standards are provided here
17.12.1 Useful Information and Links for the JPEG Standard
A key source of information on the JPEG compression standard is the book byPennebakerand Mitchell [28] This book also contains the entire text of the official committee draftinternational standard ISO DIS 10918-1 and ISO DIS 10918-2 The official standardsdocument[11]contains information on JPEG Part 3
The JPEG committee maintains an official websitehttp://www.jpeg.org, which tains general information about the committee and its activities, announcements, andother useful links related to the different JPEG standards The JPEG FAQ is located athttp://www.faqs.org/faqs/jpeg-faq/part1/preamble.html
con-Free, portable C code for JPEG compression is available from the Independent JPEGGroup (IJG) Source code, documentation, and test files are included Version 6b isavailable from
17.12.2 Useful Information and Links for the JPEG2000 Standard
Useful sources of information on the JPEG2000 compression standard include two bookspublished on the topic[1, 36] Further information on the different parts of the JPEG2000standard can be found on the JPEG websitehttp://www.jpeg.org/jpeg2000.html Thiswebsite provide links to sites from which various official standards and other documents
Trang 3can be downloaded It also provides links to sites from which software implementations
of the standard can be downloaded Some software implementations are available at the
following addresses:
■ JJ2000 software that can be accessed athttp://www.jpeg2000.epfl.ch The JJ2000
software is a Java implementation of JPEG2000 Part 1
■ Kakadu software that can be accessed at http://www.ee.unsw.edu.au/taubman/
kakadu The Kakadu software is a C++ implementation of JPEG2000 Part 1
The Kakadu software is provided with the book[36]
■ Jasper software that can be accessed athttp://www.ece.ubc.ca/mdadams/jasper/
Jasperis a C implementation of JPEG2000 that is free for commercial use
[3] A J Ahumada and H A Peterson Luminance model based DCT quantization for color image
compression Human Vision, Visual Processing, and Digital Display III, Proc SPIE, 1666:365–374,
1992.
[4] A Antonini, M Barlaud, P Mathieu, and I Daubechies Image coding using the wavelet transform.
IEEE Trans Image Process., 1(2):205–220, 1992.
[5] E Atsumi and N Farvardin Lossy/lossless region-of-interest image coding based on set partitioning
in hierarchical trees In Proc IEEE Int Conf Image Process., 1(4–7):87–91, October 1998.
[6] A Bilgin, P J Sementilli, and M W Marcellin Progressive image coding using trellis coded
quantization IEEE Trans Image Process., 8(11):1638–1643, 1999.
[7] D Chai and A Bouzerdoum JPEG2000 image compression: an overview Australian and New
Zealand Intelligent Information Systems Conference (ANZIIS’2001), Perth, Australia, 237–241,
November 2001.
[8] C Christopoulos, J Askelof, and M Larsson Efficient methods for encoding regions of interest
in the upcoming JPEG2000 still image coding standard IEEE Signal Process Lett., 7(9):247–249,
2000.
[9] C Christopoulos, A Skodras, and T Ebrahimi The JPEG 2000 still image coding system: an
over-view IEEE Trans Consum Electron., 46(4):1103–1127, 2000.
[10] K W Chun, K W Lim, H D Cho, and J B Ra An adaptive perceptual quantization algorithm for
video coding IEEE Trans Consum Electron., 39(3):555–558, 1993.
[11] ISO/IEC JTC 1/SC 29/WG 1 N 993 Information technology—digital compression and coding of
continuous-tone still images Recommendation T.84 ISO/IEC CD 10918-3 1994.
[12] ISO/IEC International standard 14492 and ITU recommendation T.88 JBIG2 Bi-Level Image
Compression Standard 2000.
[13] ISO/IEC International standard 15444-1 and ITU recommendation T.800 Information
Technology—JPEG2000 Image Coding System 2000.
Trang 4460 CHAPTER 17 JPEG and JPEG2000
[14] ISO/IEC International standard 15444-2 and ITU recommendation T.801 Information Technology—JPEG2000 Image Coding System: Part 2, Extensions 2001.
[15] ISO/IEC International standard 15444-3 and ITU recommendation T.802 Information Technology—JPEG2000 Image Coding System: Part 3, Motion JPEG2000 2001.
[16] ISO/IEC International standard 15444-4 and ITU recommendation T.803 Information Technology—JPEG2000 Image Coding System: Part 4, Compliance Testing 2001.
[17] ISO/IEC International standard 15444-5 and ITU recommendation T.804 Information Technology—JPEG2000 Image Coding System: Part 5, Reference Software 2001.
[18] N Jayant, R Safranek, and J Johnston Signal compression based on models of human perception.
Proc IEEE, 83:1385–1422, 1993.
[19] JPEG2000 http://www.jpeg.org/jpeg2000/
[20] L Karam Lossless Image Compression, Chapter 15, The Essential Guide to Image Processing Elsevier
Academic Press, Burlington, MA, 2008.
[21] K Konstantinides and D Tretter A method for variable quantization in JPEG for improved text
quality in compound documents In Proc IEEE Int Conf Image Process., Chicago, IL, October 1998.
[22] D Le Gall and A Tabatabai Subband coding of digital images using symmetric short kernel
filters and arithmetic coding techniques In Proc Intl Conf on Acoust., Speech and Signal Process., ICASSP’88, 761–764, April 1988.
[23] M W Marcellin and T R Fisher Trellis coded quantization of memoryless and Gauss-Markov
sources IEEE Trans Commun., 38(1):82–93, 1990.
[24] M W Marcellin, M J Gormish, A Bilgin, and M P Boliek An overview of JPEG2000 In Proc of IEEE Data Compression Conference, 523–541, 2000.
[25] N Memon, C Guillemot, and R Ansari The JPEG Lossless Compression Standards Chapter 5.6, Handbook of Image and Video Processing Elsevier Academic Press, Burlington, MA, 2005 [26] P Moulin Multiscale Image Decomposition and Wavelets, Chapter 6, The Essential Guide to Image Processing Elsevier Academic Press, Burlington, MA, 2008.
[27] W B Pennebaker, J L Mitchell, G G Langdon, and R B Arps An overview of the basic principles
of the q-coder adaptive binary arithmetic coder IBM J Res Dev., 32(6):717–726, 1988.
[28] W B Pennebaker and J L Mitchell JPEG Still Image Data Compression Standard Van Nostrand
Reinhold, New York, 1993.
[29] M Rabbani and R Joshi An overview of the JPEG2000 still image compression standard Elsevier
J Signal Process., 17:3–48, 2002.
[30] V Ratnakar and M Livny RD-OPT: an efficient algorithm for optimizing DCT quantization tables.
IEEE Proc Data Compression Conference (DCC), Snowbird, UT, 332–341, 1995.
[31] K R Rao and P Yip Discrete Cosine Transform—Algorithms, Advantages, Applications Academic
Press, San Diego, CA, 1990.
[32] P J Sementilli, A Bilgin, J H Kasner, and M W Marcellin Wavelet tcq: submission to JPEG2000.
In Proc SPIE, Applications of Digital Processing, 2–12, July 1998.
[33] A Skodras, C Christopoulos, and T Ebrahimi The JPEG 2000 still image compression standard.
IEEE Signal Process Mag., 18(5):36–58, 2001.
[34] B J Sullivan, R Ansari, M L Giger, and H MacMohan Relative effects of resolution and
quanti-zation on the quality of compressed medical images In Proc IEEE Int Conf Image Process., Austin,
TX, 987–991, November 1994.
Trang 5[35] D Taubman High performance scalable image compression with ebcot IEEE Trans Image Process.,
9(7):1158–1170, 1999.
[36] D Taubman and M.W Marcellin JPEG2000: Image Compression Fundamentals: Standards and
Practice Kluwer Academic Publishers, New York, 2002.
[37] R VanderKam and P Wong Customized JPEG compression for grayscale printing In Proc Data
Compression Conference (DCC), Snowbird, UT, 156–165, 1994.
[38] M Vetterli and J Kovacevic Wavelet and Subband Coding Prentice-Hall, Englewood Cliffs, NJ,
1995.
[39] G K Wallace The JPEG still picture compression standard Commun ACM, 34(4):31–44, 1991.
[40] P W Wang Image Quantization, Halftoning, and Printing Chapter 8.1, Handbook of Image and
Video Processing Elsevier Academic Press, Burlington, MA, 2005.
[41] A B Watson Visually optimal DCT quantization matrices for individual images In Proc IEEE
Data Compression Conference (DCC), Snowbird, UT, 178–187, 1993.
[42] I H Witten, R M Neal, and J G Cleary Arithmetic coding for data compression Commun ACM,
30(6):520–540, 1987.
[43] World Wide Web Consortium (W3C) Extensible Markup Language (XML) 1.0, 3rd ed., T Bray,
J Paoli, C M Sperberg-McQueen, E Maler, F Yergeau, editors, http://www.w3.org/TR/REC-xml ,
2004.
[44] W Zeng, S Daly, and S Lei Point-wise extended visual masking for JPEG2000 image compression.
In Proc IEEE Int Conf Image Process., Vancouver, BC, Canada, vol 1, 657–660, September 2000.
[45] W Zeng, S Daly, and S Lei Visual optimization tools in JPEG2000 In Proc IEEE Int Conf Image
Process., Vancouver, BC, Canada, vol 2, 37–40, September 2000.
Trang 618
Wavelet Image Compression
Zixiang Xiong 1 and Kannan Ramchandran 2
1Texas A&M University; 2University of California
FOR IMAGE CODING?
During the past 15 years, wavelets have made quite a splash in the field of image
compression The FBI adopted a wavelet-based standard for fingerprint image
com-pression The JPEG2000 image compression standard[1], which is a much more efficient
alternative to the old JPEG standard (seeChapter 17), is also based on wavelets A natural
question to ask then is why wavelets have made such an impact on image compression
This chapter will answer this question, providing both high-level intuition and
illustra-tive details based on state-of-the-art wavelet-based coding algorithms Visually appealing
time-frequency-based analysis tools are sprinkled in generously to aid in our task
Wavelets are tools for decomposing signals, such as images, into a hierarchy of
increas-ing resolutions: as we consider more and more resolution layers, we get a more and more
detailed look at the image Figure 18.1shows a three-level hierarchy wavelet
decom-position of the popular test image Lena from coarse to fine resolutions (for a detailed
treatment on wavelets and multiresolution decompositions, also seeChapter 6) Wavelets
can be regarded as “mathematical microscopes” that permit one to “zoom in” and “zoom
out” of images at multiple resolutions The remarkable thing about the wavelet
decom-position is that it enables this zooming feature at absolutely no cost in terms of excess
redundancy: for an M ⫻ N image, there are exactly MN wavelet coefficients—exactly the
same as the number of original image pixels (seeFig 18.2)
As a basic tool for decomposing signals, wavelets can be considered as duals to the
more traditional Fourier-based analysis methods that we encounter in traditional
under-graduate engineering curricula Fourier analysis associates the very intuitive engineering
concept of “spectrum” or “frequency content” of the signal Wavelet analysis, in
con-trast, associates the equally intuitive concept of “resolution” or “scale” of the signal At
a functional level, Fourier analysis is to wavelet analysis as spectrum analyzers are to
microscopes
As wavelets and multiresolution decompositions have been described in greater depth
inChapter 6, our focus here will be more on the image compression application Our
goal is to provide a self-contained treatment of wavelets within the scope of their role 463
Trang 7A three-level hierarchy wavelet decomposition of the 512⫻ 512 color Lena image Level 1
(512⫻ 512) is the one-level wavelet representation of the original Lena at Level 0; Level 2
(256⫻ 256) shows the one-level wavelet representation of the lowpass image at Level 1; andLevel 3 (128⫻ 128) gives the one-level wavelet representation of the lowpass image at Level 2
Trang 818.1 What Are Wavelets: Why Are They Good for Image Coding? 465
FIGURE 18.2
A level wavelet representation of the Lena image generated from the top view of the
three-level hierarchy wavelet decomposition inFig 18.1 It has exactly the same number of samples
as in the image domain
in image compression More importantly, our goal is to provide a high-level explanation
for why they are well suited for image compression Indeed, wavelets have superior
properties vis-a-vis the more traditional Fourier-based method in the form of the discrete
cosine transform (DCT) that is deployed in the old JPEG image compression standard
(see Chapter 17) We will also cover powerful generalizations of wavelets, known as
wavelet packets, that have already made an impact in the standardization world: the FBI
fingerprint compression standard is based on wavelet packets
Although this chapter is about image coding,1which involves two-dimensional (2D)
signals or images, it is much easier to understand the role of wavelets in image coding
using a one-dimensional (1D) framework, as the conceptual extension to 2D is
straight-forward In the interests of clarity, we will therefore consider a 1D treatment here The
story begins with what is known as the time-frequency analysis of the 1D signal As
mentioned, wavelets are a tool for changing the coordinate system in which we represent
the signal: we transform the signal into another domain that is much better suited for
processing, e.g., compression What makes for a good transform or analysis tool? At the
basic level, the goal is to be able to represent all the useful signal features and important
phenomena in as compact a manner as possible It is important to be able to compact the
bulk of the signal energy into the fewest number of transform coefficients: this way, we
can discard the bulk of the transform domain data without losing too much information
For example, if the signal is a time impulse, then the best thing is to do no transforms at
1We use the terms image compression and image coding interchangeably in this chapter.
Trang 9all! Keep the signal information in its original and sparse time-domain representation,
as that will maximize the temporal energy concentration or time resolution However,what if the signal has a critical frequency component (e.g., a low-frequency backgroundsinusoid) that lasts for a long time duration? In this case, the energy is spread out inthe time domain, but it would be succinctly captured in a single frequency coefficient ifone did a Fourier analysis of the signal If we know that the signals of interest are puresinusoids, then Fourier analysis is the way to go But, what if we want to capture boththe time impulse and the frequency impulse with good resolution? Can we get arbitrarilyfine resolution in both time and frequency?
The answer is no There exists an uncertainty theorem (much like what we learn
in quantum physics), which disallows the existence of arbitrary resolution in time andfrequency[2] A good way of conceptualizing these ideas and the role of wavelet basisfunctions is through what is known as time-frequency “tiling” plots, as shown inFig 18.3,which shows where the basis functions live on the time-frequency plane: i.e., where isthe bulk of the energy of the elementary basis elements localized? Consider the Fourier
Trang 1018.1 What Are Wavelets: Why Are They Good for Image Coding? 467
case first As impulses in time are completely spread out in the frequency domain, all
localization is lost with Fourier analysis To alleviate this problem, one typically
decom-poses the signal into finite-length chunks using windows or so-called short-time Fourier
transform (STFT) Then, the time-frequency tradeoffs will be determined by the
win-dow size An STFT expansion consists of basis functions that are shifted versions of
one another in both time and frequency: some elements capture low-frequency events
localized in time, and others capture high-frequency events localized in time, but the
resolution or window size is constant in both time and frequency (seeFig 18.3(a)) Note
that the uncertainty theorem says that the area of these tiles has to be nonzero
Shown inFig 18.3(b)is the corresponding tiling diagram associated with the wavelet
expansion The key difference between this and the Fourier case, which is the critical
point, is that the tiles are not all of the same size in time (or frequency) Some basis
elements have short time windows; others have short frequency windows Of course, the
uncertainty theorem ensures that the area of each tile is constant and nonzero It can be
shown that the basis functions are related to one another by shifts and scales as this is the
key to wavelet analysis
Why are wavelets well suited for image compression? The answer lies in the
time-frequency (or more correctly, space-time-frequency) characteristics of typical natural images,
which turn out to be well captured by the wavelet basis functions shown inFig 18.3(b)
Note that the STFT tiling diagram ofFig 18.3(a)is conceptually similar to what
com-mercial DCT-based image transform coding methods like JPEG use Why are wavelets
inherently a better choice? Looking atFig 18.3(b), one can note that the wavelet basis
offers elements having good frequency resolution at lower frequency (the short and fat
basis elements) while simultaneously offering elements that have good time resolution at
higher frequencies (the tall and skinny basis elements)
This tradeoff works well for natural images and scenes that are typically composed of
a mixture of important long-term low-frequency trends that have larger spatial duration
(such as slowly varying backgrounds like the blue sky, and the surface of lakes) as well
as important transient short duration high-frequency phenomena such as sharp edges
The wavelet representation turns out to be particularly well suited to capturing both
the transient high-frequency phenomena such as image edges (using the tall and skinny
tiles) and long spatial duration low-frequency phenomena such as image backgrounds
(the short and fat tiles) As natural images are dominated by a mixture of these kinds of
events,2wavelets promise to be very efficient in capturing the bulk of the image energy
in a small fraction of the coefficients
To summarize, the task of separating transient behavior from long-term trends is a
very difficult task in image analysis and compression In the case of images, the difficulty
stems from the fact that statistical analysis methods often require the introduction of at
least some local stationarity assumption, i.e., the image statistics do not change abruptly
2 Typical images also contain textures; however, conceptually, textures can be assumed to be a dense
concentration of edges, and so it is fairly accurate to model typical images as smooth regions delimited
by edges.
Trang 11over time In practice, this assumption usually translates into ad hoc methods to blockdata samples for analysis, methods that can potentially obscure important signal features:e.g., if a block is chosen too big, a transient component might be totally neglected whencomputing averages The blocking artifact in JPEG decoded images at low rates is a result
of the block-based DCT approach A fundamental contribution of wavelet theory[3]isthat it provides a unified framework in which transients and trends can be simultaneouslyanalyzed without the need to resort to blocking methods
As a way of highlighting the benefits of having a sparse representation, such as thatprovided by the wavelet decomposition, consider the lowest frequency band in the top
level (Level 3) of the three-level wavelet hierarchy of Lena inFig 18.1 This band is just
a downsampled (by a factor of 82⫽ 64) and smoothed version of the original image
A very simple way of achieving compression is to simply retain this lowpass version andthrow away the rest of the wavelet data, instantly achieving a compression ratio of 64:1.Note that if we want a full-size approximation to the original, we would have to inter-polate the lowpass band by a factor of 64—this can be done efficiently by using a three-stage synthesis filter bank (seeChapter 6) We may also desire better image fidelity, as
we may be compromising high-frequency image detail, especially perceptually importanthigh-frequency edge information This is where wavelets are particularly attractive as theyare capable of capturing most image information in the highly subsampled low-frequencyband and additional localized edge information in spatial clusters of coefficients in thehigh-frequency bands (seeFig 18.1) The bulk of the wavelet data is insignificant andcan be discarded or quantized very coarsely
Another attractive aspect of the coarse-to-fine nature of the wavelet representationnaturally facilitates a transmission scheme that progressively refines the received imagequality That is, it would be highly beneficial to have an encoded bitstream that can
be chopped off at any desired point to provide a commensurate reconstruction imagequality This is known as a progressive transmission feature or as an embedded bitstream(seeFig 18.4) Many modern wavelet image coders have this feature, as will be covered
in more detail inSection 18.5 This is ideally suited, for example, to Internet imageapplications As is well known, the Internet is a heterogeneous mess in terms of thenumber of users and their computational capabilities and effective bandwidths Waveletsprovide a natural way to satisfy users having disparate bandwidth and computationalcapabilities: the low-end users can be provided a coarse quality approximation, whereashigher-end users can use their increased bandwidth to get better fidelity This is also veryuseful for Web browsing applications, where having a coarse quality image with a shortwaiting time may be preferable to having a detailed quality with an unacceptable delay.These are some of the high-level reasons why wavelets represent a superior alternative
to traditional Fourier-based methods for compressing natural images: this is why theJPEG2000 standard[1]uses wavelets instead of the Fourier-based DCT
In this chapter, we will review the salient aspects of the general compression lem and the transform coding paradigm in particular, and highlight the key differencesbetween the class of early subband coders and the recent more advanced class of modern-day wavelet image coders We pick the celebrated embedded zerotree wavelet (EZW)coder as a representative of this latter class, and we describe its operation by using a
Trang 12prob-18.2 The Compression Problem 469
Encoded bitstream
01010001001101001100001010 10010100101100111010010010011 010010111010101011001010101
FIGURE 18.4
Multiresolution wavelet image representation naturally facilitates progressive transmission—
a desirable feature for the transmission of compressed images over heterogeneous packet
networks and wireless channels
simple illustrative example We conclude with more powerful generalizations of the basic
wavelet image coding framework to wavelet packets, which are particularly well suited to
handle special classes of images such as fingerprints
Image compression falls under the general umbrella of data compression, which has been
studied theoretically in the field of information theory[4], pioneered byClaude Shannon
[5]in 1948 Information theory sets the fundamental bounds on compression
perfor-mance theoretically attainable for certain classes of sources This is very useful because
it provides a theoretical benchmark against which one can compare the performance of
more practical but suboptimal coding algorithms
Trang 13Historically, the lossless compression problem came first Here the goal is to compressthe source with no loss of information Shannon showed that given any discrete sourcewith a well-defined statistical characterization (i.e., a probability mass function), there is
a fundamental theoretical limit to how well you can compress the source before you start
to lose information This limit is called the entropy of the source In lay terms, entropy refers to the uncertainty of the source For example, a source that takes on any of N discrete values a1, a2, ,a N with equal probability has an entropy given by log2N bits
per source symbol If the symbols are not equally likely, however, then one can do betterbecause more predictable symbols should be assigned fewer bits The fundamental limit
is the Shannon entropy of the source
Lossless compression of images has been covered inChapter 16 For image coding,typical lossless compression ratios are of the order of 2:1 or at most 3:1 For a 512⫻ 5128-bit grayscale image, the uncompressed representation is 256 Kbytes Lossless compres-sion would reduce this to at best∼80 Kbytes, which may still be excessive for manypractical low-bandwidth transmission applications Furthermore, lossless image com-pression is for the most part overkill, as our human visual system is highly tolerant tolosses in visual information For compression ratios in the range of 10:1 to 40:1 or more,lossless compression cannot do the job, and one needs to resort to lossy compressionmethods
The formulation of the lossy data compression framework was also pioneered byShannon in his work on rate-distortion (RD) theory[6], in which he formalized thetheory of compressing certain limited classes of sources having well-defined statisticalproperties, e.g., independent, identically distributed (i.i.d.) sources having a Gaussiandistribution subject to a fidelity criterion, i.e., subject to a tolerance on the maximumallowable loss or distortion that can be endured Typical distortion measures used aremean square error (MSE) or peak signal-to-noise ratio (PSNR)3between the original andcompressed versions These fundamental compression performance bounds are called
the theoretical RD bounds for the source: they dictate the minimum rate R needed to compress the source if the tolerable distortion level is D (or alternatively, what is the minimum distortion D subject to a bit rate of R) These bounds are unfortunately not
constructive; i.e., Shannon did not give an actual algorithm for attaining these bounds,and furthermore, they are based on arguments that assume infinite complexity and delay,obviously impractical in real life However, these bounds are useful in as much as theyprovide valuable benchmarks for assessing the performance of more practical codingalgorithms The major obstacle of course, as in the lossless case, is that these theoreticalbounds are available only for a narrow class of sources, and it is difficult to make theconnection to real world image sources which are difficult to model accurately withsimplistic statistical models
Shannon’s theoretical RD framework has inspired the design of more practical
operational RD frameworks, in which the goal is similar but the framework is
con-strained to be more practical Within the operational constraints of the chosen coding
3 The PSNR is defined as 10 log 10255
2
and measured in decibels (dB).
Trang 1418.3 The Transform Coding Paradigm 471
framework, the goal of operational RD theory is to minimize the rate R subject to a
distortion constraint D, or vice versa The message of Shannon’s RD theory is that one
can come close to the theoretical compression limit of the source if one considers vectors
of source symbols that get infinitely large in dimension in the limit; i.e., it is a good
idea not to code the source symbols one at a time, but to consider chunks of them at
a time, and the bigger the chunks the better This thinking has spawned an important
field known as vector quantization (VQ)[7], which, as the name indicates, is concerned
with the theory and practice of quantizing sources using high-dimensional VQ There
are practical difficulties arising from making these vectors too high-dimensional because
of complexity constraints, so practical frameworks involve relatively small dimensional
vectors that are therefore further from the theoretical bound
Due to this difficulty, there has been a much more popular image compression
frame-work that has taken off in practice: this is the transform coding frameframe-work[8]that forms
the basis of current commercial image and video compression standards like JPEG and
MPEG (seeChapters 9and10in[9]) The transform coding paradigm can be construed
as a practical special case of VQ that can attain the promised gains of processing source
symbols in vectors through the use of efficiently implemented high dimensional source
transforms
In a typical transform image coding system, the encoder consists of a linear transform
operation, followed by quantization of transform coefficients, and lossless compression
of the quantized coefficients using an entropy coder After the encoded bitstream of an
input image is transmitted over the channel (assumed to be perfect), the decoder undoes
all the functionalities applied in the encoder and tries to reconstruct a decoded image
that looks as close as possible to the original input image, based on the transmitted
information A block diagram of this transform image paradigm is shown inFig 18.5
For the sake of simplicity, let us look at a 1D example of how transform coding is
done (for 2D images, we treat the rows and columns separately as 1D signals) Suppose
we have a two-point signal, x0⫽ 216, x1⫽ 217 It takes 16 bits (8 bits for each sample)
to store this signal in a computer In transform coding, we first put x0and x1in a column
⫺.707
The transform T can
be conceptualized as a counter-clockwise rotation of the signal vector X by 45◦with
respect to the original(x0, x1) coordinate system Alternatively and more conveniently,
one can think of the signal vector as being fixed and instead rotate the(x0, x1) coordinate
system by 45◦clockwise to the new(y1, y0) coordinate system (seeFig 18.6) Note that
the abscissa for the new coordinate system is now y1
Orthogonality of the transform simply means that the length of Y is the same as
the length of X (which is even more obvious when one freezes the signal vector and
Trang 15(a)
Linear transform Quantization
Entropy coding
010111 0.5 b/p
Original
image
Inverse transform
010111 Entropy
decoding
Decoded
image
Inverse quantization
The transform T can be conceptualized as a counter-clockwise rotation of the signal vector X
by 45◦with respect to the original(x0, x1) coordinate system.
rotates the coordinate system as discussed above) This concept still carries over to thecase of high-dimensional transforms If we decide to use the simplest form of quanti-zation known as uniform scalar quantization, where we round off a real number to the
nearest integer multiple of a step size q (say q ⫽ 20), then the quantizer index vector ˆI, which captures what integer multiples of q are nearest to the entries of Y , is given by