1. Trang chủ
  2. » Công Nghệ Thông Tin

The Essential Guide to Image Processing- P16 ppt

30 347 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề The Essential Guide to Image Processing- P16 ppt
Trường học University of Image Processing
Chuyên ngành Image Processing
Thể loại Presentation
Định dạng
Số trang 30
Dung lượng 2,01 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Image coding using the wavelet transform.. Information Technology—JPEG2000 Image Coding System: Part 3, Motion JPEG2000.. JPEG Still Image Data Compression Standard.. The JPEG 2000 still

Trang 1

If the segmentation symbol is not decoded properly, the data in the corresponding bit

plane and of the subsequent bit planes in the code-block should be discarded Finally,

resynchronization markers, including the numbering of packets, are also inserted in front

of each packet in a tile

The performance of JPEG2000 when compared with the JPEG baseline algorithm is

briefly discussed in this section The extensions included in Part 2 of the JPEG2000

standard are also listed

17.11.1 Comparison of Performance

The efficiency of the JPEG2000 lossy coding algorithm in comparison with the JPEG

baseline compression standard has been extensively studied and key results are

sum-marized in[7, 9, 24] The superior RD and error resilience performance, together with

features such as progressive coding by resolution, scalability, and region of interest, clearly

demonstrate the advantages of JPEG2000 over the baseline JPEG (with optimum

Huff-man codes) For coding common test images such as ForeHuff-man and Lena in the range

of 0.125-1.25 bits/pixel, an improvement in the peak signal-to-noise ratio (PSNR) for

JPEG2000 is consistently demonstrated at each compression ratio For example, for the

Foreman image, an improvement of 1.5 to 4 dB is observed as the bits per pixel are reduced

from 1.2 to 0.12[7]

17.11.2 Part 2 Extensions

Most of the technologies that have not been included in Part 1 due to their complexity

or because of intellectual property rights (IPR) issues have been included in Part 2[14]

These extensions concern the use of the following:

■ different offset values for the different image components;

■ different deadzone sizes for the different subbands;

■ TCQ[23];

■ visual masking based on the application of a nonlinearity to the wavelet coefficients

[44, 45];

■ arbitrary wavelet decomposition for each tile component;

■ arbitrary wavelet filters;

■ single sample tile overlap;

■ arbitrary scaling of the ROI coefficients with the necessity to code and transmit the

ROI mask to the decoder;

Trang 2

458 CHAPTER 17 JPEG and JPEG2000

■ nonlinear transformations of component samples and transformations todecorrelate multiple component data;

■ extensions to the JP2 file format

Some sources and links for further information on the standards are provided here

17.12.1 Useful Information and Links for the JPEG Standard

A key source of information on the JPEG compression standard is the book byPennebakerand Mitchell [28] This book also contains the entire text of the official committee draftinternational standard ISO DIS 10918-1 and ISO DIS 10918-2 The official standardsdocument[11]contains information on JPEG Part 3

The JPEG committee maintains an official websitehttp://www.jpeg.org, which tains general information about the committee and its activities, announcements, andother useful links related to the different JPEG standards The JPEG FAQ is located athttp://www.faqs.org/faqs/jpeg-faq/part1/preamble.html

con-Free, portable C code for JPEG compression is available from the Independent JPEGGroup (IJG) Source code, documentation, and test files are included Version 6b isavailable from

17.12.2 Useful Information and Links for the JPEG2000 Standard

Useful sources of information on the JPEG2000 compression standard include two bookspublished on the topic[1, 36] Further information on the different parts of the JPEG2000standard can be found on the JPEG websitehttp://www.jpeg.org/jpeg2000.html Thiswebsite provide links to sites from which various official standards and other documents

Trang 3

can be downloaded It also provides links to sites from which software implementations

of the standard can be downloaded Some software implementations are available at the

following addresses:

■ JJ2000 software that can be accessed athttp://www.jpeg2000.epfl.ch The JJ2000

software is a Java implementation of JPEG2000 Part 1

■ Kakadu software that can be accessed at http://www.ee.unsw.edu.au/taubman/

kakadu The Kakadu software is a C++ implementation of JPEG2000 Part 1

The Kakadu software is provided with the book[36]

■ Jasper software that can be accessed athttp://www.ece.ubc.ca/mdadams/jasper/

Jasperis a C implementation of JPEG2000 that is free for commercial use

[3] A J Ahumada and H A Peterson Luminance model based DCT quantization for color image

compression Human Vision, Visual Processing, and Digital Display III, Proc SPIE, 1666:365–374,

1992.

[4] A Antonini, M Barlaud, P Mathieu, and I Daubechies Image coding using the wavelet transform.

IEEE Trans Image Process., 1(2):205–220, 1992.

[5] E Atsumi and N Farvardin Lossy/lossless region-of-interest image coding based on set partitioning

in hierarchical trees In Proc IEEE Int Conf Image Process., 1(4–7):87–91, October 1998.

[6] A Bilgin, P J Sementilli, and M W Marcellin Progressive image coding using trellis coded

quantization IEEE Trans Image Process., 8(11):1638–1643, 1999.

[7] D Chai and A Bouzerdoum JPEG2000 image compression: an overview Australian and New

Zealand Intelligent Information Systems Conference (ANZIIS’2001), Perth, Australia, 237–241,

November 2001.

[8] C Christopoulos, J Askelof, and M Larsson Efficient methods for encoding regions of interest

in the upcoming JPEG2000 still image coding standard IEEE Signal Process Lett., 7(9):247–249,

2000.

[9] C Christopoulos, A Skodras, and T Ebrahimi The JPEG 2000 still image coding system: an

over-view IEEE Trans Consum Electron., 46(4):1103–1127, 2000.

[10] K W Chun, K W Lim, H D Cho, and J B Ra An adaptive perceptual quantization algorithm for

video coding IEEE Trans Consum Electron., 39(3):555–558, 1993.

[11] ISO/IEC JTC 1/SC 29/WG 1 N 993 Information technology—digital compression and coding of

continuous-tone still images Recommendation T.84 ISO/IEC CD 10918-3 1994.

[12] ISO/IEC International standard 14492 and ITU recommendation T.88 JBIG2 Bi-Level Image

Compression Standard 2000.

[13] ISO/IEC International standard 15444-1 and ITU recommendation T.800 Information

Technology—JPEG2000 Image Coding System 2000.

Trang 4

460 CHAPTER 17 JPEG and JPEG2000

[14] ISO/IEC International standard 15444-2 and ITU recommendation T.801 Information Technology—JPEG2000 Image Coding System: Part 2, Extensions 2001.

[15] ISO/IEC International standard 15444-3 and ITU recommendation T.802 Information Technology—JPEG2000 Image Coding System: Part 3, Motion JPEG2000 2001.

[16] ISO/IEC International standard 15444-4 and ITU recommendation T.803 Information Technology—JPEG2000 Image Coding System: Part 4, Compliance Testing 2001.

[17] ISO/IEC International standard 15444-5 and ITU recommendation T.804 Information Technology—JPEG2000 Image Coding System: Part 5, Reference Software 2001.

[18] N Jayant, R Safranek, and J Johnston Signal compression based on models of human perception.

Proc IEEE, 83:1385–1422, 1993.

[19] JPEG2000 http://www.jpeg.org/jpeg2000/

[20] L Karam Lossless Image Compression, Chapter 15, The Essential Guide to Image Processing Elsevier

Academic Press, Burlington, MA, 2008.

[21] K Konstantinides and D Tretter A method for variable quantization in JPEG for improved text

quality in compound documents In Proc IEEE Int Conf Image Process., Chicago, IL, October 1998.

[22] D Le Gall and A Tabatabai Subband coding of digital images using symmetric short kernel

filters and arithmetic coding techniques In Proc Intl Conf on Acoust., Speech and Signal Process., ICASSP’88, 761–764, April 1988.

[23] M W Marcellin and T R Fisher Trellis coded quantization of memoryless and Gauss-Markov

sources IEEE Trans Commun., 38(1):82–93, 1990.

[24] M W Marcellin, M J Gormish, A Bilgin, and M P Boliek An overview of JPEG2000 In Proc of IEEE Data Compression Conference, 523–541, 2000.

[25] N Memon, C Guillemot, and R Ansari The JPEG Lossless Compression Standards Chapter 5.6, Handbook of Image and Video Processing Elsevier Academic Press, Burlington, MA, 2005 [26] P Moulin Multiscale Image Decomposition and Wavelets, Chapter 6, The Essential Guide to Image Processing Elsevier Academic Press, Burlington, MA, 2008.

[27] W B Pennebaker, J L Mitchell, G G Langdon, and R B Arps An overview of the basic principles

of the q-coder adaptive binary arithmetic coder IBM J Res Dev., 32(6):717–726, 1988.

[28] W B Pennebaker and J L Mitchell JPEG Still Image Data Compression Standard Van Nostrand

Reinhold, New York, 1993.

[29] M Rabbani and R Joshi An overview of the JPEG2000 still image compression standard Elsevier

J Signal Process., 17:3–48, 2002.

[30] V Ratnakar and M Livny RD-OPT: an efficient algorithm for optimizing DCT quantization tables.

IEEE Proc Data Compression Conference (DCC), Snowbird, UT, 332–341, 1995.

[31] K R Rao and P Yip Discrete Cosine Transform—Algorithms, Advantages, Applications Academic

Press, San Diego, CA, 1990.

[32] P J Sementilli, A Bilgin, J H Kasner, and M W Marcellin Wavelet tcq: submission to JPEG2000.

In Proc SPIE, Applications of Digital Processing, 2–12, July 1998.

[33] A Skodras, C Christopoulos, and T Ebrahimi The JPEG 2000 still image compression standard.

IEEE Signal Process Mag., 18(5):36–58, 2001.

[34] B J Sullivan, R Ansari, M L Giger, and H MacMohan Relative effects of resolution and

quanti-zation on the quality of compressed medical images In Proc IEEE Int Conf Image Process., Austin,

TX, 987–991, November 1994.

Trang 5

[35] D Taubman High performance scalable image compression with ebcot IEEE Trans Image Process.,

9(7):1158–1170, 1999.

[36] D Taubman and M.W Marcellin JPEG2000: Image Compression Fundamentals: Standards and

Practice Kluwer Academic Publishers, New York, 2002.

[37] R VanderKam and P Wong Customized JPEG compression for grayscale printing In Proc Data

Compression Conference (DCC), Snowbird, UT, 156–165, 1994.

[38] M Vetterli and J Kovacevic Wavelet and Subband Coding Prentice-Hall, Englewood Cliffs, NJ,

1995.

[39] G K Wallace The JPEG still picture compression standard Commun ACM, 34(4):31–44, 1991.

[40] P W Wang Image Quantization, Halftoning, and Printing Chapter 8.1, Handbook of Image and

Video Processing Elsevier Academic Press, Burlington, MA, 2005.

[41] A B Watson Visually optimal DCT quantization matrices for individual images In Proc IEEE

Data Compression Conference (DCC), Snowbird, UT, 178–187, 1993.

[42] I H Witten, R M Neal, and J G Cleary Arithmetic coding for data compression Commun ACM,

30(6):520–540, 1987.

[43] World Wide Web Consortium (W3C) Extensible Markup Language (XML) 1.0, 3rd ed., T Bray,

J Paoli, C M Sperberg-McQueen, E Maler, F Yergeau, editors, http://www.w3.org/TR/REC-xml ,

2004.

[44] W Zeng, S Daly, and S Lei Point-wise extended visual masking for JPEG2000 image compression.

In Proc IEEE Int Conf Image Process., Vancouver, BC, Canada, vol 1, 657–660, September 2000.

[45] W Zeng, S Daly, and S Lei Visual optimization tools in JPEG2000 In Proc IEEE Int Conf Image

Process., Vancouver, BC, Canada, vol 2, 37–40, September 2000.

Trang 6

18

Wavelet Image Compression

Zixiang Xiong 1 and Kannan Ramchandran 2

1Texas A&M University; 2University of California

FOR IMAGE CODING?

During the past 15 years, wavelets have made quite a splash in the field of image

compression The FBI adopted a wavelet-based standard for fingerprint image

com-pression The JPEG2000 image compression standard[1], which is a much more efficient

alternative to the old JPEG standard (seeChapter 17), is also based on wavelets A natural

question to ask then is why wavelets have made such an impact on image compression

This chapter will answer this question, providing both high-level intuition and

illustra-tive details based on state-of-the-art wavelet-based coding algorithms Visually appealing

time-frequency-based analysis tools are sprinkled in generously to aid in our task

Wavelets are tools for decomposing signals, such as images, into a hierarchy of

increas-ing resolutions: as we consider more and more resolution layers, we get a more and more

detailed look at the image Figure 18.1shows a three-level hierarchy wavelet

decom-position of the popular test image Lena from coarse to fine resolutions (for a detailed

treatment on wavelets and multiresolution decompositions, also seeChapter 6) Wavelets

can be regarded as “mathematical microscopes” that permit one to “zoom in” and “zoom

out” of images at multiple resolutions The remarkable thing about the wavelet

decom-position is that it enables this zooming feature at absolutely no cost in terms of excess

redundancy: for an M ⫻ N image, there are exactly MN wavelet coefficients—exactly the

same as the number of original image pixels (seeFig 18.2)

As a basic tool for decomposing signals, wavelets can be considered as duals to the

more traditional Fourier-based analysis methods that we encounter in traditional

under-graduate engineering curricula Fourier analysis associates the very intuitive engineering

concept of “spectrum” or “frequency content” of the signal Wavelet analysis, in

con-trast, associates the equally intuitive concept of “resolution” or “scale” of the signal At

a functional level, Fourier analysis is to wavelet analysis as spectrum analyzers are to

microscopes

As wavelets and multiresolution decompositions have been described in greater depth

inChapter 6, our focus here will be more on the image compression application Our

goal is to provide a self-contained treatment of wavelets within the scope of their role 463

Trang 7

A three-level hierarchy wavelet decomposition of the 512⫻ 512 color Lena image Level 1

(512⫻ 512) is the one-level wavelet representation of the original Lena at Level 0; Level 2

(256⫻ 256) shows the one-level wavelet representation of the lowpass image at Level 1; andLevel 3 (128⫻ 128) gives the one-level wavelet representation of the lowpass image at Level 2

Trang 8

18.1 What Are Wavelets: Why Are They Good for Image Coding? 465

FIGURE 18.2

A level wavelet representation of the Lena image generated from the top view of the

three-level hierarchy wavelet decomposition inFig 18.1 It has exactly the same number of samples

as in the image domain

in image compression More importantly, our goal is to provide a high-level explanation

for why they are well suited for image compression Indeed, wavelets have superior

properties vis-a-vis the more traditional Fourier-based method in the form of the discrete

cosine transform (DCT) that is deployed in the old JPEG image compression standard

(see Chapter 17) We will also cover powerful generalizations of wavelets, known as

wavelet packets, that have already made an impact in the standardization world: the FBI

fingerprint compression standard is based on wavelet packets

Although this chapter is about image coding,1which involves two-dimensional (2D)

signals or images, it is much easier to understand the role of wavelets in image coding

using a one-dimensional (1D) framework, as the conceptual extension to 2D is

straight-forward In the interests of clarity, we will therefore consider a 1D treatment here The

story begins with what is known as the time-frequency analysis of the 1D signal As

mentioned, wavelets are a tool for changing the coordinate system in which we represent

the signal: we transform the signal into another domain that is much better suited for

processing, e.g., compression What makes for a good transform or analysis tool? At the

basic level, the goal is to be able to represent all the useful signal features and important

phenomena in as compact a manner as possible It is important to be able to compact the

bulk of the signal energy into the fewest number of transform coefficients: this way, we

can discard the bulk of the transform domain data without losing too much information

For example, if the signal is a time impulse, then the best thing is to do no transforms at

1We use the terms image compression and image coding interchangeably in this chapter.

Trang 9

all! Keep the signal information in its original and sparse time-domain representation,

as that will maximize the temporal energy concentration or time resolution However,what if the signal has a critical frequency component (e.g., a low-frequency backgroundsinusoid) that lasts for a long time duration? In this case, the energy is spread out inthe time domain, but it would be succinctly captured in a single frequency coefficient ifone did a Fourier analysis of the signal If we know that the signals of interest are puresinusoids, then Fourier analysis is the way to go But, what if we want to capture boththe time impulse and the frequency impulse with good resolution? Can we get arbitrarilyfine resolution in both time and frequency?

The answer is no There exists an uncertainty theorem (much like what we learn

in quantum physics), which disallows the existence of arbitrary resolution in time andfrequency[2] A good way of conceptualizing these ideas and the role of wavelet basisfunctions is through what is known as time-frequency “tiling” plots, as shown inFig 18.3,which shows where the basis functions live on the time-frequency plane: i.e., where isthe bulk of the energy of the elementary basis elements localized? Consider the Fourier

Trang 10

18.1 What Are Wavelets: Why Are They Good for Image Coding? 467

case first As impulses in time are completely spread out in the frequency domain, all

localization is lost with Fourier analysis To alleviate this problem, one typically

decom-poses the signal into finite-length chunks using windows or so-called short-time Fourier

transform (STFT) Then, the time-frequency tradeoffs will be determined by the

win-dow size An STFT expansion consists of basis functions that are shifted versions of

one another in both time and frequency: some elements capture low-frequency events

localized in time, and others capture high-frequency events localized in time, but the

resolution or window size is constant in both time and frequency (seeFig 18.3(a)) Note

that the uncertainty theorem says that the area of these tiles has to be nonzero

Shown inFig 18.3(b)is the corresponding tiling diagram associated with the wavelet

expansion The key difference between this and the Fourier case, which is the critical

point, is that the tiles are not all of the same size in time (or frequency) Some basis

elements have short time windows; others have short frequency windows Of course, the

uncertainty theorem ensures that the area of each tile is constant and nonzero It can be

shown that the basis functions are related to one another by shifts and scales as this is the

key to wavelet analysis

Why are wavelets well suited for image compression? The answer lies in the

time-frequency (or more correctly, space-time-frequency) characteristics of typical natural images,

which turn out to be well captured by the wavelet basis functions shown inFig 18.3(b)

Note that the STFT tiling diagram ofFig 18.3(a)is conceptually similar to what

com-mercial DCT-based image transform coding methods like JPEG use Why are wavelets

inherently a better choice? Looking atFig 18.3(b), one can note that the wavelet basis

offers elements having good frequency resolution at lower frequency (the short and fat

basis elements) while simultaneously offering elements that have good time resolution at

higher frequencies (the tall and skinny basis elements)

This tradeoff works well for natural images and scenes that are typically composed of

a mixture of important long-term low-frequency trends that have larger spatial duration

(such as slowly varying backgrounds like the blue sky, and the surface of lakes) as well

as important transient short duration high-frequency phenomena such as sharp edges

The wavelet representation turns out to be particularly well suited to capturing both

the transient high-frequency phenomena such as image edges (using the tall and skinny

tiles) and long spatial duration low-frequency phenomena such as image backgrounds

(the short and fat tiles) As natural images are dominated by a mixture of these kinds of

events,2wavelets promise to be very efficient in capturing the bulk of the image energy

in a small fraction of the coefficients

To summarize, the task of separating transient behavior from long-term trends is a

very difficult task in image analysis and compression In the case of images, the difficulty

stems from the fact that statistical analysis methods often require the introduction of at

least some local stationarity assumption, i.e., the image statistics do not change abruptly

2 Typical images also contain textures; however, conceptually, textures can be assumed to be a dense

concentration of edges, and so it is fairly accurate to model typical images as smooth regions delimited

by edges.

Trang 11

over time In practice, this assumption usually translates into ad hoc methods to blockdata samples for analysis, methods that can potentially obscure important signal features:e.g., if a block is chosen too big, a transient component might be totally neglected whencomputing averages The blocking artifact in JPEG decoded images at low rates is a result

of the block-based DCT approach A fundamental contribution of wavelet theory[3]isthat it provides a unified framework in which transients and trends can be simultaneouslyanalyzed without the need to resort to blocking methods

As a way of highlighting the benefits of having a sparse representation, such as thatprovided by the wavelet decomposition, consider the lowest frequency band in the top

level (Level 3) of the three-level wavelet hierarchy of Lena inFig 18.1 This band is just

a downsampled (by a factor of 82⫽ 64) and smoothed version of the original image

A very simple way of achieving compression is to simply retain this lowpass version andthrow away the rest of the wavelet data, instantly achieving a compression ratio of 64:1.Note that if we want a full-size approximation to the original, we would have to inter-polate the lowpass band by a factor of 64—this can be done efficiently by using a three-stage synthesis filter bank (seeChapter 6) We may also desire better image fidelity, as

we may be compromising high-frequency image detail, especially perceptually importanthigh-frequency edge information This is where wavelets are particularly attractive as theyare capable of capturing most image information in the highly subsampled low-frequencyband and additional localized edge information in spatial clusters of coefficients in thehigh-frequency bands (seeFig 18.1) The bulk of the wavelet data is insignificant andcan be discarded or quantized very coarsely

Another attractive aspect of the coarse-to-fine nature of the wavelet representationnaturally facilitates a transmission scheme that progressively refines the received imagequality That is, it would be highly beneficial to have an encoded bitstream that can

be chopped off at any desired point to provide a commensurate reconstruction imagequality This is known as a progressive transmission feature or as an embedded bitstream(seeFig 18.4) Many modern wavelet image coders have this feature, as will be covered

in more detail inSection 18.5 This is ideally suited, for example, to Internet imageapplications As is well known, the Internet is a heterogeneous mess in terms of thenumber of users and their computational capabilities and effective bandwidths Waveletsprovide a natural way to satisfy users having disparate bandwidth and computationalcapabilities: the low-end users can be provided a coarse quality approximation, whereashigher-end users can use their increased bandwidth to get better fidelity This is also veryuseful for Web browsing applications, where having a coarse quality image with a shortwaiting time may be preferable to having a detailed quality with an unacceptable delay.These are some of the high-level reasons why wavelets represent a superior alternative

to traditional Fourier-based methods for compressing natural images: this is why theJPEG2000 standard[1]uses wavelets instead of the Fourier-based DCT

In this chapter, we will review the salient aspects of the general compression lem and the transform coding paradigm in particular, and highlight the key differencesbetween the class of early subband coders and the recent more advanced class of modern-day wavelet image coders We pick the celebrated embedded zerotree wavelet (EZW)coder as a representative of this latter class, and we describe its operation by using a

Trang 12

prob-18.2 The Compression Problem 469

Encoded bitstream

01010001001101001100001010 10010100101100111010010010011 010010111010101011001010101

FIGURE 18.4

Multiresolution wavelet image representation naturally facilitates progressive transmission—

a desirable feature for the transmission of compressed images over heterogeneous packet

networks and wireless channels

simple illustrative example We conclude with more powerful generalizations of the basic

wavelet image coding framework to wavelet packets, which are particularly well suited to

handle special classes of images such as fingerprints

Image compression falls under the general umbrella of data compression, which has been

studied theoretically in the field of information theory[4], pioneered byClaude Shannon

[5]in 1948 Information theory sets the fundamental bounds on compression

perfor-mance theoretically attainable for certain classes of sources This is very useful because

it provides a theoretical benchmark against which one can compare the performance of

more practical but suboptimal coding algorithms

Trang 13

Historically, the lossless compression problem came first Here the goal is to compressthe source with no loss of information Shannon showed that given any discrete sourcewith a well-defined statistical characterization (i.e., a probability mass function), there is

a fundamental theoretical limit to how well you can compress the source before you start

to lose information This limit is called the entropy of the source In lay terms, entropy refers to the uncertainty of the source For example, a source that takes on any of N discrete values a1, a2, ,a N with equal probability has an entropy given by log2N bits

per source symbol If the symbols are not equally likely, however, then one can do betterbecause more predictable symbols should be assigned fewer bits The fundamental limit

is the Shannon entropy of the source

Lossless compression of images has been covered inChapter 16 For image coding,typical lossless compression ratios are of the order of 2:1 or at most 3:1 For a 512⫻ 5128-bit grayscale image, the uncompressed representation is 256 Kbytes Lossless compres-sion would reduce this to at best∼80 Kbytes, which may still be excessive for manypractical low-bandwidth transmission applications Furthermore, lossless image com-pression is for the most part overkill, as our human visual system is highly tolerant tolosses in visual information For compression ratios in the range of 10:1 to 40:1 or more,lossless compression cannot do the job, and one needs to resort to lossy compressionmethods

The formulation of the lossy data compression framework was also pioneered byShannon in his work on rate-distortion (RD) theory[6], in which he formalized thetheory of compressing certain limited classes of sources having well-defined statisticalproperties, e.g., independent, identically distributed (i.i.d.) sources having a Gaussiandistribution subject to a fidelity criterion, i.e., subject to a tolerance on the maximumallowable loss or distortion that can be endured Typical distortion measures used aremean square error (MSE) or peak signal-to-noise ratio (PSNR)3between the original andcompressed versions These fundamental compression performance bounds are called

the theoretical RD bounds for the source: they dictate the minimum rate R needed to compress the source if the tolerable distortion level is D (or alternatively, what is the minimum distortion D subject to a bit rate of R) These bounds are unfortunately not

constructive; i.e., Shannon did not give an actual algorithm for attaining these bounds,and furthermore, they are based on arguments that assume infinite complexity and delay,obviously impractical in real life However, these bounds are useful in as much as theyprovide valuable benchmarks for assessing the performance of more practical codingalgorithms The major obstacle of course, as in the lossless case, is that these theoreticalbounds are available only for a narrow class of sources, and it is difficult to make theconnection to real world image sources which are difficult to model accurately withsimplistic statistical models

Shannon’s theoretical RD framework has inspired the design of more practical

operational RD frameworks, in which the goal is similar but the framework is

con-strained to be more practical Within the operational constraints of the chosen coding

3 The PSNR is defined as 10 log 10255

2

and measured in decibels (dB).

Trang 14

18.3 The Transform Coding Paradigm 471

framework, the goal of operational RD theory is to minimize the rate R subject to a

distortion constraint D, or vice versa The message of Shannon’s RD theory is that one

can come close to the theoretical compression limit of the source if one considers vectors

of source symbols that get infinitely large in dimension in the limit; i.e., it is a good

idea not to code the source symbols one at a time, but to consider chunks of them at

a time, and the bigger the chunks the better This thinking has spawned an important

field known as vector quantization (VQ)[7], which, as the name indicates, is concerned

with the theory and practice of quantizing sources using high-dimensional VQ There

are practical difficulties arising from making these vectors too high-dimensional because

of complexity constraints, so practical frameworks involve relatively small dimensional

vectors that are therefore further from the theoretical bound

Due to this difficulty, there has been a much more popular image compression

frame-work that has taken off in practice: this is the transform coding frameframe-work[8]that forms

the basis of current commercial image and video compression standards like JPEG and

MPEG (seeChapters 9and10in[9]) The transform coding paradigm can be construed

as a practical special case of VQ that can attain the promised gains of processing source

symbols in vectors through the use of efficiently implemented high dimensional source

transforms

In a typical transform image coding system, the encoder consists of a linear transform

operation, followed by quantization of transform coefficients, and lossless compression

of the quantized coefficients using an entropy coder After the encoded bitstream of an

input image is transmitted over the channel (assumed to be perfect), the decoder undoes

all the functionalities applied in the encoder and tries to reconstruct a decoded image

that looks as close as possible to the original input image, based on the transmitted

information A block diagram of this transform image paradigm is shown inFig 18.5

For the sake of simplicity, let us look at a 1D example of how transform coding is

done (for 2D images, we treat the rows and columns separately as 1D signals) Suppose

we have a two-point signal, x0⫽ 216, x1⫽ 217 It takes 16 bits (8 bits for each sample)

to store this signal in a computer In transform coding, we first put x0and x1in a column

⫺.707



The transform T can

be conceptualized as a counter-clockwise rotation of the signal vector X by 45◦with

respect to the original(x0, x1) coordinate system Alternatively and more conveniently,

one can think of the signal vector as being fixed and instead rotate the(x0, x1) coordinate

system by 45◦clockwise to the new(y1, y0) coordinate system (seeFig 18.6) Note that

the abscissa for the new coordinate system is now y1

Orthogonality of the transform simply means that the length of Y is the same as

the length of X (which is even more obvious when one freezes the signal vector and

Trang 15

(a)

Linear transform Quantization

Entropy coding

010111 0.5 b/p

Original

image

Inverse transform

010111 Entropy

decoding

Decoded

image

Inverse quantization

The transform T can be conceptualized as a counter-clockwise rotation of the signal vector X

by 45◦with respect to the original(x0, x1) coordinate system.

rotates the coordinate system as discussed above) This concept still carries over to thecase of high-dimensional transforms If we decide to use the simplest form of quanti-zation known as uniform scalar quantization, where we round off a real number to the

nearest integer multiple of a step size q (say q ⫽ 20), then the quantizer index vector ˆI, which captures what integer multiples of q are nearest to the entries of Y , is given by

Ngày đăng: 01/07/2014, 10:43

TỪ KHÓA LIÊN QUAN