Chapter 9: Basic compression algorithms. The following will be discussed in this chapter: Modeling and compression, basics of information theory, entropy example, Shannon’s coding theorem, compression in multimedia data, lossless vs lossy compression,...
Trang 1CM3106 Chapter 9:
Basic Compression Algorithms
Prof David Marshall
Trang 2Modeling and Compression
We are interested in modelingmultimedia data
To model means to replace somethingcomplex with a
simpler(= shorter) analog
Some models help understand the original
phenomenon/data better:
Example: Laws of physics
Huge arrays of astronomical observations (e.g Tycho Brahe’slogbooks) summarised in a few characters (e.g Kepler,
Newton):
|F| = GM1M2
r2
We will look at models whose purpose is primarily
compression of multimedia data
Trang 3Recap: The Need for Compression
Raw video, image, and audio files can be very large
Example: One minute of uncompressed audio
Audio Type 44.1 KHz 22.05 KHz 11.025 KHz
16 Bit Stereo: 10.1 MB 5.05 MB 2.52 MB
16 Bit Mono: 5.05 MB 2.52 MB 1.26 MB
8 Bit Mono: 2.52 MB 1.26 MB 630 KB
Example: Uncompressed images
512 x 512 Monochrome 0.25 MB
512 x 512 8-bit colour image 0.25 MB
512 x 512 24-bit colour image 0.75 MB
Trang 4Recap: The Need for Compression
Example: Videos (involves a stream of audio plus video
imagery)
Raw Video — uncompressed image frames 512x512 TrueColour at 25 FPS = 1125 MB/min
HDTV (1920 × 1080) —Gigabytes per minute
uncompressed, True Colour at 25 FPS = 8.7 GB/min
Relying on higher bandwidths is not a good option —
M25 Syndrome: traffic will always increase to fill the
current bandwidth limit whatever this is
CompressionHAS TO BE part of the representation of
audio, image, and video formats
Trang 5Basics of Information Theory
Suppose we have an information source (random variable) Swhich emits symbols {s1, s2, , sn} with probabilities
p1, p2, , pn According to Shannon, the entropy of S is
When a symbol with probability pi is transmitted, it
reduces the amount of uncertaintyin the receiver by a
factor of 1p
i.log2p1
i = −log2pi indicates the amount of informationconveyed by si, i.e., the number of bits needed to code si(Shannon’s coding theorem)
Trang 6Entropy Example
Example: Entropy of a fair coin
The coin emits symbols s1 = heads and s2 =tails with
p1 = p2 =1/2 Therefore, the entropy if this source is:
H(coin) = −(1/2 × log21/2 + 1/2 × log21/2) =
−(1/2 × −1 + 1/2 × −1) = −(−1/2 − 1/2) = 1 bit
Example: Grayscale image
In an image with uniformdistribution of gray-level
intensity (and all pixels independent), i.e pi=1/256,
then
The # of bits needed to code each gray level is 8 bits.The entropy of this image is 8
Trang 7Entropy Example
Example: Breakfast order #1
Alice: “What do you want for breakfast: pancakes or eggs? I am
unsure, because you like them equally (p1 = p2 =1/2) ”
Bob: “I want pancakes.”
Trang 8Entropy Example
Example: Breakfast order #1
Alice: “What do you want for breakfast: pancakes or eggs? I amunsure, because you like them equally (p1 = p2 =1/2) ”
Bob: “I want pancakes.”
Trang 9Entropy Example
Example: Breakfast order #2
Alice: “What do you want for breakfast: pancakes, eggs, or salad?
I am unsure, because you like them equally
(p1= p2= p3 =1/3) ”
Bob: “Eggs.”
Question: What is Bob’s entropy assuming he behaves like a
random variable = how much information has Bob communicated
Trang 10Entropy Example
Example: Breakfast order #2
Alice: “What do you want for breakfast: pancakes, eggs, or salad?
I am unsure, because you like them equally
(p1= p2= p3 =1/3) ”
Bob: “Eggs.”
Question: What is Bob’s entropy assuming he behaves like a
random variable = how much information has Bob communicated
Trang 11Entropy Example
Example: Breakfast order #3
Alice: “What do you want for breakfast: pancakes, eggs, or salad?
I am unsure, because you like them equally
(p1= p2= p3 =1/3) ”
Question: How much information has Bob communicated to
Alice?
Answer: He has reduced her uncertainty by a factor of 3/2(leaving 2 out of 3 equal options), therefore transmittedlog23/2 ≈ 0.58 bits
Trang 12Entropy Example
Example: Breakfast order #3
Alice: “What do you want for breakfast: pancakes, eggs, or salad?
I am unsure, because you like them equally
(p1= p2= p3 =1/3) ”
Question: How much information has Bob communicated to
Alice?
Answer: He has reduced her uncertainty by a factor of 3/2
(leaving 2 out of 3 equal options), therefore transmitted
log23/2 ≈ 0.58 bits
Trang 13Shannon’s Experiment (1951)
Estimated entropy for English text: HEnglish≈ 0.6 − 1.3
bits/letter (If all letters and space were equally probable, then
it would be H0 =log227 ≈ 4.755 bits/letter.)
External link: Shannon’s original 1951 paper
External link: Java applet recreating Shannon’s experiment
Trang 14Shannon’s coding theorem
Shannon 1948
Basically:
The optimal code length for an event with probability p is
L(p) = −log2p ones and zeros (or generally, −logbpif
instead we useb possible values for codes)
External link: Shannon’s original 1948 paper
Trang 15Shannon vs Kolmogorov
What if we have afinite string?
Shannon’s entropy is astatistical measure ofinformation We can “cheat” and regard astring as infinitely long sequence of i.i.d ran-dom variables Shannon’s theorem then ap-proximately applies
Kolmogorov Complexity: Basically, thelength of the shortest program that ouputs
a given string Algorithmicalmeasure of formation
in-K(S) is not computable!Practical algorithmic compression ishard
Trang 16Compression in Multimedia Data
Compression basically employs redundancy in the data:
Temporal in 1D data, 1D signals, audio, between video framesetc
Spatial correlation between neighbouring pixels or data items
Spectral e.g correlation between colour or luminescence
components This uses the frequency domain to exploit
relationships between frequency of change in data
Psycho-visual exploit perceptual properties of the human
visual system
Trang 17Lossless vs Lossy Compression
Compression can be categorised in two broad ways:
Lossless Compression: after decompression gives an exact copy
of the original data
Example: Entropy encoding schemes (Shannon-Fano,
Huffman coding), arithmetic coding, LZW algorithm (used inGIF image file format)
Lossy Compression: after decompression gives ideally a
“close” approximation of the original data, ideally
perceptually lossless
Example: Transform coding — FFT/DCT based quantisationused in JPEG/MPEG differential encoding, vector
quantisation
Trang 18Why Lossy Compression?
Lossy methods are typicallyapplied to high resoultion
audio, image compression
Have to be employedin video compression (apart from
special cases)
Basic reason:
Compression ratio of lossless methods (e.g Huffman
coding, arithmetic coding, LZW) is not high enough foraudio/video
By cleverly making a small sacrifice in terms of fidelity ofdata, we can often achieve very highcompression ratios
Cleverly = sacrifice information that is psycho-physicallyunimportant
Trang 19Lossless Compression Algorithms
Repetitive Sequence Suppression
Run-Length Encoding (RLE)
Trang 20Simple Repetition Suppression
If a sequence a series on n successive tokens appears:
Replace series with a token and a count number of
occurrences
Usually need to have a specialflagto denote when the
repeated token appears
Trang 21Simple Repetition Suppression
Fairly straight forward to understand and implement
Simplicity is its downfall: poor compression ratios
Compression savings depend on the content of the data
Applicationsof this simple compression technique include:
Suppression of zeros in a file (Zero Length Suppression)
Silence in audio data, pauses in conversation etc
Sparse matrices
Component of JPEG
Bitmaps, e.g backgrounds in simple images
Blanks in text or program source files
Other regular image or data tokens
Trang 22Run-length Encoding (RLE)
This encoding method is frequently applied to graphics-typeimages (or pixels in a scan line) — simple compression
algorithm in its own right
It is also a component used in JPEG compression pipeline
Basic RLE Approach (e.g for images):
Sequences of image elements X1, X2, , Xn (row by
Trang 23Run-length Encoding Example
Original sequence:
111122233333311112222
can be encoded as:
(1,4),(2,3),(3,6),(1,4),(2,4)
How Much Compression?
The savings are dependent on the data: In the worst case
(random noise) encoding is more heavy than original file:
2×integerrather than 1×integerif original data is integer
vector/array
MATLAB example code:
rle.m(run-length encode) , rld.m (run-length decode)
Trang 24Pattern Substitution
This is a simple form ofstatistical encoding
Here we substitute a frequently repeating pattern(s) with
a code
The code is shorter than the pattern giving us
compression
The simplest scheme could employ predefined codes:
Example: Basic Pattern Substitution
Replace all occurrences of pattern of characters ‘and’ with thepredefined code ’&’ So:
and you and I
becomes:
& you & I
Trang 25Reducing Number of Bits per Symbol
For the sake of example, consider character sequences here
(Othertoken streams can be used — e.g vectorised image
blocks, binary streams.)
Example: Compression ASCII Characters EIEIO
E(69)
z }| {01000101
I(73)
z }| {01001001
O(79)
z }| {01001111
=5 × 8 = 40 bits
To compress, we aim to find a way to describe the same
information usingless bits per symbol, e.g :
E (2 bits)
z}|{
xx
I (2 bits)z}|{
yy
E (2 bits)z}|{
xx
I (2 bits)z}|{
yy
O (3 bits)z}|{
Oz}|{
3 =11 bits
Trang 26Code Assignment
A predefined codebook may be used, i.e assign code ci
to symbol si (E.g some dictionary of common
words/tokens)
Better: dynamically determine best codes from data
Theentropy encoding schemes (next topic) basically
attempt to decide the optimum assignment of codes toachieve the best compression
Example:
Count occurrence of tokens (to estimate probabilities)
Assign shorter codes to more probable symbols and viceversa
Ideallywe should aim to achieve Shannon’s limit: −logbp!
Trang 27Morse code
Morse code makes anattempt to approach optimal code
length: observe that frequent characters (E, T, ) are
encoded with few dots/dashes and vice versa:
Trang 28The Shannon-Fano Algorithm — Learn by Example
This is a basic information theoretic algorithm
A simple example will be used to illustrate the algorithm:
Trang 29The Shannon-Fano Algorithm — Learn by ExampleEncoding for the Shannon-Fano Algorithm
A top-down approach:
1 Sort symbols according to their
frequencies/probabilities, e.g ABCDE
2 Recursively divide into two parts, each with approximatelysame number of counts, i.e split in two so as to minimisedifference in counts Left group gets0, rightgroup gets1
Trang 30The Shannon-Fano Algorithm — Learn by Example
3 Assemble codebook by depth first traversal of the tree:
Raw token stream 8 bits per (39 chars) = 312 bits
Coded data stream = 89 bits
Trang 31Shannon-Fano Algorithm: Entropy
For the above example:
Trang 32Shannon-Fano Algorithm: Discussion
Best way to understand: consider best case example
If we couldalways subdivide exactly in half, we would get
ideal code:
uncertainty by a factor 2, so transmit 1 bit
Otherwise, when counts are only approximately equal, weget only good, but not ideal code
Compare with a fair vs biased coin
Trang 33Huffman Algorithm
Can we do better than Shannon-Fano?
Huffman! Always produces best binary tree for given
probabilities
A bottom-up approach:
1 Initialization: put all nodes in a list L, keep it sorted at alltimes (e.g., ABCDE)
2 Repeat until the list L has more than one node left:
From L pick two nodes having the lowestfrequencies/probabilities, create a parent node of them.Assign the sum of the children’s frequencies/probabilities
to the parent node and insert it into L
Assign code 0/1 to the two branches of the tree, anddelete the children from L
3 Coding of each node is a top-down label of branch labels
Trang 34Huffman Encoding Example
ACABADADEAABBAAAEDCACDEAAABCDBBEDCBACAE (same string as
Trang 35Huffman Encoder Discussion
The following points are worth noting about the above
algorithm:
Decoding for the above two algorithms is trivial as long asthe coding table/book is sent before the data
There is a bit of an overhead for sending this
But negligible if the data file is big
Unique Prefix Property: no code is a prefix to any
other code (all symbols are at the leaf nodes) → great fordecoder, unambiguous
If prior statistics are available and accurate, then Huffmancoding is very good
Trang 37Huffman Coding of Images
In order to encode images:
Divide image up into (typically) 8x8 blocks
Each block is a symbol to be coded
Compute Huffman codes for set of blocks
Encode blocks accordingly
InJPEG: Blocks are DCT coded first before Huffman
may be applied (more soon)
Coding image in blocks is common to all image coding
methods
huffman.m (Used with JPEG code later),
huffman.zip (Alternative with tree plotting)
Trang 38Arithmetic Coding
What is wrong with Huffman?
Huffman coding etc use an integer number (k) of
1/0s for each symbol, hence k is never less than 1
Idealcode according to Shannon may not be integernumber of 1/0s!
Example: Huffman Failure Case
Consider a biased coin with pheads= q =0.999 and
ptails =1 − q
Suppose we use Huffmanto generate codes for heads andtails and send 1000 heads
This would require 1000 ones and zeros with Huffman!
Shannon tells us: ideallythis should be
−log2pheads≈ 0.00144 ones and zeros, so ≈ 1.44 for
entire string
Trang 39Arithmetic Coding
Solution: Arithmetic coding
A widely used entropy coder
Also used inJPEG — more soon
Only problem is its speed due possibly complex
computations due to large symbol tables
Good compression ratio (better than Huffman coding),
entropy around the Shannon ideal value
Here we describe basic approach of Arithmetic Coding
Trang 40Arithmetic Coding: Basic Idea
The idea behind arithmetic coding is: encode the entire
message into a single number, n, (0.0 6 n < 1.0)
Consider a probability line segment, [0 1), and
Assign to every symbol a range in this interval:
Range proportional to probabilitywith
Position at cumulative probability
Once we have defined the ranges and the probability line:
Start to encode symbols
Every symbol defines where the outputreal number landswithin the range
Trang 41Simple Arithmetic Coding Example
Assume we have the following string: BACA
Therefore:
A occurs with probability 0.5
B and C with probabilities 0.25
Start by assigning each symbol to the probability range
We now know that the code will be in the range 0.5 to
0.74999
Trang 42Simple Arithmetic Coding Example
Range is not yet unique
Need to narrow down the range to give us a unique code.Basic arithmetic coding iteration:
Subdivide the range for the first symbol given the
probabilities of the second symbol then the symbol etc.For all the symbols:
range = high - low;
high = low + range * high_range of the symbol being coded; low = low + range * low_range of the symbol being coded;
Where:
range, keeps track of where the next range should be
highand low, specify the output number
Initially high = 1.0, low = 0.0
Trang 43Simple Arithmetic Coding Example
For the second symbol we have:
(nowrange = 0.25, low= 0.5, high = 0.75):
BAC [0.59375, 0.625)
Trang 44Simple Arithmetic Coding Example
Subdivide again:
(range = 0.03125, low = 0.59375,high = 0.625):
BACA [0.59375, 0.60937)BACB [0.609375, 0.6171875)BACC [0.6171875, 0.625)
So the (unique) output code for BACA is any number in the
range:
[0.59375, 0.60937)
Trang 45To decodeis essentially the opposite:
We compile the table for the sequence given probabilities.Find the range of number within which the code numberlies and carry on