Mã nén Lecture_1

Mã nén

Trang 1

Introduction to Data Compression

Data compression seeks to reduce the number of bits used to store or transmit information

Trang 2

Lecture 1

Source Coding and Statistical Modeling

Trang 3

Set of symbols (alphabet) S={s1, s2, …, sN},

N is number of symbols in the alphabet

Probability distribution of the symbols: P={p1, p2, …, pN}

According to Shannon, the entropy H of an informationsource S is defined as follows:

Trang 4

The amount of information in symbol si,

in other words, the number of bits to code or code lengthfor the symbol si:

) (

log )

The average number of bits for the source S:

Trang 5

Entropy for binary source: N=2

)) 1

( log )

1 ( log

Trang 6

Entropy for uniform distribution: pi=1/N

Uniform distribution of probabilities: pi=1/N:

) (

log )

/ 1 ( log )

/ 1

Pi=1/N

Trang 7

How to get the probability distribution?

3) Adaptive (dynamic) modeling:

a) One-pass method: analysis and encoding

b) Updating the model during encoding/decoding c) No side information

Trang 8

Static vs Dynamic: Example

S = {a,b,c}; Data: a,a,b,a,a,c,a,a,b,a.

Trang 9

3) Adaptive method: Example

S = {a,b,c}; Data: a,a,b,a,a,c,a,a,b,a.

1.16 < 1.45 < 1.58

S.-Ad Ad Static

Trang 10

Shannon-Fano Code: A top-down approach

1) Sort symbols according their probabilities:

p1 ≤ p2 ≤ … ≤ pN

2) Recursively divide into parts, each with approx the same number of counts (probability)

Trang 11

Shannon-Fano Code: Example (1 step)

A,B

6+6+5=17

Trang 12

Trang 13

Trang 14

Shannon-Fano Code: Example (Result)

Symbol pi -log2(pi) Code Subtotal

1 0

Binary tree

Trang 15

Shannon-Fano Code: Encoding

Binary tree

Message: B A B A C A C A D ECodes: 01 00 01 00 10 00 10 00 110 111Bitstream: 0100010010001000110111

Trang 16

Shannon-Fano Code: Decoding

Binary tree

Bitstream: 0100010010001000110111 (23 bits)Codes: 01 00 01 00 10 00 10 00 110 111

Messaage: B A B A C A C A D E

Trang 17

Huffman Code: A bottom-up approach

b) Assign the sum of the children’s probabilities

to the parent node and inset it into OPEN

c) Assign code 0 and 1 to the two branches of the tree, and delete the children from OPEN

Trang 18

Huffman Code: Example

Symbol pi -log2(pi) Code Subtotal

1 0

A

1 B

Binary tree

Trang 19

Huffman Code: Decoding

1 0

A

1 B

Binary tree

Bitstream: 1000100010101010110111 (22 bits)Codes: 100 0 100 0 101 0 101 0 110 111

Message: B A B A C A C A D E

Trang 20

Properties of Huffman code

• Optimum code for a given data set requires two passes

• Code construction complexity O(N logN)

• Fast lookup table based implementation

• Requires at least one bit per symbol

• Average codeword length is within one bit of zero-order entropy (Tighter bounds are known): H ≤ R < H+1 bit

• Susceptible to bit errors

Trang 21

Unique prefix property

No code is a prefix to any other code, all symbols are

the leaf nodes

0 0

1 1

A

C B

Shannon-Fano and Huffman codes are prefix codes

Legend: Shannon (1948) and Fano (1949);

Huffman (1952) was student of Fano at MIT.

(D)

NOT prefix

Trang 22

Context modeling: Zero-order model

Zero-order model:

a) The symbols are statistically independent; b) The probability of the symbol si is pi

) (

If all symbols are equiprobable, p i =1/N:

Trang 23

Context modeling: First-order model

The main idea: use correlation between symbols or pixels!

p(c j) is probability of the context cj, k is the number of contexts

p(s i |c j) is the conditional probability that the current element is the symbol s i given that the context is c j

j i j

i j

Trang 24

Entropy:

Trang 25

First-order model: Pixel above (1)

Trang 26

First-order model: Pixel above (2)

j i j

i j

Trang 27

Zero order model: H= 0.544 bits

First-order model (pixel above): H = 0.518 bits

First-order model (pixel left): H = 0.113

bits

Tiêu đề	Source Coding and Statistical Modeling
Tác giả	Alexander Kolesnikov
Trường học	Standard University
Chuyên ngành	Data Compression
Thể loại	Bài giảng
Năm xuất bản	2023
Thành phố	City Name

Định dạng
Số trang	27
Dung lượng	187,5 KB