CSP1HT based scalable video codec for layered video streaming

CSPIHT BASED SCALABLE VIDEO CODEC FOR LAYERED VIDEO STREAMING FENG WEI B.. The layered 3D-CSPIHT codec introduces layering of encoded bit streams to support layered scalable video str

Trang 1

CSPIHT BASED SCALABLE VIDEO CODEC FOR

LAYERED VIDEO STREAMING

FENG WEI

(B Eng (Hons) , Xi’an Jiaotong University)

A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF ENGINEERING

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

NATIONAL UNIVERSITY OF SINGAPORE

2003

Trang 2

Last but not least, I wish to thank my boyfriend Huang Qijie for his support all the way along Almost all of my progress was made when he is by my side

Trang 3

TABLE OF CONTENTS

ACKNOWLEDGEMENT……….…… ………… i

TABLE OF CONTENTS……….…… ………… ii

LIST OF FIGURES.……….…… ……… iv

LIST OF TABLES……….….……….…vii

SUMMARY……… ………… viii

CHAPTER 1 INTRODUCTION……… ……….…… 1

CHAPTER 2 IMAGE AND VIDEO CODING……… ……….…… 5

2.1 Transform Coding……… …… ……… 5

2.1.1 Linear Transform… …….……… … ……….6

2.1.2 Quantization……….……… ……….….7

2.1.3 Arithmetic Coding……….……….……… ……… 8

2.1.4 Binary Coding……….……… ….… ………….10

2.2 Video Compression Using MEMC……… ……….10

2.3 Wavelet Based Image and Video Coding……… … ………12

2.3.1 Discrete Wavelet Transform……….……….13

2.3.2 EZW Coding ……….……….……… …………16

2.3.3 SPIHT Coding Scheme……….……… ……… …18

2.3.4 Scalability……….……… ………… 23

2.4 Image and Video Coding Standards……… ………… 25

CHAPTER 3 VIDEO STREAMING AND NETWORK QoS……… …….…… 25

3.1 Video Streaming Models……….….…….25

3.2 Characteristics and Challenges of Video Streaming……… ………26

3.3 Quality of Service……….… ……27

3.3.1 Definition of QoS …… ….……….……… …… 27

3.3.2 IntServ Framework………….……….………….….…….28

3.3.3 DiffServ Framework………….……….………… …… 31

3.4 Layered Video Streaming……… ……… 33

CHAPTER 4 Layered 3D-CSPIHT CODEC……….…… ……….36

4.1 CSPIHT and 3D-CSPIHT Video Coder ……… …… ….…… … ……… 36

4.2 Limitations of Original 3-D CSPIHT Codec……… …… … ………….41

4.3 Layered 3D-CSPIHT Video Codec……….…….… ……….42

4.3.1 Overview of New Features……….…….…………43

4.3.2 Layer IDs……….…….……… 44

Trang 4

4.3.4 How the Codec Functions in the Network……….………54

4.3.5 Layered 3D-CSPIHT Algorithm……….……… …57

CHAPTER 5 PERFORMANCE DATA……….….…….59

5.1 Coding Performance Measurements……….……….59

5.2 PSNR Performance of the layered 3D-CSPIHT Codec………….….……… 60

5.3 Coding Time and Compression Ratio……….…… 70

CHAPTER 6 CONCLUSIONS……….……71

REFERENCES……… ……….……74

Trang 5

SUMMARY

A layered scalable codec based on the 3-D Color Set Partitioning in Hierarchical Trees (3D-CSPIHT) coder is presented in this thesis The layered 3D-CSPIHT codec

introduces layering of encoded bit streams to support layered scalable video streaming

It restricts the significance criteria of the original 3D-CSPIHT coder to generate separate bit streams comprised of cumulative layers Layers are defined according to resolution subbands The layered 3D-CSPIHT codec incorporates a new sorting algorithm to produce multi-resolution scalable bit streams, and a specially designed layer ID to identify the layer that a particular data packet belongs to By doing so, decoding of lossy data is achieved

The layered 3D-CSPIHT codec is tested using both high motion and low motion standard QCIF video sequences at 10 frames per second It is compared against the original 3D-CSPIHT and the 2D-CSPIHT video coder in terms of PSNR, encoding time and compression ratio In the luminance plane, the original 3D-CSPIHT and the 2D-CSPIHT give better PSNR than the layered 3D-CSPIHT While in the chrominance planes, they give similar PSNR results The layered 3D-CSPIHT also costs more in computational time and provides less compressed bit streams, because of the expense incurred by incorporating the layer ID However, encoded video data is very likely to encounter loss in real network transmission When decoding lossy data, the layered 3D-CSPIHT codec outperforms the original 3D-CSPIHT significantly

Trang 6

LIST OF TABLES

Table 2.1 Image and video compression standards……… ……….24 Table 4.1 Resolution options……….…………47 Table 4.2 LIP, LIS, LSP state after sorting at bit plane 2 (original CSPIHT)…… …50 Table 4.3 LIP, LIS, LSP state after sorting at bit plane 1 (original CSPIHT)……… 51 Table 4.4 LIP, LIS, LSP state after sorting at bit plane 0 (original CSPIHT)……… 51 Table 4.5 LIP, LIS, LSP state after sorting at bit plane 2 (layered CSPIHT, layer 1 effective)………52 Table 4.6 LIP, LIS, LSP state after sorting at bit plane 1 (layered CSPIHT, layer 1 effective)………52 Table 4.7 LIP, LIS, LSP state after sorting at bit plane 0 (layered CSPIHT, layer 1 effective)………52 Table 4.8 LIP, LIS, LSP state after sorting at bit plane 2 (layered CSPIHT, layer 2 effective)………53 Table 4.9 LIP, LIS, LSP state after sorting at bit plane 1 (layered CSPIHT, layer 2 effective)………53 Table 4.10 LIP, LIS, LSP state after sorting at bit plane 0 (layered CSPIHT, layer 2 effective)……… 53 Table 5.1 Average PSNR (dB) at 3 different resolutions……… …61 Table 5.2 Encoding time (in second) of the original and layered codec…… ………70

Trang 7

LIST OF FIGURES

Fig 1.1 A typical video streaming system………… ………2

Fig 2.1 Encoding model……… ………5

Fig 2.2 Decoding model……… ………6

Fig 2.3 Binary coding model……… ……….……….10

Fig 2.4 Block matching motion estimation……… ……….………11

Fig 2.5 1-D DWT decomposition……… ……… 14

Fig 2.6 Dyadic DWT decomposition of an image……… …….……… 14

Fig 2.7 Subbands after 3-level dyadic wavelet decomposition……… ……… 15

Fig 2.8 2-level DWT decomposed Barbara image……… ……….15

Fig 2.9 Spatial Orientation Tree for EZW……… ………… 17

Fig 2.10 Spatial Orientation Tree of SPIHT……….18

Fig 2.11 SPIHT coding algorithm……… ……… 25

Fig 3.1 Unicast video streaming……… ……….25

Fig 3.2 Multicast video streaming……… …… 26

Fig 3.3 IntServ architecture……… ……29

Fig 3.4 Leaky bucket regulator……… … 30

Fig 3.5 An example of the DiffServ network……… ….32

Fig 3.6 DiffServ inter-domain operations……….……… …… …33

Fig 3.7 Principle of a layered codec……… ………….………… 35

Fig 4.1 CSPIHT SOT (2-D) ……….…… ….37

Fig 4.2 CSPIHT video encoder ……… ….37

Fig 4.3 CSPIHT video decoder……… ……….……….38

Fig 4.4 3D-CSPIHT STOT ……….………… …… …39

Trang 8

Fig 4.6 3D-CSPIHT video decoder……… …….……… …….41

Fig 4.7 Confusion when decode lossy data using original 3D-CSPIHT decoder… 41

Fig 4.8 Network scenario considered for design of the layered codec……… … 43

Fig 4.9 The bit stream after layer ID is added……… ……45

Fig 4.10 Resolution layers in layered 3D-CSPIHT……… ………47

Fig 4.11 Progressively transmitted and decoded layers ……… … ….47

Fig 4.12 (a) An example video frame after DWT transform ……… …….…49

Fig 4.12 (b) SOT for Fig 4.14 (a)…… ……… …… ……49

Fig 4.13 Bit stream structure of the layered 3D-CSPIHT coder……… ….….55

Fig 4.14 Flowchart of the layered decoder algorithm……… ……56

Fig 4.15 Layered 3D-CSPIHT algorithm……….…58

Fig 5.1 Frame by frame PSNR results on (a) foreman and (b) container sequences at 3 different resolutions……… ….61

Fig 5.2 Rate distortion curve of the layered 3D-CSPIHT codec ……… … 62

Fig 5.3 PSNR (dB) comparison of the original and the layered codec in (a) luminance plane, (b) Cb plane and (c) Cr plane for the foreman sequence……… 63

Fig 5.4 Frame 1 of foreman reconstructed at (a) resolution 1, (b) resolution 2, (c) resolution3 and (d) original……… 64

Fig 5.8 Comparison on carphone sequence……….66

Trang 9

Fig 5.9 Comparison on akiyo sequence……… 67

Fig 5.10 Manually formed incomplete bit streams ………68

Fig 5.11 Reconstruction of frame (a)(b)1, (c)(d)5, (e)(f)10 of the foreman sequence

……… ….69

Trang 10

CHAPTER 1 INTRODUCTION

With the emergence of increasing demand of rich multimedia information on the Internet, video streaming has become popular in both academia and industry

Video streaming technology enables real time or on-demand distribution of video resources over the network Compressed video data are transmitted by a server application, and received and displayed in real time by the corresponding client applications These applications normally start to display the video as soon as a certain amount of data arrives at the client’s buffer, thus providing downloading and viewing

of the video simultaneously

A typical video streaming system consists of five core functional blocks, i.e., coding module, network sender, network receiver, decoding module and video renderer As shown in Fig 1.1, raw video data will undergo compression in the coding module to reduce the data load in the network The compressed video is then transmitted by the sender to the client on the other side of the network, where a decoding procedure is performed to reconstruct the video for the renderer to display

Video streaming is advantageous because a user does not have to wait until the whole file to arrive before he can see the video Besides, video streaming leaves no physical files on the clients’ computer

Trang 11

Fig 1.1 A typical video streaming system

The challenge of video streaming lies in the highly delay-sensitive characteristic of video applications Video/audio data need to arrive on time to be useful Unfortunately,

current Internet service is best effort (BE) and guarantees no delay bound Delay

sensitive applications need a new service model in which they can ask for higher

assurance or priority from the network Research in network Quality of Service (QoS)

aims to investigate and provide such service models Technical details of QoS include

control protocols such as the Resource Reservation Protocols (RSVP), and individual

building blocks such as traffic policing, buffer management and admission control [1] Layered scalable streaming is one of the QoS supportive video streaming mechanisms that provide both efficiency and flexibility

The basic idea of layered scalable streaming is to encode raw video into multiple layers that can be separately transmitted, cumulatively received and progressively decoded [2]-[4] Clients obtain a preferred video quality by subscribing to different layers and combining these layers into different bit streams Base layer of the video stream must be received for any other layers to be useful, and each additional layer improves the video quality As network clients always differ significantly in their capacities and preferences, layered scalable streaming is efficient in that it is able to

Trang 12

deliver one video stream over the network, while at the same time it enables the clients

to receive a video that is specially “shaped” for each of them

Besides adaptive QoS support from the network, layered scalable video streaming requests a scalable video codec Recent subband coding algorithms based on the

Discrete Wavelet Transform (DWT) support scalability The DWT based Set Partitioning in Hierarchical Trees (SPIHT) scheme [5] [6] for coding of monochrome

images has yielded desirable results despite its simplicity in implementation The

Color SPIHT (CSPIHT) [7]-[9] improves the SPIHT and achieves comparable

compression results to SPIHT in color image coding In the area of video compression, interest is focused on the removal of temporal redundancy The use of 3-D subband coding schemes is one of the successful solutions Karlsson and Vetterli implemented a 3-D subband coding system in [10] by generalized the common 2-D filter banks to 3-D subband analysis and synthesis As one of the embedded 3-D subband coding

algorithms that follow it, 3D-CSPIHT [11] is an extension of the CSPIHT coding

scheme for video coding

The above coding schemes achieve satisfactory PSNR performance; however, they have been designed from a pure compression point of view, which render problems for their direct application to a QoS enabled streaming system

In this project, we extended the 3D-CSPIHT codec to address these problems and enable it to produce layered bit streams that are suitable for layered video streaming

Trang 13

The rest of this thesis is organized as follows: In chapter 2 we provide background information in image/video compression, and in chapter 3 we discuss related research

in multimedia communications and network QoS The details of our extension of the

3D-CSPIHT codec, called layered 3D-CSPIHT video codec, are presented in chapter 4

We analyze performance of the layered codec in chapter 5 Finally, in chapter 6 we conclude this thesis

Trang 14

CHAPTER 2 IMAGE AND VIDEO CODING

This chapter begins with an overview of transform coding for still images and video coding using motion compensation Then wavelet based image and video coding is introduced and the subband coding techniques are described in detail Finally, current image and video coding standards are briefly summarized

2.1 Transform Coding

A typical transform coding system comprises of forward transform, quantization and entropy coding, as shown in Fig 2.1 First, a reversible linear transform is used to reduce redundancy between adjacent pixels, i.e., the inter-pixel redundancy, in an image After that, the image undergoes the quantization stage to reduce psychovisual redundancy Lastly, the quantized image goes through entropy coding which aims to reduce coding redundancy Transform coding is a core technique recommended by JPEG and adopted by H 261, H.263, and MPEG 1/2/4 The corresponding decoding procedure is depicted in Fig 2.2 We will discuss the three encoding stages in this section

Transform Quantization Entropy coding

Fig 2.1 Encoding model

Trang 15

Entropy decoding Inverse transform

Compressed

Fig 2.2 Decoding model

2.1.1 Linear Transforms

Transform coding exploits the inter-pixel redundancy of an image by mapping the image to the transform domain using a reversible linear transform For most natural images, a significant number of coefficients will have small magnitudes after the transform These coefficients therefore can be coarsely quantized or entirely discarded without causing much image degradation [12] There is no information loss during the transform process, and the number of coefficients produced is equal to the number of pixels transformed Transform itself does not directly reduce the amount of data required to represent the image However, a set of transform coefficients are obtained

in this way, which makes the inter-pixel redundancies of the input image more accessible for compression in later stages of the encoding process [12]

standard basis {a 1 , a 2 , …, a N } of an N-dimensional Euclidean space, we obtain:

x ∑ (2.1)

=

= N

n n x

1

n

a

where A=[ a 1 , a 2 , …, a N ] is an identity matrix of size N × N

A different set of basis [ b 1 , b 2 , …, b N ] can be used to represent x as

x ∑ (2.2)

=

= N

n n y

1

n

b

Trang 16

Let B=[ b 1 , b 2 , …, b N ] and y=[ y1 , y2 , …, yN ]T, we have

2.1.2 Quantization

After transform process, quantization is used to reduce the accuracy of the transform coefficients according to a pre-established fidelity criterion [14] The effect of compression is achieved in this way Quantization is an irreversible process

Trang 17

Quantization is the mapping from the source data vector x to a code word = Q[x] in a

code book { ; 1 ≤ k ≤ L} The criterion to choose the proper code word is to reduce the expected distortion due to quantization with respect to a particular probability

density distribution of the data Assume the probability density function of x is f(x)

The expected distortion can be formulated as:

k r k

r

D N x r I x r k f x x dx

)(),(

2 1

;][1),(

otherwise

r x Q r

I a( )= −log p( )a (1≤i≤k) (2.7)

Trang 18

where the unit of information is bit for logarithm of base 2

The entropy of the message source is then defined as

2 (2.8)

1

k j j

Arithmetic coding is a variable length coding based on the frequency of each character

or symbol It is suitable to encode a long stream of symbols or long messages In arithmetic coding, probabilities of all code words sum up to unity The events in the data set are arranged in an interval between 0 and 1 Each code word probability can be related to a subdivision of this interval The algorithm for arithmetic coding then works

as follows:

i) Begin with a current interval [L, H) initialized to [0, 1);

ii) For each incoming event, the current interval is subdivided into subintervals proportional to their probabilities of occurrence, one for each possible event; iii) Select the subinterval corresponding to the incoming event, make it the new current interval and go back to step 1

Arithmetic coding reduces the information that needs to be transmitted to a single number within the final interval, which is identified after the whole data set is encoded

Trang 19

The arithmetic decoder, with the knowledge of occurrence probability of the different events and the number received, then maps the intervals identified and scales the intervals accordingly to decode the data set

2.1.4 Binary Coding

Binary coding is lossless, and is a necessary step in any coding system The process of

binary coding is shown in Fig 2.3

Probability table pi

Codeword cibit length li

Fig 2.3 Binary coding model

Denote the bit rate produced by such a binary coding system as R According to Fig 2.3, we have

R ( ) ( ) (2.9)

2.2 Video Compression Using MEMC

Unlike still image compression, video compression attempts to exploit the temporal redundancy There are two types of coding categorized according to the type of

redundancy being exploited, i.e., intraframe coding and interframe coding In

intraframe coding, each frame is coded separately using still image compression methods such as transform coding, while interframe coding uses spatial redundancies

Trang 20

and motion compensation to exploit temporal redundancy of the video sequence This

is done by predicting a new frame from its previous frame, thus the original frame to code is reduced to the prediction error or residual frame [15] We do this because prediction errors have smaller energy than the original pixel values and therefore can

be coded with fewer bits Those regions with high motion or scene changes will be coded directly using transform coding Video compression system is evaluated using

three criteria: reconstruction quality, compression rate and complexity

The method used to predict a frame from its previous one is called Motion Estimation (ME) or Motion Compensation (MC) [16] [17] MC uses the motion vectors to

eliminate or reduce the effects of motion, while ME computes motion vectors to carry

on the displacement information of a moving object Normally the two terms are often referred to as MEMC

reference frame actual frame

prediction block

actual block motion vector

Fig 2.4 Block matching motion estimation

MEMC is normally done at macro block (MB) (16x16 pixels) level independently in order to reduce computation complexity, which is called the Block Matching Algorithm

In the Block Matching Algorithm (Fig 2.4), a video frame is divided into macro

Trang 21

blocks Each pixel within the block is assumed to have the same amount of translational motion Motion estimation is achieved by doing block matching between

a block in the current frame and a similar matching block within a search window in

the reference frame A two-dimensional displacement vector or motion vector (MV) is

then obtained by finding the displaced co-ordinate of the match block to the reference frame The best prediction is found by minimizing a matching criterion such as the

Sum of Absolute Difference (SAD) SAD is defined as:

y x B SAD

),()

,( (2.10)

where represents the pixel with coordinate (x,y) in a MxN block from the

current frame at spatial location (i,j), while B represents the pixel with

coordinate (x,y) in the candidate matching block from the reference frame at spatial location (i,j) displaced by vector (u,v)

),

(

B j

),(

,j v x y u

I− −

2.3 Wavelet Based Image and Video Coding

This section provides a brief overview of wavelet based image and video coding

[18]-[22] The Discrete Wavelet Transform (DWT) is introduced and the subband coding schemes including the Embedded Zerotree Wavelet (EZW) and the Set Partitioning in Hieratical Tree (SPIHT) are discussed in detail In the last sub-section, the concept of

scalability is introduced

Trang 22

2.3.1 Discrete Wavelet Transform

The Discrete Wavelet Transform (DWT) is an invertible linear transform that decomposes a signal into a set of orthogonal functional basis called wavelets The

fundamental idea behind DWT is to present each frequency component as a resolution matched to its scale, so that a signal can be analyzed at various levels of scales or resolutions In the field of image and video coding, DWT performs decomposition of video frames or residual frames into a multi-resolution subband representation

We denote the wavelet basis as

( ) 2 2 (2 )

)

j k

φ (2.11)

where variables j and k are integers that are the scale and location index indicating the

wavelet's width and position, respectively They are used to scale or “dilate” φ(x) or the mother function to generate wavelets

The DWT transform pair is then defined as

, ,

where is the signal to be decomposed, and c is the wavelet coefficient To span

the data domain at different resolutions, we use equation (2.14):

Trang 23

cj+1

cj

ajinput vector

Fig 2.6 Dyadic DWT decomposition of an image

In real applications, the DWT is often performed on a vector whose length is an integer power of 2 As Fig 2.5 shows, the process of 1-D DWT computation comprises of a series of filtering and sub-sampling operations H and L denote high and low-pass

obtained from the DWT The 1-D DWT can be extended to 2-D for image and video processing In this case, filtering and sub-sampling are first performed along all the

rows of the image and then all the columns 2-D DWT is called dyadic DWT 1-level

dyadic DWT results in four different resolution subbands, namely the LL, LH, HL and the HH subbands The decomposition process is shown in Fig 2.6 The LL subband contains the low frequency image and can be further decomposed by 2-level or 3-level

Trang 24

dyadic DWT Fig 2.7 depicts the subbands of an image decomposed using a 3-level dyadic DWT Fig 2.8 shows the Barbara image after 2- level decomposition

Fig 2.7 Subbands after 3-level dyadic wavelet decomposition

Fig 2.8 2-level DWT decomposed Barbara image

The advantage of DWT is that it has versatile time frequency localization This is because DWT has shorter basis functions for higher frequencies, and longer basis functions for lower frequencies The DWT has an important advantage over traditional

Trang 25

Fourier Transform in that it can analyze signals containing discontinuities and sharp spikes

2.3.2 EZW Coding Scheme

Good energy compaction property has attracted huge research interest on DWT based image and video coding schemes The main challenge of wavelet-based coding is to achieve an efficient structure to quantize and code the wavelet coefficients in the

transform domain Lewis and Knowles defined a spatial orientation tree (SOT)

structure [23] - [27] and Shapiro then made use of the SOT concept and introduced the

Embedded Zerotree Wavlet (EZW) encoder [28] in 1993 The idea is further improved

by Said and Pearlman by modifying the EZW SOT structure Their new structure is

called Set Partitioning in Hierarchical Trees (SPIHT) A brief discussion on the EZW

scheme is provided in this section and a detailed description on SPIHT is provided in the next section

Shapiro’s EZW coder contains 4 key steps:

i) the discrete wavelet transform;

ii) subband coding using the EZW SOT structure (Fig 2.9);

iii) entropy coded successive-approximation quantization;

iv) adaptive arithmetic coding

A zerotree is actually a SOT which has no significant coefficients with respect to a given threshold For simplicity, the image in Fig 2.9 is transformed using a 2-level DWT However, in most situations, a 3-level DWT is applied to ensure better

Trang 26

reconstruction quality As shown in Fig 2.9, the image is divided into 7 subbands after the 2-level wavelet transform Nodes in the lowest subband will each have 3 children nodes with one in each of its neighborhood subband Its children, in turn, will each have 4 children nodes which reside in the same spatial location of the correspondent higher subband Thus, all the nodes are linked in the SOTs, a searching through these SOT trees will then be performed to have the significant coefficients found and coded with a higher priority The core of the EZW encoder, step ii), is based on three concepts, i.e., comparison of coefficient magnitudes to a series of decreasing thresholds representing the current bit plane, ordered bit plane coding of refinement bits and exploitation of the correlation across subbands in the transform domain

Fig 2.9 Spatial Orientation Tree for EZW

EZW coding scheme is proved competitive with virtually all known compression techniques in performance, while still generating a fully embedded bit stream It utilizes both the bit plane coding and the zerotree concept

Trang 27

2.3.3 SPIHT Coding Scheme

Pearlman’s SPIHT coder is an enhancement to the EZW coder Basically, SPIHT is also a sorting algorithm which tries to code wavelet coefficients according to priority defined by their significance with respect to a certain threshold This is achieved by tracking down the SPIHT SOT and comparing the coefficients against the given threshold SPIHT scheme inherits the basic concepts of the EZW, except that it uses a modified SOT called SPIHT SOT (Fig 2.10)

*

Fig 2.10 Spatial Orientation Tree of SPIHT

The SPIHT SOT structure is designed according to the observation that if a coefficient magnitude in a certain node of a SOT does not exceed a given threshold, it is very likely that none of the nodes in the same location in the higher subbands will exceed that threshold The SPIHT SOT naturally defines this spatial relationship using a hierarchical pyramid Each node is identified by the coordinate of the pixel and its magnitude is the correspondent absolute value of that pixel As Fig 2.10 shows, each node has either no or 4 offspring, which are located at the same spatial orientation in the next finer level of the pyramid The 4 offspring always form 2×2 adjacent pixel groups The nodes in the lowest subband of the image or the highest level of the

Trang 28

pyramid will be the roots of the SOT There is a slight difference in the offspring branching rule for the tree roots, i.e., in each 2×2 group, the upper left node will be childless Thus, the wavelet coefficients are organized in hierarchical trees with nodes

in common orientation across all subbands linked in one same SOT This will allow us

to predict a coefficient’s significance according to the magnitude of its parent node later

We use the symbols in Said and Pearlman’s paper to denote the coordinates and the sets:

• O(i,j) denotes set of coordinates of all offspring of node (i,j);

• D(i,j) denotes set of coordinates of all descendants of the node (i,j);

• H denotes all nodes in the lowest subband, inclusive of the childless nodes;

• L(i,j) denotes the set of coordinates of all non-direct descendants of the node (i,j), i.e., L(i,j)=D(i,j)-O(i,j);

Now we are able the express the SOT descendants branching rule by equation (2.15):

O(i,j) = {(2i, 2j), (2i+1, 2j), (2i, 2j+1), (2i+1,2j+1)} (2.15)

After the sets are defined, the set partitioning rule is used to create new partitions in order to effectively predict and code significant nodes A magnitude test is performed

on each partitioned subset to determine its significance If significant, the subset will

be further partitioned into new subsets, and the magnitude test will again be applied to the new subsets until each individual significant coefficient is identified Note that an individual coefficient is significant when it is larger than the current threshold and insignificant otherwise To make a significant set, at least one descendant must be

Trang 29

significant on an individual basis We denote the transformed coefficients as c i,j, the pixel sets as τ and use the following function to define the relationship between magnitude comparisons and message bits:

S

n j j

,0

2max

,1)(τ ( )τ , (2.16)

The set partitioning rule is then defined as follows:

i) The initial partition is formed with sets {(i,j)} and D(i,j), for all (i,j)∈H;

ii) If D(i,j) is significant, then it is partitioned into L(i,j) plus the four element sets with (k,l) ∈O(i,j);

single-iii) If L(i,j) is significant, then it is partitioned into the four sets D(k,l) with (k,l)

O(i,j)

∈

Following the SOT structure and the set partitioning rule, an image that has large coefficients at the SOT roots and zero or very small coefficients in higher level of the SOTs will need very little sorting and partitioning of the pixel sets This property reduces the computational complexity greatly and allows for a better reconstruction of the image

In implementation, the SPIHT coding algorithm uses 3 ordered lists to store the significance information, i.e.:

i) list of insignificant pixels (LIP)

ii) list of significant pixels (LSP)

iii) list of insignificant sets (LIS)

Trang 30

In case of LIP and LSP, the coordinates of the pixels will be stored in the list In case

of LIS, however, the list contains two types of coordinates categorized according to

which set it represents If an entry represents the set D(i,j), we say it is a type A entry;

if it represents L(i,j), we say it is a type B entry

To initialize the coding algorithm, the maximum coefficient in the image is identified and the initial bit plane is assigned a value of n=log2(max(i,j){|c i,j|} The threshold

value is then obtained by computing 2 n Also, the LIS and LIP lists are initialized with pixel coordinates in the highest subband The set portioning rule is then applied to the LIP and LIS lists to judge the significance status of the pixels or sets This is called the

sorting pass Thereafter, the refinement pass will go through the LSP list to code bits

necessary to enhance the precision of the significant coefficients from the previous sorting pass by a bit position Thus, we fulfill coding under the first bit plane To continue, the bit plane is decreased by 1 and the sorting and refinement passes are re-executed in the next iteration This process is repeated until the bit plane is reduced to zero or a user given bit budget runs out Fig 2.11 demonstrates the above coding algorithm Note that in step 2.2), the entries added to the end of the LIS list are evaluated before that same sorting pass ends So, step 2.2) not only sorts the original initialized entries, but also sorts the entries that are being added to the LIS list

The SPIHT coder improves the performance of the EZW coder by 0.3-0.6 dB This gain is mostly due to the fact that the original zerotree algorithms allow special symbols only for single zerotrees, while there are often other sets of zeros in reality In particular, SPIHT coder provides symbols for combinations of parallel zerotrees Moreover, SPIHT produces a fully embedded bit stream that can be precisely

Trang 31

controlled in bit rate SPIHT coder is very fast and has a low computational complexity Both EZW and SPIHT belong to subband coding schemes, and both exploit the correlation between subbands through the SOT

1) Initialization:

1.1) Output n= log2 (max (i, j) { | c i, j| };

1.2) Set the LSP, LIP and LIS as empty lists

Add coordinates (i,j) H to the LIP and those with descendants ∈

to the LIS as TYPE A entries

-Output sign of c i,j;

2.2) For each entry (i,j) in LIS do:

2.2.1) If entry is TYPE A then

-Output S(D(i,j));

-If S(D(i,j))=1 then

+For each offspring (k,l) of (i,j) do:

-Output S(k,l);

-if S(k,l)=1 then add (k,l) to LSP and output sign of c k,l;

-if S(k,l)=0 then add (k,l) to end of LIP;

+If L(i,j)!=0 then move (i,j) to end of LIS as

TYPE B and go step 2.2.2);

otherwise remove (i,j) from LIS;

2.2.2) If the entry is TYPE B then

-Output S(L(i,j));

-If S(L(i,j))=1 then

+Add each element in L(i,j) to end of LIS as TYPE A; +Remove (i,j) from LIS;

3) Refinement Pass:

For each entry (i,j) in LSP, except those from the last sorting pass:

-Output the nth most significant bit of c i,j;

4) Quantization-Step Update:

Decrease n by 1 and go to step 2

Fig 2.11 SPIHT coding algorithm

Trang 32

2.3.4 Scalability

One advantage of the SPIHT image and video coder is the bit rate scalability Scalability is the degree to which video and image formats can be sized in systematic proportions for distribution over communication channels of varying capacities [29] In other words, it measures how flexible an encoded bit stream is Scalable image and video coding has received considerable attention from the research community due to the diversity of the communication networks and network users

There are three basic types of scalability, and they refine video quality along three different dimensions, i.e.:

• Temporal scalability or temporal resolution/frame rate

• Spatial scalability or spatial resolution

• SNR scalability or amplitude resolution

Each type of scalable coding provides scalability of one dimension of the video sequence Multiple types of scalability can be combined to provide scalability along multiple dimensions In real applications, being temporal scalable often means supporting different frame rates, while spatial and SNR scalability means video of different spatial resolution and visual quality respectively

One common method of providing scalability is to apply subband decomposition on the video sequences Thus, the full resolution video can be decoded using both the low pass and high pass subbands, while half resolution video can be decoded using only the low pass subband The resulting half resolution video sequences can be passed through further subband decomposition to create quarter resolution video, and so on We will use this concept in chapter 4

Trang 33

2.4 Image and Video Coding Standards

ISO/IEC and ITU-T have been heavily involved in the standardization of image, audio

and video coding as international organizations Specifically, ISO/IEC focuses on

video storage, broadcast video and video streaming applications while ITU-T caters to

real time video applications Current video standards mainly comprise of the ISO

MPEG family and the ITU-T H.26x family Table 2.1 [30] provides an overview of

these standards and their applications JPEG and JPEG 2000 are also listed as still

image coding standards for reference

JPEG 2000 Improved still image compression Variable

MPEG-2 Digital Television, Video on DVD 2-20Mpbs

MPEG-4 Object-based coding Interactive video 28-1024kbps

H.261 Video conferencing over ISDN Variable

H.263 Video conferencing over Internet and PSTN

Wireless video conferencing >=33kbps H.26L Improved video compression 10-100kpbs

Table 2.1 Image and video compression standards

Trang 34

CHAPTER 3 VIDEO STREAMING AND NETWORK QoS

In this chapter some fundamentals in video streaming are provided The network

Quality of Service (QoS) is defined and two frameworks, i.e., the Integrated Services (IntServ) and the Differentiated Services (DiffServ) are discussed in detail Finally,

principles of layered video streaming are provided

3.1 Video Streaming Models

Unicast and multicast are the two models of video streaming Unicast is the

communication between a single sender and a single receiver As shown in Fig 3.1, the sender sends individual copies of video streams to each client even when some of the clients require the same video resource Unicast is also called point-to-point communication because there seems to be a non-shared connection from the server to each client

Trang 35

By contrast, communication between a single sender and multiple receivers is called multicast or point-to-multi-points communication In multicast scenario (Fig 3.2), the sender sends only one copy of the required video over the network It is then routed to several destinations by the network switches or routers A client receives the video stream by tuning in to a multicast group in its neighborhood When the clients are multiple groups, the video will be duplicated and branched at fork points, as shown at router R1 (Fig 3.2)

encoder

z R1

R2

R3

R4 harddisk

Fig 3.2 Multicast video streaming

3.2 Characteristics and Challenges of Video Streaming

Video streaming is real time applications Unlike traditional data-oriented applications such as email, ftp, and web browsing, video streaming applications is highly delay-sensitive and need the data to arrive on time to be useful As such, the service requirements of video streaming applications differ significantly from those of traditional data-oriented applications To satisfy these requirements is a great challenge under today’s Internet

Trang 36

First of all, the best effort (BE) services that current Internet provides are far from

sufficient for a real time application as video streaming Under the BE service model, there is no guarantee on delay bound or loss rate When the data load is heavy on the Internet, delivery results could be very unacceptable On the other side, video streaming requires timely and, to some extent, correct delivery We must ensure that the streaming is still viable with a decreased quality in time of congestion

Second, client machines on the Internet normally vary significantly in their computing, display and memory capabilities In most cases, these heterogeneous clients will require video of different qualities It is obviously inefficient to delivery the same video stream to all clients seperately Instead, streaming in response to particular requests from each individual client is desirable

As a conclusion, suitable coding and streaming strategies are needed to support efficient real time video streaming Scalable video coding and new network service

model which supports network Quality of Service are developed to address the above

challenges We will discuss details of QoS in the next section

3.3 Quality of Service

3.3.1 Definition of QoS

The current Internet provides one single class of service, the BE service BE service generally treats all data as one service class and provides priority to no particular data

or users This is not enough for the new real time applications such as video streaming

We must modify the Internet and provide more service options, which can, to some

Trang 37

extent, keep the service quality up to a certain level that has been previously agreed on

by the user and the network This new service model is provided by support of network

There are generally two approaches to support QoS One is fine-grained approaches, which provide QoS to individual applications or flows, the other is coarse-grained approaches, which aggregate data of traffic into classes, and provide QoS to these

classes Currently, the Internet Engineering Task Force (IETF) has developed QoS frameworks through both approaches, namely, the Integrated Services (IntServ) framework as example of fine-grained approaches, and the Differentiated Services

(DiffServ) framework as example of coarse-grained approaches

Trang 38

determine or assist in the reservation Packet classifier classifies a packet into an appropriate QoS class Policy control is then used to examine the packet to see whether

it has administrative permission to make the requested reservation For final reservation success, however, admission control must also be passed to ensure that the desired resources can be granted without affecting the QoS previously requested by and admitted to other flows Finally, the packet is scheduled to enter the network at a proper time by the packet scheduler in the primary data forwarding path

Fig 3.3 IntServ architecture

The IntServ architecture adds two service classes to the existing BE model, i.e.,

guaranteed service and controlled load service

Basic concept of guaranteed service can be described using a linear flow regulator called leaky bucket regulator (Fig 3.4) Suppose there are b tokens in the bucket, and new tokens are being filled in at a rate of r tokens/sec Before filtering by the regulator, packets are being thrown in with a variable rate Under the filtering, however, they must wait at the input queue for the same amount of tokens to be available before they

Trang 39

can further proceed into the network Obviously, such a regulator allows flows of a

maximum burst rate of b tokens/sec and an average rate of r tokens/sec to pass Thus,

it confines the traffic to a network to b+rt tokens over an interval of t seconds

Fig 3.4 Leaky bucket regulator

To invoke the service, a router needs to be informed of the traffic and the reservation

characteristics, denoted by Tspec and Rspec respectively

Tspec contains the following parameters:

• p = peak rate of flow (bytes/s)

• b = bucket depth (bytes)

• r = token bucket rate (bytes/s)

• m = minimum policed unit (bytes)

• M = maximum datagram size (bytes)

Rspec contains the following parameters:

• R = bandwidth, i.e., service rate (bytes/s)

• S = slack term (ms)

Trang 40

Guaranteed service promises a maximum delay for a flow, provided that the flow conforms to its specified traffic parameters This service model aims to support applications with hard real time requirements

Unlike guaranteed service, controlled-load service provides no rigid delay or loss guarantees Instead, it provides a QoS similar to BE service in an under utilized network, with almost no loss or delay When the network is overloaded, it tries to share the bandwidth among multiple streams in a controlled way to manage approximately the same level of QoS Controlled-load service is intended to support applications that can tolerate reasonable amount of delay and loss

3.3.3 DiffServ Framework

IntServ provides fine-grained QoS guarantees using the Tspec message However, introducing Tspec for each flow may be too expensive in implementation Besides,

incremental deployment is only possible for controlled-load service, while it is difficult

to realize guaranteed service across the network Therefore, there is a need for more flexible service models to allow for more qualitative definitions of service distinctions The solution is DiffServ, which aims to develop architecture for providing scalable and flexible service differentiation

Generally, the DiffServ architecture comprises of 4 key concepts:

Định dạng
Số trang	89
Dung lượng	1,51 MB