Side information creation using adaptive block size for distributed video coding

Side information creation using adaptive block sizefor distributed video coding Nguyen Thi Huong Thao, Vu Huu Tien Posts and Telecommunications Institute of Technology Ha noi, Vietnam Em

Trang 1

Side information creation using adaptive block size

for distributed video coding

Nguyen Thi Huong Thao, Vu Huu Tien

Posts and Telecommunications Institute of Technology

Ha noi, Vietnam Email: thaonth,tienvh@ptit.edu.vn

Hoang Van Xiem, Le Thanh Ha, Dinh Trieu Duong

Vietnam National University

Ha noi, Vietnam Email: xiemhoang, lthavnu, duongdt77@gmail.com

Abstract—Distributed video coding is the promising solution

for emerging applications such as wireless video surveillance,

wireless video sensor networks that have not been supported by

traditional video coding standards Success of distributed video

coding is based on exploiting the source statistics at the decoder

with availability of some side information The better the quality

of side information, the higher the performance of the distributed

video coding system In this paper, a novel side information

creation method is proposed by using different block sizes based

on the residual information at the encoder The proposed solution

is compared with the previous PRISM solution and simulated

results show that the proposed solution robustly improves the

coding performance in some cases of test sequences

Keywords—Distributed Video Coding, Side Information

I INTRODUCTION

Today, video standards play an important role in many

applications in life Almost all of the video coding applications

fall within the two classes of application models, namely

downlink and uplink models The downlink application model

is associated with the broadcasting approach In this model,

the encoder complexity may be high while decoder complexity

needs as light as possible because there may be one encoder but

thousands of decoders Applications such as video streaming,

broadcasting belong to this downlink model On the other

hand, in the uplink application model, low complexity encoder

is required and complexity of the decoder is not issue

Emerg-ing applications such as wireless video surveillance, wireless

video sensor networks belong to this model

However, popular video coding standards such as MPEGx,

H.264/AVC or HEVC only mainly support for downlink

ap-plication models So, what are solutions for uplink apap-plication

models? The answer for this question is Distributed Video

Coding (DVC) solution Based on two important results of

information theory are the Slepian-Wolf [1] and the

Wyner-Ziv [2] theorems, DVC is regarded as the promising solution

for the uplink application model because it only exploits

the redundancy, partially or fully, at the decoder with the

availability of side information (SI) rather not at the encoder as

predictive coding standards earlier So, motion estimation task,

that requires high computational complexity, is not performed

at encoder and this makes the encoder lighter Theoretically,

DVC can achieve compression performance equals to the

current video standard However practical DVC systems have

much work to do to achieve such performance As we see,

DVC only works well if SI is available at decoder and the

better the quality of SI, the smaller the number of parity bits

(or bit rate) needed In the literature, there have been many

SI creation proposals, notably frame interpolation [3,4] and extrapolation [5,6] algorithms Frame interplolation methods use past and future decoded frames to creat SI so there is some delay Howerver, frame extrapolation methods use only past decoded frames so the delay is lower and it is more suitable for real time applications SI creation techniques at the decoder are responsible for the compression efficiency in the DVC, therefore building the more efficient novel SI creation method

is very necessary for DVC systems

The first pratical implementations of DVC systems have been proposed in [7] and [8], namely Berkeley and Stanford ar-chitectures correspondingly In [7], PRISM codec is presented based on pixel block syndrome coding In [8], a codec based

on turbo codes operating on whole frame have been proposed

In this paper, a SI creation algorithm with high quality and reasonable computional time based on PRISM architecture is proposed The rest of paper is organized as follows Section 2 briefly describes about PRISM architecture and some related works In section 3, a novel SI creation method at the decoder

is proposed and finally, test conditions and performance results are presented in section 4

II PRISMARCHITECHTURE

The PRISM codec (Power-efficient, Robust, hIgh-compression, Syndrome-based Multimedia coding) works at block level, i.e., channel codes are applied independently for each block, with motion estimation performed at the decoder and CRC used to identify correct SI, and especially does not require a feedback channel The PRISM codec is shown in Figure 1

At the encoder:

Classification: Before encoding, each block is classified into one of several pre-defined classes depending on the temporal correlation between the current block and the corre-sponding prediction block in the reference frame Depending

on the allowed complexity at the encoder, prediction block can be either the co-located block or a motion compensated block This stage decides to which class the block belongs and so the coding mode for each block: no coding (SKIP class), traditional Intraframe coding (entropy coding class)

or syndrome coding (several syndrome coding classes) The blocks classified in the syndrome coding classes are coded using DVC coding approach as described below

Trang 2

Fig 1 (a) Encoder block diagram; (b) Decoder block diagram

DCT: A frame is divided into non-overlapped blocks and

Discrete Cosine Transform (DCT) is applied over each block

Quantization: A scalar quantizer [9] with fixed step size

as in H.263+ is applied to the obtained DCT coefficients

corresponding to a certain target quality

Syndrome coding: For those blocks classified in the

syn-drome coding classes, only the least significant bits of the

quantized DCT coefficients in a block are syndrome encoded,

so it is assumed that the most significant bits are inferred from

the SI (due to high correlation with the corresponding SI)

The number of least significant bits to be transmitted to the

decoder depends on the syndrome class to which the block

belongs Within the least significant bits, the lower part of

the is encoded using a (run, depth, path, last) 4 tuple based

entropy codec The upper part of the least significant bits is

coded using a coset channel code, in this case a BCH code,

because it works well for small block length

Hash generator: For each block, the encoder also send a

16 bit cyclic redundancy check (CRC) sum as a signature of

the quantized DCT coefficients CRC is used to select the best

candidate block (SI) at the decoder as explained below

At the decoder:

Motion search: The decoder generates side information

candidate blocks, which correspond to all half-pixel displaced

blocks in the reference frame, in a window positioned around

the center of the block to decode

Syndrome decoder: Each of the candidate blocks plays

the role of side information for syndrome decoding, which

consists in two steps [9]: The first step deals with entropy

decoding of the lower part of the least significant bitplanes

and the coset channel coded bitplanes to identify the coset in

which the SI must be decoded The second step deals with

soft decision decoding which is performed for each candidate

block (SI) to find the closest (quantized) codeword within the

coset identified in the first step For each candidate block, a

Fig 2 The encoder of the proposal architecture

decoded quantized block is thus obtained from the syndrome decoding operation

Hash check: Each candidate block leads to a decoded block, from that a CRC is generated for each decoded quan-tized block To select one of candidate blocks and successful decoding (i.e blocks with a small error probability), generated CRC is checked sequentially until decoding leads to CRC sum matching

Reconstruction and IDCT Once the quantized DCT co-efficients block is recovered, it is used along with the corre-sponding side information to get the best reconstructed block

by using the minimum mean square estimate from the side information and the quantized block The decoded video frame

is then obtained applying the IDCT over the reconstructed (DCT coefficients) block

III PROPOSALARCHITECTURE OFDISTRIBUTEDVIDEO

CODING

Motivated from solution in [10], the proposed architecture uses the H.264/AVC standard in order to exploit the enhanced coding solutions of the standards This solution is also based

on the early DVC architecture briefly presented in Section

2 As mentioned above, DVC coding approach targets the reduction of the encoder computational complexity, which is typically high for predictive video coding architectures In addition, the method in [10] uses correlation estimation of 4x4 input blocks for all frames in video sequences In order

to descrease encoding time moreover, the proposed method uses the adaptive input block size to enhance the performance

of DVC codec The proposed architecture of video coding is shown in Figure 2

A Encoding process

In this paper, the encoding process is performed in the following steps:

Frame classification:First, a video sequence is divided into

WZ frames, this means the frames that will be coded using a Wyner-Ziv approach, and key frames that will be coded as Intra frames, e.g using the H.264/AVC Intra coding mode [10] The key frames are typically periodically inserted with

a certain GOP (Group Of Picture) size An adaptive GOP size selection process may also be used, meaning that the key frames are inserted depending on the amount of temporal correlation present along the video sequence In this paper, we use a GOP size of 2, which is used in most results available

in the literature, it means that odd and even frames are key frames and Wyner-Ziv frames, respectively

Trang 3

Selecting the size of block by correlation estimation of

adaptive input blocks: In [10], for each 4x4 input block, the

encoder estimate the correlation level with the side information

in order to permit a correct decoding At the decoder side,

the candidate predictors are created by motion search the

current block 4x4 with a search window of 16x16 pixels

in the previous frame In the situation when the correlation

between Wyner-Ziv frame and the previous Intra frame is high,

it means Wyner-Ziv frame is quite similar to the Intra frame, so

encoding time can be decreased by using higher size of block

In the proposal architecture, the size of input blocks is assigned

for each Wyner-Ziv frame depending on the MAD (Mean of

Absulutely Difference) between the Wyner-Ziv frame and the

previous Intra frame and computed as shown in Eq(1)

S =

4x4 if MAD ≤ threshold 8x8 if MAD > threshold (1)

where S is the block size If MAD ≥ threshold, we consider

that the correlation is low and thus, in order to correctly recover

the Wyner-Ziv frame at the decoder, 4 x 4 size of block is used

If MAD < threshold, it means that the correlation is high and

thus,the 8 x 8 block size is used In this method, threshold is

average of MAD of previous frames

Transform: After the block size of each Wyner-Ziv is

selected, each video frame is divided in to 4 x 4 or 8 x 8

depending on the previous step and a DCT is applied over

each block DCT is used to exploit spatial redundancy in image

blocks

Quantization:A scalar quantizer is applied to the obtained

DCT coefficients to increase compression efficiency

corre-sponding to certain target quality

Syndrome generation: With a block of quantized DCT

coefficients, we compute luminance average of the current

block and transform it to 8 binary bits, namely xi,j where

(i, j) are coordinates of the current blocks center For the sake

of simplicity and descreasing computional time, xi,jis divided

into two parts, namely most significant bits (MSB) and least

significant bits (LSB) These MSB bits will be inferred from

the side information at the decoder since it is believed that

there is very high correlation for these bits; so these bits do

not need to be encoded and sent by the encoder and, thus, they

have an heavy influence on the compression rate The higher

the number of MSB bits, the higher the compression rate On

the other hand, LSB bits are considered less correlation with

block predictor at the decoder, so it is hard to well estimate

by the decoder and these bits will be encoded using a coset

channel code The encoding strategy is to divide the codeword

space X into sets containing multiple words (the quantization

levels/words), equality distanced These sets are called cosets

and are identified by the coset index, or the syndrome, which

needs a fewer amount of information than X to be encoded

So, if the distance between quantization words within each

coset is sufficiently larger than the estimated residual between

X and Y, then it is possible to recover the quantization word

using Y and the transmitted coset

We can briefly explain about coset code through the

following simple example Let X be 3 bits need to encode

at the encoder The space of codewords of X includes 8

Fig 3 The decoder of the proposal architecture

codewords: 000, 001, 010, 011, 100, 101, 110, 111 This space

of codewords of X is partitioned into four sets, each containing two codewords, namely, Coset1 ([0 0 0] and [1 1 1]), Coset2 ([0 0 1] and [1 1 0]), Coset3 ([0 1 0] and [1 0 1]) and Coset4 ([1 0 0] and [0 1 1]) The encoder for X identifies the set containing the codeword for X, and sends the index for the set (which can be described in 2 bits), also called syndrome, instead of the individual codeword The decoder,

in turn, on the reception of the coset index (syndrome), uses

Y to disambiguate the correct X from the set by declaring the codeword that is closest to Y as the answer Note that the distance between X and Y is at most 1, and the distance between the two codewords in any set is 3 Hence, decoding can be done perfectly

Cyclic Redundancy Code: The Cyclic Redundancy Code (CRC) module has the objective to generate a binary signature with the strength to validate the decoded block, thus selecting the good side information candidate There may be many side information candidates and with the purpose of detecting the rightly decoded block, a CRC checksum is sent to the decoder The CRC is designed to detect accidental changes in data, typically small differences between two codewords provoked

by channel errors As all the side information candidates are somehow correlated with the coded block, the decoded candidates are erroneous versions of that block So, the CRC is

an excellent way to detect the side information candidate that

is decoded without errors, generating a successful decoding There are a wide variety of available CRC codes with different lengths and error detection capabilities In literature, it was determined that a 16 bits CRC (CRC-16) has a reasonable per-formance for the detection of successful decoding in a PRISM like DVC architecture In this work, generation polynomial of CRC-16 is shown as (2)

B Decoding process The decoding process is performed in the following steps Motion Search: The motion search module has the objec-tive of providing a motion compensated version of the current block to the syndrome decoder In fact, this module has to generate the side information candidates that jointly with the received syndrome will lead to a successful block decoding The decoder searches the side information in a 16 x 16 window around the current block and sends this side information to the syndrome decoder

Syndrome decoder: This module has the responsibility of selecting the quantized codewords within the cosets while

Trang 4

exploiting the side information sent from the above motion

search module Based on coset index, syndrome decoder finds

within the coset the codeword which is nearest with the side

information This decoded block is sent to hash check module

to verify further

Hash check: Since for every candidate predictor, we will

decode one codeword sequence from the set of sequences

labeled by the syndrome that is nearest to it, the hash signature

mechanism is required to infer the codeword sequence intended

by the encoder For each candidate predictor we check, if it

matches the transmitted hash then the decoding is declared to

be successful Else using the motion search module, the next

candidate predictor is obtained and then the whole procedure

repeated

Reconstruction:This module has the purpose of attributing

a DCT value to each quantized coefficient, thus

regenerat-ing/reconstructing the source with an approximate version of

the encoded DCT coefficients block

Inverse Transform:Once all the transform coefficients have

been dequantized, the zig-zag scan operation carried out at the

encoder is inverted to obtain a 2-D block of reconstructed

coefficients The transformed coefficients are then inverted

using the inverse transform so as to give reconstructed pixels

IV RESULTS ANDDISCUSSIONS

In this experiment, performance of the proposed method

(Adaptive Block Size - ABS) is compared to the method with

fixed block size in [10] The QCIF format video sequences

used in the experiment include Akiyo, Container, Foreman and

Carphone Each sequence is tested with 100 frames

Table 1 shows the average PSNR and total number of bit to

encode video sequences The simulation results show that the

average PSNR of proposed method is higher than PSNR of the

method 8x8 and PSNR of the method 4x4 in some cases with

low motion like Akyioo and Container video sequences The

reason is that the method 8x8 has 64 coset indexes Thus, at

the decoder, the sucessful decoding is lower than the adaptive

method and method 4x4

In Table 2, the average bit number of the proposed method

is always lower than method 4x4 and higher than method 8x8

In the method 4x4, the number of block is the highest and

constant for video sequences because the number of blocks

is constant in each frame Thus, the number of LSB bit and

MSB bit are consummed to encode the blocks in this method is

highest In the method 8x8, the number of blocks is the lowest

and thus the encoding bit is lowest By using the adaptive block

size in the proposed method, although the number of encoding

bit is not lowest, the PSNR of the proposed method is higher

compared to the other methods

Figure 4 shows the PSNR of frame 30th in Akiyo video

sequence The results shows that the aproach based on the

adaptive block size in the proposed method achieved higher

PSNR value while the total encoding is lower than that of

method with block size 4x4

In DVC architecture, SI creation is one of important steps

to improve the performance of codec To have exact SI for

TABLE I A VERAGE PSNR OF VIDEO TEST SEQUENCES

Block size Akiyo Container Carphone Foreman 4x4 38.86 40.94 36.14 37.55 8x8 38.75 40.81 36.01 37.31 ABS 38.92 40.96 36.20 37.41

TABLE II A VERAGE NUMBER OF BIT IN A FRAME

Block size Akiyo Container Carphone Foreman 4x4 101376 101376 101376 101376

sucessful decoding, the selected size of block at encoder is important because this step defines the number of coset index

in syndrome coding Because the LSB bits of each pixel is decoded from coset indexes at the decoder Thus, if the number

of coset indexes is high, the ability of error in syndrome decoding is high and vice versa

In the proposed method, the selection of block size is proposed to adapt with MAD of frames in video sequences The changing block size at encoder helps to adjust the number

of coset indexes and thus to reduces of errors in syndrome decoding at the decoder The proposed method showed effec-tiveness in term of PSNR and total coding bit by using adaptive block size compared to method using constant block size

REFERENCES [1] D Slepian and J Wolf, Noiseless Coding of Correlated Information Sources, IEEE Transactions on Information Theory, vol 19, no 4, pp 471-480, July 1973.

[2] A Wyner and J Ziv, The Rate-Distortion Function for Source Coding with Side Information at the Decoder, IEEE Transactions on Information Theory, vol 22, no 1, pp.1-10, January 1976.

[3] A Aaron, R Zhang, and B Girod, Wyner-Ziv coding of motion video, 36th Asilomar Conference on Signals, Systems and Computers, 2002.

Fig 4 PSNR of frame 30th in Akiyo video sequences

Trang 5

[4] J Ascenso, C Brites, and F Pereira, Improving frame interpolation with

spatial motion smoothing for pixel domain distributed video coding, 5th

EURASIP Conference on Speech and Image Processing, Multimedia

Communications and Services, 2005.

[5] L Natrio, C Brites, J Ascenso, and F Pereira, Side information

extrapo-lation for low-delay pixel-domain distributed video coding, International

Workshop on Very Low Bitrate Video, 2005.

[6] A Aaron and B Girod, , Wyner-Ziv video coding with low-encoder

complexity, Picture Coding Symposium, 2004.

[7] R Puri and K Ramchandran, PRISM: A new robust video coding

architecture based on distributed compression principles, 40th Allerton

Conf Communication, Control and Computing,, Allerton, IL, USA,

2002.

[8] A Aaron, R Zhang, and B Girod, Wyner-Ziv Coding of Motion Video,

in Asilomar Conference on Signals, Systems, and Computers (ACSSC),

Pacific Grove, CA, USA, November 2002.

[9] R Puri, A Majumdar, and K Ramchandran, PRISM: a video coding

paradigm with motion estimation at the decoder, IEEE Transactions on

Image Processing, vol 16, no 10, pp 2436-2448, Oct 2007.

[10] S Milani and G, Calvagno, A Distributed Video Coder Based on

the H.264/AVC Standard, 15th European Signal Processing Conference,

Poznan, pp.673-677, Poland, 2007.

Định dạng
Số trang	5
Dung lượng	336,96 KB