Advanaced video coding for reducing complexity in h 264 3

To be specific, the contributions are: 1 A fast mode decision algorithm is presented for intra prediction in H.264 video coding.. 2 A fast inter mode decision algorithm is proposed to de

Trang 1

To achieve the highest coding efficiency, H.264/AVC uses rate distortion optimization (RDO) technique to get the best coding result in terms of maximizing coding quality and minimizing resulting data bits The idea of RDO can be briefly explained as follows: the encoder examines all possible modes of blocks such as different directions in intra spatial predication, different block sizes and multiple reference frames in the case of inter prediction and chooses the mode with the least

Trang 2

RDO cost This brute-force effort of RDO achieves much better performance, but at the expense of very high computational complexity Even with the state-of-the-art hardware technology, the real-time video coding using H.264/AVC is still a prohibitive task Therefore, algorithms for reducing the time complexity of H.264/AVC, while maintaining the coded bit rate and reconstructed video quality, are

indispensable for real-time implementation of H.264/AVC

Since early 1990’s, video coding technology has evolved continuously, generating international video coding standards such as H.261, H.263, H.263++, MPEG-1, MPEG-2, and MPEG-4 Visual They have contributed tremendously to the successful commercialization of digital video coding Similar to these previous video coding standards, H.264 will continue to provide technical solutions in many targeted application fields such as mobile video communication, digital media production and telemedicine

1.2 Objectives

Because of the computational complexity of H.264, the thesis aims at developing fast algorithms that can improve the encoding speed of H.264 without much loss of visual quality In detail, the objectives are:

(1) Develop new fast and efficient intra mode coding methods for H.264: these methods should adaptively select the most possible candidates for direction prediction The approaches are capable of achieving low complexity performance of existing

Trang 3

(2) Explore a new scheme to inter mode coding for H.264: the scheme should effectively reduce the time spent on selecting different block sizes in inter coding with minimum sacrifice in visual quality

(3) Explore new interpolation approaches for H.264: the approaches should be lossless and greatly reduce the time incurred due to the interpolation process in the encoder

1.3 Thesis Contributions

This thesis has proposed algorithms that are able to achieve the set objectives They are not only contributions to the academic, but to the industry as well To be specific, the contributions are:

(1) A fast mode decision algorithm is presented for intra prediction in H.264 video coding By making use of the edge direction histogram, the number of mode combinations for luminance and chrominance blocks in a macroblock (MB) that take part in RDO calculation has been reduced significantly from 592 to as low as 132 This results in great reduction in the complexity and computation load of the encoder Experimental results show that the fast algorithm has a very negligible loss of PSNR compared to the original scheme

(2) A fast inter mode decision algorithm is proposed to decide the best mode in inter coding of H.264 It makes use of the spatial homogeneity and the temporal

Trang 4

stationarity characteristics of the textures of video objects Specifically, homogeneity decision of a block is based on edge information inside the block, and co-sited MB difference is used to decide whether the MB is temporal stationary Based on the homogeneity and stationarity of the video objects, only a small number of inter modes are used in RDO The experimental results show that the fast algorithm is able to reduce on the average 30% encoding time, with negligible PSNR loss

(3) Two fast intra 4x4 mode elimination approaches are put forward for H.264 The lossless approach checks cost after each 4x4 block intra mode decision, and terminates if the cost is higher than the minimum cost of inter mode coding The lossy approach, by using some low cost preprocessing to make prediction, terminates if the cost is higher than some fraction of the minimum cost of inter mode Experimental results show that the lossless approach can reduce the encoding time without any sacrifice of visual quality The lossy approach can further reduce encoding time with negligible PSNR loss or bit rate increment

(4) Two adaptive interpolation methods are also presented that significantly reduce the interpolation operation required in H.264 video coding By making use of flag matrix data structure and interpolation on-demand, the proposed methods are able to increase encoder speed greatly without any PSNR loss or bit rate increase

1.4 Organization of the Thesis

The rest of this thesis is structured as follows In Chapter 2, a brief introduction to

Trang 5

Chapter 3, a fast intra mode decision method is proposed A fast inter mode decision approach is given in Chapter 4 Intra 4x4 mode elimination approaches are presented

in Chapter 5 Adaptive interpolation methods are described in Chapter 6 In Chapter 7, the contributions of this thesis are summarized and future work is outlined

Trang 6

CHAPTER 2

H.264 AND LITERATURE SURVEY

In this chapter, an overview of the H.264 standard will be presented Furthermore, some important aspects of the standard will be briefly introduced They include the architectures of the encoder and the decoder, inter mode decision, intra mode decision and motion estimation A literature survey is presented in the later part of the chapter

2.1 H.264

The well-known international standards on video coding such as H.261, H.263, MPEG-1, MPEG-2 and MPEG-4 Visual have been developed in the past one or two decades A few years ago, the ITU-T video coding expert group (VCEG) aimed at putting long-term effort to further development of a new standard for low bit rate video coding and communication applications This long-term effort has resulted in the standardization of H.26L, which demonstrates much higher compression efficiency than existing ITU-T standards such as H.261 and H.263

In 2001, the ISO /MPEG joined ITU-T/VCEG for further development of H.26L

A joint team called Joint Video Team (JVT) was formed, whose main goal was to improve the draft H.26L model into a final, complete international standard This new standard is known as advanced video coding (AVC), which is also called MPEG-4 Part

Trang 7

recommendation by ITU-T and in the same year, AVC was accepted as international standard by ISO H.264 and/or AVC will be used interchangeably in this thesis

H.264 was proposed under the MPEG requirement for advanced video coding tools Compared with MPEG-4 Visual, it has a narrower scope and targets mainly at supporting more efficient and robust coding and transmission of video frames instead

of segmenting different objects inside the frames Its original aim was to provide similar functionality to existing video coding standards such as H.263 and Simple Profile of MPEG-4 Visual but with significantly better compression efficiency and more robust and reliable transportation over transmission channels It targets at applications including duplex video communication, also known as video conferencing

or video telephony, digital television broadcasting, digital video streaming, telemedicine applications, digital video storage and digital cinema, etc Support for robust transmission over various network architecture is built inside In addition, the standard is designed to facilitate implementations on a wide range of processor platforms such as Intel, AMD and Sun Solaris One aspect on which the standard differentiates itself and other existing video coding standards is an attempt to interoperate easily among different developers to avoid misinterpretation [4, 5]

The elements common to all video coding standards are present in the current H.264 recommendation Specifically, macroblocks are 16x16 in size Luminance is represented with higher resolution than chrominance with 4:2:0 sub-sampling Motion compensation and block transforms are followed by scalar quantization and entropy coding Motion vectors are predicted from the median of the motion vectors of neighboring blocks Bi-directional B-pictures are supported that may be motion

Trang 8

compensated from both temporally previous and subsequent pictures A direct mode exists for B-pictures in which both forward and backward motion vectors are derived from the motion vector of a co-located macroblock in a reference picture In addition, H.264 has many advantages that distinguish itself from existing video coding standards, while at the same time having similar common features with other existing standards

Some of the key advantages of H.264 are:

(1) Up to 50% in bit rate savings compared to MPEG-4 Visual;

(2) Much better visual quality and PSNR value;

(3) Better error resilience technology;

(4) Network adaptation friendliness

Experimental results [6, 7, 8, 9] have demonstrated that H.264 has achieved substantial better video quality over that achieved by H.261, H.263, MPEG-2, and MPEG-4 Visual The JVT reference model software is able to achieve up to 50% in bit rate saving compared with the existing H.263 or MPEG-4 Visual codec In other words, this implies that H.264 provides significantly better visual quality using the same bit rates In addition to new coding features such as inter and intra prediction, H.264 utilizes some error resilient techniques to cope with different channel environments These characteristics make H.264 an ideal codec for applications with very limited channel capacity, storage limitation and extremely error prone channels such as mobile communication and video telephony

Because of its high compression ratio, the H.264 codec can be utilized to generate

Trang 9

previous video coding standards such as H.263 and MPEG-4 Visual Therefore, it is no doubt that H.264 will be a strong competitor in the deployment of next generation multimedia applications Besides, one important feature of H.264 is that it is an open standard This will bring the codec price down, making the technology affordable to everyone Furthermore, the bit stream format of H.264 is non-proprietary, which is crucial in today’s multimedia application environment

Integer Transform

Entropy Encode

Predictive Coding (Intra / Inter)

Inverse Transform

Inverse Quantization

Figure 2.1 H.264 encoder architecture

Figure 2.1 illustrates the architecture of the H.264 encoder The current frame, denoted by fn, is one of the original video sequences input to encoder The frame will

be divided into partitions in the unit of marcoblocks, each of 16 by 16 pixels in size These macroblocks will be encoded one by one in the raster scan order until the whole frame has been processed

Predictive coding is applied to each macroblock, in the sense that it will be encoded either as intra mode or inter mode A prediction block P of the input source macroblock will be generated, which will be further subtracted from the input

Trang 10

macroblock in order to form the residue, Dn Integer transform (approximation to Discrete Cosine Transform) is performed on Dn and the transform coefficients are quantized It should be noted that the transform itself does not compress information whereas quantization compresses data in a lossy approach by discarding irrelevant information The quantized transform coefficients are re-ordered and entropy coded Entropy encoding compresses information losslessly by exploiting the information redundancy The entropy-coded coefficients, together with side information required for decoding the macroblock (such as the quantizer step size, prediction direction in intra mode or block size decision in inter mode, motion vectors and reference frame number) form the compressed bit stream The bit stream is further passed to a Network Abstraction Layer (NAL) for transmission or storage

In addition, there exists a reconstruction path inside the encoder, which essentially serves as the decoder The purpose is to reconstruct the encoded block for future predictive coding of other macroblocks that will reference the encoded block In detail, quantized coefficients are inversely quantized and inversely transformed, resulting in a decoded prediction residual, Dn' Dn' is added to the prediction, and the resultant frame

is fed into an in-loop deblocking filter to generate the final reconstruction frame

2.3 H.264 Decoder

F’n-1(reference)

F’n(reconstructed) Filter

Reorder Entropy

Decode

Predictive Coding

Inverse Transform Inverse

Quantization NAL

P

+ +

uF’n

Trang 11

As illustrated in Figure 2.2, the H.264 decoder architecture resembles the reconstruction path in the encoder The difference lies in the prior processing that the decoder receives a compressed bit stream from the NAL and the data elements are entropy decoded and reordered to produce a set of quantized coefficients

2.4 Predictive Coding

Predictive coding in H.264 consists of two categories of prediction techniques One is intra coding, which makes prediction using the information inside the same frame The other is inter coding, which predicts using the information from other frames In common, the two approaches both attempt to find the best prediction for each input macroblock, thus leading to best coding gain The predictions are generally

of high complexity The details of the operations of these two modes are illustrated in the following sections

Trang 12

Intra 4x4 mode separates each macroblock into 16 4x4 blocks In each 4x4 block, there are totally nine prediction patterns supported by the Intra 4x4 mode As illustrated in Figure 2.3, the directions consist of vertical, horizontal, direct coefficient (DC), diagonal down-left, diagonal down-right, vertical-right, horizontal-down, vertical-left and horizontal-up respectively To the contrast, intra 16x16 mode does not divide the macroblock and use only four prediction patterns They are vertical, horizontal, DC, and plane respectively, as shown in Figure 2.4 In addition, the 8x8 chrominance sample of the current macroblock will be predicted in the similar manner

as the intra 16x16 mode As illustrated in Figures 2.3 and 2.4, each prediction block is acquired through extrapolating or interpolating the pixels in various specific patterns The prediction mode which can minimize the residue error between the original input macroblock and its prediction block will be chosen as the final coding mode by the encoder

In general, intra coding is necessary for intra-coded frames (I Frame) since the blocks inside the frame have no other prediction information than neighboring blocks With regard to non-I frames, it is also very useful, particularly for coding regions with scene changes and regions with homogeneous characteristics Otherwise, it only provides moderate compression efficiency due to the relatively large number of bits used for representing the block information In order to further exploit information redundancy between neighboring frames inside video sequences, a more efficient and sophisticated predictive coding mode, known as inter coding, is used in H.264

Trang 13

J K L

1 (horizontal)

M A B C D E F G H I

J K L

4 (diagonal down-right)

M A B C D E F G H I

J K L

7 (vertical-left)

M A B C D E F G H I

J K L

8 (horizontal-up)

Mean (A-D, I-L)

Figure 2.3 Intra 4x4 prediction modes

Trang 14

2.4.2 Inter Coding

Through exploiting temporal block similarities inherent in video sequences, inter coding generally has much higher coding efficiency than intra coding However, this comes at the price of significantly higher complexity as compared to intra coding Inter coding employs temporal prediction (known as motion estimation/compensation) from other previously reconstructed pictures, in attempt to exploit the temporal correlation existing among neighboring frames

8x16

P8x8

Figure 2.5 Variable partition sizes employed in inter coding

Various partition sizes are supported in inter coding Figure 2.5 illustrates the various partition sizes used in the inter mode As illustrated, partition with sizes of 16x16, 16x8, 8x16 and 8x8 are supported in inter coding Moreover, in the case where the P8x8 inter mode is chosen, each of the four 8x8 blocks can be further partitioned into 8x8, 8x4, 4x8 or 4x4 partition sizes These different partition sizes will be input to motion estimation process in order to get the best match

Trang 15

Furthermore, there also exists a 16x16 SKIP mode, which can be employed to maximize the coding efficiency for a particular macroblock This mode is suitable in the case that the present macroblock can be fully predicted by directly copying corresponding macroblock from previously reconstructed frame Thus, no data will be encoded except one single bit whose mere usage is to inform the decoder that the macroblock is encoded as a SKIP macroblock Therefore, the SKIP mode offers the highest coding efficiency in relation to all other inter modes and intra mode

Considering the diverse options for different partition sizes and modes, it is certain that each macroblock has to be encoded using a mode carefully chosen from this potentially large number of candidates Normally, a large partition size is sufficient for encoding stationary and homogeneous regions inside one frame whilst a small partition size fits more to regions with more details

2.5 Motion Estimation

Similar to other conventional video coding standards, motion estimation/motion compensation as a block matching technique also plays a crucial role in the H.264 encoding process In the full search process of motion estimation, an exhaustive matching within some search window will be conducted in order to find the best prediction for the current input source block The search will be done with reference to the reconstructed picture that is commonly defined as reference frame For each block, the search will generally result in a motion vector pointing to the location where the best prediction block would be obtained in the corresponding reference frame

Trang 16

Motion estimation using full search is highly computationally intensive, and thus a lot of research on fast motion estimation (FME) methods has been conducted in order

to reduce the number of search points In addition to the motion information, the sizes

of the blocks in terms of specific inter modes must also be encoded in the bit stream The choice of block sizes has significant impacts on the bit rate of encoded bit stream Generally, the smaller partition sizes rather than the bigger ones offer much better matches to the input macroblock However, the motion data related to these additional blocks will lead to more bits for the output encoded bit stream Therefore, a trade-off has to be made in order to balance the motion information and residue in order to generate optimum coding efficiency

At the motion estimation stage, the reference picture is searched for the candidate motion vectors, and the motion vector that results in the best prediction is chosen The motion estimation that is implemented in the H.264 test model chooses the motion vector that minimizes the following cost function

)())

(,()

[],[))

(

,

1 , 1

y

m x c y x s c

s

with s being the original video signal and c being the coded video signal

Trang 17

One outstanding characteristic inherent in H.264 is that it allows multi-frame motion estimation/compensation Different from previous coding standards, H.264 will allow more than one prior-coded frame to be used as reference for motion compensated prediction Figure 2.6 illustrates the scenario where the current frame could possibly refer to five previously decoded frames The multi-frame ME/MC scheme is one of the major contributions to achieving high compressing efficiency for H.264 since it captures the temporally repetitive motions prevalent in the natural video sequences such as birds flying with the wings up and down, people walking to and fro, etc

current framefive reference frames

Figure 2.6 Multiple frame motion estimation/motion compensation

2.6 Mode Decision

Besides the features introduced above, in order to achieve the highest coding efficiency, H.264/AVC uses rate distortion optimization (RDO) technique to get the best coding result in terms of maximizing coding quality and minimizing resulting data bits The idea of RDO can be explained as follows: the encoder experiments on all possible modes of blocks such as different block sizes intra spatial predication, inter-

Trang 18

frame motion estimation, multiple reference frames in the case of inter modes and choose the mode with the least rate distortion (RD) cost RD cost with regard to a certain mode is calculated using the information of distortion, rate and a Lagrangian multiplierλ λ is not very complex and is merely a function of the quantization parameter (QP) Nevertheless, rate is obtained only after a sequence of operations such

as motion estimation/motion compensation (in the case of inter coding), integer transform, quantization, inverse quantization, inverse integer transform and entropy coding During this procedure, distortion can be acquired after reconstructing the macroblock

The calculation of rate and distortion contributes to the overall RDO in achieving much better coding efficiency, but at the expense of very high computational complexity It makes real-time video coding using H.264/AVC a difficult problem Therefore, algorithms which reduce the computational complexity of H.264/AVC while maintaining the coded bit rate and reconstructed video quality are indispensable for real-time implementation of H.264/AVC

2.7 Literature Survey

In the literature, there have been quite a number of approaches to improving the encoding speed of H.264 They can be classified under the following categories: fast full pixel motion estimation, fast fractional pixel motion estimation, adaptive adjustment of search window size used in motion estimation, fast decision of reference frame, reduction of SAD calculation, detection of all-zero integer transformed blocks

In addition, there are also some proposed fast mode decision algorithms These

Trang 19

The fast motion estimation (FME) method proposed in [10] sets a search pattern in the initial step It also uses motion vectors of previous block shapes as the prediction for the following block shapes within a macroblock In the final search stage, seven adaptive preferential search ranges will be used for seven shapes of the blocks These algorithms achieve significant time saving with negligible loss of coding efficiency

In the proposal [11], a hierarchical FME algorithm consisting of four main steps are proposed Firstly, the prediction of initial search point considers the motion vector relationship in spatial domain and between different prediction modes Secondly, an unsymmetrical-cross search is performed to give a good starting search point, after which an uneven multi-hexagon-grid search is used to keep search from dropping into local minimum In the last step, an extended hexagon based search is performed for refinement As for fractional pixel ME, a Center Biased Fractional Pixel Search (CBFPS) strategy with a diamond search pattern is also proposed

In the H.264 proposal document [12], an approach called Enhanced Predictive Zonal Search (EPZS) is proposed for motion estimation It comprises three features: the initial predictor selection, the adaptive early termination and the final prediction refinement In the first feature, only a small set of highly reliable predictors are examined such as median predictor, temporal predictor and spatial predictors With regard to the second feature, an early termination process is adopted since distortion of adjacent blocks tends to be highly correlated The third feature refines motion vectors

by using several types of search patterns

Trang 20

The approach of reference frames reduction is also proposed In the paper [13], the authors present a method to reduce the computational cost due to multiple frame motion estimation without significant quality degradation Instead of checking all the blocks on each reference frame, search is only done on a center-biased path so that an ultimate frame can be selected for final search

A search window size decision algorithm is proposed [14] in the motion estimation process based on applying a motion detection algorithm The motion detection algorithm is based on coloring over sub-sampled binary images generated by block wise threshold image difference

Recently, Ates et al [15] proposed to reuse a limited set of sum of absolute

difference (SAD) values to approximate the SAD value of different block sizes An inevitable consequence of this approximation is quality drop in terms of PSNR value and/or bit rate increase For instance, the bit rate increment can be as high as 3.9% for the “Mobile” sequence

In the paper [16], an early detection algorithm for all-zero integer-transformed

blocks in H.264 is proposed The idea can be briefly explained as follows: if the minimum SAD obtained in the motion estimation stage is lowered according to some derived conditions, there is no need to do integer transform and quantization In this way, the computational savings can be obtained Similarly, [17, 18] propose an adaptive mode decision algorithm by using the property of all-zero coefficients block

Trang 21

Some skip mode decision algorithms have been proposed Originated from [19] and improved in [20], the authors proposed two techniques for H.264 mode decision They try to make an early decision of SKIP mode in P and B frames, and do intra coding selectively They propose to check the SKIP conditions in the first place If it satisfies the conditions used in P and B slices, the SKIP mode is decided as the selected coding mode for the macroblock If spatial correlation of current block is higher than the temporal correlation, the block has higher probability of being an intra block It is proposed that the average boundary error is used between pixels at boundary of the current and its adjacent encoded blocks under the best inter mode as

an indicative of degree of spatial correlation The approach is applied to main profile

of H.264 Later, similar approaches are extended to high profile of H.264 [21, 22]

There are also some mode decision algorithms proposed by other researchers In [23], a mode decision method is proposed as the preprocessing part before motion estimation The algorithm is based on whether the error surface versus block size is monotonic, that is, whether the current macroblock has the same tendency of using smaller block size or larger block size If the error surface is not monotonic, all other modes need to be tested If the error surface is monotonic, only modes between the best two modes are tested Because of the coarse relationship of rate distortion cost, the result is not very promising The bit rate can be increased as high as 2.83% for the foreman sequence

In [24], the proposed approaches are based on both the cost of motion vector and the information of previous frame 1) A SNR based approach is developed to avoid using all the modes to encode each frame If the average SNR of the previous frame

Trang 22

does not fall below the threshold value, the same set of modes are applied to the current frame Otherwise, it indicates that the modes are not suitable for encoding and

a new set should be decided for the current frame 2) In order to decide the best mode for the current block, a selection criteria based on adaptive threshold cost is applied The motion vector cost of the same block in the previous frame is used as the threshold cost on the current block If the cost falls within a range of previous cost, the previous set of modes is used Otherwise, all the possible modes are checked to re-calculate the modes Nevertheless, due to inaccurate prediction inherent to this method, the algorithms result in unacceptable bit rate increase There is even a 5% to 18% increase

in the bit stream sizes over those of the reference implementation in all tested sequences

The paper [25] proposes to categorize the modes into different subsets which are dependent on the characteristics of the video and quantization parameter Particularly, the motion compensability of every frame modulates the intra modes whereas the texture difficulty addresses the inter modes The approach has the following features: 1) With regard to the selection of intra mode, the algorithm evaluates the uncompensability of each frame 2) The inter modes are evaluated by motion estimation and are dependent on the quantizer level and the difficulty of the texture of each frame Since it just uses variance, it cannot differentiate the different direction in one intra mode For instance, it only tells when to use 4x4 mode, inside one 4x4 intra block Thus, the bit rate increment can be as high as 10% and PSNR drop can be as high as 0.30 dB

Trang 23

A recent JVT proposal [26], which originated from [27], provides an approach to combine the fast intra mode decision and fast inter mode decision The approach will skip intra mode decision if selected inter mode is good enough according to some criteria

Dai et al [28] proposes a fast inter prediction mode for H.264 Firstly, it down

samples the original image to a smaller image of half the original resolution Then it pre-encodes the small image with prediction mode selection used in H.264 encoding and obtains the prediction mode of each 8x8 block The problem with this approach is that down sampling is a time consuming process Thus, it will be difficult for the approach to be applied to real-time applications

The paper [29] presents an inter mode decision method depending on the absolute differences between consecutive frames By comparing the SAD with a threshold, the mode classes are decided However, instead of SAD, the cost function, which is a weighted sum of both SAD and motion vectors, is used as the measurement

In [30], information from previously coded MBs, such as distortion, mode and residue, are used to determine which modes can be eliminated with little loss in coding efficiency Its emphasis is more on how to do transcoding from MPEG-2 bit streams to H.264 bit streams

The paper [31] classifies MBs into two classes: high probability MBs (SKIP, INTER 16x16, INTER 8x8), low probability MBs (others) Predict the probability of

Trang 24

current MB according to its neighbors (left, up left, up) and the co-located in the previous frame If the predicted probability is lower than one minus the sum probability of SKIP, INTER 16x16 and INTER 8x8, then the MB is determined to be a low probability MB If the predicted probability is higher than the minimum of the sum probability of SKIP, INTER 16x16 and INTER 8x8, the MB is determined to be a high probability MB Otherwise, further classification is done However, the approach does not consider the dynamics existing in video sequences, and thus it is not suitable for scenarios where motions change between slow and fast in one video sequence

[32] proposes an inter mode decision scheme for P slices It initially exploits neighborhood information jointly with a set of SKIP mode conditions for enhanced skip mode decision It subsequently performs inter mode decision for the remaining macroblocks by using a gentle set of smoothness constraints

In [33], the process of the proposed method consists of two steps: decision I and decision II Decision I selects early SKIP mode decision in inter mode classes Decision II decides whether to try encoding intra mode or not If the Rdcost (rate distortion cost) of the best inter mode is less than a threshold, the routine of trying the intra mode can be omitted

The authors of [34] propose a pruned mode decision method consisting of three steps Decision I checks whether the best mode is SKIP Decision II selects one class between Classl6={SKIP, 16x16, 16x8, 8x16} and Class8={P8x8} Decision III decides

Trang 25

Paper [35] proposes a fast multi-block selection scheme focusing on 16x16, 16x8, 8x16 and 8x8 modes It is a bottom-up approach in the sense that firstly motion estimation is applied to mode 8x8 block type, secondly to mode 16x16 and finally to mode 16x8 and mode 8x16 The determination is made based on the motion vectors However, the approach is compared with full search method of H.264 From the algorithm point of view, the fast block selection method is not compatible with other FME methods

Cheng and Chang [36] present a fast three-step algorithm for 4x4 intra prediction

in H.264 Motivated by the strong correlation of RD cost among different prediction directions, the algorithm, in a systematic process, explores the neighborhood directions adjacent to the minimum one and skips other unlikely directions Instead of 9 modes, 6 modes are required to determine the prediction mode in the full search method Because of the inaccurate information it used, the performance in time saving is not very high

Choi et al [37] proposes a fast mode decision scheme in which early decision is

possible for inter macroblock mode and the routine of computing RD cost is omitted for intra mode whenever possible In order to decide inter macroblock mode early, the inter modes are grouped into two classes: Class16 and Class8 If the Rdcost16 (rate distortion cost of 16x16 block) is less than Rdcost8 (rate distortion cost of 8x8 block), the mode has very high probability to be included under Class16 With regard to intra coding mode, the routine of Rdcost (rate distortion cost) for intra mode is omitted if

Trang 26

the minimum Rdcost at one inter mode is below a threshold Due to the rough relationship among Rdcosts under high complexity mode, the results show a relatively high bit rate increase

The method proposed in [38] only needs to analyze a subset of the seven modes by using spatiotemporal predictions from neighboring blocks The coding modes of five neighboring blocks: the left, upper, upper left and upper right blocks of the current block, and the block at the same location in the previous frame are used for prediction

It further analyzes the reliability of each predicted mode of each inter-block before using the predicted mode for encoding MV variance within a MB and the magnitude

of MV difference are used to evaluate the reliability of the neighboring prediction information The results are achieved by jointly using low level programming techniques such as multi-media extension (MMX)

Kim and Altunbasak propose an algorithm [39] for fast coding mode selection in H.264 by reducing the number of candidate modes A RD optimal MB mode can be selected with high probability by examining only some of most probable candidate mode The problem with the approach is that the most probable modes are fixed and not adaptive depending on input sequences

Garg et al [40] propose an approach which finds the optimal intra prediction

mode for a 4x4 luminance block by investigating six possible modes in the worst case and two modes in the best case The algorithm is based on the statistical behavior of

Trang 27

best-selected modes over a variety of sequences The criteria using RD cost is simple, but having the side effects of achieving not very significant improvement

Han and Lee [41] propose a block matching order for fast mode decision The algorithm skips variable block size motion estimation/compensation and spatial-predictive coding in H.264 video coding standard RD cost is compared with its mean value, which results in relatively higher inaccuracy

[42] proposes an algorithm adopting a multi-stage sequential mode decision process that uses joint spatial and transform domain features to filter out unlikely candidate modes Based on the multi-stage mode decision concept, the algorithm computes low cost features and checks whether the decision process should proceed to the next step or it can be terminated earlier with the most probable mode at each stage

The authors of [43] propose a two-step fast intra prediction mode selection algorithm In the first step, a coarse-level decision is made to split all possible candidate modes into two groups: the group to be examined further and the group to be ignored In the second step, the proposed algorithm focuses on the group of interest, and considers an RD model for final decision-making

A fast H.264 intra-prediction mode selection scheme is proposed in [44] The proposed method uses spatial and transform domain features of the target block jointly

to filter out the majority of candidate modes This is justified by examining the posterior error probability and the average rate-distortion loss For the final mode

Trang 28

selection, either the feature-based or the RDO-based method is applied to 2-3 candidate modes It consists of the following features such as: 1) Feature selection: In the spatial domain, SAD between the true and the predicted pixel values is chosen as a spatial domain feature In the transform domain, SATD is chosen 2) Rank-ordered joint features; 3) Final mode selection

In [45], a feature-based fast intra/inter mode decision method is proposed to reduce the encoder complexity of the H.264 video coding standard The main idea is to decide the mode using the expected risk of choosing the wrong mode in a multidimensional feature space The proposed algorithm calculates three features and maps them into the one of three regions; namely, risk-free, risk-tolerable, and risk intolerable regions Depending on the mapping region, algorithms of different complexities for the final mode decision can be applied

2.8 Summary

An overview of the H.264 standard has been given in this chapter and some of the important parts are emphasized Specifically, these include the architectures of both encoder and decoder, inter and intra mode decision scheme, motion estimation The relevant research efforts in the literature to reducing encoder complexity have been summarized and evaluated Different from others, this thesis provides novel directions and approaches in reducing the H.264 encoding complexity and they successfully achieve the set targets The approaches will be described and analyzed in detail in the subsequent chapters

Trang 29

CHAPTER 3

FAST INTRA MODE DECISION FOR H.264

In this chapter, a fast mode decision algorithm is presented for H.264 intra prediction based on local edge information to reduce the amount of calculations in intra prediction This method is based on the observation that the pixels along the direction of local edge are normally of similar values (this is true for both luminance and chrominance components) Therefore, a good prediction could be achieved if the pixels are to be predicted using those neighboring pixels lying in the same direction of the edge

Therefore, an edge map that represents the local edge orientation and strength is created, and a local edge direction histogram is then established for each sub-block Based on the distribution of the edge direction histogram and the concept of majority voting, only a small number of prediction modes are chosen for RDO calculation during intra prediction Experimental results show that the fast mode decision algorithms increase the speed of intra coding significantly

The rest of the chapter is organized as follows Section 3.1 gives an overview of intra coding in H.264 Section 3.2 will present in detail the fast intra prediction algorithm based on the edge direction histogram Experimental results will be presented in Section 3.3 and summary in Section 3.4.

Trang 30

3.1 Overview of Intra Coding in H.264

Intra coding refers to the case where only spatial redundancies within a video picture [1] are exploited The resulting picture is referred to as an I-picture Traditionally, I-pictures are encoded by directly applying the transform to all macroblocks in the picture, which generates much larger number of data bits compared

to that of inter coding In order to increase the efficiency of the intra coding, spatial correlation between adjacent block/macroblock in a given picture is exploited in H.264, i.e., we can predict the block/macroblock of interest from the surrounding blocks/macroblocks according to their directional information The difference between the actual block/macroblock and their prediction is then coded With these advanced prediction modes, the performance of intra-frame compression in H.264 is similar to that of the recent still image compression standard, JPEG-2000 H.263 and MPEG-4 Visual also provide intra prediction, which only allows intra prediction in frequency domain at macroblock level

Intra coded macroblocks may use either 16×16 or 4×4 spatial prediction modes for luminance components (luma) Four sub-modes are available with 16×16 prediction A 16×16 macroblock can be predicted from the previously adjacent decoded pixels that are available due to the raster order (from the top-left with left-to-right swaths) decoding of macroblocks: vertical prediction from pixels above, horizontal prediction from pixels to the left, and plane prediction by spatial interpolation between these two sets of pixels Nine sub-modes are available with 4×4 prediction as shown in Figure 3.1

Trang 31

Similarly, a 4×4 sub-block can be predicted from the previously adjacent decoded pixels that are available due to the raster order decoding of each 8×8 block within a macroblock, and the nested raster order decoding of each 4×4 sub-block with each 8×8 block Due to this decoding order, not all of the 4×4 prediction modes have the decoded pixels available in their desired prediction direction In this case, the closest available decoded pixel is used For the chrominance (chroma) components, 4 prediction modes, similar to that of 16×16 luma prediction, are applied to the two 8×8 chroma blocks (U and V) Note that the resulting prediction mode for U and V components should be the same

Figure 3.1 illustrates the intra prediction for a 4×4 luma block Note that in this

figure, a to p are the pixels to be predicted, and A to I are the neighboring pixels that

are available at the time of prediction If we choose the prediction mode to be 0, then

the pixels a, e, i, and m are predicted based on the neighboring pixel A; pixels b, f, j and n are predicted based on pixel B, and so on Besides the 8 directional prediction

modes shown in the figure, there is a 9th mode, i.e., the DC prediction mode, or Mode

2 in H.264

H.264 video coding is based on the concept of rate distortion optimization, which means that the encoder has to encode the intra block using all the mode combinations and choose the one that gives the best RDO performance According to the structure of intra prediction in H.264, the number of mode combinations for luma and chroma

blocks in an MB is M8× (M4×16+M16), where M8, M4 and M16 represent the number

of modes for 8×8 chroma blocks, 4×4 and 16×16 luma blocks respectively This means

that, for a macroblock, it has to perform 4×(9×16+4)=592 different RDO calculations

Trang 32

before a best RDO mode is determined As a result, the complexity of the encoder is extremely high

I

3

4 6 1 8

Figure 3.1 An example of intra prediction

3.2 Determining the Primary Edge Direction in the Image Block

It is observed that the pixels along the direction of local edge are normally of similar values (this is true for both luma and chroma components) Therefore, a good prediction could be achieved if we predict the pixels using those neighboring pixels that are in the same direction of the edge Figure 3.2 shows a few 4×4 edge patterns and their preferred intra predication directions

There are a number of ways to get the local edge directional information, such as edge direction histogram, directional fields [46] etc The algorithm described in this chapter is based on edge detection due to its simplicity in terms of computational complexity The rest of this section will explain in detail the fast intra prediction algorithm by using an edge direction histogram

Trang 33

Figure3.2 Examples of 4×4 edge patterns and their preferred intra predication

directions

3.2.1 Edge Map

In order to obtain the edge information in the neighborhood of the intra block to be

predicted, the Sobel edge operator [47] is first applied to the video picture to generate

the edge map Each pixel in the video picture will then be associated with an element

in the edge map, which is the edge vector containing its edge direction and amplitude

Sobel operator has two convolution kernels One responds to degree of difference

in vertical direction and the other in horizontal direction For a pixel p i,j, in a luminance

(or chrominance) picture, the corresponding edge vector, Dr,j ={dx,j,dy,j}, can be

defined as,

1 , 1 , 1 1

, 1 1 , 1 , 1 1

,

1

,

1 , 1 1 , 1

, 1 1 , 1 1 , 1

,

1

,

22

− +

−

− + + + +

×+

=

j i j i j

i j i j i j

i

j

j i j j

i j i j j

i

j

p p p

dy

p p p

dx

, (3.1)

Trang 34

where dx i,j and dy i,j represent the degree of difference in vertical and horizontal

directions respectively Therefore, the amplitude of the edge vector can be decided by,

j j

j dx dy

D

In fact the amplitude could be obtained more accurately using the rooted sum of

the squares of dx i,j and dy i,j However, Equation (3.2) is computationally much more

attractive The direction of the edge (in degree) is decided by the hyperfunction,

0 ,

, , 0

In the actual implementation of the algorithm, Equation (3.3) is not necessary, as in

H.264 there are only a limited number of directions that intra prediction could be

applied In fact, simple thresholding technique will be used to build up the edge

direction histogram instead

0 200 400 600 800 1000 1200

Edge Direction Histogram

Trang 35

3.2.2 Edge Direction Histogram

In order to decide whether the image block contains an edge, and how strong this edge is, an edge direction histogram is calculated from all the pixels in the block by summing up the amplitudes of those pixels with similar directions in the block

3.2.2.1 4×4 luma block edge direction histogram

In the case of a 4×4 luma block, there are 8 directional prediction modes, as shown

in Figure 3.1, plus a DC prediction mode The border between any two adjacent directional prediction modes is the bisector of the two corresponding directions For example, the border of mode 1 (00) and mode 8 (26.60) is the direction at 13.30, this is because that for Mode 8 (Horizontal-Up), prediction is done at an angle of approximately 26.60 above horizontal direction It is important to note that mode 3 and mode 8 are adjacent due to circular symmetry of the prediction modes The mode of each pixel is determined by its edge direction Ang(Dri , j)

Therefore, the edge direction histogram of a 4×4 luma block is decided by the following algorithm For each pixel in a 4×4 luma block, let histo(k), k = 0,1,…,8, be the histogram cell of the prediction mode k, and let η =dy,j/dx,j, then

Trang 36

( Amphisto(8)

)1989.06682

)6682.04966

)1.49660273

)7302.54966

)1.49666682

)6682.01989

)1989.0 |(|

if

else

)( Amphisto(0)

)}

5.0273 |

(|

or )]

0(

and )0{[(

if

, , , , , , ,

,

, ,

j j j j j j j

j

j j

D D D D D D D

D

dy dx

rrrrrrrr

=+

≤

<

=+

≤

<

=+

≤

<

=+

≤

=+

>

≠

=

ηηηηηηη

η

(3.4)

Note that Mode 2 is not included in the above algorithm This is because that Mode 2 will always be chosen as one of the candidate mode Figure 3.3 shows an example of the edge direction histogram

3.2.2.2 Edge direction histogram for 16 ×16 luma and 8 × 8 chroma block

In the case of 16×16 luma and 8×8 chroma blocks, there are only two directional prediction modes, plus a plane prediction and a DC prediction mode Therefore, the edge direction histogram for this case will be based on three directions, i.e., horizontal, vertical and diagonal (plane) directions, as shown in Figure 3.4

Trang 37

0 3

1

Figure 3.4 Intra 8×8 and 16×16 prediction mode directions

The edge direction histogram for 16×16 luma and 8×8 chroma blocks is

constructed as follows,

)

( Amphisto(3)

else

)( Amphisto(1)

)4142.0 |

|(

if

else

)( Amphisto(0)

)4142

j j j

D D D

rrr

=+

≤

=+

Trang 38

For the similar reason, Mode 2 is missing in the above algorithm An example of such edge direction histogram is shown in Figure 3.5 Note for 8 x 8 chroma blocks, the similar equation of the above is applied, except that the order of the mode numbers is different

As mentioned above, each cell in the edge direction histogram sums up the amplitudes of those pixels with similar edge directions in the block Alternatively, the edge direction histogram counts the number of pixels with the similar edge directions

Therefore, based on the principle of majority voting, the cell k with the maximum

amplitude indicates that there is a strong edge along this direction in the image block and such are considered the preferable prediction direction The mode whose direction

complies with such k is chosen as the primary prediction mode

3.3 Mode Decision for Intra Prediction

Based on the primary prediction mode determined previously, the fast mode

decision algorithms for intra prediction select a small number of the prediction modes

to be the candidate prediction modes for RDO computation It should be noted that, the actual RDO computation in H.264 intra coding is based on the reconstructed images, while the edge directional histogram is calculated from the original lossless images, the primary prediction mode decided above will not always be the best RDO mode in actual coding Thus a number of ways have been tried in deciding the number of preferred modes, as is detailed in the following

Trang 39

Method 1: The mode with maximum amplitude in the edge directional histogram is

chosen as the candidate prediction mode, and if this amplitude is below a predefined threshold, the prediction mode will be chosen as DC

Method 2: This method simply takes DC mode as the candidate mode besides the

primary prediction mode This will eliminate the side effect generated by Method 1, in

which thresholds lead to different performances in different sequences

Method 3: During the experiments of Method 2, it is observed that, besides the

primary prediction modes, the best RDO mode is always one of the two adjacent modes (in terms of direction) of the primary prediction modes selected by Method 2

Therefore, the two additional candidate prediction modes are fixed to be the two

neighbors of the primary prediction mode in terms of directions (Refer to Figure 3.1)

Method 4: In this method, additional information is added based on Method 3 The

window size of the histogram computation is enlarged, by including pixels in the left column and upper row of the block of interest This is due to fact that a block of interest is predicted by the pixels above and/or to the left of the block

Experimental results have shown that Method 3 achieves a good balance between computational time and coding efficiency, and the rest of this section will describe the detailed implementation of this algorithm However, the experiment section will still present the comparison among all the methods

Trang 40

3.3.1 4x4 Luma Block Prediction Modes

In the edge detection based approach, the histogram cell with the maximum amplitude is the best candidate for intra prediction In the case that all the cells have the similar amplitudes, DC mode will be a better choice, thus an amplitude threshold is needed in deciding whether the intra block exhibits strong edge presence or is just a flat region However, it is difficult to pre-define a universal threshold that suit for different block context and different video sequences Therefore, DC mode is always chosen as the second candidate in the RDO operation

Extensive experiments also show that, the best RDO mode is, besides the primary prediction mode, one of the two adjacent modes (in terms of direction) of the primary prediction modes selected by the proposed algorithm The main cause for this

phenomena is that H.264 RDO is based on the reconstructed intra lossy images, while the edge directional histogram is calculated from the original lossless images Therefore two additional candidate prediction modes are fixed to be the two neighbors

of the primary prediction mode in terms of directions For example, if the primary prediction mode is Mode 1, then two additional candidate prediction modes will be

Mode 8 and Mode 6 Note that Mode 8 and Mode 3 are adjacent modes in terms of directions due to the symmetry of the circle

In summary, in 4×4 luma block intra coding, the histogram cell with the maximum amplitude, and its two adjacent cells, plus DC mode are chosen to take part in RDO calculation Therefore, for each 4×4 luma block, only four RDO calculations will be performed instead of nine

3.3.2 16x16 Luma Block Prediction Modes

Based on the same observation above, only the primary prediction mode decided

by edge direction histogram is considered as a candidate of best prediction mode, and

Định dạng
Số trang	101
Dung lượng	716,71 KB