1. Trang chủ
  2. » Ngoại Ngữ

Design of integer motion estimator of HEVC for asymmetric motion partitioning mode and 4k UHD

2 284 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 2
Dung lượng 259,27 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

HEVC supports the 64 × 64 coding tree unit, the recursive quad-tree coding unit structure and the asymmetric motion-partitioning mode in a high compression ratio.. The new memory read co

Trang 1

Design of integer motion estimator of HEVC

for asymmetric motion-partitioning mode

and 4K-UHD

J Byun, Y Jung and J Kim

A design for an integer motion estimator of high-efficiency video

coding (HEVC) is presented HEVC supports the 64 × 64 coding tree

unit, the recursive quad-tree coding unit structure and the asymmetric

motion-partitioning mode in a high compression ratio These features

require a structure of integer motion estimation that is more complex

than that of H.264/AVC The new structures of a memory read

control-ler and a sum of absolute difference (SAD) summation block are

pro-posed The new memory read controller reduces the internal memory

read time, and the new SAD summation block structure supports the

recursive quad-tree coding unit structure and the asymmetric

motion-partitioning mode The proposed design is implemented in Verilog

HDL and synthesised using the 65 nm CMOS technology The gate

count is 3.56 M, and the internal static random access memory is

about 20 kbyte The operation frequency is 250 MHz when a 4

K-Ultra high definition (UHD) (3840 × 2160P at 30 Hz) sized video is

encoded

Introduction: To provide a compression ratio higher than the previous

(HEVC) uses the basic unit size of 64 × 64, which is called the coding

tree unit (CTU), the recursive quad-tree coding unit structure and the

asymmetric motion-partitioning (AMP) mode [1, 2] These features

provide moreflexible predictability of size partitioning than previous

standards do, but they make it difficult to implement motion-estimator

hardware Previous motion-estimator system structures are not suitable

to support these features [3–5] Therefore, the HEVC requires a

motion-estimator structure that is different from that of the previous

standards

Top-level structure: Our system consists of search area memories,

current memory, 256 process elements (PEs), a sum of absolute

differ-ence (SAD) summation block, a cost block and a comparison tree

Search area memories and current memory save the pixel values of

the reference frame and the current coding unit One PE calculates the

SAD value of a 4 × 4 block The SAD summation block calculates

various SAD values using the results of PEs The cost block solves

the cost values of variously sized blocks, and the comparison tree

block decides the best mode that has the smallest cost value Since

the basic unit of the HEVC is 16 times greater than H.264/AVC and

the HEVC uses a recursive quad-tree coding unit and AMP mode,

new structures of the memory read controller and the SAD summation

block are required

search area (127x127)

processing area (64x64)

scan order

a

Fig 1 Scan order of search area memories

Memory read controller: Fig.1shows the scan order of the processing

area, which is the region of the search area that is calculated

immedi-ately Since search area memories consist of line memories, each line

memory of the search area reads only 1 byte per one clock cycle

There is no problem when the scan order is in the direction (a) or (c)

However, when the scan order is in the direction (b), the line memory

of the last search area has to read 64 bytes per one clock cycle The

memory read cycles increase by four clock cycles when the memory

bit width is 128 bits, which creates 388 800 unnecessary clock cycles

in one 4 K-Ultra high definition (UHD) (3840 × 2160P at 30 Hz) frame

To solve this problem, we added registers on the bottom line Fig 2

shows the dataflow in the search area registers The solid line indicates

the dataflow in direction (a), and the two types of dashed lines show the dataflow in direction (b) or (c) The grey registers are added to the registers on the bottom line By reading the data beforehand, these regis-ters reduce the read cycles to only one clock cycle in direction (b)

p00_00 p00_01 p00_02 p00_63

p01_63

p63_00 p63_02

p64_00

p63_63

p64_63

SRAM data

p00 02

p00_02

a c

SRAM data

SRAM data

SRAM data

SRAM data SRAM data

p00_01

p00_01

p64_01 p64_02 p00_01

Fig 2 Data flow of search area registers

a

b

c

next depth

N

N

N

N

2N

2N 2N

2N

4N

4N

N 3N

N

3N

N=4 SAD sum 0

N=4 SAD sum 1

N=4 SAD sum 2

N=4 SAD sum 3

N=4 SAD sum 4

N=4 SAD sum 5

N=4 SAD sum 6

N=4 SAD sum 7

N=4 SAD sum 8

N=4 SAD sum 9

N=4 SAD sum 10

N=4 SAD sum 11

N=4 SAD sum 12

N=4 SAD sum 13

N=4 SAD sum 14

N=4 SAD sum 15

N=8 SAD sum 0

N=8 SAD sum 1

N=8 SAD sum 2

N=8 SAD sum 3

N=16 SAD sum

N=32 SAD sum

Fig 3 Structure of SAD summation block

a N = 4, 8 or 16

b N = 32

c Hierarchical structure of SAD summation block

SAD summation block: The SAD summation block solves various sizes

of SAD values using 256 4 × 4 SAD values that are calculated by the PEs H.264/AVC uses only seven block sizes However, because the HEVC uses the recursive quad-tree coding unit structure and the AMP mode, it needs 27 block sizes [1, 2] The various block sizes need a SAD summation block that has a structure different from

ELECTRONICS LETTERS 29th August 2013 Vol 49 No 18

Trang 2

H.264/AVC Fig.3a shows the structure of the SAD summation block

when N is 4, 8 or 16 and Fig.3b shows the structure of the SAD

sum-mation block when N is 32 Since the HEVC uses the recursive

quad-tree coding unit structure, the number of structures for N = 4 is 16, for

N = 8 it is 8, for N = 16 and for N = 32 only one is needed As shown

in Fig.3c, these structures are connected hierarchically If N = 32, the

process of the SAD summation block is similar to that of H.264/AVC

However, the bold lines in Fig.3a indicate the AMP mode when N is

4, 8 or 16 These parts effectively calculate the SAD values of the

AMP mode, using small SAD values The proposed SAD summation

block solves the SAD values of every HEVC inter-prediction mode

and depth by adding small neighbour SADs

Cost block and comparison tree: The cost block calculates the cost

values of every prediction mode and depth using SAD values and a

mode and depth of the CTU, using a comparison of the results of the

cost block calculation

Pipeline process: Fig 4shows the pipeline process of the proposed

system The memory read stage uses only one clock cycle; additional

clock cycles are not required in scan direction (b) by adding registers

on the bottomline Finally, the proposed integer-motion-estimator

system uses 4105 clock cycles for processing the integer motion

esti-mation of one CTU

memory

read_1 PE_1 cost block_1

4105 clock cycles

1 clock 2 clock 2 clock

SAD summation_1

4 clock

1 clock

memory

read_2 PE_2 SAD summation cost block_2

memory

read_4096PE_4096SAD summation_4096 cost block_4096 comparison tree

Fig 4 Pipeline process of proposed system

Synthesised results: The proposed system was implemented in Verilog

HDL and was synthesised using the 65 nm CMOS technology The gate

count is 3.56 M and the internal static random access memory (SRAM)

is 20 225 bytes The operation frequency is 250 MHz when a

integer-motion-estimation system [5] The proposed system supports a

greater variety of block sizes and a higher resolution 4 K-UHD video

than the previous one has

Table 1: Comparison of proposed system with previous H.264/

AVC integer-motion-estimation system [5]

Video standard H.264/AVC HEVC

Gate count (SRAM) 1.45 M (2.97 kb) 3.56 M (20.23 kb)

Block size 16 × 16 to 4 × 4 (seven kinds,

without AMP)

64 × 64 to 8 × 4 (27 kinds, with AMP) Search range 64 × 64 64 × 64

Number of

Operation frequency 130 MHz (FHD) 250 MHz (4 K-UHD)

Conclusion: This Letter presents a motion-estimator structure that effectively supports the recursive quad-tree coding unit and the AMP mode and reduces the number of memory read cycles The designed integer-motion-estimator system uses the 65 nm CMOS technology The gate count is 3.56 M with 20.23 kb of internal SRAM It can encode a 4 K-UHD video in real time at a clock speed of 250 MHz Acknowledgment: This work was supported by the IT R&D program

of MOTIE/KEIT (10035389) research on high speed and low power wireless communication SoC for high resolution video information mining

© The Institution of Engineering and Technology 2013

24 March 2013 doi: 10.1049/el.2013.0936

J Byun, Y Jung and J Kim (School of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of Korea)

E-mail: jaekim@yonsei.ac.kr References

1 Bross, B., Han, W.-J., Sullivan, G.J., Ohm, J.-R., and Wiegand, T.:‘High

Efficiency Video Coding (HEVC) Text Specification Draft 9’, ITU-T/ ISO/IEC Joint Collaborative Team on Video coding (JCT-VC), October 2012, JCTVC-K1003

2 Francois, E., Guillo, L., Ichigaya, A., and Yu, H.:‘TE12: report on AMP evaluation’, ITU-T/ISO/IEC Joint Collaborative Team On Video coding (JCT-VC), October 2010, JCTVC-C030

3 Kang, J.S., Lee, Y.T., and Jeon, J.W.:‘Motion estimator with adaptive reduction of search points’, Electron Lett., 2003, 39, (22),

pp 1584–1586

4 Hsia, S.-C., and Hong, P.-Y.: ‘Very large scale integration (VLSI) implementation of low-complexity variable block size motion estimation for H.264/AVC coding’, IET Circuits Devices Syst., 2010, 4, (5),

pp 414–424

5 Kao, C.Y., and Lin, Y.L.:‘A memory-efficient and highly parallel archi-tecture for variable block size integer motion estimation in H.264/AVC’, IEEE Trans Very Large Scale Integr (VLSI) Syst., 2010, 18, (6),

pp 866–874

ELECTRONICS LETTERS 29th August 2013 Vol 49 No 18

Ngày đăng: 25/08/2016, 19:14

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w