1. Trang chủ
  2. » Luận Văn - Báo Cáo

Adaptive Longterm Reference Selection for Efficient Scalable Surveillance Video Coding45027

5 15 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 5
Dung lượng 394,88 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Adaptive Long-term Reference Selection for Efficient Scalable Surveillance Video Coding Le Dao Thi Hue, Giap PhamVan, Xiem HoangVan VNU – University of Engineering and Technology huel

Trang 1

Adaptive Long-term Reference Selection for

Efficient Scalable Surveillance Video Coding

Le Dao Thi Hue, Giap PhamVan, Xiem HoangVan

VNU – University of Engineering and Technology

hueledao94@gmail.com, giap_pham@outlook.com, xiemhoang@vnu.edu.vn

Abstract

The exponential growth of video surveillance has

been asking for a more powerful video coding

solution, which is characterized by not only the high

compression efficiency but also the adaptive video

streaming capability The surveillance video content,

however, usually contains a large number of

background areas and having high temporal

correlation between frames In this context, we

propose a novel adaptive long-term reference

mechanism for scalable surveillance video coding,

which provides the quality and temporal scalabilities

while achieving the high compression performance

The proposed long – term reference is mainly

selected based on the content analysis of video

sequence The long-term reference selection solution

is integrated into the most recent Scalable High

Efficiency Video Coding (SHVC) standard

Experiments conducted for a rich set of surveillance

videos show that the proposed scalable video coding

solution can achieve around 5.38% bitrate saving

when compared to the traditional SHVC video

coding benchmark

Keywords: Surveillance scalable video coding,

SHVC standard, long – term reference, bitrate saving

1 Introduction

In recent years, there has been an accelerated

expansion of surveillance systems to cope with

security and safety’s threats Considerable numbers

of surveillance cameras have been mounted in public

and private areas The emergence of large video

surveillance infrastructures leads to a massive

amount of content that must be stored, analyzed and

managed by security teams with limited resources

Furthermore, the heterogeneity of networks, display

devices, and transmission environments has been

rising as a critical issue in modern video

communication era [1] To fulfill these challenges, it

is necessary to have an efficient and adaptable

surveillance video compression system, which

provides not only the compression efficiency but also

the adaptability to the network and transmission

variation

The recent achievements of video coding technology have resulted in a new video coding solution, namely High Efficiency Video Coding (HEVC) [2] As reported, HEVC significantly outperforms the well-known H.264/AVC standard [3] For adaptive video streaming, the HEVC scalable extension, namely SHVC, has been introduced in 2014 [4] SHVC is mainly designed based on a layered coding structure in which one base layer (BL) is used to compress the video sequence with low and basic quality / resolution fidelities and one or several enhancement layers (EL)

is used to provide enhanced quality/ resolution fidelities

Though SHVC is the latest scalable video coding standard, its compression performance is still an emerging topic for research and development The work in [5] proposed an improved EL merge mode while the work in [6] proposed a novel joint layer prediction solution As reported [5, 6], the proposed

EL merge mode and joint layer prediction significantly improve the SHVC compression performance, far beyond the SHVC standard However, none of these proposals is designed for visual surveillance systems as it is mainly created for the generic video content In a visual surveillance system, cameras are usually placed at a certain position or moved with a very narrow angle Therefore, the surveillance video content usually contains a large area of background as well as having

a high temporal correlation between frames

To exploit these characteristics, we propose in this paper an improved SHVC compression solution, which is designed for surveillance video content The proposed surveillance scalable video coding (SSVC)

is created based on the use of an adaptive long – term reference selection and updating mechanism The video content is carefully analyzed before using for selecting and updating the reference picture Experimental results have shown that the proposed SSVC solution significantly outperforms the relevant SHVC standard, notably with around 5.38 % bitrate saving while still providing a similar perceptual decoded frame quality

The rest of this paper is organized as follows Section 2 briefly discusses the related and

2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip

Trang 2

background work on surveillance and scalable video

coding Afterwards, Section 3 describes the proposed

SSVC solution Section 4 presents and analyzes the

compression performance of the proposed SSVC

with comparison to the SHVC standard Finally,

Section 5 gives some main conclusions and ideas for

future works

2 Related work

2.1 SHVC standard

SHVC is the latest scalable video coding solution,

an extension of the well-known HEVC standard [4],

providing adaptive video compression capability for

a large number of video transmission environments

and displaying devices Similar to the prior SVC

standard [7], SHVC also follows a layered coding

structure with one base layer and one or several

enhancement layers

In contrasts to the SVC, SHVC adopted the

close-loop coding structure at each compression layer and

thus, only high-level syntax (HLS) element can be

changed to upgrade from HEVC to SHVC solution

Following the HLS approach, the inter layer

processing module is added to link the base layer

with the enhancement layers In this module, the

texture and motion information derived from the BL

or lower layers will be proceeded to optimally use at

the ELs As reported [4], SHVC is able to provide

not only the quality, temporal and spatial scalabilities

as commonly supported in SVC standard but also

introducing the newly bit-depth and color gamut

scalability functions It is also worth to note that

SHVC is mainly designed for genetic video content

Therefore, some specific video contents like

surveillance or conference videos may not benefit

from its compression structure

2.2 Surveillance video coding

Surveillance video compression has been

attracted many researches due to its wide use in real

surveillance and security visual systems In an early

work, X.G Zhang et al presented in [8] an efficient

coding solution for surveillance videos captured

from stationary cameras In this proposal, a high

quality background frame is generated and employed

to compress the surveillance video frames

Considering the importance of the background

frames, several background frame models have been

presented [9, 10] However, most of these works are

developed for the non-scalable video coding

structure, i.e., H.264/AVC [3], and HEVC [2]

Therefore, the presented surveillance video coding

solutions are unable to cope with the dynamic

changing of transmission environment and the

variety of displaying devices

3 Proposed surveillance scalable video coding solution

To describe the proposed surveillance scalable video coding solution and its motivation, this section starts with a brief analysis of the surveillance video content Afterwards, the proposed compression solution architecture and its novel coding tools are presented

3.1 Observations

Surveillance video systems have been widely used in modern life, from home security to public environments like schools, factories, or smart transportation In such system, the surveillance cameras are usually set at a fixed position or moved with very narrow angles Therefore, the surveillance videos usually contain the static scene and local movements; thus, a large temporal redundancy can

be exploited in such video content

To study this fact, we show in Fig 1 two frames

obtained from a surveillance video, Intersection

obtained from [11] The differences between these frames are computed and illustrated in Fig 1

(c)

Fig 1 a Frame 1st b Frame 280th

c Difference between (a) and (b)

As it can be seen, the difference between two frames in a surveillance video usually contains a large area of background (black regions) Although

in video is relatively far, the temporal correlation between them is still high This motivates us to propose in this paper a novel scalable video coding solution, developed on the top of the most recent SHVC standard and based on an adaptive long – term reference selection mechanism

3.2 SSVC architecture

Fig 2 illustrates the proposed SSVC architecture

in which two compression layers are presented The proposed adaptive long – term reference selection (ALRS) mechanism is highlighted

Trang 3

Fig 2 Proposed Surveillance – Scalable Video Coding

Architecture

SSVC coding walkthrough: A surveillance

video after captured from camera sensor is

compressed using the proposed SSVC solution based

on the following main steps:

1) Adaptive Long – Term Reference Selection

(ALRS): This module creates and updates the

appropriate reference frame for the SSVC inter

prediction The selected reference will be

indexed as a long – term reference and be stored

at the decoded picture buffer (DPB) A coding

flag is necessary to signal this information to

make sure the decoder also knows the selected

reference

2) Base layer compression: After determining the

long – term reference index, the BL frame is

compressed using the conventional HEVC

standard Its decoded texture and motion will be

stored at the DPB to be exploited later for

compressing the EL frames

3) Enhancement layer compression: The enhanced

quality/ resolution frames are performed in this

step First, the BL decoded information is used

in an inter –layer processing [4] Together with

the long – term references, the base layer

references are employed to better predict the EL

information

Finally, both BL and EL bitstreams are merged and

sent to the decoder

3.3 Long – term reference structure

To clarify the proposed long-term reference

structure employed in SSVC, Fig 3 illustrates the

difference between the use of reference frames in the

standard SHVC and the proposed SSVC Here, the

common low-delay (LD) coding structure, is

examined are compared as shown in Fig 3 It should

be noted that the long – term reference structure is

employed for both base and enhancement layers

(a)

(b)

Fig 3 The LD prediction structure of the conventional

video coding standards (a) and the proposed SSVC (b)

As shown, in the conventional video coding standard, a frame can be referred by maximum 4 other consecutive frames, i.e., frame number of 1,2,3,4, and 5 can refer to the decoded information of frame number of zero However, in the proposed long – term reference, the frame number of 6, 7, 8 or

so on can still refer to the frame number of zero This allows exploiting the high temporal correlation

between frames in a surveillance video

3.4 Adaptive Long-term Reference Selection In a surveillance video, there happen some scene changes when a new movement object appears In such case, a long – term reference may not effective at all Considering this problem, it is proposed to adaptively update the long – term reference Fig 4 illustrates an example of long – term reference updating mechanism in which the new long - term reference is updated based on the video content analysis Fig 4 Proposed ALRS Solution In this paper, a long – term reference selection algorithm is proposed by assessing the sum of absolute difference (SAD) between the current coded frame, and its long – term reference, The SAD metric is measured as: HEVC Encoder EL ALRS SHVC Encoder BL EL bitstream BL bistream ILP SHVC Decoder HEVC Decoder Reconstructed EL Reconstructed BL ILP DPB DPB DPB DPB

0 2 4 6 8

0 1 2 3 4 5 6 7 8 ……

GOP 1

GOP 2

GOP n-1

GOP n

Trang 4

= | ( ) − ( )| (1)

To assess the correlation between the current

frame and its reference, an adaptive threshold is

computed as:

In the proposed reference selection mechanism, if

otherwise, the most recent reference frame will be

updated for LTR of the current and consecutive

frames

4 Performance evaluation and discussion

To assess the proposed SSVC solution, four

common surveillance videos obtained from

PKU-SVD-A dataset [11] were used in the experiments

The name and characteristics of selected sequences

are specified in Table 1 while Fig 5 illustrates the

first frame of each tested sequence

Fig 5 Illustration of the first frame for the tested

surveillance videos

Table 1 Summary of test conditions

Test sequence and

spatial resolution

1 Crossroad, 720x576

2 Intersection, 1600x1200

3 Mainroad, 1600x1200

4 Overbridge, 720x576

Frame rate and

Quantization

The video compression benchmark is the

state-of-the-art SHVC standard while the proposed SSVC

solution is examined for two cases: the SSVC

without updating the LTR (SSVC-woUpd) and the

SSVC with updating the LTR (SSVC-wUpd)

Fig 6 illustrates the RD performance while Table

2 presents the BD-rate saving [12] when compare the

proposed SSVC solution to the SHVC standard for four sequence surveillance videos from [11]

Fig 6 RD performance comparison for the test

surveillance videos with update

Table 2 BD-rate saving Sequences SSVC-woUpd

vs SHVC

SSVC-wUpd

vs SHVC

As shown in Table 2 and Fig 6, the proposed SSVC-woUp and the SSVC-wUp achieve better compression performance when compared to the SHVC standard The BD-Rate saving are 1.53%, 5.38% in average, respectively

Our proposed solutions are good for all tested sequence videos, especially surveillance video

contain low-motion and single object e.g Mainroad

5 Conclusions

In this paper, we have proposed an efficient video coding solution for visual surveillance system The proposed surveillance scalable video coding solution

is developed on the top of the SHVC standard and exploits the low motion characteristics observed in surveillance videos through an adaptive long – term

Trang 5

reference selection mechanism As assessed, the

proposed SSVC significantly outperforms the SHVC

standard The future works can consider improving

the accuracy of the long – term reference selection

mechanism or takes into account the quality of the

long – term reference

Acknowledgement

This research is funded by Vietnam National

Foundation for Science and Technology Development (NAFOSTED) under grant number

102.01- 2016.15

References

[1] M Valera and S Velastin, “Intelligent distributed surveillance

systems: A review,” IEE Proceedings - Vision, Image and Signal

Processing, vol 152, no 2, pp 192–204, Apr 2005

[2] G J Sullivan, J.-R Ohm, W.-J Han, and T Wiegand,

“Overview of the High Efficiency Video Coding (HEVC)

Standard,” IEEE Transactions on Circuits and Systems for Video

Technology, vol 22, no 12, pp 1649-1668, Dec 2012

[3] T Wiegand, G J Sullivan, G Bjøntegaard, and A Luthra,

“Overview of the H.264/AVC video coding standard,” IEEE

Circuits and Systems for Video Tecnology, vol 13, no 7, pp

560-576, Jul 2003

[4] J M Boyce, Y Ye, J Chen, A.K Ramasubramo-nian,

“Overview of SHVC: Scalable Extensions of the High Efficiency

Video Coding (HEVC) Standard”, IEEE Transactions on Circuits

and Systems for Video Technology, vol 26 no 1 pp 20-34 Jan

2016

[5] X HoangVan, J Ascenso, and F Pereira, "Improving

enhancement layer merge mode for HEVC scalable extension," in

Picture Coding Symposium, Cairns, QLD, Australia, Jun 2015

[6] X HoangVan, J Ascenso, and F Pereira, "Improving SHVC

performance with a joint layer coding mode," in IEEE

International Conference on Acoustics, Speech and Signal

Processing, Shanghai, China, March 2016

[7] H Schwarz D Marpe T Wiegand “Overview of the Scalable

Video Coding Extension of the H.264/AVC Standard”, IEEE

Transactions on Circuits and Systems for Video Technology, vol

17 no 9 pp 1103-1120 Sept 2007

[8] X.G Zhang, L.H Liang, Q Huang, Y.Z Liu, T.J Huang, and

W Gao, “An efficient coding scheme for surveillance videos

captured by stationary cameras,” IEEE International Conference

on Visual Communication and Image Processing (VCIP), pp

77442A1–10, 2010

[9] X Zhang, L Liang, Q Huang, T Huang, W Gao, “A

background model based method for transcoding surveillance

videos captured by stationary camera,” IEEE Picture Coding

Symposium, Nagoya, Japan, pp 78-81, 2010

[10] X Zhang, T Huang, Y Tian, and W Gao,

“Background-modeling-based adaptive prediction for surveillance video coding,”

IEEE Transactions on Image Processing, vol 23, no 2, pp 769–

784, 2014

[11] PKU-SVD-A [Online] Available: http://mlg.idm.pku.edu.cn/- resources/pku-svd-a.html

[12] G Bjontegaard, "Calculation of average PSNR differences

between RD curves," Doc VCEG-M33, 13th ITU-T VCEG

Meeting, Austin, TX, USA, Apr 2001

Ngày đăng: 24/03/2022, 10:29

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN