Adaptive Long-term Reference Selection for Efficient Scalable Surveillance Video Coding Le Dao Thi Hue, Giap PhamVan, Xiem HoangVan VNU – University of Engineering and Technology huel
Trang 1Adaptive Long-term Reference Selection for
Efficient Scalable Surveillance Video Coding
Le Dao Thi Hue, Giap PhamVan, Xiem HoangVan
VNU – University of Engineering and Technology
hueledao94@gmail.com, giap_pham@outlook.com, xiemhoang@vnu.edu.vn
Abstract
The exponential growth of video surveillance has
been asking for a more powerful video coding
solution, which is characterized by not only the high
compression efficiency but also the adaptive video
streaming capability The surveillance video content,
however, usually contains a large number of
background areas and having high temporal
correlation between frames In this context, we
propose a novel adaptive long-term reference
mechanism for scalable surveillance video coding,
which provides the quality and temporal scalabilities
while achieving the high compression performance
The proposed long – term reference is mainly
selected based on the content analysis of video
sequence The long-term reference selection solution
is integrated into the most recent Scalable High
Efficiency Video Coding (SHVC) standard
Experiments conducted for a rich set of surveillance
videos show that the proposed scalable video coding
solution can achieve around 5.38% bitrate saving
when compared to the traditional SHVC video
coding benchmark
Keywords: Surveillance scalable video coding,
SHVC standard, long – term reference, bitrate saving
1 Introduction
In recent years, there has been an accelerated
expansion of surveillance systems to cope with
security and safety’s threats Considerable numbers
of surveillance cameras have been mounted in public
and private areas The emergence of large video
surveillance infrastructures leads to a massive
amount of content that must be stored, analyzed and
managed by security teams with limited resources
Furthermore, the heterogeneity of networks, display
devices, and transmission environments has been
rising as a critical issue in modern video
communication era [1] To fulfill these challenges, it
is necessary to have an efficient and adaptable
surveillance video compression system, which
provides not only the compression efficiency but also
the adaptability to the network and transmission
variation
The recent achievements of video coding technology have resulted in a new video coding solution, namely High Efficiency Video Coding (HEVC) [2] As reported, HEVC significantly outperforms the well-known H.264/AVC standard [3] For adaptive video streaming, the HEVC scalable extension, namely SHVC, has been introduced in 2014 [4] SHVC is mainly designed based on a layered coding structure in which one base layer (BL) is used to compress the video sequence with low and basic quality / resolution fidelities and one or several enhancement layers (EL)
is used to provide enhanced quality/ resolution fidelities
Though SHVC is the latest scalable video coding standard, its compression performance is still an emerging topic for research and development The work in [5] proposed an improved EL merge mode while the work in [6] proposed a novel joint layer prediction solution As reported [5, 6], the proposed
EL merge mode and joint layer prediction significantly improve the SHVC compression performance, far beyond the SHVC standard However, none of these proposals is designed for visual surveillance systems as it is mainly created for the generic video content In a visual surveillance system, cameras are usually placed at a certain position or moved with a very narrow angle Therefore, the surveillance video content usually contains a large area of background as well as having
a high temporal correlation between frames
To exploit these characteristics, we propose in this paper an improved SHVC compression solution, which is designed for surveillance video content The proposed surveillance scalable video coding (SSVC)
is created based on the use of an adaptive long – term reference selection and updating mechanism The video content is carefully analyzed before using for selecting and updating the reference picture Experimental results have shown that the proposed SSVC solution significantly outperforms the relevant SHVC standard, notably with around 5.38 % bitrate saving while still providing a similar perceptual decoded frame quality
The rest of this paper is organized as follows Section 2 briefly discusses the related and
2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip
Trang 2background work on surveillance and scalable video
coding Afterwards, Section 3 describes the proposed
SSVC solution Section 4 presents and analyzes the
compression performance of the proposed SSVC
with comparison to the SHVC standard Finally,
Section 5 gives some main conclusions and ideas for
future works
2 Related work
2.1 SHVC standard
SHVC is the latest scalable video coding solution,
an extension of the well-known HEVC standard [4],
providing adaptive video compression capability for
a large number of video transmission environments
and displaying devices Similar to the prior SVC
standard [7], SHVC also follows a layered coding
structure with one base layer and one or several
enhancement layers
In contrasts to the SVC, SHVC adopted the
close-loop coding structure at each compression layer and
thus, only high-level syntax (HLS) element can be
changed to upgrade from HEVC to SHVC solution
Following the HLS approach, the inter layer
processing module is added to link the base layer
with the enhancement layers In this module, the
texture and motion information derived from the BL
or lower layers will be proceeded to optimally use at
the ELs As reported [4], SHVC is able to provide
not only the quality, temporal and spatial scalabilities
as commonly supported in SVC standard but also
introducing the newly bit-depth and color gamut
scalability functions It is also worth to note that
SHVC is mainly designed for genetic video content
Therefore, some specific video contents like
surveillance or conference videos may not benefit
from its compression structure
2.2 Surveillance video coding
Surveillance video compression has been
attracted many researches due to its wide use in real
surveillance and security visual systems In an early
work, X.G Zhang et al presented in [8] an efficient
coding solution for surveillance videos captured
from stationary cameras In this proposal, a high
quality background frame is generated and employed
to compress the surveillance video frames
Considering the importance of the background
frames, several background frame models have been
presented [9, 10] However, most of these works are
developed for the non-scalable video coding
structure, i.e., H.264/AVC [3], and HEVC [2]
Therefore, the presented surveillance video coding
solutions are unable to cope with the dynamic
changing of transmission environment and the
variety of displaying devices
3 Proposed surveillance scalable video coding solution
To describe the proposed surveillance scalable video coding solution and its motivation, this section starts with a brief analysis of the surveillance video content Afterwards, the proposed compression solution architecture and its novel coding tools are presented
3.1 Observations
Surveillance video systems have been widely used in modern life, from home security to public environments like schools, factories, or smart transportation In such system, the surveillance cameras are usually set at a fixed position or moved with very narrow angles Therefore, the surveillance videos usually contain the static scene and local movements; thus, a large temporal redundancy can
be exploited in such video content
To study this fact, we show in Fig 1 two frames
obtained from a surveillance video, Intersection
obtained from [11] The differences between these frames are computed and illustrated in Fig 1
(c)
Fig 1 a Frame 1st b Frame 280th
c Difference between (a) and (b)
As it can be seen, the difference between two frames in a surveillance video usually contains a large area of background (black regions) Although
in video is relatively far, the temporal correlation between them is still high This motivates us to propose in this paper a novel scalable video coding solution, developed on the top of the most recent SHVC standard and based on an adaptive long – term reference selection mechanism
3.2 SSVC architecture
Fig 2 illustrates the proposed SSVC architecture
in which two compression layers are presented The proposed adaptive long – term reference selection (ALRS) mechanism is highlighted
Trang 3Fig 2 Proposed Surveillance – Scalable Video Coding
Architecture
SSVC coding walkthrough: A surveillance
video after captured from camera sensor is
compressed using the proposed SSVC solution based
on the following main steps:
1) Adaptive Long – Term Reference Selection
(ALRS): This module creates and updates the
appropriate reference frame for the SSVC inter
prediction The selected reference will be
indexed as a long – term reference and be stored
at the decoded picture buffer (DPB) A coding
flag is necessary to signal this information to
make sure the decoder also knows the selected
reference
2) Base layer compression: After determining the
long – term reference index, the BL frame is
compressed using the conventional HEVC
standard Its decoded texture and motion will be
stored at the DPB to be exploited later for
compressing the EL frames
3) Enhancement layer compression: The enhanced
quality/ resolution frames are performed in this
step First, the BL decoded information is used
in an inter –layer processing [4] Together with
the long – term references, the base layer
references are employed to better predict the EL
information
Finally, both BL and EL bitstreams are merged and
sent to the decoder
3.3 Long – term reference structure
To clarify the proposed long-term reference
structure employed in SSVC, Fig 3 illustrates the
difference between the use of reference frames in the
standard SHVC and the proposed SSVC Here, the
common low-delay (LD) coding structure, is
examined are compared as shown in Fig 3 It should
be noted that the long – term reference structure is
employed for both base and enhancement layers
(a)
(b)
Fig 3 The LD prediction structure of the conventional
video coding standards (a) and the proposed SSVC (b)
As shown, in the conventional video coding standard, a frame can be referred by maximum 4 other consecutive frames, i.e., frame number of 1,2,3,4, and 5 can refer to the decoded information of frame number of zero However, in the proposed long – term reference, the frame number of 6, 7, 8 or
so on can still refer to the frame number of zero This allows exploiting the high temporal correlation
between frames in a surveillance video
3.4 Adaptive Long-term Reference Selection In a surveillance video, there happen some scene changes when a new movement object appears In such case, a long – term reference may not effective at all Considering this problem, it is proposed to adaptively update the long – term reference Fig 4 illustrates an example of long – term reference updating mechanism in which the new long - term reference is updated based on the video content analysis Fig 4 Proposed ALRS Solution In this paper, a long – term reference selection algorithm is proposed by assessing the sum of absolute difference (SAD) between the current coded frame, and its long – term reference, The SAD metric is measured as: HEVC Encoder EL ALRS SHVC Encoder BL EL bitstream BL bistream ILP SHVC Decoder HEVC Decoder Reconstructed EL Reconstructed BL ILP DPB DPB DPB DPB
0 2 4 6 8
0 1 2 3 4 5 6 7 8 ……
GOP 1
GOP 2
GOP n-1
GOP n
Trang 4
= | ( ) − ( )| (1)
To assess the correlation between the current
frame and its reference, an adaptive threshold is
computed as:
In the proposed reference selection mechanism, if
otherwise, the most recent reference frame will be
updated for LTR of the current and consecutive
frames
4 Performance evaluation and discussion
To assess the proposed SSVC solution, four
common surveillance videos obtained from
PKU-SVD-A dataset [11] were used in the experiments
The name and characteristics of selected sequences
are specified in Table 1 while Fig 5 illustrates the
first frame of each tested sequence
Fig 5 Illustration of the first frame for the tested
surveillance videos
Table 1 Summary of test conditions
Test sequence and
spatial resolution
1 Crossroad, 720x576
2 Intersection, 1600x1200
3 Mainroad, 1600x1200
4 Overbridge, 720x576
Frame rate and
Quantization
The video compression benchmark is the
state-of-the-art SHVC standard while the proposed SSVC
solution is examined for two cases: the SSVC
without updating the LTR (SSVC-woUpd) and the
SSVC with updating the LTR (SSVC-wUpd)
Fig 6 illustrates the RD performance while Table
2 presents the BD-rate saving [12] when compare the
proposed SSVC solution to the SHVC standard for four sequence surveillance videos from [11]
Fig 6 RD performance comparison for the test
surveillance videos with update
Table 2 BD-rate saving Sequences SSVC-woUpd
vs SHVC
SSVC-wUpd
vs SHVC
As shown in Table 2 and Fig 6, the proposed SSVC-woUp and the SSVC-wUp achieve better compression performance when compared to the SHVC standard The BD-Rate saving are 1.53%, 5.38% in average, respectively
Our proposed solutions are good for all tested sequence videos, especially surveillance video
contain low-motion and single object e.g Mainroad
5 Conclusions
In this paper, we have proposed an efficient video coding solution for visual surveillance system The proposed surveillance scalable video coding solution
is developed on the top of the SHVC standard and exploits the low motion characteristics observed in surveillance videos through an adaptive long – term
Trang 5reference selection mechanism As assessed, the
proposed SSVC significantly outperforms the SHVC
standard The future works can consider improving
the accuracy of the long – term reference selection
mechanism or takes into account the quality of the
long – term reference
Acknowledgement
This research is funded by Vietnam National
Foundation for Science and Technology Development (NAFOSTED) under grant number
102.01- 2016.15
References
[1] M Valera and S Velastin, “Intelligent distributed surveillance
systems: A review,” IEE Proceedings - Vision, Image and Signal
Processing, vol 152, no 2, pp 192–204, Apr 2005
[2] G J Sullivan, J.-R Ohm, W.-J Han, and T Wiegand,
“Overview of the High Efficiency Video Coding (HEVC)
Standard,” IEEE Transactions on Circuits and Systems for Video
Technology, vol 22, no 12, pp 1649-1668, Dec 2012
[3] T Wiegand, G J Sullivan, G Bjøntegaard, and A Luthra,
“Overview of the H.264/AVC video coding standard,” IEEE
Circuits and Systems for Video Tecnology, vol 13, no 7, pp
560-576, Jul 2003
[4] J M Boyce, Y Ye, J Chen, A.K Ramasubramo-nian,
“Overview of SHVC: Scalable Extensions of the High Efficiency
Video Coding (HEVC) Standard”, IEEE Transactions on Circuits
and Systems for Video Technology, vol 26 no 1 pp 20-34 Jan
2016
[5] X HoangVan, J Ascenso, and F Pereira, "Improving
enhancement layer merge mode for HEVC scalable extension," in
Picture Coding Symposium, Cairns, QLD, Australia, Jun 2015
[6] X HoangVan, J Ascenso, and F Pereira, "Improving SHVC
performance with a joint layer coding mode," in IEEE
International Conference on Acoustics, Speech and Signal
Processing, Shanghai, China, March 2016
[7] H Schwarz D Marpe T Wiegand “Overview of the Scalable
Video Coding Extension of the H.264/AVC Standard”, IEEE
Transactions on Circuits and Systems for Video Technology, vol
17 no 9 pp 1103-1120 Sept 2007
[8] X.G Zhang, L.H Liang, Q Huang, Y.Z Liu, T.J Huang, and
W Gao, “An efficient coding scheme for surveillance videos
captured by stationary cameras,” IEEE International Conference
on Visual Communication and Image Processing (VCIP), pp
77442A1–10, 2010
[9] X Zhang, L Liang, Q Huang, T Huang, W Gao, “A
background model based method for transcoding surveillance
videos captured by stationary camera,” IEEE Picture Coding
Symposium, Nagoya, Japan, pp 78-81, 2010
[10] X Zhang, T Huang, Y Tian, and W Gao,
“Background-modeling-based adaptive prediction for surveillance video coding,”
IEEE Transactions on Image Processing, vol 23, no 2, pp 769–
784, 2014
[11] PKU-SVD-A [Online] Available: http://mlg.idm.pku.edu.cn/- resources/pku-svd-a.html
[12] G Bjontegaard, "Calculation of average PSNR differences
between RD curves," Doc VCEG-M33, 13th ITU-T VCEG
Meeting, Austin, TX, USA, Apr 2001