Spatio-Temporal Filtering For Image And Video Processing: Applications On Quality Enhancement, Coding and Data PruningA dissertation submitted in partial satisfaction of the requirements
Trang 1Spatio-Temporal Filtering For Image And Video Processing: Applications On Quality Enhancement, Coding and Data Pruning
A dissertation submitted in partial satisfaction of the
requirements for the degreeDoctor of Philosophy
inImage and Signal Processing
byD˜ung Trung V˜o
Committee in charge:
Professor Truong Q Nguyen, Chair
Professor Pamela C Cosman
Professor William S Hodgkiss
Professor Yoav Freund
Professor Alon Orlitsky
2009
Trang 2All rights reserved.
Trang 3and it is acceptable in quality and form for cation on microfilm & electronically:
publi-Chair
University of California, San Diego
2009
iii
Trang 4Signature Page iii
Table of Contents iv
List of Figures vii
List of Tables x
Acknowledgements xi
Vita and Publications xiii
Abstract of the Dissertation xvi
1 Introduction 1
1.1 Image and Video Processing Systems 1
1.2 Coding Artifacts 2
1.3 Methods on Artifact Reduction 6
1.4 Motivation 8
1.5 Thesis Outline 10
1.5.1 Spatio-Temporal Filtering for Quality Enhancement 11
1.5.2 Spatio-Temporal Filtering for Coding 13
1.5.3 Data Pruning-Based Compression 14
2 Quality Enhancement for Motion JPEG using Temporal Redundancies 15 2.1 Translational Relation of DCT Coefficients 16
2.2 Quality Enhancement using Temporal Redundancies 19
2.3 Quality Enhancement for Real Video Sequences 22
2.4 Optimal Adaptive Filter for Arbitrary Number of Referenced Frames 24 2.5 Simulation Results 26
2.5.1 Motion Vectors for Enhancement Process 26
2.5.2 Enhancement in the Ideal Case 28
2.5.3 Enhancement in Real Video Sequences 33
2.6 Conclusions 37
3 Adaptive Fuzzy Filtering for Artifact Reduction in Compressed Images and Videos 38
3.1 Fuzzy Filter 39
3.2 Directional Fuzzy Spatial Filter 41
3.2.1 Directional Spread Parameter 41
iv
Trang 5Temporal Filter 44
3.4 Motion Compensated Metric for Flickering Artifact Evaluation 46
3.5 Simulation Results 48
3.5.1 Enhancement for Compressed Images 48
3.5.2 Enhancement for Compressed Video Sequences 55
3.6 Conclusions 61
4 Optimal Motion-Compensated Spatio-Temporal Filter for Quality En-hancement and Coding of H.264/AVC Video Sequences 63
4.1 In-loop Motion Compensated Spatio-Temporal Filter 64
4.2 Optimal Motion Compensated Spatio-Temporal Filter 68
4.3 Overlapped Motion Compensation 69
4.4 Optimal Weight for Inter-frame Coding 72
4.5 Simulation Results 75
4.5.1 In-loop Motion Compensated Spatio-Temporal Filters 75
4.5.2 Optimal Motion Compensated Spatio-Temporal Linear Filter 78
4.5.3 Optimal Weight for Bi-predictive Coding 80
4.6 Conclusions 83
5 Selective Data Pruning-Based Compression using High Order Edge-Directed Interpolation 84
5.1 Rate-Distortion Relation 84
5.2 Data Prune-Based Compression 87
5.3 Optimal Data Pruning 88
5.4 High Order Edge-Directed Interpolation 90
5.4.1 Single Frame-Based Interpolation 91
5.4.2 Multi-Frame-Based Interpolation 94
5.5 Simulation Results 97
5.5.1 High Order Edge-Directed Interpolation 97
5.5.2 Data Pruning-Based Compression 100
5.6 Conclusions 105
6 Conclusions and Future Works 107
6.1 Conclusions 107
6.1.1 Spatio-temporal Filtering for Quality Enhancement 107
6.1.2 Spatio-Temporal Filtering for Coding 109
6.1.3 Spatio-Temporal Filtering for Data Pruning 110
v
Trang 66.2.2 Spatio-temporal Filtering for Coding 111
6.2.3 Spatio-temporal Filtering for Data Pruning 112
6.2.4 Spatio-temporal Estimation for Video Processing 112
Bibliography 113
vi
Trang 7Figure 1.1: Block diagram of an image and video processing system 1
Figure 1.2: An example of blocking artifacts for one zoomed-in part of 6th frame of Foreman sequence 3
Figure 1.3: An example of blocking artifacts over columns of 123rd row of Fig 1.2 4
Figure 1.4: An example of ringing artifacts in a synthesized image 4
Figure 1.5: An example of ringing artifacts over columns of 31st row of Fig 1.4 4
Figure 1.6: An example of ringing artifacts in a zoomed-in part of 6thframe of Mobile sequence 5
Figure 1.7: An example of flickering artifacts 5
Figure 1.8: Comparison between coding artifacts and Gaussian noise 9
Figure 1.9: The correlation between the current frame of compressed Mo-bile sequence and its surrounding frames 9
Figure 1.10: Block diagram of spatio-temporal filtering systems for quality enhancement 11
Figure 1.11: Block diagram of spatio-temporal filtering systems for coding 11
Figure 1.12: Block diagram of spatio-temporal filtering systems for data pruning 12
Figure 2.1: Translation between blocks of image x s and x . 16
Figure 2.2: The original and linearized quantization functions 18
Figure 2.3: Block diagram of the enhancement algorithm 19
Figure 2.4: MSE for motion vectors (m b 0, n b 0) = (0, 0) : (7, 7) and (m f0, n f0) = (−7, −7) : (0, 0) 26
Figure 2.5: Quality enhancement for ideal case - 6thframe of Mobile sequence 29 Figure 2.6: Quality enhancement for ideal case - 6th frame of Foreman sequence 31
Figure 2.7: Quality enhancement for City sequence 32
Figure 2.8: PSNR improvement for City frames compressed with quanti-zation matrix Q 34
Figure 2.9: PSNR for different options with integer pixel ME accuracy 35
Figure 2.10: PSNR for different options with half pixel ME accuracy 36
Figure 3.1: An example of directional JPEG artifacts with scaling factor of 4 for the quantization step matrix 42
Figure 3.2: Angle and spread parameter for directional fuzzy filter 42
Figure 3.3: Angles θ and θ0 of the edge-based directional fuzzy filter 43
Figure 3.4: Flow chart of the directional fuzzy filter 44
Figure 3.5: Block diagram of the adaptive fuzzy MCSTF 44
vii
Trang 8Figure 3.8: Pixel classification for directional filtering 49
Figure 3.9: Pixel classification for directional filtering 50
Figure 3.10: Comparison of filtered results 51
Figure 3.11: Zoomed images for comparison of filtered results 52
Figure 3.12: Comparison on the contribution of spatial and directional adap-tations 53
Figure 3.13: Zoomed images for comparison of Fig 3.12 53
Figure 3.14: Comparison of filter results for MJPEG sequences 56
Figure 3.15: Zoomed views for images in Fig 3.14 56
Figure 3.16: Comparison on PSNR of simulated methods for Mobile sequence 57 Figure 3.17: Comparison on flickering artifacts of simulated methods for Mobile sequence 58
Figure 3.18: Comparison of filter results for H.264 sequences 59
Figure 3.19: Comparison of PSNR for all frames in the Foreman sequence 59
Figure 3.20: Comparison of flickering metric for all frames in the Foreman sequence 60
Figure 3.21: Comparison of PSNR with different bit-rates of the Foreman sequence 61
Figure 4.1: Block diagram of the H.264/AVC encoder with in-loop MCSTF 65 Figure 4.2: In-loop coding and enhancement for GOP IBP The first row is the compressed sequence using conventional encoding scheme, the last rows is the compressed sequence using encoding scheme with in-loop enhancement and the middle rows explain step by step the encoding scheme with in-loop enhancement 66
Figure 4.3: Blocking artifacts of the motion compensated frames 70
Figure 4.4: Blocking artifacts in using cross-block and in-block cubics of MCSTF and MCTF 71
Figure 4.5: Overlapped blocks for motion compensation 71
Figure 4.6: Bi-predictive coding scheme 73
Figure 4.7: Comparison between conventional and proposed in-loop en-hancement H.264/AVC codecs 76
Figure 4.8: Enhancement for 3rd frame of Foreman sequence 77
Figure 4.9: Zoom-ined part of Fig 4.8 78
Figure 4.10: PSNR and flickering artifact comparison for frame in Foreman sequences 79
Figure 4.11: Comparison in R-D curves for bi-predictive coding with differ-ent weight prediction option 80
Figure 4.12: Bi-predictive coding for 19th frame of Crew sequence 81
Figure 5.1: Block diagram of the data pruned-based compression 86
viii
Trang 9Figure 5.4: Block diagram of the single frame-based interpolation phase 91Figure 5.5: Model parameters of sixth-order and eighth-order edge-directed
interpolation 92Figure 5.6: Block diagram of the proposed multi-frame-based interpolation
for case of upsampling with ratio 1× 2 94
Figure 5.7: Model parameters of 9th order edge-directed interpolation 97Figure 5.8: Comparison of NEDI-6 and NEDI-9 to other methods 98Figure 5.9: Comparison of NEDI-8 to other methods 99Figure 5.10: Comparison results for R-D curves of single frame data pruning-
based compression 101Figure 5.11: Comparison of NEDI-6 to other interpolation methods in case
of single frame data pruning-based compression 102Figure 5.12: One zoomed in part of Fig 5.11 102Figure 5.13: Comparison results for multi-frame data pruning-based com-
pression 103Figure 5.14: Comparison for H.264/AVC compression and optimal data
pruning-based compression with same bit-rate and PSNR values 104Figure 5.15: Comparison for H.264/AVC compression and optimal data
pruning-based compression with same bit-rate and PSNR values 105
ix
Trang 10Table 2.1: PSNR enhancement in dB for ideal sequences . 30
Table 2.2: Comparison in PSNR improvement for different scenarios 36
Table 3.1: Comparison of PSNR in units of dB for different methods . 50
Table 3.2: Percentage of the classified pixels 54
Table 3.3: Comparison of PSNR in units of dB of Different Classified Pixels and of Spatial and Directional Adaptations 54
Table 4.1: Operation of each step in Fig 4.2 66
Table 4.2: PSNR and bitrate values for bi-predictive coding options 82
Table 5.1: PSNR comparison (in dB) 99
x
Trang 11First and foremost, I would like to express my deep admiration and truethanks to my advisor, Prof Truong Nguyen His kindness makes me feel valuedand comfortable during my studies at UC San Diego He motivates me every time
I have a chance to discuss ideas or report progress with him Undoubtedly, heinspires my research while still permits me a freedom in searching for new things
I also want to take this opportunity to thank the committee members: Prof.Pamela C Cosman, Prof William S Hodgkiss, Prof Yoav Freund and Prof AlonOrlitsky for their time and valuable comments The suggestions of the committeemembers from my qualify exam substantially improved the quality of my thesis.Most of my knowledge on image processing comes from the course Digital ImageProcessing of Prof Pamela Cosman in Fall 2005, and I would like to thank for herinsightful lectures
During the summer of 2007 I did my internship under the mentoring of Dr.Sehoon Yea and Dr Anthony Vetro at Mitsubishi Electrics Research Laboratories.Their friendly and helpful support enriched my working experience and made mefeel at home during the time I lived in Boston I still remember Dr Anthony Vetrorun quickly through the company aisles to save time
The work on data pruning-based compression would not be possible withoutthe help of Dr Joel Sol´e and Dr Peng Yin at Thomson Corporate Research, where
I spent the internship at Princeton in the summer of 2008 Their support duringthat time and when I came back UC San Diego encouraged me to keep working
on this topic I would like send them great thanks for their approval of the topicand especially for the friendly discussions every Thursday
I would like to thank Prof Jong-Ki Han, his student Chan-Won Seo andDaqian Jin from Motorola for their help in the project on video coding
This work was also encouraged by my professors and colleagues at my prioruniversity, the Ho Chi Minh University of Technology I am grateful to Assoc.Prof Thuong Le-Tien, Dr Chien Hoang-Dinh and Assoc Prof Thanh Vu-Dinhfor their understanding and support
My labmates entertained and inspired me during the time I have studied at
xi
Trang 12My motivation of quality enhancement lately comes from the beauty of theU.S National Parks The most perfect images remain there forever and will always
be the target for image processing
Finally, I dedicate my dissertation to my parents, B ´ o v`ˆ a M e., for their
un-conditional and unbounded love and also to my older sister, my brother-in-law and
my younger sister, Chi., Anh, v`a Em g´ai, for their care of my parents during the
last three and a half years
xii
Trang 131980 Born, Dong Thap, Viet Nam.
2002 Bachelor, Electrical and Electronics,
Ho Chi Minh City University of Technology, Viet Nam
2004 M.S., Electrical and Electronics,
Ho Chi Minh City University of Technology, Viet Nam.2002-2005 Teaching Assistant, Center For Overseas Studies,
Ho Chi Minh City University of Technology, Viet Nam.2002-present Lecturer, Faculty of Electrical and Electronics Engineering,
Ho Chi Minh City University of Technology, Viet Nam.2005-2009 Research Assistant, University of California, San Diego.Summer 2007 Intern, Mitsubishi Electric Research Laboratories (MERL),
Cambridge, MA
Summer 2008 Intern, Thomson Corporate Research,
Princeton, New Jersey
2009-present Sr Research Engineer, Digital Media Solutions Lab,
Samsung Information Systems America Inc., Irvine, CA
2009 Ph.D., Electrical Engineering (Image and Signal Processing),
University of California, San Diego
PUBLICATIONSD˜ung T V˜o, Chan-Won Seo, Daqian Jin, Jong-Ki Han and Truong Q Nguyen,
“Optimal Motion-Compensated Spatio-Temporal Filter for Quality Enhancement
and Coding of H.264/AVC Video Sequences”, submitted to the IEEE Transactions
on Image Processing, March 2009.
D˜ung T V˜o, Joel Sole, Peng Yin, Cristina Gomila and Truong Q Nguyen,
“Selec-xiii
Trang 14D˜ung T V˜o, Truong Nguyen, Sehoon Yea, Anthony Vetro, “Adaptive Fuzzy
Fil-tering For Artifact Reduction In Compressed Images And Videos”, accepted for
publication in the IEEE Transactions on Image Processing, 2008.
D˜ung T V˜o, Truong Nguyen, “Quality Enhancement for Motion JPEG using
Tem-poral Redundancies”, the IEEE Transactions on Circuits and Systems for Video
Technology, vol 18, No 5, pages 609-619, May 2008.
Thuong Le-Tien, Chien Hoang Dinh, Dinh Viet Hao, D˜ung T V˜o, “Neural
Net-works - Based Equalizer Model Implemented To The DSP TMS320C6711”, Journal
Of Science and Technology, Viet Nam, No 40 + 41 /2003, ISBN 0868-3980, Viet
Nam, 2003
D˜ung T V˜o and Truong Nguyen, “Optimal Spatio-temporal Motion Compensated
Filters for Quality Enhancement of H.264/AVC Compressed Sequences”, submitted
to the 2009 IEEE Conference on Image Processing.
Stanley H Chan, D˜ung T V˜o and Truong Q Nguyen, “Subpixel Motion
Estima-tion For TranslaEstima-tional MoEstima-tions”, submitted to the 2009 IEEE Conference on Image
Processing.
D˜ung T V˜o, Joel Sole, Peng Yin, Cristina Gomila and Truong Q Nguyen, “Data
Pruning-Based Compression using High Order Edge-Directed Interpolation”, to
appear in IEEE Conference on Acoustics, Speech and Signal Processing, Taiwan,
2009
D˜ung T V˜o, Truong Nguyen, “Directional Motion-Compensated Spatio-Temporal
Fuzzy Filtering for Quality Enhancement of Compressed Video Sequences ”, the
15th IEEE International Conference on Image Processing, San Diego, CA, 2008.
D˜ung T V˜o, Truong Nguyen, Sehoon Yea, Anthony Vetro, “Edge-based
Direc-tional Fuzzy Filter for Artifact Reduction in JPEG Images”, the 15th IEEE
Inter-national Conference on Image Processing, San Diego, CA, 2008.
xiv
Trang 15Coding Artifacts Reduction”, the 20th SPIE Visual Communications and Image
Processing Conference, San Jose, CA, 2008.
D˜ung T V˜o, Truong Q Nguyen, “Quality Enhancement for Motion JPEG using
Temporal Redundancies”, the 14th IEEE International Conference on Image
Pro-cessing, San Antonio, Texas, 2007.
D˜ung T V˜o, Ryan S Prendergast and Truong Q Nguyen, “Filter-Banks Based
Super-Resolution for Rotated and Blurry Under-Sampled Images”, the 40th
Asilo-mar Conference on Signals, Systems and Computers, Monterey Bay, CA, 2006.
Thuong Le-Tien, D˜ung T V˜o, Tuan Nguyen-Thanh, Chi Nguyen-Duc,“Blind
Dig-ital Audio Watermarking Approach Using Spread Spectrum Technique”, Conf.
Proceedings COSCI 2005, HCMUT, Viet Nam, 2005.
Thuong Le-Tien, SungYoung Lee, Tuan Nguyen-Thanh, D˜ung T V˜o,
“Improv-ing The Robustness Of Watermark“Improv-ing Approach For Copyright Protection”, Conf.
Proceedings CCN ’04 (IASTED), ISBN 0-88986-429-2, Pages 144-148, Cambridge,
MA, USA, 2004
Thuong Le-Tien, SungYoung Lee, Tuan Nguyen Thanh, D˜ung T V˜o, “A
Feasi-ble Solution For Image Watermarking Techniques”, Conf Proceedings, Viet Nam
Conference on Radio and Electronics (REV’04), Pages 273-277, HaNoi, Viet Nam,
2004
Thuong Le-Tien, SungYoung Lee, D˜ung T V˜o, Tuan Nguyen-Thanh, “A Study
on Digital Audio Watermarking using Spread Spectrum Techniques”, Conf
Pro-ceedings ISASE 2004, Pages 16-20, HCMC, Viet Nam, 2004.
Chien Hoang Dinh, Thuong Le-Tien, Dinh Viet Hao, D˜ung T V˜o, “Equalizer
Ap-plying Neural Networks”, The 8th Scientific and Technical Conference, HCMUT,
Viet Nam, 2002
xv
Trang 16by Stanley Chan, D˜ung V˜o, Truong Nguyen, University of California, San Diego,February 2009.
“Methods and Apparatus for Video Image Data Pruning ”, Invention Disclosure
was filed by Thomson Corporate Research on September 2008, Provisional Patentwas issued on January 2009
“Filtering Artifacts in Images with 3D Spatio-Temporal Fuzzy Filters”, Patent was
filed by Mitsubishi Electronic Research Laboratories, September 2007 and was lished on January 24th, 2008 (No US2008/0019605)
pub-“Video Enhancement using Temporal Redundancies”, Invention Disclosure was
filed by D˜ung V˜o, Truong Nguyen, University of California, San Diego, May 2007
xvi
Trang 17Spatio-Temporal Filtering For Image And Video Processing: Applications On Quality Enhancement, Coding and Data Pruning
byD˜ung Trung V˜oDoctor of Philosophy in Image and Signal ProcessingUniversity of California San Diego, 2009Professor Truong Q Nguyen, Chair
In digital image and video processing, compression is required to reduce thenumber of bits needed to represent the signal Spatial, temporal and visual redun-dancies are removed to achieve this goal Although the encoded signal are morecompact and easier for storing or transmitting, the correlation between pixels ofimage or video sequences are distorted This causes coding artifacts, which degradethe visual quality of the signal and cause annoyance to the viewers Block-basedcompressed multi-dimensional signals are usually disturbed by spatial artifactssuch as blocking and ringing artifacts and temporal artifacts such as mosquito ar-tifacts and flickering artifacts To reduce the visual affect of these coding artifacts,compressed images and video sequences should be enhanced prior to being sent tothe displaying devices
Quality enhancement plays a very important role in the post-processingphase It is the first step in this phase and determines the performance of thefollowing steps such as up-scaling, frame rate up-conversion, or contrast enhance-ment Previous methods on quality enhancement focused on separately improvingthe quality of each frames In this way, the temporal consistency between frames
is not guaranteed Furthermore, characteristics of temporal artifacts are not oughly studied and exploited for artifact removal This dissertation investigatesthese characteristics and proposes novel methods to reduce both spatial and tem-
thor-xvii
Trang 18coding and data pruning.
The dissertation starts with analyzing the usage of further information fromsurrounding frames beside the information in the current frames First, a simplelinear temporal filter is studied to investigate the quantization error between thefiltered output and its original signal Reduction of the quantization error verifiesthat using information from surrounding frames can enhance the quality of thecurrent frame Next, a non-linear fuzzy spatio-temporal filter is proposed to adapt
to the characteristics of the coding artifacts, both for compressed images and videosequences
When the proposed spatio-temporal filters are proved to effectively removethe artifacts, they are used in the encoding loop as the enhanced reference frame
in encoding phase These filters are then optimized at the encoder to maximizetheir performance in artifact reduction For special cases when filter coefficientfor the pixel of interest is set to zero, the filter becomes an estimator or an in-terpolator The dissertation extends the discussion for the case of implementingoptimal estimator in the encoding phase using multi reference frames Finally, theedge-directed interpolator is studied and used in data pruning-based compression,which can help reducing the bit-rate of the encoded bit-stream This interpolator
is applied to determine the effective way in dropping the data before compressionand to reconstruct the pruned signal back to its original form The dissertationalso proposes a novel metric for evaluating the flickering artifacts For each topic,extensive simulations are implemented to verify the effectiveness of the proposedmethods
xviii
Trang 191.1 Image and Video Processing Systems
For millennia, people searched for different ways to record their ences: writing and antiquities helped the posterity imagine their ancestors’ lifeand journey while music and poems helped the descendants sympathize with theirancestors’ feeling Though those tools were useful, it was only when modern stillimages and motion pictures were discovered that life could be visualized and re-flected in the most truthful way Entertainingly, synthetic photos and animatedmovies even raise the human imagination to illusive life The word ‘image’ is de-fined in [1] as ‘an artificial resemblance either in painting or sculpture’ Being anartificial resemblance, it should be produced in a suitable form in order to meetthe need of the human being In the recent digital era, the aim is to map themulti-dimensional signal to an efficient representation with high originality andcondensedness
experi-Fig 1.1 presents a conventional image and video system It lays out theprocesses from noisy signal (after recording) to processed signal (prior to display-ing) First, the noisy signal is denoised in the Pre-Process phase Then the Encodephase converts the signal to a more compressed form by removing spatial, temporal
Trang 20and visual redundancies Because the noise is removed from the noised signal inPre-Process phase, the new compressed form does not contain the noise represen-tation and is thus more compact, compared to the compressed form without usingPre-Process phase The new formed signal is transferred over the channel, which issymbolized by the Transmit phase Being reconstructed into the displayable form,the encoded form is de-compressed by the Decode phase Although redundancyremoval helps reducing the number of bits needed to represent the video content, itdestroys the correlation between pixels and causes coding artifacts Post-Processphase is used to reduce these coding artifacts, as well as adjust the processed signalcharacteristic such as its size and frame rate into a desired form for displaying.
The aim of this dissertation is to discuss and propose novel systems whichcan reduce coding artifacts in compressed images and video sequences The systemmodification is not limited to the Post-Process phase only
1.2 Coding Artifacts
Block-based compressed signals suffer from blocking, ringing, mosquito andflickering artifacts, especially at low-bit-rate coding Blocking artifacts occur atthe border of neighboring blocks when each frame is processed independently inseparate blocks with coarse quantization of discrete cosine transform (DCT) co-efficients Fig 1.2 shows an example of blocking artifact for one zoomed-in part
of 6th frame from encoded Foreman sequence This sequence is compressed usingMotion JPEG (MJPEG) codec with scaling factor value of 4 for the quantizationstep size matrix Comparing to the original frame in Fig 1.2(a), the compressedframe in Fig 1.2(b) is affected by blocking artifacts at the border pixels of the
8× 8 blocks Fig 1.3 presents in detail the blocking artifacts over the 123 rd row
of Fig 1.2 As seen in Fig 1.3(b), the smoothness over block borders is destroyedand becomes a step function
Ringing artifacts occur due to the loss of high frequencies when quantizingthe transformed coefficients with a coarse quantization step These high frequencycomponents usually have smaller values than the low frequency components How-
Trang 21(a) Original (b) CompressedFigure 1.2: An example of blocking artifacts for one zoomed-in part of 6th frame
of Foreman sequence
ever, they are quantized with a higher quantization step size than the low frequencycomponents This makes the quantized value of the high frequency componentstend toward to a zero value Fig 1.4 shows an example of ringing artifacts in asynthesized image The image is compressed using the JPEG standard with scalingfactor value of 4 for the quantization step size matrix Fig 1.5 shows in detail theringing artifacts over the 31st row in Fig 1.4 Comparing to the shape of the origi-nal edges as in Fig 1.5(a), the shape of the compressed edges with ringing artifacts
in Fig 1.5(b) is similar to the Gibbs phenomenon [2], where in both cases the highfrequency components are removed Fig 1.6 shows the ringing artifacts in the 6th
frame of Mobile sequence which has edges with complicated shape This sequence
is compressed using MJPEG standard with the standard quantization step sizematrix As can be seen in the compressed frame in Fig 1.6(b), the ringing affectsthe visual quality more seriously in the detail areas and are most prevalent alongthe strong edges
On the other hand, mosquito artifacts come from ringing artifacts of manysingle compressed frames when displayed in a sequence For inter-coded frames,mosquito artifacts become more annoying for blocks on the boundary of movingobjects and backgrounds which have significant inter-frame prediction errors in theresidual signal [3]
Flickering artifacts happen due to the inconsistency in quality over frames
at the same spatial position These flickering artifacts [4] [5] [6] are caused by thedifference of quantization step size for blocks at the same location over frames
Trang 2275 80 85 90 95 100
Column No.
(b) CompressedFigure 1.3: An example of blocking artifacts over columns of 123rdrow of Fig 1.2
(a) Original image (b) CompressedFigure 1.4: An example of ringing artifacts in a synthesized image
100 120 140 160 180 200 220 240 260 280
Column No.
(b) CompressedFigure 1.5: An example of ringing artifacts over columns of 31st row of Fig 1.4
Trang 23(a) Original image (b) Compressed
Figure 1.6: An example of ringing artifacts in a zoomed-in part of 6th frame ofMobile sequence
6
Figure 1.7: An example of flickering artifacts
This parameter is determined by optimizing the rate-distortion function only forthe current frame, which ignores the previously coded frames Fig 1.7 shows oneexample of the temporal consistency in quality The temporal difference between
the values of original pixels at one location (m, n) of the previous frame O(t − 1)
and the current frame O(t) is defined as ∆O(t, m, n) Because of compression, the original values of pixels O(t − 1, m, n) and O(t, m, n) are distorted to I(t − 1, m, n)
and I(t, m, n), respectively The temporal difference between I(t − 1, m, n) and I(t, m, n) is defined as ∆I(t, m, n) If ∆I(t, m, n) = ∆O(t, m, n), the temporal dif-
ference of compressed pixels over frames is not the same as the temporal difference
of original pixels over frames In this case, there exists a temporal inconsistency
in the compressed sequence, which is called flickering artifact
Blocking and ringing artifacts degrade visual spatial quality while mosquito
Trang 24and flickering artifacts reduce the visual temporal quality MJPEG and
MPEG-2 encoded sequences suffer from blocking and ringing artifacts due to the largeprocessed block-size of 8× 8 or 16× 16 pixels for both motion estimation (ME)
and discrete cosine transform (DCT) While H.264/AVC compressed sequences[7] are less affected by blocking artifacts with the use of the deblocking in-loopfilter and more resistant to ringing artifacts by implementing the 4× 4 block-
based ME and integer transform, there is still an inconsistency in their temporalquality Examples of compressed sequences with coding artifacts can be found at:http://videoprocessing.ucsd.edu/∼dungvo/dissertation.html.
1.3 Methods on Artifact Reduction
Many filter-based denoising methods have been proposed to reduce codingartifacts, most of which are frame-based enhancements for blocking and ringingartifact reduction One method to reduce blocking artifacts is to use the lappedorthogonal transform (LOT) [8], which increases the dependence between adjacentblocks As the LOT based approach is incompatible with the JPEG standard,many other approaches consider pixel-domain and DCT-domain post-processingtechniques These include low-pass filtering [9], adaptive median filtering [10] andnonlinear spatially variant filtering [11] which were applied to remove the high fre-quencies caused by sharp edges between adjacent blocks Other pixel-based meth-ods are constrained least squares (CLS) [12], and maximum a posteriori probabilityapproach (MAP) [13], all of which require many iterations with high computationalload In [14], a projections onto convex set (POCS) based method was proposedwith multi-frame constraint sets to reduce the blocking artifacts This methodrequired extracting the motion between frames and quantization information fromthe video bit-stream Xiong et al [15] considered the artifact caused by discontinu-ities between adjacent blocks as quantization noise and used overcomplete waveletrepresentations to remove this noise The edge information is preserved by ex-ploiting cross-scale correlation among wavelet coefficients The algorithm yieldsimproved visual quality of the decoded image over the POCS and MAP methods
Trang 25and has lower complexity In H.264/AVC, an adaptive deblocking filter [16] wasproposed to selectively filter the artifacts in the coded block boundaries Substan-tial objective and subjective quality improvement was achieved by this deblockingfilter.
The DCT-based methods for blocking artifact removal adjust the tized DCT coefficients to reduce quantization error Tien and Hang [17] made anassumption that the quantization errors are strongly correlated to quantized coef-ficients in high contrast areas Their algorithm first compares the DCT coefficients
quan-to the pre-trained quantized coefficient representatives quan-to get the best match, thenadds the corresponding quantized error pattern to reconstruct the original DCTcoefficient This method requires a large pre-defined set of quantized coefficientrepresentatives and provides slight PSNR gain In another approach, Jeon andJeong [18] defined a block boundary discontinuity and compensated for selectedDCT coefficients to minimize this discontinuity To restore the stationarity of theoriginal image, Nosratinia [19] averaged the original decoded image with its 15displacements These displacements are calculated by compressing, decompressingand translating back shifted versions of the original decoded images With an as-sumption of small changes of neighboring DCT coefficients at the same frequency
in a small region, Chen et al [20] applied an adaptively weighted low-pass filter tothe transform coefficients of the shifted blocks The window size is determined bythe block activity which is characterized by human visual system (HVS) sensitiv-ity at different frequencies A new type of shifted blocks across any two adjacentblocks was constituted by Liu and Bovik [21] They also defined a blind mea-surement of local visibility of the blocking artifact Based on this visibility andprimary edges of the image, the block edges were divided into three categories andwere processed by corresponding effective methods
To reduce ringing artifacts, Hu et al [22] used a Sobel operator for edgedetection, then applied a simple low-pass filter for pixels near these edges Using
a similar process, Kong et al [23] established an edge map with smooth, edge andtexture blocks by considering the variance of a 3× 3 window centered on the pixel
of interest Only edge blocks are processed with an adaptive filter to remove the
Trang 26ringing artifacts close to the edges Oguz et al [24] detected the ringing artifactareas that are most prominent to HVS by binary morphological operators Then
a gray-level morphological nonlinear smoothing filter is applied to these regions.Although lessening the ringing artifacts, these methods do not solve the problemcompletely because the high frequency components of the resulting images arenot reconstructed As an encoder-based approach, [25] proposed a noise shapingalgorithm to find the optimal DCT coefficients which adapts to the noise variances
in different areas All of these methods can only reduce ringing artifacts in eachframe To deal with the temporal characteristic of mosquito artifacts, [26] appliedthe spatio-temporal median filter in the transform domain for surrounding 8× 8
blocks The improvement in this case is limited by the small correlation betweenDCT coefficients of the spatial neighboring 8×8 blocks as well as the lack of motion
compensation in the scheme
For flickering artifact removal, most of the current methods focused onreducing flickering artifacts in all intra-frame coding In [5], the quantization error
is considered to obtain the optimal intra prediction mode and to help reducing theflickering artifact Also for intra-frame coding, [6] included the flickering artifactterm in the cost function to find the optimal prediction and block-size mode Asimilar scheme is implemented in [4] for flickering reduction in Motion JPEG 2000.Note that all of these approaches are encoder-based
1.4 Motivation
Visual effects of coding artifacts vary from one codecs to another but theseartifacts always have directional and data-dependent properties Blocking, ringingand flickering artifacts are spatially or temporally directed Because of the block-based compression, blocking artifacts occur at the horizontal and vertical directions
in the pixels at borders between 2 blocks Furthermore, due to the loss of highfrequency components during the coarse quantization, the ringing artifacts appearalong the strong edges of the compressed frame Flickering artifacts happen at thesame location in the temporal direction
Trang 27(a) Original frame (b) Noisy frame (c) Compressed frame
Figure 1.8: Comparison between coding artifacts and Gaussian noise
0.88 0.9 0.92 0.94 0.96 0.98
Figure 1.9: The correlation between the current frame of compressed Mobile quence and its surrounding frames
se-The data dependent property is from the usage of the same quantizationstep size matrix for every block Coding artifacts degrade the visual quality more
in detail areas than in flat areas These characteristics make the coding facts different than the recording noise, where it is usually assumed as additivezero-mean white Gaussian noise [27] [28] [29] Fig 1.8 shows the differences be-tween additive Gaussian noise and coding artifacts in MJPEG compression Thenoisy frame in Fig 1.8(b) is degraded by additive Gaussian noise with variance of
arti-0.01 while the compressed frame in Fig 1.8(c) is encoded with scaling factor of
4 for the quantization step size matrix Comparing to the original in Fig 1.8(a),the Gaussian noise is uniformly distributed over pixels while the coding artifacts
is nonuniformly distributed This implies that these coding artifacts should betreated in a different way other than methods for denoising in pre-processing
In order to efficiently reduce the temporal artifacts such as mosquito and
Trang 28flickering artifacts, not only the spatial correlation among pixels but also the poral one needs to be incorporated Fig 1.9 shows the correlation between the 5th
tem-frame of compressed Mobile sequence and its surrounding tem-frames Compared tothe auto-correlation of the current frame, the cross-correlation in the plot betweenthe center frame and its surrounding frames is still rather large when the framedistance is small Using extra information from temporally neighboring samples,such as pixels of surrounding frames in video sequences, can further enhance thequality of compressed video sequences
Among the described artifacts, blocking artifacts are easiest to be removed
It is because of the knowledge of their location, which is always at the block borders.For other artifacts, it is difficult to locate the artifacts and separate them from theoriginal signal When multi-frame-based enhancement is used, using additionalinformation from surrounding frames does not always guarantee a better improve-ment in quality than only using the information from the current frame Further-more, in some cases, both single-frame-based enhancement and multi-frame-basedenhancement fail to improve the quality of the compressed frame Because of thenonuniformly distributed characteristic of the coding artifacts, the enhancementalgorithm to process detail areas must be different than the one for the flat areas.These matters must be addressed in methods for quality enhancement
1.5 Thesis Outline
The dissertation addresses 3 main topics: spatio-temporal filtering for ity enhancement, coding and data pruning in image and videos processing Thedissertation is organized as follows Spatio-temporal filtering methods for qualityenhancement are presented in Chapter 2 and Chapter 3 The topic of spatio-temporal filtering for coding is discussed in Chapter 4 Data pruning-based com-pression is proposed in Chapter 5 Finally, Chapter 6 gives concluding remarksand future research
qual-The first topic discusses the spatio-temporal filtering methods to process the decoded signal, which is for coding artifact reduction purpose The
Trang 29Signal
- Code - Trans-
-SignalFigure 1.10: Block diagram of spatio-temporal filtering systems for qualityenhancement
Orignal
Signal
- Code
The temporal quantization error is first analyzed for the case of temporalfiltering in Chapter 2 Based on the specific characteristics of coding artifacts,novel non-linear spatio-temporal filters are proposed in Chapter 3 to effectivelyremove these artifacts
Motion Compensated Temporal Linear Filtering
Chapter 2 demonstrates the effectiveness of using extra temporal tion in quality enhancement To simplify the analysis, a simple averaging motioncompensated temporal filter (MCTF) is considered for cases of pure translational
Trang 30is calculated Quality enhancement is achieved if this MSE is less than the MSEbetween the compressed block and the original block This leads to an enhance-ment condition For real video sequences, if the residual between the motioncompensated blocks from surrounding frames and the block in the current frame
is large, the reconstructed block after temporal filtering will deviate further fromthe original block A more practical condition is discussed to take into accountthe residual signal This chapter also generalizes the MCTF for cases of using anarbitrary number of surrounding frames and optimizes them to obtain maximumenhancement
Chapter 2, in full, is a reprint of the material as it appears in D˜ung T V˜o,Truong Nguyen, “Quality Enhancement for Motion JPEG using Temporal Redun-dancies”, the IEEE Transactions on Circuits and Systems for Video Technology,vol 18, No 5, pp 609-619, May 2008
Adaptive Fuzzy Motion Compensated Spatio-Temporal Filtering
For high compression level, the averaging MCTF in Chapter 2, which workswell in the flat areas, can blur the detailed areas In such cases, a non-linearfilter with input-dependent coefficients should be used to avoid the blurring effect.Chapter 3 proposes fuzzy spatio-temporal filters to adaptively remove the codingartifacts Fuzzy filters have real-valued spatial-rank relation and are dependent
to the spread of the signal They average pixels in flat areas while retaining theisolated pixels in edge areas In image and video compression, the artifacts such
as blocking or ringing artifacts are spatially directional and flickering artifactsare temporally directional The fuzzy motion compensated spatio-temporal filter(MCSTF) in this chapter is proposed to be adaptive to the activity of the pixels of
Trang 31interest, the relative value and relative position between the pixel of interest andits surrounding pixels.
+ Directional Fuzzy Filtering for Image Enhancement: in compressed images,such as JPEG or JPEG-2000, the ringing artifacts are prevalent along thestrong edges To avoid blurring these real edges, the proposed directionalspatial filter applies strongest smoothing filtering to the direction perpendic-ular to the edge, where the ringing artifacts are likely to have no relation withthe interested pixels, and a weaker filtering to the edge direction Becausedetail areas tend to have more ringing artifacts than flat areas, the direc-tional spatial filter is proposed to adapt its strength to standard variance ofpixels in a local window
+ Adaptive Fuzzy Filtering for Video Enhancement: to deal with the porally directional characteristic of the flickering artifacts, the fuzzy filter isbased on the cross-correlation value between the window centered by the pixel
tem-of interest and the window centered by its surrounding pixel to determine theposition-dependent scaling factor of the spread parameter The amplitude
of the spread parameter will be controlled using the standard variance ofpixels in a spatial window centered by the pixel of interest To increase thecorrelation between pixels, the surrounding frames are motion compensatedbefore applying the spatio-temporal filter A new metric which considersthe tracking characteristic of human eyes is also proposed to evaluate theflickering artifacts
Chapter 3, in full, is a reprint of the material as it appears in D˜ung T.V˜o, Truong Nguyen, Sehoon Yea, Anthony Vetro, “Adaptive Fuzzy Filtering ForArtifact Reduction In Compressed Images And Videos”, accepted in January 2009for publication in the IEEE Transactions on Image Processing
The application of spatio-temporal filtering for coding is discussed in ter 4 In this chapter, the post filters in Chapter 2 and Chapter 3 are used in the
Trang 32Chap-encoder as a spatio-temporal in-loop filter This helps to increase the consistency
in temporal quality by further reducing temporal artifacts such as mosquito andflickering artifacts With the availability of the original frames at the encoder,optimal linear spatio-temporal filters can be obtained by minimizing the MSE be-tween the reconstructed signal and the original signal These filter coefficients will
be sent to the decoder for post processing purposes When the filter coefficient ofthe pixel of interest is set to zero, the filter output becomes an estimated versionfor the pixel of interest This estimated value is used for prediction or interpola-tion This chapter also discusses the special case of the optimal spatio-temporalfilter when it is simplified to temporal filters with 2 surrounding frames and thefilter coefficient for the pixel of interest is set to zeros In this case, it becomes theoptimal weight for the Bi-predictive (B) frame
Chapter 4, in full, is a reprint of the material as it appears in D˜ung T.V˜o, Chan-Won Seo, Daqian Jin, Jong-Ki Han and Truong Q Nguyen, “Opti-mal Motion-Compensated Spatio-Temporal Filter for Quality Enhancement andCoding of H.264/AVC Video Sequences”, submitted to the IEEE Transactions onImage Processing, March 2009
Chapter 5 discusses a special application of spatio-temporal filters in terpolation A novel high-order edge-directed interpolation scheme is proposed todetermine the optimal way to conduct data pruning and to reconstruct the signal.Data is pruned to reduce frame size so that lower bit-rate is required for compres-sion This means that for the same quality level, lower compression level with fewercoding artifacts can be implemented The edge-directed interpolation is adapted
in-to the data pruning and is discussed for both image and video up-scaling
Chapter 5, in full, is a reprint of the material as it appears in D˜ung T.V˜o, Joel Sole, Peng Yin, Cristina Gomila and Truong Q Nguyen, “Selective DataPruning-Based Compression using High Order Edge-Directed Interpolation”, sub-mitted to the IEEE Transactions on Image Processing, January 2009
Trang 33Motion JPEG using Temporal
Redundancies
Motion JPEG (MJPEG) separately compresses each frame of a video quence in JPEG format Compared to MPEG, MJPEG has lower attainable com-pression level but is also less computationally complex It is popularly used fornon-linear editing which requires easy access to any frame Another application
se-of MJPEG is in medical imaging which requires high quality images and errorresilience The disadvantage of MJPEG is that it does not exploit the temporalredundancies of successive frames to achieve higher compression, and consequentlyMJPEG has a higher bit rate than MPEG for the same quality Quality enhance-ment for MJPEG until now has focused on improving the quality using singleJPEG frame
Due to the strong correlation between successive frames in video sequences,information from neighboring frames can be used to reduce the quantization errorresulting from DCT coefficient truncation in each frame This chapter proposes anovel method of using the previous and future frames to enhance the quality ofthe current frame Motion vectors between these decoded frames are found andused to estimate the displacements of the current frame, which are then averaged
to yield the final reconstructed frame This is equivalent to a motion compensatedtemporal filter (MCTF) for aligned blocks Previous MCTF are used in pre-filterdenoising [30] [31] and in scalable video coding (SVC) [32] [33]
15
Trang 34iii
iv
Figure 2.1: Translation between blocks of image x s and x.
This chapter is organized as follows: Section 2.1 formulates the translationalrelation in the DCT domain and develops an algorithm to find a displacement of oneblock The enhancement process for the case of pure translational video sequenceswill be described in detail in Section 2.2 This section also analyzes the errorbetween the reconstructed and original blocks Extensions for applications to videosequences are discussed in Section 2.3 Section 2.4 generalizes the enhancementprocess This uses an arbitrary number of referenced frames and designs an optimalfilter providing the minimal error Section 2.5 presents simulation results andcomparison to other approaches Finally, Section 2.6 concludes the chapter anddiscusses future directions related to this work
2.1 Translational Relation of DCT Coefficients
This section derives the relation in the DCT domain for shifted blocks ofdifferent frames In MJPEG, each frame is processed in separate blocks of size
N ×N (N = 8) Assume that block x s matches to a portion of image x starting at pixel (m0, n0), possibly located among four adjacent blocks (i, ii, iii, iv) as in Fig.
2.1, i.e.,
x s (m, n) = x(m + m0, n + n0)
for m, n = 0, , N −1 (2.1)
Trang 35The DCT transform (Type II) of x s can be obtained by [34]
In (2.2), if x is replaced by its DCT coefficients X, X s can be calculated based
on the DCT coefficients of 4 blocks (i, ii, iii and iv) of image x or more generally, based on a function F
sin
n
0vπ N
cos
n
0vπ N
sin
n
0vπ N
n0−1 n=0
Trang 36.
After the DCT transform, the quantization process truncates the DCT
co-efficients X(u, v) to obtain the output X q (u, v) with error ∆X q (u, v)
where
X q (u, v) = Q(u, v) × round
X(u, v) Q(u, v)
(2.8)
and Q(u, v) is the quantization step size matrix The same quantization process
is applied to X s to get X s,q Because of the non-linear characteristic of the
quan-tization function, the error ∆X q (u, v) prevents the quantized DCT coefficients of
Trang 37Figure 2.3: Block diagram of the enhancement algorithm.
the original block X q and the shifted blocks X s,q from satisfying (2.3) Linearizing
the quantization function as shown in Fig 2.2, a displacement X s,q d of X s,q can be
x d s,q (m, n) = x q (m + m0, n + n0). (2.10)
In summary, if the motion vectors between blocks are known, a translational
relation can be established as in (2.10) to get a displacement x d s,q of the current
block x s,q The next section presents a method to use this displacement in reducing
the blocking and ringing artifacts of x s,q
2.2 Quality Enhancement using Temporal
Re-dundancies
Fig 2.3 shows the block diagram of the enhancement algorithm The IDCTblocks transform the DCT coefficients to the pixel domain and motion estimation(ME) blocks are used to find the motion vectors between blocks of frames Assumethat the present encoded block is shifted from one block of the previous frame by
(m b
0, n b
0) pixels and from another block of the future frame by (m f0, n f0) pixels, the
backward estimated version x b
q (t, m, n) and forward estimated version x f
q (t, m, n)
Trang 38of compressed block x q (t, m, n) are calculated by
x b q (t, m, n) = x q (t − 1, m + m b
and
x f q (t, m, n) = x q (t + 1, m + m f0, n + n f0). (2.12)The matching blocks in the previous and future frames are any shifted blocks, notjust the encoded blocks Consequently, an average scheme is used to obtain thefinal processed block
This is equivalent to a temporal filtering for aligned blocks of different frames
Since the DCT is an orthonormal transform, the mean squared error (MSE)between the estimated and original blocks can be calculated by their DCT coeffi-cients The error between the original and estimated DCT coefficients is definedby
E q (t, u, v) = X q (t, u, v) −X(t, u, v)
= 13
E q b (t, u, v)+E q (t, u, v)+E q f (t, u, v)
(2.14)where
E q b (t, u, v) = X q b (t, u, v) − X(t, u, v), (2.15)
E q (t, u, v) = X q (t, u, v) − X(t, u, v) (2.16)and
E q f (t, u, v) = X q f (t, u, v) − X(t, u, v). (2.17)The total error comes from quantization error of the current block and the error
of using the past and future displacements of the current block At a specific
Trang 39where X i,q , X ii,q , X iii,q , X iv,q are the quantized DCT coefficients of blocks (i, ii,
iii, iv), respectively and the K b
l (u, v) terms are calculated as
Eq (2.18) shows that the backward error is caused by the quantization errors of
blocks (i, ii, iii, iv) weighted by K b
l An equivalent expression for E f
quan-that E q (u0, v0) is uniformly distributed with zero mean and
Trang 40Consequently, from (2.14), (2.18) and (2.26)
M SE = E
N−1
m=0
N−1 n=0
In this section, an algorithm to enhance the quality of MJPEG was presented and
analyzed To achieve MSE enhancement, motion vectors must lead to σ2
E
q (u0, v0)fulfilling the condition (2.29) which will be considered in detail simulations inSection 2.5
2.3 Quality Enhancement for Real Video
Sequences
In real video sequences, a pure relation as in (2.1) is rarely satisfied Amore suitable relation between blocks needs to be considered in this case Assumethat the present encoded block is shifted from one block of the previous frame by