Spatial temporal filtering for image and video processing applications on quality enhancement, coding and data pruning

Spatio-Temporal Filtering For Image And Video Processing: Applications On Quality Enhancement, Coding and Data PruningA dissertation submitted in partial satisfaction of the requirements

Trang 1

Spatio-Temporal Filtering For Image And Video Processing: Applications On Quality Enhancement, Coding and Data Pruning

A dissertation submitted in partial satisfaction of the

requirements for the degreeDoctor of Philosophy

inImage and Signal Processing

byD˜ung Trung V˜o

Committee in charge:

Professor Truong Q Nguyen, Chair

Professor Pamela C Cosman

Professor William S Hodgkiss

Professor Yoav Freund

Professor Alon Orlitsky

2009

Trang 2

Trang 3

and it is acceptable in quality and form for cation on microﬁlm & electronically:

publi-Chair

University of California, San Diego

2009

iii

Trang 4

Signature Page iii

Table of Contents iv

List of Figures vii

List of Tables x

Acknowledgements xi

Vita and Publications xiii

Abstract of the Dissertation xvi

1 Introduction 1

1.1 Image and Video Processing Systems 1

1.2 Coding Artifacts 2

1.3 Methods on Artifact Reduction 6

1.4 Motivation 8

1.5 Thesis Outline 10

1.5.1 Spatio-Temporal Filtering for Quality Enhancement 11

1.5.2 Spatio-Temporal Filtering for Coding 13

1.5.3 Data Pruning-Based Compression 14

2 Quality Enhancement for Motion JPEG using Temporal Redundancies 15 2.1 Translational Relation of DCT Coeﬃcients 16

2.2 Quality Enhancement using Temporal Redundancies 19

2.3 Quality Enhancement for Real Video Sequences 22

2.4 Optimal Adaptive Filter for Arbitrary Number of Referenced Frames 24 2.5 Simulation Results 26

2.5.1 Motion Vectors for Enhancement Process 26

2.5.2 Enhancement in the Ideal Case 28

2.5.3 Enhancement in Real Video Sequences 33

2.6 Conclusions 37

3 Adaptive Fuzzy Filtering for Artifact Reduction in Compressed Images and Videos 38

3.1 Fuzzy Filter 39

3.2 Directional Fuzzy Spatial Filter 41

3.2.1 Directional Spread Parameter 41

iv

Trang 5

Temporal Filter 44

3.4 Motion Compensated Metric for Flickering Artifact Evaluation 46

3.5 Simulation Results 48

3.5.1 Enhancement for Compressed Images 48

3.5.2 Enhancement for Compressed Video Sequences 55

3.6 Conclusions 61

4 Optimal Motion-Compensated Spatio-Temporal Filter for Quality En-hancement and Coding of H.264/AVC Video Sequences 63

4.1 In-loop Motion Compensated Spatio-Temporal Filter 64

4.2 Optimal Motion Compensated Spatio-Temporal Filter 68

4.3 Overlapped Motion Compensation 69

4.4 Optimal Weight for Inter-frame Coding 72

4.5.1 In-loop Motion Compensated Spatio-Temporal Filters 75

4.5.2 Optimal Motion Compensated Spatio-Temporal Linear Filter 78

4.5.3 Optimal Weight for Bi-predictive Coding 80

4.6 Conclusions 83

5 Selective Data Pruning-Based Compression using High Order Edge-Directed Interpolation 84

5.1 Rate-Distortion Relation 84

5.2 Data Prune-Based Compression 87

5.3 Optimal Data Pruning 88

5.4 High Order Edge-Directed Interpolation 90

5.4.1 Single Frame-Based Interpolation 91

5.4.2 Multi-Frame-Based Interpolation 94

5.5.1 High Order Edge-Directed Interpolation 97

5.5.2 Data Pruning-Based Compression 100

5.6 Conclusions 105

6 Conclusions and Future Works 107

6.1 Conclusions 107

6.1.1 Spatio-temporal Filtering for Quality Enhancement 107

6.1.2 Spatio-Temporal Filtering for Coding 109

6.1.3 Spatio-Temporal Filtering for Data Pruning 110

v

Trang 6

6.2.2 Spatio-temporal Filtering for Coding 111

6.2.3 Spatio-temporal Filtering for Data Pruning 112

6.2.4 Spatio-temporal Estimation for Video Processing 112

Bibliography 113

vi

Trang 7

Figure 1.1: Block diagram of an image and video processing system 1

Figure 1.2: An example of blocking artifacts for one zoomed-in part of 6th frame of Foreman sequence 3

Figure 1.3: An example of blocking artifacts over columns of 123rd row of Fig 1.2 4

Figure 1.4: An example of ringing artifacts in a synthesized image 4

Figure 1.5: An example of ringing artifacts over columns of 31st row of Fig 1.4 4

Figure 1.6: An example of ringing artifacts in a zoomed-in part of 6thframe of Mobile sequence 5

Figure 1.7: An example of ﬂickering artifacts 5

Figure 1.8: Comparison between coding artifacts and Gaussian noise 9

Figure 1.9: The correlation between the current frame of compressed Mo-bile sequence and its surrounding frames 9

Figure 1.10: Block diagram of spatio-temporal ﬁltering systems for quality enhancement 11

Figure 1.11: Block diagram of spatio-temporal ﬁltering systems for coding 11

Figure 1.12: Block diagram of spatio-temporal ﬁltering systems for data pruning 12

Figure 2.1: Translation between blocks of image x s and x . 16

Figure 2.2: The original and linearized quantization functions 18

Figure 2.3: Block diagram of the enhancement algorithm 19

Figure 2.4: MSE for motion vectors (m b 0, n b 0) = (0, 0) : (7, 7) and (m f0, n f0) = (−7, −7) : (0, 0) 26

Figure 2.5: Quality enhancement for ideal case - 6thframe of Mobile sequence 29 Figure 2.6: Quality enhancement for ideal case - 6th frame of Foreman sequence 31

Figure 2.7: Quality enhancement for City sequence 32

Figure 2.8: PSNR improvement for City frames compressed with quanti-zation matrix Q 34

Figure 2.9: PSNR for diﬀerent options with integer pixel ME accuracy 35

Figure 2.10: PSNR for diﬀerent options with half pixel ME accuracy 36

Figure 3.1: An example of directional JPEG artifacts with scaling factor of 4 for the quantization step matrix 42

Figure 3.2: Angle and spread parameter for directional fuzzy ﬁlter 42

Figure 3.3: Angles θ and θ0 of the edge-based directional fuzzy ﬁlter 43

Figure 3.4: Flow chart of the directional fuzzy ﬁlter 44

Figure 3.5: Block diagram of the adaptive fuzzy MCSTF 44

vii

Trang 8

Figure 3.8: Pixel classiﬁcation for directional ﬁltering 49

Figure 3.9: Pixel classiﬁcation for directional ﬁltering 50

Figure 3.10: Comparison of ﬁltered results 51

Figure 3.11: Zoomed images for comparison of ﬁltered results 52

Figure 3.12: Comparison on the contribution of spatial and directional adap-tations 53

Figure 3.13: Zoomed images for comparison of Fig 3.12 53

Figure 3.14: Comparison of ﬁlter results for MJPEG sequences 56

Figure 3.15: Zoomed views for images in Fig 3.14 56

Figure 3.16: Comparison on PSNR of simulated methods for Mobile sequence 57 Figure 3.17: Comparison on ﬂickering artifacts of simulated methods for Mobile sequence 58

Figure 3.18: Comparison of ﬁlter results for H.264 sequences 59

Figure 3.19: Comparison of PSNR for all frames in the Foreman sequence 59

Figure 3.20: Comparison of ﬂickering metric for all frames in the Foreman sequence 60

Figure 3.21: Comparison of PSNR with diﬀerent bit-rates of the Foreman sequence 61

Figure 4.1: Block diagram of the H.264/AVC encoder with in-loop MCSTF 65 Figure 4.2: In-loop coding and enhancement for GOP IBP The ﬁrst row is the compressed sequence using conventional encoding scheme, the last rows is the compressed sequence using encoding scheme with in-loop enhancement and the middle rows explain step by step the encoding scheme with in-loop enhancement 66

Figure 4.3: Blocking artifacts of the motion compensated frames 70

Figure 4.4: Blocking artifacts in using cross-block and in-block cubics of MCSTF and MCTF 71

Figure 4.5: Overlapped blocks for motion compensation 71

Figure 4.6: Bi-predictive coding scheme 73

Figure 4.7: Comparison between conventional and proposed in-loop en-hancement H.264/AVC codecs 76

Figure 4.8: Enhancement for 3rd frame of Foreman sequence 77

Figure 4.9: Zoom-ined part of Fig 4.8 78

Figure 4.10: PSNR and ﬂickering artifact comparison for frame in Foreman sequences 79

Figure 4.11: Comparison in R-D curves for bi-predictive coding with diﬀer-ent weight prediction option 80

Figure 4.12: Bi-predictive coding for 19th frame of Crew sequence 81

Figure 5.1: Block diagram of the data pruned-based compression 86

viii

Trang 9

Figure 5.4: Block diagram of the single frame-based interpolation phase 91Figure 5.5: Model parameters of sixth-order and eighth-order edge-directed

interpolation 92Figure 5.6: Block diagram of the proposed multi-frame-based interpolation

for case of upsampling with ratio 1× 2 94

Figure 5.7: Model parameters of 9th order edge-directed interpolation 97Figure 5.8: Comparison of NEDI-6 and NEDI-9 to other methods 98Figure 5.9: Comparison of NEDI-8 to other methods 99Figure 5.10: Comparison results for R-D curves of single frame data pruning-

based compression 101Figure 5.11: Comparison of NEDI-6 to other interpolation methods in case

of single frame data pruning-based compression 102Figure 5.12: One zoomed in part of Fig 5.11 102Figure 5.13: Comparison results for multi-frame data pruning-based com-

pression 103Figure 5.14: Comparison for H.264/AVC compression and optimal data

pruning-based compression with same bit-rate and PSNR values 104Figure 5.15: Comparison for H.264/AVC compression and optimal data

pruning-based compression with same bit-rate and PSNR values 105

ix

Trang 10

Table 2.1: PSNR enhancement in dB for ideal sequences . 30

Table 2.2: Comparison in PSNR improvement for diﬀerent scenarios 36

Table 3.1: Comparison of PSNR in units of dB for diﬀerent methods . 50

Table 3.2: Percentage of the classiﬁed pixels 54

Table 3.3: Comparison of PSNR in units of dB of Diﬀerent Classiﬁed Pixels and of Spatial and Directional Adaptations 54

Table 4.1: Operation of each step in Fig 4.2 66

Table 4.2: PSNR and bitrate values for bi-predictive coding options 82

Table 5.1: PSNR comparison (in dB) 99

x

Trang 11

First and foremost, I would like to express my deep admiration and truethanks to my advisor, Prof Truong Nguyen His kindness makes me feel valuedand comfortable during my studies at UC San Diego He motivates me every time

I have a chance to discuss ideas or report progress with him Undoubtedly, heinspires my research while still permits me a freedom in searching for new things

I also want to take this opportunity to thank the committee members: Prof.Pamela C Cosman, Prof William S Hodgkiss, Prof Yoav Freund and Prof AlonOrlitsky for their time and valuable comments The suggestions of the committeemembers from my qualify exam substantially improved the quality of my thesis.Most of my knowledge on image processing comes from the course Digital ImageProcessing of Prof Pamela Cosman in Fall 2005, and I would like to thank for herinsightful lectures

During the summer of 2007 I did my internship under the mentoring of Dr.Sehoon Yea and Dr Anthony Vetro at Mitsubishi Electrics Research Laboratories.Their friendly and helpful support enriched my working experience and made mefeel at home during the time I lived in Boston I still remember Dr Anthony Vetrorun quickly through the company aisles to save time

The work on data pruning-based compression would not be possible withoutthe help of Dr Joel Sol´e and Dr Peng Yin at Thomson Corporate Research, where

I spent the internship at Princeton in the summer of 2008 Their support duringthat time and when I came back UC San Diego encouraged me to keep working

on this topic I would like send them great thanks for their approval of the topicand especially for the friendly discussions every Thursday

I would like to thank Prof Jong-Ki Han, his student Chan-Won Seo andDaqian Jin from Motorola for their help in the project on video coding

This work was also encouraged by my professors and colleagues at my prioruniversity, the Ho Chi Minh University of Technology I am grateful to Assoc.Prof Thuong Le-Tien, Dr Chien Hoang-Dinh and Assoc Prof Thanh Vu-Dinhfor their understanding and support

My labmates entertained and inspired me during the time I have studied at

xi

Trang 12

My motivation of quality enhancement lately comes from the beauty of theU.S National Parks The most perfect images remain there forever and will always

be the target for image processing

Finally, I dedicate my dissertation to my parents, B ´ o v`ˆ a M e., for their

un-conditional and unbounded love and also to my older sister, my brother-in-law and

my younger sister, Chi., Anh, v`a Em g´ai, for their care of my parents during the

last three and a half years

xii

Trang 13

1980 Born, Dong Thap, Viet Nam.

2002 Bachelor, Electrical and Electronics,

Ho Chi Minh City University of Technology, Viet Nam

2004 M.S., Electrical and Electronics,

Ho Chi Minh City University of Technology, Viet Nam.2002-2005 Teaching Assistant, Center For Overseas Studies,

Ho Chi Minh City University of Technology, Viet Nam.2002-present Lecturer, Faculty of Electrical and Electronics Engineering,

Ho Chi Minh City University of Technology, Viet Nam.2005-2009 Research Assistant, University of California, San Diego.Summer 2007 Intern, Mitsubishi Electric Research Laboratories (MERL),

Cambridge, MA

Summer 2008 Intern, Thomson Corporate Research,

Princeton, New Jersey

2009-present Sr Research Engineer, Digital Media Solutions Lab,

Samsung Information Systems America Inc., Irvine, CA

2009 Ph.D., Electrical Engineering (Image and Signal Processing),

University of California, San Diego

PUBLICATIONSD˜ung T V˜o, Chan-Won Seo, Daqian Jin, Jong-Ki Han and Truong Q Nguyen,

“Optimal Motion-Compensated Spatio-Temporal Filter for Quality Enhancement

and Coding of H.264/AVC Video Sequences”, submitted to the IEEE Transactions

on Image Processing, March 2009.

D˜ung T V˜o, Joel Sole, Peng Yin, Cristina Gomila and Truong Q Nguyen,

“Selec-xiii

Trang 14

D˜ung T V˜o, Truong Nguyen, Sehoon Yea, Anthony Vetro, “Adaptive Fuzzy

Fil-tering For Artifact Reduction In Compressed Images And Videos”, accepted for

publication in the IEEE Transactions on Image Processing, 2008.

D˜ung T V˜o, Truong Nguyen, “Quality Enhancement for Motion JPEG using

Tem-poral Redundancies”, the IEEE Transactions on Circuits and Systems for Video

Technology, vol 18, No 5, pages 609-619, May 2008.

Thuong Le-Tien, Chien Hoang Dinh, Dinh Viet Hao, D˜ung T V˜o, “Neural

Net-works - Based Equalizer Model Implemented To The DSP TMS320C6711”, Journal

Of Science and Technology, Viet Nam, No 40 + 41 /2003, ISBN 0868-3980, Viet

Nam, 2003

D˜ung T V˜o and Truong Nguyen, “Optimal Spatio-temporal Motion Compensated

Filters for Quality Enhancement of H.264/AVC Compressed Sequences”, submitted

to the 2009 IEEE Conference on Image Processing.

Stanley H Chan, D˜ung T V˜o and Truong Q Nguyen, “Subpixel Motion

Estima-tion For TranslaEstima-tional MoEstima-tions”, submitted to the 2009 IEEE Conference on Image

Processing.

D˜ung T V˜o, Joel Sole, Peng Yin, Cristina Gomila and Truong Q Nguyen, “Data

Pruning-Based Compression using High Order Edge-Directed Interpolation”, to

appear in IEEE Conference on Acoustics, Speech and Signal Processing, Taiwan,

2009

D˜ung T V˜o, Truong Nguyen, “Directional Motion-Compensated Spatio-Temporal

Fuzzy Filtering for Quality Enhancement of Compressed Video Sequences ”, the

15th IEEE International Conference on Image Processing, San Diego, CA, 2008.

D˜ung T V˜o, Truong Nguyen, Sehoon Yea, Anthony Vetro, “Edge-based

Direc-tional Fuzzy Filter for Artifact Reduction in JPEG Images”, the 15th IEEE

Inter-national Conference on Image Processing, San Diego, CA, 2008.

xiv

Trang 15

Coding Artifacts Reduction”, the 20th SPIE Visual Communications and Image

Processing Conference, San Jose, CA, 2008.

D˜ung T V˜o, Truong Q Nguyen, “Quality Enhancement for Motion JPEG using

Temporal Redundancies”, the 14th IEEE International Conference on Image

Pro-cessing, San Antonio, Texas, 2007.

D˜ung T V˜o, Ryan S Prendergast and Truong Q Nguyen, “Filter-Banks Based

Super-Resolution for Rotated and Blurry Under-Sampled Images”, the 40th

Asilo-mar Conference on Signals, Systems and Computers, Monterey Bay, CA, 2006.

Thuong Le-Tien, D˜ung T V˜o, Tuan Nguyen-Thanh, Chi Nguyen-Duc,“Blind

Dig-ital Audio Watermarking Approach Using Spread Spectrum Technique”, Conf.

Proceedings COSCI 2005, HCMUT, Viet Nam, 2005.

Thuong Le-Tien, SungYoung Lee, Tuan Nguyen-Thanh, D˜ung T V˜o,

“Improv-ing The Robustness Of Watermark“Improv-ing Approach For Copyright Protection”, Conf.

Proceedings CCN ’04 (IASTED), ISBN 0-88986-429-2, Pages 144-148, Cambridge,

MA, USA, 2004

Thuong Le-Tien, SungYoung Lee, Tuan Nguyen Thanh, D˜ung T V˜o, “A

Feasi-ble Solution For Image Watermarking Techniques”, Conf Proceedings, Viet Nam

Conference on Radio and Electronics (REV’04), Pages 273-277, HaNoi, Viet Nam,

2004

Thuong Le-Tien, SungYoung Lee, D˜ung T V˜o, Tuan Nguyen-Thanh, “A Study

on Digital Audio Watermarking using Spread Spectrum Techniques”, Conf

Pro-ceedings ISASE 2004, Pages 16-20, HCMC, Viet Nam, 2004.

Chien Hoang Dinh, Thuong Le-Tien, Dinh Viet Hao, D˜ung T V˜o, “Equalizer

Ap-plying Neural Networks”, The 8th Scientific and Technical Conference, HCMUT,

Viet Nam, 2002

xv

Trang 16

by Stanley Chan, D˜ung V˜o, Truong Nguyen, University of California, San Diego,February 2009.

“Methods and Apparatus for Video Image Data Pruning ”, Invention Disclosure

was ﬁled by Thomson Corporate Research on September 2008, Provisional Patentwas issued on January 2009

“Filtering Artifacts in Images with 3D Spatio-Temporal Fuzzy Filters”, Patent was

ﬁled by Mitsubishi Electronic Research Laboratories, September 2007 and was lished on January 24th, 2008 (No US2008/0019605)

pub-“Video Enhancement using Temporal Redundancies”, Invention Disclosure was

ﬁled by D˜ung V˜o, Truong Nguyen, University of California, San Diego, May 2007

xvi

Trang 17

Spatio-Temporal Filtering For Image And Video Processing: Applications On Quality Enhancement, Coding and Data Pruning

byD˜ung Trung V˜oDoctor of Philosophy in Image and Signal ProcessingUniversity of California San Diego, 2009Professor Truong Q Nguyen, Chair

In digital image and video processing, compression is required to reduce thenumber of bits needed to represent the signal Spatial, temporal and visual redun-dancies are removed to achieve this goal Although the encoded signal are morecompact and easier for storing or transmitting, the correlation between pixels ofimage or video sequences are distorted This causes coding artifacts, which degradethe visual quality of the signal and cause annoyance to the viewers Block-basedcompressed multi-dimensional signals are usually disturbed by spatial artifactssuch as blocking and ringing artifacts and temporal artifacts such as mosquito ar-tifacts and ﬂickering artifacts To reduce the visual aﬀect of these coding artifacts,compressed images and video sequences should be enhanced prior to being sent tothe displaying devices

Quality enhancement plays a very important role in the post-processingphase It is the ﬁrst step in this phase and determines the performance of thefollowing steps such as up-scaling, frame rate up-conversion, or contrast enhance-ment Previous methods on quality enhancement focused on separately improvingthe quality of each frames In this way, the temporal consistency between frames

is not guaranteed Furthermore, characteristics of temporal artifacts are not oughly studied and exploited for artifact removal This dissertation investigatesthese characteristics and proposes novel methods to reduce both spatial and tem-

thor-xvii

Trang 18

coding and data pruning.

The dissertation starts with analyzing the usage of further information fromsurrounding frames beside the information in the current frames First, a simplelinear temporal filter is studied to investigate the quantization error between thefiltered output and its original signal Reduction of the quantization error verifiesthat using information from surrounding frames can enhance the quality of thecurrent frame Next, a non-linear fuzzy spatio-temporal filter is proposed to adapt

to the characteristics of the coding artifacts, both for compressed images and videosequences

When the proposed spatio-temporal ﬁlters are proved to eﬀectively removethe artifacts, they are used in the encoding loop as the enhanced reference frame

in encoding phase These filters are then optimized at the encoder to maximizetheir performance in artifact reduction For special cases when filter coefficientfor the pixel of interest is set to zero, the filter becomes an estimator or an in-terpolator The dissertation extends the discussion for the case of implementingoptimal estimator in the encoding phase using multi reference frames Finally, theedge-directed interpolator is studied and used in data pruning-based compression,which can help reducing the bit-rate of the encoded bit-stream This interpolator

is applied to determine the effective way in dropping the data before compressionand to reconstruct the pruned signal back to its original form The dissertationalso proposes a novel metric for evaluating the flickering artifacts For each topic,extensive simulations are implemented to verify the effectiveness of the proposedmethods

xviii

Trang 19

1.1 Image and Video Processing Systems

For millennia, people searched for different ways to record their ences: writing and antiquities helped the posterity imagine their ancestors’ lifeand journey while music and poems helped the descendants sympathize with theirancestors’ feeling Though those tools were useful, it was only when modern stillimages and motion pictures were discovered that life could be visualized and re-flected in the most truthful way Entertainingly, synthetic photos and animatedmovies even raise the human imagination to illusive life The word ‘image’ is de-fined in [1] as ‘an artificial resemblance either in painting or sculpture’ Being anartificial resemblance, it should be produced in a suitable form in order to meetthe need of the human being In the recent digital era, the aim is to map themulti-dimensional signal to an efficient representation with high originality andcondensedness

experi-Fig 1.1 presents a conventional image and video system It lays out theprocesses from noisy signal (after recording) to processed signal (prior to display-ing) First, the noisy signal is denoised in the Pre-Process phase Then the Encodephase converts the signal to a more compressed form by removing spatial, temporal

Trang 20

and visual redundancies Because the noise is removed from the noised signal inPre-Process phase, the new compressed form does not contain the noise represen-tation and is thus more compact, compared to the compressed form without usingPre-Process phase The new formed signal is transferred over the channel, which issymbolized by the Transmit phase Being reconstructed into the displayable form,the encoded form is de-compressed by the Decode phase Although redundancyremoval helps reducing the number of bits needed to represent the video content, itdestroys the correlation between pixels and causes coding artifacts Post-Processphase is used to reduce these coding artifacts, as well as adjust the processed signalcharacteristic such as its size and frame rate into a desired form for displaying.

The aim of this dissertation is to discuss and propose novel systems whichcan reduce coding artifacts in compressed images and video sequences The systemmodiﬁcation is not limited to the Post-Process phase only

1.2 Coding Artifacts

Block-based compressed signals suffer from blocking, ringing, mosquito andflickering artifacts, especially at low-bit-rate coding Blocking artifacts occur atthe border of neighboring blocks when each frame is processed independently inseparate blocks with coarse quantization of discrete cosine transform (DCT) co-efficients Fig 1.2 shows an example of blocking artifact for one zoomed-in part

of 6th frame from encoded Foreman sequence This sequence is compressed usingMotion JPEG (MJPEG) codec with scaling factor value of 4 for the quantizationstep size matrix Comparing to the original frame in Fig 1.2(a), the compressedframe in Fig 1.2(b) is aﬀected by blocking artifacts at the border pixels of the

8× 8 blocks Fig 1.3 presents in detail the blocking artifacts over the 123 rd row

of Fig 1.2 As seen in Fig 1.3(b), the smoothness over block borders is destroyedand becomes a step function

Ringing artifacts occur due to the loss of high frequencies when quantizingthe transformed coeﬃcients with a coarse quantization step These high frequencycomponents usually have smaller values than the low frequency components How-

Trang 21

(a) Original (b) CompressedFigure 1.2: An example of blocking artifacts for one zoomed-in part of 6th frame

of Foreman sequence

ever, they are quantized with a higher quantization step size than the low frequencycomponents This makes the quantized value of the high frequency componentstend toward to a zero value Fig 1.4 shows an example of ringing artifacts in asynthesized image The image is compressed using the JPEG standard with scalingfactor value of 4 for the quantization step size matrix Fig 1.5 shows in detail theringing artifacts over the 31st row in Fig 1.4 Comparing to the shape of the origi-nal edges as in Fig 1.5(a), the shape of the compressed edges with ringing artifacts

in Fig 1.5(b) is similar to the Gibbs phenomenon [2], where in both cases the highfrequency components are removed Fig 1.6 shows the ringing artifacts in the 6th

frame of Mobile sequence which has edges with complicated shape This sequence

is compressed using MJPEG standard with the standard quantization step sizematrix As can be seen in the compressed frame in Fig 1.6(b), the ringing aﬀectsthe visual quality more seriously in the detail areas and are most prevalent alongthe strong edges

On the other hand, mosquito artifacts come from ringing artifacts of manysingle compressed frames when displayed in a sequence For inter-coded frames,mosquito artifacts become more annoying for blocks on the boundary of movingobjects and backgrounds which have signiﬁcant inter-frame prediction errors in theresidual signal [3]

Flickering artifacts happen due to the inconsistency in quality over frames

at the same spatial position These ﬂickering artifacts [4] [5] [6] are caused by thediﬀerence of quantization step size for blocks at the same location over frames

Trang 22

75 80 85 90 95 100

Column No.

(b) CompressedFigure 1.3: An example of blocking artifacts over columns of 123rdrow of Fig 1.2

(a) Original image (b) CompressedFigure 1.4: An example of ringing artifacts in a synthesized image

100 120 140 160 180 200 220 240 260 280

Column No.

(b) CompressedFigure 1.5: An example of ringing artifacts over columns of 31st row of Fig 1.4

Trang 23

(a) Original image (b) Compressed

Figure 1.6: An example of ringing artifacts in a zoomed-in part of 6th frame ofMobile sequence

6

Figure 1.7: An example of ﬂickering artifacts

This parameter is determined by optimizing the rate-distortion function only forthe current frame, which ignores the previously coded frames Fig 1.7 shows oneexample of the temporal consistency in quality The temporal diﬀerence between

the values of original pixels at one location (m, n) of the previous frame O(t − 1)

and the current frame O(t) is deﬁned as ∆O(t, m, n) Because of compression, the original values of pixels O(t − 1, m, n) and O(t, m, n) are distorted to I(t − 1, m, n)

and I(t, m, n), respectively The temporal diﬀerence between I(t − 1, m, n) and I(t, m, n) is deﬁned as ∆I(t, m, n) If ∆I(t, m, n) = ∆O(t, m, n), the temporal dif-

ference of compressed pixels over frames is not the same as the temporal diﬀerence

of original pixels over frames In this case, there exists a temporal inconsistency

in the compressed sequence, which is called ﬂickering artifact

Blocking and ringing artifacts degrade visual spatial quality while mosquito

Trang 24

and ﬂickering artifacts reduce the visual temporal quality MJPEG and

MPEG-2 encoded sequences suﬀer from blocking and ringing artifacts due to the largeprocessed block-size of 8× 8 or 16× 16 pixels for both motion estimation (ME)

and discrete cosine transform (DCT) While H.264/AVC compressed sequences[7] are less aﬀected by blocking artifacts with the use of the deblocking in-loopﬁlter and more resistant to ringing artifacts by implementing the 4× 4 block-

based ME and integer transform, there is still an inconsistency in their temporalquality Examples of compressed sequences with coding artifacts can be found at:http://videoprocessing.ucsd.edu/∼dungvo/dissertation.html.

1.3 Methods on Artifact Reduction

Many filter-based denoising methods have been proposed to reduce codingartifacts, most of which are frame-based enhancements for blocking and ringingartifact reduction One method to reduce blocking artifacts is to use the lappedorthogonal transform (LOT) [8], which increases the dependence between adjacentblocks As the LOT based approach is incompatible with the JPEG standard,many other approaches consider pixel-domain and DCT-domain post-processingtechniques These include low-pass filtering [9], adaptive median filtering [10] andnonlinear spatially variant filtering [11] which were applied to remove the high fre-quencies caused by sharp edges between adjacent blocks Other pixel-based meth-ods are constrained least squares (CLS) [12], and maximum a posteriori probabilityapproach (MAP) [13], all of which require many iterations with high computationalload In [14], a projections onto convex set (POCS) based method was proposedwith multi-frame constraint sets to reduce the blocking artifacts This methodrequired extracting the motion between frames and quantization information fromthe video bit-stream Xiong et al [15] considered the artifact caused by discontinu-ities between adjacent blocks as quantization noise and used overcomplete waveletrepresentations to remove this noise The edge information is preserved by ex-ploiting cross-scale correlation among wavelet coefficients The algorithm yieldsimproved visual quality of the decoded image over the POCS and MAP methods

Trang 25

and has lower complexity In H.264/AVC, an adaptive deblocking filter [16] wasproposed to selectively filter the artifacts in the coded block boundaries Substan-tial objective and subjective quality improvement was achieved by this deblockingfilter.

The DCT-based methods for blocking artifact removal adjust the tized DCT coefficients to reduce quantization error Tien and Hang [17] made anassumption that the quantization errors are strongly correlated to quantized coef-ficients in high contrast areas Their algorithm first compares the DCT coefficients

quan-to the pre-trained quantized coefficient representatives quan-to get the best match, thenadds the corresponding quantized error pattern to reconstruct the original DCTcoefficient This method requires a large pre-defined set of quantized coefficientrepresentatives and provides slight PSNR gain In another approach, Jeon andJeong [18] defined a block boundary discontinuity and compensated for selectedDCT coefficients to minimize this discontinuity To restore the stationarity of theoriginal image, Nosratinia [19] averaged the original decoded image with its 15displacements These displacements are calculated by compressing, decompressingand translating back shifted versions of the original decoded images With an as-sumption of small changes of neighboring DCT coefficients at the same frequency

in a small region, Chen et al [20] applied an adaptively weighted low-pass filter tothe transform coefficients of the shifted blocks The window size is determined bythe block activity which is characterized by human visual system (HVS) sensitiv-ity at different frequencies A new type of shifted blocks across any two adjacentblocks was constituted by Liu and Bovik [21] They also defined a blind mea-surement of local visibility of the blocking artifact Based on this visibility andprimary edges of the image, the block edges were divided into three categories andwere processed by corresponding effective methods

To reduce ringing artifacts, Hu et al [22] used a Sobel operator for edgedetection, then applied a simple low-pass ﬁlter for pixels near these edges Using

a similar process, Kong et al [23] established an edge map with smooth, edge andtexture blocks by considering the variance of a 3× 3 window centered on the pixel

of interest Only edge blocks are processed with an adaptive ﬁlter to remove the

Trang 26

ringing artifacts close to the edges Oguz et al [24] detected the ringing artifactareas that are most prominent to HVS by binary morphological operators Then

a gray-level morphological nonlinear smoothing filter is applied to these regions.Although lessening the ringing artifacts, these methods do not solve the problemcompletely because the high frequency components of the resulting images arenot reconstructed As an encoder-based approach, [25] proposed a noise shapingalgorithm to find the optimal DCT coefficients which adapts to the noise variances

in diﬀerent areas All of these methods can only reduce ringing artifacts in eachframe To deal with the temporal characteristic of mosquito artifacts, [26] appliedthe spatio-temporal median ﬁlter in the transform domain for surrounding 8× 8

blocks The improvement in this case is limited by the small correlation betweenDCT coeﬃcients of the spatial neighboring 8×8 blocks as well as the lack of motion

compensation in the scheme

For ﬂickering artifact removal, most of the current methods focused onreducing ﬂickering artifacts in all intra-frame coding In [5], the quantization error

is considered to obtain the optimal intra prediction mode and to help reducing theflickering artifact Also for intra-frame coding, [6] included the flickering artifactterm in the cost function to find the optimal prediction and block-size mode Asimilar scheme is implemented in [4] for flickering reduction in Motion JPEG 2000.Note that all of these approaches are encoder-based

1.4 Motivation

Visual eﬀects of coding artifacts vary from one codecs to another but theseartifacts always have directional and data-dependent properties Blocking, ringingand ﬂickering artifacts are spatially or temporally directed Because of the block-based compression, blocking artifacts occur at the horizontal and vertical directions

in the pixels at borders between 2 blocks Furthermore, due to the loss of highfrequency components during the coarse quantization, the ringing artifacts appearalong the strong edges of the compressed frame Flickering artifacts happen at thesame location in the temporal direction

Trang 27

(a) Original frame (b) Noisy frame (c) Compressed frame

Figure 1.8: Comparison between coding artifacts and Gaussian noise

0.88 0.9 0.92 0.94 0.96 0.98

Figure 1.9: The correlation between the current frame of compressed Mobile quence and its surrounding frames

se-The data dependent property is from the usage of the same quantizationstep size matrix for every block Coding artifacts degrade the visual quality more

in detail areas than in flat areas These characteristics make the coding facts different than the recording noise, where it is usually assumed as additivezero-mean white Gaussian noise [27] [28] [29] Fig 1.8 shows the differences be-tween additive Gaussian noise and coding artifacts in MJPEG compression Thenoisy frame in Fig 1.8(b) is degraded by additive Gaussian noise with variance of

arti-0.01 while the compressed frame in Fig 1.8(c) is encoded with scaling factor of

4 for the quantization step size matrix Comparing to the original in Fig 1.8(a),the Gaussian noise is uniformly distributed over pixels while the coding artifacts

is nonuniformly distributed This implies that these coding artifacts should betreated in a diﬀerent way other than methods for denoising in pre-processing

In order to eﬃciently reduce the temporal artifacts such as mosquito and

Trang 28

ﬂickering artifacts, not only the spatial correlation among pixels but also the poral one needs to be incorporated Fig 1.9 shows the correlation between the 5th

tem-frame of compressed Mobile sequence and its surrounding tem-frames Compared tothe auto-correlation of the current frame, the cross-correlation in the plot betweenthe center frame and its surrounding frames is still rather large when the framedistance is small Using extra information from temporally neighboring samples,such as pixels of surrounding frames in video sequences, can further enhance thequality of compressed video sequences

Among the described artifacts, blocking artifacts are easiest to be removed

It is because of the knowledge of their location, which is always at the block borders.For other artifacts, it is difficult to locate the artifacts and separate them from theoriginal signal When multi-frame-based enhancement is used, using additionalinformation from surrounding frames does not always guarantee a better improve-ment in quality than only using the information from the current frame Further-more, in some cases, both single-frame-based enhancement and multi-frame-basedenhancement fail to improve the quality of the compressed frame Because of thenonuniformly distributed characteristic of the coding artifacts, the enhancementalgorithm to process detail areas must be different than the one for the flat areas.These matters must be addressed in methods for quality enhancement

1.5 Thesis Outline

The dissertation addresses 3 main topics: spatio-temporal filtering for ity enhancement, coding and data pruning in image and videos processing Thedissertation is organized as follows Spatio-temporal filtering methods for qualityenhancement are presented in Chapter 2 and Chapter 3 The topic of spatio-temporal filtering for coding is discussed in Chapter 4 Data pruning-based com-pression is proposed in Chapter 5 Finally, Chapter 6 gives concluding remarksand future research

qual-The ﬁrst topic discusses the spatio-temporal ﬁltering methods to process the decoded signal, which is for coding artifact reduction purpose The

Trang 29

Signal

- Code - Trans-

-SignalFigure 1.10: Block diagram of spatio-temporal ﬁltering systems for qualityenhancement

Orignal

Signal

- Code

The temporal quantization error is first analyzed for the case of temporalfiltering in Chapter 2 Based on the specific characteristics of coding artifacts,novel non-linear spatio-temporal filters are proposed in Chapter 3 to effectivelyremove these artifacts

Motion Compensated Temporal Linear Filtering

Chapter 2 demonstrates the eﬀectiveness of using extra temporal tion in quality enhancement To simplify the analysis, a simple averaging motioncompensated temporal ﬁlter (MCTF) is considered for cases of pure translational

Trang 30

is calculated Quality enhancement is achieved if this MSE is less than the MSEbetween the compressed block and the original block This leads to an enhance-ment condition For real video sequences, if the residual between the motioncompensated blocks from surrounding frames and the block in the current frame

is large, the reconstructed block after temporal ﬁltering will deviate further fromthe original block A more practical condition is discussed to take into accountthe residual signal This chapter also generalizes the MCTF for cases of using anarbitrary number of surrounding frames and optimizes them to obtain maximumenhancement

Chapter 2, in full, is a reprint of the material as it appears in D˜ung T V˜o,Truong Nguyen, “Quality Enhancement for Motion JPEG using Temporal Redun-dancies”, the IEEE Transactions on Circuits and Systems for Video Technology,vol 18, No 5, pp 609-619, May 2008

Adaptive Fuzzy Motion Compensated Spatio-Temporal Filtering

For high compression level, the averaging MCTF in Chapter 2, which workswell in the flat areas, can blur the detailed areas In such cases, a non-linearfilter with input-dependent coefficients should be used to avoid the blurring effect.Chapter 3 proposes fuzzy spatio-temporal filters to adaptively remove the codingartifacts Fuzzy filters have real-valued spatial-rank relation and are dependent

to the spread of the signal They average pixels in ﬂat areas while retaining theisolated pixels in edge areas In image and video compression, the artifacts such

as blocking or ringing artifacts are spatially directional and ﬂickering artifactsare temporally directional The fuzzy motion compensated spatio-temporal ﬁlter(MCSTF) in this chapter is proposed to be adaptive to the activity of the pixels of

Trang 31

interest, the relative value and relative position between the pixel of interest andits surrounding pixels.

+ Directional Fuzzy Filtering for Image Enhancement: in compressed images,such as JPEG or JPEG-2000, the ringing artifacts are prevalent along thestrong edges To avoid blurring these real edges, the proposed directionalspatial filter applies strongest smoothing filtering to the direction perpendic-ular to the edge, where the ringing artifacts are likely to have no relation withthe interested pixels, and a weaker filtering to the edge direction Becausedetail areas tend to have more ringing artifacts than flat areas, the direc-tional spatial filter is proposed to adapt its strength to standard variance ofpixels in a local window

+ Adaptive Fuzzy Filtering for Video Enhancement: to deal with the porally directional characteristic of the ﬂickering artifacts, the fuzzy ﬁlter isbased on the cross-correlation value between the window centered by the pixel

tem-of interest and the window centered by its surrounding pixel to determine theposition-dependent scaling factor of the spread parameter The amplitude

of the spread parameter will be controlled using the standard variance ofpixels in a spatial window centered by the pixel of interest To increase thecorrelation between pixels, the surrounding frames are motion compensatedbefore applying the spatio-temporal ﬁlter A new metric which considersthe tracking characteristic of human eyes is also proposed to evaluate theﬂickering artifacts

Chapter 3, in full, is a reprint of the material as it appears in D˜ung T.V˜o, Truong Nguyen, Sehoon Yea, Anthony Vetro, “Adaptive Fuzzy Filtering ForArtifact Reduction In Compressed Images And Videos”, accepted in January 2009for publication in the IEEE Transactions on Image Processing

The application of spatio-temporal ﬁltering for coding is discussed in ter 4 In this chapter, the post ﬁlters in Chapter 2 and Chapter 3 are used in the

Trang 32

Chap-encoder as a spatio-temporal in-loop ﬁlter This helps to increase the consistency

in temporal quality by further reducing temporal artifacts such as mosquito andflickering artifacts With the availability of the original frames at the encoder,optimal linear spatio-temporal filters can be obtained by minimizing the MSE be-tween the reconstructed signal and the original signal These filter coefficients will

be sent to the decoder for post processing purposes When the filter coefficient ofthe pixel of interest is set to zero, the filter output becomes an estimated versionfor the pixel of interest This estimated value is used for prediction or interpola-tion This chapter also discusses the special case of the optimal spatio-temporalfilter when it is simplified to temporal filters with 2 surrounding frames and thefilter coefficient for the pixel of interest is set to zeros In this case, it becomes theoptimal weight for the Bi-predictive (B) frame

Chapter 4, in full, is a reprint of the material as it appears in D˜ung T.V˜o, Chan-Won Seo, Daqian Jin, Jong-Ki Han and Truong Q Nguyen, “Opti-mal Motion-Compensated Spatio-Temporal Filter for Quality Enhancement andCoding of H.264/AVC Video Sequences”, submitted to the IEEE Transactions onImage Processing, March 2009

Chapter 5 discusses a special application of spatio-temporal ﬁlters in terpolation A novel high-order edge-directed interpolation scheme is proposed todetermine the optimal way to conduct data pruning and to reconstruct the signal.Data is pruned to reduce frame size so that lower bit-rate is required for compres-sion This means that for the same quality level, lower compression level with fewercoding artifacts can be implemented The edge-directed interpolation is adapted

in-to the data pruning and is discussed for both image and video up-scaling

Chapter 5, in full, is a reprint of the material as it appears in D˜ung T.V˜o, Joel Sole, Peng Yin, Cristina Gomila and Truong Q Nguyen, “Selective DataPruning-Based Compression using High Order Edge-Directed Interpolation”, sub-mitted to the IEEE Transactions on Image Processing, January 2009

Trang 33

Motion JPEG using Temporal

Redundancies

Motion JPEG (MJPEG) separately compresses each frame of a video quence in JPEG format Compared to MPEG, MJPEG has lower attainable com-pression level but is also less computationally complex It is popularly used fornon-linear editing which requires easy access to any frame Another application

se-of MJPEG is in medical imaging which requires high quality images and errorresilience The disadvantage of MJPEG is that it does not exploit the temporalredundancies of successive frames to achieve higher compression, and consequentlyMJPEG has a higher bit rate than MPEG for the same quality Quality enhance-ment for MJPEG until now has focused on improving the quality using singleJPEG frame

Due to the strong correlation between successive frames in video sequences,information from neighboring frames can be used to reduce the quantization errorresulting from DCT coeﬃcient truncation in each frame This chapter proposes anovel method of using the previous and future frames to enhance the quality ofthe current frame Motion vectors between these decoded frames are found andused to estimate the displacements of the current frame, which are then averaged

to yield the final reconstructed frame This is equivalent to a motion compensatedtemporal filter (MCTF) for aligned blocks Previous MCTF are used in pre-filterdenoising [30] [31] and in scalable video coding (SVC) [32] [33]

15

Trang 34

iii

iv

Figure 2.1: Translation between blocks of image x s and x.

This chapter is organized as follows: Section 2.1 formulates the translationalrelation in the DCT domain and develops an algorithm to ﬁnd a displacement of oneblock The enhancement process for the case of pure translational video sequenceswill be described in detail in Section 2.2 This section also analyzes the errorbetween the reconstructed and original blocks Extensions for applications to videosequences are discussed in Section 2.3 Section 2.4 generalizes the enhancementprocess This uses an arbitrary number of referenced frames and designs an optimalﬁlter providing the minimal error Section 2.5 presents simulation results andcomparison to other approaches Finally, Section 2.6 concludes the chapter anddiscusses future directions related to this work

2.1 Translational Relation of DCT Coeﬃcients

This section derives the relation in the DCT domain for shifted blocks ofdiﬀerent frames In MJPEG, each frame is processed in separate blocks of size

N ×N (N = 8) Assume that block x s matches to a portion of image x starting at pixel (m0, n0), possibly located among four adjacent blocks (i, ii, iii, iv) as in Fig.

2.1, i.e.,

x s (m, n) = x(m + m0, n + n0)

for m, n = 0, , N −1 (2.1)

Trang 35

The DCT transform (Type II) of x s can be obtained by [34]

In (2.2), if x is replaced by its DCT coeﬃcients X, X s can be calculated based

on the DCT coeﬃcients of 4 blocks (i, ii, iii and iv) of image x or more generally, based on a function F

sin

n

0vπ N

cos

n

0vπ N

sin

n

0vπ N

n0−1 n=0

Trang 36

.

After the DCT transform, the quantization process truncates the DCT

co-eﬃcients X(u, v) to obtain the output X q (u, v) with error ∆X q (u, v)

where

X q (u, v) = Q(u, v) × round

X(u, v) Q(u, v)

(2.8)

and Q(u, v) is the quantization step size matrix The same quantization process

is applied to X s to get X s,q Because of the non-linear characteristic of the

quan-tization function, the error ∆X q (u, v) prevents the quantized DCT coeﬃcients of

Trang 37

Figure 2.3: Block diagram of the enhancement algorithm.

the original block X q and the shifted blocks X s,q from satisfying (2.3) Linearizing

the quantization function as shown in Fig 2.2, a displacement X s,q d of X s,q can be

x d s,q (m, n) = x q (m + m0, n + n0). (2.10)

In summary, if the motion vectors between blocks are known, a translational

relation can be established as in (2.10) to get a displacement x d s,q of the current

block x s,q The next section presents a method to use this displacement in reducing

the blocking and ringing artifacts of x s,q

2.2 Quality Enhancement using Temporal

Re-dundancies

Fig 2.3 shows the block diagram of the enhancement algorithm The IDCTblocks transform the DCT coeﬃcients to the pixel domain and motion estimation(ME) blocks are used to ﬁnd the motion vectors between blocks of frames Assumethat the present encoded block is shifted from one block of the previous frame by

(m b

0, n b

0) pixels and from another block of the future frame by (m f0, n f0) pixels, the

backward estimated version x b

q (t, m, n) and forward estimated version x f

q (t, m, n)

Trang 38

of compressed block x q (t, m, n) are calculated by

x b q (t, m, n) = x q (t − 1, m + m b

and

x f q (t, m, n) = x q (t + 1, m + m f0, n + n f0). (2.12)The matching blocks in the previous and future frames are any shifted blocks, notjust the encoded blocks Consequently, an average scheme is used to obtain theﬁnal processed block

This is equivalent to a temporal ﬁltering for aligned blocks of diﬀerent frames

Since the DCT is an orthonormal transform, the mean squared error (MSE)between the estimated and original blocks can be calculated by their DCT coeffi-cients The error between the original and estimated DCT coefficients is definedby

E q (t, u, v) = X q (t, u, v) −X(t, u, v)

= 13

E q b (t, u, v)+E q (t, u, v)+E q f (t, u, v)

(2.14)where

E q b (t, u, v) = X q b (t, u, v) − X(t, u, v), (2.15)

E q (t, u, v) = X q (t, u, v) − X(t, u, v) (2.16)and

E q f (t, u, v) = X q f (t, u, v) − X(t, u, v). (2.17)The total error comes from quantization error of the current block and the error

of using the past and future displacements of the current block At a speciﬁc

Trang 39

where X i,q , X ii,q , X iii,q , X iv,q are the quantized DCT coeﬃcients of blocks (i, ii,

iii, iv), respectively and the K b

l (u, v) terms are calculated as

Eq (2.18) shows that the backward error is caused by the quantization errors of

blocks (i, ii, iii, iv) weighted by K b

l An equivalent expression for E f

quan-that E q (u0, v0) is uniformly distributed with zero mean and

Trang 40

Consequently, from (2.14), (2.18) and (2.26)

M SE = E

N−1

m=0

N−1 n=0

In this section, an algorithm to enhance the quality of MJPEG was presented and

analyzed To achieve MSE enhancement, motion vectors must lead to σ2

E

q (u0, v0)fulﬁlling the condition (2.29) which will be considered in detail simulations inSection 2.5

2.3 Quality Enhancement for Real Video

Sequences

In real video sequences, a pure relation as in (2.1) is rarely satisﬁed Amore suitable relation between blocks needs to be considered in this case Assumethat the present encoded block is shifted from one block of the previous frame by

Định dạng
Số trang	136
Dung lượng	22,07 MB