At the same time, the scope of research of the academic environment on Image Communication has further increased to include model- and knowledge-based techniques, artificial intelligence
Trang 2De-interlacing
Trang 4De-interlacing
A I(ey Technology
for Scan Rate Conversion
Trang 5ISSN: 0928 1479
Trang 6INTRODUCTION TO THE SERIES
"Advances in Image Communication"
Dear Colleague,
Image Communication is a rapidly evolving multidisciplinary field on the development and evaluation of efficient means for acquisition, storage, transmission, representation, manipulation and understanding of visual information Until a few years ago, digital image communication research was still confined to universities and research laboratories of telecommunication or broadcasting companies Nowadays, however, this field is witnessing the strong interest of a large number of industrial companies due to the advent of narrow band and broadband ISDN, GSM, the Internet, digital satellite channels, digital over-the-air transmission and digital storage media Moreover, personal computers and workstations have become important platforms for multimedia interactive applications that advantageously use a close integration of digital compression techniques (JPEG, MPEG), Very Large Scale Integration (VI-SI) technology, highly sophisticated network facilities and digital storage media
At the same time, the scope of research of the academic environment on Image Communication has further increased to include model- and knowledge-based techniques, artificial intelligence, motion analysis, and advanced image and video processing techniques The variety of topics on Image Communication is so large that no one can be a specialist in all the topics, and the whole area is beyond the scope of a single volume, while the requirement of up-to-date information is ever increasing
This was the rationale for Elsevier Science Publishers to approach me to edit a book series on 'Advances in Image Communication', next to the already existing and highly successful Journal:
"Signal Processing: Image Communication" The book series was to serve as a comprehensive reference work for those already active in the area of Image Communication Each author or editor was asked to write or compile a state-of-the-art book in his/her area of expertise, including information until now scattered in many journals and proceedings The book series therefore would help Image Communication specialists to gain a better understanding of the important issues in neighbouring areas by reading particular volumes It would also give newcomers to the field a foothold for doing research in the Image Communication area
Trang 7in order to produce a quality book series, it was necessary to ask authorities well known in their respective fields to serve as volume editors, who would in turn attract outstanding contributors
It was a great pleasure to me that ultimately we were able to attract such an excellent team of editors and authors
Elsevier Science and l, as Editor of the series, are delighted that this book series has already received such a positive response from the image communication community We hope that the series will continue to be of great use to the many specialists working in this field
Jan Biemond
Series Editor
Trang 8P r e f a c e
T HE human visual system is less sensitive to flickering details than to large-area flicker Television displays apply interlacing to profit from this fact, while broadcast formats were originally defined to match the dis- play scanning format As a consequence, interlace is found throughout the video chain If we describe interlacing as a form of spatio-temporal s u b - sampling, then de-interlacing, the topic of our book, is the reverse operation aiming at the removal of the sub-sampling artefacts
The major flaw of interlace is that it complicates many image processing tasks Particularly, it complicates scanning-format conversions These were necessary in the past mainly for international programme exchange, but with the advent of high-definition television, videophone, Internet, and video
on PCs, many scanning formats have been added to the broadcast formats, and the need for conversion between formats is increasing
This increasing need, not only in professional but also in consumer equip- ment, has restarted the discussion 'to interlace or not to interlace' Par- ticularly, this issue divides the TV and the PC communities The latter seems biased towards the opinion that present-day technologies are pow- erful enough to produce progressively scanned video at high rate and do not need to trade-off vertical against time resolution through interlacing
On the other hand, the TV world seems more conservative, and biased to- wards the opinion that present-day technologies are powerful enough to adequately de-interlace video material, which reduces, or even eliminates, the need to introduce incompatable standards and sacrifice the investments
of so many consumers
It appears that the two camps have had disjunct expertises for a long time In a world where the two fields are expected by many to be converging,
it becomes inevitable to appreciate and understand each other's techniques
to some extent Currently, the knowledge in the PC community on scan rate conversion in general, and on de-interlacing in particular, seems to be
vii
Trang 9The question, 'to interlace or not to interlace', touches various issues Whether present-day technologies are powerful enough to produce progres- sively scanned video at a high rate and a good signal to noise ratio is not evident Moreover, a visual-communication system also involves display and transmission of video signals The issue translates for the transmission channel into the question: 'Is interlacing and de-interlacing still the optimal algorithm for reducing the signal b a n d w i d t h w i t h a factor of two?' Before answering this question, it is necessary to know what can be achieved with de-interlacing techniques nowadays Although the literature provides evi- dence that an all-progressive chain gives at least as good an image quality
as an all-interlaced chain with the same channel bandwidth, recent research suggests that modern motion-compensated de-interlacing techniques, used
in todays consumer electronics products can improve the emciency of even highly efficient compression techniques It seems appropriate, therefore, to evaluate the available options in de-interlacing, before jumping to conclu- sions
As a consequence of the many related issues, the scope of our book is relatively broad
Chapter 1 reviews the historical background of interlace, the meaning and significance of the reversed process called de-interlacing, and the moti- vation for the research that formed the basis of this book
Chapter 2 presents an overview of de-interlacing techniques Over the last two decades, many de-interlacing algorithms have been proposed They range from simple spatial interpolation, via directional dependent filter- ing, up to advanced motion-compensated interpolation Some methods are already available in products, while the more recent ones will appear in products when technology economically justifies their complexity Chap- ter 2 outlines the most relevant algorithms, available either in T V and PC products, or in recent literature, and compares their performance This comparison provides figures of merit, but also screen photographs are in- eluded showing the typical artifacts of the various de-interlacing methods Although the evaluation shows good results with motion-compensated de- interlacers, it also reveals that there is room for improvement, that can result from modifications in the de-interlacing algorithm, or from improved motion estimator accuracy
Trang 10P r e f a c e ix
Chapter 3, therefore, introduces motion estimation techniques developed during roughly the last thirty years for different applications, such as motion compensated (MC) filtering for noise reduction, MC prediction for coding and MC interpolation for video format conversion MC de-interlacing is probably the most demanding application of motion estimation, as it re- quires estimation of the true motion with a sub-pixel accuracy This chapter
is focussed on motion estimation algorithms that enabled the breakthroughs required for consumer priced MC de-interlacing A relative comparison of the performance of the most relevant ME algorithm is part of this chapter
In Chapter 4, we present the research aiming at further improvement of accuracy of the best motion estimation algorithm found in Chapter 3 Par- ticularly, we aimed at eliminating the preferences for particular fractional values of the motion vector, resulting from the use of simple sub-pixel in- terpolation filters
In Chapter 5, we present the research aiming at further improvement
of the best de-interlacing algorithm found in Chapter 2 In the evaluation section of this chapter we conclude that the resulting algorithm, the majority selection de-interlacer, indeed gives the best overall de-interlacing quality The combination of the best de-interlacer, obtained in Chapter 5, with the best motion-estimator as proposed in Chapter 4, offers a solid basis for investigating, in Chapter 6, the M P E G - 2 coding efficiency of interlaced and progressive video In contrast to research published earlier, we include a
subjective assessment for the relevant bit rates We also present a compar- ison in terms of the Block Impairment Metric which is more relevant than the commonly used peak-signal-to-noise-ratio Finally, we use a more bal- anced test set than found in earlier publications Our improved evaluation
of interlaced and progressive coding in various scenarios, enables a better judgement of the current value of interlace in video standards, and shows that still many modern video chains profit from this old technique
In Chapter 7 we further explored the comparison of interlaced versus progressive video with focus on the display format This comparison is of particular interest for the display of highly detailed pictures as text, Internet pages, and for resizing of pictures (Picture resizing is for example required for the so-called 'dual-window' television, and for the so-called picture- in-picture feature) It was demonstrated that the interlaced format yields subjectively an improved vertical resolution, unless line flickering becomes predominant
In Chapter 8, we draw our final conclusion that interlace is not a relic
in the digital age, but is still a relevant ingredient of modern video formats Therefore, de-interlacing remains a key technology for future image quality
Trang 11x P r e f a c e
improvements
We cannot hope t h a t this book shall silence the discussions on interlace
We do hope, however, that it serves to provide a common knowledge basis for the divided camps It can be a starting point for further experiments
t h a t will contribute to the final technical answer The debate is unlikely
to end even there, as introducing incompatible new TV standards in the past proved difficult, and balancing technical and non-technical issues may prove to be difficult
We would also like to t h a n k our colleagues for their help with the re- search t h a t forms the basis of this book In particular, we are indebted
to Anthony Ojo, Robert-Jan Schutten, Frits de Bruijn, Mihaela van der Schaar-Mitrea, Bram Riemens, Rimmert Wittebrood, Christian Hentschel and Ton Kalker for their support for some parts of this book
Last, but not least, we would like to express our gratitude for the critical review of a major part of this book by Jan Biemond of the Delft University
of Technology
Trang 12C o n t e n t s
1.1 H i s t o r i c a l b a c k g r o u n d of i n t e r l a c e 3
1.2 D e - i n t e r l a c i n g 8
1.3 R e l a t i o n w i t h s u p e r r e s o l u t i o n 10
1.4 R e l a t i o n w i t h M P E G - 2 c o d i n g 12
1.5 M o t i v a t i o n a n d scope of t h i s b o o k 13
I B a s i c t e c h n o l o g y 1 7 2 O v e r v i e w o f d e - i n t e r l a c i n g a l g o r i t h m s 19 2.1 T h e d e - i n t e r l a c i n g p r o b l e m 21
2.1.1 S p a t i o - t e m p o r a l s a m p l i n g 21
2.1.2 M o t i o n a n d its s p a t i o - t e m p o r a l r e p r e s e n t a t i o n 23
2.1.3 P r o g r e s s i v e s c a n n i n g a n d r e c o n s t r u c t i o n 24
2.1.4 I n t e r l a c e d s c a n n i n g a n d r e c o n s t r u c t i o n 28
2.1.5 P s y c h o - v i s u a l effects 32
2.1.6 P r o b l e m s t a t e m e n t 35
2.2 N o n - m o t i o n - c o m p e n s a t e d d e - i n t e r l a c i n g 36
2.2.1 L i n e a r t e c h n i q u e s 36
2.2.2 N o n - l i n e a r t e c h n i q u e s 41
2.3 M o t i o n - c o m p e n s a t e d d e - i n t e r l a c i n g 53
2.3.1 D i r e c t m e t h o d s 54
2.3.2 H y b r i d s 57
2.3.3 T e m p o r a l B a c k w a r d P r o j e c t i o n 58
2.3.4 T i m e - R e c u r s i v e d e - i n t e r l a c i n g 59
2.3.5 A d a p t i v e - R e c u r s i v e d e - i n t e r l a c i n g 60
2.3.6 ~Transversal ~ G e n e r a l i z e d s a m p l i n g 62
xi
Trang 13xii C o n t e n t s
2.3.7 ' R e c u r s i v e ' g e n e r a l i z e d s a m p l i n g 68
2.4 E v a l u a t i o n 71
2.4.1 O b j e c t i v e p e r f o r m a n c e m e a s u r e m e n t 72
2.4.2 C o m p l e x i t y 73
2.4.3 T e s t s e t 75
2.4.4 R e s u l t s 76
2.5 C o n c l u s i o n s 87
3 O v e r v i e w on m o t i o n e s t i m a t i o n t e c h n i q u e s 9 1 3.1 H i s t o r i c a l d e v e l o p m e n t s in m o t i o n e s t i m a t i o n 92
3.2 P e l - r e c u r s i v e e s t i m a t o r s 94
3.3 B l o c k - m a t c h i n g a l g o r i t h m s 97
3.3.1 T h e m a t c h c r i t e r i o n 98
3.3.2 E f f i c i e n t s e a r c h s t r a t e g i e s 100
3.4 T r u e - m o t i o n e s t i m a t i o n 105
3.4.1 H i e r a r c h i c a l m o t i o n e s t i m a t i o n 108
3.4.2 P h a s e p l a n e c o r r e l a t i o n 109
3.4.3 R e c u r s i v e s e a r c h b l o c k - m a t c h i n g 112
3.5 G l o b a l m o t i o n m o d e l s 115
3.5.1 U p g r a d i n g a n efficient b l o c k m a t c h e r w i t h a g l o b a l m o t i o n m o d e l 116
3.6 O b j e c t b a s e d m o t i o n e s t i m a t i o n 118
3.6.1 B r i e f o v e r v i e w of m e t h o d s 118
3.6.2 A n e x a m p l e o b j e c t b a s e d e s t i m a t o r 120
3.7 E v a l u a t i o n of m o t i o n e s t i m a t i o n m e t h o d s 128
3.7.1 E s t i m a t o r p e r f o r m a n c e t e s t i n g 129
3.7.2 E v a l u a t i o n r e s u l t s 137
3.7.3 S u b j e c t i v e e v a l u a t i o n of v e c t o r fields 142
3.8 C o n c l u s i o n 149
I I S y s t e m o p t i m i z a t i o n 1 5 1 4 A c c u r a t e m o t i o n e s t i m a t e s f r o m i n t e r l a c e d v i d e o 1 5 3 4.1 A c c u r a c y of t h e m o t i o n v e c t o r s 154
4.1.1 I m p r o v i n g t h e v e r t i c a l r e s o l u t i o n 154
4.1.2 T h e o r e t i c a l a n d p r a c t i c a l a c c u r a c y l i m i t s 156
4.2 I m p r o v i n g b l o c k - b a s e d m o t i o n e s t i m a t i o n 159
4.2.1 C o s t f u n c t i o n 160
4.2.2 S y m m e t r i c a l v e r s u s a s y m m e t r i c a l m o t i o n e s t i m a t i o n 166 4.3 I n t e r p o l a t i o n t o i m p r o v e t h e m o t i o n v e c t o r a c c u r a c y 166
Trang 14C o n t e n t s xiii
4.3.1 L i n e a r i n t e r p o l a t o r s 169
4.3.2 N o n - l i n e a r i n t e r p o l a t o r s 172
4.3.3 I n t e r p o l a t i o n a n d g e n e r a l i z e d s a m p l i n g 173
4.4 E v a l u a t i o n 174
4.4.1 Test set 175
4.4.2 O b j e c t i v e p e r f o r m a n c e m e a s u r e s 177
4.4.3 R e s u l t s of a s y m m e t r i c a l m o t i o n e s t i m a t i o n 179
4.4.4 R e s u l t s of s y m m e t r i c a l m o t i o n e s t i m a t i o n 187
4.5 C o n c l u s i o n s 190
5 O n t h e o p t i m i z a t i o n o f d e - i n t e r l a c i n g 1 9 5 5.1 E v a l u a t i o n of t h e p e r f o r m a n c e on d e t a i l e d i m a g e s 196
5.1.1 E x p e r i m e n t a l s e t u p 197
5.1.2 E x p e r i m e n t a l r e s u l t s a n d e v a l u a t i o n 199
5.2 E v a l u a t i o n of t h e p e r f o r m a n c e on edges 201
5.2.1 E x p e r i m e n t a l s e t u p 201
5.2.2 E x p e r i m e n t a l r e s u l t s a n d e v a l u a t i o n 201
5.3 E v a l u a t i o n of t h e r o b u s t n e s s 203
5.3.1 E x p e r i m e n t a l s e t u p 205
5.3.2 R e s u l t s a n d e v a l u a t i o n 206
5.4 T h e M a j o r i t y - S e l e c t i o n d e - i n t e r l a c e r 208
5.4.1 C o m b i n i n g d e - i n t e r l a c i n g s t r e n g t h s 209
5.4.2 M S - h y p o t h e s i s v a l i d a t i o n 210
5.4.3 ' O p t i m i z e d ' MS d e - i n t e r l a c e r 212
5.5 E v a l u a t i o n 215
5.5.1 Q u a l i t y c r i t e r i a 215
5.5.2 R e s u l t s 216
5.6 C o n c l u s i o n s 219
I I I T h e f u t u r e o f i n t e r l a c e 2 2 1 6 T h e e f f i c i e n c y o f i n t e r l a c e d v e r s u s p r o g r e s s i v e v i d e o on a n M P E G - 2 d i g i t a l c h a n n e l 2 2 3 6.1 I n t r o d u c t i o n 224
6.2 S u m m a r y of t h e M P E G - 2 v i d e o - c o d i n g s t a n d a r d 225
6.2.1 G r o u p Of P i c t u r e s 227
6.2.2 I n t r a / i n t e r - f r a m e / f i e l d c o d i n g 227
6.2.3 Field a n d f r a m e p r e d i c t i o n 228
6.2.4 M a c r o b l o c k 229
6.2.5 M o t i o n v e c t o r s 229
Trang 15x i v C o n t e n t s
6.2.6 D i s c r e t e Cosine T r a n s f o r m C o d i n g 229
6.2.7 Profiles a n d levels 230
6.3 T h e e x p e r i m e n t s 230
6.3.1 Test sequences 231
6.3.2 S u b j e c t i v e a s s e s s m e n t 235
6.3.3 O b j e c t i v e q u a l i t y c r i t e r i a 235
6.3.4 A l g o r i t h m s 237
6.4 R e s u l t s a n d e v a l u a t i o n 242
6.4.1 All p r o g r e s s i v e - c o d i n g chain versus i n t e r l a c e d - c o d i n g chain 242
6.4.2 All i n t e r l a c e d - c o d i n g c h a i n versus p r o g r e s s i v e - c o d i n g chain 248
6.4.3 R e c e i v e r - s i d e d e - i n t e r l a c i n g versus t r a n s m i t t e r - s i d e d e - i n t e r l a c i n g 251
6.4.4 R e c e i v e r - s i d e i n t e r l a c i n g versus t r a n s m i t t e r - s i d e in- t e r l a c i n g 253
6.4.5 All p r o g r e s s i v e - c o d i n g chain versus p r o g r e s s i v e - c o d i n g chain 254
6.5 D i s c u s s i o n 256
7 T o w a r d s a n o p t i m a l d i s p l a y f o r m a t 2 5 9 7.1 D i s p l a y f o r m a t o p t i o n s 260
7.2 E x p l o i t i n g t h e source p i c t u r e r e s o l u t i o n 263
7.2.1 High r e s o l u t i o n p i c t u r e s 263
7.2.2 S t a n d a r d d e f i n i t i o n video 264
7.3 E v a l u a t i o n 267
7.3.1 F i r s t s u b j e c t i v e a s s e s s m e n t 268
7.3.2 S e c o n d s u b j e c t i v e a s s e s s m e n t 274
7.4 C o n c l u s i o n s 278
A C y c l e s p e r d e g r e e a n d c y c l e s p e r p i c t u r e w i d t h 2 8 7
C I n t e r p o l a t i o n f o r s u b - p i x e l d e - i n t e r l a c i n g 2 9 1
D E x a m p l e : d e r i v a t i o n o f a 4 t a p s T G S T f i l t e r 2 9 5
E R o b u s t n e s s o f t h e T G S T d e - i n t e r l a c e r 2 9 7
Trang 16C o n t e n t s x v
F M S d e - i n t e r l a c e r o p t i m i z a t i o n for a p r o g r a m m a b l e a r c h i -
F.1 TriMedia architecture 301 F.2 Majority-Selection de-interlacer on the TriMedia 303 F.3 C o m p u t a t i o n a l requirements 305
Trang 17x v i C o n t e n t s
This Page Intentionally Left Blank
Trang 18C H A P T E R
I n t r o d u c t i o n
F O R centuries, m a n k i n d has been creating paintings to p o r t r a y real or imagined scenes The oldest paintings in the world, found in a cave in the Ardeche Valley of France, are estimated to go back to about 30,000 years ago Paintings rather t h a n text written in characters were the first means
of communication, and as an old Chinese proverb 'a picture is worth ten
t h o u s a n d words' indicates, an efficient one It is, therefore, not surprising
t h a t m a n is highly interested in looking at pictures It took centuries for the next step to be taken; motion pictures The first movie pictures shown to the public, by the Lumiere brothers, date from 1895 in the G r a n d Cafe in Paris (France) The early years of the film i n d u s t r y was a time of exploration
Of course no preconceived idea about how to make films yet existed, so filmmakers had to learn by trial and error
The idea of 'vision at a distance', i.e scenes reproduced far from their
origin, can be traced back to the 19th century, and it is not unlikely it originated from an even earlier time However, it took till the late thirties
before television (TV), as a first realization of this concept was introduced
to the public, which took place at the World's Fair in 1939 However, from
t h a t time onwards, the television i n d u s t r y did not take long to grow to a multi-billion dollars industry
The penetration of the TVs in U.S households was about 9% in 1950 [1]
W i t h i n five years the percentage went up to 64.5% The 1999 p e n e t r a t i o n
is at a level of 98.2% The U.S television households with two or more sets accounted for about 1% in 1950, and grew to 74.3% in 1999 [1] So we may
Trang 19C h a p t e r 1 - I n t r o d u c t i o n
conclude that television has become a major product for entertainment, communication and information
Webster's dictionary defines television as
'an electronic system for transmitting images of fixed or moving objects together with sound over a wire or through space by an apparatus that converts light and sound into electrical waves and reconverts them into visible light rays and audible sound'
The process of converting light into electrical signals was enabled by the discovery of the photoelectric effect from selenium bars in 1873 Exposed to light, these bars show a variation in resistance As such, variation in light can be transformed into a variation of an electrical signal, and therefore, be transmitted
One of the earliest methods of scanning a picture to generate a corre- sponding electrical signal is described in a patent granted to the German Paul Nipkow He invented an electromechanical scanning technique based
on a rotating disk with series of holes arranged in a spiral The light sen- sitive selenium bars behind this perforated disk captured the picture This disk became known as the Nipkow disk However, Nipkow could not put his idea into practice with the materials and knowledge available at t h a t time Another scientific development in the end of the 19th century offered
an alternative; the usage of the electron A tiny particle of negative charge with almost negligible inertia became a main focus of research Karl Fer- dinand Braun of the University of Strasbourg had, in 1897, the idea of using two electromagnets to make the electron beam move in the horizontal and vertical direction To demonstrate his idea, he built the oscilloscope The cathode rays of electrons were illuminated by fluorescent materials at the end of the tube This system became known as the Cathode Ray Tube
television sets of today
With the introduction of television in the 1930s, standardization was required, i.e rules or constraints for transmitting and receiving pictorial information, similar to e.g the rules how to read a paper; in many countries the commonly accepted rules are: read from the top to the bottom of a paper, and from left to right Common TV display use the same scanning direction
Next to economical constraints, technical and psycho-visual criteria mainly formed the core of the standardization for television signals AI- though many television standards evolved over time (as e.g PAL, NTSC, SECAM), some elementary characteristics remained common in several start-
Trang 201.1 H i s t o r i c a l b a c k g r o u n d of i n t e r l a c e
dards In particular vertical-temporal subsampling, i.e interlace, was found
a good means to reduce the b a n d w i d t h , as it profits from the psycho-visual characteristics of the Human Visual System (HVS)
Although these s t a n d a r d s were defined, picture quality improved signif- icantly over the years Starting from small, low-resolution pictures and low light o u t p u t towards the bright, high resolution and large screen sizes of today It is, therefore, not unlikely t h a t technological choices made in the past are less optimal for the current state of technology
Given the large n u m b e r of television receivers t h r o u g h o u t the world, any technological advance has to be compatible with existing s t a n d a r d s [2] However, the advent of digital video has restarted the discussion on inter- lace in broadcast standards As b o t h technical and non-technical issues affect the debate on interlaced or progressive video, it is unlikely t h a t we can silence all discussions on interlace or progressive video broadcast How- ever, this book provides the ingredients t h a t enable a profound comparison between b o t h scanning formats 1, as well as the comparison itself The re- sults of this book may provide a framework for the technical part of the discussion of interlaced versus progressive video
In Section 1.1, we will briefly focus on the historical background of in- terlace Section 1.2 focuses on the reversed process, de-interlacing, which
is a basic requirement for several video processing applications As d e - interlacing increases the vertical resolution, it can be considered as a o n e - dimensional derivative of superresolution We will further elaborate on this
in Section 1.3 In Section 1.4, we discuss the link between i n t e r l a c e / d e - interlacing and M P E G - 2 coding/decoding, which is followed in Section 1.5 with the motivation for the research t h a t forms the core of this book
The transmission of t i m e - v a r y i n g pictures, usually referred to as video, re- quires a means to convert the sequence of two-dimensional pictures into a one-dimensional signal, which can be either analog or digital The spatio- temporal information contained in this video is ordered as a function of time according to a predefined scanning format This scanning format, which is a major part of a video standard, defines the n u m b e r of video or scanning lines
per picture, and the n u m b e r of pictures per second The n u m b e r of scanning lines defines the maximally achievable vertical resolution, whereas the num- ber of pictures per second (the temporal repetition frequency, with unity 1A scanning format defines the manner in which a time-varying picture is explored for its luminance and chrominance values
Trang 21C h a p t e r 1 - I n t r o d u c t i o n
Hz) defines the achievable temporal resolution Finally, the maximum per- ceivable 2 horizontal resolution is determined by the video bandwidth, spot size, video format, and in the digital format: the 'picture element' (pixel)
sampling frequency
It has been found [2] that just 10 pictures per second represents an ade- quate rate to convey the illusion of motion The ability to retain or in some way to remember the impression of an image after it has been withdrawn from the observer persists for about 0.1 seconds Motion pictures and tele- vision use higher rates to reduce the visibility of flicker The perception
of flicker varies widely with viewing conditions The screen size, colour, brightness, viewing angle, and background illumination all affect the per- ceptibility Movie pictures are recorded at a rate of 24 pictures per second, however, if displayed at this rate, the flicker would still be objectionable
To nearly eliminate flicker, the display frequency was increased by a fac- tor of two by displaying every picture twice The resulting picture-update frequency of 48 Hz is still used for motion pictures in cinemas
At the time of the introduction of television, it was, therefore, necessary
to chose a picture-update frequency of at least 48 pictures per second To avoid artifacts in the picture caused by the cycle frequency of the mains power 3, the picture-update frequency was set to 60 pictures per second (using 525 scanning lines) mainly in the continent of America, but also in some countries in Asia, like Japan In most other parts of the world, a standard of 50 pictures per second (using 625 scanning lines) was adopted
A video transmission system of 50 or 60 'full' pictures per second was considered not to be economically attractive An ingenious solution was found that both reduced the required video bandwidth, reducing system costs, while maintaining a nearly flicker free picture This is referred to
as interlacing As sketched in Figure 1.1, when a picture is displayed in the interlaced format, the odd and even scanning lines of the picture are alternatively projected on the screen (Higher orders of the interlace factor have been proposed and evaluated, but a factor of two was found to maxi- mize the quality criteria) A set of lines which together describe a picture
is referred to as a frame The odd numbered lines of the frame, together constituting the odd field (also known as the top field), are shown in a first scan on the display, and the even numbered lines, forming the even field
(also known as the bottom field), in a second run (see also Figure 1.1) [4] The picture-update frequency remains 50 or 60 pictures per second, while 2Assuming t h a t the HVS is not t h e limiting factor
3The relation with the mains was necessary due to problems in the past with t h e voltage regulation of the power supply in the television sets [3]
Trang 22A s t u d y by E n g s t r o m [5] in the beginning of the thirties on interlaced scanning already revealed the effect of what is called line flicker, i.e flicker- ing t h a t is often due to horizontal edges in the picture In his experiments,
he used a rotating disk such as sketched in Figure 1.3 rotated at 24 revolu- tions per second The inner section of the disk corresponds to the situation where each line is illuminated for two thirds of each frame cycle at a rate of
48 frames per second, i.e a progressive scanning pattern The outer section corresponds to a condition where each line is illuminated for two thirds of each frame cycle at the rate of 24 frames per second, but such t h a t alternate groups of lines are illuminated 180 degrees out of phase, i.e an interlaced scanning p a t t e r n with a field frequency of 48 pictures per second
Starting with a viewing distance considerably beyond t h a t which allowed the observation of individual lines, a flicker effect was not noticeable Ap- proaching the disk, it was observed t h a t the line structure could be resolved
at a certain position, and at the same time a peculiar interline effect was observed for the outer section of the rotating disk This behaviour became very pronounced and annoying for the observer who approached the disk closer, whereas for the inner section, this effect was not noticeable This effect is referred to as line flicker or interline twitter
Trang 23C h a p t e r 1 - I n t r o d u c t i o n
Figure 1.2: H V S graph (source [6]) The contrast sensitivity decreases
rapidly with increasing vertical frequency The H V S is less sensitive to flickering detail than to large area flicker
Figure 1.3: Special rotating disk for flicker tests with interlaced scanning
Trang 241.1 H i s t o r i c a l b a c k g r o u n d of interlace
Figure 1.4: Aliasing in the vertical direction due to interlacing the video
(fs equals the frame sampling frequency)
An alternative explanation of line flicker is provided if we consider in- terlacing as vertical subsampling with a field a l t e r n a t i n g vertical offset, but
w i t h o u t prior anti-aliasing filtering From the linear sampling rate conver-
sion theory [7] it is known t h a t due to missing proper anti-aliasing filtering
aliasing occurs, as the first repeat s p e c t r u m folds back into the baseband
(see the example shown in Figure 1.4) Aliasing occurs for the higher vertical frequencies in b o t h fields with opposite sampling phases
On c o m m o n television displays, line flicker is noticeable only at very fine vertical detail As a first example, consider a black picture with a single white horizontal line, then this white line disappears only in half of the fields Because the u p d a t e - f r e q u e n c y of the fields equals either 25 or 30 pictures per second, it is not sufficient to avoid flicker, i.e the p i c t u r e -
u p d a t e frequency is too low As a second example, consider a black picture with two horizontal white lines T h e ~top' white line is shown in the first field and the ~bottom' white line in the second field It seems t h a t the line moves downwards and upwards alternately
A special case of line flicker is created if a picture is m a d e up of alter- nating light and dark lines As a result, in one field only the light lines will
be displayed and in the next field only the dark lines Consequently, the complete picture flickers with half the p i c t u r e - u p d a t e frequency
A p a r t from the line flicker, there is a second effect t h a t is also typical
to interlace called line crawl Line crawl results from the interlace process
when the eye scans the picture vertically at a speed of one scan line per field This occurs for example if the observer tracks an object in the screen
t h a t is moving in the vertical direction with a b o u t this speed (e.g a rolling caption) The line s t r u c t u r e of the display becomes visible and it seems to
~crawl' across the object Even if the picture has a homogeneous bright- ness, the observer can perceive an a p p a r e n t movement of the lines In this case, the observer interprets the scanning lines as if belonging to a moving
Trang 25C h a p t e r 1 - I n t r o d u c t i o n
structure
Despite the line-flicker effect and line crawl for interlaced video, interlace
resolution Moreover, due to characteristics of the pick-up device and the common picture material, very high vertical frequencies are virtually absent
or at least limited to small image parts
displays is significantly higher than that for the interlaced displays, as the number of vertical lines is increased, whereas the picture-update frequency
is not decreased
Although the regular (interlaced) Cathode Ray Tube (CRT) displays
are still preferred mainly because of costs and the amount of light out- put, advances in particularly flat matrix displays will inevitably replace an increasing number of the CRTs Moreover, the increasing diversity of dif- ferent scanning formats for the various display types increases the need for video format conversion In particular the conversion of interlaced video to
progressive video, referred to as de-interlacing, provides for an increasing
demand
De-interlacing converts each field into one frame, i.e the number of pic-
tures per second remains constant, whereas the number of lines per picture
is doubled, as sketched in Figure 1.5
De-interlacing is a simple matter for stationary pictures (no object or camera motion and no intensity changes (apart from noise)), as together the alternating odd and even fields describe the captured scene However, often objects move, the camera moves, light conditions change, and scene cuts frequently occur In these circumstances, de-interlacing if often empirically determined, as it requires the interpolation of picture data that was never transmitted or even registered It is the challenge to estimate from the current and, not unlikely, from neighboring pictures the missing information 4In matrix displays the pixels are addressed individually in both spatial dimensions
Trang 261.2 De-interlacing
F i g u r e 1.5: The process of de-interlacing
t h a t most likely reconstructs the original (non-registered) scene
As de-interlacing is a simple m a t t e r for s t a t i o n a r y image parts, we can virtually create s t a t i o n a r y images by compensating for the so-called true motion 5 However, motion compensation requires motion estimation, but motion vectors (see also Figure 1.6) are not broadcasted according to the
conventional broadcast standards The model used to estimate the motion
is only a simplified representation of the 'real world' Motion estimation (ME) was, and still is subject to much research, as it is a f u n d a m e n t a l problem in m a n y video processing tasks [8]
A p o p u l a r type of motion estimator estimates the motion for every block
or group of pixels, i.e it indicates whether image parts are moving, and if
so, in what direction and with which velocity This velocity is commonly
projected on the t w o - d i m e n s i o n a l image plane, and a motion vector is avail- able for every individual pixel The t w o - d i m e n s i o n a l motion vector is the projection of the motion trajectory in the image plane, as shown in Figure
1.6 (see also C h a p t e r 3)
It is relevant to mention t h a t the d e m a n d s for a motion estimator in video format conversion generally differs from the d e m a n d s for e.g cod- ing applications [4, 9] Motion estimators for predictive coding generally aim at minimizing the prediction error, i.e neighboring motion vectors are not necessarily spatially well correlated in homogeneous regions The best results for video format conversions are obtained with motion estimators
t h a t estimate the true motion of objects instead of the best correlation of 5The true motion does not necessarily equal the physical motion of objects, but it represents the projection of the physical motion onto the two-dimensional image plane
Trang 27The image acquisition device used to capture pictures or a sequence of pictures (video), samples the video signal at least in the vertical and the
temporal direction However, a proper anti-aliasing filter prior to sampling
is rather difficult to realize (in the optical path), and as such, this filter
is missing The quality of the optics generally exceeds the quality of the image capture device Consequently, the video signal is undersampled in the vertical and temporal direction, and as a result, the picture usually suffers from aliasing
Elimination of alias is to some extent possible by combining the in- formation from multiple pictures This is what superresolution aims at
Trang 281.3 R e l a t i o n w i t h s u p e r r e s o l u t i o n 11
Superresolution refers to obtaining video at a resolution higher than that of the pickup device [8], which is only possible if the 'lower' resolution pictures contain alias As such, we may consider the set of de-interlacing algorithms
as a subset of the superresolution algorithms Superresolution is, however, commonly pursued in both spatial directions
Similar to de-interlacing, the problem to obtain superresolution from
a single 'low-resolution' picture is known to be ill-posed However, the problem becomes solvable when a sequence of these 'low-resolution' pic- tures with small mutual differences is considered Superresolution exploits the 3-D correlation (horizontally, vertically, and temporally) that is usually present in video It upconverts the input picture while eliminating or re- ducing the alias A high-quality superresolution can only be achieved with proper motion-estimation and motion-compensation techniques, similar to de-interlacing
Despite this similarity, the algorithms applied for superresolution and de-interlacing differ significantly De-interlacing algorithms are subject to real-time constraints, whereas the generation of superresolution is com- monly (still) an 'off-line' process Most algorithms for superresolution are iterative, in the sense that they start with an estimate of the higher reso- lution image and iteratively update this image using multiple neighboring pictures Due to the real-time constraints, and the demand for a consumer price level for de-interlacing, iteration is (still) hardly feasible Moreover,
to reduce system costs, de-interlacing techniques minimize the number of neighboring pictures used (commonly to one or two surrounding pictures), whereas for superresolution it is not uncommon to solve the problem with about ten or even more surrounding pictures For an example see Refer- ences [10, 11]
Applications that can profit from superresolution are, for example, print- ing of a captured video scene at an high quality level, or detection of small targets in military or civilian surveillance imaging, or detection of small tumors in medical imaging [8] These applications mainly focus on the res- olution improvement to yield a single or limited set of output pictures The application to video is expected to get more and more attention As such,
it is not unlikely that superresolution algorithms for video will use similar techniques as de-interlacing, and both research areas may profit from the knowledge obtained in these research areas
Although superresolution is at present an interest of many researchers,
we will only focus on the vertical resolution improvement techniques for standard video signals, introduced earlier as de-interlacing
Trang 29i.e the digitally coded video is modulated on a RF carrier frequency This shift from analog to digital video broadcast is far from trivial Consider a video sequence with 720 active samples (pixels) per line and
576 lines per full frame, 8 bits for the luminance signal, 8 bits for the chrominance signals, and 25 frames per second 7 A transmission capacity
of 720 9 576 9 2 9 8 9 25 ~ 165Mb (or 21 MB) per second per channel is required Broadcasters aim at bit rates of roughly 2 up to 8 Mb/s, and
as such, compression ratios of about 20 up to 80 are required to combine several digital video channels into one analog video channel (cost reduction) Therefore, compression techniques are a prerequisite for transmitting digital video
Video compression is the key enabling technology for digital video As
ratios with acceptable quality levels, it is not surprising that the consumer electronics industry has adopted the M P E G - 2 compression standard for Digital Video Broadcast (DVB), the Advanced Television Systems Com- mittee (ATSC) standard in the USA, and the Digital Versatile Disk (DVD)
M P E G - 2 enables the (near) future replacement of the analog video broad- cast and recording
Since the M P E G - 2 standard, next to interlace, also enables progressive coding, it is not unlikely that the progressive video format will be supported
by several professional and consumer products Moreover, the rapid growth
in display technology has led to a diversity of display types like P D P (Plasma Display Panel), PALC (Plasma Addressed Liquid Crystal), LCD (Liquid Crystal Display), and projection displays that are commonly addressed in
a progressive video format Besides these developments a significant part
of the programme material is available only in the interlaced format, the conversion from the interlace to the progressive format, i.e de-interlacing,
is a requirement at either the transmitter or at the receiver side De-
6Radio Frequency
7These numbers originate from an European (PAL) video signal, sampled at 13.5 MHz clock frequency, which is a rather common sampling frequency
Trang 301.5 M o t i v a t i o n a n d s c o p e of t h i s b o o k 13
interlacing remains, therefore, a key enabling technology that is not limited
to the conventional analog video broadcast The trade-off of the M P E G -
2 coding efficiency of interlaced versus progressive video is an interesting research topic addressed in this book, and it may contribute to the future relevance of work on de-interlacing
De-interlacing is a key technology for many scanning format conversions
It goes without saying that it is a requirement for converting an interlaced video signal into a progressive video signal, but also for conversions between interlaced video with different picture-update frequencies
De-interlacing increases the vertical resolution per field with a factor
of two However, as common TV signals do not fulfil the demands of the sampling theory, i.e the Nyquist criterion , we cannot rely on the lin- ear sampling-rate conversion theory It is even fundamentally impossible
to solve the de-interlacing problem under all circumstances, as will be ex- plained Probably, this fundamental problem has resulted in the large vari- ety of de-interlacing techniques proposed in the literature
Some researchers completely neglect this problem and apply the sam- pling rate conversion theory Others try to exploit the commonly high spatio-temporal correlation in the video signal to estimate the missing in- formation that has never been transmitted or even registered Neglecting the Nyquist criterion, i.e solving the de-interlacing problem pure spatially, yields an overall weak de-interlacing performance, while including vectors describing the motion of objects in the scene further improves this perfor- mance
It seems, however, rather difficult to guarantee robustness of the de- interlacer for incorrect motion vectors, while preserving the high vertical frequencies being present in many detailed picture parts and edges The challenge was to design a new de-interlacer that surpasses the performance
of the best de-interlacers known so far, while bearing economical constraints
in mind It goes without saying that the best de-interlacing quality can potentially be obtained with motion-compensation techniques Moreover, highly accurate (true-) motion vectors potentially further optimize the de- interlacing performance
We may question the relevance of our effort to improve the de-inter- lacing quality, as the digital video standard has restarted the discussion of interlaced and progressive video Is interlace a relic, i.e is interlace an outdated format, or is it still a good means to reduce the bit rate while
Trang 3114 C h a p t e r 1 - I n t r o d u c t i o n
F i g u r e 1.7: An ezample of typical blocking artifacts that can appear in a
MPEG-2 decoded picture
preserving resolution?
To justify our effort in de-interlacing, we also included in this book a comparison between the coding efficiency of interlaced and of progressive video Although several researchers have published comparable studies [12- 16], we found reasons to believe t h a t some very relevant aspects in their work are missing Particularly the effect of 'blocking artifacts' in the decoded
pictures, as illustrated in the example of Figure 1.7, is missing Perhaps even more important; a subjective assessment for the most relevant bit rates (about 2 to 8 Mb/s) is missing Moreover, we found t h a t most researchers investigated the comparison of the interlaced versus the progressive video format only for sequences containing very high vertical frequencies Up till now, less challenging, and perhaps even more common picture material t h a t contains less vertical detail but stronger motion was completely neglected
in the investigations
It is without question that, from a technical point of view, a thorough investigation prior to the debate of interlaced versus progressive video is required For a fair comparison, we need to provide:
9 A high quality de-interlacer,
Trang 321.5 M o t i v a t i o n a n d s c o p e o f t h i s b o o k 15
9 A subpixel-accurate motion estimator,
9 A representative test set for the evaluation, and
9 Decent error criteria for the analysis
In this book, we will therefore focus on existing de-interlacing algo- rithms, means to improve the motion vector accuracy, improving the d e - interlacing quality, coding and display characteristics for interlaced versus progressive video
After the introduction in C h a p t e r 1, C h a p t e r 2 presents an overview
of de-interlacing techniques t h a t are either found in a consumer product or
in the literature The de-interlacing techniques range from linear spatial methods to the sophisticated m o t i o n - c o m p e n s a t e d techniques This chap- ter includes an evaluation section of several de-interlacers revealing some strengths and weaknesses of the evaluated algorithms To enable a quick comparison of the various methods, we introduced a so-called star graph,
based on two objective quality criteria The star graph is a footprint of a
m e t h o d immediately showing some strengths and weaknesses
High quality de-interlacing relies on accurate (true) motion vectors, which need to be estimated Reuse of concepts designed for video com- pression techniques is not an option These estimators are designed to rain- imize the prediction error in predictive codecs s, but the resulting vectors not necessarily reflect the true object motion required by our application Furthermore, they usually lack accuracy Therefore, we devote C h a p t e r 3
to the subject of accurate t r u e - m o t i o n estimation
In a first a t t e m p t to optimize the de-interlacing performance, C h a p -
t e r 4 presents means to exclude preferences in the motion estimator for particular motion-vector fractions Preferences do not change the motion vector resolution, but degrade the accuracy We investigated preferences
t h a t are due to the choice of the interpolator function 9 and the motion es- timation type We found a constraint that, if applied, nearly eliminates all preferences in the motion estimator for the relevant (spatial) frequency range
Accurate motion vectors are a first step towards high-quality de-inter- lacing However, as perfect motion estimation is an ideal t h a t we can only pursue, a high-quality de-interlacer requires means to prevent annoying artifacts for incorrect motion vectors In Chapter 2, we found satisfactory results with some de-interlacers, but no de-interlacer seemed to combine all svideo encoding and decoding systems
9This interpolator function is used to obtain the subpixel fraction
Trang 331 6 C h a p t e r 1 - I n t r o d u c t i o n
desired strengths into one de-interlacer Therefore, C h a p t e r 5 focuses on a
further optimization of the overall de-interlacing quality by combining sev-
eral de-interlacers with strengths on detail preservation, edge preservation,
and robustness for incorrect motion vectors
In C h a p t e r 6, we investigated the relevance of our effort in de-interlacing
for future systems Chapter 6 includes a thorough investigation between
the interlaced and the progressive video format with respect to the M P E G -
2 coding efficiency Compared to earlier published research in this area
[12-16], we profit from a high-quality de-interlacer, a test set containing
sequences with different characteristics, extended error criteria and a sub-
jective assessment We found, in contrast to published research so far, su-
periority of interlaced video over progressive video for particular scenarios
In C h a p t e r 7, we further explored the comparison of interlaced ver-
sus progressive video, not with respect to the coding efficiency, but with
respect to the display format Moreover, we extended the evaluation to dis-
play formats that require a different refresh rate, and, therefore, scan rate
conversion techniques The included subjective assessment indicated that
an interlaced display format on average produced qualitatively better results
than the progressive display format with the same sampling frequency
Finally, the conclusions are formulated in C h a p t e r 8
i
/i~ii! ~i~i~! I!~84
Trang 35This Page Intentionally Left Blank
Trang 36Overview of de-interlacing algorithms
D E - I N T E R L A C I N G is a prerequisite in various video processing sys- tems To mention some:
9 T V receivers with a progressive display
9 Broadcast-enabled PC [17]
9 Systems t h a t require vertical scaling of interlaced video
9 Most systems with scanning format conversion (assuming interlaced input), even with interlaced output
W i t h o u t requiring a de-interlacer, some systems may, however, profit from
a de-interlacer, like:
9 Motion estimators (MEs)
9 Encoders for digital video compression
Figure 2.1 illustrates the de-interlacing task The input video fields,
containing samples of either the odd or the even vertical grid positions
(lines) of an image, have to be converted to frames t h a t contain all video lines These frames represent the same images as the corresponding input
19
Trang 372O Chapter 2 - Overview of de-interlacing algorithms
,_.
Figure 2.1: The de-interlacing task
fields but contain the samples of all lines Note that the temporal frequency, i.e the number of pictures per second, is not changed Formally, we define the output frame Fo~t(:g, n) as:
n mod 2" only, and Fi(J, n) the interpolated pixels Note that "y mod 2 =
n rood 2" is true for odd lines in odd fields and even lines in even fields only, which will be called original lines The remaining lines will be called
interpolated lines
Many de-interlacing algorithms have been reported in the literature and some are available in a commercial product The quality performance of these algorithms, however, differs significantly This chapter compares many
of these algorithms, and includes an evaluation based on objective quality- performance criteria, and an explanation of the subjective use of screen photographs (see also ref [18])
In the subsequent sections, we only define Fi(2", n) for the various de- interlacing methods, as the original lines, Fo(J, n) (also indicated as F(2", n)), are unchanged, unless mentioned otherwise (It is assumed that the original lines contain the desired information As such, these lines do not require any modification However, it can be beneficial to modify the original lines,
as will be explained)
Section 2.1 presents the de-interlacing problem in the context of spatio- temporal sampling grids and psycho-visual effects Section 2.2 shows an overview of the de-interlacing algorithms that do not use motion infor- mation, and in Section 2.3, the overview is continued with de-interlacing
Trang 382.1 T h e d e - i n t e r l a c i n g p r o b l e m 21
Figure 2.2: Spatio-temporal sampling of the video signal
algorithms that apply motion vectors Section 2.4 presents an objective evaluation of the de-interlacing methods Screen photographs are included
in this section to illustrate typical artifacts of the individual de-interlacing algorithms Finally, conclusions are drawn in Section 2.5
If we describe interlacing as a spatio-temporal sub-sampling process, then de-interlacing is the reverse process: spatio-temporal up-sampling Al- though the sub-sampling and the up-sampling process are well described
in sample-rate conversion theory [7], we will explain that this theory is not generally applicable to the de-interlacing problem
2 1 1 S p a t i o - t e m p o r a l sampling
Spatio-temporal sampling is applied on the continuous time-varying video signal Fc(Yc, [l, t), where (~, ~) is the continuous (spatial) horizontal and vertical position respectively and t the temporal position Recall that the analog video signal is a I - D continuous signal with the spatial position (~, ~)) mapped to the time t In order to obtain an amplitude and time- discrete representation of the analog video signal, sampling is required in the three dimensions, as shown in Figure 2.2, where (x, y, n) denote the discrete spatial and temporal coordinates Note that sampling in the vertical
y and time t direction is part of the scanning format used in the camera Consequently, we can digitize the video signal by sampling the video in the horizontal direction x along each scan line
The spatio-temporal sampling is mathematically expressed as:
Trang 3922 C h a p t e r 2 - O v e r v i e w of d e - i n t e r l a c i n g a l g o r i t h m s
where T is the image period, n the image number, A~ and Ay the horizontal and vertical sample distance respectively, and where _]lk/x~ (2) is defined as (Ill zxy(~), _lll_T(t) are defined accordingly)"
with 9 for convolution, and as such, ** for 2-D convolution
According to Equation 2.5, the spectrum of the continuous video signal 9r~ is replicated in the 2-D frequency domain due to the 2-D sampling process (see also Figure 2.3)
The extension of the 2-D into the 3-D spatio-temporal sampling of Equation 2.3 results in:
The spectrum of the continuous video signal $-~ is therefore replicated
in three dimensions due to the 3-D sampling lattice
1This lattice describes the applied model for s p a t i o - t e m p o r a l sampling of a video signal In practice, the video signal is sampled continuously, which also includes the 'fly-back' time
Trang 402.1 T h e d e - i n t e r l a c i n g p r o b l e m 23
F i g u r e 2.3: Replication of the continuous spectrum after 2-D sampling;
a) Fourier spectrum ~c, b) the 2-D sampling grid, c) spectral support of the sampled image
a consequence of the temporal changes by motion This is illustrated in Figure 2.4