Video Codec Design• High video coding efficiency satisfies a fundamental need in digital video systems – Reduce bandwidth / better quality – More programming and services • Other feature
Trang 1Advanced Video Codecs:
What’s on the Horizon?
Anthony Vetro Mitsubishi Electric Research Labs,
Cambridge, MA
avetro@merl.com
Trang 2Historical Perspective
• Today’s DTV broadcast based on MPEG-2
– Huge success, wide deployment, made DTV possible
– Technology basis for this standard is 20 years old
– Contending with legacy issues
Digital Broadcasting
DVD Video
Trang 3Current State-of-the-Art
MPEG-4 / H.264 AVC
• Half the bit rate of MPEG-2
with same quality
• Supported for mobile, but
not main program
• Mature standard, large
scale deployment for HD
• Technology basis for this
standard is 10 years old
• Candidate for a
next-generation broadcasting
system (but depends on
requirements…)
0 20 40 60 80 100
MPEG-2 MPEG-4 ASP H.264/AVC
Mobile (CIF) Bus (CIF)
Percentage bit-rate relative to MPEG-2
(at 32 dB)
Trang 4Video Codec Design
• High video coding efficiency satisfies a fundamental
need in digital video systems
– Reduce bandwidth / better quality
– More programming and services
• Other features consider in video codec design
– Network friendliness (e.g., NAL unit concept in AVC)
– Error resilience (concepts such as data partitioning, slices,
resynchronization markers have been around for many years)
• Extensions – traditionally come later to expand scope of
standard and enable additional applications or services
– Scalability (temporal, spatial, SNR)
– Professional (4:4:4, 10/12-bit)
– Multiview (support for stereo/3D)
Trang 5• Cursory overview of video coding
architecture and existing tools
• Recent developments – update on a new
video coding standardization project
• Enabling new services
• What’s on the horizon
(in terms of video coding)
Trang 6Video Coding Basics
Trang 7Exploiting Redundancy
I B B P B B P
Decorrelate data Energy packing
Trang 8Typical Video Coding Architecture
block partitioning
Transform/
Quantization
Intra-frame Prediction
Motion Compensation
Entropy Coder
Inv Quant/
Transform
In-Loop Filter
Motion Estimation
+
-+
Trang 9MPEG-2 Coding Tools
Transform/
Quantization
Intra-frame Prediction
Motion Compensation
Entropy Coder
Inv Quant/
Transform
Motion Estimation
-8x8 DCT Quant Matrix
DC coef
prediction
Adaptive Field-Frame
Prediction
(16x16, 16x8)
VLC
Huffman tables
Half-pel resolution
16x16 macroblocks
+
+
Note: no in-loop filter
Post-filter may be applied
8x8 DCT Quant Matrix
Adaptive Field-Frame
Adaptive Field-Frame
Prediction
(16x16, 16x8)
VLC
Huffman tables
DC coef
prediction
8x8 DCT Quant Matrix
Adaptive Field-Frame
DC coef
prediction
8x8 DCT Quant Matrix
Adaptive Field-Frame
Prediction
(16x16, 16x8)
Two reference pictures (max)
Trang 10MPEG-4/H.264 AVC Coding Tools
Transform/
Quantization
Intra-frame Prediction
Motion Compensation
Entropy Coder
Inv Quant/
Transform
In-Loop Filter
Motion Estimation
Adaptive VLC or Arithmetic Coding
Context-Spatial intra prediction
Trang 11Coding Efficiency Improvements
Trang 12Recent Developments
New Standardization Project
Trang 13The new JCT-VC Partnership
• Initial groundwork in VCEG and MPEG
• New team formed in January 2010
• Joint Call for Proposals issued
• Joint Collaborative Team on Video Coding (JCT-VC)
• Chairs:
– Gary Sullivan (Microsoft)
– Jens-Rainer Ohm (RWTH Aachen Univ.)
• First meeting: Dresden Germany, April 2010
• Project name: High Efficiency Video Coding (HEVC)
• Document archives are publicly accessible
http://wftp3.itu.int/av-arch/jctvc-site/ or http://phenix.int-evry.fr/jct/
Trang 14Call for Proposals Testing
• 27 complete proposals submitted
• Each proposal was a major package
– Lots of encoded video, extensive documentation, extensive performance metric submissions, sometimes software, etc
• Quality of proposals were compared to AVC anchors
• Extensive subjective testing
– 3 test labs, 4200 video clips evaluated, 850 subjects,
300,000 scores collected
– In a number of cases, comparable quality at half bit rate
Trang 15Test Classes and Bit Rates
• 3-5 video clips subjectively tested in Classes B-E
• Testing for both “random access” (1 sec) and
“low delay” (no picture reordering) conditions
• Complexity also considered in anchor encodings
Class Bit Rate 1 Bit Rate 2 Bit Rate 3 Bit Rate 4 Bit Rate 5
A: 2560x1600p30 2.5 Mbit/s 3.5 Mbit/s 5 Mbit/s 8 Mbit/s 14 Mbit/s
B1: 1080p24 1 Mbit/s 1.6 Mbit/s 2.5 Mbit/s 4 Mbit/s 6 Mbit/s
B2: 1080p50-60 2 Mbit/s 3 Mbit/s 4.5 Mbit/s 7 Mbit/s 10 Mbit/s
C: WVGAp30-60 384 kbit/s 512 kbit/s 768 kbit/s 1.2 Mbit/s 2 Mbit/s
D: WQVGAp30-60 256 kbit/s 384 kbit/s 512 kbit/s 850 kbit/s 1.5 Mbit/s
E: 720p60 256 kbit/s 384 kbit/s 512 kbit/s 850 kbit/s 1.5 Mbit/s
Trang 16Example Subjective Evaluation
Best Performing Proposal at 1 Mbps
Anchor at 1.6 Mbps
Anchor at 1 Mbps
Anchor at 2.5 Mbps
Trang 17Example Subjective Evaluation
Anchor at 4.5 Mbps
Anchor at 3 Mbps
Anchor at 1 Mbps
Best Performing Proposal at 2 Mbps
Anchor at 4.5 Mbps
Anchor at 3 Mbps
Anchor at 1 Mbps
Anchor at 4.5 Mbps
Trang 18Overall Average Mean Opinion Score
Trang 19rrace
Bask
etballDril
l
BQMall
PartyS
cene
Race
Horses sketballP
ass
BQSq
uare
owingBubbl es
Sample Objective Gains
Class A Class B Class C Class D Class E Average
Bit rate savings (%) relative to AVC anchors
Trang 20Basic Technology Architecture
All proposals conceptually similar to AVC (and prior standards)
Lots of variations at the individual “tool” level
Proposal survey output documents from first meeting:
Trang 21Current Status & Schedule
– Minimum set of tools, coherent design
– Tools confirmed to show good capability
• Schedule
– Approved HM/WD 1 at October 2010 meeting!
– Committee Draft (CD): February 2012
– Final Committee Draft (FCD): July 2012
– Final Draft International Standard (FDIS): January 2013
Trang 22Key Elements of HM/WD 1 Design
• NAL units and high-level syntax as in AVC
• Coding unit (CU) with selectable size (like macroblocks)
– Prediction unit (PU) for intra/inter prediction (with PU merging)
– Transform unit (TU) for residual transform
– Quadtree structures for CU/PU/TU
• Integer transforms (from 4x4 to 32x32)
• Deblocking filter and Adaptive loop filter (ALF)
• Internal bit depth increase (up to 12 bits)
• Angular intra prediction (34 directions max)
• Advanced MV prediction
• DCT-based interpolation filter for MCP (6-tap or 12-tap)
• Entropy coding
– CABAC (content adaptive binary arithmetic coder)
– LCEC (low complexity entropy coding)
Trang 23Large MC / Transform Block Size
Adaptive Transform Block Size for Intra
[Ref: JCTVC-B065]
More than a dozen tools currently under investigation
Trang 24Enabling
New Services
Trang 25Which New Services?
• What will consumers get excited about?
• What will make consumers reconsider the way that they receive content in their homes?
• Candidates
– Extended programming
– Higher resolution – 4K×2K, 8K×4K, Mobile HD
– Full color and bit depth – 4:4:4, 10/12 bit
– Multiview and 3D – stereo to auto-stereo
Trang 26Higher Resolution Services
100 inch TV on the wall
>100 degree view angle
Trang 27• Strong interest in 3D delivery
– Production of premium content increasing
– Numerous devices supporting stereoscopic
display available to the consumer
– Many standards being developed/amended
throughout the chain
• Basic delivery options
– Upgrade equipment/infrastructure
– Utilize capabilities of existing infrastructure
• Most activity focused on stereo services
Trang 28Auto-Stereoscopic Displays
• 3D viewing w/out glasses
– Pixel colors vary based on
viewing direction
• Various prototypes using
different display technology
– Lenticular, parallax barrier
– Integral imaging
• Challenge: High number of
views needed to drive display
View dependent pixel
Trang 29Target of 3D Video (3DV) Format
Data Format
Data Format Constrained Rate
Left Right
Auto-stereoscopic N-view displays
Stereoscopic displays
• Variable stereo baseline
• Adjust depth perception
Trang 30Multiview Video plus Depth (MVD)
• MVD is the reference format for 3DV: stereo
texture and stereo depth (encoded with MVC)
• Call for Proposals on 3D Video Coding
Technology to be issued in January 2011
Left Right
Trang 313DV Framework
Depth
Estimation
Video/Depth Codec
View Synthesis
Limited Video Inputs
(e.g., 2 or 3 views)
Larger # Output Views
1010001010001
Binary Representation
& Reconstruction Process
+
Trang 323DV Implications on Transmission
• Require compatibility with existing standards for
mono and stereo video services
– Expect compatibility with future formats as well
– Build on existing service or deploy as new service?
• Additional channels may be proposed, leading to
higher bandwidth requirements
Trang 33The Road
Ahead
Trang 345-year Assessment
• In 5 years: realistic to assume 75-80% lower
rate than MPEG-2 and 40-50% lower than AVC
• Does AVC address needs or plan for HEVC?
– Depends on targets for next-generation services
and corresponding timelines
Trang 35Future Outlook
Is it possible to push the video rates even lower?
Yes!
• Perhaps following same model that has been working
– Better prediction and motion modeling
– Better entropy coding and reduction of side info
– Better transform and decomposition of source signal
• More promising: perceptual video coding
– Framework would still make use of spatial/temporal prediction
– Perhaps utilize geometric modeling, e.g., of textures, regions
– Leverage computer vision, analysis/synthesis techniques
– New metrics that allow substantial point-by-point variations at
the pixel level w/out compromising structural similarity
Trang 36Structurally Lossless Images
Trang 37Structurally Lossless Images
• More than 20% pixels are different
• PSNR = 22.2dB
Trang 38Concluding Remarks
• Video compression technology has made
significant advances in last 20 years
– AVC is available, new HEVC standard is emerging
– Expect that further advances will come
• Next step: determine target services, schedule
and requirements for next-generation broadcast
(with video compression capabilities in mind)
• Acknowledgements:
– G.J Sullivan (Microsoft)
– T Murakami, K Asai, S Sekiguchi (Mitsubishi)