CSE 489-02 & CSE 589-02 Multimedia Processing Lecture 11 Video Coding

CSE 489-02 & CSE 589-02 Multimedia ProcessingLecture 11 Video Coding Spring 2009 New Mexico Tech

History H.264/AVC

MC-DCT Coding Framework • Motion estimation/compensation based on previously decoded frames • Block-translation motion model • Inter-coding: DCT-based coding of prediction error (residue) • Intra-coding: If motion estimation fails or synchronization is desired, macro-block is encoded in intra-mode • Most international video coding standards are based on this coding framework • Video teleconferencing: H.261, H.263, H.263++, H.264 • Video archive & play-back: MPEG-1, MPEG-2 (in DVDs), MPEG-4

Decoder Hybrid MC-DCT Encoder Input Macro-Block Transform, Quantization, Entropy Coding Encoded Residual (To Channel) Motion Compensated Prediction Entropy Decoding, Inverse Q, Inverse Transform Decoded Input Macro-Block (To Display) Motion Comp. Predictor Frame Buffer (Delay) Motion Vector and Block Mode Data (Side-Info, To Channel) Motion Estimation

Inter and Intra Coding • Intra • MB is encoded as is without motion compensation • DCT followed by Q, zig-zag, run-length, Huffman • Inter • Block-matching motion estimation • Predictive motion residue from best-match block is DCT encoded (similarly to intra-mode) • Motion vector is differentially encoded

Intra-Coding Mode input MB to bit-stream Encoder to motion compensated frame bit-stream to display frame Decoder

Inter-Coding Mode to bit-stream input MB Encoder reference frame

Video Sequence and Picture Intra 0 Inter 1 Inter 2 Inter 3 Inter 4 Inter 5 • Intra Picture (I-Picture) • Encoded without referencing others • All MBs are intra coded • Inter Picture (P-Picture, B-Picture) • Encoded by referencing other pictures • Some MBs are intra coded, and some are inter coded

Group of Pictures GOP GOP GOP … I B B P B B P … B B I B B P … Video stream Frame order: 0 1 2 3 4 5 6 Encoding order: 0 2 3 1 5 6 4 • Group of Pictures (GOP)

Coding of I-Slice DCT Original block Transformed block Quantization matrix Bit-stream 15 0 -2 -1 -1 -1 0 … Entropy coding Zig-zag scan

Coding of P-Slice Motion Compensation - = Original current frame Residual = Motion Vectors + Motion Estimation Frame buffer Reconstructed reference frame

8 8 Y Y Cr Cb Y Y Motion Estimation in H.261 • Macro-block • Luminance: 16x16, four 8x8 blocks • Chrominance: two 8x8 blocks • Motion estimation only performed for luminance component • Motion vector range • [ -15, 15] 15 15 15 MB 15 Search Area in Reference Frame

Coding of Motion Vectors • MV has range [-15, 15] • Integer pixel ME search only • Motion vectors are differentially & separably encoded • 11-bit VLC for MVD • Example MV = 2 2 3 5 3 1 -1… MVD = 0 1 2 -2 -2 -2… Binary: 1 010 0010 0011 0011 0011…

Inter/Intra Switching • Based on energy of prediction error • High energy: scene change, occlusions, uncovered areas…  use intra mode • Low energy: stationary background, translational motion …  use inter mode VAR INTER 64 INTRA MSE 64

Loop Filter • Optional • Can be turned on or off for each block, usually go together with MC • Advantage • Decreases prediction error by smoothing the prediction frame • Reduces high-frequency artifacts like mosquito effects • Disadvantage • Increases complexity & overhead

^ ^ = X = X Quantization • Uniform mid-rise quantizer for intra DC coefficients • Uniform mid-tread quantizer with double dead zone for inter DC and all AC coefficients Y Y 2 2 1 1 X -2Q -Q -2Q -Q 0 X Q 2Q 0 -1 Q 2Q -1 -2 -2 For intra DC For inter DC and all AC

H.263 • Standardization effort started Nov 1993 • Aim • low bit-rate video communications, less than 64 kbps • target PSTN and mobile network: 10-32 kbps • Near-term • H.263 and H.263+: established late 1997 • Long-term • H.26L, H.264: still under investigation • Main properties • H.261 with many MPEG features optimized for low bit rates • Performance: 3-4 dB improvements over H.261 at less than 64 kbps; 30% bit rate saving over MPEG-1

MPEG • Coding and communications of moving pictures and associated audio for digital storage and archival • MPEG: Moving Picture Expert Group • MPEG family • MPEG-1, Nov 1992 • MPEG-2, Nov 1994 • MPEG-4, Oct 1998 • MPEG-7, ongoing work • Main features of the MPEG video family • Bi-directional MEMC • I-frame, P-frame, B-frame • Structure: Group of Pictures (GOP), picture, slice, macro-block • Coding decisions

MPEG Goals and Applications • MPEG-1 • Optimized for applications that support a continuous transfer bit rate of about 1.5 Mbps (example, CD-ROM) • Target 1.2 Mbps for video and 250-300 kbps for audio, around analog VHS quality • Does not support interlaced sources • Main target source: SIF YCrCb 4:2:0 360 x 240 x 30 fps • VCD • MPEG-2 • The most commercially successful international coding standard • Wide range of bit rates: 4 – 80 Mbps; optimized for 4 Mbps • Target high-resolution, high-quality video broadcast & playback • DVD, Digital TV: DirecTV, HDTV…

Requirements • Coding of generic video at around 1.5 Mbps at reasonable quality (VHS) • Random access capability, frequent access point • Fast forward and fast rewind capability • Audio-video synchronization during play and access • Simple decoder • Flexibility of data format • Certain degree of robustness to communication errors • Real-time encoder possibility

From H.261 to MPEG-1 • There are a few new features in MPEG-1 comparing to the pioneering H.261 codec • Flexible data sizes and frame rates • More flexible slice structure to replace the fixed GOB structure • Data structure: introducing Group of Picture (GOP) allowing frequent access points • Bi-directional motion compensation, B-frames • Half-pixel motion compensation • More finely tuned VLCs for different purposes • Quantization table (like JPEG) replaces single Q step size

Bidirectional MC Properties • Advantage • Higher coding efficiency, frame rate can be increased significantly with few bits • More accurate motion estimation & compensation • No error propagation • Disadvantage • More memory buffer for frame storage (minimum of 3) • More end-to-end delay

H.264/AVC History • In the early 1990’s, the first video compression standards were introduced: • H.261 (1990) and H.263 (1995) from ITU • MPEG-1 (1993) and MPEG-2 (1996) from ISO • Since then, the technology has advanced rapidly • H.263 was followed by H.263+, H.263++, H.26L • MPEG-1/2 followed by MPEG-4 visual • But industry and research coders are still way ahead • H.264/AVC is a joint project of ITU and ISO, to create an up-to-date standard.

Scope and Context • Aimed at providing high-quality compression for various services: • IP streaming media (50-1500 kbps) • SDTV and HDTV Broadcast and video-on-demand (1 - 8+ Mbps) • DVD • Conversational services (<1 Mbps, low latency) • Standard defines: • Decoder functionality (but not encoder) • File and stream structure • Final results: 2-fold improvement in compression Same fidelity, half the size --- Compared to H.263 and MPEG-2

Video Compression • Motion compensation / prediction • Described current frame based on previous frame • Output description + residual image • Predicted frames are called “inter-frames”. • Some frames (intra-frames) are encoded without prediction, as natural images. • Image transform • Concentrate image energy in relatively few numeric coefficients • Lossy coding • Compress coefficient values in a lossy manner • Try to keep most important information

The H.263 Standard Coder original video compressed video Image Transform Lossy Coding Motion Compensation

The H.263 Standard Coder original video compressed video • H.263 Motion Compensation • Image is divided into 16x16 macroblocks, • Each macroblock is matched against nearby blocks in previous frame (called referenceframe), • “Nearby” = within 15-pixel horizontal/vertical range • Half-pixel accuracy (with bilinear pixel interpolation) • Best match is used to predict the macroblock, • The relative displacement, or motion vector, is encoded and transmitted to decoder • Prediction error for all blocks constitute the residual. Image Transform Lossy Coding Motion Compensation

Motion Compensation Example T=1 (reference) T=2 (current)

The H.263 Standard Coder original video compressed video • H.263 Image Transform • Residual is divided into 8x8 blocks, • 8x8 2-d Discrete Cosine Transform (DCT) is applied to each block independently • DCT coefficients describe spatialfrequencies in the block: • High frequencies correspond to small features and texture • Low frequencies correspond to larger features • Lowest frequency coefficient, called DC, corresponds to the average intensity of the block Image Transform Lossy Coding Motion Compensation

8x8 DCT Example

The H.263 Standard Coder original video compressed video • H.263 Lossy Coding • Transform coefficients are quantized: • Some less-significant bits are dropped • Only the remaining bits are encoded • For inter-frames, all coefficients get the same number of bits, except for the DC which gets more. • For intra-frames, lower-frequency coefficients get more bits • To preserve larger features better • The actual number of bits used depends on a quantization parameter (QP), whose value depends on the bit-allocation policy • Finally, bits are encoded using entropy (lossless) code • Traditionally Huffman-style code Image Transform Lossy Coding Motion Compensation

Changes in Motion Compensation • Quarter-pixel accuracy • A gain of 1.5-2dB across the board over ½-pixel • Variable block-size: • Every 16x16 macroblock can be subdivided • Each sub-block gets predicted separately • Multiple and arbitrary reference frames • Vs. only previous (H.263) or previous and next (MPEG). • Anti-aliasing sub-pixel interpolation • Removes some common artifacts in residual

Variable Block-Size MC • Motivation: size of moving/stationary objects is variable • Many small blocks may take too many bits to encode • Few large blocks give lousy prediction • In H.264, each 16x16 macroblock may be: • Kept whole, • Divided horizontally (vertically) into two sub-blocks of size 16x8 (8x16) • Divided into 4 sub-blocks • In the last case, the 4 sub-blocks may be divided once more into 2 or 4 smaller blocks.

H.264 Variable Block Sizes

Motion Scale Example T=1 T=2

H.264 VBS Example T=1 T=2

Arbitrary Reference Frames • In H.263, the reference frame for prediction is always the previous frame • In MPEG and H.26L, some frames are predicted from both the previous and the next frames (bi-prediction) • In H.264, any one frame may be used as reference: • Encoder and decoder maintain synchronized buffers of available frames (previously decoded) • Reference frame is specified as index into this buffer • In bi-predictive mode, each macroblock may be: • Predicted from one of the two references • Predicted from both, using weighted mean of predictors

Intra Prediction • Motivation: intra-frames are natural images, so they exhibit strong spatial correlation • Implemented to some extent in H.263++ and MPEG-4, but in transform domain • Macroblocks in intra-coded frames are predicted based on previously-coded ones • Above and/or to the left of the current block • The macroblock may be divided into 16 4x4 sub-blocks which are predicted in cascading fashion • An encoded parameter specifies which neighbors should be used to predict, and how

Intra-Prediction Example

Intra-Prediction ExampleVertical

Intra-Prediction ExampleHorizontal

Intra-Prediction ExampleMain Diagonal

H.264 Image Transform • Motivation: • DCT requires real-number operations, which may cause inaccuracies in inversion • H.264 uses a very simple integer 4x4 transform • A (pretty crude) approximation to 4x4 DCT • Transform matrix contains only +/-1 and +/-2 • Can be computed with only additions, subtractions, and shifts • Results show negligible loss in quality (~0.02dB)

Deblocking Filter Non Deblocked Image Deblocked Image Courtesy : Images from http://compression.ru/video/deblocking/

Entropy Coding • Motivation: traditional coders use fixed, variable-length codes • Essentially Huffman-style codes • Non-adaptive • Can’t encode symbols with probability > 0.5 efficiently, since at least one bit required • H.263 Annex E defines an arithmetic coder • Still non-adaptive • Uses multiple non-binary alphabets, which results in high computational complexity

Entropy Coding: CABAC • Context-adaptive binary arithmetic coding (CABAC) framework designed specifically for H.264 • Binarization: all syntax symbols are translated to bit-strings • 399 predefined context models, used in groups • E.g. models 14-20 used to code macroblock type for inter-frames • The model to use next is selected based on previously coded information (the context)

CSE 489-02 & CSE 589-02 Multimedia Processing Lecture 11 Video Coding