480 likes | 585 Views
ECE-C453 Image Processing Architecture. Lecture 6, 2/3/04 Lossy Video Coding Ideas Technology of DCT and Motion Estimation Oleh Tretiak Drexel University. Decorrelation Ideas. Orthogonal Transforms (KLR, DCT) Main method for intra-frame coding Wavelet New stuff (JPEG 2000)
E N D
ECE-C453Image Processing Architecture Lecture 6, 2/3/04 Lossy Video Coding Ideas Technology of DCT and Motion Estimation Oleh Tretiak Drexel University
Decorrelation Ideas • Orthogonal Transforms (KLR, DCT) • Main method for intra-frame coding • Wavelet • New stuff (JPEG 2000) • Predictive coding • Simple • Used for inter-frame coding (video) Review
Decoder Encoder Lossy Predictive Coding • How to decorrelate? • Predict values • Block coding (DFT) • wavelet • Predictive (sample based, feedback) encoder,Differential Pulse Code Modulation (DPCM) Review
Review: Image Decorrelation • x = (x1, x2, ... xn), a sequence of image gray values • Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an orthogonal matrix (A-1 = AT) • Theoretical best (for Gaussian process): A is the Karhunen-Loeve transformation matrix • Images are not Gaussian processes • Karhunen-Loeve matrix is image-dependent, computationally expensive to find • Evaluating y = Ax with K-L transformation is computationally expensive • In practice, we use DCT (discrete cosine transform) for decorrelation • Computationally efficient • Almost as good as the K-L transformation Review
Review: Block-Based Coding • Full image DCT - one set of decorrelated coefficients for whole image • Block-based coding: • Image divided into ‘small’ blocks • Each block is decorrelated separately • Block decorrelation performs almost as well (better?) than full image decorrelation • Current standards (JPEG, MPEG) use 8x8 DCT blocks Review
Rate-Distortion: 1D vs. 2D coding • Theory on tradeoff between distortion and least number of bits • Interesting tradeoff only if samples are correlated • “Water-filling” construction to compute R(d) Review
Wavelet Transform • Filterbank and wavelets • 2 D wavelets • Wavelet Pyramid Review
Filterbank Pyramid 125 125 250 500 1000 Review
48.81 9.23 1.01 15.45 6.48 2.52 0.37 Lena: Top Level, next level Review
This Lecture • Idea • Video Coding by Pixel Prediction • Motion Estimation • Technology: DCT, and how much it costs • Technology: Motion Estimation Algorithms
Video Coding • Video: Sequence of images • Reason for changes between successive images • Edits • Camera pan, zoom • Intra-frame motion • Intra-frame texture • Noise • Model: Successive images are similar • Video coding uses intra-frame redundancy to achieve lossy compression
Predicting sequential images f(t-1) f(t) f(t)–f(t–1)
Motion Compensation • Macroblock size • MxN • Matching criterion • MAE (mean absolute error) • Search window • ±p pixel locations • Search algorithm • Full search • Logarithmic search • Parallel Hierarchical One-Dimensional Search • Pixel subsampling and projection • Hierarchical downsampling
Motion Estimation Methods No compensation Full search logarithmic search 3 level hierarchical
DCT Technology • DCT Formula • How it works • DCT plus quantization • DCT implementations and cost • Direct • Separable • Fast • Refinements
What is the DCT? Note: in these equations, p stands for p. • One-dimensional 8 point DCT Input x0, ... x7, output y0, ... y7 • One-dimensional inverse DCT Input y0, ... y7, output x0, ... x7 • Matrix form of equations: x, y are one column matrices
Two-Dimensional DCT • Forward 2DDCT. Input xij i = 0, ... 7, j = 0, ... 7. Output ykl k = 0, ... 7, l = 0, ... 7 • Matrix form, X, Y ~ 8x8 matrices with coefficients xij , ykl • The 2DDCT is separable! Note: in these equations, p stands for p.
General DCT • One dimension • Two dimensions
Example: 4x4 DCT See 06IPA.xls
Computational Complexity • 1D DCT • N input and output samples ~ N2=64 operations (additions + multiplications) • 2D DCT - direct implementation • M = N2 input values, M output values -> M2 = N4 • 2D DCT - separable implementation, Y = TXTT = ZTT,where Z = TX, all matrices are NxN -> 2N3operations • For N = 8 • 2D DCT direct — 4096 operations, 64 operations per pixel • 2D DCT separable — 1024 operations, 16 ops/pixel • Big savings due to separable transform • Inverse DFT — same story.
DCT: Encoding in JPEG, MPEG • Take 8x8 blocks of pixels • Subtract range mean value • Compute 8x8 DCT • Quantize the DCT coefficients • Typically, many of the samples are equal to zero • Lossless entropy coding of the quantized samples • Different quantization step is used for different DCT coefficients • ykl — DCT coefficients, qkl — quantizer steps • zkl— quantized values
DCT: Example DCT • Data from lena, ‘smooth’ area. RMS error = 3.5 Original DCT, quantized Reconstructed
DCT example • Data from lena, ‘busy’ area. RMS error = 7.3 Original DCT DCT, quantized Reconstructed
Overview: DCT coding • Transformation decorrelates samples • Transformed samples are quantized, quantization step depends on the coefficient. Degree of compression and loss can be changed by scaling the quantization steps • Many quantized samples are zero —> run length coding • At receiver, perform inverse DCT • Many calculations! JPEG standard quantization steps
Speeding up the DCT • Separable transform - basic speedup • Fast DCT transform - like FFT • Further speedup through Scaled DCT
Optimized (fast) DCT • 1-D Chen DCT diagram. Dashed lines indicate subtraction, — multi-plication by a constant, — multiplication by 0.5 (shift). Characteristics of optimized DCT algorithms
DCT Complexity • Direct DCT computation: • 64 DCT values, each requires 64 multiplications & additions —> 4096 multiply-accumulate (MA) operations per block • Separable algorithm (operate on rows, then on columns) —> 16 one-dimensional 8 point DCT operations —> 1024 MA operations • Fast implementation ~ Nlog2N operations ~ 16x24 = 384 MA ops • Special methods ~ many operations involve multiplication by 1 or -1, take advantage of this!
Fast Scaled DCT • Picture of a butterfly at last stage of DCT + following quantizer
DCT refinements Complexity of scaled DCT algorithms, excluding quantization • Multiply-accumulate architectures • Basic operation is a = bc + d, well suited for DCT • Super-scalar architectures • Multi-register, multi-ALU processors • Perform several operations in parallel
Motion Estimation • Architecture of Motion Estimation • Algorithms and Costs • Full Search • Logarithmic Search • PHODS • Downsample, projection • Hierarchical motion estimation • Other criteria • Multi-image estimation
Baseline Models • Previous frame predicts current frame • I(x, y, t) = I(x, y, t-1) + e(x, y, t) • Not effective in presence of motion ~ zoom, pan, etc. • Prediction to account for motion: • I(x, y, t) = I(x+u, y+v, t-1) + e(x, y, t) • (u, v) — motion (displacement) vector • Model works (somewhat) for pan, not for other motion • Compromise: Compute independent motion estimates for rectangular image regions — macroblocks. • Macroblocks are, in general, bigger than DCT blocks
Motion Compensation • Macroblock size • MxN • Matching criterion • MAE (mean absolute error) • Search window • ±p pixel locations • Search algorithm • Full search • Logarithmic search • Parallel Hierarchical One-Dimensional Search • Pixel subsampling and projection • Hierarchical downsampling
Motion Estimation Terminology • Issues: • Size of macroblock • Size of search region • In video coding standards, M = N = 16
Matching Criterion • Matching criterion: what produces the fewest coded bits for the error image • Coding for each value of motion vector (u, v) is too time consuming (expensive) • In practice, mean absolute error (MAE) is most popular • C - current image, R - reference image, (x, y) - macroblock origin
Full-Search Method • Compute for (2p+1)2 values of (i, j). • Each location requires 3MN operations • Picture dimensions IxJ, F pictures per second • 3IJF(2p + 1)2 operations per second • I = 720, J = 480, F = 30, p = 15 —> 30 GOPS • Guaranteed to find best (MAE) displacement • How to do it? • Special computers • Smaller p • Faster (suboptimal) algorithm
Evaluate at -4, 0, 4 —> minimum at -4 • Evaluate at -6, (-4), -2 —> minimum at -2 • Evaluate at -3, (-2), -1 —> minimum at -3. Done! Logarithmic Search (1D) • Goal: find minimum over u in [-p, p] • First step: evaluate at -p/2, 0, p/2 (interval ~ p) • Next step: choose interval of length p/2 around minimum (2 more evaluations) • Continue until interval length is equal to 2. This takes k = ceiling(log2p) iterations • Example p = 7
Logarithmic Search - 2D • First stage requires 3x3 = 9 evaluations • Subsequent stages require 8 evaluations • k = ceiling(log2p) stages (iterations) • Rate = 3IJF(8k+1) • p = 15, I = 720, J = 480, F = 30 —> 1 GOPS • Can fail to find minimum • Bottom line: Faster method, more error than full search
Min H Min V PHODS • Parallel Hierarchical One-Dimensional Search • 1-st Blue2-nd Green3-rd Red ~Twice as fast as logarithmic Less reliable
Other Fast Methods • Subsample (do not use all points in macroblock) • Projection: Row and column projection of pixels, follow with 1-D search • Hierarchical motion estimation • Downsample reference image and current image • Perform low resolution search • Refine
Hierarchical Search • Prepare downsampled versions of current and reference images • Full macroblock 16x16 • Down 2 macroblock 8x8 • Down 4 macroblock 4x4 • Full search in Down 4 reference image • 16 x speedup, smaller macroblock • 16 x speedup, fewer displacement vectors • p = ±16, p’ = ±4 • Around point of best match, do local search in Down 2 reference image (3x3 search zone) • Repeat for Full reference image (3x3 search zone) Full Down 2 Down 4
Motion Estimation Methods No compensation Full search logarithmic search 3 level hierarchical
More Speedup • Simpler comparison criteria • Binarize difference, count pixels that do not match • PDC (Pixel Difference Classification) • Binarize current and reference • BPROP (count matching pixels) • DPC (count different pixels) • BMP (operations done on bitplanes) • Produce 3-25 fold speedup
Big Picture on Speedup • Speedup methods are less accurate • Same Bit Rate, lower SNR • Same SNR, higher bit rate • Binary criteria lose about 0.5 dB • Suppose we have adequate computing power? Can we do better? • Sub-pixel motion estimation • First find best match with pixel accuracy in displacement vectors • Interpolate images for half-pixel shifts
Multipicture Motion Estimation • Estimate on basis of past and future • Non-sequential image transmission • More chances to find good match • More calculations
Video Compression - Summary • Video — sequence of images • Can use intraframe compression • Motion JPEG • Interframe compression offers great potential for savings • No motion compensation — lower compression • Motion compensation — greater compression • All video standards provide for motion compensation • Compensation done on macroblocks, multiple motion vectors per image • Tradeoff between computing requirement and image quality