1 / 48

ECE-C453 Image Processing Architecture

ECE-C453 Image Processing Architecture. Lecture 6, 2/3/04 Lossy Video Coding Ideas Technology of DCT and Motion Estimation Oleh Tretiak Drexel University. Decorrelation Ideas. Orthogonal Transforms (KLR, DCT) Main method for intra-frame coding Wavelet New stuff (JPEG 2000)

tessa
Download Presentation

ECE-C453 Image Processing Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE-C453Image Processing Architecture Lecture 6, 2/3/04 Lossy Video Coding Ideas Technology of DCT and Motion Estimation Oleh Tretiak Drexel University

  2. Decorrelation Ideas • Orthogonal Transforms (KLR, DCT) • Main method for intra-frame coding • Wavelet • New stuff (JPEG 2000) • Predictive coding • Simple • Used for inter-frame coding (video) Review

  3. Decoder Encoder Lossy Predictive Coding • How to decorrelate? • Predict values • Block coding (DFT) • wavelet • Predictive (sample based, feedback) encoder,Differential Pulse Code Modulation (DPCM) Review

  4. Review: Image Decorrelation • x = (x1, x2, ... xn), a sequence of image gray values • Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an orthogonal matrix (A-1 = AT) • Theoretical best (for Gaussian process): A is the Karhunen-Loeve transformation matrix • Images are not Gaussian processes • Karhunen-Loeve matrix is image-dependent, computationally expensive to find • Evaluating y = Ax with K-L transformation is computationally expensive • In practice, we use DCT (discrete cosine transform) for decorrelation • Computationally efficient • Almost as good as the K-L transformation Review

  5. Review: Block-Based Coding • Full image DCT - one set of decorrelated coefficients for whole image • Block-based coding: • Image divided into ‘small’ blocks • Each block is decorrelated separately • Block decorrelation performs almost as well (better?) than full image decorrelation • Current standards (JPEG, MPEG) use 8x8 DCT blocks Review

  6. Rate-Distortion: 1D vs. 2D coding • Theory on tradeoff between distortion and least number of bits • Interesting tradeoff only if samples are correlated • “Water-filling” construction to compute R(d) Review

  7. Wavelet Transform • Filterbank and wavelets • 2 D wavelets • Wavelet Pyramid Review

  8. Filterbank Pyramid 125 125 250 500 1000 Review

  9. 48.81 9.23 1.01 15.45 6.48 2.52 0.37 Lena: Top Level, next level Review

  10. This Lecture • Idea • Video Coding by Pixel Prediction • Motion Estimation • Technology: DCT, and how much it costs • Technology: Motion Estimation Algorithms

  11. Video Coding • Video: Sequence of images • Reason for changes between successive images • Edits • Camera pan, zoom • Intra-frame motion • Intra-frame texture • Noise • Model: Successive images are similar • Video coding uses intra-frame redundancy to achieve lossy compression

  12. Predicting sequential images f(t-1) f(t) f(t)–f(t–1)

  13. Motion Compensation • Macroblock size • MxN • Matching criterion • MAE (mean absolute error) • Search window • ±p pixel locations • Search algorithm • Full search • Logarithmic search • Parallel Hierarchical One-Dimensional Search • Pixel subsampling and projection • Hierarchical downsampling

  14. Motion Estimation Methods No compensation Full search logarithmic search 3 level hierarchical

  15. DCT Technology • DCT Formula • How it works • DCT plus quantization • DCT implementations and cost • Direct • Separable • Fast • Refinements

  16. What is the DCT? Note: in these equations, p stands for p. • One-dimensional 8 point DCT Input x0, ... x7, output y0, ... y7 • One-dimensional inverse DCT Input y0, ... y7, output x0, ... x7 • Matrix form of equations: x, y are one column matrices

  17. Two-Dimensional DCT • Forward 2DDCT. Input xij i = 0, ... 7, j = 0, ... 7. Output ykl k = 0, ... 7, l = 0, ... 7 • Matrix form, X, Y ~ 8x8 matrices with coefficients xij , ykl • The 2DDCT is separable! Note: in these equations, p stands for p.

  18. General DCT • One dimension • Two dimensions

  19. Example: 4x4 DCT See 06IPA.xls

  20. Computational Complexity • 1D DCT • N input and output samples ~ N2=64 operations (additions + multiplications) • 2D DCT - direct implementation • M = N2 input values, M output values -> M2 = N4 • 2D DCT - separable implementation, Y = TXTT = ZTT,where Z = TX, all matrices are NxN -> 2N3operations • For N = 8 • 2D DCT direct — 4096 operations, 64 operations per pixel • 2D DCT separable — 1024 operations, 16 ops/pixel • Big savings due to separable transform • Inverse DFT — same story.

  21. DCT: Encoding in JPEG, MPEG • Take 8x8 blocks of pixels • Subtract range mean value • Compute 8x8 DCT • Quantize the DCT coefficients • Typically, many of the samples are equal to zero • Lossless entropy coding of the quantized samples • Different quantization step is used for different DCT coefficients • ykl — DCT coefficients, qkl — quantizer steps • zkl— quantized values

  22. DCT: Example DCT • Data from lena, ‘smooth’ area. RMS error = 3.5 Original DCT, quantized Reconstructed

  23. DCT example • Data from lena, ‘busy’ area. RMS error = 7.3 Original DCT DCT, quantized Reconstructed

  24. Overview: DCT coding • Transformation decorrelates samples • Transformed samples are quantized, quantization step depends on the coefficient. Degree of compression and loss can be changed by scaling the quantization steps • Many quantized samples are zero —> run length coding • At receiver, perform inverse DCT • Many calculations! JPEG standard quantization steps

  25. Speeding up the DCT • Separable transform - basic speedup • Fast DCT transform - like FFT • Further speedup through Scaled DCT

  26. Optimized (fast) DCT • 1-D Chen DCT diagram. Dashed lines indicate subtraction, — multi-plication by a constant, — multiplication by 0.5 (shift). Characteristics of optimized DCT algorithms

  27. DCT Complexity • Direct DCT computation: • 64 DCT values, each requires 64 multiplications & additions —> 4096 multiply-accumulate (MA) operations per block • Separable algorithm (operate on rows, then on columns) —> 16 one-dimensional 8 point DCT operations —> 1024 MA operations • Fast implementation ~ Nlog2N operations ~ 16x24 = 384 MA ops • Special methods ~ many operations involve multiplication by 1 or -1, take advantage of this!

  28. Fast Scaled DCT • Picture of a butterfly at last stage of DCT + following quantizer

  29. DCT refinements Complexity of scaled DCT algorithms, excluding quantization • Multiply-accumulate architectures • Basic operation is a = bc + d, well suited for DCT • Super-scalar architectures • Multi-register, multi-ALU processors • Perform several operations in parallel

  30. Motion Estimation • Architecture of Motion Estimation • Algorithms and Costs • Full Search • Logarithmic Search • PHODS • Downsample, projection • Hierarchical motion estimation • Other criteria • Multi-image estimation

  31. Baseline Models • Previous frame predicts current frame • I(x, y, t) = I(x, y, t-1) + e(x, y, t) • Not effective in presence of motion ~ zoom, pan, etc. • Prediction to account for motion: • I(x, y, t) = I(x+u, y+v, t-1) + e(x, y, t) • (u, v) — motion (displacement) vector • Model works (somewhat) for pan, not for other motion • Compromise: Compute independent motion estimates for rectangular image regions — macroblocks. • Macroblocks are, in general, bigger than DCT blocks

  32. Generic Encoder - simplified

  33. Generic Decoder

  34. Motion Compensation • Macroblock size • MxN • Matching criterion • MAE (mean absolute error) • Search window • ±p pixel locations • Search algorithm • Full search • Logarithmic search • Parallel Hierarchical One-Dimensional Search • Pixel subsampling and projection • Hierarchical downsampling

  35. Motion Estimation Terminology • Issues: • Size of macroblock • Size of search region • In video coding standards, M = N = 16

  36. Matching Criterion • Matching criterion: what produces the fewest coded bits for the error image • Coding for each value of motion vector (u, v) is too time consuming (expensive) • In practice, mean absolute error (MAE) is most popular • C - current image, R - reference image, (x, y) - macroblock origin

  37. Full-Search Method • Compute for (2p+1)2 values of (i, j). • Each location requires 3MN operations • Picture dimensions IxJ, F pictures per second • 3IJF(2p + 1)2 operations per second • I = 720, J = 480, F = 30, p = 15 —> 30 GOPS • Guaranteed to find best (MAE) displacement • How to do it? • Special computers • Smaller p • Faster (suboptimal) algorithm

  38. Evaluate at -4, 0, 4 —> minimum at -4 • Evaluate at -6, (-4), -2 —> minimum at -2 • Evaluate at -3, (-2), -1 —> minimum at -3. Done! Logarithmic Search (1D) • Goal: find minimum over u in [-p, p] • First step: evaluate at -p/2, 0, p/2 (interval ~ p) • Next step: choose interval of length p/2 around minimum (2 more evaluations) • Continue until interval length is equal to 2. This takes k = ceiling(log2p) iterations • Example p = 7

  39. Logarithmic Search - 2D • First stage requires 3x3 = 9 evaluations • Subsequent stages require 8 evaluations • k = ceiling(log2p) stages (iterations) • Rate = 3IJF(8k+1) • p = 15, I = 720, J = 480, F = 30 —> 1 GOPS • Can fail to find minimum • Bottom line: Faster method, more error than full search

  40. Min H Min V PHODS • Parallel Hierarchical One-Dimensional Search • 1-st Blue2-nd Green3-rd Red ~Twice as fast as logarithmic Less reliable

  41. Other Fast Methods • Subsample (do not use all points in macroblock) • Projection: Row and column projection of pixels, follow with 1-D search • Hierarchical motion estimation • Downsample reference image and current image • Perform low resolution search • Refine

  42. Hierarchical Search • Prepare downsampled versions of current and reference images • Full macroblock 16x16 • Down 2 macroblock 8x8 • Down 4 macroblock 4x4 • Full search in Down 4 reference image • 16 x speedup, smaller macroblock • 16 x speedup, fewer displacement vectors • p = ±16, p’ = ±4 • Around point of best match, do local search in Down 2 reference image (3x3 search zone) • Repeat for Full reference image (3x3 search zone) Full Down 2 Down 4

  43. Motion Estimation Methods No compensation Full search logarithmic search 3 level hierarchical

  44. Comparison

  45. More Speedup • Simpler comparison criteria • Binarize difference, count pixels that do not match • PDC (Pixel Difference Classification) • Binarize current and reference • BPROP (count matching pixels) • DPC (count different pixels) • BMP (operations done on bitplanes) • Produce 3-25 fold speedup

  46. Big Picture on Speedup • Speedup methods are less accurate • Same Bit Rate, lower SNR • Same SNR, higher bit rate • Binary criteria lose about 0.5 dB • Suppose we have adequate computing power? Can we do better? • Sub-pixel motion estimation • First find best match with pixel accuracy in displacement vectors • Interpolate images for half-pixel shifts

  47. Multipicture Motion Estimation • Estimate on basis of past and future • Non-sequential image transmission • More chances to find good match • More calculations

  48. Video Compression - Summary • Video — sequence of images • Can use intraframe compression • Motion JPEG • Interframe compression offers great potential for savings • No motion compensation — lower compression • Motion compensation — greater compression • All video standards provide for motion compensation • Compensation done on macroblocks, multiple motion vectors per image • Tradeoff between computing requirement and image quality

More Related