Data Compression

Data Compression Data Compression: the process of transforming data to a smaller representation from the original or some approximation of the original which can be computed at a later time.

Types of compression techniques: • Lossless: Reconstructed data is identical to the original data. • Example techniques: Huffman coding, Arithmetic coding, Shanon Fano coding • Lossy: Reconstructed data is an approximation to the original. • Example technique: Linear prediction, transform coding

Data Compression • Image Compression • Transform Coding • JPEG • Video Compression • Motion-compensated prediction • MPEG

Shannon-Fano Coding • Divide the set of symbols into subset of roughly equal probability. • The first subset is assigned a binary zero, the second a binary 1. • Repeat recursively until each subset contains just one symbol. Symbol Probability a 0.2 e 0.3 l 0.1 o 0.2 u 0.1 ! 0.1 Ordered list grouping code e 0.3 0.5 0.3 00 a 0.2 0.2 01 o 0.2 0.3 0.2 100 i 0.1 0.5 0.1 101 u 0.1 0.2 0.1 110 ! 0.1 0.1 111

Huffman Coding • Rank all symbols in order of probability of occurrence. • Locate the two symbols with the smallest probabilities. • Replace these two symbols by a new composite symbol, whose probability is the sum of the individual prob. • Create a node with children of 2 symbols. • Repeat steps 1, 2, 3 until there is only 1 symbol. Eventually, we will have a tree where each node is the sum of the probabilities of all leaf nodes beneath it. • Traverse the tree from the root to each leaf, recording 0 for a left branch and 1 for a right branch.

Huffman Coding Property No code forms a prefix of any other. Example: Pr(u)=1/7, Pr(x)=3/7, Pr(y)=2/7, Pr(z)=1/7 A={u, x, y, z} 1 0 Pr(uz)=2/7 u z

0 0 1 1 1 1 y y 0 0 u u z z Pr(uzy)=4/7 0 1 x x x u y x z y is encoded as 0 0 110 10 0 111 10. Symbol Prob Fixed Length Shanon/Fano Huffman code p*length code p*length code p*length p 10 0 1110 110 11110 11111 0.4 0.3 0.4 0.6 0.5 0.5 2.7 01 00 101 100 110 111 0.4 0.6 0.3 0.6 0.3 0.3 2.5 000 001 010 011 100 101 0.6 0.9 0.3 0.6 0.3 0.3 3.0 0.2 0.3 0.1 0.2 0.1 0.1 a e i o u !

Transform Coding SPATIAL DOMAIN Weight Height 65 56 80 40 69 Weight 170 130 203 80 148 200 W=2.5H 170 140 110 F = arctan2.5 = 68.19 80 40 50 60 70 80 90 Height e.g., FREQUENCY DOMAIN RECONSTRUCTED DATA Height 181 141 218 89 163 Weight 3 -4 1 -7 -9 Height 68 53 81 34 61 Weight 169 131 203 84 151 Very small numbers FREQUENCY DOMAIN Height 181 141 218 89 163 Weight 0 0 0 0 0 Do not store these numbers

Multimedia Objects Are Huge Consider a video sequence: • Frame size is pixels. • 16 bits per pixel. • Playback rate is 30 fps. • Video length is five minutes. • Transmission rate is: • Storage requirement is:

Audio Compression • Nonlinear PCM • Differential PCM • Adaptive Differential PCM • Linear Prediction Coding • Subband Coding • Human Speech Compression

Checking Pointing • Progressive Compression • Logarithmic Compression • µ-Law Compression • A-Law Compression • ADPCM Compression • Transform based Audio Compression

Differential Pulse Code Modulation (DPCM) • Take advantage of slow changes in the speech waveform. • Most of samples is concentrated into the frequency range 75 through 400 Hz. • Represent these waveforms by the differences in the PCM values. (17, 28, 30) (17, 24, 26) (17, 24, 30) • One can use quantization with DPCM to further reduce the number of bits.

To bound quantization errors, framing or dithering can be used. • Framing: Samples can be allocated into a fixed-size frame, the first sample may be encoded into PCM. Subsequent samples in the same frame can be encoded using DPCM. • Dithering: Accumulated error is added into the sample value, called dither value, before it is quantized.

Adaptive Differential Pulse Code Modulation (ADPCM) • The quantization step size between adjacent samples varies. If the waveform is changing rapidly, large quantization steps are used and vice versa. • Subband ADPCM: The speech signal is divided into two subbands (low and high frequency bands), each is coded using ADPCM.

Transform-Based Audio Compression • Step-1: Convert samples from time domain to frequency domain using a linear transformation. • Two popular transformations are: • Discrete Cosine Transform (DCT) • Wavelets • Step-2: The frequencies are divided into many bands, relying on the logarithmic sensitivity of the human ear to changes in frequency. • Step-3: Quantize the samples in each band separately.

Linear Prediction Coding Current value Next value transmit the error Predicted value • Input signal is sampled at regular intervals. • Both encoder and decoder predicts what the next sample will be. They predict the same value. • The encoder sends to the decoder only the difference between the correct and the predicted value.

The decoder predicts the value of the sample from the previous sample and add the difference received from the encoder. • Differential coding is a special case of linear prediction. Each previous value is predicted as the prediction of the next value.

Comparison of Popular Audio Compression Strategies TechniquesExpected Comments Compression Ratio • Logarithmic Encoding • ADPCM • LPC-based • Transform-Based 2:1 4:1 Best with slow moving content (speech) Speech specific, expensive 12:1 8:1 Extremely, expensive Logarithmic encoding: µ-law and A-law compressions

JPEG Methodology • Discrete Cosine Transform: It removes data redundancy by transforming data from a spatial representation (or spatial domain) to a spectral representation (or frequency domain.)

Quantizer: It reduces the precision of the integers, thereby reducing the number of bits required to store the data. • Entropy Encoder: It compresses the quantized data more compactly based on their spatial characteristics (e.g., store the run length instead of 15 zeros.)

Discrete Cosine Transform (DCT) Apply DCT to each 8x8 block: Input matrix pixel 132 136 138 140 144 145 147 155 136 140 140 147 140 148 155 156 140 143 144 148 150 152 154 155 144 144 146 145 149 150 153 160 150 152 155 156 150 145 144 140 144 145 146 148 143 158 150 140 150 156 157 156 140 146 156 145 148 145 146 148 156 160 140 145 Subtract2p-1 from each pixel value to create Spixel, where p is the number of bits used to represent each pixel. DC coefficient 172 21 -9 -10 -8 4 4 0 -18 -34 -8 6 -2 -2 -3 -8 15 24 -4 -5 -3 -4 -4 -4 -8 -8 6 4 5 6 5 3 23 -10 -5 -4 -3 -4 6 2 -9 11 4 4 3 4 3 1 -14 14 3 2 4 2 1 4 19 7 -1 1 6 -1 1 0 Output matrix DCT AC coefficients

DCT Coefficients: Property • The DC coefficient is some multiple of the average value in the block. • It determines the basic color of the data unit. • The lower-frequency coefficients in the top left corner of the table have larger values than the higher-frequency coefficients. • This is generally the case, except for situations in which there is substantial activity in the image block.

JPEG: Quantization DCT coefficients 172 21 -9 -10 -8 4 4 0 -18 -34 -8 6 -2 -2 -3 -8 15 24 -4 -5 -3 -4 -4 -4 -8 -8 6 4 5 6 5 3 23 -10 -5 -4 -3 -4 6 2 -9 11 4 4 3 4 3 1 -14 14 3 2 4 2 1 4 19 7 -1 1 6 -1 1 0 Quantum matrix 22 25 28 31 34 37 40 43 4 7 10 13 16 19 22 25 7 10 13 16 19 22 25 28 10 13 16 19 22 25 28 31 13 16 19 22 25 28 31 34 16 19 22 25 28 31 34 37 19 22 25 28 31 34 37 40 25 28 31 34 37 40 43 46 DCT coefficients after quantization 43 3 1 1 0 0 0 0 3 2 0 0 0 0 0 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

JPEG: Step Size • The quantum matrix contains quantum values which are also called step sizes. • The decision on the relative size of the step sizes is based on how errors in these coefficients will be perceived by the human visual system.

Quantization errors in the DC and lower-frequency AC coefficients are more easily detectable than quantization errors in the higher-frequency AC coefficients. • We use larger step sizes for perceptually less important coefficients. • Applications may specify values which optimizes the desired quality according to the particular image characteristics.

Encoding DC Coefficients • The DC coefficient is some multiple of the average value in the block. • The average pixel value in any block will not differ substantially from the average value in the neighboring block. • DC coefficient values will be quite close. • It makes sense to encode the difference between the DC coefficients of neighboring blocks rather than to encode the DC coefficients themselves.

DC coefficient is encoded as the difference between the current DC coefficient and the previous one. • The codeword has two fields. • The number of bits used to encode the difference. • The value of the difference.

Encoding AC Coefficients • AC coefficients are encoded in the zig-zag order. • The codeword for a non-zero AC coefficient has three fields: • The number of bits used for the presentation of the AC coefficient. • The value of the AC coefficient. • The number of subsequent zero AC coefficients.

JPEG: Decoding JPEG is a symmetrical method. Decompression is the exact reverse process of compression. • Entropy decoding • Reverse Quantization • Take the inverse DCT transform. • Add to each pixel.

JPEG: File Format

JPEG Examples Bmp (57KB) jpg (6KB) png (15KB) gif (16KB)

Video Compression • In most video sequences there is little change in the contents of the image from one frame to the next. • A good video compression technique should take advantage of this temporal correlation. • Make use of the temporal correlation to remove redundancy.

Compression Algorithm Compression Ratio Intel RTV/Indeo 3:1 Intel PLV 12:1 Vector Quantization & Motion Compensation IBM PhotoMotion 3:1 10:1 MJPEG 10:1 Fractals RLE & Frame differencing Wavelets 20:1 2D DCT & No frame differencing 50:1 H.261/H.263 Good for natural scene. Require high CPU power. 30:1 MPEG Frame-differencing: Assume that average motion is small and compress the pixel-by-pixel difference between two frames. Low compression ratio: 3:1

Can you find any differences?

Motion Compensation and Estimation[Furht97] • Motion estimation: the process of determining the value of motion vectors for each frame during the encoding process. • Motion estimation techniques are categorized into four main groups: • gradient techniques • pel-recursive • block matching • frequency domain

Motion compensation: the use of motion vectors to provide offsets into the past and/or future reference frames containing previously decoded pixels that are used to form prediction and error difference. Types of motion compensation: • Forward compensation: Coding the current frame using the previous frame as a reference frame. • Backward compensation: Coding the current frame using a future frame as a reference frame.

Block-Matching Motion Estimation Each square is a macroblock (MB). A MB covers 16x16 pixels. A MB has 4Y + Cb + Cr blocks. 1 0 4 5 2 3 Cr Cb 4Y Transmit the motion vector. The MB is encoded individually.

Complexity of Motion Estimation • Search algorithms • Cost function which maps the pixel differences to the real numbers • Search range parameter F: MB in the current frame G: Search area in the previous frame p: search range parameter n m F m+2p p m=n=16, p=6: a typical case for MPEG G n+2p Popular cost functions • Mean Square Difference (MSD) • Mean Absolute Difference (MAD) • Etc.

MSD(dx,dy)= Search space dx={-p,+p} dy={-p,+p} MAD(dx,dy)= Search space dx={-p,+p} dy={-p,+p} • F: A m x n MB from the current frame. • F(i,j): A pixel value at (i,j) • G: the same MB from a reference frame. • G(i,j): a pixel value at (i,j)

Motion Search Algorithms • exhaustive search • three-step search • 2-dimensional search • conjugate difference search • parallel hierarchical 1-d • modified pixel-diff • hierarchical motion fast search

Exhaustive Search Complexity=(2p+1)2 P=6 • Each circle represents a MB to be compared. • Compare with every MB in the search area.

Three-step Search Complexity=O(log2p) Motion vector= • Step 1: • Let step size be . • Consider 8 weights at p/2 away from the origin and the weight of the origin. • Out of these nine weights, find the minimum(or best match). Its location becomes the origin of the next step. • Step 2: • Let step size be . • Choose the minimum among the nine weights. • Step 3: • If step size is 1. • Choose the minimum among the nine weights.

3 3 3 3 2 2 2-D Logarithmic Search Complexity=O(log p) • Calculate the match weight at center. • If its weight < threshold (t), stop • Otherwise • Find weights of the four points vertically and horizontally away from the location of the center. • Find the minimum weight among the five points. • If the weight of the center is the minimum weight, change step size to . Go back to 1. • If one of the weight of the four points is the minimum and not greater than threshold, stop. Otherwise, go back to 1.

MPEG Video Standard Frame type Display order Bitstream order I 1 1 B 2 3 B 3 4 B 5 6 B 6 7 B 8 9 B 9 10 P 10 8 B 11 12 B 12 13 I 13 11 P 4 2 P 7 5 • I frame: coded without any reference to other frames. • Compression rate is relatively low.

P (predictive coded) frame: coded using motion-compensated prediction from the last I or P frame, whichever happens to be closest. • High level of compression. • B (bi-directionally predictive coded) frame: coded using motion-compensated prediction from the most recent P or I frame and the closest future P or I frame. • Very high level of compression.

Search Area in MPEG When searching for the best matching block in a neighboring frame, the region of search depends on the amount of motion. • The search area grows with the distance between the frame being coded and the frame being used for prediction. Use smaller search area I B B P B B P B B P B B I Use larger search area

Group of Pictures (GOP) • The different frames are organized together in a group of pictures (GOP). • A GOP is the smallest random access unit in the video sequence. Playback starts here I P P P P … P P P … P P P P Need to decode all these frames. Very long delay ! Playback starts here One GOP I … I B B P B B P B B I ... Need only decode these frames Essentially no delay ! • Display order: B B I B B P B B P B B P • 0 1 2 3 4 5 6 7 8 9 10 11 • Decoding order: I B B P B B P B B P B B • 2 0 1 5 3 4 8 6 7 11 9 10

MPEG-Decoder • For each picture { • For each MB { • Decode MB header, motion vectors • // Calculate source address of reference blocks • For each block { • // Prefetch a reference block • VLD and inverse quantize the block; • IDCT; • } • Motion Compensation • } • If the current picture is B-type • Display the current frame; • Else • Display the forward reference frame according to proper order • }

VLD: Variable Length Decoding • Some optimization can be done to speed up the decoding process. • Ex. • Use multimedia instruction set [Lee95,Zhou95]. • Prefetch reference MBs of a current MB into the CPU cache. • Take advantages of hardware feature such as cache combine-write feature, AGP port.

Data Compression