Shrishail S. Gajbhar 201121016

Compression of facial images using the K-SVD algorithmOri Bryt , Michael Elad, JVCIR, Elsevier, 2008 Shrishail S. Gajbhar 201121016

What are the principles behind compression? • Two fundamental components of compression are redundancy and irrelevancy reduction. • Redundancy reduction aims at removing duplication from the signal source (image/video). • Irrelevancy reduction omits parts of the signal that will not be noticed by the signal receiver, namely the Human Visual System (HVS).

Example of Image Redudancy Profile of cameraman image along row number 164. The top graph shows pixel intensity, and the bottom graph shows corresponding normalized correlation over 128 pixel displacements.

Lossless vs. Lossy Compression

Originalimage Source encoder linear transform to decorrelate the image data (lossless) (reconstructed) (inverse T) (dequantization) Compressed Entropy Coding of the resulting quantized values(lossless) (decoding) Image compression steps: Quantization of basis functions coefficients (lossy)

Facial Image Compression • Very specific class of images, a priori kown and with additional contraints we can have additional redudancy and hence more compression. (can outperform JPEG 2000) • Goal: Compression ratio (154:1 and more) e.g, 441× 358 size standard digital b/w passport photo (154 Kbytes, 8bpp) Compressing to less than 1KB without loss in visual quality.

State of The Art Techniques • JPEG2000 (Standard general purpose wavelet based image compression). • Geometric Allignment, VQ coding trained using Tree K-Means. (Low Bit-Rate Compression of Facial Images, by Michael Elad, IEEE TIP, 2007 (outperforms JPEG2000 !)). • PCA • ICA along with Matching Persuit type algoritms. (On the use of independant component analysis for image compression (outperforms JPEG, nearby to JPEG2000)). • K-SVD and Sparse representation (Extension of the above Elad’s paper replacing VQ coding by KSVD).

Background Material • Low Bit-Rate Compression of Facial Images (Elad, Goldenberg and Kimmel) • Geometrical Canonization : Restricted to frontal facial mug-shots, the handled images are geometrically deformed into a canonical form, in which facial features are located at the same spatial locations. Using a plain feature detection procedure, the image is divided to disjoint and covering set of triangles, each deformed using a different affine warp. • Leads to better conditioning of input images.

Geometrical Canonization • This information is sent to decoder for inverse warping. Parameters coded using 20 bytes. • A feature based correspondance • Facial Features Detection and Image Warping: 13 feature points along with six corner points define triangulation such that every triangle in i/p image and ‘average’ image uniquely define an affine transformation.

Allignment Example (Top) Input images and their (canonical) aligned (bottom) versions.

Coding and Hierarchical Multiscale Treatment • Coding : VQ dictionaries are trained using tree K-Means per patch separately using patches taken from the same location from 5000 images. • Only small patches can be used, which leads to loss of some redudancy, hence in compression. • The image is scaled down and VQ-coded using patches of size 8×8. Then it is interpolated back to the original resolution, and the residual is coded using VQ on 8×8 pixel patches once again. • Surpass JPEG2000 visually and also in PSNR.

VQ based compression results Left:VQ, Right:JPEG200 for 270 Bytes (compression ratio 585:1)

VQ based compression results

Sparse and Redudant Representations • Underdetermined Linear Systems : consider a matrix with n<m, and . • More unnkowns than equations, no solution if b is not in the span of columns of matrix A. • Infinitely many solutions with full rank assumption of A. • How can we find the proper x from infinitetly many possible images that “explain” given measurement b. (we knew b is down sampled and blurred version of x ) • Regularization: subject to . • Use of convex and strict functions that guarantees uniquness of solution.

Choice of • L2 norm : Closed form and unique solution exists (forward and inverse transforms are linear). • L1 norm : may have more than one solutions, but has tendency to prefer sparse solutions. (Fundamental property of Linear Programming) • i.e., L1 minimization can be formed as a LP problem. • L0 norm gives the sparsest solution but with lot of troubles! (misleading doesn’t satisfy all axiomatic requiremets of norm)

problem-Our Main Interest • subject to . • Basic questions: • Can uniqueness of a solution be claimed? Under what conditions? • If a candidate solution is available, can we perform a simple test to verify that the solution is actually the global minimizer of . • Suppose A is of size 500×2000, we know sparsest solution to has |S| 20 non-zeros, then we have 3.9E+47 such |S| column options then this simple calculation takes 1.2E+31 years to end!!! (1E-9 seconds for each individual test)

Sparseland Model • A parametric description of signal sources in a way that adapts to their true nature. • Coding mechanism for image patches. • subject to . • x is image patch D is redudant dictionary (learned by K-SVD). • For compression of image patches, equation 1 is solved using matching and basis persuit algorithms. OMP is used here for it’s simplicity and efficiency. • The solution of P0 is called sparse coding.

Training A Dictionary using K-SVD • Given a set of image patches of size N×N to be coded, , dictionary D is required using available sparse coding techniques so D is obtained by minimizing following, energy functional w.r.t. D and • and leads to different values of . • Adopts a block-coordinate descent idea. • Assuming that D is known, the penalty posed in Eq. reduces to a set of M sparse coding operations, so OMP can be used to obtain the near-optimal solutions. • Assuming representation vectors are fixed, the K-SVD algorithm updates one dictionary column at a time..

K-SVD algorithm and RMSE per iteration A typical representation error (RMSE) as a function of the iterations of the K-SVD algorithm. This graph corresponds to a gradual increment of atoms in the representation ðLÞ from 1 to 7.

Proposed Method in Paper • The general scheme : • An offline K-SVD training process. • An online image compression/decompression processes. • K – SVD Training: offline, produces set of K-SVD dictionaries which are then fixed for image compression. • Learning set is formed for each 15×15 patch. Mean patch from the set is subtracted from all the examples from that set. • Then encoder uses following steps for compression : • Pre-processing • Slicing to patches • Sparse coding • Entropy coding and quantization

Block Diagram of proposed method by authors

Encoding Steps in Detail • Pre-processing : • Geometrical warping • Simple scale down by a factor of 2:1. • Slicing to patches: • 15×15 patch is used. • Mean patch images calculated for learning set before training set is subtracted from relevant patches (so that patch fits the dictionary in content). • Sparse coding: • Every patch location has a pre-trained dictionary of code-words (atoms) Dij. • k= 512 is used, dictionaries are generated using K-SVD using 4415 training images. • Coding: assigning a linear combination of few atoms to describe the patch content (sparse coding).

Encoding Steps in Detail • Sparse coding (cont..): • Patch contents : linear weights and indices • Number of atoms vary from one patch to another, based on its average complexity, and this varied number is known at the decoder. • Sparse coding is done using OMP method. • At decoder, building of output patches is done independent of others. • Entropy coding and quantization: • Huffman tables for coding indices. • Coefficients weights uniformly quantized to 7 bits. • If on incoming input image, facial feature detection fails utterly that leads to low PSNRs in such case JPEG2000 can be used or asking user for selecting the feature points.

VQ vs. Sparse Representation • No of codewords k in VQ dictionary poses constraints on image patch size since k increases exponentially as : • B: target rate of bits. P is total no of pixels in image. • e.g., P = 39,600 (180×220), B = 500bytes, N = 8, then k = 88. If N = 12 then k = 24,000 approximately, which cannot be trained using 6000 examples. • Redundancy in adjacent patches is overlooked. • In case of sparse and redundant dictionaries, we have • Out of k atoms L are used on average per patch, so effect of increasing N is absorbed in varying L. • e.g., P = 39,600 pixels, k = 512 atoms and target rate of 500-1000 bytes and N = 8-30, leads L in the range of 1-12.

Required average number of atoms L as a function of the patch size N

Run-time and memory requirement • Computational complexity : • Training : sparse coding (OMP) and dictionary update requires per patch. is the maximal number of non-zero elements in each coefficient vector in each patch during the training, and SL is the number of examples in the learning set. Complexity for training all patches is . • Compression : encoding anddecoding stage. • Encoding : per patch, Lav is avg no of non-zero elements in coefficient vector of all patches. So total complexity • Decoding : Due to simple linear combination requires per patch. So overall complexity is . • Memory requirements : • Considering dictionaries per patch, mean patch images, Huffman tables, coefficient usage tables, quantization levels per patch memory requirement is O(Pk) (P = 39,600 and k = 512) i.e., 20Mb.

More details and results • 100hrs for training dictionaries, 5 seconds for compression and 1 second for decompression. • 4415 images for learning set, 100 images for testing. • Images size is 358×441, downscaled to 179×221. • General scheme applicable to all facial databases with some changes in parameters. • K-SVD dictionaries: • Stopping criteria : Maximum no of iterations 100 and minimal representation error. In compression, limit on maximum number of atoms per patch. Every dictionary has 512 patches of size 15×15 pixels as atoms. • Dictionaries contain images similar in nature to the image patch for which they were trained for.

K-SVD Dictionaries The Dictionary obtained by K-SVD for Patch No. 80 (the left eye) using the OMP method with L = 4.

K-SVD Dictionaries The Dictionary obtained by K-SVD for Patch No. 87 (the right nostril) using the OMP method with L = 4.

Reconstructed Images • Coding strategy is able to learn which parts are more difficult to code by assigning same representation error threshold to all patches and observing how many atoms are required per patch on average. • No of atoms increases, RMSE decreases for given image bit rate. Figure shows the atom allocation map and the representation error map for 400 bytes.

Reconstructed Images 630 bytes • No smearing artifact over all image but in local small area. • Compression is RMSE oriented not on the bit rate so increasing the bitrate doesn’t improves the PSNR drastically as can be seen in results. • Artifacts: • Caused by the slicing to disjoint patches, coding them independently. • Blockiness effect. • inaccurate representation of high frequency elements in the image. • inability to represent complicated textures. • inconsistency in the spatial location of several facial elements comparing to the original image.

‘‘good” 4.04 and a ‘‘bad” 5.9 representationRMSE, both using 630 bytes

Comparing the visual quality of a reconstructed image from the test set in several bit-rates. From top left,clockwise: 285 bytes (RMSE 10.6), 459 bytes (RMSE 8.27), 630 bytes (RMSE 7.61), 820 bytes (RMSE 7.11), 1021 bytes (RMSE 6.8), original image.

Comparing to other techniques Facial images compression with a bit-rate of 400 bytes. Comparing results of JPEG2000, the PCA results, and K-SVD method.

Comparing to other methods

Rate Distortion curves and comparison of PSNRs

Comments • Increased redudancy in dictionaries is desirable for high compression and better visual quality. • With additional constraints for increased redudancy, compression performance can surpass JPEG2000. • Use of sparse and redudant representation is a sophisticated frame work for various image related operations.

Shrishail S. Gajbhar 201121016

Shrishail S. Gajbhar 201121016

Presentation Transcript

National PNT Advisory Board Meeting October 4, 2007 s s s s s s s s s s s s s s s s Hank Skalski Department of Transp

S F S S

S - R S - R S - R S - R S - R S - R S - R S - R S - R S - R S - R S - R

UNDERSTANDING S S S

S-S OGFS

SGOG S 3 S 4 S 5 GOG S 7 SS S SS S 9 S S S 11 SS S YS S 17 S S 18 T S 22 GS S O S 23 S

G S S S

.S. S FFS

/S GS S

S ubcontractor S afety S ubcommittee S-3

S – sammy snake s s s

S+S

S@S Accelerate

CP = E[ s 2 , s 5 , s 1 , s 3 , s 2 ’ , s 3 ’ , s 4 , s 1 ’ , s 4 ’ , s 5 ’ ] S[] I[]

S-S

National PNT Advisory Board Meeting October 4, 2007 s s s s s s s s s s s s s s s s Hank Skalski

UNDERSTANDING S S S