190 likes | 388 Views
Low-Complexity Transform and Quantization in H.264/AVC. Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE. Content. Brief recall of the H.264 encode and decode structure Transform in H.264 DCT and Integer transform
E N D
Low-Complexity Transform and Quantization in H.264/AVC Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE
Content • Brief recall of the H.264 encode and decode structure • Transform in H.264 • DCT and Integer transform • Low-Complexity integer transform(author proposed) • Quantization in H.264
H.264 encode & decode Three Parts: Prediction, Transform, Quantization Input block Prediction Transform Quantization Entropy Coding Transmit Prediction: Generate block prediction by Motion Estimation. Transform: Convert the difference between the prediction and true value into coefficients by integer transform. Quantization: Quantize the coefficients.
DCT & Integer Transform • DCT(Discrete Cosine Transform) Commonly used in block transform coding of image and video, e.g. JPEG and MPEG. Definition for 8x8 block: Convert image from spatial domain to frequency domain
DCT & Integer Transform In H.264, 4x4 block transform is adopted Problem: Coefficients are irrational numbers. In digital computer, when you do an inverse transform after forward transform of an input, It may not get the same input back.
DCT & Integer Transform • Solution: Integer Transform An integer approximation of DCT. Original H.264 design: {a=13, b=7, c=17} Problem: increase of dynamic range. If max(X(i,j))=A, then max(Y(u,v))= A x (13x4)^2 = 2704 x A. Log2(2704) = 11.4, So it needs 12 more bits to encode Y(u,v) than X(i,j)
Low-Complexity Integer Transform • Choose {a=1, b=2, c=1} • Rows are orthogonal to each other. • The dynamic range gain is log2(6^2) = 5.17 • Although the norm of each row is different, it can be easily compensated in quantization part. No noticeable performance penalty while reducing the dynamic range gain and simplicity.
Low-Complexity Integer Transform • Inverse transform We could just use the transpose of H. However, in order to minimize the dynamic range gain, we scale the rows that has element 2 in H’ by ½. So it becomes, Dynamic range gain = log2(4^2) = 4 bits. Also, the factor ½ can be realized by right shift 1 bit, so no multiplication needed.
Low-Complexity Integer Transform Inverse transform Forward transform
QUANTIZATION • It is the step that introduces signal loss for better compression. • Encoder quantization is given by where controls the quantization width near the origin. • The decoder produces reverse quantization by
Rules of Quantization • There must be as low complexity as possible since the H.264 uses predictive coding which means that the error will tend to drift over the entire set for each prediction. • Memory requirements are very high for 32-bit operations hence the arithmetic must be as close to 16-bit as possible. • There must be no undue stress on the hardware yet keeping the prediction drift error free.
The disadvantage of the quantizing equation is that it divides by an integer . • In the H.264 format the quantization is of the form • The inverse quantization is given by • The values A(Q) and B(Q) are obtained from the quantization tables.
In the previous equation And Q varies from 0 to Qmax. Hence 0 is the finest and Qmax is the coarsest quantization. • Care must be taken during shifting the bits right since repeated division means tending towards negative infinity and not 0. • In the original H.264 design, L=N=20.
The values A(Q) and B(Q) must satisfy the form where G is the squared norm of the rows of H. • The values of L & N are chosen on a compromise. Larger values reduce approximation error in the above equation and smaller values reduce dynamic range.
16 Bit Arithmetic and Quantization Tables • The complexity of quantization formulae are reduced considerably by reducing them to 16 bits. • However, this reduction must be traded off with no reduction in PSNR. • This is done by effectively reducing values of B(Q), L & N. • B(Q) effectively doubles for an increase of 6 in Q making it a linear relationship between PSNR and Q. • This makes it easier to design quantization and reconstruction tables.
Quantization and Reconstruction Tables • The H.264 hence uses the modified quantization and reconstruction formulae Where The mod operator makes the quantization factor periodic making it easy to define a large range of parameters without increasing memory requirements
The matrices shown denote values of A(Q) and B(Q) such that the matrices maximise dynamic range. These ensure that results always fall within a 16 bit result.