Combined scalability coding based on the scalable extension of H.264/AVC

Combined scalability codingbased on the scalable extension of H.264/AVC Sangseok Park, PhD candidate June 13, 2008

Abstract • Scalable Video Coding (SVC) of H.264/AVC • The annex G of latest SVC draft [JVT-X201] • Spatial, temporal, quality scalabilities • Final Draft International Standard (FDIS) in July 2007 • Simplified Fine-Granular Scalability (FGS) [JVT-W111] • Combination of significant and refinement coding passes • Introduction of code type method • SVC phase 2 • Bit-Depth Scalability (BDS) design • Based on Inter-Layer Prediction (ILP) [Schwarz07] • Reverse tone-mapping process • The low complexity motion-compensation structure • SVC phase 2 [JVT-X201] T. Wiegand et al, “Joint Draft ITU-T Rec. H.264 | ISO/IEC 14496-10 / Amd.3 Scalable video coding,” JVT Document, JVT-X201, Geneva, Switzerland, Jul. 2007, [JVT-W111] M. Karczewicz, S. Park, and H. Chung., "Report of core experiment on FGS simplification (CE1)." JVT Document, JVT-W111. San Jose, California, Apr. 2007 [Schwarz07] H. Schwarz, D. Marpe, and T. Wiegand,“Overview of the Scalable Video Coding Extension of the H.264 /AVC standard,” JVT Document, IEEE Trans. CSVT, vol. 17(9), pp. 1103-1120, Sept. 2007

[HHI website] • Scalability: One bit-stream can adapt itself according to networks or terminals by dropping or truncation of parts of a bit-stream even without severe degradation of the content • Good coding efficiency • Low decoder complexity • Much better than simulcast stream, the simplest scalability approach, which combines several independent bit streams [HHI website] HHI's Image Communication., "The Scalable Video Coding Amendment of the H.264/AVC Standard." http://ip.hhi.de/imagecom_G1/savce/.

Quality scalability • CGS coding ( Coarse Grain Scalability ) • Small number of bit-extraction points • Coarse bit rate variation • Decreasing quantization steps from the SNR base layer to enhancement layer. • Fine grain scalability (FGS) coding • Progressive refinement (PR) slices • Can be truncated at arbitrary extraction points • No zig-zag scanning order of transform coefficients, cyclical scanning order used • High complexity Macroblock (MB) Slice

SVC encoder block diagram

Simplified FGS encoder block diagram • EOB: End of block, VLC: Variable length coder, CBP: Coded block pattern • BL : Base layer, EL : Enhancement layer

Results for FGS ( Fine Granular Scalability) • Performed on the basis of JSVM 7.10 with C++ [JSVM] • The proposed method was accepted and verified as SVC of H.264/AVC in the 23rd JVT meeting, San Jose, CA [JVT-W200] • The average improvement on all tested CIF sequences is 0.46% bitrate reduction while the complexity of original FGS encoder is reduced so much that high-level syntax decreased up to one-third of the original FGS encoder[JVT-W200] [JSVM]JSVM 8.10, RWTH CVS server [JVT-W200] T. Wiegand et al, “Meeting Report, Draft 7,” JVT Document, JVT-W200, San Jose, CA, Apr. 2007

Bit-Depth Scalability (BDS) • New scalability, called as Bit-Depth scalability, needed for High dynamic range (HDR) contents, such as high accurate video, remote sensing, medical applications, digital animation movies since HDR cameras and display devices have been developed. • Bit Depth,8, shows two to three orders of dynamic range ex: 256 • Bit Depth,10, shows three to four orders of dynamic range ex: 1024 • Backward compatibility should be considered. • The content can be viewed simultaneously in both current low dynamic range (LDR) devices and HDR devices. • However the current SVC does not support the bit-depth scalability.

Tone-mapping (TM) or Inverse tone-mapping (iTM) ideas for reduction or extension of the dynamic range. • TM : convert HDR sequences into LDR sequences. ex: 10bpp to 8bpp • iTM : convert LDR sequences into HDR sequences but not a exactly mathematical inverse due to loss of information. ex: 8bpp to 10bpp • Preserve the human perception for the scene. • HDR images cannot be viewed with conventional monitors but can be viewed after TM processes.

The coding flow of the enhancement layer for each macroblock is arranged as follows

SVC structure for BDS [Park08] S. Park and K.R.Rao, “Bit-Depth Scalable Video Coding Based on H.264/AVC,” IEICE Trans. Fundamentals Letter, Vol.E91-A, No.6, pp. 1541-1544 June 2008

Generate a mapping function • Inverse tone-mapping (iTM) toexpand the dynamic range in LDR (Low Dynamic Range) sequences • Linear scaling is the simplest approach • Severe noise in borders of bright area and dark area and makes contrast change sharply • Mapping Function (MF) approach • Arithmetic mean, not computationally expensive and easy to use to obtain a one-to-many mapping function [Mantiuk06] • i : the number of pixels per frame, j : is a pixel value in a LDR sequence, is a pixel value in a HDR (High Dynamic Range) sequence, and is the number of frequencies where how many cases of pixels fall into each j bin. • Mapping information is sent on a sequence parameter set (SPS) for the entire sequence • Can be overridden, depending on the features of each frame, by being sent on picture parameter set (PPS) or a slice header.

Bit Steam in H.264/AVC [Wiegand03] • VCL (Video Coding Layer) NAL unit (Network Abstraction Layer) • contains the values of the samples in the video pictures • non VCL NAL unit • contains associated additional information such as parameter sets • One frame can be one slice or split into several slices but one frame corresponds to one slice in my research [Wiegand03] T. Wiegand et al, “Overview of the H.264/AVC video coding standard,” IEEE Trans. CSVT, vol. 13(7), pp. 560-576, July 2003.

Scaling and offset on the MB basis • Spatially varying approach on the basis of MB • MB-based scaling factor and offset values are computed to obtain a prediction of MB based on sum of absolute difference (SAD) for a HDR sequence by • scaling_factor s can be {1, 1.5, 2, 2.5, 3, 3.5, 4} • offset having the minimum SAD of Residual(x,y) is calculated for each scaling_factor as follows • W and H are set to 16 and (m,n) is the starting position of each MB • Prediction generated from a mapping table is obtained by • MF(.) is the mapping function for inverse tone-mapping process

Experimental results

Coding gain of 0.14dB or 1.2% reduction in bits rate is obtained for 10 bits/pixel test sequences. • Coding gain for 12 bits/pixel test sequences reaches up to 4.2dB or 48% reduction in bit rate. • This approach brings the minimum increase in complexity by avoiding motion estimation in the enhancement layer • Increases the robustness of quality when there is no frequent update of a mapping function table

Future Works related to H.265 • H.265 design project from VCEG meeting in Geneva, Apr. 2008 [Lee08] • Progressive-scan (only) • Picture sizes • QVGA, VGA, 1080p60,2kx4k • Frame rate • 12.5/15,24/25/30,50/60,100/120, • Picture size/grid conversion within the design (e.g. 4:4:4  4:2:0, 8bpp,10bpp,12bpp) • Sampling grid: 4:2:0,4:4:4,Bayer Color Array • Views :1, N > 1 • Portable encoders, Parallelism, memory bandwidth, asymmetry (can shift balance from encoder to decoder for videoconferencing, surveillance, and mobile camcorders) from complexity issues [Lee08] From Dr. Yung-Lyul Lee at Sejong University in Korea, presently visiting professor in UTA

Combined scalability coding based on the scalable extension of H.264/AVC