520 likes | 802 Views
MPEG-4 Video Compression. The MPEG-4 visual standard has been explicitly optimised for three bit rate ranges: below 64 kbit/sec, 64 - 384 kbit/sec, 384- 4 Mbit/sec
E N D
MPEG-4 Video Compression • The MPEG-4 visual standard has been explicitly optimised for three bit rate ranges: below 64 kbit/sec, 64 - 384 kbit/sec, 384- 4 Mbit/sec • It provides content-based interactivity through the coding and representation of video objects rather than video frames to enable content-based applications. • It represents arbitrarily shaped video objects where each object can be encoded with different parameters, and at different qualities. The shape of a video object can be represented in MPEG-4 by a binary or a gray-level (alpha) plane. The texture is coded separately from its shape. • It provides support for both interlaced and progressive material • Chrominance 4:2:0 format that is supported where the number of Cb and Cr samples are half the number of samples of the luminance samples in both horizontal and vertical directions. Each component can be represented by a number of bits ranging from 4 to 12 bits
MPEG-4 Video CompressionData structure in visual part of MPEG-4
MPEG-4 Video CompressionData structure in visual part of MPEG-4 • Visual Object Sequence (VS): The complete MPEG-4 scene which may contain any 2-D or 3-D natural or synthetic objects and their enhancement layers • Video Object (VO): A video object corresponds to a particular (2-D) object in the scene. In the most simple case this can be a rectangular frame, or it can be an arbitrarily shaped object corresponding to an object or background of the scene. • Video Object Layer (VOL): Each video object can be encoded in scalable (multi-layer) or non-scalable form (single layer), depending on the application, represented by the video object layer (VOL). The VOL provides support for scalable coding. A video object can be encoded using spatial or temporal scalability, going from coarse to fine resolution. • Group of Video Object Planes (GOV): The GOV groups together video object planes. GOVs can provide points in the bitstream where video object planes are encoded independently from each other, and can thus provide random access points into the bitstream. GOVs are optional • Video Object Plane (VOP): A VOP is a time sample of a video object. VOPs can be encoded independently of each other, or dependent on each other by using motion compensation. A conventional video frame can be represented by a VOP with rectangular shape.
MPEG-4 Video CompressionData structure in visual part of MPEG-4
MPEG-4 Video Compression Block diagram of natural video decoding • The shape, texture and motion of every VOP is coded together. Shape of VOPs Reconstruction of VOP from Motion compensated previous VOP bounded by shape Composition of All VOPs MVs of VOP Macroblocks Image Store of Previous frames Prediction Error data
MPEG-4 Video Compression Shape coding tool • The shape of a VOP is bounded by a rectangular window with a size of multiples of 16 pixels in horizontal and vertical directions • The position of the bounding rectangle is chosen such that it contains the minimum number of blocks of size 16x16 with non transparent pixels. • The binary matrix representing the shape of a VOP is referred to as binary mask. In this mask every pixel belonging to the VOP is set to 255, and all other pixels are set to 0. • Every VOP is partitioned into smaller 16x16 Binary Alfa Blocks (BABs) for coding
C9 C8 C7 C3 C2 C1 C6 C5 C4 C3 C2 C0 ? C1 C0 ? Current Current MC C8 Intra context Inter context C7 C6 C5 C4 Shape Coding BAB: Binary Alfa Block • Context-based arithmetic encoding (CAE) is used in intra shape or update for inter shape • Context: formed by neighboring shape pixels • Intra context • Inter context • Context = • Context is used to access probability table, which generates probability intervals for arithmetic coding • Pixel by pixel, from L to R, top to bottom build up the arithmetic word for the BAB • Each BAB coded into one arithmetic codeword • Pixels outside context bounding box are assumed to be 0. Single binary arithmetic codeword
Each context has a probability of occurrence that is derived from the analysis of shapes and is mapped onto an arithmetic 0.0 -> 1.0 interval. This is used to arithmetic code the shape of objects Context P(0) P(1) P(1) P(0) Final interval Codeword for BAB Shape Coding For a particular context what is the prob that mask bit = 0 and 1
MPEG-4 Video Compression Grey Scale Shape coding tool • The grey scale shape information has a similar corresponding structure to that of binary shape with the difference that every pixel (element of the matrix) can take on a range of values (usually 0 to 255) representing the degree of the transparency of that pixel. • Gray scale shape information is encoded using a block based motion compensated DCT similar to that of texture coding • Grey scale shapes are required to feather in boundaries of objects with their backgrounds so that the object boundaries do not appear harsh.
MPEG-4 Video Compression Motion compensation tools • The approaches for motion compensation in the MPEG-4 standard have adapted the block-based techniques used in the other standards to the VOP structure: • A VOP may be encoded independently of any other VOP. In this case the encoded VOP is called an Intra VOP (I-VOP). • A VOP may be predicted (using motion compensation) based on another previously decoded VOP. Such VOPs are called Predicted VOPs (P-VOP). • A VOP may be predicted based on past as well as future VOPs. Such VOPs are called Bidirectional Interpolated VOPs (B-VOP). B-VOPs may only be interpolated based on I-VOPs or P-VOPs.
MPEG-4 Video Compression Motion compensation tools Motion compensated coding modes (I, B, P)
MPEG-4 Video Compression Motion vector computation • MVs of macroblocks totally within an object are predicted in the normal way: Contentional MB matching; Advanced Prediciton; Unrestricted; Predicition; Prediction-Bidirectional; • MVs of macroblocks across an object border are padded to minimise the prediction errors at the boundary of objects and then prediction is computed. • MVs of macroblocks totally outside an object are not encoded
Process of normal padding of a block Normal Padding Extended Padding MPEG-4 Video Compression Motion compensation - padding • Padding repeats the pixel value at the boundary to the edge of the MB. Overlapping repeats are averaged. • Extended padding repeats this process to MBs that are adjacent to edge MBs Process of extended padding of a block Process of padding of a VOP
MPEG-4 Video Compression Texture coding tools • 8x8 block-based DCT is used. To encode an arbitrarily shaped VOP, • an 8x8 grid is super-imposed on the VOP. • Using this grid, 8x8 blocks that are internal to VOP are encoded without modifications. • Blocks that straddle the VOP are called boundary blocks, and are treated differently from internal blocks. • The transformed blocks are quantized, and individual coefficient prediction can be used from neighbouring blocks to further reduce the entropy value of the coefficients. • This is followed by a scanning of the coefficients, to reduce to average run length between two coded coefficients.
MPEG-4 Video Compression Texture coding tools • Macroblocks totally within an object are encoded in the normal way • Macroblocks totally outside an object are not encoded • Macroblocks across an object border are padded to avoid DCT coefficients ringing in the spatial frequency domain.
MPEG-4 Video Compression Adaptive AC/DC prediction • Direction of the prediction is adaptive and is selected based on comparison of horizontal and vertical DC gradients (increase or reduction in its value) of surrounding blocks A, B, and C. • two types of prediction possible, DC prediction and AC prediction: • DC prediction: The prediction is performed for the DC coefficient only, and is either from the DC coefficient of block A, or from the DC coefficient of block C. • AC prediction: Either the coefficients from the first row, or the coefficients from the first column of the current block are predicted from the co-sited coefficients of the selected candidate block. Differences in the quantization of the selected candidate block are accounted for by appropriate scaling by the ratio of quantization step sizes.
MPEG-4 Video Compression Coefficients scanning • 1. Zig zag scan: The coefficients are read out diagonally. • 2. Alternate-horizontal scan: The coefficients are read out with an emphasis on the horizontal direction first. if there is DC prediction in horizontal direction • 3. Alternate-vertical scan: Similar to the horizontal scan, but applied in the vertical direction. if DC prediction is performed from the vertical direction
MPEG-4 Video Compression Quantization of AC Spectral Components • Two types of quantizations available: • The first method uses one of two available quantization matrices to modify the quantization step size depending on the spatial frequency of the coefficient. • The second method uses the same quantization step size for all coefficients. • MPEG-4 also allows for a non-linear quantization of DC values
MPEG-4 Video Compression Interlaced coding mode • Allows progressive and interlaced mode. • Motion compensation for field or frames similar to that of MPEG-2 • Modified AC/DC prediction • Field DCT • Interlaced I, P, and B VOP coding • Modified prediction for motion coding • Modified scan rules • 10% more efficient in compression efficiency compared to MPEG-2
MPEG-4 Video Compression Interlaced Coding • Frame DCT coding: Each luminance block is composed of lines from two fields alternately. • Field DCT coding: Each luminance block is composed of lines from only one of the two fields.
MPEG-4 Video Compression Scalability • Object scalability • Achieved by the data structures used and the shape coding • Temporal scalability • Achieved by generalized scalability mechanism • Spatial scalability • Achieved by generalized scalable mechanism
MPEG-4 Video Compression Temporal scalability • The temporal scalability is achievable for both rectangular frames and arbitrarily shaped VOPs • The base layer is encoded conventional MPEG-4 video • The enhancement layer is encoded using one of the following two mechanisms: • Type 1: The enhancement-layer improves the resolution of only a portion of the base-layer. • Type 2: The enhancement-layer improves the resolution of the entire base-layer.
MPEG-4 Video Compression Temporal enhancement types Only a portion of the base layer is enhanced in the enhancement layer The enhancement layer improves the resolution of the entire base layer
MPEG-4 Video Compression Temporal scalability Type 1 Only a portion of the VOP in the base layer is enhanced
MPEG-4 Video Compression Temporal Scalability Type 2 The entire VOP in the base layer is enhanced
MPEG-4 Video Compression Spatial scalability • The base layer is coded as conventional MPEG-4 video • The enhancement layer is encoded using prediction mechanisms from the base layer
MPEG-4 Video Compression Spatial scalability VOPs of the enhancement layer are encoded as P-VOPs or B-VOPs.
MPEG-4 Video Compression Error resilience tools • Resynchronization markers: There are unique markers in the bitstream so that in the case of an error, the decoder can skip the remaining bits until the next marker and restart decoding from that point on. • Data partitioning:This method separates the bits for coding of motion information and those for the texture information. In the event of an error, a more efficient error concealment may be applied when for instance the error occurs on the texture bits only, by making use of the decoded motion information. • Extended header code:These binary codes allow an optional inclusion of redundant header information, vital for correct decoding of video. This way, the chances of corruption of header information and complete skipping of large portions of bitstream will be reduced.
MPEG-4 Video Compression Error resilience tools • Reversible VLCs: These VLCs allow to further reduce the influence of error occurrence on the decoded data. RVLCs are codewords which can be decoded in forward as well as backward manners. In the event of an error and skipping of the bitstream until the next resynchronization marker, it is possible to still decode portions of the corrupted bitstream in the reverse order to limit the influence of the error.
MPEG-4 Video Compression Error resilience tools • For MPEG-4 resynchronization markers are located at start of picture and boundary of objects • For H263 resynchronization markers are located at start of picture and Group of Blocks (GOBs). Picture Start Code MPEG4 Resync Marker H.263 Resync Marker H.263 Bitstream MPEG4 Bitstream
MPEG-4 Video Compression Static sprite coding tools • A sprite consists of those regions of a VO that are present in the scene, throughout the video segment. • An obvious example is a `background sprite' (also referred to as the `background mosaic'), which would consist of all pixels belonging to the background in a camera-panning sequence.
MPEG-4 Video Compression Sprite Coding Tools • Low latency sprite coding: transmit only a portion of the sprite in the beginning. The remainder of the sprite is transmitted, piece-wise, as required or as the bandwidth allows. Another method is to transmit the entire sprite in a progressive fashion, starting with a low quality version, and gradually improving its quality by transmitting residual images.
MPEG-4 Video Compression Static Texture – Wavelet Transform • The static coding technique is based on a wavelet transform: Lx Hx • Lx, Ly image can be recursively decomposed into four subimages • Quantise and entropy code each sub-image, choosing number of bits/subimage to optimise quality of image A B (A+B)/2 (A-B)/2 C D HxLy LxLy (C+D)/2 (C-D)/2 LxHy HxHy • Lx, Ly are low pass filters in x and y directions • Hx, Hy are high pass filters in x and y directions Ly ((A-B)+(C-D))/2 (A+B+C+D)/2 Hy ((A+B)-(C+D))/2 ((A-B)-(C-D))/2
MPEG-4 Video Compression Wavelet Transform – DC Sub-band • The DC sub-band is encoded using a predictive scheme. • Each coefficient is predicted from its left or top neighbour depending which is closest. • The difference is then arithmetic coded.
MPEG-4 Video Compression Wavelet Transform – AC Sub-band • Many of the coefficients of the AC sub-band become zero after quantisation. • There is a strong correlation between the amplitudes of the wavelets across the scales • Zero Tree algorithm exploits this strong correlation. If a node on a the tree has a value X then its descendants will be very similar to it. The difference patterns are then arithmetic encoded.
MPEG-4 Video Compression Shape adaptive wavelet coding • Generalization of the wavelet transform to arbitrarily shaped VOP • number of transformed coefficients in the VOP = number of pixels in the VOP • Generalization of zero-tree coding • no extra bit necessary for pixels outside the VOP
MPEG-4 Video Compression Wavelet coding - SNR scalability bitstream 30kbits 8kbits 5kbits
bitstream 47kbits 14kbits 34kbits MPEG-4 Video Compression Wavelet coding - spatial scalability 47kbits 14kbits 34kbits
MPEG-4 Video Compression 12-bit video coding tool • Allows compression of video data with precision of up to 12-bits/pixel • The syntax, semantics, and coding tools are extended: • bit-precision • extended DC VLC tables • extended quantization mechanism • Insertion of marker bits to avoid start code emulations
MPEG-4 Systems Multiplexing • Place media objects anywhere in a given coordinate system. • Apply transforms to change the geometrical or acoustical appearance of a media object. • Group primitive media objects in order to form compound media objects. • Apply streamed data to media objects, in order to modify their attributes (e.g. a sound, a moving texture belonging to an object; animation parameters driving a synthetic face). • Change interactively the user’s viewing and listening points anywhere in the scene.
MPEG-4 Systems Multiplexing m u l t i p l e x d e m u l t i p l e x C o m p o s i t o r com- press decom- press com- press decom- press com- press decom- press com- press decom- press com- press decom- press Scene Descr. Scope of MPEG-4 Systems
MPEG-4 Systems System Decoder Model D M U X C o m p o s i t o r Decoder Buffer DB1 Decoder Composition Memory CM1 Decoder Buffer DB2 Decoder Composition Memory CM2 Decoder Buffer DB3 Decoder Composition Memory CM3 Decoder Buffer DBn Decoder Composition Memory CMn Scope of MPEG-4 Systems
MPEG-4 Systems Flex Mux and Trans Mux Multiplexes group of logical associated media FlexMux TransMux Multiplexes media for transport (utilises existing standards e.g DVB-T, IP over ATM etc.)