Multimedia Compression

Multimedia Compression - 2

Video Compression Standards • Compression needs • H.261 • H.263 • MPEG (2, 4, 7,21)

Coding Requirements for Video PAL standard: ·Luminance and chrominance signals are encoded separately and the resulting digital data stream are transformed using a multiplexing technique (4:2:2) · Sampling rate of 13.5 Mhz is used for luminance Y. Y = 0.30R + 0.59G + 0.11B · Sampling rate for chrominance (R-Y and B-Y) is 6.75 MHz

Coding Requirements for Video With 8-bit coding of each sample the bandwidth requirement is: (13.5MHz + 6.75 MHz + 6.75 MHz) x 8 bits = 216 Mbits For storage requirement 640 x 480 x 3 x 8 = 7,372,800 bits For one second of playback (25 frames/second) the storage requirement is 184.32 Mbits

Coding Requirements for Video • ·For dialogue mode of operation, based on human perception end-to-end delay should not exceed 150 milliseconds for compression and decompression. • In a retrieval mode the following demands arise: • Fast forward and backward retrieval with simultaneous display should be possible • Random access to single images and audio frames with an access time less than 0.5 ms.

Coding Requirements for Video • For both dialogue and retrieval mode the following are required: • Definition of a format independent of frame size and video frame rate. • Various audio and video data rates should be supported which leads to different qualities that can be adjusted according to different systems. • Possibility of synchronization of audio signal with video • Coding should be realized using software. • Portability

Video Compression • Most of the activities were triggered for teleconferencing and videophony applications. For multimedia use we can’t assume that users have video products. We assume that the video will be played back on ordinary computers. • Most of the time video data is compressed twice. First during capturing then by software codec. • All video compression algorithms work on digitized video consisting of bitmapped images

Video Compression • Both MPEG and H.261 extend the DCT based compression approach to include methods for spatial and temporal compression reducing frame-to-frame redundancies. • Starting frame or reference frame is compressed first using JPEG like DCT-based approach. Reconstruct the next frame and decide based on the amount of motion whether to compress the next frame independently or by using references to the previously coded frame. • The motion estimation step involves sliding each reconstructed sub-image around its immediate neighborhood in the next frame and computing a measure of correlation.

H.261 (px64) • Developed for videophone and videoconferencing systems over ISDN. • Since ISDN channels have a bandwidth of 64 kbps, H.261was defined to operate at data rates that are multiples of 64. • The standard is px64 kbits/second (p = 1, 2, …, 30) • Applications • Two-way telecommunications • videoconferencing • For these dialogue applications coding and decoding must be done in real time

H.261 (px64) • Unlike JPEG, H.261 defines a very precise image format. • Image refresh rate at the input must be 29.97 frames/s • During encoding frame rate can be 10-15 frames/s • Two resolution formats with 4/3 aspect ratio are defined • CIF – Common Intermediate Format (optional) • Luminance components of 352 lines x 288 pixels • Chrominance components are sub-sampled with 176linesX144 • QCIF – Quarter CIF (mandatory) • 176X144 pixels for luminance • 88X72 pixels for chrominance

H.261 (px64) • Example • 29.97 frames/s uncompressed QCIF data stream has a data rate of 9.115 Mbits. • ((176 X 144 X 8) + 2(88 X 72 X 8)) X 29.97 = 9.115 Mbits/s • Necessary compression ratio is 1:47.5 • At the same frame rate CIF has an uncompressed data rate of 36.45 Mbits/s.

H.261 (px64) • Coding Algorithms • Uses inter-frame and intra-frame coding • Intraframecoding • similar to JPEG • interframe coding • prediction method is used to find the most similar macro block in the preceding image. • Motion vector is defined as the relative position of the previous macro block with the current one.

H.261 (px64) • Layered structure • Block of 8x8 pixels • Macroblock of 4 Y blocks, 1 Cr block, 1 Cb block

H.261 (px64) 16x16 Macroblock of 4 Y blocks, 1 Cr block, 1 Cb block 4:2:0

Intra-frame Coding • Macroblock: • 16 x 16 pixel areas on Y plane of original image. • Usually consists of 4 Y blocks, 1 Cr block, and 1 Cb block (4:2:0 or 4:1:1) • Quantization is by constant value for all DCT coefficients (i.e., no quantization table as in JPEG).

Frame Sequence of H.261 • I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each I-frame, hence “Intra”. • P-frames are not independent: coded by a forward predictive coding method (prediction from a previous P-frame is allowed — not just from a previous I-frame). • Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal. • To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video.

Frame Sequence of H.261 • Two frame types: Intra-frames (I-frames) and Inter-frames (P-frames): I-frame provides an accessing point, it uses basically JPEG.

Inter-frame Coding • Motion vectors in H.261 are always measured in units of full pixel and they have a limited range of ± 15 pixels, i.e., p= 15. • For each macroblock in the Target frame (curent frame), a motion vector is allocated by one of the search methods discussed earlier. - After the prediction, a difference macroblockis derived to measure the prediction error. • Each of these 8 x 8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures.

Inter-frame Coding • The P-frame coding encodes the difference macroblock (not the Target macroblock itself). • Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level. - The MB itself is then encoded (treated as an Intra MB) and in this case it is termed a non-motion compensated MB. • For a motion vector, the differenceMVD is sent for entropy coding:MVD = MVPreceding− MVCurrent

Inter-frame Coding

Motion Vector Searches C(x+k, y+l): macro block pixels in the target R(x+i+k, y+j+l): macro block pixels in the reference • The goal is to find a vector (u, v) such that • the mean Absolute Error, MAE(u, v) • is minimum: • Full Search Method • Two-dimensional Logarithmic Search • Hierarchical Motion Estimation

Motion Estimation and Compensation Positions (X, Y) Current block Referans Block • The mean squared error between the current block and the same position in the reference frame is: • {(1-4)2 + (3-2)2 + (2-3)2 + (6-4)2 + (4-2)2 + (3-2)2 + (5-4)2 + (4-3)2 + (3-3)2 }/9 = 2.44 MSE Values

H.263 • H.263 is an improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN). • Aims at low bit-rate communications at bit-rates of less than 64 kbps. • (27 kbps – the speed of V.34 modem) • Uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction). • Video size: Sub-QCIF (128x96), QCIF (176x144), CIF(352x288), 4CIF(704X576), 16CIF (1408x1152)

H.263 • Performance is improved by • In order to reduce the prediction error, half-pixel precision is supported in H.263 vs. full-pixel precision only in H.261. • - The default range for both the horizontal and vertical components uand vof MV(u, v) are now [−16, 15.5]. • - The pixel values needed at half-pixel positions are generated by a simple bilinear interpolation method, as shown in the Figure Half-pixel Prediction by Bilinear Interpolation in H.263.

H.263 • PB frames mode (bidirectional prediction) • In H.263, a PB-frame consists of two pictures being coded as one unit. • - The PB-frames mode yields satisfactory results for videos with moderate motions. • - Under large motions, PB-frames do not compress as well as B-frames and an improved new mode has been developed in Version 2 of H.263.

H.263 • Unrestricted motion vector mode • The pixels referenced are no longer restricted to be within the boundary of the image. • - When the motion vector points outside the image boundary, the value of the boundary pixel that is geometrically closest to the referenced pixel is used. • - The maximum range of motion vectors is [-31.5, 31.5].

Moving Picture Experts Group - MPEG • The MPEG accepted as an International standard (IS)was developed to cover motion video as well as audio coding according to ISO/IEC standardization. • Achieves data stream compression rate about 1.2 Mbits/second.

Moving Picture Experts Group - MPEG • Expected Features • random access • fast forward/reverse search • reverse playback • audio-visual synchronization • robustness to errors • coding/decoding delays • editability • format flexibility • cost considerations

Moving Picture Experts Group - MPEG • Based on the experiences with JPEG and H.261 • Follow up standards • MPEG-1 (1992) • Coding audio and media for storage media (CD-ROM, 1.5 Mbps) • VCD and MP3 • MPEG-2 (1994) • Coding of audio and video for transport and storage. • Higher data rates (4~80 Mbps) for high-quality audio/video • Digital TV set top boxes and DVD

Moving Picture Experts Group - MPEG • MPEG-4 (1999 – 2001) • Coding of natural and synthetic media objects for web and mobile applications (interactivity) • MPEG-7 2001 (“experimental core” status) • Multimedia content description for AV materials • basis for search and retrieval • MPEG-21 (upcoming- started in June 2000) • Openframework for multimediadeliveryandconsumption • Transparent and augmented use of multimedia resources.

Moving Picture Experts Group - MPEG • Scalability for hierarchical coding: partition the signal into substreams (layers), each representing a well-defined portion of the signal. • First substream (base layer) carries the elements that are essential for the reconstruction of the signal but at a lower quality. • Other substreams improve the quality incrementally. • Users/receivers use as many layers as their resources allow.

Moving Picture Experts Group - MPEG • Advantages of Layered Coding • Allow users with heterogeneous resources. • Flexibility in shaping of the traffic • Effective bandwidth usage • Aid in real time communication

Moving Picture Experts Group – MPEG1 • MPEG-1 adopts the CCIR601 digital TV format also known as SIF (Source Input Format). • • MPEG-1 supports only non-interlaced video. Normally, its picture resolution is: • – 352 × 240 for NTSC video at 30 fps • – 352 × 288 for PAL video at 25 fps • – It uses 4:2:0 chroma subsampling

Moving Picture Experts Group - MPEG • H.261 and MPEG differ from each other in their motion estimations. H.261 requires that each frame be compared to a single preceding frame, whereas the proposed MPEG standard does not define the number of frames that may be used in the motion estimation process.

Overview of MPEG Compression Algorithm • Quality requirement demand • very high compression not achievable only by intra-frame compression alone • random access is best satisfied with intra-frame coding only • The challenge is to implement inter-frame coding while not compromising the applications that demand random access. • Two inter-frame coding techniques • predictive • interpolative

Moving Picture Experts Group – MPEG1 Motion Compensation (MC) based video encoding in H.261 is based on forward prediction. There is a need for Bidirectional Search The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame.

Video Encoding • MPEG distinguishes 3 types of image coding for processing: • I-Frame ( Intracoded image) Coded as still image • JPEG • Point of random access • P-Frame : Predictive coded frames • require information of the previous I frame and/or all previous P-frames for encoding and decoding. • P-frame is based on the idea that successive images don’t change in area but the whole area is shifted. The macroblock of the last P or I frame that is most similar to the macroblock under consideration is determined. Only the motion vector and small difference in content of these macro blocks are encoded.

Video Encoding • B-Frame: Bi-directionally predictive coded frames • Based on the difference of a prediction of the past image and the following P or I frame. • B-frames can never be accessed randomly. • A macro block may be derived from the previous or the next macro block of P or I frame. • Interpolative motion compensation is allowed.

Video Encoding B frame macroblocks can specify two motion vectors (one to past and one to future), indicating result is to be averaged.

A TYPICAL MPEG FRAME SEQUENCE* GOP – Group Of Pictures Forward prediction I = Intracoded (Interpolated) Frame B = Bidirectional Predicted Frame P = Forward Predicted Frame

MPEG Frame Sequences • The order of images in a data stream and during display may differ. • The first image in a data stream always has to be an I-frame. The decoder decodes and stores the reference frame first. During display B-frame can occur before an I-frame.

MPEG Frame Sequences • Several factors in selecting a frequency for I frames • Rate of scene change • Random accessibility • Available resources • Compression vs. computation P pictures compress 3 times as much as I pictures B pictures compress 1,5 times as much as P pictures but, reconstruction of B pictures requires heavy computation.

Moving Picture Experts Group – MPEG1 • Frame is divided into macroblocks of 16X16 • Each slice is coded independently — additional flexibility in bit-rate control. • – Slice concept is important for error recovery.

MPEG at a Glance

MPEG-2 • For higher quality video at a bit-rate of more than 4 Mbps • Supports interlaced video as well since this is one of the options for digital broadcast TV and HDTV. • In interlaced video each frame consists of two fields, referred to as the top-field and the bottom-field. – In a Frame-picture, all scanlines from both fields are interleaved to form a single frame, then divided into 16×16 macro-blocks and coded using MC. – If each field is treated as a separate picture, then it is called Field-picture.

MPEG-2 Field pictures and Field-prediction for Field-pictures in MPEG-2. (a) Frame−picture vs. Field−pictures (b) Field Prediction for Field−pictures

MPEG-2 Other Major Differences from MPEG-1 • Better resilience to bit-errors: In addition to Program Stream,a Transport Stream is added to MPEG-2 bit streams. • Support of 4:2:2 and 4:4:4 chroma subsampling. • More restricted slice structure: MPEG-2 slices must start and end in the same macroblock row. In other words, the left edge of a picture always starts a new slice and the longest slice in MPEG-2 can have only one row of macroblocks. • More flexible video formats: It supports various picture resolutions as defined by DVD, ATV and HDTV.

Wired & Wireless MPEG-4 Application Scenarios Live Content License Server Download & Play Streaming Live Feed On-demand Content Media Encoder Media Services Server Streaming from a Media Server (or Web Server) Media Player PC, Hand-held, STB Stored Content Compression Access Interaction

MPEG–4 process • For end users: brings higher level of interaction with content, within the limits set by the author. • First represent units of aural, visual or audiovisual content, each called a "media object". • May be natural (i.e., recorded by a camera) or synthetic • (generated with a computer) • Describe the composition of these objects to create compound media objects that form audiovisual scenes; • Synchronize the data associated with media objects, so that they can be transported over network channels • Quality of Service is important for specific media objects • Allow interaction with the audiovisual scene generated at the receiver’s end.

MPEG – 4 components • MPEG-4 audiovisual scenes are composed of several media objects, organized in a hierarchical fashion. At the leaves, we find primitive media objects, such as: • still images (e.g. as a fixed background), • video objects (e.g. a talking person - without the background) • audio objects (e.g. the voice associated with that person)

Multimedia Compression - 2