1 / 65

Data Compression

Data Compression. Terminology. Physical versus logical Physical Performed on data regardless of what information it contains Translates a series of bits to another series of bits Logical Knowledge-based Change United Kingdom to UK. Terminology. Symmetric

prince
Download Presentation

Data Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Compression

  2. Terminology • Physical versus logical • Physical • Performed on data regardless of what information it contains • Translates a series of bits to another series of bits • Logical • Knowledge-based • Change United Kingdom to UK Data Compression

  3. Terminology • Symmetric • Compression and decompression roughly use the same techniques and take just as long • Data transmission which requires compression and decompression on-the-fly will require these types of algorithms Data Compression

  4. Terminology • Asymmetric • Most common is where compression takes a lot more time than decompression • In an image database, each image will be compressed once and decompressed many times • Less common is where decompression takes a lot more time than compression • Creating many backup files which will hardly ever be read Data Compression

  5. Terminology • Non-adaptive • Contain a static dictionary of predefined substrings to encode which are known to occur with high frequency • Adaptive • Dictionary is built from scratch Data Compression

  6. Terminology • Semi-adaptive • In pass 1, an optimal dictionary is constructed • In pass 2, the actual compression occurs Data Compression

  7. Terminology • Lossless • decompress(compress(data)) = data • Lossy • decompress(compress(data))  data • A small change in pixel values may be invisible, however Data Compression

  8. Pixel Packing Data Compression

  9. Run-Length Encoding • Repeating string of characters, called a run, is coded into two bytes • First byte contains the run count, one less than the number of repetitions • Second byte contains the run value, the character being repeated Data Compression

  10. Run-Length Encoding • ‘77777zzzyyyyyyV’ becomes ‘472z5y0V’ • 15 byte string becomes 8 bytes long • Compression ratio of almost 2 to 1 • Some strings become twice as long • ‘7fu5JLY9jhYIujG’ Data Compression

  11. Data Compression

  12. Lempel-Ziv-Welch (LZW) • Lossless • GIF, TIFF, V.42bis modem compression standard, PostScript Level 2 • Substitutional or dictionary-based • Algorithm builds a data dictionary • Code emitted if pattern found in dictionary, while if not already in dictionary, it is added • Not necessary to have dictionary to do decompression Data Compression

  13. Lempel-Ziv-Welch (LZW) • History • 1977 • Abraham Lempel and Jakob Ziv published a paper on a universal data compression algorithm • Called LZ77 • 1978 • Lempel and Ziv formulated an improved, dictionary-based data compression algorithm • Called LZ78 Data Compression

  14. Lempel-Ziv-Welch (LZW) • History • 1981 • While working for Sperry, Lempel and Ziv, with some other researchers filed for a patent for LZ78 • Granted in 1984 • 1984 • While working for Sperry, Terry Welch modified LZ78 • Result was LZW algorithm • Published in IEEE Computer Data Compression

  15. Lempel-Ziv-Welch (LZW) • History • 1985 • Sperry granted a patent for Welch’s modification and for implementation of LZW • 1986 • Sperry and Burroughs merged to form Unisys • Ownership of Sperry patent transferred to Unisys Data Compression

  16. Lempel-Ziv-Welch (LZW) • History • 1987 • CompuServe created GIF file format • Required use of LZW algorithm • Didn’t check patents for LZW • Unisys also didn’t realize GIF used LZW 1988 • Aldus released Revision 5.0 of TIFF file format • Used LZW algorithm • 1990 • Unisys licensed Adobe for use of LZW patent for PostScript Data Compression

  17. Lempel-Ziv-Welch (LZW) • History • 1991 • Unisys licensed Aldus for use of LZW patent in TIFF • 1993 • Unisys became aware the GIF file format used LZW • Negotiations began with CompuServe Data Compression

  18. Lempel-Ziv-Welch (LZW) • History • 1994 • Unisys and CompuServe came to an understanding that LZW algorithm by CompuServe would be licensed for the application of the GIF file format in software used primarily to access the CompuServe Information Service • 1995 • America Online and Prodigy also entered into license agreements with Unisys for LZW Data Compression

  19. Lempel-Ziv-Welch (LZW) • GIF is not in public domain • Some people were suspicious regarding the announcement of CompuServe that it was getting a license from Unisys • In programming community it was known for many years prior to this that GIF used LZW and that LZW was patented by Unisys Data Compression

  20. Lempel-Ziv-Welch (LZW) • Some people were suspicious regarding the announcement of CompuServe that it was getting a license from Unisys • Unisys claimed that CompuServe only found out rather late that this was the case • GIF was becoming an integral part of WWW for exchanging low-resolution graphics Data Compression

  21. Lempel-Ziv-Welch (LZW) • Eventually, Unisys’ LZW patent and licensing agreements held • Unisys reduced license fees after 1995 • Unisys wouldn’t charge anything for inadvertent infringement by GIF software products delivered prior to 1995 • License fees still required for updates delivered after 1995 Data Compression

  22. Lempel-Ziv-Welch (LZW) • Not illegal to own, transmit, or receive GIF files, just to compress or decompress them without a license Data Compression

  23. offset = 0 3 1 2 5 1 3 1 4 1 2 5 1 5 5 1 5 5 1 4 Search buffer Lookahead buffer length = 0 Output is (0, 0, code(4)) Lempel-Ziv-Welch (LZW) Data Compression

  24. offset = 7 3 1 2 5 1 3 1 4 1 2 5 1 5 5 1 5 5 1 4 Search buffer Lookahead buffer length = 4 Output is (7, 4, code(5)) Lempel-Ziv-Welch (LZW) Data Compression

  25. offset = 3 3 1 2 5 1 3 1 4 1 2 5 1 5 5 1 5 5 1 4 Search buffer Lookahead buffer length = 5 Output is (3, 5, code(4)) Lempel-Ziv-Welch (LZW) Data Compression

  26. JPEG • Joint Photographic Experts Group • 1982 • ISO (International Standard Organization) formed Photographic Experts Group (PEG) • Develop methods of transmitting video, images and text over ISDN (Integrated Services Digital Network) lines Data Compression

  27. JPEG • 1986 • Subgroup of CCITT (International Telegraph and Telephone Consultative Committee) began to look at methods of compressing color and gray-scale data for fax transmission • Methods for this were similar to those being considered by PEG Data Compression

  28. JPEG • 1987 • Two groups combined into JPEG • Most previous compression methods did poor job of compressing continuous-tone image data Data Compression

  29. JPEG • Very few file formats can support 24-bit raster images • GIF only works for 256 colors • LZW doesn’t work well on scanned image data • TIFF and BMP didn’t compress this type of image data very well Data Compression

  30. JPEG • JPEG compresses continuous tone image data with a pixel depth of 6-24 bits with good efficiency • JPEG itself doesn’t define standard file format Data Compression

  31. JPEG • Toolkit of methods with quality-compression trade-off • Lossy • Discards information that human eye cannot easily see • Slight changes in color not perceived well • Slight changes in intensity are well perceived Data Compression

  32. JPEG • Works well with color or gray-scale continuous tone images: photographs, video stills, complex graphics which resemble natural objects • Doesn’t work well for animations, ray tracing, line art, black-and-white documents, and typical vector graphics Data Compression

  33. JPEG • End-user can tune quality of JPEG encoder through use of Q-factor, which ranges from 1-100 • Q-factor = 1 produces smallest, worst quality images • Q-factor = 100 produces largest, best quality images • Optimal value of Q-factor is image dependent Data Compression

  34. JPEG • JPEG introduces artifacts in images containing large areas of a single color • JPEG is slow if implemented in software • Baseline JPEG • Minimal subset of JPEG which all JPEG-aware applications are required to support Data Compression

  35. JPEG Data Compression

  36. JPEG • Color transform • Encodes each component in a color model separately • Is independent of any color space model Data Compression

  37. JPEG • Color transform • Best compression ratios result if a luminance (gray scale)/chrominance (color) color space, such as YUV, is used • Human eyes more sensitive to luminance information (Y) than to chrominance information (U, V) • The other models spread human sensitive information across each of their 3 components Data Compression

  38. JPEG • Down-sampling • Average groups of pixels together • To exploit human’s lesser sensitivity to chrominance information, we use fewer pixels for the chrominance channels • In an image of 1000  1000 pixels, we might use 1000  1000 luminance pixels, but only 500  500 chrominance pixels • Each chrominance pixel covers the same area as a 2  2 block of luminance pixels Data Compression

  39. JPEG • Down-sampling • For each 2  2 block, we can store 6 pixel values 4 luminance values and 2 chrominance values [1 for each of 2 channels] instead of 12 4 pixel values for each of 3 channels • This 50% reduction in data has almost no perceivable effect Data Compression

  40. JPEG • Discrete cosine transform • For each color channel, the image data is divided into 8  8 blocks • DCT applied to each block • Low-order, or DC, term represents average value in the block • Successive higher-order, or AC, terms represent the strength of more rapid changes across the block Data Compression

  41. JPEG • Discrete cosine transform • Can discard high-frequency data • DCT is lossless except for roundoff errors • DCT is most costly step in JPEG Data Compression

  42. JPEG • Scan-order of each 8  8 block of pixels for DCT Data Compression

  43. JPEG • An 8  8 block from an 8 bit image Data Compression

  44. AC coefficients DC coefficient JPEG • The DCT coefficients corresponding to the previous 8  8 block Data Compression

  45. JPEG • Quantization • Divide DCT output by a quantization coefficient and round result to integer • The larger the coefficient, the more data is lost • Each of the 64 positions of the DCT output block has its own coefficient • Higher order terms have a larger coefficient • Different coefficients for luminance and chrominance channels Data Compression

  46. JPEG • Quantization • This is the step controlled by the quality-factor • Selecting quantization coefficients is an art Data Compression

  47. JPEG • Sample quantization table • Coefficients based on human perception Data Compression

  48. JPEG • Labels • Label labij corresponding to the quantized value of the transform coefficient cij is where Qij is the (i,j)th element of the quantization table Data Compression

  49. JPEG • Quantizer labels corresponding to the previous 8  8 block Data Compression

  50. Encoding • Huffman compress resulting coefficients • Can use arithmetic coding as well Data Compression

More Related