1 / 15

Lossless Compression - II

Lossless Compression - II. Hao Jiang Computer Science Department Sept. 18, 2007. Properties of Huffman Coding. Huffman coding uses longer codewords for symbols with smaller probabilities and shorter codewords for symbols that often occur.

tadhg
Download Presentation

Lossless Compression - II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007

  2. Properties of Huffman Coding • Huffman coding uses longer codewords for symbols with smaller probabilities and shorter codewords for symbols that often occur. • The two longest codewords differ only in the last bit. • The codewords are prefix codes and uniquely decodable. • H · Average Codeword Length < H + 1

  3. Extended Huffman Coding • Huffman coding is not effective for cases when there are small number of symbols and the probabilities are highly skewed. • Example: A source has 2 symbols a and b. P(a) = 0.9 and P(b) = 0.1. H = 0.4690 For Huffman Coding, average codeword length is 1. (far from optimal !)

  4. Extended Huffman Coding (cont) • We can encode a group symbols together and get better performance. • For the previous example, an extended source has symbols {aa, ab, ba, aa} and P(aa) = P(a)*P(a) = 0.81 => 1 P(ab) = P(a)*P(b) = 0.09 => 00 P(bb) = P(b)*P(b) = 0.09 => 011 P(bb) = P(a)*P(b) = 0.01 => 010 Now the average codeword length per symbol is 0.6450 (much better!).

  5. Extended Huffman Coding (cont) 1223231212 P(1) = 0.3 p(2) = 0.5 P(3) = 0.2 Codewords: 1 -> 10 2 -> 0 3-> 11 Average codeword length = 2 * 0.3 + 1 * 0.5 + 2 * 0.2 = 1.5 P(12) = 0.6 P(23) = 0.4 codewords: 12 -> 0 23 -> 1 Average codeword length = (1 * 0.6 + 1 * 0.4)/2 = 0.5 In the second case, the average codeword length is smaller than the entropy of single symbol one. Is this right?

  6. Dictionary Based • Dictionary based method is another way to capture the correlation of symbols. • Static dictionary • Good when the data to be compressed is specific in some application. • For instance, to compress a student database, the world “Name”, “Student ID” will often appear. • Static dictionary method does not work well if the source characteristics change.

  7. Adaptive Dictionary • LZ77 (Jacob Ziv and Abraham Lempel 1977) encoder Step n: a b c a a c d a a b c d a b b b a b Longest match string length = 3 Match position 8 If No match, <0, 0, c(x)> Codeword generated is <8, 3, c(d)> Step n+1: a b c a a c d a a b c d a b b b a b Codeword generated is <4, 2, c(b)>

  8. LZ77 Decoder a b c a a c d a a b c d Codeword generated is <8, 3, c(d)> a b c a a c d a a b c d Then move the window by 4 characters and repeat.

  9. A Special Case c d d c d c a b a b a b a d b b a b The output codeword is < 2, 5, d>

  10. LZ78 • LZ78 uses an explicit dictionary. Encoding Process Example: Input: a b c b a b a a a

  11. LZ78 Decoding Example

  12. LZW • Encoder s = next input character; While not EOF { c = next input character; if s + c is in the directory s = s + c; else { output the codeword for s; add s+c to the directory; s = c; } } Output code for s

  13. LZW encoding example The input string: a b a b b a b c a b EOF

  14. LZW Decoder s = empty string; While ( (k = next input code) != EOF ) { entry = dictionary entry for k; if (k is not in the dictionary) entry = s + s[0]; output entry; if (s is not empty) add string (s+entry[0]) to dictionary; s = entry; }

  15. LZW Decoding example: The input string: 1 2 4 5 2 3 4 EOF

More Related