Chapter 6

Chapter 6 Entropy and Shannon’s First Theorem

A quantitative measure of the amount of information any event represents. I(p) = the amount of information in the occurrence of an event of probability p. Axioms: A. I(p) ≥ 0 for any event p B. I(p1∙p2) = I(p1) + I(p2) p1 & p2 are independent events C. I(p) is a continuous function of p Information single symbol source Cauchy functional equation units of information: in base 2 = a bit in base e = a nat in base 10 = a Hartley Existence: I(p) = log_(1/p) 6.2

Uniqueness: Suppose I′(p) satisfies the axioms. Since I′(p) ≥ 0, take any 0 < p0 < 1, any base k = (1/p0)(1/I′(p0)). So kI′(p0) = 1/p0, and hence logk (1/p0) = I′(p0). Now, any z (0,1) can be written as p0r, r a real number  R+ (r = logp0z). The Cauchy Functional Equation implies that I′(p0n) = nI′(p0) and m  Z+, I′(p01/m) = (1/m) I′(p0), which gives I′(p0n/m)= (n/m) I′(p0), and hence by continuity I′(p0r) = rI′(p0). Hence I′(z) = r∙logk(1/p0) = logk (1/p0r) = logk(1/z). ⁪ Note: In this proof, we introduce an arbitrary p0, show how any z relates to it, and then eliminate the dependency on that particular p0. 6.2

Entropy The average amount of information received on a per symbol basis from a source S = {s1, …, sq} of symbols, si has probability pi. It is measuring the information rate. In radix r, when all the probabilities are independent: • Entropy is amount of information in probability distribution. • Alternative approach: consider a long message of N symbols from S= {s1, …, sq} with probabilities p1,…,pq. You expect si to appear Npi times, and the probability of this typical message is: 6.3

Consider f(p) = p ln (1/p): (works for any base, not just e) f′(p) = (-p ln p)′ = -p(1/p) – ln p = -1 + ln (1/p) f″(p) = p(-p-2) = - 1/p< 0 for p (0,1)  f is concave down f′(1/e) = 0 f(1/e) = 1/e 1/e f f′(1) = -1 f′(0) = ∞ 1/e 1 0 p f(1) = 0 6.3

y = x 1 ln x Basic information about logarithm function 0 x 1 -1 Tangent line to y = ln x at x = 1 (y ln 1) = (ln)′x=1(x  1)  y = x 1 (ln x)″ = (1/x)′ = -(1/x2) < 0 x  ln x is concave down. Conclusion: ln x  x  1 6.4

Fundamental Gibbs inequality • Minimum Entropy occurs when one pi = 1 and all others are 0. • Maximum Entropy occurs when? Consider • Hence H(S) ≤ log q, and equality occurs only when pi = 1/q. 6.4

Entropy Examples S = {s1} p1 = 1 H(S) = 0 (no information) S = {s1,s2} p1 = p2 = ½ H2(S) = 1 (1 bit per symbol) S = {s1, …, sr} p1 = … = pr = 1/r Hr(S) = 1 but H2(S) = log2r. • Run length coding (for instance, in binary predictive coding): p = 1  q is probability of a 0. H2(S) = p log2(1/p) + q log2(1/q) As q 0 the term q log2(1/q) dominates (compare slopes). C.f. average run length = 1/q and average # of bits needed = log2(1/q). So q log2(1/q) = avg. amount of information per bit of original code.

Entropy as a Lower Bound for Average Code Length Given an instantaneous code with length li in radix r, let By the McMillan inequality, this hold for all uniquely decodable codes. Equality occurs when K = 1 (the decoding tree is complete) and 6.5

Shannon-Fano Coding Simplest variable length method. Less efficient than Huffman, but allows one to code symbol si with length li directly from probability pi. li = logr(1/pi) Summing this inequality over i: Kraft inequality is satisfied, therefore there is an instantaneous code with these lengths. 6.6

Example: p’s: ¼, ¼, ⅛, ⅛, ⅛, ⅛ l’s: 2, 2, 3, 3, 3, 3 K = 1 0 1 L = 5/2 H2(S) = 2.5 0 1 0 1 0 1 0 1 6.6

The Entropy of Code Extensions concatenation multiplication Recall: The nth extension of a source S = {s1, …, sq} with probabilities p1, …, pq is the set of symbols T = Sn = {si1∙∙∙ sin : sij S 1  j  n} where ti= si1∙∙∙ sin has probability pi1∙∙∙ pin = Qiassuming independent probabilities. Let i = (i1−1, …, in−1)q + 1, an n-digit number base q. The entropy is: [] 6.8

H(Sn) = n∙H(S) Hence the average S-F code length Ln for Tsatisfies: H(T)  Ln < H(T) + 1  n ∙ H(S)  Ln < n ∙ H(S) + 1  H(S)  (Ln/n) < H(S) + 1/n [now let n go to infinity] 6.8

Extension Example S = {s1, s2} H2(S) = (2/3)log2(3/2) + (1/3)log2(3/1) p1 = 2/3p2= 1/3~ 0.9182958 … Huffman: s1 = 0 s2 = 1 Avg. coded length = (2/3)∙1+(1/3)∙1 = 1 Shannon-Fano: l1 = 1 l2 = 2 Avg. length = (2/3)∙1+(1/3)∙2 = 4/3 2nd extension: p11 = 4/9p12 = 2/9 = p21p22 = 1/9S-F: l11 = log2 (9/4) = 2 l12 = l21 = log2 (9/2) = 3 l22 = log2 (9/1) = 4 LSF(2) = avg. coded length = (4/9)∙2+(2/9)∙3∙2+(1/9)∙4 = 24/9 = 2.666… Sn = (s1 + s2)n, probabilities are corresponding terms in (p1 + p2)n 6.9

Extension cont. 2n 3n-1 * (2 + 1)n= 3n 6.9

Markov Process Entropy 6.10

.8 previousstate nextstate Example 0, 0 .2 .5 .5 1, 0 0, 1 .5 .5 .2 1, 1 .8 equilibrium probabilities:p(0,0) = 5/14 = p(1,1) p(0,1) = 2/14 = p(1,0) 6.11

The Fibonacci numbers Let f0= 1 f1 = 2 f2 = 3 f3 = 5 f4 = 8 , …. be defined by fn+1= fn + fn−1.The = the golden ratio, a root of the equation x2= x+ 1. Use these as the weights for a system of number representation with digits 0 and 1, without adjacent 1’s (because (100)phi = (11)phi).

Base Fibonacci Representation Theorem: every number from 0 to fn − 1 can be uniquely written as an n-bit number with no adjacent one’s . Existence: Basis: n = 0 0 ≤ i ≤ 0. 0 = (0)phi = ε Induction: Let 0 ≤ i ≤ fn+1 If i < fn , we are done by induction hypothesis. Otherwise, fn ≤ i < fn+1 = fn−1 + fn , so 0 ≤ (i − fn) < fn−1, and is uniquely representable by i − fn = (bn−2 … b0)phi with bi in {0, 1} ¬(bi = bi+1 = 1). Hence i = (10bn−2 … b0)phi which also has no adjacent ones. Uniqueness: Let i be the smallest number ≥ 0 with two distinct representations (no leading zeros). i = (bn−1 … b0)phi = (b′n−1 … b′0)phi . By minimality of ibn−1 ≠ b′n−1 , and so without loss of generality, let bn−1 = 1 b′n−1 = 0, implies (b′n−2 … b′0)phi ≥ fn−1 which can’t be true.

The golden ratio = (1+√5)/2 is a solution to x2− x − 1 = 0 and is equal to the limit of the ratio of adjacent Fibonacci numbers. 0 … r− 1 1/r H2 = log2r 1/2 0 Base Fibonacci 1/ 0 1 1st order Markov process: 1/ 1 1/2 0 1/ Think of source as emitting variable length symbols: 0 10 1/1/2 1 0 1/ + 1/2 = 1 1/2 Entropy = (1/)∙log  + ½(1/²)∙log ² = log which is maximal take into account variable length symbols

Chapter 6

Chapter 6

Presentation Transcript

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

CHAPTER 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

CHAPTER 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6