1 / 25

Chapter 5

Chapter 5. A Measure of Information. Outline. 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties of the uncertainty function 5.4 Entropy and Coding 5.5 Shannon-Fano Coding. 5.1 Axioms for the uncertainty measure.

huy
Download Presentation

Chapter 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5 A Measure of Information

  2. Outline • 5.1 Axioms for the uncertainty measure • 5.2 Two Interpretations of the uncertainty function • 5.3 Properties of the uncertainty function • 5.4 Entropy and Coding • 5.5 Shannon-Fano Coding

  3. 5.1 Axioms for the uncertainty measure x : discrete random variable x1x2... xM p1p2... pM h(p): the uncertainty of an event with probability p h(pi): the uncertainty of { x = xi} The average uncertainty of x: If p1 = p2 =...= pM= , we say

  4. Axiom 1: f(M) should be a monotonically increasing function of M, that is, M<M ’ implies f(M)<f(M ’) For example, f(2)<f(6) • Axiom 2: X: (x1, . . ., xM) Y: (y1, . . ., yL) (X,Y): Joint experiment has M.L equally likely outcome. f(M.L) = f(M) + f(L) independent

  5. Axiom 3 (Group Axiom): X = (x1, x2, . . . , xr, xr+1, . . . , xM ) Construct a compound experiment X1 A Xr X Xr+1 B XM

  6. A B

  7. Axiom 5: H(p,1-p) is a continuous function of p, i.e., a small change in p will correspond to a small change in uncertainty. • We can use four axioms above to find the H function. • Thm 5.1: The only function satisfying the four given axioms is H(p1, . . . , PM)= , where C > 0 and the logarithm base > 1

  8. For example, C = 1, and base = 2 H(p,1-p) Coin : { tail, head } 1 Max. uncertainty ½ ½ ▪ ▪ ▪ 1 0 Min. uncertainty 0 1 ½

  9. 5.2 Two Interpretations of the uncertainty function • (1) H(p1, . . . , pM) may be interpreted as the expectation of a random variable W = w(x)

  10. (2) H(p1, . . . , pM) may be interpreted as the min average number of ‘yes’ ‘no’ questions required to specify the values of x For example, H(x) = H( 0.3 , 0.2 , 0.2 , 0.15 , 0.15 ) = 2.27 x1 x2 x3 x4 x5 x1 Y x=x1? Y N x2 Does x=x1 or x2? N x3 Y x=x3? x4 Y N x=x4 N x5

  11. Avg # of q = 2·0.7 + 3·0.3 = 2.3 > 2.27 H.W. : X = { x1, x2 } p(x1) = 0.7 p(x2) = 0.3 How many questions (in average) are required to specify the outcome of a joint experiment involving 2 independent observation of x?

  12. 5.3 Properties of the uncertainty function y • Lemma 5.2 Let p1, . . . , pM & q1, . . . , qM be arbitrary positive number with Then y = x -1 y = ln x ln x ≤ x -1 x

  13. Thm 5.3 H(p1, . . . , pM) ≤ log M with equality iff pi =

  14. 5.4 Entropy and Coding • Noiseless Coding Theorem X : x1x2· · · · xM p1 p2· · · · pM Codeword: w1 w2· · · · wM length: n1n2· · · · nM Minimize: Code Alphabet: { a1, a2, …, aD} Ex. D = 2, { 0, 1 }

  15. Thm (Noiseless Coding Thm) • If is the average codeword length of a uniquely decodable code for X, then with equality iff , for i = 1, 2, …, M. • Note: • is the uncertainty of X computed by using the base D.

  16. pf:

  17. A code is called “absolutely optimal” if it achieves the lower bound by the noiseless coding thm. • Ex. H(x) = 7/4 =

  18. 5.5 Shannon-Fano Coding • Select the integer ni s.t. => An instantaneous code can be constructed with the lengths n1, n2, …, nM obtained from Shannon-Fano coding.

  19. Thm: Given a random variable X with uncertainty

  20. In fact, we can always approach the lower bound as closely as desired if we are allowed to use “block coding”. • Take a series of observation of X Let Y = (x1, x2, …, xs) Assign a codeword to Y => Block coding decrease the average codeword length per value of X

  21. Ex. But H(X) = 0.88129 H(p), p = 0.3 or p = 0.7 look up table

  22. How do we find the actual code symbols? • We simply assign them in order. • By S-F coding: • We then assign

  23. How bad is Shannon-Fano Coding?

More Related