1 / 47

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9. CS479/679 Pattern Recognition Dr. George Bebis. Expectation-Maximization (EM). EM is an iterative method to perform ML estimation: Starts with an initial estimate for θ .

rigg
Download Presentation

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Expectation-Maximization (EM)Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern RecognitionDr. George Bebis

  2. Expectation-Maximization (EM) • EM is an iterative method to perform ML estimation: • Starts with an initial estimate for θ. • Refines the current estimate iteratively to increase the likelihood of the observed data: p(D/ θ)

  3. Expectation-Maximization (EM) • EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) • Some creativity is required to recognize where the EM algorithm can be used. • Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

  4. The Case of Incomplete Data • Many times, it is impossible to apply ML estimation because we can not measure all the features or certain feature values are missing. • The EM algorithm is ideal for problems with unobserved (missing) data.

  5. Example (Moon, 1996) x1+x2+x3=n assume a trinomial distribution: x1!x2!x3!

  6. Example (Moon, 1996) (cont’d)

  7. EM: Main Idea • If x was available, we could use ML to estimate θ, i.e., • Since x is not available: Maximize the expectation of p(Dx /θ) with respect to the unknown variables given y and an estimate of θ.

  8. EM Steps (1) Initialization (2) Expectation (3) Maximization (4) Test for convergence

  9. EM Steps (cont’d) (1) Initialization Step: initialize the algorithm with a guess θ0 (2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations • When ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:

  10. EM Steps (cont’d) (3)Maximization Step: provides a new estimate of the parameters. (4) Test for Convergence: stop; otherwise, go to Step 2. if

  11. Example (Moon, 1996) (cont’d) • Suppose: x1!x2!x3!

  12. Example (Moon, 1996) (cont’d) • Take expected value: Let’s look at the M-step before completing the E-step …

  13. Example (Moon, 1996) (cont’d) 2 • We only need to estimate: Let’s complete the E-step now …

  14. Example (Moon, 1996) (cont’d) (see Moon’s paper, page 53, for a proof)

  15. Example (Moon, 1996) (cont’d) • Initialization: θ0 • Expectation Step: • Maximization Step: • Convergence Step: 2

  16. Example (Moon, 1996) (cont’d) θt

  17. Convergence properties of EM • The solution depends on the initial estimate θ0 • At each iteration, a value of θis computed so that the likelihood function does not decrease. • There is no guarantee that it will convergence to a global maximum. • The algorithm is guaranteed to be stable. • i.e., there is no chance of "overshooting" or diverging from the maximum.

  18. Mixture of 2D Gaussians - Example

  19. Mixture of 1D Gaussians - Example π2=0.2 π1=0.3 π3=0.5

  20. Mixture Model π1 πk π3 π2

  21. Mixture Parameters

  22. Fitting a Mixture Model toa set of observations Dx • Two fundamental issues: (1) Estimate the number of mixture components K (2) Estimate mixture parameters (πk ,θk), k=1,2,…,K

  23. Mixtures of Gaussians(see Chapter 10) where each p(x/θ)= • The parameters θk are (μk,Σk)

  24. Data Generation Process Assuming Mixtures of Gaussians π1 πk π3 π2

  25. Estimating Mixture Parameters Using ML – difficult!

  26. Estimating Mixture Parameters Using EM: Case of Unknown Means • Assumptions

  27. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Introduce hidden or unobserved variables zi

  28. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Main steps using EM

  29. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step

  30. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step

  31. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step

  32. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step E(zik) is just the probability that xi was generated by the k-th component:

  33. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Maximization Step

  34. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Summary

  35. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Summary

  36. Estimating Mixture Parameters Using EM: General Case • Need to review Lagrange Optimization first …

  37. Lagrange Optimization solve for x and λ g(x)=0 n+1 equations / n+1 unknowns

  38. Lagrange Optimization (cont’d) • Example Maximize f(x1,x2)=x1x2subject to the constraint g(x1,x2)=x1+x2-1=0 3 equations / 3 unknowns

  39. Estimating Mixture Parameters Using EM: General Case • Introduce hidden variables

  40. Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step

  41. Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step (cont’d)

  42. Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step (cont’d)

  43. Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step use Lagrange optimization

  44. Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d)

  45. Estimating Mixture Parameters Using EM: General Case (cont’d) • Summary

  46. Estimating Mixture Parameters Using EM: General Case (cont’d) • Summary

  47. Estimating the Number of Components K

More Related