1 / 33

Class 7: Hidden Markov Models

Class 7: Hidden Markov Models. Sequence Models. So far we examined several probabilistic models sequence models These model, however, assumed that positions are independent This means that the order of elements in the sequence did not play a role

Download Presentation

Class 7: Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class 7:Hidden Markov Models .

  2. Sequence Models • So far we examined several probabilistic models • sequence models • These model, however, assumed that positions are independent • This means that the order of elements in the sequence did not play a role • In this class we learn about probabilistic models of sequences

  3. Probability of Sequences • Fix an alphabet  • Let X1,…,Xn be a sequence of random variables over  • We want to model P(X1,…,Xn)

  4. Markov Chains Assumption: • Xi+1 is independent of the past once we know Xi This allows us to write:

  5. Markov Chains (cont) Assumption: • P(Xi+1|Xi) is the same for all i Notation P(Xi+1=b |Xi=a ) = Aab • By specifying the matrix A and initial probabilities, we define P(X1,…,Xn) • To avoid the special case of P(X1), we can use a special start state, and denote P(X1 = a) = Asa

  6. Example: CpG islands • In human genome, CpG dinucleotides are relatively rare • CpG pairs undergo a process called methylation that modifies the C nucleotide • A methylated C can (with relatively high chance) mutate to a T • Promotor regions are CpG rich • These regions are not methylated, and thus mutate less often • These are called CpG islands

  7. CpG Islands • We construct Markov chain for CpG rich and poor regions • Using maximum likelihood estimates from 60K nucleotide, we get two models

  8. Ratio Test for CpG islands • Given a sequence X1,…,Xnwe compute the likelihood ratio

  9. Empirical Evalation

  10. Finding CpG islands Simple Minded approach: • Pick a window of size N(N = 100, for example) • Compute log-ratio for the sequence in the window, and classify based on that Problems: • How do we select N? • What do we do when the window intersects the boundary of a CpG island?

  11. Alternative Approach • Build a model that include “+” states and “-” states • A state “remembers” last nucleotide and the type of region • A transition from a - state to a + describes a start of CpG island

  12. Hidden Markov Models Two components: • A Markov chain of hidden statesH1,…,Hn with L values • P(Hi+1=k |Hi=l ) = Alk • ObservationsX1,…,Xn • Assumption: Xidepends only on hidden state Hi • P(Xi=a |Hi=k ) = Bka

  13. Semantics

  14. Example: Dishonest Casino

  15. Computing Most Probable Sequence Given:x1,…,xn Output: h*1,…,h*n such that

  16. Idea: • If we know the value of hi, then the most probable sequence on i+1,…,n does not depend on observations before time i • Let Vi(l) be the probability of the best sequence h1,…,hi such that hi = l

  17. Dynamic Programming Rule • so

  18. Viterbi Algorithm • Set V0(0) = 1, V0(l) = 0 for l > 0 • for i= 1, …, n • for l = 1,…,L • set • Let h*n = argmaxl Vn(l) • for i = n-1,…,1 • set h*i = Pi+1(h*i+1)

  19. Viterbi Algorithm – Example

  20. Computing Probabilities Given:x1,…,xn Output: P(x1,…,xn ) How do we sum exponential number of hidden sequences?

  21. Forward Algorithm • Perform dynamic programming on sequences • Let fi(l) = P(x1,…,xi,Hi=l) • Recursion rule: • Conclusion

  22. Computing Posteriors • How do we compute P(Hi | x1,…,xn) ?

  23. Backward Algorithm • Perform dynamic programming on sequences • Let bi(l) = P(xi+1,…,xn|Hi=l) • Recursion rule: • Conclusion

  24. Computing Posteriors • How do we compute P(Hi | x1,…,xn) ?

  25. Dishonest Casino (again) • Computing posterior probabilities for “fair” at each point in a long sequence:

  26. Learning Given a sequence x1,…,xn, h1,…,hn • How do we learn Akl and Bka ? • We want to find parameters that maximize the likelihoodP(x1,…,xn,h1,…,hn) We simply count: • Nkl - number of times hi=k & hi+1=l • Nka - number of times hi=k & xi = a

  27. Learning Given only sequence x1,…,xn • How do we learn Akl and Bka ? • We want to find parameters that maximize the likelihoodP(x1,…,xn) Problem: • Counts are inaccessible since we do not observe hi

  28. If we have Akl and Bka we can compute

  29. Expected Counts • We can compute expected number of times hi=k & hi+1=l • Similarly

  30. Expectation Maximization (EM) • Choose Akl and Bka E-step: • Compute expected counts E[Nkl], E[Nka] M-Step: • Restimate: • Reiterate

  31. EM - basic properties • P(x1,…,xn: Akl, Bka)  P(x1,…,xn: A’kl, B’ka) • Likelihood grows in each iteration • If P(x1,…,xn: Akl, Bka) = P(x1,…,xn: A’kl, B’ka)then Akl, Bka is a stationary point of the likelihood • either a local maxima, minima, or saddle point

  32. Complexity of E-step • Compute forward and backward messages • Time & Space complexity: O(nL) • Accumulate expected counts • Time complexity O(nL2) • Space complexity O(L2)

  33. EM - problems Local Maxima: • Learning can get stuck in local maxima • Sensitive to initialization • Require some method for escaping such maxima Choosing L • We often do not know how many hidden values we should have or can learn

More Related