300 likes | 449 Views
CPSC 7373: Artificial Intelligence Lecture 12: Hidden Markov Models and Filters. Jiang Bian, Fall 2012 University of Arkansas at Little Rock. Hidden Markov Models. Hidden Markov Models (HMMs) t o analyze, or t o predict t ime series Applications: Robotics Medical Finance Speech and
E N D
CPSC 7373: Artificial IntelligenceLecture 12: Hidden Markov Models and Filters Jiang Bian, Fall 2012 University of Arkansas at Little Rock
Hidden Markov Models • Hidden Markov Models (HMMs) • to analyze, or • to predict • time series • Applications: • Robotics • Medical • Finance • Speech and • Language Technologies • etc.
Bayes Network of HMMs • HMMs • A sequence of states that evolves over time; • Each state only depends on the previous state; and • Each state emits a measurement • Filters • Kalman Filters • Particle Filters S1 S2 S3 Sn . . . Zn Z1 Z2 Z3
Markov Chain The initial state: P(R0) = 1 P(S0) = 0 0.4 R S 0.8 0.6 0.2 P(R1) = ??? P(R2) = ??? P(R3) = ???
Markov Chain The initial state: P(R0) = 1 P(S0) = 0 0.4 R S 0.8 0.6 0.2 P(R1) = 0.6 P(R1|R0) * P(R0) = 0.6 P(R2) = 0.44 P(R2) = P(R2|R1) * P(R1) + P(R2|S1)*P(S1) = 0.44 P(R3) = 0.376 P(R3) = P(R3|R2) * P(R2) + P(R3|S2)*P(S2) = 0.376
Markov Chain 0.5 • P(A0) = 1 • P(A1) = ??? • P(A2) = ??? • P(A3) = ??? A B 0.5 1
Markov Chain 0.5 • P(A0) = 1 • P(A1) = 0.5 • P(A2) = 0.75 • P(A3) = 0.625 A B 0.5 1
Stationary Distribution 0.5 • P(A1000) = ??? • Stationary Distribution: • P(At) = P(At-1) • P(At|At-1)*P(At-1) +P(At|Bt-1)*P(Bt-1) • P(At) = X • X = 0.5 * X + 1 * (1- X) • X = 2/3 = P(A) • P(B) = 1 – P(A) = 1/3 A B 0.5 1
Stationary Distribution • ??? 0.4 R S 0.8 0.6 0.2
Stationary Distribution • 1/3 • X = P(Rt) = P(Rt-1) • = 0.6X + 0.2(1-x) • 0.6x = 0.2 • X = 1/3 0.4 R S 0.8 0.6 0.2
Transition Probabilities ? • Finding the transition probabilities from observation. • e.g., R, S, S, S, R, S, R (7 days) • Maximum Likelihood • P(R0) = 1 • P(S|R) = 1, P(R|R) = 0 • P(S|S) = 0.5, P(R|S) = 0.5 R S ? ? ?
Transition Probabilities - Quiz ? • Finding the transition probabilities from observation. • e.g., S, S, S, S, S, R, S, S, S, R, R • Maximum Likelihood • P(R0) = ??; P(S0) = ?? • P(S|R) = ??, P(R|R) = ?? • P(S|S) = ??, P(R|S) = ?? R S ? ? ?
Transition Probabilities - Quiz ? • Finding the transition probabilities from observation. • e.g., S, S, S, S, S, R, S, S, S, R, R • Maximum Likelihood • P(R0) = 0; P(S0) = 1 • P(S|R) = 0.5, P(R|R) = 0.5 • P(S|S) = 0.75, P(R|S) = 0.25 R S ? ? ?
Laplacian Smoothing ? • Laplacian smoothing; k = 1 • P(R0) = ??; P(S0) = ?? • P(S|R) = ??, P(R|R) = ?? • P(S|S) = ??, P(R|S) = ?? R S ? ? ?
Laplacian Smoothing ? • R, S, S, S, S • Laplacian smoothing; k = 1 • P(R0) = 2/3; P(S0) = 1/3 • P(S|R) = 4/5, P(R|R) = 1/5 • P(S|S) = 2/3, P(R|S) = 1/3 R S ? ? ?
Hidden Markov Models • Suppose that I can’t observe the weather (e.g., I am grounded with no windows); • But I can make a guess based on whether my wife carries a umbrella; and • P(U|R) = 0.9; P(-U|R) = 0.1 • P(U|S) = 0.2; P(-U|S) = 0.8 • P(R1|U1) = P(U1|R1) * P(R1) / P(U1) = 0.9 * 0.4 / (0.4*0.9 + 0.2 *0.6) = 0.75 • P(R1) = P(R1|R0)P(R0) + P(R1|S0) P(S0) = 0.4 • P(R0) = ½ • P(S0) = ½ 0.4 R S 0.8 0.6 0.2 U U
Specification of an HMM • N - number of states • Q = {q1; q2; : : : ;qT} - set of states • M - the number of symbols (observables) • O = {o1; o2; : : : ;oT} - set of symbols • A - the state transition probability matrix • aij= P(qt+1 = j|qt= i) • B- observation probability distribution • bj(k) = P(ot= k|qt= j) i≤ k ≤ M • π - the initial state distribution
Specification of an HMM • Full HMM is thus specified as a triplet: • λ = (A,B,π)
Central problems in HMM modelling • Problem 1 Evaluation: • Probability of occurrence of a particular observation sequence, O = {o1,…,ok}, given the model • P(O|λ) • Complicated – hidden states • Useful in sequence classification
Central problems in HMM modelling • Problem 2 Decoding: • Optimal state sequence to produce given observations, O = {o1,…,ok}, given model • Optimality criterion • Useful in recognition problems
Central problems in HMM modelling • Problem 3 Learning: • Determine optimum model, given a training set of observations • Find λ, such that P(O|λ) is maximal
Particle Filter Algorithm • Algorithm particle_filter( St-1, ut-1zt): • For Generate new samples • Sample index j(i) from the discrete distribution given by wt-1 • Sample from using and • Compute importance weight • Update normalization factor • Insert • For • Normalize weights
Particle Filters • Pros: • easy to implement • Work well in many applications • Cons: • Don’t work in high dimension spaces • Problems with degenerate conditions. • Very few particles, and • Not much noise in either measurements or controls.