Hidden Markov Model Continues …

Hidden Markov ModelContinues …

Finite State Markov Chain • A discrete time stochastic process, consisting of a domainD of m states {1,…,m} and • An m dimensional initial distribution vector ( p(1),.., p(m)). • Anm×mtransition probabilities matrix M= (ast) • For each integer L, a Markov Chain assigns probability to sequences (x1…xL) over D (i.e, xiD) as follows: Similarly, (X1,…, Xi ,…)is a sequence of probability distributions over D.

Use of Markov Chains: Sequences with CpG Islands In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (p denotes “pair”).

Modeling sequences with CpG Island The “+” model: Use transition matrix A+ = (a+st), Where: a+st = (the probability that t follows s in a CpG island) The “-” model: Use transition matrix A- = (a-st), Where: a-st = (the probability that t follows s in a non CpG island)

X1 X2 XL-1 XL Stationary means that the transition probability tables do not depend on i. In short: (Stationary) Markov Chains • Every variable xi has a domain. For example, suppose the domain are the letters {a, c, t, g}. • Every variable is associated with a local (transition) probability tablep(Xi = xi | Xi-1= xi-1 ) and p(X1 = x1 ). • The joint distribution is given by

X1 X2 XL-1 XL Question 1: Using two Markov chains For CpG islands: We need to specify pI(xi | xi-1) where I stands for CpG Island. Xi =1 Xi-1 Lines must add up to one; columns need not.

X1 X2 XL-1 XL Question 1: Using two Markov chains For non-CpG islands: We need to specify pN(xi | xi-1) where N stands for Non CpG island. Xi Xi-1 Some entries may or may not change compared to pI(xi | xi-1) .

Question 1: Log Odds-Ratio test Comparing the two options via odds-ratio test yields If logQ > 0, then CpG island is more likely. If logQ < 0, then non-CpG island is more likely.

Question 2: Finding CpG Islands Given a long genomic string with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic): The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters. A+ C+ G+ T+ A- C- G- T- Therefore we use here Hidden Markov Model

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model A Markov chain (s1,…,sL): and for each state s and a symbol x we have p(Xi=x|Si=s) Application in communication: message sent is (s1,…,sm) but we receive (x1,…,xm) . Compute what is the most likely message sent ? Application in speech recognition: word said is (s1,…,sm) but we recorded (x1,…,xm) . Compute what is the most likely word said ?

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model Notations: Markov Chain transition probabilities: p(Si+1= t|Si= s) = ast Emission probabilities: p(Xi = b| Si = s) = es(b) For Markov Chains we know: What is p(s,x) = p(s1,…,sL;x1,…,xL) ?

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model p(Xi = b| Si = s) = es(b), means that the probability of xidepends only on the probability of si. Formally, this is equivalent to the conditional independence assumption: p(Xi=xi|x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL) = esi(xi) Thus

S1 S2 SL-1 SL X1 X2 XL-1 XL The query of interest: K = * * ( s ,..., s ) max arg p ( s ,..., s | x , , x ) 1 1 1 L L L (s ,..., ) s 1 L Hidden Markov Model for CpG Islands Domain(Si)={+, -}  {A,C,T,G} (8 values) The states: In this representation P(xi| si) = 0 or 1 depending on whether xi is consistent with si . E.g. xi= G is consistent with si=(+,G) and with si=(-,G) but not with any other state of si.

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model • Questions: • Given the “visible” sequence x = (x1,…,xL), find: • A most probable (hidden) path. • The probability of x. • For each i = 1, .., L, and for each state k, p(si=k| x)

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL 1. Most Probable state path First Question: Given an output sequence x = (x1,…,xL), Amost probablepaths*= (s*1,…,s*L)is one which maximizes p( s | x ).

s1 s2 si X1 X2 Xi Viterbi’s Algorithm for Most Probable Path The task: compute Let the states be {1, …, m} Idea: for i = 1, …, L and for each state l, compute: vl(i) = the probability p(s1,..,si;x1,..,xi|si=l)of a most probable path up to i, which ends in state l.

Si-1 s1 l ... X1 Xi-1 Xi Viterbi’s algorithm for most probable path vl(i) = the probability p(s1,..,si;x1,..,xi|si=l)of a most probable path up to i, which ends in state l. Exercise:For i = 1,…,L and for each state l:

Result: p(s1*,…,sL*;x1,…,xl) = Viterbi’s algorithm s1 s2 sL-1 sL si 0 X1 X2 XL-1 XL Xi We add the special initial state 0. Initialization: v0(0) = 1 , vk(0) = 0 for k > 0 For i=1 to L do for each state l : vl(i) = el(xi) MAXk {vk(i-1)akl } ptri(l)=arg maxk{vk(i-1)akl} [storing previous state for reconstructing the path] Termination:

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL 2. Computing p(x) Given an output sequence x = (x1,…,xL), Compute the probability that this sequence was generated: The summation taken over all state-paths s generating x.

? ? si X1 X2 Xi Forward algorithm for computing p(x) The task: compute Idea: for i=1,…,L and for each state l, compute: fl(i) = p(x1,…,xi;si=l ), the probability of all the paths which emit (x1, .., xi) and end in state si=l. Use the recursive formula:

Result: p(x1,…,xL) = Forward algorithm for computing p(x) s1 s2 sL-1 sL si 0 X1 X2 XL-1 XL Xi Similar to Viterbi’s algorithm: Initialization: f0(0) := 1 , fk(0) := 0 for k>0 For i=1 to L do for each state l : fl(i) = el(xi) ∑kfk(i-1)akl

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL 3. The distribution of Si, given x Given an output sequence x = (x1,…,xL), Compute for each i=1,…,l and for each state k the probability that si = k ; i.e., p(si=k| x) . This helps to reply queries like: what is the probability that si is in a CpG island, etc.

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi Solution in two stages • For each i and each state k, compute p(si=k | x1,…,xL). 2. Do the same computation for every i = 1, .. , L but without repeating the first task L times.

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi Computing for a single i:

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi Decomposing the computation P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL| x1,…,xi,si) (by the equality p(A,B) = p(A) p(B|A ). P(x1,…,xi,si)=fsi(i) ≡ F(si), so we are left with the task to compute P(xi+1,…,xL| x1,…,xi,si) ≡B(si)

Decomposing the computation s1 s2 Si+1 sL si X1 X2 Xi+1 XL Xi Exercise: Show from the definitions of Markov Chain and Hidden Markov Chain that: P(xi+1,…,xL| x1,…,xi,si) = P(xi+1,…,xL| si) Denote P(xi+1,…,xL| si) ≡ B(si).

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi Decomposing the computation Summary: P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL| x1,…,xi , si) =P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ F(si)·B(si) Equality due to independence of ( {xi+1,…,xL}, and {x1,…,xi} | si ) – by the Exercise.

F(si): The Forward algorithm: s1 s2 sL-1 sL si 0 X1 X2 XL-1 XL Xi Initialization: F(0) = 1 For i=1 to L do for each state l : F(si) = esi(xi)·∑si-1F(si-1)asi-1si The algorithm computes F(si) = P(x1,…,xi,si) for i=1,…,L (namely, considering evidence up to time slot i).

Si Si+1 SL-1 SL Xi+1 XL-1 XL {step i: compute B(si) from B(si+1)} P(xi+1,…,xL|si) = P(si+1 | si) P(xi+1 | si+1) P(xi+2,…,xL| si+1) si+1 B(si) B(si+1) B(si): The backward algorithm The task: Compute B(si) = P(xi+1,…,xL|si) for i=L-1,…,1 (namely, considering evidence after time slot i). {first step, step L-1: Compute B(sL-1).} P(xL| sL-1) = sLP(xL,sL|sL-1) = sL P(sL|sL-1) P(xL|sL)

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi The combined answer 1.To compute the probability that Si=sigiven that {x1,…,xL} run the forward algorithm and compute F(si) = P(x1,…,xi,si), run the backward algorithm to compute B(si) = P(xi+1,…,xL|si), the product F(si)B(si) is the answer (for every possible value si). 2. To compute these probabilities for every si simply run the forward and backward algorithms once, storing F(si) and B(si) for every i (and every value of si). Compute F(si)B(si) for every i.

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi Time and Space Complexity of the forward/backward algorithms Time complexity is O(m2L) where m is the number of states. It is linear in the length of the chain, provided the number of states is a constant. Space complexity is also O(m2L).

Hidden Markov Model Continues …

Hidden Markov Model Continues …

Presentation Transcript

Markov Random Fields

Lecture 12 – Discrete-Time Markov Chains

Markov Decision Processes: A Survey

Stock Returns Predictability using Markov Regime Switching Model

The IS-LM model

A level Physics Hidden Depths

Autoliv Supplier Escalation Model

Markov Logic

The Hidden Charms of Windows PowerShell 3.0 and 4.0

Planning under Uncertainty with Markov Decision Processes: Lecture II

A Revealing Introduction to Hidden Markov Models

From: “The Hidden Teacher” an essay by Loren Eiseley

Partially Observable Markov Decision Processes

Active Learning for Hidden Markov Models

Hidden Markov Models

The Rendering Process Hidden Line and Hidden Surface Removal (HLHSR)

“Hidden Valleys” and their Novel Signals at Colliders

“ Georeactor ” Detection with Gigaton Antineutrino Detectors

Simulation Algorithms for Lattice QCD

Mining the Biomedical Literature