170 likes | 321 Views
Loopy Belief Propagation. a summary. What is inference?. Given: Observabled variables Y Hidden variables X Some model of P(X,Y) We want to make some analysis of P(X|Y): Estimate marginal P(S) for S µ X Minimal Mean Squared Error configuration (MMSE) This is just E[X|Y]
E N D
Loopy Belief Propagation a summary
What is inference? • Given: • Observabled variables Y • Hidden variables X • Some model of P(X,Y) • We want to make some analysis of P(X|Y): • Estimate marginal P(S) for S µ X • Minimal Mean Squared Error configuration (MMSE) • This is just E[X|Y] • Maximum A-Posteriori configuration (MAP) • N most likely configurations • Minimum Variance (MVUE)
Representing Structure in P(X,Y) • Often, P(X,Y) = Õkfk(XCk), where XCkµ X [ Y Markov Random Field Bayes Net Factor Graph P(X) = f1(x1,x2,x3) ¢ f2(x3,x4) ¢ f3(x3,x5) / Z P(X) = P(x3|x1,x2) ¢ P(x4|x3) ¢ P(x5|x3) P(X) = f1(x1,x2,x3) ¢ f2(x3,x4) ¢ f3(x3,x5) ¢ f4(x1) ¢ f5(x2) / Z
Sum-Product Algorithm aka belief update Quickly computes every single-variable marginal P(xn) from a tree graph Suppose the factor graph is a tree. For the tree to the left, we have: P(X) = f1(x1,x2)f2(x2,x3,x4)f3(x3,x5)f4(x4,x6) Then marginalization (for example, computing P(x1)) can be sped up by exploiting the factorization: P(x1) = å f1(x1,x2)f2(x2,x3,x4)f3(x3,x5)f4(x4,x6) = å f1(x1,x2) (å f3(x3,x5)) (å f4(x4,x6)) x2,x3,x4,x5,x6 x2,x3,x4 x5 x6
Message Passing for Sum-Product We can compute every marginal P(xn) quickly using a system of message passing: Message from variable node n to factor node m: vn,m(xn) = Õmi,n(xn) Message from factor node m to variable node n: mm,n(xn) = å [fs(xN(s)) Õ vi,m(xi)] Marginal P(xn): P(xn) / Õmm,n(xn) Each node n can pass a message to neighbor m only once it has received a message from all other adjacent nodes. Intuitively, each message from n to m represents P(xm|Sn), where Sn is the set of all children of node n. i 2 N(n) \ n xN(n) \ n i 2 N(n) \ n m 2 N(n)
Max-Product Algorithm aka belief revision Quickly computes the Maximum A-Posteriori configuration of a tree graph Instead of summing P(X), we take the maximum to get the “maximal” (instead of the marginal): M(x1) = max f1(x1,x2)f2(x2,x3,x4)f3(x3,x5)f4(x4,x6) = max f1(x1,x2) (max f3(x3,x5)) (max f4(x4,x6)) Use the same message passing system to compute the maximal of each variable. x2,x3,x4,x5,x6 x2,x3,x4 x5 x6
Computational Costof Max-Product and Sum-Product • Each message is of size M, where M is the number of states in the random variable. • usually pretty small • Each variable ! factor node message requires (N-2)M multiplies, where N is the number of neighbors off the variable node. • that’s tiny • Each factor ! variable node message requires summation over N-1 variables, each of size M. Total computation per message is O(N ¢ MN). • not bad, as long as there aren’t any hub-like nodes.
What if the graph is not a tree • Several alternative methods: • Gibbs sampling • Expectation Maximization • Variational methods • Elimination Algorithm • Junction-Tree algorithm • Loopy Belief Propagation
Loopy Belief Propagation • Just apply BP rules in spite of loops • In each iteration, each node sends all messages in parallel • Seems to work for some applications Decoding TurboCodes
Trouble with LBP • May not converge • A variety of tricks can help • Cycling Error – old information is mistaken as new • Convergence Error – unlike in a tree, neighbors need not be independent. However, LBP treats them as if they were. Bolt & Gaag “On the convergence error in loopy propagation” (2004).
Good news about MAP in LBP • For a single loop, MAP values are correct • Although the “maximals” are not • If LPB converges, the resulting MAP configuration has higher probability than any other configuration in the “Single Loops and Trees” Neighborhood Example SLT neighborhoods on a grid Weiss, Freeman, “On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs” (2001)
MMSE in LBP • If P(X) is jointly Gaussian, LBP will converge to the correct marginals. • For pairwise-connected markov random fields, if LBP converges, its marginals will minimize Bethe free energy. Weiss, Freeman, “Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology” (2001) Yedidia, Freeman, Weiss, “Bethe free energy, Kikuchi approximations, and belief propagation algorithms”, (2001)
Free Energy Suppose we were able to compute the marginals of a probability distribution b(X) that closely approximated P(X|Y). We would want b(X) to resemble P(X|Y) as much as possible. The total energy F of b(X) is the Kullback-Leibler divergence between b(X) and P(X|Y): However, F is difficult to compute. Also, the b(X) we are working with is often ill-defined.
Kikuchi Free Energy We can approximate total free energy using Kikuchi Free energy. • Select a set of clusters of nodes of a factor graph • All nodes must be in at least one cluster • For each factor node in a cluster, all adjacent variables nodes must also be included. • For each cluster of variables Si, compute the total energy. Sum them together. • F[b(Si)] is the KL-divergence between b(S_i) and the marginal P(S_i|Y) • Now we have double-counted the intersections between sets S_i. Subtract the free-energy of the intersections. Repeat. Bethe free energy is Kukuchi free energy starting with all clusters of size 2.
More advanced algorithmsGreater accuracy, at a price • Generalized Belief Propagation algorithms have been developed to minimize Kicuchi free energy (Yedida, Freeman, Weiss, 2004) • The junction-tree algorithm is a special case • Alan Yuille (2000) has devised a message passing algorithm that minimizes Bethe free energy and is guaranteed to converge. • Other groups are working on fast & robust Bethe minimization (Pretti & Pelizzola 2003).