410 likes | 593 Views
Belief Propagation and its Generalizations. Shane Oldenburger. Outline. The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP. Outline. The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation
E N D
Belief Propagation and its Generalizations Shane Oldenburger
Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP
Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP
Recall from the Jointree Algorithm • We separate evidence e into: • e+: denotes evidence pertaining to ancestors • e-: denotes evidence pertaining to descendants • BEL(X) = P(X|e) = P(X|e+,e-) = P(e-|X,e+)*P(X|e+)/P(e-|e+) = P(e-|X)P(X|e+) = *(X)*(X) : messages from parents : messages from children : normalization constant
Pearl’s Belief Propagation Algorithm:Initialization • Nodes with evidence • (xi) = 1 where xi = ei; 0 otherwise • (xi) = 1 where xi = ei; 0 otherwise • Nodes with no parents • (xi) = p(xi) //prior probabilities • Nodes with no children • (xi) = 1
Pearl’s BP algorithm Iterate • For each X: • If all messages from parents of X have arrived, combine into (X) • If all messages from children of X have arrived, combine into (X) • If (X) has been computed and all messages other than from Yi have arrived, calculate and send message XYi to child Yi • If (X) has been computed and all messages other than from Ui have arrived, calculate and send message XUi to parent Ui Compute BEL(X) = *(X)*(X)
BP properties • Exact for Polytrees • Only one path between any two nodes • Each node X separates graph into two disjoint graphs (e+, e-) • But most graphs of interest are not Polytrees – what do we do? • Exact inference • Cutset conditioning • Jointree method • Approximate inference • Loopy BP
Loopy BP • In the simple tree example, a finite number of messages where passed • In a graph with loops, messages may be passed around indefinitely • Stop when beliefs converge • Stop after some number of iterations • Loopy BP tends to achieve good empirical results • Low-level computer vision problems • Error-correcting codes: Turbocodes, Gallager codes
Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP
Markov Random Fields • BP algorithms have been developed for many graphical models • Pairwise Markov Random Fields are used in this paper for ease of presentation • An MRF consists of “observable” nodes and “hidden” nodes • Since it is pairwise, each observable node is connected to exactly one hidden node, and each hidden is connected to at most one observable node
Markov Random Fields • Two hidden variables xi and xj are connected by a “compatibility function” ij(xi, yi) • Hidden variable xi is connected to observable variable yi by “evidence function” i(xi, yi) = xi(xi) • The joint probability for a pairwise MRF is p({x}) = (1/Z) ijij(xi, yi) ixi(xi) • The BP algorithm for pairwise MRFs is similar to that for Bayesian Networks
Conversion between graphical models • We can limit ourselves to considering pairwise MRFs • Any pairwise MRF or BN can be converted to an equivalent “Factor graph” • Any factor graph can be converted into an equivalent pairwise MRF or BN
An intermediary model • A factor graph is composed of • “variable” nodes represented by circles • “function” nodes represented by squares • Factor graphs are a generalization of Tanner graphs, where the “function” nodes are parity checks of its connected variables • A function node for a factor graph can be any arbitrary function of the variables connected to it
Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP
Gibbs Free Energy • Gibbs free energy is the difference in the energy of a system from an initial state to a final state of some process (e.g. chemical reaction) • For a chemical reaction, if the Gibbs free energy is negative then the reaction is “spontaneous”, or “allowed” • If the Gibbs free energy is non-negative, the reaction is “not allowed”
Gibbs free energy • Instead of difference in energy of a chemical process, we want to define Gibbs free energy in term of the difference between a target probability distribution p and an approximate probability distribution b • Define the “distance” between p({x}) and b({x}) as • D(b({x}) || p({x})) = {x}b({x}) ln[b({x})/ p({x})] • This is known as the Kullback-Liebler distance • Boltzmann’s law: p({x}) = (1/Z) e-E({x})/T • Generally assumed by statistical physicists • Here we will use Boltzmann’s law as our definition of “energy” E • T acts as a unit scale parameter; let T = 1 • Substituting Boltzmann’s law into our distance measure • D(b({x}) || p({x})) = {x}b({x})E({x}) + {x}b({x})ln[b({x})] + ln Z
Gibbs free energy • Our distance measure • D(b({x}) || p({x})) = {x}b({x})E({x}) + {x}b({x})ln[b({x})] + ln Z • We see will be zero (p = b) when • G(b({x})) = {x}b({x})E({x}) + {x}b({x})ln[b({x})] = U(b({x}) - S(b({x}) is minimized at F = -ln Z • G: “Gibbs free energy” • F: “Helmholz free energy” • U: “average energy” • S: “entropy”
Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP
Bethe approximation • We would like to derive Gibbs free energy in terms of one- and two-node beliefs bi and bij • Due to the pairwise nature of pairwise MRFs, bi and bij are sufficient to compute the average energy U • U = - ijbij(xi,xj)lnij(xi,xj) - ibi(xi)lni(xi) • The exact marginals probabilities pi and pij yeild the same form, so this average energy is exact if the one- and two-node beliefs are exact
Bethe approximation • The entropy term is more problematic • Usually must settle for an approximation • Entropy can be computed exactly if it can be explicitly expressed in terms of one- and two-node beliefs • B({x}) = ij bij(xi,xj) / i bi(xi)qi-1 where qi = #neighbors of xi Then the Bethe approximation to entropy is • SBethe = ijxixjbij(xi,xj)lnbij(xi,xj) + (qi -1) xibi(xi)lnbi(xi) • For singly connected networks, this is exact and GBethe = U – SBethe corresponds to the exact marginal probabilities p • For graphs with loops, this is only an approximation (but usually a good one)
Equivalence of BP and Bethe • The Bethe approximation is exact for pairwise MRF’s when the graphs contain no loops, so the Bethe free energy is minimal for the correct marginals • BP gives correct marginals when the graph contains no loops • Thus, when there are no loops, the BP beliefs are the global minima of the Bethe free energy • We can say more: a set of beliefs gives a BP fixed point in any graph iff they are local stationary points of the Bethe free energy • This can be shown by adding Lagrange multipliers to GBethe to enforce the marginalization constraints
Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP
Kikuchi approximation • Kikuchi approximation is an improvement on and generalization of Bethe • With this association between BP and the Bethe approximation to Gibbs free energy, can we use better approximation methods to craft better BP algorithms?
Cluster variational method • Free energy approximated as a sum of local free energies of sets of regions of nodes • “Cluster variational method” provides a way to select the set of regions • Begin with a basic set of clusters including every interaction and node • Subtract the free energies of over-counted intersection regions • Add back over-counted intersections of intersections, etc. • Bethe is a Kikuchi approximation where the basic clusters are set to the set of all pairs of hidden nodes
Cluster variational method • Bethe regions involve one or two nodes • Define local free energy of a single node Gi(bi(xi)) = xibi(xi)*ln(bi(xi) + Ei(xi)) • Define local free energy involving two nodes Gij(bi(xi,xj)=xi,xjbij(xi,xj)*ln(bij(xi,xj) + Eij(xi,xj)) • Then for the regions corresponding to Bethe, GBethe = G12 + G23 + G45 + G56 + G14 + G25 + G36 – G1 – G3 – G4 – G6 – 2G2 – 2G5
Cluster variational method • For the Kikuchi example shown below, regions involve four nodes • Extend the same logic as before • Define local free energy involving four nodes e.g. G1245(b1245(x1,x2,x4,x5) = x1,x2,x4,x5 b1245(x1,x2,x4,x5)* ln(b1245(x1,x2,x4,x5) + E1245(x1,x2,x4,x5)) • Then for the Kikuchi regions shown, GKikuchi = G1245 + G2356 – G25
A more general example • Now we have basic regions [1245], [2356], [4578], [5689] • Intersection regions [25], [45], [56], [58], and • Intersection of intersection region [5] • Then we have GKikuchi = G1245 + G2356 + G4578 + G5689 - G25 - G45 - G56 - G58 + G5
Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP
Generalized BP • We show how to construct a GBP algorithm for this example • First find the intersections, intersections of intersections, etc. of the basic clusters • Basic: [1245], [2356], [4578], [5689] • Intersections: [25], [45], [56], [58] • Intersection of intersections: [5]
Region Graph • Next, organize regions into the region graph • A hierarchy of regions and their “direct” subregions • ”direct” subregions are subregions not contained in another subregion • e.g. [5] is a subregion of [1245], but is also a subregion of [25]
Messages • Construct messages from all regions r to direct subregions s • These correspond to each edge of the region graph • Consider the message from region [1245] to subregion [25] • A message from nodes not in the subregion (1,4) to those in the subregion (2,5) m1425
Belief Equations • Construct belief equations for every region r • br({x}r) proportional to each compatibility matrix and evidence term completely contained in r • b5 = k[5][m25m45m65m85] • b45 = k[4545][m1245m7845m25m65m85] • b1245 = k[124512142545] [m3625m7845m65m85]
Belief Equations • b5 = k[5][m25m45m65m85]
Belief Equations • b45 = k[4545][m1245m7845m25m65m85]
Belief Equations • b1245=k[124512142545][m3625m7845m65m85]
Enforcing Marginalization • Now, we need to enforce the marginalization condition relating each pair of regions that share an edge in the hierarchy • e.g. between [5] and [45] b5(x5) = x4b45(x4, x5)
Message Update • Adding the marginalization into the belief equations, we get the message update rule: m45(x5) k x4,x2 4(x4)45(x4,x5)m1245(x4,x5)m7825(x2,x5) • The collection of belief equations and the message update rules define out GBP algorithm
Complexity of GBP • Bad news: running time grows exponentially with the size of the basic clusters chosen • Good news: if the basic clusters encompass the shortest loops in the graphical model, usually nearly all the error from BP is eliminated • This usually requires only a small addition amount of computation than BP