1 / 22

Advanced Learning Algorithms for Structure in Bayesian Networks

Explore learning algorithms for Bayesian network structures, with a focus on EM algorithm’s application in parameter estimation. The process of learning structure, handling incomplete data, and maximizing scoring structures are discussed.

eileenb
Download Presentation

Advanced Learning Algorithms for Structure in Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV PVSAT ANAPHYLAXIS ARTCO2 EXPCO2 SAO2 TPR INSUFFANESTH HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME ERRCAUTER HR ERRBLOWOUTPUT HISTORY CO CVP PCWP HREKG HRSAT HRBP BP Advanced IWS 06/07 Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97-021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99. Graphical Models- Learning - Structure Learning Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany

  2. Learning With Bayesian Networks ? ? ? A B A B H A B - Learning Stucture learing? Parameter Estimation

  3. Increases the number of parameters to be estimated Wrong assumptions about domain structure Cannot be compensated for by fitting parameters Wrong assumptions about domain structure Truth Earthquake Earthquake AlarmSet Alarm Set Burglary Burglary Earthquake Alarm Set Burglary Sound Sound Sound Why Struggle for Accurate Structure? Missing an arc Adding an arc - Learning

  4. B E A Learning algorithm E E B B P(A | E,B) P(A | E,B) .9 ? .1 ? e e b b e e b b .7 ? .3 ? .8 ? .2 ? e e b b .99 ? .01 ? e e b b Unknown Structure, (In)complete Data E, B, A <Y,N,N> <Y,N,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y> • Network structure is not specified • Learnerr needs to select arcs & estimate parameters • Data does not contain missing values B E A - Learning E, B, A <Y,?,N> <Y,N,?> <N,N,Y> <N,Y,Y> . . <?,Y,Y> • Network structure is not specified • Data contains missing values • Need to consider assignments to missing values

  5. E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y> Score-based Learning Define scoring function that evaluates how well a structure matches the data score - Learning E B E E A A B A B Search for a structure that maximizes the score

  6. Structure Search as Optimization Input: • Training data • Scoring function • Set of possible structures Output: • A network that maximizes the score - Learning

  7. Heuristic Search • Define a search space: • search states are possible structures • operators make small changes to structure • Traverse space looking for high-scoring structures • Search techniques: • Greedy hill-climbing • Best first search • Simulated Annealing • ... Theorem: Finding maximal scoring structure with at most k parents per node is NP-hard for k > 1 - Learning

  8. S C E D Typically: Local Search • Start with a given network • empty network, best tree , a random network • At each iteration • Evaluate all possible changes • Apply change based on score • Stop when no modification improves score - Learning

  9. S C E S C D E D Typically: Local Search • Start with a given network • empty network, best tree , a random network • At each iteration • Evaluate all possible changes • Apply change based on score • Stop when no modification improves score Add C D - Learning

  10. S C E S C D E D S C E D Typically: Local Search • Start with a given network • empty network, best tree , a random network • At each iteration • Evaluate all possible changes • Apply change based on score • Stop when no modification improves score Add C D Reverse C E - Learning

  11. S C E S C D E D S C S C E E D D Typically: Local Search • Start with a given network • empty network, best tree , a random network • At each iteration • Evaluate all possible changes • Apply change based on score • Stop when no modification improves score Add C D Reverse C E Delete C E - Learning

  12. S C E S C D E D S C S C E E D D Typically: Local Search If data is complete: To update score after local change, only re-score (counting) families that changed Add C D Reverse C E Delete C E - Learning If data is incomplete: To update score after local change, reran parameter estimation algorithm

  13. Local Search in Practice • Local search can get stuck in: • Local Maxima: • All one-edge changes reduce the score • Plateaux: • Some one-edge changes leave the score unchanged • Standard heuristics can escape both • Random restarts • TABU search • Simulated annealing - Learning

  14. <9.7 0.6 8 14 18> <0.2 1.3 5 ?? ??> <1.3 2.8 ?? 0 1 > <?? 5.6 0 10 ??> ………………. Local Search in Practice • Using LL as score, adding arcs always helps • Max score attained by fully connected network • Overfitting: A bad idea… • Minimum Description Length: • Learning  data compression • Other: BIC (Bayesian Information Criterion), Bayesian score (BDe) - Learning DL(Data|model) DL(Model)

  15. G3 G2 G4 Gn G1 Local Search in Practice Parameter space • Perform EM for each candidate graph Parametric optimization (EM) Local Maximum - Learning

  16. G3 G2 G4 Gn G1 Local Search in Practice Parameter space • Perform EM for each candidate graph Parametric optimization (EM) Local Maximum • Computationally expensive: • Parameter optimization via EM — non-trivial • Need to perform EM for all candidate structures • Spend time even on poor candidates  In practice, considers only a few candidates - Learning

  17. Structural EM [Friedman et al. 98] Recall, in complete data we had • Decomposition  efficient search Idea: • Instead of optimizing the real score… • Find decomposable alternative score • Such that maximizing new score  improvement in real score - Learning

  18. Structural EM [Friedman et al. 98] Idea: • Use current model to help evaluate new structures Outline: • Perform search in (Structure, Parameters) space • At each iteration, use current model for finding either: • Better scoring parameters: “parametric” EM step or • Better scoring structure: “structural” EM step - Learning

  19. Reiterate Score & Parameterize Computation X1 X1 X1 X2 X2 X2 X3 X3 X3 H H Y1 Y1 Y1 Y2 Y2 Y2 Y3 Y3 Y3 H Structural EM [Friedman et al. 98] Expected Counts N(X1) N(X2) N(X3) N(H, X1, X1, X3) N(Y1, H) N(Y2, H) N(Y3, H)  - Learning N(X2,X1) N(H, X1, X3) N(Y1, X2) N(Y2, Y1, H) Training Data

  20. Inference: P(S|X=0,D=1,C=0,B=1) Expected counts S X D C B <? 0 1 0 1> <1 1 ? 0 1> <0 0 0 ? ?> <?? 0 ? 1> ……… S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ……….. Data Structure Learning: incomplete data E A Expectation B Current model Maximization Parameters - Learning EM-algorithm: iterate until convergence

  21. Inference: P(S|X=0,D=1,C=0,B=1) Expected counts S X D C B <? 0 1 0 1> <1 1 ? 0 1> <0 0 0 ? ?> <?? 0 ? 1> ……… S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ……….. Data Structure Learning: incomplete data B E E A A Expectation B Current model Maximization Parameters - Learning Maximization Structure E B E E SEM-algorithm: iterate until convergence A A B A B

  22. Structure Learning: Summary • Expert knowledge + learning from data • Structure learning involves parameter estimation (e.g. EM) • Optimization w/ score functions • likelihood +complexity penality= MDL • Local traversing of space of possible structures: • add, reverse, delete (single) arcs • Speed-up: Structural EM • Score candidates w.r.t. current best model - Learning

More Related