1 / 48

Learning Bayesian Networks

Learning Bayesian Networks. (From David Heckerman’s tutorial). X 1 true false false true. X 2 1 5 3 2. X 3 0.7 -1.6 5.9 6.3. Learning Bayes Nets From Data. Bayes net(s). data. X 1. X 2. Bayes-net learner. X 3. X 4. X 5. X 6. X 7. + prior/expert information. X 8. X 9.

rmcclain
Download Presentation

Learning Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Bayesian Networks (From David Heckerman’s tutorial)

  2. X1 true false false true X2 1 5 3 2 X3 0.7 -1.6 5.9 6.3 ... . . . . . . Learning Bayes Nets From Data Bayes net(s) data X1 X2 Bayes-net learner X3 X4 X5 X6 X7 + prior/expert information X8 X9

  3. Overview • Introduction to Bayesian statistics:Learning a probability • Learning probabilities in a Bayes net • Learning Bayes-net structure

  4. tails heads Learning Probabilities: Classical Approach Simple case: Flipping a thumbtack True probabilityq is unknown Given iid data, estimate q using an estimator with good properties: low bias, low variance, consistent (e.g., ML estimate)

  5. tails heads p(q) q 0 1 Learning Probabilities: Bayesian Approach True probability q is unknown Bayesian probability density forq

  6. Bayesian Approach: use Bayes' rule to compute a new density for q given data prior likelihood posterior

  7. The Likelihood “binomial distribution”

  8. Example: Application of Bayes rule to the observation of a single "heads" p(q) p(heads|q)= q p(q|heads) q q q 0 1 0 1 0 1 prior likelihood posterior

  9. Q X1 X2 XN ... toss 1 toss 2 toss N A Bayes net for learning probabilities

  10. Sufficient statistics (#h,#t) are sufficient statistics

  11. The probability of heads on the next toss

  12. Prior Distributions for q • Direct assessment • Parametric distributions • Conjugate distributions (for convenience) • Mixtures of conjugate distributions

  13. Conjugate Family of Distributions Beta distribution: Properties:

  14. Intuition • The hyperparameters ah and at can be thought of as imaginary counts from our prior experience, starting from "pure ignorance" • Equivalent sample size = ah + at • The larger the equivalent sample size, the more confident we are about the true probability

  15. Beta Distributions Beta(0.5, 0.5) Beta(1, 1) Beta(3, 2) Beta(19, 39)

  16. Assessment of a Beta Distribution Method 1:Equivalent sample - assess ah and at - assess ah+at and ah/(ah+at) Method 2:Imagined future samples

  17. Generalization to m discrete outcomes("multinomial distribution") Dirichlet distribution: Properties:

  18. More generalizations(see, e.g., Bernardo + Smith, 1994) Likelihoods from the exponential family • Binomial • Multinomial • Poisson • Gamma • Normal

  19. Overview • Intro to Bayesian statistics:Learning a probability • Learning probabilities in a Bayes net • Learning Bayes-net structure

  20. Q X1 X2 XN ... toss 1 toss 2 toss N From thumbtacks to Bayes nets Thumbtack problem can be viewed as learning the probability for a very simple BN: X heads/tails

  21. tails heads X Y heads/tails heads/tails “heads” “tails” The next simplest Bayes net

  22. X Y heads/tails heads/tails QX X1 X2 XN The next simplest Bayes net ? QY case 1 Y1 case 2 Y2 YN case N

  23. X Y heads/tails heads/tails QX X1 X2 XN The next simplest Bayes net "parameter independence" QY case 1 Y1 case 2 Y2 YN case N

  24. X Y heads/tails heads/tails QX X1 X2 XN The next simplest Bayes net "parameter independence" QY case 1 Y1 ß case 2 Y2 two separate thumbtack-like learning problems YN case N

  25. X Y heads/tails heads/tails A bit more difficult... Three probabilities to learn: • qX=heads • qY=heads|X=heads • qY=heads|X=tails

  26. X Y heads/tails heads/tails A bit more difficult... QY|X=heads QY|X=tails QX heads X1 Y1 case 1 tails X2 Y2 case 2

  27. X Y heads/tails heads/tails A bit more difficult... QY|X=heads QY|X=tails QX X1 Y1 case 1 X2 Y2 case 2

  28. X Y heads/tails heads/tails A bit more difficult... ? ? QY|X=heads QY|X=tails QX ? X1 Y1 case 1 X2 Y2 case 2

  29. X Y heads/tails heads/tails A bit more difficult... QY|X=heads QY|X=tails QX X1 Y1 case 1 X2 Y2 case 2 3 separate thumbtack-like problems

  30. In general… Learning probabilities in a BN is straightforward if • Local distributions from the exponential family (binomial, poisson, gamma, ...) • Parameter independence • Conjugate priors • Complete data

  31. X Y heads/tails heads/tails Incomplete data makes parameters dependent QY|X=heads QY|X=tails QX X1 Y1 case 1 X2 Y2 case 2

  32. Overview • Intro to Bayesian statistics:Learning a probability • Learning probabilities in a Bayes net • Learning Bayes-net structure

  33. Learning Bayes-net structure Given data, which model is correct? X Y model 1: X Y model 2:

  34. Bayesian approach Given data, which model is correct? more likely? X Y model 1: Datad X Y model 2:

  35. Bayesian approach: Model Averaging Given data, which model is correct? more likely? X Y model 1: Datad X Y model 2: average predictions

  36. Bayesian approach: Model Selection Given data, which model is correct? more likely? X Y model 1: Datad X Y model 2: Keep the best model: - Explanation - Understanding - Tractability

  37. To score a model, use Bayes rule Given data d: model score "marginal likelihood" likelihood

  38. Thumbtack example X heads/tails conjugate prior

  39. X Y heads/tails heads/tails More complicated graphs 3 separate thumbtack-like learning problems X Y|X=heads Y|X=tails

  40. Model score for a discrete BN

  41. Computation of Marginal Likelihood Efficient closed form if • Local distributions from the exponential family (binomial, poisson, gamma, ...) • Parameter independence • Conjugate priors • No missing data (including no hidden variables)

  42. Practical considerations The number of possible BN structures for n variables is super exponential in n • How do we find the best graph(s)? • How do we assign structure and parameter priors to all possible graph?

  43. initialize structure score all possible single changes perform best change any changes better? yes no return saved structure Model search • Finding the BN structure with the highest score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995) • Heuristic methods • Greedy • Greedy with restarts • MCMC methods

  44. Structure priors 1. All possible structures equally likely 2. Partial ordering, required / prohibited arcs 3. p(m) a similarity(m, prior BN)

  45. Parameter priors • All uniform: Beta(1,1) • Use a prior BN

  46. Parameter priors Recall the intuition behind the Beta prior for the thumbtack: • The hyperparameters ah and at can be thought of as imaginary counts from our prior experience, starting from "pure ignorance" • Equivalent sample size = ah + at • The larger the equivalent sample size, the more confident we are about the long-run fraction

  47. x1 x2 x3 x4 x5 x6 x7 x8 x9 Parameter priors imaginary count for any variable configuration equivalent sample size + parameter modularity parameter priors for any BN structure for X1…Xn

  48. x1 x2 x3 x4 x5 x6 x1 x2 x7 x3 x4 x8 x5 x9 x6 x7 x1 true false false true x2 false false false true x3 true true false false x8 x9 ... . . . . . . Combine user knowledge and data prior network+equivalent sample size improved network(s) data

More Related