1 / 36

Parameter Related Domain Knowledge for Learning in Bayesian Networks

Parameter Related Domain Knowledge for Learning in Bayesian Networks. Stefan Niculescu PhD Candidate, Carnegie Mellon University Joint work with professor Tom Mitchell and Dr. Bharat Rao April 2005. Domain Knowledge.

deiter
Download Presentation

Parameter Related Domain Knowledge for Learning in Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parameter Related Domain Knowledge forLearning in Bayesian Networks Stefan Niculescu PhD Candidate, Carnegie Mellon University Joint work with professor Tom Mitchell and Dr. Bharat Rao April 2005

  2. Domain Knowledge • In real world, often data is too sparse to allow building of an accurate model • Domain knowledge can help alleviate this problem • Several types of domain knowledge: • Relevance of variables (feature selection) • Conditional Independences among variables • Parameter Domain Knowledge

  3. Parameter Domain Knowledge • In a Bayes Net for a real world domain: • can have huge number of parameters • not enough data to estimate them accurately • Parameter Domain Knowledge constraints: • reduce the number of parameters to estimate • reduce the variance of parameter estimates

  4. Outline • Motivation • Parameter Related Domain Knowledge • Experiments • Related Work • Summary / Future Work

  5. Parameters and Counts Theorem. The Maximum Likelihood estimators are given by: CPT for variable Xi

  6. Parameter Sharing Theorem. The Maximum Likelihood estimators are given by:

  7. Incomplete Data, Frequentist

  8. Dependent Dirichlet Priors

  9. Bayesian Averaging

  10. Hierarchical Parameter Sharing

  11. Probability Mass Sharing DK: Parameters of a given color have the same sum across all distributions. ...

  12. Probability Ratio Sharing DK: Parameters of a given color preserve their relative ratios across all distributions. ...

  13. Where are we right now?

  14. Outline • Motivation • Parameter Related Domain Knowledge • Experiments • Related Work • Summary / Future Work

  15. Datasets • Project World - CALO • 6 persons, ~ 200 emails • Manually labeled as About / Not About Meetings • Data: (Person, Email, Topic) • Artificial Datasets • Kept most of the characteristics of the data BUT ... • ... new emails were generated where frequencies of certain words were shared across users • Purpose: • Domain Knowledge readily available • To be able to study the effect of training set size (up to 5000) • To be able to compare our estimated distribution to the true distribution

  16. Approach • Can model Email using a Naive Bayes model: • Without Parameter Sharing (PSNB) • With Parameter Sharing (SSNB) • Also compare with a model that assumes the sender is irrelevant (GNB) • the frequencies of words within a topic to be learnt from all examples Sender Topic Word Sender Topic Word

  17. Effect of Training Set Size • As expected: • SSNB performs better than both models • SSNB and PSNB tend to perform similarly when the size of training set increases, but SSNB much better when data is sparse

  18. Outline • Motivation • Parameter Related Domain Knowledge • Experiments • Related Work • Summary / Future Work

  19. Dirichlet Priors in a Bayes Net Prior Belief Spread The Domain Expert specifies an assignment of parameters. However, leaves room for some error (Spread)

  20. HMMs and DBNs ... ... ... ...

  21. Module Networks • In a Module: • Same parents • Same CPTs Image from “Learning Module Networks” by Eran Segal and Daphne Koller

  22. Context Specific Independence Burglary Set Alarm

  23. Outline • Motivation • Parameter Related Domain Knowledge • Experiments • Related Work • Summary / Future Work

  24. Summary • Parameter Related Domain Knowledge is needed when data is scarce • Developed methods to estimate parameters: • For each of four types of Domain Knowledge presented • From both complete and incomplete Data • Markov Models, Module Nets, Context Specific Independence – particular cases of our parameter sharing domain knowledge • Models using Parameter Sharing performed better than two classical Bayes Nets on synthetic data

  25. Future Work • Automatically find Shared Parameters • Study interactions among different types of Domain Knowledge • Incorporate Domain Knowledge about continuous variables • Investigate Domain Knowledge in the form of inequality constraints

  26. Questions ?

  27. THE END

  28. Backup Slides

  29. Hierarchical Parameter Sharing

  30. Full Data Observability, Frequentist

  31. Probability Mass Sharing • Want to model P(Word|Language) • Two languages: English, Spanish • Different sets of words • Domain Knowledge: • Aggregate Probability Mass of Nouns the same in both • Same holds for adjectives, verbs, etc

  32. Probability Mass Sharing

  33. Full Data Observability, Frequentist

  34. Probability Ratio Sharing • Want to model P(Word|Language) • Two languages: English, Spanish • Different sets of words • Domain Knowledge: • Word groups: • About computers: computer, mouse, monitor, etc • Relative frequency of “computer” to “mouse” same in both languages • Aggregate mass can be different T1Computer Words T2 Business Words

  35. Probability Ratio Sharing

  36. Full Data Observability, Frequentist

More Related