1 / 37

Learning in Bayesian Networks

Learning in Bayesian Networks. Known Structure Complete Data. Learning. Known Structure Incomplete Data. Unknown Structure Complete Data. Unknown Structure Incomplete Data. The Learning Problem. Known Structure Complete Data. Known Structure Incomplete Data.

lyle
Download Presentation

Learning in Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning in Bayesian Networks

  2. Known Structure Complete Data Learning Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete Data The Learning Problem

  3. Known Structure Complete Data

  4. Known Structure Incomplete Data

  5. Unknown Structure Complete Data

  6. Unknown Structure Incomplete Data

  7. CPTs A CPTs B Method A Method B Known Structure

  8. + CPTs A + CPTs B = PrA = PrB Known Structure Which probability distribution should we choose? Common criterion: Choose distribution that maximizes likelihood of data

  9. Data D d1 d6 + CPTs A + CPTs B Likelihood of data given PrA PrA(D) = PrA (d1) … PrA (dm) Likelihood of data given PrB = PrA = PrB PrB(D) = PrB (d1) … PrB (dm) Known Structure

  10. Complete Data: Unique set of CPTs which maximize likelihood of data Incomplete Data: No Unique set of CPTs which maximize likelihood of data Maximizing Likelihood of Data

  11. Complete Data: Unique set of CPTs which maximize likelihood of data Incomplete Data: No Unique set of CPTs which maximize likelihood of data Maximizing Likelihood of Data

  12. Data D d1 d6 Estimated parameter: Number of data points di with d b c = Number of data points di with b c Known Structure, Complete Data

  13. Data D d1 d6 Known Structure, Complete Data Estimated parameter:

  14. Network with: Nodes: n Parameters: k Data points: m Time complexity: O(m k )(straightforward implementation) Space complexity: O(k + mn)parameter count Complexity

  15. Estimated parameters at iteration i+1 (using the CPTs at iteration i): Known Structure, Incomplete Data Pr0 corresponds to the initial Bayesian network (random CPTs)

  16. Known Structure, Incomplete Data • EM Algorithm (Expectation-Maximization): • -Initial CPTs to random values • Repeat until convergence: • Estimate parameters using current CPTs (E-step) • Update CPTs using estimates (M-step)

  17. Likelihood of data cannot get smaller after an iteration Algorithm is not guaranteed to return the network which absolutely maximizes likelihood of data It is guaranteed to return a local maxima: Random re-starts Algorithm is stopped when change in likelihood gets very small Change in parameters gets very small EM Algorithm

  18. Network with: Nodes: n Parameters: k Data points: m Treewidth: w Time complexity (per iteration): O(m k n 2w)(straightforward implementation) Space complexity: O(k + nm + n 2w)parameter count + space for data + space for inference Complexity

  19. Collaborative Filtering (CF) finds items of interest to a user based on the preferences of other similar users. Assumes that human behavior is predictable Collaborative Filtering

  20. E-commerce Recommend products based on previous purchases or click-stream behavior Ex: Amazon.com Information sites Rate items based on previous user ratings Ex: MovieLens, Jester Where is it used?

  21. CF

  22. Use the entire database of user ratings to make predictions. Find users with similar voting histories to the active user. Use these users’ votes to predict ratings for products not voted on by the active user. Memory-based Algorithms

  23. Construct a model from the vote database. Use the model to predict the active user’s ratings. Model-based Algorithms

  24. Use a Naïve Bayes network to model the vote database. m vote variables: one for each title. Represent discrete vote values. 1 “cluster” variable Represents user personalities Bayesian Clustering

  25. C … V1 V2 V3 Vm Naïve Bayes

  26. C … V1 V2 V3 Vm

  27. Inference Evidence: known votes vk for titles k ÎI Query: title j for which we need to predict vote C … V1 V2 V3 Vm • Expected value of vote: w å = = Î p h Pr( v h | v : k I ) j j k = 1 h

  28. Simplified Expectation Maximization (EM) Algorithm with partial data Initialize CPTs with random values subject to the following constraints: q » q » Pr( c ) Pr( v | c ) c v | c k k å å q = q = 1 1 v | c c k C v k Learning

  29. MovieLens 943 users; 1682 titles; 100,000 votes (1..5); explicit voting MS Web – website visits 610 users; 294 titles; 8,275 votes (0,1) : null votes => 0 : 179,340 votes; implicit voting Datasets

  30. Learning curve for MovieLens Dataset

  31. User database is divided into: 80% training set and 20% test set. One-by-one select a user from the test set to be the active user. Predict some of their votes based on remaining votes Protocols

  32. All-But-One Given-{Two, Five, Ten} Ia e e Q e e e e e e e e e e e Ia Q e Q e Q Q e Q Q Q e Q Q e Ia e Q Q e Q e e e e Q e e e e Ia Q Q Q Q e Q Q Q Q Q Q e Q Q

  33. Average Absolute Deviation Ranked Scoring Evaluation Metric

  34. Experiments were run 5 times and averaged Movielens Results

  35. MS Web

  36. Prediction time: (Memory-based) 10 minutes per experiment; (Model-based) 2 minutes Learning time: 20 minutes per iteration n: number of data point; m: number of titles; w: number of votes per title;|C| number of personality types Computational Issues

  37. Building networks: Nodes, Edges CPTs Inference: Posterior marginals MPE MAP Learning: EM Sensitivity Engine Demo of SamIam

More Related