1 / 20

Modular and hierarchical learning systems

Modular and hierarchical learning systems. Michael I. Jordan and Robert A. Jacobs Presented by Danke Xie Cognitive Science, UCSD CSE 291s Lawrence Saul 4/26/2007. Outline. Decision Tree Mixture of Experts Architecture The Mixture of Experts Model Learning algorithm

stefano
Download Presentation

Modular and hierarchical learning systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modular and hierarchical learning systems Michael I. Jordan and Robert A. Jacobs Presented by Danke Xie Cognitive Science, UCSD CSE 291s Lawrence Saul 4/26/2007

  2. Outline • Decision Tree • Mixture of Experts Architecture • The Mixture of Experts Model • Learning algorithm • Hierarchical Mixture of Experts architecture • Demo

  3. Introduction • Why modular and hierarchical systems? • Divide into less complex problems • Ex: supervised learning y x f(x) g(x)

  4. Decision Tree • Classification problem • Decision Tree x y  {0,1} X5 > 3 y n X2 < 4 ? X6 > 7 ? y n n y 0 1 0 1

  5. Decision Tree • What’s missing • Living in10,000-dimension space? • Learning is greedy  optimizing likelihood • Soft decision / assignment of task to experts 2 1 4 3 Example: 4 classes in high-dimensional space

  6. Mixture of experts (ME) architecture • Gating network Generating weights • Expert network Interpreted probabilistically as

  7. Generating data • Data set • Given x, randomly choose labels i with probability where is the parameter of the data generating model • Generate y according to • Learn to estimate and from data

  8. A Gradient-based Learning algorithm • Maximize log-likelihood • Optimize with respect to and where =

  9. Analogy of Mixture of Gaussians • The learning algorithm can also be derived using EM algorithm • EM algorithm can be used to find maximum likelihood estimates of parameters, where the likelihood cannot be computed without knowing how to assign data points to clusters / experts • The probabilities of the assignments can be seen as latent variables. This is similar to all of Mixture of Gaussians and (Hierarchical) Mixture of Experts.

  10. EM algorithm • Mixture of Gaussians (unsupervised)

  11. EM algorithm • Mixture of Experts (supervised)

  12. Hierarchical Mixture of Experts

  13. Training set

  14. Classification of test set

  15. Thank you

  16. A Gradient-based Learning algorithm • Maximize log-likelihood • We derive learning rules for the special case • Expert networks and gating networks are linear • Simple probabilistic density for Expert Networks

  17. A Gradient-based Learning algorithm • Take derivative of l with respect to өi

  18. Learning rule for ME • Experts are linear models

  19. Learning rule for HME • LMS-like learning algorithm

More Related