1 / 21

Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability

Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability. Ramesh Nallapati, William Cohen and John Lafferty Machine Learning Department Carnegie Mellon University. Latent Dirichlet Allocation (LDA).

madison
Download Presentation

Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelized variational EM for Latent Dirichlet Allocation:An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John Lafferty Machine Learning Department Carnegie Mellon University

  2. Latent Dirichlet Allocation (LDA) • A directed graphical model for topic mining from large scale document collections • A completely unsupervised technique • Extracts semantically coherent multinomial distributions over vocabulary called topics • Represents documents in a lower dimensional topic-space ICDM’07 HPDM workshop

  3. LDA: generative process ICDM’07 HPDM workshop

  4. LDA: topics ICDM’07 HPDM workshop

  5. LDA: inference • Intractable for exact inference • Several approximation inference techniques available • Stochastic techniques • MCMC Sampling • Numerical techniques • Loopy Belief propagation • Variational Inference • Expectation Propagation ICDM’07 HPDM workshop

  6. LDA: variational inference • The true (intractable) posterior probability of the latent variables approximated by a fully factored variational posterior • Lower bound on the true data-log-likelihood: • The difference is equal to the KL-divergence between the variational posterior and true posterior ICDM’07 HPDM workshop

  7. LDA: variational Inference E-step: M-step: ICDM’07 HPDM workshop

  8. LDA: variational inference • The main bottleneck is E-step • Key insight: • Variational parameters d and dnk can be computed independently for various documents • E-step can be parallelized • Two implementations • Multi-processor architecture with shared memory • Distributed architecture with shared disk ICDM’07 HPDM workshop

  9. Parallel implementation • Hardware: • Linux machine with 4 CPUs • Each CPU is an Intel Xeon 2.4GHz processor • Shared 4GB RAM • 512 KB cache • Software: • David Blei’s LDA implementation in C • Used pthreads to parallelize the code ICDM’07 HPDM workshop

  10. Parallel Implementation ICDM’07 HPDM workshop

  11. Distributed implementation • Hardware: • Cluster of 96 nodes • Each is a linux machine • Transmetta Efficeon 1.2GHz processors • 1GB RAM and 1MB cache • Software • David Blei’s C-code forms the core • Perl code to co-ordinate the worker nodes • Rsh connections to invoke worker nodes • Communication through disk ICDM’07 HPDM workshop

  12. Distributed implementation ICDM’07 HPDM workshop

  13. Data • A subset of PubMed consisting of 300K docs • Collection Indexed using Lemur • Stopwords removed and stemmed • Vocabulary size: ¼ 100,000 • Generated subcollections of various sizes • Vocabulary size remains the same ICDM’07 HPDM workshop

  14. Experiments • Studied runtime as a function of • number of threads/nodes • collection size • Fixed the number of topics at 50 • Multiprocessor setting: varied # CPUs from 1 to 4 • Distributed setting: varied # nodes from 1 to 90 • LDA initialization on a collection: • randomly initialized LDA run for 1 EM iteration • resulting model used as a starting point in all experiments • Reported average runtime per EM-iteration ICDM’07 HPDM workshop

  15. Results: Multiprocessor ICDM’07 HPDM workshop

  16. Results: Multiprocessorcase: 50,000 documents ICDM’07 HPDM workshop

  17. Discussion • Plot shows E-step is the main bottle-neck • The speedup is not linear! • A speedup of only 1.85 from 1 to 4 CPUs (50,000 docs) • Possibly a conflict between threads in read-accessing the model in main-memory • Create a copy of the model in memory for each thread? • Results in huge memory requirements ICDM’07 HPDM workshop

  18. Results: Distributed ICDM’07 HPDM workshop

  19. Results: distributedcase: 50,000 documents ICDM’07 HPDM workshop

  20. Discussion • Sub-linear speedups • Speedup of 14.5 from 1 to 50 nodes (50,000 docs) • Speedup tapers-off after an optimum number of nodes • Conflicts in disk reading • M-step: larger input filesize with more nodes • Optimum number of nodes increases with collection size • Scaling the cluster size is desirable for larger collections ICDM’07 HPDM workshop

  21. Conclusions • Distributed version seems more desirable • Scalable • Cheaper • Future work: further improvements • Communication using RPCs • Loading only sparse model corresponding to the sparse doc-term matrix during E-step • Load one document at a time in E-step • Document clustering before splitting the data between nodes ICDM’07 HPDM workshop

More Related