1 / 24

Style & Topic Language Model Adaptation Using HMM-LDA

Style & Topic Language Model Adaptation Using HMM-LDA. Bo-June (Paul) Hsu, James Glass. Outline. Introduction LDA HMM-LDA Experiments Conclusions. Introduction.

rthorne
Download Presentation

Style & Topic Language Model Adaptation Using HMM-LDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass

  2. Outline • Introduction • LDA • HMM-LDA • Experiments • Conclusions

  3. Introduction • An effective LM needs to not only account for the casual speaking style of lectures but also accommodate the topic-specific vocabulary of the subject matter • Available training corpora rarely match the target lecture in both style and topic • In this paper, the syntactic state and semantic topic assignment are investigated using HMM with LDA model

  4. LDA • A generative probabilistic model of a corpus • The topic mixture is drawn from a conjugate Dirichlet prior • PLSA • LDA • Model parameters

  5. Markov chain Monte Carlo • A class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution • The most common application of these algorithms is numerically calculating multi-dimensional integrals • an ensemble of "walkers" moves around randomly • A Markov chain is constructed in such a way as to have the integrand as its equilibrium distribution

  6. LDA • Estimate posteriori • Integrating out: • Gibbs sampling

  7. Markov chain Monte Carlo (cont.) • Gibbs Sampling http://en.wikipedia.org/wiki/Gibbs_sampling

  8. HMM+LDA • HMMs generate documents purely based on syntactic relations among unobserved word classes • Short-range dependencies • Topic model generate documents based on semantic correlations between words, independent of word order • long-range dependencies • A major advantage of generative models is modularity • Different models are easily combined • Words are exhibited by Mixture of model & product of model • Only a subset of words, content words, exhibit long-range dependencies • Replace one probability distribution over words used in syntactic model with the semantic model

  9. HMM+LDA (cont.) • Notation: • A sequence of words • A sequence of topic assignments • A sequence of classes • means semantic class • zth topic associated with distribution over words • Each class is associated with distribution over words • Each document has a distribution over topic • Transition between class and follows a distribution

  10. HMM+LDA (cont.) • A document is generated: • Sample from a prior • For each word in document • Draw from • Draw from • If , then draw from ,else draw from

  11. HMM+LDA (cont.) • Inference • are drawn from • are drawn from • The row of the transition matrix are drawn from • are drawn from • Assume all Dirichlet distribution are symmetric

  12. HMM+LDA (cont.) • Gibbs Sampling

  13. HMM-LDA Analysis • Lectures Corpus • 3 undergraduate subject in math, physics, computer science • 10 CS lectures for development set, 10 CS lectures for test set • Textbook Corpus • CS course textbook • divided in to 271 topic-cohesive documents at every section heading • Run Gibbs sampler against the two dataset • L: 2,800 iterations, T: 2,000 iterations • Use lowest perplexity model as the final model

  14. HMM-LDA Analysis (cont.) • Semantic topics (Lectures) Magnetism Machine learning Childhood Memories Linear Algebra • <laugh>: cursory examination of the data suggests that speakers talking about children tend to laugh more during the lecture • Although it may not be desirable to capture speaker idiosyncrasies in the topic mixtures, HMM-LDA has clearly demonstrated its ability to capture distinctive semantic topics in a corpus

  15. HMM-LDA Analysis (cont.) • Semantic topics (Textbook) • A topically coherent paragraph • 6 of the 7 instances of the words “and” and “or” (underline) are correctly classified • Multi-word topic key phrases can be identified for n-gram topic models the context-dependent labeling abilities of the HMM-LDA models is demonstrated

  16. HMM-LDA Analysis (cont.) • Syntactic States (Lectures) • State 20 is topic state Prepositions Conjunctions Verbs Hesitation disfluencies • As demonstrated with spontaneous speech, HMM-LDA yields syntactic states that have a good correspondence to part-of speech labels, without requiring any labeled training data

  17. Discussions • Although MCMC techniques converge to the global stationary distribution, we cannot guarantee convergence from observation of the perplexity alone • Unlike EM algorithms, random sampling may actually temporarily decrease the model likelihood • The number of iteration was chosen to be at least double the point at which the PP first appeared to converge

  18. Language Modeling Experiments • Baseline model: Lecture + Textbook Interpolated trigram model (using modified Kneser-Ney discounting) • Topic-deemphasized style (trigram) model (Lectures): • To deemphasize the observed occurrences of topic words and ideally redistribute these counts to all potential topic words • The counts of topic to style word transitions are not altered

  19. Language Modeling Experiments (cont.) • Textbook model should ideally have higher weight in the contexts containing topic words • Domain trigram model (Textbook): • Emphasize the sequences containing a topic word in the context by doubling their counts

  20. Language Modeling Experiments (cont.) • unsmoothed topical tirgram model: • Apply HMM-LDA with 100 topics to identify representative words and their associated contexts for each topics • Topic mixtures for all models • Mixture weights were tuned on individual target lectures (cheat) • 15 of 100 topics account for over 90% of the total weight

  21. Language Modeling Experiments (cont.) • Since the topic distribution shifts over a long lecture, modeling a lecture with fixed weights may not be the most optimal • Update the mixture distribution by linearly interpolating it with the posterior topic distribution given the current word

  22. Language Modeling Experiments (cont.) • The variation of topic mixtures Review previous lecture -> Show an example of computation using accumulators -> Focus the lecture on stream as a data structure, with an intervening example that finds pairs of i and j that sum up to a prime

  23. Language Modeling Experiments (cont.) • Experimental results

  24. Conclusions • HMM-LDA shows great promise for finding structure in unlabeled data, from which we can build more sophisticated models • Speaker-specific adaptation will be investigated in the future

More Related