1 / 19

Part-of-Speech Tagging

Part-of-Speech Tagging. Foundation of Statistical NLP CHAPTER 10. Contents. Markov Model Taggers Hidden Markov Model Taggers Transformation-Based Learning of Tags Tagging Accuracy and Uses of Taggers. Markov Model Taggers. Markov properties Limited horizon Time invariant

Download Presentation

Part-of-Speech Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10

  2. Contents • Markov Model Taggers • Hidden Markov Model Taggers • Transformation-Based Learning of Tags • Tagging Accuracy and Uses of Taggers

  3. Markov Model Taggers • Markov properties • Limited horizon • Time invariant cf. Wh-extraction (Chomsky) a. Should Peter buy a book? b. Which book should Peter buy?

  4. Markov Model Taggers • The probabilistic model • Finding the best tagging t1,n for a sentence w1,n ex:P(AT NN BEZ IN AT VB | The bear is on the move)

  5. assumtion • words are independent of each other • a word’s identity only depends on its tag

  6. Markov Model Taggers • Training for all tags t jdo for all tags tkdo end end for all tags t jdo for all words wldo end end

  7. Markov Model Taggers • Tagging (the Viterbi algorithm)

  8. Variations • The models for unknown words 1. assuming that they can be any part of speech 2. using morphological to make inferences about a possible parts of speech

  9. Z: normalization constant

  10. Variation • Trigram taggers • Interpolation • Variable Memory Markov Model (VMMM)

  11. Variation • Smoothing • Reversibility Kl: the number of possible parts of speech of wl

  12. Variation • Sequence vs. tag by tag Time flies like an arrow. a. NN VBZ RB AT NN. P(.) = 0.01 b. NN NNS VB AT NN. P(.) = 0.01 • there is no large difference in accuracy between maximizing the sequence and tag

  13. Hidden Markov Model Taggers When we have no tagged training data • Initializing all parameters with the dictionary information • Jelinek’s method • Kupiec’s method

  14. Hidden Markov Model Taggers • Jelinek’s method • initializing the HMM with the MLE for P(wk|ti) • assuming that words occur equally likely with each of their possible tags. T(wj): the number of tags allowed for wj

  15. Hidden Markov Model Taggers • Kupiec’s method • grouping all words with the same possible parts of speech into ‘metawords’ uL • not to fine-tune parameters for each word

  16. Hidden Markov Model Taggers • Training • after initialization, the HMM is trained using the Forward-Backward algorithm • Tagging • equal to VMM ! the difference between VMM tagging and HMM tagging is in how we train the model, not in how we tag.

  17. The effect of initialization on HMM overtrainingproblem D0 maximum likelihood estimates from a tagged training corpus D1 correct ordering only of lexical probabilities D2 lexical probabilities proportional to overall tag probabilities D3 equal lexical probabilities for all tags admissible for a word T0 maximum likelihood estimates from a tagged training corpus T1 equal probabilities for all transitions Hidden Markov Model Taggers

  18. Use Visible Markov Model • a sufficiently large training text • similar to the intended text of application • Run Forward-Backward for a few iterations • no training text • training and test text are very different • but at least some lexical information • Run Forward-Backward for a larger number of iterations • no lexical information

More Related