1 / 64

Part of Speech (POS) Tagging

Part of Speech (POS) Tagging. CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005. Sources (and Resources). Some slides adapted from Dorr, www.umiacs.umd.edu/~christof/courses/cmsc723-fall04

minowa
Download Presentation

Part of Speech (POS) Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part of Speech (POS) Tagging CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  2. Sources (and Resources) • Some slides adapted from • Dorr, www.umiacs.umd.edu/~christof/courses/cmsc723-fall04 • Jurafsky, www.stanford.edu/class/linguist238 • McCoy, www.cis.udel.edu/~mccoy/courses/cisc882.03f • With some additional examples and ideas from • Martin: www.cs.colorado.edu/~martin/csci5832.html • Hearst: www.sims.berkeley.edu/courses/is290-2/f04/resources.html • Litman: www.cs.pitt.edu/~litman/courses/cs2731f03/cs2731.html • Rich: www.cs.utexas.edu/users/ear/cs378NLP • You may find some or all of these useful resources throughout the course. CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  3. Word Classes and Part-of-Speech Tagging • What is POS tagging? • Why do we need POS? • Word Classes • Rule-based Tagging • Stochastic Tagging • Transformation-Based Tagging • Tagging Unknown Words • Evaluating POS Taggers CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  4. Parts of Speech • 8 traditional parts of speech (more or less) • Noun, verb, adjective, preposition, adverb, article, pronoun, conjunction. • This idea has been around for over 2000 years (Dionysius Thrax of Alexandria, c. 100 B.C.) • Called: parts-of-speech, lexical category, word classes, morphological classes, lexical tags, POS • Actual categories vary by language , by reason for tagging, by who you ask! CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  5. POS examples • N noun chair, bandwidth, pacing • V verb study, debate, munch • ADJ adj purple, tall, ridiculous • ADV adverb unfortunately, slowly, • P preposition of, by, to • PRO pronoun I, me, mine • DET determiner the, a, that, those CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  6. WORDS TAGS the girl kissed the boy on the cheek N V P DET Definition of POS Tagging “The process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin) CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  7. POS Tagging example WORD tag the DET koala N put V the DET keys N on P the DET table N CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  8. What does Tagging do? • Collapses Distinctions • Lexical identity may be discarded • e.g. all personal pronouns tagged with PRP • Introduces Distinctions • Ambiguities may be removed • e.g. deal tagged with NN or VB • e.g. deal tagged with DEAL1 or DEAL2 • Helps classification and prediction Modified from Diane Litman's version of Steve Bird's notesCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  9. Significance of Parts of Speech • A word’s POS tells us a lot about the word and its neighbors: • Limits the range of meanings (deal), pronunciation (object vs object) or both (wind) • Helps in stemming • Limits the range of following words for Speech Recognition • Can help select nouns from a document for IR • Basis for partial parsing (chunked parsing) • Parsers can build trees directly on the POS tags instead of maintaining a lexicon Modified from Diane Litman's version of Steve Bird's notesCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  10. Word Classes • What are we trying to classify words into? • Classes based on • Syntactic properties. What can precede/follow. • Morphological properties. What affixes they take. • Not primarily by semantic coherence (Conjunction Junction notwithstanding!) • Broad "grammar" categories are familiar • NLP uses much richer "tagsets" CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  11. Open and closed class words • Two major categories of classes: • Closed class: a relatively fixed membership • Prepositions: of, in, by, … • Auxiliaries: may, can, will had, been, … • Pronouns: I, you, she, mine, his, them, … • Usually function words(short common words which play a role in grammar) • Open class: new ones can be created all the time • English has 4: Nouns, Verbs, Adjectives, Adverbs • Many languages have all 4, but not all! CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  12. Open Class Words • Every known human language has nouns and verbs • Nouns: people, places, things • Classes of nouns • proper vs. common • count vs. mass • Verbs: actions and processes • Adjectives: properties, qualities • Adverbs: hodgepodge! • Unfortunately, John walked home extremely slowly yesterday CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  13. Closed Class Words • Idiosyncratic. Differ more from language to language. • Language strongly resists additions • Examples: • prepositions: on, under, over, … • particles: up, down, on, off, … • determiners: a, an, the, … • pronouns: she, who, I, .. • conjunctions: and, but, or, … • auxiliary verbs: can, may should, … • numerals: one, two, three, third, … CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  14. Prepositions from CELEX CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  15. English Single-Word Particles CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  16. Pronouns in CELEX CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  17. Conjunctions CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  18. Auxiliaries CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  19. POS Tagging: Choosing a Tagset • Many parts of speech, potential distinctions • To do POS tagging, need to choose a standard set of tags to work with • Sets vary in # of tags: a dozen to over 200 • Size of tag sets depends on language, objectives and purpose • Need to strike a balance between • Getting better information about context (best: introduce more distinctions) • Make it possible for classifiers to do their job (need to minimize distinctions) CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  20. Some of the best-known Tagsets • Brown corpus: 87 tags • Penn Treebank: 45 tags • Lancaster UCREL C5 (used to tag the BNC): 61 tags • Lancaster C7: 145 tags Slide modified from Massimo Poesio'sCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  21. The Brown Corpus • The first digital corpus (1961) • Francis and Kucera, Brown University • Contents: 500 texts, each 2000 words long • From American books, newspapers, magazines • Representing genres: • Science fiction, romance fiction, press reportage scientific writing, popular lore Modified from Diane Litman's version of Steve Bird's notesCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  22. Penn Treebank • First syntactically annotated corpus • 1 million words from Wall Street Journal • Part of speech tags and syntax trees Modified from Diane Litman's version of Steve Bird's notesCSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  23. Tag Set Example: Penn Treebank PRP PRP$ CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  24. Example of Penn Treebank Tagging of Brown Corpus Sentence The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. VB DT NN .Book that flight . VBZ DT NN VB NN ?Does that flight serve dinner ? CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  25. POS Tagging • Words often have more than one POS: back • The back door = JJ • On my back = NN • Win the voters back = RB • Promised to back the bill = VB • The POS tagging problem is to determine the POS tag for a particular instance of a word. CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  26. Word Class Ambiguity(in the Brown Corpus) Unambiguous (1 tag): 35,340 Ambiguous (2-7 tags): 4,100 (Derose, 1988) CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  27. Part-of-Speech Tagging • Rule-Based Tagger: ENGTWOL • Stochastic Tagger: HMM-based • Transformation-Based Tagger: Brill CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  28. Rule-Based Tagging • Basic Idea: • Assign all possible tags to words • Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv. • Typically more than 1000 hand-written rules, but may be machine-learned. CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  29. Start With a Dictionary • she: PRP • promised: VBN,VBD • to TO • back: VB, JJ, RB, NN • the: DT • bill: NN, VB • Etc… for the ~100,000 words of English CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  30. Assign All Possible Tags NN RB VBNJJ VB PRP VBD TO VB DT NN She promised to back the bill CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  31. Write rules to eliminate tags Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP” NN RB JJ VB PRP VBD TO VB DT NN She promised to back the bill VBN CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  32. Sample ENGTWOL Lexicon CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  33. Stochastic Tagging • Based on probability of certain tag occurring given various possibilities • Necessitates a training corpus • No probabilities for words not in corpus. • Training corpus may be too different from test corpus. CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  34. Stochastic Tagging (cont.) Simple Method: Choose most frequent tag in training text for each word! • Result: 90% accuracy • Why? • Baseline: Others will do better • HMM is an example CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  35. HMM Tagger • Intuition: Pick the most likely tag for this word. • HMM Taggers choose tag sequence that maximizes this formula: • P(word|tag) × P(tag|previous n tags) • Let T = t1,t2,…,tnLet W = w1,w2,…,wn • Find POS tags that generate a sequence of words, i.e., look for most probable sequence of tags T underlying the observed words W. CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  36. Conditional Probability • A brief digression… • Conditional probability: how do we determine the likelihood of one event following another if they are not independent? • Example: • I am trying to diagnose a rash in a 6-year-old child. • Is it measles? • On other words, given that the child has a rash, what is the probability that it is measles? CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  37. Conditional Probabilities cont. • What would affect your decision? • The overall frequency of rashes in 6-yr-olds • The overall frequency of measles in 6-yr-olds • The frequency with which 6-yr-olds with measles have rashes. • P(measles|rash) = P(rash|measles)P(measles) P(rash) CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  38. Bayes' Theorem • Bayes' Theorem or Bayes' Rule formalizes this intuition: • P(X|Y) = P(Y|X) P(X) P(Y) • P(X) and P(Y) are known as the "prior probabilities" or "priors". CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  39. We want the best set of tags for a sequence of words (a sentence) W is a sequence of words T is a sequence of tags Probabilities CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  40. We want the best set of tags for a sequence of words (a sentence) W is a sequence of words T is a sequence of tags Probabilities CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  41. Tag Sequence: P(T) • How do we get the probability of a specific tag sequence? • Count the number of times a sequence occurs and divide by the number of sequences of that length. Not likely. • Make a Markov assumption and use N-grams over tags... • P(T) is a product of the probability of N-grams* that make it up CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  42. N-Grams • The N stands for how many terms are used • Unigram: 1 term; Bigram: 2 terms; Trigrams: 3 terms • Usually don’t go beyond 3. • You can use different kinds of terms, e.g.: • Character based n-grams • Word-based n-grams • POS-based n-grams • Ordering • Often adjacent, but not required • We use N-grams to help determine the context in which some linguistic phenomenon happens. • E.g., look at the words before and after the period to see if it is the end of a sentence or not. CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  43. P(T): Bigram Example • Given a sentence: <s> Det Adj Adj Noun </s> • Probability is product of four N-grams: P(Det|<s>) P(Adj|Det) P(Adj|Adj) P(Noun|Adj) CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  44. Counts • Where do you get the N-gram counts? • From a large hand-tagged corpus. • For N-grams, count all the Tagi Tagi+1 pairs • And smooth them to get rid of the zeroes • Alternatively, you can learn them from an untagged corpus CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  45. What about P(W|T) • First its odd. It is asking the probability of seeing “The big red dog” given “Det Adj Adj Noun” • Collect up all the times you see that tag sequence and see how often “The big red dog” shows up. Again not likely to work. CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  46. P(W|T) • We’ll make the following assumption (because it’s easy)… Each word in the sequence only depends on its corresponding tag. So… • How do you get the statistics for that? CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  47. So… • We start with • And get CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  48. HMMs • This is a Hidden Markov Model (HMM) • The states in the model are the tags, and the observations are the words. • The state to state transitions are driven by the bigram statistics • The observed words are based solely on the state that you’re in CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  49. An Example • Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN • People/NNS continue/VBP to/TO inquire/VB the DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN • to/TO race/???the/DT race/??? • ti = argmaxj P(tj|ti-1)P(wi|tj) • max[P(VB|TO)P(race|VB) , P(NN|TO)P(race|NN)] • Brown: • P(NN|TO) = .021 × P(race|NN) = .00041 = .000007 • P(VB|TO) = .34 × P(race|VB) = .00003 = .00001 CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  50. Performance • This method has achieved 95-96% correct with reasonably complex English tagsets and reasonable amounts of hand-tagged training data. CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

More Related