1 / 54

COMP 4060 Natural Language Processing

COMP 4060 Natural Language Processing. Morphology, Word Classes, POS Tagging. Overview . Morphology Stemming Word Classes POS Tagging (Jurafsky, 2 nd edition, Ch. 2, 3, 5; Allen Ch. 2,3). Morphology. Morphemes and Words. Morpheme = "minimal meaning-bearing unit in a language"

Anita
Download Presentation

COMP 4060 Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging Morphology

  2. Overview • Morphology • Stemming • Word Classes • POS Tagging • (Jurafsky, 2nd edition, Ch. 2, 3, 5; Allen Ch. 2,3) Morphology

  3. Morphology Morphology

  4. Morphemes and Words • Morpheme = "minimal meaning-bearing unit in a language" • Combine morphemes to create words • Inflection • combination of a word stem with a grammatical morpheme • same word class, e.g. clean (verb), clean-ing (verb) • Derivation • combination of a word stem with a grammatical morpheme • Yields different word class, e.g. clean (verb), clean-ing (noun) • Compounding • combination of multiple word stems • Cliticization • combination of a word stem with a clitic • different words from different syntactic categories, e.g. I’ve = I + have Morphology

  5. Inflectional Morphology Inflectional Morphology word stem + grammatical morpheme cat + s only for nouns, verbs, and some adjectives • Nouns • plural: regular: +s, +es irregular:mouse -mice;ox-oxen rules for exceptions: e.g.-y -> -ies like: butterfly - butterflies • possessive: +'s, +' • Verbs • main verbs (sleep, eat, walk) • modal verbs (can, will, should) • primary verbs (be, have, do) Morphology

  6. Inflectional Morphology (verbs) Verb Inflections only for: main verbs (sleep, eat, walk); primary verbs (be, have, do) Morpholog. FormRegularly Inflected Form • stem walk merge try map • -s form walks merges tries maps • -ing participle walking merging trying mapping • past; -ed participle walked merged tried mapped Morph. FormIrregularly Inflected Form • stem eat catch cut • -s form eats catches cuts • -ing participle eating catching cutting • -ed past atecaughtcut • -ed participle eaten caught cut Morphology

  7. Inflectional and Derivational Morphology (adjectives) Adjective Inflections and Derivations: • prefix un- unhappy adjective, negation • suffix -ly happily adverb, mode -er happier adjective, comparative 1 -est happiest adjective, comparative 2 • suffix -ness happinessnoun plus combinations, like unhappiest, unhappiness. Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big. Morphology

  8. Inflectional Morphology Morphology

  9. Noun Inflections Morphology

  10. Verb Inflections Morphology

  11. Derivational Morphology Morphology

  12. Noun Derivation Morphology

  13. Adjective Derivation Morphology

  14. Clitics Morphology

  15. Verb Clitics Morphology

  16. Methods, Algorithms Morphology

  17. Stemming • Stemming algorithms strip off word affixes • yield stem only, no additional information (like plural, 3rd person etc.) • used, e.g. in web search engines • famous stemming algorithm: the Porter stemmer Morphology

  18. Stemming Methods • Rule-based stemming • Example rules: • ATIONAL→ ATE e.g., relational→ relate • ING→  if stem contains vowel, e.g., motoring→ motor Morphology

  19. Stemming Problems Morphology

  20. Tokenization, Word Segmentation • Tokenization or word segmentation • separate out “words” (lexical entries) from running text • expand abbreviated terms • E.g. I’m into I am, it’s into it is • collect tokens forming single lexical entry • E.g. New York marked as one single entry Morphology

  21. Tokenization, Word Segmentation • Finite state transducer (FST) • Modifies input string (rules) • Recognizes (stored) abbreviations and composite words • See Fig.3.22 in Jurafsky, Ch.3 • More of an issue in languages like Chinese Morphology

  22. Lemmatization • Lemmatization maps words with same root but different surface appearances onto the same lexeme • e.g. buys, bought, buying -> buy Morphology

  23. Morphological Processing Morphology

  24. Word Reccognition • Spelling Errors • Mark non-words based on dictionary/lexicon • Use “minimum editing distance” • Dynamic programming • Table-based • Transform operations • deletion, substitution, insertion • Calculate minimum path • Morphological Parser = FST Morphology

  25. Morphological Processing • Knowledge • lexical entry: stem plus possible prefixes, suffixes plus word classes, e.g. endings for verb forms (see tables above) • rules: how to combine stem and affixes, e.g. add s to form plural of noun as in dogs • orthographic rules: spelling, e.g. double consonant as in mapping • Processing: Finite State Transducers • take information above and analyze word token / generate word form Morphology

  26. Fig. 3.3 FSA for verb inflection. Morphology

  27. Fig. 3.4 Simple FSA for adjective inflection. Fig. 3.5 More detailed FSA for adjective inflection. Morphology

  28. Fig. 3.7 Compiled FSA for noun inflection. Morphology

  29. Fig. 3.12 Lexical and intermediate tape of a FS Transducer Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation. Morphology

  30. Word Classes and POS Tagging Morphology

  31. Word Classes Sort words into categories according to: • morphological properties Which types of morphological forms do they take? e.g. form plural: noun+s; 3rd person: verb+s • distributional properties What other words or phrases can occur nearby? e.g. possessive pronoun before noun • semantic coherence Classify according to similar semantic type. e.g. nouns refer to object-like entities Morphology

  32. Open vs. Closed Word Classes Open Class Types The set of words in these classes can change over time, with the development of the language, e.g. spaghetti and download Open Class Types: nouns, verbs, adjectives, adverbs Morphology

  33. Open vs. Closed Word Classes Closed Class Types The set of words in these classes are very much determined and hardly ever change for one language. Closed Class Types: prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals Morphology

  34. Open Class Words: Nouns Nouns denote objects, concepts, entities, events Proper Nouns Names for specific individual objects, entities e.g. the Eiffel Tower, Dr. Kemke Common Nouns Names for categories, classes, abstracts, events e.g. fruit, banana, table, freedom, sleep, race, ... Count Nouns enumerable entities, e.g. two bananas Mass Nouns not countable items, e.g. water, salt, freedom Morphology

  35. Open Class Words: Verbs Verbs denote actions, processes, and states,e.g. smoke, dream, rest, run several morphological forms,e.g. non-3rd person - eat, sleep 3rd person - eats, sleeps, progressive/ - eating,sleeping present participle/ gerundive past participle - eaten, slept simple past - ate, slept Morphology

  36. Open Class Words: Verbs (2) non-3rd person eatI eat. We eat. They eat. 3rd personeats He eats. She eats. It eats. progressive eating He is eating. He will be eating. He has been eating. e.g. present participleHe is eating. gerundiveEating scorpions [NP] is common in China. use as adjectiveEating children [NP] are common at McDonalds. past participleeaten He has eaten the scorpion. The scorpion was eaten. simple past ate He ate the scorpion. Morphology

  37. Verb Forms 1 - The five verb forms Fig.2.6. The five verb forms. (Allen, 1995, p.28) Morphology

  38. Verb Forms 2 - The basic tenses Fig.2.7. The basic tenses. (Allen, 1995, p.29) Morphology

  39. Verb Forms 3 - The progressive tenses Fig.2.8. The progressive tenses. (Allen, 1995, p.29) Morphology

  40. Verb Tense Chart. From: http://www.athabascau.ca/courses/engl/155/support/verb_tenses.htm

  41. Open Class Words: Adjectives Adjectives denote qualities or properties of objects e.g. heavy, blue, content most languages have concepts for colour - white, green, ... age - young, old, ... value - good, bad, ... not all languages have adjectives as separate class Morphology

  42. Open Class Words: Adverbs 1 Adverbs denote modifications of actions (verbs) or qualities (adjectives) e.g. walk slowlyorheavily drunk Directionalor Locational adverbs specify direction or location e.g. go home, stay here Morphology

  43. Open Class Words: Adverbs 2 Degree Adverbs specify extent of process, action, property e.g. extremely slow, very modest Manner Adverbs specify manner of action or process e.g. walk slowly, run fast Temporal Adverbs specify time of event or action e.g. yesterday, Monday Morphology

  44. Closed Word Classes Closed Class Types: Prepositions: on, under, over, at, from, to, with, ... Determiners: a, an, the, ... Pronouns: he, she, it, his, her, who, I, ... Conjunctions: and, or, as, if, when, ... Auxiliary verbs: can, may, should, are, … Particles: up, down, on, off, in, out, … Numerals:one, two, three, ..., first, second, ... Morphology

  45. Closed Word Class: Prepositions Prepositions occur before noun phrases; describe relations; often spatial or temporal relations e.g. on the table spatial in two hours temporal Morphology

  46. Closed Word Class: Pronouns Pronouns reference to entities, events, relations etc. Personal Pronouns refer to persons or entities, e.g. you, he, it, ... Possessive Pronouns possession or relation between person and object, e.g. his, her, my, its, ... Wh-Pronouns reference in question or back reference, e.g. Who did this ..., Frieda, who is 80 years old ... Morphology

  47. Closed Word Class: Conjunctions Conjunctions join phrases or sentences; semantics is varied and complex Coordinating Conjunction Join two phrases or sentences on the same level through conjunctions like and, or, but, ... e.g. He takes a cat and a dog. He takes a dog and she takes a cat. Subordinating Conjunction Connect embedded phrases through e.g. that e.g. He thinks that the cat is nicer than the dog. Morphology

  48. Closed Word Class: Auxiliary Verbs Auxiliary Verbs Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult. Tense addition expressing present, past or future, ... e.g. He will take the cat home. Aspect addition expressing completion of action e.g. He is taking the cat home. (incomplete) Mood addition expressing necessityof action e.g. He can take the cat home. (possible) Morphology

  49. Closed Word Class: Copula, Modal Verbs Copula(be, do, have)andModal Verbs(can, should, ...) are subclasses of Auxiliary Verbs. Describe state, process, or tense / modality of action. Semantics: difficult (e.g. modal logic) State / Process: be and do e.g. He is at home. He does nothing. Tense: have e.g. He has taken the cat home. Modality: can, ought to, should, must e.g. He can take the cat home. (possibility) Morphology

  50. Tagsets and POS Tagging Morphology

More Related