500 likes | 903 Views
Morphology. What is morphology? Finite State Transducers Two Level Morphology. What is morphology?. Decomposition of words into meaningful units: anti dis establish ment arian ism Interacts with- syntax( categories and word order)
E N D
Morphology What is morphology? Finite State Transducers Two Level Morphology
What is morphology? • Decomposition of words into meaningful units: • anti dis establish ment arian ism • Interacts with- syntax( categories and word order) • [establish] = verb + ment = noun • phonology: divine divinity • obscene obscenity • Interacts with semantics: • boy boys • Peter Peterchen
Phonological String morphological analyzer dictionary lookup syntactic analyzer lexical- semantic analysis discourse processing
Why store all words as morphemes rather than all Morphological combinations as words? What does the morphological analyzer have to output?
The what and the how: • Efficient and effective algorithm to decompose categories into, • or build categories from, component morphemes. • What this algorithm will be depends on problems it has to solve. • In turn depends on representations computed. • Given stem /lemma ( e.g. ‘jump’ add material to change category • Or grammatical properties of word ‘jumped’, ‘jumpable’ • order of composition matters: • ride/ riding • enoble/ enobling/*nobling Adj ---> V, V===> V+ing • trance/*trancing/entrance/entrancing
CONCATENATIVE MORPHOLOGICAL PROCESSES: COMPOUNDING: firefighter PREFIXATION: Un+ well INFIXATION: ( TAGALOG) fikas - strong fumikas - be strong SUFFIXATION: Kick + er CIRCUMFIXATION: ( German) ge [sag] t past prefix [say] past suffix
Inflectional Morphology • non category changing, required by syntax • Agreement: person/number: • Je parle • Nous parlons • Ils parlent • Gender: • la petite ( the little one (fem)) • le petit ( the little one (masc)) • la squelette ( the skeleton)
Derivational Morphology • changes category. Not required by syntax • Deverbal Nominal: • bak+er tion: destroy/destruction • catch+ er Roman's destruction of the city • 'er' = agent of action Catcher of the ball • John’s catcher of the ball • 'John" ~= one who caught
Regular vs Irregular Jump/jumped hit/hit bring/brought sing/sang Productive/Non-Productive adore/adorable, kick/kickable, fax/faxable produce/production destroy/destruction *graft/graftuction Bring/ brought
Productive and rule governed: • fax fax +er • ??? Crudoy cruduction • Category sensitivity: • breakable/* manable • sensitivity/ *hittivity • Semantic sensitivity: • un + well un + happy • *un + ill *un+ sad
Store morphemes or words? lebensversicherungsgesellschaftsangesteller leben+ versicherung + gesellschaft+s+angesteller life insurance company +Poss employee Turkish: Turkish verns have 40k forms
Non- concatenative Morphology • Templatic morphology (Semitic languages):lmd (learn), lamad (he studied), limed (he taught), lumad (he was taught)
Concatenation: Beads on a string Agglutinative ( concatenative) languages are well behaved for FSAs as long as we don’t include phonological or spelling changes Verb Lexicon: jump+edjump kiss+ed kiss stream+ed stream *hopp+ed hop, ??? verb ed q q 1 q q1 q2 0
Pieces of a Morphological Analyzer -er,est,ly un adj-root q2 q3 q0 q1 The lexicon stores the lemmas, and divides them into adjective classes really/clearly *bigly/redly Morphotactics: State sequence indicates order of morpheme composition e.g. comparative or adverb formation is by suffixation
Lexicon • Arranged as TRIE ( letter strings in common relative to position • n-k-e-y • D-o • -g • Classed by part of speech category ( noun, verb) and morphotactic • (which other affixes can precede or follow) • or orthographic considerations.
Orthography • spelling rules- handle phonological or spelling variation in • orthographic a morpheme • Try /trying/tries • Cringe/cringing/cringes
Using FSAs for Recognition: English Nouns and their Inflection
Orthographic • Want association between morpheme and semantic function • Want association between allographs or allophones of the same • phoneme • Allographs: • city -cities • bake- baking • divine-divinity • try tried
Finite State Transducers (FSTs)- the Big Idea Need to relate lexical level, the level that gives us the morphological analysis (+plural,+able to the surface level that keeps track of phonological/ or graphological (spelling_ changes)
Parsing vs recognition • An FSA can give you the string composition of a morphological sequence, and can tell you whether a given morphological string is or is not in the language. It recognizes the string • An FST parses the string. It tells you the morphological structure associated with the string. Other instances of parsing?
Formal definition • An FST defines a relation between sets of pairs of strings: • It contains at least a lexical level that is a concatenation of morphemes • and a surface level that shows the correct spelling for each • morpheme in a given context • cat/sheep ^ s • e.g. noun (instanciated from lexicon) + plural • E s • cats/sheep
Q= finite set of states q0 to qn finite alphabet of complex symbols (feasible pairs) i:o with one symbol from the input alphabet Q0 = the start state F= set of final states = (q, i:o) the transition function or matrixbetween states. Takes a state from Q and a complex symbol i:o from and returns a new state. feasible pair: a relation of a symbol on one tape to a symbol on the other tape. e.g. can + [pl:^s]
default pair- the upper tape is the same as the lower tape • same input as output :c*a*t/c:c*a:a*t:t*pl:^s • feasible pairs either stated in lexicon if irregular • g:g*o:e*o:e*s:s*e:e goose:geese • or by an automaton that stipulates correspondence in rule • governed way if the relation is regular. If regular, indicated as • Default paris and usually represented by one symbol. • FSTs are closed under: • inversion: switches i/o labels • composition: union of two transducers • one after the other.
trie: in lexicon, categories arranged by letter one at a time with class at end. Allows parallel search as long as things match e.g. m*e*t*a*l <N> m*e*t*a <root> metal, meta-language
Kimmo-BasedMorphological Parsing • Two-level morphology: lexical level + surface level (Koskenniemi 83) • Finite-state transducers (FST): input-output pair
Four-Fold View of FSTs • As a recognizer • As a generator • As a translator • As a set relater
Terminology for Kimmo • Upper = lexical tape • Lower = surface tape • Characters correspond to pairs, written a:b • If “a=b”, write “a” for shorthand • Two-level lexical entries • # = word boundary • ^ = morpheme boundary • Other = “any feasible pair that is not in this tranducer”
Notation x s z ^ __ s # e --> e /
FSTs and ambiguity Parse Example 1: unionizable Parse Example 2: assess
What to do about Global Ambiguity? • Accept first successful structure • Run parser through all possible paths • Bias the search in some manner
Stemming • For some applications,don’t need full morphological analysis. • IR- don’t care that e.g ‘logician’ is related to ‘logical’ Just want • to know that if you are interested in articles about ‘logic’ • may want former two classes as well. So just want to ‘get back • to root list. • Relate two forms by having a literal relation rule. E.g • al#---> 0 • Is it useful: in a big document may not be necessary because the • will appear in many forms including form in query
stemming is morphologically impoverished so error driven • - can’t distinguish rules that apply at morpheme boundaries • versus internal to root: • patronization = patron + ize + ation • organization = organize+ ation • But the stemmer will treat these as a single class and derive • “organ” as an underlying root. • -’adverse’/’adversity • ‘universe / university
Psycholinguistics • Is the human lexicon efficient in the way computational lexica • are? • -Stanners et al (1979) :where two words are related inflection- • ally,then root stored and other forms rule derived. Where • there is a derivational relationship, then both forms are stored • paradigm = repetition priming • ‘great, happy, peachy, adorable , round, short, great • small • Repetition priming for ‘turns’ given ‘turning’ but not • ‘select’, ‘selective’
Marslen- Wilson et al (1994): May have priming for • Semantically similar derivationally related words: • permit/permission • * create/creativity • On-line versus long term storage lexicon: • Speech errors: ‘we have screw looses’