250 likes | 1.19k Views
RECOGNISING NOMINALISATIONS . Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh. DEFINITION. “Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness)
E N D
RECOGNISING NOMINALISATIONS • Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata • (Andrew) Yuk On KONG • University of Edinburgh
DEFINITION • “Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness) • or (in classical transformational grammar especially) the derivation of a noun phrase from an underlying clause (e.g. Her answering of the letter….from She answered the letter). • The term is also used in the classification of relative clauses (e.g. What concerns me is her attitude)…….” (Crystal 1997)
Nominalisations (1st definition) from verbs only are considered here, e.g. "statement" from "state". • Problem: WORD--noun? from a verb or not? • Nominalsations derived from verbs are very productive in English and are usually created by means of suffixation (i.e., suffixes that form nouns are attached to verb bases).
EXCLUSIONS • Nominals, e.g. the poor, the wounded • Nominalisation NOT From Verb, e.g. redness • -ing form, e.g. the making of the movie • Antidisestablish-ment-arian-ism
REGULAR? • Nominalise nominalisation • Interpret interpretation • Interrupt interruption • Associate association • delete deletion • break breakage • leak leakage
Confine confinement • Refine refinement (but • define definition) • submit submission • admit admission (but also admittance) • remit remission; remittance; remit
VERB=NOUN • Debate Debate (not debation); debater • Pay pay • Love love • Boss boss • Stand stand • purchase purchase • Lie lie (“tell a lie”) • (cf lie down)
VERB=NOUN (except stress) • transfer transfer • transport transport • import import • rebel rebel; (rebellion)
1 VERB, >1 NOUNS • Collect collection; collector • Interpret interpretation; interpreter • Cover cover; coverage • Conduct conduction; conductor; • Depend dependant/dependent; dependence; dependency
SEMANTICS • Conduct conduction (conduct electricity/heat) • Conduct conduct (behave/organise)
WHEN TO USE WHICH SUFFIX • -tion/-sion • er/or • Debate debater • Talk talker • Collect collector • Conduct conductor
IRREGULAR NOMINALISATION • Choose choice • Succeed success;succession;successor • Decide decision • Sell sale
PSEUDO-NOMINALISATION • mote?? Motion (noun; a very small piece of dust) • Depart Departure; Department??? • Apart apartment????
WHY BOTHER? • The identification of nominalisations and their associated verbs (e.g. "statement" and "state"). important for a number of NLP tasks: • machine translation • information retrieval • automatic learning of machine-readable dictionaries • grammar induction
HOW ? • nominalisation is a productive morphological phenomenon: • list all acceptable nominalised forms? • New words?
techniques NOT focusing on nominalisations • build rules • machine-learning approaches to induce morphological structures using large corpora • knowledge-free induction of inflectional morphologies (Schone and Jurafsky 2001).
SCHONE AND JURAFSKY (2001) • Schone and Jurafsky (2001) have performed work for acquiring cognates and morphological variants. • Induced semantics—Latent Semantic Analysis (LSA) • Induced orthographic info • Induced syntactic info • Transitive information • Affix frequencies
GOAL OF THIS STUDY • The principal goal of this project is to develop a system which can recognise nominalisations, together with the verbs from which they are derived.
EXPERIMENT 1 (baseline) • identify nouns using the tags in the corpus • identify potential nominalisations from the list of nouns with a list of nominalisation suffixes • find the corresponding potential verb for each by identifying the verb (from among verbs as tagged) that shares with it the greatest number of letters in sequence • accept a pair of nominalisation and verb if the % letter matched > 50% and discard any other
EXPERIMENT 2 • using decision tree to build a model • possible features include: -letter similarity between verbs and nouns -suffix frequency -verb frequency -verb semantics -subject of noun -subject of verb
EVALUATION • experiments will be based on the BNC corpus. • The obtained nominalisations will be evaluated against the CELEX morphological lexicon and manually annotated data. • Precision, recall and F-score
BRITISH NATIONAL CORPUS • Over 100 million words • Corpus of modern English • Both spoken (10%) and written (90%) • Each word is automatically tagged by the CLAWS stochastic POS tagger • 65 different tags • encoded using SGML to represent POS tags and a variety of other structural properties of texts (e.g. headings, paragraphs, lists, etc.)
<item> • <s n=086> • <w NN1-VVG>Shopping <w PRP>including <w NN1>collection <w PRF>of • <w NN2>prescriptions • </item> • <item> • <s n=087> • <w VVG>Daysitting <w CJC>and <w VVG>nightsitting • </item>
CELEX • English, Dutch and German • Annotated by human using lemmata from two dictionaries of English • 52,446 lemmata and 160,594 wordforms • orthographic, phonological, morphological, syntactic and frequency information • morphological structure, e.g. ((celebrate),(ion))
MILESTONES • 6/2002 Experiment 1—baseline • 7/2002 Experiment 2 • 8/2002 Write-up • 9/2002 Finalise report