1 / 25

RECOGNISING NOMINALISATIONS

RECOGNISING NOMINALISATIONS . Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh. DEFINITION. “Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness)

Rita
Download Presentation

RECOGNISING NOMINALISATIONS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RECOGNISING NOMINALISATIONS • Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata • (Andrew) Yuk On KONG • University of Edinburgh

  2. DEFINITION • “Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness) • or (in classical transformational grammar especially) the derivation of a noun phrase from an underlying clause (e.g. Her answering of the letter….from She answered the letter). • The term is also used in the classification of relative clauses (e.g. What concerns me is her attitude)…….” (Crystal 1997)

  3. Nominalisations (1st definition) from verbs only are considered here, e.g. "statement" from "state". • Problem: WORD--noun? from a verb or not? • Nominalsations derived from verbs are very productive in English and are usually created by means of suffixation (i.e., suffixes that form nouns are attached to verb bases).

  4. EXCLUSIONS • Nominals, e.g. the poor, the wounded • Nominalisation NOT From Verb, e.g. redness • -ing form, e.g. the making of the movie • Antidisestablish-ment-arian-ism

  5. REGULAR? • Nominalise nominalisation • Interpret interpretation • Interrupt interruption • Associate association • delete deletion • break breakage • leak leakage

  6. Confine confinement • Refine refinement (but • define definition) • submit submission • admit admission (but also admittance) • remit remission; remittance; remit

  7. VERB=NOUN • Debate Debate (not debation); debater • Pay pay • Love love • Boss boss • Stand stand • purchase purchase • Lie lie (“tell a lie”) • (cf lie down)

  8. VERB=NOUN (except stress) • transfer transfer • transport transport • import import • rebel rebel; (rebellion)

  9. 1 VERB, >1 NOUNS • Collect collection; collector • Interpret interpretation; interpreter • Cover cover; coverage • Conduct conduction; conductor; • Depend dependant/dependent; dependence; dependency

  10. SEMANTICS • Conduct conduction (conduct electricity/heat) • Conduct conduct (behave/organise)

  11. WHEN TO USE WHICH SUFFIX • -tion/-sion • er/or • Debate debater • Talk talker • Collect collector • Conduct conductor

  12. IRREGULAR NOMINALISATION • Choose choice • Succeed success;succession;successor • Decide decision • Sell sale

  13. PSEUDO-NOMINALISATION • mote?? Motion (noun; a very small piece of dust) • Depart Departure; Department??? • Apart apartment????

  14. WHY BOTHER? • The identification of nominalisations and their associated verbs (e.g. "statement" and "state"). important for a number of NLP tasks: • machine translation • information retrieval • automatic learning of machine-readable dictionaries • grammar induction

  15. HOW ? • nominalisation is a productive morphological phenomenon: • list all acceptable nominalised forms? • New words?

  16. techniques NOT focusing on nominalisations • build rules • machine-learning approaches to induce morphological structures using large corpora • knowledge-free induction of inflectional morphologies (Schone and Jurafsky 2001).

  17. SCHONE AND JURAFSKY (2001) • Schone and Jurafsky (2001) have performed work for acquiring cognates and morphological variants.  • Induced semantics—Latent Semantic Analysis (LSA) • Induced orthographic info • Induced syntactic info • Transitive information • Affix frequencies

  18. GOAL OF THIS STUDY • The principal goal of this project is to develop a system which can recognise nominalisations, together with the verbs from which they are derived.

  19. EXPERIMENT 1 (baseline) • identify nouns using the tags in the corpus • identify potential nominalisations from the list of nouns with a list of nominalisation suffixes • find the corresponding potential verb for each by identifying the verb (from among verbs as tagged) that shares with it the greatest number of letters in sequence • accept a pair of nominalisation and verb if the % letter matched > 50% and discard any other

  20. EXPERIMENT 2 • using decision tree to build a model • possible features include: -letter similarity between verbs and nouns -suffix frequency -verb frequency -verb semantics -subject of noun -subject of verb

  21. EVALUATION • experiments will be based on the BNC corpus. • The obtained nominalisations will be evaluated against the CELEX morphological lexicon and manually annotated data. • Precision, recall and F-score

  22. BRITISH NATIONAL CORPUS • Over 100 million words • Corpus of modern English • Both spoken (10%) and written (90%) • Each word is automatically tagged by the CLAWS stochastic POS tagger • 65 different tags • encoded using SGML to represent POS tags and a variety of other structural properties of texts (e.g. headings, paragraphs, lists, etc.)

  23. <item> • <s n=086> • <w NN1-VVG>Shopping <w PRP>including <w NN1>collection <w PRF>of • <w NN2>prescriptions • </item> • <item> • <s n=087> • <w VVG>Daysitting <w CJC>and <w VVG>nightsitting • </item>

  24. CELEX • English, Dutch and German • Annotated by human using lemmata from two dictionaries of English • 52,446 lemmata and 160,594 wordforms • orthographic, phonological, morphological, syntactic and frequency information • morphological structure, e.g. ((celebrate),(ion))

  25. MILESTONES • 6/2002 Experiment 1—baseline • 7/2002 Experiment 2 • 8/2002 Write-up • 9/2002 Finalise report

More Related