1 / 17

Annotation for Hindi PropBank

Annotation for Hindi PropBank. Outline. Introduction to the project Basic linguistic concepts Verb & Argument Making information explicit Null arguments. Tasks to be carried out Tools for annotation Timesheets, tips P ractice. Creation of Resources. For machines rather than humans

march
Download Presentation

Annotation for Hindi PropBank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotation for Hindi PropBank

  2. Outline • Introduction to the project • Basic linguistic concepts • Verb & Argument • Making information explicit • Null arguments • Tasks to be carried out • Tools for annotation • Timesheets, tips • Practice

  3. Creation of Resources • For machines rather than humans • Imagine a dictionary/ thesaurus for computers • A requirement for Natural Language Processing • Large annotated resources • Annotation implies addition of linguistic information • Tailored to language specific requirements • Needs to be as consistent as possible • Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation

  4. Hindi-Urdu Treebank Project • One of the first efforts to make a large-scale resource for Hindi-Urdu • Similar resources exist for Chinese, Arabic and English • Three main components • Hindi-Urdu dependency treebank • Hindi-Urdu PropBank • Hindi-Urdu phrase structure treebank [derived]

  5. PropBank • PropBank resource creation at CU Boulder • We annotate semantic information on top of syntactic information • PropBank involves annotation of predicate argument structure • Mainly concerned with verbs & their arguments • And the semantic nature of the arguments

  6. What are verbs? • Verbs are predicating elements e.gdaud, pii, baras etc • Encode (very broadly) actions and states • Also have two kinds of grammatical information • Tense, aspect (present, future ; perfect, continuous) • Gender, number, person (masc/fem; sing, pl; 1st, 2nd, 3rd )

  7. What are arguments? • In a sentence, e.g Ram ate an apple / Raam ne sebkhaaya: • A verb, ‘eat’ or ‘khaa’ predicate • A person eating ‘Raam’ ARGUMENT • Thing eaten ‘apple’ / ‘seb’ ARGUMENT • Without arguments, the meaning of the verb ‘ate’ is not realized completely • Together, they make up the predicate argument structure of the sentence

  8. Arguments show what’s important • Raam ne jaldi se sebkhaaya • Raam, seb are arguments • But ‘jaldi se’ is not • It’s all about the verb • It projects its need for certain arguments • Sift what’s mandatory from what’s optional

  9. Like Unix commands • Some commands require only one argument. • cd/home/student/ashwini • cphmwk1.txthmwk2.txt • If the command is typed with too many or too few arguments…

  10. Error!

  11. Making information explicit • As speakers of Hindi or English, we already have knowledge of predicate argument structure • E.g. hari ___ pahuMcaa • Capturing this knowledge for the machine is essential • Ram ne sebkhaayaaurpaanipiyaa • Who drank the water?

  12. Identify arguments • In PropBank, we first identify arguments of a verb • When explicitly present, they are called ARG • Further, they are numbered as ARG0, ARG1, ARG2 etc. • Often, you have ARG as well as ARG-M • RamARG0 ne jaldiseARG-M sebARG1khaaya

  13. Null arguments • What if arguments are not explicit? • E.g Ram ne sebkhaayaaur___ paanipiyaa • Ram is also the person drinking water • It can be dropped, because of conjunction aur • For the machine, it must be retrieved from the sentence • We also mark these missing or null arguments

  14. Tasks to be carried out • Null argument insertion • Argument annotation

  15. Tools to be used • Sanchay – GUI for annotators. We use it especially for Null argument insertion • Use yourverbs account to access Sanchay • Wiki for annotator resources

  16. Timesheets & tips • Being honest about filling out timesheets is quite important • We can access the amount of time you spend on verbs • I will ask you to keep track of number of annotations per hour to cross check • Turn in the timesheets at my CINC mailbox in physical form, with your signature

  17. Practice • We need to learn about four kinds of empty categories • Plan to proceed • Recognizing syntactic constructions • Getting familiar with the tool • Practice with the corpus • Q & A based on null argument insertion

More Related