1 / 14

Jan Haji č Otakar Smr ž Petr Zemánek Jan Šnaidauf Emanuel Beška

Prague Arabic Dependency Treebank. Development in Data and Tools. Jan Haji č Otakar Smr ž Petr Zemánek Jan Šnaidauf Emanuel Beška. Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague. Project Release – PADT 1.0.

boyd
Download Presentation

Jan Haji č Otakar Smr ž Petr Zemánek Jan Šnaidauf Emanuel Beška

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prague Arabic DependencyTreebank Development in Data and Tools Jan HajičOtakar SmržPetr ZemánekJan ŠnaidaufEmanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague

  2. Project Release – PADT 1.0 • December 2004, Linguistic Data Consortium • 148 000 Morpho, 113 500 Syntax Prague Arabic Dependency Treebank: Development in Data and Tools

  3. Open-Source Tools • TrEd Tree Editor • Multi-purpose annotation environment • Suite of programming utilities • Netgraph Search Engine • Server/Client system architecture • Easy-to-learn query language • Encode::Arabic Perl Module • Extension for processing of Arabic script • ArabTeX, Buckwalter, Unicode, … Prague Arabic Dependency Treebank: Development in Data and Tools

  4. PADT Functional Views • Functional Generative Description • Theory of linguistic meaning and its expression • Prague Dependency Treebank for Czech • Independence of representation levels • Tectogrammatical – linguistic meaning • Analytical – surface dependency syntax • Morphological – categories and lexical units • Abstraction of the relations across levels • Strict distinction between form and function • Different units of description on each level Prague Arabic Dependency Treebank: Development in Data and Tools

  5. Functional Morphology • Provides syntax levels with their abstract language, not just giving letters in tokens • Revives multiple senses of categories • Completeness of generation • Strict modeling of grammatical control • MorphoTrees – ‘human tagging’ • Successful prototype feature-based tagger Prague Arabic Dependency Treebank: Development in Data and Tools

  6. Syntactic Levels of Description • Analytical level • Pragmatically motivated, close to surface syntax • Every single token resulting frommorphological level forms one node • Tree-like dependency structure for every sentence • Tectogrammatical level • Linguistic (literal) meaning, deep relations, TFA • Initial structures transformed from AL • Nodes for autosemantic words only • Decisive role of valency frames Prague Arabic Dependency Treebank: Development in Data and Tools

  7. Logic of Analytical Trees • Concepts of dependency and valency • Reduction: sentence must retain grammatical correctness if leaves(terminal nodes) are chopped off • Trees: clause components  clauses  sentences  paragraphs etc.Subtrees of clauses exchangeable for non-clauses • Nodes: words, tokenized parts of words, punctuation marks – marked by functions • Edges: syntactic relations –governing node  dependent node/subtree Prague Arabic Dependency Treebank: Development in Data and Tools

  8. Some Syntax Issues of Arabic • Non-verbal predication of several types • Subordinate non-verbal clauses / modification • Verb-like behavior of many nominal forms • Mostly VSO in verbal sentences, but… • vice-versa in non-verbal clauses • different, depending on context boundness • Compound verbs, fixed composite prepositions • Grammatical co-reference, accusative ofinner object, complex referencing, etc. Prague Arabic Dependency Treebank: Development in Data and Tools

  9. Problem I: Predication • Head node of tree: PREDICATE • Why? Steady role in sentence, cannot be omitted • Verbal predicate: I-go to school • Non-verbal predicate • Nominal: The-house a-big (=the house is big) • Existential: There a-city (=there is a city) • Prepositional • Possessive: For him a-house (=he has a house) • Adverbial: The-mosque in the-city (=…is…) • Conjunctional: The-problem that (=…is that) Prague Arabic Dependency Treebank: Development in Data and Tools

  10. Predication Types in Trees Verbal Nominal dAma [Pred] lasted kabIrun [Pnom] a-big[nom.] iqtirAHu [Sb] proposal sAEatayni [Adv] two-hours [acc.] Prepositional(possessive) al-baytu [Sb] the-house[nom.] Existential ‑hu [Atr] his al-EamalIyata [Obj] the-operation [acc.] EalA [AuxP] on vam~ata [PredE] there-is zumalA’i [Obj] colleagues Prepositional(adverbial, locative) la- [PredP] for madInatun [Sb] a-city [nom.] ‑hi [Atr] his Verb-likebehavior (object of noun?) fI [PredP] in -hu [Obj] him baytun [Sb] a-house [nom.] al-jAmiEu [Sb] the-mosque [nom.] al-madInati [Adv] the-city [gen.] Prague Arabic Dependency Treebank: Development in Data and Tools

  11. Problem II: Clauses & Co-reference • Recursiveness: subordinate clause is con-tained as subtree in place of simple element • Head-node of clause gets the same function • Problem: non-verbal structures – clauses or not? • Compound verbs (mA zAla etc.) treated equally • Grammatical co-reference: Personal pro- noun formally required by another element • Pronoun must be marked to be treated as such • Target of reference is unambiguously identifiable • Often in subordinate clauses, mostly attributiveEx.: He-wrote a-book number its-pages hundred Prague Arabic Dependency Treebank: Development in Data and Tools

  12. Clauses & Co-reference in Trees Compound verb, formed as main verb and its complement Attributive clause, prepositional predicate (adverbial) zAlat [Pred] she-stopped kataba [Pred] he-wrote kitAban [Obj] a-book mA [AuxM] not Objective clause, verbal predicate tuHis~u [Atv] she-feels al-rajulu [Sb] the-man [nom.] fI [Atr_PredP] in zaybabu [Sb] Zaynab Attributive clause, nominal predicate mi’atu [Sb] hundred [nom.] Referencing pronoun, as attribute in clause anna [AuxC] that -hi [Adv_Ref] it tuEjibu [Obj_Pred] they-impress SafHatin [Atr] pages [gen.] jumalan [Sb] sentences [acc.] Referencing pronoun, as adverbial in clause wADiHun [Atr_Pnom] clear [nom.] naHwu [Sb] grammar [nom.] -hA [Obj] her ‑hA [Atr_Ref]their Prague Arabic Dependency Treebank: Development in Data and Tools

  13. Future Prospects • Implementation of Functional Morphology • Tectogrammatical annotation • Lexicons of valency frames • Re-training the feature-based tagger on MorphoTrees • Machine-learning on the treebank data for various purposes Prague Arabic Dependency Treebank: Development in Data and Tools

  14. Thank you Questions welcome! http://ckl.mff.cuni.cz/padt/ Prague Arabic Dependency Treebank: Development in Data and Tools

More Related