1 / 15

Machine Learning 2

Machine Learning 2. Inductive Dependency Parsing Joakim Nivre. Inductive Dependency Parsing. Dependency-based representations … have restricted expressivity but provide a transparent encoding of semantic structure. have restricted complexity in parsing. Inductive machine learning …

evadne
Download Presentation

Machine Learning 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning 2 Inductive Dependency Parsing Joakim Nivre

  2. Inductive Dependency Parsing • Dependency-based representations … • have restricted expressivity but provide a transparent encoding of semantic structure. • have restricted complexity in parsing. • Inductive machine learning … • is necessary for accurate disambiguation. • is beneficial for robustness. • makes (formal) grammars superfluous.

  3. Dependency Graph P ROOT OBJ PMOD NMOD SBJ NMOD NMOD NMOD

  4. Key Ideas • Deterministic: • Deterministic algorithms for building dependency graphs (Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Nivre 2003) • History-based: • History-based models for predicting the next parser action (Black et al. 1992, Magerman 1995, Ratnaparkhi 1997, Collins 1997) • Discriminative: • Discriminative machine learning to map histories to actions (Veenstra and Daelemans 2000, Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Nivre et al. 2004)

  5. Guided Parsing • Deterministic parsing: • Greedy algorithm for disambiguation • Optimal strategy given an oracle • Guided deterministic parsing: • Guide = Approximation of oracle • Desiderata: • High prediction accuracy • Efficient implementation (constant time) • Solution: • Discriminative classifier induced from treebank data

  6. Learning • Classification problem (S T) • Parser states: S = { s | s = (1, …, p) } • Parser actions: T= { t1, …, tm } • Training data: • D = { (si-1, ti) | ti(si-1) = si in gold standard derivation s1, …, sn } • Learning methods: • Memory-based learning • Support vector machines • Maximum entropy modeling • …

  7. hd rd ld ld t1 th … . … top … . . … next n1 n2 n3 Feature Models • Model P: PoS: t1, top, next, n1, n2 • Model D: P + DepTypes: t.hd, t.ld, t.rd, n.ld • Model L2: D + Words: top, next • Model L4: L2+ Words: top.hd, n1 Stack Input

  8. Experimental Results (MBL) • Results: – Dependency features help – Lexicalisation helps … – … up to a point (?)

  9. Parameter Optimization • Learning algorithm parameter optimization: • Manual (Nivre 2005) vs. paramsearch (van den Bosch 2003)

  10. Learning Curves Swedish: • Attachment score (U/L) • Models: D, L2 • 10K tokens/section English: • Attachment score (U/L) • Models: D, L2 • 100K tokens/section

  11. Dependency Types: Swedish • High accuracy (84% labeled F): IM (markerinfinitive) 98.5%PR (preposition  noun) 90.6%UK (complementizer  verb) 86.4%VC (auxiliary verb  main verb) 86.1%DET (noun  determiner) 89.5%ROOT 87.8%SUB (verb  subject) 84.5% • Medium accuracy (76% labeled F 80%): ATT (noun modifier) 79.2%CC (coordination) 78.9%OBJ (verb  object) 77.7%PRD (verb  predicative) 76.8%ADV (adverbial) 76.3% • Low accuracy (labeled F 70%): INF, APP, XX, ID

  12. Dependency Types: English • High accuracy (86% labeled F): VC (auxiliary verb  main verb) 95.0%NMOD (noun modifier) 91.0%SBJ (verb  subject) 89.3%PMOD (prepositionmodifier) 88.6%SBAR (complementizer  verb) 86.1% • Medium accuracy (73% labeled F 83%): ROOT 82.4%OBJ (verb  object) 81.1% VMOD (verb modifier) 76.8%AMOD (adjective/adverb modifier) 76.7%PRD (predicative) 73.8% • Low accuracy (labeled F 70%): DEP (null label)

  13. MaltParser • Software for inductive dependency parsing: • Freely available for research and education (http//www.msi.vxu.se/users/nivre/research/MaltParser.html) • Version 0.3: • Parsing algorithms: • Nivre (2003) (arc-eager, arc-standard) • Covington (2001) (projective, non-projective) • Learning algorithms: • MBL (TIMBL) • SVM (LIBSVM) • Feature models: • Arbitrary combinations of part-of-speech features, dependency type features and lexical features • Auxiliary tools: • MaltEval • MaltConverter • Proj

  14. CoNLL-X Shared Task

  15. Possible Projects • CoNLL Shared Task: • Work on one or more languages • With or without MaltParser • Data sets available • Parsing spoken language: • Talbanken05: Swedish treebank with written and spoken data, cross-training experiments • GSLC: 1.2M corpus of spoken Swedish

More Related