1 / 48

Dependency Parsing

Dependency Parsing. Some slides are based on: PPT presentation on dependency parsing by Prashanth Mannem Seven Lectures on Statistical Parsing by Christopher Manning. Constituency parsing. Breaks sentence into constituents (phrases), which are then broken into smaller constituents

macon
Download Presentation

Dependency Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dependency Parsing Some slides are based on: PPT presentation on dependency parsing by PrashanthMannem Seven Lectures on Statistical Parsing by Christopher Manning

  2. Constituency parsing • Breaks sentence into constituents (phrases), which are then broken into smaller constituents • Describes phrase structure and clause structure ( NP, PP, VP, etc.) • Structures often recursive

  3. S NP VP VP NP mom is an amazing show

  4. Dependency parsing • Syntactic structure consists of lexical items, linked by binary asymmetric relations called dependencies • Interested in grammatical relations between individual words (governing & dependent words) • Does not propose a recursive structure, rather a network of relations • These relations can also have labels

  5. Dependency vs. Constituency • Dependency structures explicitly represent • Head-dependent relations (directed arcs) • Functional categories (arc labels) • Possibly some structural categories (parts-of-speech) • Constituency structure explicitly represent • Phrases (non-terminal nodes) • Structural categories (non-terminal labels) • Possibly some functional categories (grammatical functions)

  6. Dependency vs. Constituency • A dependency grammar has a notion of a head • Officially, CFGs don’t • But modern linguistic theory and all modern statistical parsers (Charniak, Collins, …) do, via hand-written phrasal “head rules”: • The head of a Noun Phrase is a noun/number/… • The head of a Verb Phrase is a verb/modal/…. Based on a slide by Chris Manning

  7. Dependency vs. Constituency • The head rules can be used to extract a dependency parse from a CFG parse (follow the heads) • A phrase structure tree can be got from a dependency tree, but dependents are flat Based on a slide by Chris Manning

  8. Definition: dependency graph • An input word sequence w1…wn • Dependency graph G = (V,E) where • V is the set of nodes i.e. word tokens in the input seq. • E is the set of unlabeled tree edges (i, j) i, jєV • (ii, j) indicates an edge from i(parent, head, governor) to j (child, dependent)

  9. Definition: dependency graph • A dependency graph is well-formed iff • Single head: Each word has only one head • Acyclic: The graph should be acyclic • Connected: The graph should be a single tree with all the words in the sentence • Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B)

  10. Non-projective dependencies Ram saw a dog yesterday which was a Yorkshire Terrier

  11. Parsing algorithms • Dependency based parsers can be broadly categorized into • Grammar driven approaches • Parsing done using grammars • Data driven approaches • Parsing by training on annotated/un-annotated data

  12. Unlabeled graphs • Dan Klein recently showed that labeling is relatively easy and that the difficulty of parsing lies in creating bracketing (Klein, 2004) • Therefore some parsers run in two steps: 1) bracketing; 2) labeling

  13. Traditions • Dynamic programming • e.g., Eisner (1996), McDonald (2006) • Deterministic search • e.g., Covington (2001), Yamada and Matsumoto, Nivre (2006) • Constraints satisfaction • e.g., Maruyama, Foth et al.

  14. Data driven • Two main approaches • Global, Exhaustive, Graph-based parsing • Local, greedy, transition-based parsing

  15. Graph-based parsing • Assume there is a scoring function: • The score of a graph is • Parsing for input string x is All dependency graphs

  16. MST algorithm (McDonald, 2006) • Scores are based on features, independent of other dependencies • Features can be • Head and dependent word and POS separately • Head and dependent word and POS bigram features • Words between head and dependent • Length and direction of dependency

  17. MST algorithm (McDonald, 2006) • Parsing can be formulated as maximum spanning tree problem • Use Chu-Liu-Edmonds (CLE) algorithm for MST (runs in , considers non-projective arcs) • Uses online learning for determining weight vector w

  18. Transition-based parsing • A transition system for dependency parsing defines: • a set C of parser configurations, each of which defines a (partially built) dependency graph G • a set T of transitions, each a function t:CC • for every sentence x = w0,w1, . . . ,wn • a unique initial configuration cx • a set Qxof terminal configurations

  19. Transition sequence • A transition sequence Cx,m = (cx, c1, . . . , cm) for a sentence x is a sequence of configurations such that and, for every there is a transition such that • The graph defined by is the dependency graph of x

  20. Transition scoring function • The score of a transition tin a configuration cs(c, t) represents the likelihood of taking transition t out of configuration c • Parsing is finding the optimal transition sequence ( )

  21. Yamada and Matsumoto (2003) • A transition-based (shift-reduce) parser • Considers two adjacent words • Runs in iterations, continues as long as new dependencies are created • In every iteration, consider 3 different actions and choose one using SVM (or other discriminative learning technique) • Time complexity • Accuracy was shown to be close to the state-of-the-art algorithms (e.g., Eisner’s)

  22. Y&M (2003) Actions • Shift • Left • Right

  23. Y&M (2003) Learning • Features (lemma, POS tag) are collected from the context

  24. Stack-based parsing • Introducing a stack and a buffer • The buffer is a queue of all input words (left to right) • The stack begins empty; words are pushed to the stack by the defined actions • Reduces Y&M complexity to linear time

  25. 2 stack-based parsers • Nivre’s (2003, 2006) arc-standard i doesn’t have a head already j doesn’t have a head already Stack Buffer

  26. 2 stack-based parsers • Nivre’s (2003, 2006) arc-eager

  27. Example (arc eager) Red _ROOT_ figures on the screen indicated falling stocks S Q Borrowed from Dependency Parsing (P. Mannem)

  28. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Shift Borrowed from Dependency Parsing (P. Mannem)

  29. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Left-arc Borrowed from Dependency Parsing (P. Mannem)

  30. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Shift Borrowed from Dependency Parsing (P. Mannem)

  31. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Right-arc Borrowed from Dependency Parsing (P. Mannem)

  32. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Shift Borrowed from Dependency Parsing (P. Mannem)

  33. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Left-arc Borrowed from Dependency Parsing (P. Mannem)

  34. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Right-arc Borrowed from Dependency Parsing (P. Mannem)

  35. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Reduce Borrowed from Dependency Parsing (P. Mannem)

  36. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Reduce Borrowed from Dependency Parsing (P. Mannem)

  37. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Left-arc Borrowed from Dependency Parsing (P. Mannem)

  38. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Right-arc Borrowed from Dependency Parsing (P. Mannem)

  39. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Shift Borrowed from Dependency Parsing (P. Mannem)

  40. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Left-arc Borrowed from Dependency Parsing (P. Mannem)

  41. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Right-arc Borrowed from Dependency Parsing (P. Mannem)

  42. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Reduce Borrowed from Dependency Parsing (P. Mannem)

  43. Example Red _ROOT_ figures on the screen indicated falling stocks S Q Reduce Borrowed from Dependency Parsing (P. Mannem)

  44. Graph (MSTParser) vs. Transitions (MaltParser) • Accuracy on different languages Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007

  45. Graph (MSTParser) vs. Transitions (MaltParser) • Sentence length vs. accuracy Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007

  46. Graph (MSTParser) vs. Transitions (MaltParser) • Dependency length vs. precision Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007

  47. Known Parsers • Stanford (constituency + dependency) • MaltParser (dependency) • MSTParser (dependency) • Hebrew • Yoav Goldberg’s parser (http://www.cs.bgu.ac.il/~yoavg/software/hebparsers/hebdepparser/)

More Related