1 / 15

Automatic classification for implicit discourse relations

Automatic classification for implicit discourse relations. Lin Ziheng. PDTB and discourse relations. Explicit relations Arg1: The bill intends to restrict the RTC to Treasury borrowings only,

aran
Download Presentation

Automatic classification for implicit discourse relations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic classification for implicit discourse relations Lin Ziheng

  2. PDTB and discourse relations • Explicit relations • Arg1: The bill intends to restrict the RTC to Treasury borrowings only, Arg2:unless the agency receives specific congressional authorization. (Alternative) (wsj_2200) • Implicit relations • Arg1: The loss of more customers is the latest in a string of problems. • Arg2:[for instance] Church's Fried Chicken Inc. and Popeye's Famous Fried Chicken Inc., which have merged, are still troubled by overlapping restaurant locations. (Instantiation) (wsj_2225)

  3. PDTB and discourse relations (2) • PDTB hierarchy of relation classes, types and subtypes

  4. PDTB and discourse relations (3) • Level-2 relation types, on implicit dataset from the training sections (sec. 2 - 21) • Remove Condition, Pragmatic Condition, Pragmatic Contrast, Pragmatic Concession and Exception • 11 relation types remained • Dominating types: • Cause • Conjunction • Restatement

  5. Contextual features r1 r2 • Arg1:Tokyu Department Store advanced 260 to 2410. Arg2:[and]Tokyu Corp. was up 150 at 2890. (List) (wsj_0374) • Arg1:Tokyu Department Store advanced 260 to 2410. Tokyu Corp. was up 150 at 2890. Arg2:[and]Tokyu Construction gained 170 to 1610. (List) (wsj_0374) Shared argument r1.Arg1 r1.Arg2 r2.Arg1 r2.Arg2 r2 Fully embedded argument r1 r1.Arg1 r1.Arg2 r2.Arg2 r2.Arg1

  6. Contextual features (2) • For each relation curr, look at the surrounding two relations prev and next, giving to a total of six features First figure in previous slide where curr = r2 Second figure in previous slide where curr = r2

  7. Syntactic Features • Arg1: "The HUD budget has dropped by more than 70% since 1980," argues Mr. Colton. Arg2:[so] "We've taken more than our fair share. (Cause) (wsj_2227)

  8. Syntactic Features (2) • Collect all production rules: • Ignore function tags, such as -TPC, -SBJ, -EXT • From Arg1: S  NP VP, NP  DT NNP NN, VP  VBZ VP, VP  VBN PP PP, PP  IN NP, NP  QP NN, QP  JJ IN CD, NP  CD, DT  The, NNP  HUD, NN  budget, VBZ  has, VBN  dropped, IN  by, JJ  more, IN  than, CD  70, NN  %, IN  since, CD  1980 • From Arg2: S  `` NP VP ., NP  PRP, VP  VBP VP, VP  VBN NP, NP  NP PP, NP  JJR, PP  IN NP, NP  PRP$ JJ NN, ``  ``, PRP  We, VBP ‘ve, VBN  taken, JJR  more, IN  than, PRP$  our, JJ  fair, NN  share, .  .

  9. Dependency features

  10. Dependency features (2) • Collect all words with dependency types from their dependents • From Arg1: budget  detnn, dropped  nsubj aux prep prep, by  pobj, than  advmod, 70  quantmod, %  num, since  pobj, argues  ccompnsubj, Colton  nn • From Arg2: taken  nsubj aux dobj, more  prep, than  pobj, share  possamod

  11. Lexical features • Collect all word pairs from Arg1 and Arg2, i.e., all (wi, wj) where wi is a word from Arg1 and wj is a word from Arg2

  12. Experiments • Classifier: OpenNLPMaxEnt • Training data: sections 2 – 21 • Test data: section 23 • Use Mutual Information(MI) to rank features for production rules, dependency rules and word pairs separately • Majority baseline: 26.1%, where all instances are classified into Cause

  13. Experiments (2) • Use contextual features and one other feature class • context + production rules • context + dependency rules • context + word pairs

  14. Experiments (3) • With large numbers of features • context + all production rules: 36.68% • context + all dependency rules: 27.94% • context + 10,000 word pairs: 35.25%

  15. Experiments (4) • Combine all feature classes, got an accuracy of 40.21%. • The following shows that all feature classes contribute to the performance

More Related