1 / 34

RTE @ Stanford

RTE @ Stanford. Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning and Andrew Y. Ng PASCAL Challenges Workshop April 12, 2005. Our approach.

jeana
Download Presentation

RTE @ Stanford

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RTE @ Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning and Andrew Y. Ng PASCAL Challenges Workshop April 12, 2005

  2. Our approach • Represent using syntactic dependencies • But also use semantic annotations. • Try to handle language variability. • Perform semantic inference over this representation • Use linguistic knowledge sources. • Compute a “cost” for inferring hypothesis from text. Low cost  Hypothesis is entailed.

  3. Outline of this talk • Representation of sentences • Syntax: Parsing and post-processing • Adding annotations on representation (e.g., semantic roles) • Inference by graph matching • Inference by abductive theorem proving • A combined system • Results and error analysis

  4. Sentence processing • Parse with a standard PCFG parser. [Klein & Manning, 2003] • Al Qaeda: [Aa]l[ -]Qa’?[ie]da • Train on some extra sentences from recent news. • Used a high-performing Named Entity Recognizer (next slide) • Force parse tree to be consistent with certain NE tags. Example: American Ministry of Foreign Affairs announced that Russia called the United States... (S (NP (NNP American_Ministry_of_Foreign_Affairs)) (VP (VBD announced) (…)))

  5. Named Entity Recognizer • Trained a robust conditional random field model. [Finkel et al., 2003] • Interpretation of numeric quantity statements Example: T: Kessler's team conducted 60,643 face-to-face interviews with adults in 14 countries. H: Kessler's team interviewed more than 60,000 adults in 14 countries. TRUE Annotate numerical values implied by: • “6.2 bn”, “more than 60000”, “around 10”, … • MONEY/DATE named entities

  6. Parse tree post-processing • Recognize collocations using WordNet Example: Shrek 2 rang up $92 million. (S (NP (NNP Shrek) (CD 2)) (VP (VBD rang_up) (NP (QP ($ $) (CD 92) (CD million)))) (. .)) MONEY, 9200000

  7. Basic representations Parse tree  Dependencies • Find syntactic dependencies • Transform parse tree representations into typed syntactic dependencies, including a certain amount of collapsing and normalization Example: Bill’s mother walked to the grocery store. subj(walked, mother) poss(mother, Bill) to(walked, store) nn(store, grocery) • Dependencies can also be written as a logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C)

  8. grocery store walked mother Bill subj to poss nn ARGM-LOC Representations Logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C) Dependency graph VBD PERSON ARGM-LOC VBD PERSON • Can make representation richer • “walked” is a verb • “Bill” is a PERSON (named entity). • “store” is the location/destination of “walked”. • …

  9. Annotations • Parts-of-speech, named entities • Already computed. • Semantic roles Example: T: C and D Technologies announced that it has closed the acquisition of Datel, Inc. H1: C and D Technologies acquired Datel Inc. TRUE H2: Datel acquired C and D Technologies. FALSE • Use a state-of-the-art semantic role classifier to label verb arguments. [Toutanova et al. 2005]

  10. More annotations • Coreference Example: T: Since its formation in 1948, Israel … H: Israel was established in 1948. TRUE • Use a conditional random field model for coreference detection. Note: Appositive “references” were previously detected. T: Bush, the President of USA, went to Florida. H: Bush is the President of USA. TRUE • Other annotations • Word stems (very useful) • Word senses (no performance gain in our system)

  11. Event nouns • Use a heuristic to find event nouns • Augment text representation using WordNet derivational links. Example: T: … witnessed the murder of police commander ... H: Police officer killed. TRUE Text logical formula: murder(M) police_commander(P) of(M, P) Augment: murder(E, M, P) NOUN: VERB:

  12. Outline of this talk • Representation of sentences • Syntax: Parsing and post-processing • Adding annotations on representation (e.g., semantic roles) • Inference by graph matching • Inference by abductive theorem proving • A combined system • Results and error analysis

  13. bought ARG0(Agent) ARG1(Theme) purchased John BMW ARG0(Agent) ARG1(Theme) PERSON John car PERSON Graph Matching Approach • Why Graph Matching?: • Dependency tree has natural graphical interpretation • Successful in other domains: e.g., Lossy image matching • Input: Hypothesis (H) and Text (T) Graphs Toy example: • Vertices are words and phrases • Edges are labeled dependencies • Output: Cost of matching H to T (next slide)

  14. bought ARG0(Agent) ARG1(Theme) T purchased John BMW ARG0(Agent) ARG1(Theme) matching PERSON John car PERSON H Graph Matching: Idea • Idea: Align H to T so that vertices are similar and preserve relations (as in machine translation) • A matching M is a mapping from vertices of H to vertices of T Thus, for each vertex v in H, M(v) is a vertex in T

  15. Graph Matching: Costs • The cost of a matching MatchCost(M) measures the “quality” of a matching M • VertexCost(M) – Compare vertices in H with matched vertices in T • RelationCost(M) – Compare edges (relations) in H with corresponding edges (relations) in T • MatchCost(M) = (1 - ß) VertexCost(M) + ß RelationCost(M)

  16. Graph Matching: Costs • VertexCost(M) For each vertex v in H, and vertex M(v) in T: • Do vertex heads share same stem and/or POS ? • Is T vertex head a hypernym of H vertex head? • Are vertex heads “similar” phrases? (next slide) • RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T • Are parent/child pairs in H parent/child in T ? • Are parent/child pairs in H ancestor/descendant in T ? • Do parent/child pairs in H share a common ancestor in T?

  17. Digression: Phrase similarity • Measures based on WordNet (Resnik/Lesk). • Distributional similarity • Example: “run” and “marathon” are related. • Latent Semantic Analysis to discover words that are distributionally similar (i.e., have common neighbors). • Used a web-search based measure • Query google.com for all pages with: • “run” • “marathon” • Both “run” and “marathon” • Learning paraphrases. [Similar to DIRT: Lin and Pantel, 2001] • “World knowledge” (labor intensive) • CEO = Chief_Executive_Officer • Philippines  Filipino • [Can add common facts: “Paris is the capital of France”, …]

  18. Graph Matching: Costs • VertexCost(M) For each vertex v in H, and vertex M(v) in T: • Do vertex heads share same stem and or POS ? • Is T vertex head a hypernym of H vertex head? • Are vertex heads “similar” phrases? (next slide) • RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T • Are parent/child pairs in H parent/child in T ? • Are parent/child pairs in H ancestor/descendant in T ? • Do parent/child pairs in H share a common ancestor in T?

  19. bought ARG0(Agent) ARG1(Theme) John BMW PERSON Synonym Match Cost: 0.2 Hypernym Match Cost: 0.4 Exact Match Cost: 0.0 purchased ARG0(Agent) ARG1(Theme) John car PERSON Graph Matching: Example VertexCost: (0.0 + 0.2 + 0.4)/3 = 0.2 RelationCost: 0 (Graphs Isomorphic) ß = 0.45 (say) MatchCost: 0.55 * (0.2) + 0.45 * (0.0) = 0.11

  20. Outline of this talk • Representation of sentences • Syntax: Parsing and post-processing • Adding annotations on representation (e.g., semantic roles) • Inference by graph matching • Inference by abductive theorem proving • A combined system • Results and error analysis

  21. Abductive inference • Idea: • Represent text and hypothesis as logical formulae. • A hypothesis can be inferred from the text if and only if the hypothesis logical formula can be proved from the text logical formula. • Toy example: Prove? Allow assumptions at various “costs” BMW(t) + $2 => car(t) bought(p, q, r) + $1 => purchased(p, q, r)

  22. P(p1, p2, …, pm) Q(q1, q2, …, qn) Abductive assumptions • Assign costs to all assumptions of the form: • Build an assumption cost model

  23. Abductive theorem proving • Each assumption provides a potential proof step. • Find the proof with the minimum total cost • Uniform cost search • If there is a low-cost proof, the hypothesis is entailed. • Example: T: John(A) BMW(B) bought(E, A, B) H: John(x) car(y) purchased(z, x, y) Here is a possible proof by resolution refutation (for the earlier costs): $0 -John(x) -car(y) -purchased(z, x, y) [Given: negation of hypothesis] $0 -car(y) -purchased(z, A, y) [Unify with John(A)] $2 -purchased(z, A, B) [Unify with BMW(B)] $3 NULL [Unify with purchased(E, A, B)] Proof cost = 3

  24. Abductive theorem proving • Can automatically learn good assumption costs • Start from a labeled dataset (e.g.: the PASCAL development set) • Intuition: Find assumptions that are used in the proofs for TRUE examples, and lower their costs (by framing a log-linear model). Iterate. [Details: Raina et al., in submission]

  25. Some interesting features • Examples of handling “complex” constructions in graph matching/abductive inference. • Antonyms/Negation: High cost for matching verbs, if they are antonyms or one is negated and the other not. • T: Stocks fell. H: Stocks rose. FALSE • T: Clinton’s book was not a hit H: Clinton’s book was a hit. FALSE • Non-factive verbs: • T: John was charged for doing X. H: John did X. FALSE Can detect because “doing” in text has non-factive “charged” as a parent but “did” does not have such a parent.

  26. Some interesting features • “Superlative check” • T: This is the tallest tower in western Japan. H: This is the tallest tower in Japan. FALSE

  27. Outline of this talk • Representation of sentences • Syntax: Parsing and post-processing • Adding annotations on representation (e.g., semantic roles) • Inference by graph matching • Inference by abductive theorem proving • A combined system • Results and error analysis

  28. Results • Combine inference methods • Each system produces a score. Separately normalize each system’s score variance. Suppose normalized scores are s1 and s2. • Final score S = w1s1 + w2s2 • Learn classifier weights w1 and w2 on the development set using logistic regression. Two submissions: • Train one classifier weight for all RTE tasks. (General) • Train different classifier weights for each RTE task. (ByTask)

  29. Results • Best other results: Accuracy=58.6%, CWS=0.617 • Balanced predictions. [55.4%, 51.2% predicted TRUE on test set.]

  30. Results by task

  31. Partial coverage results Task-specific optimization seems better! ByTask ByTask General • Can also draw coverage-CWS curves. For example: • at 50% coverage, CWS = 0.781 • at 25% coverage, CWS = 0.873

  32. Some interesting issues • Phrase similarity • away from the coast  farther inland • won victory in presidential election  became President • stocks get a lift  stocks rise • life threatening  fatal • Dictionary definitions • believe there is only one God  are monotheistic • “World knowledge” • K Club, venue of the Ryder Cup, …  K Club will host the Ryder Cup

  33. Future directions • Need more NLP components in there: • Better treatment of frequent nominalizations, parenthesized material, etc. • Need much more ability to do inference • Fine distinctions between meanings, and fine similarities. e.g., “reach a higher level” and “rise” We need a high-recall, reasonable precision similarity measure! • Other resources (e.g., antonyms) are also very sparse. • More task-specific optimization.

  34. Thanks!

More Related