Textual Entailment: A Perspective on Applied Text Understanding

Textual Entailment:A Perspective on Applied Text Understanding Ido Dagan Bar-Ilan University, Israel Joint works with: Oren Glickman, Idan Szpektor, Roy Bar Haim Bar Ilan University, Israel Maayan Geffet Hebrew University, Israel Hristo Tanev, Bernardo Magnini, Alberto Lavelli, Lorenza Romano ITC-irst, Italy Bonaventura Coppola and Milen Kouylekov University of Trento and ITC-irst, Italy

Talk Focus: A Framework for “Applied Semantics” • The textual entailment task – what and why? • Empirical evaluation – PASCAL RTE Challenge • Problem scope, decomposition and analysis • Different perspective on semantic inference • Probabilistic framework • Cf. syntax, MT – clear task, methodology and community

Variability Ambiguity Natural Language and Meaning Meaning Language

Variability of Semantic Expression All major stock markets surged Dow gains 255 points Dow ends up Stock market hits a record high Dow climbs 255 The Dow Jones Industrial Average closed up 255

Variability Recognition –Major Inference in Applications Question Answering (QA) Information Extraction (IE) Information Retrieval (IR) Multi Document Summarization (MDS)

Typical Application Inference QuestionExpected answer formWhoboughtOverture? >> XboughtOverture Overture’s acquisitionby Yahoo Yahoo bought Overture hypothesized answer text • Similar for IE: X buy Y • Similar for “semantic” IR: t: Overture was bought … • Summarization (multi-document) – identify redundant info • MT evaluation (and recent proposals for MT?)

KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS (IJCAI-05) CFP: • Reasoning aspects: * information fusion, * search criteria expansion models * summarization and intensional answers, * reasoning under uncertainty or with incomplete knowledge, • Knowledge representation and integration: * levels of knowledge involved (e.g. ontologies, domain knowledge), * knowledge extraction models and techniques to optimize response accuracy, * coherence and integration.

Inference for Textual Question Answering Workshop (AAAI-05) CFP: • abductions, default reasoning, inference with epistemic logic or description logic • inference methods for QA need to be robust, cover all ambiguities of language • available knowledge sources that can be used for inference … but similar needs for other applications – can we address a uniform empirical task?

Applied Textual Entailment: Abstract Semantic Variability Inference • QA: “Where was John Wayne Born?” • Answer: Iowa Hypothesis (h): John Wayne was born in Iowa inference Text (t): The birthplace of John Wayne is in Iowa

The Generic Entailment Task • Given the text t, can we infer that h is (most likely) true? Hypothesis (h): John Wayne was born in Iowa inference Text (t): The birthplace of John Wayne is in Iowa

Classical Entailment Definition • Chierchia & McConnell-Ginet (2001):A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true • Strict entailment - doesn't account for some uncertainty allowed in applications

“Almost certain” Entailments t:The technological triumph known as GPS … was incubated in the mind of Ivan Getting. h: Ivan Getting invented the GPS. t: According to the Encyclopedia Britannica, Indonesia is the largest archipelagic nation in the world, consisting of 13,670 islands. h: 13,670 islands make up Indonesia.

Textual Entailment ≈Human Reading Comprehension • From a children’s English learning book(Sela and Greenberg): • Reference Text:“…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida. …” • Hypothesis (True/False?):The Bermuda Triangle is near the United States ???

Reading Comprehension QA By Canadian Broadcasting Corporation T: The school has turned its one-time metal shop – lost to budget cuts almost two years ago - into a money-making professional fitness club. Q: When did the metal shop close? A: Almost two years ago

Recognizing Textual Entailment (RTE) ChallengePASCAL NOE Challenge2004-5 Ido Dagan, Oren glickman Bar-Ilan University, Israel Bernardo Magnini ITC-irst, Trento, Italy

Generic Dataset by Application Use • QA • IE • Similar for “semantic” IR: Overture was acquired by Yahoo • Comparable documents (summarization) • MT evaluation • Reading comprehension • Paraphrase acquisition

Some Examples • 567 development examples, 800 test examples

Dataset Characteristics • Examples selected and annotated manually • Using automatic systems where available • Balanced True/False split • True – certain or highly probable entailment • Filtering controversial examples • Example distribution? • Mode –explorative rather than competitive

Arthur Bernstein Competition “… Competition, even a piano competition, is legitimate … as long as it is just an anecdotal side effect of the musical culture scene, and doesn’t threat to overtake the center stage” Haaretz News Paper Culture Section, April 1st, 2005

Submissions • 17 participating groups • 26 system submissions • Microsoft Research: manual analysis of dataset at lexical-syntactic matching level

Broad Range of System Types • Knowledge sources and inferences • Direct t-h matching: • Word overlap / Syntactic tree matching • Lexical relations: • WordNet & statistical (corpus based) • Theorem Provers / Logical inference • Adding a fuzzy scoring mechanism • Supervised / unsupervised learning methods

Accuracy

Where are we?

What’s next – RTE-2 • Organizers: • Bar Ilan, CELCT (Trento), MITRE, MS-Research • Main dataset: utilizing real systems outputs • QA, IE, IR, summarization • Humanperformance dataset • Reading comprehension, human QA (planned) • Schedule (RTE website): • October – development set • February – results submission (test set January) • April 10 – PASCAL workshop in Venice! • right after EACL

Other Evaluation Modes • Entailment subtasks evaluations • Lexical, lexical-syntactic, alignment… • “Seek” mode: • Input: h and corpus • Output: All entailing t’s in corpus • Captures nicely information seeking needs, but requires post-run annotation (like TREC) • Contribution to specific applications

Decomposition ofEntailment Levels Empirical Modeling of Meaning Equivalence and Entailment ACL-05 Workshop Roy Bar-Haim Idan SzpektorOren Glickman Bar-Ilan University

Why? • Entailment Modeling is Complex!! • Was apparent at RTE1 • How can we decompose it, for • Better analysis and sub-task modeling • Piecewise evaluation • Avoid “this is the performance of my complex system…” methodology

Combination of Inference Types T  H

Combination of Inference Types T Co-reference Syntactic trans. paraphrasing Lexical world knowledge H Diverse inference types, different levels of representation

Defining Intermediate Models Lexical Lexical-syntactic

Lexical Model • T and H are represented as bag of terms • T L H if • for each term u  H there exists a term v  T such that v L u • v Lu if • they share the samelemma and POS OR • they are connected by a chain of lexical transformations

Lexical Transformations • We assume perfect word sense disambiguation

Lexical Entailment - Examples • #1952 from RTE1 (TH) ? TLH

Lexical Entailment - Examples • #1361 from RTE1 (TH)

Lexical Entailment - Examples • #1361 from RTE1 (TH) Synonym

Lexical Entailment - Examples • #1952 from RTE1 (TH) Synonym TLH 

Lexical Entailment - Examples • #2127 from RTE1 (TH) ? TLH

Lexical Entailment - Examples • #2127 from RTE1 (TH) TLH 

Lexical-Syntactic Model • T and H are represented by syntactic dependency relations • T LS H if the relations within H can be matched by the relations in T • The coverage can be obtained through a sequence of lexical-syntactic transformations

Lexical-Syntactic Transformations • We assume perfect disambiguation and reference resolution

Lexical-Syntactic Entailment - Examples • #1361 from RTE1 (TH) subj subj TLSH 

Lexical-Syntactic Entailment - Examples • #2127 from RTE1 (TH) subj subj TLSH 

Beyond Lexical-Syntactic Models • Future work…

Empirical Analysis

Annotation • 240 T-H pairs of RTE1 dataset • T L H ; T LS H • High annotator agreement (authors) • Kappa: “substantial agreement”

Model evaluation results • Low precision for Lexical model • Lexical match fails to predict entailment • High precision for Lexical Syntactic model • Checking syntactic relations is crucial • Medium recall for both levels • Higher levels of inference are missing

contribution of individual componentsRTE 1 positive examples Lexical Lex-Syn

Summary (1) • Annotating and analaysing entailment components • Guide research on entailment • Opens new research problems and redirects old ones

Summary (2) • Allows better evaluation of systems • Performance of individual components • Future work – expand analysis to additional levels of representation and inferences • Identify the exciting semantic phenomena …

Textual Entailment: A Perspective on Applied Text Understanding

Textual Entailment: A Perspective on Applied Text Understanding

Presentation Transcript

Applied NWP

Understanding and assessing neglect

Humanistic Perspective

A historical and cultural perspective on ICWA

Problem Solving

Understanding Text Structures

Textual Entailment, QA4MRE, and Machine Reading

Unit 10: Personality

I can cite several pieces of textual evidence.

Chapter 4 Choosing a Perspective for Understanding Revelation In This Chapter

Recognizing Partial Textual Entailment

Text Features

Textual Entailment

Text Classification

Textual Entailment

College English Integrated Course 3

An Integrated English Course Book 4

An Integrated English Course Book 4

Lord of the Flies

Letter to a B Student

Making a Living

Introduction to Applied Behavior Analysis