710 likes | 981 Views
Textual Entailment: A Perspective on Applied Text Understanding. Ido Dagan Bar-Ilan University, Israel Joint works with: Oren Glickman , Idan Szpektor, Roy Bar Haim Bar Ilan University, Israel Maayan Geffet Hebrew University, Israel
E N D
Textual Entailment:A Perspective on Applied Text Understanding Ido Dagan Bar-Ilan University, Israel Joint works with: Oren Glickman, Idan Szpektor, Roy Bar Haim Bar Ilan University, Israel Maayan Geffet Hebrew University, Israel Hristo Tanev, Bernardo Magnini, Alberto Lavelli, Lorenza Romano ITC-irst, Italy Bonaventura Coppola and Milen Kouylekov University of Trento and ITC-irst, Italy
Talk Focus: A Framework for “Applied Semantics” • The textual entailment task – what and why? • Empirical evaluation – PASCAL RTE Challenge • Problem scope, decomposition and analysis • Different perspective on semantic inference • Probabilistic framework • Cf. syntax, MT – clear task, methodology and community
Variability Ambiguity Natural Language and Meaning Meaning Language
Variability of Semantic Expression All major stock markets surged Dow gains 255 points Dow ends up Stock market hits a record high Dow climbs 255 The Dow Jones Industrial Average closed up 255
Variability Recognition –Major Inference in Applications Question Answering (QA) Information Extraction (IE) Information Retrieval (IR) Multi Document Summarization (MDS)
Typical Application Inference QuestionExpected answer formWhoboughtOverture? >> XboughtOverture Overture’s acquisitionby Yahoo Yahoo bought Overture hypothesized answer text • Similar for IE: X buy Y • Similar for “semantic” IR: t: Overture was bought … • Summarization (multi-document) – identify redundant info • MT evaluation (and recent proposals for MT?)
KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS (IJCAI-05) CFP: • Reasoning aspects: * information fusion, * search criteria expansion models * summarization and intensional answers, * reasoning under uncertainty or with incomplete knowledge, • Knowledge representation and integration: * levels of knowledge involved (e.g. ontologies, domain knowledge), * knowledge extraction models and techniques to optimize response accuracy, * coherence and integration.
Inference for Textual Question Answering Workshop (AAAI-05) CFP: • abductions, default reasoning, inference with epistemic logic or description logic • inference methods for QA need to be robust, cover all ambiguities of language • available knowledge sources that can be used for inference … but similar needs for other applications – can we address a uniform empirical task?
Applied Textual Entailment: Abstract Semantic Variability Inference • QA: “Where was John Wayne Born?” • Answer: Iowa Hypothesis (h): John Wayne was born in Iowa inference Text (t): The birthplace of John Wayne is in Iowa
The Generic Entailment Task • Given the text t, can we infer that h is (most likely) true? Hypothesis (h): John Wayne was born in Iowa inference Text (t): The birthplace of John Wayne is in Iowa
Classical Entailment Definition • Chierchia & McConnell-Ginet (2001):A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true • Strict entailment - doesn't account for some uncertainty allowed in applications
“Almost certain” Entailments t:The technological triumph known as GPS … was incubated in the mind of Ivan Getting. h: Ivan Getting invented the GPS. t: According to the Encyclopedia Britannica, Indonesia is the largest archipelagic nation in the world, consisting of 13,670 islands. h: 13,670 islands make up Indonesia.
Textual Entailment ≈Human Reading Comprehension • From a children’s English learning book(Sela and Greenberg): • Reference Text:“…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida. …” • Hypothesis (True/False?):The Bermuda Triangle is near the United States ???
Reading Comprehension QA By Canadian Broadcasting Corporation T: The school has turned its one-time metal shop – lost to budget cuts almost two years ago - into a money-making professional fitness club. Q: When did the metal shop close? A: Almost two years ago
Recognizing Textual Entailment (RTE) ChallengePASCAL NOE Challenge2004-5 Ido Dagan, Oren glickman Bar-Ilan University, Israel Bernardo Magnini ITC-irst, Trento, Italy
Generic Dataset by Application Use • QA • IE • Similar for “semantic” IR: Overture was acquired by Yahoo • Comparable documents (summarization) • MT evaluation • Reading comprehension • Paraphrase acquisition
Some Examples • 567 development examples, 800 test examples
Dataset Characteristics • Examples selected and annotated manually • Using automatic systems where available • Balanced True/False split • True – certain or highly probable entailment • Filtering controversial examples • Example distribution? • Mode –explorative rather than competitive
Arthur Bernstein Competition “… Competition, even a piano competition, is legitimate … as long as it is just an anecdotal side effect of the musical culture scene, and doesn’t threat to overtake the center stage” Haaretz News Paper Culture Section, April 1st, 2005
Submissions • 17 participating groups • 26 system submissions • Microsoft Research: manual analysis of dataset at lexical-syntactic matching level
Broad Range of System Types • Knowledge sources and inferences • Direct t-h matching: • Word overlap / Syntactic tree matching • Lexical relations: • WordNet & statistical (corpus based) • Theorem Provers / Logical inference • Adding a fuzzy scoring mechanism • Supervised / unsupervised learning methods
What’s next – RTE-2 • Organizers: • Bar Ilan, CELCT (Trento), MITRE, MS-Research • Main dataset: utilizing real systems outputs • QA, IE, IR, summarization • Humanperformance dataset • Reading comprehension, human QA (planned) • Schedule (RTE website): • October – development set • February – results submission (test set January) • April 10 – PASCAL workshop in Venice! • right after EACL
Other Evaluation Modes • Entailment subtasks evaluations • Lexical, lexical-syntactic, alignment… • “Seek” mode: • Input: h and corpus • Output: All entailing t’s in corpus • Captures nicely information seeking needs, but requires post-run annotation (like TREC) • Contribution to specific applications
Decomposition ofEntailment Levels Empirical Modeling of Meaning Equivalence and Entailment ACL-05 Workshop Roy Bar-Haim Idan SzpektorOren Glickman Bar-Ilan University
Why? • Entailment Modeling is Complex!! • Was apparent at RTE1 • How can we decompose it, for • Better analysis and sub-task modeling • Piecewise evaluation • Avoid “this is the performance of my complex system…” methodology
Combination of Inference Types T Co-reference Syntactic trans. paraphrasing Lexical world knowledge H Diverse inference types, different levels of representation
Defining Intermediate Models Lexical Lexical-syntactic
Lexical Model • T and H are represented as bag of terms • T L H if • for each term u H there exists a term v T such that v L u • v Lu if • they share the samelemma and POS OR • they are connected by a chain of lexical transformations
Lexical Transformations • We assume perfect word sense disambiguation
Lexical Entailment - Examples • #1952 from RTE1 (TH) ? TLH
Lexical Entailment - Examples • #1361 from RTE1 (TH)
Lexical Entailment - Examples • #1361 from RTE1 (TH) Synonym
Lexical Entailment - Examples • #1952 from RTE1 (TH) Synonym TLH
Lexical Entailment - Examples • #2127 from RTE1 (TH) ? TLH
Lexical Entailment - Examples • #2127 from RTE1 (TH) TLH
Lexical-Syntactic Model • T and H are represented by syntactic dependency relations • T LS H if the relations within H can be matched by the relations in T • The coverage can be obtained through a sequence of lexical-syntactic transformations
Lexical-Syntactic Transformations • We assume perfect disambiguation and reference resolution
Lexical-Syntactic Entailment - Examples • #1361 from RTE1 (TH) subj subj TLSH
Lexical-Syntactic Entailment - Examples • #2127 from RTE1 (TH) subj subj TLSH
Beyond Lexical-Syntactic Models • Future work…
Annotation • 240 T-H pairs of RTE1 dataset • T L H ; T LS H • High annotator agreement (authors) • Kappa: “substantial agreement”
Model evaluation results • Low precision for Lexical model • Lexical match fails to predict entailment • High precision for Lexical Syntactic model • Checking syntactic relations is crucial • Medium recall for both levels • Higher levels of inference are missing
contribution of individual componentsRTE 1 positive examples Lexical Lex-Syn
Summary (1) • Annotating and analaysing entailment components • Guide research on entailment • Opens new research problems and redirects old ones
Summary (2) • Allows better evaluation of systems • Performance of individual components • Future work – expand analysis to additional levels of representation and inferences • Identify the exciting semantic phenomena …