670 likes | 688 Views
This paper discusses the significance of understanding text meaning at a semantic level and the need for a unified computational framework for semantics. It explores the textual entailment task and its modeling approaches, including knowledge acquisition and applications in various domains. The variability and ambiguity in natural language meaning, along with the importance of semantic expression, are also highlighted. The text emphasizes the implication of textual entailment in different applications, such as question answering, summarization, machine translation evaluation, and educational purposes. Additionally, it delves into the role of knowledge in textual entailment and how systems can leverage both text and background knowledge to determine entailment confidence. The paper also touches on the challenges and impact of Recognizing Textual Entailment (RTE) tasks, showcasing examples and the success of previous challenges in the research community.
E N D
Textual Entailment as a Framework for Applied Semantics Ido Dagan Bar-Ilan University, Israel Joint works with: Oren Glickman, Idan Szpektor, Roy Bar Haim, Maayan Geffet, Moshe Koppel, Efrat Marmorshtein, Bar Ilan UniversityShachar Mirkin Hebrew University, Israel Hristo Tanev, Bernardo Magnini, Alberto Lavelli, Lorenza Romano ITC-irst, Italy Bonaventura Coppola, Milen Kouylekov University of Trento and ITC-irst, Italy Danilo Giampiccolo, CELCT, Italy Dan Roth, UIUC
Applied Semantics forText Understanding/Reading • Understanding text meaning refers to the semantic level of language • An applied computational framework for semantics is needed • Such common framework is still missing
Desiderata for Modeling Framework • A framework for a target level of language processing should provide: • Generic module for applications • Unified paradigm for investigating language phenomena • Unified knowledge representation • Most semantics research is scattered • WSD, NER, SRL, lexical semantics relations… (e.g. vs. syntax) • Dominating approach - interpretation
Outline • The textual entailment task – what and why? • Evaluation – PASCAL RTE Challenges • Modeling approach: • Knowledge acquisition • Inference (briefly) • Application example • An alternative framework for investigating semantics
Variability Ambiguity Natural Language and Meaning Meaning Language
Variability of Semantic Expression The Dow Jones Industrial Average closed up 255 Model variabilityas relations between text expressions: • Equivalence: expr1 expr2 (paraphrasing) • Entailment: expr1 expr2 – the general case • Incorporates inference as well Dow ends up Dow gains 255 points Stock market hits a record high Dow climbs 255
Typical Application Inference QuestionExpected answer formWhoboughtOverture? >> XboughtOverture Overture’s acquisitionby Yahoo Yahoo bought Overture entails hypothesized answer text • Similar for IE: X buy Y • Similar for “semantic” IR: t: Overture was bought … • Summarization (multi-document) – identify redundant info • MT evaluation (and recent ideas for MT) • Educational applications
KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS (IJCAI-05) CFP: • Reasoning aspects: * information fusion, * search criteria expansion models * summarization and intensional answers, * reasoning under uncertainty or with incomplete knowledge, • Knowledge representation and integration: * levels of knowledge involved (e.g. ontologies, domain knowledge), * knowledge extraction models and techniques to optimize response accuracy… but similar needs for other applications – can entailment provide a common empirical task?
Classical Entailment Definition • Chierchia & McConnell-Ginet (2001):A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true • Strict entailment - doesn't account for some uncertainty allowed in applications
“Almost certain” Entailments t:The technological triumph known as GPS … was incubated in the mind of Ivan Getting. h: Ivan Getting invented the GPS.
Applied Textual Entailment • Directional relation between two text fragments: Text (t) and Hypothesis (h): • Operational (applied) definition: • Human gold standard - as in NLP applications • Assuming common background knowledge – which is indeed expected from applications!
Probabilistic Interpretation Definition: • t probabilistically entailshif: • P(h istrue | t) > P(h istrue) • tincreases the likelihood of h being true • ≡ Positive PMI – t provides information on h’s truth • P(h istrue | t ):entailment confidence • The relevant entailment score for applications • In practice: “most likely” entailment expected
The Role of Knowledge • For textual entailment to hold we require: • text AND knowledgeh but • knowledge should not entail h alone • Systems are not supposed to validate h’s truth without utilizing t
PASCAL Recognizing Textual Entailment (RTE) ChallengesEU FP-6 Funded PASCAL NOE 2004-7 Bar-Ilan University ITC-irst and CELCT, Trento MITRE Microsoft Research
Generic Dataset by Application Use • 7 application settings in RTE-1, 4 in RTE-2/3 • QA • IE • “Semantic” IR • Comparable documents / multi-doc summarization • MT evaluation • Reading comprehension • Paraphrase acquisition • Most data created from actual applications output • RTE-2: 800 examples in development and test sets • 50-50% YES/NO split
Participation and Impact • Very successful challenges, world wide: • RTE-1 – 17 groups • RTE-2 – 23 groups • 30 groups in total • ~150 downloads! • RTE-3 underway – 25 groups • Joint workshop at ACL-07 • High interest in the research community • Papers, conference sessions and areas, PhD’s, influence on funded projects • Textual Entailment special issue at JNLE • ACL-07 tutorial
Methods and Approaches (RTE-2) • Measure similarity match between t and h (coverage of h by t): • Lexical overlap (unigram, N-gram, subsequence) • Lexical substitution (WordNet, statistical) • Syntactic matching/transformations • Lexical-syntactic variations (“paraphrases”) • Semantic role labeling and matching • Global similarity parameters (e.g. negation, modality) • Cross-pair similarity • Detect mismatch (for non-entailment) • Logical interpretation and inference (vs. matching)
Dominant approach: Supervised Learning • Features model similarity and mismatch • Classifier determines relative weights of information sources • Train on development set and auxiliary t-h corpora Similarity Features:Lexical, n-gram,syntactic semantic, global Classifier YES t,h NO Feature vector
Results Average: 60% Median: 59%
Analysis • For the first time: deeper methods (semantic/ syntactic/ logical) clearly outperform shallow methods (lexical/n-gram) Cf. Kevin Knight’s invited talk at EACL-06, titled: Isn’t linguistic Structure Important, Asked the Engineer • Still, most systems based on deep analysis did not score significantly better than the lexical baseline
Why? • System reports point at: • Lack of knowledge (syntactic transformation rules, paraphrases, lexical relations, etc.) • Lack of training data • It seems that systems that coped better with these issues performed best: • Hickl et al. - acquisition of large entailment corpora for training • Tatu et al. – large knowledge bases (linguistic and world knowledge)
Some suggested research directions • Knowledge acquisition • Unsupervised acquisition of linguistic and world knowledge from general corpora and web • Acquiring larger entailment corpora • Manual resources and knowledge engineering • Inference • Principled framework for inference and fusing information levels • Are we happy with bags of features?
Complementary Evaluation Modes • Entailment subtasks evaluations • Lexical, lexical-syntactic, logical, alignment… • “Seek” mode: • Input: h and corpus • Output: All entailing t’s in corpus • Captures information seeking needs, but requires post-run annotation (TREC style) • Contribution to specific applications! • QA – Harabagiu & Hickl, ACL-06; RE – Romano et al., EACL-06
Learning Entailment Rules Q: What reduces the risk of Heart Attacks? Hypothesis:Aspirinreduces the risk ofHeart Attacks Text:Aspirin prevents Heart Attacks Entailment Rule:XpreventY ⇨ Xreduce risk ofY template template Need a large knowledge base of entailment rules
TEASE – Algorithm Flow Lexicon Input template: Xsubj-accuse-objY WEB TEASE Sample corpus for input template: Paula Jones accused Clinton… Sanhedrin accused St.Paul… … Anchor Set Extraction(ASE) Anchor sets: {Paula Jonessubj; Clintonobj} {Sanhedrinsubj; St.Paulobj} … Template Extraction (TE) Sample corpus for anchor sets: Paula Jones called Clinton indictable… St.Paul defendedbefore the Sanhedrin … Templates: X call YindictableY defend before X… iterate
Experiment and Evaluation • 48 randomly chosen input verbs • 1392 templates extracted ; human judgments Encouraging Results: • Future work: precision, estimate probabilities
Acquiring Lexical Entailment Relations • COLING-04, ACL-05Lexical entailment via distributional similarity • Individual features characterize semantic properties • Obtain characteristic features via bootstrapping • Test characteristic feature inclusion (vs. overlap) • COLING-ACL-06Integrate pattern-based extraction • NP such as NP1, NP2, … • Complementary information to distributional evidence • Integration using ML with minimal supervision (10 words)
Acquisition Example • Top-ranked entailments for “company”: • firm, bank, group, subsidiary, unit, business, • supplier, carrier, agency, airline, division, giant, • entity, financial institution, manufacturer, corporation, • commercial bank, joint venture, maker, producer, factory … • Does not overlap traditional ontological relations
Initial Probabilistic Lexical Co-occurrence Models • Alignment-based (RTE-1 & ACL-05 Workshop) • The probability that a term in h is entailed by a particular term in t • Bayesian classification (AAAI-05) • The probability that a term in h is entailed by (fits in) the entire text of t • An unsupervised text categorization setting – each term is a category • Demonstrate directions for probabilistic modeling and unsupervised estimation
rel rel N1 N2 conj mod and N2 Manual Syntactic Transformations Example: ‘X preventY ’ • Sunscreen, which prevents moles and sunburns, …. sunscreen prevent obj subj Y X which subj prevents obj () moles mod conj and sunburns
Syntactic Variability Phenomena Template: X activate Y
Takeout • Promising potential for creating huge entailment knowledge bases • Mostly by unsupervised approaches • Manually encoded • Derived from lexical resources • Potential for uniform representations, such as entailment rules, for different types of semantic and world knowledge
Inference • Goal: infer hypothesis from text • Match and apply available entailment knowledge • Heuristically bridge inference gaps • Our approach: mapping language constructs • Vs. semantic interpretation • Lexical-syntactic structures as meaning representation • Amenable for unsupervised learning • Entailment rule transformations over syntactic trees
Relation Extraction • Subfield of Information Extraction • Identify differentwaysof expressing a target relation • Examples: Management Succession, Birth - Death, Mergers and Acquisitions, Protein Interaction • Traditionally performed in a supervised manner • Requires dozens-hundreds examples per relation • Examples should cover broad semantic variability • Costly - Feasible??? • Little work on unsupervised approaches
Our Goals Entailment Approach for Relation Extraction Unsupervised Relation Extraction System Evaluation Framework for Entailment Rule Acquisition and Matching
Proposed Approach Input Template X prevent Y Entailment Rule Acquisition TEASE Templates X prevention for Y, X treat Y, X reduce Y TransformationRules Syntactic Matcher Relation Instances <sunscreen, sunburns>
Dataset • Bunescu 2005 • Recognizing interactions between annotated proteins pairs • 200 Medline abstracts • Gold standard dataset of protein pairs • Input template : X interact with Y
Manual Analysis - Results • 93% of interacting protein pairs can be identified with lexical syntactic templates Number of templates vs. recall (within 93%): Frequency of syntactic phenomena:
TEASE Output for X interact with Y A sample of correct templates learned:
TEASE algorithm - Potential Recall on Training Set • Iterative - taking the top 5 ranked templates as input • Morph - recognizing morphological derivations(cf. semantic role labeling vs. matching)
Results for Full System Error sources: • Dependency parser and syntactic matching errors • No morphological derivation recognition • TEASE limited precision (incorrect templates)
Vs Supervised Approaches • 180 training abstracts