710 likes | 864 Views
Word Sense and Subjectivity. Jan Wiebe Rada Mihalcea University of Pittsburgh University of North Texas. Introduction. Growing interest in the automatic extraction of opinions, emotions , and sentiments in text (subjectivity).
E N D
Word Sense and Subjectivity Jan Wiebe Rada Mihalcea University of Pittsburgh University of North Texas
Introduction • Growing interest in the automatic extraction of opinions,emotions, and sentiments in text (subjectivity)
Subjectivity Analysis: Applications • Opinion-oriented question answering:How do the Chinese regard the human rights record of the United States? • Product review mining:What features of the ThinkPad T43 do customers like and which do they dislike? • Review classification:Is a review positive or negative toward the movie? • Tracking emotions toward topics over time:Is anger ratcheting up or cooling down toward an issue or event? • Etc.
Introduction • Continuing interest in word sense • Sense annotated resources being developed for many languages • www.globalwordnet.org • Active participation in evaluations such as SENSEVAL
Word Sense and Subjectivity • Though both are concerned with text meaning, they have mainly been investigated independently
S O Subjectivity Labels on Senses Alarm, dismay, consternation – (fear resulting from the awareness of danger) Alarm, warning device, alarm system – (a device that signals the occurrence of some undesirable event)
S O Subjectivity Labels on Senses Interest, involvement -- (a sense of concern with and curiosity about someone or something; "an interest in music") Interest -- (a fixed charge for borrowing money; usually a percentage of the amount borrowed; "how much interest do you pay on your mortgage?")
He spins a riveting plot which grabs and holds the reader’s interest. Sense 4 Sense 1? Sense 4 “a sense of concern with and curiosity about someone or something” S Sense 1“a fixed charge for borrowing money” O WSD System Sense 1 Sense 4? The notes do not pay interest. WSD using Subjectivity Tagging
WSD using Subjectivity Tagging He spins a riveting plot which grabs and holds the reader’s interest. S Sense 4 Sense 1? Sense 4 “a sense of concern with and curiosity about someone or something” S Sense 1“a fixed charge for borrowing money” O Subjectivity Classifier WSD System Sense 1 Sense 4? O The notes do not pay interest.
WSD using Subjectivity Tagging He spins a riveting plot which grabs and holds the reader’s interest. S Sense 4 Sense 1? Sense 4 “a sense of concern with and curiosity about someone or something” S Sense 1“a fixed charge for borrowing money” O Subjectivity Classifier WSD System Sense 1 Sense 4? O The notes do not pay interest.
Subjectivity Tagging using WSD Subjectivity Classifier He spins a riveting plot which grabs and holds the reader’s interest. S O? O S? The notes do not pay interest.
S Sense 4 “a sense of concern with and curiosity about someone or something” OSense 1“a fixed charge for borrowing money” Subjectivity Tagging using WSD Subjectivity Classifier He spins a riveting plot which grabs and holds the reader’s interest. S O? Sense 4 WSD System O S? Sense 1 The notes do not pay interest.
S Sense 4 “a sense of concern with and curiosity about someone or something” OSense 1“a fixed charge for borrowing money” Subjectivity Tagging using WSD Subjectivity Classifier He spins a riveting plot which grabs and holds the reader’s interest. S O? Sense 4 WSD System O S? Sense 1 The notes do not pay interest
Goals • Explore interactions between word sense and subjectivity • Can subjectivity labels be assigned to word senses? • Manually • Automatically • Can subjectivity analysis improve word sense disambiguation? • Can word sense disambiguation improve subjectivity analysis? Future work
Outline • Motivation and Goals • Assigning Subjectivity Labels to Word Senses • Manually • Automatically • Word Sense Disambiguation using Automatic Subjectivity Analysis • Conclusions
Prior Work on Subjectivity Tagging • Identifying words and phrases associated with subjectivity • Think ~ private state;Beautiful ~ positive sentiment • Hatzivassiloglou & McKeown 1997; Wiebe 2000; Kamps & Marx 2002; Turney 2002; Esuli & Sabastiani 2005; Etc • Subjectivity classification of sentences, clauses, phrases, or word instances in context • subjective/objective; positive/negative/neutral • Riloff & Wiebe 2003; Yu & Hatzivassiloglou 2003; Dave et al 2003; Hu & Liu 2004; Kim & Hovy 2004; Etc. • Here:subjectivity labels are applied toword senses
Outline • Motivation and Goals • Assigning Subjectivity Labels to Word Senses • Manually • Automatically • Word Sense Disambiguation using Automatic Subjectivity Analysis • Conclusions
Annotation Scheme • Assigning subjectivity labels toWordNet senses • S:subjective • O:objective • B:both
S Annotators are given the synset and its hypernym Alarm, dismay, consternation – (fear resulting form the awareness of danger) • Fear, fearfulness, fright – (an emotion experiences in anticipation of some specific pain or danger (usually accompanied by a desire to flee or fight))
Subjective Sense Definition • When the sense is used in a text or conversation, we expect it to express subjectivity, and we expect the phrase/sentence containing it to be subjective.
Objective Senses: Observation • We don’tnecessarily expect phrases/sentences containing objective senses to be objective • Would you actually be stupid enough to pay that rate of interest? • Will someone shut that darn alarm off? • Subjective, but notdue tointerest or alarm
Objective Sense Definition • When the sense is used in a text or conversation, we don’t expect it to express subjectivity and,if the phrase/sentence containing it issubjective, the subjectivity is due tosomething else.
Senses that are Both • Covers both subjective and objective usages • Example: absorb, suck, imbibe, soak up, sop up, suck up, draw, take in, take up – (take in, alsometaphorically;“The sponge absorbs water well”;“She drew strength from the Minister’s Words”)
Annotated Data • 64 words; 354 senses • Balanced subset [32 words; 138 senses]; 2 judges • The ambiguous nouns of the SENSEVAL-3 English Lexical Task [20 words; 117 senses]; 2 judges • [Mihalcea, Chklovski & Kilgarriff, 2004] • Others [12 words; 99 senses]; 1 judge
Annotated Data: Agreement Study • 64 words; 354 senses • Balanced subset [32 words; 138 senses]; 2 judges • 16 words have both S and O senses • 16 words do not (8 only S and 8 only O) • All subsets balanced between nouns and verbs • Uncertain tags also permitted
Inter-Annotator Agreement Results • Overall: • Kappa=0.74 • Percent Agreement=85.5%
Inter-Annotator Agreement Results • Overall: • Kappa=0.74 • Percent Agreement=85.5% • Without the 12.3% cases when a judge is U: • Kappa=0.90 • Percent Agreement=95.0%
Inter-Annotator Agreement Results • Overall: • Kappa=0.74 • Percent Agreement=85.5% • 16 words with S and O senses: Kappa=0.75 • 16 words with only S or O: Kappa=0.73 Comparable difficulty
Inter-Annotator Agreement Results • 64 words; 354 senses • The ambiguous nouns of the SENSEVAL-3 English Lexical Task [20 words; 117 senses] 2 judges • U tags not permitted • Even so, Kappa=0.71
Outline • Motivation and Goals • Assigning Subjectivity Labels to Word Senses • Manually • Automatically • Word Sense Disambiguation using Automatic Subjectivity Analysis • Conclusions
Related Work • unsupervised word-sense ranking algorithm of [McCarthy et al 2004] • That task:approximate corpus frequencies of word senses • Our task: predict a word-sense property (subjectivity) • method for learning subjective adjectives of[Wiebe 2000] • That task:label words • Our task:label word senses
Overview • Main idea: assess the subjectivity of a word sense based on information about the subjectivity of • a set of distributionally similar words • in a corpus annotated with subjective expressions
MPQA Opinion Corpus • 10,000 sentences from the world press annotated for subjective expressions • [Wiebe at al., 2005] • www.cs.pitt.edu/mpqa
Subjective Expressions • Subjective expressions: opinions, sentiments, speculations, etc. (private states)expressed in language
Examples • His alarm grew. • The leaders roundly condemned the Iranian President’s verbal assault on Israel. • He would be quite a catch. • That doctor is a quack.
Annotated Corpus (MPQA) Unannotated Corpus (BNC) Lin 1998 #insts(DSW) in SE - #insts(DSW) not in SE #insts (DSW) subj(w) = DSW = {dsw1, …, dswj} Preliminaries: subjectivity of word w
Annotated Corpus (MPQA) DSW = {dsw1, …, dswj} Subjectivity of word w Unannotated Corpus (BNC) #insts(DSW) in SE - #insts(DSW) not in SE #insts (DSW) subj(w) = [-1, 1] [highly objective, highly subjective]
Annotated Corpus (MPQA) dsw1inst1 dsw1inst2 dsw2inst1 Unannotated Corpus (BNC) +1 -1 +1 +1 -1 +1 subj(w) = = 1/3 3 DSW = {dsw1,dsw2} Subjectivity of word w
Annotated Corpus (MPQA) dsw1inst1 dsw1inst2 dsw2inst1 +sim(wi,dsw1) - sim(wi,dsw1) + sim(wi,dsw2) subj(wi) = 2 * sim(wi,dsw1) + sim(wi,dsw2) Subjectivity of word sense wi Rather than 1, add or subtract sim(wi,dswj) +sim(wi,dsw1) [-1, 1] -sim(wi,dsw1) +sim(wi,dsw2)
Method –Step 1 • Given word w • Find distributionally similar words [Lin 1998] • DSW = {dswj | j = 1 .. n} • Experiment with top 100 and 160
Method – Step 2 • Find the similarity between each word sense and each distributionally similar word • wnss can be any concept-based similarity measure between word senses • we use Jiang & Conrath 1997
Method – Step 2 • Find the similarity between each word sense and each distributionally similar word • wnss can be any concept-based similarity measure between word senses • we use Jiang & Conrath 1997
Method – Step 2 • Find the similarity between each word sense and each distributionally similar word • wnss can be any concept-based similarity measure between word senses • we use Jiang & Conrath 1997
Method – Step 2 • Find the similarity between each word sense and each distributionally similar word • wnss can be any concept-based similarity measure between word senses • we use Jiang & Conrath 1997
Method – Step 2 • Find the similarity between each word sense and each distributionally similar word • wnss can be any concept-based similarity measure between word senses • we use Jiang & Conrath 1997
Method –Step 3 Input:word sense wi of word w DSW = {dswj | j = 1..n} sim(wi,dswj) MPQA Opinion Corpus Output:subjectivity score subj(wi)
Method –Step 3 totalsim = #insts(dswj) * sim(wi,dswj) subj = 0 for each dswj in DSW: for each instance k in insts(dswj): if k is in a subjective expression: subj += sim(wi,dswj) else: subj -= sim(wi,dswj) subj(wi) = subj / totalsim
Method – Optional Variation if k is in a subjective expression: subj += sim(wi,dswj) else: subj -= sim(wi,dswj) w1 dsw1 dsw2 dsw3 w2 dsw1 dsw2 dsw3 w3 dsw1 dsw2 dsw3 “Selected”
Evaluation • Calculate subjscores for all word senses, and sort them • While 0 is a natural candidate for division between S and O, we perform the evaluation for different thresholds in [-1,+1] • Calculate the precision of the algorithm at different points of recall