180 likes | 308 Views
Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago. UIC at TREC 2006: Blog Track. Summary. Overview of the opinion retrieval Relevant document retrieval Opinion relevant document retrieval Opinion system Subjective/objective training data
E N D
Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago UIC at TREC 2006: Blog Track
Summary • Overview of the opinion retrieval • Relevant document retrieval • Opinion relevant document retrieval • Opinion system • Subjective/objective training data • Feature extraction • Subjectivity classifier • Opinion document ranking
Document Space Opinion Documents Query Opinion Relevant Documents Relevant Documents Opinion Document Retrieval
Opinion Document Retrieval • Relevant documents • an IR approach • Opinion relevant documents • a classification approach
Relevant Document Retrieval • The UIC IR system in TREC 2005 Robust Track • Without WSD and adding synonyms/hyponyms • Phrase recognition • Proper name, dictionary phrase • Simple phrase, complex phrase • Query expansion • pseudo relevant feedback, Wikipedia, Web • Document-query similarity • Phrase similarity and term similarity
Retrieved documents … another bad thing about march of the penguins - I totally agree. For a documentary , it carried just about no information. … a document ... " march of the penguins ," which was excellent yet really pretty disturbing … Opinion sentences opinion relevant document Opinion Relevant Document Retrieval
The Opinions • Opinions are query dependent • food automobile • Should be learned and tested depending on queries • Should be analyzed within the sentences
Opinion System Overview query Wikipedia.org Objective sentences Rateitall.com Subjective sentences Feature Extraction Retrieved Documents Opinion Documents SVM classifier Final answers Opinion Relevant Documents Opinion - query connection Re-rank
The Objective Sentences • Wikipedia.org pages as primary source • every sentence is objective • multiple pages for multiple phrases • Web pages as secondary source • from web search engine • restriction: -comment -review, -”I think”
The Subjective Sentences • Rateitall.com pages as primary source • every comment sentence is subjective • Web pages as secondary source • from web search engine • restriction: +comment, +review, +”I think”.
The Featured Terms • Use unigrams and bigrams • Chi-square test • to test the hypothesis that a term t is distributed unevenly in the objective text set and the subjective text set
The Sentence Classifier • Support Vector Machine sentence classifier Subjective sentences Objective sentences Featured terms Featured term vector representation SVM Training SVM classifier
Find the Opinion Documents • A retrieved document that contains at least one opinion sentence • Split document to sentences • Test each sentence by the classifier Document SVM classifier Sentence 1 Label 1:objective Sentence 2 Label 2:subjective … … Sentence n Label n:objective
Find the Opinion Relevant Documents • A retrieved document that contains at least one opinion “relevant” sentence • query terms in or near a opinion sentence query opinion sentence text window document document
Rank the Opinion Relevant Documents • Strategy 1 • Use the document retrieval ranking • Remove documents that does not have opinion relevant sentence Sim(D, Q): query-doc similarity I(D, Q) = 1 if D contains opinion relevant sentence = 0 otherwise
Rank the Opinion Relevant Documents • Strategy 2 • Calculate a document opinion score OS(D): opinion sentence set of document D Scoreclassification(s): score of the opinion sentence s from the SVM classifier Relevant(s, Q): 1 if s is a opinion relevant sentence, 0 otherwise