Computing semantic relatedness using Wikipedia features

Computing semantic relatedness using Wikipedia features Presenter : YAN-SHOU SIE Authors Mohamed Ali HadjTaieb*, Mohamed Ben Aouicha, AbdelmajidBen Hamadou2013. KBS

Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

Motivation • Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguis- tics, cognitive science and artiﬁcialintelligence.

Objectives • We propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances.

Methodology • Our semantic relatedness computing system • Filtering Wikipedia category graph • pre-processing • Filtering article content • Porter stemming • Weighting article stems • Providing a Category Semantic Depiction (CSD)

Methodology • Different steps performed to generate the Category Semantic DepictionFilteringWikipedia category graph

Methodology • Filtering Wikipedia category graph • First : clean meta-categories • We remove all those nodes whose labels contain any of the following strings : Wikipedia, wikiproject, lists, mediawiki,template, user, portal, categories, articles, pages, stub and album • Second : remove orphan nodes and we keep only the category Contents as root • maximum depth 291 to 221

Methodology • pre-processing • Filtering article content • Remove html tags,infobox, language translation, hyperlinks. . . • Porter stemming • ﬁltereda stop list to eliminate words which do not have any contribution. • Weighting article stems • Providing a Category Semantic Depiction (CSD)

Methodology- • Semantic relatedness computing system architecture • Extraction categories algorithm • WordNet: • resolve the disambiguation pages problem: • Setp1 : extracting all outLinks • Setp2 : find links containing disambiguation tag in parenthesis • Setp3 : extract categories to the two first links • Final : take the categories of the article assigned to the first link existing in the ordered set

Methodology • Semantic relatedness computing system architecture • Semantic relatedness computing

Methodology • Evaluating semantic relatedness measures • Comparison with human judgments • Pearson product-moment correlation coefficient • Spearman rank order correlation coefﬁcient • Datasets

Experiments • Our semantic relatedness computing system modules using Wikipedia features • Basic system • First module • Second module • Third module • Forth module

Experiments • Basic system

Experiments • First module: simple patterns

Experiments • Second module: Wikipedia pages

Experiments • Third module: enrichment using categories neighbors in WCG

Experiments • Forth module: Categories enrichment using WCG and redirects

Experiments • Application of the SR measure on other datasets • Datasets RG-65 and MC-30 • The verbal dataset YP-130 • Solving word choice problems

Conclusions • Our result system shows a good performance and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches

Comments • Advantages Able to use wiki to get a lot of semantic relationship information, semantic relations for many measurements related work of great help. • Applications • cognitive science • artiﬁcial intelligence

Computing semantic relatedness using Wikipedia features

Computing semantic relatedness using Wikipedia features

Presentation Transcript

ENHANCING CLUSTER LABELING USING WIKIPEDIA

Extended Gloss Overlaps as a Measure of Semantic Relatedness

Relatedness

Syntactic relatedness

OMIOTIS: A Thesaurus-based Measure of Semantic Relatedness

Extracting Semantic Relationships between Wikipedia Categories

APS Wikipedia Initiative: Using Wikipedia Writing in Psychology Classes

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

A semantic approach for question classification using WordNet and Wikipedia

Semantic Wikipedia The missing links

Extracting Semantic Knowledge from Wikipedia Category Names

Additional Wikipedia Based Features

Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis

Using Semantic Relatedness for Word Sense Disambiguation

A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis

Geographic Features Extraction Using Soft-computing Methods(2)

Computing Semantic Relatedness

Semantic Features

Towards a Semantic Wikipedia: WikiData