200 likes | 349 Views
Computing semantic relatedness using Wikipedia features. Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb * , Mohamed Ben Aouicha , Abdelmajid Ben Hamadou 2013. KBS. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
Computing semantic relatedness using Wikipedia features Presenter : YAN-SHOU SIE Authors Mohamed Ali HadjTaieb*, Mohamed Ben Aouicha, AbdelmajidBen Hamadou2013. KBS
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguis- tics, cognitive science and artificialintelligence.
Objectives • We propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances.
Methodology • Our semantic relatedness computing system • Filtering Wikipedia category graph • pre-processing • Filtering article content • Porter stemming • Weighting article stems • Providing a Category Semantic Depiction (CSD)
Methodology • Different steps performed to generate the Category Semantic DepictionFilteringWikipedia category graph
Methodology • Filtering Wikipedia category graph • First : clean meta-categories • We remove all those nodes whose labels contain any of the following strings : Wikipedia, wikiproject, lists, mediawiki,template, user, portal, categories, articles, pages, stub and album • Second : remove orphan nodes and we keep only the category Contents as root • maximum depth 291 to 221
Methodology • pre-processing • Filtering article content • Remove html tags,infobox, language translation, hyperlinks. . . • Porter stemming • filtereda stop list to eliminate words which do not have any contribution. • Weighting article stems • Providing a Category Semantic Depiction (CSD)
Methodology- • Semantic relatedness computing system architecture • Extraction categories algorithm • WordNet: • resolve the disambiguation pages problem: • Setp1 : extracting all outLinks • Setp2 : find links containing disambiguation tag in parenthesis • Setp3 : extract categories to the two first links • Final : take the categories of the article assigned to the first link existing in the ordered set
Methodology • Semantic relatedness computing system architecture • Semantic relatedness computing
Methodology • Evaluating semantic relatedness measures • Comparison with human judgments • Pearson product-moment correlation coefficient • Spearman rank order correlation coefficient • Datasets
Experiments • Our semantic relatedness computing system modules using Wikipedia features • Basic system • First module • Second module • Third module • Forth module
Experiments • Basic system
Experiments • First module: simple patterns
Experiments • Second module: Wikipedia pages
Experiments • Third module: enrichment using categories neighbors in WCG
Experiments • Forth module: Categories enrichment using WCG and redirects
Experiments • Application of the SR measure on other datasets • Datasets RG-65 and MC-30 • The verbal dataset YP-130 • Solving word choice problems
Conclusions • Our result system shows a good performance and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches
Comments • Advantages Able to use wiki to get a lot of semantic relationship information, semantic relations for many measurements related work of great help. • Applications • cognitive science • artificial intelligence