120 likes | 307 Views
Extracting Semantic Relationships between Wikipedia Categories. By Sergey Chernov , Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou, Michal Kopycki, Przemyslaw Rys. MOTIVATION. Preliminaries. WIKIPEDIA: largest knowledge sharing system Many pages assigned to CATEGORIES
E N D
Extracting Semantic Relationships between Wikipedia Categories By Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl,Xuan Zhou, Michal Kopycki, Przemyslaw Rys Sergey Chernov
MOTIVATION Preliminaries • WIKIPEDIA: largest knowledge sharing system • Many pages assigned to CATEGORIES • All links are NAVIGATIONAL • Can we extract SEMANTIC links? Sergey Chernov
MOTIVATION Wikipedia Categories Example Sergey Chernov
MOTIVATION Possible benefits • Semi-structured queries • “find Countries which had Democratic Non-Violent Revolutions” rephrased as • “find page from category Countries which is connected to some page in Non-Violent Revolutions” • Hints for authors • “you edit page from category Countries,do you want to add a link to page in category Capital?” • Raw data for manual semantic markup Sergey Chernov
Experiments Heuristics • Number of links • NL = 3 • Connectivity Ratio • CR = 3/4 = 0.75 Countries Capitals Germany Berlin Austria Vienna Denmark Stockholm France Paris Sergey Chernov
Experiments Dataset • INEX 2006 collection • Sample category rankings Sergey Chernov
Manual assessment methodology • Semantic Connection Strength (SCS) Measure: • 2 = strong semantic relationship, • 1 = average semantic relationship, • 0 = weak or no semantic relationship. • Instruction for Assessors • “category A is strongly related to category B (value 2) if you believe that every page in A should conceptually have at least one semantic link to B;” • “A and B are averagely related (value 1), if you believe 50% of pages in A should have semantic links to B;” • “otherwise, A and B are weakly related (value 0).” Sergey Chernov
Experiments Experiments with Number of Links Average semantic connections strength for 100 sample categories, extracted using Number of Links. Sergey Chernov
Experiments Experiments with Connectivity Ratio Average semantic connections strength for 100 sample categories, extracted using Connectivity Ratio. Sergey Chernov
Summary General Results and Conclusions • Result is skewed toward Countries category • Connectivity Ratio is a better measure than Number of Links • We have observed that inlinks have better performance than outlinks. Sergey Chernov
Summary Future Steps • More manual exploration, look for additional heuristics • Consider more categories • SCS composed of • Is this a “part of” relation? W1 • Is this a “is a” relation? W2 • Is this a “synonym” relation? W3 • Is this a “antonym” relation? W4 • It is related in a different way? Which one? W5 Sergey Chernov
Thank You! Sergey Chernov