170 likes | 281 Views
Truth Discovery with Multiple Conflicting Information Providers on the Web Jiawei Han, Xiaoxin Yin, Philip Yu. Presented by Jungyoon Lee. Motivation. Is the world-wide web always trustable? No!! No guarantee for the correctness of information on the web.
E N D
Truth Discovery with Multiple Conflicting Information Providers on the WebJiawei Han, Xiaoxin Yin, Philip Yu Presented by Jungyoon Lee
Motivation • Is the world-wide web always trustable? • No!! • No guarantee for the correctness of information on the web. • What if two different web sites have two different and conflict information of a certain object? • Veracity – conformity to truth.
Example – Authors of books • Who wrote the book “Rapid Contextual Design?”
Survey • 54% of Internet users trust news web sites at least most of time, while this ratio is only 26% for web sites that sell products, and is merely 12% for blogs. • Solutions? • Authority Hub analysis? • PageRank? • Link-based analysis?
TRUTHFINDER • Given a large amount of conflicting information about many objects, which is provided by multiple web sites, how to discover the true fact about each object? • Inter-dependency between facts and web sites • Prob. of a fact being true • Trustworthiness of a web site • Similar to inter-dependency in Authority-Hub analysis?
Inter-dependency • The trustworthiness of a web site does NOT depend on how many facts it provides. • We cannot compute the prob. of a fact being true by adding up the trustworthiness of web sites. • This leads to non-linearity in computation. • Different facts influence each other.
Problem Definition • Input of TRUTHFINDER
Definitions • Confidence of facts: s(f) • The probability of f being correct, according to the best of our knowledge • Trustworthiness of web sites: t(w) • The expected confidence of the facts provided by w • Implication between facts • imp(f1 → f2) : between -1 and 1 • positive: if f1 is correct then f2 is likely to be correct
Four Basic Heuristics • Usually there is only one true fact for a property of an object. • This true fact appears to be the same or similar on different web sites. • The false facts on different web sites are less likely to be the same or similar. • In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects.
Basic Inference , where F(w) is the set of facts provided by w.
Computing confidence of a fact • The simple case: a fact f is the only fact about an object.
Handling Additional Subtlety • Are different web sites independent with each other? • Not likely! • The confidence of a fact f can easily be negative if f is conflicting with some facts provided by trustworthy web sites.
Iterative Computation • Initially, it has very little information about the web sites and the facts. (set to uniform trustworthiness) • At each iteration, TRUTHFINDER tries to improve its knowledge about their trustworthiness and confidence, and it stops when the computation reaches a stable state.
Empirical Study Accuracies of TRUTHFINDER and VOTING