1 / 17

Presented by Jungyoon Lee

Truth Discovery with Multiple Conflicting Information Providers on the Web Jiawei Han, Xiaoxin Yin, Philip Yu. Presented by Jungyoon Lee. Motivation. Is the world-wide web always trustable? No!! No guarantee for the correctness of information on the web.

damali
Download Presentation

Presented by Jungyoon Lee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Truth Discovery with Multiple Conflicting Information Providers on the WebJiawei Han, Xiaoxin Yin, Philip Yu Presented by Jungyoon Lee

  2. Motivation • Is the world-wide web always trustable? • No!! • No guarantee for the correctness of information on the web. • What if two different web sites have two different and conflict information of a certain object? • Veracity – conformity to truth.

  3. Example – Authors of books • Who wrote the book “Rapid Contextual Design?”

  4. Survey • 54% of Internet users trust news web sites at least most of time, while this ratio is only 26% for web sites that sell products, and is merely 12% for blogs. • Solutions? • Authority Hub analysis? • PageRank? • Link-based analysis?

  5. TRUTHFINDER • Given a large amount of conflicting information about many objects, which is provided by multiple web sites, how to discover the true fact about each object? • Inter-dependency between facts and web sites • Prob. of a fact being true • Trustworthiness of a web site • Similar to inter-dependency in Authority-Hub analysis?

  6. Inter-dependency • The trustworthiness of a web site does NOT depend on how many facts it provides. • We cannot compute the prob. of a fact being true by adding up the trustworthiness of web sites. • This leads to non-linearity in computation. • Different facts influence each other.

  7. Problem Definition • Input of TRUTHFINDER

  8. Definitions • Confidence of facts: s(f) • The probability of f being correct, according to the best of our knowledge • Trustworthiness of web sites: t(w) • The expected confidence of the facts provided by w • Implication between facts • imp(f1 → f2) : between -1 and 1 • positive: if f1 is correct then f2 is likely to be correct

  9. Four Basic Heuristics • Usually there is only one true fact for a property of an object. • This true fact appears to be the same or similar on different web sites. • The false facts on different web sites are less likely to be the same or similar. • In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects.

  10. Basic Inference , where F(w) is the set of facts provided by w.

  11. Computing confidence of a fact • The simple case: a fact f is the only fact about an object.

  12. Influences between Facts

  13. Handling Additional Subtlety • Are different web sites independent with each other? • Not likely! • The confidence of a fact f can easily be negative if f is conflicting with some facts provided by trustworthy web sites.

  14. Iterative Computation • Initially, it has very little information about the web sites and the facts. (set to uniform trustworthiness) • At each iteration, TRUTHFINDER tries to improve its knowledge about their trustworthiness and confidence, and it stops when the computation reaches a stable state.

  15. Empirical Study Accuracies of TRUTHFINDER and VOTING

  16. Empirical Study

  17. Questions?

More Related