1 / 20

A Method of Rating the Credibility of News Documents on the Web

A Method of Rating the Credibility of News Documents on the Web. Ryosuke Nagura, Yohei Seki Toyohashi University of Technology Aichi, 441-8580, Japan nagu@kde.ics.tut.ac.jp, seki@ics.tut.ac.jp. Noriko Kando National Institute of Informatics Tokyo, 101-8430, Japan kando@nii.ac.jp.

osman
Download Presentation

A Method of Rating the Credibility of News Documents on the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Method of Rating the Credibility of News Documents on the Web Ryosuke Nagura, Yohei Seki Toyohashi University of Technology Aichi, 441-8580, Japan nagu@kde.ics.tut.ac.jp, seki@ics.tut.ac.jp Noriko Kando National Institute of Informatics Tokyo, 101-8430, Japan kando@nii.ac.jp Masaki Aono Toyohashi University of Technology Aichi, 441-8580, Japan aono@ics.tut.ac.jp SIGIR’06, August 6–11, 2006, Seattle, Washington, USA.

  2. ABSTRACT • INTRODUCTION • METHOD -Commonality -Numerical Agreement -Objectivity • EVALUATION • CONCLUSION

  3. INTRODUCTION • When a user collects information from the Web, the information is not always correct. • Many lies are placed on electronic bulletin boards or in blogs by malicious persons. • The purpose of this study is to propose a method to rate the credibility of the information in news articles on the Web.

  4. Commonality • The more news publishers delivered articles with similar content to the target article being assessed, the higher the credibility was rated. • Computed the cosine similarities between sentences in all the articles from different news publishers. • If sentence similarities exceeded a threshold, we defined the sentences as “similar”.

  5. Commonality

  6. Example • Score = (1/3+1/3+2/3+0)/4 = 0.33 (1) 網站 A 網站 B 網站 C 網站 D 的待測文件 A F G H A K L M E B C T Y U Q S X V T C E F W Q R Y T A B C Z

  7. Numerical Agreement • Numerical expressions such as “100 passengers” or “three tracks” occur in news reports. When numerical expressions contradicted those in other articles from different news publishers, the credibility was rated lower.

  8. Numerical Agreement

  9. Example • Score = (3 – 1*2) / 5 = 0.2 (2) 網站 B 網站 C 網站 A 網站 D 的待測文件 1 A 2 F 3 C 4 G 5 I 6 J 7 K 1 A 2 B 3 C 5 P 6 G 7 Z 8 X 1 A 2 B 4 C 5 E 5 F 6 Z 7 G 1 A 2 B 3 C 4 D 5 E

  10. Commonality and Numerical Agreement • When the added scores of (1) and (2) exceeded the threshold σ(= 0.5), we categorized them as candidates for credible articles. • 0.33 + 0.2 = 0.53 >σ(= 0.5)

  11. Objectivity • For candidates for credible articles, the credibility score was rated using a list of speculative clue phrases and the indication of the news sources in the articles. • All the sentences in the article were rated using the score of the speculative clue phrases they contained.

  12. Objectivity

  13. Objectivity • We categorized news source types into four: (a) news agencies and publishers (b) government agencies (c) police and (d) TV/radio.

  14. Objectivity • For the news source, we raised the objectivity score if news sources were given in the article. • The appearance of news agencies such as “Associated Press” raised the objectivity higher than the appearance of “TV/radio”.

  15. EVALUATION • 55,994 news stories on the Web (published from October to November, 2005) were collected every 30 minutes from seven news sites (four newspaper sites, one TV news site, one overseas news agency site, and one evening daily site ; all stories were written in Japanese).

  16. EVALUATION • From the collected articles, 52 articles were selected manually for human assessment. • We manually categorized the 52 articles into 8 topic groups as shown in Table 2 and divided them as having high (4, 3) or low (2,1) credibility scores calculated by our proposed method,

  17. EVALUATION • Combined them into 26 pairs that were published in the same time period. We then asked three assessors to compare credibilities. • Agreement rates = # of agreements / # of article pairs

  18. Example • 有 3 pairs A(4,3) B(1,3) C(4,4) • 若人感覺 A1:高 A2:高 (O) B1:高 B2:低 (X) C1:高 C2:低 (X) Agreement rates = 1 / 3 = 0.33

  19. EVALUATION

  20. CONCLUSION • In an experiment with three assessors, the average agreement between our proposed method and human assessments was 69.1%. • However, it tends to rate a “scoop” rather lower when the method is applied to developing news stories.

More Related