1 / 29

Exploring Linkability of User Reviews

Exploring Linkability of User Reviews. Mishari Almishari and Gene Tsudik University of California, Irvine. Roadmap. Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion. Motivation. Increasing P opularity of Reviewing Sites

mindy
Download Presentation

Exploring Linkability of User Reviews

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Linkabilityof User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

  2. Roadmap Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion

  3. Motivation Increasing Popularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010

  4. Example category Rating

  5. Motivation Rising awareness of privacy

  6. Motivation How is it applied? Traceability/Linkability Linkability of Ad hoc Reviews Linkablility of Several Accounts

  7. Goal Assess the linkability in user reviews

  8. Roadmap Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion

  9. Data Set • 1 Million Reviews • 2000 Users • more than 300 reviews

  10. Problem Settings

  11. Problem Settings

  12. IR: Identified Record AR: Anonymous Record IR AR Problem Formulation IR AR IR AR AR IR

  13. TOP-X Linkability X: 1 and 10 1, 5, 10, 20,…60 Anonymous Record (AR) Problem Settings Matching Model Identified Records (IR’s)

  14. Methodologies (1) Naïve Bayesian Model Decreasing Sorted List of IRs (2) Kullback-LeiblerDivergence (KLD) Increasing Sorted List of IRs Maximum-Likelihood Estimation

  15. Tokens • Unigram: • “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y” • 26 values • Digram • “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy” • 676 values • Rating • 5 values • Category • 28 values

  16. Roadmap Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion

  17. NB -Unigram Unigram Results Linkability Ratio Size 60, LR 83%/ Top-1 LR 96% Top-10 Anonymous Record Size

  18. Digram Results NB -Digram Size 20, LR 97%/ Top-1 Size10, LR 88%/ Top-1 Linkability Ratio Anonymous Record Size

  19. Improvement (1): Combining Lexical and non-Lexical ones NB Model Gain, up to 20% Linkability Ratio Anonymous Record Size Size 30, 60 % To 80% Size 60, 83 % To 96%

  20. What about Restricting Identified Record (IR) Size? NB Model KLD Model Linkability Ratio Linkability Ratio Anonymous Record Size Anonymous Record Size Performed better for smaller IR Affected by IR size Size 20 or less, improved

  21. Improvement (2): Matching All IR’s At Once ✔ v4 v2 v3 v1 ✖ ✔ v7 v5 v6 v8 ✖ ✖ ✔ v9 v10 v12 v11 ✖ ✖ ✖ ✔ v15 v14 v13 v16

  22. Matching All Results Restricted IR Full IR Linkability Ratio Linkability Ratio Anonymous Record Size Anonymous Record Size Gain, up to 16% Gain, up to 23% Size 30, From 74% To 90% Size 20, From 35% To 55%

  23. Improvement (3): For Small IR Size Changing it to: 0.5 + Review Length Gain up to 5% Size 10, 89% To 92% Linkability Ratio Size 7, 79% To 84% Anonymous Record Size

  24. Roadmap Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion

  25. Discussion • Unigram and Scalability • 26 VS 676 • 59 VS 676 • Less than 10% • Prolific Users • On the long run, will be prolific • Anonymous Record Size • A set of 60 reviews, less than 20% of minimum contribution • Detecting Spam Reviews

  26. Roadmap Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion

  27. Future Work • Improving more for Small AR’s • Other Probabilistic Models • Using Stylometry • Review Anonymization • Exploring Linkability in other Preference Databases

  28. Conclusion • Extensive Study to Assess Linkability of User Reviews • For large set of users • Using very simple features • Users are very exposed even with simple features and large number of authors Takeaway Point: Reviews can be accurately de-anonymized using alphabetical letter distributions

  29. Questions?

More Related