1 / 33

Collaborative Filtering

Collaborative Filtering. Rong Jin Department of Computer Science and Engineering Michigan State University. Outline. Brief introduction information filtering Collaborative filtering Major issues in collaborative filtering Main methods for collaborative filtering

orpah
Download Presentation

Collaborative Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collaborative Filtering Rong Jin Department of Computer Science and Engineering Michigan State University

  2. Outline • Brief introduction information filtering • Collaborative filtering • Major issues in collaborative filtering • Main methods for collaborative filtering • Flexible mixture model for collaborative filtering • Decoupling model for collaborative filtering

  3. Short vs. Long Term Info. Need • Short-term information need (Ad hoc retrieval) • “Temporary need”, e.g., info about used cars • Information source is relatively static • User “pulls” information • Application example: library search, Web search • Long-term information need (Filtering) • “Stable need”, e.g., new data mining algorithms • Information source is dynamic • System “pushes” information to user • Applications: news filter

  4. Examples of Information Filtering • News filtering • Email filtering • Movie/book/product recommenders • Literature recommenders • And many others …

  5. Information Filtering • Basic filtering question: Will user U like item X? • Two different ways of answering it • Look at what U likes  characterize X content-based filtering • Look at who likes X  characterize U collaborative filtering • Combine content-based filtering and collaborative filtering

  6. Other Names for Information Filtering • Content-based filtering is also called • “Adaptive Information Filtering” in TREC • “Selective Dissemination of Information” (SDI) in Library & Information Science • Collaborative filtering is also called • Recommender systems

  7. What to Recommend? Description: A high-school boy is given the chance to write a story about an up-and-coming rock band as he accompanies it on their concert tour. Recommend: ? Description:A homicide detective and a fire marshall must stop a pair of murderers who commit videotaped crimes to become media darlings Rating: Description: A biography of sports legend, Muhammad Ali, from his early days to his days in the ring Rating: No Description: A young adventurer named Milo Thatch joins an intrepid group of explorers to find the mysterious lost continent of Atlantis. Recommend: ? Description:Benjamin Martin is drawn into the American revolutionary war against his will when a brutal British commander kills his son. Rating: Yes Example: Content-based Filtering History

  8. 5 User 3 is more similar to user 1 than user 2  5 for movie “15 minutes” for user 3 Example: Collaborative Filtering

  9. Collaborative Filtering (CF) vs. Content-based Filtering (CBF) • CF do not need content of items while CBF relies the content of items • CF is useful when content of items • are not available or difficult to acquire • are brief and insufficient • Example: movie recommendation • A movie is preferred may because • its actor • its director • its popularity

  10. Application of Collaborative Filtering

  11. Objects: O o1 o2 … ojoj+1… on 3 1 …. … 4 2 ? 2 5 ? 4 3 ? 3 ? 1 2 Users: U u1 u2 … um ? Collaborative Filtering • Goal: Making filtering decisions for an individual user based on the judgments of other users utest3 4…… 1

  12. Collaborative Filtering • Goal: Making filtering decisions for an individual user based on the judgments of other users • General idea • Given a user u, find similar users {u1, …, um} • Predict u’s rating based on the ratings of u1, …, um

  13. 5 User 3 is more similar to user 2 than user 1  5 for movie “15 minutes” for user 3 Example: Collaborative Filtering

  14. Memory-based Approaches for CF • The key is to find users that are similar to the test user • Traditional approach • Measure the similarity in rating patterns between different users • Example: Pearson Correlation Coefficient

  15. Remove the rating bias from each training user Pearson Correlation Coefficient for CF • Similarity between a training user y and a test user y0:

  16. Pearson Correlation Coefficient for CF • Estimate ratings for the test user Weighted vote of normalized rates

  17. Example

  18. Example

  19. Example

  20. Problems with Memory-based Approaches • Most users only rate a few items • Two similar users can may not rate the same set of items  Clustering users and items

  21. Flexible Mixture Model (FMM) Cluster both users and items simultaneously User clustering and item clustering are correlated !

  22. Movie Type I Movie Type II Movie Type III Flexible Mixture Model (FMM) Cluster both users and items simultaneously Unknown ratings are gone!

  23. P(Zo) P(Zu) P(o|Zo) P(u|Zu) O U R P(r|Zo,Zu) Flexible Mixture Model (FMM) Zu: user class Zo: item class U: user O: item R: rating Hidden variable Observed variable Zu Zo

  24. Flexible Mixture Model: Estimation • Annealed Expectation Maximization (AEM) algorithm • E-step: calculate posterior probability for hidden variables zu and Zo • b: temperature for Annealed EM algorithm • M-step: updated parameters

  25. Flexible Mixture Model: Predication • Fold-in process • Repeat the EM algorithm including ratings from the test user • Fix all the parameters except for P(ut|zu) Key issue: What user class does the test user belong to ?

  26. Another Prob. with Memory-based Approaches • Users with similar interests can have different rating patterns  Decoupling preference patterns from rating patterns

  27. Hidden variable Observed variable O U Decoupling Model (DM) Zu: user class Zo: item class U: user O: item R: rating Zo Zu

  28. Zu Zpref Zo Hidden variable Observed variable O U Decoupling Model (DM) Zu: user class Zo: item class U: user O: item R: rating Zpref: whether users like items

  29. Zu Zpref Zo ZR Hidden variable Observed variable O U R Decoupling Model (DM) Zu: user class Zo: item class U: user O: item R: rating Zpref: whether users like items ZR: rating class • Separating preference and rating patterns • User class + Rating class  rating R • Zu Zpref and ZR +Zpref  r

  30. Experiment • Datasets: EachMovie and MovieRating • Evaluation: • Mean Absolute Error (MAE): average absolute deviation of the predicted ratings to the actual ratings on items. • The smaller MAE, the better the performance

  31. Experiment Protocol • Test the sensitivity of the proposed model to the amount of training data • Vary the number of training users • MovieRating dataset: 100 and 200 training users • EachMovie dataset: 200 and 400 training users • Test the sensitivity of the proposed model to the information needed for the test user • Vary the number of rated items provided by the test user • 5, 10, and 20 items are given with ratings

  32. Given: Given: Given: Given: 5 5 5 5 20 20 20 20 10 10 10 10 Experimental Results:FMM and other baseline algorithms MAE MAE A smaller MAE indicates better performance Movie Rating, 200 Training Users Movie Rating, 100 Training Users MAE MAE Each Movie, 400 Training Users Each Movie, 200 Training Users

  33. FMM vs. DM Smaller value indicates better performance Results on Movie Rating Results on Each Movie

More Related