1 / 18

Netflix and Beyond

Netflix and Beyond. Tuning Solr for great results. Walter Underwood http:// wunderwood.org/most_casual_observer /. Typical Web Query Mix. informational navigational (known-site) transactional (known-item) (Andrei Broder , AltaVista, 2002). “talking rat movie”.

caraf
Download Presentation

Netflix and Beyond

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Netflix and Beyond Tuning Solr for great results.Walter Underwood http://wunderwood.org/most_casual_observer/

  2. Typical Web Query Mix • informational • navigational (known-site) • transactional (known-item) (Andrei Broder, AltaVista, 2002)

  3. “talking rat movie”

  4. Top Queries October 2006 • finding neverland • bridgetjones • closer • the incredibles • incredibles • ladder 49 • fat albert • being julia • ray • national treasure alfie spanglish star wars meet the fockers final cut hotel rwanda neverland after the sunset million dollar baby hitch

  5. Netflix Queries • 92% movie titles • 5% genres and categories • 3% people Known-item queries make up 95% of Netflix traffic.

  6. Problematic User Behavior • One or two words? • Partial words • Misspellings

  7. One or Two Words?

  8. Partial Words • People don’t like to make mistakes: • rat, rata, ratat • apoc • koyaanisq • Phonetic encoding (soundex) assumes complete words

  9. Autocomplete Finishes Words • Load movie titles and popular people • 10% improvement in search quality (MRR) • 10X as much traffic as search queries • Dedicated Solr with RAMDirectory • Front-end HTTP cache, 1 hour lifetime, 80% hit rate

  10. Some Misspellings • shakespear • the incredables • seven samarai • breakfast at tiffiney • blazing sadles • selen • scorupko • taeku • christopherwalkin • return to lonsom dove • teh matrix • comdytv pirhana dungens and dragons pufiyami al pachino incredables gundan seed mobile suit chatterluy white fany to the rsecue meet the faulkers brigettejoes diary oh brother where are thou? pirartes of the carr

  11. Switch from Phonetic to Fuzzy • Tested a dozen algorithms with users • 250K queries per test cell • JaroWinkler slightly better than Levenstein • JaroWinkler with 0.7 is very, very broad match • “koyaanisqatsi” matches “koy” (yuck!) • but “1048” matches “1408”

  12. Problematic Corpus Behavior • Missing movies • Ollie Hopnoodle’s Haven of Bliss • CJ7 • Hard-to-spell names • Ratatouille • Coraline • InglouriousBasterds • Hard-to-remember names • Click • Apocalypto • Seven Up Plus Seven

  13. Metrics: MRR • Mean Reciprocal Rank • Weighted clickthrough, measured on site traffic • #1 is a full click • #2 is a half click • #3 is one third click • etc. • Daily, weekly, and seasonal variations • Overall customer satisfaction • Good for A/B tests, weak for finding bugs

  14. Per-query Metrics • Useful for finding problems • MRR • Clickthrough percent • Most-clicked rank (#1 is good) • Percentage of clicks on most-clicked • known-item queries are over 80% • categories are under 50%

  15. Success Looks Like … • MRR consistently over 0.5 • 85% of clicks on #1

  16. Questions?

More Related