1 / 47

Using your Users’ Taxonomy: Improving Search with Analytics

Using your Users’ Taxonomy: Improving Search with Analytics. John Ferrara Information Architect, Vanguard. Our story – Chapter 1. In fall of 2007, we were transitioning to a new search engine technology Information architects participated in product selection and visioning

kelvin
Download Presentation

Using your Users’ Taxonomy: Improving Search with Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using your Users’ Taxonomy:Improving Search with Analytics John Ferrara Information Architect, Vanguard

  2. Our story – Chapter 1 • In fall of 2007, we were transitioning to a new search engine technology • Information architects participated in product selection and visioning • IA was less involved once implementation started It’s beautiful! Ahh! Ooo!

  3. Chapter 2: First signs of a problem • Project manager noticed some searches didn’t work that well • Asked for help evaluating the quality of results • I tried a few searches and agreed that they seemed to be underperforming

  4. Chapter 3: Bringing it to the developers Where’s the proof? Information Architect & Dev Team Meeting • Told the development team that results seemed off, but they were skeptical. You can’t tell for sure. Search seems to have a few problems… Nah.

  5. Stage 1 – Blind fury Information Architect & Dev Team Meeting DO NOT QUESTION THE INFORMATION ARCHITECT!!

  6. Stage 2 – Getting over yourself Wait, they have a point…

  7. Unsound method for evaluation • The tested searches came from our formal taxonomy • Users might not describe things the same way • Users might not be interested in the same things • All anectodal, no metrics • The handful of searches I tried didn’t do well • Thousands of different searches are submitted each day • Provided no basis for comparison • By what standard do we measure good or bad performance? • How will we know when it’s good enough?

  8. Chapter 4: Recognizing an opportunity • We have: • The most popular searches (our users’ own taxonomy) • The legacy search engine in production • The new search engine running in dev • Excel • All we need is a method

  9. Developed 2 testing methods • Relevancy: How reliably the search engine returns the best matches first. • Quick & easy • Limited insight • Precision: The proportion of relevant and irrelevant results clustered at the top of the list. • Longer & more difficult • Robust insight Both use the users’ taxonomy

  10. Relevancy test, step 1 • Go to the most common queries report • Skip any phrase where: • There’s more than one best target • There is no relevant content on the site • You’re not sure what the user is trying to find • Keep the rest • Try to get enough that the results will be quantitatively significant

  11. For example… Skip any where: Example: There’s more than one best target “Registrar” could either refer to the University registrar or the Law School registrar, which have different pages. Neither one is more plausible than the other. There is no relevant content on the site “Football” has a single clear best target, but it’s hosted on a separate site that’s not indexed in the search engine. This is a problem, but it’s not the fault of the engine. You’re not sure what the user is trying to find “Parking” is a very common search, but it’s vague. It could refer to student parking, event parking, parking registration, visitor parking, or parking tickets

  12. Apparent intention (awfully important) “campus map” • Your judgment of the user’s intention impacts results. • Actual intention: • What the user really had in mind • Can’t get this from search logs • Apparent intention: • How a reasonable person would interpret a search phrase • Search should be held to the human standard, but cannot be expected to do any better • When in doubt, skip it (there’s no shortage of search phrases). You only want to keep phrases where you’re very confident of the user’s intended meaning

  13. Relevancy test, step 2 • Put the narrowed list of search phrases into a spreadsheet

  14. Relevancy test, step 2 • Put the narrowed list of search phrases into a spreadsheet • Add the title of the best target

  15. Relevancy test, step 2 • Put the narrowed list of search phrases into a spreadsheet • Add the title of the best target • Add the URL of the best target

  16. Relevancy test, step 3 • Search for the users’ phrases

  17. Relevancy test, step 3 • Here’s the best target:

  18. Relevancy test, step 3 • Here’s where it is in the search results: #1

  19. Relevancy test, step 3 • Not all phrases may work that well

  20. Relevancy test, step 3 • Here’s the best target:

  21. Relevancy test, step 3 • And here are the top results:

  22. Relevancy test, step 3 • Here’s where the best target was: #17

  23. Relevancy test, step 3 • Record each target’s distance from the top of the list

  24. Relevancy test, step 4 • Go to the results tab • Mean: Average distance from the top • Median: Less sensitive to outliers, but not useful once at least half are ranked #1 • Count - Below 1st: How often is the best target something other than 1st? • Count – Below 5th: How often is the best target outside the critical area? • Count – Below 10th: How often is the best target beyond the first page? For all numbers, the lower the better

  25. Shortcomings of relevancy testing • Has to skip some phrasings • Looking for the “best target” ignores the quality of other results • Tells a narrow story of search performance Precision testing closes these gaps.

  26. What is precision? • In other words, how many of the results that the search engine returns are of good quality? • Users don’t look at all of the results, so we limit the test to the top few. Number of relevant results Total number of results Precision =

  27. Precision test, step 1 • Again, work from the user’s taxonomy • This time we don’t eliminate any phrasings

  28. Precision test, step 1 • Transpose the phrases directly to the spreadsheet

  29. Precision test, step 2 • Search for the users’ phrases

  30. Evaluate relevance on a scale • R – Relevant. Based on the information the user provided, the page's ranking is completely relevant.

  31. Evaluate relevance on a scale • R - Relevant:

  32. Evaluate relevance on a scale • R – Relevant. Based on the information the user provided, the page's ranking is completely relevant. • N – Near. The page is not a perfect match, but it’s clearly reasonable for it to be ranked highly.

  33. Evaluate relevance on a scale • N - Near:

  34. Evaluate relevance on a scale • R – Relevant. Based on the information the user provided, the page's ranking is completely relevant. • N – Near. The page is not a perfect match, but it’s clearly reasonable for it to be ranked highly. • M - Misplaced: You can see why the search engine returned it, but it should not be ranked highly.

  35. Evaluate relevance on a scale • M - Misplaced:

  36. Evaluate relevance on a scale • R – Relevant: Based on the information the user provided, the page's ranking is completely relevant. • N – Near: The page is not a perfect match, but it’s clearly reasonable for it to be ranked highly. • M - Misplaced: You can see why the search engine returned it, but it should not be ranked highly. • I – Irrelevant: The result has no apparent relationship to the user’s search.

  37. Evaluate relevance on a scale • I - Irrelevant:

  38. Use a mnemonic R – Relevant N – Near M – Misplaced I – Irrelevant R – Ralph N – Nader M – Makes I – Igloos Ralph Nader image by Don LaVange Igloo image by NOAA

  39. Precision test, step 3 • Record the ratings of the top 5 results from each search in the spreadsheet

  40. Calculating precision • Precision depends upon what you count as permissable • Our method specifies three parallel standards: • Strict – Only counts completely relevant results • Loose – Counts relevant and near results • Permissive – Counts relevant, near, and misplaced results

  41. Precision test, step 4 • Go to the results tab For these numbers, the higher the better Ralph Nader image by Don LaVange Igloo image by NOAA

  42. Presenting performance metrics

  43. Chapter 5: Bringing back the data Now I see the problem. Information Architect & Dev Team Meeting • The case for change was more compelling because people could see the data and trust it. We need to fix this. Ah!

  44. Actions for remediation

  45. Tracking improvement

  46. Relevancy testing: Quick & easy Provides actionable metrics Has to skip some phrasings Focused on a “best target”, ignores the quality of other results Tells a narrow story of search performance Precision testing: Longer & more difficult Provides actionable metrics Doesn’t skip any phrasings Factors in results that are close enough, making it more realistic Tells a robust story of search performance Evaluating the evaluations

  47. Questions? I’m all ears!

More Related