1 / 45

NLP Support for Faceted Navigation in Scholarly Collections

Learn about the application of CastaNet algorithm for automatic creation of facet hierarchies in scholarly collections. Explore the advantages of faceted navigation and how it enhances browsing and searching in digital libraries.

oliverdean
Download Presentation

NLP Support for Faceted Navigation in Scholarly Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLP Support for Faceted Navigation in Scholarly Collections ACL’09 Workshop on NLP for Scholarly Collections Marti Hearst and Emilia Stoica Presented by Preslav Nakov

  2. Motivation • Faceted navigation is now standard for “vertical” content collections • e-commerce stores • image collections • It is also being used for digital libraries • WorldCat, NCSU, Chicago • Problem: the facets for the SUBJECT facet need to be richer. • How to automatically create these facets? • Our solution: CastaNet applied to scholarly collections

  3. Outline • Definition of faceted metadata • Examples of faceted navigation in use • Castanet: an algorithm for (semi) automatic creation of facet hierarchies • Application of Castanet to a scholarly collection

  4. The Idea of Facets • Facets are a way of labeling data • A kind of Metadata (data about data) • Can be thought of as properties of items • Facets vs. Categories • Items are placed INTO a category system • Multiple facet labels are ASSIGNED TO items

  5. The Idea of Facets • Create INDEPENDENT categories (facets) • Each facet has labels (sometimes arranged in a hierarchy) • Assign labels from the facets to every item • Example: recipe collection Ingredient Cooking Method Chicken Stir-fry Bell Pepper Curry Course Cuisine Main Course Thai

  6. The Idea of Facets • Break out all the important concepts into their own facets • Sometimes the facets are hierarchical • Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

  7. Using Facets • Now there are multiple ways to get to each item Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Fruit > Pineapple Dessert > Cake Preparation > Bake Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze

  8. Faceted navigation’s advantages: • Integrate browsing and searching seamlessly • Support exploration and learning • Avoid dead-ends, “pogo’ing”, and “lostness”

  9. Uses of Faceted Navigation in Online Digital Libraries

  10. WorldCat

  11. WorldCat

  12. U Chicago

  13. U Chicago

  14. Advantages of Facets • Can’t end up with empty results sets • (except with keyword search) • Helps avoid feelings of being lost. • Easier to explore the collection. • Helps users infer what kinds of things are in the collection. • Evokes a feeling of “browsing the shelves” • Is preferred over standard search for collection browsing in usability studies. • (Interface must be designed properly)

  15. Limitation of Facets • Do not naturally capture MAIN THEMES • Facets do not show RELATIONS explicitly Aquamarine Red Orange Door Doorway Wall • Which color associated with which object? Photo by J. Hearst, jhearst.typepad.com

  16. Usability Studies (using Flamenco) • Usability studies done on 3 collections: • Recipes (epicurious): 13,000 items • Architecture Images: 40,000 items • Fine Arts Images: 35,000 items • Conclusions: • Users like and are successful with the dynamic faceted hierarchical metadata, especially for browsing tasks • Very positive results, in contrast with studies on earlier iterations.

  17. How to Create Facet Hierarchies? Our Approach: Castanet

  18. Biomedical Journal Titles (3275 Titles)

  19. Castanet Output (Bio titles)

  20. Castanet Output (Bio titles)

  21. Castanet Output (LibraryThing tags)

  22. Castanet Output (LibraryThing Tags)

  23. Castanet Output (LibraryThing Tags)

  24. Our Approach:Leverage the structure of WordNet

  25. Build tree Compress tree Select terms Get hypernym paths WordNet Divide into facets Our Approach • Leverage the structure of WordNet Documents

  26. Select well distributed terms from collection red blue 1. Select Terms Build tree Comp. tree Documents Select terms Get hypernym paths WordNet

  27. Build tree Comp. tree Documents Select terms Get hypernym paths abstraction abstraction property property WordNet visual property visual property color color chromatic color chromatic color red, redness blue, blueness 2. Get Hypernym Path red blue

  28. abstraction abstraction abstraction property property property visual property visual property visual property color color color chromatic color chromatic color chromatic color red, redness blue, blueness red, redness blue, blueness red blue 3. Build Tree Build tree Comp. tree Documents Select terms Get hypernym paths WordNet red blue

  29. color chromatic color red blue green 4. Compress Tree Build tree Comp. tree Documents Select terms Get hypernym paths WordNet color chromatic color red, redness blue, blueness green, greenness red blue green

  30. 4. Compress Tree (cont.) Build tree Comp. tree Documents Select terms Get hypernym paths WordNet color color chromatic color red blue green red blue green

  31. 5. Divide into Facets Divide into facets

  32. 2 paths for same word Sense 2 for word “tuna” organism, being => fish => food fish => tuna => bony fish => spiny-finned fish => percoid fish => tuna Sense 1 for word “tuna” organism, being => plant, flora => vascular plant => succulent => cactus => tuna 2 paths for same sense Disambiguation • Ambiguity in: • Word senses • Paths up the hypernym tree

  33. How to Select the Right Senses and Paths? • First: build core tree • (1) Create paths for words with only one sense • (2) Use Domains • Wordnet has 212 Domains • medicine, mathematics, biology, chemistry, linguistics, soccer, etc. • Automatically scan the collection to see which domains apply • The user selects which of the suggested domains to use or may add own • Paths for terms that match the selected domains are added to the core tree • Then: add remaining terms to the core tree.

  34. Using Domains dip glosses: Sense 1: A depression in an otherwise level surface Sense 2: The angle that a magnet needle makes with horizon Sense 3: Tasty mixture into which bite-size foods are dipped dip hypernyms Sense 1 Sense 2 Sense 3 solid shape, form food => concave shape => space => ingredient, fixings => depression => angle => flavorer Given domain “food”, choose sense 3

  35. Castanet Evaluation

  36. Castanet Evaluation • This is a tool for information architects, so people of this type did the evaluation • We compared output on • Recipes • Biomedical journal titles • We compared to two state-of-the-art algorithms • LDA (Blei et al. 04) • Subsumption (Sanderson & Croft ’99)

  37. Subsumption Output (Bio titles)

  38. Subsumption Output (Bio titles)

  39. LDA Output (Bio titles)

  40. LDA Output (Bio titles)

  41. Evaluation Method • Information architects assessed the category systems • For each of 2 systems’ output: • Examined and commented on top-level • Examined and commented on two sub-levels • Then comment on overall properties • Meaningful? • Systematic? • Likely to use in your work?

  42. Evaluation Results (Bio titles) • 15 participants, all PubMed Users • Results for “Would you use this system in your work?” • Answering “Yes in some cases” or “yes definitely” • Pine (Castanet): 11/15 • Oak (LDA): 1/7 • Birch (Subsumption): 1/8

  43. Evaluation Results (recipes) • Results on recipes collection for “Would you use this system in your work?” • Yes in some cases or yes definitely: • Pine (Castanet): 29/34 • Oak (LDA): 0/18 • Birch (Subsumption): 6/16 • Results on quality of categories:

  44. Conclusions • Flexible application of hierarchical faceted metadata is a proven approach for navigating scholarly collections. • Midway in complexity between simple hierarchies and deep knowledge representation. • Currently in use in digital library sites, but the SUBJECT categories need more work. • Algorithms are needed to help create faceted metadata structures • Our WordNet-based algorithm, while not perfect, provides a good starting point for scholarly collections

  45. For more information:flamenco.berkeley.edu Thank you! Preslav Nakov, Marti Hearst & Emilia Stoica

More Related