1 / 22

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management. Eytan Adar et al Presented by Xiao Hu CS491CXZ . Outline. What is Haystack Haystack and IR Data Model System Architecture Information Gathering Problems. What is Haystack?.

imelda
Download Presentation

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Haystack: Per-User Information Environment1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ

  2. Outline • What is Haystack • Haystack and IR • Data Model • System Architecture • Information Gathering • Problems

  3. What is Haystack? • A software for organizing and retrieving personal information • Totally personalized • One user, one Haystack • Personal digital bookshelf • A prototype

  4. Haystack and IR • Haystack • personal collection • user’s satisfaction • particular user • focus on searching • specific to one user • can observe user’s implicit information needs • IR • large corpus • precision-recall metric • “expert”relevance judge • IF (collaborative filtering) • preference for similar users • require explicit user input All Users / Groups ofUsers A Single User

  5. Haystack Functionality • Automated data gathering • Information maximization • gathering as much information as possible • Customized information collection • Adaptation to individual query needs A IR system that adapts to its user ?

  6. General Data Model • Accommodate all information • arbitrary pieces of data • metadata • links between them • Facilitate data growth • new data • user’s annotation • user’s information behavior • A semantic network • full text searching • bibliographic info. Searching • associate searching • adapt to the user

  7. General Data Model (illustration)

  8. General Data Model (summary) • Inheritance hierarchy • Straw  needle: primitive information •  bale: collection of related straws •  tie: relationship b/w straws • Metadata representation • Recursive metadata annotation • Interface Haystack to external “services” • Index agents controlling external devices

  9. System Architecture • Database, searching engine • an adapter as interface to various engines • Core Haystack system (root server) • data model implementation • operation-system-like services • Client level services • user interface • proxy services • data augmenting services • annotation, querying, browsing • observing interaction with external information resources • modifying data, adding links,…

  10. System Architecture (illustration)

  11. Indexing in Haystack • Straws generate textual information • IR system stores such information • Info. from each straw will be regarded as one unit of indexing • allows to associate pieces of information • Incrementally indexing • whenever a series of changes happen

  12. Outline • What is Haystack • Haystack and IR • Data Model • System Architecture • Data Gathering • Problems

  13. Information gathering • User’s explicit annotation • User’s behaviors observed by the system • interaction with outside world (www, emails) • interaction with Haystack • building query paths – adapting to the user’s style • Analyzing corpus already in Haystack • indexing • metadata extraction • adding links between documents

  14. User’s explicit annotation • Probably the best information source • Might not be realistic • Nicer interface to encourage users • HCI studies

  15. Observers • Proxy services • WWW, email proxies • Recording webpages the user sees • Tracing the path of browsing • Recording visiting time • …… • Query observer • Using query interactions to mold the data model to the user • Plug in new data • Adding links b/w nodes • Facilitating retrieval

  16. Query Observer • Integrates queries into the data model • Query straw • a bale, containing query text, rank of docs, …. • attached nodes of matched documents • annotations from user’s choices  relevance feedback • Query path • a chain of query straws in a single searching • good for future retrievals: • presenting similar query terms • adapting relevance of documents by reindexing documents with text of the query path  tuned to a particular user

  17. Information gathering • User’s explicit annotation • User’s behaviors observed by the system • interaction with outside world (www, emails) • interaction with Haystack • building query paths – adapting to the user’s style • Analyzing corpus already in Haystack • indexing • metadata extraction • adding links between documents Data augmenting clients Data driven clients

  18. Data augmenting clients

  19. Data augmenting clients • digesting existing information, generating new information • Independent but cooperating • Fetch clients • Type inference clients • Extractor clients • Field finder clients • Triggered by events: data changes in Haystack

  20. Data augmenting clients: example

  21. Summary • A prototype of a personalized information organization and retrieval system • Relationship with IR • General Data Model • graph, straws, … • System Architecture • three layers: DB, core, clients • Data Gathering • three approaches

  22. Problems • Information maximization assumption • the more, the better? • for one user, but has to be prepared for all users • what are useful clues? • Efficiency issues • dynamic indexing • a slow system (512M memory, 2G disk…) • Today’s haystack project • semantic web, RDF, ontology, user interface …

More Related