1 / 26

(Tacitly) Collaborative Question Answering Utilizing Web Trails

(Tacitly) Collaborative Question Answering Utilizing Web Trails. Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010. Collaboration. Working together Efficiency, sharing vs. groupthink Tacit collaboration Professional analysts → COLLANE system

denton
Download Presentation

(Tacitly) Collaborative Question Answering Utilizing Web Trails

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (Tacitly) Collaborative Question Answering Utilizing Web Trails Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 LREC QA workshop

  2. Collaboration • Working together • Efficiency, sharing vs. groupthink • Tacit collaboration • Professional analysts → COLLANE system • Information sharing • Why and when • Collaborative filtering • Sharing insight and experience LREC QA workshop

  3. Outline • Introduction • Collaborative Knowledge Layer • Web Trails • Exploratory Episodes • Experiments • Data Collection • Results • Collaborative Sharing • Conclusions • Future Research LREC QA workshop

  4. Sharing on the Internet? • Internet users leave behind trails of their work • What they asked • What links they tried • What worked and what didn’t • Capture this Exploratory Knowledge • Utilize this Knowledge for subsequent users • Tacitly enables Collaborative Question Answering • Improved Efficiency and Accuracy LREC QA workshop

  5. Collaborative Knowledge Layer • Captures exploration paths (Web Trails) • Supplies meaning to the underlying data • May clarify/alter originally intended meaning • Hypothesis: CKL may be utilized to • Improve interactive QA • Support tacit collaboration • Current Experiments • Capturing web exploration trails • Computing degree of trail overlap LREC QA workshop

  6. Collaborative Space LREC QA workshop

  7. Web Trails • Consists of individual exploratory moves • Entering a search query • Typing text into an input box • Responses from the browser • Offers accepted or ignored • Files saved • Items viewed • Links clicked through, etc. • Returns to the search box • Contain optimal paths leading to specific outcomes LREC QA workshop

  8. Exploratory Episodes • Discovered overlapping subsequences of web trails • Common portions of exploratory web trails from multiple network users • May begin with a single user web trail • Shared with new users who appear to be pursuing a compatible task LREC QA workshop

  9. G E F D T C B Q M A K A-B-Q-D-G Exploratory Episode helps new user from M to G LREC QA workshop

  10. Experiment • Evaluate degree of web trail overlap • 11 Research Problems Defined • Generated 100 short queries for each research problem description • Used Google to retrieve the top 500 results from each query • ~500MB per topic • Filtered for duplicates, commercial, offensive topics, etc. • 2GB Corpus of web-mined text LREC QA workshop

  11. Experiment setup • 4-6 Analysts per Research Topic • 2.5 hours per topic • Utilized two fully functional QA Systems • HITQA – Analytical QA system developed under the AQUAINT program at SUNY Albany • COLLANE – Collaborative extension of the HITIQA system developed under the CASE program at SUNY Albany • Analyst’s Objective • Find sufficient information for a 3-page report for the assigned topic LREC QA workshop

  12. Example topic: artificial reefs Many countries are creating artificial reefs near their shores to foster sea life. In Florida a reef made of old tires caused a serious environmental problem. Please write a report on artificial reefs and their effects. Give some reasons as to why artificial reefs are created. Identify those built in the United States and around the world. Describe the types of artificial reefs created, the materials used and the sizes of the structures. Identify the types of man-made reefs that have been successful (success defined as an increase in sea life without negative environmental consequences). Identify those types that have been disasters. Explain the impact an artificial reef has on the environment and ecology. Discuss the EPA’s (Environmental Protection Agency) policy on artificial reefs. Include in your report any additional related information about this topic. LREC QA workshop

  13. What is COLLANE? • Collaborative environment • Analysts work in teams • Synchronously and asynchronously • Information sharing on as-needed basis An Analytic Tool Exploits the strength of collaborative work LREC QA workshop

  14. Collaborating via COLLANE • A team of users work on a task • Each user has own working space • A Combined Answer Space is created • Made out of individual contributions • Users interact with the system • Via question answering and visual interfaces • The system observes and facilitates • Shares relevant information found by others  tacit collaboration • Users interact with each other • Exchange tips and data items via a chat facility  open collaboration LREC QA workshop

  15. COLLANE/HITIQA user interface LREC QA workshop

  16. Key Tracked Events • Questions Asked • Data Items Copied • Data Items Ignored • Systems offers accepted/rejected • Displaying Text • Words searched in user interface • All dialogue between user and system • Bringing up full document source • Passages viewed • Time spent LREC QA workshop

  17. Experimental Results • Aligned Episodes on common data items • Only considered user copy as indicator • Used document level overlap • Ignored potential content overlap between different documents • Lower bound on Episode overlap LREC QA workshop

  18. LREC QA workshop

  19. A-G & E-H 60-75% Overlap Artificial Reefs Example LREC QA workshop

  20. Experimental Results • 95 Exploratory Episodes • EE grouped by the degree of overlap • 60% or higher → may be shared? • OR • 40% or lower → divergent? • Find an overlap threshold • Maximize information sharing • Minimize rejection LREC QA workshop

  21. Some topics appear more suitable for information sharing and tacit collaboration LREC QA workshop

  22. At 50% episode overlap threshold more than half of all episodes are candidates for sharing LREC QA workshop

  23. Collaborative Sharing Objective • Leverage Exploratory Knowledge • Use experience and judgment of users who faced the same or similar problem • Provide superior accuracy and responsiveness to subsequent users • Similar to Relevance Feedback in IR • Community based rather than single user judgment LREC QA workshop

  24. Utilize User B trail • Offer D4-D7 to User D • After D3 copy • Avoids 2 fruitless questions • Q2 & Q4 • Finds extra potential relevant data point • D7 LREC QA workshop

  25. Conclusion • Users searching for information in a networked environment leave behind exploratory trails that can be captured • Exploratory Episodes can be compared for overlap by data items copied • Many users searching for same or highly related information are likely to follow similar routes through the data • When a user overlaps an EE above a threshold they may benefit from tacit information sharing LREC QA workshop

  26. Future Research • Evaluate overlap utilizing semantic equivalence of data items copied • Distill Exploratory Episodes into shareable knowledge elements • Expand overlap metrics • Question similarity • Items Ignored, etc. • Evaluate frequency of acceptance of offered material • Varying thresholds LREC QA workshop

More Related