1 / 18

CSA3080: Adaptive Hypertext Systems I

CSA3080: Adaptive Hypertext Systems I. Lecture 10: Representing Data, Information, and Knowledge II. Dr. Christopher Staff Department of Computer Science & AI University of Malta. Surface-based approaches.

Download Presentation

CSA3080: Adaptive Hypertext Systems I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA3080:Adaptive Hypertext Systems I Lecture 10:Representing Data, Information, and Knowledge II Dr. Christopher Staff Department of Computer Science & AI University of Malta 1 of 18 cstaff@cs.um.edu.mt

  2. Surface-based approaches • Semantic representations and the ability to reason would give computational systems enormous potential • Currently, it is not known what the limitations of the Semantic Web might be • But it is certainly expensive to model knowledge (time, money, computationally) 2 of 18 cstaff@cs.um.edu.mt

  3. Surface-based approaches • Surface-based approaches attempt to approximate using information in the correct context (knowledge), but recognise their limitations • E.g., Mulder [Kwok01] uses an extended boolean IR system to attempt to answer (certain types of) questions. • Reference • Kwok, C.C.T., Etzioni, O., Weld, D.S., 2001, “Scaling Question-Answering to the Web”, in Proceedings of the 10th International WWW Conference, Honk Kong, May 1-5, 2001. http://citeseer.nj.nec.com/kwok01scaling.html 3 of 18 cstaff@cs.um.edu.mt

  4. Surface-based approaches • Mulder • Turns questions into partial phrases, and then submits a phrase query to an IR system • “Does John love Mary?” is turned into the query “John loves Mary” • Documents containing the phrase are evidence • What are the limitations? 4 of 18 cstaff@cs.um.edu.mt

  5. Surface-based approaches • So, given that we are operating in a hypertextual environment, what can we use to i) identify what is of interest to a user • Assumptions • 1: the user interest is represented by a description • 2: description is a formal statement • ii) adapt hyperspace to the user 5 of 18 cstaff@cs.um.edu.mt

  6. Surface-based approaches • This addresses our immediate concerns • i) identify what is of interest to a user • so that a user doesn’t have to describe it • user modelling next lecture • ii) adapt hyperspace to the user • so that a user doesn’t have to find it • adaptation techniques in the last lecture/s 6 of 18 cstaff@cs.um.edu.mt

  7. Surface-based approaches • At their most fundamental • An IR system is document representation + algorithm for matching query to documents • Assume binary weights for terms • A hypertext is a collection of nodes and links • IR and Hypertext allow user interaction • What else can we say about the structures, user interaction, with a view to learning about the user? 7 of 18 cstaff@cs.um.edu.mt

  8. Surface-based approaches • IR: • User submits query • System returns relevant documents • User reads/accesses some • With relevance feedback, user can select examples of relevant/non-relevant documents and IR system will modify the query • If we “remember” users we can remember terms used/documents viewed 8 of 18 cstaff@cs.um.edu.mt

  9. Surface-based approaches • IR: • Documents may be relevant to different queries • Can we learn anything from this? • Some words in query are used as context (to eliminate docs containing diff word senses) • Relevance feedback 9 of 18 cstaff@cs.um.edu.mt

  10. Surface-based approaches • Hypertext • What’s a link really? • Navigation history • Automatic link “typing” • Contextualisation of information • Is a document necessarily identically relevant to all parents? • Is all of a document necessarily relevant to all parents? • Can we learn anything about documents which link to the same child/children? • Are assumptions made about information by authors along a path? 10 of 18 cstaff@cs.um.edu.mt

  11. Surface-based approaches • HyperContext • If we index multiple representations of the same document, will retrieval effectiveness improve? • Can information be added to an interpretation (from its parents) to improve relevance? • Can information be removed from an interpretation if it is non-relevant to a parent? 11 of 18 cstaff@cs.um.edu.mt

  12. Surface-based approaches • This surface-based approach improves retrieval by filtering out non-relevant terms from documents and by adding relevant terms to documents • reducing the number of false positives • increasing the chances of locating a relevant document • It does nothing to expose the “meaning” of the data in the document 12 of 18 cstaff@cs.um.edu.mt

  13. Other examples • WebWatcher [Armstrong95] • Adds user’s search terms to links on path to relevant document so that future users can be guided • Added terms do not need to be present anywhere in the hypertext • Reference • R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell. Webwatcher : A learning apprentice for the world wide web . In 1995 AAAI Spring Symposium on Information Gathering from Heterogeneous Distributed Environments, March 1995. http://citeseer.nj.nec.com/armstrong95webwatcher.html 13 of 18 cstaff@cs.um.edu.mt

  14. Other examples • Analysing Query Logs • Can documents be clustered according to the terms that are used in queries? • Can queries be automatically expanded to find documents relevant to what the user intended to ask for? • Can we use the results of past similar queries? 14 of 18 cstaff@cs.um.edu.mt

  15. Other examples • Analysing “context paths” [Mizuuchi99] • Terms “assumed” in Web pages may be explicit in the access paths to those Web pages • Users who follow links will have read the information • But the info will be missing from the destination pg • Reference • Mizuuchi, Y., and Tajima, K., 1999, “Finding Context Paths for Web Pages”, in Proc. Hypertext 99. http://citeseer.nj.nec.com/mizuuchi99finding.html 15 of 18 cstaff@cs.um.edu.mt

  16. Other examples • Use implicit link types to determine whether a path is “significant” • Link types: • intradirectory • downward • upward • sibling • intersite • Link roles: • entrance • back • jump 16 of 18 cstaff@cs.um.edu.mt

  17. Conclusion • Surface-based approaches to AHS frequently couple IR or log analysis with hypertext • The IR aspect is typically term-feature based • “Meaning” is less embedded within the words/phrases that occur in a document, but with how the document is actually used 17 of 18 cstaff@cs.um.edu.mt

  18. Conclusion • These techniques can be coupled with NL techniques, such as Entity Name Recognition to improve term recognition • E.g., President of USA in one doc is referred to as George W. Bush in another. Query (which is about GWB) is specified as “George Bush” • Still cannot do reasoning about the content of documents 18 of 18 cstaff@cs.um.edu.mt

More Related