CSA3080: Adaptive Hypertext Systems I

CSA3080:Adaptive Hypertext Systems I Lecture 10:Representing Data, Information, and Knowledge II Dr. Christopher Staff Department of Computer Science & AI University of Malta 1 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • Semantic representations and the ability to reason would give computational systems enormous potential • Currently, it is not known what the limitations of the Semantic Web might be • But it is certainly expensive to model knowledge (time, money, computationally) 2 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • Surface-based approaches attempt to approximate using information in the correct context (knowledge), but recognise their limitations • E.g., Mulder [Kwok01] uses an extended boolean IR system to attempt to answer (certain types of) questions. • Reference • Kwok, C.C.T., Etzioni, O., Weld, D.S., 2001, “Scaling Question-Answering to the Web”, in Proceedings of the 10th International WWW Conference, Honk Kong, May 1-5, 2001. http://citeseer.nj.nec.com/kwok01scaling.html 3 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • Mulder • Turns questions into partial phrases, and then submits a phrase query to an IR system • “Does John love Mary?” is turned into the query “John loves Mary” • Documents containing the phrase are evidence • What are the limitations? 4 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • So, given that we are operating in a hypertextual environment, what can we use to i) identify what is of interest to a user • Assumptions • 1: the user interest is represented by a description • 2: description is a formal statement • ii) adapt hyperspace to the user 5 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • This addresses our immediate concerns • i) identify what is of interest to a user • so that a user doesn’t have to describe it • user modelling next lecture • ii) adapt hyperspace to the user • so that a user doesn’t have to find it • adaptation techniques in the last lecture/s 6 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • At their most fundamental • An IR system is document representation + algorithm for matching query to documents • Assume binary weights for terms • A hypertext is a collection of nodes and links • IR and Hypertext allow user interaction • What else can we say about the structures, user interaction, with a view to learning about the user? 7 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • IR: • User submits query • System returns relevant documents • User reads/accesses some • With relevance feedback, user can select examples of relevant/non-relevant documents and IR system will modify the query • If we “remember” users we can remember terms used/documents viewed 8 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • IR: • Documents may be relevant to different queries • Can we learn anything from this? • Some words in query are used as context (to eliminate docs containing diff word senses) • Relevance feedback 9 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • Hypertext • What’s a link really? • Navigation history • Automatic link “typing” • Contextualisation of information • Is a document necessarily identically relevant to all parents? • Is all of a document necessarily relevant to all parents? • Can we learn anything about documents which link to the same child/children? • Are assumptions made about information by authors along a path? 10 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • HyperContext • If we index multiple representations of the same document, will retrieval effectiveness improve? • Can information be added to an interpretation (from its parents) to improve relevance? • Can information be removed from an interpretation if it is non-relevant to a parent? 11 of 18 cstaff@cs.um.edu.mt

Surface-based approaches • This surface-based approach improves retrieval by filtering out non-relevant terms from documents and by adding relevant terms to documents • reducing the number of false positives • increasing the chances of locating a relevant document • It does nothing to expose the “meaning” of the data in the document 12 of 18 cstaff@cs.um.edu.mt

Other examples • WebWatcher [Armstrong95] • Adds user’s search terms to links on path to relevant document so that future users can be guided • Added terms do not need to be present anywhere in the hypertext • Reference • R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell. Webwatcher : A learning apprentice for the world wide web . In 1995 AAAI Spring Symposium on Information Gathering from Heterogeneous Distributed Environments, March 1995. http://citeseer.nj.nec.com/armstrong95webwatcher.html 13 of 18 cstaff@cs.um.edu.mt

Other examples • Analysing Query Logs • Can documents be clustered according to the terms that are used in queries? • Can queries be automatically expanded to find documents relevant to what the user intended to ask for? • Can we use the results of past similar queries? 14 of 18 cstaff@cs.um.edu.mt

Other examples • Analysing “context paths” [Mizuuchi99] • Terms “assumed” in Web pages may be explicit in the access paths to those Web pages • Users who follow links will have read the information • But the info will be missing from the destination pg • Reference • Mizuuchi, Y., and Tajima, K., 1999, “Finding Context Paths for Web Pages”, in Proc. Hypertext 99. http://citeseer.nj.nec.com/mizuuchi99finding.html 15 of 18 cstaff@cs.um.edu.mt

Other examples • Use implicit link types to determine whether a path is “significant” • Link types: • intradirectory • downward • upward • sibling • intersite • Link roles: • entrance • back • jump 16 of 18 cstaff@cs.um.edu.mt

Conclusion • Surface-based approaches to AHS frequently couple IR or log analysis with hypertext • The IR aspect is typically term-feature based • “Meaning” is less embedded within the words/phrases that occur in a document, but with how the document is actually used 17 of 18 cstaff@cs.um.edu.mt

Conclusion • These techniques can be coupled with NL techniques, such as Entity Name Recognition to improve term recognition • E.g., President of USA in one doc is referred to as George W. Bush in another. Query (which is about GWB) is specified as “George Bush” • Still cannot do reasoning about the content of documents 18 of 18 cstaff@cs.um.edu.mt

CSA3080: Adaptive Hypertext Systems I