1 / 32

Information Retrieval Overview

Information Retrieval Overview. Dr. Aboud Madlin. Contents. Overview Text-based Information retrieval Documents and Query representation Information retrieval models Clustering and classification Evaluation Measures Search Engines on WWW - INFORMATION RETRIEVAL C. J. van RIJSBERGEN

olathe
Download Presentation

Information Retrieval Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Retrieval Overview Dr. AboudMadlin

  2. Contents • Overview • Text-based Information retrieval • Documents and Query representation • Information retrieval models • Clustering and classification • Evaluation Measures • Search Engines on WWW - INFORMATION RETRIEVAL C. J. van RIJSBERGEN - Automatic Text Processing Gerard Salton

  3. Overview • Definitions • Comparison of data retrieval and information retrieval • Text Based IR • Search Engines on WWW

  4. DefinitionsInformation Retrieval • IR is a branch of applied computer science focusing on • the acquisition, • organization, • storage, • retrieval, and distribution of information. • IR involves helping users find information that matches their information needs. • IR has become a center of the focus in the web era. Its theories, techniques, and applications have reached many fields where processing large amount of information is essential.

  5. Definitions • Document Retrieval • Automatic selection of a subset of Documents in a Corpus of Documents in a way that the selectedDocuments are relevant for the Query or Information Needof the User • Text Retrieval • Retrieval of Documents written or spoken in naturallanguage (the later = Spoken Document Retrieval)

  6. IR Systems Human Components System Components Components of IR Systems • Human Components • Users -- who create the needs of the system (the user) • Organization -- who makes it possible to have the system (the funder) • Information professionals -- who operate the system and provide the services (the server) • System Components • Data -- the content of the system • Device & media -- hardware of the system • Algorithms & procedures -- software of the system

  7. Users • The user • anyone who need to find some information • The user groups • group by their knowledge of the system • group by their domain knowledge • group by information needs • need to locate a particular item • need some information • need all information on a subject

  8. Reality Goals ? Reality Goals ? Reality Goals ? Reality Goals ? Reality Goals ? Info. Needs ?? Info. Systems User’s information needs

  9. Reality Goals ? Reality Goals ? Reality Goals ? Reality Goals ? Reality Goals ? Info. Needs Problems Request ?? Queries Info. Systems First Abstraction Principle Second Abstraction Principle ?? Data

  10. Abstraction Principles • First Abstraction Principle • Abstract data from the “real world” And make them available to the system. • Second Abstraction Principles • Abstract the user’s information needs into a form the system understands.

  11. Information User Search/select Queries Stored Information Info. Needs Translating info. needs to queries Matching queries To stored information Query result evaluation Does information found match user’s information needs? Challenges of IR

  12. DOCUMENTS QUERY INTERPRETATION Analyze Indexation Matching Function QUERY Document Model Model Selected Documents INTERROGATION THESAURUS CONSTRUCTION Information Retrieval Model

  13. Indexation • Indexing with Natural Language index Terms (keywords, Phrases, other sophisticated structures) and full text search: • Detection of content • Index Term weighting • Indexing with controlled language : Thesaurus class terms, Classification Codes, ... • Both can be Manual or Computerized

  14. Interrogation • Allowing the user to express his Information Need using Query Model and answering this Query using the Matching Function: • Index Terms and Term weighting • Operators • Relevance Feedback and Query expansion with Synonyms and Related terms • The matching function is usually a numeric function that computes the Relevance degree of a Document and a Query

  15. Retrieval Models • They are defined by: • Form in representing Document and Query • Matching Algorithms (D, Q, F, R(qi, dj) • Examples: • Boolean & Extended Boolean Model (set theory) • Vector Space and Generalized Vector Space Models( algebra theory) • Semantic Models • Probabilistic Models • Inference Network Models • Logic Models • ...

  16. Evaluation Measures • Concept of Relevance • Classical Measures: • Recall • nb of retrieval relevant doc/ nb all relevant doc • Precision • nb of retrieval relevant doc/ nb of retrieval doc

  17. User Reformulation Relevance Feedback Matching Function Documents • Consideration of User Decisions Query Selected Documents Results

  18. Information Retrieval Vs Data Retrieval • Data and Information • Data • String of symbols associated with objects, people, .. • Values of an attribute • Data must be interpreted with associated attributes. • Information • The meaning of the data interpreted by a person or a system • Data that changes the state of a person or system that perceives it.

  19. Information Retrieval Vs Data Retrieval • Information and Knowledge • knowledge • Structured information • through structuring, information becomes understandable • Processed Information • through processing, information becomes meaningful and useful knowledge Data information

  20. Information Retrieval Vs Data Retrieval • Documents • Logical unit of text • articles, books, • links, web pages • Other components that come with the text • figures, charts, graphics • multimedia

  21. Information Retrieval Vs Data Retrieval • Textual Data • Repository of human intellectuals • Rich and diverse resources for all answers. • Meaningful and understandable (to users). • Free of pre-formatted structures • continuous • separated into documents • Easy to process by the computer

  22. Information Retrieval Vs Data Retrieval • Textual Data • Massive • Any IR system needs the capability of large scale data processing. • Use of indexes and various representations are required. • Inconsistent • It’s a human language • Same information expressed in different way • Different information expressed in similar ways. • Incomplete (It’s an open system)

  23. Information Retrieval Vs Data Retrieval • Retrieval • Text retrieval • Document retrieval • Information retrieval • We can’t retrieve information! • We can only retrieve documents that contains text which carries information. • Information can be anywhere • in the text, in the links, in the process of text.

  24. Information Retrieval Vs Data Retrieval Information Retrieval • Conceptually, information retrieval is used to cover all related problems in finding needed information • Historically, information retrieval is about document retrieval, emphasizing document as the basic unit • Technically, information retrieval refers to (text) string manipulation, indexing, matching, querying, etc.

  25. Information Retrieval Vs Data Retrieval • Information Retrieval Systems • The goal of IR systems is to help users find information that satisfies their information needs. • The process of IR systems is to match two abstractions: • data abstracted in the system • queries abstracted from user’s information needs • Information retrieval is much more difficult than data retrieval

  26. Comparison of data retrieval and information retrieval Data retrieval Information retrieval Content Data Information Data object Table Document Matching Exact match Partial match, best match Items wanted Matching Relevant Query language SQL(artificial) Natural Query specification Complete Incomplete Model Deterministic Probabilistic Highly structured less structure

  27. Other methods for Retrieving Information • Browsing or Navigation System • Using Hypertext or Hypermedia links until relevant document • Question-Answering System • User asks Questions • Answer is directly extracted from Document Collection

  28. Text-Based Information Retrieval • Fundamental Techniques • Document and Query Representation • Term weighting schemes based on Corpus Statistics • Retrieval Models • Document Clustering/Classification • Data Structures and Search Techniques • Evaluation Measures

  29. Text-Based Information Retrieval • New Challenges • Statistical methods & Machine Learning Techniques applied to Text Retrieval • Text Categorization • Text Summarization • Cross-Language IR • Knowledge Representation and use of Knowledge Bases

  30. Search Engines on WWW • Web Search Engines • Crawling Agents • Indexes of the Web pages • Query Interface • Answer Interface • Retrieval Models • Content based models • Link based models • Specific Purpose Search Engines

  31. From web to semantic web • Current web problems (unstructured information) • How can we convert the unstructured information to structured information? • Semantic Web mining • Search engines

  32. using clustering algorithms to facilitate the browsing of the search engine's results • Web Document Clustering Problem • Online Clustering framework VS Offline clustering framework • Modeling Online Clustering framework (Vector-space theory) • Decreasing the high dimension of the feature vector • Modeling Offline clustering framework (Graph-based theory) • Future research issues • Arabic language • Google and Semantic Web

More Related