1 / 13

Structured Text Retrieval Models

Structured Text Retrieval Models. Str. Text Retrieval. Text Retrieval retrieves documents based on index terms. Observation: Documents have implicit structure. Regular text retrieval and indexing strategies lose the information available within the structure.

giulio
Download Presentation

Structured Text Retrieval Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structured Text Retrieval Models

  2. Str. Text Retrieval • Text Retrieval retrieves documents based on index terms. • Observation: Documents have implicit structure. • Regular text retrieval and indexing strategies lose the information available within the structure. • Text Retrieval desired based on structure. e.g. All documents having “George Bush” in the caption of a photo.

  3. Models for Str. Text Retrieval • PAT Expressions • Overlapped Lists • Proximal Nodes • List of References • Tree-based • Query Languages (SFQL,CCL)

  4. Proximal Nodes By Gonzalo Navarro and Ricardo Baeza-Yates • Based on hierarchical structure of documents • Structure computation is static and all structural elements are defined. “nodes” • Model attempts to define operators on these nodes based on their definition and content. • Only nodes at a particular hierarchy are returned as results.

  5. Proximal Nodes Document Chapter Chapter Section Section Section

  6. Proximal Nodes • Nodes are structural in nature, e.g. Chapter, Section, etc. • Each node has a defined segment (Contiguous part of text) • Operators are defined with respect to this model. • Structure operators and Text operators.

  7. Proximal Nodes • Structure Operators • Name • Inclusion • Positional Inclusion • Distance operators • Child/Parent operators • Set Manipulation operators • Text Operators • Match

  8. Retrieval on Evidence By Mounia Lalmas • Based on documents made up of objects. • Objects are modeled as independent entities and can be in different media, language or locations. • Document indexing – degree of uncertainty that the index term actually represents the object. • Uncertainty must be captured to get better results. • Use the Dempster-Shafer theory of evidence

  9. Retrieval on Evidence • Model takes into consideration disparity between indexing vocabularies. • Aggregation of indexing vocabulary and also the aggregation of the uncertainty. Object o Є O and a type t Є T, the function type is defined as O →∂(T) Aggregation is defined over objects and composite object types contain all the types of the contained objects

  10. Retrieval on Evidence • Indexing vocabulary is defined over a proposition-space. e.g. Wine (english,text), Blue(colour,feature) • Sentence space defines that indexes in the same proposition space can be used together. • Semantic between indexing vocabulary is maintained using the the notion of worlds.

  11. Retrieval on Evidence • Each type t has S, W, v, π • St is the sentence space for a type • W is the possible worlds associated with St • vt is {true, false} over Wt x Pt • Πt is {true, false} over Wt x St Logical and equivalence between sentences is built around the notion of their semantics being equivalent in all or most worlds.

  12. Retrieval on Evidence • However, the uncertainty of the representation remains. • This is represented by the weighting function based on the Dempster Shafer model. • These objects and their syntactic and semantic models are aggregated for the objects which contain them. E.g. A section containing sentences indexed by terms a,b,c,d.. Will be equivalent to sentences over the worlds also implying a,b,c,d…

  13. Comparisons • Proximal Nodes is based on structured documents. It presents the matter clearly and provides approaches towards building a software architecture. It presents findings of conducted experiments. • The Evidence paper tries to model heterogeneous documents, made up of different media, languages, etc. Overall the model is complex and no results are given to its implementation and performance.

More Related