200 likes | 212 Views
This article explores a framework for measuring indexing scheme efficiency based on access overhead and storage redundancy. It discusses the impact of range and set queries, emphasizing the trade-offs between redundancy and access overhead. The study focuses on lower bounds and the challenges of optimizing efficiency for different workloads. It also touches upon the theory of indexability and open problems in the field.
E N D
On the analysis of indexing schemes Written by: Joseph M. Hellerstein Elias Koutsoupias Christos H. Papadimitriou Presented by Tali Kaufman
Presentation layout Problem definition - define a framework to measure the efficiency of an index. Performance factors - access overhead and storage redundancy. Range-queries access overhead upper bound access overhead lower bound (r = 1) access overhead lower bound (r >= 1) Set-queries worst-case access overhead conclusions open problems
The problem Problem - define a framework for measuring the efficiency of an indexing scheme for a workload, based on two performance factors: storage redundancy and access overhead. Workload - a definition of a data set and a set of potential queries. Indexing scheme - a collection of blocks, which store an actual data set instance.
Access overhead upper boundfor two dimensional range queries
Conclusions Theory of indexability- the article presents a framework for studying indexability. Workload and index scheme in indexability theory vs. language and algorithm in complexity theory. Emphasis the secondary storage nature of indexing schemes, examine storage utilization(redundancy) and disk access (access overhead) Consider range queries and set queries and focus on lower bounds and trade-off between redundancy and access overhead The trade-off is worse for workloads with large number of queries (set queries - exponential, range queries - polynomial) Algorithms to find the best access methods (search algorithms), and to find best partition into blocks, are not considered. The size of the instance does not affect the results