15.6 Index Based Algorithms

15.6 Index Based Algorithms Akshay Reddy Chada CS 257

Contents • Clustering and non-clustering indexes • Index based Selection • Joining using an index • Joining using a sorted index

Algorithms are useful for the selection operator. • In a clustered relation tuplesare packed roughly as few blocks, as they can possibly hold those tuples.

Clustering And Nonclustering Indexes • Clustering indexes are on an attribute or attributes such that all the tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them. • A relation that isn’t clustered cannot have a clustering index

Index-based Selection • For a selection σC(R), suppose C is of the form a=v, where a is an attribute • For clustering index R.a: the number of disk I/O’s will be B(R)/V(R,a)

Index-based Selection • The actual number may be higher: 1. index is not kept entirely in main memory 2. they spread over more blocks 3. may not be packed as tightly as possible into blocks

Example • B(R)=1000, T(R)=20,000 number of I/O’s required: • 1. clustered, not index 1000 • 2. not clustered, not index 20,000 • 3. If V(R,a)=100, index is clustering 10 • 4. If V(R,a)=10, index is nonclustering 2,000

Joining by using an index • Natural join R(X, Y) S S(Y, Z) Number of I/O’s to get R Clustered: B(R) Not clustered: T(R) Number of I/O’s to get tuple t of S Clustered: T(R)B(S)/V(S,Y) Not clustered: T(R)T(S)/V(S,Y)

Example • R(X,Y): 1000 blocks S(Y,Z)=500 blocks Assume 10 tuples in each block, so T(R)=10,000 and T(S)=5000 V(S,Y)=100 If R is clustered, and there is a clustering index on Y for S the number of I/O’s for R is: 1000 the number of I/O’s for S is10,000*500/100=50,000

Joining Using a Sorted index • Natural join R(X, Y) S (Y, Z) with index on Y for either R or S • Extreme case: Zig-zagjoin • Example: relation R(X,Y) and R(Y,Z) with index on Y for both relations search keys (Y-value) for R: 1,3,4,4,5,6 search keys (Y-value) for S: 2,2,4,6,7,8

Joining using a sorted index • Used when the index is a B-tree, or structure from which we easily can extract the tuples of a relation in sorted order.

15.6 Index Based Algorithms