1 / 69

CSCE 310 / 603 Database Systems

CSCE 310 / 603 Database Systems. Chapter 14: Index Structures. Indexes on Sequential Files. Effects of Actions on Index. Explanations for Actions. create/destroy empty overflow block has no effect on dense index since it refers to records sparse index since it refers to main records

adonica
Download Presentation

CSCE 310 / 603 Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCE 310 / 603 Database Systems Chapter 14: Index Structures

  2. Indexes on Sequential Files

  3. Effects of Actions on Index

  4. Explanations for Actions • create/destroy empty overflow block has no effect on • dense index since it refers to records • sparse index since it refers to main records • create/destroy empty main block: • no effect on dense index as above • insert/delete entry in sparse index • insert/delete/slide record: • insert/delete/update entry in dense index • only change sparse index if affects first record in block

  5. 70 10 30 50 20 40 60 80 Deletion from sparse index 10 30 50 70 90 110 130 150

  6. 70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150

  7. 70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150

  8. 70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

  9. 10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

  10. 70 10 30 50 20 40 60 80 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

  11. 70 10 30 50 20 40 60 80 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

  12. 10 30 50 70 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

  13. 70 10 30 50 20 40 60 80 Deletion from dense index 10 20 30 40 50 60 70 80

  14. 70 10 50 20 80 60 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

  15. 10 70 50 20 80 60 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

  16. 10 70 50 20 60 80 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

  17. 10 30 40 60 20 50 Insertion, sparse index case 10 30 40 60

  18. 10 30 40 60 20 50 Insertion, sparse index case • insert record 34 10 30 40 60

  19. 10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! Insertion, sparse index case • insert record 34 10 30 40 60

  20. 10 30 40 60 20 50 Insertion, sparse index case • insert record 15 10 30 40 60

  21. 10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60

  22. 10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60 • Illustrated: Immediate reorganization • Variation: • insert new block (chained file) • update index

  23. 10 30 40 60 20 50 Insertion, sparse index case • insert record 25 10 30 40 60

  24. 10 30 40 60 25 20 50 overflow blocks (reorganize later...) Insertion, sparse index case • insert record 25 10 30 40 60

  25. 10 30 40 60 25 20 50 overflow blocks (reorganize later...) Insertion, sparse index case • insert record 25 10 30 40 60

  26. Insertion, dense index case • Similar • Often more expensive . . .

  27. More on Indexes Secondary Indexes B-Trees

  28. Secondary Indexes • Sometimes we want multiple indexes on a relation. • Ex: search Candies(name,manf) both by name and by manufacturer • Typically the file would be sorted using the key (ex: name) and the primary index would be on that field. • The secondary index is on any other attribute (ex: manf). • Secondary index also facilitates finding records, but cannot rely on them being sorted

  29. Sparse Secondary Index? • No! • Since records are not sorted on that key, cannot predict the location of a record from the location of any other record. • Thus secondary indexes are always dense.

  30. 100 30 20 80 90 60 40 10 50 70 90 30 20 ... 80 100 Sequence field • Sparse index

  31. 30 20 80 100 90 40 70 10 60 50 90 30 ... 20 80 100 does not make sense! Sequence field • Sparse index

  32. Design of Secondary Indexes • Always dense, usually with duplicates • Consists of key-pointer pairs ("key" means search key, not relation key) • Entries in index file are sorted by key • Therefore second-level index is sparse

  33. 90 30 20 80 100 50 70 40 10 60 50 10 10 20 50 60 30 70 90 40 ... ... sparse second- level Secondary indexes Sequence field dense first- level

  34. Secondary Index and Duplicate Keys • Scheme in previous diagram wastes space in the present of duplicate keys • If a search key value appears n times in the data file, then there are n entries for it in the index.

  35. 20 20 10 10 30 10 40 40 40 40 Duplicate values & secondary indexes

  36. 30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option...

  37. 30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option... • Problem: • excess overhead! • disk space • search time

  38. Buckets • To avoid repeating values, use a level of indirection • Put buckets between the secondary index file and the data file • One entry in index for each search key K; its pointer goes to a location in a "bucket file", called the bucket for K • Bucket holds pointers to all records with search key K

  39. 10 30 20 20 10 10 40 40 40 40 50 10 20 60 ... 30 40 Duplicate values & secondary indexes buckets

  40. Why “bucket” idea is useful Indexes Records name: primary Emp (name,dept,floor,...) dept: secondary floor: secondary

  41. Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2

  42. dept index Emp floor index Toy 2 Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2

  43. dept index Emp floor index Toy 2 Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2 • Intersect Toy dept bucket and floor 2 bucket to get set of matching Emp’sSaves disk I/O's

  44. Summary of Indexes So Far • Advantages: • simple • index is sequential file, good for scans • Disadvantages • either inserts are expensive • or lose sequentiality (cf. next slide) • Instead use B-tree data structure to implement index

  45. 10 Example Index (sequential) continuous free space 20 30 40 50 60 70 80 90

  46. 32 39 38 31 33 34 35 36 overflow area (not sequential) 10 Example Index (sequential) continuous free space 20 30 40 50 60 70 80 90

  47. B-Trees • Several related data structures • Key features are: • automatically adjust number of levels of indexes as size of data file changes • storage on blocks is managed to keep every block between half full and full => no overflow blocks needed • We'll actually study B+ trees

  48. B-Tree Structure • an example of a balanced search tree: every root-to-leaf path has same length • each node (vertex) in the tree is a block, which contains search keys and pointers • parameter n, which is largest value so that n+1 pointers and n keys fit in one block • Ex: If block size is 4096 bytes, keys are 4 bytes, and pointers are 8 bytes, then n = 340.

  49. Constraints on B-Tree Nodes • Keys in leaf nodes are copies of keys from data file, in sorted order • Root contains between 2 and n+1 index node pointers • Each internal node contains between (n+1)/2 and n+1 index node pointers • Each non-leaf node consists of ptr1,key1,ptr2,key2,…,keym-1,ptrm where ptri points to index node with keys between keyi-1 and keyi

  50. Constraints (cont'd) • Each leaf contains between (n+1)/2 and ndata record pointers, plus a "next leaf" pointer • Associated with each data record pointer is a key, and the pointer points to the data record with that key

More Related