310 likes | 440 Views
Buffer and File organization. Review. DBMSs store data on disk Disk characteristics: 2 orders of magnitude slower than MM Unit of read/write operations a block/page (multiple of sectors) Access time = seek time + rotation time + transfer time Sequential I/O much faster than random I/O
E N D
Review • DBMSs store data on disk • Disk characteristics: • 2 orders of magnitude slower than MM • Unit of read/write operations a block/page (multiple of sectors) • Access time = seek time + rotation time + transfer time • Sequential I/O much faster than random I/O • Database Systems try to minimize the overhead of moving data from and to disk
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Buffer manager- Context
Buffer Management • Keep pages in a part of memory (buffer), read directly from there • What happens if you need to bring a new page into buffer and buffer is full: you have to evict one page • Replacement policy: • LRU : Least Recently Used (CLOCK) • MRU: Most Recently Used • Toss-immediate : remove a page if you know that you will not need it again • Pinning (needed in recovery, index based processing,etc) • Other DB specific RPs: DBMIN, LRU-k, 2Q
DB Buffer Management in a DBMS • Data must be in RAM for DBMS to operate on it! • Buffer Mgr hides the fact that not all data is in RAM Page Requests from Higher Levels BUFFER POOL disk page free frame MAIN MEMORY DISK choice of frame dictated by replacement policy
When a Page is Requested ... • Buffer pool information table contains: <frame#, pageid, pin_count, dirty> • If requested page is not in pool and no free frame: • Choose a frame for replacement.Only “un-pinned” pages are candidates! • If frame is “dirty”, write it to disk • Read requested page into chosen frame • Pin the page and return its address. • If requests can be predicted (e.g., sequential scans) • pages can be pre-fetchedseveral pages at a time!
More on Buffer Management • Requestor of page must eventually unpin it, and indicate whether page has been modified: • dirtybit is used for this. • Page in pool may be requested many times, • a pin countis used. • To pin a page, pin_count++ • A page is a candidate for replacement iff pin count == 0 (“unpinned”) • CC & recovery may entail additional I/O when a frame is chosen for replacement. • Write-Ahead Log protocol; more later!
Buffer Replacement Policy • Frame is chosen for replacement by a replacement policy: • Least-recently-used (LRU), MRU, Clock, etc. • Policy can have big impact on # of I/O’s; depends on the access pattern.
LRU Replacement Policy • Least Recently Used (LRU) • for each page in buffer pool, keep track of time when last unpinned • replace the frame which has the oldest (earliest) time • very common policy: intuitive and simple • Works well for repeated accesses to popular pages • Problems? • Problem: Sequential flooding (next slide) • LRU + repeated sequential scans. • # buffer frames < # pages in filemeans each page request causes an I/O.
MRU • Sometimes you may want to use MRU!! Why MRU? Consider the following access pattern (page ids): 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1,2,3,4,5,6,7,8,…. Assume a buffer of 10 pages: After the first 10 pages, MRU will replace 10 with 11, 11 with 12, 12 with 13, 13 with 14, 14 with 15. Next 9 pages will be found in and we need no disk reads for them. Compare this approach against LRU.
A(1) B(p) D(1) C(1) “Clock” Replacement Policy • An approximation of LRU • Arrange frames into a cycle, store one reference bit per frame • Can think of this as the 2nd chance bit • When pin count reduces to 0, turn on ref. bit • When replacement necessarydo for each page in cycle { if (pincount == 0 && ref bit is on) turn off ref bit; else if (pincount == 0 && ref bit is off) choose this page for replacement; } until a page is chosen;
DBMS vs. OS File System OS does disk space & buffer mgmt: why not let OS manage these tasks? • Some limitations, e.g., files can’t span disks. • Buffer management in DBMS requires ability to: • pin a page in buffer pool, force a page to disk & order writes(important for implementing CC & recovery) • adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations.
File Organization • Basics • A database is a collection of files, file is a collection of records, record (tuple) is a collection of fields (attributes) • Files are stored on Disks (that use blocks to read and write) • Two important issues: • Representation of each record • Grouping/Ordering of records and storage in blocks
File Organization • Goal and considerations: • Compactness • Overhead of insertion/deletion • Retrieval speed: sometime we prefer to bring more tuples than necessary in MM and use CPU to filter out the unnecessary ones!
Record Representation and Page Formats • Fixed-Length Records • Example Account( acc-number char(10), branch-name char(20), balance real) Each record is 38 bytes. Store them sequentially, one after the other Record1 at position 0, record2 at position 38, record3 at position 76 etc
Fixed-Length Records • Simple approach: • Store record i starting from byte n (i – 1), where n is the size of each record. • Record access is simple but records may cross blocks • Modification: do not allow records to cross block boundaries • Insertion of record i: Add at the end • Deletion of record i: Two alternatives: • move records: • i + 1, . . ., n to i, . . . , n – 1 • record n to i • do not move records, but link all free records on a free list
Free Lists • FLR with Free Lists • Store the address of the first deleted record in the file header. • Use this first record to store the address of the second deleted record, and so on • Can think of these stored addresses as pointerssince they “point” to the location of a record. • More space efficient representation: reuse space for normal attributes of free records to store pointers. (No pointers stored in in-use records.) Better handling ins/del
Page Formats: Fixed Length Records Slot 1 Slot 2 Free Space . . . Slot N Slot M . . . 1 1 1 M 0 M ... 3 2 1 number of slots UNPACKED, BITMAP
Variable-Length Records: Slotted Page Structure • VLR-SP: divide page into directory, free space and data spaces • Slot directory contains: • number of record entries • end of free space in the block • location and size of each record • Records stored at the bottom of the page • External tuple pointers point to record ptrs: rec-id = <page-id, slot#>
Rid = (i,N) Page i Rid = (i,2) Rid = (i,1) N Pointer to start of free space 20 16 24 N . . . 2 1 # slots SLOT DIRECTORY Insertion: 1) Use FP to find space and insert 2) Find available ptr in the directory (or create a new one) 3) adjust FP and number of records Deletion ?
Record Formats: Fixed Length • Information about field types same for all records in a file; stored in systemcatalogs. • Finding i’th field does not require scan of record. F1 F2 F3 F4 L1 L2 L3 L4 Base address (B) Address = B+L1+L2
4 $ $ $ $ Record Formats: Variable Length • Two alternative formats (# fields is fixed): F1 F2 F3 F4 Fields Delimited by Special Symbols Field Count F1 F2 F3 F4 Array of Field Offsets • Second offers direct access to i’th field, efficient storage • of nulls(special don’t know value); small directory overhead.
Column Stores • Another way to store records is to store them column-wise! • Store each column in different files. • Advantages? • Disadvantages?
Ordering and Grouping records • Issue #1: • In what order we place records in a block? • Heap technique: assign anywhere there is space • Ordered technique: maintain an order on some attribute So, we can use binary search if selection on this attribute.
File organization • Issue #2: In which blocks should records be placed Many alternatives exist, each ideal for some situation , and not so good in others: • Heap files:Add at the end of the file.Suitable when typical access is a file scan retrieving all records. • Sorted Files:Keep the pages ordered. Best if records must be retrieved in some order, or only a `range’ of records is needed. • Hashed Files: Good for equality selections. Assign records to blocks according to their value for some attribute
Data Dictionary Storage Data dictionary (also called system catalog) stores metadata: that is, data about data, such as • Information about relations • names of relations • names and types of attributes of each relation • names and definitions of views • integrity constraints • User and accounting information, including passwords • Statistical and descriptive data • number of tuples in each relation • Physical file organization information • How relation is stored (sequential/hash/…) • Physical location of relation • operating system file name or • disk addresses of blocks containing records of the relation • Information about indices
Data dictionary storage • Stored as tables!! • E-R diagram? • Relations, attributes, domains • Each relation has name, some attributes • Each attribute has name, length and domain • Also, views, integrity constraints, indices • User info (authorizations etc) • statistics
A-name name position 1 N has relation attribute domain
Data Dictionary Storage (Cont.) • A possible catalog representation: Relation-metadata = (relation-name, number-of-attributes, storage-organization, location)Attribute-metadata = (attribute-name, relation-name, domain-type, position, length) User-metadata = (user-name, encrypted-password, group) Index-metadata = (index-name, relation-name, index-type, index-attributes) View-metadata = (view-name, definition)
Large Objects • Large objects : binary large objects (blobs) and character large objects (clobs) • Examples include: • text documents • graphical data such as images and computer aided designs audio and video data • Large objects may need to be stored in a contiguous sequence of bytes when brought into memory. • If an object is bigger than a page, contiguous pages of the buffer pool must be allocated to store it. • May be preferable to disallow direct access to data, and only allow access through a file-system-like API, to remove need for contiguous storage.