1 / 19

Improving Transaction-Time DBMS Performance and Functionality

Improving Transaction-Time DBMS Performance and Functionality. David Lomet Microsoft Research Feifei Li Florida State University. Immortal DB: A Transaction-Time DB. What is Transaction-Time DB? Retains versions of records Current and prior database states

hiroko
Download Presentation

Improving Transaction-Time DBMS Performance and Functionality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Transaction-Time DBMS Performance and Functionality David Lomet Microsoft Research FeifeiLi Florida State University

  2. Immortal DB: A Transaction-Time DB • What is Transaction-Time DB? • Retains versions of records • Current and prior database states • Supports temporal based access to these versions • Using transaction time • Immortal DB Goals • Performance close to unversioned DB • Full indexed access to history • Explore other functionality based on versions • History as backup • Bad user transaction removal • Auditing

  3. Prior Publications • SIGMOD’04: demo’d and demo paper • ICDE’04: initial running system described • SIGMOD’06: removing effects of bad user transactions • ICDE’08: indexing with version compression • ICDE’09: performance and functionality

  4. Talk Outline • Immortal DB: a transaction time database • Update Performance: timestamping • Timestamping is main update overhead • Prior approaches • Our new approach • Update performance results • Support for auditing • What do we provide • Exploiting timestamping implementation • Range Read Performance: new page splitting strategy • Storage utilization determines range read performance • Prior split strategy guaranteeing “as off” version utilization • Our new approach • Storage utilization results

  5. Timestamping & Update Performance • Timestamp not known until commit • Fixing it to early leads to aborts • Requires 2nd “touch” to add TS to record • 1st for update when TS not known • 2nd for adding TS when known • TID:TS mapping must be stable until all timestamping completes and is stable • Biggest single extra cost for updates

  6. Prior Timestamping Techniques • Eager timestamping • As a 2nd update during transaction • Delays commit, ~doubles update • Lazy Timestamping – several variations • Replace Transaction ID (TID) with timestamp (TS) lazily after commit; but this requires … • Persisting (TID:TS) mapping • Trick is in handling this efficiently • Most prior efforts updated Persistent Transaction Timestamp Table (PTT) at commit with TID:TS mapping • We improve on this part of process

  7. Lazier Timestamping TID:TS posted to log at commit Log TID:TS Commit record: with TID:TS TS added at commit TID:TS batch write from VTT to PTT at chkpt PTT TID:TS Removes VTT entries When TS’ing complete Ref cnt = 0 and stable Main Memory Vol. ts table(VTT) TID:TS: ref cnt Timestamping activity Based mostly on VTT Only TID:TS with unfinished TS’ing

  8. Execution Time Expected result is less than 20% case 100% PTT batch inserts Prior TS method unbatched Unversioned 50% PTT batch inserts 20% PTT batch inserts IMPORTANT: Simple “ONE UPDATE” Transaction

  9. Talk Outline • Immortal DB: a transaction time database • Update Performance: timestamping • Timestamping is main update overhead • Prior approaches • Our new approach • Update performance results • Support for auditing • What do we provide • Exploiting timestamping implementation • Range Read Performance: new page splitting strategy • Storage utilization determines range read performance • Prior split strategy guaranteeing “as off” version utilization • Our new approach • Storage utilization results

  10. Adding Audit Support • Basic infrastructure only • Too much in audit to try to do more • For every update, who did it and when • Technique • Extend PTT schema to include User ID (UID) • Always persist this information • No garbage collection • Timestamping technique permits batch update to PTT PTT TID:TS:UID

  11. What does it cost? Audit Mode: Always keep everything in PTT, never delete ~ equal to 50% batch insert case as these also are batch deleted 100% PTT batch inserts Prior TS method unbatched Unversioned 50% PTT batch inserts 20% PTT batch inserts IMPORTANT: Simple “ONE UPDATE” Transaction

  12. Talk Outline • Immortal DB: a transaction time database • Update Performance: timestamping • Timestamping is main update overhead • Prior approaches • Our new approach • Update performance results • Support for auditing • What do we provide • Exploiting timestamping implementation • Range Read Performance: new page splitting strategy • Storage utilization determines range read performance • Prior split strategy guaranteeing “as off” version utilization • Our new approach • Storage utilization results

  13. Utilization => Range Read Performance • Biggest factor is records/page • Current data is most frequently read • We need technique that will improve storage utilization • Surely for current data • No compromise for historical data • Prior page splitting technology evolved from WOB-tree • Which was constrained by write-once media • We can do better with write-many media

  14. Prior Approaches to Guaranteed Utilization • Choose target fill factor for current database • Can’t be 100% like unversioned • Higher => more redundant versions for “partially persistent indexes” • Like TSB-tree, BV-tree, WOB-tree • Because splitting by time creates redundant versions when they cross time split boundary • “Naked” key splits compromise version utilization • Key split splits history as well as current data • Excessive key splits without time splits drives down storage utilization by any specific version. • What to do? Always time split with key split • Removes historical data from new current pages • Permitting them to fill fully to fill factor • Protects historical versions from further splitting • Originally in WOB-tree– a necessity there with WO storage media

  15. Why time split with key split? Same page over time Historical page Current page Page fills key split Time split Key split Key split Free space Time split with key split guarantees historical page will have good utilization for its versions Historical data Added versions

  16. Intuition for new splitting technique Historical page utilization preserved Current page utilization improved Current page Historical page Time split Page fills Key split Free space Historical data Added versions • Always time split when page first is full • Key split afterwards when the page is full again

  17. Analytical Result Added current records with one extra page fill before key split Where in is the insertion ratio, up is the update ratio and cr is the compression ratio. * Formula derived based on one extra time for current pages to fill • We can show the following:

  18. Analysis:Current Storage Utilizationvs Update Ratio Cur Utilization Expect update ratio of 65% - 85% Update Ratio

  19. Summary • Optimizing timestamping yields update performance close to unversioned • Optimizing page splitting yields current time range search performance close to unversioned • Audit functionality easy to add via timestamping infrastructure • Questions???

More Related