Improving Transaction-Time DBMS Performance and Functionality

Improving Transaction-Time DBMS Performance and Functionality David Lomet Microsoft Research FeifeiLi Florida State University

Immortal DB: A Transaction-Time DB • What is Transaction-Time DB? • Retains versions of records • Current and prior database states • Supports temporal based access to these versions • Using transaction time • Immortal DB Goals • Performance close to unversioned DB • Full indexed access to history • Explore other functionality based on versions • History as backup • Bad user transaction removal • Auditing

Prior Publications • SIGMOD’04: demo’d and demo paper • ICDE’04: initial running system described • SIGMOD’06: removing effects of bad user transactions • ICDE’08: indexing with version compression • ICDE’09: performance and functionality

Talk Outline • Immortal DB: a transaction time database • Update Performance: timestamping • Timestamping is main update overhead • Prior approaches • Our new approach • Update performance results • Support for auditing • What do we provide • Exploiting timestamping implementation • Range Read Performance: new page splitting strategy • Storage utilization determines range read performance • Prior split strategy guaranteeing “as off” version utilization • Our new approach • Storage utilization results

Timestamping & Update Performance • Timestamp not known until commit • Fixing it to early leads to aborts • Requires 2nd “touch” to add TS to record • 1st for update when TS not known • 2nd for adding TS when known • TID:TS mapping must be stable until all timestamping completes and is stable • Biggest single extra cost for updates

Prior Timestamping Techniques • Eager timestamping • As a 2nd update during transaction • Delays commit, ~doubles update • Lazy Timestamping – several variations • Replace Transaction ID (TID) with timestamp (TS) lazily after commit; but this requires … • Persisting (TID:TS) mapping • Trick is in handling this efficiently • Most prior efforts updated Persistent Transaction Timestamp Table (PTT) at commit with TID:TS mapping • We improve on this part of process

Lazier Timestamping TID:TS posted to log at commit Log TID:TS Commit record: with TID:TS TS added at commit TID:TS batch write from VTT to PTT at chkpt PTT TID:TS Removes VTT entries When TS’ing complete Ref cnt = 0 and stable Main Memory Vol. ts table(VTT) TID:TS: ref cnt Timestamping activity Based mostly on VTT Only TID:TS with unfinished TS’ing

Execution Time Expected result is less than 20% case 100% PTT batch inserts Prior TS method unbatched Unversioned 50% PTT batch inserts 20% PTT batch inserts IMPORTANT: Simple “ONE UPDATE” Transaction

Adding Audit Support • Basic infrastructure only • Too much in audit to try to do more • For every update, who did it and when • Technique • Extend PTT schema to include User ID (UID) • Always persist this information • No garbage collection • Timestamping technique permits batch update to PTT PTT TID:TS:UID

What does it cost? Audit Mode: Always keep everything in PTT, never delete ~ equal to 50% batch insert case as these also are batch deleted 100% PTT batch inserts Prior TS method unbatched Unversioned 50% PTT batch inserts 20% PTT batch inserts IMPORTANT: Simple “ONE UPDATE” Transaction

Utilization => Range Read Performance • Biggest factor is records/page • Current data is most frequently read • We need technique that will improve storage utilization • Surely for current data • No compromise for historical data • Prior page splitting technology evolved from WOB-tree • Which was constrained by write-once media • We can do better with write-many media

Prior Approaches to Guaranteed Utilization • Choose target fill factor for current database • Can’t be 100% like unversioned • Higher => more redundant versions for “partially persistent indexes” • Like TSB-tree, BV-tree, WOB-tree • Because splitting by time creates redundant versions when they cross time split boundary • “Naked” key splits compromise version utilization • Key split splits history as well as current data • Excessive key splits without time splits drives down storage utilization by any specific version. • What to do? Always time split with key split • Removes historical data from new current pages • Permitting them to fill fully to fill factor • Protects historical versions from further splitting • Originally in WOB-tree– a necessity there with WO storage media

Why time split with key split? Same page over time Historical page Current page Page fills key split Time split Key split Key split Free space Time split with key split guarantees historical page will have good utilization for its versions Historical data Added versions

Intuition for new splitting technique Historical page utilization preserved Current page utilization improved Current page Historical page Time split Page fills Key split Free space Historical data Added versions • Always time split when page first is full • Key split afterwards when the page is full again

Analytical Result Added current records with one extra page fill before key split Where in is the insertion ratio, up is the update ratio and cr is the compression ratio. * Formula derived based on one extra time for current pages to fill • We can show the following:

Analysis:Current Storage Utilizationvs Update Ratio Cur Utilization Expect update ratio of 65% - 85% Update Ratio

Summary • Optimizing timestamping yields update performance close to unversioned • Optimizing page splitting yields current time range search performance close to unversioned • Audit functionality easy to add via timestamping infrastructure • Questions???

Improving Transaction-Time DBMS Performance and Functionality

Improving Transaction-Time DBMS Performance and Functionality

Presentation Transcript

appraising and improving performance

DBMS Performance: A multidimensional Challenge

Appraising and Improving Performance

Appraising and Improving Performance

Appraising and Improving Performance

Transaction Log Performance Tuning

TIME MANAEMENT STRATEGIES FOR IMPROVING ACADEMIC PERFORMANCE

Functionality of a DBMS

Improving Performance

Performance Evaluation of a DBMS

GT4 GRAM: A Functionality and Performance Study

Appraising and Improving Performance

Improving Performance

Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Improving performance

Indexing transaction time databases

Improving Student Performance

Steal-on-abort Improving Transactional Memory Performance through Dynamic Transaction Reordering

Improving Productivity, Cycle Time, and Operating Performance

Improving Ado.Net Performance

Binance transaction time out

Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement