Boost Write Performance for DBMS on Solid State Drive

Boost Write Performance for DBMS on Solid State Drive Yu LI

Backgrounds (1) • SSD is a complex storage device • flash chips (i.e., NAND) • controller hardware • proprietary software (i.e., firmware) • block device interface via a standard interconnect (e.g., USB, IDE, SATA). • In general: • Sequential read/write, random read is fast. • Random write is slow.

Backgrounds (2) • Some DBMS applications trend to generate random write stream • Online Transaction Processing (OLTP) • Small and frequent insert/delete/update • Concurrence

In-Page Logging Approach • In-Page Logging Approach [Lee, Sigmod 07] • Idea: turn random write to log appending • However • In-page logging area needs hardware support. • For SSD, not practical.

Backgrounds (3) • Question: is there any solution to improve write performance without modifying the firmware of SSD ? • Systemetic performance studies show that not all kinds of “random write” on SSD are slow. • Write performance depends more on write pattern on SSD. [uFlip CIDR2009]

uFlip results Focused write e.g., write inside a <8MB file Partitioned Sequential Write write e.g., 1,50,2,51,3,52,… Ordered Sequential Write write e.g., 1,3,5,7,9,…

Our Idea (1) Write Stream Decomposition • If we can collect enough write requests: • Isolate the write request of good write patterns • Cluster write requests to form instance of focused write SSD

Our Idea (2) 1 StableBuffer 2 Decomposition 3 • Through StableBuffer: • Two writes (1,3) in good write pattern (1x~4x) • One random read (2) (at most 1x) • => Total 9x • Directly: • => 17x~30x SSD

System Overview DBMS Buffer Manager DBMS Transactions StableBuffer Write Read StableBuffer Translation Table Write Write Stream Decompositors Main Memory SSD

Components of StableBuffer Manager • StableBuffer: • pre-allocated focused are on SSD. • E.g., pre-allocated file < 8MB. • StableBuffer Translation Table: • A table for entries like “<12345678AB, 32>” • Fast lookup, insert and delete • Write Stream Decompositors: • A group programs running in concurrent threads • Decomposite instance of good write pattern

More on StableBuffer Translation Table • Reverse index embedded in pages for StableBuffer Translation Table • Destinations and timestamp • For recovery in case of system crush • When recovery, page at offset O whose destination is D, compare its timestamp T to the latest update time T0 of page at destination D • If T> T0 , insert <D,O> into table. • Otherwise, the slot O is free.

Query on StableBuffer • When get a request of retrieving some page at D • we need to check whether there is an entry “<D,O>” in StableBuffer Translation Table. • If there is, return page at Oth slot in StableBuffer. • Otherwise issue a read request to SSD for the page at D. • So it is better to implement StableBuffer Translation Table as a hash table on D.

Decompositors (1) Decompositors Decomposite index Sequential Write Stream Sequential Write Decompositor StableBuffer Translation Table Share Partitioned Sequential Write Stream index Petitioned Sequential Write Decompositor Share Ordered Sequential Write Stream index Ordered Sequential Write Decompositor index Focused Write Decompositor Focused Write Stream

Decompositors (2) • Decompositors run in concurrent threads. • The results could share same entries of StableBuffer Translation Table. • Select the results of decompositors • select the instance of write pattern which performs better on SSD. • select bigger instance. • E.g., 1,2,56,57,6,7,42,43,3,4,... • We select the results according to

Decompositors (3) • Sequential Write Decompositor • Maintain a search tree index on the destination addresses of mapping entries • Partitioned Write Decompositor • share the search tree index of Sequential Write Decompositor • Ordered Write Decompositor • share the search tree index of Sequential Write Decompositor • Focused Write Decompositor • maintains a hash index of entries of StableBuffer TranslationTable. • entry “< D;O >” will be hashed into bucket

Preliminary Result of Evaluation • Prototype of StableBuffer manager • Accept write trace file • On Windows desktop pc, 16GB MTron MSD-SATA-3525 SSD • page size 4KB • StableBuffer is 8MB = 2048 pages • Trace • Oracle 11g running TPC-C benchmark • simulates an enterprise OLTP retailing system, which keeping insert/delete/update records from a 8GB database • 488623 write requests

Preliminary Result of Evaluation 1.5x

Q & A • Thanks

Boost Write Performance for DBMS on Solid State Drive