Flash-Based Caching For Databases - Energy Efficiency and Performance

Flash-Based Caching For Databases -Energy Efficiency and Performance Ankit Chaudhary

Problem Statement • How to use flash memory as database caching device? • What is the performance improvement? • What is the energy efficiency?

Flash Memory • Semiconductor based non-volatile memory. • Used as SSDs, flash drives, mobile device memory. • Type: • NOR • NAND : • Single-level cell • Multi-level cell • Lower density • Higher erase time • Higher latency • Shorter life span

Flash Memory – Important Properties • Does not contain mechanical arm like, HDD. • Does not require frequent refreshing of capacitors due to charge leak like DRAM. • Helps in increasing read throughput. • Helps in reduced power consumption. missing mechanical arm & disk movement • Helps in reduced power consumption.

Flash Memory –Operations 3 Operations : Read, Write/Program, and Erase Erase sets bit to 1. Write/Program sets bit to 0.

Flash Memory –Problems Erase sets bit to 1. Write sets bit to 0. To update the value, we need to erase entire block. • Erase before write • Write endurance • Flash random write 10,000 to 100,000 program/erase cycles. Throughput is lower.

Flash Memory – Comparison with HDD and DRAM Data referred:[Yi12]

Architectures • 2-tier Architecture • 3-tier Architecture • Hybrid Architecture : • complexity at bottom layer and buffer management. • NOT USED

Basic 3TA Working Case 1 Request for Page P Look for the page P in DRAM based Buffer (Tt) Page Located In Top-tier Page P found Yes No (Page fault in Tt) Look for the page P in Flash based Cache(Tm) Yes Page P found No (Page fault in Tm) Page Request Served Yes Serve the Data from Disk Drive (Tb)

Basic 3TA Working Case 2 Request for Page P Look for the page P in DRAM based Buffer (Tt) Page Located In Middle-tier Page P found Yes No (Page fault in Tt) Look for the page P in Flash based Cache(Tm) Yes Page P found No (Page fault in Tm) Page Request Served Yes Serve the Data from Disk Drive (Tb)

Basic 3TA Working Case 3 Request for Page P Look for the page P in DRAM based Buffer (Tt) Page Located In Bottom-tier Page P found Yes No (Page fault in Tt) Look for the page P in Flash based Cache(Tm) Yes Page P found No (Page fault in Tm) Page Request Served Yes Serve the Data from Disk Drive (Tb)

Energy & Performance Efficiency • Two replacement algorithm for cache management • LOC : Local Algorithm • LRU-based replacement algorithm. • Doesn’t have any information about Top-tier. • Duplicity of data between Top-tier and Middle-tier. • GLB : Global Algorithm • LRU-based replacement algorithm. • Have information about Top-tier as well. • Duplicity of data between Top-tier and Middle-tier does not exists.

Local Replacement Algo.(LOC) Case 1 Tm H Ls Request for reading page P from Tm MRU Page Look for slot c containing page P in directory H c Read page P from slot c Move slot c to MRU position of Ls LRU Page Update H and return P H = directory Tm = middle-tier cache Ls = cache slot List

Local Replacement Algo.(LOC) Case 2 Tm H Ls Request for reading page P from Tm MRU Page Look for slot c containing page P in directory H Start page eviction process Select a victim v, LRU of Ls. Check if it is dirty then write it to Tb LRU Page v P Load P from Tb to v and move it to MRU H = directory Tm = middle-tier cache Ls = cache slot List Tb = bottom-tier disk drive Update H and return P

Global Replacement Algo.(GLB) • In case of a page fault at Tt, GLB loads the page from Tm to Tt. • If there is a cache miss at Tm, the page will directly be loaded to Ttfrom Tb. • In both cases, there will be a page eviction from Tt to Tm. • IMPORTANT: • Unlike LOC, GLB loads the page into Tt before serving the request.

Global Replacement (GLB) -Page Eviction Algo. Tm H Ls Request for evicting page P to Tm MRU Page Start page eviction process Select a victim v, LRU of Ls. Check if it is dirty then write it to Tb Load P from Tt to v and move it to MRU LRU Page v P Update H H = directory Tm = middle-tier cache Ls = cache slot List Tb= bottom-tier disk drive

Experiment • Comparison between 2TA, LOC and GLB (3TA). • Used simulation and real-life environment for computing the results. • Results computed for varying sizes of Tm(using “s” parameter). • Computed Virtual Execution Time for 2TA, LOC and GLB • Computed Power Consumption for 2TA, LOC and GLB. • Formulas Used : Virtual Execution Time ; Access time for middle-tier ; Access time for bottom-tier Power Consumption

Results : Simulation Based using TPC-E using TPC-E (d) (c) using TPC-C using TPC-H Data referred:[Yi12]

Results : Simulation Based Energy consumption of the TPC-E trace for b = 1000 Data referred:[Yi12]

Results : Real-life (a) Real-life trace performance : execution time (sec) for each b ϵ {1000, …., 32000} Real-life trace performance : for b = 32000 Data referred:[Yi12]

Conclusion • 3TA is better then 2TA in terms of both performance and energy efficiency. • LOC performs better for bigger sizes of flash based middle-tier. • GLB performs better for smaller sizes of flash based middle-tier.

What about FTL ? • FTL makes cache management algorithm to work on flash memory without modification. • FTL provides transparent access to flash memory. • BUT ……. It is proprietary and vendor specific. FTL : Flash Translation Layer

Small Introduction to GC • Select the sets of garbage blocks. Each garbage block consists of valid/invalid pages. • Move all valid pages from garbage blocks to another sets of free blocks and update the management information. • Erase the garbage blocks, which in return will create free blocks. v v iv iv iv iv v v v v v v v v v v v v v v v v v v GC : Garbage Collection

Problems • Proprietary FTL = difficult for standardizing the performance. • No control over various expensive operations like GC, performed by FTL. • Cold Page Migration : moving unnecessary cold but valid pages during the process of GC, leading to expensive and less efficient operations. • Inefficient GC = frequent GC = more erase operations. • Reduced life of flash device due to flash endurance.

Solution • Two approaches : • Logical Page Drop (LPD) • Access flash memory using FTL. • Introduces a new operation: Delete. • Proactive cold-page dropping. • Native Flash Access (NFA) • Directly accesses flash memory. • Implements customized GC process. • Block management structure (BMS), maintains validity/cleanliness of pages. • Bulk GC processing. • Intelligently selects the victim garbage block.

Logical Page Dropping Case 1 Tm F ≠ ф v v iv iv iv iv v v v v v v v v v v v v v v v v v Request for free slot • Provides the free slot from F • Remove the slot from F S F d=4 S = set of occupied slots F = set of free slots d = number of victim slots

Logical Page Dropping Case 2 Tm F = ф v v iv iv iv iv v v v v v v v v v v v v v v v v v iv Request for free slot • Select a victim slot vband evict it • Evict d pages & perform delete operation v iv iv iv iv v v v v v v v iv iv • Perform GC on the block • Provide vb as free slot S F d=4 S = set of occupied slots F = set of free slots d = number of victim slots

Native Flash Access –Allocation Algorithm Case 1 Tm Current block is not full v v iv iv iv iv v v v v v v v v v v v v v v v v v Request for free slot • wp provides the address of free slot wp • wp increments and point to next free slot S F Wl=2000 & Wh=60000 S = set of occupied slots F = set of free slots Wl = low watermark Wh= high watermark Wp= write pointer

Native Flash Access –Allocation Algorithm Case 2 Tm Current block is full v v iv iv iv iv v v v v v v v v v v v v v v v v v v v v v v iv iv v v Request for free slot • wp points to the first free slot of next free block • Check the value F with Wl • If |F| < Wl then perform GC’s until F ≥ Wh wp S F Wl=2000 & Wh=20000 S = set of occupied slots F = set of free slots Wl = low watermark Wh= high watermark Wp= write pointer

Native Flash Access –Garbage Collection Algorithm Tm Check the validity of pages in block v v v v v v v • If All Pages = Valid then, select the victim block v v v v v v v v v v v v v iv iv v v Drop all Valid pages where LAT ≤t • move others to free slots • Erase the block and mark it free S F t= 1/1/13 01:42:53 S = set of occupied slots F = set of free slots t = page-dropping threshold LAT = Last Access Time

Experiment • Comparison between NFA, LPD, and Baseline (BL). • Used simulation environment to calculate the results. • Use of LRU for selecting victim pages or blocks. • Use of greedy policy for selecting the victim block with least number of valid block. • 128 pages X 512 blocks setup for all three approaches. BL= is the middle-tier cache with indirect flash access working without delete operation.

Result (a) Throughput (IOPS) (c) TPC-H (d) TPC-E (b) TPC-C Breakdown of the trace execution time (seconds) into the fraction of GC tg, cache overhead tc, and disk accesses tb Data referred:[Yi12]

Result Distribution of the number of valid pages in garbage-collection blocks. A bar of height y at position x on the x-axis means that it happened y times that a block containing x valid pages got garbage collected. Number of erase for each block. Each position on the x-axis refers to a block Data referred:[Yi12]

Conclusion • NFA and LPD outperforms BL in terms of throughput and GC efficiency. • NFA seems to be the better option compared to both LPD and BL. • Use of NFA and LPD also take care of wear-levelling. • Directly accessing flash memory without using FTL helps both in performance and lifetime improvment.

Summary • 3-tier architecture performs better than 2-tier architecture both in terms of energy efficiency and performance. • Using flash memory as secondary cache improves the performance significantly. • Native access of flash memory helps in improving performance and life of flash device.

References • [RB09] D.Roberts, T.Kgil, et al.: Integrating NAND device onto servers. Communications of the ACM, vol. 52, no. 4, pages 98-103, 2009. • [KM07] J.Koomey: Estimating total power consumption by servers in the US and the world. http://sites.and.com/de/Documents/svrpwrusecompletefinal.pdf, February 2007. • [ID08] The diverse and exploding digital universe (an IDC white paper). http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf, March 2008. • [AR02] ARIE TAL, M-Systems Newark, CA: NAND vs.\, NOR flash technology. The designer should weigh the options when using flash memory (Article). http://www.electronicproducts.com/Digital_ICs/NAND_vs_NOR_flash_technology.aspx, January 2002. • [BY10] Byung-Woo Nam, Gap-Joo Na and Sang-Won Lee: A Hybrid Flash Memory SSD Scheme for Enterprise Database Applications, April 2010. • [TD12] TDK Global: SMART Storage Solution for Industrial Application (Technical Journal), January 2012. • [GA05] EranGal and Sivan Toledo, School of Computer Science, Tel-Aviv University: Algorithms and Data Structures for Flash Memories, January 2005. • [IO09] IoannisKoltsidas and Stratis D. Viglas, School of Informatics, University of Edinburgh: Flash-Enabled Database Storage, March 2010 • [SE10] SeongcheolHong and DongkunShin, School of Information and Communication Engineering Sungkyunkwan University Suwon, Korea: NAND Flash-based Disk Cache Using SLC/MLC Combined Flash Memory, May 2010. • [TH05] Theo Harder: DBMS Architecture -- The Layer Model and its Evolution. March 2005 • [Yi12] Yi Ou: Ph.D. Thesis report, University of Kaiserslautern, Caching for flash-based databases and flash-based caching for databases. August 2012

Questions?

Flash-Based Caching For Databases - Energy Efficiency and Performance