1 / 25

Flash Translation Layer (FTL)

Flash Translation Layer (FTL). March 28, 2011 Sungjoo Yoo Embedded System Architecture Lab. Agenda. Introduction to FTL LAST. [Source: J. Lee, 2007]. Typical Flash Storage. Both # of Flash i/o ports and controller technology determine Flash performance. Host (PC). Intel SSD.

azura
Download Presentation

Flash Translation Layer (FTL)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flash Translation Layer (FTL) March 28, 2011 Sungjoo Yoo Embedded System Architecture Lab. ESA, POSTECH, 2011

  2. Agenda • Introduction to FTL • LAST ESA, POSTECH, 2011

  3. [Source: J. Lee, 2007] Typical Flash Storage • Both # of Flash i/o ports and controller technology determine Flash performance Host (PC) Intel SSD I/O Interface(USB, IDE, PCMCIA) NANDFlash Controller FTL runs on the controller FTL runs on the controller ESA, POSTECH, 2011

  4. [Source: J. Lee, 2007] Flash Translation Layer (FTL) • A software layer emulating standard block device interface Read/Write • Features • Sector mapping • Garbage collection • Power-off recovery • Bad block management • Wear-leveling • Error correction code (ECC) ESA, POSTECH, 2011

  5. Single Page Write Case • Remember “erase-before-write” means “no overwrite”! (tR + tRC + tWC + tPROG )*(# pages/block) + tERASE = (25us + 105.6us*2 + 300us)*64 + 2ms = 36.32ms for a single-page (2KB) write operation ESA, POSTECH, 2011

  6. Replacement Block Scheme [Ban, 1995] • In-place scheme • Keep the same page index in data and update blocks Called data (D) block Called update (U) block (or log block) D block U block 1 Previously, two single-page write operations take 2 x 36.32ms = 72.63ms ~90X reduction! Two single page-write operations take 2 x (tWC + tPROG ) = 2 x (105.6us + 300us) = 0.81ms ESA, POSTECH, 2011

  7. Replacement Block Scheme [Ban, 1995] • In-place scheme • Keep the same page index in data and log blocks D block U block 1 U block 2 Advantage Simple Disadvantages Utilization is low Violate the sequential write constraint ESA, POSTECH, 2011

  8. Log Buffer-based Scheme [Kim, 2002] • In-place (linear mapping) vs. out-of-place (remapping) schemes D block U block 1 U block 2 D block U block 1 U block 2 In-place scheme +No need to manage complex mapping info - Low U block utilization - Violation of sequential write constraint Out-of-place scheme + High U block utilization + Sequential write - Mapping information needs to be maintained ESA, POSTECH, 2011

  9. Garbage Collection (GC) D block U block 1 U block 2 No more U block!  Perform garbage collection to reclaim U block(s) by erasing blocks with many invalid pages ESA, POSTECH, 2011

  10. [Kang, 2006] Three Types of Garbage Collection • Which one will be the most efficient? ESA, POSTECH, 2011

  11. Garbage Collection Overhead Full merge cost calculation Assumptions 64 page block tRC = tWC = 100us tPROG = 300us tERASE = 2ms Max # of valid page copies = 64 # block erases = 3 Full merge operations D block Free block U block 1 U block 2 Runtime cost = 64*(tRC+tWC+tPROG)+3*tERASE = 64*(100us*2+300us)+3*2ms = 38ms X X X Valid page copies may dominate runtime cost  minimize # valid page copies ESA, POSTECH, 2011

  12. Three Representative Methods of Flash Translation Layer • FAST [Lee, 2007] • Two types of log block • A sequential write log block to maximize switch merges • Random write log blocks cover the other write accesses • Superblock [Kang, 2006] • A group of blocks is managed as a superblock • Linear address mapping is broken within a superblock to reduce # of valid page copies in GC • LAST [Lee, 2008] • Two partitions in random write log blocks • Hot partition  more dead blocks  reduction in full merge ESA, POSTECH, 2011

  13. [Lee, 2008] LAST • Observations • Typical SSD accesses have both random and sequential traffics • Random traffics can be classified into hot and cold ESA, POSTECH, 2011

  14. [Lee, 2008] LAST Scheme ESA, POSTECH, 2011

  15. [Lee, 2008] Locality Detection:Random vs. Sequential • Observations • Short requests are very frequent (a) • Short requests tend to access random locations (b) • Long requests tend to access sequential locations (b) • Threshold of randomness • 4KB from experiments ESA, POSTECH, 2011

  16. [Lee, 2008] LAST Scheme < 4KB >= 4KB ESA, POSTECH, 2011

  17. [Lee, 2008] Why Hot and Cold? • Observation • A large amount of invalid pages (>50%) occupy random log buffer space • They are mostly caused by hot pages • Problem • Invalid pages are distributed over random log buffer space, which causes full merges (expensive!) ESA, POSTECH, 2011

  18. Aggregating Invalid Pages due to Hot Pages • An example trace • 1,4,3,1,2,7, 8, 2, 1, … • Single random buffer partition suffer from distributed invalid pages • In LAST method, Hot partition aggregates invalid pages --> full merges can be reduced. In addition, full merges are delayed ESA, POSTECH, 2011

  19. Temporal Locality: Hot or Cold? • Update interval (calculated for each page access) = Current page access time – last page access time • If update interval < k (threshold) • Hot (means frequent writes) ESA, POSTECH, 2011

  20. [Lee, 2008] LAST Scheme < 4KB >= 4KB ESA, POSTECH, 2011

  21. Garbage Collection in LAST:Step 1 Victim Partition Selection • Basic rule • If there is a dead block in Hot partition, we select Hot partition as the victim partition • Else, we select Cold partition as the victim • Demotion from Hot to Cold page • If there is a log block whose updated time is smaller than a certain threshold time, age threshold (i.e., old enough), then we select Hot partition as the victim ESA, POSTECH, 2011

  22. Garbage Collection in LAST:Step 2 Victim Block Selection • Case A: Victim partition = Hot partition • If there is a dead block, select it • Else, select a least recently updated block • Case B: Victim partition = Cold partition • Choose the block with the lowest (full) merge cost (in the merge cost table) • Na: associativity degree, Np: # valid page copies • Cc: page copy cost, Ce: erase cost ESA, POSTECH, 2011

  23. Adaptiveness in LAST • Hot/cold partition size (Sh, Sc), temporal locality threshold (k), age threshold, etc. are adjusted at runtime depending on the given traffics • Nd = # dead blocks in Hot partition • Uh = utilization of Hot partition (# valid pages / # total pages) • One example of runtime policy • If Nd is increasing, then reduce Sh since too many log blocks are assigned to Hot partition • There are several more policy examples in the paper • Comments: They do not seem to be extensive. Thus, they can be improved further ESA, POSTECH, 2011

  24. Experimental Results • Full merge cost is significantly reduced by LAST • Many dead blocks are created  GC with the lowest cost (only erase is needed)

  25. Reference • [Ban, 1995] A. Ban, Flash File System, US Patent, no. 5,404,485, April 1995. • [Kim, 2002] J. Kim, et al., “A Space-Efficient Flash Translation Layer for compactflash Systems”, IEEE Transactions on Consumer Electronics, May 2002. • [Kang, 2006] J. Kang, et al., “A Superblock-based Flash Translation Layer for NAND Flash Memory”, Proc. EMSOFT, Oct. 2006. • [S. Lee, 2007] A Log Buffer-Based Flash Translation Layer Using Fully-Associative Sector Translation, ACM TECS, 2007 • [Lee, 2008] S. Lee, et al., “LAST: locality-aware sector translation for NAND flash memory-based storage systems”, ACM SIGOPS Operating Systems Review archive, Volume 42 ,  Issue 6, October 2008. ESA, POSTECH, 2011

More Related