a shared log design for flash clusters

a shared log design for flash clusters Mahesh Balakrishnan, Dahlia Malkhi Vijayan Prabhakaran, Ted Wobber John D. Davis, Michael Wei Microsoft Research Silicon Valley

tape is dead disk is tape flash is disk RAM localityis king - Jim Gray, Dec 2006

flash in the data center can flash clusters eliminate the trade-off between consistency and performance? what new abstractions are required to manage and access flash clusters?

the CORFU abstraction: a shared log 20K/s 200K/s 500K/s example application: the Hyder database (Bernstein et al., CIDR 2011) infrastructure applications: SMR databases key-value stores filesystems virtual disks application append(value) read(offset) 200K/s 500K/s CORFU read from anywhere append to tail flash cluster

the CORFU hardware: network flash • network-attached flash units • low power: 15W per unit • low latency • low cost cost + power usage of a 1 TB, 10 Gbps flash farm:

problem statement how do we implement a scalable shared log over a cluster of network-attached flash units?

the CORFU design CORFU API: V = read(O) O = append(V) trim(O) application mapping resides at the client CORFU library read from anywhere append to tail 4KB entry each logical entry is mapped to a replica set of physical flash pages

the CORFU protocol: reads client application read(pos) D1 D3 D5 D7 CORFU library D2 D4 D6 D8 read(D1/D2, page#) Projection: D1 D2 D3 D4 D5 D6 D7 D8 CORFU cluster

the CORFU protocol: appends client CORFU append throughput: # of 64-bit tokens issued per second sequencer is only an optimization! clients can probe for tail or reconstruct it from flash units reserve next position in log (e.g., 100) sequencer (T0) application read(pos) append(val) D1 D3 D5 D7 CORFU library D2 D4 D6 D8 write(D1/D2, val) Projection: D1 D2 D3 D4 D5 D6 D7 D8 CORFU cluster

chain replication in CORFU client C1 2 client C2 client C3 1 safety under contention: if multiple clients try to write to same log position concurrently, only one wins writes to already written pages => error durability: data is only visible to reads if entire chain has seen it reads on unwritten pages => error requires `write-once’ semantics from flash unit

handling failures: flash units each Projection is a list of views 0 - D1 D2 D3 D4 D5 D6 D7 D8 0 - 7 D1 a D3 D4 D5 D6 D7 D8 8 - D1 D9 D3 D4 D5 D6 D7 D8 0 - 7 D1 a D3 D4 D5 D6 D7 D8 8 – 9 D1 D9 D3 D4 D5 D6 D7 D8 9 - D10 D11 D12 D13 D14 D15 D16 D17 Projection 0 Projection 1 Projection 2 reconfiguration steps: ‘seal’ current projection at flash units write new projection at auxiliary D10 D12 D14 D16 D1 D3 D5 D7 latency for 32-drive cluster: tens of milliseconds D9 D2 D4 D6 D8 D11 D13 D15 D17

handling failures: clients • client obtains token from sequencer and crashes:holes in the log • solution: other clients can fill the hole • fast CORFU fill operation (<1ms) ‘walks the chain’: • completes half-written entries • writes junk on unwritten entries (metadata operation, conserves flash cycles, bandwidth)

garbage collection: two models • prefix trim(O): invalidate all entries before offset O • entry trim(O): invalidate only entry at offset O ∞ ∞ invalid entries invalid entries valid entries valid entries

CORFU throughput sequencer bottleneck reads scale linearly

how far is CORFU from Paxos? Paxos-like protocols are IO-bound at leader… D1 D3 D5 D7 … so is a single CORFU chain D2 D4 D6 D8 CORFU cluster Projection ‘stitches’ together multiple chains: no I/O bottleneck!

conclusion CORFU is a scalable shared log: linearly scalable reads, 1M appends/s CORFU uses network-attached flash to construct inexpensive, power-efficient clusters

a shared log design for flash clusters

a shared log design for flash clusters

Presentation Transcript

Membership Card Design Log

Tango: distributed data structures over a shared log

Building Career Clusters A 10-Step Blueprint for Career Clusters

Design Log

Innovation Design: Progressively developing a shared vision for PML

Design Log

Flash…In a Flash

Medal Design Log

GPFS: A Shared-Disk File System for Large Computing Clusters

FLASH in a FLASH

Flash Design for eLearning

Converter System Log Design

A Vision for the Future: Oklahoma’s Career Clusters Design

Shared Features in Log-Linear Models

Adobe Flash Catalyst for Agile Interaction Design

LOG IN……Norton Shared Folder

Gift Bag Design Log

Clusters for innovation

Design Your Own Flash Drive

Log Replacement for Log Cabin