1 / 28

Improving Read Performance of PCM via Write Cancellation and Write Pausing

Improving Read Performance of PCM via Write Cancellation and Write Pausing. Moinuddin Qureshi Michele Franceschini and Luis Lastras IBM T. J. Watson Research Center, Yorktown Heights, NY. Introduction. More cores in system  More concurrency  Larger working set

ethelv
Download Presentation

Improving Read Performance of PCM via Write Cancellation and Write Pausing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Read Performance of PCM via Write Cancellation and Write Pausing Moinuddin QureshiMichele Franceschini and Luis Lastras IBM T. J. Watson Research Center, Yorktown Heights, NY HPCA – 2010

  2. Introduction More cores in system  More concurrency  Larger working set DRAM-based memory system hitting: power, cost, scaling wall Phase Change Memory (PCM): Emerging technology, projected to be more scalable, higher density, power-efficient

  3. RESET Large Current Small Current Temperature Memory Element SET SET Low resistance RESET High resistance Access Device PCM Operation Tmelt Switching by heating using electrical pulses RESET state: amorphous (high resistance) SET state: crystalline (low resistance) Tcryst Time Read latency 2x-4x of DRAM. Write latency much higher Photo Courtesy: Bipin Rajendran, IBM

  4. Problem of Contention from Slow Writes PCM writes 4x-8x slower than reads Writes not latency critical. Typical response: Use large buffers and intelligent scheduling. But once write is scheduled to a bank, later arriving read waits Write request causes contention for reads  increased read latency

  5. Outline • Introduction • Quantifying the Problem • Adaptive Write Cancellation • Write Pausing • Combining Cancellation & Pausing • Summary

  6. Configuration: Hybrid Memory Processor Chip DRAM Cache (256MB) PCM-Based Main Memory Each bank has a separate RDQ and WRQ (32-entry) Baseline uses read priority scheduling if WRQ < 80% full. If WRQ>80% full, oldest-first policy  “forced write” (rare <0.1%)

  7. Norm. Execution Time Problem Read Latency=1k cycles Write Latency=8k cycles (sensitivity in paper) 12 workloads: each with 8 benchmarks from SPEC06 Baseline No Read Priority Effective Read Latency (Cycles) Write Latency=1K Write Latency=0 Writes significantly increase read latency (Problem only for asymmetric memories)

  8. Outline • Introduction • Problem: Writes Delaying Reads • Adaptive Write Cancellation • Write Pausing • Combining Cancellation & Pausing • Summary

  9. Write Cancellation Write Cancellation: “abort” on-going write to Improve read latency Line in non-deterministic state: read matching read request from WRQ Perform write cancellation as soon as a read request arrives at a bank (as long as the write is not done in forced-mode)

  10. Write Cancellation with Static Threshold Canceling a write request close to completion is wasteful and causes episodes of forced-writes (low performance) WCST: Cancel write request only if less than K% service done 2365 (NeverCancel) (AlwaysCancel)

  11. High 100% Threshold 50% ForcedWrites Low 0% 30 10 20 Num Entries in WRQ Adaptive Write Cancellation Best threshold depends on num pending entries in WRQ. Fewer entries  Higher threshold (best read latency) More entries  Lower threshold (reduces forced writes) Write Cancellation with Adaptive Threshold (WCAT) Threshold = 100 – (4*NumEntriesInWRQ)

  12. Adaptivity of WCAT We sampled all WRQ every 2M cycles to measure occupancy WCAT uses higher threshold initially with empty WRQ but Lower threshold later reduces the episodes of forced-writes

  13. Results for WCAT Baseline: 2365 cycles Ideal:1K cycles Adaptive threshold reduces latency and incurs half the overhead

  14. Outline • Introduction • Problem: Writes Delaying Reads • Adaptive Write Cancellation • Write Pausing • Combining Cancellation & Pausing • Summary

  15. Iterative Write in PCM devices In Multi-Level Cells (MLC), the programming precision requirement increases linearly with the number of levels PCM cells respond differently to same programming pulse Acknowledged solution to address uncertainty: Iterative writes Each iteration consists of steps of: write-read-verify Not done Verify Write Read Done

  16. Model for Iterative Writes We develop an analytical model to capture number of iterations: In terms of bits/cell, num levels written in one shot, and learning Time required to write a line is worst-case of all cells in line MLC:3 bits/cell Avg number of iterations: 8.3 (consistent with MLC literature)

  17. Rd X Iter 1 Iter 2 Rd X Iter 3 Iter 4 Better read latency with negligible write overhead Concept of Write Pausing Iterative writes can be paused to service pending read requests Potential Pause Points Iter 1 Iter 2 Iter 3 Iter 4 Reads can be performed at the end of each iteration (potential pause point) We extend the iterative write algorithm of Nirschl et al. [IEDM’07] to support Write Pausing

  18. Results for Write Pausing Write Pausing at end of iteration gets 85% of benefit of “Anytime” Pause

  19. Outline • Introduction • Problem: Writes Delaying Reads • Adaptive Write Cancellation • Write Pausing • Combining Cancellation & Pausing • Summary

  20. Write Pausing + WCAT Rd X Iter 1 Iter 2 Rd X Iter 3 Iter 4 Rd X Iter 1 Iter 2 Iter 3 Iter 4 Rd X Iter 1 Rd X Iter 2 Iter 3 Iter 4 Iter2 Cancelled Only one iteration is cancelled  “micro-cancellation” has low overhead

  21. Results Baseline: 2365 cycles Ideal:1K cycles Write Pause + Micro Cancellation very close to Anytime Pause (re-execution overhead of micro cancellation <4% extra iterations)

  22. Impact of Write Queue Size Speedup wrt Baseline (32-entry) We will need large buffers to best exploit the benefit of Pausing

  23. Outline • Introduction • Problem: Writes Delaying Reads • Adaptive Write Cancellation • Write Pausing • Combining Cancellation & Pausing • Summary

  24. Summary • Slow writes increase the effective read latency (2.3x) • Write Cancellation: Cancel ongoing write to service read • Threshold based write cancellation • Adaptive Threshold: better performance, half the overhead • Write Pausing exploits iterative write to service pending reads • Write Pausing + Micro Cancellation close to optimal pause • Effective read latency: from 2365 to 1330 cycles (1.45x speedup) • We will need large write buffers to exploit the benefit of Pausing

  25. Questions

  26. Write Pausing in Iterative Algorithms (Nirschl+ IEDM’07)

  27. Workloads and Figure of Merit • 12 memory-intensive workloads from SPEC 2006: • 6 rate-mode (eight copies of same benchmark) • 6 mix-mode (two copies of four benchmarks) Key metric: Effective Read Latency Tin = Time at which read request enters RDQ Tout = Time at which read request finishes service at memory Effective Read Latency = Tout – Tin (average reported)

  28. Sensitivity to Write Latency At WriteLatency=4K, the speedup is 1.35x instead of 1.45x (at 8K latency)

More Related