1 / 29

ULC: An Unified Placement and Replacement Protocol in Multi-level Storage Systems

ULC: An Unified Placement and Replacement Protocol in Multi-level Storage Systems. Song Jiang and Xiaodong Zhang College of William and Mary. Multi-Level Buffer Caching in Distributed Systems. client. Front-tier server. end-tier server. network. client. disk array. . 50%. 10%. 10%.

amos
Download Presentation

ULC: An Unified Placement and Replacement Protocol in Multi-level Storage Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ULC:An Unified Placement and Replacement Protocol in Multi-level Storage Systems Song Jiang and Xiaodong Zhang College of William and Mary

  2. Multi-Level Buffer Caching in Distributed Systems client Front-tier server end-tier server network client disk array

  3. 50% 10% 10% 10% 10% 50%  10% 10% 40% L1 80% Challenges to Improve Hierarchy Performance L2 L3 L1 L4 LRU LRU LRU LRU (1) Can the hit rate of hierarchical caches achieve the hit rate of a single first level cache with its size equal to the aggregate size of the hierarchy? (2) Can we make caches close to clients contribute more to the hit rate?

  4. Reason I: Weakened Locality at Low Level Caches • Low level caches hold the misses from their upper level buffer caches, and the hits have high latency. • The requests with strong locality have been filtered by the high level buffer caches close to clients.

  5. Q0 Q1 Qn . . . Qout An Existing Solution: Re-designing Low Level Cache Replacement Algorithms Multi-Queue Replacement (MQ) [USENIX’01] • To overcome weak locality, MQ is a frequency-based replacement; • Once a block is accessed, it is promoted to a higher queue. Periodically, blocks in each queue are checked and low frequency blocks are demoted to lower queues.

  6. Drawbacks of MQ Replacement • Inheriting the weakness of frequency-based algorithm – not responsive to access pattern changes • Containing workload sensitive parameters; • Cannot fully exploit the locality knowledge inherent in applications (accurate information is in high level caches) Motivation: Locality analysis is conducted at clients, where original requests are generated.

  7. Reason II: Undiscerning Redundancy among Levels of Buffer Caches Both caches Server cache client Cache Snapshots at every 1000 references

  8. 5 6 3 2 Demotion 1 10 0 4 6 7 9 8 Another Existing Solution: Extending Existing Replacement into an Unified Replacement For example: Unified LRU (uniLRU) [USENIX’02] 10 Client L1 LRU stack Server L2 LRU stack

  9. 3 2 1 6 0 5 3 L1 LRU stack 2 1 All the hits go to this L2 position 6 0 4 L2 LRU stack 7 8 9 Drawbacks of Unified LRU • High level caches are not well utilized • Large demotion overhead; …….

  10. Our Approach: Unified Level-aware Caching (ULC) • Minimizing redundancy among levels of the buffer caches by unified replacement based on client information. • Blocks with weak locality are placed in the low level buffer caches (1) Locality is analyzed at client. (2) The analysis results are used to direct the placement of blocks in the hierarchy. Locality strength Cache levels

  11. R LAD-R Current Position LAD LAD R NAD NAD LAD-R Last Access Position Next Access Position Quantifying Locality Strength LAD-R = max (LAD, R) Current Position • Locality strength is characterized by Next Access Distance (NAD); • NAD is unknown currently; • NAD is quantitatively predicted by Last Access Distance (LAD) and Recency (R). Last Access Position • Advantages of LAD-R over R • not change until the next reference of the block • Accurate quantification Next Access Position Unified LRU Stack

  12. Multi-Level Buffer Caching Protocol ---- Unified and Level-Aware Caching (ULC) • ULC running on the first level client dynamically ranks the accessed blocks according to their LAD-R values. • Based on the ranking results, blocks are cached (placed) at levels L1, L2, …, accordingly. • Low level caches take actions such as caching or replacing according to the instructions from clients.

  13. LAD-R Based Block Caching • Exactly arranging block layout as LAD-R ranking is expensive (at least O(logn)) • Efficient two-phase LAD-R Based caching (O(1)): • LAD determines block placementat the time of retrieval (R = 0); • R is used for block replacementafter a block is cached.

  14. 2 2 7 7 LAD-R Based Placement and Replacement 5 The LRU position at which a block is accessed determines its placement 3 The current LRU position determines its replacement L1 LRU stack 1 6 0 4 L2 LRU stack 8 9

  15. ULC Data Structure Recency Status 6 7 2 7 2 6 1 10 8 5 3 8 5 3 R1 12 12 4 9 4 9 R2 Y1 11 11 R3 L1 • Recency Status is determined by recency • Level Status is determined by LAD • The placement of a block is determined by its level status Yardstick L2 • The yardstick block is the one for replacement at the corresponding level Y2 L3 Y3 Level Status uniLRU Stack

  16. Two Operations in the ULC Protocol • Two request messages from the client to the low level caches: • Retrieve (b, i, j) ( i ≥ j ): retrieve block b from level Li, and cache it at level Lj when it passes level Lj on its route to level L1. • Demote (b, i, j) (i < j): demote block b from level Li into level Lj.

  17. 2 6 6 7 2 7 10 1 8 5 8 3 5 3 12 12 4 9 4 9 Y1 11 11 R2, accessed Block 11 L3 L2 Retrieve (11, 3, 2 ) Retrieve (11, From, To) L1 L2 L3 Level Status R2 Y2 Y3 uniLRU Stack

  18. 6 6 6 7 2 2 7 6 1 10 3 5 8 3 5 8 12 12 4 9 4 9 Y1 11 11 Y2 11 11 Retrieve (11, 3, 2) Demote (6, 2, 3) L1 L2 L3 Level Status Y3 uniLRU Stack

  19. ULC with Multiple Clients 9 9 9 Y1 Y1 9 14 Y2 Y2 14 Yardstick Yardstick 19 5 10 8 11 15 7 18 2 7 10 17 3 4 3 18 17 6 6 5 3 14 L1 L2 6 L1 L2 Client 2 Client 1 19 15 17 7 L1 Block Global_LRU at Server 3 L2 Block 6 Lout block

  20. Performance Evaluation: Workload Traces • RANDOM: spatially uniform distribution of references; (synthetic traces) • ZIPF: highly skewed reference distribution; (synthetic traces) • HTTPD collected on a 7-node parallel web-server. (HP) • DEV1 collected in an office environment for over 15 consecutive days. (HP) • TPCC1 the I/O trace of the TPC-C database benchmark. (IBM DB2)

  21. Performance on a 3-level Structure Block size: 8KB Block transfer time between the client and the server buffer caches: 1ms Block transfer time between the server buffer cache and its RAM cache on disk: 0.2ms Block transfer time between the disk RAM cache and the disk: 10ms Cache size: 100MB each

  22. Compared with indLRU, ULC significantly increases hit ratios; • Compared with uniLRU, ULC providse better hit distribution;

  23. indLRU has high miss penalty; • uniLRU has high demotion cost;

  24. Performance on a Multi-client Structure • httpd collected on a7-node parallel web-server. • openmail: collected on 6 HP 9000 K580 servers running HP OpenMail application. • db2 collected on an8-node IBM SP2 system running an IBM DB2 database. Block size: 8KB Block transfer time between the clients and the server: 1ms Block transfer time between the server buffer cache and the disk: 10ms Cache size: 100MB each (except for workload tpcc1, which is 50MB)

  25. The effect of “cache pollution” in MQ

  26. Large demotion cost in uniLRU

  27. Summary • We propose an effective way to quantify locality in multi-level caches; • We design an efficient block placement / replacement protocol (ULC); • ULC makes the layout of cached blocks in the hierarchy matches their locality; • Experiments show that ULC significantly outperform exiting schemes.

More Related