1 / 27

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. Author: Li Fan, et al. Print: JSAC 98 Presentation: Wonyoung Park [Communication Networks Research Lab]. Introduction. Web cache sharing Reduce Web traffic Alleviate network bottlenecks. Web Cache Sharing. Harvest project

verne
Download Presentation

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summary Cache:A Scalable Wide-Area Web Cache Sharing Protocol Author: Li Fan, et al. Print: JSAC 98 Presentation: Wonyoung Park [Communication Networks Research Lab]

  2. Introduction • Web cache sharing • Reduce Web traffic • Alleviate network bottlenecks

  3. Web Cache Sharing • Harvest project • Design the Internet Cache Protocol (ICP) • Today, hierarchies of proxy caches are established via ICP

  4. ICP • The proxy multicastsa query message to all other proxies whenever a cache miss occurs. Proxy Server GET /index.html Proxy Server ICP Query ICP Response Proxy Server Proxy Server

  5. Problem of ICP • The Overhead of the ICP protocol • A proxy multicasts a query message to all other proxies whenever a cache miss occurs • NOT scalable • O(n2)

  6. Summary Cache • When a cache miss occurs, the proxy probe all the summary and then fetch the document from other proxy Proxy Server query GET /index.html or fail (false positive) Proxy Server Proxy Server Cache miss! Summary says what proxy server has the document Proxy Server

  7. Summary Cache (1) • Each proxy keeps a compact summary of a cache directory of every other proxy • Summary update is aperiodic • Implement as an enhancement of the ICP protocol • ICP version 1.1

  8. Summary Cache (2) • Advantage • Reduce network bandwidth, CPU assumption (vs. the ICP protocol) • Scalable • Not so bad compared to the ICP protocol

  9. Summary Cache http://www.lakuyo.com/muchima.htm 32bit Hash function 4 32bit Hash function 1 MD5 101101110101010111100 …………..… 010111 128bit

  10. How to make a summary MD5 result - 128bit 101101110101010111100 …………..… 010111 10…111 10…111 10…111 10…111 1 1 1 4개의 해시 함수에 따라서 해당하는 summary 비트를 1로 set

  11. Operation • 찾고자 하는 URL에 따라서 해당하는 비트를 찾음 • 각 비트가 1로 세트되어 있는지 확인 • 하나라도 0으로 되어 있으면 실패 • 찾는 비트가 모두 1일 때 • 제대로 찾는 경우(확률 높음) • false positive • 표시되어 있지만 실제 존재하지 않는 경우 • 다른 URL들에 의해서 그 비트가 set됐을 경우

  12. Summary • Bit size • 8, 16, 32 times of the average number of document 101101110101010111100 …………..… 010111 ..01.. 1 , if count(All hash func(Every URL))>0 0 , else

  13. Bloom Filter • A hashing technique • m bit • k independent hashing function • many to one mapping • “false positive”

  14. Optimal parameter for Bloom Filter • false positive를 최소화 하는 파라미터 • 많은 비트를 할당할 수록 좋은 성능 • 그러나 많은 메모리 필요

  15. Probability of false positives • upper graph: for 4 hash functions • lower graph: optimal integral number of hash functions

  16. Factors influencing Performance • Update delay • Until the percentage of cached documents that are “new” reaches a threshold • Simulation: 1% ~ 10% • Summary representation • size of memory • exact directory : use URL • server-name : too many false hit • bloom filter parameters

  17. Result • Have virtually the same cache hit ratio as the exact-directory approach • Reduce the number of messages by a factor of 25 to 60 • Reduce the bandwidth consumption by over 50% • Eliminates between 30% to 95% of the CPU overhead • Maintain almost the same hit ratio as ICP

  18. Term • stale hit • 다른 캐시에 자료가 있는 줄 알고 요청했으나 캐시 자료가 오래된 것이라 쓸모 없는 경우 • false hit • 없는데 있다고 Bloom filter가 착각 • exact_dir • bloom_filter를 쓰지 않고 정확하게 했을 때 • 한 캐시의 내용이 바뀌면 다른 캐시도 동시에 알게 됨 • bloom_filter는 false hit있음 • bloom_filter_숫자 • 한 URL당 ‘숫자’만큼 비트를 할당 • 예) 100개 URL 용량 – 8일 경우 800bit

  19. Impact of Summary Update Delays

  20. Total hit ratio under different summary representations

  21. Ratio of false hits under different summary representations • 없는데 있다고 하는 경우

  22. Number of network messages per user request under different summary forms

  23. Bytes of network messages per user request under different summary forms

  24. Implementation • Summary-Cache Enhanced ICP • Included in ICP version 2 • ICP_OP_DIRUPDATE • directory update message • Additional header • 16bit: Function_Num • 16bit: Function_Bits • 32bit: BitArray_Size_InBits • 32bit: Number_of_Updates • set/unset bit • a list of 32bit integers • 1 bit : set/unset • 31bit: index of the bit that need to be changed

  25. Performance Experiment • Performance of ICP and Summary-Cache for UPisa trace • The enhanced ICP protocol reduces the network bandwidth and CPU overhead significantly while only slightly decreasing the total hit ratio • Lowers the client latency compared to no ICP

  26. Conclusion • Proposed the Summary-Cache enhanced ICP • A scalable wide-area Web cache sharing protocol • Simulation and Measurement were done • Two key aspects of this approach • the effects of delayed updates • the succinct representation of summaries

  27. Critique • Good approach! • Static page only • May need another protocol for dynamic pages • Use a URL to make a summary • Any other parameter? / HTTP/1.1 • Does not account for cache replacement policy – e.g) LRU

More Related