1 / 47

Proxy Caching: Duct Tape of the Internet

Proxy Caching: Duct Tape of the Internet. Origin Server. Proxy Cache. Object Store. Blake Scholl <bscholl@cmu.edu> Vishal Soni <vsoni@andrew.cmu.edu>. ((( ))). Please feel free to ask questions!. The Internet, circa 1996-1997. Stupid Website!. Election Results. Web Site. ISP.

emma-dorsey
Download Presentation

Proxy Caching: Duct Tape of the Internet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proxy Caching: Duct Tape of the Internet Origin Server ProxyCache Object Store Blake Scholl <bscholl@cmu.edu> Vishal Soni <vsoni@andrew.cmu.edu>

  2. ((( ))) Please feel free to ask questions!

  3. The Internet, circa 1996-1997 Stupid Website! Election Results Web Site ISP Women in Lingerie ISP Web Site Where’s my paper Playboy?

  4. Cache CDN The Internet, circa 1998-2001 Why does Bill get all the women? Starr Report Web Site ISP Contested Election! ISP Web Site When is Gore going to give up?

  5. The Value of a Proxy • Reduced bandwidth consumption • Reduced access latency • Reduced overload on web servers • Improved reliability • Improved usage data collection

  6. Reverse Proxying (“server accelerator”) Internet ProxyCache Web Server ProxyCache

  7. ProxyCache Forward Proxying Web Server Internet Web Server ProxyCache Web Server

  8. There are billions of unique pages on the Internet, totaling at least in the terabytes. • The total amount of data on the Internet is growing rapidly. • A proxy can hope to store only gigabytes of data. • How can forward caching ever work? Page requests are heavy-tailed! AOL’s caches see around 40% hit rates!

  9. Proxy Jargon Origin Server ProxyCache End User Object Store

  10. The Anatomy of a Proxy Origin Server Proxy Cache End User Server Personality Client Personality Object Store

  11. Multilevel Caching Origin Server ProxyCache End User ProxyCache ProxyCache ProxyCache ProxyCache ProxyCache

  12. Arbitrary Graphs of Caches! Origin Server ProxyCache ProxyCache ProxyCache ProxyCache ProxyCache ProxyCache Origin Server ProxyCache ProxyCache

  13. Research Questions I • What cache architecture is good? • How do users find caches? • How do caches find upstream caches? • Where should I place the caches? • What data should I cache? • Should my caches cooperate? How?

  14. Research Questions II • Should I prefetch data from servers? • What gets cached and what gets bumped? (placement/replacement) • How do I keep content fresh? (Cache coherency) • How do I manage my caches? • What should I do about dynamic content?

  15. Consequences of Poor System: • Stale Content • Increased Latency • New failure point • Underestimated statistics • Difficult Administration • Bottleneck (?)

  16. What makes a good cache system? • Fast Access • Robustness • Transparency • Scalability • Efficiency • Adaptivity • Network Stability • Load balancing • Tolerance of heterogeneity • Simplicity

  17. Cache Architecture

  18. AOL First Hack: Distribute Proxies Origin Server AT&T UUNet Sprint Proxy Proxy Proxy Proxy

  19. An Improvement: Cache Hierarchy Origin Server BackboneNetworks Proxy Proxy Proxy Proxy Proxy Proxy ISP POPs Proxy Proxy Proxy Proxy Proxy Proxy CustomerNetworks Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy

  20. Hierarchical Cache Issues • Difficult to place cache servers at network core • Increased latency at each level • Bottleneck at high-level caches (?) • Redundant data storage (?)

  21. Try Again: Distributed Caches Proxy Proxy Proxy Origin Server Proxy Proxy

  22. Distributed Caching • Pros: • Load Sharing • Fault Tolerant • Cons: • Higher Connection Times • Higher Bandwidth usage

  23. Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy

  24. Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy http://www.news.com/index.html?

  25. Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy http://www.news.com/index.html?

  26. Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy http://www.cmu.edu?

  27. Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy http://www.cmu.edu?

  28. Evaluation: Hybrid Caching • The Good: • Improved flexibility • Better load-balancing of hot spots • Shorter Connection Times • The Bad: • Much more complex • Good cache routing / resolution is essential

  29. Cache Resolution & Routing • How do users find caches? • How do caches find other caches? • Goal: • Quick data location and retrieval. • Two Fundamentally Different Approaches: • Configuration/Indirection • Transparent Proxying

  30. Transparent Proxying:Cache Routing via Magic Origin Server Smarts ProxyCache

  31. Transparent Proxying:Cache Routing via Magic Origin Server Smarts Smarts ProxyCache ProxyCache

  32. Transparent Proxying:“Hybrid” Architecture Magically! www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy

  33. Indirect Resolution & Routing • Indirection via: DNS, HTTP redirects, or embedded URLs • Common approaches • Grow a caching distribution tree away from each popular server towards sources of high demand. Do resolution via cache routing table or hash function.

  34. Cache Resolution / Routing. • Cache Routing Table • Harvest cache organizes caches in hierarchy • Adaptive Web caching uses a mesh of caches. • Provey and Harrison scheme • Cachemesh system • Legedza and Guttag

  35. Cache Resolution / Routing • Hashing Function • Cache Array Routing Protocol • Array membership list, URL • Summary Cache • Summary of URLs of cached docs • Karger, Lewin, Leighton, et al. • Consistent hashing (Akamai System)

  36. Prefetching • Anticipate a document requests and preload / prefetch into local cache • Between browser clients and web servers • Traces… • Between proxies and web servers • Pushing… • Between browser clients and proxies

  37. Prefetching • Summary • Browser <-> Server, Proxy <-> Server • Increase WAN traffic • Browser <-> Proxy • Affects traffic only over LANs. • Either fetch based on popularity or access pattern

  38. Cache replacement • Traditional replacement • LRU, LFU, Pitkow/Recker • Key based replacement • Breaking ties… • Size, LRU-Min, LRU-threshold, Hyper-G, Lowest latency first • Cost based replacement • GreedyDual size, Hybrid, Lowest Relative value, Least Normalized Cost, Bolot/Hoschka, SLRU, Server assisted, Hierarchical GreedyDual

  39. Cache Replacement • Oooo… Caveat • Performance of replacement depends on traffic characteristics. No known policy can outperform others for all types of web access patterns

  40. Cache Coherency • Stale pages need to be update. • Web cache coherency are different from issues in distributed systems • Different access patterns, larger scale, single update location (web servers) • Weak/Strong coherency …

  41. Cache Coherency • Strong coherency • Client validation • Server invalidation • Weak coherency • Adaptive TTL • Piggyback invalidation

  42. Caching Contents • Three roles of a cache… • Data cache, Connection cache, computation cache. • Dynamic caching How to make more data cacheable?

  43. User Access Pattern Prediction • Client’s access pattern to predict future requests. • Group resources likely to be accessed together. • Use Prediction by partial match model to determine which page is likely to be accessed in the near future. • Privacy concerns(?)

  44. Load Balancing • Eliminate Hot Spots • Replication to store copies of hot pages/services throughout Internet. Spread work across several servers.

  45. Aside: Cache Clusters ProxyCache ProxyCache Layer 4+ Switch ProxyCache ProxyCache ProxyCache

  46. Additional issues • Proxy Placement(?) • Web Traffic characteristics

  47. Conclusion • Alleviate server bottlenecks • Minimize user access latency • Proxy placement – under researched • Other issues • Dynamic caching, security, fault tolerance • Buzz words • Scalable, robust, adaptive, stable.

More Related