180 likes | 212 Views
This paper examines Internet content delivery systems including CDNs, Akamai, peer-to-peer networks like Gnutella and Kazaa. The analysis focuses on client, object, and server dynamics, outlining caching implications. Results suggest P2P traffic, large object sizes, and concentrated traffic sources significantly impact bandwidth use.
E N D
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of the 5th Symposium on Operating Systems Design and Implementation December 2002
Outline • Goals of Paper • Overview of Content Delivery Systems • Experimental Methodology • Results • Caching • Conclusions
Goals • Quantify the increasing importance of novel content delivery systems • Characterize the behavior of these systems from the perspectives of clients, objects, and servers • Derive implications for caching in these systems
Content Delivery Systems • HTTP Web Traffic • Content Delivery Networks • Akamai • Peer-to-peer file sharing networks • Gnutella • Kazaa
HTTP Traffic • Clients request objects from web servers using HTTP • Most web objects are small, 5-10KB. • Web object requests follow a Zipf-like distribution • Caching • Cache hit rate increases logarithmically with client population • Impossible for dynamic content
Content Delivery Networks (CDNs) • Dedicated collections of servers that are geographically distributed • Provide static content, e.g. images, streaming video • Allows user to access replica of content that is “close” • Replica location done via DNS interposition or URL rewriting at origin servers • Redirection adds overhead • Reduces average download response time
Peer-to-Peer Systems • Peers form a distributed system to exchange content • Batch-style downloads • Most peers have low-availability and limited network capacity • Files transferred via direct connection between peers
Experiment Methodology • Use passive network monitoring to collect trace of TCP traffic between University of Washington (UW) to rest of Internet • Collected 9 days of data, over 20 TB
Some Interesting Observations • UW is an HTTP content provider • Exported 16.65 TB. Imported 3.44 TB • Bandwidth consumption (in+out) • .2% Akamai • 6.04% Gnutella • 14.3% WWW • 36.9% Kazaa • Rest is other TCP protocols: mail, streaming video/audio, etc.
Some More Interesting Observations • Compared to 1999 study • HTML traffic has decreased 43% • GIF/JPG traffic has decreased 59% • AVI/MPG traffic increased nearly 400% • MP3 traffic increased nearly 300%
Objects • Median P2P object size is 4MB. • Median Web object is 2KB • 5% of Kazaa objects are over 100MB • Top 1% of Kazaa objects account for 50% of bytes transferred • For Web, top 1% account for 16% of bytes transferred
Clients • For both Web and Kazaa, small number of clients account for large portion of traffic • In Web, top 200 clients (0.5% of the population) account for 13% of the traffic • In Kazaa, top 200 clients (4% of the population) account for 50% of the traffic
Servers • Would expect server load for Kazaa to be much more distributed than for WWW • This is not the case: • Top 500 external Web servers provide 22% of the bytes • Top 500 external Kazaa servers provide 10% of the bytes
Scalability • With respect to bandwidth cost: adding another 450 Kazaa clients would be equivalent to doubling the web client population (from 40,000 to 80,000)
CDN Caching • Do CDNs provide any performance benefits over local proxy cache? • If Akamai traffic were directed to proxy cache instead: • 88% ideal object hit rate (all objects cacheable) • 50% practical hit rate • Conclusion: Widely deployed proxy caches reduce need for separate CDNs
P2P Caching • Inbound cache byte hit rate = 35% • Outbound cache byte hit rate = 85% • Hit rate increases with client population • 1,000 clients = 40% hit rate • 500,000 clients = 85% hit rate • Conclusion: Reverse P2P cache saves the most bandwidth
Conclusions • P2P traffic accounts for majority of HTTP bytes transferred • P2P objects are significantly larger than Web objects • Small number of large objects account for a large percentage of P2P traffic • Small number of clients and servers responsible for majority of P2P traffic • P2P traffic creates significant bandwidth load