280 likes | 392 Views
Deconstructing the KaZaA Network. Matei Ripeanu joint work with Nathaniel Leibowitz and Adam Wierzbicki. P2P Impact: Widespread adoption. KaZaA – 200 millions downloads (3.5M/week) one of the most popular applications ever!
E N D
Deconstructing the KaZaA Network Matei Ripeanu joint work with Nathaniel Leibowitz and Adam Wierzbicki
P2P Impact: Widespread adoption • KaZaA – 200 millions downloads (3.5M/week) one of the most popular applications ever! • Number of users for file-sharing applications (www.slyck.com, March’03) • Surveys: 25-30% of all customers at large ISPs use P2P file-sharing systems
P2P Impact (2): Huge traffic • P2P generated traffic now dominates the Internet load • Internet2 traffic statistics • UChicago estimate (March ‘01): Gnutella control traffic about 1% of all Internet traffic. • Cornell.edu (March ’02): 60% P2P
Recent studies Three recent measurement studies on Kazaa traffic: • Are File Swapping Networks Cacheable? Characterizing P2P Traffic, N. Leibowitz, et all, (WCW7 Aug 2002) • Analyzing Peer-to-Peer Traffic Across Large Networks, S. Sen, J. Wang, (IMW, Nov. 2002) • An Analysis of Internet Content Delivery Systems, S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, H. Levy (OSDI, Dec. 2002)
Datacollection • Collect traces at border routers • UWashington, Tier 1 ISP (AT&T?), large Israeli ISP • Identify (and log) Kazaa traffic based on: • port number (1214) • content of HTTP request
Question 1: • What is the overall bandwidth impact?
Bandwidth repartition UW data, June 2002, Source: Saroiu & all. UWashington measurements • Web = 14% of TCP; P2P = 43% of TCP • P2P now dominates Web in bandwidth consumed
Inbound vs. Outbound traffic • UWashington acts like a huge content server: outbound (served) traffic 7.6 times larger than inbound traffic • Residential ISP: the situation is reversed as inbound traffic is more than 5 times larger than outbound
Question 2: • How do the objects shared look like?
File size characteristics • Possible file ranges: • 10KB-100KB pics • 1MB-5MB songs • 10-200MB apps, video clips • > 500MB movies
Question 3 • What is the file popularity distribution? Terminology: • Download session: downloading one chunk of the file in a single HTTP session • Download cycle: a complete download of a file
File popularity distribution • 10% most popular files generate 60% of the download cycles • 1% (or about 3,000) most popular files generate 25% of the download cycles
Question 4: • How is consumed bandwidth use distributed among objects?
Traffic distribution - files • 1% most popular files generate 80% of the traffic • 0.1% most popular files (about 300) generate 50% of the traffic • Compare to UWashington traces where 1% most popular objects responsible for ‘only’ 50% of bytes transferred
Costs … Cost to provide access to the most popular object for a month Assumptions: • OC3 line at $40K/month • 5 day logs extrapolated to one month
Traffic distribution vs. file size • 60 % of the bytes downloaded but only 5% of download cycles correspond to large (movie) files
Question 6: • Content dynamics and caching performance
Content dynamics How many new files does the system sees? per day per hour
Content dynamics (2) How stable is the set of most popular files? About 30% files remain popular over long period of time
Achieved caching performance Significant savings: • File hit rates of 30-35% • Byte hit rates 50-60% • P2P traffic is more cacheable than Web traffic • But, it takes long time to warm-up caches (weeks)
Question 7: • Virtual relationships between users Outliers filtered out
Food web LANL coauthors Film actors Power grid Web Internet Word co-occurrences Small world data-sharing graph Data-sharing graph: • Nodes == Kazaa Users • Link two users that have similar activities (download the same files)
Future questions • What savings can be realized without in caching data but only redirecting requests to local users? • What can one say about the overall characteristics of the network (number of users, number of files, distributions) knowing only data logged by one ISP. Constraint: • Law makers may cause P2P traffic to vanish • However this will lead to a new research question: How will the sudden disappearance of 60% of Internet traffic affect the Internet?
Your questions • Thank you
Goals High-level questions: • What is the impact of these new content delivery systems on the Internet and on ISPs? • What are the characteristics of the Kazaa traffic?