1 / 21

Improving the Reliability of Internet Paths with One-hop Source Routing

Improving the Reliability of Internet Paths with One-hop Source Routing. Krishna P.Gummadi Univ. of Washington OSDI 2004. Outline. One-line comment Problem Measurement study Approach Detailed-policy Real-world implementation Critique. One-line comment.

nia
Download Presentation

Improving the Reliability of Internet Paths with One-hop Source Routing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving the Reliability of Internet Paths with One-hop Source Routing Krishna P.Gummadi Univ. of Washington OSDI 2004

  2. Outline • One-line comment • Problem • Measurement study • Approach • Detailed-policy • Real-world implementation • Critique

  3. One-line comment • Improving the Reliability of Internet Paths simply with Scalable One-hop Source Routing (SOSR) destination source

  4. B A Network C D Problem • Internet reliability demands increase • However, the reliability falls FAR short of the “five 9s” • Encountering a path failure: 1.5 ~ 3.3% • Long recovery time • Suggested solutions • Server replications • Expensive – limited to high-end web sites • Multi-homing • BGP fail-over time is LONG [Labovitz 00] • Overlay routing network (ex: RON) • Monitoring/Selecting paths incurs high overhead Any Simple, Scalable Solutions??

  5. Measurement study on Environment • The realities of internet path failures • Measure the availability of Internet paths broadly • Frequency, Duration of failures • Find the potential of their approach - SOSR • 2 important factors of the performance • Location of failures, success rate of alternate paths

  6. Methodology • Requesters: 67 Planet Lab nodes (monitor internet paths) • Destination : three different set of hosts 3153 =378 popular Web Servers + 1139 Broadband Hosts + 1636 Randomly selected IP addresses (Comparison)

  7. Methodology Assigned destination • Probe every 15 seconds • Probe every 5 seconds after a loss(no response within 3 sec) • 3 consecutive loss  failure • All other observers probe together (to check alternative paths) • Path recovery: 10 consecutive response after the failure PL observer

  8. Measured facts • Availability (Frequency, Duration) • “7 day study saw more failures than RON saw in 9 months!” • On average each path failed at least once per week. • 20% of all server paths were fault free • 12% of all broadband host paths were fault free Web server paths : 99.6% availability Broadband host paths: 94.4% availability

  9. Measured facts • Location of failures • 4 different parts of Internet paths • src_side, backbone, dst_side, last_hop • Effects the number of alternative paths • Backbone: path diversity • Last-hop: no choice  Destination last hop dst_side Backbone (Core) src_side Observer

  10. Measured facts • Success rate of other observers during path failures • Can recover from failure through the alternative paths • Select the node as an intermediary node PL Observer X Destination PL Observer1

  11. Approach - One Hop Source Routing Observer 4 Observer1 Requester Observer2 Destination How should we select intermediaries? Observer 3

  12. Which intermediary? • Number of useful intermediaries 100%-20%=80% 21 or more nodes Let’s pick k intermediary nodes randomly!  No state maintenance!

  13. How many intermediaries? Knee in the graph is at k = 4 4 intermediaries are enough!  Less overhead, high recovery rate

  14. Approach - One Hop Source Routing Observer 4 Observer1 Observer 5 Requester Observer2 Let’s select 4 intermediaries randomly! Destination Observer 3

  15. Result of random-4 • For Servers it recovered from • 50% of near-source side failures • 89% of middle core failures! • 72% of destination side failures! • 40% of last hop failures

  16. Improving random-k • Assumption: Disjoint path can recover from failure • Doesn’t share the failed link • 1. History-k • (random k-1) + recently succeeded node (assumed to be disjoint) • 2. BGP-paths-k • Try to use the most disjoint path for recovery • Select the paths with smallest ASs in common • Have to sort intermediaries by the number of common ASs

  17. Real-world implementation Requester Intermediary

  18. The Test • 3 machines running wget at U. of Washington • 982 web servers, 1 webpage fetched per sec for 3 days. • 273,000 total requests • All machines fetched same page at same time • To share the path failure • 3 techniques: wget, wget-sosr and wget-aggressiveTCP

  19. Result • Failure rate were only 0.18% • wget-SOSR: Recovered from 56% of network level failures • However, due to applications failures • overall recovery rate is 20%

  20. Critique • Strong Point • A new approach without overlay network • Stateless, Simple, Scalable • Weak Points • Latency • The simple approach doesn’t consider latency • Response time of a web page is critical! • Supporting stateful connections • Many web-based communications are stateful • Need some improvement to support stateful connection recovery • Limited to only a few application • Users have to install “additionally” for the applications

  21. Some optimizations

More Related