1 / 39

Effective Diagnosis of Routing Disruptions from End Systems

Effective Diagnosis of Routing Disruptions from End Systems. Ying Zhang Z. Morley Mao Ming Zhang. AS A. Routing disruptions impact application performance. More applications today have high QoS requirements Routing events can cause high loss and long delays. AS B. AS C.

shana
Download Presentation

Effective Diagnosis of Routing Disruptions from End Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective Diagnosis of Routing Disruptions from End Systems Ying Zhang Z. Morley Mao Ming Zhang

  2. AS A Routing disruptions impact application performance • More applications today have high QoS requirements • Routing events can cause high loss and long delays AS B AS C AS D AS E Internet Dst Src

  3. AS A Existing approaches to diagnose routing disruptions are ISP-centric • Require routing data from many routers in ISPs [Feldmann04, Teixeira04, Wu05] • Passive and accurate BGP collectors AS D AS C AS B Internet

  4. AS A Limitations of ISP-centric approaches • Difficult to gain access to data from many ISPs • BGP data reflects “expected” data-plane paths ISP ? ? ? End-systems AS D AS C AS B ? ? ? ? Internet

  5. Can we diagnose entirely from end systems? • Goal: infer data-plane paths of many routers Probing host AS C ISP A AS B AS D Dst

  6. Our approach: end systems based monitoring • Only require probing from end hosts • Cover all the PoPs of a target ISP Probing host AS C Target ISP AS B AS D Dst

  7. Our approach: end systems based monitoring • Cover most of the destinations on the Internet Probing host Dst Dst AS C ISP A AS B AS D Dst Dst

  8. Our approach: end systems based monitoring • Identify routing changes by comparing paths measured consecutively Probing host AS C ISP A AS B AS D Dst

  9. Advantages and challenges • Advantages: • No need to access to ISP-propriety data • Identify actual data-plane paths • Monitor data plane performance • Challenges: • Limited resources to probe • Coverage of probed paths • Timing granularity • Measurement noise

  10. System architecture Collaborative probing Target ISP Event identification and classification Event correlation and inference Event impact analysis Target ISP Target ISP Reports

  11. Outline • Collaborative probing • Event identification and classification • Event correlation and inference • Result and validation

  12. Collaborative probing • Using a set of hosts • To learn the routing state • To improve coverage • To reduce overhead Probing host AS C ISP A AS B AS D

  13. Outline • Collaborative probing • Event identification and classification • Event correlation and inference • Result and validation

  14. Event classification • Classify events according to ingress/egress changes Type2: Ingress PoP same, egress PoP different Type1: Ingress PoP changes Type3: Ingress PoP same, egress PoP same Destination Prefix P Target ISP Probing host

  15. Outline • Collaborative probing • Event identification and classification • Event correlation and inference • Result and validation

  16. Likely causes: link failures Neighbor AS Destination Prefix P Old egress PoP New egress PoP Old path New path Target ISP Probing host 16

  17. Likely causes: internal distance changes • Hot potato changes • Cost of old internal path increases • Cost of new internal path decreases Neighbor AS Old egress PoP New egress PoP distance: 120 distance: 80 distance: 100 distance: 120 17 Probing host

  18. Event correlation • Spatial correlation: a single network failure often affects multiple routers • Temporal correlation: routing events occurring close together are likely due to only a few causes

  19. Inference methodology • An evidence: an event that supports the cause Destination prefix P Link L Cause: Link L is down New egress New path Probing host Target ISP Probing host

  20. Inference methodology • A conflict: a measurement trace that conflicts with the cause Destination prefix P Link L Cause: Link L is down New egress New path Probing host Target ISP Probing host

  21. Inference methodology Evidence node [1,2,3]->[1,2,4] AS 3 AS 4 Withdrawal AS 2 Cause: node 3 withdraws the route AS 1 Cause: link 2-3 down

  22. Inference methodology Evidence Graph Evidence node [1,2,3]->[1,2,4] Evidence node [0,2,3]->[0,2,4] AS 3 AS 4 Withdrawal AS 2 Cause: node 3 withdraws the route AS 1 AS 0 Cause: link 2-3 down

  23. Inference methodology Conflict Graph AS 6 Conflict node [1,2,3,6] Conflict node [0,2,3,6] Conflict node [0,2,3] AS 3 AS 2 Cause: link 2-3 down Cause: node 3 withdraws the route AS 1 AS 0

  24. Inference methodology Evidence Graph Conflict Graph • Greedy algorithm: minimum set of causes that can explain all the evidence while minimizing conflicts Conflict node [1,2,3,6] Conflict node [0,2,3,6] Conflict node [0,2,3] Evidence node [1,2,3]->[1,2,4] Evidence node [0,2,3]->[0,2,4] Evidence: 2 Conflicts: 3 Evidence: 2 Conflicts: 0

  25. Outline • Collaborative probing • Event identification and classification • Event correlation and inference • Result and validation

  26. ISPs studied

  27. Results of event classification • Many events are internal changes • Abilene has many ingress changes

  28. Validation with BGP based approach [Wu05] • Hot potato changes: egress point changes due to internal distance changes Number of incidences identified by both Number of incidences identified by our method Number of incidences identified by BGP method False negative, false positives

  29. Validation with BGP based approach • Session resets: peering link up/down • Inaccuracy reasons: • Limited coverage • Coarse-grained probing • Measurement noise

  30. System performance • Can keep up with generated routing state • Applicable for real-time diagnosis and mitigation • Reactive: construct alternate paths to bypass the problem • Proactive: avoid paths with many historical routing disruptions

  31. Conclusion • Developed the first system to diagnose routing disruptions purely from end systems • Used a simple greedy algorithm on two bipartite graphs to infer causes • Comprehensively validated the accuracy

  32. Thank you! Questions?

  33. Performance impact analysis • End-to-end latency changes caused by different types of routing events

  34. Validation with BGP data • BGP feeds from RouteView, RIPE, Abilene, and 29 BGP feeds from a Tier-1 ISP • The destination prefix coverage and the routing event detection rate

  35. Event classification: same ingress PoP, different egress PoP • Policy changes • Local preference in the old route decreases • Local preference in the new route increases Neighbor AS Local Pref : 60->110 Local Pref : 100->50 Old egress PoP New egress PoP Old path New path Target ISP 35 Probing host

  36. Event classification: same ingress PoP, different egress PoP • External routing changes • Old route worsens due to external factors (withdrawal, longer AS path) • New route improves due to external factors AS A AS B ABCD->ABEFD BCEFD->BEFD Old egress PoP New egress PoP Old path New path Target ISP 36 Probing host

  37. Event classification: same ingress PoP, same egress PoP • Internal PoP path changes • Cost of old internal path increases • Cost of new internal path decreases • External AS path changes Destination Prefix P New path Old path Target ISP 37 Probing host

  38. Results of cause inference • Effectiveness of inference algorithm • Clusters: a group of events with the same root cause

  39. Event identification • A routing event: path changes • Event identificationomparing continuous routing snapshots Probing host AS C ISP A AS B AS D Dst

More Related