1 / 28

Internet Routing Instability

Internet Routing Instability. Craig Labovitz, G. Robert Malan, Farham Jahanian University of Michigan Presented By Krishnanand M Kamath. Cause and Effect. Define routing instability Rapid change of network reachability and topology information. Causes Router Configuration Errors

garth
Download Presentation

Internet Routing Instability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet Routing Instability Craig Labovitz, G. Robert Malan, Farham Jahanian University of Michigan Presented By Krishnanand M Kamath

  2. Cause and Effect Define routing instability Rapid change of network reachability and topology information • Causes • Router Configuration Errors • Transient Physical and data link problems • Problems with leased line, router failures, high levels of congestion • Software Configuration Errors • Effects • Very many – slew of effects

  3. Effects • Increased network latency and time to convergence • Dropped and out of order delivery of packets • Miserable end to end performance • Loss of connectivity in national networks • Route caching architecture and low end processors for CPU • Pr(Cache Miss) increases, severe CPU load, memory problems • Delays in packet processing, Keep-Alive packets are delayed • Others flag the router as down and transmit updates • Down router reinitiates peering session • Large state dump transmission • Yet more routers fail- Route Flap Storm

  4. Solutions • Route Aggregation • Reduces the overall number of networks visible in the core • Requires cooperation between service providers • Redundant connectivity to the internet – multi-homing • Route Dampening Algorithms • Not a panacea – legitimate announcements may be delayed • Overall, • Multi-homing exhibiting linear growth • Internet topology growing increasingly less hierarchical • Increasing topological complexity

  5. Recall Updates • Announcements • New route • New policy decision for • an existing route Withdrawals Explicit – associated with a withdrawal message Implicit – existing route is Replaced by announcement Of new route

  6. Types of Updates Inter-domain routing updates • Forwarding Instability • Legitimate topological changes and affect the paths on which data will be forwarded between AS’s Routing policy fluctuation Reflects changes in routing policy information that may not affecting forwarding paths between AS’s Pathological Updates Redundant BGP info that reflect neither routing nor forwarding instability

  7. Major Results • Number of BGP updates is one or more orders of magnitude larger than expected. • Routing information is dominated by pathological updates • Instability and redundant updates exhibit a periodicity of 30 & 60 secs • Instability and redundant updates show a correlation to network usage • Instability is not dominated by a small set of AS or routes • Discounting policy fluctuation and pathological behavior there remains a significant level of internet forwarding instability • Specific architectural and protocol implementation changes in • commercial internet routers through collaboration with vendors

  8. Taxonomy • Data Analyzed • Sequences of BGP updates for each (prefix, peer) tuple • Events Identified • WADiff • A route is explicitly withdrawn as it becomes unreachable and later replaced • with an alternative route to the same destination. The alternative route differs • in its ASPATH or nexthop attribute information.(Forwarding Instability) • AADiff • A route is implicitly withdrawn and replaced by an alternative route as the • original route becomes unreachable, or a prefferd alternative path becomes • Available (Forwarding Instability)

  9. Taxonomy(contd.) • Events Identified(contd.) • WADup • A route is explicitly withdrawn and then re-announced as reachable. This may • reflect transient topological failure, or it may represent a pathological oscillation. • (Forwarding Instability or Pathological Behavior) • AADup • A route is implicitly withdrawn and replaced with a duplicate of the original route. • Duplicate Route – is defined as a subsequent route announcement that does not • differ in nexthop or ASPATH attribute information. • (Pathological Behavior or Route Ploicy Fluctuation) • WWDup • The repeated transmission of BGP withdrawals for a prefix that is currently • unreachable. (Pathological Behavior)

  10. Methodology Data Collected: BGP routing messages Time Period: Over the course of 9 months starting Jan 96 Where: Five of the major U.S. network exchange points Tool: Unix based route servers, Multithreaded routing Toolkit(MRT)

  11. Gross Observations • We Expect, • Instability  (Globally visible addresses, total number of available paths) • We Observe, • For 45,000 prefixes and 1500 paths- 3 to 6 million updates per day

  12. Pathological Behavior • Disturbing behaviors, • Most of the BGP updates entirely pathological (WWDup) • Disproportionate effect that a single service provider can have on global routing • Causal relationship between manufacturer of a router and level of pathological behavior • Routing updates have a regular, specific periodicity of either 30 or 60 seconds • Persistence of pathological behavior are under five minutes

  13. Origins of Pathologies Stateless BGP: Withdrawals are sent for every explicitly and implicitly withdrawn prefix- no state on info advertised to peers • Plausible Explanations, • CSU Timer problems • Unjittered 30 second interval timer, self-synchronization • Misconfigured interaction of IGP/BGP protocols • Router vendor software bugs • Unconstrained routing policies

  14. Analysis of Instability Instability as the sum of AADiff, WADiff and WADup updates

  15. Fine-grained Instability Statistics There is no correlation between the size of an AS and its proportion of the instability statistics.

  16. Fine-grained Instability Statistics No single AS or prefix consistently dominates the instability statistics Instability is evenly distributed across routes

  17. Temporal Properties of Instability • Plausible causes for the periodicity, • Routing software timers, self synchronization, and routing loops • CSU handshaking timeouts • Flaw in routing protocol

  18. Origins of Internet Routing Instability Craig Labovitz, G. Robert Malan, Farham Jahanian University of Michigan

  19. Introduction • We observed, • Several orders of magnitude more routing updates • Large number of duplicate routing messages • Unexpected frequency components between instability events • Extend earlier analysis by, • Identifying the origins of many of the pathological behavior • Impact of specific commercial router software changes suggested • Additional router software changes that can decrease updates exchanged by an additional 30 percent or more

  20. Major Results • Volume of inter-domain routing updates has decreased by an order of magnitude since April 1997. • The majority of BGP messages consists of redundant announcements • A growing proportion of instability stems from specific changes in Internet architecture coupled with limitations in router software and algorithms. • Instability is not disproportionately dominated by prefixes of specific lengths. • Persistently oscillating routes dominate the BGP traffic generated by a few Internet providers. • Experimentally confirmed a num of origins of pathological routing behavior postulated in the earlier work.

  21. Analysis of Gross Trends • Note, • Dramatic decrease in the number of withdrawals • Number of announcements have doubled over 28 month period • Growth of BGP announcements disproportional to any corresponding increase in the number of routing table entries

  22. Taxonomy Analyze sequences of BGP updates for each (prefix, peer) tuple • Identify the events, • AADup: • A route is implicitly withdrawn and replaced with a duplicate of the original route. • We define a duplicate route as a subsequent route announcement that does not • differ in any BGP path attribute information. • AADiff: • A route is implicitly withdrawn and replaced by an alternative route as the original • route becomes unreachable, or a preferred alternative path becomes available. • Tup and Tdown • Fluctuation in the reachability for a given prefix • Tup:currently unreachable prefix announced reachable & transitions up • Tdown: announced route is withdrawn and transitions down

  23. Analysis of Update Categories • AADup Behavior stems from: • Non – transitive attribute filtering • Combination of BGP minimum advertising timer with stateless BGP

  24. Analysis of AADiffs • Note • Low percentage of ASPath ASDiffs • Growth in number of origin AADiffs related to architecture and and policy issues • Growth in number of community AADiffs reflects its recent adoption by many ISPs • Oscillations in MED due to the IBGP mapped MED policy at two service providers

  25. IBGP Mapped MED

  26. Frequency • Recall, • Frequency defined as inverse of inter-arrival time between routing updates • Predominant frequencies have a 30 sec and 60 sec periodicity • Cause, • Frequency components stem from a fixed minimum BGP advertisement timer • used by atleast one router vendor

  27. Prefix Length Statistics

  28. Conclusions Volume of routing update messages decreased by an order of magnitude by specific software changes on the majority of core Internet backbone routers. Software changes successfully suppressed the generation of pathological withdrawals. Proposed new software changes that may reduce instability levels by an additional thirty percent. Instability is well distributed across both autonomous system and prefix space. No single service provider or set of network destinations appears to be at fault.

More Related