Infrastructure-based Resilient Routing

Infrastructure-basedResilient Routing Ben Y. Zhao, Ling Huang, Jeremy Stribling, Anthony Joseph and John Kubiatowicz University of California, Berkeley Sahara Winter Retreat, 2004

Challenges Facing Network Applications • Network connectivity is not reliable • Disconnections frequent in the wide-area Internet • IP-level repair is slow • Wide-area: BGP  3 mins • Local-area: IS-IS  5 seconds • Next generation network applications • Mostly wide-area • Streaming media, VoIP, B2B transactions • Low tolerance of delay, jitter and faults • Our work: transparent resilient routing infrastructure that adapts to faults in not seconds, but milliseconds ravenben@eecs.berkeley.edu

Talk Overview • Motivation • A Structured Overlay Infrastructure • Mechanisms and policy • Evaluation • Summary ravenben@eecs.berkeley.edu

The Challenge • Routing failures are diverse • Many causes: • Router misconfigurations, cut fiber, planned downtime, protocol implementation bugs • Occur anywhere with local or global impact: • Single fiber cut can disconnect AS pairs • Isolating failures is difficult • Wide-area measurement is ongoing research • Single event leads to complex inter-protocol interactions • End user symptoms often dynamic or intermittent • Requires: • Fault detection from multiple distributed vantage points • In-network decision making necessary for timely responses ravenben@eecs.berkeley.edu

An Infrastructure Approach • Our goals • Overlay focused on resiliency • Route around failures to maintain connectivity • Respond in milliseconds (react instantaneously to faults) • Our approach • Large-scale infrastructure for fault and route discovery • Nodes are observation points (similar to Plato’s NEWS service) • Nodes are also points of traffic redirection(forwarding path determination and data forwarding) • Automated fault-detection and circumvention • No edge node involvement: fast response time, security focused on infrastructure • Fully transparent, no application awareness necessary ravenben@eecs.berkeley.edu

Qwest Backbone An Illustration Goal: fast fault detection and route-around Key: on the fly in-network traffic redirection ravenben@eecs.berkeley.edu

Why Structured Overlays • Resilient Overlay Networks (MIT) • Fully connected mesh • Allows each node full knowledge of network • Fast, independent calculation of routes • Nodes can construct any path, maximum flexibility • Cost of flexibility • Protocol needs to choose the “right” route/nodes • Per node O(n) state • Monitors n - 1 paths • O(n2) total path monitoring is expensive D S ravenben@eecs.berkeley.edu

O V E R L A Y v v v v v v v v v v v v v The Big Picture • Locate nearby overlay proxy • Establish overlay path to destination host • Overlay traffic routes traffic resiliently Internet ravenben@eecs.berkeley.edu

register register get (hash(B)) P’(B) put (hash(B), P’(B)) put (hash(A), P’(A)) Traffic Tunneling A, B are IP addresses Legacy Node B Legacy Node A B P’(B) Proxy P’(B) = B P’(A) = A Proxy Structured Peer to Peer Overlay • Store mapping from end host IP to its proxy’s overlay ID • Similar to approach in Internet Indirection Infrastructure (I3) ravenben@eecs.berkeley.edu

Tradeoffs of Tunneling via P2P • Less neighbor paths to monitor per node: O(log(n)) • Large reduction in probing bandwidth: O(n)  O(log(n)) • Faster fault detection with low bandwidth consumption • Actively maintain path redundancy • Manageable for “small” # of paths • Redirect traffic immediately when a failure is detectedEliminate on-the-fly calculation of new routes • Restore redundancy when a path fails • Fast fault detection + precomputed paths = increased responsiveness • Cons: overlay imposes routing stretch (mostly < 2) ravenben@eecs.berkeley.edu

In-network Resiliency Mechanisms • Efficient fault detection • Use soft-state to periodically probe log(n) neighbor paths • “Small” number of routes  reduced bandwidth • Exponentially weighted moving averagefor link quality estimation • Avoid route flapping due to short term loss artifacts • Loss rate Ln = (1 - )  Ln-1 +   p • Simple approach taken, ongoing research available • Smart fault-detection / propagation (Zhuang04) • Intelligent and cooperative path selection (Seshardri04) • Maintaining backup paths • Each hop has flexible routing constraint • Create and store backup routes at node insertion • Restore redundancy via “intelligent” gossip after failures • Simple policies to choose among redundant paths ravenben@eecs.berkeley.edu

First Reachable Link Selection (FRLS) • Use estimated loss results to choose shortest “usable” path • Sort next hop paths by latency • Use shortest path withminimal quality > T • Correlated failures • Reduce with intelligent topology construction • Key is to leverage redundancy available ravenben@eecs.berkeley.edu

Evaluation • Metrics for evaluation • How much routing resiliency can we exploit? • How fast can we adapt to faults (responsiveness)? • Experimental platforms • Event-based simulations on transit stub topologies • Data collected over multiple 5000-node topologies • PlanetLab measurements • Microbenchmarks on responsiveness More details in paper (ICNP03) and poster session ravenben@eecs.berkeley.edu

Exploiting Route Redundancy (Sim) • Simulation of Tapestry, 2 backup paths per routing entry • Transit-stub topology shown, results from TIER and AS graphs similar ravenben@eecs.berkeley.edu

660 300 Responsiveness to Faults (PlanetLab) • Response time increases linearly with probe period • Minimum link quality threshold T = 70%, 20 runs per data point ravenben@eecs.berkeley.edu

Link Probing Bandwidth (Planetlab) • Medium sized routing overlays incur low probing bandwidth • Bandwidth increases logarithmically with overlay size ravenben@eecs.berkeley.edu

Conclusion • Pros and cons of infrastructure approach • Structured routing has low path maintenance costs • Allows “caching” of backup paths for quick failover • Transparent to user applications • Can no longer construct arbitrary paths • Structured routing with low redundancy close to ideal connectivity • Incur low routing stretch • Fast enough for highly interactive applications • 300ms beacon period  response time < 700ms • On overlay networks of 300 nodes, b/w cost is 7KB/s • Ongoing questions • Is there lower bound on desired responsiveness?Should we use multipath redundant routing for resilience? • How to deploy as a single network across ISPs?VPN-like routing service? ravenben@eecs.berkeley.edu

Related Work • Redirection overlays • Detour (IEEE Micro 99) • Resilient Overlay Networks (SOSP 01) • Internet Indirection Infrastructure (SIGCOMM 02) • Secure Overlay Services (SIGCOMM 02) • Topology estimation techniques • Adaptive probing (IPTPS 03) • Internet tomography (IMC 03) • Routing underlay (SIGCOMM 03) • Many, many other structured peer-to-peer overlays Thanks to Dennis Geels / Sean Rhea for their work on BMark ravenben@eecs.berkeley.edu

Infrastructure-based Resilient Routing

Infrastructure-based Resilient Routing

Presentation Transcript

Epidemic Routing and Oracle Based Routing

SC08 Routing Infrastructure

Infrastructure-based Resilient Routing

Road-Based Multipath Routing With Resilient Video Streaming for Urban VANETs

SESSION: RESILIENT INFRASTRUCTURE

SESSION: RESILIENT INFRASTRUCTURE

Road-Based Multipath Routing With Resilient Video Streaming for Urban VANETs

Flat routing infrastructure

Action 7. Making infrastructure more resilient

Policy-Based Routing

Chapter 5: Routing Protocols in Infrastructure-based Opportunistic Networks

Towards Resilient and Practical Geometric Routing for WSNs

Securing the Routing Infrastructure

Resilient Water Infrastructure

Utility-based Routing

Ants-based Routing

A Mobile Infrastructure Based VANET Routing Protocol in the Urban Environment

Secure Urbanism and Resilient Infrastructure

Resilient P2P Anonymous Routing by Using Redundancy

Epidemic Routing and Oracle Based Routing

Secure Urbanism and Resilient Infrastructure