130 likes | 144 Views
Lecture 2. Agenda Finish with OSPF, Areas, DR/BDR Convergence, Cost Fast Convergence Tools to troubleshoot Tools to measure convergence Intro to implementation: scheduling Readings Sub-milisecond IGP convergence uLoop elimination. To READ. Sub-millisecond convergence
E N D
Lecture 2 • Agenda • Finish with OSPF, Areas, DR/BDR • Convergence, Cost • Fast Convergence • Tools to troubleshoot • Tools to measure convergence • Intro to implementation: scheduling • Readings • Sub-milisecond IGP convergence • uLoop elimination
To READ Sub-millisecond convergence Threads vs. events
HWks • Study the scheduling or Quagga • Fast convergence in quagga OSPF • LSA generator • Implement BFD • We will take it slow…
Some complications • EXAMPLE • In a broadcast network can not have n^2 adjacencies • Use a fake centralized router • Designated router, one is elected as such • Setup adjacencies only with it • The DR advertises the network LS for the network • Backup-DR to ensure that if DR dies I recover quickly
Scalability concerns • EXAMPLE: flooding cost and number of routes • AREAS • Limit the scope of flooding • Limit the number of routes • If allocated hierarchically/properly • Area border router (ABR) between two areas • ΕΧΑΜPLE: how to compute areas through ABR • Area 0 is special and is thebackbone • Stub areas that do not have through traffic
Periodic LSA refresh • To catch some rate memory corruptions • Necessary to make the protocol really robust • In OSPF refresh each 30 minutes • Synchronization of updates • If not careful all routers will refresh all their LSAs at the same time • Randomization
What matters • Convergence speed • How quickly all routers will have consistent information after a change in the network • uLoops cause me to loose traffic • How quickly new routes start to being used • So traffic flows properly again • Stability • How much protocol control traffic • How much CPU I burn • How things work when CPU is overloaded • The above may be conflicting goals
Important Times in IGP • EXAMPLE timeline • Link fails, router detects failure, sends update, computes SPF, updates RIB, updates FIB • Failure detection time • Depends on link technology • May need to rely on the HELLO protocol • Flooding time • Depends on the CPU load and network load, interfaces • SPF time • Depends on how loaded is the CPU and how many routes I have • Depends on the algorithm • RIB/FIB update • How fast I change forwarding plane • depends on number of routes
Network wide timeline • Routers next to the failed node will detect the failure, originate LSA • LSA will travel the diameter of the network • Last router will compute SPF • Last router will install routes in FIB • Done
How to be faster • Faster SPF • Better algorithms • Incremental SPF • Faster detection • Faster HELLOs • BFD!!! • In the line card instead of the control plane • many protocols can share • Faster FIB download • Download “important” prefixes first • Do things faster • Trigger SPF immediately • Trigger LSA origination immediately
How to be stable • SPF may be expensive • Can not do SPF all the time something minor changes, may be better to do one SPF for all changes • Avoid extra FIB downloads • Dot not want to do SPF all the time if there is network churn, will overload CPU • Do not want to sent too many updates at once • Receiver may get overloaded • Do not want to send updates too quickly • Link may be flapping • When CPU/links are loaded ensure that • Do not miss HELLOs, will make things worse
Configuration: Timers • Hello timer, dead timer • LSA update delay • LSA pacing • LSA retransmission pacing • SPF delay • Wait for this time before you do SPF • SPF hold-time • Do not do another SPF before this time passes • Can have dynamic timers • Be fast when CPU is idle • Be slow when CPU is loaded
How to measure performance • Black box vs. white box • White box is near impossible for commercial products • Black box needs tricks • Without knowledge of the internal-structure • See paper