1 / 21

TRILL Routing Scalability Considerations

TRILL Routing Scalability Considerations. Alex Zinin zinin@psg.com. General scalability framework. About growth functions for Data overhead (Adj’s, LSDB, MAC entries) BW overhead (Hellos, Updates, Refr’s/sec) CPU overhead (comp complexity, frequency) Scaling parameters

jenna
Download Presentation

TRILL Routing Scalability Considerations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TRILL Routing Scalability Considerations Alex Zinin zinin@psg.com TRILL BOF

  2. General scalability framework • About growth functions for • Data overhead (Adj’s, LSDB, MAC entries) • BW overhead (Hellos, Updates, Refr’s/sec) • CPU overhead (comp complexity, frequency) • Scaling parameters • N—total number of stations N • L—number of VLANs • F—relocation frequency • Types of devices • Edge switch (attached to a fraction of N, and L) • Core switch (most of L) TRILL BOF

  3. Scenarios for analysis • Single stationary bcast domain • No practical station mobility • N = O(1K) by natural bcast limits • Bcast domain with mobile stations • Multiple stationary VLANs • L = O(1K) total, O(100) visible to switch • N = O(10K) total • Multiple VLANs with mobile stations TRILL BOF

  4. Protocol params of interest • What • Amount of data (topology, leaf entries) • Number of LSPs • LSP refresh rate • LSP update rate • Flooding complexity • Route calculation complexity & frequency • Why • Required memory [increase] as network grows • Required mem & CPU to keep up with protocol dynamics • Link BW overhead to control the network • How: • Absolute: big-O notation • Relative: compare to e.g. bridging & IP routing TRILL BOF

  5. Why is this important • If data-inefficient: • Increased memory requirements • Frequent memory upgrades as network grows • Much more info to flood • If comput’ly inefficient: • Substantial comp power increase == marginal network size increase • High CPU utilization • Inability to keep up with protocol dynamics TRILL BOF

  6. Link-state Protocol Dynamics • Network events are visible everywhere • Main assumption for stationary networks: • Network change is temporary • Topology stabilizes within finite T • For each node: • Rinp—input update rate (network event frequency) • Rprc—update process rate • Long-term convergence condition: • Rprc >> Rinp • What if (Rprc < Rinp) ??? • Micro bursts are buffered by queues • Short-term (normal for stat. nets): update drops, rexmit, convergence • Long-term/permanent: net never converges, CPU upgrade needed • Rprc = f (proto design, CPU, implementation) • Rinp = f (proto design, network) TRILL BOF

  7. Data-plane parameters • Data overhead • Number of MAC entries in CAM-table • Why worry? • CAM-table is expensive • 1-8K entries for small switches • 32K-128K for core switches • Shared among VLANs • Entries expire when stations go silent TRILL BOF

  8. Single Bcast domain (CP) • Total of O(1K) MAC addresses • Each address: 12bit VLAN tag + 48bit MAC = 60 bits • IS-IS update packing: • 4 addr’s per TLV (TLV is 255B max) • 20 addr’s per LSP fragment (1470B default) • ~5K add’s per node (256 frags total) • LSP refresh rate: • 1K MACs = 50 LSPs • 1h renewal = 1 update every 72 secs • MAC update rate: • Depends on MAC learning & dead detection procedure TRILL BOF

  9. MAC learning • Traffic + expiration (5-15m): • Announces station activity • 1K stations, 30m fluctuations = 1 update every 1.8 seconds average • Likely bursts due to “start-of-day” phenomenon • Reachability-based • Start announcing MAC when first heard from station • Assume it’s there until have seen evidence otherwise even if silent (presumption of reachability) • Removes activity-sensitive fluctuations TRILL BOF

  10. Single bcast domain (DP) • Number of entries • Bridges: f (traffic) • Limited by local config, location within network • Rbridge: all attached stations • No big change for core switches (see most MACs) • May be a problem for smaller ones TRILL BOF

  11. Single bcast: summary • With reachibility-based MAC announcements… • CP is well within the limits of current link-state routing protocols • Can comfortably handle O(10k) routes • Dynamics are very similar • There’s an existence proof that this works • CP data overhead is O(N) • Worse than IP routing: O(log N) • However, net size is upper-bound by bcast limits • Small switches will need to store & compute more • Data-plane may require bigger MAC tables in smaller switches TRILL BOF

  12. Note: comfort limit • Always possible to overload neighbor w updates • Update flow control is employed • Dynamic is possible, yet… • Experience-based heuristics: pace updates at 30/sec • Not a hard rule, ballpark • Limits burst Rinp for neighbor • Prevents drops during flooding storms • Given the (Rprc >> Rinp) condition, want average to be an order of magnitude lower, e.g. O(1) upd/sec Max TRILL BOF

  13. Note: protocol upper-bound • LSP generation is paced: normally not more frequent than each 5 secs • Each LSP frag has it’s own timer • With equal distribution • Max node origination rate == 51 upd/sec • Does not address long-term stability TRILL BOF

  14. Single bcast + mobility • Same number of stations • Same data efficiency for CP and DP • Different dynamics • Take IETF wireless network, worst case • ~700 stations • New location within 10 minutes • Average 1 MAC every 0.86 sec or 1.16 MAC/sec • Note: every small switch in VLAN will see updates • How does it work now??? • Bridges (APs + switches) relearn MACs, expire old • Summary: dynamics barely fit within comfort range TRILL BOF

  15. Multiple VLANs • Real networks have VLANs • Assuming current proposal is used • Standard IS-IS flooding • Two possibilities: • Single IS-IS instance for whole network • Separate IS-IS instance per VLAN • Similar scaling challenges as with VR-based L3 VPNs TRILL BOF

  16. VLANs: single IS-IS • Assuming reachability-based MAC announc’t • Adjacencies and convergence scale well • However… • Easily hit 5K MAC/node limit (solvable) • Every switch sees every MAC in every VLAN • Even if it doesn’t need it • Clear scaling issue TRILL BOF

  17. VLANs: multiple instances • MAC announcements scale well • Good resource separation • However… • N adjacencies for a VLAN trunk • N times more processing for a single topological event • N times more data structures (neighbors, timers, etc.) • N =100…1000 for a core switch • Clear scaling issue for core switches TRILL BOF

  18. VLANs: data plane • Core switches • Not big difference • Exposed to most MACs in VLANs anyway • Smaller switches • Have to install all MACs even if a single port on a switch belongs to a VLAN • May require bigger MAC tables than available today TRILL BOF

  19. VLANs: summary • Control plane: • Currently available solutions have scaling issues • Data plane: • Smaller switches may have to pay TRILL BOF

  20. VLANs + Mobility • Assuming some VLANs will have mobile stations • Data plane: same as stationary VLANs • All scaling considerations for VLANs apply • Mobility dynamics get multiplied • Single IS-IS: updates hit same adjacency • Multiple IS-IS: updates hit same CPU • Activity not bounded naturally anymore • Update rate easily goes outside comfort range • Clear scaling issues TRILL BOF

  21. Resolving scaling concerns • 5K MAC/node limit in IS-IS could be solved with RFC3786 • Don’t use per-VLAN (multi-instance) routing • Use reachability-based MAC announcement • Scaling MAC distribution requires VLAN-aware flooding: • Each node and link is associated with a set of VLANs • Only information needed by the remote nbr is flooded to it • Not present in current IS-IS framework • Forget about mobility ;-) TRILL BOF

More Related