1 / 32

A Load-Balanced Switch with an Arbitrary Number of Linecards

A Load-Balanced Switch with an Arbitrary Number of Linecards. Isaac Keslassy , Shang-Tse (Da) Chuang, Nick McKeown Stanford University. Stanford 100Tb/s Router. “Optics in Routers” project http://yuba.stanford.edu/or/ Some challenging numbers: 100Tb/s R =160Gb/s linecard rate

ida
Download Presentation

A Load-Balanced Switch with an Arbitrary Number of Linecards

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Load-Balanced Switch with an Arbitrary Number of Linecards Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University

  2. Stanford 100Tb/s Router • “Optics in Routers” project • http://yuba.stanford.edu/or/ • Some challenging numbers: • 100Tb/s • R=160Gb/s linecard rate • N=640 linecards • Performance guarantees

  3. Router Wish List Scale to High Linecard Speeds • No Centralized Scheduler • Optical Switch Fabric • Low Packet-Processing Complexity Scale to High Number of Linecards • High Number of Linecards • Arbitrary Arrangement of Linecards Provide Performance Guarantees • 100% Throughput Guarantee • Delay Guarantee • No Packet Reordering

  4. Load-Balanced Switch Out Out Out Forwarding mesh Load-balancing mesh R R In 3 2 1 R/N R/N R/N R/N R/N R/N R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In R/N R/N

  5. Load-Balanced Switch Out Out Out Forwarding mesh Load-balancing mesh R R In R/N R/N R/N R/N 1 R/N R/N R/N R/N R R In R/N R/N 2 R/N R/N R/N R/N R/N R R R/N In R/N R/N 3

  6. Combining the Two Meshes R R/N R/N Out R/N One linecard R/N R R/N R/N Out R/N R/N R R/N Out In In Out Out R In R/N R/N R/N R/N R In R/N R/N R/N R R/N In R/N

  7. R R In In Out Out In In 2R/N Out Out In In Out Out In In Out Out A Single Combined Mesh

  8. References on Early Work • Initial Work • C.-S. Chang, D.-S. Lee and Y.-S. Jou, "Load Balanced Birkhoff-von Neumann Switches, part I: One-Stage Buffering," Computer Communications, Vol. 25, pp. 611-622, 2002. • Sigcomm’03 • I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M. Horowitz, O. Solgaard and N. McKeown, "Scaling Internet Routers Using Optics," ACM SIGCOMM '03, Karlsruhe, Germany, August 2003.

  9. Summary of Early Work

  10.    Router Wish List  Scale to High Linecard Speeds • No Centralized Scheduler • Optical Switch Fabric • Low Packet-Processing Complexity Scale to High Number of Linecards • High Number of Linecards • Arbitrary Arrangement of Linecards Provide Performance Guarantees • 100% Throughput Guarantee • Delay Guarantee • No Packet Reordering 

  11. 1 2 3 4 7 1 2 3 4 5 8 6 1 2 3 4 5 6 7 8 ExampleN=8 2R/8

  12. 8 2 1 3 4 7 6 5 5 3 2 1 6 7 8 4 When N is Too LargeDecompose into groups (or racks) 2R 2R 4R 4R 4R/4 2R 2R

  13. 1 1 2 L L 2 1 2 L L 2 1 When N is Too LargeDecompose into groups (or racks) Group/Rack 1 Group/Rack 1 2R 2R 2RL/G 2R 2R 2RL 2RL 2R 2R 2RL/G Group/RackG Group/Rack G 2RL/G 2R 2R 2R 2R 2RL 2RL 2R 2RL/G 2R

  14.    Router Wish List  Scale to High Linecard Speeds • No Centralized Scheduler • Optical Switch Fabric • Low Packet-Processing Complexity Scale to High Number of Linecards • High Number of Linecards • Arbitrary Arrangement of Linecards Provide Performance Guarantees • 100% Throughput Guarantee • Delay Guarantee • No Packet Reordering  

  15. 2RL/G 2RL/G 2RL/G 2RL/G 2RL/G + + = 2RL/G 2RL/G 2RL 2RL/G ≤ G * 2 2 1 L 1 L 1 L L 1 2 2 When Linecards are MissingFailures, Incremental Additions, and Removals… Group/Rack 1 Group/Rack 1 2R 2R 2RL 2RL/G 2R 2R 2RL 2RL 2R 2R • Solution: replace mesh with sum of permutations Group/RackG Group/Rack G 2R 2R 2R 2R 2RL 2RL 2R 2R

  16. Optics L 2 1 L 1 L 2 2 1 2 L 1 Electronics Electronics Hybrid Electro-Optical ArchitectureUsing MEMS Switches Group/Rack 1 Group/Rack 1 2R 2R 2R 2R MEMS Switch 2R 2R MEMS Switch Group/RackG Group/Rack G 2R 2R 2R 2R 2R 2R

  17. 1 L 2 1 L 1 L 2 1 2 L 2 When Linecards are Missing Group/Rack 1 Group/Rack 1 2R 2R 2R 2R MEMS Switch 2R 2R MEMS Switch Group/RackG Group/Rack G 2R 2R 2R 2R 2R 2R

  18.       Router Wish List Scale to High Linecard Speeds • No Centralized Scheduler • Optical Switch Fabric • Low Packet-Processing Complexity Scale to High Number of Linecards • High Number of Linecards • Arbitrary Arrangement of Linecards Provide Performance Guarantees • 100% Throughput Guarantee • Delay Guarantee • No Packet Reordering 

  19. Questions • Number of MEMS Switches? • TDM Schedule?

  20. Laser/Modulator MUX l1 l1 , l2 ,...,l64 l2 l64 Link Capacity ≈ 64 λ’s * 5 Gb/s/λ = 320 Gb/s = 2R 2 L 1 2 L 1 L L 1 2 1 2 All Link Capacities Are Equal Group/Rack 1 Group/Rack 1 2R 2R 2R 2R ≤ 2R MEMS Switch 2R 2R ≤ 2R ≤ 2R MEMS Switch Group/RackG Group/Rack G ≤ 2R 2R 2R MEMS Switch ≤ 2R 2R 2R 2R 2R ≤ 2R

  21. 2R 2R 2R 2R 2R 2R 2 1 1 2 2 1 Example2 Groups of 2 Linecards Group/Rack 1 Group/Rack 1 2R 2R 1 4R 4R 2R 2R 2 Group/Rack 2 Group/Rack 2 2R 2R 4R 4R 2R 2R

  22. ≤ 2R ≤ 2R ≤ 2R Group/Rack 2 Group/Rack 2 2R 2R 2R 2R Group/Rack G Group/Rack G G-1 2R 2R 2R 2R 1 1 2 L 1 1 L 2 1 1 Intuition on Worst-Case Group/Rack 1 Group/Rack 1 2R 2R L 2R 2R 2RL 2RL MEMS Switch 2R 2R MEMS Switch MEMS Switch

  23. Number of MEMS Switches • Theorem:M ≤ L+G-1 • Examples:

  24. Questions • Number of MEMS Switches? • TDM Schedule?

  25. 2R 2R 2R 2R 2 1 1 2 2 1 TDM Schedule Group A Group A 2R 2R 1 4R 4R 2R 2R 2 Group B Group B 2R 2R 4R 4R 2R 2R

  26. Tx Group A Tx Group B TDM Schedule

  27. Tx Group A Tx Group B TDM Schedule

  28. Tx Group A Tx Group B Bad TDM Schedule

  29. TDM Schedule Algorithm • Intuition • Create TDM schedule between groups: “Group A sends to group B” • Assign group connections to specific linecards: “Linecard A1 sends to linecard B3” • Theorem:There exists a polynomial-time algorithm to find a correct TDM schedule.

  30. Algorithm Running Time milliseconds Worst Case Average Case Best Case number of linecards [Verilog simulation, linecard placement generated uniformly-at-random among 40 groups, 4ns clock cycle, 1000 runs per case. Source: Srikanth Arekapudi]

  31. Open Questions • Greedy TDM algorithm with more capacity? • A better switch fabric architecture?

  32. Thank you.

More Related