1 / 27

Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008

Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008. Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University of California, San Diego. Motivation: Networks-on-Chip. Chip-multiprocessors (CMPs) increasingly popular

fletcher
Download Presentation

Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Near-Optimal Oblivious Routingfor 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University of California, San Diego

  2. Motivation: Networks-on-Chip • Chip-multiprocessors (CMPs) increasingly popular • 2D-mesh networks often used as on-chip fabric 12.64mm I/O Area single tile 1.5mm 2.0mm 21.72mm Tilera Tile64 Intel 80-core I/O Area

  3. 2D to 3D Motivation: 3D Integrated Circuits • 3D Benefits • Reduced wire delays • Enormous bandwidth • Heterogeneous system integration • Natural progression • 3D-mesh for 3D CMPs

  4. Routing Algorithm Objectives • Maximize throughput • How much load the network can handle • Minimize hop count • Minimize routing delay between source and destination

  5. Challenges • For 2D-case, a near-optimal throughput routing algorithm with minimal hop count called O1TURN is known [Seo’05]. • Surprisingly, optimality of O1TURN does not extend to 3D case, actual throughput performance degrades severely. • Only known optimal throughput routing algorithm is Valiant (VAL) load-balancing, but VAL performs poorly on hop count (latency), twice that of minimal routing.

  6. Main Contribution • Developed a new oblivious routing algorithm called “Randomized Partially Minimal” (RPM) routing. • RPM provably guarantees near-optimal worst-case throughput in 3D case. • Optimal for even radix k (e.g. 8 x 8 x 8 mesh). • Within factor of 1/k2 for odd radix (e.g. 7 x 7 x 7 mesh). • Good latency performance. • Only factor of 1.33 of minimal routing (much better than 2x cost of VAL, only known routing algorithm with optimal throughput) • In practice, 3D-meshes are asymmetric because number of device layers less than number of tiles per edge. • e.g., for 16 x 16 x 4 mesh (4 layers), RPM’s hop count just factor of 1.1 of minimal routing.

  7. Outline • Motivation for our work • Existing 2D routing algorithms don’t extend well into 3D • RPM routing algorithm • Simulation results • Extensions and future work

  8. The 2D case Dimension-Ordered Routing (DOR) Route minimal XY Valiant load-balancing (VAL) Route source → randomly chosen intermediate node → destination Route minimal XY in both phases ROMM Same as VAL, but intermediate node restricted to minimal direction Orthogonal 1-TURN (O1TURN) Route minimal XY and YX with equal probability Extending to the 3D case … Dimension-Ordered Routing (DOR) Route minimal XYZ Valiant load-balancing (VAL) Route source → randomly chosen intermediate node → destination Route minimal XYZ in both phases ROMM Same as VAL, but intermediate node restricted to minimal direction Orthogonal 1-TURN (O1TURN) Route along one of 6 minimal orthogonal paths (XYZ, XZY, YXZ, YZX, ZXY, ZYX) with equal probability Existing Routing Algorithms

  9. Worst-Case Throughput • Best theoretical normalized worst-case throughput known to be 50% (well-known result). • Worst-case throughput analysis can be reduced to a maximal weighted matching problem [Towles’02]. • VAL achieves this optimal throughput, but has poor latency. • As shown next, DOR, ROMM, and O1TURN are all far from optimal in 3D.

  10. Poor Worst-Case Throughput VAL/Optimal Only6-15%

  11. How do 2D mesh algorithms fare in 3D? 8 x 8 x 8 Network • Worst case throughput of DOR, ROMM, O1TURN far from optimal • Average hop count of VAL far from minimal • Need a routing algorithm that can trade latency for worst-case throughput VAL DOR ROMM O1TURN Normalized Worst-Case Throughput 0.5 0.063 0.132 0.15 Normalized Average-Case Throughput 0.5 0.316 0.454 0.513 VAL DOR ROMM O1TURN Hop Count (normalized tominimal) 2 1 1 1

  12. Why O1TURN performs poorly in 3D? • O1TURN – Worst-Case throughput optimal for 2D but more than 3 times worse than optimal for 3D • The difference • 2D traffic matrix is “admissible” for 2D mesh • In 3D, projected traffic on each 2D plane is no longer admissible !! • Can we transform the 3D routing problem to routing admissible traffic on each 2D plane ?

  13. Outline • Motivation for our work • Existing 2D algorithms don’t extend well into 3D • RPM routing algorithm • Simulation results • Extensions and future work

  14. Randomized Partially-Minimal Routing (RPM) Phase-2 Z Intermediate layer to destination Random intermediate layer Destination Phase-1 Z Source to intermediate layer Z Y Source X XYorYX routing on the intermediate layer

  15. Main Idea • Load-balance uniformly across the vertical layers • Min XY/YX used on each layer • Main Result: RPM has near-optimal worst-case throughput • Achieves optimal worst-case throughput when network radix kis even • Within a factor of 1/k2 optimal when k is odd.

  16. RPM achieves Near-Optimal Worst Case Throughput (optimal for even radix) VAL/Optimal RPM

  17. Average-Case Throughput • RPM outperforms VAL, DOR, ROMM and O1TURN in average-throughput on randomly generated traffic.

  18. Average Hop Count • Normalized hop count of RPM • Symmetric Meshes - 1.33 times minimal compared to 2x for VAL • Asymmetric 16x16x4 Mesh – 1.1 times minimal

  19. Outline • Motivation for our work • Existing 2D routing algorithms don’t extend well into 3D • RPM routing algorithm • Simulation results • Extensions and future work

  20. Flit-Level Simulation • Ideal throughput evaluation assumes • Ideal single-cycle router • Infinite buffers • No contention in switches, no flow control • Flit-level simulation • PopNet network simulator • 4 stage router pipeline – Route computation, VC allocation, Switch arbitration, Link traversal • Credit-based flow control • 8 virtual channels, each 5 flits deep • Multi-flit packets injected into the network (5 flits/packet)

  21. Flit-Level Simulation (cont’d) • Network configurations simulated • 4 x 4 x 4 Mesh • 8 x 8 x 8 Mesh • 16 x 16 x 4 Mesh • Routing algorithms compared: DOR, VAL, ROMM, O1TURN, DUATO, RPM • DUATO is a minimal adaptive routing algorithm implemented for comparison • Four different traffic traces used • Transpose traffic – (x,y,z) → (y,z,x) • Complement traffic – (x,y,z) → (k-x-1, k-y-1, k-z-1) • Uniform traffic • Worst Case traffic pattern for DOR (DOR-WC) – (x,y,z) → (k-z-1, k-y-1, k-x-1)

  22. Uniform Traffic 8x8x8 Mesh 16x16x4 Mesh

  23. Transpose Traffic 8x8x8 Mesh 16x16x4 Mesh

  24. Complement Traffic 8x8x8 Mesh 16x16x4 Mesh

  25. DOR-WC Traffic 8x8x8 Mesh 16x16x4 Mesh

  26. To sum it up … • 3D IC technology is emerging. • Stacking cores in 3 dimensions offers several advantages over 2D placement of cores. • 2D minimal Mesh routing algorithms have poor worst-case throughput in 3D, VAL has high latency penalty. • RPM trades off latency (partially-minimal) for better worst case performance (near-optimal).

  27. Thank You Questions?

More Related