1 / 40

ARIADNE A gnostic R econfiguration I n A D isconnected N etwork E nvironment

ARIADNE A gnostic R econfiguration I n A D isconnected N etwork E nvironment. Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan), Li- Shiuan Peh (MIT), Valeria Bertacco (Michigan). What is “reconfiguration”?. Silicon technologies move into the nanometer regime

ciro
Download Presentation

ARIADNE A gnostic R econfiguration I n A D isconnected N etwork E nvironment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ARIADNEAgnostic ReconfigurationIn ADisconnected Network Environment Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan), Li-ShiuanPeh (MIT), Valeria Bertacco (Michigan)

  2. What is “reconfiguration”? Silicon technologies move into the nanometer regime …transistors become unreliable In future chips of 100 billion transistors, 10% of transistors will eventually fail over the lifetime of the chip for architects: permanent faults Shekhar Borkar Our focus in this talk: Network-on-Chip cannot resend need to re-route around the fault P S D P$ S$ NIC reconfiguration: “the process of replacing the routing algorithm” R

  3. Why is reconfiguration challenging? • XY routing X S Y D

  4. Agnostic Reconfiguration algorithm In A Disconnected Network Environment Why is reconfiguration challenging? • XY routing S D

  5. Why is reconfiguration challenging? • XY routing Agnostic Reconfiguration algorithm S In A Disconnected Network Environment D

  6. Outline • Motivation • Ariadne • Baseline • Deadlocks • Synchronization • Evaluation • Overhead • Performance • Reliability • Conclusions

  7. How will S find a path to D? ? S D

  8. How will S find a path to D? RT RT RT RT S • … • … • … • … D: D: D: D: S W E N • … • … • … • … D

  9. How will S find a path to D? RT RT RT RT RT RT S • … • … • … • … • … • … D: D: D: D: D: D: W,N E,N S W,S E,S N • … • … • … • … • … • … D

  10. How will S find a path to D? S D

  11. How will S find a path to D? RT RT RT S • … • … • … D: D: D: W W S • … • … • … D

  12. ARIADNE: baseline • Upon a fault that changes the topology… • a nodecan let everyone know how it can be reached with a single broadcast • N nodes can let everyone know how they can be reached with N broadcasts

  13. ARIADNE: baseline statically assigned node IDs • Upon a fault that changes the topology… • Every node broadcasts “in-turn” to let others know how it can be reached 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 • last • 2nd • 3rd 16 17 18 19 20 21 22 23 • 1st 17 20 19 • fault detector: 18 24 25 27 28 29 30 31 24 • … 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

  14. ARIADNE: baseline • Upon a fault that changes the topology… • Every node broadcasts “in-turn” to let others know how it can be reached • last • 2nd • 3rd • 1st 20 17 19 • fault detector: 18 • … • Issues: • deadlock avoidance • synchronization • (when to broadcast, multiple detectors)

  15. ARIADNE: deadlocks S D

  16. ARIADNE: deadlocks S S D D S D

  17. ARIADNE: deadlocks up*/down* disable routes where rank goes 0 1 2 3 4 5 6 7 first bcast ONLY: nodes are assigned ranks higher 8 9 10 11 12 13 14 15 bcaster “root” 0 16 17 18 19 20 21 22 23 • down lower • up immediate neighbors 1 24 25 26 27 28 29 30 31 assume routing circle: 1 node will have higher rank than its neighbors, breaking the circular route 2-hop neighbors 2 32 33 34 35 36 37 38 39 r 3-hop neighbors 3 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 unique ordering: among nodes with same rank, arbitrarily select a higher one 56 57 58 59 60 61 62 63

  18. ARIADNE: deadlocks up*/down* disable routes where rank goes first bcast ONLY: nodes are assigned ranks higher bcaster “root” 0 • down lower • up immediate neighbors 1 assume routing circle: 1 node will have higher rank than its neighbors, breaking the circular route 2-hop neighbors 2 r 3-hop neighbors 3 unique ordering: among nodes with same rank, arbitrarily select a higher one

  19. ARIADNE: deadlocks up*/down* disable routes where rank goes first bcast ONLY: nodes are assigned ranks S higher bcaster “root” 0 • down lower • up immediate neighbors 1 D assume routing circle: 1 node will have higher rank than its neighbors, breaking the circular route 2-hop neighbors 2 r 3-hop neighbors 3 unique ordering: among nodes with same rank, arbitrarily select a higher one connectivity: can reach any node via the root S D

  20. ARIADNE: deadlocks • Upon a fault that changes the topology… • Every node broadcasts “in-turn” to let others know how it can be reached • Issues: • deadlock avoidance • synchronization

  21. ARIADNE: deadlocks • Upon a fault that changes the topology… • Every node broadcasts “in-turn” to let others know how it can be reached RULE: (i) first broadcast ranks nodes (ii) remaining broadcasts spread ONLY via enabled turns • Issues: • synchronization

  22. ARIADNE: synchronization • 1-bit • 1-bit • arbitration • 1-bit • 1-bit • 1-bit • 1-bit • how does the recipient of a flag know the broadcasting node? • 1-bit • 1-bit • “in turn” broadcasts: when does previous broadcast complete? can broadcasts overlap? • NO

  23. ARIADNE: synchronization Solution : Atomic Broadcasts • Nodes utilize the cycle count as a global reference point • Each node is assigned a unique broadcast slot from the “global” cycle counter

  24. ARIADNE: synchronization 0 1 2 3 cycle count (same for all nodes) bcast node bcast cycle 0 1 1 1 0 0 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 6 6 5 4 4 9 5 0 15 15 15 0 7 15 4 5 5 6 7 log(16) bits log(16) bits … X X X X X X 8 9 10 11 5 initiates bcast … 5’s bcast completes 6’s bcast completes 4’s bcast completes 12 13 14 15 waits for 5 0 6 initiates bcast … longest (in hops) broadcast … … reconfiguration completes in (16)2 =(number of nodes)2 cycles

  25. ARIADNE: synchronization 0 1 2 3 cycle count (same for all nodes) bcast node bcast cycle 0 1 1 1 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 1 0 1 1st hop 5 5 5 4 9 5 0 1 7 15 2 15 4 5 5 6 7 … X X X X X X 2nd hop 8 9 8 10 11 5 initiates bcast 5’s bcast completes 12 13 14 15 8 resigns from becoming the root node waits for waits for 8 5 0 0 … (!) we need to reconfigure once even for multiple faults

  26. Outline • Motivation • Ariadne • Baseline • Deadlocks • Synchronization • Evaluation • Overhead • Performance • Reliability • Conclusions

  27. Evaluation: Overhead Evaluation • On-chip routing algorithms for irregular topologies Immunet (V. Puente, ISCA’04) • Vicis routing algo • (D. Fick, DATE’09) • exceptions to turn model to apply it to an arbitrary topology • reserves an escape VC for deadlock freedom (routes deterministically in a ring) ARIADNE 6.0% 2.0% 1.5% • synthesized a baseline 5-stage pipelined router (5 ports, 2 VCs, 5-flit buffer/VC) • with Synopsys Design Compiler (IBM 130nm target library): • router area (mm2): baseline=2.708, Ariadne=2.761, Vicis=2.748, Immunet=2.870

  28. Evaluation: Performance • Average over 10 PARSEC benchmarks • 1000 fault configurations • Experimental Setup: Garnet + GEMS  lower is better • System Configuration (GEMS) • deadlocks •  traffic • Network Architecture (GARNET) • routing in • a ring

  29. Evaluation: Performance + Reliability • On-chip routing algorithms for irregular topologies Immunet (V. Puente, ISCA’04) • Vicis routing algo • (D. Fick, DATE’09) • exceptions to turn model to apply it to an arbitrary topology • reserves an escape VC for deadlock freedom (routes deterministically in a ring) ARIADNE 6.0% 2.0% 1.5%

  30. Outline • Motivation • Ariadne • Baseline • Deadlocks • Synchronization • Evaluation • Overhead • Performance • Reliability • Conclusions

  31. Conclusions We have presented Ariadne. • a reconfiguration algorithm that provides deadlock-free routing paths in irregular network topologies that result from faulty links • is implemented in a fully distributed mode, resulting in simple hardware and low complexity • enables a trade-off between performance and reliable functionality on unreliable silicon

  32. Thank You! Questions? [source: wikipedia] The Greek legend of Princess Ariadne • “Ariadne (Αριάδνη), was the daughter of King Minos of Crete. Minos attacked Athens after his son was killed there. The Athenians asked for terms, and were required to sacrifice seven young men and seven maidens every nine years to the Minotaur, a monster with the head of a bull on the body of a man. One year, the sacrificial party included Theseus, a young man who volunteered to come and kill the Minotaur. Ariadne fell in love at first sight, and helped him by giving him a ball of red fleece thread that she was spinning, to find his way out of the Minotaur's labyrinth.” …similarly to Princess Ariadne, our Ariadne algorithm helps packets find their way in the labyrinth-like topology of a faulty network.

  33. BACKUP SLIDES

  34. Evaluation: Results (reconfiguration dynamics)

  35. Ariadne Architecture

  36. Ariadne vs. off-chip approaches • Off-chip networks utilize centralized software algorithms for reconfiguration (-) topology needs to be communicated to a central node (-) interfacing with OS (-) updated routing tables need to be delivered back to each node C OS

  37. Ariadne: Motivation • Many transistor failures are expected to occur at NoCs fabricated at advanced technology nodes • These failures will result in router link failures and disconnected routers

  38. Evaluation: Related Work • On-chip routing algorithms for irregular topologies implemented for comparison Immunet (V. Puente, ISCA 2004) • Vicis routing algorithm • (D. Fick, DATE 2009) • 1 escape VC reserved for deadlock freedom: routes deterministically in a ring S • Other VCs route adaptively. BUT, if no other VC is available, a packet switches to escape VC D • latency  when traffic  • 6% overhead (3 routing tables) • reliability

  39. Evaluation: Related Work • On-chip routing algorithms for irregular topologies implemented for comparison Immunet (V. Puente, ISCA 2004) • Vicis routing algorithm • (D. Fick, DATE 2009) • No faults? use turn model • 1 escape VC reserved for deadlock freedom: routes deterministically in a ring • Upon fault occurrence: • re-enable disabled turns to increase connectivity • Other VCs route adaptively. BUT, if no other VC is available, a packet switches to escape VC • latency  when traffic  • 6% overhead (3 routing tables) ? • reliability • (assuming north last) • (assuming north last) • (assuming north last)

  40. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

More Related