1 / 45

Augmenting FPGAs with Embedded Networks-on-Chip

Augmenting FPGAs with Embedded Networks-on-Chip. Mohamed ABDELFATTAH Vaughn BETZ. Outline. 1. Why NoCs on FPGAs?. 2. Embedded NoCs. 3. Comparison Against Buses. 1. Why NoCs on FPGAs?. Motivation. Logic Blocks. Switch Blocks. Wires. Interconnect. 1. Why NoCs on FPGAs?.

maris
Download Presentation

Augmenting FPGAs with Embedded Networks-on-Chip

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Augmenting FPGAs with Embedded Networks-on-Chip Mohamed ABDELFATTAH Vaughn BETZ

  2. Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3 Comparison Against Buses

  3. 1. Why NoCs on FPGAs? Motivation Logic Blocks Switch Blocks Wires Interconnect

  4. 1. Why NoCs on FPGAs? Motivation Logic Blocks Switch Blocks • Hard Blocks: • Memory • Multiplier • Processor Wires

  5. 1. Why NoCs on FPGAs? Motivation 1600 MHz Hard Interfaces DDR/PCIe .. Logic Blocks 800 MHz Switch Blocks Interconnect still the same • Hard Blocks: • Memory • Multiplier • Processor Wires 200 MHz

  6. 1. Why NoCs on FPGAs? Motivation 1600 MHz Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure DDR3 PHY and Controller PCIe Controller 800 MHz 200 MHz Gigabit Ethernet

  7. 1. Why NoCs on FPGAs? Motivation Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated DDR3 PHY and Controller PCIe Controller Gigabit Ethernet

  8. Source: Google Earth Los Angeles Barcelona Keep the “roads”, but add “freeways”. Logic Cluster Hard Blocks

  9. 1. Why NoCs on FPGAs? FPGA with NoC NoC Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated DDR3 PHY and Controller Router forwards data packet PCIe Controller Links Router moves data to local interconnect Routers Gigabit Ethernet

  10. 1. Why NoCs on FPGAs? FPGA with NoC Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated • Abstraction favours modularity: • Parallel compilation • Partial reconfiguration • Multi-chip interconnect DDR3 PHY and Controller PCIe Controller • High bandwidth endpoints known • Pre-design NoC to requirements Gigabit Ethernet • NoC links are “re-usable” • NoC is heavily “pipelined” • NoC abstraction favors modularity

  11. 1. Why NoCs on FPGAs? FPGA with NoC Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated • Abstraction favours modularity: • Parallel compilation • Partial reconfiguration • Multi-chip interconnect DDR3 PHY and Controller PCIe Controller Gigabit Ethernet • Latency-tolerant communication • NoC abstraction favors modularity

  12. 1. Why NoCs on FPGAs? Compute Acceleration GPU CPU • Maxeler • Geoscience (14x, 70x) • Financial analysis (5x, 163x) • Altera OpenCL • Video compression (3x, 114x) • Information filtering (5.5x)

  13. 1. Why NoCs on FPGAs? Compute Acceleration

  14. 1. Why NoCs on FPGAs? Compute Acceleration

  15. 1. Why NoCs on FPGAs? Compute Acceleration NoC

  16. Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs Mixed NoCs Hard NoCs 3 Comparison Against Buses

  17. 2. Embedded NoCs Embedded NoCs = + “Soft” NoC Soft Routers Soft Links = + “Mixed” NoC Hard Routers Soft Links = + “Hard” NoC Hard Routers Hard Links

  18. Methodology Soft Mixed Hard FPGA CAD Tools ASIC CAD Tools Area Speed Design Compiler Power? Power HSPICE Gate-level simulation Gate-level simulation Toggle rates

  19. 2. Embedded NoCs Mixed NoCs Logic blocks FPGA Programmable “soft” interconnect Router Baseline Router = + “Mixed” NoC Hard Routers Soft Links

  20. 2. Embedded NoCs Mixed NoCs FPGA Router = + “Mixed” NoC Hard Routers Soft Links 20

  21. 2. Embedded NoCs Mixed NoCs FPGA Router Special Feature Configurable topology Assumed a mesh  Can form any topology

  22. 2. Embedded NoCs Hard NoCs Logic blocks FPGA Programmable “soft” interconnect Dedicated “hard” interconnect Router = + “Hard” NoC Hard Routers Hard Links 22

  23. 2. Embedded NoCs Hard NoCs FPGA Router = + “Hard” NoC Hard Routers Hard Links 23

  24. 2. Embedded NoCs Hard NoCs 1.1 V 0.9 V FPGA Router Special Feature Low-V mode Save 33% Dynamic Power ~15% slower = + “Hard” NoC Hard Routers Hard Links 24

  25. 3. Area/Power Analysis Soft, Mixed and Hard [65 nm] 64-node NoC on Stratix III Hard Mixed Soft 448 LBs 576 LBs ~12,500 LBs Area 33% of FPGA ~ 1.5% of FPGA 64 – NoC Speed 730 – 940 MHz 166 MHz ~ 50 GB/s Speed ~ 10 GB/s Bisection BW

  26. 3. Area/Power Analysis Soft, Mixed and Hard [65 nm] 64-node NoC on Stratix III Provides ~50GB/s peak bisection bandwidth Very Cheap! Less than cost of 3 soft nodes Hard (Low-V) Mixed Soft 448 LBs 576 LBs ~12,500 LBs Area 33% of FPGA ~ 1.5% of FPGA 64 – NoC Speed 730 – 940 MHz 166 MHz ~ 50 GB/s Speed ~ 10 GB/s Bisection BW

  27. 3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 123% How much is used for system-level communication? 17.4 W Largest Stratix-III device Typical FPGA Dynamic Power

  28. 3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 123% 15% NoC 17.4 W Typical FPGA Dynamic Power

  29. 3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 11% 123% 15% NoC 17.4 W Typical FPGA Dynamic Power

  30. 3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 7% 11% 123% 15% NoC 17.4 W Typical FPGA Dynamic Power

  31. 3. Area/Power Analysis Bandwidth in Perspective DDR3  Module 1 PCIe Module 2 14.6 GB/s Full theoretical BW 14.6 GB/s Cross whole chip! 17 GB/s 17 GB/s 17 GB/s 17 GB/s 14.6 GB/s Aggregate Bandwidth 126 GB/s 14.6 GB/s NoC Power Budget 3.5%

  32. Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3 Comparison Against Buses Area/Power Efficiency Design Effort

  33. 4. Comparison DDR3: Qsys Bus vs. NoC Embedded NoC: 16 Nodes, hard routers & links Qsys bus: Build logical bus from fabric

  34. 4. Comparison DDR3: Qsys Bus vs. NoC “The Case for Embedded Networks-on-Chip on FPGAs” To appear in IEEE Micro Magazine (February) Embedded NoC: 16 Nodes, hard routers & links Qsys bus: Build logical bus from fabric

  35. 4. Comparison Design Effort close • Steps to close timing using Qsys FPGA

  36. 4. Comparison Design Effort far • Steps to close timing using Qsys FPGA

  37. 4. Comparison Design Effort far • Steps to close timing using Qsys FPGA Timing closure can be simplified with an embedded NoC

  38. 4. Comparison Area Comparison

  39. 4. Comparison Area Comparison

  40. 4. Comparison Area Comparison Entire NoC smaller than bus for 3 modules!

  41. 4. Comparison Area Comparison 1/8 Hard NoC BW used  already less area for most systems

  42. 4. Comparison Power Comparison Hard NoC saves power for even the simplest systems

  43. Why NoCs on FPGAs? 1 Big city needs freeways to handle traffic Embedded NoCs: Mixed & Hard 2 Power: 9-15X Area: 20-23X Speed: 5-6X • Area Budget for 64 nodes: ~1% • Power Budget for 100 GB/s: 3-7% 3 Comparison Against P2P/Buses • Raw efficiency close to simplest P2P links • NoC more efficient & lower design effort.

  44. Thank You! www.eecg.utoronto.ca/~mohamed

More Related