1 / 17

A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications

A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications. Gianluca Durelli , Alessandro A. Nacci , Riccardo Cattaneo, Christian Pilato, Donatella Sciuto and Marco Domenico Santambrogio Politecnico di Milano

cheung
Download Presentation

A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian Pilato, Donatella Sciuto and Marco Domenico Santambrogio Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Milano, IT [durelli, nacci, rcattaneo, pilato, sciuto]@elet.polimi.it marco.santambrogio@polimi.it 20th Reconfigurable Architectures Workshop May 20-21, 2013, Boston, USA

  2. Rationale • Strive for performance in computing intensive applications • Reconfigurable HW well suited for certain classes of applications • Multimedia, computational biology, physical simulation • FPGA used in HPC systems • High maintenance costs • need to share resources among users • Need to dynamically share and reuse components on FPGA among different users

  3. Outline Goals State of Art Proposed Solution Design and Evaluation Case Study Conclusions and Future work

  4. Goals • Design an interconnection able to: • Create different pipelines reusing available components on the FPGA • Share the resources between different applications • Not insert any stall in the pipeline • Target FPGA for HPC scenario

  5. State of Art • Introduce unexpected delays in computation • Can’t assure performance when sharing the device between different users • BUS interconnection • Congestion problem • Does not scale • Network on Chip • Possible congestion problem • Good scalability

  6. Proposed Solution • Switch based interconnection • Cores inputs connected to interconnection outputs • Cores outputs connected to interconnection inputs • Fully pipelined point-to-point communication • Data read/write only when all the inputs are available • Can be configured by setting for each input and output channels: • Switching configuration: • Multiplexer configuration to route information • From which clock cycle the channel is active • How much data have to be read/write through that channel

  7. Proposed Solution 3 2 5 4 • Suited for Dataflow/Pipelined applications • Parameters can be extracted from an high level description of the application and pipeline structure: • Possibility to automate the parameter extraction and interconnection design

  8. Implementation • Solution Implemented with HLS: • HLS well suited for dataflow/stencil loop synthesis • Simplify HW development • Generation of compatible interfaces • Maxeler Technologies: • HPC Dataflow computing exploiting FPGA • Proprietary HLS starting from Java-like description: • Proposed interconnection solution easily described in Java • MaxWorkstation 3A: • Intel i7 quad-core • Xilinx Virtex6 XC6VSX547T • PCIe communication: • Maximum 8 channels/streams

  9. Evaluation: Area Occupation • Area increment (10-30%) due to increase in switching logic • The interconnection consumes up to 6% of the FPGA: • Lot of space remains for user cores

  10. Evaluation: Frequency • Tested with pass-through cores to evaluate maximum working frequency of the interconnection (300MHz) • In case of real life applications (Brain network with cores working at 200MHz) the interconnection does not affect the critical path

  11. Case Study (D) (B) (C) (A) • Application: • Image processing pipeline (up to 4 stages): • Gray scale (GS), Gaussian blur (GB), Edge detection (ED) filters • Their combinations • Tested architectures: • Experiments: • Single execution of a N stages pipeline • Batch execution of a workload of 100 random applications

  12. Case Study: Single execution (D) (B) (C) (A)

  13. Case Study: Single execution (D) (B) (C) (A)

  14. Case Study: Batch execution • Proposed solution (D) does not introduce overhead in the overall execution timew.r.t. the other two architectures • Low system load: • Up to 30% reduction in the overall workload execution time

  15. Case Study: Batch execution • Low system load (1-2 applications): • Proposed solution (D) does not introduce delays in the execution of a single application of the workload • Higher system loads (more than 2 applications): • 10%-30% reduction in single application execution time

  16. Conclusions and Future work • Conclusion: • Design of a interconnection to support HW resource sharing in multi-application scenario • Solution suited for dataflow/pipelined systems • Possibility to realize different pipeline configurations at run-time • Future works: • Design of a mapping/reconfiguration strategy to allocate user cores and configure new core instances at run-time

More Related