A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications

A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian Pilato, Donatella Sciuto and Marco Domenico Santambrogio Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Milano, IT [durelli, nacci, rcattaneo, pilato, sciuto]@elet.polimi.it marco.santambrogio@polimi.it 20th Reconfigurable Architectures Workshop May 20-21, 2013, Boston, USA

Rationale • Strive for performance in computing intensive applications • Reconfigurable HW well suited for certain classes of applications • Multimedia, computational biology, physical simulation • FPGA used in HPC systems • High maintenance costs • need to share resources among users • Need to dynamically share and reuse components on FPGA among different users

Outline Goals State of Art Proposed Solution Design and Evaluation Case Study Conclusions and Future work

Goals • Design an interconnection able to: • Create different pipelines reusing available components on the FPGA • Share the resources between different applications • Not insert any stall in the pipeline • Target FPGA for HPC scenario

State of Art • Introduce unexpected delays in computation • Can’t assure performance when sharing the device between different users • BUS interconnection • Congestion problem • Does not scale • Network on Chip • Possible congestion problem • Good scalability

Proposed Solution • Switch based interconnection • Cores inputs connected to interconnection outputs • Cores outputs connected to interconnection inputs • Fully pipelined point-to-point communication • Data read/write only when all the inputs are available • Can be configured by setting for each input and output channels: • Switching configuration: • Multiplexer configuration to route information • From which clock cycle the channel is active • How much data have to be read/write through that channel

Proposed Solution 3 2 5 4 • Suited for Dataflow/Pipelined applications • Parameters can be extracted from an high level description of the application and pipeline structure: • Possibility to automate the parameter extraction and interconnection design

Implementation • Solution Implemented with HLS: • HLS well suited for dataflow/stencil loop synthesis • Simplify HW development • Generation of compatible interfaces • Maxeler Technologies: • HPC Dataflow computing exploiting FPGA • Proprietary HLS starting from Java-like description: • Proposed interconnection solution easily described in Java • MaxWorkstation 3A: • Intel i7 quad-core • Xilinx Virtex6 XC6VSX547T • PCIe communication: • Maximum 8 channels/streams

Evaluation: Area Occupation • Area increment (10-30%) due to increase in switching logic • The interconnection consumes up to 6% of the FPGA: • Lot of space remains for user cores

Evaluation: Frequency • Tested with pass-through cores to evaluate maximum working frequency of the interconnection (300MHz) • In case of real life applications (Brain network with cores working at 200MHz) the interconnection does not affect the critical path

Case Study (D) (B) (C) (A) • Application: • Image processing pipeline (up to 4 stages): • Gray scale (GS), Gaussian blur (GB), Edge detection (ED) filters • Their combinations • Tested architectures: • Experiments: • Single execution of a N stages pipeline • Batch execution of a workload of 100 random applications

Case Study: Single execution (D) (B) (C) (A)

Case Study: Batch execution • Proposed solution (D) does not introduce overhead in the overall execution timew.r.t. the other two architectures • Low system load: • Up to 30% reduction in the overall workload execution time

Case Study: Batch execution • Low system load (1-2 applications): • Proposed solution (D) does not introduce delays in the execution of a single application of the workload • Higher system loads (more than 2 applications): • 10%-30% reduction in single application execution time

Conclusions and Future work • Conclusion: • Design of a interconnection to support HW resource sharing in multi-application scenario • Solution suited for dataflow/pipelined systems • Possibility to realize different pipeline configurations at run-time • Future works: • Design of a mapping/reconfiguration strategy to allocate user cores and configure new core instances at run-time

A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications

A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications

Presentation Transcript

Reconfigurable Computing for Space Applications

FLIP: Flexible Interconnection Protocol

Reconfigurable FPGA Technology Radiation - Continuing

Reconfigurable Data Path Processor for Space Applications

FPGA: Applications and Examples

A Reconfigurable OSSIE/GNU Radio Component for SDR Applications

FPGA: From Flashing LED to Reconfigurable Computing

A survey on Reconfigurable Computing for Signal Processing Applications

Design Technology for Networked Reconfigurable FPGA Platforms

Emerging Memory Technologies for Reconfigurable Routing in FPGA Architecture

Reconfigurable Computing - FPGA structures

Scientific Applications on a NASA Reconfigurable Hypercomputer

Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning

FPGA and Reconfigurable Computing

Reconfigurable Computing Applications

A survey on Reconfigurable Computing for Signal Processing Applications

a LRTAP dataflow

FPGA Run-time Reconfigurable Placement

reconfigurable/fpga computing part 1

Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning

A Reconfigurable FPGA Architecture for DSP Transforms

Flexible Antennadesign for Wave Applications