1 / 16

National Sun Yat-sen University Embedded System Laboratory

National Sun Yat-sen University Embedded System Laboratory. SAGA : SystemC Acceleration on GPU Architectures. Presenter: Ming- Shiun Yang. Design Automation Conference (DAC), 2012 49 th ACM/EDAC/IEEE Sara Vinco (Italy), Debapriya Chatterjee (USA),

beau
Download Presentation

National Sun Yat-sen University Embedded System Laboratory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. National Sun Yat-sen University Embedded System Laboratory SAGA : SystemC Acceleration on GPU Architectures Presenter: Ming-Shiun Yang Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE Sara Vinco(Italy), DebapriyaChatterjee(USA), Valeria Bertacco(USA), Franco Fummi(Italy) 2013/01/21

  2. Abstract SystemC is a widespread language for HW/SW system simulation and design exploration, and thus a key development platform in embedded system design. However, the growing complexity of SoC designs is having an impact on simulation performance, leading to limited SoC exploration potential, which in turns affects development and verification schedules and time-to-market for new designs. Previous efforts have attempted to parallelize SystemC simulation, targeting both multiprocessors and GPUs. However, for practical designs, those approached fall far short of satisfactory performance. This paper proposes SAGA, a novel simulation approach that fully exploits the intrinsic parallelism of RTL SystemC descriptions, targeting GPU platforms. By limiting synchronization events with ad-hoc static scheduling and separate independent dataflows, we shows that we can simulate complex SystemC descriptions up to 16 times faster than traditional simulators.

  3. What’s the problem • Original SystemC simulation • Use scheduler to dispatch all processesto one core. • Sequential processing. • The growing complexity of SoC designs is having impact on simulation performance.

  4. Related Works Heavy overhead Code modification [1,4,9,10] Parallel SystemC Environment [7]CUDA Programming Guide [3]HIFSuite General purpose programming interface Mapping SystemC to CUDA This Paper

  5. NVIDIA CUDA Architecture • Compute Unified Device Architecture (CUDA) • An interface is proposed to GP-GPU programming • GPU is a co-processor capable of executing many threads in parallel Mapping SystemC to CUDA : HIFSuite: hif2C HIFSuite: sc2hif CUDA SystemC HIF file C file

  6. Proposed Method • SAGA • Exploit scheduling to eliminate the need of frequent synchronization. • Carve independent dataflows and then mapped to distinct threads and processors. (Parallel execution) HIFSuite: sc2hif HIFSuite: hif2C SystemC HIF file SAGA HIF file modified CUDA C file Proposed Simulator (SAGA) Traditional Simulator

  7. SAGA methodology–Steps1. Construction the dependency graph

  8. SAGA methodology–Step 2 : Partitioning into concurrent dataflows

  9. Example – step 2 Queue Current dataflow list P8 P8 Queue ≠ Empty, pop P8 Queue Current dataflow list P6 P7 P8 P6 Queue ≠ Empty, pop P6 Queue Current dataflow list P7 P1 P2 P7 P6 P8 Queue ≠ Empty, pop P7 Queue Current dataflow list P3 P1 P2 P4 P6 P7 P8 Current dataflow list P6 P7 P8 P3 P1 P2 P4

  10. SAGA methodology–Step 3 : Process levelization and scheduling

  11. Example – step 3 -1 2 -1 2 -1 -1 1 1 1 -1 0 0 0 0 0 0 0 1. Set all leaf nodes to 0 level 2. Set all non-leaf nodes to -1 level 3. if parent level < child level, parent level = child level +1 ex. P6’s level < P1’s level  P6’s level = 0+1 =1 …

  12. Experimental setup • Column 3 : loc – line of codes • Column 4 : Dataflows (#) – partition number of dataflows in step 2. • Column 5 : Replicated processes / the maxmum amount of replication for these process

  13. SAGA Performance and Speedup • 16 times faster than traditional SystemC simulator.

  14. Costs of Compilation • HIFSuite : A set of tools and APIs that provide support for modeling and verification of HW/SW system. HIFSuite: sc2hif HIFSuite: hif2C SystemC HIF file SAGA HIF file modified CUDA C file

  15. Conclusion • Proposed a parallel schedule method for SystemC simulator. • A novel partitioning technique to carve independent dataflows mapped to distinct threads and multi-processors.

  16. My common • The time of translating SystemC to CUDA by HIFSuite is so long. • They expect that a mature version could operate directly on SystemC source code (future work) • This paper is good • illustrate clearly • Experiment result • Achieve their goal (reduce the simulation time) • Many analysis • Compare with other works

More Related