1 / 22

Application-Specific Customization and Scalability of Soft Multiprocessors

Application-Specific Customization and Scalability of Soft Multiprocessors . Master’s Thesis Defense. Deepak Unnikrishnan Chair: Prof. Russell Tessier. Funded by Altera Corporation, National Science Foundation. Outline. Motivation Previous work Design Components Approach

vahe
Download Presentation

Application-Specific Customization and Scalability of Soft Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application-Specific Customization and Scalability of Soft Multiprocessors Master’s Thesis Defense Deepak Unnikrishnan Chair: Prof. Russell Tessier Funded by Altera Corporation, National Science Foundation

  2. Outline • Motivation • Previous work • Design Components • Approach • Results • Conclusion

  3. Motivation • Emerging soft multiprocessor systems and applications • Fully automated soft multiprocessor design • Easy of use • Verifiability - Existing parallel benchmarks • Flexibility - Application specific customizations • Applications: • Multi-core prototyping • End to end product designs

  4. Soft multiprocessor synthesis • FPGA based soft-multiprocessor system for specific applications • IP packet forwarding[1] • MPEG /JPEG • Synthesis of latency/throughput constrained stream applications[2] • Limitations • Tuned for a specific application • No individual processor optimizations • Not scalable [1] “An FPGA-based soft multiprocessor system for IPv4 packet forwarding”, Ravindran et al. FPL 2005. [2] “Efficient automated synthesis, programming and implementation of multi-processor platforms on FPGA chips”, Nikolov et al. , FPL 2006

  5. Optimization/Interconnection • Soft-processor optimization techniques • Pipeline stages, ISA, Shifter, Forwarding logic[1] • Custom hardware [2] • Instruction scheduling and recoding[3] • Interconnects • Bus/Network on Chip • Topologies – Ring, Star, Mesh, Hypercube[4] • Limitations • Isolated evaluation of design tradeoffs • Limited benchmarks [1] “Application-specific customization of soft processor microarchitecture,”, Yiannacouras et al., FPGA 2006. [2] “CUSTARD- A customizable threaded FPGA soft processor and tools,” Dimond et al. FPL 2007. [3] “Combining Instruction Coding and Scheduling to Optimize Energy in System-on-FPGA,”, Dimond et al. FCCM 2006. [4] “Routability of Network Topologies in FPGAs,” Saldana et al., TVLSI,March 2007

  6. Design Flow Topology StreamitApp # Processors Custom features Streamit Compiler Processor Templates (SPREE) Computation Communication Soft multiprocessor generator SoftCoreMapper Binary profiler Multiprocessor system designs Code for soft multiprocessors SPREE gcc Quartus Flow Area, Performance, Power evaluation

  7. Streamit Example – Software Radio void->void pipeline FMRadio(int N,int freq1, int freq2) { add AtoD(); add FMDemod(); add splitjoin { split duplicate; for (inti=0; i<N; i++) { add pipeline { add LowPassFilter(); add HighPassFilter(); } } join roundrobin(); } add Adder(); add Speaker(); } AtoD FMDemod Duplicate LPF1 LPF2 LPF3 HPF1 HPF2 HPF3 RoundRobin Adder Speaker Courtesy: “Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs”, Gordon et al., ASPLOS 2006

  8. Streamit Compiler Extensions Streamit Application Parsing and graph expansion Computation Communication Static order calculation Dead code elimination Partitioning Dependency Analysis Streamit SoftCoreMapper Layout Topology based rescheduling Scheduling Code generation Soft multiprocessor code Computation Communication

  9. SPREE • Soft Processor Rapid Exploration Environment • Automatic processor generation from processor descriptions • Fine granular micro-architectural customizations • Pipeline stages • Data path • Instruction set • Excellent platform for hardware-software co-design evaluation. Processor Description RTL Generator Verilog processor designs C App Quartus CAD Flow MIPS gcc Area, Power, Frequency Courtesy: “Application-specific customization of soft processor microarchitecture,”, P. Yiannacouras et al., FPGA 2006.

  10. Multiprocessor architecture 0 1 2 • Key architectural features: • Software flow control • Memory mapped I/O ports • Local on-chip memories lF/ D EX/M WB 3 4 5 lF/ D EX/M WB

  11. Multiprocessor Optimization • Topology • Interconnect buffer size • Pipeline stages • Instruction set architecture • Memory size

  12. Components • Approach • Results • Future work Experimental Framework • Multiprocessor systems of size 4,6,9 and 16 • AlteraQuartus II 8.0/Modelsim6.1g • Target Platforms – • 90nm Cyclone II (Altera DE2 board) • 90nm Stratix II • 65nm Stratix III (Altera DE3 board) • 16 soft-multiprocessor systems implemented on DE3 (65nm Stratix III)

  13. Benchmarks

  14. Experimental Results - Topology 0 1 2 S->In Out->E W->E S->E W->S W->S • Topology • Mesh • Point-to-point 3 4 5 Out->N Out->E W->In Out->N N->In N->In • Comm schedule - graph • For each generated data • { • Discover hop edges • Eliminate hop edges • Insert point to point edges • } • Reschedule communication 0 3->In Out->5 3 4 5 Out->0 Out->4 3->In Out->5 3->In 5->In Point to point Topology

  15. 1 3 2 4 Layout of a 16 processors on Stratix II device[1] [1]Generated using Quartus Chip Planner for Stratix II

  16. Approach • Results • Future Work • Conclusion Future Work • Evaluate the impact of Streamit compiler optimizations on soft-multiprocessor systems. • Example: Choice of partitioning – greedy, dynamic programming • Evaluate the effect of increasing processor pipelining on soft-multiprocessor system. • More aggressive processor optimizations • Application specific on-chip memory size reduction • Target 32-64 soft processors on FPGA with larger applications • Impact of optimizations on system power

  17. Thesis Completion Timeline

  18. Results • Future Work • Conclusion • References Conclusion • Fully automatic and scalable flow for design evaluation of large soft-multiprocessor systems. • Choice of interconnect topologies • Choice of multiple system level optimizations • Diverse set of benchmarks • Preliminary results indicate 3-5X speedup for 16 processor systems and 27% area savings • Mesh vs Point to point interconnect topologies were evaluated • Work presented at FCCM’ 2009, Napa valley, CA

  19. Components • Approach • Results • References References [1] J. Cong, G. Han, W. Jiang, “Synthesis of an application-specific soft multiprocessor system,” In International Conference on Field Programmable Logic and Applications, 2007 [2] K. Ravindran, N. Satish, Y. Jin, K. Keutzer, “An FPGA-based soft multiprocessor system for IPv4 packet forwarding,” In International Conference on Field Programmable Logic and Applications(FPL),August 2005. [3]H. Nikolov, T. Stefanov, E. Deprettere, “Efficient automated synthesis, programming, and implementation of mult-processor platforms on FPGA chips,” In International Conference on Field Programmable Logic and Applications(FPL), August 2006. [4] M. I. Gordon, W. Thies, S. Amarasinghe, “Exploiting coarse-grained task, data, and pipeline parallelism in stream programs,” In International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), March 2006. [5] P. Yiannacouras, J.G. Steffan, J. Rose. “Application-specific customization of soft processor microarchitecture,” In International Symposium on Field-Programmable Gate Arrays(FPGA), February 2006. [6] M. Taylor, “The Raw prototype design document,” Technical Reports ‘05, Massachusetts Institute of Technology [7] J.P. Derutin, L.Damez, A. Desportes, J.L.L. Galilea, “Design of a scalable network of communicating soft processors on FPGA,” In International Workshop on Computer Architecture for Machine Perception and Sensing(CAMPS), September 2006.

  20. References [8] O. Hebert, I.C. Kraljic, Y. Savaria. “A method to derive application-specific embedded processing cores,” In International Conference on Hardware Software Codesign(CODES), September 2000. [9] R. Dimond, O. Mencer, W. Luk, “CUSTARD- A customizable threaded FPGA soft processor and tools,” In International Conference on Field Programmable Logic and Applications(FPL), August 2007. [10] R.G. Dimond, O. Mencer, W. Luk, “Combining Instruction Coding and Scheduling to Optimize Energy in System-on-FPGA,” In IEEE Symposium on Field-Programmable Custom Computing Machines(FCCM), April 2006. [11] B. Fort, D. Capalija, Z. Vranesic, and S. Brown, “A multithreaded soft processor for SoPC area reduction,” In IEEE Symposium on Field-Programmable Custom Computing Machines(FCCM), Apr. 2006. [12] M. Labrecque, J.G. Steffan, “Improving pipelined soft processors with multithreading,” In International Conference on Field-Programmable Logic and Applications (FPL), August 2007. [13] M.A.R. Saghir, M. El-Majzoub, P. Akl, “Datapath and ISA customization for soft VLIW processors,” In IEEE International Conference on Reconfiguurable Computing and FPGAs(ReConFig), September 2006. [14] M. Saldana, L. Shannon, J.S. Yue, S. Bian, J. Graig, P. Chow, “Routability of Network Topologies in FPGAs,” In IEEE Transactions on Very Large Scale Integration Systems(VLSI),March 2007

  21. Streamit Application Target number of Processors Streamit Compiler RAW backend Tile Code Switch Code SoftCoreMapper Interconnect Topology Customizable Processor Templates Binary Profiler Application Specific Soft-Multiprocessor System Generator Tile code for soft processors SPREE MIPS gcc compiler Verilog Multiprocessor system designs Quartus Compiler Flow Memory Initialization Binaries Area,Performance,Power evaluation

More Related