200 likes | 456 Views
Parallel Computing Using FPGA ( Field Programmable Gate Arrays ). Studies in Parallel & Distributed Systems – 159.735. Sohaib Ahmed. 15 th May, 2009. Outlines. FPGAs and their internal structures Why use FPGAs for parallel computing ? Types of FPGAs
E N D
Parallel Computing Using FPGA (Field Programmable Gate Arrays) Studies in Parallel & Distributed Systems – 159.735 Sohaib Ahmed 15th May, 2009
Outlines • FPGAs and their internal structures • Why use FPGAs for parallel computing ? • Types of FPGAs • Application Examples and Processing in Applications • FPGAs in Parallel Computing • FPGA Limitations • Design Methods for FPGAs • Conclusion
FPGAs - Introduction • Ross Freeman, one of the Xilinx founder (www.xilinx.com) invented FPGAs in mid-1980s • Other vendors include Altera, Actel, Lattice Semiconductor and Atmel • Support the notion of reconfigurable computing • Reconfigurable Computing • Use of multiple reconfigurable devices (such as FPGAs) and multiple microprocessors • Processor(s) execute sequential and non-critical code while reconfigurable fabric (FPGAs) performed that code which can be mapped efficiently to hardware
FPGAs Internal Structure A semiconductor device consisting of : • Configurable Logic Blocks (CLBs) • Input/Output (I/O) Blocks (IOBs) • Static RAM (SRAM) Blocks • Digital Signal Processing Blocks (DSPBs)
Why using FPGAs ? • Speed up • Hardware is faster than software [1] • FPGAs can support thousand-fold parallelism especially for low-precision computations • Cost • Development cost is much less than ASIC (Application-specific integrated circuits) for lower volumes • Flexibility • FPGAs are flexible as compare to ASIC as they can be reprogrammable
Types of FPGAs • CPLDs ( Complex Programmable Logic Devices) • Requires voltage levels that are not usually present on computer systems • Anti-fuse based devices • Program only once • Static-RAM-Based Services • Can be programmed while the device is running
Application Examples • Virtex-II Pro • Virtex-4 • Xilinx Devices • Recent success of FPGA in Tsubame Cluster in Tokyo • Improved performance by additional 25%
Dynamic matching of a node to the computational requirement of an application Application specific computers become more flexible Enables the support of multi modes of parallel computing : MIMD, SIMD etc Partial reconfiguration can allow better hardware resource utilization Can extend dynamic task allocation scheme to allow for dynamic hardware allocation Support for variable grain size FPGAs in Parallel Computing
Capacity Logic blocks have not dense representation as instructions have Conventional processor run 90 % of code that takes 10 % of execution time Reconfigurable logic takes 10 % of code that takes 90 % of execution time Tools Compilers for reconfigurable logic are not very good Some operations are hard to implement on FPGAslike random access and pointer-based data structures FPGAs Limitations
Use an algorithm optimal for FPGAs Systolic arrays for correlation are efficient Use a computing mode appropriate for FPGAs Streaming, systolic, arrays of fine-grained automata preferable Searching biomedical databases for similar sequences Use appropriate FPGA structures Analyzing DNA or protein sequences A straightforward systolic array Design Methods for FPGA [3]
Design Methods for FPGA [3] • Living with Amdahl’s Law • Speeding up an application significantly through an enhancement requires most of the application to be enhanced • NAMD & ProtoMol framework was designed for computational experimentation • Hide latency of independent functions • Latency hiding is a basic technique for achieving high performance in parallel applications • Functions on the same chip to operate in parallel • Use rate-matching to remove bottlenecks • Function level parallelism is built in
Design Methods for FPGA [3] • Take advantage of FPGA-specific hardware • Hard-wired components such as integer multipliers and independently accessible BRAMs (Block RAMs) • Xilinx VP100 has 400 independent accessible, 32-bit quad-ported BRAMs can help in achieving 20 Terabytes per sec at capacity • Use appropriate arithmetic precision • Use appropriate arithmetic mode • Minimize use of high-cost arithmetic operations
Current Progress in Hardware & Software • SRC-6 and SRC-7 are parallel architectures in which cross bar switch that can be piled for scalability • High performance computing vendors like Silicon Graphics Inc. (SGI), Cray and Linux Networx incorporated FPGAs in their parallel architectures [4] • VHDL, Verilog are used to create hardware kernel • Other hardware description languages like Carte C, Carte Fortran, Impulse C, Mitrion C and Handel-C are used. • Annapolis Micro Systems’ CoreFire, Starbridge Systems’ Viva, Xilinx System Generator and DSPlogic’s reconfigurable computing toolbox are the high-level graphical programming development tools [5]
Conclusion Using FPGAs in Parallel computing offer following benefits : • Application acceleration • Flexibility in terms of application domain • Potential cost benefits over ASICs • The ability to exploit variable levels and modes of parallelism • More effective use of hardware resources
References [1] Todman,T.J,Constantinides, G.A, Witon, S.J.E, Mencer,O., Luk,W. & cheung, P.Y.K (2005) Reconfigurable computing : architectures and design methods [2] Altera Cooperation White Paper (2007). Accerating high performance computing with FPGAs. October 2007 [3] Herbordt, M.C., VanCourt, T., Yongfeng, G., Shukhwani, B., Conti,A., Model,J. & Disabello,D. (2007). Achieving high performance with FPGA-Based computing [4] Buell, D., El-Ghazawi, T., Gaj,K.,& Kindratenko,V. (2007). High-Performance reconfigurable computing. IEEE Computer Society, March, 2007 [5] El-Ghazawi, T., El-Araby,E., Miaoqing Huang, Gaj,K., Kindratenko, V.,& Buell, D. (2008).The promise of high- performance reconfigurable computing. IEEE computer society, February, 2008 pp. 69 -76. Any Questions ? Thank You