1 / 153

Program Development Environments

Program Development Environments. Languages & Tools. Kris Gaj George Mason University. Acknowledgements. Companies, centers, and sponsors. AMI Cray Mitrion NCSA SGI SRC Star Bridge DoD/LUCITE. Acknowledgements. GWU/GMU students. Esmail Chitalwala (GWU/Star Bridge)

macey-nunez
Download Presentation

Program Development Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Program Development Environments Languages & Tools Kris Gaj George Mason University

  2. Acknowledgements Companies, centers, and sponsors • AMI • Cray • Mitrion • NCSA • SGI • SRC • Star Bridge • DoD/LUCITE

  3. Acknowledgements GWU/GMUstudents • Esmail Chitalwala (GWU/Star Bridge) • Hatim Diab (GWU) • Esam El-Araby (GWU) • Miaoqing Huang (GWU) • Hoang Le (GMU) • Allen Michalski (GMU/USC) • Nandkishore Sastry (GMU) • Chang Shu (GMU) • Mohamed Taher (GWU) • Proshanta Saha (GWU)

  4. SRC Programming Model Microprocessor FPGA Libraries of macros function_1 macro_1 macro_2 macro_3 macro_4 ………………………. main.c macro_1(a, b, c) macro_2(b, d) macro_2(c, e) function_1() function_2() VHDL FPGA function_2 I/O a macro_3(s, t) macro_1(n, b) macro_4(t, k) Macro_1 ANSI C c b Macro_2 Macro_2 MAP C (subset of ANSI C) d e I/O

  5. HLL FPGA system HDL SRC Program Partitioning C function for P P system C function for MAP VHDL macro

  6. SRC Compilation Process Application sources Macro sources .mc or .mf files . . vhd or or .v files .c or .f files HDL HDL sources sources Logic synthesis Logic synthesis .v files .v files  MAP Compiler P Compiler Netlists . . ngo ngo files files Object .o files .o files files Place & Route Place & Route Linker Linker .bin files .bin files Configuration Application bitstreams executable

  7. SRC Libraries of Hardware Macros Vendor libraries of hardware macros • basic integer and floating-point arithmetic • digital signal processing • User libraries of hardware macros • developed by GWU/GMU/USC 2002-2006 • Secret-key cipher encryption & breaking • Binary Galois Field arithmetic • (polynomial basis & normal basis representation) • Elliptic Curve Arithmetic • Long integer modular arithmetic (RSA) • Sorting • Image processing • Bioinformatics • See http://hpc.gwu.edu/library

  8. Star Bridge Programming Environment - Viva Star Sheets Library Object

  9. .ngo files .bin files Star Bridge Compilation Process User input Netlists Graphical User Interface Xilinx VIVA Place & Route Configuration bitstreams Application executable

  10. Cray XD1 Programming Flows The MathWorks int mask (a, m) Mitrion-C { return (a & m); } MATLAB/ Simulink High-level Flow Synthesis Xilinx Mitrion SystemGenerator process (a, m) is VHDL, begin Verilog z <= a and m; end process; VHDL or Verilog VHDL/Verilog Synthesis Mentor Graphics Gate-level EDIF a Synopsys z m Synplicity Xilinx Standard Flow Xilinx Place & Route 01001011010101 01010110101001 01000101011010 10100101010101 Source: [Cray, MAPLD05]

  11. Xtreme DSP Design Flow

  12. Behavioral Simulation (VCS, Modelsim) Design Synthesis (Synplify Pro, Amplify) Metadata Processing (Python) Static Timing Analysis (ISE Timing Analyzer) Device Programming (RASC Abstraction Layer, Device Manager, Device Driver) Real-time Verification (gdb) .c Altix HDL-based SGI Altix Programming Flow Design iterations Design Verification Design Entry (Verilog, VHDL) .v, .vhd .v, .vhd IA-32 Linux Machine .v, .vhd .edf Design Implementation (ISE) .ncd, .pcf .cfg .bin

  13. HLL Design Entry (Handel-C, Mitrion C, Viva) RTL Generation and Integration with Core Services Behavioral Simulation (VCS, Modelsim) Design Synthesis (Synplify Pro, Amplify) Metadata Processing (Python) Static Timing Analysis (ISE Timing Analyzer) Design Implementation (ISE) Device Programming (RASC Abstraction Layer, Device Manager, Device Driver) Real-time Verification (gdb) .c Altix HLL-based SGI Altix Programming Flow Design Verification .v, .vhd .v, .vhd .v, .vhd IA-32 Linux Machine .edf .ncd, .pcf .cfg .bin

  14. Mitrion-CProgramming Model for Cray & SGI Microprocessor FPGA Mitrion Distributed Processor Architecture (platform dependent) Application code (platform independent) VHDL main.c Mitrion-C Mitrion Compiler & Configurator function_1(in1) start_fpga() FPGA function_1(in2) start_fpga() RAM application on the distributed processor ANSI C based on Mitrion API Input &output I/O

  15. Compiling A Mitrion Program Mitrion-C Source code Mitrion Software Development Kit Compiler Processor Machine-code Processor Architecture Processor Configurator Simulator & Debugger Processor HW-Design (VHDL IP Core) FPGA

  16. The Mitrion Platform 1) The Mitrion Virtual Processor • A fine-grain massively parallel, configurable soft-core processor • 10-30 times faster than traditional CPUs 2) The Mitrion-C programming language • An intrinsically parallel C-family language 3) The Mitrion Software Development Kit • Compiler • Debugger/Simulator • Processor configurator

  17. A New Processor Architecture Specifically For FPGAs int:48<30> main() { int:48 prev = 1; int:48 fib = 1; int:48<30> fibonnacci = for(i in <1..30>) { fib = fib+prev; prev = fib; } <>fib; } fibonnacci; Architecture design goal: • High silicon utilization • Take advantage of FPGA re-configurability Goal achieved by: • Allow processor to be massively parallel • Allow processor to be fully adapted to algorithm ?

  18. Processor Architecture: A Cluster-On-A-Chip • Non-Von Neumann architecture • Processor architecture more like a cluster • Very Fine-Grain Parallelism • Normal clusters run a block of code on each PE1 • Mitrion runs a single instruction on each PE • Each PE adapted to optimally run its instruction • Network topology specific for algorithm • No Instruction Stream, instead Data Stream 1) PE = Processing Element

  19. A C-family Language • Basic syntax is the same as for other C-family languages • Examples: • Blocks are surrounded by { } • Assignment with = • Statements end with ; • if, for, while • Most of the usual c operators • C-style comments (though nestable)

  20. Types • Basic types int/uintsigned/unsigned integer boolean boolean value (true/false) float Floating point realvalue bits Bit vector format • Free bit width int:2424 bit signed integer uint:1919 bit unsigned integer float:24.8 IEEE-754 single precision float • Collections int:24[100]Vector (indexable collection) int:14<100>List (no index)

  21. Language constructs Operators if(a>b) ... while(i<10) ... for(i in <0..999>) ... foreach (e in vector) ... int:8 function(int:8 a) ...

  22. A C-family Language • Important differences • No pointers • No dynamic allocation • Static general recursion only • Though loop structures may be dynamic

  23. Compiler, Simulator And Debugger

  24. HLL Program Entry for FPGA Accelerator Boards Graphical Data Flow Diagram HDL Software Traditional Hardware Software Extended (e.g. Corefire) Hardware Increased productivity Increased capability to describe parallel execution

  25. GraphicalData Flow Diagram HDL HLL Program Entry for Reconfigurable Computers Software Star Bridge COM objects porting EDIF Hardware Software SRC Hardware HDL macros Increased productivity Increased capability to describe parallel execution

  26. GraphicalData Flow Diagram HDL HLL Program Entry for Reconfigurable Computers CrayXD1 with Simulink Software Simulink Hardware Xilinx System Generator SGI or Cray with Mitrion Software Mitrion Processor Hardware Mitrion-C Increased productivity Increased capability to describe parallel execution

  27. General hierarchy of library files suggested by SRC Computers Inc.

  28. Structure of the SRC macro repository < top of repository > < macros > <lib # 1 > <lib # 2 > <lib # 3 > common rev_d rev_e rev_f macro2 macro3 macro1 InfoFile BlkBoxFile DebugCodeFile DataSheet hdlfile

  29. Platform independent HDL file: macro.v or macro.vh Verilog or VHDL code defining the macro Debug Code File: macro.c provides the equivalent C functionality for the macro Data sheet file: datasheet contains the documentation for the macro Platform dependent Blk Box File: blackbox.v Interface (black box) definition for the macro in Verilog Info File: info Info file entry for this macro Files describing an SRC macro

  30. HLL (C, Fortran) HLL (C, Fortran) FPGA system HLL (C, Fortran) HLL (C, Fortran) Library Development - SRC LLL (ASM) P system HDL (VHDL, Verilog) Library Developer Application Programmer

  31. GDF (Viva) GDF (Viva) FPGA system GDF (Viva) GDF (Viva) Library Development - StarBridge HLL, LLL (C++, ASM) P system HDL (VHDL, Verilog) Library Developer Application Programmer

  32. Software libraries and their role in the development of SRC libraries

  33. Roles of software libraries source of test vectors for VHDL macros| emulation of hardware during debugging performance comparison

  34. How to approach porting your application to reconfigurable computers? 1. Identify class of applications 2. Identify basic operations required by your applications 3. Determine the existence of the RC library of such operations 4. Determine the existence of the microprocessor library of such operations 5. Determine the right granularity for the required library operations

  35. Classes of applications • input/output intensive applications • bulk data encryption (DES, IDEA, and RC5 encryption) 2. computationally intensive applications • secret-key cipher breaking based on the exhaustive key search (DES, IDEA, RC5 breakers) • public-key cipher breaking based on factoring 3. latency-critical applications • cipher key agreement and signature (ECC schemes, RSA)

  36. Example 1 Cryptography: High-throughput encryption

  37. Cipher message cryptographic key K bits ciphertext

  38. Secret-key ciphers key of Alice and Bob - KAB key of Alice and Bob - KAB Network Decryption Encryption Bob Alice

  39. High-Throughput Encryption . . . . Mi+2 Mi+1 Mi K0 Encryption algorithms: DES, 3DES, AES, RC5, IDEA, etc. Encryption Ci+2 Ci+1 Ci

  40. Fully Pipelined Architecture Loop unrolling Pipeline stages inside of cipher rounds New input & new output every clock cycle . . . . Round 1 . . . . Round 2 . . . . . . . Round k . . . .

  41. #include <libmap.h> void encryption (uint64_t sdata[], uint64_t key, uint64_t *hardware_timein, uint64_t *hardware_timeprocess, uint64_t*hardware_timeout, int mapnum) { OBM_BANK_A (S1OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_B (S2OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_C (S3OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_D (S4OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_E (S5OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_F (S6OBM, uint64_t, MAX_OBM_SIZE) uint32_t encrypt_decrypt; //0:encrypt 1:decrypt int i, nbytes; uint64_t t1,t2,t3,t4; Encryption on SRC-6 – No streamingencryption.mc (1)

  42. encrypt_decrypt = 0; nbytes = MAX_OBM_SIZE * 8*3; start_timer(); read_timer(&t1); DMA_CPU(CM2OBM, S1OBM, MAP_OBM_stripe(1,"A,B,C"), sdata, 1, nbytes, 0); wait_DMA(0); read_timer(&t2); for(i=0;i<MAX_OBM_SIZE;i++) { des (S1OBM[i], key, encrypt_decrypt, &S4OBM[i]); des (S2OBM[i], key, encrypt_decrypt, &S5OBM[i]); des (S3OBM[i], key, encrypt_decrypt, &S6OBM[i]); } read_timer(&t3); Encryption on SRC-6 – No streamingencryption.mc (2)

  43. Encryption on SRC-6 – No streamingencryption.mc (3) DMA_CPU(OBM2CM, S4OBM, MAP_OBM_stripe(1,"D,E,F"), sdata, 1, nbytes, 5); wait_DMA(5); read_timer(&t4); *hardware_timein = t2-t1; *hardware_timeprocess = t3-t2; *hardware_timeout = t4-t3; }

  44. Encryption on SRC-6 – No streamingdes_blkbx.v module des ( desOut, desIn, keyin, decrypt, clk ) /* synthesis syn_black_box syn_noprune=1 */ ; output [63:0] desOut; input [63:0] desIn; input [63:0] keyin; input decrypt; input clk /* synthesis syn_noclockbuf=1 */ ; endmodule

  45. Encryption on SRC-6 – No streamingdes.info (1) BEGIN_DEF "des" MACRO = "des"; LATENCY = 17; STATEFUL = NO; EXTERNAL = NO; PIPELINED = YES; INPUTS = 3: I0 = INT 64 BITS (desIn[63:0]) I1 = INT 64 BITS (keyin[63:0]) I2 = INT 32 BITS (decrypt) ; OUTPUTS = 1: O0 = INT 64 BITS (desOut[63:0]) ; IN_SIGNAL : 1 BITS "clk" = "CLOCK";

  46. Encryption on SRC-6 – No streamingdes.info (2) DEBUG_HEADER = $ void des__dbg (long long desin, long long keyin, int decrypt, long long *desout); $; DEBUG_FUNC = $ #include <des.h> void des__dbg(long long desin, long long keyin, int decrypt, long long *desout) { des_(desout, &desin, &keyin, &decrypt); } $; END_DEF

  47. #include <libmap.h> void encryption (uint64_t sdata[], uint64_t key, uint64_t *hardware_timeprocess, uint64_t *hardware_timeout, int mapnum) { OBM_BANK_A (S1OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_B (S2OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_D (S4OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_E (S5OBM, uint64_t, MAX_OBM_SIZE) uint32_t encrypt_decrypt; //0:encrypt 1:decrypt int i, nbytes; uint64_t t1,t2,t3; Stream_64 S0, S1; uint64_t v0, v1; encrypt_decrypt = 0; nbytes = MAX_OBM_SIZE * 8*2; Encryption on SRC-6 - with streamingencryption.mc (1)

  48. start_timer(); read_timer(&t1); #pragma src parallel sections { #pragma src section { stream_dma_cpu_dual (&S0, &S1, PORT_TO_STREAM, S1OBM, DMA_A_B, sdata, 1, nbytes); } #pragma src section { for (i=0; i<MAX_OBM_SIZE; i++) { get_stream (&S0, &v0); get_stream (&S1, &v1); des (v0, key, encrypt_decrypt, &S4OBM[i]); des (v1, key, encrypt_decrypt, &S5OBM[i]); }; } } Encryption on SRC-6 – with streamingencryption.mc (2)

More Related