1 / 28

Progress Report

Progress Report. FPGA-based Infrastructure. Henry Chen henryic@ee.ucla.edu June 11, 2010. Motivation. Architectural & algorithmic exploration/optimization High-performance/high-throughput computation Closed-loop test environment [1,2]. Platform Architecture [3].

fynn
Download Presentation

Progress Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progress Report FPGA-based Infrastructure Henry Chen henryic@ee.ucla.edu June 11, 2010

  2. Motivation • Architectural & algorithmic exploration/optimization • High-performance/high-throughput computation • Closed-loop testenvironment [1,2]

  3. Platform Architecture [3] • Large design effort; amortize widely • As general-purpose as possible • Large memories • High I/O bandwidth • Use embedded CPU to provide high-level interface to FPGA resources

  4. IBOB • IBOB (Interconnect Break-Out Board) • 1x Virtex-II Pro (FPGA + PowerPC405) • 2x 18Mb (36-bit) SRAMs (~250MHz) • 2x CX4 10Gb high-speed-serial • 2x Z-DOK+ high-speed differential GPIO (80 diff pairs) • 80x LCMOS/LVTTL GPIO • RS232 UART to PPC; major I/O bottleneck • read_xps/write_xps • Our primary test platform; have 2 in-house

  5. ROACH • ROACH (Reconfigurable Open Architecture Compute Hardware) • 1x Virtex 5 FPGA • External PPC440 • 1x DDR2 DIMM • 2x 72Mbit (18-bit) QDR SRAMs (~350MHz) • 4x CX4 • 2x Z-DOK+ (80 diff pairs) • External PPC provides much faster interface to FPGA resources (1GbE) • None in-house (for now)

  6. BEE2 • BEE2 (Berkeley Emulation Engine) • 5x Virtex-II Pro • 20x DDR2 DRAM DIMMs (200MHz) • 18x CX4 ports • High-End Reconfigurable Computer • High I/O bandwidth per FPGA • High memory bandwidth per FPGA • High memory capacity per FPGA • Have one in-house

  7. BORPH [4] • Linux kernel modification for hardware abstraction; run on embedded CPU connected to FPGA • “Hardware process” • Programming an FPGA  running Linux executable • Some FPGA resources accessible in Linux process memory space • Makes FPGA board look just like Linux workstation • Used on BEE2, ROACH; limited version on IBOB w/ expansion board

  8. Design Environment • Simulink • Schematic-like • Integration w/ Matlab for analysis • Good for dataflow designs (ie., DSP) • Designed by BWRC, now maintained by international collaboration • Tutorials aplenty! See wiki

  9. Design Environment • Xilinx System Generator for Simulink • Custom DSP and system blocksets • One-click design compilation

  10. Testing w/ ROACH + KATCP • Digital frontend receiver (Rashmi)

  11. 1GbE PowerPC Matlab FPGA LVDS IO ASIC Test Board QDR SRAM ASIC BRAM

  12. Testing Requirements • High TX clock rate (400MHz target) • Beyond practical limits of IBOB’s V2P • Long test vectors (~4Mb) • Asynchronous clock domains for TX and RX

  13. Asynchronous Clock Domains • Easily supported by FPGA hardware • XSG has very limited capability for expressing multiple clocks; CE toggling • Further restricted by bee_xps tool automation; assumes single clock design (though many different clocks available)

  14. Asynchronous Clock Domains • Manually merged separate designs for test vector and readback datapaths Fixed 60MHz RX 255-315 MHz TX

  15. Results • Test up to 315MHz w/ loadable vectors in QDR;up to 340MHz with pre-compiled vectors in ROMs • 55dB SNR @ 20MHz bandwidth

  16. Limitations • DDR output FF critical path @ 340MHz (clock out) • QDR SRAM bus interface critical path @ 315MHz • Output clock jitter? • LVDS receivers usually only 400500Mbps • OK for data, not good for faster clocks • Get LVDS I/O cells?

  17. Future Design Recommendations • Send source-synchronous clock with returned data • Send synchronization information with returned data • “Vector warning” or frame start • Data valid

  18. KATCP • Comm. protocol interfacing to BORPH • Can be implemented over TCP telnet connection • Libraries and clients for C, Python

  19. KATCP Matlab Client • For our purposes, replaces read_xps, write_xps • Can program FPGA from directly from Matlabno more JTAG cable! • Provides byte-level read/write granularity • Increases speed from ~KB/s to ~MB/s • Room for improvement; currently high protocol overhead

  20. Towards Streaming • Transition to TCP/IP-based protocols facilitates streaming • Osort test vectors 10Mb of data at ~Mb/s (IBOB) • Single-vector load and read via SRAM • LWIP UDP read/write_xps • Ethernet streaming w/o going through shared memory

  21. New Windows Server(s) • dsp experiencing severe stability problems • eecls-{1, 2, 3, 4}.ee.ucla.edu • Windows Server 2008 (32-bit) • Matlab R2007b (+ XSG 10.1) • Matlab R2009b (+ XSG 11.5, Synphony 2009.12) • Xilinx Suite 10.1 • Xilinx Suite 11.5 • ModelSim 6.6a • Synplify 2010.03 • sherwin is now a print server

  22. References [1] Marković, D., et al., “ASIC Design and Verification in an FPGA Environment,” IEEE CICC, 2007 [2] Dejan Marković, UCLA EEM216A Fall 2008 Lecture 20 [3] Chang, C., et al., “BEE2: A High-End Reconfigurable Computing System”, IEEE Design & Test of Computers, 2005 [4] H. So, R. Brodersen, “A Unified Hardware/Software Runtime Environment for FPGA-Based Reconfigurable Computers using BORPH,” ACM TECS, 2008.

More Related