1 / 18

FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab

FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. FPGA: A Sea of Resources. Logic Blocks. PLL. Processor. Multiplier. SRAM. I/O Pads. Clock Buffers. What can we build?. - Very complex systems. Reg. Reg.

stacia
Download Presentation

FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology http://csg.csail.mit.edu/6.375

  2. FPGA: A Sea of Resources Logic Blocks PLL Processor Multiplier SRAM I/O Pads Clock Buffers http://csg.csail.mit.edu/6.375

  3. What can we build? - Very complex systems http://csg.csail.mit.edu/6.375

  4. Reg Reg Logic Block: Building functionality Carry In Look-up Table + Combinational Input Combinational Output Muxing Logic Look-up Table + Carry Out http://csg.csail.mit.edu/6.375

  5. FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF Slice:Look-up Table • Arbitrary Logic • Program flipflops • Use inputs to select • Can we make a ROM? • Can we make a RAM? • Just add enable logic Combinational Output Muxing Logic Enable Demux Combinational Input http://csg.csail.mit.edu/6.375

  6. Reconfigurable Wiring Switch Switch • 2D Mesh Grid • Local connections made by driving powerful transistors • Switches route across dimensions • Heterogeneous wire length • Many wires to nearby cells • Few long-length wires Logic Block Switch Switch http://csg.csail.mit.edu/6.375

  7. SMIPS System http://csg.csail.mit.edu/6.375

  8. SMIPS Infrastructure http://csg.csail.mit.edu/6.375

  9. SMIPS Infrastructure • Bus Interface Logic • Avalon Master/Slave • Cbus Devices • mkCBusWideRegRW(addr,reg); • Many interfaces (Get, RegFile, etc.) • Mechanism for building memory map automatically • Some C drivers included http://csg.csail.mit.edu/6.375

  10. Demonstration • Synplify Pro • Quartus II • Nios-II IDE http://csg.csail.mit.edu/6.375

  11. Cryptosort: Think Different • Large (.5 GB) encrypted database • Decrypt Database • Sort Database on key • Encrypt Database • Do it fast, on an FPGA • Design principals differ from ASIC • Must be aware of FPGA hardware • Joint with Myron King, Man Cheuk Ng http://csg.csail.mit.edu/6.375

  12. From Problem: DRAM Cryptosorter • Encrypted Records in External Memory • Sort Records in Ascending Order • Decrypt Database with AES • Encrypt Sorted Records with AES http://csg.csail.mit.edu/6.375

  13. Cryptosort Architecture: PLB PPC PLB Master DRAM Feeder Function Unit: Sort Tree • Use Merge Sort O(n log(n)) http://csg.csail.mit.edu/6.375 L11-13

  14. Engineering the Merge Tree 8 to 4 < < < < < < < 4 to 2 Probably optimal for ASIC Easy to para-meterize and build tree Each level merges 2n streams into n streams 2 to 1 http://csg.csail.mit.edu/6.375

  15. Refining the Module This means each level only needs to perform one comparison per cycle http://csg.csail.mit.edu/6.375 Naïve implementation: exponential resource usage • Each comparator takes 3% of slices • At most, fit 3 levels Key observation: • Throughput is rate-limited by final 2-to-1 merge step

  16. Sharing the Comparator: Idea Loop: Choose non-empty input pair corresponding to output fifo with room (scheduling) Compare the fifo heads Dequeue the smaller one and put it on output fifo < We save area by having one comparator per level But we introduce a comparator scheduling problem http://csg.csail.mit.edu/6.375

  17. Sharing the Comparator: Physical Implementation Issues Not enough regs Each BRAM contains multiple FIFOs Aggressive clock Single cycle scheduling is impossible Enq happens several cycles after scheduling Credit based flow control http://csg.csail.mit.edu/6.375

  18. Layout: http://csg.csail.mit.edu/6.375

More Related