660 likes | 1.08k Views
Introduction to FPGA Devices. World of Integrated Circuits. Integrated Circuits. Full-Custom ASICs. Semi-Custom ASICs. User Programmable. PLD. FPGA. PAL. PLA. PML. LUT (Look-Up Table). MUX. Gates. Two competing implementation approaches. FPGA F ield P rogrammable G ate A rray.
E N D
Introduction to FPGA Devices ECE 645 – Computer Arithmetic
World of Integrated Circuits Integrated Circuits Full-Custom ASICs Semi-Custom ASICs User Programmable PLD FPGA PAL PLA PML LUT (Look-Up Table) MUX Gates ECE 645 – Computer Arithmetic
Two competing implementation approaches FPGA FieldProgrammable GateArray ASIC ApplicationSpecific IntegratedCircuit • bought off the shelf • and reconfigured by • designers themselves • designs must be sent • for expensive and time • consuming fabrication • in semiconductor foundry • no physical layout design; • design ends with • a bitstream used • to configure a device • designed all the way • from behavioral description • to physical layout ECE 645 – Computer Arithmetic
Block RAMs Block RAMs What is an FPGA? Configurable Logic Blocks I/O Blocks Block RAMs ECE 645 – Computer Arithmetic
Which Way to Go? ASICs FPGAs Off-the-shelf High performance Low development cost Low power Short time to market Low cost in high volumes Reconfigurability ECE 645 – Computer Arithmetic
Other FPGA Advantages • Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower • Mistakes not detected at design time have large impact on development time and cost • FPGAs are perfect for rapid prototyping of digital circuits • Easy upgrades like in case of software • Unique applications • reconfigurable computing ECE 645 – Computer Arithmetic
Major FPGA Vendors SRAM-based FPGAs • Xilinx, Inc. • Altera Corp. • Atmel • Lattice Semiconductor Flash & antifuse FPGAs • Actel Corp. • Quick Logic Corp. Share over 60% of the market ECE 645 – Computer Arithmetic
Xilinx Programmable Logic Devices • Primary products: FPGAs and the associated CAD software • Main headquarters in San Jose, CA • Fabless* Semiconductor and Software Company • UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} • Seiko Epson (Japan) • TSMC (Taiwan) ISE Alliance and Foundation Series Design Software ECE 645 – Computer Arithmetic
Xilinx FPGA Families • Old families • XC3000, XC4000, XC5200 • Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. • High-performance families • Virtex (0.22µm) • Virtex-E, Virtex-EM (0.18µm) • Virtex-II, Virtex-II PRO (0.13µm) • Virtex-4 (0.09µm) • Low Cost Family • Spartan/XL – derived from XC4000 • Spartan-II – derived from Virtex • Spartan-IIE – derived from Virtex-E • Spartan-3 ECE 645 – Computer Arithmetic
Xilinx FPGA Block Diagram ECE 645 – Computer Arithmetic
CLB Structure ECE 645 – Computer Arithmetic
CLB Slice Structure • Each slice contains two sets of the following: • Four-input LUT • Any 4-input logic function, • or 16-bit x 1 sync RAM • or 16-bit shift register • Carry & Control • Fast arithmetic logic • Multiplier logic • Multiplexer logic • Storage element • Latch or flip-flop • Set and reset • True or inverted inputs • Sync. or async. control ECE 645 – Computer Arithmetic
LUT (Look-Up Table) Functionality • Look-Up tables are primary elements for logic implementation • Each LUT can implement any function of 4 inputs ECE 645 – Computer Arithmetic
5-Input Functions implemented using two LUTs • One CLB Slice can implement any function of 5 inputs • Logic function is partitioned between two LUTs • F5 multiplexer selects LUT ECE 645 – Computer Arithmetic
5-Input Functions implemented using two LUTs LUT LUT LUT LUT OUT ECE 645 – Computer Arithmetic
Distributed RAM RAM16X1S D WE WCLK = O A0 A1 A2 A3 LUT LUT LUT RAM32X1S D WE WCLK A0 O A1 A2 A3 A4 or RAM16X2S D0 D1 WE = WCLK O0 A0 O1 RAM16X1D A1 A2 D A3 WE or WCLK A0 SPO A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 • CLB LUT configurable as Distributed RAM • A LUT equals 16x1 RAM • Implements Single and Dual-Ports • Cascade LUTs to increase RAM size • Synchronous write • Synchronous/Asynchronous read • Accompanying flip-flops used for synchronous read ECE 645 – Computer Arithmetic
Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth Shift Register LUT D D D D Q Q Q Q IN CE CE CE CE CE CLK LUT = OUT DEPTH[3:0] ECE 645 – Computer Arithmetic
Shift Register 12 Cycles 64 64 Operation A Operation B 4 Cycles 8 Cycles Operation C 3 Cycles 3 Cycles 9-Cycle imbalance • Register-rich FPGA • Allows for addition of pipeline stages to increase throughput • Data paths must be balanced to keep desired functionality ECE 645 – Computer Arithmetic
Carry & Control Logic COUT YB Carry & Control Logic Look-Up Table Y G4 G3 G2 G1 S D Q O CK EC R F5IN BY SR XB Look-Up Table Carry & Control Logic X S F4 F3 F2 F1 D Q O CK EC R CIN CLK CE SLICE ECE 645 – Computer Arithmetic
Fast Carry Logic • Each CLB contains separate logic and routing for the fast generation of sum & carry signals • Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters • Carry logic is independent of normal logic and routing resources MSB Carry Logic Routing LSB ECE 645 – Computer Arithmetic
Accessing Carry Logic • All major synthesis tools can infer carry logic for arithmetic functions • Addition (SUM <= A + B) • Subtraction (DIFF <= A - B) • Comparators (if A < B then…) • Counters (count <= count +1) ECE 645 – Computer Arithmetic
Block RAM Port A Spartan-II True Dual-Port Block RAM Port B Block RAM • Most efficient memory implementation • Dedicated blocks of memory • Ideal for most memory requirements • 4 to 104 memory blocks • 18 kbits = 18,432 bits per block • Use multiple blocks for larger memories • Builds both single and true dual-port RAMs ECE 645 – Computer Arithmetic
Spartan-3 Block RAM Amounts ECE 645 – Computer Arithmetic
Block RAM Port Aspect Ratios ECE 645 – Computer Arithmetic
Block RAM Port Aspect Ratios 1 2 4 0 0 0 4k x 4 8k x 2 4,095 16k x 1 8,191 8+1 0 2k x (8+1) 2047 16+2 0 1024 x (16+2) 1023 16,383 ECE 645 – Computer Arithmetic
Dual Port Block RAM ECE 645 – Computer Arithmetic
Dual-Port Bus Flexibility RAMB4_S4_S16 • Each port can be configured with a different data bus width • Provides easy data width conversion without any additional logic WEA Port A In 1K-Bit Depth Port A Out 18-Bit Width ENA RSTA DOA[17:0] CLKA ADDRA[9:0] DIA[17:0] WEB Port B Out 9-Bit Width Port B In 2k-Bit Depth ENB RSTB DOB[8:0] CLKB ADDRB[8:0] DIB[15:0] ECE 645 – Computer Arithmetic
Added advantage of True Dual-Port No wasted RAM Bits Can split a Dual-Port 16K RAM into two Single-Port 8K RAM Simultaneous independent access to each RAM To access the lower RAM Tie the MSB address bit to Logic Low To access the upper RAM Tie the MSB address bit to Logic High Two Independent Single-Port RAMs DOA[0] WEA ENA RSTA CLKA ADDRA[12:0] DOB[0] DIA[0] WEB ENB RSTB CLKB ADDRB[12:0] DIB[0] RAMB4_S1_S1 Port A In 8K-Bit Depth Port A Out 1-Bit Width VCC, ADDR[12:0] Port B In 8K-Bit Depth Port B Out 1-Bit Width GND, ADDR[12:0] ECE 645 – Computer Arithmetic
New 18 x 18 Embedded Multiplier • Fast arithmetic functions • Optimized to implement multiply / accumulate modules ECE 645 – Computer Arithmetic
18 x 18 Multiplier 18 x 18 Multiplier Data_A (18 bits) Output (36 bits) Data_B (18 bits) • Embedded 18-bit x 18-bit multiplier • 2’s complement signed operation • Multipliers are organized in columns Note: See Virtex-II Data Sheet for updated performances ECE 645 – Computer Arithmetic
Basic I/O Block Structure Q D Three-State EC FF Enable Three-StateControl Clock SR Set/Reset Q D Output EC FF Enable Output Path SR Direct Input FF Enable Input Path Q D Registered Input EC SR ECE 645 – Computer Arithmetic
IOB Functionality • IOB provides interface between the package pins and CLBs • Each IOB can work as uni- or bi-directional I/O • Outputs can be forced into High Impedance • Inputs and outputs can be registered • advised for high-performance I/O • Inputs can be delayed ECE 645 – Computer Arithmetic
Routing Resources CLB CLB CLB PSM PSM Programmable Switch Matrix CLB CLB CLB PSM PSM CLB CLB CLB ECE 645 – Computer Arithmetic
Clock Distribution ECE 645 – Computer Arithmetic
Spartan-3 FPGA Family Members ECE 645 – Computer Arithmetic
FPGA Nomenclature ECE 645 – Computer Arithmetic
Device Part Marking We’re Using: XC3S100-4FG256 ECE 645 – Computer Arithmetic
Virtex-II 1.5V Architecture Block RAMs Block RAMs Block RAMs Block RAMs I/O Block Configurable Logic Block Multipliers 18 x 18 Multipliers 18 x 18 Multipliers 18 x 18 Multipliers 18 x 18 ECE 645 – Computer Arithmetic
Virtex-II 1.5V ECE 645 – Computer Arithmetic
Virtex-II Block SelectRAM • Virtex-II BRAM is 18 kbits • Additional “parity” bits available in selected configurations ECE 645 – Computer Arithmetic
Using Library Components in VHDL Code ECE 645 – Computer Arithmetic
RAM 16x1 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity RAM_16X1_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC; DATA_OUT : out STD_LOGIC ); end RAM_16X1_DISTRIBUTED; ECE 645 – Computer Arithmetic
RAM 16x1 (2) architecture RAM_16X1_DISTRIBUTED_STRUCTURAL of RAM_16X1_DISTRIBUTED is attribute INIT : string; attribute INIT of RAM16X1_S_1: label is "F0C1"; -- Component declaration of the "ram16x1s(ram16x1s_v)" unit -- File name contains "ram16x1s" entity: ./src/unisim_vital.vhd component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component; ECE 645 – Computer Arithmetic
RAM 16x1 (3) begin RAM_16X1_S_1: ram16x1s generic map (INIT => X"F0C1") port map (O=>DATA_OUT, A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>DATA_IN, WCLK=>CLK, WE=>WE ); end RAM_16X1_DISTRIBUTED_STRUCTURAL; ECE 645 – Computer Arithmetic
RAM 16x8 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity RAM_16X8_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC_VECTOR(7 downto 0); DATA_OUT : out STD_LOGIC_VECTOR(7 downto 0) ); end RAM_16X8_DISTRIBUTED; ECE 645 – Computer Arithmetic
RAM 16x8 (2) architecture RAM_16X8_DISTRIBUTED_STRUCTURAL of RAM_16X8_DISTRIBUTED is attribute INIT : string; attribute INIT of RAM16X1_S_1: label is "0000"; -- Component declaration of the "ram16x1s(ram16x1s_v)" unit -- File name contains "ram16x1s" entity: ./src/unisim_vital.vhd component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component; ECE 645 – Computer Arithmetic
RAM 16x8 (3) begin GENERATE_MEMORY: for I in 0 to 7 generate RAM_16X1_S_1: ram16x1s generic map (INIT => X"0000") port map (O=>DATA_OUT(I), A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>DATA_IN(I), WCLK=>CLK, WE=>WE ); end generate; end RAM_16X8_DISTRIBUTED_STRUCTURAL; ECE 645 – Computer Arithmetic
ROM 16x1 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity ROM_16X1_DISTRIBUTED is port( ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_OUT : out STD_LOGIC ); end ROM_16X1_DISTRIBUTED; ECE 645 – Computer Arithmetic