Automatic Synthesis and Optimization of Floating Point Hardware

Automatic Synthesis and Optimization of Floating Point Hardware Ho Chun Hok Department of Computer Science and Engineering The Chinese University of Hong Kong 18JUL2003

Overview • Introduction • Fly – Modifiable Compiler • Float – Floating Point Library • Function Generator • Results • Conclusion

Introduction • Hardware Description Language (HDL) based design has shortcomings • Hardware designs are parallel and people think in von-Neumann patterns • Complex to decompose a hardware design into datapath and control signal • Errors must introduced during the translation • Debugging on the hardware is harder then on the software • Hardware Interface for FPGA board must be developed • A designer must have strong background on the hardware design

Introduction • Elementary Functions are not supported • No floating point arithmetic • No standard mathematical library like <math.h> • No log, sin, cos, 1/x, … • The size of FPGA is limited. • Area is an essential factor of a design

Motivations • Is it possible to use single description on both software and hardware design? • Can we optimize the floating point arithmetic on hardware to save the resource? • On hardware design, can we introduce mathematic library just like software do?

Objectives • Main goal  Use the smallest effort to develop hardware on FPGA • No need to familiar with hardware knowledge • The compilation from description to hardware is transparent to the designer • Floating point arithmetic supported • Elementary mathematic library provided, like software programming

Contributions • A framework with 3 modules is developed • Fly – Modifiable Hardware Complier • Translate description into datapath • Float – Floating Point Arithmetic Library • Provide parameterized floating point operator and optimization engine • Function Generator • Generate any differentiable function, can be regarded as mathematic library.

Contributions • Applications Developed using this framework • Greatest Common Divisor Coprocessor • Digital Sine-Cosine Generator (DSCG) • Ordinary Differential Equation Solver (ODE) • N-Body Problem Simulator • Ranged from fixed point design to floating point one

Contribution – Traditional Design Flow

Contribution – Revised Design Flow Hardware Process is transparent to designer

Fly – Hardware Compiler

Introduction • Fly is easily extensible • Source code can be easily understood and modified • Support common programming constructs • Fly language supports • Register assignment • Parallel statements • If – else branches • While loops • Built-in functions • Comments

Fly Programming Language Main elements

Fly Programming Language • Compilation Technique • Uses Page’s compilation technique • Each statement has associated start and end signals • Fly constructs a one-hot state machine (i.e. the control part of the hardware design) from the program by cascading the signals • Fly compiler implementation simple and concise due to use of Perl as the development language • One pass compilation • Outputs VHDL code • Can support different FPGA and ASIC design tools • Gives opportunity for synthesis tools to perform further logic optimization

Application I - GCD Example { $s = $din[1]; $l = $din[2]; while ($s != $l) { $a = $l - $s; if ($a > 0) { $l = $a; } else { [{$s = $l;}{$l = $s;}] #swap } } $dout[1] = $1; }

Resultant Datapath $s = $din[1]; $l = $din[2]; … else {[{$s = $l;}{$l = $s;}]}

Resultant Datapath • while ($s != $l) {…}

Resultant Datapath if ($a > 0) {$l = $a;}else {…}

Host Interface (register) $s = $din[1]; … $dout[1] = $l;

Summary • Input: Perl-like description of floating point design • Output: Synthesizable VHDL code for implementation • Datapath • One-hot state machine (control signal) • host interface is introduced • The datapath is correct because of automatic construction • Error eliminated when translating software algorithm into datapath and control signal • Bitstream generation is transparent to user • GCD coprocessor was given as an example

Float – Floating Point Design Environment

Introduction • Many applications involve floating point operation • Graphical Transformation • Scientific Simulation • Seldom implementation of floating point arithmetic on FPGA system • Implement the floating point arithmetic on FPGA is possible • Larger area • Higher speed • Arbitrary size of floating point on FPGA is possible • Allow more flexible design

Introduction • Float Class • Optimize the floating point algorithm during simulation • A instant of float class represent a floating point variable when simulate the algorithm • VHDL Floating Point Generator • Generate arbitrary sized Floating point adder/multiplier • Integrated into fly environment

Float Design Environment • Float Class • Encapsulate the Floating Point data structure • Arbitrary exponent and fraction size • Implemented on Perl • Support several method on floating point operation

Float Design Environment • Float Class Attribute • Sign, Exponent, Fraction, • Size of exponent, fraction • Maximum magnitude • Use to determine the minimum exponent size required • Circuit size required for the floating point operation

Float Design Environment • Float Class Method Support • add() • multiply() • setExponetSize() • setFractionSize() • setValue() • getValue() • getCircuitSize()

Float Design Environment • Optimization • Input – accuracy, resource constraint • Output – size of each floating point operator • Nelder-Mead method to minimize the cost function

Float Design Environment • Cost Function • Adder size: • Multiplier size: • Quantization Error (dB): • Cost Function:

Float Design Environment • VHDL Floating Point Generator • Generate parameterized adder and multiplier with arbitrary size of exponent and fraction • Fully-Pipelined Design • Latency of Multiplication: 8 cycle • Latency of Addition : 4 cycle • 1 clock cycle throughput • Module is written in Perl as the Interface of library • Compatible to the fly compiler through start and end signal

Adder C A B we S1 F1 Floating Point Adder C A B we S1 . . . . . F1 Integration into fly compiler • C=A+B: • Datapath for integer addition need 1 one clock cycle to complete • C=A .+ B • Datapath for floating point operation need more cycle to complete, add more Flip-Flop to delay the control signal

Application II - Digital Sine Cosine Generator • Let si[n] be the signal at time n • If

Application II - Digital Sine Cosine Generator $cos_theta = new Float(23, 8, 0.9); $cos_theta_p1 = new Float(23, 8, 1.9); $cos_theta_m1 = new Float(23, 8, -0.1); $s1[0] = new Float(23, 8, 0); $s2[0] = new Float(23, 8, 1); for ($i = 0 ; $i < 50 ; $i ++) { $s1[i+1] = $s1[$i] * $cos_theta + $s2[$i] * $cos_theta_p1; $s2[i+1] = $s1[$i] * $cos_theta_m1 + $s2[$i] * $cos_theta; }

Application III - Ordinary Differential Equation Solver • Used modified fly compiler to solve ordinary differential equation • Used Euler’s method, h is step size • Example involves floating point addition, subtraction and multiplication

Application III - Ordinary Differential Equation Solver { $h = &read_host(1); [ {$t=0.0;}{$y=1.0;}{$dy=0.0;} {onehalf=0.5;}{$index=0;} ] while ($t < 3.0) { [{$t1 = $h .* $onehalf;}{$t2 = $t .- $y;}] [{$dy = $t1 .* $t2;}{$t = $t .+ $h;}] [{$y = $y .+ $dy;}{$index = $index + 1;}] $void = &write_host($y, $index); } }

Summary • Float Environment is introduced • Float Class allow to determine the size of floating point operation and maintain certain level of accuracy • Area can be reduced through optimization •  more logic can be implemented on the FPGA • Module generation allow fly compiler supports arbitrary-sized floating point arithmetic • Floating Point algorithm can be implemented on FPGA with ease • Translation from floating point to fixed point is no longer required • DSCG and ODE applications were given

Function Generator

Introduction • In software system, standard mathematical library function is available • In hardware design, mathematic library is required to implemented by designer • A general method which allow arbitrary differentiable function generation is desirable • STAM approach was adopted • Integrated into fly compiler

STAM – datapath Symmetric Properties were removed during implementation for simplicity

Implementation using VHDL • A Perl program which automates the generation of VHDL code with STAM algorithm • The program preprocesses the VHDL design and the STAM specification is inside the comment • BlockRAM store the table entries • The design can be used directly in the VHDL

VHDL Preprocessor

Floating Point extension • The original STAM can apply to Fixed Point Arithmetic • Minor add-on can let the STAM handle floating point arithmetic • Floating point arithmetic of v(-3/2) is implemented using STAM and floating point library

Floating Point extension

Fly integration • start and end signal is attached at the entity of power15, • A built-in function _power15() is introduced inside fly compiler with slight modification

Application IVN-Body Problem Simulation • Calculate the acceleration force of each particles by iteration • Used fly, float, and function generator in this application

N-Body problem - Fly implementation [ {$r1 = $x .+ $y;} {$r2 = $z .+ $epsilon;} ] # caculate rij $rij = $r1 .+ $r2; # call built-in function power^{-1.5} $tmp2 = &_power15($rij); [{$tmpx = $tmp2 .* $diffx;} {$tmpy = $tmp2 .* $diffy;} {$tmpz = $tmp2 .* $diffz;}] [{$ax = $ax .+ $tmpx;} #accumulate a {$ay = $ay .+ $tmpy;} {$az = $az .+ $tmpz;}] $j = $j + 1; } { # initialization, fetch xi,yi,zi while ($j < $n) { # fetch xj,yj,zj from memory $xj = &read_host($index); $index = $index + 1; $yj = &read_host($index); $index = $index + 1; $zj = &read_host($index); $index = $index + 2; [{$diffx = $xj .- $xi;} {$diffy = $yj .- $yi;} {$diffz = $zj .- $zi;}] [ {$x = $diffx .* $diffx;} {$y = $diffy .* $diffy;} {$z = $diffz .* $diffz;} ]

Summary • STAM approach enhance the flexibility of fly compiler • Arbitrary mathematical function is now support through table lookup • Mechanism is similar to software programming • N-body problem simulation shows that a real world problem can be solved with this framework

Results

Experiment Environment • The framework was integrated into the Pilchard FPGA platform • Pilchard uses DIMM memory bus interface instead of PCI bus (lower latency and higher bandwidth than PCI) • Compilation and implementation process is transparent to the user

ResultApplication I - GCD • A GCD coprocessor was implemented using the Fly System • Implemented on Pilchard (Xilinx XCV300E-8) • Fixed point 16bit integer • Max. Frequency: 126 MHz • Slices Used: 135 out of 3072 slices • Computes a GCD every 1.63ms (including all interface overheads)

ResultFloating Point Generator • Floating Point Operators was implemented • Implemented on Pilchard (Xilinx XCV1000E-6) • Different fraction size is measured, exponent size is 8 • Max. Frequency (Multiplier): 103MHz • Max. Frequency (Adder): 58MHz • The result used to model the area relationship

Automatic Synthesis and Optimization of Floating Point Hardware

Automatic Synthesis and Optimization of Floating Point Hardware

Presentation Transcript

Floating Point

Compiler Exploitation of Decimal Floating-Point Hardware

Floating Point

COM181 Computer Hardware Lecture 1b: Floating Point

Hardware Based Floating Point Processing

Automatic Verification of Floating Point Units

Floating Point

Floating Point

Precision Modeling and Bitwidth Optimization of Floating-Point Applications

Floating Point

Floating Point

Floating Point

Floating point

Floating Point Synthesis From Model-Based Design

Floating Point Hardware and Algorithms

Floating Point

Floating point

Floating Point

Automatic Floating-Point to Fixed-Point Transformations

Floating Point Hardware and Algorithms

Floating Point