1 / 66

Automatic Synthesis and Optimization of Floating Point Hardware

Automatic Synthesis and Optimization of Floating Point Hardware. Ho Chun Hok Department of Computer Science and Engineering The Chinese University of Hong Kong. 18JUL2003. Overview. Introduction Fly – Modifiable Compiler Float – Floating Point Library Function Generator Results

velma
Download Presentation

Automatic Synthesis and Optimization of Floating Point Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Synthesis and Optimization of Floating Point Hardware Ho Chun Hok Department of Computer Science and Engineering The Chinese University of Hong Kong 18JUL2003

  2. Overview • Introduction • Fly – Modifiable Compiler • Float – Floating Point Library • Function Generator • Results • Conclusion

  3. Introduction • Hardware Description Language (HDL) based design has shortcomings • Hardware designs are parallel and people think in von-Neumann patterns • Complex to decompose a hardware design into datapath and control signal • Errors must introduced during the translation • Debugging on the hardware is harder then on the software • Hardware Interface for FPGA board must be developed • A designer must have strong background on the hardware design

  4. Introduction • Elementary Functions are not supported • No floating point arithmetic • No standard mathematical library like <math.h> • No log, sin, cos, 1/x, … • The size of FPGA is limited. • Area is an essential factor of a design

  5. Motivations • Is it possible to use single description on both software and hardware design? • Can we optimize the floating point arithmetic on hardware to save the resource? • On hardware design, can we introduce mathematic library just like software do?

  6. Objectives • Main goal  Use the smallest effort to develop hardware on FPGA • No need to familiar with hardware knowledge • The compilation from description to hardware is transparent to the designer • Floating point arithmetic supported • Elementary mathematic library provided, like software programming

  7. Contributions • A framework with 3 modules is developed • Fly – Modifiable Hardware Complier • Translate description into datapath • Float – Floating Point Arithmetic Library • Provide parameterized floating point operator and optimization engine • Function Generator • Generate any differentiable function, can be regarded as mathematic library.

  8. Contributions • Applications Developed using this framework • Greatest Common Divisor Coprocessor • Digital Sine-Cosine Generator (DSCG) • Ordinary Differential Equation Solver (ODE) • N-Body Problem Simulator • Ranged from fixed point design to floating point one

  9. Contribution – Traditional Design Flow

  10. Contribution – Revised Design Flow Hardware Process is transparent to designer

  11. Fly – Hardware Compiler

  12. Introduction • Fly is easily extensible • Source code can be easily understood and modified • Support common programming constructs • Fly language supports • Register assignment • Parallel statements • If – else branches • While loops • Built-in functions • Comments

  13. Fly Programming Language Main elements

  14. Fly Programming Language • Compilation Technique • Uses Page’s compilation technique • Each statement has associated start and end signals • Fly constructs a one-hot state machine (i.e. the control part of the hardware design) from the program by cascading the signals • Fly compiler implementation simple and concise due to use of Perl as the development language • One pass compilation • Outputs VHDL code • Can support different FPGA and ASIC design tools • Gives opportunity for synthesis tools to perform further logic optimization

  15. Application I - GCD Example { $s = $din[1]; $l = $din[2]; while ($s != $l) { $a = $l - $s; if ($a > 0) { $l = $a; } else { [{$s = $l;}{$l = $s;}] #swap } } $dout[1] = $1; }

  16. Resultant Datapath $s = $din[1]; $l = $din[2]; … else {[{$s = $l;}{$l = $s;}]}

  17. Resultant Datapath • while ($s != $l) {…}

  18. Resultant Datapath if ($a > 0) {$l = $a;}else {…}

  19. Host Interface (register) $s = $din[1]; … $dout[1] = $l;

  20. Summary • Input: Perl-like description of floating point design • Output: Synthesizable VHDL code for implementation • Datapath • One-hot state machine (control signal) • host interface is introduced • The datapath is correct because of automatic construction • Error eliminated when translating software algorithm into datapath and control signal • Bitstream generation is transparent to user • GCD coprocessor was given as an example

  21. Float – Floating Point Design Environment

  22. Introduction • Many applications involve floating point operation • Graphical Transformation • Scientific Simulation • Seldom implementation of floating point arithmetic on FPGA system • Implement the floating point arithmetic on FPGA is possible • Larger area • Higher speed • Arbitrary size of floating point on FPGA is possible • Allow more flexible design

  23. Introduction • Float Class • Optimize the floating point algorithm during simulation • A instant of float class represent a floating point variable when simulate the algorithm • VHDL Floating Point Generator • Generate arbitrary sized Floating point adder/multiplier • Integrated into fly environment

  24. Float Design Environment • Float Class • Encapsulate the Floating Point data structure • Arbitrary exponent and fraction size • Implemented on Perl • Support several method on floating point operation

  25. Float Design Environment • Float Class Attribute • Sign, Exponent, Fraction, • Size of exponent, fraction • Maximum magnitude • Use to determine the minimum exponent size required • Circuit size required for the floating point operation

  26. Float Design Environment • Float Class Method Support • add() • multiply() • setExponetSize() • setFractionSize() • setValue() • getValue() • getCircuitSize()

  27. Float Design Environment • Optimization • Input – accuracy, resource constraint • Output – size of each floating point operator • Nelder-Mead method to minimize the cost function

  28. Float Design Environment • Cost Function • Adder size: • Multiplier size: • Quantization Error (dB): • Cost Function:

  29. Float Design Environment • VHDL Floating Point Generator • Generate parameterized adder and multiplier with arbitrary size of exponent and fraction • Fully-Pipelined Design • Latency of Multiplication: 8 cycle • Latency of Addition : 4 cycle • 1 clock cycle throughput • Module is written in Perl as the Interface of library • Compatible to the fly compiler through start and end signal

  30. Adder C A B we S1 F1 Floating Point Adder C A B we S1 . . . . . F1 Integration into fly compiler • C=A+B: • Datapath for integer addition need 1 one clock cycle to complete • C=A .+ B • Datapath for floating point operation need more cycle to complete, add more Flip-Flop to delay the control signal

  31. Application II - Digital Sine Cosine Generator • Let si[n] be the signal at time n • If

  32. Application II - Digital Sine Cosine Generator $cos_theta = new Float(23, 8, 0.9); $cos_theta_p1 = new Float(23, 8, 1.9); $cos_theta_m1 = new Float(23, 8, -0.1); $s1[0] = new Float(23, 8, 0); $s2[0] = new Float(23, 8, 1); for ($i = 0 ; $i < 50 ; $i ++) { $s1[i+1] = $s1[$i] * $cos_theta + $s2[$i] * $cos_theta_p1; $s2[i+1] = $s1[$i] * $cos_theta_m1 + $s2[$i] * $cos_theta; }

  33. Application III - Ordinary Differential Equation Solver • Used modified fly compiler to solve ordinary differential equation • Used Euler’s method, h is step size • Example involves floating point addition, subtraction and multiplication

  34. Application III - Ordinary Differential Equation Solver { $h = &read_host(1); [ {$t=0.0;}{$y=1.0;}{$dy=0.0;} {onehalf=0.5;}{$index=0;} ] while ($t < 3.0) { [{$t1 = $h .* $onehalf;}{$t2 = $t .- $y;}] [{$dy = $t1 .* $t2;}{$t = $t .+ $h;}] [{$y = $y .+ $dy;}{$index = $index + 1;}] $void = &write_host($y, $index); } }

  35. Summary • Float Environment is introduced • Float Class allow to determine the size of floating point operation and maintain certain level of accuracy • Area can be reduced through optimization •  more logic can be implemented on the FPGA • Module generation allow fly compiler supports arbitrary-sized floating point arithmetic • Floating Point algorithm can be implemented on FPGA with ease • Translation from floating point to fixed point is no longer required • DSCG and ODE applications were given

  36. Function Generator

  37. Introduction • In software system, standard mathematical library function is available • In hardware design, mathematic library is required to implemented by designer • A general method which allow arbitrary differentiable function generation is desirable • STAM approach was adopted • Integrated into fly compiler

  38. STAM – datapath Symmetric Properties were removed during implementation for simplicity

  39. Implementation using VHDL • A Perl program which automates the generation of VHDL code with STAM algorithm • The program preprocesses the VHDL design and the STAM specification is inside the comment • BlockRAM store the table entries • The design can be used directly in the VHDL

  40. VHDL Preprocessor

  41. Floating Point extension • The original STAM can apply to Fixed Point Arithmetic • Minor add-on can let the STAM handle floating point arithmetic • Floating point arithmetic of v(-3/2) is implemented using STAM and floating point library

  42. Floating Point extension

  43. Fly integration • start and end signal is attached at the entity of power15, • A built-in function _power15() is introduced inside fly compiler with slight modification

  44. Application IVN-Body Problem Simulation • Calculate the acceleration force of each particles by iteration • Used fly, float, and function generator in this application

  45. N-Body problem - Fly implementation [ {$r1 = $x .+ $y;} {$r2 = $z .+ $epsilon;} ] # caculate rij $rij = $r1 .+ $r2; # call built-in function power^{-1.5} $tmp2 = &_power15($rij); [{$tmpx = $tmp2 .* $diffx;} {$tmpy = $tmp2 .* $diffy;} {$tmpz = $tmp2 .* $diffz;}] [{$ax = $ax .+ $tmpx;} #accumulate a {$ay = $ay .+ $tmpy;} {$az = $az .+ $tmpz;}] $j = $j + 1; } { # initialization, fetch xi,yi,zi while ($j < $n) { # fetch xj,yj,zj from memory $xj = &read_host($index); $index = $index + 1; $yj = &read_host($index); $index = $index + 1; $zj = &read_host($index); $index = $index + 2; [{$diffx = $xj .- $xi;} {$diffy = $yj .- $yi;} {$diffz = $zj .- $zi;}] [ {$x = $diffx .* $diffx;} {$y = $diffy .* $diffy;} {$z = $diffz .* $diffz;} ]

  46. Summary • STAM approach enhance the flexibility of fly compiler • Arbitrary mathematical function is now support through table lookup • Mechanism is similar to software programming • N-body problem simulation shows that a real world problem can be solved with this framework

  47. Results

  48. Experiment Environment • The framework was integrated into the Pilchard FPGA platform • Pilchard uses DIMM memory bus interface instead of PCI bus (lower latency and higher bandwidth than PCI) • Compilation and implementation process is transparent to the user

  49. ResultApplication I - GCD • A GCD coprocessor was implemented using the Fly System • Implemented on Pilchard (Xilinx XCV300E-8) • Fixed point 16bit integer • Max. Frequency: 126 MHz • Slices Used: 135 out of 3072 slices • Computes a GCD every 1.63ms (including all interface overheads)

  50. ResultFloating Point Generator • Floating Point Operators was implemented • Implemented on Pilchard (Xilinx XCV1000E-6) • Different fraction size is measured, exponent size is 8 • Max. Frequency (Multiplier): 103MHz • Max. Frequency (Adder): 58MHz • The result used to model the area relationship

More Related