1 / 17

ECE 565 High-Level Synthesis--Introduction

ECE 565 High-Level Synthesis--Introduction. Shantanu Dutt ECE Dept., UIC. HLS Flow. Code/Algorithm  Architecture (interconnected functional units (FUs), memory units (MUs) via muxes, demuxes, tristate buffers, buses, dedicated interconnects).

barnard
Download Presentation

ECE 565 High-Level Synthesis--Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 565High-Level Synthesis--Introduction Shantanu Dutt ECE Dept., UIC

  2. HLS Flow • Code/Algorithm  Architecture (interconnected functional units (FUs), memory units (MUs) via muxes, demuxes, tristate buffers, buses, dedicated interconnects) Classically, these 3 stages were performed sequentially but currently performed together (which leads to better optimization)

  3. HLS Flow (contd)

  4. HLS Flow (contd) (Binding) Allocation: Simple counting of FUs after the above 2 stages

  5. Simple HLS Examples +

  6. ldd ldc ldx c d ldy x y I1 I0 I0 I1 ldb lda mux a b mux mux2 mux1 + X 1 2 3 4 5 6 demux demux cc 3(i+1) ldz z reg. “a” loaded lda = 1 Simple HLS Examples (contd) 2) Mapping to h/w w/ constraints: use only 1 (X) and 1 (+) a) Non-overlapped scheduling X c1(1) c1(2) + c2(1) c3(2) c3(1) c2(2) cc’s mux1=0, mux2=0 demux=0, ldy=1 [y  c+d] (c2) Controller FSM: cc 3i Reset Note: A register is loaded at the +ve/-ve edge (in a +ve/-ve edge triggered system) of the cc after the one in which its load signal is asseted. lda=1, ldb=1, ldc=1, ldd=1, mux1=1, mux2=1 demux=1, ldz=1 Note: Unspecified control signals have either an inactive value, or if such a concept doesn’t exists for the cs, then the don’t-care value ldx=1 cc 3(i+2) [x  a x b] (c1) [z  x+y] (c3)

  7. ldd ldc ldx c d ldy x y I1 I0 I0 I1 ldb lda mux a b mux mux2 mux1 + X demux demux 1 2 3 4 5 6 ldz z Simple HLS Examples (contd) 2) Mapping to h/w w/ constraints: use only 1 (X) and 1 (+) b) Overlapped scheduling X c1(1) c1(2) + c2(1) c3(1) c2(2) c3(2) cc’s cc 3(i+1) ldc=1, ldd=1, mux1=0, mux2=0, demux=0, ldx=1, ldy=1 [y  c+d, x  a x b] (c1, c2) Controller FSM: cc 3i Reset • For 4 iterations, the overlapped schedule takes 9 cc’s versus 12 cc’s by the non-overlapped sched. • Overlap. sched: Time for n iterations = 2n+1 • Throughput = n/(2n+1) ~ 0.5 outputs/cc • Nonoverlap. sched: Time for n iterations = 3n • Throughput = n/3n ~ 0.33 outputs/cc •  ~ 34% throughput improvement using an overlapped schedule lda=1, ldb=1, mux1=1, mux2=1 demux=1, ldz=1 [z  x+y] (c3)

  8. in1 in in2 T F Distributor • Some DFG control operation nodes: Selectot T F Condition (T/F) Condition (T/F) out out2 out1 Simple HLS Examples (contd) • Conditional code: If (a > b) then c  a-b; Else c  b-a; • Possible DFGs corresponding to the above conditional code:

  9. Iterative code: while (a > b) a  a-b; b a a r1 b ldb lda ldr1 1 T F 0 sel Mux b’ mux > - b’+1 = 2’s compl. of -b To fsm + cin 1 s xor ovfl = 1  -ve = 0  +ve Initialized to F dist T F demux Demux 0 1 ldfina a final a + c1 c2 c1 c2 Scheduling & binding: cc’s Simple HLS Examples (contd) c2 c1

  10. Delay Nodes in DFGs A delay node is generally implemented as a register; a delay node thus becomes a state variable.

  11. Delay Nodes in DFGs (contd) register Mapping to the architecture Transformation in the DFG

  12. Detailed HLS Example

  13. Detailed HLS Example (contd) Note: Not clear how register allocation has been done. It is sub-optimal. The synthesized architecture

  14. Detailed HLS Example (contd)

  15. Detailed HLS Example—Register Allocation

  16. Detailed HLS Example—Register Allocation (contd) • In the conflict graph (one per FU), there is an edge between 2 variable nodes if their lifetimes overlap (indicating that different registers need to be allocated to them) • Graph coloring in general is NP-hard • The above type of conflict graph is called an interval graph (derived from a 1-dimensional interval) • Min. graph coloring can be solved optimally in linear time (using the left-edge algorithm that we will see later for channel routing)

  17. Detailed HLS Example—Register Allocation (contd)

More Related