1 / 23

A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products

A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products. Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University. f. e. c. a. d. b. q = c * d. p = a * b. p. q. z = p + q + e + f. z. What is a Sum-of-Product (SOP).

maura
Download Presentation

A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Timing-Driven Synthesis Approach of a FastFour-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University

  2. f e c a d b q = c * d p = a * b p q z = p + q + e + f z What is a Sum-of-Product (SOP) • An arithmetic Sum-of-Product block (SOP) consists of an arbitrary number of product terms and sum terms. • General form of SOP:

  3. Multiplier {assign z = a * b} found in Microprocessors Multiply-Accumulator {assign z = (a * b) + c} found in Cryptographic Applications Squarer {assign z = a * a} found in DSP processors Addition Tree {assign z = a + b + c + d} found in ALU, Wireless applications Generalized SOP {assign z = (a * b) + (c * d)} found in FIR filters, IIR filters Examples of SOP Blocks

  4. Synthesis of Sum-of-Products Inputs • Synthesis of Sum-of-Product blocks is done in 3 steps (in the order of data-flow) • Creation of Partial Products • Reduction of Partial Products into 2 operands • Computation of Final Sum by adding the 2 operands Creation of Partial Products Reduction of Partial Products Computation of Final Sum Output

  5. Motivation and Problem Statement • SOP blocks are widely used and computationally-intensive • Final adder in SOP consumes about 30% to 40% delay of the SOP block. This paper focuses on the synthesis of an efficient final adder for a SOP expression • Stand-alone adder architectures do not work well in SOP

  6. Stand-alone Adder Architectures • Frequently used adder architectures • Ripple-Carry • Area-efficient, but slow • Timing-efficient if inputs have skewed arrival time • Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) • Faster architecture • Requires more area • Carry-Select • Large area overhead (often >100%) • Better delay if Cin signal arrives late. • None of these are very suitable in Sum-of-Products • Why?

  7. Special Arrival-time Property • The 2 operands of the final adder in a SOP exhibit a peculiar arrival time pattern • As a result, traditional monolithic adders do not work well in SOP • Optimized for equal arrival times • Hence, hybrid adders are required, which exploit this arrival-time pattern • Hence it is critical to synthesize an efficient hybrid adder which is designed specifically for SOP blocks

  8. Proposed 4-Stage Hybrid Adder w1 w2 w3 w1 w4 w2 w3 w4 SubAdder1 RippleCarry SubAdder2 KoggeStone SubAdder3 CarrySelect SubAdder4 CarrySelect w1 w2 w3 w4 • Ripple-Carry architecture near LSB • Fast Kogge-Stone architecture near Middle • 2 Carry-Selects (based on Brent-Kung) near MSB • GOAL : Find w1 , w2 , w3 and w4 algorithmically

  9. Notations • We use the following notations: • The bit-width of SubAdder1 (Ripple) is w1 bits • The bit-width of SubAdder2 (Kogge-Stone) is w2 bits • The bit-width of SubAdder3 (Carry-Select, Brent-Kung) is w3 bits • The bit-width of SubAdder4 (Carry-Select, Brent-Kung) is w4 bits • w1 + w2 + w3 + w4 = n (total width of the hybrid adder) • T(ai) = Time when input signal ai is available • T(Si) = Time when output signal Si (Sumi) is available • T(Ci) = Time when output signal Ci (Carryi) is available

  10. x0 x1 y1 y0 FA FA z1 z0 SubAdder1 (Ripple-Carry) xk yk x2 y2 • Most area-efficient architecture • Very slow • Timing-efficient if input arrival time is skewed. We use it for a few bits near LSB (which arrive earliest) FA FA zk+1 zk z2

  11. Parallel-Prefix Adders (KS, BK) • In a Parallel-Prefix adder, Carry for each bit is computed by an efficient tree-structure (using the Generate and Propagate concept). • For each bit i of the adder, Generate (Gi) indicates whether a carry is generated from that bit • Gi = ai bi • For each bit i of the adder, Propagate (Pi) indicates whether a carry is propagated through that bit • Pi = ai bi • The Generate and Propagate concept is extendable to blocks comprising multiple bits, as we discuss next

  12. (Gright, Pright ) (Gleft, Pleft) (Gleft, right, Pleft, right ) Parallel-Prefix Adders (KS, BK) • If two blocks (comprising one or more bits) have the GP value-pairs as (Gleft, Pleft) and (Gright, Pright), then the combined block has the GP values as follows: • Gleft, right = Gleft (Pleft Gright) • Pleft, right = Pleft Pright • The above computation is performed by a carry-operator or ”o”-operator • Once we obtain carry for each bit, it is trivial to compute the sum output of each bit (XOR and NAND)

  13. SubAdder2 (Kogge-Stone) GP0 GP6 GP7 GP2 GP4 GP5 GP3 GP1 C7 C1 C8 C3 C5 C4 C6 C2 • Kogge-Stone Parallel prefix architecture • Delay: log2n levelsof ”o”-operator • Area: (n*log2n)-n+1 number of ”o”-operator Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973

  14. Brent-Kung (BK) GP0 GP6 GP7 GP2 GP4 GP5 GP3 GP1 C7 C1 C8 C3 C5 C4 C6 C2 • Brent-Kung Parallel prefix architecture • Delay: (2*log2n)-2 levels of ”o”-operator • Area: (2*n)-2-log2n number of ”o”-operator Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982

  15. SubAdder3 & SubAdder4 (Carry-Select) y x y x • Large area overhead • Used as a special case, since Cin arrives late • Speed depends on the architecture of two adders • But these adders need not be KS (rather, we use BK) • The arrival times of the inputs of SubAdder3 and SubAdder4 are earlier than those for SubAdder2 1’b1 1’b0 Adder0 Adder1 z1 z0 Mux cin z

  16. Determination of width of SubAdder1 • Width of the Ripple adder (SubAdder1) • At every bit (i), compute T(Ci+1) and check if • T(Ci+1) ≤ T(ai+1) • T(Ci+1) ≤ T(bi+1) • If check passes, i = i+1 • Else continue checking until 3 consecutive bits fail the check (Hill Climbing) • Return the value i as the Ripple Adder width

  17. Determination of width of SubAdder2 • Width of Kogge-Stone Adder (SubAdder2) • The latest arriving signals are part of this adder • Hence keep this adder wide, while ensuring that this does not result in a very narrow Carry-Select adder for SubAdder3 and SubAdder4 • We determine the widths with the following equation: • w2 = n – w1 if (n-w1) ≤ 8 • w2 = 2p, where p = log2 (n-w1) if (n-w1) > 8 • Example: If n=32 and w1=7 then w2=16

  18. Delay of the Hybrid Adder w1 w2 w3 w1 w4 w2 w3 w4 SubAdder1 RippleCarry SubAdder2 KoggeStone SubAdder3 CarrySelect SubAdder4 CarrySelect w1 w2 w3 w4 T(C4) T(S4) T(S3) T(S2) Thybrid = max (T(C4), T(S4), T(S3), T(S2))

  19. Determination of widths of SubAdder3 andSubAdder4 • Width of the two Carry-Select adders • Initial width configuration • w3 = (n-w1-w2)/2 • w4 = (n-w1-w2-w3) • With this initial configuration, estimate delay of the overall hybrid adder (based on the previous slide) • Use an iterative approach to explore in the appropriate direction (similar to Binary Search) and converge on the smallest delay configuration

  20. Experimental Setup • To test our approach, we used: • Adders in several different types of SOP blocks (Multipliers, MAC, generalized SOP and Squarer) • Two process technologies (0.13µ and 0.09µ) • Two commercial library vendors • Two different arrival time constraints • We compared the results of our hybrid adder with the adder produced by a commercial datapath synthesis tool.

  21. Results On an average, 14.31%faster than the result of the commercial Synthesis tool (with 6.62% area penalty)

  22. Summary • Hybrid adder consists of 4 SubAdders • SubAdder1 has Ripple-Carry architecture • SubAdder2 has Kogge-Stone architecture • SubAdder3 and SubAdder4 have Carry-Select (based on Brent-Kung) architecture • Widths of all SubAdders are computed based on a timing-driven analysis • On an average, 14.31% faster (with 6.62% area penalty)

  23. Thank you

More Related