1 / 41

A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits

A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits. Behnam Ghavami and Hossein Pedram Presented by Wei- Lun Hung. Outline. Introduction AsyncTool : Synthesis of QDI Asynchronous Circuits Statistic Performance Analyzing Transistor’s Parameters Assignment

Download Presentation

A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits BehnamGhavami and HosseinPedram Presented by Wei-Lun Hung

  2. Outline • Introduction • AsyncTool: Synthesis of QDI Asynchronous Circuits • Statistic Performance Analyzing • Transistor’s Parameters Assignment • Experimental Results • Conclusion

  3. Introduction • The VLSI design challenges • High power consumption • Synchronization problems • Robust issues • One possible solution: Asynchronous circuit • Low power consumption • No clock skew • Low Electromagnetic Interference (EMI)

  4. Asynchronous Circuits • Not controlled by global clock • Eliminate clock skew • Potentially faster • Low power consumption • Low EMI • Rely on exchanging handshaking • Limitations • Lack of automatic synthesis tool • Hard to evaluate performance of asynchronous circuits

  5. Transistor’s Parameters • The Vth, Vdd and gate size are the parameters which affect the performance of circuits • Heuristically search to find a good tradeoff according to the optimization goal • The optimization of synchronous circuits • Multiple-Vth and multiple-Vdd assignment • Ex: the gates on critical paths operate at the higher Vdd or lower Vth • The optimization asynchronous circuits • Cannot compute a critical path as synchronous circuits • Depends on dynamic factors, ex: # of tokens

  6. The Framework of Asynchronous Circuit

  7. AsyncTool: Synthesis of QDI Asynchronous Circuits

  8. Asynchronous Circuit Model • Delay-insensitive (DI) • Most robust of all asynchronous circuit delay models • Makes no assumptions on the delay of wires or gates • Any transition on an input to a gate must be seen on the output • Not practical due to the heavy restrictions • Quasi delay-insensitive (QDI) • Like DI, but • Assume that the delay of the branch are equal (isochronic forks) • Use Verilog-CSP Code in this framework

  9. AsyncTool: Synthesis of QDI Asynchronous Circuits • Use Pre-Charge logic Full-Buffer (templates) for its predefined templates • Encapsulate all isochronic forks inside • Eliminate isochronic fork constrain • 3 Parts • Arithmetic function extractor (AFE) • Ex: Addition, subtraction, comparison ... • Implements them with pre-synthesized standard templates • Decomposition • Template Synthesizer (TSYN) • one-bit operators, ex: AND, OR, XOR, … • Expander is used to convert multiple-bit expressions

  10. Decomposition (1/2) • Decompose the original description into an equivalent collection of smaller interacting processes • Convert to dynamic single assignment form • Projection • Dynamic Single Assignment form

  11. Decomposition (2/2) • Projection • Break the program up into a concurrent system of smaller modules

  12. Statistic Performance Analyzing

  13. Petri-Nets • Used to model concurrency and synchronization • Represented as a bipartite graph • Defined as four-tupleN = (P, T, F, m0) • P: Set of places • T: Se qt of Transitions • F ⊆ (P × T) ∪ (T × P): Flow relation • m0: Initial marking • A Masking is a mapping M: P → N

  14. Petri-Nets Examples

  15. Timed Petri-Net • A Petri-Net in which transitions or places are annotated with delays • For a cycle Ck, the cycle metric is • CM(Ck) = D(Ck)/M(Ck) • D(Ck) = ∑di, ∀i ∈ Ck • The performance of a Timed Petri-Net is dictated by the cycle time  largest cycle metric • CTime = MAX[CM(Ck)], ∀Ck∈ TPN • Can be resolved by Maximum Mean-Cycle Algorithms

  16. Average Case VS Worst Case

  17. Probabilistic Timed Petri-Net

  18. The Average-Case Performance Metric • For a P-TPN has only one choice with n outcomes • Convert to n TPN models • For a P-TPN has more than one choice • Recursively the following formula

  19. Probability Model • Use the static range of the primary inputs of the circuit to determine the static range or internal signals • Independent VS dependent

  20. Computing the Static Range (1/3) • The tagged static ranges of a variable v is shown by TSR(v), where r ∈TSR(v) is expressed as <r.ct, r.vt, r.sr> • r.ct: the conditional tag • r.vt: the variable expression tag • r.sr: the static range

  21. Computing the Static Range (2/3) • Having the static range of the right hand side variables can compute the static range or left hand side variable by Where ° is a standar operator on data values and • is operation on static ranges

  22. Computing the Static Range (3/3) • For a loop

  23. Computing Choice Probabilities(1/3) • For a condition variable CV(X>Y)

  24. Computing Choice Probabilities(2/3)

  25. Computing Choice Probabilities(3/3)

  26. Template’s Parameters Assignment • The Vth, Vdd and gate size are the parameters which affect the performance of circuits • Dual-Vdd, dual-Vth and eight sizes for each type of template • Adopt Quantum genetic algorithm

  27. The Genetic Algorithm • A search technique used in computing to find exact or approximate solutions to optimization • Use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover • Population: abstract representations of candidate solutions • Repopulation: generate a second generation population of solutions from those selected through genetic operators • Fitness function: decide the surviving chance of individuals

  28. The Quantum Genetic Algorithm • The circuit configuration information is encoded into qubit • A qubit may be in ‘1’ or ‘0’ state, or in any superposition of the two, represented as ⎜Ψ〉=α⎜1〉+β⎜0〉 , where ⎜α⎜2+⎜β⎜2 = 1 , give the probability that the qubit will be found in ‘0’ or ‘1’

  29. The Quantum Genetic Algorithm • The population of mqubitindividals at generation g is denoted as Q(g) = {q1g, q2g, …, qng} , where qjis defined as

  30. The Update Procedure

  31. The Quantum Genetic Algorithm

  32. Fitness Function • Power • The leakage of a template depends on the number of transistors that re turned off under inputs • Calculate the gate leakage under each input pattern • Area • A qubit have little chance to survival if its area is larger than the area constraint • Performance

  33. Control Parameters • Population size • For a small population, the genetic diversity may not increase for many generations • For a large population, it may increase the computing time but take fewer generation to find the best solutions • Small population of size 10 to 15 perform very well • Termination condition • The power reduction is less than 0.0005% during the last 200 generations

  34. Performance Estimation Results

  35. Power Optimization Results

  36. Power Optimization Results

  37. Different Technique Comparisons

  38. Comparison to worst-case optimized circuits

  39. Conclusion • An efficient design framework for optimizing reducing total power consumption while maintaining the high performance of circuits • Use Probabilistic Timed Petri-Net model to capture the dynamic behavior of the system • The proposed assigning threshold-voltage, supply-voltage and template sizing method is based on a quantum genetic algorithm • 5X ~ 7X savings for power consumptions with 2.5% performance penalty

  40. Comments • Not Scalable? • Have to specify the static range of the inputs of the circuits • The connection between synthesis and parameter assigning is not strong • Experimental results are questionable • Many typos

More Related