1 / 21

Clockless Logic: Asynchronous Pipelines

Clockless Logic: Asynchronous Pipelines. MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001. MOUSETRAP Pipelines. Simple asynchronous implementation style, uses… transparent latches

hea
Download Presentation

Clockless Logic: Asynchronous Pipelines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clockless Logic: Asynchronous Pipelines MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001

  2. MOUSETRAP Pipelines Simple asynchronous implementation style, uses… • transparent latches • simple control:1 gate/pipeline stage Target datapath = static logic blocks “MOUSETRAP”: uses a “capture protocol” Latches … • are normally transparent: beforenew data arrives • become opaque: afterdata arrives (“capture” data) Control Signaling:transition-signaling = 2-phase • simple protocol: req/ack = only 2 events per handshake (not 4) • no “return-to-zero” • each transition (up/down) signals a distinct operation Our Goal: very fast cycle time • simple inter-stage communication

  3. MOUSETRAP: A Basic FIFO Stages communicate usingtransition-signaling: Latch Controller 1 transition per data item! ackN-1 ackN En doneN reqN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1 2nd data item flowing through the pipeline 1st data item flowing through the pipeline 1st data item flowing through the pipeline

  4. MOUSETRAP: A Basic FIFO (contd.) Latch is disabled when current stage is “done” Latch is re-enabled when next stage is “done” Latch controller (XNOR) acts as “phase converter”: • 2 distinct transitions (up or down)  pulsed latch enable Latch Controller 2 transitions per latch cycle ackN-1 ackN En reqN reqN+1 doneN Data in Data out Data Latch Stage N-1 Stage N Stage N+1

  5. MOUSETRAP: FIFO Cycle Time 3 Latch Controller 2 ackN-1 ackN En reqN reqN+1 doneN 1 2 Data in Data out Data Latch Fast self-loop: N disables itself Stage N-1 Stage N Stage N+1 Cycle Time = N re-enabled to compute N+1 computes N computes

  6. Detailed Controller Operation • One pulse per data item flowing through: • down transition:caused by“done” of N • up transition:caused by“done” of N+1 • No minimum pulse width constraint! • simply, down transition should start “early enough” • can be “negative width” (no pulse!) Stage N’s Latch Controller ackfrom N+1 donefrom N to Latch

  7. MOUSETRAP: Pipeline With Logic logic logic logic Logic Blocks:can use standard single-rail (non-hazard-free) “Bundled Data” Requirement: • each“req”must arrive after data inputs valid and stable Simple Extension to FIFO: insert logic block + matching delay in each stage Latch Controller ackN-1 ackN reqN+1 reqN delay delay delay doneN Data Latch Stage N-1 Stage N Stage N+1

  8. Special Case: Using “Clocked Logic” pull-up network A B “keeper” “keeper” logic inputs logic inputs En En logic output logic output En En pull-down network A B A General C2MOS gate Clocked-CMOS = C2MOS: eliminate explicit latches • latch folded into logic itself C2MOS AND-gate

  9. Gate-Level MOUSETRAP: with C2MOS En,En 2 2 2 (ack,ack’) (done,done’) (En,En’) Use C2MOS:eliminate explicit latches New Control Optimization =“Dual-Rail XNOR” • eliminate 2 inverters from critical path Latch Controller ackN-1 ackN 2 2 2 doneN 2 2 reqN reqN+1 pair of bit latches C2MOS logic Stage N-1 Stage N+1 Stage N

  10. Complex Pipelining: Forks & Joins fork join Non-Linear Pipelining: has forks/joins Contribution: introduce efficient circuit structures • Forks: distributedata + controlto multiple destinations • Joins: mergedata + controlfrom multiple sources • Enabling technology for building complex async systems Problems with Linear Pipelining: • handles limited applications; real systems are more complex

  11. Forks and Joins: Implementation ack1 C ack ack2 req1 C req req req2 Stage N Stage N Join:merge multiple requests Fork:merge multiple acknowledges

  12. Related Protocols Day/Woods (’97), and Charlie Boxes (’00) Similarities: all use… • transition signaling for handshakes • phase conversion for latch signals Differences: MOUSETRAP has… • higher throughput • ability to handlefork/joindatapaths • more aggressive timing, less insensitivity to delays

  13. Performance, Timing and Optzn. Stage Latency = Cycle Time = Stage Latency = Cycle Time = MOUSETRAP with Logic: MOUSETRAP Using C2MOS Gates:

  14. Timing Analysis Latch Controller ackN-1 ackN reqN+1 reqN delay delay doneN logic logic Data Latch Stage N Stage N-1 Main Timing Constraint: avoid “data overrun” Data must be safely “captured” by Stage N before new inputs arrive fromStage N-1 • simple 1-sided timing constraint: fast latch disable • Stage N’s “self-loop” faster than entire path through previous stage

  15. Timing Optzn: Reducing Cycle Time Analytical Cycle Time = Goal:shorten (in steady-state operation) Steady-state = no undue pipeline congestion Observation: • XNOR switches twice per data item: • only 2nd (up) transition criticalfor performance: Solution: reduce XNOR output swing • degrade “slew” for start of pulse • allows quick pulse completion: faster rise time Still safe when congested:pulse starts on time • pulse maintained until congestion clears

  16. Timing Optzn (contd.) “optimized” XNOR output “unoptimized” XNOR output N’s latch disabled N’s latch re-enabled N “done” N+1 “done” latch only partly disabled; recovers quicker! (no pulse width requirement)

  17. Comparison with Wave Pipelining Two Scenarios: • Steady State: • both MOUSETRAP and wave pipelines act like transparent “flow through” combinational pipelines • Congestion: • right environment stalls: each MOUSETRAP stage safely captures data • internal stage slow: MOUSETRAP stages to its left safely capture data congestion properly handled in MOUSETRAP Conclusion: MOUSETRAP has potential of… • speed of wave pipelining • greater robustness and flexibility

  18. Timing Issues: Handling Wide Datapaths En En reqN doneN reqN+1 reqN doneN reqN+1 Stage N-1 Stage N Stage N-1 Stage N Buffers inserted to amplify latch signals (En): Reducing Impact of Buffers: • control uses unbuffered signals  buffer delay off of critical path! • datapath skewed w.r.t. control Timing assumption: buffer delays roughly equal

  19. En reqN doneN reqN+1 Stage N-1 Stage N

  20. Preliminary Results Pre-Layout Simulations of FIFO’s: • do not account for wire delays, parasitics, etc. • careful transistor sizing/verification of timing constraints

  21. Conclusions and Future Work Introduced a new asynchronous pipeline style: • Static logic blocks • Simple latches and control: • transparent latches, or C2MOS gates • single gate control = 1 XNOR gate/stage • Highly concurrent event-driven protocol • High throughputs obtained: • 3.5 GHz in 0.25, 1.9 GHz in 0.6 • comparable to wave pipelines; yet more robust/less design effort • Correctly handle forks and joins in datapaths • Timing constrains: local, 1-sided, easily met Ongoing Work: • more realistic performance measurement (incl. parasitics) • layout and fabrication

More Related