670 likes | 999 Views
Synchronization Ideas. Charles E. Dike Intel Corporation. Introduction. Tutorial Share some ideas about synchronization and metastability Introduce NEW, IMPROVED theory on metastability Charles Dike (cdike@ichips.intel.com). Synchronous Clock at 1.5GHz. Synchronous Clock at 3.0 GHz.
E N D
Synchronization Ideas Charles E. Dike Intel Corporation
Introduction • Tutorial • Share some ideas about synchronization and metastability • Introduce NEW, IMPROVED theory on metastability • Charles Dike (cdike@ichips.intel.com)
Synchronous Clock at 1.5GHz Synchronous Clock at 3.0 GHz Asynchronous Circuit Pausable Clock at 1.8 GHz Synchronous Clock at 1.5GHz Why and where synchronize? • Reduce latency between independent clock domains. • Asynchronous domain to synchronous clock. • Synchronous clock to an independent synchronous clock. • Benefit - higher performance in critical circuits.
MEM MEM MEM MEM FPU FPU FPU FPU ALU ALU ALU ALU Design Direction 80s towards 100MHz 90s towards 1GHz 00s multi-GHz VALUE ADDED
Chip Area Networks Late 00s multi-GHz
I believe…. • We must be able to synchronize all domains to a PLL controlled clock • Interconnect on chip will be asynchronous (GALS) • We need to minimize latency • There will be two basic synchronizer uses - near neighbor and the chip net
Topics of Discussion • Generic synchronizer of the type used in the TeraFlops computer • Simple synchronizer of the type used in StrongArm • The Myrinet pipeline synchronization scheme • Latest understanding of metastability
Generic Synchronizer • Handles self timed to synchronous interfaces and vice-versa • Supports synchronous to synchronous interfaces • Can handle streaming data • Adaptable to any speed range • Possibly used over the chip network
D D Q Q Two flop synch VALID #1 #2 CLK
D D D D Q Q Q Q Q S R LATCH OUTPUT SENDER CLOCK RECEIVER CLOCK Single latch synch ACK REQ CLK1 CLK2 Write Valid Read Valid
ACK REQ CLK1 D D D D D D D D Q Q Q Q Q Q Q Q CLK2 Write Valid Read Valid Q Q S S R R Multi latch synch ACK REQ CLK1 CLK2 Write Valid Read Valid
1 1 0 0 1 0 0 1 0 0 0 1 SYNC 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 General Case WRITE POINTER STATUS REGISTER READ POINTER SYNCHRONIZERS EMPTY FULL PADDING LATENCY EN EN EN Write Clock Read Clock Write Enable
D D D D Q Q Q Q D D Q Q R R R R EN EN R R empty case WRITE POINTER READ POINTER STATUS REGISTER SYNCHRONIZER EMPTY Write Pointer a Read Pointer a EMPTY Write Pointer b Write Enable Write Clock Read Clock Read Pointer b
1 1 0 0 1 0 0 1 0 0 0 1 SYNC 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 General Case WRITE POINTER STATUS REGISTER READ POINTER SYNCHRONIZERS EMPTY FULL PADDING LATENCY EN EN EN Write Clock Read Clock Write Enable
Topics of Discussion • Generic synchronizer of the type used in the TeraFlops computer • Simple synchronizer of the type used in StrongArm mprocessor • The Myrinet pipeline synchronization scheme • Latest understanding of metastability
Simple Synchronizer • Constrained by frequency ratio • Supports synchronous to synchronous interfaces • Does it support asynch to synch? Yes, with restrictions. • Possibly used in local neighbor synchronizers
D Q D D D Q Q Q Simple Synchronizer SYNC A2 A3 A A1 w x y z SLOW CLK MI* Divide by 2 FAST CLK MI* = Metastable Immune
FAST CLOCK SLOW CLOCK A D Q D D D Q Q Q A1 A2 A3 SYNC timing1 SYNC A2 A3 A A1 SLOW MI* FAST Divide by 2 1 2 3 4 5 6
FAST CLOCK 1 2 3 4 5 6 SLOW CLOCK SYNC CHEATER CLOCK D Q D D D Q Q Q timing2 SYNC A2 A3 A A1 SLOW MI* FAST Divide by 2
D Q D D D Q Q Q timing3 SYNC A2 A3 A A1 SLOW MI* FAST Divide by 2 FAST CLOCK 1 2 3 4 5 6 SLOW CLOCK SYNC CHEATER CLOCK
D D Q Q D D D D D D Q Q Q Q Q Q timing4 SYNC A2 A3 A A1 SLOW MI* MI* FAST Divide by 2 SYNC A2 A3 A A1 MI* FAST FAST CLOCK 1 2 3 4 5 6 SLOW CLOCK SYNC SLOW CLOCK# SYNC
FAST CLOCK 1 2 3 4 5 6 SLOW CLOCK SYNC CHEATER CLOCK D D D D Q Q Q Q SYNC SYNC FAST CLOCK FAST CLOCK transfers SLOW TO FAST TRANSFER FAST TO SLOW TRANSFER SLOW CLOCK SLOW CLOCK
Topics of Discussion • Generic synchronizer of the type used in the TeraFlops computer • Simple synchronizer of the type used in StrongArm • The Myrinet pipeline synchronization scheme • Latest understanding of metastability
Pipeline Synchronizer • Supports synchronous to synchronous interfaces • Supports asynch to synch and vice-versa • Possibly used in local neighbor synchronizers • Essentially a distributed fifo and synchronizer
f0 f1 f0 S S S Ri Ri Ri Ro Ro Ro Di Di Di Do Do Do Ao Ao Ao Ai Ai Ai Pipeline Synchronizer
R1 A1 ME R0 A0 S f0 ME element X f0 REQ
Ri Ro Di Do Ao Ai C C Fifo element Ro Ri Data Ai Ao
f0 f1 f0 S S S Ri Ri Ri Ro Ro Ro Di Di Di Do Do Do f0 Ao Ao Ao Ai Ai Ai f1 Async to sync Asynchronous Synchronous
Ri Ro Ri Ro Ri Ro Di Do Di Do Di Do Ao Ai Ao Ao Ai Ai S S S f0 f1 f0 f0 f1 Sync to async Asynchronous Synchronous
Points to ponder #1 • All synchronizing interfaces have one thing in common - a latching element that holds data while metastabilities are being resolved. • There is no way to avoid the latency which is required to resolve metastabilities. • To minimize latency the latching element characteristics can be improved. • We will be required to understand and use this knowledge. This is the future of digital design.
Topics of Discussion • Generic synchronizer of the type used in the TeraFlops computer • Simple synchronizer of the type used in StrongArm • The Myrinet pipeline synchronization scheme • Latest understanding of metastability
Role of the Synchronizing Flop • Reorients incoming information to a clock edge • Its performance determines system failure rate or latency
Real Life • There is no magic bullet • There is a lot of misinformation on metastability around • To date many circuits have been over designed through planning and luck • Whenever a circuit fails based on too high of a frequency ultimately the cause of failure is metastability • There is no way to synchronize a signal faster than about the time it takes to pass a signal through six static gates
NODE A NODE B Metastability is.... OUT SET OUT RESET
Tw (window size) - likelihood of entering a metastable state - in units of time Tau (t) - rate at which metastability resolves - in units of time MTBF (Mean Time Between Failures) e t/t MTBF = Twfdfc Technical terms <Vn2>=4kT/C <thermal noise
D time of data after clock Propagation delay Simple jamb latch NODE B NODE A OUT DATA CLOCK RESET
D time of data after clock Propagation delay Simple jamb latch NODE B NODE A OUT DATA CLOCK RESET ~RC time constant
D time of data after clock Propagation delay Rough Histogram Tw The slope is thet D time of data after clock (log scale) Propagation delay e t/t MTBF = Twfdfc
e t/t MTBF = Twfdfc Why is the theory a problem? • It assumes a uniform distribution of data about the clock • What happens when data always violates the setup/ hold window? • It is not detailed enough • Doesn’t consider a deterministic region • Doesn’t account for thermal noise • People tend to extrapolate the theory improperly
Overview of refined theory • Not everything past a normal propagation is a metastable event • The Tw window can’t be improved by input edge rates • Tw has a complex relationship to t based on load • The MTBF formula needs to be modified due to non-uniform distribution of data about the clock input
PULSE GENERATOR #1 PC R DELAY D Q TRIGGER TEK 11801-B OSCILLOSCOPE PULSE GENERATOR #2 DELAY INPUT Test case
advancing time Measuring real data
Inflection point Histogram 0.6mv/0.1ps time
Inflection point Histogram 0.6mv/0.1ps time
Measured versus Basic Tw The slope is thet 0.6mv/0.1ps D time of data after clock (log scale) Propagation delay Propagation delay e t/t MTBF = Twfdfc
t Simulated.... Battery Voltage Controlled Switch R1 = 100 W R1 = 100M W
Latch outputs at nodes 1 and 2 1.5 volts 1.0 t = | t1 - t2 | V2 V1 ln 0.5 0.0 1.0 1.2 1.4ns Semilog difference between latch outputs Where: V1 = voltage at time t1 V2 = voltage at time t2 100 t2 volts 10-3 t1 10-6 1.0 1.2 1.4ns time Tau Simulated 2
k = 1.38 x 10-23 J/K t = 20 picoseconds B = 1/t =5 x 1010Hz R = ~400 W T = 300o K <Vn2>=4kT/C=4kTBR Vn = ~0.6 mv