500 likes | 1.09k Views
Random Number Generator. May 1, 2006. Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan. Why Random Numbers?. Real-Time Simulations Encryption Gambling. Encryption Need random numbers for authentication
E N D
Random Number Generator May 1, 2006 Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan
Why Random Numbers? • Real-Time Simulations • Encryption • Gambling
Encryption • Need random numbers for authentication • Key generation • Software vs. Hardware • Less power/time per number • Portable • Gambling • ePoker Rooms • SoC Deck Generation • Other future casino games
Business Plan • Potential markets • Defense and Intelligence Organizations • E-Gambling / Casinos • Game Consoles • Mobile Communication • License the IP • Our design will be part of a larger ASIC or GPP design
IBAA Algorithm • Uses RC4 encryption algorithm • Cryptographically secure • Deterministic • 1024-bit number generated • Internally Updated Seed • not user visible = secure
The IBAA Algorithm #define ALPHA (8) #define SIZE (1<<ALPHA) #define ind(x) ((x)&(0x1F)) #define barrel(a) (((a)<<19)^((a)13)) uint32 A, B, Y, X; uint32 M[32], R[32]; … for ( i=0; i<SIZE; i++ ) { X = m[ind(i)]; A = barrel(A) + M[ind(i +16)]; M[ind(i)] = Y = M[ind(X)] + A + B; R[ind(i)] = B = M[ind(Y>>ALPHA)] + X; }
IBAA Algorithm to Architecture for ( i=0; i<SIZE; i++ ) { X = M[ind(i)]; A = barrel(A) + M[ind(i +16)]; M[ind(i)] = Y = M[ind(X)] + A + B; R[ind(i)] = B = M[ind(Y>>ALPHA)] + X; } 4 Reads from M 1 Write to M 1 Write to R dependencies, feedback, and RAW hazards
Algorithm to Architecture • Hardware Limits • Max. of 2 simultaneous reads from memory • Can’t do better than two stages • Each stage must take multiple cycles to complete
Algorithm to Architecture • Chosen Timing • Addition = 1 cycle • Memory Read = 0.5 cycles • Memory is clocked ½ period off phase • Set address and receive data in 1 cycle • When forwarding is applied, need 4 cycles per stage
Adder (X) Reg (M4) Reg SRAM (M) (M1) Reg (M2) Reg (M3) Reg Stage 1 -------------------------------------- M1 = M[i+16] -------------------------------------- X = M[i] | A = M1 + barrel(A) -------------------------------------- M3 = M[X] | C1 = (X==i-1) -------------------------------------- Y1 = A + (C1) ? Y : M3 Stage 2 ------------------------------------ Y = B + Y1 ------------------------------------ M4 = M[Yaddr] | C2 = (i==Yaddr) ------------------------------------ B = X + (C2) ? Y : M4 ------------------------------------ M[i] = Y | R[i] = B Counter Control Logic FSM Counter Register SRAM (R) (B) Reg (Y) Reg Adder (Y1) Reg Adder (A) Reg Adder
Design For Manufacture Regular Fabrics
Why DFM? • Ability to print on smaller processes • Robust Manufacturability • Sacrifice area, speed and metal layers for a regular design
Regular Fabrics Sample Layout:
Adder • Four adders execute 256 times. • Hybrid adder • Fast and low power. A[27:10] B[27:10] B[31:28] A[31:28] B[9:4] A[9:4] B[3:0] A[3:0] C[32] C’[28] C[10] C’[4] CS4 CS18 CS6 CS4 S[31:28] S[27:10] S[9:4] S[3:0]
Adder Performance • Delay: 1.56 ns • Energy Consumption • (worst case switching) : 12.4 pJ • Power Dissipation • (estimating with our switch factor) : 148 μW
SRAM Single Bus Cell Double Bus Cell
Functional Verification • Structural Verilog vs. C Code: • Generate numbers under equal load conditions • Compare Numbers • Schematic vs. Structural Verilog • Under equal inputs, check if port outputs match • LVS
Verification • Schematic and Extracted Parasitic spice simulations of major blocks • Check for clean signals • Check delays and rise/fall times • Extracted Parasitic simulation of critical Register-Register Path • Signals are clean • Delay = 2.1 ns • Extracted Parasitic simulation of chip clock distribution
Poly Density 7.52% Metal1 Density 20.85 %
Metal3 Density 18.76% Metal2 Density 19.89%
Metal4 Density 9.36% Metal5 Density 6.8%
Specifications • Pins • 36 input pins • 32 bit seed input, gen, read, rst, clk • 34 output pins • 32 bit random output, rdy, done • 2 input/output pins • vdd, gnd • 475 MHz chip speed • 436 KHz throughput
Putting it All Together Schematic ExtractRC
Where to Now ? • ERC, tapeout, etc. • Thermal noise unit to use as input seed • On-Chip Bus Interface • HyperTransport™ Interface
References • Jenkins, Robert J. “ISAAC”. http://burtleburtle.net/bob/rand/isaac.html • Chirca, Schulte, Glossner, et al. “A Static Low-Power, High-Performance 32-bit Carry Skip Adder”. http://mesa.ece.wisc.edu/publications/cp_2004-12.pdf • “CLA and Ling Adders”. http://umunhum.stanford.edu/~farland/notes.html