1 / 44

Lucas-Lehmer Primality Tester

Lucas-Lehmer Primality Tester. Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka. Agenda. Background (Marques) Project Description (Marques) Algorithmic Description (Joe) Data Flow/Block Diagram (Joe)

guri
Download Presentation

Lucas-Lehmer Primality Tester

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

  2. Agenda • Background (Marques) • Project Description (Marques) • Algorithmic Description (Joe) • Data Flow/Block Diagram (Joe) • Design Process (Nathan) • Simulations (Nathan) • Floorplan/Layout (Brian) • Conclusions (Brian)

  3. History of 2P-1 • 16th century it was believed 2P-1 was prime for all prime P’s • 1536 Hudalricus Regius proved 211-1 was not prime • French monk Marin Mersenne published Cogitata Physica-Mathematica where he stated 2P-1 was prime for P = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127 and 257 

  4. Lucas-Lehmer • François Edouard Anatole Lucas • 1876 proved that the number 2127 - 1 is prime using his own methods • Derrick Lehmer • 1930 he refined Lucas’s method

  5. Make History • December 2005 • 43rd Known Mersenne Prime Found!! • Dr. Curtis Cooper and Dr. Steven Boone • Professors at Central Missouri State University • 230,402,457-1

  6. Prime Number Competitions • Electronic Frontier Foundation • $50,000 to the first individual or group who discoversa prime number with at least 1,000,000 decimal digits (awarded Apr. 6, 2000) • $100,000 to the first individual or group who discoversa prime number with at least 10,000,000 decimal digits • $150,000 to the first individual or group who discoversa prime number with at least 100,000,000 decimal digits • $250,000 to the first individual or group who discoversa prime number with at least 1,000,000,000 decimal digits

  7. Mersenne Prime Algorithm • Only used for numbers that are in the form 2P-1 • For P > 2 • 2P-1 is prime if and only if Sp-2 is zero in this sequence: • S0 = 4 • SN = (SN-12 - 2) mod (2P-1)

  8. Example to Show 27 - 1 is Prime • 27 – 1 = 127 • S0 = 4 • S1 = (4 * 4 - 2) mod 127 = 14 • S2 = (14 * 14 - 2) mod 127 = 67 • S3 = (67 * 67 - 2) mod 127 = 42 • S4 = (42 * 42 - 2) mod 127 = 111 • S5 = (111 * 111 - 2) mod 127 = 0

  9. Algorithmic description We knew the necessary computations, but how to translate that to gates? Computations needed: Squaring (not a problem…) Add/Subtract (not a problem…) -Modulo (2n – 1) multiplication (?)

  10. Mechanisms behind the math • If done with brute force, modulo 2n-1 could have been ugly. • Would need to square and find the remainder via division. • Luckily, for that specific computation, math is on our side, the 2n-1 constraint saves us from division, as will be seen. • A quick search on www.ieee.org produced inspiration. • Reto Zimmermann. Efficient VLSI Implementation of Modulo (2n +- 1) Addition and Multiplication. Computer Arithmetic, 1999; p158-167.

  11. Useful Math: Multiplication Just like any other multiplication, a modulo multiplication can be computed by (modulo) summing the partial products. So modulo multiplication is multiplication using a modulo adder. From the Zimmerman paper

  12. Loop x16 1 16 16 16 16 16 S1 = (4 * 4) mod 127 - 2 = 14 . . . S5 = (111 * 111 - 2) mod 127 = 0 Loop xP-2 Block Diagram 16 Mod Calc 16 Next Partial Product P FSM 2 start Counter Mod add 4 2 1 done Register 16 2 S2 = (14 * 14) mod 127 - 2 = 67 Subtract 2 2 Register 1 1 Count Compare Out

  13. The Process So far: • Found Mathematical Means (core algorithm) • Found Computational Means (modulo multiplier, adder) • From the above, a high level C program was written in a manner that would easily translate to verilog and gates, or at least more standard operations Design Process int mod_square_minus(int value, int p, int offset) { int acc, i; int mod = (1 << p) - 1; for(acc=offset, i=0; i<(sizeof(int)*8-1); i++) { int a = (value >> i) & 1; int temp; if (a) { if (i-p > 0) temp = value << (i-p); else temp = value >> (p-i); acc = acc + temp + ((value << i) & ((1 << p) - 1)); } if (acc >= mod) acc = acc - mod; } return acc; } This easily translated into behavorial verilog, and readily turned into a gate-level implementation. Essentially it was written in a more low-level manner.

  14. Design Process The rest of the design can simply be thought of as a wrapper for the modulo multiplier. The following slides contain Verilog code that was directly taken from the C code below. module mod_mult(out, itrCount, x, y, mod, p, reset, en, clk); input [15:0] x, y, mod, p; output [15:0] out; input reset, en, clk; wire [15:0] pp, ma0, temp; output [3:0] itrCount; counter mycount(itrCount, reset, en, clk); partial_product ppg(pp, x, y, itrCount, mod, p); mod_add modAdder(out, pp, temp, mod); dff_16_lp partial(clk, out, temp, reset, en); endmodule Top level of multiplier

  15. module partial_product(out, x, y, i, mod, p); output [15:0] out; input [15:0] x, y, mod, p; input [3:0] i; wire [15:0] diff1, diff2, added, result, corrected, final; wire [15:0] high, low, shifted, toadd; wire cout1, cout2, ithbith, toobig; sub_16 difference1(diff1, cout1, {12'b0, i}, p); sub_16 difference2(diff2, cout2, p, {12'b0, i}); shift_left shiftL(high, y, diff1[3:0]); shift_right shiftR(low, y, diff2[3:0]); mux16 choose(high, low, shifted, cout1); shift_left shiftL2(toadd, y, i); and16 bigand(added, toadd, mod); fulladder_16 addhighlow(.out(result), .xin(added), .yin(shifted), .cin({1'b0}), .cout(nowhere)); sub_16 correct(.out(corrected), .cout(toobig), .xin(mod), .yin(result)); mux16 correctionMux(.out(final), .high(corrected), .low(result), .sel(toobig)); shift_right ibit({15'b0, ithbit}, x, i); select16 checkfor0(.out(out), .x(result), .sel(ithbit)); endmodule Partial Product Unit w/ modulo reduction

  16. Modulo Adder module mod_add(out, x, y, mod); input [15:0] x, y, mod; output [15:0] out; wire cout, isDouble, cin; wire [15:0] plus, lowbits, done, mod_bar, check; fulladder_16 add(.out(plus), .xin(x), .yin(y), .cin(cin), .cout()); invert_16 inverter(mod_bar, mod); and16 hihnbits(check, plus, mod_bar); and16 lownbits(done, plus, mod); or8 (cin, check[0], check[1], check[2], check[3], check[4], check[5], check[6], check[7], check[8], check[9], check[10], check[11], check[12], check[13], check[14], check[15]); compare_16 checkfordouble(isDouble, done, 16'b1111_1111_1111_1111); mux16 fixdouble(.out(out), .high(16'b0), .low(done), .sel(isDouble)); endmodule

  17. Final Design Process Notes • Lessons learned: Never tweak the schematics without retesting the verilog first. Timing issues can be subtle. Verilog is better for catching them and quickly fixing/retesting than schematics. • Considering total time spent during this phase, roughly half was on the “core” and the FSM, the rest on the “wrapper”.

  18. Road to verification : C 2 Examples of the high-level C implementations: Tyrion:~/Desktop/15525 nstohs$ ./prime4 7 round 1: (4 * 4 - 2) mod 127 = 14 round 2: (14 * 14 - 2) mod 127 = 67 round 3: (67 * 67 - 2) mod 127 = 42 round 4: (42 * 42 - 2) mod 127 = 111 round 5: (111 * 111 - 2) mod 127 = 0 27-1 is prime Tyrion:~/Desktop/15525 nstohs$ ./prime4 11 round 1: (4 * 4 - 2) mod 2047 = 14 round 2: (14 * 14 - 2) mod 2047 = 194 round 3: (194 * 194 - 2) mod 2047 = 788 round 4: (788 * 788 - 2) mod 2047 = 701 round 5: (701 * 701 - 2) mod 2047 = 119 round 6: (119 * 119 - 2) mod 2047 = 1877 round 7: (1877 * 1877 - 2) mod 2047 = 240 round 8: (240 * 240 - 2) mod 2047 = 282 round 9: (282 * 282 - 2) mod 2047 = 1736 211-1 is not prime

  19. Road to verification: Verilog Tests were either specific tests on important units such as Partial_Product Samples of Verilog Verification output: Partial Product Unit p = 7 380 ppOut= 56, x= 14, y= 14, i= 2, mod= 127, p= 7 400 ppOut= 112, x= 14, y= 14, i= 3, mod= 127, p= 7 420 ppOut= 0, x= 14, y= 14, i= 4, mod= 127, p= 7 440 ppOut= 0, x= 14, y= 14, i= 5, mod= 127, p= 7 Top Level p = 7 itrOut= x itrOut= 4 itrOut= 14 itrOut= 67 itrOut= 42 itrOut= 111 itrOut= 0 Top Level p = 11 itrOut= x itrOut= 4 itrOut= 14 itrOut= 194 itrOut= 788 itrOut= 701 itrOut= 119 itrOut= 1877 … …or top level tests. Note that these are the same results generated from the C code

  20. Road to verification: Schematic I Schematic Test of our modular adder. 128 + 68 Mod 127 = 69

  21. Road to verification: Schematic II Plot of the top level output after a single iteration, p=7 Output after a single iteration is 14, the expected value.

  22. Road to verification: Schematic III 4 14 67 42 111

  23. Road to verification: Intermission Disk Space required for a full-length schematic test of p=7 : 6 GB Time required for a full-length schematic test of p=7 : 5 hours Disk Space required for a full-length extractedRC test of p=7 : 20 GB Time required for a full-length extractedRC test of p=7 : 8 hours Simulations become lengthy due to tests needing to be “deep” to be useful.

  24. Layout: ExtractedRC – Full Run 4 14 67 42 111

  25. Timing To determine the bounds of our clock, Pathmill was used once major portions of the schematic was complete. The critical path through our design is one loop through the modular multiplier, which runs through the modular adder and partial products module. The pathmill delay of the modular adder was 9ns, and 5.2 ns through the partial products module. This already puts our total delay at 14.2 ns, putting our schematic delay at 70 MHz. For extractedRC, due in part to simulation issues, a conservative 50 MHz was chosen as the final clock.

  26. Issues • extractedRC of partial_product module • Registers switch • Custom design to DFFs with muxes • Switching from parallel calculations to series • Transistor count vs. clock cycles • Syncing up design between people • Transferring files • Different design styles • LONG simulation times • Floorplanning • Too much emphasis on aspect ratios and not enough on wiring • Couldn’t decide on one set floorplan

  27. Floorplan v1.0

  28. Floorplan v2.0

  29. Final Floorplan

  30. Pin Specifications

  31. Initial Module Specifications

  32. Final Module Specifications

  33. Chip Specifications • Transistor Count: 13,702 • Size: 296.51µm x 292.13µm • Area: 86,621µm² • Aspect Ratio: 1.01:1 • Density: 0.16 transistors/µm²

  34. Final Floorplan

  35. Final Floorplan

  36. Partial Product adder Sub_16 shift_left shift_right mux shift_right shift_left Select16 16-bit and

  37. Poly Layer Density: 7.14%

  38. Active Layer Density: 8.76%

  39. Metal1 Layer Density: 23.86%

  40. Metal2 Layer Density: 19.97%

  41. Metal3 Layer Density: 11.30%

  42. Metal4 Layer Density: 10.34%

  43. Conclusions • Plan for buffers -Will be hard to put them in after the fact • Your design will change dramatically from start to finish so be flexible • Communication is key • Do layout in parallel

More Related