200 likes | 295 Views
Lecture5 – Introduction to Cryptography 3/ Implementation. Rice ELEC 528/ COMP 538 Farinaz Koushanfar Spring 2009. Rivest, Shamir, Adelman (RSA). Number theory + difficulty of determining prime factors of a large number Two keys d and e are used for encryption and decryption
E N D
Lecture5 – Introduction to Cryptography 3/ Implementation Rice ELEC 528/ COMP 538 Farinaz Koushanfar Spring 2009
Rivest, Shamir, Adelman (RSA) • Number theory + difficulty of determining prime factors of a large number • Two keys d and e are used for encryption and decryption • Plaintext message P is encrypted to ciphertext C C=Pe mod n • The plaintext is recovered by P=Cd mod n • Encrypt/decrypt are mutual inverses and commutative P=Cd mod n = (Pe)d mod n = (Pd)e mod n
RSA – Key Choice • Starting point: select a value for n • Product of two large primes p and q – they are ~100 digits n is ~200 bits • A relatively large e is selected that is relatively prime to (p-1)*(q-1), one easy way is to select e to be larger prime than both (p-1) and (q-1) • Finally, d is selected such that e*d= 1 mod (p-1)*(q-1)
Mathematical Foundation • The Euler totient function(n)is the number of positive integers less than n relatively prime to n, if p is prime, then (p)=p-1 • If n=p*q, where p and q are both prime (n)=(p)*(q)= (p-1)*(q-1) • Euler and Fermat proved that x(n) =1 mod n For any integer x, if n and x are relatively prime
Mathematical Foundation -- RSA • Encrypt by RSA: E(P)=Pe • Value of e is selected s.t. the inverse d can be easily formed (inverses mod (n)) e*d=1 mod(n) • Or, e*d=k*(n)+1 for some int k • Because of Euler/Fermat results, assuming P and p are relatively prime Pp-1=1 mod p
RSA Math (Cont’d) • Since (p-1) is a factor of (n) Pk*(n)=1 mod p • Multiplying by P produces Pk*(n)+1=P mod p • The same is true for q: Pk*(n)+1=P mod q (Pe)d = Ped =Pk*(n)+1=P mod q=P mod p • Thus, (Pe)d = P mod n • And e and d are inverse operations
Crypto Processors • There are many many HW implementations of the standard security protocols, e.g., AES, DES, PKP • Please check: http://www.hardware-ciphers.com/en/index.html • Our goal is not to design a new one, or to teach you to design a new one, but to show to you how implementations look • What are the basic building blocks, what are the potential weaknesses/vulnerability of each block
Recommended reading • A. Hodjat, I. Verbauwhede. Minimum area cost for a 30 to 70 Gbits/s AES processor. IEEE Computer society Annual Symposium on VLSI, pp. 83- 88, 2004. • T. Good and M. Benaissa. AES on FPGA from the fastest to the smallest, 2005. • L. Batina, S. Berna Ors, B. Preneel and J. Vandewalle. Hardware architectures for public key cryptography, 2003.
Minimum Area Cost for a 30 to 70 Gbits/s AES Processor Alireza Hodjat Ingrid Verbauwhede Department of Electrical Engineering University of California, Los Angeles {ahodjat, ingrid} @ ee.ucla.edu IEEE Computer Society Symposium on VLSI (ISVLSI 04) February 2004 This material is based upon work supported by the Space and Naval Warfare Systems Center - San Diego under contract No. N66001-02-1-8938.
Outline • Motivation • Ultra high throughput AES implementation • Area efficient byte substitution • High speed AES with online key scheduling • High speed AES with offline key scheduling • Conclusion
Motivation • Cryptographically secure random number generation for optical link switches • Advanced Encryption Standard algorithm in the Counter mode of operation • Non-feedback mode of operation (pipelining is allowed)
Ultra High Throughput AES • The key length • Critical path is in the Key scheduling path • Fixed key size : only 128-bit • Loop-unrolling • Pipelining • Inner round pipelining • Outer round pipelining • Choice of byte-substitution phase • LUT implementation • Implementation using GF operations (further pipelining)
Byte substitution optimization • Byte substitution on GF(28) • First: multiplicative inverse in GF(28) • Second: Affine transformation (over Gf(2)) • Multiplicative inverse in GF(28) is expensive • Area efficient implementation using GF(24) operations
a : Byte substitution using LUT implementation b : Non-pipelined Sbox using GF operations c : Two-stage pipelined Sbox using GF operations d : Three-stage pipelined Sbox using GF operations Area Efficient Byte Substitution
Area-Delay Trade-off for Sbox • The area cost of the Sbox using two-stage and three-stage composite field implementation is 23% and 32% less than the LUT design with the same speed
2 pipeline stages per round 3 pipeline stages per round 4 pipeline stages per round High Speed AES with Online Key Scheduling
Throughput-Area Trade-off for AES • Area cost for the design with three pipeline stages is 35% less than the design with LUT Sbox implementation • Area cost for the design with four pipeline stages is 30% less than the design with LUT Sbox implementation
High Speed Design with Offline Key Scheduling • Key does not vary as frequent as data • Pre-calculate the key schedule and store them in the round key registers • Key schedule is done in 20 cycles
Throughput-Area Trade-Off • Offline key scheduling unit can reduce the area up to 28 %. • Area cost for the design with three pipeline stages is 37% less than the design with LUT Sbox implementation • Area cost for the design with four pipeline stages is 33% less than the design with LUT Sbox implementation
Conclusion • Area efficient architectures for 30 to 70 Gbits/s AES processor • Loop unrolling and inner and outer round pipelining were used • Pipelined design of composite field implementation of the byte substitute phase reduces the area cost up to 35% • Offline key scheduling unit reduces the area cost up to 28% • Total area cost of the final architecture was reduced up to 48%