420 likes | 555 Views
Bogazici University Department of Computer Engineering C mpE 220 Discrete Mathematics 09. Applications of Number Theory Haluk Bingöl. Applications from §2.4. Hashing Functions (hashes) Pseudorandom Numbers Cryptology. Hashing Functions. Hashing Functions. Also known as:
E N D
Bogazici UniversityDepartment of Computer EngineeringCmpE 220 Discrete Mathematics09. Applications of Number TheoryHaluk Bingöl
Applications from §2.4 • Hashing Functions (hashes) • Pseudorandom Numbers • Cryptology
Hashing Functions • Also known as: • hash functions, hash codes, or just hashes. • Two major uses: • Indexing hash tables • Data structures which support O(1)-time access. • Creating short unique IDs for long documents. • Used in digital signatures – the short ID can be signed, rather than the long document.
Hash Function Requirements • Def.A hash function h:A→B is a map from a set A to a smaller set B(i.e., |A| ≥ |B|). • An effective hash function should have the following properties: • It should cover (be onto) its codomain B. • It should be efficient to calculate • ideally, it should take O(log |A|) operations. • The cardinality of each preimage of an element of B should be about the same. • b1,b2B: |h−1(b1)| ≈ |h−1(b2)| • That is, elements of B should be generated with roughly uniform probability. • Ideally, the map should appear random, so that clearly “similar” elements of A are not likely to map to the same (or similar) elements of B.
Hash Function Requirements • Furthermore, for a cryptographically secure hash function: • Given an element bB, the problem of finding an aA such that h(a)=bshould have average-case time complexity of Ω(|B|c) for some c>0. • This ensures that it would take exponential time in the length of an ID for an opponent to “fake” a different document having the same ID.
A Simple Hash Using mod • Let the domain and codomain be the sets of all natural numbers below certain bounds: • A = {aℕ | a <alim},B = {bℕ | b <blim} • Then an acceptable (although not great) hash function from A to B (when alim≥blim) is h(a) = a mod blim. • It has the following desirable hash function properties: • It covers or is onto its codomain B (its range is B). • When alim≫blim, then each bB has a preimage of about the same size, • Specifically, |h−1(b)| = alim/blim or alim/blim. • However, it has the following limitations: • It is not very random. • For example, if all a’s encountered happen to have the same residue mod blim, they will all map to the same b! • It is definitely not cryptographically secure. • Given a b, it is easy to generate a’s that map to it: • Namely, we know that for any nℕ, h(b + nblim) = b.
Hash Table Characteristics • A hash table is a type of container data structure often used for representing a set. • It has these properties: • Every element e stored is assigned a unique keyk(e) • It identifies the element and can be easily calculated from e. • It supports the following operations with O(1) expected (average case) time: • Looking up an element e given its key k. • Adding a new element e to the hash table. • Deleting an element e from the hash table. • Listing the next element, in an enumeration of all elements.
Simple Hash Table Implementation • There is an array eb with blim locations for storing elements. • procedurestore(e: element)a := k(e) {calculate key a of element}b := h(a) {hash the key to a storage location b}while (eb≠ null k(eb) ≠ a) {not e or empty}b = (b+1) mod blim {go to next loc.,circularly}ifeb = null theneb = e {store it if it wasn’tthere} • procedurelookup(aA: desired element’s key)b := h(a) {hash it to a location b}while (eb≠ null k(eb)≠ a) {not e or empty}b = (b+1) mod blim {go to next loc., circularly}ifk(eb) = athenreturnebelse return null Exercise: What happens when the array is full?How might you fix this problem?
Digital Signature Application • Many digital signature systems use a cryptographically secure (but public) hash function h which maps arbitrarily long documents down to fixed-length (e.g., 1,024-bit) “fingerprint” strings. • Document signing procedure: • Given a document a to sign, quickly compute its hash b = h(a). • Compute a certain function c = f(b) that is known only to the signer • This step is generally slow, so we don’t want to apply it to the whole document. • Deliver the original document together with the digital signature c.
Digital Signature Application • Signature verification procedure: • Given a document a and signature c, quickly find a’s hash b = h(a). • Compute b′ = f −1(c). (Possible if f’s inverse f −1 is made public.) • Compare b to b′; if they are equal then the signature is valid. • Note that if h were not cryptographically secure, then an opponent could easily forge a different document a′ that hashes to the same value b, and thereby attach someone’s digital signature to a different document than they actually signed, and fool the verifier.
Pseudo-random Numbers • Numbers that are generated deterministically, but that appear random for all practical purposes. • Used in many applications, such as: • Hash functions • Simulations, games, graphics • Cryptographic algorithms
Linear Congruential Method • One simple common pseudo-random number generating procedure • Uses the mod operator • Requires four natural numbers • The modulusm, the multipliera, the increment c, and the seedx0where 2 ≤ a < m, 0 ≤ c < m, 0 ≤ x0 < m. • Generates the pseudo-random sequence {xn} with 0 ≤ xn < m, via the following: • xn+1 = (axn + c) modm. • Tends to work best when a,c,m are prime, or at least relatively prime. • If c=0, the method is called a pure multiplicative generator.
Example • Let modulus m = 1,000 = 23·53. • To generate outputs in the range 0-999. • Pick increment c = 467 (prime), multiplier a = 293 (also prime), seed x0 = 426. • Then we get the pseudo-random sequence: • x1 = (ax0 + c) modm = 285 • x2 = (ax1 + c) modm = 972 • x3 = (ax2 + c) modm = 263 • and so on… Note alternatingodd and even values –results from m being even
Cryptology • This is the study of secret (coded) messages. It includes: • Cryptography – Methods for encrypting and decrypting secret (coded) messages. • Cryptanalysis – Methods for code-breaking. • Some simple early codes include Caesar’s cipher: • Associate each letter w. its position 0-25 in the alphabet • Encrypt by replacing letter n by letter (n+3) mod 26. • Decrypt by replacing n with (n−3) mod 26. • This a simple example of a shift cipher(n+k) mod m. • Can generalize this to affine transforms (linear 1-1 transforms) (an + b) mod 26, e.g., (7n+3) mod 26. • This is still very insecure however!
Modular Exponentiation Problem (from §2.5) • Problem: Given large integers b (base), n (exponent), and m (modulus), efficiently compute bn mod m. • Note that bnitself may be completely infeasible to compute and store directly. • E.g. if n is a 1,000-bit number, then bn itself will have far more digits than there are atoms in the universe! • Yet, this is a type of calculation that is commonly required in modern cryptographic algorithms!
Modular Exponentiation • proceduremodularExponentiation(b: integer, n = (nk−1…n0)2, m: positive integers)x := 1 {result will be accumulated here}b2i := bmodm{ mod m; i=0 initially}fori := 0 to k−1 {go thru all k bits of n}ifni = 1 thenx := (x·b2i) modmb2i := (b2i·b2i) modmreturnx
Why that Algorithm Works The binary expansion of n • Note that: • We can compute b to various powers of 2 by repeated squaring. • Then multiply them into the partial product, or not, depending on whether the corresponding ni bit is 1. • Crucially, we can do the modm operations as we go along, because of the various identity laws of modular arithmetic. – All the numbers stay small. = b1 = b
§2.6: More on Applications • Misc. useful results • Linear congruences • Chinese Remainder Theorem • Computer arithmetic w. large integers • Pseudoprimes • Fermat’s Little Theorem • Public Key Cryptography • The Rivest-Shamir-Adleman (RSA) cryptosystem
Some Misc. Results • Thm. (Euclid) • a,b > 0: gcd(a,b) = gcd(b, a mod b) • Thm. • a,b>0: s,t: gcd(a,b) = sa + tb • Lemma. • a,b,c>0: gcd(a,b)=1 a | bc→ a|c • Lemma. • If p is prime and p|a1a2…an (integers ai) then i: p|ai. • Thm. • If ac ≡ bc (mod m) and gcd(c,m)=1, then a ≡ b (modm).
Proof Euclid’s Algorithm Works • Thm.gcd(a,b) = gcd(b,c) if c = a mod b.Proof. First, c = a mod b implies t: a = bt + c. Letg = gcd(a,b), and g′ = gcd(b,c). Since g|a and g|b (thus g|bt) we know g|(a−bt), i.e.g|c. Sinceg|b g|c, it follows that g ≤ gcd(b,c) = g′. Now, since g′|b (thus g′|bt) and g′|c, we know g′|(bt+c), i.e., g′|a. Since g′|a g′|b, it follows thatg′ ≤ gcd(a,b) = g. Since we have shown that both g≤g′ and g′≤g, it must be the case that g=g′. ■
Proof of Theorem 1 • Thm.a≥b≥0 st: gcd(a,b) = sa + tbProof.(By induction over the value of the larger argument a.) From theorem 0, we know that gcd(a,b) = gcd(b,c) ifc = a mod b, in which case a = kb + c for some integer k, so c = a− kb. Now, since b<a and c<b, by inductive hypothesis, we can assume that uv: gcd(b,c) = ub + vc. Substituting for c, this is ub+v(a−kb), which we can regroup to getva + (u−vk)b. So now let s = v, and let t = u−vk, and we’re finished. The base case is solved by s=1, t=0, which works for gcd(a,0), or if a=b originally. ■
Proof of Lemma 1 • Lemma.gcd(a,b)=1 a|bc→ a|cProof. Applying theorem 1, st: sa+tb=1. Multiplying through by c, we have thatsac + tbc = c. Since a|bcis given, we know that a|tbc, and obviously a|sac. Thus (using the theorem on p.154), it follows that a|(sac+tbc); in other words, that a|c. ■
Proof of Lemma 2 • Lemma.Primep|a1…an→ i: p|ai.Proof. If n=1, this is immediate since p|a0 → p|a0. Suppose the lemma is true for all n<k and suppose p|a1…ak. If p|m where m=a1…ak-1 then we have it inductively. Otherwise, we have p|mak but ¬(p|m). Since m is not a multiple of p, and p has no factors, m has no common factors with p, thus gcd(m,p)=1. So by applying lemma 1, p|ak. ■
Uniqueness of Prime Factorizations • “The prime factorization of any number n is unique.” • Thm. If p1…ps = q1…qt are equal products of two nondecreasing sequences of primes, then s=t and pi = qi for all i.Proof. Assume (without loss of generality) that all primes in common have already been divided out, so that ij:pi≠ qj. But since p1…ps = q1…qt, we have that p1|q1…qt, since p1·(p2…ps) = q1…qt. Then applying lemma 2, j: p1|qj. Since qj is prime, it has no divisors other than itself and 1, so it must be that pi=qj. This contradicts the assumption ij:pi≠ qj. The only resolution is that after the common primes are divided out, both lists of primes were empty, so we couldn’t pick out p1. In other words, the two lists must have been identical to begin with! ■
Proof of Theorem 2 • Thm.If ac ≡ bc (mod m) and gcd(c,m)=1, then a ≡ b (mod m).Proof. Since ac ≡ bc (mod m), this means m | ac−bc. Factoring the right side, we getr m | c(a − b). Since gcd(c,m)=1, lemma 1 implies that m | a−b, in other words, that a ≡ b (mod m). ■
An Application of Theorem 2 • Suppose we have a pure-multiplicative pseudo-random number generator {xn} using a multiplier a that is relatively prime to the modulus m. • Then the transition function that maps from xn to xn+1 is bijective. • Because if xn+1 = axn mod m = axn′ mod m, then xn=xn′ (by theorem 2). • This in turn implies that the sequence of numbers generated cannot repeat until the original number is re-encountered. • And this means that on average, we will visit a large fraction of the numbers in the range 0 to m−1 before we begin to repeat! • Intuitively, because the chance of hitting the first number in the sequence is 1/m, so it will take Θ(m) tries on average to get to it. • Thus, the multiplier a ought to be chosen relatively prime to the modulus, to avoid repeating too soon.
Linear Congruences, Inverses • A congruence of the form ax≡ b (mod m) is called a linear congruence. • To solve the congruence is to find the x’s that satisfy it. • An inverse of a, modulo m is any integer a′ such thata′a ≡ 1 (mod m). • If we can find such an a′, notice that we can then solve ax≡b by multiplying through by it, giving a′ax≡a′b, thus 1·x≡a′b, thus x≡a′b. • Theorem 3: If gcd(a,m)=1 and m>1, then a has a unique (modulo m) inverse a′. • Proof: By theorem 1, st: sa+tm = 1, so sa+tm ≡ 1 (mod m). Since tm≡0 (mod m), sa≡1 (mod m). Thus s is an inverse of a (mod m). Theorem 2 guarantees that if ra≡sa≡1 then r≡s, thus this inverse is unique mod m. (All inverses of a are in the same congruence class as s.) ■
Chinese Remainder Theorem • Theorem: (Chinese remainder theorem.) Let m1,…,mn > 0 be relatively prime. Then the system of equations x≡ ai (mod mi) (for i=1 to n) has a unique solution modulo m = m1·…·mn. • Proof: Let Mi = m/mi. (Thus gcd(mi, Mi)=1.) So by theorem 3, yi=Mi′ such that yiMi≡1 (mod mi). Now let x = ∑iaiyiMi. Since mi|Mkfor k≠i,Mk≡0 (mod mi), so x≡aiyiMi≡ai (mod mi). Thus, the congruences hold. (Uniqueness is an exercise.) □
Computer Arithmetic w. Large Ints • By Chinese Remainder Theorem, an integer a where 0≤a<m=∏mi, gcd(mi,mj≠i)=1, can be represented by a’s residues mod mi: • (a mod m1, a mod m2, …, a mod mn) • To perform arithmetic upon large integers represented in this way, • Simply perform ops on these separate residues! • Each of these might be done in a single machine op. • The ops may be easily parallelized on a vector machine. • Works so long as m > the desired result.
Computer Arithmetic Example • For example, the following numbers are relatively prime: • m1 = 225−1 = 33,554,431 = 31 · 601 · 1,801 • m2 = 227−1 = 134,217,727 = 7 · 73 · 262,657 • m3 = 228−1 = 268,435,455 = 3 · 5 · 29 · 43 · 113 · 127 • m4 = 229−1 = 536,870,911 = 233 · 1,103 · 2,089 • m5 = 231−1 = 2,147,483,647 (prime) • Thus, we can uniquely represent all numbers up to m = ∏mi ≈ 1.4×1042 ≈ 2139.5 by their residues ri modulo these five mi. • E.g., 1030 = (r1 = 20,900,945; r2 = 18,304,504; r3 = 65,829,085; r4 = 516,865,185; r5 = 1,234,980,730) • To add two such numbers in this representation, • Just add their corresponding residues using machine-native 32-bit integers. • Take the result mod 2k−1: • If result is ≥ the appropriate 2k−1 value, subtract out 2k−1 • Or just take the low k bits and add 1. • Note: No carries are needed between the different pieces!
Pseudoprimes • Ancient Chinese mathematicians noticed that whenever n is prime, 2n−1≡1 (mod n). • Some also claimed that the converse was true. • However, it turns out that the converse is not true! • If 2n−1≡1 (mod n), it doesn’t follow that n is prime. • For example, 341=11·31, but 2340≡1 (mod 341). • Composites n with this property are called pseudoprimes. • More generally, if bn−1≡1 (mod n) and n is composite, then n is called a pseudoprime to the base b.
Carmichael numbers • These are sort of the “ultimate pseudoprimes.” • A Carmichael number is a composite n such that bn−1≡1 (mod n) for allb relatively prime to n. • The smallest few are 561, 1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, 29341. • Well, so what? Who cares? • Exercise for the student: Do some research and find me a useful & interesting application of Carmichael numbers. (Extra credit.)
Fermat’s Little Theorem • Fermat generalized the ancient observation that 2p−1≡1 (mod p) for primes p to the following more general theorem: • Theorem: (Fermat’s Little Theorem.) • If p is prime and a is any non-negative integer, then ap≡a (mod p). • Furthermore, if ¬(p|a), then ap−1≡1 (mod p).
Public Key Cryptography • In private key cryptosystems, the same secret “key” string is used to both encode and decode messages. • This raises the problem of how to securely communicate the key strings. • In public key cryptosystems, instead there are two complementary keys. • One key decrypts the messages that the other one encrypts. • This means that one key (the public key) can be made public, while the other (the private key) can be kept secret from everyone. • Messages to the owner can be encrypted by anyone using the public key, but can only be decrypted by the owner using the private key. • Like having a private lock-box with a slot for messages. • Or, the owner can encrypt a message with their private key, and then anyone can decrypt it, and know that only the owner could have encrypted it. • This is the basis of digital signature systems. • The most famous public-key cryptosystem is RSA. • It is based entirely on number theory!
Rivest-Shamir-Adleman (RSA) • The private key consists of: • A pair p,q of large random prime numbers, and • An exponent e that is relatively prime to (p−1)(q−1). • The public key consists of: • The product n = pq (but not p and q), and • d, an inverse of e modulo (p−1)(q−1), but not e itself. • To encrypt a message encoded as an integer M<n: • Compute C = Me mod n. • To decrypt the encoded message C, • Compute M = Cd mod n.
Why RSA Works • Theorem (Correctness of RSA): (Me)d≡ M (mod n).Proof: • By the definition of d, we know that de≡ 1 [mod (p−1)(q−1)]. • Thus by the definition of modular congruence, k: de = 1 + k(p−1)(q−1). • So, the result of decryption is Cd ≡ (Me)d = Mde = M1+k(p−1)(q−1)(mod n) • Assuming that M is not divisible by either p or q, • Which is nearly always the case when p and q are very large, • Fermat’s Little Theorem tells us that Mp−1≡1 (mod p) and Mq−1≡1 (mod q) • Thus, we have that the following two congruences hold: • First: Cd≡ M·(Mp−1)k(q−1) ≡ M·1k(q−1) ≡ M (mod p) • Second:Cd≡ M·(Mq−1)k(p−1) ≡ M·1k(p−1) ≡ M (mod q) • And since gcd(p,q)=1, we can use the Chinese Remainder Theorem to show that therefore Cd≡M (mod pq): • If Cd≡M (mod pq) then s: Cd=spq+M, so Cd≡M (mod p) and (mod q). Thus M is a solution to these two congruences, so (by CRT) it’s the only solution.■
References • RosenDiscrete Mathematics and its Applications, 5eMc GrawHill, 2003