Remote Timing Attacks are Practical

Remote Timing Attacks are Practical An Overview by - Rahul Deshpande

What are Timing Attacks • Extracting secrets by observing time to respond to various queries • E.g.. Kocher designed a timing attack to expose secret keys used for RSA.

Timing Attacks • Usually used to attack weak computing devices such as Smart Cards • Also applicable to general software systems • Practical against network servers

Common Assumptions • Attack only applicable to hardware security devices • Attack cannot be used to against general purpose servers since decryption times are masked by many concurrent processes running on the system.

Challenging the Assumptions • Remote timing attack against OpenSSL developed. • OpenSSL: an SSL library commonly used in web servers and other SSL applications. • Attack client measures the time an OpenSSL server takes to respond to the decryption queries. • Client able to extract private key stored on the server.

Environments in which attack is applicable • Network: between two machines in different buildings with multiple routers and switches between them. • Interprocess: Between two processes running on the same machine. • Virtual Machines: extracting RSA private key from secure Virtual Machine (VM), invalidating isolation provided by Virtual Machine Monitor (VMM)

OpenSSL Decryption • RSA decryption done using modular exponentiation • M = cd mod N where N= pq is the RSA modulus. • OpenSSL uses Chinese Remainder Theorem to perform exponentiation. • CRT computes exponentiation in two steps by computing m1 and m2 and then combining the two to get m. • Decryption with CRT gives up to a factor of four speedup • Timing attack can expose the factors of N used in CRT.

The Chinese Remainder Theorem • It is possible to reconstruct integers in a certain range from their residues modulo a set of pair wise relatively prime moduli. • E.g. The 10 integers in Z10(0,1….9) can be reconstructed from their two residues modulo 2 and 5 (relatively prime factors of 10). • Provides a way to manipulate large numbers mod M in terms of tuples of smaller numbers. • CRT can be formulated as: k • M = ∏ mi i=1 • Where, mi are pairwise relatively prime. • Any integer in Zm can be represented by a k-tuple whose elements are in Zmi using the following correspondence; • A <-> (a1, a2,….ak)

The Chinese Remainder Theorem • n = n1n2…nk with gcd (ni; nj ) = 1 when i != j • The system of congruencies x=x1(mod n1)=…=xk(mod nk) has a simultaneous solution x to all of the congruencies, and there exists exactly one solution x between 0 and n-1.

Speedup RSA with CRT • Any message M<N is uniquely represented by the tuple [MP;MQ ], where MP = M(mod P) and MQ = M(mod Q). CP = C(mod P) and CQ = C(mod Q). DP = D(mod (P-1)) and DQ = D(mod (Q-1)) RP = QP-1(mod N) and RQ = PQ-1(mod N) MP = CPDP(mod P) and MQ = CQDQ(mod Q) SP = MPRP(mod N) and SQ = MQRQ(mod N) M = SP + SQ. If M>=N then calc M=M-N. • Reference: Johann Großschädl, “The Chinese Remainder Theorem and its Application in a High-Speed RSA Crypto Chip”

Exponentiation • Simplest algorithm to compute gd mod q is square and multiply. • OpenSSL uses optimization of square and multiply called sliding window exponentiation

Sliding Window Exponentiation • Block of bits (window) of d processed at each iteration. • Requires precomputing a multiplication table, taking time proportional to 2w-1 +1 for a window size of w. • For a 1024-bit modulus, OpenSSL uses a window size of five. • Attack: Querying on many inputs g, attacker exposes information about bits of the factor q. • Attack on sliding windows harder than on square and multiply because of fewer multiplications.

Montgomery Reduction • A reduction modulo q done via multiprecision division and then returning the remainder is expensive. • Montgomery proposed method for implementing reduction modulo q using series of operations efficient in hardware and software. • Montgomery reduction transforms a reduction modulo q into a reduction modulo some power of two denoted by R • Reduction modulo power of 2 faster since easily implemented in hardware. • All variables must be put into Montgomery form.

Montgomery Reduction • At the end of reduction, checked if output cR is greater than q. • If cR>q, q subtracted from output to keep cR in the range [o,q). This extra step is called Extra Reduction. • Extra Reduction causes timing difference for different inputs. • Detecting timing differences from extra reduction tells how close g is to a multiple of one of the factors.

Multiplication Routines • RSA operations make use of a multi-precision integer multiplication routine. • OpenSSL implements two multiplication routines: Karatsuba and Normal. • Karatsuba used when multiplying two numbers with equal number of words. Takes time O(n1.58). • Normal Multiplication used when multiplying two numbers with unequal word sizes n and m. Takes time O(nm).

Multiplication Routines • Normal Multiplication takes quadratic time for numbers of approximately same size. • Multiplication of two unequal size words takes longer than multiplication of equal size words. • This fact used in timing attack on OpenSSL. • Underlying word multiplication algorithm dominates the total time for a decryption. • In OpenSSL, it takes 30%-40% of total running time.

Comparison of Timing Differences • Two algorithmic data dependencies in OpenSSL that cause time variance in RSA decryption: 1. Number of extra reductions in Montgomery Reduction. 2. Choice of multiplication routine. • Effects of these optimizations counteract each other. • Karatsuba: decryption of g<q faster than g>q and vice versa for Montgomery Reduction.

A Timing Attack on OpenSSL • Exposes the factorization of the RSA modulus. • Approximations built which get progressively closer as the attack proceeds. • Can be viewed as a binary search for q. • After recovering half-most bits of q, Coppersmith’s algorithm used to retrieve complete factorization. • Value of decryption not needed, only the time required for decryption needed.

Timing Attack on OpenSSL • g is an integer that has the same top i-1 bits as q and remaining bits of g are 0. • ghi is same as g, with ith bit set to 1. If bit of q is 1 then g< ghi<q, otherwise g<q< ghi. • Measure the time to decrypt both ug and ughi, represented as t1 and t2. • Calculate the timing difference td = |t1-t2|. • If bit i of q is 0, then td is large • If bit i of q is 1, then td is small

Real World Scenarios • Timing attack applies to SSL applications such as stunnel, Apache web server with mod_SSL, and trusted computing projects such as Microsoft’s NGSCB. • RSA applications using a hardware crypto accelerator not vulnerable. • Attacks apply to only software based RSA implementations.

Example of an Attack on SSL server • In a standard full SSL handshake, SSL server performs RSA decryption using its private key. • CLIENT-KEY-EXCHANGE message composed by encrypting PKCS 1 padded random bytes with server’s public key. • In the attack, client substitutes properly formatted CLIENT-KEY-EXCHANGE with the guess g. • Server generates ALERT message. • Client computes time difference and repeats for various values.

Experiments • Show that factorization of the RSA modulus N is vulnerable. • Test effects of increasing decryption requests • Compare effectiveness based upon different keys • Compare effectiveness based upon machine architecture and common compile-time optimizations • Compare effectiveness based upon source-based optimizations • Compare inter-process vs. local network attacks • Compare effectiveness against two common SSL applications: Apache web server with mod_SSL and stunnel

Experiment Setup • Attack performed against OpenSSL 0.9.7 which does not blind RSA operations by default. • Simple TCP server implemented that read an ASCII string • Converted string to OpenSSL’s internal multi-precision representation • The RSA decryption performed • Decryption time: writing the ciphertext over the socket to receiving the reply.

Experiment 1- Number of Ciphertexts • Parameters that determine the number of queries needed to expose a single bit of RSA factor: • Neighborhood size: for every bit of q, measure the decryption time for a neighborhood of values g, g+1, g+2… g+n, denoted by n. • Sample Size: For each value g+i, sample decryption time multiple time and compute mean decryption time. Number of times g+i is queried on denoted by s. • Total number of queries needed to compute Tg= s*n.

Continued.. • Zero-one gap: gap between when a bit of q is 0 and 1. • Larger the gap, stronger the indicator that bit is 0, smaller the chance of error. • Increasing the neighborhood size increases zero-one gap when bit is 0, but is steady when bit is 1. • Total number of queries to recover a factor: (2ns)*log2(N/4) where N= RSA public modulus.

Experiment 2- Different Keys • Several 1024-bit keys attacked, to determine the ease of breaking different moduli. • Zero-one gap positive for first 32 bits due to Montgomery reductions. • Normally, resulting zero-one gap shifts occur around the multiple of machine word size. • Attacker must be aware that zero-one gap may flip signs when guessing bits that are around multiples of machine word size. • If hard-to-guess bits encountered, neighborhood size can be increased to increase the zero-one gap.

Experiment 3- Architecture and Compile-Time effects • Computer Architecture and compile-time optimizations affect the zero-one gap. • Effect of Architecture: Programs with similar retirement counts may have different execution profiles. • This is due to different run-time factors such as branch predictions, pipeline throughput, and the L1 and L2 cache behavior. • Compile-time optimizations change the number of instructions and how efficiently instructions are executed on the hardware.

Continued… • Effects of compile-time optimizations tested by compiling OpenSSL in three different ways: • Optimized • No Pentium flag • Unoptimized • Each different compile-time optimizations changes the zero-one gap.

Experiment 4 – Source-Based Optimizations • Patches can change the code profile of RSA libraries resulting in timing vulnerability. • After a CRT decryption, OpenSSl re-encrypts the result to verify if it is identical to original ciphertext. • OpenSSL calculates both Montgomery parameters on every decryption. • A patch allows OpenSSL to cache both the values between decryptions with the same key. • This shifts the zero-one gap since resulting code has different execution profile. • Patches may be used to increase the zero-one gap making the code vulnerable to timing attacks.

Experiment 5 – Interprocess vs. Local Network Attacks • Noise from network eliminated by repeated sampling, giving similar zero-one gap to inter-process. • Networks with less than1ms of variance are vulnerable. • Attacker can take advantage of higher CPU speeds for increasing accuracy of timing measurements.

Experiment 6 – Attacking SSL Applications on the Local Network • Apache+mod_SSL is a commonly used secure web server. • Stunnel allows TCP/IP connections to be tunneled through SSL. • Servers connected by a single switch are vulnerable to the attack. Attacker has access to a machine near the OpenSSL-based server. • Timing attacks also work in larger networks where client and webserver are separated by multiple routers and switches on the network backbone. • Run-time differences result in different zero-one gaps. • Experiment highlights difficulty in determining minimum number of queries for a successful attack.

Defenses • Three Possible Defenses: • RSA Blinding: • Calculates • x is then decrypted as normal, followed by division by r. • Since r is random, x is random and timing the decryption does not reveal information about the key. • Performance penalty of 2%-10%.

Continued… 2. Try and make all RSA decryptions not dependent upon the input ciphertext. • Harder to create and maintain the code when decryption time is not dependant upon ciphertext. 3. Require all RSA computations to be quantized i.e. always take multiples of some predefined time quantum. • Preferred method is Blinding. • Drawbacks is that it requires a good source of randomness to prevent attacks on blinding factor leading to a small performance degradation

Conclusion • Experiments show that timing attacks are effective when carried out between machines separated by multiple routers. • Timing attacks also effective on two processes on the same computer. • Several Crypto libraries, including OpenSSL, now implement blinding by default to prevent timing attacks.

Remote Timing Attacks are Practical

Remote Timing Attacks are Practical

Presentation Transcript

SSH Timing Attacks

Remote Procedure Call Practical Issues

Timing Attacks on Elliptic Curve Cryptosystems (ECC)

Cache-Collision Timing Attacks Against AES

Timing Attacks to RSA

Keystroke Timing Attacks in a Free-Text Environment

Preventing Active Timing Attacks in Low-Latency Anonymous Communication

Timing attacks on MAC verification

SSH Keystroke Timing Attacks

Timing Analysis of Keystrokes and Timing Attacks on SSH

Remote Timing Attacks are Practical

Practical Attacks on a Proximity Card

Exploiting Cache-Timing in AES: Attacks and Countermeasures

Practical Approaches to Dealing with DDoS Attacks

Remote Timing Attacks

Cache-Collision Timing Attacks Against AES

Timing Attacks

Tor and Timing Attacks

What Are Malicious Attacks?

Camouflage: Memory Traffic Shaping to Mitigate Timing Attacks

Are You Remote?