1 / 28

Cache Attacks and Countermeasures: the Case of AES

Cache Attacks and Countermeasures: the Case of AES. Dag Arne Osvik, Adi Shamir and Eran Tromer. Presented by Ophir Arbiv ophirarb@post.tau.ac.il. [1] Cache Attacks and Countermeasures: the Case of AES (Extended Version),2005, Dag Arne Osvik, Adi Shamir and Eran Tromer.

Download Presentation

Cache Attacks and Countermeasures: the Case of AES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Attacks and Countermeasures: the Case of AES Dag Arne Osvik, Adi Shamir and Eran Tromer Presented by Ophir Arbiv ophirarb@post.tau.ac.il

  2. [1] Cache Attacks and Countermeasures: the Case of AES (Extended Version),2005, Dag Arne Osvik, Adi Shamir and Eran Tromer. [2] theory.csail.mit.edu/~tromer/SKC2006/cache-skc06.ppt – Tromer’s lecture in MIT. [3] www.l-sec.be/calit/present/AdiShamir.pdf - Adi Shamir’s lecture in Weizman Inst. Sources

  3. AES – Advanced Encryption Standard • 1997 - DES becoming outdated NIST announces competition to design a successor. • Evaluation criteria - Security, Cost, Algorithm & Implementation Characteristics • 21 Algorithms were received. In 2001 - NIST selected Rijndael as the proposed AES algorithm. • Rijndael was proposed by Dr. Vincent Rijmen and Dr. Joan Daemen from Belgium • Properties: • Symmetric • Block Cipher • Based in finite mathematics • 128 bit Data and Key size of 128, 192 and 256 bits. • Resistant to known attacks.

  4. AES Algoritrhm • The mathematical description of the algorithm: Source: http://klabs.org/mapld05/presento/103_swankoski_p.ppt

  5. Tables: Key: = Round implementation: Efficient Implementation • Originally proposed in the Rijndael spec, and is now widely used. • Uses pre-computed table lookups. • Each round - 16 table lookups, 16 xor’s, and 12 shifts. • .Tables occupy – 4 KB (X2)

  6. AES - summary • During AES selection, only branch statements, arithmetic, and data-dependent shift were considered vulnerable. • Proposed Algorithms was widely analyzed. • Apparently, since it uses only table lookup, xor & shift, NIST declared Rijndael “not vulnerable to timing attacks. • 2003 - NSA declared AES-128 can be used to protect all US Government data except Top Secret data which needs AES-256 (at least). • No known direct attacks as for today. • Expected to be the standard for 20+ years.

  7. Side Channels • Any observable information emitted as a byproduct of the physical implementation of the cryptosystem. K Plaintext Side Channels Cipher Ciphertext Source: www.stanford.edu/~jbonneau/AES_side_channel.ppt

  8. Examples for side-channels : • Power consumption (simple, differential…) • Time • Heat • Acoustic Noise (Keyboards..) • Cache • Fault (power glitch, jitter..) • Electromagnetic radiation • Visual Side Channels

  9. cache →timing gap Typicallatency: 0.3ns 50-150ns Why Cache Analysis? CPU core60% (until recently) Annual speedincrease: Main memory7-9%

  10. The cache is a shared resource.=> cache state affects and affected by all processes. => possible crosstalk between processes. Process memory is usually protected but… Information about memory access patterns of other processes is leaked. Cache attacks are pure software attacks. Very cheap. A process with no special privileges & no interaction with the cryptographic code (some variants) can attack the cryptographic code. Cache Attacks

  11. memory block(B bytes) DRAM cache set(W cache lines) cache line (B bytes) cache How Cache Works? • The cache holds copies of aligned blocks of B bytes in main memory (blocks). • When a memory access instruction is processed, memory cell is searched in the cache first. • If a cache miss occurs, a full memory block is copied into the appropriate set (S possible sets) into one of the W cache lines. Memory Access Cache

  12. How Does a Cached Table Look Like? S-boxtable DRAM cache

  13. Notation • δ – the cache line size B divided by the size of each table entry (usually 64/4 =16). • <y> = the memory block of y in Tl. <y> = <z> iff when used as lookup indices into the same table T`, they would cause access to the same memory block • Qk(p,l,y) = 1 - iff the AES encryption of the plaintext p under the encryption key k accesses the memory block of index y in Tl at least once (during the 10 rounds).

  14. Cache Attacks on AES • The efficient implementation of the algorithm has a big weakness: The lookup addresses strongly rely on the encryption key ( The Secret). • Therefore, by knowing which memory cells were accessed we can extract the key (suppose a BUS attack). • Usually the attacker doesn’t have access to the BUS and the memory is partitioned and protected by the OS. • The Solution : The cache is a shared resource through which we can learn about the memory access patterns of other processes.

  15. Synchronous Attacks • The plaintext or cipher-text is known • The attacker can operate synchronously with the encryption (on the same processor). • Examples: • sending data packets through a secure channel in a VPN. • Linux’s dm-crypt and cryptoloop services. • The Attack Scheme • Obtain a set of random samples, Mk(p,l,y) of the predicate Qk(p,l,y). • Perform off-line cryptanalysis: • Guess small parts of the key. • Use the guess to predict memory accesses. • Check whether the predictions are consistent with the collected data.

  16. Consider one of the memory accesses in the 1st round: T0[p0  k0] Given a candidate value k’0 and samples of Q(p,l,y): The useful samples are those that fulfill: p0  k’0y If k’0k0 then for all useful samples: p0  k0 p0  k’0 y so T0[p0  k0] accesses address y => Q(p,l,y)=1 Otherwise: p0  k0 p0  k’0y => Q(p,l,y)=0 But there are 35 more “random” accesses to T0… with probability (1-1/16)350.104 A few hundred (!) random samples suffice to eliminate all bad candidates. High nibble of all key bytes (log2(256/ δ)) are extracted (64 bits). One Round Attack

  17. Full Key Extraction • We managed to narrow down each byte of the key to δ possibities, with a straightforward method. (in the common case it means extracting half the key - 64 bits) • This is all the possible information from 1st round accesses. • By moving to 2nd round and taking advantage of the non-linearity of the S-box we can extract the full key!!

  18. Two Round Attack • These equations for the 2nd round are easily derived from the Rijndael specification: { s(·) denotes the Rijndael S-box function and • denotes multiplication over GF(256).} • is used as an index to T2. • The only relevant unknowns in the index are the low nibbles of k0,k5,k10 and k15 (216 candidates). • Can test a candidate as before: • Predict this lookup according to guess {k’0,k’5,k’10, k’15} (lower nibble k2 irrelevant). • Identify useful samples, i.e., those where y is in the same memory block as the prediction • Check whether Q(p,l,y)=1 for all useful samples. • There are 3 more accesses of this special form, with disjoint sets of relevant low nibbles. => full key recovery using ~2000 random samples.

  19. How do we obtain the measurements Mk(p,l,y) of predicate Qk(p,l,y) ?? Inter-process crosstalk can be exploited in two ways: Effect of the cache on the encryption (timing). Effect of the encryption on the cache. Measurement Methods

  20. Attackermemory Measurement Method 1: Evict + Time 1. Make sure the tables are cached 2. Evict one cache set T0 DRAM 3. Time an encryption and see if it’s slow cache

  21. Results • Weakness of this method: • It relies on timing the triggered encryption => it is very sensitive to variations in the operation (noise due scheduling, branches, cache contention and ect.) • The authors were able to extract key only from artificial service (using OpenSSL libs) but not from real services.

  22. Attackermemory Measurement Method 2: Prime + Probe • Trying to discover the set of memory blocks read by the encryption a posteriori, by examining the state of the cache after encryption. 1. Completely evict tables from cache 2. Trigger a single encryption S-boxtable 3. Access attacker memory again and see which cache sets are slow DRAM cache

  23. Results • Yields more information (4 · 256/ δ) from a single encryption • Not a timing attack! Attacker is timing a simple operation performed by itself! • Insensitive to timing variance in encryption code path (crucial for effective attacks on complicated systems). • No real need to trigger the encryption – can wait until it happens by itself… :

  24. Synchronous Attacks - summary • For a known plain-text & sync. attacker • Two Measurement methods. • Results: • OpenSLL libs on Athlon 64: • Evict + Time – 500,000 encryptions. (why?) • Prime & Probe – 300 encryptions, (16K on P4E). • Real Linux dm_crypt: • Prime & Probe – 800 write operations – 65 ms + 3 sec offline analysis. • Variants …

  25. Asynchronous Attack • Someone runs encryptions computations using a secret key. • Attacker process runs on the same CPU at (roughly) the same time. • Assume the plaintext/ciphertext has a non-uniform (conditional) distribution: • English • Formatted data • Headers • Ciphertext gleaned from wire • Examples: just about any use of crypto on a multi-user system Finding the key • Compare two distributions: • Measured memory accesses statistics. • Predicted memory accesses statistics, under the given plaintext distribution and the key hypothesis. • Find key that yields best correlation

  26. Countermeasures • The authors consider numerous countermeasures e.g.: • Avoiding Memory Accesses • Alternative Lookup Tables • Data-Oblivious Memory Access Pattern • Cache State Normalization and Process Blocking • Disabling Cache Sharing • Static or Disabled Cache • Dynamic Table Storage • Hiding the Timing • None of the them solves the problem completely. Some are architecture/application dependant or require changes in the system. • None are both secure, efficient (or cheap) and generic. => Case specific solutions – probably a combination of the methods.

  27. Thank you! Questions?

  28. Homework • What is the difference between Evict+Time and Prime+Probe measurement methods. • In the case of known cipher-text, how would the attack change? (hint: can be more efficient – see paper) • Why does a first round synchronous attack able to extract only half the key bits? (on a δ=16 platform) • Does the addition of random delay to the encryption algorithm improve the immunity against synchronous attacks? Why?

More Related