400 likes | 488 Views
Architecture Support for Secure Computing. Mikel Bezdek Chun Yee Yu CprE 585 Survey Project 12/10/04. Presentation Outline. Motivation Assumptions Attacks Proposed Solutions Pending questions and future research. Motivation.
E N D
Architecture Support for Secure Computing Mikel Bezdek Chun Yee Yu CprE 585 Survey Project 12/10/04
Presentation Outline • Motivation • Assumptions • Attacks • Proposed Solutions • Pending questions and future research
Motivation • Currently piracy of software and digital media is a huge problem • Attempts to solve with software solutions have proven easy to foil • Adding support at the hardware level is a promising solution
Assumptions • All solutions assume processor and on chip storage to be secure • Operating system and all peripherals, including off chip memory, are untrusted OS I/O Devices Processor Memory
Points of Attack • Because of untrusted memory attacks can occur on any transfers to or from external memory • Because of untrusted OS, attacks could occur at context switches, when OS takes control of operation
Memory Attacks • Adversaries may try to gain information from unprotected off chip memory by: • Modifying data • Spoofing, Splicing, and Replay Attacks • Monitoring data access pattern (address bus)
Solutions • Basic XOM architecture • XOM using One Time Pad Encryption • Hash Trees • Aegis Processor • HIDE Architecture
XOM (Execute-Only Memory) • Tamper Resistant Software • Software is encrypted using symmetric encryption, its key is encrypted using asymmetric encryption • Asymmetric Encryption - public key used by vendor, private key used by XOM chip • Symmetric Encryption - the private key is unique to each program, also called the XOM ID • Secured Computing • Enforces access restrictions using tagged and encrypted storage • Encrypted code execution using on-chip decryption
XOM Internal Security • L2 Cache lines tagged with XOM ID with valid bits for each word in cache line • L1 Cache lines are tagged with a XOM ID • Registers are tagged with a XOM ID • XOM ID is kept in a table in the XOM chip
XOM Context Switches • Involves 4 special registers: • Data register - Data is packaged into movable (by the interrupting application), read-write protected data. A mutating key and XOM ID is used for packaging. • Hash registers (2) - 128 bit hash is made from the package, stored in two 64 bit registers • XOM ID register - storing XOM tag
XOM and External Memory • Encrypts data with XOM ID and creates a hash (MAC) • Message Authentication Code – a keyed one way hash, protects against spoofing and slicing attacks
XOM Performance Issues • Optimizations: • Use a reversible CRC instead of MAC • Dedicated, pipelined DES encryption/decryption hardware. • Max of 50% slowdown assuming a 48 cycle Triple DES implementation and 100 cycle memory access latency.
XOM with One-Time Pad • Average XOM slowdown is 16.7% on SPEC 2000 benchmarks • Around 30% slowdown on memory intensive programs • One-Time Pad encryption can be used to remove encryption/decryption from critical path
XOM with OTP • Proposed OTP solution • Cipher = plain Å encryptedkey(address + seq) • Plain = cipher Å encryptedkey(address + seq) • key = XOM ID • address = virtual address of data/instruction • seq = mutating sequence number • encryptedkey(address + seq) is concurrent with memory access • Encryption/decryption requires a one cycle XOR operation
XOM with OTP • Cipher = plain Å encryptedkey(address + seq) • Plain = cipher Å encryptedkey(address + seq) • key = XOM ID • address = virtual address of data/instruction • seq = mutating sequence number
XOM with OTP • Sequence Number Cache (SNC) • Stores sequence numbers for each cache line • Accessed by virtual address of cache line • Limited size • Use replacement – store parts of SNC in unsecured memory • No replacement – OTP on some data, can’t use OTP on rest of data
XOM with OTP • Sequence Number Cache operation • Hits – sequence number is accessed and passed on to the encryption unit • Misses • No replacement – default back to original XOM, where encryption is performed after memory access. Costs 100 + 50 cycles • With replacement – fetch sequence number memory, then perform encryption
XOM with OTP • SNC and Context Switching • Dump to memory with encryption • Tag SNC entries with XOM ID
XOM with OTP • Performance • 16.7% XOM average slowdown • 4.59% XOM w/ OTP – No Replacement • 1.28% XOM w/ OTP – With Replacement • 1.035% max additional memory traffic
Hash Trees • Memory Integrity Verification • Allows the secure processor to ensure that the data it reads from memory matches the data most recently written • Protection • Spoofing • Splicing • Replay
H H H H H H H Hash Tree - Details • Works by calculating a hash of data • Hash is easy to compute given data, but hard to find data which will result in an equivalent hash Secure H H H H H H H H Data
Hash Tree - Details • Calculated when accessing memory • No need to calculate hash for a cache hit • Data can be given speculatively to the processor while hash is generated and checked • Speculative commits • Allowed using fetched but unverified data • Exception raised by hash checker does not need to be recovered from • Stalls on hash checker when using processor’s secret key • Simulations done show that with caching of hashes an average overhead of less than 20% can be achieved
Aegis Architecture • Uses concepts from XOM and hash trees to create a “private and authenticated tamper-resistant environment” for the processor to run in • This means that data is private from any observers and that any tampering will be detected
Aegis Architecture • Allows a user to trust the results from a program • System Authentication • Program Authentication • Message Authentication • This is accomplished by the sign_msg instruction, which encrypts a message and a hash of the program with the processor’s secret key before sending back to the user
Aegis Architecture • To provide environment, 3 key things must be done • Memory Integrity Verification • Encryption/Decryption of off-chip memory • Context Switches managed securely
Aegis – Memory Integrity Verification • Accomplished using hash trees • Introduces new twist on hash trees, log hash • In log hash, only memory accesses leading up to a sign_msg instruction are verified • Greatly reduces cost of verification while not sacrificing much security
Aegis – Off chip memory • Data stored in the off chip memory is encrypted and decrypted using the one time pad xom scheme to hide latency • Pads are generated using the address of the data combined with a time stamp, incremented at every write-back • Time stamps are needed before calculation of pad can begin, so caching of timestamps is a good idea
Aegis – secure context switches • Uses a Secure Context Manager • Maintains a table of all processes • Table entry contains: secure process ID (SPID), program hash, register values, and hash for off-chip memory verification • Table stored in memory, but can be cached for recent processes • In addition, cache entries are tagged with SPID to ensure a process cannot gain access to another process’s data
Aegis - Overhead • Overhead of SCM in negligible, main slow down comes from integrity verification and encryption of memory • Using l-hashes and OTP encryption, authors were able to see an average overhead of < 25%, with a worst case of 55% of tested benchmarks
HIDE - Motivation • Addresses the problem of secure information leaking due to monitoring of the address bus • Access patterns reveal information about branching • Can be compared with known branching patterns to identify IP reused in secure process
HIDE – Critical Idea • Addresses from the processor are remapped before being sent to memory • Mapping is done using a permutation function to ensure a random mapping • Current mapping (permutation vector) must be stored on chip
HIDE - Implementation • To ensure that attackers cannot see patterns in memory accesses, each access from a current pv must happen once • Implemented with locking cache blocks
HIDE – Hide Cache • Modified L2 cache • Cache hits (R and W) unmodified • When a block is loaded on a cache miss, it is locked • A locked block cannot be replaced • When all blocks are locked, permutation must be done, which unlocks all blocks
HIDE – Permutation Steps • A new pv is created mapping set of all current memory addresses to new addresses • Blocks are loaded sequentially from memory and stored in their new location (pv[i]) in an on-chip buffer • Buffer is written back sequentially to memory • If on-chip buffer size S is less then memory size, M, process must be repeated M/S times
HIDE - Improvements • Since permutation is a lengthy operation, don’t want to wait until all cache blocks are locked • Idea of pre-permutation – start permutation when half of cache blocks are locked
HIDE - Improvements • Instead of permuting entire memory at once, permute chunks at a time • Chunk size is one or more pages • Memory accesses within a chunk preserve security, only accesses across chunks leak information. Reduce by: • Larger chunk size • Store code to minimize inter-chunk access • Requires maintaining info about each page
HIDE - Results • Simulated using super scalar on SPEC2K benchmarks • Average slowdown was only 1.3% • Memory bandwidth used was on average 9% of total
HIDE - Conclusions • Provides high level of security without imposing must loss in performance • Requires slight modification to L2 cache, addition of permutation hardware • Will not work for multiprocessor systems, since the pv and locking info must be communicated on unsecured bus
In Summary • Supporting software security with hardware is a developing field • Assumes basic model of secure processor with private half of public-private key pair • XOM with OTP keeps memory private, hashes ensure memory is tamper free, and permutation scheme can be used to secure address bus • When combined, allows users to trust results from a secure processor and software developers to create copy-proof software
Pending Questions • Will users accept performance losses in order to gain security • Will vendors support secure processing • Problems relating to secret (private) key stored on processor