400 likes | 534 Views
Notary: Hardware Techniques to Enhance Signatures. Luke Yen Collaborator: Prof. Stark C. Draper Advisor: Prof. Mark D. Hill University of Wisconsin, Madison MICRO-41 - November 11, 2008 www.cs.wisc.edu/multifacet/papers/micro08_notary.pdf. Executive Summary.
E N D
Notary: Hardware Techniques to Enhance Signatures • Luke Yen • Collaborator: Prof. Stark C. Draper • Advisor: Prof. Mark D. Hill • University of Wisconsin, Madison • MICRO-41 - November 11, 2008 • www.cs.wisc.edu/multifacet/papers/micro08_notary.pdf
Executive Summary University of Wisconsin-Madison Tackle 2 problems with hardware signatures: • Problem 1: Best signature hashing (i.e., H3) has high area & power overheads • Solution 1: Use entropy analysis to guide lower-cost hashing (Page-Block-XOR, PBX) that performs similar to H3 • Ex: 160 gates for H3 vs 20 gates for PBX • Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs • Solution 2: Avoid inserting private stack addrs, propose privatization interface for higher performance
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Signature background University of Wisconsin-Madison • Signatures (hardware Bloom filters) used to summarize and detect conflicts with a transaction’s read- and write-sets • Inspired by Bulk system [Ceze,ISCA’06] • Implemented in LogTM-SE [Yen,HPCA’07] • Can have false positives, but never false negatives • Also proposed for non-TM purposes (e.g., SC violation detection, atomicity violation detection, race recording) • Ex: Use k Bloom filters of size m/k, with independent hash functions
Signature hash functions LogTM-SE w/ 2kb signatures • Result: H3 better with >=2 hash functions • However, H3 uses many multi-level XOR trees • Can we improve this? University of Wisconsin-Madison • Which hash function is best? [Sanchez, MICRO’07] • Bit-selection? Hash simply decodes some number of input bits • H3? Each bit of a hash value is an XOR of (on avg.) half of the input address bits
H3 implementation University of Wisconsin-Madison Num XOR Ex: 2kb signatures, k=2, c=10, 32-bit addr = 160 XOR gates per signature Can we reduce the total gate count?
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Entropy overview University of Wisconsin-Madison • Not all address bits have equal randomness • Ex: High-level address bits unlikely to change if working set size is small • Key insight: If input bits are random and those bits are used as inputs to hash functions, random hash values result • Use entropy to measure bit randomness • Entropy – measure of the uncertainty of a random variable x
Entropy formally defined n bits 0 bits Other cases max min Entropy value of n-bit field All bit patterns in n-bit field equally likely n-bit field has constant value University of Wisconsin-Madison • Entropy = • p(xi) = the probability of the occurrence of value xi • N = number of sample values random variable x can take on • Entropy = amount of information required on average to describe outcome of variable x (in bits) • Ex: What is the best possible lossless compression?
Our measures of entropy Local entropy 6 6 31 31 Addr Addr Global entropy NSkip University of Wisconsin-Madison • For our workloads, we care about: • Q1: What is the best achievable entropy? • Global entropy – upper bound on entropy of address • Q2: How does entropy change within an address? • Local entropy – entropy of bit-field within the address
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Entropy results University of Wisconsin-Madison • Workloads to be described later • Global entropy is at most 16 bits • Bit-window for local entropy is 16 bits wide (NSkip from 0-10) • Smaller windows (<16b) may not reach global entropy value • Larger windows (>16b) hides some fine-grain info
Entropy results summary University of Wisconsin-Madison • More entropy results in our MICRO paper • In summary, for our workloads entropy monotonically decreases when moving towards high-order bits • We calculate the average entropy across the entire workload’s execution • May miss entropy changes due to program phase behavior • Our Page-Block-XOR (PBX) hash takes advantage of this overall trend
Page-Block-XOR (PBX) University of Wisconsin-Madison • Motivated by 3 findings: • (1) Lower-order bits have most entropy • Follows from our entropy results • (2) XORing two bit-fields produces random hash values • From prior work on XOR hashing (e.g., data placement in caches, DRAM) • (3) Bit-field overlaps can lead to higher false positives • Correlation between the two bit-fields can reduce the range of hash values produced (worse for larger signatures)
PBX implementation • PPN and Cache-index fields not tied to system params: • Use entropy to find two non-overlapping bit-fields with high randomness University of Wisconsin-Madison • For 2kb signatures with 2 hash functions: • 20 XOR gates for PBX vs 160 XOR gates for H3!
Summary thus far University of Wisconsin-Madison • Problem 1: H3 has high area & power overheads • Solution 1: Use entropy analysis to guide lower-cost PBX • Ex: 160 gates for H3 vs 20 gates for PBX • Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs • Solution 2: To be described
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Motivation University of Wisconsin-Madison • False conflicts caused by thread-private addrs • Avoid conflicts if addrs not inserted in thread’s signatures
Privatization solutions University of Wisconsin-Madison • Two solutions proposed: • (1) Remove private stack references from sigs. • Very little work for programmer/compiler • Benefits depend on fraction of stack addresses versus all transactional references • (2) Language-level interface (e.g., private_malloc(), shared_malloc()) • Even higher performance boost • For skilled programmer • WARNING: Incorrectly marking shared objects as private can lead to program errors!
Page-based implementation University of Wisconsin-Madison • Each page is assigned a status, private or shared • Invariant: Page is shared if any object is shared • If stack is private, library marks stack pages as private • If using privatization heap functions, mark heap pages accordingly
OS support University of Wisconsin-Madison • OS allocates different physical page frames for shared and private pages • Sets a per-frame bit in translation entry if shared • Reduce number of page frames used by packing objects with same status together • Signatures insert memory addresses of transactional references to shared pages • Query page sharing bit in HW TLB & current transactional status
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Methodology University of Wisconsin-Madison • Full-system simulation using Simics and Wisconsin GEMS timing modules • Transistor-level design for area & power of XOR gates • CACTI for Bloom filter bit array area & power • Simulated system • Single-chip CMP • 16 single-threaded,in-order cores • 32kB, 4-way private L1 I & D, write-back • 8MB, 8-way shared L2 cache • MESI directory protocol • Signatures from 64b-64kb (8B-8kB) & “Perfect”
Workloads University of Wisconsin-Madison • Micro-benchmarks • BTree – read and write ops on shared tree • Sparse Matrix – algorithm from dense column vector multiplication kernel • SPLASH-2 apps • Barnes & Raytrace – exert most signature pressure • Stanford STAMP apps • Vacation, Genome, Delaunay, Bayes, Labyrinth • DNS server • BIND
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
PBX vs H3 area & power University of Wisconsin-Madison Area & power overheads (2kb, k=4):
PBX vs H3 execution time PBX performs similar to H3 Additional workload results in paper University of Wisconsin-Madison
Privatization results summary University of Wisconsin-Madison • Removing private stack references from signatures did not help much • Most addr references not to stack • Most likely because running with SPARC ISA. Other ISAs (e.g., x86) likely has more benefits • Privatization interface helps four workloads • Remainder either does not have private heap structures or does not have high transactional duty cycle
Privatization interface results University of Wisconsin-Madison
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Conclusions University of Wisconsin-Madison • Tackle 2 problems with signature designs: • (1) Area and power overheads of H3 hashing • E.g., 160 XOR gates for H3, 20 for PBX • (2) False conflicts due to signature bits set by private memory references • Our solutions: • (1) Use entropy analysis to guide hashing function (PBX), a low-cost alternative that performs similarly to H3 • (2) Prevent private stack references from entering signatures, and propose a privatization interface for heap allocations • Notary can be applied to non-TM uses: • PBX hashing can directly transfer • Privatization may transfer if addr filtering applies
Future Work University of Wisconsin-Madison • Dynamic entropy calculation: • How to adapt PBX hashing to entropy changes over time? • Dynamic privatization characteristics: • How common is it for objects to change sharing status (i.e., from private to shared, and vice versa)?
BACKUP SLIDES University of Wisconsin-Madison
Privatization interface University of Wisconsin-Madison
Dynamic privatization University of Wisconsin-Madison • Dynamically switch from private to shared, and vice versa • If transitioning from private -> shared, safe to mark page as shared (at cost of performance) • If transitioning from shared -> private, default policy is to disallow if there exists other shared objects on same page • Otherwise, trap to user software and let programmer call shared_free(), followed by private_malloc() on object
Bit-field overlaps harmful for PBX University of Wisconsin-Madison
Removing stack refs doesn’t help significantly University of Wisconsin-Madison
Entropy of commercial workloads University of Wisconsin-Madison
Signature Operation Example Program: xbegin LD A ST B LD C LD D ST C … External ST E External ST F A C D B FALSE POSITIVE: CONFLICT! ALIAS Hash Function(s) NO CONFLICT 00100100 00000100 00100100 00000000 00100100 00100100 R W 00100010 00000000 00100010 00000010 00100010 University of Wisconsin-Madison
Type of Hash Functions Bit-selection H3 [Carter, CSS79] (inexpensive, low quality) (moderate, higher quality) University of Wisconsin-Madison In real programs, addresses neither independent nor uniformly distributed (key assumptions to derive PFP(n)) But can generate hash values that are almostuniformly distributed and uncorrelated with good (universal/almost universal) hash functions Hash functions considered: