430 likes | 687 Views
Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory. TRANSACT 2014. Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy. Multicore Performance Scaling. 2. Hardware Transactional Memory (HTM).
E N D
Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory TRANSACT 2014 Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy
Hardware Transactional Memory (HTM) IBM’s Blue Gene/Q & System Z & Power Architecture Intel’s Haswell TSX: RTM & HLE Low overhead (cache based) 3
Haswell RTM if (_xbegin() == _XBEGIN_STARTED) Speculate Execution _xend() else Abort Handler Speculate Execution, without any locks Read and Write Sets Abort on memory conflict 4
Haswell RTM if (_xbegin() == _XBEGIN_STARTED) Speculate Execution _xend() _xbegin() _xbegin() Add to Write Set Write X ABORT Add to Read Set Read X Add to Write Set Write Y Write Y Add to Write Set _xend() COMMIT _xend() Make the change to Y visible 5
Lock Elision <HLE_Aquire_Prefix> Lock(L) Atomic region executed as a transaction or mutually exclusive on L <HLE_Release_Prefix> Release(L) Execute optimistically, without any locks Track Read and Write Sets Abort on memory conflict: rollback acquire lock 6
7 [Anand Tech]
Haswell RTM Small & Medium Transactions Best-effort Conflicts Overflow Unsupported Instructions Interrupts Needs software fallback 8
Overview • Best-effort Hardware Transactional Memory Description • Lazy SGL Correctness Results • Bloom Filter SGL 9
Single Global Lock HyTM (simple and common) Begin HW txn Read L Try_SPEC: Wait until Lock is free Transactional_Read(Lock) If Lock is taken ABORT Speculate critical section End speculation End HW txn Begin SW txn Acquire L On_ABORT: If try_lock(Lock) Critical section Release(Lock) Else Try_SPEC Does not abort! Release L End SW txn 10
Single Global Lock HyTM (simple and common) Begin HW txn Read L Begin HW txn Read L Begin SW txn Acquire L Begin HW txn Read L X X X Begin HW txn Read L End HW txn (1) Time End HW txn (2) X Release L End SW txn End HW txn (3) End HW txn (4) Legend: X = ABORT 11
Thread 2 Thread 1 Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Acquire(L) CRITICAL SECTION (SW TXN) Time Release(L) Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION End_HW_TXN (L) Execution Time 1 12
Thread 2 Thread 1 Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Acquire(L) CRITICAL SECTION (SW TXN) Begin_HW_TXN(L) CRITICAL SECTION Time Release(L) End_HW_TXN(L) Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Execution Time 2 Execution Time 1 13
Lazy SGL Begin HW txn Try_SPEC: Speculate critical section Transactional_Read(Lock) If Lock is taken ABORT End speculation Read L End HW txn Begin SW txn Acquire L On_ABORT: If try_lock(Lock) Critical section Release(Lock) Else Try_SPEC Does not abort! Release L End SW txn 14 14
Lazy SGL Begin HW txn Begin HW txn Begin SW txn Acquire L Begin HW txn X Begin HW txn Read L End HW txn (1) X Read L End HW txn (2) Time Release L End SW txn Read L End HW txn (3) Read L End HW txn (4) Legend: X = ABORT COMMIT COMMIT 15
Overview • Best-effort Hardware Transactional Memory Description • Lazy SGL Correctness Results • Bloom Filter SGL 16
Transactional Memory Correctness Transaction 1 SW Order T2 BEFORE T1 Transaction 2 HW Time COMMIT Order T2 AFTER T1 COMMIT 17
Case 1: HW begins SW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) TXN_BEGIN … X = b … TXN_END Acquire Lock … X = a … Release Lock X value: Time b a Correct: a Actual: b Check Lock Correct: a Actual: a ABORT 18
Case 2: SW begins HW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) Acquire Lock … X = a … Release Lock TXN_BEGIN … X = b … TXN_END X value: Time Correct: a Actual: a Correct: a Actual: b Check Lock ABORT 19
Case 3: SW begins HW begins SW ends HW ends Thread 1 (SW) Thread 2 (HW) Acquire Lock … X = a … Release Lock TXN_BEGIN … X = b … TXN_END X value: Time b a Check Lock Correct: b Actual: b COMMIT 20
Case 4: HW begins SW begins SW ends HW ends Thread 1 (SW) Thread 2 (HW) TXN_BEGIN … X = b … TXN_END Acquire Lock … X = a … Release Lock Time X value: Correct: b Actual: b Check Lock COMMIT 21
Hardware Sandboxing Thread 1 (SW) Thread 2 (HW) X = 5; Y = 6 Acquire Lock … ++X … ++Y … Release Lock TXN_BEGIN … Z = 1/(Y-X) … TXN_END Time Z = 1/0 !!! 22
Indirect Jumps Thread 1 (SW) Thread 2 (HW) _xbegin … if (X == Y) *p = garbage p() … if (lock) abort _xend X = 5; Y = 6 Acquire Lock … ++X … ++Y … Release Lock _xend Indirect jump to garbage location Time 23
Overview • Best-effort Hardware Transactional Memory Description • Lazy SGL Correctness Results • Bloom Filter SGL 24
Intruder (medium txns) Better 25
Improved Lock Acquisition Rate Vacation Low (medium txns) Intruder (medium txns) Better Kmeans High (small txns) Labyrinth (large txns) 26
No single thread overhead Slowdown relative to sequential for 1 thread 27
Overview • Best-effort Hardware Transactional Memory Description • Lazy SGL Correctness Results • Bloom Filter SGL 28
Bloom Filters • Efficient probabilistic data structure to compute fast set intersection • Can admit false positives • No false negatives • Used in TM for Conflict Detection 29
Lazy SGL Begin HW txn Begin HW txn Begin SW txn Acquire L Begin HW txn X Begin HW txn Read L End HW txn (1) X Read L End HW txn (2) Time Release L End SW txn Read L End HW txn (3) Read L End HW txn (4) Legend: X = ABORT COMMIT COMMIT 30
BF SGL Begin HW txn Begin HW txn Begin SW txn Acquire L Begin HW txn Begin HW txn Check BF End HW txn (1) Time Check BF End HW txn (2) Release L End SW txn Read L End HW txn (3) Read L End HW txn (4) Legend: X = ABORT COMMIT COMMIT 31
Case 1: HW begins SW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) TXN_BEGIN … X = b … TXN_END Acquire Lock … X = a … Release Lock X value: Time b a Correct: a Actual: b Check BF Check Lock Correct: a Actual: a ABORT If BFs intersect: ABORT Else: COMMIT 32
Case 2: SW begins HW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) Acquire Lock … X = a … Release Lock TXN_BEGIN … X = b … TXN_END X value: Time Correct: a Actual: a Correct: a Actual: b Check Lock Check BF ABORT If BFs intersect: ABORT Else: COMMIT 33
Conclusions • HTMs are becoming more available • Best-effort – need software fallback • Eager SGL • simple and fast fallback, • often preferred to more efficient solutions 34
Conclusions • Lazy SGL • as simple as Eager SGL • more efficient • Bloom Filter SGL • more accurate conflict detection • Slower • Can be implemented directly in hardware 35
References http://de.sap.info/wp-content/uploads/2012/02/In_Memory_Technologie.jpg http://www.avoiceformen.com/wp-content/uploads/sites/2/2013/01/Questions.jpg