220 likes | 319 Views
Fence Complexity in Concurrent Algorithms. Petr Kuznetsov TU Berlin/DT-Labs. STM is about ease-of-programming and efficiency. What is “efficient“ in a concurrent system?. Cost metrics. Space: used memory Cheap Advanced garbage-collection Time:
E N D
Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs
STM is about ease-of-programming and efficiency What is “efficient“ in a concurrent system?
Cost metrics • Space: used memory • Cheap • Advanced garbage-collection • Time: • the number of reads and writes (per operation) • the number of stalls
Relaxed memory models Memory is much slower than CPU Read: check the cache -> read the memory Write: invalidate the caches -> update the memory To overcome “stalled writes” – reorder operations Reordering may result in inconsistency
What is inconsistency? Process P: Write(X,1) Read(Y) Process Q: Write(Y,1) Read(X) R(Y) W(X,1) W(X,1) P Q W(Y,1) R(X)
Possible outcomes Out-of-order P Q P reads before Q writes Q reads after P writes P reads after Q writes Q reads before P writes
Fixing out-of-order • Memory fences: read-after-write (RAW) write(X,1) fence() // enforce the order read(Y) W(X,1) R(Y) P Q W(Y,1) R(X)
Fixing out-of-order • Atomic operations: atomic-write-after-read atomic{ read(Y) … write(X,1) } E.g., CAS, TAS, Fetch&Add,… RAW/AWAR fences take ~60 RMRs
Our result • Any concurrent program in a certain class must use RAW/AWARs
What programs? • Concurrent data types: • queues, counters, hash tables, trees,… • Non-commutative operations • Linearizable solo-terminating implementations • Mutual exclusion
Non-commutative operations Operation A is non-commutative if there exists operation B where (applied to some state): A influences B and B influences A
Example: Queue • enq(v) – add v to the end of the queue • deq() – dequeues the item at the head of the queue Q=1;2 Q.deq():1;Q.deq():2 vs. Q.deq():2;Q.deq():1 deq() influence each other Q.enq(3):ok;Q.deq():1 vs. Q.deq():1;Q.enq(3):ok enq() is commutative
Proof sketch • A non-commutative operation must write • Suppose not deq():1 deq():1 1;2 w there must be a write!
Proof sketch • Let w be the first write • Suppose there are no AWAR A(w) - the longest atomic construct containing w deq():1 1;2 w w must be the first base-object event in A(w)!
Proof sketch • Suppose there are no RAWs deq():1 deq():1 1;2 A(w) No RAW - no difference for deq()!
Mutual exclusion Lock() – acquire the lock Unlock() – release the lock • (Mutex) No two process holds the lock at the same time • (Deadlock-freedom) If at least one process executes Lock() and no active process fails, at least one process acquires the lock Two Lock() operations influence each other!
Our result • In any implementation of mutual exclusion or a concurrent data type with a non-commutative operation op, a complete execution of op or lock() contains a • RAW or AWAR • Every successful lock acquire incurs • a RAW/AWAR fence
Why do we care? • Hardware design: what primitives must be optimized? • API design: returned values matter • Set with add returning fail vs. returning ok • Verification – early catch of obviously incorrect algorithm
What’s next? • Weaker primitives? • Idempotent Work Stealing [Michael et al,PPoPP’09 ] • Tight lower bounds? • How many RAW/AWAR fences are incurred? • Other patterns • Read-after-read • Write-after-write • Multi-RAW: write(Xi,1) collect(X1,..,Xn)
References • H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. Michael, M. VechevLaws of Order: Expensive Synchronization in Concurrent Algorithms Cannot be EliminatedIn POPL 2011 • Srivatsan’s talk on STM fence complexity, TR on the way