Programming w/ Concurrency #2: Multithreaded Programming with Shared Memory

Programming w/ Concurrency #2:Multithreaded Programming with Shared Memory Joe Duffy FUN405 Program Manager, CLR Team Microsoft Corporation

Agenda • Shared Memory • Lock Implementation Trivia • Memory Models • GUIs and COM • Wrap-Up

Shared Memory Basics • Concurrent workers can share to communicate • Objects in the heap • Raw memory in the address space • System-wide kernel objects and memory mapped I/O • With sharing comes responsibility • Dealing with broken invariants, avoiding corruption

Shared Memory BasicsConcept recap • Invariants are assumed conditions in your code • When invariant are broken, locking can ensure: • Serialization: Things happen one after the other • Atomicity: Either it happens fully, or the effects are not visible at all object myLock = new object(); void Foo() { lock (myLock) { // munge the data structure (not happy ) // but leave it in a happy state  } } object myLock = new object(); void Foo() { lock (myLock) { try { // munge the data structure // and leave it in a happy state (unless an exception occurs) } catch { if (NotConsistent) // erase any partial munges } } }

Sharing MemoryShared state in our code • WinFX code modifies statics and internal CLR state in a thread-safe manner • Designed to tolerate concurrency • Avoids corrupting shared state • Suggests to hosts when to rip the AppDomain instead of aborting a single thread • Instances are not thread-safe • We don’t know it’s shared if you share it • You are responsible for ensuring thread-safety • There are very few exceptions, e.g. Thread • Recommended guidance for reusable libraries

1 LockConvoys 1 a b a a 2 3 4 Deadlock! lock lock lock lock 2 1 (low priority) PriorityInversion! 2 (high priority) Locking ChallengesHeisenbugs • General challenges • Deadlocks • Priority inversion • Lock convoys • Accidental deadlocks can be caused by locking on • Cross-AD bled objects, e.g. System.Type • Can also lead to orphaned monitors due to AppDomain death • State publicly accessible from libraries • Scalability challenges • Granularity • Too coarse can lead to decreased throughput • Too fine incurs perf overhead of lots of little locks

Atomicity ChallengesAsynchronous exceptions • Goal: Those who lock never see inconsistencies • How? Patch up broken invariants upon failure • Rock solid atomicity is actually quite hard • Async exceptions can happen nearly anywhere, e.g. sophisticated hosts inject ThreadAborts • Suspend during CERs, catch/finally, .cctors, native code • Don’t panic! • If you’re under a lock, you can assume the AD is being unloaded • Finally blocks and finalizers are often good enough • If you’ve mutated process- or system-wide state you should use SafeHandle

Running Code In Parallel • How to run code in parallel (on the CLR)? • You have many options, in order of preference: • Parallel worker APIs • Async APIs specific to some types • ThreadPool.QueueUserWorkItem(…), • or BackgroundWorker (for UIs) • Explicit threading (e.g. Thread..ctor, .Start) • TP.QUWI and (usually) Async APIs follow the APM • Rendezvous occurs with one of: • Callback delegate • IAsyncResult.IsComplete • IAsyncResult.WaitHandle, or • Just EndXxx (automatically blocks if !IsComplete) • EndXxx always necessary to release resources

Writing Your Own Lock (?) • Want a spin-lock? • Easy enough to implement yourself… • …or perhaps not class SpinLock { private int state; public void Enter() { while (Interlocked.CompareExchange( ref state, 1, 0) != 0); } public void Exit() { state = 0; } }

Hand Written Spin Lock

Writing Your Own Lock (?!)Not so fast! • Summary: • 99% of the audience shouldn’t need to! • Extremely easy to get wrong, we write them for you • Original attempt robs forward progress • Can hold the bus • Starves other hardware threads • And besides… It’s silly to spin on a single proc • CLR doesn’t know it’s a lock unless you tell it • Begin/EndCriticalRegion tells the host that aborting a single thread could lead to instability (e.g. deadlocks) • And we didn’t even discuss reentrancy and affinity

Constructor Race Condition • Can inst refer to an uninitialized Foo? class Foo { static Foo inst; string state; bool initialized; private Foo() { state = “I’m happy”; initialized = true; } public Foo Instance { get { if (inst == null) lock (this) { if (inst == null) inst = new Foo(); } return inst; } } } // Two threads concurrently: Foo i = Foo.Instance; Might look something like this (psuedo-jitted code): Foo tmp = GCAlloc(typeof(Foo)); tmp->state = “I’m happy”; tmp->initialized = 1; inst = tmp; But what if it turned into this? inst = GCAlloc(typeof(Foo)); inst->initialized = 1; inst->state = “I’m happy”; • Thread 2 could see non-null inst, yet: • Initialized == 0, or • Initialized ==1, but state == null

Read/Write Reordering • Compilers (JIT) and processors want to execute read and/or writes out of order, e.g. // can become static int x, y; void Foo() { y = 2; // swap and delete one x = 1; // … } // source code static int x, y; void Foo() { y = 1; x = 1; y = 2; // … } • We say the write of x passed the 2nd write to y • Code motion: JIT optimizations • Out of order execution: CPU pipelining, predictive execution • Cache coherency: Hardware threads use several memories • Writes by one processor can move later (in time) due to buffering • Reads can occur earlier due to locality and cache lines • Not legal to impact sequential execution, but can be visible to concurrent programs • Memory models define which observed orderings are permissible

Memory ModelsControlling reordering • Load acquire: Won’t move after future instrs • Store release: Other instrs won’t move after it • Fence: No instructions will “pass” the fence in either direction • A.k.a. barrier

Memory ModelsOn the CLR • Strongest model is sequential/program order • Seldom implemented (x86), we are a bit weaker • Reordering is for performance; limiting that limits the processor’s ability to effectively execute code • Locks acquisitions and releases are fences • Makes code using locking simple[r] • Lock free is costly and difficult – just avoid it! • Notice this didn’t solve the ctor race, however • ECMA specification • Volatile loads have acquire semantics • Volatile stores are release semantics • v2.0 implements a much stronger model • All stores have release semantics • Summary: 2.0 prevents the ctor race we saw earlier • But on strong model machines, won’t cause problems on 1.x

Memory ModelsWhy volatile and Thread.MB()? • Reorderings are still possible • Non-acquire loads can still pass each other • st.rel followed by a ld.acq can still swap • Volatile can fix #1, Thread.MB() can fix #2 • Q: For example, can a > b? • A: Yes, unless a (or b) are marked volatile static int a; static int b; // Thread #1 while (true) { int x = a; int y = b; Debug.Assert(y >= x); } // Thread #2 While (true) { b++; a++; }

GUIs and Messages

COM Threading ModelMaking concurrency simple

GUIs, COM and MessagingPumping and reentrancy • COM uses a GUI thread for STAs • Each STA thread has a queue and a pump • Method calls on a STA COM proxy pUnk turns into a PostMessage, then pumps waiting for a reply • STA must to pump to dispatch the call, then PostMessage the “return” to the caller • Dispatched calls are stacked onto the STA’s existing call stack • Called reentrancy • Thread-wide state can be implicitly shared • If the pump isn’t running, the queue isn’t draining… deadlocks, “(Not Responding)”, etc.

GUIs and MessagingCLR interoperability • Good news! The CLR does a lot for you • Cross apartment transitions and marshaling • Pumping the STA whenever you do a managed block • Places your threads into an MTA by default • You can override the default apartment choice • STA or MTAThreadAttribute applied to the entry-point • Thread.SetApartmentState for explicit threads • But Visual Studio sticks an STAThreadAttribute on many projects • Some project types require STA, e.g. GUIs (Windows Forms and Windows Presentation Foundation) require it • Using the wrong type can cause COM interop headaches

Fun Pumping and Reentrancy Parlor Trick

FinalizationConcurrency ‘gotchas’ • Finalizer accesses your components from a different MTA thread • CLR objects assuming thread affinity could be surprised • STA components require finalizer to transition • If the STA’s thread isn’t pumping, the finalizer isn’t finalizing • On a server with lots of STA components, not a GoodThing™ • Resurrection dangers • Somebody in a finalization queue can republish your pointer to the world • And then you can be finalized and called concurrently • Can lead to subtle, difficult to find bugs • Moral: Don’t do it (1) to yourself and (2) to others

log transistors/die log CPU clock freq 5 B<10 GHz >30%/y ! 100 M 3 GHz <10%/y 10,000 1 MHz 2015 2003 1975 Summary • The Platform strives to make Concurrency tractable • We continue to make it easier over time • Locking makes it easier, attempting to be clever comes with a tax • Hardcore architecture and implementation details are fun, provide insight and appreciation, but not necessary to do your day job • The future is a fun place to be: Remember Jan’s graph? • TRY IT OUT!!! (and don’t block your UI thread)

Other Talks • On the DVD (if you missed it) • FUN302: Programming with Concurrency (Part 1) • DAT301: High Performance Cluster Computing • Concurrency futures • FUN323: Fri 8:30 a.m.MSR: Future Possibilities in Concurrency • TLN309: Fri 10:30 a.m.C++: Future Directions in Language Innovation • Other related talks • FUN308: Wed 1:45 p.m.Developing Rock Solid Reliable Apps • FUN412: Thu 10:00 a.m.Five Things Every Win32 Developer Should Know • TLN306: Wed 1:45 p.m.The .NET Language Integrated Query Framework

Q&A/Resources My blog:http://www.bluebytesoftware.com/blog/ Chris Brumme’s blog:http://blogs.msdn.com/cbrumme/ Herb Sutter’s blog:http://www.pluralsight.com/blogs/hsutter/ Other CLR Team Blogs: http://blogs.msdn.com/shawnfa/archive/2005/02/08/369384.aspx .NET Framework 2.0 Joe Duffy, ETA Q4 2005, ISBN: 0764571354 Patterns for Parallel Programming Timothy G. Mattson, et al, ISBN: 0321228111 Concurrent Programming in Java™ Doug Lea, ISBN: 0201310090

Programming w/ Concurrency #2: Multithreaded Programming with Shared Memory

Programming w/ Concurrency #2: Multithreaded Programming with Shared Memory

Presentation Transcript

Chapter 4: Dynamic Programming

Sales positioning

Concepts of Programming Languages

Programming with C# and .NET

LINEAR PROGRAMMING

Chapter 1

MIPS Assembly Language Programming

Advanced Programming Techniques

An introduction to Logic Programming

PROGRAMMING IN HASKELL

VEX/ROBOTC Session 1

Introduction to Object-Oriented Programming and the Java Programming Language

The JavaScript Programming Language

Prolog: Programming in Logic

Object Oriented Programming

Internet and Java Foundations, Programming and Practice

TM 331: Computer Programming Introduction to Class, Introduction to Programming

C++ Programming: From Problem Analysis to Program Design , Fifth Edition

Dynamic Programming (continued)

Introduction to Computer programming

Socket Programming(2/2)