1 / 30

Programming w/ Concurrency #2: Multithreaded Programming with Shared Memory

Programming w/ Concurrency #2: Multithreaded Programming with Shared Memory. Joe Duffy FUN405 Program Manager, CLR Team Microsoft Corporation. Agenda. Shared Memory Lock Implementation Trivia Memory Models GUIs and COM Wrap-Up. Shared Memory Basics.

shaw
Download Presentation

Programming w/ Concurrency #2: Multithreaded Programming with Shared Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming w/ Concurrency #2:Multithreaded Programming with Shared Memory Joe Duffy FUN405 Program Manager, CLR Team Microsoft Corporation

  2. Agenda • Shared Memory • Lock Implementation Trivia • Memory Models • GUIs and COM • Wrap-Up

  3. Shared Memory Basics • Concurrent workers can share to communicate • Objects in the heap • Raw memory in the address space • System-wide kernel objects and memory mapped I/O • With sharing comes responsibility • Dealing with broken invariants, avoiding corruption

  4. Shared Memory BasicsConcept recap • Invariants are assumed conditions in your code • When invariant are broken, locking can ensure: • Serialization: Things happen one after the other • Atomicity: Either it happens fully, or the effects are not visible at all object myLock = new object(); void Foo() { lock (myLock) { // munge the data structure (not happy ) // but leave it in a happy state  } } object myLock = new object(); void Foo() { lock (myLock) { try { // munge the data structure // and leave it in a happy state (unless an exception occurs) } catch { if (NotConsistent) // erase any partial munges } } }

  5. Sharing MemoryShared state in our code • WinFX code modifies statics and internal CLR state in a thread-safe manner • Designed to tolerate concurrency • Avoids corrupting shared state • Suggests to hosts when to rip the AppDomain instead of aborting a single thread • Instances are not thread-safe • We don’t know it’s shared if you share it • You are responsible for ensuring thread-safety • There are very few exceptions, e.g. Thread • Recommended guidance for reusable libraries

  6. 1 LockConvoys 1 a b a a 2 3 4 Deadlock! lock lock lock lock 2 1 (low priority) PriorityInversion! 2 (high priority) Locking ChallengesHeisenbugs • General challenges • Deadlocks • Priority inversion • Lock convoys • Accidental deadlocks can be caused by locking on • Cross-AD bled objects, e.g. System.Type • Can also lead to orphaned monitors due to AppDomain death • State publicly accessible from libraries • Scalability challenges • Granularity • Too coarse can lead to decreased throughput • Too fine incurs perf overhead of lots of little locks

  7. Atomicity ChallengesAsynchronous exceptions • Goal: Those who lock never see inconsistencies • How? Patch up broken invariants upon failure • Rock solid atomicity is actually quite hard • Async exceptions can happen nearly anywhere, e.g. sophisticated hosts inject ThreadAborts • Suspend during CERs, catch/finally, .cctors, native code • Don’t panic! • If you’re under a lock, you can assume the AD is being unloaded • Finally blocks and finalizers are often good enough • If you’ve mutated process- or system-wide state you should use SafeHandle

  8. Running Code In Parallel • How to run code in parallel (on the CLR)? • You have many options, in order of preference: • Parallel worker APIs • Async APIs specific to some types • ThreadPool.QueueUserWorkItem(…), • or BackgroundWorker (for UIs) • Explicit threading (e.g. Thread..ctor, .Start) • TP.QUWI and (usually) Async APIs follow the APM • Rendezvous occurs with one of: • Callback delegate • IAsyncResult.IsComplete • IAsyncResult.WaitHandle, or • Just EndXxx (automatically blocks if !IsComplete) • EndXxx always necessary to release resources

  9. Agenda • Shared Memory • Lock Implementation Trivia • Memory Models • GUIs and COM • Wrap-Up

  10. Writing Your Own Lock (?) • Want a spin-lock? • Easy enough to implement yourself… • …or perhaps not class SpinLock { private int state; public void Enter() { while (Interlocked.CompareExchange( ref state, 1, 0) != 0); } public void Exit() { state = 0; } }

  11. Hand Written Spin Lock

  12. Writing Your Own Lock (?!)Not so fast! • Summary: • 99% of the audience shouldn’t need to! • Extremely easy to get wrong, we write them for you • Original attempt robs forward progress • Can hold the bus • Starves other hardware threads • And besides… It’s silly to spin on a single proc • CLR doesn’t know it’s a lock unless you tell it • Begin/EndCriticalRegion tells the host that aborting a single thread could lead to instability (e.g. deadlocks) • And we didn’t even discuss reentrancy and affinity

  13. Agenda • Shared Memory • Lock Implementation Trivia • Memory Models • GUIs and COM • Wrap-Up

  14. Constructor Race Condition • Can inst refer to an uninitialized Foo? class Foo { static Foo inst; string state; bool initialized; private Foo() { state = “I’m happy”; initialized = true; } public Foo Instance { get { if (inst == null) lock (this) { if (inst == null) inst = new Foo(); } return inst; } } } // Two threads concurrently: Foo i = Foo.Instance; Might look something like this (psuedo-jitted code): Foo tmp = GCAlloc(typeof(Foo)); tmp->state = “I’m happy”; tmp->initialized = 1; inst = tmp; But what if it turned into this? inst = GCAlloc(typeof(Foo)); inst->initialized = 1; inst->state = “I’m happy”; • Thread 2 could see non-null inst, yet: • Initialized == 0, or • Initialized ==1, but state == null

  15. Read/Write Reordering • Compilers (JIT) and processors want to execute read and/or writes out of order, e.g. // can become static int x, y; void Foo() { y = 2; // swap and delete one x = 1; // … } // source code static int x, y; void Foo() { y = 1; x = 1; y = 2; // … } • We say the write of x passed the 2nd write to y • Code motion: JIT optimizations • Out of order execution: CPU pipelining, predictive execution • Cache coherency: Hardware threads use several memories • Writes by one processor can move later (in time) due to buffering • Reads can occur earlier due to locality and cache lines • Not legal to impact sequential execution, but can be visible to concurrent programs • Memory models define which observed orderings are permissible

  16. Memory ModelsControlling reordering • Load acquire: Won’t move after future instrs • Store release: Other instrs won’t move after it • Fence: No instructions will “pass” the fence in either direction • A.k.a. barrier

  17. Memory ModelsOn the CLR • Strongest model is sequential/program order • Seldom implemented (x86), we are a bit weaker • Reordering is for performance; limiting that limits the processor’s ability to effectively execute code • Locks acquisitions and releases are fences • Makes code using locking simple[r] • Lock free is costly and difficult – just avoid it! • Notice this didn’t solve the ctor race, however • ECMA specification • Volatile loads have acquire semantics • Volatile stores are release semantics • v2.0 implements a much stronger model • All stores have release semantics • Summary: 2.0 prevents the ctor race we saw earlier • But on strong model machines, won’t cause problems on 1.x

  18. Memory ModelsWhy volatile and Thread.MB()? • Reorderings are still possible • Non-acquire loads can still pass each other • st.rel followed by a ld.acq can still swap • Volatile can fix #1, Thread.MB() can fix #2 • Q: For example, can a > b? • A: Yes, unless a (or b) are marked volatile static int a; static int b; // Thread #1 while (true) { int x = a; int y = b; Debug.Assert(y >= x); } // Thread #2 While (true) { b++; a++; }

  19. Agenda • Shared Memory • Lock Implementation Trivia • Memory Models • GUIs and COM • Wrap-Up

  20. GUIs and Messages

  21. COM Threading ModelMaking concurrency simple

  22. GUIs, COM and MessagingPumping and reentrancy • COM uses a GUI thread for STAs • Each STA thread has a queue and a pump • Method calls on a STA COM proxy pUnk turns into a PostMessage, then pumps waiting for a reply • STA must to pump to dispatch the call, then PostMessage the “return” to the caller • Dispatched calls are stacked onto the STA’s existing call stack • Called reentrancy • Thread-wide state can be implicitly shared • If the pump isn’t running, the queue isn’t draining… deadlocks, “(Not Responding)”, etc.

  23. GUIs and MessagingCLR interoperability • Good news! The CLR does a lot for you • Cross apartment transitions and marshaling • Pumping the STA whenever you do a managed block • Places your threads into an MTA by default • You can override the default apartment choice • STA or MTAThreadAttribute applied to the entry-point • Thread.SetApartmentState for explicit threads • But Visual Studio sticks an STAThreadAttribute on many projects • Some project types require STA, e.g. GUIs (Windows Forms and Windows Presentation Foundation) require it • Using the wrong type can cause COM interop headaches

  24. Fun Pumping and Reentrancy Parlor Trick

  25. FinalizationConcurrency ‘gotchas’ • Finalizer accesses your components from a different MTA thread • CLR objects assuming thread affinity could be surprised • STA components require finalizer to transition • If the STA’s thread isn’t pumping, the finalizer isn’t finalizing • On a server with lots of STA components, not a GoodThing™ • Resurrection dangers • Somebody in a finalization queue can republish your pointer to the world • And then you can be finalized and called concurrently • Can lead to subtle, difficult to find bugs • Moral: Don’t do it (1) to yourself and (2) to others

  26. Agenda • Shared Memory • Lock Implementation Trivia • Memory Models • GUIs and COM • Wrap-Up

  27. log transistors/die log CPU clock freq 5 B<10 GHz >30%/y ! 100 M 3 GHz <10%/y 10,000 1 MHz 2015 2003 1975 Summary • The Platform strives to make Concurrency tractable • We continue to make it easier over time • Locking makes it easier, attempting to be clever comes with a tax • Hardcore architecture and implementation details are fun, provide insight and appreciation, but not necessary to do your day job • The future is a fun place to be: Remember Jan’s graph? • TRY IT OUT!!! (and don’t block your UI thread)

  28. Other Talks • On the DVD (if you missed it) • FUN302: Programming with Concurrency (Part 1) • DAT301: High Performance Cluster Computing • Concurrency futures • FUN323: Fri 8:30 a.m.MSR: Future Possibilities in Concurrency • TLN309: Fri 10:30 a.m.C++: Future Directions in Language Innovation • Other related talks • FUN308: Wed 1:45 p.m.Developing Rock Solid Reliable Apps • FUN412: Thu 10:00 a.m.Five Things Every Win32 Developer Should Know • TLN306: Wed 1:45 p.m.The .NET Language Integrated Query Framework

  29. Q&A/Resources My blog:http://www.bluebytesoftware.com/blog/ Chris Brumme’s blog:http://blogs.msdn.com/cbrumme/ Herb Sutter’s blog:http://www.pluralsight.com/blogs/hsutter/ Other CLR Team Blogs: http://blogs.msdn.com/shawnfa/archive/2005/02/08/369384.aspx .NET Framework 2.0 Joe Duffy, ETA Q4 2005, ISBN: 0764571354 Patterns for Parallel Programming Timothy G. Mattson, et al, ISBN: 0321228111 Concurrent Programming in Java™ Doug Lea, ISBN: 0201310090

  30. © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

More Related