Thread-specific Heaps for Multi-threaded Programs

Thread-specific Heapsfor Multi-threaded Programs Bjarne Steensgaard Microsoft Research

GC and Threads Traditional approaches: • Pseudo-concurrency => no concurrency • Concurrent GC => synchronization overhead • Stop and GC => no concurrency during GC Observations leading to our approach: • Much data is only used by a single thread • When collecting data used only by a single thread, other threads can be ignored

GC and Thread-specific Heaps Thread-specific Heaps • Contains data only accessed by a single thread • Can be GC’ed independently of and concurrently with other thread-specific heaps (no pointers from the outside into these heaps) Shared Heap • Contains data possibly shared among threads • GC’ed using one of the traditional approaches

Advantages • Concurrent collection of thread heaps • Increased locality of GC • Reduced GC latency (shorter “stops”) • Reduced memory overhead for two-space copying components of GC • “To”-space only needed for heaps actively being copied, “from” space can be released as copying of each heap is completed

Enabling Thread-specific Heaps Memory requests must be specialized • Shared or thread-specific; choose conservatively • Must observe the invariant that there are no pointers from shared data to thread-specific data Root set division • May distinguish shared and thread-specific roots • Not necessary (and not implemented), but could reduce GC latency

Compiler Support in Marmot Escape and Access Analysis • Interprocedural, flow-insensitive, context-sensitive • Polymorphic type inference (monomorphic recursion) for a non-standard type system • Tracks object flow and threads object access • Objects “escape” only when potentially accessed by multiple threads (as opposed to being visible to multiple threads)

Compiler Support in Marmot Method specialization • Duplicate methods as necessary to specialize memory requests according to analysis results (and to call other specialized methods) • Crucial for achieving a usable separation of objects into shared and thread-specific objects Very similar to Ruf’s PLDI’00 work • Analysis and transformation stages are similar to Ruf’s work to remove synchronization ops

Thread-specific GC in Marmot Prototype! Proof of concept • Modified two-generation copying GC • Each heap has two generations When a GC is triggered, all heaps are GC’ed • Reachable objects in the shared heap are copied first by a single thread • Threads then copy objects from their own heaps (helper threads are available for blocked threads) • When thread copying is complete, thread is restarted • Minimal synchronization needed for copying shared objects after initial copy of shared objects

Example Shared root Thread 1 root Thread 2 root Thread 3 root Thread-specific object Legend: Shared object

Performance and Efficacy Performance • On par with existing garbage collector for most programs, better for others Efficacy • Unknown! Most available programs do not use multi-threading for interesting purposes

Efficacy Examples • VolanoMark (chat client/server) shares almost all long-lived data among threads • Client: allocates ½MB thread, 16MB shared data,copies 4KB thread, 1.2MB shared data • Server: allocates 5MB thread, 10MB shared data,copies 5KB thread, 1.7MB shared dataGC has improved locality, but otherwise little benefit • Mtrt benefits greatly, but is a poor benchmark • Allocates 27MB thread, ½MB shared data,copies6.5MB thread, 170MB shared data

Future Work • Variations on how to collect the heaps • Heaps for thread groups or groups of threads • Allowing non-followed pointers from shared objects to thread-specific objects • Allowing thread-specific objects in shared containers using programmer annotations

Heap A Heap D Heap F Heap E Heap B Heap C Multi-layer Heap Division Partially ordered rather than per-thread heaps Completely ordered heaps • If very fine-grained, then we are approaching Tofte & Talpin’s “Stack of Regions” approach

Other Heap Divisions User-defined divisions checked by compiler • FX with regions Divisions according to major data structures • Example: a compiler could use different heap for program representation and analysis results • Permits customizing the collector to the nature of the data structure • The IBM folks are experimenting with “memory contexts”

Related Work • Andy King & Richard JonesUniversity of Kent • Static division into thread-specific heaps • Pat Caudill & Allen Wirfs-BrockInstantiations, Inc. (makers of Jove) • Dynamic division into thread-specific heaps • Use write-barrier and copy-on-GC to deal with objects that are really shared among threads

Thread-specific Heaps for Multi-threaded Programs

Thread-specific Heaps for Multi-threaded Programs

Presentation Transcript

Multi-threaded RTOS

Thread-specific Storage (TSS)

Multi-threaded Active Objects

Kiwi: Synthesis of FPGA Circuits from Multi-Threaded C# Programs

Multi-threaded Active Objects

Multi-threaded applications

Regression Verification for Multi-Threaded Programs

Multi-threaded Reachability

Tera MTA (Multi-Threaded Architecture)

Multi Threaded Chat Server

Multi-threaded Reachability

Dynamic Data-Race Detection in Lock-Based Multi-Threaded Programs

Multi-Threaded Transactions

Distributed Verification of Multi-threaded C++ Programs

Multi-threaded programming with NSPR

Multi-Thread Programming

Parallelism (Multi-threaded)

Multi-threaded RTOS

Multi-Threaded Video Rendering

Multi-threaded ROOT