1 / 28

The Compressor: Concurrent, Incremental and Parallel Compaction.

The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology. The Compressor. The first compactor with one heap pass. Fully compacts all the objects in the heap. Preserves the order of the objects.

andrew
Download Presentation

The Compressor: Concurrent, Incremental and Parallel Compaction.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology

  2. The Compressor • The first compactor with one heap pass. • Fully compacts all the objects in the heap. • Preserves the order of the objects. • Low space overhead. • A parallel version and a concurrent version.

  3. Garbage collection • Automatic memory management. • User allocates objects • Memory manager reclaims objects which “are not needed anymore”. In practice: unreachable from roots.

  4. High throughput Parallel (STW) t t short Pauses Concurrent & Parallel Modern Platforms and Requirements • High performance and low pauses. • SMP and Multicore platforms: • Use parallel collectors for highest efficiency • Use concurrent collectors for short pauses.

  5. Main Streams in GC • Mark and Sweep • Trace objects. • Go over the heap and reclaim the unmarked objects. • Reference Counting • Keep the number of pointers to each object. • When an object counter reaches zero, reclaim the object. • Copying • Divide the heap into two spaces. • Copy all the objects from one space to the other.

  6. Compaction - Motivation • M&S and RC face the problem of fragmentation. • Fragmentation – unused space between live objects due to repeated allocation and reclaiming. • Allocation efficiency decreases. • May fail to allocate large objects. • Cache behavior may be harmed. • Compaction – move all the live objects to one place in the heap. • Best practice: keep order of objects for best locality.

  7. Traditional Compaction • Go over the heap and write the new location of every object in its header (install a forwarding pointer). • Update all the pointers in the roots and the heap. • Move the objects Stack • Three Heap Passes

  8. Agenda • Introduction: garbage collection, servers, compaction. • The Compressor: • Basic technique • Obtain compaction with a single heap pass. • The parallel version. • The concurrent version. • Measurements • Related Work. • Conclusion

  9. Compressor - Overview • Compute new locations of objects • Fix root pointers • Move objects + fix their pointers Stack • One Heap Passplus one pass over the (small) mark-bits table.

  10. 0 50 90 125 200 275 325 350 Offset vector 0 1 2 3 4 5 6 7 8 9 The Heap 1000 1100 1200 1300 1400 1500 1600 1700 Addresses Compute new locations • Computing new locations and saving this info succinctly: • Heap partitioned to blocks (typically, 512 bytes). • Start by computing and saving for each block the total size of objects preceding that block (the offset vector).

  11. 0 50 90 125 200 275 325 350 0 1 2 3 4 5 6 7 8 9 Offset vector Markbit vector The Heap Addresses 1000 1100 1200 1300 1400 1500 1600 1700 Computing A New Address • Assume a markbit vector which reflect the heap: • First and the last bits of each object are set. • A new location of an object is computed from the markbit and the offset vectors: • for object 5, at the 4th block the new location is: 1000 + 125 +50 = 1175.

  12. 0 50 90 125 200 275 325 350 0 1 2 3 4 5 6 7 8 9 Offset vector Markbit vector The Heap Addresses 1000 1100 1200 1300 1400 1500 1600 1700 Computing Offset Vector • Computed from the markbit vector. • Does not require a heap pass

  13. Properties • Single heap pass. • Plus one pass over the markbit vector. • Small space overhead. • Does not need a forwarding pointer. • Single threaded. • Stop-the-world. • Next: • A parallel stop-the-world (STW) version. • A concurrent version.

  14. Parallelization – First Try • Had we divided the heap to two spaces… • The application uses only one space. • The Compressor compacts the objects from one space (from-space) to the other (to-Space). • Advantage: objects can be moved independently. • Problem: space overhead.

  15. Eliminating Space Overhead • Initially, to-space is not mapped to physical pages. • It is a virtual address space. • For every (virtual) page in to-space: (a parallel loop) • Map the virtual page to physical memory. • Move the corresponding from-space objects and fix the pointers. • Unmap the relevant pages in from-space. roots 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

  16. Properties • All virtues of basic Compressor: • Single heap pass, small space overhead. • Easy parallelization: each to-space page can be handled independently. • Stop-The-World.

  17. What about Concurrency? • Problem: two copies appear when moving objects during application run. • Sync. problems between compaction and application. • Solution (Baker style): Application can only access moved objects (in to-space).

  18. ConcurrentVersion • Stop application • Fix roots to new locations in to-space. • Read-protect to-space and let application resume. • When application touches a to-space page a trap is sprung. • Trap handler moves relevant objects into the page and unprotect the page. roots 0 1 2 3 4 5 6 7 8 9 4 5 6 7 8 9

  19. Implementation & Measurements • Implementation on the Jikes RVM. • Compressor added to a simple modification of the Jikes mark-sweep collector (main modification: allocation via local-caches). • Compressor invoked once every 10 collections. • Benchmarks: SPECjbb, Dacapo, SPECjvm98. • In the talk we concentrate on SPECjbb • Compared collectors: • no compaction algorithms on the Jikes RVM. • Some comparison to mark-sweep (MS) and an Appel Generational collector (GenMS).

  20. SPECjbb Throughput CON = Concurrent Compressor, STW = Parallel Compressor

  21. SPECjbb pause time (ms)

  22. SPECjbb - Allocations per time

  23. Dacapo - Allocations per time

  24. Previous Work on Compaction • Early works: Two-finger, Lisp2, and the threaded algorithm [Jonkers and Morris] are single threaded and therefore create a large pause time. • [Flood et al. 2001] first parallel compaction algorithms. But has 3 heap passes and creates several dense areas. • [Abuaiadh et al. 2004] Parallel with two heap passes, not concurrent. • [Ossia et al. 2004] execute the pointer fix-up part concurrently.

  25. Related Work • Numerous concurrent and parallel garbage collectors. • Copying collectors [Cheney 70] compact objects during the collection but require a large space overhead and do not retain objects order. • Savings in space overhead for copying collectors [Sachindran and Moss 2004] • [Bacon et al. 2003, Click et al. 2005] propose an incremental compaction. But it uses a read barrier, and does not keep the order of objects.

  26. Complexity Comparisons

  27. Conclusion The Compressor: • The first compactor that passes over the heap only once. • Plus one pass over the mark-bits vector. • Fully compacts all the objects in the heap. • Preserves the order of the objects. • Low space overhead. • Uses memory services to obtain parallelism. • Uses traps to obtain concurrency.

  28. Questions

More Related