1 / 30

Shimin Chen LBA Reading Group Presentation

Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007. Shimin Chen LBA Reading Group Presentation. Motivation. Synchronization is a challenging step in parallel programming

nita
Download Presentation

Shimin Chen LBA Reading Group Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Colorama: Architectural Support for Data-Centric SynchronizationLuis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin Chen LBA Reading Group Presentation

  2. Motivation • Synchronization is a challenging step in parallel programming • Transactional Memory helpful but still complicated • Programmers have to reason non-locally • Code-centric approach • Data-Centric Synchronization (DSC) desirable • Associate synchronization constraints with data structures • Which data items should be in the same critical section • System automatically inserts sync operations into code • Reason locally

  3. What’s New? • Existing DCS proposal are SW-only (S-DCS) • Cannot handle C/C++ pointer aliasing • Unrealistic • New proposal: hardware DCS (H-DCS) • Colorama • HW primitives to start and exit critical sections • Independent of the underlying sync mechanisms

  4. Outline • Introduction • Data-Centric Synchronization (DCS) • Architectures of Colorama • Programming with Colorama • Evaluation • Conclusion

  5. Data-Centric Synchronization (DCS) • Data consistency domain • Two threads cannot access the same domain at the same time • For example: X, and Y are in the same domain • If a thread is accessing X, no other threads can access X & Y • System needs to automatically infer entry and exit points of critical sections: • Entry: access to data in a domain • Exit: define a simple, clear exit policy and let programmers write code to conform to this policy

  6. Software DCS (S-DCS) • Vaziri et al’s Atomic Sets • Compiler and language extensions to Java • Data consistency domain: atomic set, subset of fields of a Java class • Entry point: compiler analysis • Exit policy: insert exit point • In the same method as the entry point and • Right before method return

  7. Colorama: Hardware DCS • Data consistency domain: color • Data item belongs to a domain: colored • Entry point: detected by HW • Exit policy: driven by compiler • Examples:

  8. Examples Cont’d

  9. Outline • Introduction • Data-Centric Synchronization (DCS) • Architectures of Colorama • Programming with Colorama • Evaluation • Conclusion

  10. Structures Overview • Every colored data item has an entry in Palette (details next) • Per-thread: all 3 structures have the same number of entries • Owned color array: current critical sections • CAB, CRB: used for exit policy

  11. Palette SW managed • Palette based on Mondrian Memory Protection system (Witchel et al. ASPLOS’02) – the white part • Extend with color ID (the gray part) HW

  12. Entry Point • HW monitors each load and store • Check cached Palette for the mem op • Check owned colors array • Trigger a user-level SW handler if accessing a colored region not owned • Handler for entry point: • Add color ID into owned colors array • Start critical section (e.g. begin transaction)

  13. Exit Policy • Exit a critical section when the thread returns from the subroutine where the critical section was entered

  14. Implementing Exit Policy • Color acquire bitmap register (CAB) and color release bitmap register (CRB) • CAB automatically set by HW at entry points • Compiler generates the following code: • Subroutine prologue: Push CAB CAB  0 • Subroutine epilogue: CRB  CAB Pop CAB • Upon write to CRB: HW triggers user-level handler • Handler: remove Color ID from owned color array, exit critical section

  15. Handling Pointers as Subroutine Arguments • Perform multiple operations on a structure together • Propose “colorcheck” instruction

  16. Using Locks as Sync Mechanisms • Colorama can also use locks • Two potential problems: • Longer critical section thus maybe more contention • May deadlock • See evaluations

  17. Outline • Introduction • Data-Centric Synchronization (DCS) • Architectures of Colorama • Programming with Colorama • Evaluation • Conclusion

  18. Correctness • Critical sections of the same color are serialized • Correctly colored programs  data-race free • Possible programming errors: • Fail to color shared data structures • Use different colors to data that should be protected together

  19. Compatibility Issues • Legacy libraries that do not use Colorama • OK if they explicitly protect lib data using locks, etc. • Colorama protects application data outside of lib • Cases requires extensions to Colorama • Worker thread executes an infinite loop that processes incoming request • Needs to release lock, wait, acquire lock in the same loop • Colorama extensions: getcolorid etc.

  20. Complete API

  21. Outline • Introduction • Data-Centric Synchronization (DCS) • Architectures of Colorama • Programming with Colorama • Evaluation • Conclusion

  22. Setup • Evaluation is based on analyzing applications by using a Pin-based tool

  23. Is the Exit Policy Suitable? • Matched: lock acquire & release in same subroutine • Almost all dynamic and 95% static critical sections • Answer: Yes

  24. Critical Section Size Increase

  25. How often multiple independent critical sections are in the same subroutine? • Potential deadlocks • 1% dynamic and 4% static • Detailed analysis shows that the resulting lock order always same, thus no deadlocks

  26. Structure Sizes • # palette rows: # of allocated regions + # of static data objects • # of colors: # lock addr • # of Owned Colors Array entries: max # of active locks held by a thread

  27. Colorama Instruction Overheads • Per-routine: • Prologue & epilogue: 6 insn/routine • 1 colorcheck insn per pointer argument • Estimate 7 insn/routine • On avg, 1.6 routines per 100 dynamic insns: so ~11% insns • Entry and exit handlers: low freq of critical section enry and exit, so low overhead • Coloring overheads ~ memory allocation calls • # of insns between allocations: firefox, gaim, gftp – 2-4K • Memory allocators can keep pools of colored memory (??)

  28. Memory Overhead • MMP: Mondrian Memory Protection • Palette adds 1-2.5% more space over app footprint

  29. Conclusions • Colorama: Hardware Data-Centric Synchronization • HW support for entry and exit points • Evaluation suggests: • Exit policy is suitable • Low impact on critical section lengths • Modest additional overhead over MMP • This paper does not even do simulation!

  30. Related Work • monitors

More Related