1 / 22

Parallelizing Data Race Detection

Parallelizing Data Race Detection. Benjamin Wester Facebook David Devecsery , Peter Chen, Jason Flinn , Satish Narayanasamy University of Michigan. Data Races. t 1. t 2. TIME. lock( l ) x = 1 unlock( l ) x = 3. lock( l ) x = 2 unlock( l ). Race Detection. Instrumentation +

roseannad
Download Presentation

Parallelizing Data Race Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, SatishNarayanasamy University of Michigan

  2. Data Races t1 t2 TIME lock(l) x = 1 unlock(l) x = 3 lock(l) x = 2 unlock(l) Benjamin Wester - ASPLOS '13

  3. Race Detection Instrumentation + Analysis Normal run time With race detector TIME • Detection is slow! • 8x–200x • Managed vs. binary • Granularity • Completeness • Debug info gathered • Matches app concurrency Benjamin Wester - ASPLOS '13

  4. Race Detection Normal run time Parallel race detector With race detector TIME • Detection is slow! • 8x–200x • Managed vs. binary • Granularity • Completeness • Debug info gathered • Matches app concurrency Benjamin Wester - ASPLOS '13

  5. How to use parallel hardware? • Scale program? • Performance tied to app • Parallel algorithm? • Lots of fine-grained dependencies • Uniparallelism • Converts execution into parallel pipeline • Works for instrumentation • [Wallace CGO’07, • Nightingale ASPLOS’08] Benjamin Wester - ASPLOS '13

  6. Uniparallelism TIME Benjamin Wester - ASPLOS '13

  7. Uniparallelism Epoch-parallel Execution Epoch-sequential Execution TIME Benjamin Wester - ASPLOS '13

  8. Uniparallelism • Scales well: Epochs are independent execution units • More efficient: Synchronization TIME Benjamin Wester - ASPLOS '13

  9. Add Detector… Here! • Scales well: Epochs are independent execution units • More efficient: Synchronization & Lock elision Race detection depends on state! TIME Benjamin Wester - ASPLOS '13

  10. Architecture Epoch-sequential Epoch-parallel Commit Benjamin Wester - ASPLOS '13

  11. Moving Work Identify meaningful fast subset of analysis state • Low frequency, lightweight, self-contained • Run in Epoch-Sequential phase • Predicted and usablein parallel phase Symbolicexecution • Replace missing state with symbols • Defer some computation to commit phase • Commit: replace symbols with concrete values Benjamin Wester - ASPLOS '13

  12. Architecture Epoch-sequential Epoch-parallel Commit Full instrumentation Symbolic execution Instrument and predict fast subset of analysis state Perform deferred work on concrete values Benjamin Wester - ASPLOS '13

  13. Parallel FastTrack State: Vector clocks – e.g. ⟨0, 1, 0⟩ – for each: • Thread • Lock • Variable x 2: last read(s), last write Example computation: Check(read_x ≤ thread_i );write_x = thread_i Transitive reduction Use knowledge of happens-before relationship Fast Slow Benjamin Wester - ASPLOS '13

  14. Parallel Eraser State: • Set of locks held by thread (lockset) • Lockset for each variable • Position in state machine for each variable Example computation: If state_x == SHARED then ls_x = ls_x ∩ {L, M} Lockset factorization Factor out common behavior Fast Slow Benjamin Wester - ASPLOS '13

  15. Parallel Performance Efficiency Scalability 2 worker threads as baseline 8-CPU Platform Benchmarks: • SPLASH-2 (water, lu, ocean, fft, radix) • Parallel app: pbzip2 Benjamin Wester - ASPLOS '13

  16. Efficiency Exactly 2 cores Benjamin Wester - ASPLOS '13

  17. Efficiency Exactly 2 cores Median 8% faster Median 13% faster Benjamin Wester - ASPLOS '13

  18. Scaling Parallel FastTrack 2 Worker Threads Benjamin Wester - ASPLOS '13

  19. Scaling Parallel FastTrack 2 Worker Threads Median 4.4x with 8 cores Benjamin Wester - ASPLOS '13

  20. Scaling Parallel Eraser 2 Worker Threads Median 3.3x with 8 cores Benjamin Wester - ASPLOS '13

  21. Conclusion • 3-Phase Uniparallel architecture • Parallel FastTrack • Parallel Eraser • Parallel algorithms: • Are more efficient • Scale better Benjamin Wester - ASPLOS '13

  22. Questions? Benjamin Wester - ASPLOS '13

More Related