1 / 213

Building Dynamic Instrumentation Tools with DynamoRIO

Building Dynamic Instrumentation Tools with DynamoRIO. Derek Bruening Qin Zhao. Tutorial Outline. 1:30-1:40 Welcome + DynamoRIO History 1:40-2:40 DynamoRIO Internals 2:40-3:00 Examples, Part 1 3:00-3:15 Break 3:15-4:15 DynamoRIO API 4:15-5:15 Examples, Part 2

svea
Download Presentation

Building Dynamic Instrumentation Tools with DynamoRIO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Dynamic Instrumentation Toolswith DynamoRIO Derek Bruening Qin Zhao

  2. Tutorial Outline • 1:30-1:40 Welcome + DynamoRIO History • 1:40-2:40 DynamoRIO Internals • 2:40-3:00 Examples, Part 1 • 3:00-3:15 Break • 3:15-4:15 DynamoRIO API • 4:15-5:15 Examples, Part 2 • 5:15-5:30 Feedback DynamoRIO Tutorial at MICRO 12 Dec 2009

  3. DynamoRIO History • Dynamo • HP Labs: PA-RISC late 1990’s • x86 Dynamo: 2000 • RIO  DynamoRIO • MIT: 2001-2004 • Prior releases • 0.9.1: Jun 2002 (PLDI tutorial) • 0.9.2: Oct 2002 (ASPLOS tutorial) • 0.9.3: Mar 2003 (CGO tutorial) • 0.9.4: Feb 2005 DynamoRIO Tutorial at MICRO 12 Dec 2009

  4. DynamoRIO History • 0.9.5: Apr 2008 (CGO tutorial) • 1.0 (0.9.6): Sep 2008 (GoVirtual.org launch) • Determina • 2003-2007 • Security company • VMware • Acquired Determina (and DynamoRIO) in 2007 • Open-source BSD license • Feb 2009: 1.3.1 release DynamoRIO Tutorial at MICRO 12 Dec 2009

  5. DynamoRIO Internals 1:30-1:40 Welcome + DynamoRIO History 1:40-2:40 DynamoRIO Internals 2:40-3:00 Examples, Part 1 3:00-3:15 Break 3:15-4:15 DynamoRIO API 4:15-5:15 Examples, Part 2 5:15-5:30 Feedback

  6. Typical Modern Application: IIS DynamoRIO Tutorial at MICRO 12 Dec 2009

  7. Runtime Interposition Layer DynamoRIO Tutorial at MICRO 12 Dec 2009

  8. Design Goals • Efficient • Near-native performance • Transparent • Match native behavior • Comprehensive • Control every instruction, in any application • Customizable • Adapt to satisfy disparate tool needs DynamoRIO Tutorial at MICRO 12 Dec 2009

  9. Challenges of Real-World Apps • Multiple threads • Synchronization • Application introspection • Reading of return address • Transparency corner cases are the norm • Example: access beyond top of stack • Scalability • Must adapt to varying code sizes, thread counts, etc. • Dynamically generated code • Performance challenges DynamoRIO Tutorial at MICRO 12 Dec 2009

  10. Internals Outline • Efficient • Software code cache overview • Thread-shared code cache • Cache capacity limits • Data structures • Transparent • Comprehensive • Customizable DynamoRIO Tutorial at MICRO 12 Dec 2009

  11. Direct Code Modification e9 37 6f 48 92 jmp <callout> Kernel32!TerminateProcess: 7d4d1028 7c 05 jl 7d4d102f 7d4d102a 33 c0 xor %eax,%eax 7d4d102c 40 inc %eax 7d4d102d eb 08 jmp 7d4d1037 7d4d102f 50 push %eax 7d4d1030 e8 ed 7c 00 00 call 7d4d8d22 DynamoRIO Tutorial at MICRO 12 Dec 2009

  12. Debugger Trap Too Expensive cc int3 (breakpoint) Kernel32!TerminateProcess: 7d4d1028 7c 05 jl 7d4d102f 7d4d102a 33 c0 xor %eax,%eax 7d4d102c 40 inc %eax 7d4d102d eb 08 jmp 7d4d1037 7d4d102f 50 push %eax 7d4d1030 e8 ed 7c 00 00 call 7d4d8d22 DynamoRIO Tutorial at MICRO 12 Dec 2009

  13. Variable-Length Instruction Complications e9 37 6f 48 92 jmp <callout> Kernel32!TerminateProcess: 7d4d1028 7c 05 jl 7d4d102f 7d4d102a 33 c0 xor %eax,%eax 7d4d102c 40 inc %eax 7d4d102d eb 08 jmp 7d4d1037 7d4d102f 50 push %eax 7d4d1030 e8 ed 7c 00 00 call 7d4d8d22 DynamoRIO Tutorial at MICRO 12 Dec 2009

  14. Entry Point Complications e9 37 6f 48 92 jmp <callout> Kernel32!TerminateProcess: 7d4d1028 7c 05 jl 7d4d102f 7d4d102a 33 c0 xor %eax,%eax 7d4d102c 40 inc %eax 7d4d102d eb 08 jmp 7d4d1037 7d4d102f 50 push %eax 7d4d1030 e8 ed 7c 00 00 call 7d4d8d22 DynamoRIO Tutorial at MICRO 12 Dec 2009

  15. Direct Code Modification: Too Limited • Not transparent • Cannot write jump atomically if crosses cache line • Even if write is atomic, not safe if overwrites part of next instruction • Jump may span code entry point • Too limited • Not safe w/o suspending all threads and knowing all entry points • Limited to inserting callouts • Code displaced by jump is a mini code cache • All the same consistency challenges of larger cache • Inter-operation issues with other hooks DynamoRIO Tutorial at MICRO 12 Dec 2009

  16. We Need Indirection • Avoid transparency and granularity limitations of directly modifying application code • Allow arbitrary modifications at unrestricted points in code stream • Allow systematic, fine-grained modifications to code stream • Guarantee that all code is observed DynamoRIO Tutorial at MICRO 12 Dec 2009

  17. Basic Interpreter START fetch decode execute Slowdown: ~300x DynamoRIO Tutorial at MICRO 12 Dec 2009

  18. Improvement #1: Interpreter + Basic Block Cache basic block builder START dispatch context switch BASIC BLOCK CACHE Non-control-flow instructions executed from software code cache non-control-flow instructions Slowdown: 300x 25x DynamoRIO Tutorial at MICRO 12 Dec 2009

  19. Improvement #1: Interpreter + Basic Block Cache A C B basic block builder START D dispatch context switch BASIC BLOCK CACHE D A B DynamoRIO Tutorial at MICRO 12 Dec 2009

  20. Example Basic Block Fragment add %eax, %ecx cmp $4, %eax jle $0x40106f frag7: stub0: stub1: add %eax, %ecx cmp $4, %eax jle <stub0> jmp <stub1> mov %eax, eax-slot mov &dstub0, %eax jmp context_switch mov %eax, eax-slot mov &dstub1, %eax jmp context_switch dstub0 target: 0x40106f dstub1 target: fall-thru DynamoRIO Tutorial at MICRO 12 Dec 2009

  21. Improvement #2: Linking Direct Branches basic block builder START dispatch context switch BASIC BLOCK CACHE Direct branch to existing block can bypass dispatch non-control-flow instructions Slowdown: 300x 25x 3x DynamoRIO Tutorial at MICRO 12 Dec 2009

  22. Improvement #2: Linking Direct Branches A C B basic block builder START D dispatch context switch BASIC BLOCK CACHE D A B DynamoRIO Tutorial at MICRO 12 Dec 2009

  23. Direct Linking add %eax, %ecx cmp $4, %eax jle $0x40106f frag7: stub0: stub1: add %eax, %ecx cmp $4, %eax jle <frag8> jmp <stub1> mov %eax, eax-slot mov &dstub0, %eax jmp context_switch mov %eax, eax-slot mov &dstub1, %eax jmp context_switch dstub0 target: 0x40106f dstub1 target: fall-thru DynamoRIO Tutorial at MICRO 12 Dec 2009

  24. Improvement #3: Linking Indirect Branches basic block builder START dispatch context switch BASIC BLOCK CACHE Application address mapped to code cache indirect branch lookup non-control-flow instructions Slowdown: 300x 25x 3x 1.2x DynamoRIO Tutorial at MICRO 12 Dec 2009

  25. Indirect Branch Transformation frag8: stub0: mov %ecx, ecx-slot pop %ecx jmp <stub0> mov %ebx, eax-slot mov &dstub0, %ebx jmp ib_lookup ret DynamoRIO Tutorial at MICRO 12 Dec 2009

  26. Traces reduce branching, improve layout and locality, and facilitate optimizations across blocks We avoid indirect branch lookup Next Executing Tail (NET) trace building scheme [Duesterwald 2000] Improvement #4: Trace Building Basic Block Cache Trace Cache A G D J A G K B J E B E H K F H C I F L D DynamoRIO Tutorial at MICRO 12 Dec 2009

  27. Incremental NET Trace Building Basic Block Cache Trace Cache G J G K G G J K K G G J K J K DynamoRIO Tutorial at MICRO 12 Dec 2009

  28. Improvement #4: Trace Building trace selector basic block builder START dispatch context switch BASIC BLOCK CACHE TRACE CACHE indirect branch lookup non-control-flow instructions non-control-flow instructions indirect branch stays on trace? Slowdown: 300x 26x 3x 1.2x 1.1x DynamoRIO Tutorial at MICRO 12 Dec 2009

  29. Base Performance SPEC CPU2000 Server Desktop DynamoRIO Tutorial at MICRO 12 Dec 2009

  30. Sources of Overhead • Extra instructions • Indirect branch target comparisons • Indirect branch hashtable lookups • Extra data cache pressure • Indirect branch hashtable • Branch mispredictions • ret becomes jmp* • Application code modification DynamoRIO Tutorial at MICRO 12 Dec 2009

  31. Time Breakdown for SPECINT trace selector basic block builder START < 1% dispatch context switch BASIC BLOCK CACHE TRACE CACHE indirect branch lookup non-control-flow instructions non-control-flow instructions indirect branch stays on trace? ~ 0% ~ 4% ~ 94% ~ 2% DynamoRIO Tutorial at MICRO 12 Dec 2009

  32. Not An Ordinary Application • An application executing in DynamoRIO’s code cache looks different from what the underlying hardware has been tuned for • The hardware expects: • Little or no dynamic code modification • Writes to code are expensive • call and ret instructions • Return Stack Buffer predictor DynamoRIO Tutorial at MICRO 12 Dec 2009

  33. Performance Counter Data DynamoRIO Tutorial at MICRO 12 Dec 2009

  34. Internals Outline • Efficient • Software code cache overview • Thread-shared code cache • Cache capacity limits • Data structures • Transparent • Comprehensive • Customizable DynamoRIO Tutorial at MICRO 12 Dec 2009

  35. Threading Model Running Program Thread1 Thread2 Thread3 ThreadN … Code Caching Runtime System Thread1 Thread2 Thread3 ThreadN … Operating System Thread1 Thread2 Thread3 ThreadN … DynamoRIO Tutorial at MICRO 12 Dec 2009

  36. Code Space Running Program Thread Thread Thread Thread Thread-Private Code Caches Thread-Shared Code Cache Thread Thread Thread Thread Operating System Thread1 Thread2 Thread1 Thread2 DynamoRIO Tutorial at MICRO 12 Dec 2009

  37. Thread-Private versus Thread-Shared • Thread-private • Less synchronization needed • Absolute addressing for thread-local storage • Thread-specific optimization and instrumentation • Thread-shared • Scales to many-threaded apps DynamoRIO Tutorial at MICRO 12 Dec 2009

  38. Database and Web Server Suite DynamoRIO Tutorial at MICRO 12 Dec 2009

  39. Memory Impact ab med guest low guest med DynamoRIO Tutorial at MICRO 12 Dec 2009

  40. Performance Impact DynamoRIO Tutorial at MICRO 12 Dec 2009

  41. Scalability Limit DynamoRIO Tutorial at MICRO 12 Dec 2009

  42. Internals Outline • Efficient • Software code cache overview • Thread-shared code cache • Cache capacity limits • Data structures • Transparent • Comprehensive • Customizable DynamoRIO Tutorial at MICRO 12 Dec 2009

  43. Added Memory Breakdown DynamoRIO Tutorial at MICRO 12 Dec 2009

  44. Code Expansion DynamoRIO Tutorial at MICRO 12 Dec 2009

  45. Cache Capacity Challenges • How to set an upper limit on the cache size • Different applications have different working sets and different total code sizes • Which fragments to evict when that limit is reached • Without expensive profiling or extensive fragmentation DynamoRIO Tutorial at MICRO 12 Dec 2009

  46. Adaptive Sizing Algorithm • Enlarge cache if warranted by percentage of new fragments that are regenerated • Target working set of application: don’t enlarge for once-only code • Low-overhead, incremental, and reactive DynamoRIO Tutorial at MICRO 12 Dec 2009

  47. Cache Capacity Settings • Thread-private: • Working set size matching is on by default • Client may see blocks or traces being deleted in the absence of any cache consistency event • Can disable capacity management via • -no_finite_bb_cache • -no_finite_trace_cache • Thread-shared: • Set to infinite size by default • Can enable capacity management via • -finite_shared_bb_cache • -finite_shared_trace_cache DynamoRIO Tutorial at MICRO 12 Dec 2009

  48. Internals Outline • Efficient • Software code cache overview • Thread-shared code cache • Cache capacity limits • Data structures • Transparent • Comprehensive • Customizable DynamoRIO Tutorial at MICRO 12 Dec 2009

  49. Two Modes of Code Cache Operation • Fine-grained scheme • Supports individual code fragment unlink and removal • Separate data structure per code fragment and each of its exits, memory regions spanned, and incoming links • Coarse-grained scheme • No individual code fragment control • Permanent intra-cache links • No per-fragment data structures at all • Treat entire cache as a unit for consistency DynamoRIO Tutorial at MICRO 12 Dec 2009

  50. Data Structures • Fine-grained scheme • Data structures are highly tuned and compact • Coarse-grained scheme • There are no data structures • Savings on applications with large amounts of code are typically 15%-25% of committed memory and 5%-15% of working set DynamoRIO Tutorial at MICRO 12 Dec 2009

More Related