1 / 63

Delta Debugging

Politecnico di Milano. Delta Debugging. An advanced debugging technique. Authors: Carlo Curino, Alessandro Giusti. Motivations. Reducing faults: 50%-80% of total cost Debugging: One of the hardest, yet least systematic activities of software engineering most time-consuming

aquene
Download Presentation

Delta Debugging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Politecnico di Milano Delta Debugging An advanced debugging technique Authors: Carlo Curino, Alessandro Giusti Curino, Giusti

  2. Motivations • Reducing faults: • 50%-80% of total cost • Debugging: • One of the hardest, yet least systematic activities of software engineering • most time-consuming • Locating faults: • most difficult Curino, Giusti

  3. Overview • Which problems are solved by Delta Debugging • Four solutions: a common approach • Simplifying failure-inducing input • Isolating failure-inducing thread schedule • Identifying failure-inducing changes in the code • Isolating Cause-Effect Chains Curino, Giusti

  4. Failure-inducing input • This HTML input makes Mozilla crash (segmentation fault). Which portion is the failure-inducing one? Curino, Giusti

  5. Thread scheduling • The result of a multithread program seems not deterministic. Why it happens? Curino, Giusti

  6. Code changes • The old version of GDB works with DDD, the new one doesn’t! • 178.000 lines of code have been modified between the two versions where’s the bug? Curino, Giusti

  7. Cause-effect chain • Which part of the program state is involved in the failure? Curino, Giusti

  8. Four solutions: a single approach • The underlying problem is: • Find which part of something determines the failure So a common strategy can be applied: • Divide et impera applied to deltas between: • Working and failing Inputs • Working and failing code versions • Working and failing threads schedules • Working and failing program states This allows: • Efficient and automatic debugging procedure Curino, Giusti

  9. Common terminology • A test case can either: • Fail • (The failure shows up) • Pass • (program runs properly) • Be Unspecified • (different problems arise) • Delta debugging Algorithms iteratively: • Apply changes (to input, code, schedule or state) • Run tests Curino, Giusti

  10. Common terminology (2) • Concept of difference: • A really general delta between something in 2 test cases • Examples: • Difference in the input: different character (or bit) in the input stream • Difference in thread schedule: difference in the time a given thread switch is performed • Difference in the code: different statement in 2 version of a program • Difference in the program state: different values of the internal variables of a program Curino, Giusti

  11. Simplifying Failure-inducing input Curino, Giusti

  12. Minimizing vs Isolating • Minimizing (ddmin algorithm): • Slower • More human friendly • Isolating (dd algorithm): • Generalization of the ddmin algorithm • Faster • Good to generate the input of the cause-effect chain DD Curino, Giusti

  13. Minimizing: Mozilla bug • Minimizing: • 57 test to simplify the 896 line HTML input to the “<SELECT>” tag that causes the crash • Each character is relevant (as shown from line 20 to 26) • Only removes deltas from the failing test • Returns a n-minimal (global minimum is NP) input that causes a failure Curino, Giusti

  14. Minimizing: didactic example Curino, Giusti

  15. Isolating: Mozilla bug • Isolating: • Only 7 tests (instead of 26) • Removes deltas from the failing test and add deltas to passing test • Isolates a single delta “<” that makes the failure to go away • Returns the 2 nearest input on failing and the other passing Curino, Giusti

  16. General DD Algorithm Initial Fail Differences Initial Pass Curino, Giusti

  17. What if we remove these diff from current failing test? General DD Algorithm Initial Fail Differences Initial Pass Curino, Giusti

  18. General DD Algorithm Initial Fail Differences Failure disappears: “Move up” Initial Pass Curino, Giusti

  19. What if we remove these diff? General DD Algorithm Initial Fail Differences Initial Pass Curino, Giusti

  20. General DD Algorithm Initial Fail UNRESOLVED TEST: “Increase Granularity” Differences Initial Pass Curino, Giusti

  21. General DD Algorithm Initial Fail What if we remove these diff from current failing test? Differences Initial Pass Curino, Giusti

  22. General DD Algorithm Initial Fail Still Fails: “Move Down” Differences Initial Pass Curino, Giusti

  23. Formally: the Algorithm Curino, Giusti

  24. Efficiency considerations • The worst case: |k|2 + 3|k| tests (k=cardinality of the change set) • all test cases are unresolved except the last one • very unlikely • The best case: 2*log|k| • Try to avoid unresolved tests outcomes • Lexical, syntactical knowledge about input Curino, Giusti

  25. DEMO Eclipse Plugin Live Demo Curino, Giusti

  26. Thread Scheduling • The behavior of a multithreaded program may depend on the schedule. Curino, Giusti

  27. DD applied to Thread Scheduling • Debug is even harder here: • Thread switches and schedules are nondeterministic • It is difficult to reproduce and isolate failures • Goal: • Relate failure to a small set of relevant differences from passing and failing schedules • Again a “purely experimental approach”, no need to understand the program Curino, Giusti

  28. Purely experimental: Pros and Cons • Pros: • program treated as a black box:requires only to execute the program • Failure: an arbitrary behaviour of the program. Requires only to distinguish failure from success. • Cons: • (w.r.t static analysis) Test-based: can not determine properties for all runs of a program like the general absence of deadlocks • require an observable failure Curino, Giusti

  29. Dejavu tool • Tool: Dejavu (DEterministic JAVa replay Utility) by IBM • Reproduce of schedules and induced failures • Exploiting Dejavu • the Thread Schedule becomes an input • We can generate schedules by mixing 1 running schedule and 1 failing schedule Curino, Giusti

  30. Differences in thread scheduling • Starting point: • Passing run • Failing run • Differences (for t1): • t1 occurs in at time 254 • t1 occurs in at time 278 • ∆1 = |278 − 254| induces a statement interval: the code executed between time 254 and 278 Curino, Giusti

  31. Differences in thread scheduling • We can build further test cases mixing the two schedule to isolate the relevant differences Curino, Giusti

  32. Real life test: setting • Test #205 of the SPEC JVM98 Java test suite • Modification of the raytracer program to a multi-threaded version • Introduction of a simple race condition • Implementation of an automated test that checks failure/passing • Generation of random schedules to find a passing schedule and a failing schedule • Differences between the passing and failing schedule: • 3,842,577,240 differences • Each diff moves thread switch time to +1 or -1 Curino, Giusti

  33. Real life test: results • DD isolate one single difference after 50 test (about 28 min) Curino, Giusti

  34. Real life test: pin-point the failure • The failure occurs if and only if thread switch #33 occurs at yield point (safe point like function invocation) 59,772,127 (instead of 59,772,126) • at 59,772,127 line 91 is the first yield point after the initialization of OldScenesLoaded • At 59,772,126 line 82 is the yield point just before the initialization of OldScenesLoaded Curino, Giusti

  35. Real life test: conclusion • Delta Debugging is efficient • even when applied to very large thread schedules (>3,000,000,000 diff) • No analysis is required as Delta Debugging relies on experiments alone • only the schedule was observed and altered • failure-inducing thread switch is easily associated with code • Alternate runs are obtained automatically • by generating random schedules • only one initial run (pass or fail) is required Curino, Giusti

  36. Code changes • A given revision of a program behaves correctly. The next one does not. • Find which of the changes in the code causes the problem. • Inconvent when difference == thousands of lines of code Curino, Giusti

  37. The manual solution • Binary search through the revision history  Regression containment • Does not always work: • Multiple changes that cause the failure only when combined (interference) • A single change can amount to many code lines (granularity) • Mixing parallel developement branches originates inconsistency problems Curino, Giusti

  38. Procedure • Developed in 1999: some differences with current general DD algorithms. • Consider the differences between the working and failing revisions. • Ignore any knowledge about the temporal ordering of the changes. • Goal: find a minimal failure-inducing change set. Curino, Giusti

  39. Inconsistencies • Mixing code changes regardless of their ordering originates lots of tests with “Unresolved” outcome: • Integration failure • Construction failure • Execution failure • They increase complexity of the DD algorithm! Curino, Giusti

  40. Future work • Group related changes (partly done)  less inconsistent trials. • Common change dates/sources • Location criteria • Lexical criteria • Syntactic criteria (common funcions/modules) • Semantic criteria Curino, Giusti

  41. Cause-Effect Background • A bit of background: • A program state is represented by variable values, and references. Curino, Giusti

  42. Background (2) • While the program runs, the state evolves. • We assume the program is • Deterministic • Not interactive  identical states at identical times have identical evolutions. Curino, Giusti

  43. Idea: apply DD to program states. • We need two distinct runs: • one failing • one passing • We want the two runs to be (initially) as much similar as possibile. • If we let the two runs evolve in parallel, their initial state will be similar. • Isolating failure-inducing input can help. • Apply DD to different "slices" of the program evolution. (A sort of TAC for computer routines). Curino, Giusti

  44. Procedure • Iteratively • Build a new state mixing the passing and failing state. • Let the program evolve and see if it passes, fails, or does unrelated weird things (undefined outcome). • Isolate the smallest subset of the state relevant for the failure. • No news so far. But: • this happens at a specific moment of the program evolution. It will be repeated (e.g. at important functions' entry points). Curino, Giusti

  45. The result • A cause-effect chain that leads to a failure. Curino, Giusti

  46. The cause-effect chain • The initial states are absolutely legitimate: for example, direct consequence of a specific input that the program should handle.  intended program states. • The final effects are the failure.  faulty program states. • The error lies somewhere in the middle, when an intended program states evolves into a faulty one. Curino, Giusti

  47. Fascinating terminology • A defect in the code originates an infection in the state. • The infection usually propagates as the program evolves. Curino, Giusti

  48. Limits • No automatic discrimination of intended and faulty (infected) states! • The human user can increase resolution of slices, and pinpoint the code that evolves an INTENDED state to a FAULTY one.  Correct the error (== defect in the code) and break the cause-effect chain that leads to the failure. Curino, Giusti

  49. Cause Transitions • Sometimes executing an instruction • a given variable ceases to be failure-inducing • others begin  the failure-inducing subset of the state changes (cause transition) • An algorithm can efficiently find cause transitions in cause-effect chains, by means of binary search (again). Curino, Giusti

  50. Cause Transitions (2) Curino, Giusti

More Related