1 / 26

Delta: Heuristically Minimize “Interesting” Files delta.tigris.org

Delta: Heuristically Minimize “Interesting” Files delta.tigris.org. Daniel S. Wilkerson work with Scott McPeak. This quater million line file crashes my tool!. We had a quarter million line (preprocessed) C++ file that crashed our C++ front-end (Elsa)

devon
Download Presentation

Delta: Heuristically Minimize “Interesting” Files delta.tigris.org

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Delta:Heuristically Minimize “Interesting” Filesdelta.tigris.org Daniel S. Wilkerson work with Scott McPeak

  2. This quater million line file crashes my tool! • We had a quarter million line (preprocessed) C++ file that crashed our C++ front-end (Elsa) • How long would it take you to minimize that by hand? • Delta reduced it in a few hours to a page or two of code • While we did something else!

  3. Delta Debugging Algorithm • Andreas Zeller’s Delta Debugging Algorithm • For file minimization, reduces to this: for each granularity g from 0 to log2 N • partition the file into 2g parts • for each part • test if the file minus part is still interesting • if so, permanently throw out that part • Result is “one minimal” • removing any one line will make test fail

  4. Example: both blue needed • a • b • c • d • e • f • g • h

  5. both blue needed: g = 0 • a • b • c • d • e • f • g • h can’t delete the box since it contains both b and e

  6. both blue needed: g = 1 • a • b • c • d • e • f • g • h can’t delete; contains b can’t delete; contains e

  7. both blue needed: g = 2 • a • b • c • d • e • f • g • h can delete can delete

  8. both blue needed: g = 3 can delete • a • b • c • d • e • f • g • h can delete

  9. both blue needed: final • a • b • c • d • e • f • g • h

  10. You could do this manually... • and be much more clever ...but delta is often faster • I find it surprising that minimizing a file exibiting a certain behavior, brute force mostly wins over cleverness • “Computers are as dumb as hell but they go like 60” -- Richard Feynman

  11. Do a controlled experiment • An experiment does many things • the interesting bit • and the boilerplate just to make it go • A control is another experiment • that only does the boilerplate • Do both and “subtract”; finds interesting bit gcc -c $F control: $F passes gcc &&oink $F | grep 'error:...‘ but not oink

  12. topformflat: “explaining hierarchical structure” • To delta, a file is a sequence of lines • topformflat “explains” the nesting of C/C++ • Simple flex filter that copies input to output • but doesn’t print newlines nested deeper than a nesting-depth argument • Strategy: repeatedly minimize with increasing nesting depths

  13. topformflat Example void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...}

  14. topformflat Example, level=0 void foo() {for(...){x -= 5;bar();}while(...){j++;}} void bar() {z |= 17;foo();} void baz() {...}

  15. topformflat Example, level=1 void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...} deleted

  16. topformflat Example, level=2 void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...}

  17. Science: Most bugs exhibitableby small inputs • On any input size, the result is almost always small • for C++ input to a compiler, 1-2 pages of code. • Seems to be a phenomenon of computation • there actually is Science in Computer Science! • but not always • delta worked for a week and still had 50 files • a buffer had to fill up and then flush

  18. The “Configuration File Trick” • Delta generalizes to many situations if you • parameterize the process with a file • minimize the file. • Simon Goldsmith was instrumenting Java system binaries • “during class-loading JVM would seg-fault; nothing really comprehensible would happen” • wrote a script to read a config file for which instrumented classes to put into the jar file • use delta to minimize the config file

  19. Simulated Annealing • Simulated Annealing • Large, non-convex sub-space • Gradient of goodness • Random local moves • likely to find another point in the sub-space • Moves parameterizable by a temperature. • Some say the ability to sometimes get worse is essential • I say: locality, randomness, and temperature

  20. Delta as Simulated Annealing • space: files that pass your test • goodness: smaller file is better • local moves: chop out a chunk of file • note that we never “get worse” • so delta is greedy • temperature: chunk size • we have an exponential “annealing schedule”, which is not unusual, says wikipedia anyway.

  21. Delta surprisingly effective • Especially given how ignorant and general it is • Most ideas for improvements are how to make the local moves better at staying in the space • These ideas generally require knowing what the file means. • Important point: But note how well delta already does knowing nothing! • and topformflat only knows nesting and quotes!

  22. Improvement: use knowledge of dependencies to improve moves If you know the language semantics, reject moves that would violate it, or only make moves that would produce a legal file decl use

  23. Fan Mail • From: Flash Sheridan • This is just a quick thank-you note for Delta. ... it immediately reduced a ... bug file from 16K lines to ten (GCC bug 22604). • Oddly enough, it initially found a different bug (22603), since I'd only specified "internal compiler error", not "segmentation fault".

  24. Fan Mail, p.2 • From: Flash Sheridan • Delta has become even more valuable since my initial thank-you note. • I'm not sure it's helped with all of the GCC bugs I've been filing... but I couldn't have filed most of them without Delta. • Delta has always been able to find a radically smaller file, which I have been able to attach to my bug report.

  25. Fan Mail, p.3 • From: Richard Guenther • delta is saving a lot of gcc developers life ;) I would guess 1 of 3 bugs sumitted to the gcc bugzilla get their testcase reduced using delta. • ... a little bit more accurate would be to say we're using delta to reduce all testcases from the gcc bugzilla in case they get entered unreduced.

  26. Delta: This simple dumb script is everywhere! One class devoted to it in both Berkeley and Stanford Software Engineering Courses • Berkeley: “We've just assigned a delta-related homework to the students today” • Stanford: “I gave them a homework assignment for CS295 using delta. Feedback was positive but unquantified.” Why did it take so long to think of this simple thing?

More Related