1 / 20

Automatic Software Repair with Evolutionary Computation

Automatic Software Repair with Evolutionary Computation. Stephanie Forrest Westley Weimer. Introduction. Automatic bug repair is an important unsolved problem in software engineering Automated repair is needed for self-healing systems “The problem of security is the problem of software”

Download Presentation

Automatic Software Repair with Evolutionary Computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Software Repairwith Evolutionary Computation Stephanie Forrest Westley Weimer

  2. Introduction • Automatic bug repair is an important unsolved problem in software engineering • Automated repair is needed for self-healing systems • “The problem of security is the problem of software” • We combine state-of-the-art methods from programming languages with innovations in evolutionary computation • To repair bugs in publicly released software

  3. Summary of Method • Assume: • Access to C source code • Negative test case (input = 10593 ; output = infinite loop) • Positive test cases (encode required program functionality) • Construct Abstract Syntax Tree using CIL • Evolve repair that avoids negative test case and passes positive test case • Minimize repair using structural differencing and delta debugging

  4. What is evolutionary computation? • Evolution in a computer: • Individuals (genotypes) stored in the computer’s memory • Evaluation of individuals (artificial selection) • Differential reproduction by copying and deleting • Variation introduced by analogy with mutation and crossover

  5. Example: Microsoft Zune • Dec. 31, 2008. Microsoft Zune players mysteriously freeze up. • Bug: Infinite loop when input is last day of a leap year. • Negative test case: 10593, which corresponds to Dec 31, 2008. • Repair is not trivial. Microsoft’s recommendation was to let Zune drain its battery and then reset. Downloaded from http://pastie.org/349916 (Jan. 2009).

  6. Evolutionary Computation Innovations • Start with a working program • Focus on execution path through AST • Restrict mutation and crossover to execution path • Represent AST at level of statements • Leaves out expressions, variable declarations • Genetic operators • Don’t invent any new code, crossback, macromutation operators • Minimize repair size using structural differencing

  7. AST Representation

  8. Weighted Path • Nodes visited by negative test case have weight 1.0 • Nodes visited by negative and positive test cases have weight 0.01 • All other nodes have weight 0.0

  9. The Final Evolved Repair

  10. Summary of Repairs to Date • Twenty distinct defects in 7 classes: • Segfault: 7 • Buffer overflows: 3 • Infinite loops: 4 • Incorrect output: 2 • Integer overflow: 2 • Non-overflow DOS: 1 • Format string attack: 1 • Twenty distinct programs totaling 186,603 LOC (180k LOC) • Scientific Computing: 1 • Scripting Languages: 3 • Games, Graphics, Sound: 4 • Servers (web, ftp, authentication): 4 • Operating system utilities: 8

  11. Benchmark programs GECCO 2009, ICSE 2009, ACSAC(submitted)

  12. Time to Discover Repair • Time to repair: • 3 - 10 minutes • Time includes: • GP algorithm (selection, mutation, calculating fitness, etc.) • Running test cases • Pretty printing and memoizing ASTs • gcc (compiling ASTs into executable code) • No special hardware

  13. Research Questions • Does it really work? Why does it work? How can we break it? • How does the representation affect size of search space? • Order-of-magnitude reductions • What is the role of evolution? • Variable. Random search often performs as well • How does the number of test cases affect results? • Can improve results and reduce variability, but increases search time • How does the method scale with problem size? • Search time scales more than linearly but less than a quadratic

  14. Search Time Scaling m = 1.26

  15. Why it Works • Generic approach • Powerful intermediate representation • Weighted path greatly reduces search space • Minimization eliminates unnecessary fixes • Most bugs can be fixed with a few local modifications • 667 average atomic genetic operations to discover a repair; Repair discovered on average in 3.6 generations; 2.9 genetic operations per fitness evaluation • At least 1/2 the time, Random Search does as well as GP

  16. Quality of Repair • Manual checks for repair correctness. • Microsoft requires that security-critical changes be subjected to 100,000 fuzz inputs (randomly generated structured input strings). • Used SPIKE black-box fuzzer (immunitysec.com) to generate 100,000 held-out fuzz requests for web server examples. • In no case did GP repairs introduce errors that were detected by the fuzz tests, and in every case the GP repairs defeated variant attacks based on the same exploit. • Thus, the GP repairs are not fragile memorizations of the input. • GP repairs also correctly handled all subsequent requests from indicative workload.

  17. Papers and Awards • W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest ``Automatically finding patches using genetic programming.'’ ICSE (2009) Best Paper Award. • S. Forrest, W. Weimer, T. Nguyen, and C. Le Goues ``A Genetic Programming Approach to Automated Software Repair.'’ GECCO (2009) Best Paper Award. • C. Le Goues, T. Nguyen, W. Weimer, and S. Forrest ``Closed-Loop Repair of Security Vulnerabilities.'’ (ACSAC 25) (Submitted June 2009). • AWARD: Human-Competitive Results Produced by Genetic and Evolutionary Computation (Humie Award). $5000 • IFIP TC2 Manfred Paul Award for Excellence in Software: Theory and Practice.1024 Euros • 2nd International Workshop on Search-Based Software Testing. Best paper and best presentation.

  18. The Future • Self-healing systems for security (next talk) • Integrating anomaly detection to find negative test cases • Runtime repair using software dynamic translation, e.g., Strata • Repair templates, other search methods • Repair quality carefully • Consistency in distributed applications? N-version diversity? • Systematic study of large software code bases • Hypothesis: Most bugs are small • A small step for GP, a large step for software?

  19. Evolutionary computation details • Fitness: Weighted sum of test cases that the program passes: • F(Programs that don’t compile) = 0 • 5 positive test cases (weight = 1), 1 or 2 negative test cases (weight = 10) • Mutation operations: • Delete a statement, Insert a statement, Swap a stmt along the weighted path with a stmt from another part of the program, • Crossover: Crosses back to original parent • Population size is 40. Standard run is 10 gens + 10 gens

  20. Minimizing the final repair • Use tree-structured differencing (Al-Ekram et al. 2005) • View primary repair as a set of tree-structured operations • Consider the One-minimal subset of repairs • Let Cp = {c1, c2, ... cn} be the set of changes in a primary repair • One-minimal subset is the minimal subset of Cp that passes all test cases • Delta debugging: Search for one-minimal subset using binary search • n2 time in worst case • often linear

More Related