380 likes | 552 Views
Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation. Patrice Godefroid Shuvendu K. Lahiri Cindy Rubio-González. Microsoft Research University of Wisconsin – Madison.
E N D
Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation Patrice GodefroidShuvendu K. LahiriCindy Rubio-González Microsoft Research University of Wisconsin– Madison International Static Analysis Symposium – September 2011
Background • Systematic Dynamic Test Generation (= DART) New inputs Run program Symbolically execute program Constraints Negate and solve constraints Recorded trace Valid input And the process repeats (possibly forever!) • Used in many tools • EXE, CUTE, SAGE, PEX, KLEE, BitScope, Apollo, etc.
SAGE @ Microsoft • #1 application for SMT solvers today (CPU usage) • 1st whitebox fuzzer for security testing • 200+ machines (since 2008) • 1 billion+ constraints • 100s of apps, 100s of security bugs • Example: Win7 file fuzzing • Found ~1/3 of all fuzzing bugs • Millions of dollars saved • for Microsoft + time/energy for the world
Compositional Test Generation Compositional Dynamic Test Generation • Compute summaries that can be reused later • Avoid retesting • Can provide the same path coverage exponentially faster! Systematically executing all feasible paths does not scale
Example of Function Summary 1 intis_positive(int x) { if (x > 0) return 1; return 0; 4 } Where ret denotes the value returned by the function is_positive
Function Summaries • Function summary for a function f • Logic formula over constraints • Derived by successive iterations and defined as a disjunction of formulas Conjunction of constraints on the outputs of f Conjunction of constraints on the inputs of f • Can be computed automatically from the path constraint for the intraprocedural path
Must Summaries • Symbolic execution of large programs imprecise • Complex program statements • Calls to operating-system and library functions Assume hash is a complex or unknown function • Concrete values simplified constraints • Under-approximate path constraints • Summaries become must summaries Assume if g is invoked with y = 45, then hash(45) = 987 1 intg(int x, int y) { if ((x > 0) && (hash(y) > 10)) return1; 4 return 0; 5 } Under-approximate with smaller precondition
Must Summaries • Defined as quadruple ⟨lp, P, lq, Q⟩ where: Ip P summary precondition holding at lp Qsummary postcondition holding at lq lq Prog
Some Facts About Summaries • Time to be produced: weeks/months • Number of summaries: millions • Number of instructions executed between lp and lq: can be hundreds of thousands
Incremental Compositional Test Generation • Have to start from scratch if there is a small code change Incremental compositional test generation • As in smart/selective regression testing • Reuse summaries still valid in new program • Recompute invalid summaries
Must Summary Checking • Given a valid must summary for a program and a new version of the program, is the summary still valid for the new version? • Intraprocedural summaries • locations lpandlqare in a same function f • function fdoes not return between lp to lq when the summary is generated
Some proposals • Naïve • For each summary, record executed instructions • Too expensive, ~100K of instructions executed • Runtime overhead • Our proposal • Verifystatically what summaries are valid in order to reuse them • Less precise than recomputing summaries from scratch, but cheaper
Algorithms 1. Static Change Impact Analysis 2. Predicate-Sensitive Change Impact Analysis 3. Must Summary Validity Checking Analysis
Phase 1: Static Change Impact Analysis • Impact analysis of code changes in the control-flow and call graphs of the program Ip Ip lq lq Old program New program
Modified Instructions and Functions • Instruction i of a program Prog is modified if: • i is changed or deleted in Prog’ or • Its ordered set of immediate successors has changed • Function f in a program Prog is modified if f: • contains a modified instruction • calls a modified function • calls an unknown function
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Phase 1: Static Change Impact Analysis Construct call graph for the program 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Phase 1: Static Change Impact Analysis Find modified and unknown functions Map summaries, construct control-flow graphs Find indirectly modified and unknown functions 3 2 4 IM IM IM IU IU U M S M S IM IU M IU S IU IU U S IU S S
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Phase 1: Static Change Impact Analysis Find summaries as valid or invalid 5 IM IM IM IU IU U M S M S IM IU M IU S IU IU U S IU S S
Phase 2: Predicate-Sensitive Change Impact Analysis • Exploit the predicates P and Q in a summary Ip P: x>0y<10 if(x > 0) Invalidated by Phase 1 if (y==0) w = w + 1 w = 0 w = 1 lq ... Q: w = 0 Old program
Phase 2: Predicate-Sensitive Change Impact Analysis void foo() { ... if (x > 0) { if (y == 10) w++; // MODIFIED else w = 0; } else { w = 1; // MODIFIED } ... Ip P: x>0y<10 lq Q:w = 0 return; } Old program
Phase 2: Predicate-Sensitive Change Impact Analysis void foo() { 1 gotolp; ... assume P; modified = false; if (x > 0) { if (y == 10) { modified = true; w++; } else w = 0; } else { modified = true; w = 1; } assert(Q ¬modified); ... 2 Ip P: x>0y<10 3 3 4 lq Q:w = 0 return; } Instrumented old program
Phase 2: Predicate-Sensitive Change Impact Analysis • Check assertion in instrumented code does not fail for all possible inputs • Verification-condition based program verifier • Create logic formula from program with assertions • Check formula validity using theorem prover • If valid, the assertion does not fail in any execution
Phase 3: Must Summary Validity Checking • Check must summary validity against some code, independently of code changes Ip P: x < 0 if(x < 0) Invalidated by Phase 1 and Phase 2 if (y < 0) r = 1 r = 0 r = 4 w = 1 lq ... Q: r 0 Old program New program
Phase 3: Must Summary Validity Checking void bar() { ... if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code } } ... Ip P: x < 0 lq Q: r 0 return; } New program
Phase 3: Must Summary Validity Checking void bar() { 1 reach_lq = false; gotolp; ... assume P; if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code } } assert(Q); reach_lq = true; ... assert(reach_lq); 2 Ip P: x < 0 3 lq Q: r 0 4 return; } Instrumented new program
Phase 3: Must Summary Validity Checking • Check that assertions hold in the instrumented program for all possible inputs
Result Validated summaries can be reused • Because of soundness Invalidated summaries are discarded and need to be recomputed • New tests are generated to cover their preconditions Algorithms can be used in isolation or in a pipeline
Implementation Details Old DLL NewDLL Old DLL NewDLL Summaries Produced by SAGE Old DLL NewDLL Map summaries, find modified insts and funcs (C++) Vulcan Library to statically analyze Windows binaries Phase 1 Change Impact Phase 2 Predicate Sensitive Phase 3 Validity Checking Used in pipeline or isolation Valid/Invalid Summaries
Implementation Details Procedure (x86) Summary ⟨lp,P,lq,Q⟩ Vulcan Translator from X86 to BoogiePL Sound translation Instrumented BPL file (Phase 2 or Phase 3) Boogie/Z3
Benchmarks • Image parsers embedded in Windows • ANI, GIF and JPEG • Ran SAGE to generate summaries (small sample) • 286 for ANI, 288 for GIF and 517 for JPEG • Identified the DLLs involved • 3 for ANI, 4 for GIF and 8 for JPEG • Compared old version against a randomly picked newer version • Delta ~1 to 3 years
Difference Between Program Versions Modified functions: 3% - 10% Indirectly modified functions: 30% - 45% Indirectly unknown functions: 60% - 74% Unknown functions: 27% - 37%
Applying Phases in Isolation # Validated Summaries # Validated Summaries 31% 58% 85% 30% 69% 92% Total Validated: 256/286 (90%) Total Validated: 274/288 (95%) # Validated Summaries Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking Total Validated: 501/517 (97%) 61% 94% 33%
Applying Phases in Pipeline FashionPhase 1 → Phase 2 → Phase 3 # Validated Summaries # Validated Summaries 58% 27% 4% 69% 25% 1% Total Validated: 256/286 (90%) Total Validated: 274/288 (95%) # Validated Summaries Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking Total Validated: 501/517 (97%) 61% 35% 1%
Running Time (Isolation) # Minutes # Minutes # Minutes Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking
Running Time Phase 1 → Phase 2 → Phase 3 # Minutes 43 min 28min 41min Preliminary results show that statically validating must summaries is up to 20 times faster than recomputing them! Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking
Summary • Formulated the problem of statically validating must summaries • Described three approaches for validating must summaries • Presented a preliminary evaluation on three large Windows image parsers • Demonstrated the effectiveness of static must summary checking • Validated hundreds of must summaries in minutes
Questions? Old DLL NewDLL Old DLL NewDLL Summaries Old DLL NewDLL Map summaries, find modified insts and funcs (C++) Vulcan Phase 1 Change Impact Phase 2 Predicate Sensitive Phase 3 Validity Checking Valid/Invalid Summaries