550 likes | 735 Views
Dynamically Discovering Likely Program Invariants to Support Program Evolution. Presented By: Wes Toland, Geoff Gerfin. Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin. Outline. Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants
E N D
Dynamically Discovering Likely Program Invariants to Support Program Evolution Presented By: Wes Toland, Geoff Gerfin Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Invariants • What are invariants? • A constraint over a variable’s values • A relationship between multiple variable values. • Defined as mathematical predicates (Example: n >= 0)
Importance of Invariants • In program development: • Refining a specification • Aid in runtime checking • In software evolution: • Aid programmer in understanding functionality of undocumented program so incorrect assumptions are not made. • Violation of invariant results in a bug.
Daikon • Programmers do not usually explicitly annotate or document code with invariants. • Daikon proposes to automatically determine program invariants and report them in a meaningful manner.
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Daikon’s Infrastructure: Original Program i,s := 0,0; do i != n -> i,s := i + 1, s + b[i] od
Daikon’s Infrastructure: Instrumented Program print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od
Daikon’s Infrastructure: Trace File print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od
Daikon’s Infrastructure: Invariants Determined Invariants 1.) n >= 0 2.) s = SUM(B) 3.) i >= 0
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Code Instrumentation (2/6) • Daikon’s front-end modifies source code to trace specific variables at points of interest: • Function entry points (pre-conditions) • Function exit points (post-conditions) • Loop heads (loop invariants) • The trace data is used as input to Daikon’s back-end, which is used to infer invariants
Code Instrumentation (3/6) • Daikon uses an abstract syntax tree for code instrumentation. • What is an AST?
Code Instrumentation (4/6) How could this be useful for code instrumentation?
Code Instrumentation (5/6) • AST is used by Daikon to determine which variables are in scope at each point of interest. • Code is inserted into program point to write the values for all variables in scope to a file in a specific format.
Code Instrumentation (6/6) • Status variables are created for each original program variable and are passed along throughout function calls. • Status variables: • Modification timestamp (Used to prevent garbage output) • Smallest and largest indices (for arrays and pointers) • Linked list flag • Status variables are updated when a program manipulates its associated variable.
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Data Trace Generation (2/2) Instrumented Code Data Trace DB print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Types of Invariants (3/3) • Single-sequence variables: • Range (min and max values) • Ordering (increasing or decreasing) • Invariants over all elements (Given array[size], all elements >= c) • Two-sequence variables • Linear relationship ( y[100] = a*x[100] + b ) • Comparison ( x < y where x[i] = y[i]-1 ) • Reversal for(i = 0;i < length(y); i++) x[i]= y[length(y) - i] • Sequence and numeric variables: • Membership: ( i € s)
Inferring Invariants (1/5) What invariants should be inferred from this method, regardless of the test suite input?
Inferring Invariants (3/5) • Daikon can identify from this trace that for all samples, x = orig(x)
Inferring Invariants (4/5) • Daikon can identify from this trace that for all samples, y = orig(y) = 1.
Inferring Invariants (5/5) • Daikon can identify from this trace that for all samples, *x = orig(*x) + 1. • Is this invariant too limited?
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Uses of Invariants (1/2) • Explicated Data Structures • Clearly define undocumented data structures without looking through code. • Confirmed and contradicted expectations • Assert an understanding of code functionality. • Example: It may appear that x is always less than y, which Daikon can verify for the programmer (assuming a valid test suite). • Bug Discovery
Uses of Invariants (2/2) • Identify limited use of procedures • Identify procedures that have unnecessary functionality based on the input. • Demonstrate test suite inadequacy • Reveal shortcoming of exercising all branches within a program by analyzing Daikon’s output. • Validate program changes • After a piece of code has been heavily modified, but should still abide by the original specifications, it is a good idea to compare the invariants. • If they match, the programmer can be more confident that the modifications did not have adverse effects.
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Evaluation Overview • Asserting Daikon’s Invariant Detection • Performance Evaluation • Stability Evaluation
Asserting Daikon’s Invariant Detection • Simple accuracy evaluation of Daikon • A sample program was taken from The Science of Programming • The “gold standard” of invariant identification • Program had documented precondition, postcondition, and loop variant specifications • Daikon reproduced all documented specifications plus some additional invariants: • Erroneously omitted (omitted in documentation) • Information about the test suite • Extraneous (Redundant invariants)
Performance Evaluation • Siemen’s replace program is used over varying test cases and number of variables. • Most important factor: number of variables over which invariants are checked • This is not the total number of program variables, rather it is the number of variables in a program point’s scope. • Invariant detection time grows quadratically with this factor. • Additionally, invariant detection time grows linearly with test suite size.
Stability Evaluation • Number of test cases affects different types of invariants in different ways: • Note that the identical unary invariants do not vary much as the number of test cases are increased. • However, the number of differing unary invariants varies largely.
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Related Work (1/2) • Static Approaches to Inferring Invariants • Operate on program text, not test runs (symbolic execution) [Hoare69]. • Advantages • Reported invariants are true for any program run (but not necessarily exhaustive). • Theoretically, static approaches can detect all sound invariants if a program is run to convergence. • Limitations • Omit properties that are true but uncomputable. • Pointer manipulation is impossible to approximate.
Related Work (2/2) • Dynamic Approaches to Inferring Invariants • Event traces [Blum93]. • Uses a state machine instead of AST. • Advantage: Lower data storage requirements. • Runtime switches based on user-inserted assert statements (Value Profiling) [Hanson93].
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion
Limitations (1/2) • Accuracy of inferred invariants depends on quality and completeness of test cases • Additional test cases could provide data that will lead to additional invariants to be inferred. • Additionally, invariants may only hold true for cases in test suite • Daikon produces gigabytes of trace data, even while analyzing trivial programs. • The initial prototype implementation ran out of memory when testing 5,542 test cases
Limitations (2/2) • The instrumenter, and therefore Daikon, is currently limited to C, Java, and Lisp. • Daikon does not yet follow arbitrary-length paths through recursive structures. • Daikon cannot compute invariants such as linear relationships over numerous variables (more than 3). • Instrumenting the program by modifying object code (opposed to source code) would allow for improved precision and portability. • Exact memory locations could be traced. • This approach has many more obstacles.
Future Work (1/2) • Ernst et. al. planned on increasing relevance and performance after this work by: • Reducing redundant invariance. • Removing relations from variables that can be statically proven to be unrelated. • Ignoring variables that have not been assigned since their last instrumentation. • Converting the implementation of Daikon from Python to C. • Checking fewer invariants (useful when programmer wants to focus on specific part of code)
Future Work (2/2) • Since paper publication: • Additional front-end support: • 2002: Perl (dfepl front-end implementation) • 2005: C++ (Kvasir front-end implementation) • 2003: Various performance improvements: • Handle data trace files incrementally • Original implementation stored entire trace file in memory • 2005: IDE Plug-in support for Visual Studio
Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion