240 likes | 254 Views
Ditto is a tool that speeds up data structure invariant checks in Java programs, allowing for automatic and efficient debugging of large-scale web applications. It incrementally checks for changes in the data structure and optimizes the rerun of invariant checks.
E N D
Ditto:Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley
Motivation: A Debugging Scenario • Buggy program: a large-scale web application in Java • Primary data structure: hashMap of shopping carts • Carts are modified throughout code • Bug: hashMap acting weird: carts disappearing, etc. • Hypothesis: cart modification violates hashCode() invariance
How to Check the Hypothesis? • Debugger facilities inadequate • Idea: write a runtime check • Iterates over buckets, checks hashCode() of each cart in bucket • Run check frequently to pinpoint error
Problem • The check is slow! (100x slowdown) • Rerunning the program is now a problem • Furthermore, what if bug isn’t reproducible? • Run the program with the check on entire test suite? • Infeasible.
Our Tool: Ditto • Ditto speeds up data structure invariant checks • Usually asymptotically in size of data structure • Hash table: 10x speedup at 1600 elements • What invariant checks can Ditto handle? • Side-effect-free: cannot return fresh mutable objects • Recursive: not an inherent limitation of algorithm
Basic Observation: Incrementalize • Invariant checks the entire data structure … • … but once checked, a local change can be (re)checked locally! • So, first establish invariant, then incrementally check changes … … … “Hash code of each cart in table corresponds to containing bucket.” …
A New Domain • Existing incrementalizers: general purpose but not automatic [Acar PLDI 2006] • User must annotate the program • For functional programs • Other caveats (conversion to CPS, etc.) • Ditto is automatic in this domain • Functional invariant checks in an imperative Java setting • No user annotations • Allows arbitrary heap updates outside the invariant • A simple bytecode-to-bytecode implementation
Ditto Algorithm Overview • First run of check: construct graph of the computation • Stores function calls, concrete inputs • Track changes to computation inputs • Subsequent runs of check: rerun only subcomputations with changed inputs • Incrementally update computation graph = incrementally compute invariant check
Example Invariant Check • Ensures a tree is locally ordered booleanisOrdered(Tree t) { if (t == null) return true; if (t.left != null && t.left .value >= t.value) return false; if (t.right != null && t.right.value <= t.value) return false; return isOrdered(t.left) && isOrdered(t.right); }
1. Constructing a Computation Graph • Purpose of computation graph: • For unchanged parts of data structure, reuse existing results • For changed parts, identify parts of check that need to be rerun • Graph stores the initial check run: • Node = function invocation, along with its • Concrete formal arguments • Concrete heap accesses • Return value Same inputs = can reuse return val Changed inputs = must rerun Inputs
1. Constructing a Computation Graph • During first check run, by instrumentation The Heap Node created with concrete formal arg A P isOrdered(P) Returns true A isOrdered(A) Calls children B C isOrdered(B) isOrdered(C) Heap reads from a.value, a.left, a.right, a.left.value, a.right.value are remembered
2. Detecting Changed Inputs • Inputs to check that could change between runs: • Arguments – easy to detect (passed to the check) • Heap values – harder (could be modified anywhere in code) • Selective write barriers • Statically determine which fields are read in the check • Barriers collect changed heap inputs used by check • In example: add write barriers for all writes into fields: • Tree.left • Tree.right • Tree.value if (t == null) return true; if (t.left != null && t.left.value >= t.value) return false; if (t.right != null && t.right.value <= t.value) return false; return isOrdered(t.left) && isOrdered(t.right);
3. Rerunning the Invariant isOrdered() • Data structure modification: Add node N, remove node F … … A A B C N C … … … B … … … D D E F … … E F G … … … … G … …
3. Rerunning the Invariant true isOrdered(A) … … A A B C N C • Goal: Incrementally update computation graph • Graph must look as if check was run afresh … … … B … … … D D E F … E F G Write barriers say… … … … … G … … Tree With New Modifications Computation Graph From Last Run
3. Rerunning the Invariant true … … A A C N C N • isOrdered(A) is first node that needs to be rerun • Parent inputs haven’t changed (functions are side-effect-free) • Rerunning exposes new node N • What happens at isOrdered(B)? … … B B … … … … D D E F E F … … … G G … … … …
3. Rerunning the Invariant true … … A A C N C N • isOrdered(B) has same formal args, heap inputs • We’d like to reuse its previous result • And end this subcomputation • Problem: isOrdered(B) also depends on return values of its callees • Which might change, since isOrdered(D) will be rerun • So we can’t be sure isOrdered(B)’s result will be the same! … … B B … … … … D D E F E F … … … G G … … … …
Optimistic Memoization • Don’t want to rerun all nodes between B and D • Solution: we optimistically assume that isOrdered(B) will return the same result • Invariant checks generally do! (e.g. “success”) • Check assumption when we rerun isOrdered(D) • For now, reuse previous result, finish up A • A returns previous result (true), so finished here … A N C … B … … D E F … … G … …
3. Rerunning the Invariant • Now we rerun isOrdered(D) • Reuse previous result of isOrdered(E), (G) • No further changes so no need for optimism • isOrdered(F) pruned from graph • isOrdered(D) returns previous result (true) • So optimistic assumption was correct • Computations around isOrdered(A) all correct … A N C … B … … D E F … … G … …
What If isOrdered(D) Returned false? false • Result propagated up graph • Continues as long as return val differs • In this case, root node of graph is reached • Result for entire computation is changed • Automatically corrects optimistic assumptions false A false N false B false false false D E … … G … …
Result of Algorithm true … … A A C N C N • We’ve incrementally updated computation graph to reflect updated data structure • Even with circular dependencies throughout graph, only reran 3 nodes • Result of computation is result of root node (true) • Graph is ready for next incremental update … … B B … … … … D D E F E … … … G G … … … …
Evaluation • Ran on a number of common data structure invariants, two real-world examples • Most complex invariant: red-black trees • Tree is globally ordered • Same # of black nodes to leaf • Other RB properties (Black follows Red, etc.) • We were unable to incrementalize this check by hand!
Real-world Examples • Tetris-like game Netcols • Invariant: no “floating” jewels in grid • With check, main event loop ran at 80ms, noticeably laggy • Result: event loop to 15ms with Ditto • JavaScript obfuscator • Invariant: no excluded keywords (based on a set of criteria) in renaming map
Summary • Results: • Automatic incrementalization made practical • For checks in Java programs • Data structure checks viable for development environment • Made possible by • Selection of an interesting domain • Optimistic memoization • Web: http://www.cs.berkeley.edu/~aj/cs/ditto/