Abstract Interpretation and Future Program Analysis Problems

Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of Technology

Abstract Interpretation:The Early Years • Formal Connection Between • Sound analysis of program • Execution of program • Broader Impact • Insight that analysis is execution • Reduced need to think of analysis as reasoning about all possible executions! • Good fit with analysis problems of that era • Properties of local variables • Within single procedure

How Is Abstract Interpretation Holding Up? • Technical result as relevant as ever • Moore’s Law effects • Much more computing power for analysis • More complex programs • Ambitious analyses • Heap properties • Multiple threads • Interprocedural partial program analyses • Stretch intuitive vision of analysis as execution

Outline • Combined pointer and escape analysis • Rationale behind design decisions • Alternative choices in design space • Challenges and Predictions • Bigger Picture

Goal of Pointer Analysis • Characterize objects to which pointers point • Synthesize finite set of object representatives • Derive representative(s) each pointer points to r = p.f; p f r “p.f points to a object, so after the execution of r = p.f, r may point to a object, but not to a , , or object”

Our Pointer Analysis Goals • Accurate for multithreaded programs • Compositional, partial program analysis • Analyze each procedure once • Independently of callers • May skip analysis of invoked procedures • Why? • Parts of program unavailable (different language, not written yet) • Parts may be irrelevant for desired result

Analysis Abstraction Basic abstraction Is Points-to Graph • Nodes represent objects in heap • Edges represent references in heap f p f f q f u

Two Kinds of Edges • Inside edges (solid) – represent references created inside analyzed part of program • Outside edges (dashed) – represent references created outside analyzed part of program f p f f q f u

Two Kinds of Nodes • Inside nodes (solid) – represent objects created inside analyzed part of program • Outside nodes (dashed) – represent objects • Created outside analyzed part of program, or • Accessed via edges created outside analyzed part of program f p f f q f u

Key Question What does the heap look like when the procedure begins its execution? • Previous algorithms analyzed callers before callees, so model of heap always available • Unfortunately, this approach requires analysis of entire program in top-down fashion • Our solution: use code to reconstruct what (accessed part of) heap must look like

Analysis In Example m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q

Analysis In Example m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r q

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r q

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r q s

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f q s

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s One option – continue to expand graph But the analysis may never terminate…

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s Instead have one outside node per load statement Represents all objects loaded at that statement Bounds graph and guarantees termination

Consequences of This Decision • Multiple objects represented by single node (load node in loop) • But can also have single object represented by multiple nodes in graph (!!) (object loaded at multiple statements) f do a = q.f; until (a = null); do b = q.f; until (b = null); f q f f

Consequences of This Decision • Form of points-to graph depends on program • Programs with identical behavior but different graphs… f f p p f r r f f f f q q s s do s = s.f; until (s = null); s = s.f; while (s != null) s = s.f

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s t

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f f q s t

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f f q s t u

What Does Result Tell Us? • Nodes (outside) • Created outside analyzed part of program • Incomplete information • Nodes (inside, escaped) • Created inside analyzed part of program • But reachable from unanalyzed part of program • Incomplete information f p r f f f q s t u • Nodes (inside, captured) • Created inside analyzed part of program • Unreachable from unanalyzed part of program • Complete information about referencing relationships!

Crucial Distinction • Escaped vs. Captured • Enables analysis to identify regions of heap where it has complete information • Crucial for both • Accuracy of analysis • Effective use of analysis results f p r f f f q s t u

Multiple Calling Contexts f • Two Key Assumptions • p and q refer to different objects • Parallel threads may access objects p r f f f q m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } s t

f p r f f f q s t Multiple Calling Contexts What if p and q refer to the same object? (i.e. p and q aliased) m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } r f f p f f q s t

Multiple Calling Contexts f p What if p and q refer to the same object and there are no parallel threads? r f f f q m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } s t r f f p f f q s t

Multiple Calling Contexts What if p and q refer to the same object and there are no parallel threads? m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } r p f f q s t

Issues • Substantially different results for different calling contexts • But caller is unavailable at analysis time… • New analysis for each possible context? • Lots of contexts… • Most of which probably won’t be needed…

r p f f q s t Our Solution f p • Analyze assuming • Distinct parameters • Parallel threads • Aliased parameters at caller? Merge nodes… • No parallel threads? Remove outside edges and nodes… r f f f q s t r f f p f f q s t

Solution Is Not Perfect • Specialization can lose precision – can have two procedures such that when analyzed with • Distinct parameters – same analysis result • Aliased parameters - different analysis result • Conceptually complex analysis • Think about all contexts during analysis • Start to lose intuition of analysis as execution • Difficult time applying abstract interpretation framework

V – concrete values A – abstract values  - abstraction function  - concretization function Abstract Interpretation and Analysis Abstract interpretation is parameterized framework ta a1 a2     tv v1 v2

Applying Framework • A – points-to graphs • V – concrete heaps •  - points-to graph for a given heap • Points-to graph depends on program • Need to augment heap with access history •  - all heaps that correspond to points-to graph • OK, I give up…

Correctness Proof • Inductively construct a relation  between • Objects in heap • Nodes that represent objects • Invariants that characterize  • Transfer function • Takes points-to graph and  • Give new points-to graph and  • Prove that transfer functions preserve invariants

Threads and Abstract Interpretation • Philosophy of Abstract Interpretation • Come up with a decent abstraction • Execute program on that abstraction • Problem with threads • Execution usually modeled as interleaving • Too many interleavings!

Our Solution Points-to graphs explicitly represent all possible interactions between parallel threads Basic Analysis Approach • Analyze each thread in isolation • To compute combined effect of multiple threads • Retrieve result for each thread • Compute interactions that may occur Outside edges Interactions in which one thread reads a reference created by parallel thread Inside Edges Interactions in which one thread creates a reference read by parallel thread

Interthread Analysis n(p,q) || m(p,q)

p Interthread Analysis n(p,q) || m(p,q) p q q Retrieve points-to graph from analysis of each thread

p if may represent same object as A B A B Interthread Analysis n(p,q) || m(p,q) p q q Establish correspondence between nodes Start with parameter nodes

p Interthread Analysis n(p,q) || m(p,q) p q q • Compute Interactions Between Threads • Match inside and outside edges • For each outside node, compute nodes in other graph that it represents

p p Interthread Analysis n(p,q) || m(p,q) p q q • Use computed representation relationship to • combine graphs and • obtain single graph for the execution of both threads q

Property of Analysis • Flow-sensitive within each thread (if reorder statements, get different result) • Flow-insensitive between threads • Assumes interactions can happen • Any number of times • In any order • Analysis models interactions that can’t actually happen in any interleaved execution

a a b b c c Imprecision Due To Flow Insensitivity n(a,b,c) { 1:p=b.f p.f=a 2:a.f=b } m(a,c) { 3:q=a.f 4:q.f=c } || Interthread Analysis Result Execution Order Required to Produce Blue Edge a 1 3 b 2 4 c

Weak Memory Consistency Models

Initially: y=1 x=0 Thread 1 Thread 2 y=0 z = x+y x=1 What is value of z?

Initially: y=1 x=0 Three Interleavings z = x+y y=0 Thread 1 Thread 2 z = x+y y=0 y=0 x=1 x=1 z = x+y z = 0 z = 1 x=1 y=0 What is value of z? x=1 z = x+y z = 1

Initially: y=1 x=0 Three Interleavings z = x+y y=0 Thread 1 Thread 2 z = x+y y=0 y=0 x=1 x=1 z = x+y z = 0 z = 1 x=1 y=0 What is value of z? x=1 z can be 0 or 1 z = x+y z = 1

Initially: y=1 x=0 Three Interleavings z = x+y y=0 Thread 1 Thread 2 INCORRECT REASONING! z = x+y y=0 y=0 x=1 x=1 z = x+y z = 0 z = 1 x=1 y=0 What is value of z? x=1 z can be 0 or 1 z = x+y z = 1

Abstract Interpretation and Future Program Analysis Problems

Abstract Interpretation and Future Program Analysis Problems

Presentation Transcript

Introduction to Abstract Interpretation

Abstract Interpretation and Predicate Abstraction

Basic abstract interpretation theory

Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II

Spring 2014 Program Analysis and Verification Lecture 13: Abstract Interpretation V

From Program slicing to Abstract Interpretation

Spring 2014 Program Analysis and Verification Lecture 11: Abstract Interpretation III

Sparse Abstract Interpretation

Static Analysis with Abstract Interpretation

Spring 2014 Program Analysis and Verification Lecture 9: Abstract Interpretation I

Spring 2014 Program Analysis and Verification Lecture 12: Abstract Interpretation IV

Iterative Program Analysis Abstract Interpretation

Iterative Program Analysis Abstract Interpretation

Analysis and Interpretation

Abstract interpretation

Iterative Program Analysis Abstract Interpretation

Radar Interpretation Problems

Purity Analysis : Abstract Interpretation Formulation

Abstraction and Abstract Interpretation

Logical Abstract Interpretation

Analysis and Interpretation

Program Analysis using Random Interpretation