370 likes | 549 Views
Pointer Analysis. G. Ramalingam Microsoft Research, India. Goals. Points-to Analysis : Determine the set of possible values of a pointer-variable (at different points in a program) what locations can a pointer point-to?
E N D
Pointer Analysis G. Ramalingam Microsoft Research, India
Goals • Points-to Analysis: Determine the set of possible values of a pointer-variable (at different points in a program) • what locations can a pointer point-to? • Alias Analysis: Determine if two pointer-variables may point to the same location • Compute conservative approximation • A fundamental analysis, required by most other static analyses
A Constant Propagation Example x = 3; y = 4; z = x + 5; • x is always 3 here • can replace x by 3 • and replace x+5 by 8 • and so on
A Constant Propagation ExampleWith Pointers x = 3; *p = 4; z = x + 5; • Is x always 3 here?
A Constant Propagation ExampleWith Pointers p = &y; x = 3; *p = 4; z = x + 5; if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5; p = &x; x = 3; *p = 4; z = x + 5; pointers affect most program analyses x is always 3 x is always 4 x may be 3 or 4 (i.e., x is unknown in our lattice)
A Constant Propagation ExampleWith Pointers p = &y; x = 3; *p = 4; z = x + 5; if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5; p = &x; x = 3; *p = 4; z = x + 5; p always points-to y p always points-to x p may point-to x or y
Points-to Analysis • Determine the set of targets a pointer variable could point-to (at different points in the program) • “p points-to x” • “p stores the value &x” • “*p denotes the location x” • targets could be variables or locations in the heap (dynamic memory allocation) • p = &x; • p = new Foo(); or p = malloc (…); • must-point-to vs. may-point-to
A Constant Propagation ExampleWith Pointers Can *p denote the same location as *q? *q = 3; *p = 4; z = *q + 5; what values can this take?
More Terminology • *p and *q are said to be aliases (in a given concrete state) if they represent the same location • Alias analysis • determine if a given pair of references could be aliases at a given program point • *pmay-alias *q • *pmust-alias*q
Pointer Analysis • Points-To Analysis • may-point-to • must-point-to • Alias Analysis • may-alias • must-alias
Points-To Analysis:A Simple Example p = &x; q = &y; if (?) { q = p; } x = &a; y = &b; z = *q;
Points-To Analysis:A Simple Example p = &x; q = &y; if (?) { q = p; } x = &a; y = &b; z = *q;
Points-To Analysis x = &a; y = &b; if (?) { p = &x; } else { p = &y; } *x = &c; *p = &c; x: a x: {a,c} x: a y: {b,c} y: b y: b p: {x,y} p: {x,y} p: {x,y} a: c a: c a: null How should we handle this statement? (Try it!) Weak update Weak update Strong update
Questions • When is it correct to use a strong update? A weak update? • Is this points-to analysis precise? • We must formally define what we want to compute before we can answer many such questions
Points-To Analysis:An Informal Definition • Let u denote a program-point • Define IdealMayPT (u) to be { (p,x) | p points-to x in some state at u } • Compute a set MayPT(u) that over-approximates above set • Define IdealMustPT (u) to be { (p,x) | p points-to x in all states at u } • Compute a set MustPT(u) that under-approximates above set
Static Program Analysis • A static program analysis computes approximateinformation about the runtime behavior of a given program • The set of valid programs is defined by the programming language syntax • The runtime behavior of a given program is defined by the programming language semantics • The analysis problem defines whatinformation is desired • The analysis algorithm determines what approximation to make
Programming Language: Syntax • A program consists of • a set of variables Var • a directed graph (V,E,entry) with a distinguished entry vertex, with every edge labelled by a primitive statement • A primitive statement is of the form • x = null • x = y • x = *y • x = &y; • *x = y • skip (where x and y are variables in Var) • Omitted (for now) • Dynamic memory allocation • Pointer arithmetic • Structures and fields • Procedures
Example Program 1 Vars = {x,y,p,a,b,c} x = &a; y = &b; if (?) { p = &x; } else { p = &y; } *x = &c; *p = &c; x = &a 2 y = &b 3 p = &x p = &y 4 5 skip skip 6 *x = &c 7 *p = &c 8
Programming Language:Operational Semantics • Operational semantics == an interpreter (defined mathematically) • State • Data-State ::= Var -> (Var U {null}) • PC ::= V (the vertex set of the CFG) • Program-State ::= PC x Data-State • Initial-state: • (entry, \x. null)
Example States Vars = {x,y,p,a,b,c} Initial data-state 1 x: N, y:N, p:N, a:N, b:N, c:N x = &a 2 Initial program-state y = &b x: N, y:N, p:N, a:N, b:N, c:N <1, > 3 p = &x p = &y 4 5 skip skip 6 *x = &c 7 *p = &c 8
Programming Language:Operational Semantics • Meaning of primitive statements • CS[stmt] : Data-State -> Data-State • CS[ x = null ] s = s[x null] • CS[ x = &y ] s = s[x y] • CS[ x = y ] s = • CS[ x = *y ] s = • CS[ *x = y ] s = Exercise: Assume pointer values are non-null for simplicity
Programming Language:Operational Semantics • Meaning of primitive statements • CS[stmt] : Data-State -> Data-State • CS[ x = null ] s = s[x null] • CS[ x = &y ] s = s[x y] • CS[ x = y ] s = s[x s(y)] • CS[ x = *y ] s = s[x s(s(y))] • CS[ *x = y ] s = s[s(x) s(y)] must say what happens if null is dereferenced
Programming Language:Operational Semantics • Meaning of program • a transition relation on program-states • Program-State X Program-State • state1state2means that the execution of some edge in the program can transform state1 into state2 • Defining • (u,s) (v,s’) iff the program contains a control-flow edge u->vlabelled with a statement stmt such that M[stmt]s = s’
Programming Language:Operational Semantics • A sequence of states s1s2 … snis said to be an execution (of the program) iff • s1is the Initial-State • sisi+1for 1 <= I < n • A state s is said to be a reachable stateiff there exists some execution s1s2 … snis such that sn= s. • Define RS(u) = { s | (u,s) is reachable }
Programming Language:Operational Semantics • A sequence of states s1s2 … snis said to be an execution (of the program) iff • s1 is the Initial-State • si| si+1 for 1 <= I < n • A state s is said to be a reachable state iff there exists some execution s1s2 … snis such that sn= s. • Define RS(u) = { s | (u,s) is reachable } All of this formalism for this one definition
Ideal Points-To Analysis:Formal Definition • Let u denote a vertex in the CFG • Define IdealMustPT (u) to be { (p,x) | forall s in RS(u). s(p) == x } • Define IdealMayPT (u) to be { (p,x) | exists s in RS(u). s(p) == x }
May-Point-To Analysis:Formal Definition Compute R: V -> 2Vars’ such that R(u) IdealMayPT(u) (where Var’ = Var U {null}) For every vertex u in the CFG, compute a set R(u) such that R(u) IdealMayPT(u)
May-Point-To Analysis • An algorithm is said to be correct if the solution R it computes satisfies "uÎV. R(u) IdealMayPT(u) • An algorithm is said to be precise if the solution R it computes satisfies "uÎV. R(u)=IdealMayPT(u) • An algorithm that computes a solution R1 is said to be more precise than one that computes a solution R2 if "uÎV. R1(u) ÍR2(u) Compute R: V -> 2Vars’ such that R(u) IdealMayPT(u)
Algorithm A: A Formal DefinitionThe “Data Flow Analysis” Recipe • Define semi-lattice of abstract-values • AbsDataState ::= Var -> 2Var’ • f1 f2 = \x. (f1 (x) È f2 (x)) • bottom = \x.{} • Define initial abstract-value • InitialAbsState = \x. {null} • Define transformers for primitive statements • AS[stmt] : AbsDataState -> AbsDataState
Algorithm A: A Formal DefinitionThe “Data Flow Analysis” Recipe • Let st(v,u) denote stmt on edge v->u • Compute the least-fixed-point of the following “dataflow equations” • x(entry) = InitialAbsState • x(u) = v->u AS(st(v,u)) x(v) x(v) x(w) v w st(v,u) st(w,u) u x(u)
Algorithm A:The Transformers • Abstract transformers for primitive statements • AS[stmt] : AbsDataState -> AbsDataState • AS[ x = y ] s = s[x s(y)] • AS[ x = null ] s = s[x {null}] • AS[ x = &y ] s = s[x {y}] • AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn) • AS[ *x = y ] s = ???
Abstract Transformers:Weak/Strong Update AS[stmt] : AbsDataState -> AbsDataState AS[ *x = y ] s = s[z s(y)] if s[x] = {z} s[z1 s(z1) s(y)] if s[x] = {z1, …, zk} [z2s(z2) s(y)] (where k > 1) … [zks(zk) s(y)]
Algorithm A: TransformersWeak/Strong Update • Transformer for “*x = y” S. if (S[x] == {z} then S[z -> S(y)] else z. if (z S(x)) then S(y) S(z) else S(z) S. z. if (S(x) == {z} then S(y) else if (z S(x)) then S(y) S(z) else S(z)
Recap:A basic pointer analysis algorithm 1 S1 = [x -> {null}, y -> {null}, p -> {null},…] x = &a 2 S2 = AS[x = &a] S1 S2 = S1 [x -> {a}] y = x 3 S3 = AS[y = x] S2 S3 = S2 [y -> S2(x)] p = &x p = &y 4 5 … skip skip 6 S6 = S4 S5 *x = &c … 7 *p = &c 8
Abstract Transformers • AS[stmt] : AbsDataState -> AbsDataState • AS[ x = y ] s = s[x s(y)] • AS[ x = null ] s = s[x {null}] • AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn)