3-Valued Logic Analyzer (TVP) Part II

3-Valued Logic Analyzer(TVP)Part II Tal Lev-Ami and Mooly Sagiv

Outline • The Shape Analysis Problem • Solving Shape Analysis with TVLA • Structural Operational Semantics • Predicate logic • Embedding • (Imprecise) Abstract Interpretation • Instrumentation Predicates • Focus • Coerce • Bibliography

Shape Analysis • Determine the possible shapes of a dynamically allocated data structure at given program point • Relevant questions: • Does a variable point to an acyclic list? • Does a variable point to a doubly-linked list? • Does a variable point p to an allocated element every time p is dereferenced? • Can a procedure create a memory-leak

NULL dereference Dereference of NULL pointers typedef struct element { int value; struct element *next; } Elements bool search(int value, Elements *c) {Elements *elem;for ( elem = c; c != NULL;elem = elem->next;) if (elem->val == value) return TRUE; return FALSE

Memory leakage Elements* reverse(Elements *c){ Elements *h,*g; h = NULL; while (c!= NULL) { g = c->next; h = c; c->next = h; c = g; } return h; leakage of address pointed-by h

NULL dereference Dereference of NULL pointers [elem := c;]1 [found := false;]2 while ([c != null]3 && [!found]4) ( if ([elem->car= value]5) then [found := true]6 else [elem = elem->cdr]7 )

Structural Operational Semanticsfor languages with dynamically allocated objects • The program state consists of: • current allocated objects • a mapping from variables into atoms, objects, and null • a car mapping from objects into atoms, objects, and null • a cdr mapping from objects into atoms, objects, and null • … • malloc() allocates more objects • assignments update the state

Structural Operational Semantics • The program state S=(O, env, car, cdr): • current allocated objects O • atoms (integers, Booleans) A • env: Var* A  O  {null} • car: A  A  O  {null} • cdr: A  A  O  {null} • The meaning of expressions Aa: SA  O  {null} • Aat(s) = at • Ax((O, env, car, cdr)) = env(x) • Ax.car((O, env, car, cdr)) = car(env(x)) • Ax.cdr((O, env, car, cdr)) = cdr(env(x))

Structural Semantics for SWhileaxioms [assvsos] <x := a, s=(O, e, car, cdr)>  (O, e[x Aas], car, cdr) [asscarsos] <x.car := a, (O, e, car, cdr)>  (O, e, car[e(x) Aas], cdr) [asscdrsos] <x.cdr := a, (O, e, car, cdr)>  (O, e, car, cdr[e(x) Aas]) [assmsos] <x := malloc(), (O, e, car, cdr)>  (O {n}, e[x n], car, cdr) where nO [skipsos] <skip, s>  s

[ifttsos] <if b then S1 else S2, s> <S1, s> [ifffsos] <if b then S1 else S2, s> <S2, s> if Bbs=tt if Bbs=ff [comp1sos] <S1 , s>  <S’1, s’> <S1; S2, s>  < S’1; S2, s’> [comp2sos] <S1 , s> s’ <S1; S2, s>  < S2, s’> Structural Semantics for SWhilerules

Summary • The SOS is natural • Can handle: • errors, e.g., null dereferences • free • garbage collection • But does not lead to an analysis • The set of potential objects is unbound • Solution: Three-Valued Kleene Predicate Logic

Predicate Logic • Vocabulary • A finite set of predicate symbols Peach with a fixed arity • A finite set of function symbols • Logical Structures S provide meaning for predicates • A set of individuals (nodes) U • PS: US {0, 1} • First-Order Formulas over  express logical structure properties

Using Predicate Logic to describe states in SOS • U=O • For a Boolean variable x define a nullary predicate (proposition) b[x] • b[x] = 1 when env(x)=1 • For a pointer variable x define a unary predicate • p[x](u)=1 when env(x)=u and u is an object • Two binary predicates: • s[car](u1, u2) = 1 when car(u1)=u2 and u2 is object • s[cdr](u1, u2) = 1 when cdr(u1)=u2 and u2 is object

Running Example [elem := c;]1 [found := false;]2 while ([c != null]3 && [!found]4) ( if ([elem->car= value]5) then [found := true]6 else [elem = elem->cdr]7 )

%s Pvar {elem, c} %s Bvar {found} %s Sel {car, cdr} #include "pred.tvp" %% #include "cond.tvp" #include "stat.tvp" %% /* [elem := c;]1 */ l_1 Copy_Var(elem, c) l_2 /* [found := false;]2 */ l_2 Set_False(found) l_3 /* while ([c != null]3 && [!found]4) ( */ l_3 Is_Not_Null_Var (c) l_4 l_3 Is_Null_Var (c) l_end l_4 Is_False(found) l_5 l_4 Is_True(found) l_end /* if ([elem->car= value]5) */ l_5 Uninterpreted_Cond() l_6 l_5 Uninterpreted_Cond() l_7 /* then [found := true]6 */ l_6 Set_True(found) l_3 /* else [elem = elem->cdr]7 */ l_7 Get_Sel(cdr, elem, elem) l_3 /* ) */%% l_1, l_end

pred.tvp foreach (z in Bvar) { %p b[z]() } foreach (z in Pvar) { %p p[z](v) unique box } foreach (sel in Sel) { %p s[sel](v1, v2) function }

Actions • Use first order formulae over  to express the SOS • Every action can have: • title %t • focus formula %f • precondition formula %p • error messages %message • new formula %new • predicate-update formulas {} • retain formula

cond.tvp (part 1) %action Uninterpreted_Cond() { %t "uninterpreted-Condition" } %action Is_True(x1) { %t x1 %p b[x1]() { b[x1]() = 1 } } %action Is_False(x1) { %t "!" + x1 %p !b[x1]() { b[x1]() = 0 } }

cond.tvp (part 2) %action Is_Not_Null_Var(x1) { %t x1 + " != null" %p E(v) p[x1](v) } %action Is_Null_Var(x1) { %t x1 + " = null" %p !(E(v) p[x1](v)) }

stat.tvp (part 1) %action Skip() { %t "Skip" } %action Set_True(x1) { %t x1 + " := true" { b[x1]() = 1 } } %action Set_False(x1) { %t x1 + " := false" { b[x1]() = 0 } }

stat.tvp (part 2) %action Copy_Var(x1, x2) { %t x1 + " := " + x2 { p[x1](v) = p[x2](v) } }

stat.tvp (part 3) %action Get_Sel(sel, x1, x2) { %t x1 + " := " + x2 + “.” + sel %message (!E(v) p[x2](v)) -> "an illegal dereference to" + sel + " component of " + x2 { p[x1](v) = E(v_1) p[x2](v_1) & s[sel](v_1, v) } }

stat.tvp (part 4) %action Set_Sel_Null(x1, sel) { %t x1 + "." + sel + " := null" %message (!E(v) p[x1](v)) -> "an illegal dereference to" + sel + " component of " + x1 { s[sel](v_1, v_2) = s[sel](v_1, v_2) & !p[x1](v_1) } }

stat.tvp (part 5) %action Set_Sel(x1, sel, x2) { %t x1 + “.” + sel + " := " + x2 %message (E(v, v1) p[x1](v) & s[sel](v, v1)) -> "Internal Error! assume that " + x1 + "." + sel + ==NULL" %message (!E(v) p[x1](v)) -> "an illegal dereference to" + sel + " component of " + x1 { s[sel](v_1, v_2) = s[sel](v_1, v_2) | p[x1](v_1) & p[x2](v_2) } }

stat.tvp (part 6) %action Malloc(x1) { %t x1 + " := malloc()" %new { p[x1](v) = isNew(v) } }

information order 01=1/2 Logical order 3-Valued Kleene Logic • A logic with 3-values • 0 -false • 1 - true • 1/2 - don’t know • Operators are conservatively interpreted • 1/2 means either true or false 1/2 0 1

Kleene Interpretation of Operators(logical-and)

Kleene Interpretation of Operators(logical-negation)

3-Valued Predicate Logic • Vocabulary • A finite set of predicate symbols P • A special unary predicate sm • sm(u)=0 when u represents a unique concrete node • sm(u)=1/2 when u may represent more than one concrete node • 3-valued Logical Structures Sprovide meaning for predicates • A (bounded) set of individuals (nodes) U • PS: US {0, 1/2, 1} • First-Order Formulas over  express logical structure properties • Interpret  as maximum on logical order

The Blur Operation • Abstract an arbitrary structure into a structure of bounded size • Select a set of unary predicates as abstraction-predicates • Map all the nodes with the same value of abstraction predicates into a single summary node • Join the values of other predicates

The Embedding Theorem • If a big structure B can be embedded in a structure S via a surjective (onto) function f such that all predicate values are preserved, i.e.,pB(u1, .., uk)  pS (f(u1), ..., f(uk)) • Then, every formula  is preserved is preserved • =1 in S =1 in B • =0 in S =0 in B • =1/2 in S don’t know

Naive Program Analysis via 3-valued predicate logic • Chaotic iterations • Start with the initial 3-valued structure • Execute every action in three phases: • check if precondition is satisfied • execute update formulas • execute blur • Command line tvla prgm prgm -action pub

prgm.tvs %n = {u, u0} %p = { sm = {u:1/2} s[cdr] = {u->u:1/2, u0->u:1/2} p[c] = {u0} }

More Precise Shape Analysis • Distinguish between cyclic and acyclic lists • Use Focus to guarantee that important formulas do not evaluate to 1/2 • Use Coerce to maintain global invariants • It all works • Singly linked lists (reverse, insert, delete, del_all) • Sortedness (bubble-sort, insetion-sort, reverse) • Doubly linked lists (insert, delete • Mobile code (router) • Java multithreading (interference, concurrent-queue)

The Instrumentation Principle • Increase precision by storing the truth-value of some designated formulae • Introduce predicate-update formulae to update the extra predicates

x  31 71 91 is = 0 is = 0 is = 0 is = 0 is = 0 is = 0 Example: Heap Sharing is[cdr](v) = v1,v2: cdr(v1,v)  cdr(v2,v)  v1  v2 x x u u u1 u1

 x 31 71 91 is = 0 is = 0 is = 0 is = 0 Example: Heap Sharing is[cdr](v) = v1,v2: cdr(v1,v)  cdr(v2,v)  v1  v2 is = 1 x x u u u1 u1 is = 0 is = 1 is = 0

pred.tvp foreach (z in Bvar) { %p b[z]() } foreach (z in Pvar) { %p p[z](v) unique box } foreach (sel in Sel) { %p s[sel](v1, v2) function } foreach (sel in Sel) { %i is[sel](v) = E(v1, v2) sel(v_1) & sel(v2, v) & v_1 != v_2 }

stat.tvp (part 4) %action Set_Sel_Null(x1, sel) { %t x1 + "." + sel + " := null" %message (!E(v) p[x1](v)) -> "an illegal dereference to" + sel + " component of " + x1 { s[sel](v_1, v_2) = s[sel](v_1, v_2) & !p[x1](v_1) is[sel](v) = is(v) & (!(E(v_1) x1(v_1) & sel(v_1, v)) | E(v_1, v_2) v_1 != v_2 & (sel(v_1, v) & !x1(v_1)) & (sel(v_2, v) & !x1(v_2))) } }

stat.tvp (part 5) %action Set_Sel(x1, sel, x2) { %t x1 + “.” + sel + " := " + x2 %message (E(v, v1) p[x1](v) & s[sel](v, v1)) -> "Internal Error! assume that " + x1 + "." + sel + ==NULL" %message (!E(v) p[x1](v)) -> "an illegal dereference to" + sel + " component of " + x1 { s[sel](v_1, v_2) = s[sel](v_1, v_2) | p[x1](v_1) & p[x2](v_2) is[sel](v) = is[sel](v) | E(v_1) x2(v) & sel(v_1, v) } }

Additional Instrumentation Predicates • reachable-from-variable-x(v)v1:x(v1)  cdr*(v1,v) • cyclic-along-dimension-d(v) cdr+(v, v) • ordered elementinOrder(v) v1:cdr(v, v_1)v->d <= v_1->d • doubly linked lists

The Focusing Principle • To increase precision • “Bring the predicate-update formula into focus” (Force 1/2 to 0 or 1) • Then apply the predicate-update formulas

x x x y x y u1 (1) Focus on  v1: x(v1)  cdr(v1,v)            u u1 u1 u u y y u1 u.1 u.0

x x y x x x y x u u1 u u1 y u1 u.1 u.0 y (2) Evaluate Predicate-Update Formulae x(v) =  v1: x(v1)  cdr(v1,v)            u u1 u1 u y u1 u.1 u.0

The Coercion Principle • Increase precision by exploiting some structural properties possessed by all stores (Global invariants) • Structural properties captured by constraints • Apply a constraint solver

x x x x x x u u u1 u1 u u1 u u1 y y u1 u.1 u.0 u1 u.1 u.0 y y (3) Apply Constraint Solver

Conclusion • TVLA allows construction of non trivial analyses • But it is no panacea • Expressing operational semantics using logical formulas is not always easy • Need instrumentation to be reasonably precise (sometimes help efficiency as well) • Open problems: • A debugger for TVLA • Frontends • Algorithmic problems: • Space optimizations

Bibliography • Chapter 2.6 • http://www.cs.uni-sb.de/~wilhelm/foiles/(Invited talk CC’2000) • http://www.cs.wisc.edu/~reps/#shape_analysisParametric Shape Analysis based on 3-valued logics (the general theory) • http://www.math.tau.ac.il/~tla/The system and its applications

3-Valued Logic Analyzer (TVP) Part II