CSE P501 – Compiler Construction

CSE P501 – Compiler Construction Semantic Checks Attribute Grammars Symbol Tables Types Disclaimer: more here than needed for the MiniJava project Jim Hogg - UW - CSE - P501

What to check the program is legal? class C { int a; C(int v) { a = v; } void setA(int v) { a = v; } } class Main { public static void main() { C c = new C(17); c.setA(42); } } Jim Hogg - UW - CSE - P501

Beyond Syntax There is a level of correctness not captured by a CFG: • Has a variable been declared before it is used? • Are types consistent in an expression? • In the assignment x=y, is y assignable to x? • Does a method call have right number and types of parameters? • In a selector p.q, is q a method or field of object p? • Is variable x guaranteed to be initialized before it is used? • Could p be null when p.q is executed? • Etc Jim Hogg - UW - CSE - P501

What else to know to generate code? • Where are fields allocated in an object? • How big are objects? (ie, how much storage needs to be allocated by new) • Where are local variables stored when a method is called? • Which methods are associated with an object/class? • How do we figure out which method to call based on the run-time type of an object? Jim Hogg - UW - CSE - P501

Semantic Analysis • Main tasks: • Extract types and other information from the program • Check language rules that go beyond the context-free grammar • Resolve names – connect declarations and uses • “Understand” the program – last phase of front end • ... so the program is "correct" for hand-off to the backend • Key data structures: symbol tables • For each identifier in the program, record its attributes (kind, type, etc) • Later: assign storage locations (stack frame offsets) for variables; add other annotations Jim Hogg - UW - CSE - P501

Some Kinds of Semantic Information Jim Hogg - UW - CSE - P501

Semantic Checks • Grammar = BNF • Short: eg, Java in a couple of pages • Semantics = Language Reference Manual • Long: eg, Java SE8 = 760 pages • For each language construct we want to know • What semantic rules should be checked • For an expression, what is its type (is expression legal?) • For a declaration, what info to capture for use elsewhere? Jim Hogg - UW - CSE - P501

A Sampling of Semantic Checks: 1 • Appearance of a name: id • id has been declared and is in-scope • Inferred type of id == its declared type • Memory location assigned by compiler • Constant: v • Inferred type and value are explicit Jim Hogg - UW - CSE - P501

A Sampling of Semantic Checks: 2 • Binary operator: e1 op e2 • e1 and e2 have compatible types • Identical, or • well-defined conversion to appropriate types (not in MiniJava) • Inferred type is a function of the operator and operand types Jim Hogg - UW - CSE - P501

A Sampling of Semantic Checks: 3 • Assignment: e1 = e2 • e1 is assignable (not constant or expression) • e1 and e2have compatible types • Identical, or • e2 can be converted to e1 (eg:char to int), but not in MiniJava, or • Type of e2 is a subclass of type of e1 (can be decided at compile time) • Inferred type is type of e1 • Location where value stored assigned by compiler class D extends B { D next; } d.next = b; // ok? d.next = d; // ok? B x + y = 42 // pointers? 1 = 2 // never good x[3] = true // in MiniJava? D Jim Hogg - UW - CSE - P501

A Sampling of Semantic Checks: 4 • Cast: (e1) e2 [not in MiniJava] • e1must bea type • e2is such that: • Same type as e1 • Can be converted to type e1 (eg: int to double) • e1 is a superclass - upcast • e1 is a subclass - downcast (needs runtime check) • Inferred type is e1 (int) x // char x? double x? (42) x // never good (boolean) x // int x? (int[]) x // ? class D extends B ... (D) new B(); // downcast (B) new D(); // upcast Jim Hogg - UW - CSE - P501

A Sampling of Semantic Checks: 5 • Field reference: exp.f • expis a reference type (not a valuetype) • The class of exphas a field named f • Inferred type is declared type of f y C • Reference Type • x = y, then x points at y • eg: C y = new C(); x = y; • Value Type • x = y, then x receives a copy of y • eg: int y = 42; x = y; x y 42 x 42 Jim Hogg - UW - CSE - P501

A Sampling of Semantic Checks: 6 • Method call exp.m(e1, e2, …, en) • expmust bea reference type • The class of exphas a method named m • The method has n parameters • Each argument must be assignment-compat with corresponding parameter • Inferred type is given by method declaration (or is void) Method Overloading Method Defs Method Calls More Method Defs m(int, int) {... m(double, double) {... m(1, 2); m(1.0, 2.0); m(1.0, 2); m(1, 2.0); m(int, int) {... m(int, double) {... m(double, int) {... Jim Hogg - UW - CSE - P501

A Sampling of Semantic Checks: 7 Return statement • return exp; // exp must be assignment-compatible with method's return type • return; // better be a void method! Jim Hogg - UW - CSE - P501

Semantic Analysis • Parser builds AST • Now check semantic constraints • Can partly be done during the parse, but often easier to organize as separate phases - eg: visitor pattern over AST • And some things can’t be done on-the-fly. eg: info about identifiers that are used before they are declared (fields, classes) [ cf: declare-before-use, as in Pascal ] • Information stored in Symbol Tables • Generated by semantic analysis, used there and later Jim Hogg - UW - CSE - P501

Attribute Grammars • We can specify Java micro-syntax with a few dozen regex • Then find a tool (JFlex) to create a scanner from these regex • We can specify Java syntax with a few pages of BNF • Then find a tool (CUP) to create a parser from that BNF • What about the huge collection of constraint checks? (760 pages in the Java Language Reference Manual) • Attribute Grammars? • Then find a tool (???) to create a semantic checker for that Attribute Grammar? Jim Hogg - UW - CSE - P501

Attribute Example • Give each AST node a .val attribute to hold its computed value • AST and attribution for (1+2) * (6 / 2) * + / 6 1 2 2 • This example is much simplified • Attributes should really be attached to nodes in the Parse Tree • Attribute equations should really be attached to each BNF production Jim Hogg - UW - CSE - P501

Attribute Example • Give each node has a .val attribute to hold the computed value of that node • AST and attribution for (1+2) * (6 / 2) .val * .val .val + / .val .val .val .val .val 6 1 2 2 Jim Hogg - UW - CSE - P501

Attribute Example • Give each node has a .val attribute to hold the computed value of that node • AST and attribution for (1+2) * (6 / 2) .val=9 * .val=3 .val=3 + / .val=2 .val=6 .val=2 .val=1 6 1 2 2 Jim Hogg - UW - CSE - P501

Attribute Grammars • Idea: associate attributes with each node in AST • Eg: • Type info (int, boolean, int[], class for MiniJava) • Storage location (eg: byte-offset 28 from frame-pointer) • Assignable (eg: constant vs variable) • Numeric value (if node represents a constant) • etc • Notation: X.a if a is an attribute of node X Jim Hogg - UW - CSE - P501

Inherited and Synthesized Attributes Given a production A  Y1 Y2 … Yn A synthesized attribute A.a is a function of Y’s (bottom-up) An inherited attribute Yi.b is a function of X.a and other Yj.c (top-down and sideways) Sometimes restricted, eg: Y’s to the left Jim Hogg - UW - CSE - P501

Attribute Equations • For each kind of node we give a set of equations relating that node's attributes and its children Eg: plus.val = e1.val + e2.val • or, relating that node's attributes and its parent • Attribution (evaluation) means finding a solution that satisfies all of the equations in the tree Jim Hogg - UW - CSE - P501

Informal Example of Attribute Rules: 1 • Grammar for a trivial language: progdeclstmt decl int id; stm exp = exp ; exp id | exp + exp | 1 • What attributes would we create in order to check types and assignability? Jim Hogg - UW - CSE - P501

Informal Example of Attribute Rules: 1 • Grammar for a trivial language: progdeclstmt decl int id; stm exp = exp ; exp id | exp + exp | 1 • For stm exp = exp; need to check that: • LHS exp is assignable - not a constant, not an arithmetic expression • RHS exphas a type that is assignment-compatible with LHS Jim Hogg - UW - CSE - P501

Informal Example of Attribute Rules: 2 Attributes progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 • .env • "environment" • link to a Symbol Table entry • synthesized by decl, inherited by stm • each entry maps a name to its type and value • .type • expression type (int, Boolean, int[], class) • synthesized • .kind • variable versus value (lvalueversusrvalue, in C-speak) • synthesized Jim Hogg - UW - CSE - P501

Attributes for Declarations progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .env decl int id ; Note - not all node types have, or need, attributes Jim Hogg - UW - CSE - P501

Attributes for Programs progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 prog .env .env decl stm Jim Hogg - UW - CSE - P501

Attributes for Constants progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .type .kind exp 1 Jim Hogg - UW - CSE - P501

Attributes for Expressions progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .type .kind exp .type .kind id Jim Hogg - UW - CSE - P501

Attributes for Addition progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .env .type .kind .env .type .kind exp .env .type .kind exp1 + exp2 Jim Hogg - UW - CSE - P501

Attributes for Assignment progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .env .type .kind stm .env .type .kind .env .type .kind exp1 = exp2 Jim Hogg - UW - CSE - P501

Example progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 prog int x; x = x + 1; decl stm .env .type .kind exp = exp int id + exp exp id x x 1 id x Jim Hogg - UW - CSE - P501

Extensions • Can be extended to handle sequences of declarations and statements • Sequence of declarations builds up a combined environment – each decl synthesizes a new environment from previous, augmented with new binding • Full environment is passed down to statements and expressions Jim Hogg - UW - CSE - P501

Observations • These are equational computations - no sequential modification of state (think functional programming - no side-effects) • Issues on deciding whether a given set of attribute equations will actually converge • Can be automated, provided the attribute equations are non-circular • Problems • Non-local computation • Can’t afford to literally pass around copies of large, aggregate structures like environments Jim Hogg - UW - CSE - P501

In Practice • Attribute Grammars give us a way of thinking how to structure semantic checks • Use Symbol Tables to hold environment information • Add fields to nodes to refer to appropriate attributes • symbol table entries for identifiers • types for expressions • insert into appropriate places in AST class hierarchy; eg, most statements don’t need types • But, commercial compilers don't use Attribute Grammars • Instead? - "death by a thousand if's" Jim Hogg - UW - CSE - P501

Symbol Tables A table that maps id  <type, kind, location, ...> API • lookup(id)  info = <type, kind, location, ...> • enter(id, info) // updates table • open() // opens new scope • close() // closes scope Use • Build table from declarations during (or before) AST walk • Use info to check semantic rules (eg: declare before use) Jim Hogg - UW - CSE - P501

Aside: Implementing Symbol Tables • Formerly: big topic in classical compiler courses: implementing a hashed symbol table - hash function, table size, collisions chains, etc (The C standard library doesn't provide any) • Now: just use the collection classes provided with the implementation language (Java, C#, C++, ML, Haskell, ...) • Then tune & optimize if it really matters • In production compilers, it really matters! • For Java: • Map<K,V> • ArrayListfor ordered lists (eg: parameters) Jim Hogg - UW - CSE - P501

Symbol Tables for MiniJava: Global • A MiniJava program = 1 file = multiple classes (no separate compilation) • One Global table per program • Maps class name to Class symbol table • Created in a pass over ClassDeclAST nodes • Used to check field/method names and extract their info • Global Symbol Table lives throughout the compilation • In real Java, Symbol Table info is persisted into .class files • In C#, Symbol Table info is persisted as 'metadata' into output assembly Jim Hogg - UW - CSE - P501

Symbol Tables for MiniJava: Class • One Class symbol table for each class • 1 entry for each field in class • name, type, (public|private), offset-in-class • 1 entry for each method in class • List of parameters: name, type, ordinal (or ordered) In full Java, need some way to handle namespaces Ie: same identifier can be both a method and a field in a class Jim Hogg - UW - CSE - P501

Symbol Tables for MiniJava: Locals • One Locals table for each method • One entry for each parameter • Contents = type, memory location • One entry for each local variable • Contents = type, memory location • Needed only while compiling the method • Can discard when done (after first, or final, pass) Jim Hogg - UW - CSE - P501

Beyond MiniJava • We don't deal with: • Class static fields or methods • Accessibility - public, protected, private • Inner classes • Nested scopes in methods – re-use of identifiers, nested functions (ML, Pascal, …) • Basic idea: new symbol tables for inner scopes, linked to surrounding scope’s table • Look for identifier in inner scope; if not found look in surrounding scope (recursively) • Pop back up on scope exit Jim Hogg - UW - CSE - P501

Engineering Issues • In practice, want to retain O(1) lookup • Use hash tables with additional information to get the scope nesting right • Scope entry/exit operations • In multipass compilers, Symbol Table info needs to persist after analysis of inner scopes for use on later passes • See a compiler textbook for ideas & details Jim Hogg - UW - CSE - P501

Error Recovery • What to do when compiler finds an undeclared identifier? • eg: x = y + 1 and there is no entry for y in Symbol Table • Only complain once (Why?) • Create an entry for y in Symbol Table, to suppress future error messages • Assign the forged entry for y to have a type of “unknown” • “Unknown” is the type of all malformed expressions and is compatible with all other types • Can avoid redundant error messages (how?) Jim Hogg - UW - CSE - P501

“Predefined” Things • Many languages have some “predefined” items • classes, functions (eg: "maxint") • "standard library" or "prelude" • Write initialization code to inject predefined info in Symbol Table • Preferably, import a file including "standard prelude". Tradeoffs? • Rest of compiler doesn’t need to know the difference between “predefined” items and ones found in the user program Jim Hogg - UW - CSE - P501

Types • Classical roles of types in programming languages • Compile-time error detection (find errors ASAP) • Improved expressiveness (eg: method & operator overloading) • Provide information to optimizer • Runtime safety • Eg: Haskell - if your program type-checks, it's like correct • Eg: Even FORTRAN had INTEGER and REAL - different bit layouts • Can we ensure programs are correct with enough testing? Jim Hogg - UW - CSE - P501

Terminology Static vs dynamic typing • static: checking done prior to execution (eg, at compile-time) • dynamic: checking during execution Strong vsweak typing • strong: guarantees no illegal operations performed • weak: can’t make guarantees Caveats: • Hybrids common • Inconsistent usage common • “untyped” or “typeless” could mean dynamic or weak Jim Hogg - UW - CSE - P501

Type Systems • Base Types • Fundamental, atomic types • Eg: int, double, char, bool • Compound or Constructed Types • Built up from other types (recursively) • Type-Constructors include: • arrays • records/structs/classes • pointers • enumerations • functions • modules (eg: ML) Jim Hogg - UW - CSE - P501

How to Represent Types in a Compiler? • Create a shallow class hierarchy. Eg: abstract class Type {...} class ClassType extends Type {...} class BaseType extends Type {...} • Should not need too many of these Jim Hogg - UW - CSE - P501

Types vs ASTs • Types are not AST nodes! (eg: IntType != INT != ILIT) • AST = abstract representation of source program (including source program type info) • Types = abstract representation of types for semantics checks, inference, etc. • Can include information not explicitly represented in the source code, or may describe types in ways more convenient for processing • Be sure you have a separate “type” class hierarchy in your compiler distinct from the AST Jim Hogg - UW - CSE - P501

Base Types • For each base type (int, bool, etc), create a single object to represent it • Symbol table entries and AST nodes for expressions refer to these to represent type info • Usually create at compiler startup • Useful to create a type void object to tag functions that return no value • Also useful to create a type unknown object for errors • void and unknown types reduce need for special case code • ie, pass these types around - no need to check everywhere Jim Hogg - UW - CSE - P501

CSE P501 – Compiler Construction