Intermediate representation

Intermediate representation • Goals: • encode knowledge about the program • facilitate analysis • facilitate retargeting • facilitate optimization HIR semantic analysis HIR intermediate code gen. scanning parsing LIR LIR code gen. optim

Intermediate representation • Components • code representation • symbol table • analysis information • string table • Issues • Use an existing IR or design a new one? • How close should it be to the source/target?

IR selection • Using an existing IR • cost savings due to reuse • it must be expressive and appropriate for the compiler operations • Designing an IR • decide how close to machine code it should be • decide how expressive it should be • decide its structure • consider combining different kinds of IRs

IR classification: Level • High-level • closer to source language • used early in the process • usually converted to lower form later on • Example: AST

IR classification: Level • Medium-level • try to reflect the range of features in the source language in a language-independent way • most optimizations are performed at this level • algebraic simplification • copy propagation • dead-code elimination • common subexpression elimination • loop-invariant code motion • etc.

IR classification: Level • Low-level • very close to target-machine instructions • architecture dependent • useful for several optimizations • loop unrolling • branch scheduling • instruction/data prefetching • register allocation • etc.

IR classification: Level i := op1 if step < 0 goto L2 L1: if i > op2 goto L3 instructions i := i + step goto L1 L2: if i < op2 goto L3 instructions i := i + step goto L2 L3: for i := op1 to op2 step op3 instructions endfor High-level Medium-level

IR classification: Structure • Graphical • trees, graphs • not easy to rearrange • large structures • Linear • looks like pseudocode • easy to rearrange • Hybrid • combine graphical and linear IRs • Example: • low-level linear IR for basic blocks, and • graph to represent flow of control

(Basic blocks) • Basic block = a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end.

(Basic blocks) • Partitioning a sequence of statements into BBs • Determine leaders (first statements of BBs) • the first statement is a leader • the target of a conditional is a leader • a statement following a branch is a leader • For each leader, its basic block consists of the leader and all the statements up to but not including the next leader.

Linear IRs • Sequence of instructions that execute in order of appearance • Control flow is represented by conditional branches and jumps • Common representations • stack machine code • three-address code

Linear IRs • stack machine code • assumes presence of operand stack • useful for stack architectures, JVM • operations typically pop operands and push results. • advantages • easy code generation • compact form • disadvantages • difficult to rearrange • difficult to reuse expressions

Linear IRs • three-address code • compact • generates temp variables • level of abstraction may vary • loses syntactic structure • quadruples • operator • up to two operands • destination • triples • similar to quadruples but the results are not named explicitly (index of operation is implicit name) • Implement as table, array of pointers, or list

Linear IRs L1: i := 2 t1:= i+1 t2 := t1>0 if t2 goto L1 (1) 2 (2) i st (1) (3) i + 1 (4) (3) > 0 (5) if (4), (1) Quadruples Triples

Graphical IRs • Parse tree • Abstract syntax tree • high-level • useful for source-level information • retains syntactic structure • Common uses • source-to-source translation • semantic analysis • syntax-directed editors

Graphical IRs • Tree, for basic block • root: operator • up to two children: operands • can be combined • Uses: • algebraic simplifications • may generate locally optimal code. gt, t2 add, t1 0 L1: i := 2 t1:= i+1 t2 := t1>0 if t2 goto L1 assgn, i add, t1 gt, t2 assgn, i 1 2 i 1 t1 0 2

Graphical IRs • Directed acyclic graphs (DAGs) • Like compressed trees • leaves: variables, constants available on entry • internal nodes: operators • annotated with variable names? • distinct left/right children • Used for basic blocks (doesn't show control flow) • Can generate efficient code. • Note: DAGs encode common expressions • But difficult to transform • Better for analysis

Graphical IRs • Generating DAGs • check whether an operand is already present • if not, create a leaf for it • check whether there is a parent of the operand that represents the same operation • if not create one, then label the node representing the result with the name of the destination variable, and remove that label from all other nodes in the DAG.

Graphical IRs • Directed acyclic graphs (DAGs) • Examplem := 2 * y * z n := 3 * y * z p := 2 * y - z

Graphical IRs • Control flow graphs (CFGs) • Each node corresponds to a • basic block, or • fewer nodes • may need to determine facts at specific points within BB • a single statement • more space and time • Each edge represents flow of control

Graphical IRs • Dependence graphs • Encode flow of values from definition to use • Nodes represent operations • Edges connect definitions to uses • Graph represents constraints on the sequencing of operations • Built for specific optimizations, then discarded

SSA form • Static Single Assignment Form • Encodes information about data and control flow • Two constraints: • each definition has a unique name • each use refers to a single definition • all uses reached by a definition are renamed • Example:x := 5 x0 := 5 x := x+1 becomes x1 := x0 + 1 y := x *2 y0 := x1 * 2 • What if we have a loop?

SSA form • The compiler inserts special join functions (called -functions) at points where different control flow paths meet. • Example:read(x) read(x0)if (x>0) if (x0>0) y:=5 y0 := 5else becomes else y:=10 y1 := 10x := y y2 := (y0, y1) x1 := y2

SSA form • Example 2: x := 0 x0 := 0i := 1 i0 := 1while (i<10) if (i0>=10) goto L2 x := x+i L1: i := i+1

SSA form • Example 2: x := 0 x0 := 0i := 1 i0 := 1while (i<10) if (i0>=10) goto L2 x := x+i L1: x1:= (x0, x2) i := i+1 i1 := (i0, i2) x2 := x1+i1 i2 := i1+1 if (i2<10) goto L1 L2: x3 := (x0, x2) i3 := (i0, i2)

SSA form • Note:  is not an executable function • A program is in SSA form if • each variable is assigned a value in exactly one statement • each use of a variable is dominated by the definition. • point x dominates point y if every path from the start to y goes through x

SSA form • Why use SSA? • explicit def-use pairs • no write-after-read and write-after-write dependences • speeds up many dataflow optimizations • But • too many temp variables, -functions • limited to scalars • how to handle arrays?

Intermediate representation