490 likes | 674 Views
CMPUT680 - Fall 2003. Topic 6: Optimal Code Generation José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680. Data Dependency Graph. (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5;
E N D
CMPUT680 - Fall 2003 Topic 6: Optimal Code Generation José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680 CMPUT 680 - Compiler Design and Optimization
Data Dependency Graph (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 a CMPUT 680 - Compiler Design and Optimization
b c d e Data Dependency Graph (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 a CMPUT 680 - Compiler Design and Optimization
b c d e Data Dependency Graph (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 a f CMPUT 680 - Compiler Design and Optimization
b c d e f g Data Dependency Graph (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4:= t1 - 4; (e) t5:= t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4- t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 a CMPUT 680 - Compiler Design and Optimization
b c d e f g Data Dependency Graph (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7:= t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 a h CMPUT 680 - Compiler Design and Optimization
b c d e f g Data Dependency Graph (a) t1 := ld(x); (b) t2 := t1 + 4; (c) t3 := t1 * 8; (d) t4 := t1 - 4; (e) t5 := t1 / 2; (f) t6 := t2 * t3; (g) t7 := t4 - t5; (h) t8 := t6 * t7; (i) st(y,t8); B3 a h i CMPUT 680 - Compiler Design and Optimization
Code Generation Problem: How to generate optimal code for a basic block specified by its DAG representation? If the DAG is a tree, we can use Sethi-Ullman algorithm to generate code that is optimal in terms of program length or number of registers used. CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 557)
Example t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 Generate code for a machine with two registers. - t4 Assume that only t4 is alive at the exit of the basic block. + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 558)
Example t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD a, R0 ADD b, R0 - t4 + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 558)
Example Assume that SUB only works with registers. Can’t evaluate t3 because there are no available registers! t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD a, R0 ADD b, R0 LOAD c, R1 ADD d, R1 - t4 + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 558)
Example t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD a, R0 ADD b, R0 LOAD c, R1 ADD d, R1 STORE R0, t1 LOAD e, R0 SUB R1, R0 - t4 + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 558)
Example t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD a, R0 ADD b, R0 LOAD c, R1 ADD d, R1 STORE R0, t1 LOAD e, R0 SUB R1, R0 LOAD t1, R1 SUB R0, R1 STORE R1, t4 - t4 + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 558)
Example t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD a, R0 ADD b, R0 LOAD c, R1 ADD d, R1 STORE R0, t1 LOAD e, R0 SUB R1, R0 LOAD t1, R1 SUB R0, R1 STORE R1, t4 1 spill - Evaluation Order: t1, t2, t3, t4 t4 + - t1 t3 e + a b t2 c d 10 instructions CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 558)
Example(can we do better?) t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD c, R0 ADD d, R0 - t4 + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 559)
Example(can we do better?) t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD c, R0 ADD d, R0 LOAD e, R1 SUB R0,R1 - t4 + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 559)
Example(can we do better?) t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD c, R0 ADD d, R0 LOAD e, R1 SUB R0, R1 LOAD a, R0 ADD b, R0 - t4 + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 559)
Example(can we do better?) t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD c, R0 ADD d, R0 LOAD e, R1 SUB R0, R1 LOAD a, R0 ADD b, R0 SUB R1, R0 STORE R0, t4 - t4 + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 559)
Example(can we do better? Yes!!!) t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 LOAD c, R0 ADD d, R0 LOAD e, R1 SUB R0, R1 LOAD a, R0 ADD b, R0 SUB R1, R0 STORE R0, t4 no spills!!! - Evaluation Order: t2, t3, t1, t4 t4 + - t1 t3 e + a b t2 8 instructions c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 559)
Example:Why the improvement? We evaluated t4 immediately after t1 (its leftmost argument). t1 := a + b t2 := c + d t3 := e - t2 t4 := t1 - t3 - t4 + - t1 t3 e + a b t2 c d CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 559)
Heuristic Node Listing Algorithm for a DAG (1) while unlisted interior nodes remain (2) select an unlisted node n, all of whose parents have been listed; (3) list n; (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; endwhile endwhile CMPUT 680 - Compiler Design and Optimization (Aho-Sethi-Ullman,pp. 560)
+ - - + + c d e a b Node Listing Example 1 2 3 4 8 5 7 6 11 12 9 10 (1) while unlisted interior nodes remain (2) select an unlisted node n, all of whose parents have been listed; (3) list n; CMPUT 680 - Compiler Design and Optimization
+ - - + + c d e a b Node Listing Example n 1 List: 1 2 3 4 8 5 7 6 11 12 9 10 (1) while unlisted interior nodes remain (2) select an unlisted node n, all of whose parents have been listed; (3) list n; CMPUT 680 - Compiler Design and Optimization
- + + c d e a b Node Listing Example 1 n List: 1 m 2 + - 3 4 8 5 7 6 11 12 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
- + + c d e a b Node Listing Example 1 n List: 1 2 m 2 + - 3 4 8 5 7 6 11 12 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
- + d e a b Node Listing Example 1 List: 1 2 n 2 + - 3 4 8 5 m + c 7 6 11 12 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
- + + c d e a b Node Listing Example 1 List: 1 2 3 n 2 + - 3 4 8 5 7 6 11 12 9 10 (1) while unlisted interior nodes remain (2) select an unlisted node n, all of whose parents have been listed; (3) list n; CMPUT 680 - Compiler Design and Optimization
- + + c d e a b Node Listing Example 1 List: 1 2 3 n 2 + - 3 m 4 8 5 7 6 11 12 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
- + + c d e a b Node Listing Example 1 List: 1 2 3 4 n 2 + - 3 m 4 8 5 7 6 11 12 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
+ c d e a b Node Listing Example 1 List: 1 2 3 4 2 + - 3 n 4 m - + 8 5 7 6 11 12 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
+ c d e a b Node Listing Example 1 List: 1 2 3 4 5 2 + - 3 n 4 m - + 8 5 7 6 11 12 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
d e a b Node Listing Example 1 List: 1 2 3 4 5 2 + - 3 4 n - + 8 5 m + c 7 6 11 12 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
d e a b Node Listing Example 1 List: 1 2 3 4 5 6 2 + - 3 4 n - + 8 5 m + c 7 6 11 12 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
d e Node Listing Example 1 List: 1 2 3 4 5 6 2 + - 3 4 - + 8 5 n + c 7 6 11 12 m a b 9 10 (4) while the leftmost child m of n has no unlisted parents, and is not a leaf node do /* since n was just listed, m is not yet listed */ (5) list m; (6) n := m; CMPUT 680 - Compiler Design and Optimization
d e a b Node Listing Example 1 List: 1 2 3 4 5 6 8 2 + - 3 4 n - + 8 5 + c 7 6 11 12 9 10 (1) while unlisted interior nodes remain (2) select an unlisted node n, all of whose parents have been listed; (3) list n; CMPUT 680 - Compiler Design and Optimization
d e a b Node Listing Example 1 List: 1 2 3 4 5 6 8 2 + - 3 4 - + 8 5 + c 7 6 11 12 9 10 Therefore the optimal evaluation order (regardless of the number of registers available) for the internal nodes is 8654321. CMPUT 680 - Compiler Design and Optimization
Optimal Code Generationfor Trees If the DAG representing the data flow in a basic block is a tree, then for some machine models, there is a simple algorithm (the SethiUllman algorithm) that gives the optimal order. The order is optimal in the sense that it yields the shortest instruction sequence over all instruction sequences that evaluate the tree. CMPUT 680 - Compiler Design and Optimization
Sethi-Ullman Algorithm Intuition: 1. Label each node according to the number of registers that are required to generate code for the node. 2. Generate code from top down always generating code first for the child that requires the most registers. CMPUT 680 - Compiler Design and Optimization
Sethi-Ullman Algorithm(Intuition) Left leaf Right leaves Bottom-Up Labeling: visit a node after all its children are labeled. CMPUT 680 - Compiler Design and Optimization
Labeling Algorithm CMPUT 680 - Compiler Design and Optimization
Labeling Algorithm CMPUT 680 - Compiler Design and Optimization
a b Example t4 t1 t3 e t2 c d CMPUT 680 - Compiler Design and Optimization
a b Example Labeling leaves: leftmost is 1, others are 0 t4 t1 t3 e t2 0 1 1 c d 1 0 CMPUT 680 - Compiler Design and Optimization
Example Labeling t2: Thus t4 t1 t3 a b e t2 0 1 1 c d 1 0 CMPUT 680 - Compiler Design and Optimization
Example t4 t1 t3 a b e t2 0 1 1 1 c d 1 0 CMPUT 680 - Compiler Design and Optimization
Example Likewise labeling t2 t4 t1 t3 2 a b e t2 0 1 1 1 c d 1 0 CMPUT 680 - Compiler Design and Optimization
Example t4 t1 t3 1 2 a b e t2 0 1 1 1 c d 1 0 CMPUT 680 - Compiler Design and Optimization
t4 2 t1 t3 1 2 a b e t2 0 1 1 1 c d 1 0 Example See Aho, Sethi, Ullman (pp. 564) for code generating algorithm. CMPUT 680 - Compiler Design and Optimization