730 likes | 911 Views
Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Review top-down parsing Recursive descent LL(1) parsing Start bottom-up parsing LR(k) SLR(k) LALR(k). Top-down parsing.
E N D
Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University
Today • Review top-down parsing • Recursive descent • LL(1) parsing • Start bottom-up parsing • LR(k) • SLR(k) • LALR(k)
Top-down parsing • Parser repeatedly faces the following problem • Given program P starting with terminal t • Non-terminal N with list of possible production rules: N α1 … N αk • Predict which rule can should be used to derive P
Recursive descent parsing • Define a function for every nonterminal • Every function work as follows • Find applicable production rule • Terminal function checks match with next input token • Nonterminal function calls (recursively) other functions • If there are several applicable productions for a nonterminal, use lookahead
Boolean expressions example E LIT | (E OP E) | not E LIT true|false OP and | or | xor not ( not true or false ) production to apply known from next token E E E => notE => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not ( E OP E ) not LIT or LIT true false
Flavors of top-down parsers • Manually constructed • Recursive descent (previous lecture, review now) • Generated (this lecture) • Based on pushdown automata • Does not use recursion
Recursive descent parsing • Define a function for every nonterminal • Every function work as follows • Find applicable production rule • Terminal function checks match with next input token • Nonterminal function calls (recursively) other functions • If there are several applicable productions for a nonterminal, use lookahead
Matching tokens E LIT | (E OP E) | not E LIT true|false OP and | or | xor match(token t) { if (current == t) current = next_token() else error } Variable current holds the current input token
Functions for nonterminals E LIT | (E OP E) | not E LIT true|false OP and | or | xor E() { if (current {TRUE, FALSE}) // E LIT LIT(); else if (current == LPAREN) // E ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) // E not E match(NOT); E(); else error; } LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; }
Implementation via recursion E() { if (current {TRUE, FALSE}) LIT(); else if (current == LPAREN) match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT) match(NOT); E(); else error; } E → LIT | ( E OP E ) | not E LIT → true | false OP → and | or | xor LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; } OP() { if (current == AND) match(AND); else if (current == OR) match(OR); else if (current == XOR) match(XOR); else error; }
How is prediction done? p. 189 • For simplicity, let’s assume no null production rules • See book for general case • Find out the token that can appear first in a rule – FIRST sets
FIRST sets • For every production rule Aα • FIRST(α) = all terminals that α can start with • Every token that can appear as first in α under some derivation for α • In our Boolean expressions example • FIRST( LIT ) = { true, false } • FIRST( ( E OP E ) ) = { ‘(‘ } • FIRST( not E ) = { not } • No intersection between FIRST sets => can always pick a single rule • If the FIRST sets intersect, may need longer lookahead • LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens • LL(1) is an important and useful class
Computing FIRST sets Assume no null productions A Initially, for all nonterminalsA, setFIRST(A) = { t | Atω for some ω } Repeat the following until no changes occur:for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) ∪ FIRST(B) This is known as fixed-point computation
FIRST sets computation example STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant
1. Initialization STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant
2. Iterate 1 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant
2. Iterate 2 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPRTERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant
2. Iterate 3 – fixed-point STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant
LL(k) grammars • A grammar is in the class LL(K) when it can be derived via: • Top-down derivation • Scanning the input from left to right (L) • Producing the leftmost derivation (L) • With lookahead of k tokens (k) • For every two productions Aα and Aβ we have FIRST(α) ∩ FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions) • A language is said to be LL(k) when it has an LL(k) grammar • What can we do if grammar is not LL(k)?
LL(k) parsing via pushdown automata • Pushdown automaton uses • Prediction stack • Input stream • Transition table • nonterminals x tokens -> production alternative • Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t
LL(k) parsing via pushdown automata • Two possible moves • Prediction • When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error • Match • When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t • If (t ≠ T) syntax error • Parsing terminates when prediction stack is empty • If input is empty at that point, success. Otherwise, syntax error
Model of non-recursivepredictive parser Predictive Parsing program Stack Output Parsing Table
Example transition table (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Which rule should be used Input tokens Nonterminals
Running parser example aacbb$ A aAb | c
Illegal input example abcbb$ A aAb | c
Using top-down parsing approach • Compute parsing table • If table is conflict-free then we have an LL(k) parser • If table contains conflicts investigate • If grammar is ambiguous try to disambiguate • Try using left-factoring/substitution/left-recursion elimination to remove conflicts
Marking “end-of-file” Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’ S $where $ is not part of the set of tokens To parse an input P with G’ we change it into P$ Simplifies top-down parsing with null productions and LR parsing
Bottom-up parsing • No problem with left recursion • Widely used in practice • Shift-reduce parsing: LR(k), SLR, LALR • All follow the same pushdown-based algorithm • Read input left-to-right producing rightmost derivation • Differ on type of “LR Items” • Parser generator CUP implements LALR
Some terminology • The opposite of derivation is called reduction • Let Aα be a production rule • Let βAµ be a sentence • Replace left-hand side of rule in sentence:βAµ=> βαµ • A handle is a substring that is reduced during a series of steps in a rightmost derivation
Rightmost derivation example 1 + (2) + (3) E E + (E) E i E + (2) + (3) Each non-leaf node represents a handle E + (E) + (3) Rightmost derivation in reverse E + (3) E E + (E) E E E E E 1 + ( 2 ) + 3 ( )
LR item To be matched Already matched Input N αβ Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β
LR items N αβ Shift Item N αβ Reduce Item
Example Z exprEOF expr term | expr+ term term ID | (expr) Z E $ E T | E + T T i | ( E ) (just shorthand of the grammar on the top)
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) i + i $ Z E $ Why do we need these additional LR items? Where do they come from? What do they mean? E T E E + T T i T (E )
-closure { Z E $, Z E $ E T | E + T T i | ( E ) E T, -closure({Z E $}) = E E + T, T i , T ( E ) } Given a set S of LR(0) items If P αNβ is in S then for each rule N in the grammarS must also contain N
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) i + i $ Remember position from which we’re trying to reduce Items denote possible future handles Z E $ E T E E + T T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) i + i $ Match items with current token Z E $ T i Reduce item! E T E E + T T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) T + i $ i Z E $ Reduce item! E T E T E E + T T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) E + i $ T i Z E $ Reduce item! E T E T E E + T T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) E + i $ T i Z E $ Z E$ E T E E + T E E+ T T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) E + i $ T i Z E $ Z E$ E E+T E T T i E E + T E E+ T T ( E ) T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) E + T $ i T i Z E $ Z E$ E E+T E T T i E E + T E E+ T T ( E ) T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) E + T $ T i i Reduce item! Z E $ Z E$ E E+T E E+T E T T i E E + T E E+ T T ( E ) T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) E $ E + T T i i Z E $ Z E$ E T E E + T E E+ T T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) E $ E + T T i Reduce item! i Z E $ Z E$ Z E$ E T E E + T E E+ T T i T ( E )
Example: parsing with LR items Z E $ E T | E + T T i | ( E ) Z E $ E + T Reduce item! i T Z E $ i Z E$ Z E$ E T E E + T E E+ T T i T ( E )
Computing item sets • Initial set • Z is in the start symbol • -closure({ Zα | Zαis in the grammar } ) • Next set from a set S and the next symbol X • step(S,X) = { NαXβ | NαXβ in the item set S} • nextSet(S,X) = -closure(step(S,X))
LR(0) automaton example reduce state shift state q6 E T T T q7 q0 T (E) E T E E + T T i T (E) Z E$ E T E E + T T i T (E) ( q5 i i T i E E ( ( i q1 q8 q3 Z E$ E E+ T T (E) E E+T E E+T T i T (E) + + $ ) q9 q2 Z E$ T (E) T q4 E E + T
GOTO/ACTION tables empty – error move ACTION Table GOTO Table
LR(0) parser tables • Two types of rows: • Shift row – tells which state to GOTO for current token • Reduce row – tells which rule to reduce (independent of current token) • GOTO entries are blank