1 / 19

Top-Down Parsing

Top-Down Parsing. Where Are We?. Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract Syntax Tree (AST). Lexical Analysis. Syntactic Analysis. if. Semantic Analysis. ==. ;. =. b. 0. a. “Hi”. Do tokens conform to the language syntax?.

owena
Download Presentation

Top-Down Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Top-Down Parsing

  2. Where Are We? • Source code: if (b==0) a = “Hi”; • Token Stream: if (b == 0) a = “Hi”; • Abstract Syntax Tree • (AST) Lexical Analysis Syntactic Analysis if Semantic Analysis == ; = b 0 a “Hi” Do tokens conform to the language syntax?

  3. Last Time • Parse trees vs. ASTs • Derivations • Leftmost vs. Rightmost • Grammar ambiguity

  4. Parsing • What is parsing? • Discovering the derivation of a string: If one exists • Harder than generating strings • Two major approaches • Top-down parsing • Bottom-up parsing • Won’t work on all context-free grammars • Properties of grammar determine parse-ability • We may be able to transform a grammar

  5. Two Approaches • Top-down parsers LL(1), recursive descent • Start at the root of the parse tree and grow toward leaves • Pick a production & try to match the input • Bad “pick”  may need to backtrack • Bottom-up parsers LR(1), operator precedence • Start at the leaves and grow toward root • As input is consumed, encode possible parse trees in an internal state • Bottom-up parsers handle a large class of grammars

  6. Grammars and Parsers • LL(1) parsers • Left-to-right input • Leftmost derivation • 1 symbol of look-ahead • LR(1) parsers • Left-to-right input • Rightmost derivation • 1 symbol of look-ahead • Also: LL(k), LR(k),LALR, … Grammars that this can handle are called LL(1) grammars Grammars that this can handle are called LR(1) grammars

  7. Top-Down Parsing • Start with the root of the parse tree • Root of the tree: node labeled with the start symbol • Algorithm: Repeat until the fringe of the parse tree matches input string • At a node A, select a production for A Add a child node for each symbol on rhs • If a terminal symbol is added that doesn’t match, backtrack • Find the next node to be expanded (a non-terminal) • Done when: • Leaves of parse tree match input string (success) • All productions exhausted in backtracking (failure)

  8. Example • Expression grammar (with precedence) • Input string x – 2 * y

  9. Current position in the input stream Example • Problem: • Can’t match next terminal • We guessed wrong at step 2 expr  x - 2 * y  x - 2 * y 2 expr + term  x – 2 * y 3 term + term expr + term  x – 2 * y 6 factor + term x  – 2 * y 8 <id> + term x  – 2 * y -<id,x> + term term fact x

  10. Backtracking • Rollback productions • Choose a different production for expr • Continue  x - 2 * y  x - 2 * y 2 expr + term  x – 2 * y Undo all these productions 3 term + term  x – 2 * y 6 factor + term x  – 2 * y 8 <id> + term x  – 2 * y ? <id,x> + term

  11. Retrying • Problem: • More input to read • Another cause of backtracking expr  x - 2 * y  x - 2 * y 2 expr - term expr - term  x – 2 * y 3 term - term  x – 2 * y 6 factor - term x  – 2 * y 8 <id> - term term fact x –  2 * y -<id,x> - term x –  2 * y 3<id,x> - factor x – 2  * y fact 2 7<id,x> - <num> x

  12. term * fact fact y 2 Successful Parse • All terminals match – we’re finished expr  x - 2 * y  x - 2 * y 2 expr - term expr - term  x – 2 * y 3 term - term  x – 2 * y 6 factor - term x  – 2 * y 8 <id> - term term x –  2 * y -<id,x> - term x –  2 * y 4<id,x> - term * fact x –  2 * y fact 6<id,x> - fact * fact x – 2  * y 7<id,x> - <num> * fact x – 2 *  y - <id,x> - <num,2> * fact x x – 2 * y  8<id,x> - <num,2> * <id>

  13. Other Possible Parses • Problem: termination • Wrong choice leads to infinite expansion (More importantly: without consuming any input!) • May not be as obvious as this • Our grammar is left recursive  x - 2 * y  x - 2 * y 2 expr + term  x – 2 * y 2 expr + term + term  x – 2 * y 2 expr + term + term + term  x – 2 * y 2 expr + term + term + term + term

  14. Left Recursion • Formally, A grammar is left recursive if  a non-terminal A such that A →* A a(for some set of symbols a) • Bad news: Top-down parsers cannot handle left recursion • Good news: We can systematically eliminate left recursion What does →* mean? A → B x B → A y

  15. Removing Left Recursion • Two cases of left recursion: • Transform as follows:

  16. Two productions with no choice at all All other productions are uniquely identified by a terminal symbol at the start of RHS Right-Recursive Grammar • We can choose the right production by looking at the next input symbol • This is called lookahead • BUT, this can be tricky…

  17. Predictive Parsing Given an LL(1) Grammar • The parser can “predict” the correct expansion • Using lookahead and FIRST and FOLLOW sets • Two kinds of predictive parsers • Recursive descent Often hand-written • Table-driven Generate tables from First and Follow sets

  18. Recursive Descent • This produces a parser with six mutually recursive routines: • Goal • Expr • Expr2 • Term • Term2 • Factor • Each recognizes one NT or T • The term descent refers to the direction in which the parse tree is built.

  19. Next Time … • Bottom-up Parsers • More powerful • Widely used – yacc, bison, JavaCUP • Overview of YACC • Removing shift/reduce reduce/reduce conflicts • Just in case you haven’t started your homework!

More Related