70 likes | 136 Views
Context Free Grammars and BNF. In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur form (BNF) notation describes CFGs Symbols are either tokens or nonterminal symbols
E N D
Context Free Grammars and BNF • In context free grammars (CFGs), structures are independent of the other structures surrounding them • Backus-Naur form (BNF) notation describes CFGs • Symbols are either tokens or nonterminal symbols • Productions are of the form nonterminal → definition where definition defines the structure of a nonterminal • Rules may be recursive, with nonterminal symbol appearing both on left side of a production and in its own definition • Metasymbols are used to identify the parts of the production (arrow), alternative definitions of a nonterminal (vertical bar) • Next time we’ll extend metasymbols for repeated (braces) or optional (square brackets) structure in a definition (EBNF)
Parse Trees and Abstract Syntax Trees • Parse trees show derivation of a structure from BNF • E.g., number → DIGIT | DIGIT number • Abstract syntax trees (ASTs) encapsulate the details • Very useful for converting between structurally similar forms parse tree abstract syntax tree hornclause number body number head DIGIT number DIGIT predicate … 4 DIGIT 2 5
Ambiguity, Associativity, Precedence • If any statement in the language has more than one distinct parse tree, the language is ambiguous • Ambiguity can be removed implicitly, as inalways replacing the leftmost remaining nonterminal (an implementation hack) • Recursive production structure also can disambiguate • E.g., adding another production to the grammar to establish precedence (lower in parse tree gives higher precedence) • E.g., replacing exp → exp + exp with alternative productions exp → exp + term or exp → term + exp • Recursive productions also define associativity • I.e., left-recursive form exp → exp + term is left-associative, right-recursive form exp → term + exp is right-associative
Extended Backus-Naur Form (EBNF) • Optional/repeated structure is common in programs • E.g., whether or not there are any arguments to a function • E.g., if there are arguments, how many there are • We can extend BNF with metasymbols • E.g., square brackets indicate optional elements, as in the production function → name ‘(‘ [args] ‘)’ • E.g., curly braces to indicate zero or more repetitions of elements, as in the production args → arg {‘,’ arg} • Doesn’t change the expressive power of the grammar • A limitation of EBNF is that it obscures associativity • Better to use standard BNF to generate parse/syntax trees
Recursive-Descent Parsing • Shift-reduce (bottom-up) parsing techniques are powerful, but complex to design/implement manually • Further details about them are in another course (CSE 431) • Still will want to understand how they work, use techniques • Recursive-descent (top-down) parsing is often more straightforward, and can be used in many cases • We’ll focus on these techniques somewhat in this course • Key idea is to design (potentially recursive) parsing functions based on the productions’ right-hand sides • Then, work through a grammar from more general rules to more specific ones, consuming input tokens upon a match • EBNF helps with left recursion removal (making a loop) and left factoring (making remainder of parse function optional)
Lookahead with First and Follow Sets • Recursive descent parsing functions are easiest to write if they only have to consider the current token • I.e., the head of a stream or list of input tokens • Optional and repeated elements complicate this a bit • E.g., function → name ( [args] ) and arg → 0 |…| 9 and args → arg {, arg} with ( )0 |…| 9 , as terminal symbols • But, EBNF structure helps in handling these two cases • The set of tokens that can be first in a valid sequence, e.g., each digit in 0 |…| 9 is in the first set for arg (and for args) • The set of tokens that can follow a valid sequence of tokens, e.g., ‘)’ is in the follow set for args • A token from the first set gives a parse function permission to start, while one from the follow set directs it to end
Today’s Studio Exercises • We’ll code up ideas from Scott Chapter 2.3 • Looking at more ideas and mechanisms for parsing, especially ones that are relevant to the lab assignment • Today’s exercises are again all in C++ • Please take advantage of the on-line tutorial and reference manual pages that are linked on the course web site • As always, please ask us for help as needed • When done, email your answers to the course account with “Syntax Studio II” in the subject line