1 / 7

Context Free Grammars and BNF

Context Free Grammars and BNF. In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur form (BNF) notation describes CFGs Symbols are either tokens or nonterminal symbols

caesar
Download Presentation

Context Free Grammars and BNF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Context Free Grammars and BNF • In context free grammars (CFGs), structures are independent of the other structures surrounding them • Backus-Naur form (BNF) notation describes CFGs • Symbols are either tokens or nonterminal symbols • Productions are of the form nonterminal → definition where definition defines the structure of a nonterminal • Rules may be recursive, with nonterminal symbol appearing both on left side of a production and in its own definition • Metasymbols are used to identify the parts of the production (arrow), alternative definitions of a nonterminal (vertical bar) • Next time we’ll extend metasymbols for repeated (braces) or optional (square brackets) structure in a definition (EBNF)

  2. Parse Trees and Abstract Syntax Trees • Parse trees show derivation of a structure from BNF • E.g., number → DIGIT | DIGIT number • Abstract syntax trees (ASTs) encapsulate the details • Very useful for converting between structurally similar forms parse tree abstract syntax tree hornclause number body number head DIGIT number DIGIT predicate … 4 DIGIT 2 5

  3. Ambiguity, Associativity, Precedence • If any statement in the language has more than one distinct parse tree, the language is ambiguous • Ambiguity can be removed implicitly, as inalways replacing the leftmost remaining nonterminal (an implementation hack) • Recursive production structure also can disambiguate • E.g., adding another production to the grammar to establish precedence (lower in parse tree gives higher precedence) • E.g., replacing exp → exp + exp with alternative productions exp → exp + term or exp → term + exp • Recursive productions also define associativity • I.e., left-recursive form exp → exp + term is left-associative, right-recursive form exp → term + exp is right-associative

  4. Extended Backus-Naur Form (EBNF) • Optional/repeated structure is common in programs • E.g., whether or not there are any arguments to a function • E.g., if there are arguments, how many there are • We can extend BNF with metasymbols • E.g., square brackets indicate optional elements, as in the production function → name ‘(‘ [args] ‘)’ • E.g., curly braces to indicate zero or more repetitions of elements, as in the production args → arg {‘,’ arg} • Doesn’t change the expressive power of the grammar • A limitation of EBNF is that it obscures associativity • Better to use standard BNF to generate parse/syntax trees

  5. Recursive-Descent Parsing • Shift-reduce (bottom-up) parsing techniques are powerful, but complex to design/implement manually • Further details about them are in another course (CSE 431) • Still will want to understand how they work, use techniques • Recursive-descent (top-down) parsing is often more straightforward, and can be used in many cases • We’ll focus on these techniques somewhat in this course • Key idea is to design (potentially recursive) parsing functions based on the productions’ right-hand sides • Then, work through a grammar from more general rules to more specific ones, consuming input tokens upon a match • EBNF helps with left recursion removal (making a loop) and left factoring (making remainder of parse function optional)

  6. Lookahead with First and Follow Sets • Recursive descent parsing functions are easiest to write if they only have to consider the current token • I.e., the head of a stream or list of input tokens • Optional and repeated elements complicate this a bit • E.g., function → name ( [args] ) and arg → 0 |…| 9 and args → arg {, arg} with ( )0 |…| 9 , as terminal symbols • But, EBNF structure helps in handling these two cases • The set of tokens that can be first in a valid sequence, e.g., each digit in 0 |…| 9 is in the first set for arg (and for args) • The set of tokens that can follow a valid sequence of tokens, e.g., ‘)’ is in the follow set for args • A token from the first set gives a parse function permission to start, while one from the follow set directs it to end

  7. Today’s Studio Exercises • We’ll code up ideas from Scott Chapter 2.3 • Looking at more ideas and mechanisms for parsing, especially ones that are relevant to the lab assignment • Today’s exercises are again all in C++ • Please take advantage of the on-line tutorial and reference manual pages that are linked on the course web site • As always, please ask us for help as needed • When done, email your answers to the course account with “Syntax Studio II” in the subject line

More Related