190 likes | 261 Views
Dive into programming language syntax with this guide on grammar hierarchies, BNF Form, and derivation trees. Learn about ambiguity in expressions and how to create unambiguous grammars for compilers.
E N D
Grammars • Definitions • Grammars • Backus-Naur Form • Derivation • terminology • trees • Grammars and ambiguity • Simple example • Grammar hierarchies • Syntax graphs • Recursive descent parsing
Definitions • Syntax • the form or structure of the expressions, statements, and program units • Semantics • the meaning of the expressions, statements, and program units • Sentence • a string of characters over some alphabet • Language • a set of sentences • Lexeme • the lowest level syntactic unit of a language • :=, {, while • Token • a category of lexemes (e.g., identifier)
Grammars • Can serve as “generators” or “recognizers” • recognizers used in compilers • we’ll study grammars as generators • Contain 4 components • terminal symbols • atomic components of statements in the language • appear in source programs • identifiers, operators, punctuation, keywords • nonterminal symbols • intermediate elements in producing terminal symbols • never appear in source program • start (or goal) symbol • a special nonterminal which is the starting symbol for producing statements
Grammars (continued) • 4 components (continued) • productions • rules for transforming nonterminal symbols into terminals or other nonterminals • “nonterminal” ::= terminals and/or nonterminals • each has lefthand side (LHS) and righthand side (RHS) • every nonterminal must appear on LHS of at least one production
Grammars (continued) • 4 categories of grammars • regular • good for identifiers, parameter lists, subscripts • context free • LHS of production is single non-terminal • context sensitive • recursively enumerable enough for PLs
Backus-Naur Form (BNF) • Used to describe syntax of PL; first used for Algol-60 • Nonterminals are enclosed in <...> • <expression>, <identifier> • Alternatives indicated by | • <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • Options (0 or 1 occurrences) indicated by [...] • <stmt> ::= if <cond> then <stmt> [ else <stmt>] • note recursion • Repetition (0 or more occurrences) indicated by {...} • <unsigned> ::= <digit> {<digit>} • Derivation • repeated application of rules, starting with start symbol and ending with sentence
BNF (continued) • Example grammar and derivation <program> -> <stmts> <stmts> -> <stmt> | <stmt> ; <stmts> <stmt> -> <var> = <expr> <var> -> a | b | c | d <expr> -> <term> + <term> | <term> - <term> <term> -> <var> | const <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
Derivation Terminology • Every string of symbols in the derivation is a sentential form • A sentence is a sentential form that has only terminal symbols • A leftmost derivationis one in which the leftmost nonterminal in each sentential form is the one that is expanded • similarly for rightmost derivation • A derivation may be neither leftmost nor rightmost
Derivation Trees • A derivation tree is the tree resulting from applying productions to rewrite start symbol • a parse tree is the same tree starting with terminals and building back to the start symbol <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b
Grammars and Ambiguity • A grammar isambiguous iff it generates a sentential form that has two or more distinct parse trees • An ambiguous expression grammar: • <expr> -> <expr> <op> <expr> | const • <op> -> / | - <expr> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const
Grammars and Ambiguity (continued) • We must have unambiguous grammars so compiler can produce correct code • because parse tree provides precedence and associativity of operators • Left recursive grammars produce left associativity • Right recursive grammars produce right associativity • An unambiguous expression grammar: • <expr> -> <expr> - <term> | <term> • <term> -> <term> / const | const <expr> <expr> - <term> <term> <term> / const const const
Grammars and Ambiguity (continued) • One famous ambiguity is “dangling else” • <stmt> ::= if <cond> then <stmt> [else <stmt>] • This can derive if X > 9 then if B = 4 then X := 5 else X := 0
Grammars and Ambiguity (continued) • Can solve syntactically by adding nonterminals & prod • <stmt> ::= <matched> | <unmatched> • <matched> ::= if <cond> then <matched> else <matched> • <unmatched> ::= if <cond> then <stmt> | if <cond> then <matched> else <unmatched> • Can also solve semantically • “elses are associated with immediately preceding unmatched then”
Grammar Hierarchies • BNF (and equivalent notations such as syntax graphs) can describe context free grammars • nonterminals appear alone on the LHS of productions • But there is a whole hierarchy of grammar types • recursively enumerable • context sensitive • context free • regular • Context free grammars can describe the essential features of all current PLs • Regular grammars are good for identifiers, parameter lists, etc.
Simple Grammar Example • Consider following unambiguous grammar for expressions • <expr> ::= [<expr> <addop>] <term> • <term> ::= [<term> <mulop>] <factor> • <factor> ::= (<expr>) | <digit> • <addop> ::= + | - • <mulop> ::= * | / • <digit> ::= 0 | ... | 9 • This grammar is left recursive and generates expressions that are left associative • Changing <factor> production produces right associative exponentiation • <factor> ::= <expon> [ ** <factor> ]
Syntax Graphs • Are equivalent to CFGs • put the terminals in circles or ellipses and put the nonterminals in rectangles; • connect with lines with arrowheads • Terminals in circles • Non-terminals in rectangles • Lines and arrows indicate how constructs are built type_identifier ( identifier ) , constant .. constant
Recursive Descent Parsing • Parsing is the process of tracing or constructing a parse tree for a given input string • Parsers usually do not analyze lexemes • done by a lexical analyzer, which is called by the parser
Recursive Descent Parsing (continued) • A recursive descent parser traces out a parse tree in top-down order • top-down parser • Each nonterminal in the grammar has a subprogram associated with it • the subprogram parses all sentential forms that the nonterminal can generate • The recursive descent parsing subprograms are built directly from the grammar rules • Recursive descent parsers, like other top-down parsers, cannot be built from left-recursive grammars
Recursive Descent Parsing (continued) Example For the grammar: <term> -> <factor> {(* | /) <factor>} void term () { factor (); /* parse the first factor*/ while (next_token == ast_code || next_token == slash_code) { lexical (); /* get next token */ factor (); /* parse the next factor */ } }