1 / 45

Topic #2: Infix to Postfix

Topic #2: Infix to Postfix. EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003. Why this topic?. Program to convert infix to postfix A simple, one-pass compiler! Infix notation is source language Postfix is intermediate code No code generation (but could be actions)

jmcghee
Download Presentation

Topic #2: Infix to Postfix

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic #2: Infix to Postfix EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003

  2. Why this topic? • Program to convert infix to postfix • A simple, one-pass compiler! • Infix notation is source language • Postfix is intermediate code • No code generation (but could be actions) • A good example that introduces many aspects of writing a compiler

  3. Syntax vs. Semantics • Syntax • Describes what is allowable in a language • Relatively easy to describe (we will use context free grammars) • Semantics • Describes the meaning of a program (or expression) • Very difficult to describe in a formal sense • We are discussing programming languages, but these definitions also apply to natural languages

  4. Structure of Simple Compiler • A lexical analyzer will tokenize the input • A “syntax-directed translator” combines syntax analysis and intermediate code generation

  5. Context-free Grammars • Often used to describe the hierarchical structure of a programming language • Every CFG has four components: • A set of tokens (terminal symbols) • A set of nonterminals • A set of productions (rules) • A start symbol (left side of first rule) Example rule: stmt→if (expr) stmtelsestmt

  6. Productions • Every production consists of left side, an arrow (“can have the form of”), right side • Left side is a always nonterminal; the production is “for” this nonterminal • Right side consists of terminals and/or non-terminals • Convention, nonterminals will be italicized Example rule: stmt→if (expr) stmtelsestmt

  7. Example CFG • List of digits separated by plus or minus signs: • For convenience, nonterminals can be grouped using ‘|’ (“or”) • The tokens of this grammar: + - 0 1 2 3 4 5 6 7 8 9 • The start symbol: list list→ list + digit list → list – digit list→digit digit→ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 list→ list + digit | list – digit | digit digit→ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

  8. CFG for Expressions expr→ expr + term | expr – term | term term → term * factor | term / factor | factor factor→digit | (expr) digit→ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

  9. Strings • A string of tokens is a sequence of zero or more tokens • The string containing zero tokens is the empty string: ‘ε’ • A grammar “derives” strings • starting with start symbol • repeatedly replacing nonterminals with right side of productions for the nonterminals • The set of strings that can be derived from the start symbol is the “language” for the grammar

  10. Parse Trees • Shows how the start symbol of a grammar can derive a string in the language • A tree with the following properties: • The root is labeled with a start symbol • Each leaf labeled with a token or ε • Each interior node is labeled by a nonterminal • If A is the label for an interior node, and X1,X2,…,Xn (nonterminals or tokens) are the labels of its children, then the following production must exist: A→X1 X2 … Xn

  11. Sample Parse Tree: 9 – 5 + 2

  12. More on Parse Trees • A language can be defined as all strings that can be generated by some parse tree • The process of finding a parse tree for a given string is called “parsing” the string

  13. Ambiguous Grammars • If any string has more than one parse tree, grammar is said to be ambiguous • Need to avoid for compilation, since string can have more than one meaning • List of digits separated by plus or minus signs: • Example merges notion of digit and list into single nonterminal string • Same strings are derivable, but some strings have multiple parse trees (possible meanings) string→ string + string | string – string |0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

  14. Two Parse Trees: 9 – 5 + 2

  15. Precedence and Associativity • Precedence • Determines the order in which different operators are evaluated when they occur in the same expression • Operators of higher precedence are applied before operators of lower precedence • Associativity • Determines the order in which operators of equal precedence are evaluated when they occur in the same expression • Most operators have a left-to-right associativity, but some have right-to-left associativity

  16. Postfix Notation • Formal rules, infix → postfix • If E is variable or constant, E → E • If E is expression of form E1 op E2, where op is binary operator, E1 → E1’, and E2 → E2’, then E → E1’ E2’ op • If E is expression of form (E1) and E1 → E1’, then E → E1’ • Parentheses are not needed!

  17. Syntax-Directed Definitions • CFG to specify syntactic structure of input • Each grammar symbol has associated attributes • Each production has associated semantic rules for computing values of attributes • Grammar and semantic rules together constitute the syntax-directed definition • A parse tree showing the attribute values at each node is called an annotated parse tree

  18. Two Types of Attributes • Synthesized Attributes • Value at a parse-tree node can be determined based on values of attributes of children • Can be evaluated during a single bottom-up traversal of parse tree • Inherited Attributes • More complicated • Will be discussed later in course

  19. Example Syntax-Directed Definition

  20. Depth-First Search • Recursively visit all children before evaluating semantic rules of given node • Suitable if all attributes are synthesized procedure visit (n:node) begin for each child m of n, from left to right do visit(m) evaluate semantic rules at node n end

  21. Annotated Parse Tree: 9 – 5 + 2

  22. Translation Schemes • Adds to a CFG • Includes “semantic actions” embedded within productions • Similar to syntax-directed definition, but order of evaluation is explicit

  23. Example Translation Scheme expr expr + term { print(‘+’) } expr  expr – term { print(‘-’) } expr  term term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) }

  24. Equivalent Translation Scheme expr term rest rest  + term { print(‘+’) } rest rest  - term { print(‘-’) } rest rest  ε term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) }

  25. Parsing • Parsing is the process of determining if a string of tokens can be generated by a grammar • For any CFG, there is a parser that takes at most O(n^3) time • For most programming languages that arise in practice, linear time parsers exist

  26. Top-down Parsing • Recursively apply the following steps: • At node n with nonterminal A, select a production for A • Construct children at n for symbols on right side of selected production • Find next node for which subtree needs to be constructed • Top-down parsing uses a “lookahead” symbol • Selecting production may involve trial-and-error and backtracking

  27. Predictive Parsing • Recursive-descent parsing is a recursive, top-down approach to parsing • A procedure is associated with each nonterminal of the grammar • Predictive parsing • Special case of recursive-descent parsing • The lookahead symbol unambiguously determines the procedure for each nonterminal

  28. Procedures for Nonterminals • Production with right side α used if lookahead is in FIRST(α) • FIRST(α) is set of all symbols that can be first symbol of α • If lookahead symbol is not in FIRST set for any production, can use production with right side of ε • If two or more possibilities, can not use this method • If no possibilities, an error is declared • Nonterminals on right side of selected production are recursively expanded

  29. Left Recursion • Left-recursive productions can cause recursive-descent parsers to loop forever • Example: example example + term • Can eliminate left recursion A  β R R  α R | ε A  A α | β

  30. Eliminating Left Recursion expr expr + term { print(‘+’) } expr  expr – term { print(‘-’) } expr  term term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) } expr term rest rest  + term { print(‘+’) } rest rest  - term { print(‘-’) } rest rest  ε term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) }

  31. Infix to Prefix Code: Part 1 #include <stdio.h> #include <ctype.h> int lookahead; void expr(void); void rest(void); void term(void); void match(int); void error(void); int main(void) { lookahead = getchar(); expr(); putchar('\n'); /* adds trailing newline character */ } …

  32. Infix to Prefix Code: Part 2 … void expr(void) { term(); rest(); } void term(void) { if (isdigit(lookahead)) { putchar(lookahead); match(lookahead); } else error(); } …

  33. Infix to Prefix Code: Part 3 … void rest(void) { if (lookahead == '+') { match('+'); term(); putchar('+'); rest(); } else if (lookahead == '-') { match('-'); term(); putchar('-'); rest(); } } …

  34. Infix to Prefix Code: Part 4 … void match(int t) { if (lookahead == t) lookahead = getchar(); else error(); } void error(void) { printf("syntax error\n"); /* print error message */ exit(1); /* then halt */ }

  35. Code Optimization 1 void rest(void) { REST: if (lookahead == '+') { match('+'); term(); putchar('+'); goto REST; } else if (lookahead == '-') { match('-'); term(); putchar('-'); goto REST; } }

  36. Code Optimization 2 void expr(void) { term(); while (1) { if (lookahead == '+') { match('+'); term(); putchar('+'); } else if (lookahead == '-') { match('-'); term(); putchar('-'); } else break; } }

  37. Improvements Remaining • Want to ignore whitespace • Allow numbers • Allow identifiers • Allow additional operators (multiplications and division) • Allow multiple expressions (separated by semicolons)

  38. Lexical Analyzer • Eliminates whitespace (and comments) • Reads numbers (not just single digits) • Reads identifiers and keywords

  39. Implementing the Lexical Analyzer

  40. Allowable Tokens • expected tokens: +, -, *, /, DIV, MOD, (, ), ID, NUM, DONE • ID represents an identifier, NUM represents a number, DONE represents EOF

  41. Tokens and Attributes

  42. A Simple Symbol Table • Each record of symbol table contains a token type and a string (lexeme or keyword) • Symbol table has fixed size • All lexemes in array of fixed size • Will be able to insert and search for tokens: • insert(s, t): creates entry with string s and token t, returns index into symbol table • lookup(s): searches for entry with string s, returns index if found, 0 otherwise • Keywords (div and mod) will be inserted into symbol table, they can not be used as identifiers

  43. Updated Translation Scheme start listeof list  expr; list | ε expr expr + term { print(‘+’) } | expr – term { print(‘-’) } | term term  term * factor { print(‘*’) } | term / factor { print(‘/’) } | termdivfactor { print(‘DIV’) } | termmodfactor { print(‘MOD’) } | factor factor  (expr) | id { print(id.lexeme) } | num { print(num.value) }

  44. After Eliminating Left Recursion start listeof list  expr; list | ε expr term moreterms moreterms  + term { print(‘+’) } moreterms | - term { print(‘-’) } moreterms | ε term  factor morefactors morefactors  * factor { print(‘*’) } morefactors | / factor { print(‘/’) } morefactors | divfactor { print(‘DIV’) } morefactors | modfactor { print(‘MOD’) } morefactors | ε factor  (expr) | id { print(id.lexeme) } | num { print(num.value) }

  45. Final Code • About 250 lines of C • Pretty sloppy, otherwise would be longer

More Related