910 likes | 1.11k Views
Chapter 1. Language Processor. Introduction. Semantic gap Solve by PL Design and coding PL implementation steps Introduced new PL Domain Specification Gap: Semantic gap between two specification of same task. Execution Gap: Gap between the semantics of the program
E N D
Chapter 1 Language Processor
Introduction • Semantic gap • Solve by PL • Design and coding • PL implementation steps • Introduced new PL Domain • Specification Gap: • Semantic gap between two specification of same task. • Execution Gap: • Gap between the semantics of the program written in different programming language. Application domain Execution domain Semantic Gap Application domain PL Domain Execution domain Execution Gap Specification gap
Language Processor • Definition: LP is a software which bridges a specification or execution gap. • Parts of LP: • Language translator: bridges an execution gap like compiler, assembler • Detranslator • Preprocessor • language migrator • Interpreter: is a language processor which bridges an execution gap without generating m/c lang. program.
Problem oriented lang. • Less specification gap, more execution gap • Procedure oriented lang. • More specification gap, less specification gap
Language processing activities • Program generation activity • Program Execution activity: • Translation and Interpretation Application domain Program generator domain Target PL Domain Execution Domain Specification Gap
Program Translation • Translate program from SL to m/c language. • Characteristics • A program must be translated before it can be executed. • A translated program may saved in a file and saved program may be executed repeatedly. • A program must be retranslated following modifications. • Program Interpretation: • Reads the source program and stores in to memory. • Determines it meaning and performs action.
Program interpretation and Execution • Program Execution • Fetch the instruction cycle • Decode the instruction to determine the operation. • Execute the instruction • Program Interpretation • Fetch the statement • Analyze the instruction to determine the meaning. • Execute the statement
Comparison • ?????
Fundamentals of language processing • LP= Analysis of SP+ Synthesis of TP. • Analysis of SP • Lexical rule: valid lexical units • Syntax rule: formation of valid statements • Semantic rule: Associate mening with valid statements.
Phases of LP • Forward Reference: A forward reference of a program entity is a reference to the entity which precedes its definition in the program. Ex. struct s { struct t *pt}; . . struct t { struct s *ps }; • Issues concerning memory requirements and organization of LP. Analysis Phase Synthesis Phase Sourceprogram IR Targetprogram Errors Errors
Passes of LP • Language Processor pass: A language processor pass is the processing of every statement in a source program, or its equivalent representation, to perform language processing function. • Pass-I: Perform Analysis of SP. • Pass-II : Perform synthesis of TP.
Intermediate Representation of Programs • Intermediate Representation: An Intermediate representation(IR) is a representation of a source program which reflects the effect of some, but not all, analysis and synthesis tasks performed during language processing. SP Front End Back End TP Intermediate Representation(IR)
IR and Semantic actions • Properties of IR • Ease of Use: • Processing Efficiency: • Memory efficiency: compact • Semantic actions: • All actions performed by the front end, except lexical and syntax analysis are called semantic actions., which includes • Checking semantic validity • Determine the meaning • Constructing IR
Toy Compiler • Gcc or cc compiler- c or c++ • Toy compiler- ??? • Front End • Lexical analysis • Syntax analysis • Semantic analysis • Back End • Memory Allocation • Code Generation SymboltableGeneration
Front End • Lexical (scanning) • Ex. a:=b+i ; id#2 op#5 id #3 op#3 id #1 op#10 • Syntax (Parsing) a,b : real; a:=b+i; • Semantic: IC tree is generated real a b
Back End • Memory Allocation: • Code Generation: Generating Assembly Lang. • issues: • Determine the places where IR should be kept. • Determine which instructions should be used for type conversion. • Determine which addressing mode should be used for accessing variables.
Fundamentals of Language Specification • Programming language Grammars: • Terminal symbols • lowercase letters, punctuation marks, null • Concatenation(.) • Nonterminal symbols: name of syntax category of language • Productions: called rewriting a rule, is a rule of the grammar • NT = String of T’s and NT’s. • Production form: <article>= a/an/the <Noun> =<boy ><apple> <Noun phrase>= <artical><Noun>
Grammar • Def: A grammar G of a language Lg is a quadruple (∑,SNT,S,P) where, • ∑ is the set of terminals • SNT is the set of NT’s • S is the distinguished symbol • P is the set of productions • Ex: Derive a sentence “A boy ate an apple” • <sentence> = <Noun Phrase> <verb phrase> • <Noun phrase> =<article><Noun> • <verb phrase>=<verb ><noun phrase> • <Article> = a/an/the • <Noun> = boy/apple • <Verb> = ate
Grammar • Derive a + b * c /5 and construct parse tree.(top down) • <exp>=<exp> + <term> | <term> • <term>=<term>*<factor> | <factor> • <factor>=<factor>/<number> • <number>=0/1/2/3/../9 • Classification of grammar: • Type-0: phrase structure grammar • Type-1 : context sensitive grammar • Type-2 : context free grammar • Type-3 : linear grammar or regular grammar
Binding • Definition: A binding is the association of an attribute of a program entity with a value. • Static Binding: Binding is a binding performed before the execution of a program begins. • Dynamic Binding: Binding is a binding performed after the execution of a program begins.
Chapter - 3 Scanning and Parsing Unit-2
Scanning • Definition: Scanning is the process of recognizing the lexical components in a source string. • Type-3 grammar orregular grammar • Regular grammar used to identify identifiers • Regular language obtained from the operation or , concatenation and Kleen* • Ex. Write a regular expression which used to identify strings which ends with abb. • (a+b)*abb. • Ex. Write a regular expression which used to identify strings which recognize identifiers. • R.E. = (letter)(letter/ digit)* • Digit = 0/1/2/…/9 • Letter = a/b/c/…./z
Examples of regular expression • Integer :[+/-](d)† • Real : [+/-](d)†.(d) † • Real with optional fraction : [+/-](d)†.(d) * • Identifier : l(l/d)*
Example of Regular expression • String ending with 0 : (0+1)*0 • String ending with 11: (0+1)*11 • String with 0 EVEN and 1 ODD. (0+1)*(01*01*)*11*0* • The language of all strings containing exactly two 0’s. :1*01*01* • The language of all strings that do not end with 01 : ^+1+(0+1)*+(0+1)*11
Finite state automaton • FSA: is a triple (S,∑,T) where, S is a finite set of states, ∑ is the alphabet of source symbols, T is a finite set of state transitions FSA DFA NFA
DFA from Regular Expression (0+1)*0 (11+10)* (0+1)*(1+00) (0+1)*
Transition table from DFA (0+1)*0 Transition Table (11+10)* (0+1)*(1+00) (0+1)*
DFA and it’s transition Diagram Check for the given string aabab
Types of Parser Types of Parser Top down Parser Bottom Up Parser Backtracking Predictive Parser LR Parser Shift Reduce Parser SLR LR LALR
Example • Expression grammar (with precedence) • Input string x – 2 * y
Current position in the input stream Example • Problem: • Can’t match next terminal • We guessed wrong at step 2 expr x - 2 * y x - 2 * y 2 expr + term x – 2 * y 3 term + term expr + term x – 2 * y 6 factor + term x – 2 * y 8 <id> + term x – 2 * y -<id,x> + term term fact x
Backtracking • Rollback productions • Choose a different production for expr • Continue x - 2 * y x - 2 * y 2 expr + term x – 2 * y Undo all these productions 3 term + term x – 2 * y 6 factor + term x – 2 * y 8 <id> + term x – 2 * y ? <id,x> + term
Retrying • Problem: • More input to read • Another cause of backtracking expr x - 2 * y x - 2 * y 2 expr - term expr - term x – 2 * y 3 term - term x – 2 * y 6 factor - term x – 2 * y 8 <id> - term term fact x – 2 * y -<id,x> - term x – 2 * y 3<id,x> - factor x – 2 * y fact 2 7<id,x> - <num> x
term * fact fact y 2 Successful Parse • All terminals match – we’re finished expr x - 2 * y x - 2 * y 2 expr - term expr - term x – 2 * y 3 term - term x – 2 * y 6 factor - term x – 2 * y 8 <id> - term term x – 2 * y -<id,x> - term x – 2 * y 4<id,x> - term * fact x – 2 * y fact 6<id,x> - fact * fact x – 2 * y 7<id,x> - <num> * fact x – 2 * y - <id,x> - <num,2> * fact x x – 2 * y 8<id,x> - <num,2> * <id>
Problems in Top down Parsing • Backtracking( we have seen) • Left recursion • Left Factoring
Left Recursion • Problem: termination • Wrong choice leads to infinite expansion (More importantly: without consuming any input!) • May not be as obvious as this • Our grammar is left recursive x - 2 * y x - 2 * y 2 expr + term x – 2 * y 2 expr + term + term x – 2 * y 2 expr + term + term + term x – 2 * y 2 expr + term + term + term + term
Rules for Left Recursion • If A-> Aa1/Aa2/Aa3/………/Aan/b1/b2/…/bn • After removal of left Recursion A-> b1A’/b2A’/b3A’ A’-> a1A’/a2A’/є • Ex. Apply for • A-> Aa/Ab/c/d • A-> Ac/Aad/bd/є
Removing Left Recursion • Two cases of left recursion: • Transform as follows
Left Factoring • When the choice between two production is not clear, we may be able to rewrite the productions to defer decisions is called as left factoring. Ex. Stmt-> if expr then stmt else stmt | if expr then stmt Stmt-> if expr then stmt S’ S’-> if expr then stmt |є • Rules: if A-> ab1/ab2 then A-> aA’ A’-> b1/b2
Some examples for Left factoring • S-> Assig_stmt/call_stmt/other • Assig_stmt-> id=exp • call_stmt->id(exp_list)
Recursive Descent Parsing • Example Rule 1: S a S b Rule 2: S b S a Rule 3: S B Rule 4: B b B Rule 5: B • Parse: a a b bb • Has to use R1: S a S b • Again has to use R1: a S b a a S b b • Now has to use Rule 2 or 3, follow the order (always R2 first): • a a S b b a a b S a b b a a b b S a a b b a a b bb S a aa b b • Now cannot use Rule 2 any more: a a b bbB a aa b b a a b bbB a aa b b incorrect, backtrack • After some backtracking, finally tried • a S b a a S b b a a b Bbb a a b bb worked
Predicative Parsing • Need to immediately know which rule to apply when seeing the next input character • If for every non-terminal X • We know what would be the first terminal of each X’s production • And the first terminal of each X’s production is different • Then • When current leftmost non-terminal is X • And we can look at the next input character • We know exactly which production should be used next to expand X
Predicative Parsing • Need to immediately know which rule to apply when seeing the next input character • If for every non-terminal X • We know what would be the first terminal of each X’s production • And the first terminal of each X’s production is different • Example Rule 1: S a S b Rule 2: S b S a Rule 3: S B Rule 4: B b B Rule 5: B First terminal is a First terminal is b If next input is a, use R1 If next input is b, use R2 But, R3’s first terminal is also b Won’t work!!!
Predicative Parsing • Need to immediately know which rule to apply when seeing the next input character • If for every non-terminal X • We know what would be the first terminal of each X’s production • And the first terminal of each X’s production is different • What grammar does not satisfy the above? • If two productions of the same non-terminal have the same first symbol (N or T), you can see immediately that it won’t work • S b S a | b B • S B a | B C • If the grammar is left recursive, then it won’t work • S S a | b B, B b B | c • The left recursive rule of S can generate all terminals that the other productions of S can generate • S b B can generate b, so, S S a can also generate b
Predicative Parsing • Need to rewrite the grammar • Left recursion elimination • This is required even for recursive descent parsing algorithm • Left factoring • Remove the leftmost common factors
First() • First() = { t | * t } • Consider all possible terminal strings derived from • The set of the first terminals of those strings • For all terminals t T • First(t) = {t}
First() • For all non-terminals X N • If X add to First(X) • If X 1 2 … n • i is either a terminal or a non-terminal (not a string as usual) • Add all terminals in First(1) to First(X) • Exclude • If First(1) … First(i-1) then add all terminals in First(i) to First(X) • If First(1) … First(n) then add to First(X) • Apply the rules until nothing more can be added • For adding t or : add only if t is not in the set yet
First() • Grammar E TE’ E’ +TE’ | T FT’ T’ *FT’ | F (E) | id | num • First First(*) = {*}, First(+) = {+}, … First(F) = {(, id, num} First(T’) = {*, } First(T) = First(F) = {(, id, num} First(E’) = {+, } First(E) = First(T) = {(, id, num}