1 / 78

UNIT - II

UNIT - II. Grammar Formalism: Chomsky hierarchy of languages, Context free grammar, derivation trees, and sentential forms. Right most and leftmost derivation of strings, Ambiguity in context free grammars. Minimization of Context Free Grammars. Chomsky normal form, Greibach normal form.

alyn
Download Presentation

UNIT - II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNIT - II • Grammar Formalism: Chomsky hierarchy of languages, Context free grammar, derivation trees, and sentential forms. Right most and leftmost derivation of strings, Ambiguity in context free grammars. Minimization of Context Free Grammars. Chomsky normal form, Greibach normal form. • Push down Automata: Push down automata, definition, model, acceptance of CFL, Acceptance by final state and acceptance by empty stack and its equivalence. Equivalence of CFL and PDA,

  2. Formal Language • Formal language – is specified by well-defined set of rules of syntax • We describe the sentences of a formal language using a grammar. 2

  3. Grammar • Rules for defining which strings over an alphabet are in a particular language • Automata • A mathematical model of a computer which can determine whether a particular string is in the language 3

  4. Context-Free Grammar (CFG) The syntax of a programming language is specified by using Context Free Grammar (CFG). A CFG can be defined as G = { T,N,P,S} where T->set of terminals (lexemes are terminals) N-> set of non terminals S->starting symbol P-> production rules in the form of A→ where A N , NT 4

  5. CFG were originally conceived by N. Chomsky as a way to describe natural language. Applications of CFG CFG is used in the development of • Parsers (Paring is a process of determining if a string of tokens can be generated by a grammar). • XML (Extensible Markup Language) and DTD (Document Type Definitions) • Markup Languages (HTML) • The YACC (Yet-Another-Compiler-Compiler)Parser-Generator

  6. Notational Conventions: 1.Terminals: i) Lowercase letters early in the alphabets such as a, b ,c ii) All Tokens 2. Non Terminals: i) Uppercase letters early in the alphabets like A,B,C ii) Lowercase italic names such as exp ,stmt … 3.Uppercase letters late in the alphabet X, Y, Z are use to represent grammar symbol M,./?BHVCDX1QAzQ either terminal or non terminal 4. Lowercase Greek letters ,, are used to represent set of grammar symbols. 1q A 123 6

  7. Depending on production rules the grammars are classified into 4 types: • Unrestricted or Type 0 Grammar • Context Sensitive or Type1 Grammar • Context Free or Type2 Grammar • Regular or Type3 Grammar 7

  8. i) Unrestricted or Type 0 Grammar: In this the production rules are of the form (any no of terminals & non-terminals on both sides) → where , NT ii) Context Sensitive or Type1 Grammar: In this the production rules are of the form (the length of  should be greater) → where , NT and || || iii) Context Free or Type2 Grammar: In this the production rules are of the form A→where NT, A N(only one non-terminal on leftside) iv) Regular or Type3 Grammar: In this only one Non terminal is used at both left and right sides of the production Ex: A →a, A → Ba , A → aB 8

  9. Chomsky Hierarchy of languages and their recognizers Type 0 Turing Machine Type 1 LBA Type 2 PDA Type 3 FA 9

  10. Example : CFG Consider a language ,generates equal number of a’s and b’s {anbn | n0} S a S b S  Formally: G = ({S}, {a,b}, {S , S a S b}, S) 10

  11. Example : CFG The following CFG is for simple arithmetic expressions: E→ E op E  ( E )  id op→ +  -  *  % From above production rules T={ ( , ) , id , + , - , * , % } N = { E, op } S = { E } 11

  12. Derivations • A CFG specifies how to generate syntactically valid strings of symbols (terminals) by • Beginning at the start variable • Choose a production with the start variable on LHS • Replace the start variable with the RHS of that production • Choose a non-terminal N in the resulting string • Choose a production rule P with N on its LHS • Replace N with the RHS of P • Repeat the process (4-7) until no non-terminals remains. 12

  13. Derivation and Sentential Form • The => meta-symbol indicates that RHS was obtained by using a production rule to replace some non-terminal in LHS. • A derivation is a series of replacement operations that show how to derive a string of terminals from the start variable. • Each string of terminals along the way is called a sentential form. • The final sentential form is called a yield. 13

  14. Derivations • v is one-step derivable from u, written u  v • v is derivable from u, written u * v, if: There is a chain of one-derivations of the form: u  u1  u2  …  v =>* means “yields after zero or more replacements”. 14

  15. Example • Derivation for the string “ x * y + z” can be derived as follows: E=> E op E => E op id => E + id => E op E + id => E op id + id => E * id + id => id * id + id (x) (y) (z) 15

  16. Derivation • At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement. • If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation. • If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation. 16

  17. Left-Most and Right-Most Derivations Left-Most Derivation E  (E) •  (E+E) •  (id+E) •  (id+id)‏ Right-Most Derivation E  (E)  (E+E) •  (E+id) •  (id+id)‏ E→ E + E | ( E ) | id 17

  18. Derivation Tree of A Context-free Grammar • Represents the language using an ordered rooted tree. • Root represents the starting symbol. • Internal vertices represent the non terminal symbol that arise in the production. • Leaves represent the terminal symbols. • If the production A→w arise in the derivation, where w is a word, the vertex that represents A has as children vertices that represent each symbol in w, in order from left to right. 18

  19. Parse Tree E E ( E )‏ ( E )‏ E + E E E ( E )‏ ( E )‏ E + E E + E id id id • Paring is a process of determining if a string of tokens can be generated by a grammar. • A parse tree can be seen as a graphical representation of a derivation.  (E)‏  -(E+E)‏  (id+E)‏  (id+id)‏ 19

  20. Ambiguity E E + E id * E E id id E * E E E + E id id id • A grammar produces more than one left most parse tree or more than one right most parse tree for a sentence is called as an ambiguous grammar. E  E+E  id+E id+E*E  id+id*E  id+id*id E  E*E  E+E*E  id+E*E  id+id*E  id+id*id 20

  21. Elimination of ambiguity in grammars If the grammar is in the form of S->SS | 1 |2|…|n Is replaced by S->SS1 | S1 S1->1 |2 |…|n Ex: E  E+E | E*E | id | (E)is replaced by E  E+T | T TF | F F  id | (E) 21

  22. Minimizing Context Free Grammars 22

  23. Three ways to simplify/clean a CFG (clean) • Eliminate useless symbols (simplify) • Eliminate -productions • Eliminate unit productions A =>  A => B 23

  24. Eliminating Useless Symbols • A symbol is useful if it appears in some derivation of some terminal string from the start symbol. • Otherwise, it is useless.Eliminate all useless symbols by: • Eliminate symbols that derive no terminal string. (symbols that are not generating) A symbol X is generatingif there exists: X * w, for some w  T* 24

  25. 2. Eliminate unreachable symbols. A symbol X is reachable if there exists: S * X  For a symbol X to be “useful”, it has to be both reachable and generating • S * X * w’, for some w’  T* reachable generating 25

  26. First, eliminate all symbols that are not generating • Next, eliminate all symbols that are not reachable The order of these steps are important. 26

  27. Eliminating Useless symbols SAB | a A b • A, S are generating • B is not generating (and therefore B is useless) • ==> Eliminating B… (i.e., remove all productions that involve B) • S a • A  b • Now, A is not reachable and therefore is useless • Simplified G: • S  a 27

  28. What would happen if you reverse the order: i.e., test reachability before generating? Will fail to remove: A  b SAB | a A b 1) Eliminate non-reachable symbols so eliminate B S a A b 2) Eliminate non-generating symbols There are no symbols that are not generating Hence the simplified grammar is S a A b If we follow this order Will fail to remove: A  b 28

  29. Eliminating -productions A is said to be “nullable” variable if A*  First detect all nullable variables and remove nullable productions 29

  30. + • S  Example: Eliminating -productions • Let L be the language represented by the following CFG G: • SAB • AaAA |  • BbBB |  • Nullable symbols: {A, B} • G1 can be constructed from G as follows: • B  b | bB | bB | bBB • ==> B  b | bB | bBB • Similarly, A  a | aA | aAA • Similarly, S  A | B | AB Simplifiedgrammar G1: • S  A | B | AB • A  a | aA | aAA • B  b | bB | bBB 30

  31. Eliminating unit productions • A unit production is one whose right side consists of exactly one variable. (A->B) • These productions can be eliminated. • Identify unit pairs • A → B, B → ω, then A → ω A → B * 31

  32. The Unit Pair Algorithm: To remove unit productions • Suppose AB1B2 …  Bn • Action: Replace all intermediate productions to produce  directly • i.e., A; B1; … Bn; Definition: (A,B) to be a “unit pair” if A*B • We can find all unit pairs inductively: • Basis: Every pair (A,A) is a unit pair (by definition). Similarly, if AB is a production, then (A,B) is a unit pair. • Induction: If (A,B) and (B,C) are unit pairs, and AC is also a unit pair. 32

  33. The Unit Pair Algorithm: To remove unit productions Input: G=(V,T,P,S) Goal: to build G1=(V,T,P1,S) devoid of unit productions Algorithm: • Find all unit pairs in G • For each unit pair (A,B) in G: • Add to P1 a new production A, for every B which is a non-unit production • If a resulting production is already there in P, then there is no need to add it. 33

  34. Eliminating unit productions • E  T | E+T • T  F | T*F • F  I | (E) • I  a | b | Ia | Ib | I0 | I1 • How to eliminate unit productions? • Replace E T with E  F | T*F • Then, upon recursive application wherever there is a unit production: • E F | T*F | E+T (substituting for T) • E I | (E) | T*F| E+T (substituting for F) • E a | b | Ia | Ib | I0 | I1 | (E) | T*F | E+T (substituting for I) • Now, E has no unit productions • Similarly, eliminate for the remainder of the unit productions (A,B) to be a “unit pair” if A*B ,find all unit pairs 34

  35. Example: eliminating unit productions G: • E  T | E+T • T  F | T*F • F  I | (E) • I  a | b | Ia | Ib | I0 | I1 G1: • E  E+T | T*F | (E) | a| b | Ia | Ib | I0 | I1 • T  T*F | (E) | a| b | Ia | Ib | I0 | I1 • F  (E) | a| b | Ia | Ib | I0 | I1 • I  a | b | Ia | Ib | I0 | I1 35

  36. Normal Forms

  37. Why normal forms? • If all productions of the grammar could be expressed in the same form(s), then: • It becomes easy to design algorithms that use the grammar • It becomes easy to show proofs and properties 37

  38. Chomsky Normal Form (CNF) G is said to be in Chomsky Normal Form ( special forms for CFGs )if all its productions are in one of the following two forms: • A  BC where A,B,C are variables, or • A  a where a is a terminal • G has no useless symbols • G has no unit productions • G has no -productions • every production has the form A ->BC or A->c. 38

  39. Steps to convert to CFG to CNF • 1) Eliminate useless symbols • 2) determining all nullable variables and getting rid of all ε -productions 3) getting rid of all unit productions 4) breaking up long productions into length of 2 (the length of the production in CNF should be atmost 2) 5) moving terminals to unit productions 6) finally every production should be in the form A ->BC or A->c. 39

  40. Greibach Normal Form (GNF) G is said to be in Greibach Normal Form if all its productions are in one of the following two forms: • A  aA1A2A3 where A1,A2,A3 are variables, or • A  a where a is a terminal 40

  41. Steps to convert CFG to GNF 1) Eliminate useless symbols 2) Eliminate unit productions 3) Eliminate -productions 4) G should be in CNF 5) Eliminate left recursion 6) Get the production in the form of • A  aA1A2A3 where A1,A2,A3 are variables or • A  a where a is a terminal

  42. Algorithm to convert a CFG to GNF 42

  43. Left Recursion • A grammar is left recursive if it has a non-terminal A such that there is a derivation. A  A for some string  • Top-down parsing techniques (in compilers) cannot handle left-recursive grammars. • So, we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive. + 43

  44. Immediate Left-Recursion A  A  |  where  does not start with A  eliminate immediate left recursion A  A’ A’ A’ |  an equivalent grammar In general, A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A  eliminate immediate left recursion A 1 A’ | ... | n A’ A’1 A’ | ... | m A’ |  an equivalent grammar 44

  45. Immediate Left-Recursion -- Example E  E+T | T T  T*F | F F  id | (E)  eliminate immediate left recursion E  T E’ E’  +T E’ |  T  F T’ T’ *F T’ |  F  id | (E) 45

  46. Greibach Normal Form (GNF) Example: Convert the following CFG to GNF S → XA | BB B → b | SB X → b A → a 2) Rename with new variables S = A1 X = A2 A = A3 B = A4 Updated CNF A1 → A2A3 | A4A4 A4 → b | A1A4 A2 → b A3 → a 1) Convert to CNF S → XA | BB B → b | SB X → b A → a May 27, 2009 46

  47. A4 → A1A4 3) Ai → AjXk j >= i Xk is a string of zero or more variables A1 → A2A3 | A4A4 A4 → b | A1A4 A2 → b A3 → a May 27, 2009 47

  48. | A4A4A4 | b A4 → bA3A4 3) Ai → AjXk j >= i A4 → A1A4 A1 → A2A3 | A4A4 A4 → b | A1A4 A2 → b A3 → a | A4A4A4 | b A4 → A2A3A4 May 27, 2009 48

  49. A4 → A4A4A4 4) Eliminate Left Recursions A1 → A2A3 | A4A4 A4 → bA3A4 | A4A4A4 | b A2 → b A3 → a May 27, 2009 49

  50. 4) Elimination of Left Recursions A4 → bA3A4 | b | bA3A4Z| bZ A1 → A2A3 | A4A4 A4 → bA3A4 | A4A4A4 | b A2 → b A3 → a Z → A4A4 | A4A4Z May 27, 2009 50

More Related