1 / 56

Chapter 5 Context-Free Grammars

Chapter 5 Context-Free Grammars. Grammar (Review). A grammar is a 4-tuple G = ( NT , T , P , S ), where NT : a finite set of non-terminal symbols T : a finite set of terminal (alphabet) symbols S : is the starting symbol S  NT P : a finite set of production rules: x → y

binh
Download Presentation

Chapter 5 Context-Free Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5Context-Free Grammars

  2. Grammar (Review) • A grammar is a 4-tuple G = (NT, T, P, S), where • NT: a finite set of non-terminal symbols • T: a finite set of terminal (alphabet) symbols • S: is the starting symbol S NT • P: a finite set of production rules: x→y • x (NT  T)+y (NT  T)* • Given w = uxv and production x → y • Apply rule replacing x with y to obtain z = uyv (written as w z) • Derivation of wT*: • S w1··· wnw (written as S *w) • The language generated by the grammar is • L(G) = {wT* : S *w}

  3. Linear and Regular Grammars (Review) Given a grammar G = (NT, T, P, S) let A and B NT and wT* A grammar is linear if at most one NT is on the right side of any rule A right-linear grammar has productions of the form A → w, or A → wB A left-linear grammar has productions of the form A → w, or A →Bw A regular grammar is either right-linear or left-linear A regular grammar is linear but not all linear grammars are regular. For example, grammar S → aSa | bSb | a | b | , is linear but not regular.

  4. 5.1: Context-Free Grammars (1) Context-Free Grammars (CFG) are language-description mechanisms used to generate the strings of a language A grammar is said to be context-free if every rule has a single non-terminal on the left-hand side A A NT: single non-terminal on the left hand side  (NT  T)*: string of terminals and non-terminals A language L is called context-free language (CFL) if and only if there is a context-free grammar G (CFG) that generates it (i.e., L = L(G))

  5. 5.1: Context-Free Grammars (2) The term context-free is based on the fact that the substitution of a NT on the left can be made anytime that NT appears in a sentential form, i.e., it does not depend on the rest of the sentential form (the context). This means you can apply the rule in any context. You might imagine a context-sensitive rule aaAb → aayb where the substitution of the variable depends on its context. In the example, we can only replace A with y when it is in the given context.

  6. 5.1: Context-Free Grammars (3) Example 5.1 G = ({S}, {a, b}, S, {S → aSa | bSb | }) context-free? linear? regular? typical derivation might be: S aSa aaSaa aabSbaa aabbaa Grammar generates language L(G)={wwR: w  {a, b}*}

  7. 5.1: Context-Free Grammars (4) Example 5.2 G = ({S, A, B}, {a, b}, S, {S → abB A → aaBb B → bbAa A → }) context-free? linear? regular? grammar generates language L(G) = {ab (bbaa)n bba (ba)n: n ≥ 0} See Examples 5.3 and 5.4

  8. 5.1: Context-Free Grammars (5) What is the language of this grammar? G = ({S}, {a, b}, S, {S →aSb, S → ab}) L(G) = {anbn | n 1} What is the language of this grammar? G = ({S}, {a, b}, S, {S→aSb, S→e}) L(G) = {anbn | n 0}

  9. 5.1: Context-Free Grammars (6) What is the language of this grammar? G = ({S, A}, {a, b}, S, {S→aSa | aAa A→bA | b } L(G) = {anbman | n 1, m 1} What is the language of this grammar? G1 = ({S, A, B}, {a, b}, S, {S →AB A→aA | a B→bB | e } G2 = ({S, B}, {a, b}, S, {S→aS | aB B→bB | e } L(G) = {a+b*} = {anbm | n 1, m 0}

  10. 5.1: Context-Free Grammars (7) What is the language of this grammar? G = ({S, B}, {a, b, c}, S, {S→abScB | e B→bB | b } L = {(ab)n(cbm)n | n 0, m 1} What is the grammar of this language? L = {anbmcmdn | n 1, m 1} G = ({S, A}, {a, b, c, d}, S, { S→aSd | aAd A→bAc | bc } What is the grammar of this language? L = {anbmcmd2n | n 0, m 1} G = ({S, A}, {a, b, c, d}, S, { S→aSdd | A A→bAc | bc }

  11. 5.1: Context-Free Grammars (8) What is the grammar of this language? L = {a*ba*ba*} G1 = ({S, A}, {a, b}, S, {S→AbAbA A→aA | e } G2 = ({S, A, C}, {a, b}, S, {S→aS | bA A→aA | bC } C→aC | e

  12. 5.1: Context-Free Grammars (9) What is the grammar of this language? A language over {a, b} with at least 2 b’s G1 = ({S, A}, {a, b}, S, {S→AbAbA A→aA | bA | e } G2 = ({S, A, C}, {a, b}, S, {S→aS | bA A→aA | bC C→aC | bC | e }

  13. 5.1: Context-Free Grammars (10) What is the grammar of this language? A language over {a, b} of even-length strings G = ({S, O}, {a, b}, S, {S→aO | bO| e O→aS | bS } What is the grammar of this language? A language over {a, b} of an even no. of b’s G = ({S, B, C}, {a, b}, S, { S→aS | bA| e A→aA | bS }

  14. 5.1: Context-Free Grammars (11) What is the grammar of this language? A language over {a, b, c} that do not contain the substring abc G = ({S, B, C}, {a, b, c}, S, {S→bS | cS| aA | e A→aA | bB | cS | e B→aA | bS | e}

  15. 5.1: Context-Free Grammars (12) Write a CFG for the following Languages L1 = {wcwR | w {a, b}*} S aSa | bSb | c L2 = {anbncmdm| n 1, m  1} S XY X  aXb | ab Y  cYd | cd

  16. 5.1: Context-Free Grammars (13) The following languages are NOT CFLs L1 = {wcw | w {a, b}*} L2 = {anbmcndm | n 1, m  1} L3 = {anbncn | n 0} G = ({S}, {a, b}, S, {S →aSb | SS | }) is this grammar context-free? yes, there is a single variable on the left hand side is this grammar linear? no, a rule has more than one variable on the right hand side

  17. 5.1: Context-Free Grammars (14) Are regular languages context-free? Why? yes, because context-free means that there is a single variable on the left side of each rule. All regular languages are generated by grammars that have a single variable on the left side of each grammar rule. But, not all context-free grammars are regular. Regular languages are a proper subset of the class of context-free languages. Regular grammars are a proper subset of context-free grammars. Non-regular languages can be generated by context-free grammars, so the term context-free generally includes non-regular languages and regular languages.

  18. 5.1: Context-Free Grammars (15)Leftmost & Rightmost Derivations • Given the grammar • S → aAB, A → bBb, B → A |  • String abbbb can be derived in different ways: • Left-most derivation: • always replace the leftmost NT in each sentential form • S aAB abBbB abAbB abbBbbB abbbbB abbbb • Righ-tmost derivation: • always replace the rightmost NT in each sentential form • S aAB aA abBb abAb abbBbb abbbb

  19. 5.1: Context-Free Grammars (16) DEFINITION  derives in one step  derives in  one step  indicates that the derivation utilizes the rules of grammar G + * G

  20. 5.1: Context-Free Grammars (17)   lm rm Derivation Order Leftmost: Replace the leftmost non-terminal symbol E E op E  id op E  id *E  id * id Rightmost: Replace the rightmost non-terminal symbol E E op E  E opid  E*id  id * id lm lm lm lm rm rm rm rm Important Notes:A y If xAzxyz , what’s true about x ? terminals If xAzxyz , what’s true about z ? terminals

  21. 5.1: Context-Free Grammars (18) Example • S  S + E | E • E  number | ( S ) • Left-most derivation • S  S+E  E+E  (S)+E  (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+(S))+E (1+2+(S+E))+E (1+2+(E+E))+E (1+2+(3+E))+E (1+2+(3+4))+E (1+2+(3+4))+5 • Right-most derivation • S  S+E E+5  (S)+5  (S+E)+5 (S+(S))+5 (S+(S+E))+5 (S+(S+4))+5 (S+(E+4))+5 (S+(3+4))+5 (S+E+(3+4))+5 (S+2+(3+4))+5 (E+2+(3+4))+5 (1+2+(3+4))+5

  22. 5.1: Context-Free Grammars (19)Derivation (Parsing) Trees Given the grammar S → aaSB | , B → bB | b A leftmost derivation S aaSB aaB aab A rightmost derivation S aaSB aaSb aab Both derivations correspond to the parse (or derivation) tree above.

  23. 5.1: Context-Free Grammars (20) In a parse tree, the nodes labeled with NTs correspond to the left side of a production rule and the children of that node correspond to the right side of the rule, e.g., S → aaSB. The tree structure shows the rule that is applied to each NT (for a specific derivation), without showing the order of rule application. Each internal node of the tree corresponds to a NT, and the leaves of the derivation tree represent the string of terminals. Note the tree applies to a specific derivation and may not include all rules, e.g., B → bB is not shown above.

  24. 5.1: Context-Free Grammars (21) Definition 5.3: Let G = (NT, T, S, P) be a context-free grammar. An ordered tree is a derivation tree for G if and only if it has the following properties: 1. the root is labeled S 2. every leaf has a label from T {} 3. every non-leaf (interior) vertex has a label from NT. 4. if a vertex has label A NT, and its children are labeled (left to right) a1, a2, ..., an, then P contain a production of the form A → a1a2…an 5. a leaf labeled has no siblings; that is, a vertex with a child labeled can have no other children Partial derivation tree: a tree having properties 3–5, but for which property 1 may not hold, and for which property 2 is replaced with: 2a. every leaf has a label from NT T {}

  25. 5.1: Context-Free Grammars (22) S  S +E E+E  (S)+E  (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E … (1+2+(3+4))+E (1+2+(3+4))+5 Derivation  Parse Tree S Parse Tree S + E S  S + E | E E  number | ( S ) E 5 ( S ) S + E ( S ) S + E E S + E 2 4 1 E 3 Derivation

  26. 5.1: Context-Free Grammars (23) E E E E E E E E op op op op E E E E id * id id * E  E op E E  E op E | ( E ) | - E | idop  + | - | * | / |   id op E id  id * E  id * id

  27. 5.1: Context-Free Grammars (24) • A derivation tree for grammar G yields a sentence of the language L(G) • A partial derivation tree yields a sentential form • The yield of a parse tree is the string of leaf symbols obtained by reading the tree left-to-right • the order they are encountered when the tree is traversed in a depth-first manner, always taking the leftmost unexplored branch

  28. 5.1: Context-Free Grammars (25) • Given the grammar • S → aAB A → bBbB → A | , • the yield of the partial derivation tree is the sentential form abBbB

  29. 5.1: Context-Free Grammars (26) • Given the grammar • S → aAB A → bBbB → A | , • the yield of the derivation tree is the sentence abbbb L(G).

  30. 5.1: Context-Free Grammars (27) • Theorem 5.1 (Relationship between Sentential Forms & Derivation Trees) • Let G = (NT, T, S, P) be a CFG. Then • for every w L(G), there exists a derivation tree of G whose yield is w. • the yield, w, of any derivation tree of G is such that w L(G). • if tGis any partial derivation tree for G whose root is labeled S, then the yield of tGis a sentential form of G. • As a side note, any w L(G) has a leftmost and a rightmost derivation.

  31. 5.1: Context-Free Grammars (28) • Given G = ({S, A, B}, {a, b}, S, {S→abB, A→aaBb|, B→bbAa}), • show L(G) = {ab(bbaa)nbba(ba)n : n ≥ 0}. • Basis: S abB abbbAa abbba =ab(bbaa)0bba(ba)0 • Inductive hypothesis: assume S *ab(bbaa)ibba(ba)i • Inductive step: if we “back up” one step and rather than terminating the hypothesis step via an A → substitution, we continue to expand, we get: • S ⇒*ab(bbaa)ibbAa(ba)i ab(bbaa)ibba(ba)i •  ab(bbaa)ibbaaBba(ba)i •  ab(bbaa)i+1B(ba)i+1 •  ab(bbaa)i+1bbAa(ba)i+1 •  ab(bbaa)i+1bba(ba)i+1 • By induction S *ab(bbaa)nbba(ba)n

  32. 5.1: Context-Free Grammars (29) • Given G = ({S, A, B}, {a, b}, S, { S → abB, A → aaBb | , B → bbAa}) • a leftmost derivation tree for w = abbbaabbaba.

  33. 5.1: Context-Free Grammars (30) • This grammar only has leftmost or rightmost derivations since there can never be more than one variable on the right hand side of the sentential forms. • Even so, to do a leftmost derivation from the tree, we simply expand the leftmost variables first starting from the root. The result is shown below • S abB abbbAa abbbaaBba abbbaabbAaba abbbaabbaba

  34. In practical applications, it is usually not enough to decide whether a string belongs to a language. It is also important to know how to derive the string from the language. Parsing uncovers the syntactical structure of a string, which is represented by a parse tree. The syntactical structure is important for assigning semantics to the string -- for example, if it is a computer program. 5.2: Parsing and Ambiguity (1)

  35. 5.2: Parsing and Ambiguity (2) • Application of parsing (Compilers) • Let G be a context-free grammar for the C language. Let the string w be a C program. • One thing a compiler does - in particular, the part of the compiler called the “parser” - is determine whether w is a syntactically correct C program. • It also constructs a parse tree for the program that is used in code generation. • There are many sophisticated and efficient algorithms for parsing. You may study them in more advanced classes (for example, on compilers).

  36. 5.2: Parsing and Ambiguity (3) • S-grammars • Definition 5.4: A context-free grammar G = (NT, T, S, P) is said to be a simple grammar or s-grammar if all of its productions are of the form • A → ax where A  NT, a  T, x NT*, and any pair (A, a) occurs at most once in P. • Examples: • S → aS | bSS | c is an s-grammar. • S → aS| bSS | aSB| c is not an s-grammar. • because pair (S, a) occurs in two productions: • S → aS, and S → aSB

  37. 5.2: Parsing and Ambiguity (4) • Example: • given s-grammar G with production • S → aS | bSS | c • show the derivation of the string w = abcc • Since G is an s-grammar, • the only way to get the a up front is via rule • S → aS • the only way to get the b is via rule • S → bSS • the only way to get each c is via rule • S → c • Thus, we have parsed the string in 4 steps as • S aS abSS abcS abcc

  38. 5.2: Parsing and Ambiguity (5) • Find an s-grammar for • aaa*b | b • We need to start with an a or have only the b. • S → aA • S → b • A → aB • B → aB • B → b

  39. 5.2: Parsing and Ambiguity (6) • A terminal string may be generated by a number of different derivations • Let G be a CFG. A string w is in L(G), iff there is • a leftmost derivation of w from S. (S w) • Question • Is there a unique leftmost derivation of every sentence (string) in the language of a grammar? • NO * lm

  40. 5.2: Parsing and Ambiguity (7) id id id E E E E * + E E Consider the expression grammar: E  E+E | E*E | (E) | -E | id Two different leftmost derivations of id + id * id E E+ E  id+ E  id+E* E  id+id* E  id+id* id

  41. 5.2: Parsing and Ambiguity (8) E E E * E E + E id id id Consider the expression grammar: E  E+E | E*E | (E) | -E | id Two different leftmost derivations of id + id * id E E* E  E+ E * E  id +E* E  id + id *E  id + id * id

  42. 5.2: Parsing and Ambiguity (9)Ambiguity • Definition 5.5: • a grammar G is ambiguous if there is a string with at least two parse trees, • two or more leftmost or rightmost derivations. • Example: CFG G with productions S → aSb | SS | is ambiguous because there are two parse tree for w = aabb as shown below.

  43. 5.2: Parsing and Ambiguity (10) • A CFG is ambiguous if there is a string w L(G) that can be derived by two distinct leftmost derivation • A grammar G is ambiguous if there exists a sentence in G with more than one derivation (parsing) tree • A grammar that is not ambiguous is called unambiguous • If G is ambiguous then L(G) is not necessarily ambiguous • A language L is inherently ambiguous if there is no unambiguous grammar that generates it

  44. 5.2: Parsing and Ambiguity (11) • Let G be S →aS | Sa | a • G is ambiguous since the string aa has 2 distinct leftmost derivation • S aS  aa • S Sa  aa • L(G) = a+ • This language is also generated by the unambiguous grammar S→aS | a • L(G) is not ambiguous. Why?

  45. 5.2: Parsing and Ambiguity (12) • Let G be S →bS | Sb | aL(G) = b*ab* • G is ambiguous since the string bab has 2 distinct leftmost derivation • S bS  bSb  bab • S Sb  bSb  bab • The ability to generate the b’s in either order must be eliminated to obtain an unambiguous grammar • This language is also generated by the unambiguous grammars G2: S→bS | A A→Ab | a G1: S→bS | aA A→bA | e

  46. 5.2: Parsing and Ambiguity (13) 1 3 2 E E E E + * E E E E E * E E + E 3 1 2 • Eliminating Ambiguity • Consider the following grammar • E  E + E | E * E | num • Consider the sentence 1 + 2 * 3 • Leftmost Derivation 1 • E  E + E  1 + E  1 + E * E  1 + 2 * E  1 + 2 * 3 • Leftmost Derivation 2 • E E * E  E + E * E  1 + E * E  1 + 2 * E  1 + 2 * 3

  47. 5.2: Parsing and Ambiguity (14) • Different parse trees correspond to different evaluations (meaning) LMD-1 LMD-2 * + + * 1 2 3 1 2 3 = 9 = 7

  48. 5.2: Parsing and Ambiguity (15) • Ambiguous: E →E + E | E * E | number • Both + and * have the same precedence • To remove ambiguity, you have to give + and * different precedence • Let us say * has higher precedence than + • E E + T | T • T T*num | num

  49. 5.2: Parsing and Ambiguity (16) 3 2 E num T * E + T T num num 1 Eliminating Ambiguity • E E + T | T • T T * num | num • Consider the sentence 1 + 2 * 3 • Leftmost Derivation (only one LMD) • E  E + T  T + T  num + T  1 + T  1 + T * num  1 + num * num  1 + 2 * num  1 + 2 * 3 = 7

  50. 5.2: Parsing and Ambiguity (17) 3 2 E E * T 1 • Let us say + has higher precedence than * • E  E * T | T • T T+ num | num • Consider the sentence 1 + 2 * 3 • Leftmost Derivation (only one LMD) • E  E * T  T * T  T + num * T  num + num * T  1 + num * T  1 + 2 * T  1 + 2 * num  1 + 2 * 3 T num num T + num = 9

More Related