1 / 24

Earley’s Algorithm: General Context-Free Parsing

Earley’s Algorithm: General Context-Free Parsing. Lecture 12 P. N. Hilfinger. Parsing General Context-Free Grammars. Shift-reduce parsing can work for most practical applications. However, one must sometimes munge the grammar, though not as much as LL(1).

emoran
Download Presentation

Earley’s Algorithm: General Context-Free Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger Prof. Hilfinger CS164 Lecture 12

  2. Parsing General Context-Free Grammars • Shift-reduce parsing can work for most practical applications. • However, one must sometimes munge the grammar, though not as much as LL(1). • Cannot handle ambiguity, nor situations where resolving ambiguities requires looking far ahead. • Today, we’ll look at a method that can: Earley’s Algorithm. • In fact, shift-reduce parsing is a highly optimized special case of this algorithm. Prof. Hilfinger CS164 Lecture 12

  3. Earley’s Algorithm: Basic Idea • Scan tokens left-to-right. • At each point, keep track of all possible subtrees that could include the current point in the input, based on everthing seen so far. • At the end of the input, if there is a tree that is rooted at the start symbol, we’ve found a parse (possibly many). Prof. Hilfinger CS164 Lecture 12

  4. Some Notation • If input is s=s1s2…sn then “position k’’ in the input is just after skand before sk+1, with position 0 at the beginning and position n at the end. • At each input position, k, compute a set of items, where each item has the form A    , m where A    is a production and 0≤m≤k. • Together, the items in the set describe all subtrees of possible parse trees that begin or end at position k or have a child that does. Prof. Hilfinger CS164 Lecture 12

  5. Meaning of an Item • An itemA    , m at position k means: • The input between positions m and k matches . • Depending on what sk+1…snis, there might be a subtree formed from production A    in the (or a) parse tree for the entire string. • So when  is empty, means that there is a possible handle for A   that ends at k. • So that leaves the problem of figuring out what items to put in each set. Prof. Hilfinger CS164 Lecture 12

  6. Example • Grammar: E  E + T E  T T  T * int T  int • Input: 0 int 1 + 2 int 3 * 4 int 5 • At position 0, we expect to see an E to our right, formed from one of E’s productions. • Plus, since an E can start with a T, we won’t be surprised by a T formed from one of its productions. Prof. Hilfinger CS164 Lecture 12

  7. Example: Getting Started int 0 + 1 E   T, 0 E   E + T, 0 Start with items for start symbol E T   int, 0 T   T * int, 0 and (since E can start with T), also add items for T Prof. Hilfinger CS164 Lecture 12

  8. Closure Items • Whenever we have an item B  A , j in item set m, it indicates that a substring producing A might start at this position. • That’s what the item A  , m means, so we also add those items (for each production A ) to item set m. • These are called closure items. • Other items are kernel items. Prof. Hilfinger CS164 Lecture 12

  9. Example: Computing next item set int 0 + 1 E   T, 0 E   E + T, 0 T   int, 0 T   T * int, 0 T  int , 0 T  T  * int, 0 E  T , 0 E  E  + T, 0 Prof. Hilfinger CS164 Lecture 12

  10. Computing next item set • For each item of the form A  c , k in item set m, where c=sm+1 is the next input symbol, insert A c  , k in item set m+1. • For each complete item, A , kin item set m+1, and each item B  A , j back in item set k, add item B A  , j to item set m+1. (When creating a parse tree, the A in this new item will have have children , as denoted by dashed red arrows in our examples). Prof. Hilfinger CS164 Lecture 12

  11. Continuing the Example, Set 2 + int 1 2 T  int , 0 E  E +  T, 0 closure items T   T * int, 2 T  T  * int, 0 E  T , 0 T   int, 2 E  E  + T, 0 Prof. Hilfinger CS164 Lecture 12

  12. Continuing the Example, Set 3 int * 2 3 E  E +  T, 0 T  int , 2 T  T  * int, 2 T   T * int, 2 E  E + T , 0 T   int, 2 from item set 0 E  E  + T, 0 Prof. Hilfinger CS164 Lecture 12

  13. Continuing the Example, Sets 4 & 5 int * 3 4 5 T  int , 2 T  T *  int, 2 T  T * int , 2 T  T  * int, 2 T  T  * int, 2 E  E + T , 0 E  E + T , 0 ACCEPT! E  E  + T, 0 E  E  + T, 0 Prof. Hilfinger CS164 Lecture 12

  14. Accepting the String • In the last item set, have a completed item for the start symbol that started in set 0. • That means “the input between 0 and end matches an entire production for the start symbol,” so the string parses correctly. Prof. Hilfinger CS164 Lecture 12

  15. Retrieving a Parse Tree or Derivation • Start with a completed item in the last set that produces the whole input (has form S…,0 for start symbol S). • Follow the red arrows to find how to expand that symbol. • Work backwards through the sets to find the expansions of the other nonterminals. Prof. Hilfinger CS164 Lecture 12

  16. Getting a Tree from our Example (I) int E 5 T  T * int , 2 E + T T  T  * int, 2 T E  E + T , 0 To find out how to expand this T, go back to chart 3 (before * int) start here * int E  E  + T, 0 Prof. Hilfinger CS164 Lecture 12

  17. Getting a Tree from our Example (II) int E 3 T  int , 2 E + T T  T  * int, 2 T To find out how to expand this E, go back to chart 1 (before +) E  E + T , 0 int * int E  E  + T, 0 Prof. Hilfinger CS164 Lecture 12

  18. Figuring out Where to Look • In the last slide, we had to figure out where to look for the derivation of the E in E + T • We used the items T  T  * int, 2 and T  int , 2 to get the T in E + T, both of which tell us that theT started after item set #2. • And since + is a terminal, we then have to go back one more. Prof. Hilfinger CS164 Lecture 12

  19. Getting a Tree from our Example (III) int 1 E T  int , 0 E T T  T  * int, 0 E  T , 0 T T E  E  + T, 0 int + int * int start here Prof. Hilfinger CS164 Lecture 12

  20. An Ambiguous Grammar (I) • Grammar: E  E + E E  E * E E  int • Input: 0 int 1 + 2 int 3 * 4 int 5 0 int 1 E   int, 0 E   E + E, 0 E   E * E, 0 E  int , 0 E  E  + E, 0 E  E  * E, 0 Prof. Hilfinger CS164 Lecture 12

  21. An Ambiguous Grammar (II) 1 + 2 int 3 E  int , 0 E  E  + E, 0 E  E  * E, 0 E  E +  E, 0 E   int, 2 E   E + E, 2 E   E * E, 2 E  int , 2 E  E  + E, 2 E  E  * E, 2 E  E + E , 0 E  E  + E, 0 E  E  * E, 0 Prof. Hilfinger CS164 Lecture 12

  22. An Ambiguous Grammar (III) 3 * 4 int 5 E  int , 2 E  E  + E, 2 E  E  * E, 2 E  E + E , 0 E  E  + E, 0 E  E  * E, 0 E  E *  E, 2 E  E *  E, 0 E   int, 4 E   E + E, 4 E   E * E, 4 E  int , 4 E  E * E , 2 E  E * E , 0 E  E  + E, 4 E  E  * E, 4 E  E + E , 0 There are two ways to produce the E starting at 0, reflecting ambiguity. Prof. Hilfinger CS164 Lecture 12

  23. Just for Fun… Grammar is ferociously ambiguous: produces  an infinite number of ways! E  E  E E 0 E  , 0 E   E E, 0 E  E  E, 0 E  E E , 0 ! ! ! Prof. Hilfinger CS164 Lecture 12

  24. Relationship to LR Shift-Reduce Parsing • With an LR(1) grammar, never have item sets where two items have the same production, with the dot in the same place, but different starting positions. • So, ignoring the starting positions, there is a finite number of possible item sets. • These are the states in the shift-reduce parser. Prof. Hilfinger CS164 Lecture 12

More Related