240 likes | 283 Views
Earley’s Algorithm: General Context-Free Parsing. Lecture 12 P. N. Hilfinger. Parsing General Context-Free Grammars. Shift-reduce parsing can work for most practical applications. However, one must sometimes munge the grammar, though not as much as LL(1).
E N D
Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger Prof. Hilfinger CS164 Lecture 12
Parsing General Context-Free Grammars • Shift-reduce parsing can work for most practical applications. • However, one must sometimes munge the grammar, though not as much as LL(1). • Cannot handle ambiguity, nor situations where resolving ambiguities requires looking far ahead. • Today, we’ll look at a method that can: Earley’s Algorithm. • In fact, shift-reduce parsing is a highly optimized special case of this algorithm. Prof. Hilfinger CS164 Lecture 12
Earley’s Algorithm: Basic Idea • Scan tokens left-to-right. • At each point, keep track of all possible subtrees that could include the current point in the input, based on everthing seen so far. • At the end of the input, if there is a tree that is rooted at the start symbol, we’ve found a parse (possibly many). Prof. Hilfinger CS164 Lecture 12
Some Notation • If input is s=s1s2…sn then “position k’’ in the input is just after skand before sk+1, with position 0 at the beginning and position n at the end. • At each input position, k, compute a set of items, where each item has the form A , m where A is a production and 0≤m≤k. • Together, the items in the set describe all subtrees of possible parse trees that begin or end at position k or have a child that does. Prof. Hilfinger CS164 Lecture 12
Meaning of an Item • An itemA , m at position k means: • The input between positions m and k matches . • Depending on what sk+1…snis, there might be a subtree formed from production A in the (or a) parse tree for the entire string. • So when is empty, means that there is a possible handle for A that ends at k. • So that leaves the problem of figuring out what items to put in each set. Prof. Hilfinger CS164 Lecture 12
Example • Grammar: E E + T E T T T * int T int • Input: 0 int 1 + 2 int 3 * 4 int 5 • At position 0, we expect to see an E to our right, formed from one of E’s productions. • Plus, since an E can start with a T, we won’t be surprised by a T formed from one of its productions. Prof. Hilfinger CS164 Lecture 12
Example: Getting Started int 0 + 1 E T, 0 E E + T, 0 Start with items for start symbol E T int, 0 T T * int, 0 and (since E can start with T), also add items for T Prof. Hilfinger CS164 Lecture 12
Closure Items • Whenever we have an item B A , j in item set m, it indicates that a substring producing A might start at this position. • That’s what the item A , m means, so we also add those items (for each production A ) to item set m. • These are called closure items. • Other items are kernel items. Prof. Hilfinger CS164 Lecture 12
Example: Computing next item set int 0 + 1 E T, 0 E E + T, 0 T int, 0 T T * int, 0 T int , 0 T T * int, 0 E T , 0 E E + T, 0 Prof. Hilfinger CS164 Lecture 12
Computing next item set • For each item of the form A c , k in item set m, where c=sm+1 is the next input symbol, insert A c , k in item set m+1. • For each complete item, A , kin item set m+1, and each item B A , j back in item set k, add item B A , j to item set m+1. (When creating a parse tree, the A in this new item will have have children , as denoted by dashed red arrows in our examples). Prof. Hilfinger CS164 Lecture 12
Continuing the Example, Set 2 + int 1 2 T int , 0 E E + T, 0 closure items T T * int, 2 T T * int, 0 E T , 0 T int, 2 E E + T, 0 Prof. Hilfinger CS164 Lecture 12
Continuing the Example, Set 3 int * 2 3 E E + T, 0 T int , 2 T T * int, 2 T T * int, 2 E E + T , 0 T int, 2 from item set 0 E E + T, 0 Prof. Hilfinger CS164 Lecture 12
Continuing the Example, Sets 4 & 5 int * 3 4 5 T int , 2 T T * int, 2 T T * int , 2 T T * int, 2 T T * int, 2 E E + T , 0 E E + T , 0 ACCEPT! E E + T, 0 E E + T, 0 Prof. Hilfinger CS164 Lecture 12
Accepting the String • In the last item set, have a completed item for the start symbol that started in set 0. • That means “the input between 0 and end matches an entire production for the start symbol,” so the string parses correctly. Prof. Hilfinger CS164 Lecture 12
Retrieving a Parse Tree or Derivation • Start with a completed item in the last set that produces the whole input (has form S…,0 for start symbol S). • Follow the red arrows to find how to expand that symbol. • Work backwards through the sets to find the expansions of the other nonterminals. Prof. Hilfinger CS164 Lecture 12
Getting a Tree from our Example (I) int E 5 T T * int , 2 E + T T T * int, 2 T E E + T , 0 To find out how to expand this T, go back to chart 3 (before * int) start here * int E E + T, 0 Prof. Hilfinger CS164 Lecture 12
Getting a Tree from our Example (II) int E 3 T int , 2 E + T T T * int, 2 T To find out how to expand this E, go back to chart 1 (before +) E E + T , 0 int * int E E + T, 0 Prof. Hilfinger CS164 Lecture 12
Figuring out Where to Look • In the last slide, we had to figure out where to look for the derivation of the E in E + T • We used the items T T * int, 2 and T int , 2 to get the T in E + T, both of which tell us that theT started after item set #2. • And since + is a terminal, we then have to go back one more. Prof. Hilfinger CS164 Lecture 12
Getting a Tree from our Example (III) int 1 E T int , 0 E T T T * int, 0 E T , 0 T T E E + T, 0 int + int * int start here Prof. Hilfinger CS164 Lecture 12
An Ambiguous Grammar (I) • Grammar: E E + E E E * E E int • Input: 0 int 1 + 2 int 3 * 4 int 5 0 int 1 E int, 0 E E + E, 0 E E * E, 0 E int , 0 E E + E, 0 E E * E, 0 Prof. Hilfinger CS164 Lecture 12
An Ambiguous Grammar (II) 1 + 2 int 3 E int , 0 E E + E, 0 E E * E, 0 E E + E, 0 E int, 2 E E + E, 2 E E * E, 2 E int , 2 E E + E, 2 E E * E, 2 E E + E , 0 E E + E, 0 E E * E, 0 Prof. Hilfinger CS164 Lecture 12
An Ambiguous Grammar (III) 3 * 4 int 5 E int , 2 E E + E, 2 E E * E, 2 E E + E , 0 E E + E, 0 E E * E, 0 E E * E, 2 E E * E, 0 E int, 4 E E + E, 4 E E * E, 4 E int , 4 E E * E , 2 E E * E , 0 E E + E, 4 E E * E, 4 E E + E , 0 There are two ways to produce the E starting at 0, reflecting ambiguity. Prof. Hilfinger CS164 Lecture 12
Just for Fun… Grammar is ferociously ambiguous: produces an infinite number of ways! E E E E 0 E , 0 E E E, 0 E E E, 0 E E E , 0 ! ! ! Prof. Hilfinger CS164 Lecture 12
Relationship to LR Shift-Reduce Parsing • With an LR(1) grammar, never have item sets where two items have the same production, with the dot in the same place, but different starting positions. • So, ignoring the starting positions, there is a finite number of possible item sets. • These are the states in the shift-reduce parser. Prof. Hilfinger CS164 Lecture 12