220 likes | 354 Views
Beyond PCFGs. Chris Brew. Ohio State University. Beyond PCFGs. Shift-reduce parsers probabilistic LR parsers Data-oriented parsers. Motivation. Get round the limitations of the PCFG model Exploit knowledge about individual words Build better language models. Shift-reduce.
E N D
Beyond PCFGs Chris Brew Ohio State University
Beyond PCFGs • Shift-reduce parsers • probabilistic LR parsers • Data-oriented parsers
Motivation • Get round the limitations of the PCFG model • Exploit knowledge about individual words • Build better language models
Shift-reduce • Simple version either shift a word from the input list to the parse stack or reduce two elements from top of parse stack to a single tree • Hermjakob and Mooney cmp-lg 9706002 • structures rather than just trees and words • more complex parse action language • not just binary rules
Machine learning for shift-reduce • supervisor shows the system correct sequences of parsing actions • system tries to learn to predict correct actions • needs a feature language • as it learns, the supervisor has less need to override the actions chosen by the system.
Examples of the feature language • broad syntactic class of the third element on the stack • the tense of the first element of the input list • Does top element of stack contain an object? • Could top frame be an adjectival degree adverb (e.g. very)? • Is frame1 a possible agent/patient of frame2? • Do frame1 and frame2 satisfy subject-verb agreement?
Hand-crafted knowledge used • 205 features, all moderately local (no references to 1000th element of the stack or anything like that) • 4356 node lexical knowledge base • subcategorisation table for 242 verbs • But we learn the association between features and actions
Various different hybrid decision structures • best was a hierarchical list of decision trees which encoded information about the task. Schematically. • decide whether to do anything • if not, we are done • if so, decide whether to do a reduction • if so, decide which reduction • if not, decide what sort of shift to do
Evaluation • Corpus of 272 annotated sentences. • 17-fold cross validation (17 blocks of 16 sentences each) • Precision of 92.7%, recall of 92.8%: average length 17.1 words, with 43.5 parse actions per sentence. Parseval measures. • Correct structure and labelling 26.8% (i.e. 1 in 4 sentences are completely correct)
Comments on Hermjakob and Mooney • A lot of grunt work needed - but not as much as full rationalist NLP system • The knowledge used is micro-modular very small pieces of highly independent knowledge • Test set is small, sentences short • Fairly robust • Good on small scale tests in an English/German MT task
N N N N N N N N Probabilistic LR Parsers • Briscoe and Carroll CL 19(1) pp 25-59) • PCFGsgive these subtrees same probability N N
LR Parsing • Builds a parsing table which gives parsing actions and Gotos for possible combinations of parser state and input symbols • There may be parsing action conflicts, in which more than one action is available. • In programming language grammars, you almost never want conflicts. • In NL grammars, you have no escape!
Probabilistic LR • When there is a conflict, non-deterministically execute all possible actions. • But score them according to a probability distribution. • So where do the probabilities come from? And what do they mean? See analysis in Stolcke’s paper relating them to his forward and inner probabilities.
LR parsing using Alvey Tools Grammar • Wide coverage unification grammar written by Claire Grover and Ted Briscoe • Build LR tables from CF backbone of this grammar • Interactively build disambiguated training corpus by supervising choice of parse actions
Evaluation • Very good performance on LDOCE noun definitions 76% correct structure and labelling • State of the art results in later work on tag sequence grammars where the available lexical information is more restricted. (54% correct structure and labelling) • Work underway to bring this technique to Wall Street Journal data for comparison with other methods
Data-oriented parsing • Rens Bod: Enriching Linguistics with Statistics: Performance Models of Natural Language, Amsterdam Ph.D • Treebank data again (this time ATIS -- 600 sentences) • Radical rejection of context-free assumption • Count subtrees of arbitrary depth, not rule applications
NP V NP Euan likes Matthew S S S NP VP S NP VP NP VP Matthew V NP NP VP V NP Euan Matthew Tree fragments Some of the fragments S NP VP Matthew V NP likes Euan
The probability of a tree • The probability of all the ways of making it out of fragments • The probability of a fragment is given as a ratio between the frequency of the fragment and the total frequency of all fragments having that root
Complexity • It’s hard to find efficient algorithms for sewing together DOP trees (cf. Si’maan for solutions) • Only very small corpora feasible • In practice, depth may have to be limited. • Many tree fragments are very rare, so there is an issue about smoothing
Evaluation • several variations studied, DOP4 geta parse accuracies around 80% without a hand-coded dictionary, DOP-5 around 90% with. • results to be interpreted with caution due to small size of corpus • Evaluation on Dutch OVIS domain suggests that DOP is not competitive with Groningen’s more labour intensive system (but maybe that’s not the point)
Where to find out more • Papers by Bod, Carroll, Hermjakob. • Manning and Schütze ch 12. • http://xxx.soton.ac.uk/archive/cs/intro.html (subarea Computation and Language)