220 likes | 450 Views
Chris Brew, The Ohio State University http://www.purl.org/NET/cbrew.htm. Montague Grammar and MT. 1. Machine Translation and Montague Grammar. Great paper by Jan Landsbergen, in Readings in Machine Translation. The place of linguistics in MT What is the essence of Montague Grammar?
E N D
Chris Brew, The Ohio State University http://www.purl.org/NET/cbrew.htm Montague Grammar and MT 1
Montague Grammar and MT Machine Translation and Montague Grammar • Great paper by Jan Landsbergen, in Readings in Machine Translation. • The place of linguistics in MT • What is the essence of Montague Grammar? • How can we use it (the essence) in MT? • The subset problem • How does this look today?
Possible translations • It must be defined clearly what the correct sentences of the source and target languages are. • Linguistic theory provides means to do this by providing grammars with associated compositional semantics • Landsbergen suggests a Montague -(inspired) grammar • If the input is a correct source language sentence, the output should be a correct target language sentence. • This is a condition on the design of the translation system. • Landsbergen sketches one approach • There must be some definition of the information content that the source and target sentences should have in common • This is a call to arms for translation theory • No good solution is currently available Montague Grammar and MT
Best translations • It must be defined clearly what the correct sentences of the source and target languages are. • This defines the search space of possible inputs and outputs • If the input is a correct source language sentence, the output should be the best corresponding target language sentence. • The system will be evaluated on its treatment of correct sentences. Robustness with respect to incorrect input is not required. • It could be that there are three sentences e,f and e’ such that f is the best translation of e but e’ is the best translation of f. ‘best translation’ is not a symmetric relation • By contrast, ‘possible translation’ is symmetric. • In addition, if we have three languages E,F,G then we have transitivity • possibleE-F possibleF-G = possibleE-G Montague Grammar and MT
Montague Grammar and MT Comparing MT systems • It is possible to reason theoretically about systems that at least aspire to Landsbergen’s principles • There are no obvious grammatical or semantic criteria for evaluating systems when the output is not even a correct sentence of the target language. • Linguists should specify the possible translations • Engineers (or linguists wearing hard hats) should worry about robustness and translation selection. • The robustness part might need to appeal to world knowledge, discourse history, knowledge of the task, other extralinguistic factors
Montague Grammar and MT The essence of Montague Grammar • There is a set of basic expressions with meanings • Rules are pairs of a syntactic and a semantic rule, where the syntactic and the semantic rules work in lock-step (Rule-to-rule hypothesis) • Either: the semantic rules are operators that build up the semantic value (Montagovian) • Or: the semantic rules build up an expression in some logic, then the expression is interpreted by the rules of the logic to produce a standardized semantic value (echt Montague)
Montague Grammar and MT Landsbergen’s system • M-grammars • Have surface trees (S-trees). S-PARSER is standard technology, generates parse forest of S-trees) • M-PARSER scans the results of S-PARSER, and applies a series of analytical rules to the S-trees rewriting them to produce surface trees. The M-PARSER is very powerful, and builds up semantic values. • The result of M-PARSER is a semantic tree that is easy to transfer.
Montague Grammar and MT The subset problem • Montague grammars translate natural language into subsets of intensional logic • There is no guarantee that the subset will be the same for every language • Without extra cleverness, the only sentences that can be translated will be those which are in the intersection of the source language IL and the target language IL
Montague Grammar and MT Isomorphic grammars • To avoid the subset problem, impose the constraint that • For every syntactic rule in one language there is a corresponding syntactic rule in every other language, and that the meaning operation is the same across the board • For every basic expression, there is a corresponding one in every other language • This is a really heavy constraint on grammar writers, and it isn’t clear how to do it
Montague Grammar and MT Grammar writing • A set of compositional rules R is written for handling a particular phenomenon in language L, a corresponding set of rules R’ is written for handling the corresponding phenomenon in language L’ (Landsbergen p250) • Grammar development proceeds in parallel. You test by ensuring that R covers the relevant expressions of L and R’ covers the relevant expressions of L’ • The most important practical difference between this and other approaches is probably that the grammars are written with translation in mind.
Montague Grammar and MT The claim • If you do this grammar-writing co-ordination, you can get away without worrying about the subset problem • Montague grammar may be way too complicated but if Dutch geloven works the same as English believe you can, in that case, get away with the same theoretically insufficient representation on both sides • You might be able to control the consequencesof putting extra (non-truth functional) control information into the IL by doing this on a case-by-case basis in order to co-ordinate specific phenomena. (DANGER)
Montague Grammar and MT How does this look today? • Practical experience with broad-coverage grammars • We now know that broad-coverage grammars produce large numbers of analyses, most of them crazy. • It definitely pays to do some kind of probabilistic parse selection, even if you have a good broad-coverage grammar. • If your goal is to do well on existing parsing metrics, it works well to learn the grammar from a treebank.
Montague Grammar and MT The linguistic question • Given a tree, tell me how to make a score for the tree out of smaller components
Montague Grammar and MT Given a tree • Tell me how to break it down into smaller components • Smaller components because these smaller components are going to be common enough that the statistics over them might be reliable • But large enough that the crucial relationships between the parts of the tree have a chance of coming through • Probabilistic context-free grammars are (slightly?) too coarse-grained. • So we adjust them in ways that bring out more of the crucial relationships. • Add parents, grandparents, head-words, other clever stuff
Montague Grammar and MT Given a translation pair • Tell me how to break it down into smaller components • Smaller components because these smaller components are going to be common enough that the statistics over them might be reliable • But large enough that the crucial relationships between the parts of the pair have a chance of coming through • Language model for TL, standard technology • Models 1,2,3,4,5 for SL TL correspondence. Clearly very coarse-grained • How to adjust so that more of the crucial relationships come through? • How to think about translation pairs?
Montague Grammar and MT Errorfulness • PTB is smallish and somewhat errorful • This imposes practical limits on the complexity of models. The more detail you ask for, the less likely your training procedure is to provide it in reliable form. • Hand-written grammars blur the distinction between ungrammaticality and lack of coverage. • It is therefore dangerous for components that use grammars to give too much weight to the grammar’s claims about ungrammaticality • Even when the grammar fails to provide a complete analysis, it could provide useful partial results.
Montague Grammar and MT Errorfulness • Current word-aligned corpora are tiny, but do at least exist. Presumably they too are errorful. • Unsupervised learning via EM has dominated the field. This is because nothing better is available. The pseudo-annotation that EM hallucinates is very errorful. • The complexity of models is limited by the need to do EM and by the difficulty of working with errorful annotation. • It is dangerous for the system to believe hard-and-fast things about intertranslatability
Montague Grammar and MT Coverage • To score well, it usually pays to guess even if • The question seems so stupid that no sensible answer is possible • Your answer would be little better than a random guess • Statistical parsers build up models of grammar that always make a guess • The models learn from the whole of the data. They might be designed to learn linguistic things, but they can and do implicitly learn non-linguistic things that turn out to help.
Montague Grammar and MT Coverage • To score well, it usually pays to guess even if • The question seems so stupid that no sensible answer is possible • Your answer would be little better than a random guess • Brown-style MT systems have good coverage, and not-bad probabilistic models of <something>. They too learn from the whole of the data. • Their design is shaped partly by the need to model linguistic things (e.g. word order variation) partly by accidental success in modeling other factors that we don’t understand yet
Montague Grammar and MT Conclusions • There is clear parallel between Landsbergen’s notion of intertranslatability and Montague’s notion of grammaticality. • Arguably, statistical parsers succeed because they relax the notion of grammaticality, allowing them to handle misfires in the grammar smoothly. Co-incidentally, they finish up robust to other difficulties, including weaknesses in the statistical models and the training data.
Montague Grammar and MT Conclusions • There is clear parallel between Landsbergen’s notion of intertranslatability and Montague’s notion of grammaticality. • Arguably, MT systems succeed because they relax the notion of intertranslatability (or just fail to even have such a notion). • Co-incidentally, this makes them robust to failings in the statistical modeling, the data, and the procedures for data augmentation. • That said, it would be nice to have explicit semantics in MT systems