230 likes | 393 Views
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages. Katharina Probst April 5, 2002. Overview of the talk. Introduction and Motivation Overview of the AVENUE project Elicitation of bilingual data Rule Learning Seed Generation
E N D
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002
Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work
Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work
Introduction and Motivation • Basic idea: opening up Machine Translation to Languages to minority languages • Scarce resources for minority languages: • Bilingual text • Monolingual text • Target language grammar • Due to scarce resources, statistical and example-based methods will likely not perform as well • Our approach: • A system that elicits necessary information about the target language from a bilingual informant • The elicited information is used in conjunction with any other available target language information to learn syntactic transfer rules
System overview SL Input Run-Time Module Learning Module SL Parser EBMT Engine Elicitation Process SVS Learning Process Transfer Rules Transfer Engine TL Generator User Unifier Module TL Output
Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work
Elicitation • Eliciation is the process of presenting a bilingual speaker with sets of sentences. The user translates the sentences and specifies how the words align • The elicitation process serves multiple purposes: • Collection of data • Feature detection
Feature Detection • Feature detection is a process by which the learning module answers questions such as “Does the target language mark number on nouns?” • The elicitation corpus is organized in minimal pairs, i.e. pairs of sentences that differ in only one feature. For example: • You (John) are falling.[2nd person m, subj, present tense] • You (Mary) are falling.[2nd person f, subj, present tense] • You (Mary) fell.[2nd person f, subj, past tense] • Sentences 1 and 2 and sentences 2 and 3 are minimal pairs. • By comparing the translations for “you”, the system gets indications of whether plural is marked on nouns. • The results of feature detection will be used to guide the system in navigating through the elicitation corpus by eliminating parts used on Implicational Universals • The results will also be used by the rule learning module
More on the elicitation corpus • Eliciting data from bilingual informants entails a number of challenges: • The bilingual informant him/herself • Morphology and the lexicon • Learning grammatical features • Compositional elicitation • Elicitation of non-compositional data • Verb subcategorization • Alignment issues • Bias towards the source language
Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work
Rule Learning in the AVENUE project - Introduction • The goal is to semi-automatically (i.e. with the help of the user) infer syntactic transfer rules • Rule learning can be divided into two main steps: • Seed Generation: The system produces an initial “guess” at a transfer rule based on only one sentence. The produced rule is quite specific to the input sentence. • Version Space Learning: Here, the system takes the seed rules and generalize them.
Transfer rule formalism A transfer rule (TR) consists of the following components: • Source language sentence, Target language sentence that the TR was produced from • Word alignments • Phrase information such as NP, S, … • Part-of-Speech sequences for source and target language. • X-side constraints, i.e. constraints on the source language. These are used for parsing. • Y-side constraints, i.e. constraints on the target language. These are used for generation. • XY-constraints, i.e. constraints that transfer features from the source to the target language. These are used for transfer.
A word on compositionality • Basic idea: if you produce a transfer rule for a sentence, and there already exist transfer rules that can translated parts of the sentence, why not use them? • Adjust the alignments, part-of-speech sequences, and the constraints • The trickiest part is to find new constraints that cannot be in the lower-level rule, but are necessary to translate correctly in the context of a sentence
Clustering • Seed rules are “clustered” into groups that warrant attempt to merge • Clustering criteria: POS sequences, Phrase information, Alignments • Main reason for clustering: divide the large version space into a number of smaller version spaces and run the algorithm on each version space separately • Possible danger: Rules that should be considered together (such as “the man”, “men”) will not be
The Version Space • A set of seed rules in a cluster defines a version space as follows: The seed rules form the specific boundary (S). A virtual rule with the same POS sequences, alignments, and phrase information, but no constraints forms the general boundary (G): G boundary: virtual rule with no constraints Generalizations of seed rules, less specific than rule in G S boundary: seed rules
The partial ordering of rules in the version space • A rule TR2 is said to be strictly more general than another rule TR1 if the set of f-structures that satisfy TR2 are a superset of the set of f-structures that satisfy TR1. It is said to be equivalent to TR1 if the set of f-structures that satisfies TR1 is the same as the set of f-structures that satisfies TR2. • We have defined three operations that move a transfer rule to a strictly more general rule
Generalization operations • Operation 1: delete value constraint, e.g. ((X1 agr) = *3pl) → NULL • Operation 2: delete agreement constraint, e.g. ((X1 agr) = (X2 agr)) → NULL • Operation 3: merge two value constraints to an agreement constraint ((X1 agr) = *3pl) , ((X2 agr) = *3pl) → ((X1 agr) = (X2 agr))
Merging two transfer rules At the heart of the seeded version space learning algorithm is the merging of two transfer rules (TR1 and TR2) to a more general rule (TR3): • All constraints that are both in TR1 and TR2 are inserted into TR3 and removed from TR1 and TR2. • Perform all instances of Operation3 on TR1 and TR2 separately. • Repeat step 1.
Seeded Version Space Algorithm • Remove duplicate rules from the S boundary • Try to merge each pair of transfer rules • A merge is successful only if the CSet (set of covered sentences, i.e. sentences that are translated correctly) of the merged rule is a superset of the union of the CSets of the two unmerged rules • Pick the successful merge that optimizes an evaluation criterion • Repeat until no more merges are found
Evaluating a set of transfer rules • Initial thought: evaluate a merge based on the “goodness” of the new rule, i.e. its CSet and based on the size of the rule set • Goal: maximize coverage and minimize set • Currently: merges are only successful if there is no loss in coverage, so size of rule set only criterion used • Future(1): Coverage should be measured on a test set • Future(2): Relax the constraint that a successful merge cannot result in loss of coverage
Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work
Conclusions and Future Work • Novel approach to data-driven MT: less data, more encoded linguistic knowledge • Still in the first stages, so system is under heavy development and subject to major changes • Current work: compositionality • Future work includes: • Expanding coverage • Addressing (much) more complex constructions • Eliminating some assumptions