Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002

Overview of the talk • Introduction and Motivation • Overview of the AVENUE project • Elicitation of bilingual data • Rule Learning • Seed Generation • Seeded Version Space Learning • Conclusions and Future Work

Introduction and Motivation • Basic idea: opening up Machine Translation to Languages to minority languages • Scarce resources for minority languages: • Bilingual text • Monolingual text • Target language grammar • Due to scarce resources, statistical and example-based methods will likely not perform as well • Our approach: • A system that elicits necessary information about the target language from a bilingual informant • The elicited information is used in conjunction with any other available target language information to learn syntactic transfer rules

System overview SL Input Run-Time Module Learning Module SL Parser EBMT Engine Elicitation Process SVS Learning Process Transfer Rules Transfer Engine TL Generator User Unifier Module TL Output

Elicitation • Eliciation is the process of presenting a bilingual speaker with sets of sentences. The user translates the sentences and specifies how the words align • The elicitation process serves multiple purposes: • Collection of data • Feature detection

Feature Detection • Feature detection is a process by which the learning module answers questions such as “Does the target language mark number on nouns?” • The elicitation corpus is organized in minimal pairs, i.e. pairs of sentences that differ in only one feature. For example: • You (John) are falling.[2nd person m, subj, present tense] • You (Mary) are falling.[2nd person f, subj, present tense] • You (Mary) fell.[2nd person f, subj, past tense] • Sentences 1 and 2 and sentences 2 and 3 are minimal pairs. • By comparing the translations for “you”, the system gets indications of whether plural is marked on nouns. • The results of feature detection will be used to guide the system in navigating through the elicitation corpus by eliminating parts used on Implicational Universals • The results will also be used by the rule learning module

More on the elicitation corpus • Eliciting data from bilingual informants entails a number of challenges: • The bilingual informant him/herself • Morphology and the lexicon • Learning grammatical features • Compositional elicitation • Elicitation of non-compositional data • Verb subcategorization • Alignment issues • Bias towards the source language

Rule Learning in the AVENUE project - Introduction • The goal is to semi-automatically (i.e. with the help of the user) infer syntactic transfer rules • Rule learning can be divided into two main steps: • Seed Generation: The system produces an initial “guess” at a transfer rule based on only one sentence. The produced rule is quite specific to the input sentence. • Version Space Learning: Here, the system takes the seed rules and generalize them.

Transfer rule formalism A transfer rule (TR) consists of the following components: • Source language sentence, Target language sentence that the TR was produced from • Word alignments • Phrase information such as NP, S, … • Part-of-Speech sequences for source and target language. • X-side constraints, i.e. constraints on the source language. These are used for parsing. • Y-side constraints, i.e. constraints on the target language. These are used for generation. • XY-constraints, i.e. constraints that transfer features from the source to the target language. These are used for transfer.

Seed Generation

A word on compositionality • Basic idea: if you produce a transfer rule for a sentence, and there already exist transfer rules that can translated parts of the sentence, why not use them? • Adjust the alignments, part-of-speech sequences, and the constraints • The trickiest part is to find new constraints that cannot be in the lower-level rule, but are necessary to translate correctly in the context of a sentence

Clustering • Seed rules are “clustered” into groups that warrant attempt to merge • Clustering criteria: POS sequences, Phrase information, Alignments • Main reason for clustering: divide the large version space into a number of smaller version spaces and run the algorithm on each version space separately • Possible danger: Rules that should be considered together (such as “the man”, “men”) will not be

The Version Space • A set of seed rules in a cluster defines a version space as follows: The seed rules form the specific boundary (S). A virtual rule with the same POS sequences, alignments, and phrase information, but no constraints forms the general boundary (G): G boundary: virtual rule with no constraints Generalizations of seed rules, less specific than rule in G S boundary: seed rules

The partial ordering of rules in the version space • A rule TR2 is said to be strictly more general than another rule TR1 if the set of f-structures that satisfy TR2 are a superset of the set of f-structures that satisfy TR1. It is said to be equivalent to TR1 if the set of f-structures that satisfies TR1 is the same as the set of f-structures that satisfies TR2. • We have defined three operations that move a transfer rule to a strictly more general rule

Generalization operations • Operation 1: delete value constraint, e.g. ((X1 agr) = *3pl) → NULL • Operation 2: delete agreement constraint, e.g. ((X1 agr) = (X2 agr)) → NULL • Operation 3: merge two value constraints to an agreement constraint ((X1 agr) = *3pl) , ((X2 agr) = *3pl) → ((X1 agr) = (X2 agr))

Merging two transfer rules At the heart of the seeded version space learning algorithm is the merging of two transfer rules (TR1 and TR2) to a more general rule (TR3): • All constraints that are both in TR1 and TR2 are inserted into TR3 and removed from TR1 and TR2. • Perform all instances of Operation3 on TR1 and TR2 separately. • Repeat step 1.

Seeded Version Space Algorithm • Remove duplicate rules from the S boundary • Try to merge each pair of transfer rules • A merge is successful only if the CSet (set of covered sentences, i.e. sentences that are translated correctly) of the merged rule is a superset of the union of the CSets of the two unmerged rules • Pick the successful merge that optimizes an evaluation criterion • Repeat until no more merges are found

Evaluating a set of transfer rules • Initial thought: evaluate a merge based on the “goodness” of the new rule, i.e. its CSet and based on the size of the rule set • Goal: maximize coverage and minimize set • Currently: merges are only successful if there is no loss in coverage, so size of rule set only criterion used • Future(1): Coverage should be measured on a test set • Future(2): Relax the constraint that a successful merge cannot result in loss of coverage

Conclusions and Future Work • Novel approach to data-driven MT: less data, more encoded linguistic knowledge • Still in the first stages, so system is under heavy development and subject to major changes • Current work: compositionality • Future work includes: • Expanding coverage • Addressing (much) more complex constructions • Eliminating some assumptions

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages

Presentation Transcript

Discriminative Learning of Extraction Sets for Machine Translation

Towards Interactive and Automatic Refinement of Translation Rules

Automatic Translation of Human Languages

MT For Low-Density Languages

Towards automatic enrichment and analysis of linguistic data for low-density languages

Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages

Rules Based Machine Translation

Towards Interactive and Automatic Refinement of Translation Rules

Rapid development of machine translation for low density languages

Improving Statistical Machine Translation by Means of Transfer Rules

AVENUE Automatic Machine Translation for low-density languages

NICE Machine Translation for Low-Density Languages

Learning Transfer Rules for Machine Translation with Limited Data

Semi Automatic Liquid Filling Machine | Semi Automatic Twin

Semi-automatic Liquid Filling Machine

Semi Automatic Stitching Machine

Semi-Automatic Stretch Wrap Machine

Improving Statistical Machine Translation by Means of Transfer Rules

Semi Automatic Washing Machine

Semi Automatic Noodle Making Machine

Best Semi Automatic Washing Machine

Semi automatic vs. fully automatic washing machine