Statistical Machine Translation

Statistical Machine Translation

General Framework Given sentences S and T, assume there is a “translator oracle” that can calculate P(T|S), the probability that an “ideal translator” will produce sentence T given sentence S. Our statistical translator tries to “reverse engineer” the ideal translator. That is, given T, it finds the S with highest probability P(S|T). We have: We want:

language model search method translation model

Language model language model search method translation model can use n-gram model

Target sentence Translation model Need alignment model that will allow us to calculate the probabilities of alignments, e.g., P [The (1) proposal (2) will (4) not (3,5) now (9) be implemented (6, 7, 8) | Les propositions ne seront pas misesenapplication maintenant] Source sentence Notation for alignment: Les propositions ne seront pas mises en application maintenant | The (1) proposal (2) will (4, 5) not (3) now (9) be implemented (6, 7, 8)

Target sentence Translation model Alignment model consists of: • fertility model (fertility = number of source words each target word is mapped to) • term-translation model • distortion model Source sentence

Target sentence Need to calculate P (alignment), that is: P [The (1) proposal (2) will (4) not (3,5) now (9)be implemented (6, 7, 8)| Les propositions ne seront pas mises en application maintenant] Translation model (from Brown et al. paper): Source sentence To calculate this, we need: Fertility model: P(fertility =n|term) for each n (up to maximum value) and each target term Term-translation model: P(termS| termT), the probability that termSappears in the source given that termT appears in the target • Distortion model: One simple version is: assume position of target term depends only on position of source term and length of target sentence • P(i | j,L) for each target position i, source position j, and target length L • (limited to some maximum value for L)

How does the statistical translator learn these various models? From data, of course! E.g., massive amount of paired source/target sentences from UN translations How does the statistical translator search the database for the highest probability source sentence? See paper

Statistical Machine Translation