120 likes | 301 Views
Statistical Machine Translation. General Framework. Given sentences S and T , assume there is a “translator oracle” that can calculate P( T | S ), the probability that an “ideal translator” will produce sentence T given sentence S.
E N D
General Framework Given sentences S and T, assume there is a “translator oracle” that can calculate P(T|S), the probability that an “ideal translator” will produce sentence T given sentence S. Our statistical translator tries to “reverse engineer” the ideal translator. That is, given T, it finds the S with highest probability P(S|T). We have: We want:
language model search method translation model
Language model language model search method translation model can use n-gram model
Language model language model search method translation model can use n-gram model
Target sentence Translation model Need alignment model that will allow us to calculate the probabilities of alignments, e.g., P [The (1) proposal (2) will (4) not (3,5) now (9) be implemented (6, 7, 8) | Les propositions ne seront pas misesenapplication maintenant] Source sentence Notation for alignment: Les propositions ne seront pas mises en application maintenant | The (1) proposal (2) will (4, 5) not (3) now (9) be implemented (6, 7, 8)
Target sentence Translation model Alignment model consists of: • fertility model (fertility = number of source words each target word is mapped to) • term-translation model • distortion model Source sentence
Target sentence Need to calculate P (alignment), that is: P [The (1) proposal (2) will (4) not (3,5) now (9)be implemented (6, 7, 8)| Les propositions ne seront pas mises en application maintenant] Translation model (from Brown et al. paper): Source sentence To calculate this, we need: Fertility model: P(fertility =n|term) for each n (up to maximum value) and each target term Term-translation model: P(termS| termT), the probability that termSappears in the source given that termT appears in the target • Distortion model: One simple version is: assume position of target term depends only on position of source term and length of target sentence • P(i | j,L) for each target position i, source position j, and target length L • (limited to some maximum value for L)
Target sentence Need to calculate P (alignment), that is: P [The (1) proposal (2) will (4) not (3,5) now (9)be implemented (6, 7, 8)| Les propositions ne seront pas mises en application maintenant] Translation model (from Brown et al. paper): Source sentence Example: P [The (1) proposal (2) will (4) not (3,5) now (9) be implemented (6, 7, 8) | Les propositions ne seront pas mises en application maintenant] = P(fertility=1 | the) × P(les | the) × P(1 | 1, 7) × P(fertility=1 | proposal) × P(propositions | proposal) × P(2 | 2, 7) × P(fertility=1 | will) × P(seront | will) × P(3 | 4, 7) × P(fertility=2 | not) × P(ne | not) × P(pas | not)×P(4 | 3, 7)× P(4 | 5, 7) × etc.
How does the statistical translator learn these various models? From data, of course! E.g., massive amount of paired source/target sentences from UN translations How does the statistical translator search the database for the highest probability source sentence? See paper