1 / 13

Dr. Jan Hajič CS Dept., Johns Hopkins Univ. hajic@cs.jhu cs.jhu/~hajic

Introduction to Natural Language Processing (600.465) Statistical Translation: Alignment and Parameter Estimation. Dr. Jan Hajič CS Dept., Johns Hopkins Univ. hajic@cs.jhu.edu www.cs.jhu.edu/~hajic. Alignment. Available corpus assumed: parallel text (translation E ↔ F)

paulos
Download Presentation

Dr. Jan Hajič CS Dept., Johns Hopkins Univ. hajic@cs.jhu cs.jhu/~hajic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Natural Language Processing (600.465)Statistical Translation: Alignment and Parameter Estimation Dr. Jan Hajič CS Dept., Johns Hopkins Univ. hajic@cs.jhu.edu www.cs.jhu.edu/~hajic JHU CS 600.465/Jan Hajic

  2. Alignment • Available corpus assumed: • parallel text (translation E ↔F) • No alignment present (day marks only)! • Sentence alignment • sentence detection • sentence alignment • Word alignment • tokenization • word alignment (with restrictions) JHU CS 600.465/ Intro to NLP/Jan Hajic

  3. Sentence Boundary Detection • Rules, lists: • Sentence breaks: • paragraphs (if marked) • certain characters: ?, !, ; (...almost sure) • The Problem: period . • could be end of sentence (... left yesterday. He was heading to...) • decimal point: 3.6 (three-point-six) • thousand segment separator: 3.200 (three-thousand-two-hundred) • abbreviation never at the end of sentence: cf., e.g., Calif., Mt., Mr. • ellipsis: ... • other languages: ordinal number indication (2nd ~ 2.) • initials: A. B. Smith • Statistical methods: e.g., Maximum Entropy JHU CS 600.465/ Intro to NLP/Jan Hajic

  4. Sentence Alignment • The Problem: sentences detected only: • E: • F: • Desired output: Segmentation with equal number of segments, spanning continuously the whole text. • Original sentence boundaries kept: • E: • F: • Alignments obtained: 2-1, 1-1, 1-1, 2-2, 2-1, 0-1 • New segments called “sentences” from now on. JHU CS 600.465/ Intro to NLP/Jan Hajic

  5. Alignment Methods • Several methods (probabilistic and not prob.) • character-length based • word-length based • “cognates” (word identity used) • using an existing dictionary (F: prendre ~ E: make, take) • using word “distance” (similarity): names, numbers, borrowed words, Latin origin words, ... • Best performing: • statistical, word- or character- length based (with some words perhaps) JHU CS 600.465/ Intro to NLP/Jan Hajic

  6. Length-based Alignment • First, define the problem probabilistically: argmaxA P(A|E,F) = argmaxA P(A,E,F) (E,F fixed) • Define a “bead”: • E: • F: • Approximate: P(A,E,F) @Pi=1..nP(Bi), where Bi is a bead; P(Bi) does not depend on the rest of E,F. “bead” (2:2 in this case) JHU CS 600.465/ Intro to NLP/Jan Hajic

  7. The Alignment Task • Given the model definition, P(A,E,F) @Pi=1..nP(Bi), find the partitioning of (E,F) into n beads Bi=1..n, that maximizes P(A,E,F) over training data. • Define Bi = p:qai, where p:q {0:1,1:0,1:1,1:2,2:1,2:2} • describes the type of alignment • Want to use some sort of dynamic programming: • Define Pref(i,j)... probability of the best alignment from the start of (E,F) data (1,1) up to (i,j) JHU CS 600.465/ Intro to NLP/Jan Hajic

  8. P(1:0ak) Pref(i-2,j-2) Pref(i-2,j-1) Pref(i-1,j-2) Pref(i-1,j-1) Pref(i-1,j) Pref(i,j-1) P(2:2ak) P(2:1ak) P(1:2ak) P(0:1ak) P(1:1ak) Recursive Definition • Initialize: Pref(0,0) = 0. • Pref(i,j) = max ( Pref(i,j-1) P(0:1ak), Pref(i-1,j) P(1:0ak), Pref(i-1,j-1) P(1:1ak), Pref(i-1,j-2) P(1:2ak), Pref(i-2,j-1) P(2:1ak), Pref(i-2,j-2) P(2:2ak) ) • This is enough for a Viterbi-like search. • E: • F: i j JHU CS 600.465/ Intro to NLP/Jan Hajic

  9. Probability of a Bead • Remains to define P(p:qak) (the red part): • k refers to the “next” bead, with segments of p and q sentences, lengths lk,e and lk,f. • Use normal distribution for length variation: • P(p:qak) = P(d(lk,e,lk,f,m,s2),p:q) @ P(d(lk,e,lk,f,m,s2))P(p:q) • d(lk,e,lk,f,m,s2) = (lk,f - mlk,e)/lk,es2 • Estimate P(p:q) from small amount of data, or even guess and re-estimate after aligning some data. • Words etc. might be used as better clues in P(p:qak) def. JHU CS 600.465/ Intro to NLP/Jan Hajic

  10. Saving time • For long texts (> 104 sentences), even Viterbi (in the version needed) is not effective (o(S2) time) • Go paragraph by paragraph if they are aligned 1:1 • What if not? • Apply the same method first to paragraphs! • identify paragraphs roughly in both languages • run the algorithm to get aligned paragraph-like segments • then, run on sentences within paragraphs. • Performs well if not many consecutive 1:0 or 0:1 beads. JHU CS 600.465/ Intro to NLP/Jan Hajic

  11. Word alignment • Length alone does not help anymore. • mainly because words can be swapped, and mutual translations have often vastly different length. • ...but at least, we have “sentences” (sentence-like segments) aligned; that will be exploited heavily. • Idea: • Assume some (simple) translation model (such as Model 1). • Find its parameters by considering virtually all alignments. • After we have the parameters, find the best alignment given those parameters. JHU CS 600.465/ Intro to NLP/Jan Hajic

  12. Word Alignment Algorithm • Start with sentence-aligned corpus. • Let (E,F) be a pair of sentences (actually, a bead). • Initialize p(f|e) randomly (e.g., uniformly), fF, eE. • Compute expected counts over the corpus: c(f,e) = S(E,F);eE,fF p(f|e) " aligned pair (E,F), find if e in E and f in F; if yes, add p(f|e). • Reestimate: p(f|e) = c(f,e) / c(e) [c(e) = Sf c(f,e)] • Iterate until change of p(f|e) is small. JHU CS 600.465/ Intro to NLP/Jan Hajic

  13. Best Alignment • Select, for each (E,F), A = argmaxA P(A|F,E) = argmaxAP(F,A|E)/P(F) = argmaxA P(F,A|E) = argmaxA (e / (l+1)mPj=1..m p(fj|eaj)) = argmaxA Pj=1..mp(fj|eaj) • Again, use dynamic programming, Viterbi-like algorithm. • Recompute p(f|e) based on the best alignment • (only if you are inclined to do so; the “original” summed-over-all distribution might perform better). • Note: we have also got all Model 1 parameters. JHU CS 600.465/ Intro to NLP/Jan Hajic

More Related