1 / 43

Rebecca Hwa hwa@cs.pitt.edu

Information visualization and its applications to machine translation. Rebecca Hwa hwa@cs.pitt.edu. Information visualization ( infovis ). “Information visualization is the use of computer-supported interactive visual representations of abstract data to amplify cognition .” [Card,1999].

missy
Download Presentation

Rebecca Hwa hwa@cs.pitt.edu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information visualization and its applications to machine translation Rebecca Hwa hwa@cs.pitt.edu

  2. Information visualization (infovis) “Information visualization is the use of computer-supported interactive visual representations of abstract data to amplify cognition.” [Card,1999]

  3. Example: movie narration chart [http://xkcd.com/657]

  4. Functions of visualization [Collins & Carpendale] • Recording information • Tables, maps, blueprints • Process and presenting information • Share, collaborate, revise • Through feedback and interaction • Seeing the unseen

  5. Why might Infovis help improving NLP? “Complexity brings externalization” [Collins, 2008] Slide: Courtesy of Collins’s tutorial

  6. How might infovis help to improve NLP? • Identify activities that are natural for users and machines to collaborate. • Design applications that encourage interactivity. • Use user interactions as diagnostic information.

  7. Outline • Introduction • Infoviz • Machine Translation • Infovis for MT correction • Improving MT • Future Directions

  8. Machine Translation (MT) • Transform a sentence from one language (source) to another (target) while preserving the (literal) meaning • Many approaches [cf. survey by Lopez, 2007] • Example-based MT • Statistical phrase-based MT • (Statistical) Syntax-driven MT • …

  9. Sample output [from Chinese-English Google-MT] “He is being discovered almost hit an arm in the pile of books on the desktop, just like frightened horse as a Lieju Wangbangbian almost Pengfan the piano stool.”

  10. Dealing with translation errors • We have a better language model. • “just like a frightened horse, he …” • We have common sense to help with decoding. • “Cannot find an arm in a pile of books” • He discovered an arm under the books • It was his arm that hit the books • Unknown translations complicates the matter. • “as LiejuWangbangbianalmostPengfanthe piano stool”

  11. Human Computer Collaboration Compensate each other’s weaknesses • Monolingual speaker: • Don’t understand the source language • May not know much about MT/NLP • Can be overwhelmed by too much information • MT: simple language model; no common sense • Collaboration via a graphical interface: • Visualization of multiple NLP resources • Interactive exploration of MT outputs

  12. Research Questions • To what extent can a collaborative approach help monolingual users understand source? • What resources do the users find informative? • How is the user’s understanding impacted by: • User’s background • Genre of the source text • Quality of the MT output • How will these answers help improving MT?

  13. Outline • Introduction • Infoviz • Machine Translation • Infovis for MT correction • The Chinese Room [with J. Albrecht and G.E. Marai] • System overview • A small user study • Discussions • Improving MT • Future Directions

  14. Interface Design Considerations • Can monolingual users be good “translators”? • Provide resources that MT may use • Too much information may confuse users • Limit to a handful of resources • NLP tools are imperfect • Visualization should not hide conflicting analyses between different resources

  15. System Overview • Google MT API • Word alignments • N-best phrasal re- translation • Syntactic parser • Stanford parser [Klein&Maning03] • Bilingual dictionary • Example phrase search • IR over a large source corpus and a parallel corpus [Lemur]

  16. A short video demo

  17. Experimental Methodology • 8 non-Chinese speakers • 4 short passages of ~10 sentences each • Two news articles • Two passages from a novel • Latin Square • Each person corrects the four passages under alternating conditions: • Full interactive system • Document view only • Corrections are judged by 2 bilingual speakers

  18. Translation Correction Procedure • First session: 20 minutes of system tutorial • Four sessions: one passage per sitting • Can work on sentences in arbitrary order • Can “pass” on any part of a sentence • Final commentary • Summarize the four articles • Qualitative feedback on the experience • Suggestions for improving the prototype

  19. Translation Evaluation Procedure • Given: • Source sentence • Reference translation • Bilingual judges evaluate: • Original MT • 8 corrected translations • An alternative reference translation • Emphasis on adequacy • The judges’ scores are normalized to reduce variance [Blatz et al., 2003]

  20. Experimental Hypotheses • The visualization interface will help users recover from more translation errors • The quality of correction is positively correlated with the quality of the initial MT • Users exposed to NLP or other foreign languages may better exploit the interface • Users may develop different correction strategies, preferring different resources

  21. Quality of Translation Correction Italics indicate numbers that are not statistically significant Overall, users made better corrections using the full visualization interface, but users still improved translations by directly editing MT outputs

  22. Quality of Translation Correction Italics indicate numbers that are not statistically significant The relationship between the quality of the original MT and that of the correction is more nuanced than simple correlation.

  23. Time Spent Average seconds per sentence

  24. Impact of User Backgrounds • Prior exposure to NLP • Not significantly better at using the full system • Better at correcting errors directly from MT outputs • Prior domain knowledge • Knowledge about basketball helped user5 and user6 on the sports news article

  25. Resource Usages

  26. Discussion – unrecovered errors • Corroborative errors from multiple sources • 美 is interpreted as “U.S.” rather than “beauty” by many NLP applications • Reference: He is responsive to beauty. • MT: He sensitive to the United States. • User corrected: He liked America. • Transliterated foreign names • Reference: Bandai’s spokeswoman Kasumi Nakanishi. • MT:Bandai Group, a spokeswoman for the U.S. to be SIN-West • User corrected: West, a spokeswoman for the U.S. Toy Manufacturing Group, and soon to be Vice President

  27. Discussion • In a group of users, a small percent may succeed in fixing these translation errors Multiple users work together as a group?

  28. Related Work • For monolingual speakers • [Hu, Resnik, and Bedersen, 2009] • For MT researchers • DerivTool[DeNeefe, Knight, and Chan, 2005] • Linear B [Callison-Burch, 2005] • For computer-aided human translators • TransType[Langlais, Foster, and Lapalme, 2000] • TransSearch[Macklovitch, Simard, Langlais, 2000]

  29. Take-away messages • Collaboration between MT and monolingual target-language speakers can lead to an overall improved translation. • Better language modeling and decoding supported by a standard translation model may still improve translation • Domain coherence is an important factor • Syntactic constraints may be helpful

  30. Outline • Introduction • Infovis for MT correction • Improving MT • Language model adaptation for “difficult” phrases [with B. Mohit and F. Liberato] • Prototypes of short phrases • Future Directions

  31. Difficult to Translate Phrases (DTP) • A common strategy for users of TheChineseRoom • Work on small chunks of bad translations in isolation • Apply multiple strategies to make sense of each phrase • Related previous work: Automatically identifying “difficult to translate phrases” (DTPs)[Mohit&Hwa, 2007] • Phrases that MT is likely to get wrong • Missing crucial word/phrasal translation pairs • Complex structures in source phrases • Bad language model/decoder interactions

  32. Research Questions • Should the DTPs be processed differently? • Hypothesis: a general-purpose language model (LM) may not be well suited for DTPs. • Approach: adapt a special LM for each DTP • Will better DTP handling lead to overall translation improvement? • What if a phrase is mis-classified as a DTP?

  33. Adapt Language Models for DTPs • Train one LM for each DTP • Identify a subset of sentences in the (bilingual) training corpus whose source side is similar to the DTP in question. • Use the target side as training data for the LM. • When decoding, use the adapted LM for DTP and the standard LM for the rest of the sentence • Related work: adapt LM for each test set [Kim & Khudanpur, 2004; Tam et. al., 2007; Zhang, 2008; Snover et. al., 2008]

  34. Experimental Setup • Arabic-English Phrase-based MT • Control for translation model sizes: • Smaller TM: Trained on 1M words • Larger TM: Trained on 50M words • LMs under comparison: • Adapted LM • Estimated upper-bound for adapted LM • Baseline LM: English side of the parallel corpus • Larger LM: monolingual English corpus • Evaluation metric: BLEU

  35. When translating DTPs Smaller TM • It’s better to use the adapted LMs than the baseline LM • Using adapted LMs is comparable to using a much larger general purpose LM • Upper bound suggests there’s still room for improvement Larger TM

  36. When translating “easy” phrases • Adapted LM about the same as the baseline • A larger general purpose LM doesn’t help. • The estimated upper bound improvement is smaller

  37. Overall performance • DTP classification has an accuracy of ~75%. • Adapted LM still helps the overall performance, resulting in ~+1 BLEU score.

  38. Outline • Introduction • Infovis for MT correction • Improving MT • Language model adaptation for “difficult” phrases • Prototypes of short phrases [with F. Liberato and B. Mohit] • Future Directions

  39. Dealing with unknown phrases • Users of TheChineseRoom tried to combine individual word lookups in some sensible way • Can we augment the translation phrase table by generating phrasal translations for unknown source phrases during test? • Working on shorter phrases in isolation  can use more complex translation and decoding methods

  40. Phrasal prototype • “backed off” version of phrasal translations • e.g. as a mix of surface words and parts-of-speech patterns: NN al JJ↔ NNNN • Can be scored like phrasal translations • Keep only the more likely prototypes • For a source phrase in test that: • matches a source prototype • not in the phrase table • generate target phrase based on target prototype and word-to-word dictionary

  41. Pilot experiment and results • POS prototypes • For training prototypes, we don’t need a large parallel corpus. • Many generated phrases are useless, but they don’t degrade performance

  42. Summary • Infovis for MT – Prototype design has to satisfy two groups • Help users accomplish their objectives • Define interactions that will help researchers • Users identified opportunities to improve MT • We created specialized LM for difficult to translate phrases to approximate domain coherence. • We applied syntactic constraints to generate more potential translations for unknown phrases.

  43. Information visualization and Its application to machine translation Rebecca Hwa hwa@cs.pitt.edu

More Related