1 / 18

Machine Translation The Translator ’ s Choice

Machine Translation The Translator ’ s Choice. Heidi Düchting Sylke Krämer Johann Roturier. Outline . Background Challenges Solutions Benefits Next steps Conclusions. Commercial Imperatives. Effective Time-critical documents in volume Efficient Translation process automation

Download Presentation

Machine Translation The Translator ’ s Choice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Translation The Translator’s Choice Heidi Düchting Sylke Krämer Johann Roturier

  2. Outline • Background • Challenges • Solutions • Benefits • Next steps • Conclusions

  3. Commercial Imperatives • Effective • Time-critical documents in volume • Efficient • Translation process automation • Combining translation technologies • workflow • TM, MT, and PE tools • Control • Loose writing guidelines vs. Controlled Language rules • Improved machine translatability

  4. Commercial Systems • Combine technologies • TM with previously machine-translated and post-edited segments for look-up • TM systems with MT component • Rule based and Example based • Pre-translate phase • Towards improved post-editing efficiency? • Not available in all systems • MT systems with TM component • 100 % match look-up

  5. Challenges • Setting a threshold for TM matches • 100% matches only • suitable when the objective is to provide MT output for gisting (no post-editing) • suitable when the MT system is fully customized and CL environment is in place (no post-editing?) • Quick PE • New sentences in which only one character changes are sent to the MT engine • W32.Beagle.AB is a mass-mailing worm that neither propagates via network shares nor deletes files • W32.Beagle.AC is a mass-mailing worm that neither propagates via network shares nor deletes files

  6. Solutions (1) • Two-tier process • Leverage Trados TM repository • Use MT system to translate unknown segments (Systran Premium 5.0) • Use MT output as TM input • Determine the export threshold • Existing TM segments vs. new controlled segments • Uncontrolled: Symantec announced a patch was available • CL: Symantec announced that a patch was available

  7. Solutions (2) • TMX format • obvious choice as the exchange format • XLIFF not supported by all MT systems • source and target segments <tu usagecount="1" creationdate="20050301T122255Z" creationid="SUPER"> <tuv lang="EN-US"> <seg>Then the worm searches all local and network drives for .gif, .bmp, and .wav files.</seg> </tuv> <tuv lang="DE-DE"> <seg>Then the worm searches all local and network drives for .gif, .bmp, and .wav files.</seg> </tuv> </tu>

  8. Processing TMX • Technical issues • TMX's various implementations can create discrepancies during the exchange process • Identical source and target segment • XML parser and TMX header • Pre and post processing with a single macro • Modules to remove and restitute sections • Environment: VBA

  9. Pre-translation Workflow Step 1: Analyze new document Step 2: Export unmatched segments Step 3: Pre-processing module Step 4: Call to MT system Step 5: Post-processing module Step 6: Import segments into TM

  10. Effective pre-translation • Efficiency and robustness • Refinable • Opportunity for modifications • Target segments • CL environment predictability • Frequent errors • Ideal scenario • Address problems that could not be fixed with CL rules

  11. Towards Automated Post-Editing • Surface post-editing • No linguistic analysis: no second MT • Text processing • Frequent errors due to default MT settings • Remove drudgery from post-editing • Lexical • Capitalization (folgende vs. Folgende) • Incorrect spelling (neuzustarten vs. neu zu starten) • Missing contractions (à le vs. au) • Extra words (fichier de .bmp vs. fichier .bmp)

  12. Towards Automated Post-Editing • Syntactic • Word order: “Klicken auf Sie” vs. “Klicken Sie auf” • Wrong structures (transfer or generation issue): neither…nor (ni ne..ni ne) • Textual • Formatting: trailing spaces after symbols (backslashes) • Punctuation inconsistent with style guide: inverted commas for German

  13. Towards Automated Post-Editing • Suitability of the environment • Regular expressions support • RE are a ‘way to describe text through pattern matching’ (Stubblebine 2003: 1) • Grouping and Capturing: • Match: ([Kk]licken) (auf) (Sie) • Replace: \1 \3 \2

  14. Content workflow

  15. Next steps • New environment • GMS integration • Centralized interface with content • Transport layer • MT as plug-in • XLIFF format • To machine translate unmatched segments • PE replacements • Fine-tune contextual replacements

  16. Conclusions • Combining MT & TM is efficient • leverage • post-editing is not repeated • increased throughput • Environment for avoiding errors • facilitated when CL rules are introduced • Scope of errors is reduced • New opportunities for translators • Fine-tuning MT user dictionaries • Refine automated PE tasks

  17. Thank You johann_roturier@symantec.com

More Related