1 / 28

Statistical Machine Translation with Moses

Statistical Machine Translation with Moses. 0.6227. Hieu Hoang Localization World 2013. Agenda. What is Statistical Machine Translation? What is Moses? Common misconceptions Coming up What can we do for you?. Agenda. What is Statistical Machine Translation? What is Moses?

vega
Download Presentation

Statistical Machine Translation with Moses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Machine Translation with Moses 0.6227 Hieu Hoang Localization World 2013

  2. Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh

  3. Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh

  4. What is Statistical Machine Translation? It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the “Chinese code.” If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation? Warren Weaver 1949 Moses by Hieu Hoang, University of Edinburgh

  5. What is Statistical Machine Translation? • NLP Application • search engines, text mining etc. • Big-data • bi-text from the Internet • eg. multilingual websites, documents • large monolingual data • Learn to translate • from previous translations • models of language Moses by Hieu Hoang, University of Edinburgh

  6. What is Statistical Machine Translation? Training Using Source Text Linguistic Tools Training Data bi-text monolingual data dictionary § SMT System SMT System translation model language model lots of numbers… translation model language model lots of numbers… Source Text Moses by Hieu Hoang, University of Edinburgh

  7. What is a model? • Translation Model • Language Model • (of the target language) thanks to Precision Translation Tools Moses by Hieu Hoang, University of Edinburgh

  8. What is a model? • Translation model • source  translation • probability Moses by Hieu Hoang, University of Edinburgh

  9. What is a model? • Language model • Likelihood of sentence • in target language Moses by Hieu Hoang, University of Edinburgh

  10. Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh

  11. What is Moses? • Replacement for Pharoah • Academic software • Closed-source • Open source • Re-written, clean code • More features • Large developer community • Initiated by Hieu Hoang • Developed at NLP Workshop Moses by Hieu Hoang, University of Edinburgh

  12. Agenda • What is Statistical Machine Translation? • What is Moses? • Timeline • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh

  13. What is Moses? Common Misconceptions • Only for Linux • Difficult to use • Unreliable • Only phrase-based • Developed by one person • Slow Moses by Hieu Hoang, University of Edinburgh

  14. Only works on Linux • Tested on • Windows 7 (32-bit) with Cygwin 6.1 • Mac OSX 10.7 with MacPorts • Ubuntu 12.10, 32 and 64-bit • Debian 6.0, 32 and 64-bit • Fedora 17, 32 and 64-bit • openSUSE 12.2, 32 and 64-bit • Project files for • Visual Studio • Eclipse on Linux and Mac OSX Moses by Hieu Hoang, University of Edinburgh

  15. Difficult to use • Easier compile and install • Boost bjam • No installation required • Binaries available for • Linux • Mac • Windows/Cygwin • Moses + Friends • IRSTLM • GIZA++ and MGIZA • Ready-made models trained on Europarl Moses by Hieu Hoang, University of Edinburgh

  16. Unreliable • Monitor check-ins • Unit tests • More regression tests • Nightly tests • Run end-to-end training • http://www.statmt.org/moses/cruise/ • Tested on all major OSes • Train Europarl models • Phrase-based, hierarchical, factored • 8 language-pairs • http://www.statmt.org/moses/RELEASE-1.0/models/ Moses by Hieu Hoang, University of Edinburgh

  17. Only phrase-based model • replacement for Pharoah • extension of Pharaoh • From the beginning • Factored models • Lattice and confusion network input • Multiple LMs, multiple phrase-tables • since 2009 • Hierarchical model • Syntactic models Moses by Hieu Hoang, University of Edinburgh

  18. Developed by one person • ANYONE can contribute • 50 contributors ‘git blame’ of Moses repository Moses by Hieu Hoang, University of Edinburgh

  19. Slow Decoding thanks to Ken!! Moses by Hieu Hoang, University of Edinburgh

  20. Slow Training • Multithreaded • Reduced disk IO • compress intermediate files • Reduce disk space requirement Moses by Hieu Hoang, University of Edinburgh

  21. What is Moses? Common Misconceptions • Only for Linux • Difficult to use • Unreliable • Only phrase-based • Developed by one person • Slow Moses by Hieu Hoang, University of Edinburgh

  22. What is Moses? Common Misconceptions • Only for Linux Windows, Linux, Mac • Difficult to useEasier compile and install • UnreliableMulti-stage testing • Only phrase-basedHierarchical, syntax model • Developed by one personeveryone • SlowFastest decoder, multithreaded training, less IO Moses by Hieu Hoang, University of Edinburgh

  23. Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh

  24. Coming up… • Code cleanup • Incremental Training • Better translation • smaller model • bigger data • faster training and decoding • Applications • CAT tools • Speechtranslation Moses by Hieu Hoang, University of Edinburgh

  25. Applications Computer-Aided Translation • EU Project • CASMACAT • MATECAT Moses by Hieu Hoang, University of Edinburgh

  26. Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh

  27. What can we do for you? • simpler Moses • graphical interface • Windows compatibility • terminology and glossary • incremental training • What can you do for us? • code • data • funding Moses by Hieu Hoang, University of Edinburgh

  28. What can we do for you? • simpler Moses • graphical interface • Windows compatibility • terminology and glossary • incremental training • What can you do for us? • code • data • funding Moses by Hieu Hoang, University of Edinburgh

More Related