大型多语多语机翻译的后译 (Back-translation for Large-Scale Multilingual Machine Translation)

This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). This work aims to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieved the second place.

翻译：本文说明了我们在第六届机器翻译会议(WMT-21)上对大规模多语种机器翻译共同任务的做法,目的是建立一个单一的多语种翻译系统,假设通用的跨语言翻译能够提高多语种翻译的性能。我们探索了从双语翻译到多语种翻译的不同反译方法。从有限的抽样方法取得了更好的效果,这不同于双语翻译的发现。此外,我们还探索了词汇和合成数据数量的影响。令人惊讶的是,数字词汇规模较小,效果更好,而广泛的单语英语数据则稍有改进。我们向小型任务提交了成绩,又取得了第二成就。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日