This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). This work aims to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieved the second place.
翻译:本文说明了我们在第六届机器翻译会议(WMT-21)上对大规模多语种机器翻译共同任务的做法,目的是建立一个单一的多语种翻译系统,假设通用的跨语言翻译能够提高多语种翻译的性能。我们探索了从双语翻译到多语种翻译的不同反译方法。从有限的抽样方法取得了更好的效果,这不同于双语翻译的发现。此外,我们还探索了词汇和合成数据数量的影响。令人惊讶的是,数字词汇规模较小,效果更好,而广泛的单语英语数据则稍有改进。我们向小型任务提交了成绩,又取得了第二成就。