The choice of parameter sharing strategy in multilingual machine translation models determines how optimally parameter space is used and hence, directly influences ultimate translation quality. Inspired by linguistic trees that show the degree of relatedness between different languages, the new general approach to parameter sharing in multilingual machine translation was suggested recently. The main idea is to use these expert language hierarchies as a basis for multilingual architecture: the closer two languages are, the more parameters they share. In this work, we test this idea using the Transformer architecture and show that despite the success in previous work there are problems inherent to training such hierarchical models. We demonstrate that in case of carefully chosen training strategy the hierarchical architecture can outperform bilingual models and multilingual models with full parameter sharing.
翻译:在多语种机器翻译模型中选择参数共享战略决定如何最佳地使用参数空间,从而直接影响到最终翻译质量。在显示不同语言之间关联程度的语言树的启发下,最近提出了在多语种机器翻译中分享参数的新的一般方法。主要的想法是使用这些专业语言等级作为多语种结构的基础:两种语言越接近,它们共享的参数就越多。在这项工作中,我们使用变换器结构测试这一想法,并表明尽管以前的工作取得了成功,但在培训这种等级模式方面存在着固有的问题。我们证明,在经过仔细选择的培训战略下,等级结构可以超越完全共享参数的双语模式和多语言模式。