Multilingual neural machine translation can translate unseen language pairs during training, i.e. zero-shot translation. However, the zero-shot translation is always unstable. Although prior works attributed the instability to the domination of central language, e.g. English, we supplement this viewpoint with the strict dependence of non-centered languages. In this work, we propose a simple, lightweight yet effective language-specific modeling method by adapting to non-centered languages and combining the shared information and the language-specific information to counteract the instability of zero-shot translation. Experiments with Transformer on IWSLT17, Europarl, TED talks, and OPUS-100 datasets show that our method not only performs better than strong baselines in centered data conditions but also can easily fit non-centered data conditions. By further investigating the layer attribution, we show that our proposed method can disentangle the coupled representation in the correct direction.
翻译:多语言神经机翻译在培训期间可以翻译看不见的语言配对,即零点翻译。然而,零点翻译总是不稳定的。尽管先前的工程将不稳定归因于中央语言(例如英语)的支配,但我们用非中心语言的严格依赖性来补充这一观点。在这项工作中,我们提出一种简单、轻巧而有效的语言特有模式方法,适应非中心语言,将共享的信息和语言特有信息结合起来,以抵消零点翻译的不稳定性。 IWSLT17、Europarl、TED和OPUS-100数据集的变换器实验显示,我们的方法不仅比中心数据条件下的强基线更好,而且很容易适应非中心语言的数据条件。我们通过进一步调查分层归属,表明我们提出的方法可以将组合的表达方式分解到正确的方向。