The many-to-many multilingual neural machine translation can translate between language pairs unseen during training, i.e., zero-shot translation. Improving zero-shot translation requires the model to learn universal representations and cross-mapping relationships to transfer the knowledge learned on the supervised directions to the zero-shot directions. In this work, we propose the state mover's distance based on the optimal theory to model the difference of the representations output by the encoder. Then, we bridge the gap between the semantic-equivalent representations of different languages at the token level by minimizing the proposed distance to learn universal representations. Besides, we propose an agreement-based training scheme, which can help the model make consistent predictions based on the semantic-equivalent sentences to learn universal cross-mapping relationships for all translation directions. The experimental results on diverse multilingual datasets show that our method can improve consistently compared with the baseline system and other contrast methods. The analysis proves that our method can better align the semantic space and improve the prediction consistency.
翻译:许多多种多语种神经机器翻译可以翻译培训期间看不见的两种语言之间,即零点翻译。改进零点翻译需要学习通用表达和交叉映射关系的模型,将监督方向上的知识传授给零点方向。在这项工作中,我们根据模拟编码器的演示输出差异的最佳理论,建议国家移动器的距离。然后,我们通过尽量减少拟议的学习通用表述的距离,缩小不同语言在象征性层次的语义等值表达之间的差距。此外,我们提议了一个基于协议的培训计划,它可以帮助模型根据语义等同语句的句子作出一致的预测,学习所有翻译方向的普遍交叉映射关系。关于多种多语种数据集的实验结果表明,我们的方法可以与基线系统和其他对比方法一致改进。分析证明,我们的方法可以更好地调整语义空间,提高预测的一致性。