Multilingual machine translation has attracted much attention recently due to its support of knowledge transfer among languages and the low cost of training and deployment compared with numerous bilingual models. A known challenge of multilingual models is the negative language interference. In order to enhance the translation quality, deeper and wider architectures are applied to multilingual modeling for larger model capacity, which suffers from the increased inference cost at the same time. It has been pointed out in recent studies that parameters shared among languages are the cause of interference while they may also enable positive transfer. Based on these insights, we propose an adaptive and sparse architecture for multilingual modeling, and train the model to learn shared and language-specific parameters to improve the positive transfer and mitigate the interference. The sparse architecture only activates a subnetwork which preserves inference efficiency, and the adaptive design selects different subnetworks based on the input languages. Evaluated on multilingual translation across multiple public datasets, our model outperforms strong baselines in terms of translation quality without increasing the inference cost.
翻译:最近,多语文机器翻译由于支持不同语文之间的知识转让,而且与众多双语模式相比培训和部署费用较低,因此最近引起了许多注意。已知的多语文模式的挑战是负面语言干扰。为了提高翻译质量,对更大模型能力的多语种建模应用了更深、更宽的架构,这同时也增加了推论成本。在最近的研究中指出,各语文共享参数是干扰因素,它们也可能导致积极的转移。根据这些见解,我们提议为多语种建模建立适应性和稀少的架构,并培训模型学习共享和特定语言的参数,以改进积极的转移和减少干扰。稀疏的架构仅启动一个维护推论效率的子网络,适应性设计则根据输入语言选择不同的子网络。对多个公共数据集的多语种翻译进行了评价,我们的模型在翻译质量方面超越了强有力的基线,而没有增加推论成本。