Multilingual machine translation (MMT) benefits from cross-lingual transfer but is a challenging multitask optimization problem. This is partly because there is no clear framework to systematically learn language-specific parameters. Self-supervised learning (SSL) approaches that leverage large quantities of monolingual data (where parallel data is unavailable) have shown promise by improving translation performance as complementary tasks to the MMT task. However, jointly optimizing SSL and MMT tasks is even more challenging. In this work, we first investigate how to utilize intra-distillation to learn more *language-specific* parameters and then show the importance of these language-specific parameters. Next, we propose a novel but simple SSL task, concurrent denoising, that co-trains with the MMT task by concurrently denoising monolingual data on both the encoder and decoder. Finally, we apply intra-distillation to this co-training approach. Combining these two approaches significantly improves MMT performance, outperforming three state-of-the-art SSL methods by a large margin, e.g., 11.3\% and 3.7\% improvement on an 8-language and a 15-language benchmark compared with MASS, respectively
翻译:多语言机器翻译(MMT)从跨语言传输中受益,但是一个具有挑战性的多任务优化问题。这部分是因为没有系统学习语言特定参数的明确框架。自我监督的学习方法(SSL)利用大量单语数据(在没有平行数据的情况下),通过改进翻译工作作为MMT任务的补充任务,显示出了希望。然而,联合优化SSL和MMT任务甚至更具挑战性。在这项工作中,我们首先调查如何利用蒸馏内部学习更多的*个语言特定参数,然后显示这些语言特定参数的重要性。接下来,我们提出一个创新而简单的SSL任务,即同时去除,与MMT任务同时进行连线,同时分解关于编码器和解码器的单语数据。最后,我们对这一共同培训方法采用了内部蒸馏法。将这两种方法结合起来,大大改进MMMT的绩效,在8和15种语言基准上分别比M、11.3 ⁇ 和3.7 ⁇ 改进了三种最先进的SL方法。