Previous works mostly focus on either multilingual or multi-domain aspects of neural machine translation (NMT). This paper investigates whether the domain information can be transferred across languages on the composition of multi-domain and multilingual NMT, particularly for the incomplete data condition where in-domain bitext is missing for some language pairs. Our results in the curated leave-one-domain-out experiments show that multi-domain multilingual (MDML) NMT can boost zero-shot translation performance up to +10 gains on BLEU, as well as aid the generalisation of multi-domain NMT to the missing domain. We also explore strategies for effective integration of multilingual and multi-domain NMT, including language and domain tag combination and auxiliary task training. We find that learning domain-aware representations and adding target-language tags to the encoder leads to effective MDML-NMT.
翻译:先前的工作主要侧重于神经机器翻译的多语言或多领域方面。本文件调查域信息是否可以在多领域和多语言NMT的构成方面跨语言传递,特别是一些语言配对缺少在主域位数的不完整数据条件。我们在包放一域外试验中得出的结果表明,多领域多语言(MDML)NMT能够将零点翻译性能提高到BLEU的+10增益,并帮助多领域NMT的普及到缺失领域。我们还探索了多语言和多领域NMT的有效整合战略,包括语言和域标签组合及辅助任务培训。我们发现,学习域觉悟表达和在编码编码中添加目标语言标记可以带来有效的MDML-NMT。