In a multilingual neural machine translation model that fully shares parameters across all languages, an artificial language token is usually used to guide translation into the desired target language. However, recent studies show that prepending language tokens sometimes fails to navigate the multilingual neural machine translation models into right translation directions, especially on zero-shot translation. To mitigate this issue, we propose two methods, language embedding embodiment and language-aware multi-head attention, to learn informative language representations to channel translation into right directions. The former embodies language embeddings into different critical switching points along the information flow from the source to the target, aiming at amplifying translation direction guiding signals. The latter exploits a matrix, instead of a vector, to represent a language in the continuous space. The matrix is chunked into multiple heads so as to learn language representations in multiple subspaces. Experiment results on two datasets for massively multilingual neural machine translation demonstrate that language-aware multi-head attention benefits both supervised and zero-shot translation and significantly alleviates the off-target translation issue. Further linguistic typology prediction experiments show that matrix-based language representations learned by our methods are capable of capturing rich linguistic typology features.
翻译:在一个充分分享所有语文参数的多语言神经机器翻译模型中,通常使用人工语言符号来指导将语言转换为理想的目标语言;然而,最近的研究表明,预留语言符号有时未能将多语言神经机器翻译模型引导成正确的翻译方向,特别是零光翻译。为了缓解这一问题,我们提议了两种方法,即语言嵌入化和多角度多角度关注,学习信息化的语言表述,将翻译引向正确的方向。前者体现语言嵌入从源到目标的信息流中的不同关键切换点,目的是扩大翻译方向指导信号。后者利用矩阵,而不是矢量,在连续空间中代表一种语言。矩阵被挤成多头,以便在多个子空间学习语言表达。关于大规模多语言神经机器翻译的两个数据集的实验结果显示,语言认知多角度关注既有利于监督翻译,也有利于零光击翻译,并大大缓解离目标翻译问题。进一步的语言分类预测实验显示,我们的方法所学的基于矩阵的语言表述能够捕捉到丰富的语言类型特征。