Recently, token-level adaptive training has achieved promising improvement in machine translation, where the cross-entropy loss function is adjusted by assigning different training weights to different tokens, in order to alleviate the token imbalance problem. However, previous approaches only use static word frequency information in the target language without considering the source language, which is insufficient for bilingual tasks like machine translation. In this paper, we propose a novel bilingual mutual information (BMI) based adaptive objective, which measures the learning difficulty for each target token from the perspective of bilingualism, and assigns an adaptive weight accordingly to improve token-level adaptive training. This method assigns larger training weights to tokens with higher BMI, so that easy tokens are updated with coarse granularity while difficult tokens are updated with fine granularity. Experimental results on WMT14 English-to-German and WMT19 Chinese-to-English demonstrate the superiority of our approach compared with the Transformer baseline and previous token-level adaptive training approaches. Further analyses confirm that our method can improve the lexical diversity.
翻译:最近,象征性的适应性培训在机器翻译方面取得了大有希望的改进,通过对不同标志分配不同的培训重量,调整了跨热带损失功能,以缓解象征性的不平衡问题;然而,以往的做法只使用目标语言中的静态单词频率信息,而不考虑原始语言,这不足以完成机器翻译等双语任务;在本文件中,我们提出了一个基于双语的基于双语的适应性目标,从双语的角度衡量每个目标标志的学习困难,并相应地赋予适应性重量,以改善象征性的适应性培训。这种方法给高BMI的标志分配了更大的培训重量,这样便能以粗糙的颗粒形式更新简易的标志,而困难标志则以细微的颗粒形式更新。WMT14英语对德语和WMT19中文对英语的实验结果显示了我们方法与变异基线和以前的象征性的适应性培训方法相比的优越性。进一步的分析证实,我们的方法可以改善词汇的多样性。