Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at https://github.com/SunbowLiu/CopyingPenalty.
翻译:先前的研究显示,使用预先培训的语言模型(LM)的神经机翻译(NMT)模型的初始化可以加速模型培训,提高模型性能。在这项工作中,我们确定了NMT培训前培训前的关键副作用,因为基于LM培训前和NMT的培训目标之间存在差异。由于LM目标学会重建几个源符号并复制其中的大多数符号,培训前初始化会影响NMT模型的复制行为。我们通过引入一个称为复制率的衡量标准,对复制行为进行定量分析,从经验上表明,基于培训前NMT的模型的复制率大于标准。针对这一问题,我们提出了一个简单而有效的方法,称为复制惩罚,以控制解码过程中的复制行为。关于内部和外部基准的广泛实验表明,复制处罚方法通过控制基于NMT模型的复制行为,不断改进翻译性能。源代码可免费查阅 https://Peginubus/CombY.com。