In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for the improvement of about +1 BLEU across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.
翻译:在本文中,我们调查了连接背后的驱动因素 — — 一种简单而有效的低资源神经机器翻译数据增强方法。 我们的实验表明,讨论环境不太可能导致四对语言的BLEU+1的改善。 相反,我们证明这一改善来自与交谈无关的另外三个因素:背景多样性、长度多样性和(在较小程度上)位置变化。