In this paper, we evaluate the translation of negation both automatically and manually, in English--German (EN--DE) and English--Chinese (EN--ZH). We show that the ability of neural machine translation (NMT) models to translate negation has improved with deeper and more advanced networks, although the performance varies between language pairs and translation directions. The accuracy of manual evaluation in EN-DE, DE-EN, EN-ZH, and ZH-EN is 95.7%, 94.8%, 93.4%, and 91.7%, respectively. In addition, we show that under-translation is the most significant error type in NMT, which contrasts with the more diverse error profile previously observed for statistical machine translation. To better understand the root of the under-translation of negation, we study the model's information flow and training data. While our information flow analysis does not reveal any deficiencies that could be used to detect or fix the under-translation of negation, we find that negation is often rephrased during training, which could make it more difficult for the model to learn a reliable link between source and target negation. We finally conduct intrinsic analysis and extrinsic probing tasks on negation, showing that NMT models can distinguish negation and non-negation tokens very well and encode a lot of information about negation in hidden states but nevertheless leave room for improvement.
翻译:在本文中,我们评价了英文-德文(EN-DE)和英文-中文(EN-ZH)的自动和人工翻译。我们显示,随着更深和更先进的网络,神经机器翻译模型(NMT)对否定的翻译能力有了提高,尽管语言对口和翻译方向的性能各不相同。在EN-DE、DE-EN、EN-ZH和ZH-EN中,人工评价的准确性分别为95.7%、94.8%、93.4%和91.7%。此外,我们表明,翻译不足是NMT中最重大的错误类型,这与以前为统计机器翻译观察到的更多样化的错误类型形成对比。为了更好地了解否定的翻译不足的根源,我们研究了模型的信息流和培训数据。虽然我们的信息流分析并未显示任何可用来检测或纠正否定的翻译不足,但我们发现,否定经常在培训中重新表述,这可能会使模型更难了解关于否定的内在来源和目标的正确化分析,我们最终能够理解否定的正确性定义。