Homographs, words with different meanings but the same surface form, have long caused difficulty for machine translation systems, as it is difficult to select the correct translation based on the context. However, with the advent of neural machine translation (NMT) systems, which can theoretically take into account global sentential context, one may hypothesize that this problem has been alleviated. In this paper, we first provide empirical evidence that existing NMT systems in fact still have significant problems in properly translating ambiguous words. We then proceed to describe methods, inspired by the word sense disambiguation literature, that model the context of the input word with context-aware word embeddings that help to differentiate the word sense be- fore feeding it into the encoder. Experiments on three language pairs demonstrate that such models improve the performance of NMT systems both in terms of BLEU score and in the accuracy of translating homographs.
翻译:具有不同含义但表面形态相同的词句,长期给机器翻译系统造成困难,因为很难根据上下文选择正确的翻译。然而,随着神经机器翻译系统的出现,在理论上可以考虑到全球感应环境,人们可能会假设这个问题已经缓解。在本文中,我们首先提供经验证据,证明现有的NMT系统在正确翻译模糊词句方面实际上仍然存在着严重问题。我们接着根据词感模糊学文献,对输入单词的背景进行描述,用上下文识别的词嵌入式词句,帮助区分感知将感感输入到编码器中的文字。对三种语言的实验表明,这种模型在BLEU分数和翻译同义词谱的准确性两方面都改善了NMT系统的性能。