神经机机翻译的解码时间词汇域域适应 (Decoding Time Lexical Domain Adaptation for Neural Machine Translation)

Machine translation systems are vulnerable to domain mismatch, especially when the task is low-resource. In this setting, out of domain translations are often of poor quality and prone to hallucinations, due to the translation model preferring to predict common words it has seen during training, as opposed to the more uncommon ones from a different domain. We present two simple methods for improving translation quality in this particular setting: First, we use lexical shortlisting in order to restrict the neural network predictions by IBM model computed alignments. Second, we perform $n$-best list reordering by reranking all translations based on the amount they overlap with each other. Our methods are computationally simpler and faster than alternative approaches, and show a moderate success on low-resource settings with explicit out of domain test sets. However, our methods lose their effectiveness when the domain mismatch is too great, or in high resource setting.

翻译：机器翻译系统容易出现域际不匹配, 特别是当任务为低资源时。在这一设置中, 域际翻译往往质量差,容易产生幻觉, 因为翻译模式更倾向于预测培训期间常见的词, 而不是来自不同域的比较罕见的词。我们在此特定设置中提出了两种简单的提高翻译质量的方法: 首先, 我们使用词汇短名单来限制IBM模型对神经网络的预测, 计算对齐。其次, 我们根据所有翻译的重叠量重新排序, 从而进行美元- 美元的最佳列表重新排序。我们的方法比替代方法简单快捷, 并且显示在低资源环境下的适度成功, 并有明确的域外测试设置。然而, 当域错配太大或资源设置高时, 我们的方法会失去效力。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

18+阅读 · 2020年11月17日

【伯克利】黑盒机器翻译系统的模仿攻击与防御，Imitation Attacks and Defenses for Black-box Machine Translation Systems

专知会员服务

7+阅读 · 2020年5月4日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日