Machine translation systems are vulnerable to domain mismatch, especially when the task is low-resource. In this setting, out of domain translations are often of poor quality and prone to hallucinations, due to the translation model preferring to predict common words it has seen during training, as opposed to the more uncommon ones from a different domain. We present two simple methods for improving translation quality in this particular setting: First, we use lexical shortlisting in order to restrict the neural network predictions by IBM model computed alignments. Second, we perform $n$-best list reordering by reranking all translations based on the amount they overlap with each other. Our methods are computationally simpler and faster than alternative approaches, and show a moderate success on low-resource settings with explicit out of domain test sets. However, our methods lose their effectiveness when the domain mismatch is too great, or in high resource setting.
翻译:机器翻译系统容易出现域际不匹配, 特别是当任务为低资源时。 在这一设置中, 域际翻译往往质量差,容易产生幻觉, 因为翻译模式更倾向于预测培训期间常见的词, 而不是来自不同域的比较罕见的词。 我们在此特定设置中提出了两种简单的提高翻译质量的方法: 首先, 我们使用词汇短名单来限制IBM模型对神经网络的预测, 计算对齐。 其次, 我们根据所有翻译的重叠量重新排序, 从而进行美元- 美元 的最佳列表重新排序。 我们的方法比替代方法简单快捷, 并且显示在低资源环境下的适度成功, 并有明确的域外测试设置 。 然而, 当域错配太大或资源设置高时, 我们的方法会失去效力 。