神经机器翻译简单词汇域域适应方法高低高/低 (The Highs and Lows of Simple Lexical Domain Adaptation Approaches for Neural Machine Translation)

Machine translation systems are vulnerable to domain mismatch, especially in a low-resource scenario. Out-of-domain translations are often of poor quality and prone to hallucinations, due to exposure bias and the decoder acting as a language model. We adopt two approaches to alleviate this problem: lexical shortlisting restricted by IBM statistical alignments, and hypothesis re-ranking based on similarity. The methods are computationally cheap, widely known, but not extensively experimented on domain adaptation. We demonstrate success on low-resource out-of-domain test sets, however, the methods are ineffective when there is sufficient data or too great domain mismatch. This is due to both the IBM model losing its advantage over the implicitly learned neural alignment, and issues with subword segmentation of out-of-domain words.

翻译：机器翻译系统容易出现域际不匹配,特别是在低资源情况下。由于暴露偏差和作为语言模型的解码器,外部翻译往往质量差,容易产生幻觉。我们采取了两种办法来缓解这一问题:受IBM统计调整限制的词汇短名单和基于相似性的假设重新排序。这些方法在计算上是廉价的,广为人知,但在域外适应方面没有进行广泛的实验。然而,当数据充足或域外错配太大时,在低资源测试组中,方法是无效的。这是因为IBM模型在隐性学习的内线调整方面失去优势,以及外语子字块分割问题。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

18+阅读 · 2020年11月17日

【伯克利】黑盒机器翻译系统的模仿攻击与防御，Imitation Attacks and Defenses for Black-box Machine Translation Systems

专知会员服务

7+阅读 · 2020年5月4日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【哈佛大学】机器学习的层次局限性，A Hierarchy of Limitations in Machine Learning

专知会员服务

47+阅读 · 2020年2月12日