神经机器翻译的非计量非不受监督的域域适应 (Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation)

Recently, $k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor ($k$NN) retrieval to achieve domain adaptation without retraining. Despite being conceptually attractive, it heavily relies on high-quality in-domain parallel corpora, limiting its capability on unsupervised domain adaptation, where in-domain parallel corpora are scarce or nonexistent. In this paper, we propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval. To this end, we first introduce an autoencoder task based on the target language, and then insert lightweight adapters into the original NMT model to map the token-level representation of this task to the ideal representation of translation task. Experiments on multi-domain datasets demonstrate that our proposed approach significantly improves the translation accuracy with target-side monolingual data, while achieving comparable performance with back-translation.

翻译：最近,$k$NN-MT(NMT)展示了将经过培训的神经机器翻译(NMT)模型直接纳入特定域名的象征性面值($k$ear-neighbor $k$nwn)检索以在没有再培训的情况下实现领域适应的有希望的能力。尽管在概念上具有吸引力,但它在很大程度上依赖于高质量的平行平行的部位内部的高质量,限制了其在无监督域适应方面的能力,而该部位平行的部位是稀缺或不存在的。在本文中,我们提议了一个新的框架,在目标语言中直接将单语句直接用于目标语言中,以构建一个有效的数据存储点($k$k$ear-neearnear-nenebor)检索。为此,我们首先引入了基于目标语言的自动校对器任务,然后在原NMT模型中插入了轻量的调整器,将这项任务的代号表示方式与翻译任务的理想表述方式相匹配。多部位数据集实验表明,我们提出的方法大大改进了目标端单语数据翻译的准确性,同时实现反转。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

96+阅读 · 2020年3月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日