神经机器翻译的非计量非不受监督的域域适应 (Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation)

Recently, $k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor ($k$NN) retrieval to achieve domain adaptation without retraining. Despite being conceptually attractive, it heavily relies on high-quality in-domain parallel corpora, limiting its capability on unsupervised domain adaptation, where in-domain parallel corpora are scarce or nonexistent. In this paper, we propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval. To this end, we first introduce an autoencoder task based on the target language, and then insert lightweight adapters into the original NMT model to map the token-level representation of this task to the ideal representation of translation task. Experiments on multi-domain datasets demonstrate that our proposed approach significantly improves the translation accuracy with target-side monolingual data, while achieving comparable performance with back-translation.

翻译：最近,$k$NN-MT(NMT)展示了将经过培训的神经机器翻译(NMT)模型直接纳入特定域名的象征性面值($k$ear-neighbor $k$nwn)检索以在没有再培训的情况下实现领域适应的有希望的能力。尽管在概念上具有吸引力,但它在很大程度上依赖于高质量的平行平行的部位内部的高质量,限制了其在无监督域适应方面的能力,而该部位平行的部位是稀缺或不存在的。在本文中,我们提议了一个新的框架,在目标语言中直接将单语句直接用于目标语言中,以构建一个有效的数据存储点($k$k$ear-neearnear-nenebor)检索。为此,我们首先引入了基于目标语言的自动校对器任务,然后在原NMT模型中插入了轻量的调整器,将这项任务的代号表示方式与翻译任务的理想表述方式相匹配。多部位数据集实验表明,我们提出的方法大大改进了目标端单语数据翻译的准确性,同时实现反转。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【CVPR2021】基于跨领域自适应聚类的半监督领域自适应算法

专知会员服务

58+阅读 · 2021年5月19日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

【DeepMind】无监督实体对齐，AlignNet: Unsupervised Entity Alignment

专知会员服务

21+阅读 · 2020年7月24日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日