Recently, $k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor ($k$NN) retrieval to achieve domain adaptation without retraining. Despite being conceptually attractive, it heavily relies on high-quality in-domain parallel corpora, limiting its capability on unsupervised domain adaptation, where in-domain parallel corpora are scarce or nonexistent. In this paper, we propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval. To this end, we first introduce an autoencoder task based on the target language, and then insert lightweight adapters into the original NMT model to map the token-level representation of this task to the ideal representation of translation task. Experiments on multi-domain datasets demonstrate that our proposed approach significantly improves the translation accuracy with target-side monolingual data, while achieving comparable performance with back-translation.
翻译:最近,$k$NN-MT(NMT)展示了将经过培训的神经机器翻译(NMT)模型直接纳入特定域名的象征性面值($k$ear-neighbor $k$nwn)检索以在没有再培训的情况下实现领域适应的有希望的能力。 尽管在概念上具有吸引力,但它在很大程度上依赖于高质量的平行平行的部位内部的高质量,限制了其在无监督域适应方面的能力,而该部位平行的部位是稀缺或不存在的。 在本文中,我们提议了一个新的框架,在目标语言中直接将单语句直接用于目标语言中,以构建一个有效的数据存储点($k$k$ear-neearnear-nenebor)检索。 为此,我们首先引入了基于目标语言的自动校对器任务,然后在原NMT模型中插入了轻量的调整器,将这项任务的代号表示方式与翻译任务的理想表述方式相匹配。多部位数据集实验表明,我们提出的方法大大改进了目标端单语数据翻译的准确性,同时实现反转。