kNN-MT presents a new paradigm for domain adaptation by building an external datastore, which usually saves all target language token occurrences in the parallel corpus. As a result, the constructed datastore is usually large and possibly redundant. In this paper, we investigate the interpretability issue of this approach: what knowledge does the NMT model need? We propose the notion of local correctness (LAC) as a new angle, which describes the potential translation correctness for a single entry and for a given neighborhood. Empirical study shows that our investigation successfully finds the conditions where the NMT model could easily fail and need related knowledge. Experiments on six diverse target domains and two language-pairs show that pruning according to local correctness brings a light and more explainable memory for kNN-MT domain adaptation.
翻译:kNN- MT 通过建立一个外部数据储存库为域的适应提供了一个新的范例, 外部数据储存库通常能保存平行体中所有目标语言符号发生的情况。 因此, 构建的数据储存通常很大, 并且可能多余。 在本文中, 我们调查了这个方法的可解释性问题: NMT模型需要什么知识? 我们提出地方正确性概念( LAC) 是一个新角度, 描述单个条目和特定邻里的潜在翻译正确性。 经验研究表明, 我们的调查成功地找到了NMT模型很容易失效和需要相关知识的条件。 对六个不同目标域和两个语言空间的实验显示, 根据本地正确性进行运行为 kNN- MT域的适应带来光亮和更可解释的记忆。