简单和可缩放近距离近邻机器翻译 (Simple and Scalable Nearest Neighbor Machine Translation)

$k$NN-MT is a straightforward yet powerful approach for fast domain adaptation, which directly plugs pre-trained neural machine translation (NMT) models with domain-specific token-level $k$-nearest-neighbor ($k$NN) retrieval to achieve domain adaptation without retraining. Despite being conceptually attractive, $k$NN-MT is burdened with massive storage requirements and high computational complexity since it conducts nearest neighbor searches over the entire reference corpus. In this paper, we propose a simple and scalable nearest neighbor machine translation framework to drastically promote the decoding and storage efficiency of $k$NN-based models while maintaining the translation performance. To this end, we dynamically construct an extremely small datastore for each input via sentence-level retrieval to avoid searching the entire datastore in vanilla $k$NN-MT, based on which we further introduce a distance-aware adapter to adaptively incorporate the $k$NN retrieval results into the pre-trained NMT models. Experiments on machine translation in two general settings, static domain adaptation and online learning, demonstrate that our proposed approach not only achieves almost 90% speed as the NMT model without performance degradation, but also significantly reduces the storage requirements of $k$NN-MT.

翻译：国元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元(美元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元-国元)是快速域适应的直截了当但又非常有力的方法,它直接将预先训练的国元-国元/国元/国元模式的神经机器翻译(NMT)模型与域元-国元(NMT)的解码和存储效率相加固,同时保持翻译性能。为此,我们通过句级检索,为避免搜索万国元-国元-国元-国元-国元-国元/国元/国元/国元的完整数据库,尽管在概念-国库中进一步引入远程适应性适应性适应将国元-国元-国元的检索结果纳入预先训练的模型。在两种通用环境、静域调整和在线学习中进行机器翻译实验,实验不仅能-国元-国/国/国元-国/国元-国/国/国/国/国/国元-国元-国元-国元-国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/国/

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日