DALC: 神经机器翻译领域适应性学习曲线预测 (DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation)

Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pre-trained general NMT model which is adapted to the new domain on a sample of in-domain parallel data. Without parallel data, there is no way to estimate the potential benefit of DA, nor the amount of parallel samples it would require. It is however a desirable functionality that could help MT practitioners to make an informed decision before investing resources in dataset creation. We propose a Domain adaptation Learning Curve prediction (DaLC) model that predicts prospective DA performance based on in-domain monolingual samples in the source language. Our model relies on the NMT encoder representations combined with various instance and corpus-level features. We demonstrate that instance-level is better able to distinguish between different domains compared to corpus-level frameworks proposed in previous studies. Finally, we perform in-depth analyses of the results highlighting the limitations of our approach, and provide directions for future research.

翻译：神经机器翻译(NMT)模型的域适应(DA)往往依赖于事先经过培训的一般NMT模型,该模型根据主页平行数据的样本适应新领域,没有平行数据,就无法估计DA的潜在惠益,也不可能估计它所需的平行样本数量,然而,这是一个可取的功能,可以帮助MT从业人员在将资源投入到数据集的创建中之前作出知情决定。我们提议了一个“Demain适应学习曲线预测(DaLC)”模型,该模型根据源语中主页单一语言样本预测DA的预期性能。我们的模型依靠NMT编码器的表述,加上各种实例和实体层面的特征。我们证明,与以往研究中提议的实体层面框架相比,实例层面能够更好地区分不同领域。最后,我们深入分析了结果,突出我们方法的局限性,并为今后的研究提供方向。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日