多语种和多语种神经机器翻译培训的不确定性软件平衡 (Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training)

Learning multilingual and multi-domain translation model is challenging as the heterogeneous and imbalanced data make the model converge inconsistently over different corpora in real world. One common practice is to adjust the share of each corpus in the training, so that the learning process is balanced and low-resource cases can benefit from the high resource ones. However, automatic balancing methods usually depend on the intra- and inter-dataset characteristics, which is usually agnostic or requires human priors. In this work, we propose an approach, MultiUAT, that dynamically adjusts the training data usage based on the model's uncertainty on a small set of trusted clean data for multi-corpus machine translation. We experiments with two classes of uncertainty measures on multilingual (16 languages with 4 settings) and multi-domain settings (4 for in-domain and 2 for out-of-domain on English-German translation) and demonstrate our approach MultiUAT substantially outperforms its baselines, including both static and dynamic strategies. We analyze the cross-domain transfer and show the deficiency of static and similarity based methods.

翻译：学习多语种和多域翻译模式具有挑战性,因为不同和不平衡的数据使该模式在现实世界中与不同的公司不统一地融合在不同的公司之上。一个常见的做法是调整培训中每个主体的份额,以便学习过程平衡,低资源案例可以受益于高资源案例。然而,自动平衡方法通常取决于数据内部和内部的特征,这通常是不可知性的或需要人类前科。在这项工作中,我们提议了一种方法,即多功能AT,根据该模型在一套小的多体机器翻译的可靠清洁数据方面的不确定性动态调整培训数据使用情况。我们试验了两种不确定性措施,即多语言(16种语言,4种设置)和多域环境(4种内部语言,2种外语翻译为英文-德文翻译),并表明我们的方法多功能AT大大超出其基线,包括静态和动态战略。我们分析了跨域传输,并显示了基于静态和类似方法的缺陷。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

不可错过! CMU CMU《高级自然语言处理》结课了，附课件与视频

专知会员服务

73+阅读 · 2021年10月4日

专知会员服务

39+阅读 · 2020年11月3日

少标签数据学习，54页ppt

专知会员服务

203+阅读 · 2020年5月22日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【Google】无监督机器翻译，Unsupervised Machine Translation