为低资源机器翻译质量估算进行基于组合的转让学习 (Ensemble-based Transfer Learning for Low-resource Machine Translation Quality Estimation)

Quality Estimation (QE) of Machine Translation (MT) is a task to estimate the quality scores for given translation outputs from an unknown MT system. However, QE scores for low-resource languages are usually intractable and hard to collect. In this paper, we focus on the Sentence-Level QE Shared Task of the Fifth Conference on Machine Translation (WMT20), but in a more challenging setting. We aim to predict QE scores of given translation outputs when barely none of QE scores of that paired languages are given during training. We propose an ensemble-based predictor-estimator QE model with transfer learning to overcome such QE data scarcity challenge by leveraging QE scores from other miscellaneous languages and translation results of targeted languages. Based on the evaluation results, we provide a detailed analysis of how each of our extension affects QE models on the reliability and the generalization ability to perform transfer learning under multilingual tasks. Finally, we achieve the best performance on the ensemble model combining the models pretrained by individual languages as well as different levels of parallel trained corpus with a Pearson's correlation of 0.298, which is 2.54 times higher than baselines.

翻译：机器翻译的质量估计(QE)是一项任务,用于估计一个未知的MT系统所提供翻译产出的质量分数。然而,低资源语言的QE分数通常难以收集,而且很难收集。在本文件中,我们侧重于第五次机器翻译会议(WMT20)的句级QE共同任务,但在更具挑战性的环境中。我们的目标是预测在培训期间几乎没有提供配对语言的量化E分数时,特定翻译产出的质量分数。我们提议了一个基于全方位预测器天分QE模型,通过利用其他杂项语言的QE分数和有针对性语言的翻译结果,转让学习来克服这种QE数据稀缺的挑战。根据评价结果,我们详细分析我们每项扩展对QE模型的可靠性和在多语言任务下进行转移学习的普及能力的影响。最后,我们在组合模型上取得最佳业绩,该模型由单个语言预先培训过的模型以及不同层次的平行培训数据库比Pearson时间的2.58基准要好。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

专知会员服务

39+阅读 · 2020年11月3日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

【Google】无监督机器翻译，Unsupervised Machine Translation