Quality Estimation (QE) of Machine Translation (MT) is a task to estimate the quality scores for given translation outputs from an unknown MT system. However, QE scores for low-resource languages are usually intractable and hard to collect. In this paper, we focus on the Sentence-Level QE Shared Task of the Fifth Conference on Machine Translation (WMT20), but in a more challenging setting. We aim to predict QE scores of given translation outputs when barely none of QE scores of that paired languages are given during training. We propose an ensemble-based predictor-estimator QE model with transfer learning to overcome such QE data scarcity challenge by leveraging QE scores from other miscellaneous languages and translation results of targeted languages. Based on the evaluation results, we provide a detailed analysis of how each of our extension affects QE models on the reliability and the generalization ability to perform transfer learning under multilingual tasks. Finally, we achieve the best performance on the ensemble model combining the models pretrained by individual languages as well as different levels of parallel trained corpus with a Pearson's correlation of 0.298, which is 2.54 times higher than baselines.
翻译:机器翻译的质量估计(QE)是一项任务,用于估计一个未知的MT系统所提供翻译产出的质量分数。然而,低资源语言的QE分数通常难以收集,而且很难收集。在本文件中,我们侧重于第五次机器翻译会议(WMT20)的句级QE共同任务,但在更具挑战性的环境中。我们的目标是预测在培训期间几乎没有提供配对语言的量化E分数时,特定翻译产出的质量分数。我们提议了一个基于全方位预测器天分QE模型,通过利用其他杂项语言的QE分数和有针对性语言的翻译结果,转让学习来克服这种QE数据稀缺的挑战。根据评价结果,我们详细分析我们每项扩展对QE模型的可靠性和在多语言任务下进行转移学习的普及能力的影响。最后,我们在组合模型上取得最佳业绩,该模型由单个语言预先培训过的模型以及不同层次的平行培训数据库比Pearson时间的2.58基准要好。