A parallel corpus is generally required to automatically evaluate the translation quality using the metrics, such as BLEU, METEOR and BERTScore. While the reference-based evaluation paradigm is widely used in many machine translation tasks, it is difficult to be applied to translation with low-resource languages, as those languages suffer from a deficiency of corpora. Round-trip translation provides an encouraging way to alleviate the urgent requirement of the parallel corpus, although it was unfortunately not observed to correlate with forwarding translation in the era of statistical machine translation. In this paper, we firstly observe that forward translation quality consistently correlates to corresponding round-trip translation quality in the scope of neural machine translation. Then, we carefully analyse and unveil the reason for the contradictory results on statistical machine translation systems. Secondly, we propose a simple yet effective regression method to predict the performance of forward translation scores based on round-trip translation scores for various language pairs, including those between very low-resource languages. We conduct extensive experiments to show the effectiveness and robustness of the predictive models on 1,000+ language pairs. Finally, we test our method on challenging settings, such as predicting scores: i) for unseen language pairs in training and ii) on real-world WMT shared tasks but in new domains. The extensive experiments demonstrate the robustness and utility of our approach. We believe our work will inspire works on very low-resource multilingual machine translation.
翻译:通常需要平行材料来自动评估使用诸如BLEU、METEOR和BERTScore等衡量标准翻译的质量。 在许多机器翻译任务中广泛使用基于参考的评价模式,但很难运用于低资源语言的翻译,因为这些语言缺乏体质。 圆曲翻译为缓解平行材料的迫切需要提供了一种令人鼓舞的方法,尽管不幸的是,在统计机器翻译时代没有观察到它与转发翻译的关联。 在本文中,我们首先看到,前方翻译质量与神经机器翻译范围内相应的圆轨翻译质量始终相关。 然后,我们仔细分析和公布统计机器翻译系统结果相互矛盾的原因。 其次,我们提出了一个简单而有效的回归方法,以预测基于各种语言对口双的圆轨翻译分数的成绩,包括非常低资源语言之间的分数。 我们进行了广泛的实验,以展示1,000+语言配对的低预测模型的有效性和稳健健性。 最后,我们测试了我们具有挑战性的多语言翻译方法的设置,例如预测了统计机器翻译系统的系统化,但是展示了我们真正的数字化的模型。