This work introduces a simple regressive ensemble for evaluating machine translation quality based on a set of novel and established metrics. We evaluate the ensemble using a correlation to expert-based MQM scores of the WMT 2021 Metrics workshop. In both monolingual and zero-shot cross-lingual settings, we show a significant performance improvement over single metrics. In the cross-lingual settings, we also demonstrate that an ensemble approach is well-applicable to unseen languages. Furthermore, we identify a strong reference-free baseline that consistently outperforms the commonly-used BLEU and METEOR measures and significantly improves our ensemble's performance.
翻译:这项工作引入了一套简单的递减组合,用于根据一套新颖和既定的计量标准评价机器翻译质量。我们利用与2021年WMT计量讲习班中以专家为基础的MQM分数的关联来评估组合。在单一语言和零光跨语言环境中,我们表现出与单一语言和零光跨语言标准相比的显著性能改进。在跨语言环境中,我们还表明共同方法完全适用于隐形语言。此外,我们确定了一个强有力的无参考基准,该基准始终比常用的BLEU和METEOR措施要高,并大大改进了我们共同语言的性能。