This work applies Minimum Bayes Risk (MBR) decoding to optimize diverse automated metrics of translation quality. Automatic metrics in machine translation have made tremendous progress recently. In particular, neural metrics, fine-tuned on human ratings (e.g. BLEURT, or COMET) are outperforming surface metrics in terms of correlations to human judgements. Our experiments show that the combination of a neural translation model with a neural reference-based metric, BLEURT, results in significant improvement in automatic and human evaluations. This improvement is obtained with translations different from classical beam-search output: these translations have much lower likelihood and are less favored by surface metrics like BLEU.
翻译:这项工作运用最低贝量风险(MBR)解码来优化翻译质量的多种自动度量。 机器翻译中的自动度量最近取得了巨大进步。 特别是, 神经度量,根据人类评级( 如BLEURT, 或CWT) 进行微调, 在与人类判断的相关性方面,是优异的表面度量。 我们的实验表明,神经翻译模型与神经参考度量( BLEURT)的结合, 导致自动和人文评估的显著改善。 这一改进是通过与古典光束搜索产出不同的翻译取得的: 这些翻译的可能性要小得多,而且不那么受到像BLEU这样的表面度量的偏好。