Machine translation quality estimation (QE) predicts human judgements of a translation hypothesis without seeing the reference. State-of-the-art QE systems based on pretrained language models have been achieving remarkable correlations with human judgements yet they are computationally heavy and require human annotations, which are slow and expensive to create. To address these limitations, we define the problem of metric estimation (ME) where one predicts the automated metric scores also without the reference. We show that even without access to the reference, our model can estimate automated metrics ($\rho$=60% for BLEU, $\rho$=51% for other metrics) at the sentence-level. Because automated metrics correlate with human judgements, we can leverage the ME task for pre-training a QE model. For the QE task, we find that pre-training on TER is better ($\rho$=23%) than training for scratch ($\rho$=20%).
翻译:机器翻译质量估计( QE) 预测人对翻译假设的判断而不见参考值。 以预先培训的语言模型为基础的最先进的 QE 系统已经实现了与人类判断的显著关联, 但它在计算上非常重, 需要人的注释, 而这些注释是缓慢而昂贵的。 为了解决这些限制, 我们定义了衡量估计( ME) 的问题, 在那里, 一个人预测自动衡量分数也没有参考值。 我们显示, 即使无法查阅参考, 我们的模型也可以估算在句级的自动计量值( $rho$=60%, BLEU = $rho$=51%, 其它计量值为$\rho$=51% ) 。 由于自动化指标与人类判断相关, 我们可以利用ME 任务来预培训 QE 模型。 对于 QE 任务, 我们发现, 在 QE 任务中, 培训前 与 抓痕培训( $\ rho$= 20 % ) 相比, 更好 。