This paper addresses the problem of evaluating the quality of automatically generated subtitles, which includes not only the quality of the machine-transcribed or translated speech, but also the quality of line segmentation and subtitle timing. We propose SubER - a single novel metric based on edit distance with shifts that takes all of these subtitle properties into account. We compare it to existing metrics for evaluating transcription, translation, and subtitle quality. A careful human evaluation in a post-editing scenario shows that the new metric has a high correlation with the post-editing effort and direct human assessment scores, outperforming baseline metrics considering only the subtitle text, such as WER and BLEU, and existing methods to integrate segmentation and timing features.
翻译:本文讨论了评价自动制作字幕的质量问题,这不仅包括机器转换或翻译的语音的质量,也包括线条分割和字幕时间的质量。我们建议采用SUER(SUER)——一个单一的新指标,其依据是编辑距离和所有字幕转换,将所有这些字幕属性都考虑在内。我们将其与用于评价抄录、翻译和字幕质量的现有指标进行比较。编辑后情景中仔细的人力评估表明,新指标与编辑后的努力和直接的人类评估分数高度相关,优于仅考虑字幕文本(如WER和BLEU)以及整合分解和计时功能的现有方法的基线指标。