While automatic summarization evaluation methods developed for English are routinely applied to other languages, this is the first attempt to systematically quantify their panlinguistic efficacy. We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall). Based on this, we evaluate 19 summarization evaluation metrics, and find that using multilingual BERT within BERTScore performs well across all languages, at a level above that for English.
翻译:虽然为英语开发的自动总结评价方法通常适用于其他语言,但这是首次尝试系统地量化其全语效力。 我们对八种不同语言进行总结,人工为重点(精度)和覆盖面(回顾)提供摘要说明。 在此基础上,我们评估了19项总结评价指标,发现在BERTScore中使用多语种的BERT在所有语言中都表现良好,比英语要好。