In text summarization and simplification, system outputs must be evaluated along multiple dimensions such as relevance, factual consistency, fluency, and grammaticality, and a wide range of possible outputs could be of high quality. These properties make the development of an adaptable, reference-less evaluation metric both necessary and challenging. We introduce MaskEval, a reference-less metric for text summarization and simplification that operates by performing masked language modeling (MLM) on the concatenation of the candidate and the source texts. It features an attention-like weighting mechanism to modulate the relative importance of each MLM step, which crucially allows it to be adapted to evaluate different quality dimensions. We demonstrate its effectiveness on English summarization and simplification in terms of correlations with human judgments, and explore transfer scenarios between the two tasks.
翻译:在文本摘要和简化方面,系统产出必须按多个方面进行评价,例如相关性、事实一致性、流畅性和语法性,以及广泛的可能产出可以是高质量的。这些特性使得开发一个适应性强、无参考性强的评价衡量标准既必要又具有挑战性。我们引入了MaskEval,这是一个用于文本摘要和简化的无参考性衡量标准,通过对候选人和源文本的配置进行隐蔽语言模型(MLM)来操作。它有一个类似关注的权重机制,以调整每个MLM步骤的相对重要性,这非常关键地使其能够适应不同质量层面的评价。我们展示了它在英语汇总和简化与人类判断的相关性方面的效力,并探索了两项任务之间的转移设想。