Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging techniques from information retrieval. RISE is first trained as a retrieval task using a dual-encoder retrieval setup, and can then be subsequently utilized for evaluating a generated summary given an input document, without gold reference summaries. RISE is especially well suited when working on new datasets where one may not have reference summaries available for evaluation. We conduct comprehensive experiments on the SummEval benchmark (Fabbri et al., 2021) and the results show that RISE has higher correlation with human evaluations compared to many past approaches to summarization evaluation. Furthermore, RISE also demonstrates data-efficiency and generalizability across languages.
翻译:评估自动生成的文本摘要是一项艰巨的任务。虽然有许多有趣的方法,但它们仍然不能进行人文评价。我们提出RISE,这是利用信息检索技术评价摘要的新方法。RISE最初是作为一个检索任务,使用双重编码检索装置进行检索培训,随后可以用来评价产生的摘要,提供输入文件,而没有黄金参考摘要。RISE特别适合在可能没有参考摘要可供评价的新数据集开展工作。我们进行了关于SummEval基准的全面试验(Fabbri等人,2021年),结果显示,RISE与人类评价的相关性高于以往的许多总结评价方法。此外,RISE还展示了数据效率和跨语言的通用性。