The issue of factual consistency in abstractive summarization has attracted much attention in recent years, and the evaluation of factual consistency between summary and document has become an important and urgent task. Most of the current evaluation metrics are adopted from the question answering (QA). However, the application of QA-based metrics is extremely time-consuming in practice, causing the iteration cycle of abstractive summarization research to be severely prolonged. In this paper, we propose a new method called ClozE to evaluate factual consistency by cloze model, instantiated based on masked language model(MLM), with strong interpretability and substantially higher speed. We demonstrate that ClozE can reduce the evaluation time by nearly 96$\%$ relative to QA-based metrics while retaining their interpretability and performance through experiments on six human-annotated datasets and a meta-evaluation benchmark GO FIGURE \citep{gabriel2020go}. We also implement experiments to further demonstrate more characteristics of ClozE in terms of performance and speed. In addition, we conduct an experimental analysis of the limitations of ClozE, which suggests future research directions. The code and models for ClozE will be released upon the paper acceptance.
翻译:近年来,抽象归纳的实际一致性问题引起了许多注意,对摘要和文件之间事实一致性的评价已成为一项重要和紧迫的任务,目前评价指标大多是从问答中采用的。然而,基于质量评估指标的应用在实践中极其耗费时间,导致抽象汇总研究的迭代周期严重延长。在本文件中,我们提议了一种称为ClozE的新方法,用基于隐蔽语言模型(MLMM)的即时性来评价真实一致性,该模型具有很强的可解释性,而且速度要快得多。我们证明,ClozE能够将评估时间比基于质量评估的指标减少近96美元,同时通过试验六套人文说明数据集和元评价基准GO FIGUORE \citepepgabriel20go}来保持其可解释性和性。我们还进行了实验,以进一步显示ClozE在性能和速度方面的更多特征。此外,我们对ClozE的局限性进行了实验性分析,这将表明ClozE今后的研究方向和格式。