Despite significant progress has been achieved in text summarization, factual inconsistency in generated summaries still severely limits its practical applications. Among the key factors to ensure factual consistency, a reliable automatic evaluation metric is the first and the most crucial one. However, existing metrics either neglect the intrinsic cause of the factual inconsistency or rely on auxiliary tasks, leading to an unsatisfied correlation with human judgments or increasing the inconvenience of usage in practice. In light of these challenges, we propose a novel metric to evaluate the factual consistency in text summarization via counterfactual estimation, which formulates the causal relationship among the source document, the generated summary, and the language prior. We remove the effect of language prior, which can cause factual inconsistency, from the total causal effect on the generated summary, and provides a simple yet effective way to evaluate consistency without relying on other auxiliary tasks. We conduct a series of experiments on three public abstractive text summarization datasets, and demonstrate the advantages of the proposed metric in both improving the correlation with human judgments and the convenience of usage. The source code is available at https://github.com/xieyxclack/factual_coco.
翻译:尽管在案文摘要方面取得了显著进展,但生成摘要中的实际不一致仍然严重限制了其实际应用,在确保事实一致性的关键因素中,可靠的自动评价衡量标准是第一个和最重要的因素,然而,现有的衡量标准要么忽视了事实不一致的内在原因,要么依赖辅助任务,导致与人类判断的不满意关系,或者增加了实践中使用不便。鉴于这些挑战,我们提出一个新的衡量标准,通过反事实估计评价文本摘要的实际一致性,该衡量标准在源文件、生成的摘要和之前的语文之间建立了因果关系。我们从对生成的摘要产生的全部因果关系中消除了先前语文的影响,这种影响可能造成事实上的不一致,我们提供了一种简单而有效的方法,用以评价一致性,而不必依赖其他辅助任务。我们就三种公开的抽象文本摘要数据集进行了一系列实验,并展示了拟议指标在改进与人类判断和使用便利的关系方面的优势。源代码见https://github.com/xiexlack/factal_co。