Abstractive summarization models often generate inconsistent summaries containing factual errors or hallucinated content. Recent works focus on correcting factual errors in generated summaries via post-editing. Such correction models are trained using adversarial non-factual summaries constructed using heuristic rules for injecting errors. However, generating non-factual summaries using heuristics often does not generalize well to actual model errors. In this work, we propose to generate hard, representative synthetic examples of non-factual summaries through infilling language models. With this data, we train a more robust fact-correction model to post-edit the summaries to improve factual consistency. Through quantitative and qualitative experiments on two popular summarization datasets -- CNN/DM and XSum -- we show that our approach vastly outperforms prior methods in correcting erroneous summaries. Our model -- FactEdit -- improves factuality scores by over ~11 points on CNN/DM and over ~31 points on XSum on average across multiple summarization models, producing more factual summaries while maintaining competitive summarization quality.
翻译:抽象总结模型往往产生含有事实错误或幻觉内容的不一致摘要。 最近的工作重点是纠正通过编辑后的编辑后生成摘要中的事实错误。 这些纠正模型是使用注射错误的超自然规则制作的对抗性非事实摘要来培训的。 然而,使用超自然学生成非事实摘要往往不及实际模型错误。 在这项工作中,我们提议通过填充语言模型来生成非事实摘要的硬性、有代表性的合成示例。 有了这些数据,我们培训了一个更强有力的事实纠正模型,以便在编辑摘要后改进事实一致性。 通过对两个普及的汇总数据集 -- -- CNN/DM和XSum -- -- 进行定量和定性试验,我们表明我们的方法大大优于纠正错误摘要的先前方法。我们的模型 -- -- 事实编辑 -- -- 提高了在CNN/DM和XSum上的平均~11点和在多个总结模型上的~31点的事实质量,产生了更多事实摘要,同时保持竞争性的总结质量。