We present FactPEGASUS, an abstractive summarization model that addresses the problem of factuality during pre-training and fine-tuning: (1) We augment the sentence selection strategy of PEGASUS's (Zhang et al., 2020) pre-training objective to create pseudo-summaries that are both important and factual; (2) We introduce three complementary components for fine-tuning. The corrector removes hallucinations present in the reference summary, the contrastor uses contrastive learning to better differentiate nonfactual summaries from factual ones, and the connector bridges the gap between the pre-training and fine-tuning for better transfer of knowledge. Experiments on three downstream tasks demonstrate that FactPEGASUS substantially improves factuality evaluated by multiple automatic metrics and humans. Our thorough analysis suggests that FactPEGASUS is more factual than using the original pre-training objective in zero-shot and few-shot settings, retains factual behavior more robustly than strong baselines, and does not rely entirely on becoming more extractive to improve factuality. Our code and data are publicly available at: https://github.com/meetdavidwan/factpegasus
翻译:我们介绍了一个抽象的总结模型,它解决了培训前和微调期间的事实质量问题:(1) 我们强化了PEGASUS(张等人,2020年)的判刑选择战略(张等人,2020年)培训前的目标,以创建既重要又真实的假摘要;(2) 我们为微调引入了三个互补部分; 纠正器去除参考摘要中存在的幻觉,对比器使用对比式学习来更好地区分非事实摘要与事实摘要,连接器弥合了培训前和微调之间的差距,以更好地转让知识。关于三个下游任务的实验表明,FEGOSUS大大改进了由多个自动指标和人类评估的事实质量。 我们的透彻分析表明,“GEPEGASUS”比在零发和几发环境中使用最初的培训前目标更具事实性,保留的事实行为比强的基线更有力,并不完全依靠更多的采掘来提高事实质量。我们的代码和数据在https://github.com/meetdaviwan/factgasgas 上公开提供。