ChatGPT 作为文本摘要中事实不一致性评估器的研究 (ChatGPT as a Factual Inconsistency Evaluator for Text Summarization) - 专知论文

会员服务 ·

0

不一致性 · 一致 · 文本摘要 · ChatGPT · 评估指标 ·

2023 年 4 月 13 日

ChatGPT as a Factual Inconsistency Evaluator for Text Summarization

翻译：ChatGPT 作为文本摘要中事实不一致性评估器的研究

Zheheng Luo,Qianqian Xie,Sophia Ananiadou

from arxiv, ongoing work, 12 pages, 4 figures

The performance of text summarization has been greatly boosted by pre-trained language models. A main concern of existing methods is that most generated summaries are not factually inconsistent with their source documents. To alleviate the problem, many efforts have focused on developing effective factuality evaluation metrics based on natural language inference, question answering, and syntactic dependency et al. However, these approaches are limited by either their high computational complexity or the uncertainty introduced by multi-component pipelines, resulting in only partial agreement with human judgement. Most recently, large language models(LLMs) have shown excellent performance in not only text generation but also language comprehension. In this paper, we particularly explore ChatGPT's ability to evaluate factual inconsistency under a zero-shot setting by examining it on both coarse-grained and fine-grained evaluation tasks including binary entailment inference, summary ranking, and consistency rating. Experimental results indicate that ChatGPT generally outperforms previous evaluation metrics across the three tasks, indicating its great potential for factual inconsistency evaluation. However, a closer inspection of ChatGPT's output reveals certain limitations including its preference for more lexically similar candidates, false reasoning, and inadequate understanding of instructions.

翻译：摘要：预训练语言模型使文本摘要的性能有了很大的提升。现有方法的一个主要问题是，大多数生成的摘要与其源文档存在事实上的不一致性。为了缓解这个问题，许多方法致力于基于自然语言推理，问题回答和句法依存等开发有效的事实性评估指标。然而，这些方法要么计算复杂度高，要么由多个组件的管道引入了不确定性，结果只与人类判断部分一致。最近，大型语言模型（LLM）在文本生成和语言理解方面的表现都非常出色。在本文中，我们特别探索 ChatGPT 在零样本情况下评估事实不一致性的能力，通过对二元蕴含推理、摘要排名和一致性评分等粗粒度和细粒度评估任务进行检查。实验结果表明，在三项任务中，ChatGPT通常优于以前的评估指标，表明它在事实不一致性评估方面具有巨大的潜力。然而，对ChatGPT的输出进行更详细的检查发现，它存在一定的局限性，包括更多相关候选者的偏好、虚假推断和指令理解不足等问题。

0

相关内容

不一致性

ChatGPT 背后的“功臣”——RLHF 技术详解

ChatGPT 背后的“功臣”——RLHF 技术详解

专知会员服务

169+阅读 · 2023年2月21日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【百度&北京大学】自然语言生成的保真性:分析、评价和优化方法的系统综述，Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

【百度&北京大学】自然语言生成的保真性:分析、评价和优化方法的系统综述，Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

专知会员服务

15+阅读 · 2022年3月11日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

【SIGIR2020】一个统一的双视图模型，用于具有不一致性损失的评论总结和情绪分类，A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

【SIGIR2020】一个统一的双视图模型，用于具有不一致性损失的评论总结和情绪分类，A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

专知会员服务

22+阅读 · 2020年6月3日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【微软】利用知识图谱提高抽象摘要的事实正确性，Boosting Factual Correctness

专知会员服务

18+阅读 · 2020年3月23日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

EMNLP 2022 | 校准预训练模型中的事实知识

EMNLP 2022 | 校准预训练模型中的事实知识

PaperWeekly

1+阅读 · 2022年11月22日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

PaperWeekly

0+阅读 · 2022年6月1日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

武器装备体系架构的跨领域组合决策分析与冲突消解方法

国家自然科学基金

113+阅读 · 2015年12月31日

函数数据变换模型及降维方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于动态描述逻辑的异构数据库数据整合技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

轻量密码非线性模块的设计和代数差错攻击

国家自然科学基金

0+阅读 · 2012年12月31日

Lai-Massey分组密码模型的安全性研究

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

极化合成孔径雷达(SAR)图像地物并行分割分类研究与应用

国家自然科学基金

1+阅读 · 2012年12月31日

恶性疟原虫外切酶体调控var基因表达的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

EGFR2单抗Herceptin修饰紫杉醇纳米胶束联合Survivin基因沉默靶向治疗鼻咽癌的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

Estimation of Multivariate Discrete Hawkes Processes: An Application to Incident Monitoring

Estimation of Multivariate Discrete Hawkes Processes: An Application to Incident Monitoring

Arxiv

0+阅读 · 2023年5月31日

Scalable Performance Analysis for Vision-Language Models

Arxiv

0+阅读 · 2023年5月31日

IDAS: Intent Discovery with Abstractive Summarization

Arxiv

0+阅读 · 2023年5月31日

PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning

Arxiv

0+阅读 · 2023年5月31日

Evaluating and Detecting ChatGPT's Responses on Abstractive Summarization

Arxiv

0+阅读 · 2023年5月29日

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Arxiv

0+阅读 · 2023年5月29日

DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization

Arxiv

0+阅读 · 2023年5月26日

CREST: A Joint Framework for Rationalization and Counterfactual Text Generation

Arxiv

0+阅读 · 2023年5月26日

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems

Arxiv

0+阅读 · 2023年5月26日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

VIP会员

文章信息

相关主题

相关VIP内容

ChatGPT 背后的“功臣”——RLHF 技术详解

ChatGPT 背后的“功臣”——RLHF 技术详解

专知会员服务

169+阅读 · 2023年2月21日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【百度&北京大学】自然语言生成的保真性:分析、评价和优化方法的系统综述，Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

【百度&北京大学】自然语言生成的保真性:分析、评价和优化方法的系统综述，Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

专知会员服务

15+阅读 · 2022年3月11日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

【SIGIR2020】一个统一的双视图模型，用于具有不一致性损失的评论总结和情绪分类，A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

【SIGIR2020】一个统一的双视图模型，用于具有不一致性损失的评论总结和情绪分类，A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

专知会员服务

22+阅读 · 2020年6月3日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【微软】利用知识图谱提高抽象摘要的事实正确性，Boosting Factual Correctness

专知会员服务

18+阅读 · 2020年3月23日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

EMNLP 2022 | 校准预训练模型中的事实知识

EMNLP 2022 | 校准预训练模型中的事实知识

PaperWeekly

1+阅读 · 2022年11月22日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

PaperWeekly

0+阅读 · 2022年6月1日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Estimation of Multivariate Discrete Hawkes Processes: An Application to Incident Monitoring

Estimation of Multivariate Discrete Hawkes Processes: An Application to Incident Monitoring

Arxiv

0+阅读 · 2023年5月31日

Scalable Performance Analysis for Vision-Language Models

Arxiv

0+阅读 · 2023年5月31日

IDAS: Intent Discovery with Abstractive Summarization

Arxiv

0+阅读 · 2023年5月31日

PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning

Arxiv

0+阅读 · 2023年5月31日

Evaluating and Detecting ChatGPT's Responses on Abstractive Summarization

Arxiv

0+阅读 · 2023年5月29日

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Arxiv

0+阅读 · 2023年5月29日

DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization

Arxiv

0+阅读 · 2023年5月26日

CREST: A Joint Framework for Rationalization and Counterfactual Text Generation

Arxiv

0+阅读 · 2023年5月26日

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems

Arxiv

0+阅读 · 2023年5月26日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

相关基金

武器装备体系架构的跨领域组合决策分析与冲突消解方法

国家自然科学基金

113+阅读 · 2015年12月31日

函数数据变换模型及降维方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于动态描述逻辑的异构数据库数据整合技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

轻量密码非线性模块的设计和代数差错攻击

国家自然科学基金

0+阅读 · 2012年12月31日

Lai-Massey分组密码模型的安全性研究

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

极化合成孔径雷达(SAR)图像地物并行分割分类研究与应用

国家自然科学基金

1+阅读 · 2012年12月31日

恶性疟原虫外切酶体调控var基因表达的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

EGFR2单抗Herceptin修饰紫杉醇纳米胶束联合Survivin基因沉默靶向治疗鼻咽癌的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员