GPT-3时代的新闻总结和评价 (News Summarization and Evaluation in the Era of GPT-3) - 专知论文

会员服务 ·

0

GPT-3 · MoDELS · 小样本学习 · Next · Prompt ·

2022 年 9 月 26 日

News Summarization and Evaluation in the Era of GPT-3

翻译：GPT-3时代的新闻总结和评价

Tanya Goyal,Junyi Jessy Li,Greg Durrett

from arxiv, All data shared at: https://tagoyal.github.io/zeroshot-news-annotations.html

The recent success of zero- and few-shot prompting with models like GPT-3 has led to a paradigm shift in NLP research. In this paper, we study its impact on text summarization, focusing on the classic benchmark domain of news summarization. First, we investigate how zero-shot GPT-3 compares against fine-tuned models trained on large summarization datasets. We show that not only do humans overwhelmingly prefer GPT-3 summaries, but these also do not suffer from common dataset-specific issues such as poor factuality. Next, we study what this means for evaluation, particularly the role of gold standard test sets. Our experiments show that both reference-based and reference-free automatic metrics, e.g. recently proposed QA- or entailment-based factuality approaches, cannot reliably evaluate zero-shot summaries. Finally, we discuss future research challenges beyond generic summarization, specifically, keyword- and aspect-based summarization, showing how dominant fine-tuning approaches compare to zero-shot prompting. To support further research, we release: (a) a corpus of 10K generated summaries from fine-tuned and zero-shot models across 4 standard summarization benchmarks, (b) 1K human preference judgments and rationales comparing different systems for generic- and keyword-based summarization.

翻译：最近,GPT-3等模型的零点和点点点成功,导致了NLP研究的范式转变。在本文中,我们研究了它对文本总和的影响,重点是典型的新闻总和基准领域。首先,我们调查了零点GPT-3如何与经过大规模总和数据集培训的微调模型相比。我们发现,人类不仅压倒性地更喜欢GPT-3摘要,而且这些也不受诸如事实质量差等共同的数据集特有问题的影响。接下来,我们研究了这对评价的意义,特别是金标准测试组的作用。我们的实验表明,基于参考和无参考的自动计量方法,例如最近提出的QA-或基于要求的事实质量方法,都无法可靠地评价零点摘要。最后,我们讨论了除通用总和外,具体地、基于关键词和基于方位和方位的汇总之外,今后的研究挑战,表明与零点促动等主要的微调方法。为了支持进一步的研究,我们发行了以下文件:(a)从精确调整和零点模型中生成的10K摘要,用于对4个标准标准级的模型的模型进行对比。

0

相关内容

GPT-3

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ARHGAP9基因在肝癌侵袭转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

HOXB7基因促胃癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

广义透孢黑团壳属Massarina分类与分子系统学研究

国家自然科学基金

0+阅读 · 2015年12月31日

高速列车转向架概念设计建模及设计参数定量分析研究

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

肿瘤相关巨噬细胞分泌CCL18上调HOTAIR促进食管癌转移

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

共形曲面的谱簇的渐近分析

国家自然科学基金

0+阅读 · 2011年12月31日

Levy过程驱动的无穷维随机动力系统的动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

粘土的加载速率效应与蠕变特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Arxiv

0+阅读 · 2022年11月1日

GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation

Arxiv

1+阅读 · 2022年11月1日

Synthetic ID Card Image Generation for Improving Presentation Attack Detection

Arxiv

0+阅读 · 2022年10月31日

Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency

Arxiv

0+阅读 · 2022年10月31日

Evaluation of large-scale synthetic data for Grammar Error Correction

Arxiv

0+阅读 · 2022年10月31日

A Survey on Multi-modal Summarization

Arxiv

49+阅读 · 2021年9月11日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

74+阅读 · 2019年8月14日

VIP会员

文章信息

相关主题

小样本学习

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

最新《扩散模型原理》新书，470页pdf

无人机作战：演进、创新与未来战场

AI 智能体简史

多模态空间推理在大模型时代：综述与基准测试

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

相关论文

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Arxiv

0+阅读 · 2022年11月1日

GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation

Arxiv

1+阅读 · 2022年11月1日

Synthetic ID Card Image Generation for Improving Presentation Attack Detection

Arxiv

0+阅读 · 2022年10月31日

Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency

Arxiv

0+阅读 · 2022年10月31日

Evaluation of large-scale synthetic data for Grammar Error Correction

Arxiv

0+阅读 · 2022年10月31日

A Survey on Multi-modal Summarization

Arxiv

49+阅读 · 2021年9月11日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

74+阅读 · 2019年8月14日

相关基金

ARHGAP9基因在肝癌侵袭转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

HOXB7基因促胃癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

广义透孢黑团壳属Massarina分类与分子系统学研究

国家自然科学基金

0+阅读 · 2015年12月31日

高速列车转向架概念设计建模及设计参数定量分析研究

国家自然科学基金

0+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

肿瘤相关巨噬细胞分泌CCL18上调HOTAIR促进食管癌转移

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

共形曲面的谱簇的渐近分析

国家自然科学基金

0+阅读 · 2011年12月31日

Levy过程驱动的无穷维随机动力系统的动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

粘土的加载速率效应与蠕变特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员