与FRANK的抽象总结了解事实:事实计量基准 (Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics) - 专知论文

会员服务 ·

0

可理解性 · surge · 可辨认的 · binary · state-of-the-art ·

2021 年 4 月 27 日

Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics

翻译：与FRANK的抽象总结了解事实:事实计量基准

Artidoro Pagnoni,Vidhisha Balachandran,Yulia Tsvetkov

from arxiv, Accepted at NAACL 2021

Modern summarization models generate highly fluent but often factually unreliable outputs. This motivated a surge of metrics attempting to measure the factuality of automatically generated summaries. Due to the lack of common benchmarks, these metrics cannot be compared. Moreover, all these methods treat factuality as a binary concept and fail to provide deeper insights into the kinds of inconsistencies made by different systems. To address these limitations, we devise a typology of factual errors and use it to collect human annotations of generated summaries from state-of-the-art summarization systems for the CNN/DM and XSum datasets. Through these annotations, we identify the proportion of different categories of factual errors in various summarization models and benchmark factuality metrics, showing their correlation with human judgment as well as their specific strengths and weaknesses.

翻译：现代总和模型产生非常流利但往往在事实上不可靠的产出。这促使大量指标试图衡量自动生成摘要的真实性。由于缺乏共同基准,这些尺度无法比较。此外,所有这些方法将事实质量作为一个二进制概念处理,未能更深入地了解不同系统造成的各种不一致情况。为克服这些限制,我们设计了事实错误类型,并用它来收集CNN/DM和XSum数据集最先进的总和系统中产生的摘要的人类说明。我们通过这些说明,在各种总和模型和基准事实质量指标中确定不同类别事实错误的比例,显示它们与人类判断的相关性以及它们的具体长处和弱点。

0

相关内容

可理解性

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

64+阅读 · 2021年4月24日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

92+阅读 · 2020年7月10日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

249+阅读 · 2020年4月19日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

58+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

177+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

104+阅读 · 2019年10月9日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

Cardinality Minimization, Constraints, and Regularization: A Survey

Arxiv

0+阅读 · 2021年6月17日

Estimating spatially-varying density and time-varying demographics with open population spatial capture-recapture: a photo-ID case study on bottlenose dolphins in Barataria Bay, Louisana, USA

Arxiv

0+阅读 · 2021年6月17日

A Topic Coverage Approach to Evaluation of Topic Models

Arxiv

0+阅读 · 2021年6月16日

Causal Navigation by Continuous-time Neural Networks

Arxiv

0+阅读 · 2021年6月15日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Interference and Generalization in Temporal Difference Learning

Arxiv

8+阅读 · 2020年3月13日

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

Arxiv

9+阅读 · 2019年9月4日

TBD: Benchmarking and Analyzing Deep Neural Network Training

Arxiv

3+阅读 · 2018年3月16日

Collaborative Autoencoder for Recommender Systems

Arxiv

9+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

64+阅读 · 2021年4月24日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

92+阅读 · 2020年7月10日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

249+阅读 · 2020年4月19日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

58+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

177+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

104+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

量子计算在非正规战争中的新兴潜力

《生成可解释军事行动方案（COA）》

《量子信息科学与技术对国家安全的影响》最新118页

AI应用追寻系列报告（一）：AI陪伴，下一个启元

相关资讯

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Cardinality Minimization, Constraints, and Regularization: A Survey

Arxiv

0+阅读 · 2021年6月17日

Estimating spatially-varying density and time-varying demographics with open population spatial capture-recapture: a photo-ID case study on bottlenose dolphins in Barataria Bay, Louisana, USA

Arxiv

0+阅读 · 2021年6月17日

A Topic Coverage Approach to Evaluation of Topic Models

Arxiv

0+阅读 · 2021年6月16日

Causal Navigation by Continuous-time Neural Networks

Arxiv

0+阅读 · 2021年6月15日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Interference and Generalization in Temporal Difference Learning

Arxiv

8+阅读 · 2020年3月13日

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

Arxiv

9+阅读 · 2019年9月4日

TBD: Benchmarking and Analyzing Deep Neural Network Training

Arxiv

3+阅读 · 2018年3月16日

Collaborative Autoencoder for Recommender Systems

Arxiv

9+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员