争取将问题作为评价摘要内容质量的自动计量标准 (Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary) - 专知论文

会员服务 ·

0

INFORMS · Performer · 估计/估计量 · state-of-the-art · 可辨认的 ·

2021 年 7 月 26 日

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

翻译：争取将问题作为评价摘要内容质量的自动计量标准

Daniel Deutsch,Tania Bedrax-Weiss,Dan Roth

from arxiv, This is a pre-MIT Press publication version of the paper

A desirable property of a reference-based evaluation metric that measures the content quality of a summary is that it should estimate how much information that summary has in common with a reference. Traditional text overlap based metrics such as ROUGE fail to achieve this because they are limited to matching tokens, either lexically or via embeddings. In this work, we propose a metric to evaluate the content quality of a summary using question-answering (QA). QA-based methods directly measure a summary's information overlap with a reference, making them fundamentally different than text overlap metrics. We demonstrate the experimental benefits of QA-based metrics through an analysis of our proposed metric, QAEval. QAEval out-performs current state-of-the-art metrics on most evaluations using benchmark datasets, while being competitive on others due to limitations of state-of-the-art models. Through a careful analysis of each component of QAEval, we identify its performance bottlenecks and estimate that its potential upper-bound performance surpasses all other automatic metrics, approaching that of the gold-standard Pyramid Method.

翻译：衡量摘要内容质量的基于参考的评价指标的一个可取的属性是,它应估计摘要具有多少共同之处,并参照其中的参考内容; 传统文本重叠的基于指标,如ROUGE等传统文本的基于指标没有做到这一点,因为它们限于在法律上或通过嵌入方式对象征物进行匹配; 在这项工作中,我们提出一种衡量标准,用问答方法评价摘要的内容质量; 以质量A为基础的方法直接衡量摘要的信息与参考内容重叠,使其与文本重叠指标有根本的不同; 我们通过分析我们拟议的衡量标准,即QAEval. QAEval. QAAEval. QAEAval在使用基准数据集进行的大多数评价方面超越目前的最新指标,同时由于最新模型的局限性而在其他方面具有竞争力; 我们通过仔细分析定量评估的每个组成部分,查明其业绩瓶颈,并估计其潜在的上限性能超过所有其他自动指标,接近金质标的Pyramid方法。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

专知会员服务

42+阅读 · 2019年11月21日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

Context / Sequential / Session RS的区别

Context / Sequential / Session RS的区别

机器学习与推荐算法

3+阅读 · 2020年6月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Improving Question Answering Performance Using Knowledge Distillation and Active Learning

Arxiv

0+阅读 · 2021年9月26日

More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering

Arxiv

0+阅读 · 2021年9月25日

Calling Out Bluff: Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems

Calling Out Bluff: Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems

Arxiv

1+阅读 · 2021年9月24日

Object-driven Text-to-Image Synthesis via Adversarial Training

Object-driven Text-to-Image Synthesis via Adversarial Training

Arxiv

6+阅读 · 2019年2月27日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations

Arxiv

4+阅读 · 2018年5月18日

Deep learning evaluation using deep linguistic processing

Arxiv

3+阅读 · 2018年5月12日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

What Level of Quality can Neural Machine Translation Attain on Literary Text?

Arxiv

5+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

估计/估计量

state-of-the-art

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

专知会员服务

42+阅读 · 2019年11月21日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

Context / Sequential / Session RS的区别

Context / Sequential / Session RS的区别

机器学习与推荐算法

3+阅读 · 2020年6月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Improving Question Answering Performance Using Knowledge Distillation and Active Learning

Arxiv

0+阅读 · 2021年9月26日

More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering

Arxiv

0+阅读 · 2021年9月25日

Calling Out Bluff: Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems

Calling Out Bluff: Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems

Arxiv

1+阅读 · 2021年9月24日

Object-driven Text-to-Image Synthesis via Adversarial Training

Object-driven Text-to-Image Synthesis via Adversarial Training

Arxiv

6+阅读 · 2019年2月27日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations

Arxiv

4+阅读 · 2018年5月18日

Deep learning evaluation using deep linguistic processing

Arxiv

3+阅读 · 2018年5月12日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

What Level of Quality can Neural Machine Translation Attain on Literary Text?

Arxiv

5+阅读 · 2018年1月15日

微信扫码咨询专知VIP会员