《人的标准和自动计量:对代代评价的基准》 (Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation) - 专知论文

会员服务 ·

0

Analysis · 相关系数 · motivation · 讲稿 · 正交 ·

2022 年 8 月 25 日

Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

翻译：《人的标准和自动计量:对代代评价的基准》

Cyril Chhun,Pierre Colombo,Chloé Clavel,Fabian M. Suchanek

from arxiv, 43 pages, 38 figures. To appear in Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022)

Research on Automatic Story Generation (ASG) relies heavily on human and automatic evaluation. However, there is no consensus on which human evaluation criteria to use, and no analysis of how well automatic criteria correlate with them. In this paper, we propose to re-evaluate ASG evaluation. We introduce a set of 6 orthogonal and comprehensive human criteria, carefully motivated by the social sciences literature. We also present HANNA, an annotated dataset of 1,056 stories produced by 10 different ASG systems. HANNA allows us to quantitatively evaluate the correlations of 72 automatic metrics with human criteria. Our analysis highlights the weaknesses of current metrics for ASG and allows us to formulate practical recommendations for ASG evaluation.

翻译：关于自动故事一代(ASG)的研究严重依赖人和自动评价,但是,对于人类评价标准使用哪些标准,没有达成共识,也没有分析自动标准与这些标准之间的关系。我们建议重新评估ASG评价。我们提出了一套由社会科学文献仔细推动的6个正统和全面的人标准。我们还介绍了由10个ASG系统制作的1 056个附带说明的HANNA数据集。HANNA使我们能够从数量上评价72个自动衡量标准与人标准的相关性。我们的分析突出了ASG现有衡量标准的弱点,并使我们能够为ASG评价提出切实可行的建议。

0

相关内容

Analysis

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

专知会员服务

24+阅读 · 2022年3月15日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

113+阅读 · 2021年4月17日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于QoP与QoS融合的通信网络系统优化设计

国家自然科学基金

0+阅读 · 2014年12月31日

多时间尺度动态神经网络不确定非线性MIMO系统状态观测研究

国家自然科学基金

0+阅读 · 2013年12月31日

多因素耦合的激光熔覆层应力及应力损伤的超声波评价物理机制

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

非金属/过渡金属共掺杂的GaN纳米线可控制备及室温铁磁性研究

国家自然科学基金

0+阅读 · 2013年12月31日

新变指标Besov-Triebel-Lizorkin型函数空间及算子有界性

国家自然科学基金

0+阅读 · 2012年12月31日

基于上下文认知机制的Ad Hoc网络性能优化策略管理模型的研究

国家自然科学基金

0+阅读 · 2012年12月31日

地震电磁辐射宽频带组网观测与辨识研究

国家自然科学基金

0+阅读 · 2011年12月31日

Navier-Stokes方程稳定化有限元方法后验误差估计

国家自然科学基金

0+阅读 · 2011年12月31日

基于多源观测数据的三维云融合分析算法研究

国家自然科学基金

2+阅读 · 2009年12月31日

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Arxiv

0+阅读 · 2022年10月6日

The Vendi Score: A Diversity Evaluation Metric for Machine Learning

Arxiv

0+阅读 · 2022年10月5日

A Systematic Survey on Deep Generative Models for Graph Generation

Arxiv

18+阅读 · 2022年10月4日

ABC of the Future

Arxiv

0+阅读 · 2022年10月3日

Fine-grained Contrastive Learning for Definition Generation

Arxiv

0+阅读 · 2022年10月2日

Small Area Estimation of Inequality Measures using Mixtures of Betas

Arxiv

0+阅读 · 2022年9月30日

A Survey of Deep Learning for Low-Shot Object Detection

Arxiv

21+阅读 · 2021年12月6日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

专知会员服务

24+阅读 · 2022年3月15日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

113+阅读 · 2021年4月17日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

数据智能体综述：新兴范式还是被高估的炒作？

海底战已至：美国构思海底安全战略 | 最新报告

【ICCV2025教程】视觉异常检测中的基础模型：进展、挑战与应用

美军将无人自主等新技术融入潜艇部队以更具杀伤力

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Arxiv

0+阅读 · 2022年10月6日

The Vendi Score: A Diversity Evaluation Metric for Machine Learning

Arxiv

0+阅读 · 2022年10月5日

A Systematic Survey on Deep Generative Models for Graph Generation

Arxiv

18+阅读 · 2022年10月4日

ABC of the Future

Arxiv

0+阅读 · 2022年10月3日

Fine-grained Contrastive Learning for Definition Generation

Arxiv

0+阅读 · 2022年10月2日

Small Area Estimation of Inequality Measures using Mixtures of Betas

Arxiv

0+阅读 · 2022年9月30日

A Survey of Deep Learning for Low-Shot Object Detection

Arxiv

21+阅读 · 2021年12月6日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

相关基金

基于QoP与QoS融合的通信网络系统优化设计

国家自然科学基金

0+阅读 · 2014年12月31日

多时间尺度动态神经网络不确定非线性MIMO系统状态观测研究

国家自然科学基金

0+阅读 · 2013年12月31日

多因素耦合的激光熔覆层应力及应力损伤的超声波评价物理机制

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

非金属/过渡金属共掺杂的GaN纳米线可控制备及室温铁磁性研究

国家自然科学基金

0+阅读 · 2013年12月31日

新变指标Besov-Triebel-Lizorkin型函数空间及算子有界性

国家自然科学基金

0+阅读 · 2012年12月31日

基于上下文认知机制的Ad Hoc网络性能优化策略管理模型的研究

国家自然科学基金

0+阅读 · 2012年12月31日

地震电磁辐射宽频带组网观测与辨识研究

国家自然科学基金

0+阅读 · 2011年12月31日

Navier-Stokes方程稳定化有限元方法后验误差估计

国家自然科学基金

0+阅读 · 2011年12月31日

基于多源观测数据的三维云融合分析算法研究

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员