产生挑战:准确评价共同任务的成果 (Generation Challenges: Results of the Accuracy Evaluation Shared Task) - 专知论文

会员服务 ·

0

模型评估 · 样例 · 推断 · 自然语言处理 ·

2021 年 8 月 15 日

Generation Challenges: Results of the Accuracy Evaluation Shared Task

翻译：产生挑战:准确评价共同任务的成果

Craig Thomson,Ehud Reiter

from arxiv, To appear in proceedings of INGL2021

The Shared Task on Evaluating Accuracy focused on techniques (both manual and automatic) for evaluating the factual accuracy of texts produced by neural NLG systems, in a sports-reporting domain. Four teams submitted evaluation techniques for this task, using very different approaches and techniques. The best-performing submissions did encouragingly well at this difficult task. However, all automatic submissions struggled to detect factual errors which are semantically or pragmatically complex (for example, based on incorrect computation or inference).

翻译：共同评估准确性的任务侧重于评估神经神经导航定位系统在体育报告领域产生的文本的实际准确性的技术(人工和自动),四个小组使用非常不同的方法和技术提交了这项任务的评价技术,业绩最佳的提交材料在这项困难的任务中表现良好,令人鼓舞,然而,所有自动提交材料都努力找出在音义上或实际上复杂的事实错误(例如,基于不正确的计算或推断)。

0

相关内容

模型评估

机器学习系统设计系统评估标准

人工智能在5G系统中应用综述

专知会员服务

50+阅读 · 2021年6月3日

【重磅】2021年IEEE Fellow出炉！ 282位新晋升会士！七十多位华人当选！

专知会员服务

23+阅读 · 2020年11月25日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

自然语言处理常见数据集、论文最全整理分享

自然语言处理常见数据集、论文最全整理分享

深度学习与NLP

11+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

专知

6+阅读 · 2018年2月19日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Arxiv

0+阅读 · 2021年10月9日

Process Extraction from Text: state of the art and challenges for the future

Arxiv

0+阅读 · 2021年10月7日

Reverse Engineering Configurations of Neural Text Generation Models

Arxiv

5+阅读 · 2020年4月13日

BERTScore: Evaluating Text Generation with BERT

Arxiv

5+阅读 · 2019年4月21日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

Planar Object Tracking in the Wild: A Benchmark

Arxiv

5+阅读 · 2018年5月22日

The challenge of simultaneous object detection and pose estimation: a comparative study

Arxiv

6+阅读 · 2018年1月24日

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Arxiv

5+阅读 · 2017年12月12日

Neural Response Generation with Dynamic Vocabularies

Arxiv

5+阅读 · 2017年11月30日

VIP会员

文章信息

相关主题

自然语言处理

相关VIP内容

人工智能在5G系统中应用综述

专知会员服务

50+阅读 · 2021年6月3日

【重磅】2021年IEEE Fellow出炉！ 282位新晋升会士！七十多位华人当选！

专知会员服务

23+阅读 · 2020年11月25日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

自然语言处理常见数据集、论文最全整理分享

自然语言处理常见数据集、论文最全整理分享

深度学习与NLP

11+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

专知

6+阅读 · 2018年2月19日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Arxiv

0+阅读 · 2021年10月9日

Process Extraction from Text: state of the art and challenges for the future

Arxiv

0+阅读 · 2021年10月7日

Reverse Engineering Configurations of Neural Text Generation Models

Arxiv

5+阅读 · 2020年4月13日

BERTScore: Evaluating Text Generation with BERT

Arxiv

5+阅读 · 2019年4月21日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

Planar Object Tracking in the Wild: A Benchmark

Arxiv

5+阅读 · 2018年5月22日

The challenge of simultaneous object detection and pose estimation: a comparative study

Arxiv

6+阅读 · 2018年1月24日

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Arxiv

5+阅读 · 2017年12月12日

Neural Response Generation with Dynamic Vocabularies

Arxiv

5+阅读 · 2017年11月30日

微信扫码咨询专知VIP会员