如何确定自然语言理解的基准? (What Will it Take to Fix Benchmarking in Natural Language Understanding?) - 专知论文

会员服务 ·

0

NLU · 有偏 · Performer · Better · 可理解性 ·

2021 年 10 月 15 日

What Will it Take to Fix Benchmarking in Natural Language Understanding?

翻译：如何确定自然语言理解的基准?

Samuel R. Bowman,George E. Dahl

from arxiv, Proceedings of NAACL 2020. This revision adds a missing acknowledgment

Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and biased systems score so highly on standard benchmarks that there is little room for researchers who develop better systems to demonstrate their improvements. The recent trend to abandon IID benchmarks in favor of adversarially-constructed, out-of-distribution test sets ensures that current models will perform poorly, but ultimately only obscures the abilities that we want our benchmarks to measure. In this position paper, we lay out four criteria that we argue NLU benchmarks should meet. We argue most current benchmarks fail at these criteria, and that adversarial data collection does not meaningfully address the causes of these failures. Instead, restoring a healthy evaluation ecosystem will require significant progress in the design of benchmark datasets, the reliability with which they are annotated, their size, and the ways they handle social bias.

翻译：对许多自然语言理解(NLU)任务的评价被打破:在标准基准上,不可靠和有偏向的系统得分很高,以致研究人员发展更好的系统以展示其改进之处的空间很小。最近的趋势是放弃ID基准,转而采用对抗性构筑的超出分配的测试,这确保了目前的模型运行不善,但最终只能掩盖我们想要衡量基准的能力。在本立场文件中,我们列出了我们争论NLU基准应该达到的四项标准。我们争论的是,目前大多数基准在这些标准上都失败了,而对抗性数据收集并没有有意义地解决这些失败的原因。相反,恢复健康的生态系统将需要在基准数据集的设计、其注释的可靠性、其大小以及它们处理社会偏见的方式方面取得重大进展。

0

相关内容

NLU

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

94+阅读 · 2021年10月19日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

专知会员服务

42+阅读 · 2020年3月17日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

预训练语言模型究竟捕获了什么？（oLMpics - On what Language Model Pre-training Captures）

预训练语言模型究竟捕获了什么？（oLMpics - On what Language Model Pre-training Captures）

专知会员服务

14+阅读 · 2020年1月3日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

意识是一种数学模式

意识是一种数学模式

CreateAMind

3+阅读 · 2019年6月24日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

想入门钻研NLP？先听听Bengio等25位世界顶级自然语言处理学者的建议（附问答实录及分享下载）

想入门钻研NLP？先听听Bengio等25位世界顶级自然语言处理学者的建议（附问答实录及分享下载）

专知

5+阅读 · 2018年12月12日

开发者应当了解的18套机器学习平台

开发者应当了解的18套机器学习平台

深度学习世界

5+阅读 · 2018年8月14日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

Natural 自然语言处理（NLP）「全解析」

Natural 自然语言处理（NLP）「全解析」

人工智能学家

14+阅读 · 2017年9月23日

An Informative Tracking Benchmark

Arxiv

0+阅读 · 2021年12月13日

Towards More Robust Natural Language Understanding

Arxiv

0+阅读 · 2021年12月1日

Adversarial NLI: A New Benchmark for Natural Language Understanding

Arxiv

4+阅读 · 2019年10月31日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

Multi-Task Deep Neural Networks for Natural Language Understanding

Multi-Task Deep Neural Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年1月31日

Multi-task learning to improve natural language understanding

Arxiv

4+阅读 · 2018年12月17日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

Did the Model Understand the Question?

Arxiv

4+阅读 · 2018年5月14日

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Arxiv

7+阅读 · 2018年3月28日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

VIP会员

文章信息

相关主题

相关VIP内容

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

94+阅读 · 2021年10月19日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

专知会员服务

42+阅读 · 2020年3月17日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

预训练语言模型究竟捕获了什么？（oLMpics - On what Language Model Pre-training Captures）

预训练语言模型究竟捕获了什么？（oLMpics - On what Language Model Pre-training Captures）

专知会员服务

14+阅读 · 2020年1月3日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《太空边缘（临近空间）的武器化？军事高空平台的进展与前景》

《利用星基增强系统（SBAS）信号进行射频干扰（RFI）检测与特征分析》

美陆军在“艾布拉姆斯”坦克与“布拉德利”步战车上测试“牛蛙”反无人机炮塔

《军事领域特性及其对军事人工智能应用的影响》

相关资讯

意识是一种数学模式

意识是一种数学模式

CreateAMind

3+阅读 · 2019年6月24日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

想入门钻研NLP？先听听Bengio等25位世界顶级自然语言处理学者的建议（附问答实录及分享下载）

想入门钻研NLP？先听听Bengio等25位世界顶级自然语言处理学者的建议（附问答实录及分享下载）

专知

5+阅读 · 2018年12月12日

开发者应当了解的18套机器学习平台

开发者应当了解的18套机器学习平台

深度学习世界

5+阅读 · 2018年8月14日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

Natural 自然语言处理（NLP）「全解析」

Natural 自然语言处理（NLP）「全解析」

人工智能学家

14+阅读 · 2017年9月23日

相关论文

An Informative Tracking Benchmark

Arxiv

0+阅读 · 2021年12月13日

Towards More Robust Natural Language Understanding

Arxiv

0+阅读 · 2021年12月1日

Adversarial NLI: A New Benchmark for Natural Language Understanding

Arxiv

4+阅读 · 2019年10月31日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

Multi-Task Deep Neural Networks for Natural Language Understanding

Multi-Task Deep Neural Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年1月31日

Multi-task learning to improve natural language understanding

Arxiv

4+阅读 · 2018年12月17日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

Did the Model Understand the Question?

Arxiv

4+阅读 · 2018年5月14日

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Arxiv

7+阅读 · 2018年3月28日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

微信扫码咨询专知VIP会员