"约翰50岁,儿子65岁吗?" 评估NLP模型对可行性的理解 ("John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility) - 专知论文

会员服务 ·

0

可行 · MoDELS · 可理解性 · state-of-the-art · 知识 (knowledge) ·

2023 年 2 月 3 日

"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility

翻译："约翰50岁,儿子65岁吗?" 评估NLP模型对可行性的理解

Himanshu Gupta,Neeraj Varshney,Swaroop Mishra,Kuntal Kumar Pal,Saurabh Arjun Sawant,Kevin Scaria,Siddharth Goyal,Chitta Baral

from arxiv, EACL 2023

In current NLP research, large-scale language models and their abilities are widely being discussed. Some recent works have also found notable failures of these models. Often these failure examples involve complex reasoning abilities. This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible. To this end, we introduce FeasibilityQA, a question-answering dataset involving binary classification (BCQ) and multi-choice multi-correct questions (MCQ) that test understanding of feasibility. We show that even state-of-the-art models such as GPT-3, GPT-2, and T5 struggle to answer the feasibility questions correctly. Specifically, on MCQ and BCQ questions, GPT-3 achieves an accuracy of just (19%, 62%) and (25%, 64%) in zero-shot and few-shot settings, respectively. We also evaluate models by providing relevant knowledge statements required to answer the question. We find that the additional knowledge leads to a 7% gain in performance, but the overall performance still remains low. These results make one wonder how much commonsense knowledge about action feasibility is encoded in state-of-the-art models and how well they can reason about it.

翻译：在目前国家语言方案的研究中,正在广泛讨论大规模语言模型及其能力。最近的一些工作也发现这些模型有显著的失败。这些失败的例子往往涉及复杂的推理能力。这项工作侧重于简单的常识能力,对行动(或其效果)何时可行进行推理。为此,我们引入了可行性QA,这是一个回答问题的数据集,涉及二元分类(BCQ)和多选取的多校正问题,测试可行性。我们发现,更多的知识导致7%的绩效收益,但总体绩效仍然很低。这些结果使人怀疑,关于行动可行性的常见知识在MMCQ和BCQ问题上的准确度分别为19%、62%和25%、64%,在零发和几发环境中的准确度。我们还通过提供回答问题所需的相关知识声明来评价模型。我们发现,额外的知识导致7%的绩效,但总体绩效仍然很低。这些结果使人怀疑,关于行动可行性的常见知识是如何在状态模型中被编码的。

0

相关内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

小系统中的反常扩散和Kramers逃逸问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

自旋轨道耦合BEC系统的混沌特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

溶剂热法FeSe基超导材料制备和物性研究

国家自然科学基金

0+阅读 · 2014年12月31日

水解诱导聚乳酸及其碳基纳米复合材料分子有序与结晶的研究

国家自然科学基金

0+阅读 · 2014年12月31日

偏心搅拌反应器内自由表面湍流流动与混合特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

泛函不等式与随机微分方程上的大偏差问题

国家自然科学基金

0+阅读 · 2012年12月31日

Asperger综合症情绪认知的神经心理调控机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Scaling Expert Language Models with Unsupervised Domain Discovery

Arxiv

0+阅读 · 2023年3月24日

Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods

Arxiv

0+阅读 · 2023年3月24日

Reliable and Efficient Evaluation of Adversarial Robustness for Deep Hashing-Based Retrieval

Arxiv

0+阅读 · 2023年3月22日

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

Arxiv

0+阅读 · 2023年3月21日

Practical and Ethical Challenges of Large Language Models in Education: A Systematic Literature Review

Arxiv

0+阅读 · 2023年3月17日

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal

Arxiv

21+阅读 · 2022年12月12日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

VIP会员

文章信息

相关主题

state-of-the-art

知识 (knowledge)

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Scaling Expert Language Models with Unsupervised Domain Discovery

Arxiv

0+阅读 · 2023年3月24日

Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods

Arxiv

0+阅读 · 2023年3月24日

Reliable and Efficient Evaluation of Adversarial Robustness for Deep Hashing-Based Retrieval

Arxiv

0+阅读 · 2023年3月22日

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

Arxiv

0+阅读 · 2023年3月21日

Practical and Ethical Challenges of Large Language Models in Education: A Systematic Literature Review

Arxiv

0+阅读 · 2023年3月17日

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal

Arxiv

21+阅读 · 2022年12月12日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

相关基金

小系统中的反常扩散和Kramers逃逸问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

自旋轨道耦合BEC系统的混沌特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

溶剂热法FeSe基超导材料制备和物性研究

国家自然科学基金

0+阅读 · 2014年12月31日

水解诱导聚乳酸及其碳基纳米复合材料分子有序与结晶的研究

国家自然科学基金

0+阅读 · 2014年12月31日

偏心搅拌反应器内自由表面湍流流动与混合特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

泛函不等式与随机微分方程上的大偏差问题

国家自然科学基金

0+阅读 · 2012年12月31日

Asperger综合症情绪认知的神经心理调控机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员