大型语言模型是否全能?：探索LLMs的领域不可知推理技能 (Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning Skills of LLMs) - 专知论文

会员服务 ·

0

Performer · 道德化 · INFORMS · 语言模型化 · 可理解性 ·

2023 年 3 月 22 日

Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning Skills of LLMs

翻译：大型语言模型是否全能?：探索LLMs的领域不可知推理技能

Shrivats Agrawal

The potential of large language models (LLMs) to reason like humans has been a highly contested topic in Machine Learning communities. However, the reasoning abilities of humans are multifaceted and can be seen in various forms, including analogical, spatial and moral reasoning, among others. This fact raises the question whether LLMs can perform equally well across all these different domains. This research work aims to investigate the performance of LLMs on different reasoning tasks by conducting experiments that directly use or draw inspirations from existing datasets on analogical and spatial reasoning. Additionally, to evaluate the ability of LLMs to reason like human, their performance is evaluted on more open-ended, natural language questions. My findings indicate that LLMs excel at analogical and moral reasoning, yet struggle to perform as proficiently on spatial reasoning tasks. I believe these experiments are crucial for informing the future development of LLMs, particularly in contexts that require diverse reasoning proficiencies. By shedding light on the reasoning abilities of LLMs, this study aims to push forward our understanding of how they can better emulate the cognitive abilities of humans.

翻译：大型语言模型（LLMs）推理能力是否能够像人类一样具有多方面、多形式的推理能力一直是机器学习社区争议的焦点。然而，人类的推理能力多种多样，包括类比、空间和道德推理等。这个事实提出了一个问题，LLMs是否在所有这些不同领域中表现得同样出色。本研究旨在通过进行直接使用或受现有类比和空间推理数据集启发的实验，调查LLMs在不同推理任务上的表现。此外，为了评估LLMs像人类一样推理的能力，它们的表现也将在更加开放和自然的语言问题上进行评估。我的研究结果表明，LLMs在类比和道德推理方面表现出色，但在空间推理任务中表现不如人类专业。我认为这些实验对于引导未来LLMs的发展是至关重要的，特别是在需要多样化推理技能的语境中。通过阐明LLMs的推理能力，本研究旨在推动我们对其如何更好地模拟人类认知能力的理解。

0

相关内容

Performer

ChatGPT大模型如何做科学研究? CMU提出《大模型智能体系统》，高推理展现出大型语言模型的新兴自主科学研究能力

ChatGPT大模型如何做科学研究? CMU提出《大模型智能体系统》，高推理展现出大型语言模型的新兴自主科学研究能力

专知会员服务

112+阅读 · 2023年4月12日

ChatGPT背后的大模型如何做推理？港中文等最新《自然语言推理》综述详述预训练语言模型推理方法

ChatGPT背后的大模型如何做推理？港中文等最新《自然语言推理》综述详述预训练语言模型推理方法

专知会员服务

116+阅读 · 2023年3月29日

ChatGPT如何work的？最新《大型语言模型》综述，51页slides

ChatGPT如何work的？最新《大型语言模型》综述，51页slides

专知会员服务

162+阅读 · 2023年2月28日

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【机器推理可解释性】Machine Reasoning Explainability

【机器推理可解释性】Machine Reasoning Explainability

专知会员服务

35+阅读 · 2020年9月3日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

专知会员服务

22+阅读 · 2019年8月10日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

合成孔径雷达（SAR）在地球科学应用中的尺度效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

用于生物检测与成像的AIE型红光纳米材料

国家自然科学基金

0+阅读 · 2012年12月31日

股票预期收益率波动如何影响公司资本结构调整？

国家自然科学基金

0+阅读 · 2012年12月31日

社会从众的心理和神经机制

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

宽探测器螺旋CT所致辐射剂量评价方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

类石墨烯表面结构和性能预测

国家自然科学基金

0+阅读 · 2011年12月31日

自我加工优势效应的神经机制

国家自然科学基金

0+阅读 · 2011年12月31日

Beyond the Safeguards: Exploring the Security Risks of ChatGPT

Arxiv

1+阅读 · 2023年5月13日

GPT-NER: Named Entity Recognition via Large Language Models

Arxiv

0+阅读 · 2023年5月12日

Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation

Arxiv

0+阅读 · 2023年5月12日

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Arxiv

0+阅读 · 2023年5月11日

Marshall-Olkin Power-Law Distributions in Length-Frequency of Entities

Arxiv

0+阅读 · 2023年5月10日

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

Arxiv

1+阅读 · 2023年5月10日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

61+阅读 · 2023年3月29日

The Life Cycle of Knowledge in Big Language Models: A Survey

Arxiv

28+阅读 · 2023年3月14日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

ChatGPT大模型如何做科学研究? CMU提出《大模型智能体系统》，高推理展现出大型语言模型的新兴自主科学研究能力

ChatGPT大模型如何做科学研究? CMU提出《大模型智能体系统》，高推理展现出大型语言模型的新兴自主科学研究能力

专知会员服务

112+阅读 · 2023年4月12日

ChatGPT背后的大模型如何做推理？港中文等最新《自然语言推理》综述详述预训练语言模型推理方法

ChatGPT背后的大模型如何做推理？港中文等最新《自然语言推理》综述详述预训练语言模型推理方法

专知会员服务

116+阅读 · 2023年3月29日

ChatGPT如何work的？最新《大型语言模型》综述，51页slides

ChatGPT如何work的？最新《大型语言模型》综述，51页slides

专知会员服务

162+阅读 · 2023年2月28日

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【机器推理可解释性】Machine Reasoning Explainability

【机器推理可解释性】Machine Reasoning Explainability

专知会员服务

35+阅读 · 2020年9月3日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

专知会员服务

22+阅读 · 2019年8月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Beyond the Safeguards: Exploring the Security Risks of ChatGPT

Arxiv

1+阅读 · 2023年5月13日

GPT-NER: Named Entity Recognition via Large Language Models

Arxiv

0+阅读 · 2023年5月12日

Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation

Arxiv

0+阅读 · 2023年5月12日

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Arxiv

0+阅读 · 2023年5月11日

Marshall-Olkin Power-Law Distributions in Length-Frequency of Entities

Arxiv

0+阅读 · 2023年5月10日

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

Arxiv

1+阅读 · 2023年5月10日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

61+阅读 · 2023年3月29日

The Life Cycle of Knowledge in Big Language Models: A Survey

Arxiv

28+阅读 · 2023年3月14日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

相关基金

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

合成孔径雷达（SAR）在地球科学应用中的尺度效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

用于生物检测与成像的AIE型红光纳米材料

国家自然科学基金

0+阅读 · 2012年12月31日

股票预期收益率波动如何影响公司资本结构调整？

国家自然科学基金

0+阅读 · 2012年12月31日

社会从众的心理和神经机制

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

宽探测器螺旋CT所致辐射剂量评价方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

类石墨烯表面结构和性能预测

国家自然科学基金

0+阅读 · 2011年12月31日

自我加工优势效应的神经机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员