语言模型能够处理循环嵌套的语法结构吗? 关于比较模型和人类的案例研究 (Can language models handle recursively nested grammatical structures? A case study on comparing models and humans) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · CASE · Processing（编程语言） · Performer ·

2023 年 2 月 16 日

Can language models handle recursively nested grammatical structures? A case study on comparing models and humans

翻译：语言模型能够处理循环嵌套的语法结构吗? 关于比较模型和人类的案例研究

Andrew Kyle Lampinen

How should we compare the capabilities of language models (LMs) and humans? I draw inspiration from comparative psychology to highlight some challenges. In particular, I consider a case study: processing of recursively nested grammatical structures. Prior work suggests that LMs cannot handle these structures as reliably as humans can. However, the humans were provided with instructions and training, while the LMs were evaluated zero-shot. I therefore match the evaluation more closely. Providing large LMs with a simple prompt -- substantially less content than the human training -- allows the LMs to consistently outperform the human results, and even to extrapolate to more deeply nested conditions than were tested with humans. Further, reanalyzing the prior human data suggests that the humans may not perform above chance at the difficult structures initially. Thus, large LMs may indeed process recursively nested grammatical structures as reliably as humans. This case study highlights how discrepancies in the evaluation can confound comparisons of language models and humans. I therefore reflect on the broader challenge of comparing human and model capabilities, and highlight an important difference between evaluating cognitive models and foundation models.

翻译：我们应如何比较语言模型(LMS)和人类的能力?我应如何从比较心理学中汲取灵感,以突出一些挑战?我特别考虑一个案例研究:处理循环嵌套的语法结构。先前的工作表明,LMS无法象人类那样可靠地处理这些结构。然而,向人类提供了指示和培训,而LMS却被评估为零。因此,我更密切地匹配评估。向大型LMs提供简单的即时数据 -- -- 其内容远远少于人类培训的内容 -- -- 使LMs能够持续地超越人类结果,甚至外推到比人类所测试的更深的嵌套条件。此外,对以前的人类数据进行再分析表明,人类最初在困难的结构中可能不会超过机会。因此,大LMs的确可以像人类那样可靠地处理循环嵌套的语法结构。本案例研究强调,评价中的差异如何能混杂语言模型和人类的比较。我因此思考比较人与模型能力的更广泛挑战,并突出评估认知模型和基础模型之间的重要差异。

0

相关内容

语言模型化

语言模型化

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

间充质干细胞的自噬在其治疗脊髓损伤中的作用和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

水稻OsWDRP通过BR信号转导系统调节株高的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

一维CuInS2-ZnS异质结构纳米材料的合成和光电性质

国家自然科学基金

0+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

骨髓间质干细胞治疗下颌骨放射性骨坏死大型动物实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

p16甲基化在早衰细胞逃逸衰老与ALT肿瘤化中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models

Arxiv

0+阅读 · 2023年4月7日

The Eyes Have It!: Using Human-Selected Features for Predicting Athletes' Performance

Arxiv

0+阅读 · 2023年4月6日

Behavioral estimates of conceptual structure are robust across tasks in humans but not large language models

Arxiv

0+阅读 · 2023年4月5日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

5+阅读 · 2023年4月4日

Connecting Simple and Precise P-values to Complex and Ambiguous Realities

Arxiv

0+阅读 · 2023年4月3日

A Bibliometric Review of Large Language Models Research from 2017 to 2023

Arxiv

0+阅读 · 2023年4月3日

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

Arxiv

0+阅读 · 2023年4月3日

On the Creativity of Large Language Models

Arxiv

0+阅读 · 2023年3月27日

The Life Cycle of Knowledge in Big Language Models: A Survey

Arxiv

28+阅读 · 2023年3月14日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

VIP会员

文章信息

相关主题

语言模型化

Processing（编程语言）

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models

Arxiv

0+阅读 · 2023年4月7日

The Eyes Have It!: Using Human-Selected Features for Predicting Athletes' Performance

Arxiv

0+阅读 · 2023年4月6日

Behavioral estimates of conceptual structure are robust across tasks in humans but not large language models

Arxiv

0+阅读 · 2023年4月5日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

5+阅读 · 2023年4月4日

Connecting Simple and Precise P-values to Complex and Ambiguous Realities

Arxiv

0+阅读 · 2023年4月3日

A Bibliometric Review of Large Language Models Research from 2017 to 2023

Arxiv

0+阅读 · 2023年4月3日

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

Arxiv

0+阅读 · 2023年4月3日

On the Creativity of Large Language Models

Arxiv

0+阅读 · 2023年3月27日

The Life Cycle of Knowledge in Big Language Models: A Survey

Arxiv

28+阅读 · 2023年3月14日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

相关基金

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

间充质干细胞的自噬在其治疗脊髓损伤中的作用和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

水稻OsWDRP通过BR信号转导系统调节株高的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

一维CuInS2-ZnS异质结构纳米材料的合成和光电性质

国家自然科学基金

0+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

骨髓间质干细胞治疗下颌骨放射性骨坏死大型动物实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

针灸治疗大鼠CD肠纤维化Smads与ERK-1/2MAPK信号通路Cross talk研究

国家自然科学基金

0+阅读 · 2009年12月31日

p16甲基化在早衰细胞逃逸衰老与ALT肿瘤化中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员