语言模型能够处理循环嵌套的语法结构吗? 关于比较模型和人类的案例研究 (Can language models handle recursively nested grammatical structures? A case study on comparing models and humans) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · CASE · Processing（编程语言） · Prompt ·

2022 年 11 月 1 日

Can language models handle recursively nested grammatical structures? A case study on comparing models and humans

翻译：语言模型能够处理循环嵌套的语法结构吗? 关于比较模型和人类的案例研究

Andrew Kyle Lampinen

How should we compare the capabilities of language models and humans? Here, I consider a case study: processing of recursively nested grammatical structures. Prior work has suggested that language models cannot handle these structures as reliably as humans can. However, the humans were provided with instructions and training before being evaluated, while the language models were evaluated zero-shot. I therefore attempt to more closely match the evaluation paradigms by providing language models with few-shot prompts. A simple prompt, which contains substantially less content than the human training, allows large language models to consistently outperform the human results. The same prompt even allows extrapolation to more deeply nested conditions than have been tested in humans. Further, a reanalysis of the prior human experiments suggests that the humans may not perform above chance at the difficult structures initially. These results suggest that large language models can in fact process recursively nested grammatical structures comparably to humans. This case study highlights how discrepancies in the quantity of experiment-specific context can confound comparisons of language models and humans. I use this case study to reflect on the broader challenge of comparing human and model capabilities, and to suggest that there is an important difference between evaluating cognitive models of a specific phenomenon and evaluating broadly-trained models.

翻译：我们应如何比较语言模型和人的能力?在这里,我考虑一个案例研究:处理循环嵌套的语法结构。先前的工作表明,语言模型无法象人类能够做的那样可靠地处理这些结构。然而,在评估之前,向人类提供了指导和培训,而语言模型则被评估为零。因此,我试图通过提供几发的提示来更密切地匹配评价模式。简单快速(其内容远远少于人类培训的内容)使得大型语言模型能够持续地超越人类结果。同样及时甚至允许外推到比人类所测试的更深的巢状条件。此外,对先前人类实验的重新分析表明,人类在最初的艰难结构中可能不会比机会更成功。这些结果表明,大型语言模型在事实上可以与人类相对可比较的嵌套式的语法结构。本案例研究突出表明,具体语言模型数量的差异如何可以混杂对语言模型和人类结果的比较。我利用这一案例研究来思考在广泛比较人类和具体模型能力时所面临的更广泛挑战。

0

相关内容

语言模型化

语言模型化

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

流感病毒NS基因功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Doublesex基因在对虾性别决定和分化中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

CTHRC1基因遗传多态性与原发性胆汁性肝硬化的相关性研究及其功能鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

巨桉锌指结构蛋白基因EgrZPCT在抗冷胁迫中功能和调控机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于Cayley图的若干研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRAF1在心肌梗死后心室重构中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Can large language models reason about medical questions?

Arxiv

0+阅读 · 2022年12月20日

Self-Instruct: Aligning Language Model with Self Generated Instructions

Self-Instruct: Aligning Language Model with Self Generated Instructions

Arxiv

1+阅读 · 2022年12月20日

Task Ambiguity in Humans and Language Models

Arxiv

0+阅读 · 2022年12月20日

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Arxiv

0+阅读 · 2022年12月19日

Design and Structure Dependent Priors for Scale Parameters in Latent Gaussian Models

Arxiv

0+阅读 · 2022年12月19日

LMentry: A Language Model Benchmark of Elementary Language Tasks

Arxiv

0+阅读 · 2022年12月19日

More is Better (Mostly): On the Backdoor Attacks in Federated Graph Neural Networks

Arxiv

0+阅读 · 2022年12月16日

Self-Prompting Large Language Models for Open-Domain QA

Arxiv

0+阅读 · 2022年12月16日

A unified information-theoretic model of EEG signatures of human language processing

Arxiv

0+阅读 · 2022年12月16日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

VIP会员

文章信息

相关主题

语言模型化

Processing（编程语言）

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能驾驶：旧理念与新技术

美军手册：战术心理战分遣队与小组指南 | 68页

军事机器学习设计：关于开发自动化任务摘要系统的梯次化设计科学研究 | 2025最新93页

美国防部自主系统研制试验与鉴定指南 | 2025年最新200页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Can large language models reason about medical questions?

Arxiv

0+阅读 · 2022年12月20日

Self-Instruct: Aligning Language Model with Self Generated Instructions

Self-Instruct: Aligning Language Model with Self Generated Instructions

Arxiv

1+阅读 · 2022年12月20日

Task Ambiguity in Humans and Language Models

Arxiv

0+阅读 · 2022年12月20日

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Arxiv

0+阅读 · 2022年12月19日

Design and Structure Dependent Priors for Scale Parameters in Latent Gaussian Models

Arxiv

0+阅读 · 2022年12月19日

LMentry: A Language Model Benchmark of Elementary Language Tasks

Arxiv

0+阅读 · 2022年12月19日

More is Better (Mostly): On the Backdoor Attacks in Federated Graph Neural Networks

Arxiv

0+阅读 · 2022年12月16日

Self-Prompting Large Language Models for Open-Domain QA

Arxiv

0+阅读 · 2022年12月16日

A unified information-theoretic model of EEG signatures of human language processing

Arxiv

0+阅读 · 2022年12月16日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

相关基金

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

流感病毒NS基因功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Doublesex基因在对虾性别决定和分化中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

CTHRC1基因遗传多态性与原发性胆汁性肝硬化的相关性研究及其功能鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

巨桉锌指结构蛋白基因EgrZPCT在抗冷胁迫中功能和调控机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于Cayley图的若干研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRAF1在心肌梗死后心室重构中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员