重新思考内文学习规模的作用:66亿规模的基于解释的案例研究 (Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale) - 专知论文

会员服务 ·

0

语言模型化 · Performer · Learning · 缩放 · MoDELS ·

2022 年 12 月 18 日

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

翻译：重新思考内文学习规模的作用:66亿规模的基于解释的案例研究

Hritik Bansal,Karthik Gopalakrishnan,Saket Dingliwal,Sravan Bodapati,Katrin Kirchhoff,Dan Roth

from arxiv, 21 pages, 19 figures, 1 table, 2 algorithms

Language models have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a large language model to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance. We find substantial overlap in the set of attention heads (un)important for in-context learning across tasks and number of in-context examples. We also address our hypothesis through a task-agnostic lens, finding that a small set of attention heads in OPT-66B score highly on their ability to perform primitive induction operations associated with in-context learning, namely, prefix matching and copying. These induction heads overlap with task-specific important heads, suggesting that induction heads are among the heads capable of more sophisticated behaviors associated with in-context learning. Overall, our study provides several insights that indicate large language models may be under-trained to perform in-context learning and opens up questions on how to pre-train language models to more effectively perform in-context learning.

翻译：在本文中,我们调查了这样一种假设,即大型语文模式的内通学习能力并非在所有基本组成部分中均匀分布。我们用一套660亿参数语言模式(OPT-66B)在14个下游任务中运用,我们发现情况确实如此:关注负责人70%和反馈前方网络20 % 的规模扩大,可以消除,任务绩效下降的程度最小。我们发现,一组关注负责人(对于在任务和数个同源实例之间进行同源学习至关重要)的大量重叠。我们还从任务-认知角度探讨我们的假设,发现OM-66B的少量关注负责人高度评价他们执行与文中学习有关的原始上岗作业的能力,即,前置匹配和复制。这些上岗负责人与具体任务的重要负责人重叠,表明上岗负责人是能够与在任务和同源中进行更复杂的学习的更精密行为中的负责人之一。我们通过一个任务-不可忽视的视角来学习前语言。

0

相关内容

语言模型化

语言模型化

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

益髓颗粒调控DNA甲基化治疗骨髓增生异常综合征分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

CRBN泛素化修饰AMPK调控骨髓瘤细胞增殖与凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

小分子TRF2抑制剂的设计、合成及表征

国家自然科学基金

0+阅读 · 2014年12月31日

基于深度学习的飞行器故障不确定性评估与预测研究

国家自然科学基金

4+阅读 · 2014年12月31日

分泌型金属蛋白酶CLCA在哮喘气道重塑中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

再生障碍性贫血中TCR信号通路失调及其相关分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属薄膜生长中物理参量及界面结构的椭偏光谱实时监控研究

国家自然科学基金

0+阅读 · 2011年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

SpecXAI -- Spectral interpretability of Deep Learning Models

Arxiv

0+阅读 · 2023年2月20日

On the Stability and Generalization of Triplet Learning

Arxiv

0+阅读 · 2023年2月20日

Scaling Laws for Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2023年2月19日

The Capacity for Moral Self-Correction in Large Language Models

Arxiv

1+阅读 · 2023年2月18日

Inclusive Study Group Formation At Scale

Arxiv

0+阅读 · 2023年2月16日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

SpecXAI -- Spectral interpretability of Deep Learning Models

Arxiv

0+阅读 · 2023年2月20日

On the Stability and Generalization of Triplet Learning

Arxiv

0+阅读 · 2023年2月20日

Scaling Laws for Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2023年2月19日

The Capacity for Moral Self-Correction in Large Language Models

Arxiv

1+阅读 · 2023年2月18日

Inclusive Study Group Formation At Scale

Arxiv

0+阅读 · 2023年2月16日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

相关基金

益髓颗粒调控DNA甲基化治疗骨髓增生异常综合征分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

CRBN泛素化修饰AMPK调控骨髓瘤细胞增殖与凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

小分子TRF2抑制剂的设计、合成及表征

国家自然科学基金

0+阅读 · 2014年12月31日

基于深度学习的飞行器故障不确定性评估与预测研究

国家自然科学基金

4+阅读 · 2014年12月31日

分泌型金属蛋白酶CLCA在哮喘气道重塑中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

再生障碍性贫血中TCR信号通路失调及其相关分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属薄膜生长中物理参量及界面结构的椭偏光谱实时监控研究

国家自然科学基金

0+阅读 · 2011年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员