无需训练的学习：上下文学习的隐式动态机制 (Learning without training: The implicit dynamics of in-context learning) - 专知论文

会员服务 ·

0

上下文 · 上下文学习 · 大语言模型 · Transformer · 示例 ·

Learning without training: The implicit dynamics of in-context learning

翻译：无需训练的学习：上下文学习的隐式动态机制

Benoit Dherin,Michael Munn,Hanna Mazzawi,Michael Wunder,Javier Gonzalvo

One of the most striking features of Large Language Models (LLMs) is their ability to learn in-context. Namely at inference time an LLM is able to learn new patterns without any additional weight update when these patterns are presented in the form of examples in the prompt, even if these patterns were not seen during training. The mechanisms through which this can happen are still largely unknown. In this work, we show that the stacking of a self-attention layer with an MLP, allows the transformer block to implicitly modify the weights of the MLP layer according to the context. We argue through theory and experimentation that this simple mechanism may be the reason why LLMs can learn in-context and not only during training. Specifically, we show how a transformer block implicitly transforms a context into a low-rank weight-update of its MLP layer.

翻译：大型语言模型（LLM）最显著的特征之一是其上下文学习能力。具体而言，在推理阶段，当新模式以提示中的示例形式呈现时，LLM能够在无需额外权重更新的情况下学习这些模式，即使这些模式在训练期间从未出现过。这种现象背后的机制在很大程度上仍是未知的。本研究表明，自注意力层与多层感知机（MLP）的堆叠结构，使得Transformer模块能够根据上下文隐式地调整MLP层的权重。我们通过理论分析与实验验证提出，这一简单机制可能是LLM不仅能在训练中学习，还能进行上下文学习的关键原因。具体而言，我们证明了Transformer模块如何将上下文隐式地转化为其MLP层的低秩权重更新。

0

相关内容

上下文

【ICCV2025】具有局部对齐视觉-语言模型的可解释零样本学习

【ICCV2025】具有局部对齐视觉-语言模型的可解释零样本学习

专知会员服务

10+阅读 · 7月1日

CLIP通用提示学习的简要概述

CLIP通用提示学习的简要概述

专知会员服务

16+阅读 · 3月13日

【NeurIPS2024】超越冗余：信息感知的无监督多重图结构学习

【NeurIPS2024】超越冗余：信息感知的无监督多重图结构学习

专知会员服务

27+阅读 · 2024年9月29日

【NeurIPS2022】通过模型转换的可解释强化学习

【NeurIPS2022】通过模型转换的可解释强化学习

专知会员服务

38+阅读 · 2022年10月4日

【ICML2021】图对比学习自动化

专知会员服务

41+阅读 · 2021年6月19日

【CVPR2021】跨模态检索的概率嵌入

【CVPR2021】跨模态检索的概率嵌入

专知

17+阅读 · 2021年3月2日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

模糊认知集群优化的聚类算法

国家自然科学基金

8+阅读 · 2015年12月31日

面向大规模多步学习问题的学习分类元系统技术研究

国家自然科学基金

5+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation

Arxiv

0+阅读 · 12月24日

MDToC: Metacognitive Dynamic Tree of Concepts for Boosting Mathematical Problem-Solving of Large Language Models

Arxiv

0+阅读 · 12月21日

LLM-as-a-qualitative-judge: automating error analysis in natural language generation

Arxiv

0+阅读 · 12月19日

Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs

Arxiv

0+阅读 · 12月19日

The Mental World of Large Language Models in Recommendation: A Benchmark on Association, Personalization, and Knowledgeability

Arxiv

0+阅读 · 12月19日

VIP会员

文章信息

相关主题

上下文学习

大语言模型

相关VIP内容

【ICCV2025】具有局部对齐视觉-语言模型的可解释零样本学习

【ICCV2025】具有局部对齐视觉-语言模型的可解释零样本学习

专知会员服务

10+阅读 · 7月1日

CLIP通用提示学习的简要概述

CLIP通用提示学习的简要概述

专知会员服务

16+阅读 · 3月13日

【NeurIPS2024】超越冗余：信息感知的无监督多重图结构学习

【NeurIPS2024】超越冗余：信息感知的无监督多重图结构学习

专知会员服务

27+阅读 · 2024年9月29日

【NeurIPS2022】通过模型转换的可解释强化学习

【NeurIPS2022】通过模型转换的可解释强化学习

专知会员服务

38+阅读 · 2022年10月4日

【ICML2021】图对比学习自动化

专知会员服务

41+阅读 · 2021年6月19日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】数据、决策与过度依赖：构建可信人工智能的核心挑战

《多域时代中维持弹性军事训练：挑战与机遇》

【AAAI2026】专家数量何为最优？面向混合专家模型的语义专业化优化研究

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

相关资讯

【CVPR2021】跨模态检索的概率嵌入

【CVPR2021】跨模态检索的概率嵌入

专知

17+阅读 · 2021年3月2日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

相关论文

Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation

Arxiv

0+阅读 · 12月24日

MDToC: Metacognitive Dynamic Tree of Concepts for Boosting Mathematical Problem-Solving of Large Language Models

Arxiv

0+阅读 · 12月21日

LLM-as-a-qualitative-judge: automating error analysis in natural language generation

Arxiv

0+阅读 · 12月19日

Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs

Arxiv

0+阅读 · 12月19日

The Mental World of Large Language Models in Recommendation: A Benchmark on Association, Personalization, and Knowledgeability

Arxiv

0+阅读 · 12月19日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

模糊认知集群优化的聚类算法

国家自然科学基金

8+阅读 · 2015年12月31日

面向大规模多步学习问题的学习分类元系统技术研究

国家自然科学基金

5+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

微信扫码咨询专知VIP会员