变异体如演算法:内通学习的普遍化和稳定 (Transformers as Algorithms: Generalization and Stability in In-context Learning) - 专知论文

会员服务 ·

0

Learning · 泛化理论 · 变换 · Transformer模型 · Prompt ·

2023 年 2 月 6 日

Transformers as Algorithms: Generalization and Stability in In-context Learning

翻译：变异体如演算法:内通学习的普遍化和稳定

Yingcong Li,M. Emrullah Ildiz,Dimitris Papailiopoulos,Samet Oymak

from arxiv, Revised version significantly improves the stability guarantees and provides new experiments

In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer. We characterize when transformer/attention architecture provably obeys the stability condition and also provide empirical verification. For generalization on unseen tasks, we identify an inductive bias phenomenon in which the transfer learning risk is governed by the task complexity and the number of MTL tasks in a highly predictable manner. Finally, we provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) provide insights on stability, and (3) verify our theoretical predictions.

翻译：在变压器模型按一系列(投入、输出)示例运行并在瞬间进行推断的情况下,变压器模型(ICL)是一种提示类型。在这项工作中,我们将变压器模型作为算法学习问题正式确定为一种算法学习问题,在变压器模型隐含地于推论时间构建假设功能时,我们首先通过多任务学习的透镜来探讨这种抽象的统计方面:当输入提示是(1) i.d.(投入、标签)对对或(2)动态系统产生的轨迹的序列(投入、标签)或(2) 时,我们获得综合控制器的概括性界限。我们的分析的要点是将超额风险与变压器所实施的算法的稳定性联系起来。当变压器/注意结构在推移时可以满足稳定性条件,并且提供经验性核实。关于不可测任务的一般化,我们找出一种感化偏差偏差现象,即转移学习风险受任务复杂性和MTL任务数目的高度可预测性制约。最后,我们提供数字评价,(1) 显示变压器确实能够执行动态、理论推算和模型推导分析问题。i。

0

相关内容

Learning

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

基于纳米发电机的自驱动MEMS/NEMS机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于两相流的膜吸收法海水烟气脱硫强化集成及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

烟气从天然气水合物中置换甲烷机理及三维实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

锰铁酸钇基多铁性陶瓷的微观结构和电、磁性能

国家自然科学基金

0+阅读 · 2012年12月31日

中高温烟气余热动力循环优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁化靶聚变中反场构型的气体放电特性与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

强湍流、机械振动及闪急沸腾诱导的射流破碎机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

用外显子组捕获测序技术鉴定Olmsted型掌跖角化症的致病基因

国家自然科学基金

0+阅读 · 2011年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

A Compositional Approach to Certifying the Almost Global Asymptotic Stability of Cascade Systems

Arxiv

0+阅读 · 2023年3月27日

Using Graph Algorithms to Pretrain Graph Completion Transformers

Arxiv

0+阅读 · 2023年3月27日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

The Principles of Deep Learning Theory

Arxiv

65+阅读 · 2021年6月18日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

VIP会员

文章信息

相关主题

Transformer模型

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《攻击场景描述形式化模型研究》

【博士论文】理解神经网络的训练动态：从局部优化轨迹与特征学习视角

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

A Compositional Approach to Certifying the Almost Global Asymptotic Stability of Cascade Systems

Arxiv

0+阅读 · 2023年3月27日

Using Graph Algorithms to Pretrain Graph Completion Transformers

Arxiv

0+阅读 · 2023年3月27日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

The Principles of Deep Learning Theory

Arxiv

65+阅读 · 2021年6月18日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

相关基金

基于纳米发电机的自驱动MEMS/NEMS机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于两相流的膜吸收法海水烟气脱硫强化集成及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

烟气从天然气水合物中置换甲烷机理及三维实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

锰铁酸钇基多铁性陶瓷的微观结构和电、磁性能

国家自然科学基金

0+阅读 · 2012年12月31日

中高温烟气余热动力循环优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁化靶聚变中反场构型的气体放电特性与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

强湍流、机械振动及闪急沸腾诱导的射流破碎机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

用外显子组捕获测序技术鉴定Olmsted型掌跖角化症的致病基因

国家自然科学基金

0+阅读 · 2011年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员