In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer. We characterize when transformer/attention architecture provably obeys the stability condition and also provide empirical verification. For generalization on unseen tasks, we identify an inductive bias phenomenon in which the transfer learning risk is governed by the task complexity and the number of MTL tasks in a highly predictable manner. Finally, we provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) provide insights on stability, and (3) verify our theoretical predictions.
翻译:在变压器模型按一系列(投入、输出)示例运行并在瞬间进行推断的情况下,变压器模型(ICL)是一种提示类型。在这项工作中,我们将变压器模型作为算法学习问题正式确定为一种算法学习问题,在变压器模型隐含地于推论时间构建假设功能时,我们首先通过多任务学习的透镜来探讨这种抽象的统计方面:当输入提示是(1) i.d.(投入、标签)对对或(2)动态系统产生的轨迹的序列(投入、标签)或(2) 时,我们获得综合控制器的概括性界限。我们的分析的要点是将超额风险与变压器所实施的算法的稳定性联系起来。当变压器/注意结构在推移时可以满足稳定性条件,并且提供经验性核实。关于不可测任务的一般化,我们找出一种感化偏差偏差现象,即转移学习风险受任务复杂性和MTL任务数目的高度可预测性制约。最后,我们提供数字评价,(1) 显示变压器确实能够执行动态、理论推算和模型推导分析问题。i。