In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. This implicit training is in contrast to explicitly tuning the model weights based on examples. In this work, we formalize in-context learning as an algorithm learning problem, treating the transformer model as a learning algorithm that can be specialized via training to implement-at inference-time-another target algorithm. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer, which holds under mild assumptions. Secondly, we use our abstraction to show that transformers can act as an adaptive learning algorithm and perform model selection across different hypothesis classes. We provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) identify an inductive bias phenomenon where the transfer risk on unseen tasks is independent of the transformer complexity, and (3) empirically verify our theoretical predictions.
翻译:文本内学习( ICL) 是一种激励类型, 当变压器模型在一系列( 投入、 输出) 示例中运行, 并进行现场推断时, 我们首先通过多任务学习的镜头来探索这种抽象的统计方面: 当输入提示:(1) i. i.d. (投入、 标签) 组合或(2) 动态系统产生的轨迹时, 我们的隐含培训就将变压器模型正式确定为算法学习问题, 将变压器模型作为学习算法的稳定性, 通过培训可以专门执行- 推断- 时间- 其它目标算法。 我们首先通过多任务学习的镜头来探索变压器的统计方面。 当输入提示:(1) i. d. ( 输入、 标签) 配对或(2) 动态系统产生的轨迹时, 我们的分析把变压器的超风险与变压器所执行的算法的稳定性相联系, 这在轻度假设之下。 其次, 我们利用我们的抽象来显示变压器可以作为适应性学习算法, 并在各个假设类中进行模型选择。 我们提供数字评价, (1) 显示变压器确实显示变压者可以执行 和变压式变压式变压式变压式分析, 。