变换者能够学习什么内容?简单功能类案例研究 (What Can Transformers Learn In-Context? A Case Study of Simple Function Classes)

In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions -- that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes -- namely sparse linear functions, two-layer neural networks, and decision trees -- with performance that matches or exceeds task-specific learning algorithms. Our code and models are available at https://github.com/dtsip/in-context-learning .

翻译：文中学习是指模型是否有能力在由文本内示例(与某些任务相对应的投影-输出对配对)组成的快速序列中进行条件化的快速序列,以及新的查询输入,并生成相应的输出。关键是,文中学习只有在推论时间发生,而该模型没有任何参数更新。GPT-3等大型语言模型具有一定的进行文中学习的能力,但是,这种成功的任务与培训数据中存在什么关系还不清楚。为了在理解文本内学习方面取得进展,我们考虑了在文本内学习一个模型以在文本内学习功能(例如线性功能)的模型中学习一个函数(例如,线性功能)的清晰问题:鉴于该类中某些函数衍生的数据,我们能否从该类中培养一个模型到文中学习“最大部分”的函数?我们从经验上看,标准变形变换器可以从抓到在文本内学习线性功能 -- 受过训练的模型可以从文中学习直线性函数 -- 也就是从文中学习直线性函数,甚至从可比较性能到最优的文中,在平面格式中学习。在图表中,在数据分布中,在学习中,在格式内学习中,在格式中,在学习最接近中,在格式内学习最深的变式数据分配中,在格式中,在格式中,在格式中,在学习。(我们可能的变变式中,在格式中,在方向数据中,在方向中,在方向中,在方向中,在方向中可以显示中可以显示。