In-context learning is a surprising and important phenomenon that emerged when modern language models were scaled to billions of learned parameters. Without modifying a large language model's weights, it can be tuned to perform various downstream natural language tasks simply by including concatenated training examples of these tasks in its input. Though disruptive for many practical applications of large language models, this emergent learning paradigm is not well understood from a theoretical perspective. In this paper, we propose a first-of-its-kind PAC based framework for in-context learnability, and use it to provide the first finite sample complexity results for the in-context learning setup. Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution, and then a second in-context learning phase, which keeps this function constant and concatenates training examples of the downstream task in its input. We use our framework in order to prove that, under mild assumptions, when the pretraining distribution is a mixture of latent tasks (a model often considered for natural language pretraining), these tasks can be efficiently learned via in-context learning, even though the model's weights are unchanged and the input significantly diverges from the pretraining distribution. Our theoretical analysis reveals that in this setting, in-context learning is more about identifying the task than about learning it, a result which is in line with a series of recent empirical findings. We hope that the in-context learnability framework presented in this paper will facilitate future progress towards a deeper understanding of this important new learning paradigm.
翻译:当现代语言模式规模扩大至数十亿项已学习的参数时,就出现了一种令人惊讶和重要的现象。在不修改一个大型语言模式的权重的情况下,可以调整它来完成各种下游自然语言任务,只要在其投入中加入关于这些任务的培训实例。虽然这种新兴学习模式对大型语言模式的许多实际应用具有干扰性,但从理论角度并没有充分理解这种新兴学习模式。在本文中,我们提议了一个基于其同类PAC的首个框架,用于在文内学习,并利用它为文内学习设置提供第一个有限的复杂抽样结果。我们的框架包括一个初步的培训前阶段,这个阶段适合培训前分发的功能,然后是第二个文内学习阶段,这个阶段保持这一功能的常态,把下游任务的培训范例放在一起。我们使用我们的框架来证明,在温和的假设下,当培训前的分布是一种潜在的任务(一种通常被考虑用于自然语言培训前的模型)的混合时,这些任务可以有效地通过文字学习来学习,尽管这个模型的权重在于培训前的学习过程,而这个模型的分数重在分析中显示我们之前的学习过程的结果是不同的。在分析中,在分析中,这种理论分析中,这种分析中,这种分数重的进度将显示我们学习的结果将是一种不同的。</s>