Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language. Trained transformers can perform in-context learning for a range of tasks, in a manner consistent with the theoretical results. Mirroring real-world LLMs in a miniature setup, in-context learning emerges when scaling parameters and data, and models perform better when prompted to output intermediate steps. Probing shows that in-context learning is supported by a representation of the input's compositional structure. Taken together, these results provide a step towards theoretical understanding of emergent behavior in large language models.
翻译:扩大大型语言模型(LLMS) 导致从示例演示中学习内文的能力。尽管取得了进步,但对于这一现象的理论理解仍然有限。我们争辩说,内文学习依赖于自然语言数据中合成作业的再组合。我们得出了一个信息理论约束,表明在培训前分配具有足够数量的组成结构时,在语言驱动假设下,从通用的下方到下方的预测如何产生内文学习能力。第二个约束为促使LLMS产出中间步骤以获得答案的经验成功提供了理论依据。在验证理论预测时,我们引入了一种有控制的结构,用于引领文中文学习;与以往的方法不同,它说明了语言的构成性质。经过培训的变换者可以以符合理论结果的方式为一系列任务进行内文学习。将真实世界的LMS在一种微型设置中进行校正,在扩展参数和数据时出现文字学习,而模型在产出中间步骤时表现得更好。为了验证理论学,我们引入了一种有节制的结构;与以前的方法不同,它解释了语言的构成性质。经过培训的变式学习可以以与理论模型结构结构结构的大规模解释结构相支持。</s>