Large Language Models (LLMs) have demonstrated remarkable abilities, one of the most important being in-context learning (ICL). With ICL, LLMs can derive the underlying rule from a few demonstrations and provide answers that comply with the rule. Previous work hypothesized that the network creates a task vector in specific positions during ICL. The task vector can be computed by averaging across the dataset. It conveys the overall task information and can thus be considered global. Patching the global task vector allows LLMs to achieve zero-shot performance with dummy inputs comparable to few-shot learning. However, we find that such a global task vector does not exist in all tasks, especially in tasks that rely on rules that can only be inferred from multiple demonstrations, such as categorization tasks. Instead, the information provided by each demonstration is first transmitted to its answer position and forms a local task vector associated with the demonstration. In some tasks but not in categorization tasks, all demonstrations' local task vectors converge in later layers, forming the global task vector. We further show that local task vectors encode a high-level abstraction of rules extracted from the demonstrations. Our study provides novel insights into the mechanism underlying ICL in LLMs, demonstrating how ICL may be achieved through an information aggregation mechanism.
翻译:大型语言模型(LLMs)展现出卓越的能力,其中最重要的能力之一是上下文学习(ICL)。通过ICL,LLMs能够从少量示例中推导出潜在规则,并给出符合规则的答案。先前的研究假设网络在ICL过程中会在特定位置创建任务向量。该任务向量可通过在数据集上平均计算得到,它传递了整体任务信息,因此可被视为全局性的。植入全局任务向量可使LLMs在使用虚拟输入时实现与少样本学习相当的零样本性能。然而,我们发现此类全局任务向量并非存在于所有任务中,尤其是在那些依赖于仅能从多个示例中推断出规则的任务(如分类任务)中。相反,每个示例提供的信息首先会传递至其对应的答案位置,并形成与该示例相关联的局部任务向量。在某些任务(但非分类任务)中,所有示例的局部任务向量会在后续网络层中收敛,从而形成全局任务向量。我们进一步证明,局部任务向量编码了从示例中提取规则的高层抽象表示。本研究为LLMs中ICL的内在机制提供了新的见解,揭示了ICL如何通过信息聚合机制得以实现。