Large Language Models (LLMs) can sometimes degrade into repetitive loops, persistently generating identical word sequences. Because repetition is rare in natural human language, its frequent occurrence across diverse tasks and contexts in LLMs remains puzzling. Here we investigate whether behaviorally similar repetition patterns arise from distinct underlying mechanisms and how these mechanisms develop during model training. We contrast two conditions: repetitions elicited by natural text prompts with those induced by in-context learning (ICL) setups that explicitly require copying behavior. Our analyses reveal that ICL-induced repetition relies on a dedicated network of attention heads that progressively specialize over training, whereas naturally occurring repetition emerges early and lacks a defined circuitry. Attention inspection further shows that natural repetition focuses disproportionately on low-information tokens, suggesting a fallback behavior when relevant context cannot be retrieved. These results indicate that superficially similar repetition behaviors originate from qualitatively different internal processes, reflecting distinct modes of failure and adaptation in language models.
翻译:大型语言模型(LLMs)有时会陷入重复循环,持续生成相同的词序列。由于重复在自然人类语言中较为罕见,其在LLMs中跨多种任务和上下文频繁出现的现象仍令人费解。本文探究行为上相似的重复模式是否源于不同的底层机制,以及这些机制在模型训练过程中如何发展。我们对比了两种条件:由自然文本提示引发的重复与由上下文学习(ICL)设置诱导的重复,后者明确要求复制行为。分析表明,ICL诱导的重复依赖于一个专门的注意力头网络,该网络在训练过程中逐渐专业化;而自然发生的重复则早期出现且缺乏明确的电路结构。注意力检查进一步显示,自然重复不成比例地聚焦于低信息量词元,表明其在无法检索相关上下文时作为一种回退行为。这些结果表明,表面上相似的重复行为源于性质不同的内部过程,反映了语言模型中不同的失败与适应模式。