Language model (LM) assistants are increasingly used in applications such as brainstorming and research. Improvements in memory and context size have allowed these models to become more autonomous, which has also resulted in more text accumulation in their context windows without explicit user intervention. This comes with a latent risk: the belief profiles of models -- their understanding of the world as manifested in their responses or actions -- may silently change as context accumulates. This can lead to subtly inconsistent user experiences, or shifts in behavior that deviate from the original alignment of the models. In this paper, we explore how accumulating context by engaging in interactions and processing text -- talking and reading -- can change the beliefs of language models, as manifested in their responses and behaviors. Our results reveal that models' belief profiles are highly malleable: GPT-5 exhibits a 54.7% shift in its stated beliefs after 10 rounds of discussion about moral dilemmas and queries about safety, while Grok 4 shows a 27.2% shift on political issues after reading texts from the opposing position. We also examine models' behavioral changes by designing tasks that require tool use, where each tool selection corresponds to an implicit belief. We find that these changes align with stated belief shifts, suggesting that belief shifts will be reflected in actual behavior in agentic systems. Our analysis exposes the hidden risk of belief shift as models undergo extended sessions of talking or reading, rendering their opinions and actions unreliable.
翻译:语言模型助手在头脑风暴和研究等应用中的使用日益广泛。内存和上下文窗口容量的提升使这些模型更具自主性,这也导致其上下文窗口中的文本在没有用户明确干预的情况下不断累积。这带来了一个潜在风险:模型的信念轮廓——即其通过响应或行为所体现的对世界的理解——可能随着上下文的累积而悄然改变。这可能导致用户体验出现细微的不一致,或使模型行为偏离原有的对齐目标。本文探讨了通过参与交互和处理文本——即交谈与阅读——累积上下文如何改变语言模型的信念,并体现在其响应和行为中。我们的研究结果表明,模型的信念轮廓具有高度可塑性:GPT-5在经历10轮关于道德困境和安全问题的讨论后,其陈述信念发生了54.7%的偏移;而Grok 4在阅读对立立场文本后,政治议题上的信念偏移达27.2%。我们还通过设计需要工具使用的任务来考察模型的行为变化,其中每个工具选择都对应着一种隐含信念。研究发现这些变化与陈述信念的偏移相一致,表明信念偏移将在智能体系统的实际行为中得到体现。我们的分析揭示了模型在经历长时间交谈或阅读会话后发生信念偏移的潜在风险,这将导致其观点和行为变得不可靠。