Large language models (LLMs) increasingly serve as human-like decision-making agents in social science and applied settings. These LLM-agents are typically assigned human-like characters and placed in real-life contexts. However, how these characters and contexts shape an LLM's behavior remains underexplored. This study proposes and tests methods for probing, quantifying, and modifying an LLM's internal representations in a Dictator Game, a classic behavioral experiment on fairness and prosocial behavior. We extract ``vectors of variable variations'' (e.g., ``male'' to ``female'') from the LLM's internal state. Manipulating these vectors during the model's inference can substantially alter how those variables relate to the model's decision-making. This approach offers a principled way to study and regulate how social concepts can be encoded and engineered within transformer-based models, with implications for alignment, debiasing, and designing AI agents for social simulations in both academic and commercial applications, strengthening sociological theory and measurement.
翻译:大语言模型(LLMs)在社会科学和应用场景中日益充当类人决策智能体。这些LLM智能体通常被赋予类人角色并置于现实生活情境中。然而,这些角色和情境如何塑造LLM的行为仍未得到充分探索。本研究提出并测试了在独裁者博弈(一种关于公平和亲社会行为的经典行为实验)中探测、量化和修改LLM内部表征的方法。我们从LLM的内部状态中提取“变量变异向量”(例如从“男性”到“女性”)。在模型推理过程中操纵这些向量可以显著改变这些变量与模型决策之间的关联。该方法为研究和调控社会概念如何在基于Transformer的模型中被编码和工程化提供了一种原则性途径,对于对齐、去偏见以及为学术和商业应用中的社会模拟设计AI智能体具有启示意义,从而加强社会学理论与测量。