Moral judgment is integral to large language models' (LLMs) social reasoning. As multi-agent systems gain prominence, it becomes crucial to understand how LLMs function when collaborating compared to operating as individual agents. In human moral judgment, group deliberation leads to a Utilitarian Boost: a tendency to endorse norm violations that inflict harm but maximize benefits for the greatest number of people. We study whether a similar dynamic emerges in multi-agent LLM systems. We test six models on well-established sets of moral dilemmas across two conditions: (1) Solo, where models reason independently, and (2) Group, where they engage in multi-turn discussions in pairs or triads. In personal dilemmas, where agents decide whether to directly harm an individual for the benefit of others, all models rated moral violations as more acceptable when part of a group, demonstrating a Utilitarian Boost similar to that observed in humans. However, the mechanism for the Boost in LLMs differed: While humans in groups become more utilitarian due to heightened sensitivity to decision outcomes, LLM groups showed either reduced sensitivity to norms or enhanced impartiality. We report model differences in when and how strongly the Boost manifests. We also discuss prompt and agent compositions that enhance or mitigate the effect. We end with a discussion of the implications for AI alignment, multi-agent design, and artificial moral reasoning. Code available at: https://github.com/baltaci-r/MoralAgents
翻译:道德判断是大语言模型(LLMs)社会推理的核心组成部分。随着多智能体系统日益重要,理解LLMs在协作时与作为独立智能体运行时的差异变得至关重要。在人类道德判断中,群体审议会导致功利主义增强效应:即倾向于认可那些造成伤害但能为最多人带来最大利益的规范违反行为。本研究探讨了多智能体LLM系统中是否会出现类似动态。我们在两种条件下测试了六个模型在经典道德困境集上的表现:(1)独立模式,模型独立推理;(2)群体模式,模型以成对或三人组形式进行多轮讨论。在个人困境(即智能体决定是否直接伤害个体以造福他人)中,所有模型在群体条件下对道德违反行为的接受度均更高,表现出与人类相似的功利主义增强效应。然而,LLMs实现该效应的机制与人类不同:人类群体因对决策结果敏感度提高而变得更功利,而LLM群体则表现出对规范敏感度降低或公正性增强的特征。我们报告了不同模型在何时及以何种强度显现该效应的差异,并讨论了增强或减弱该效应的提示词设计与智能体组合策略。最后,我们探讨了该发现对AI对齐、多智能体系统设计及人工道德推理的启示。代码发布于:https://github.com/baltaci-r/MoralAgents