GTAlign：基于博弈论对齐的LLM助手以实现社会福利 (GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare)

Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a social welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and social welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .

翻译：大语言模型（LLMs）在推理方面取得了显著进展，但在写作、信息检索或提供实用指导等任务中，有时生成的回答对用户而言并非最优。传统的对齐实践通常假设最大化模型奖励即最大化用户福利，但这一假设在实践中常不成立：当用户偏好简洁答案时，模型可能过度澄清或生成冗长的推理过程。此类行为类似于囚徒困境，即个体理性的选择导致社会次优结果。根本挑战在于缺乏一种使LLM与用户互惠的、基于原则的决策机制。我们提出博弈论对齐（GTAlign），一种将博弈论决策整合到推理与训练中的对齐框架。在推理过程中，模型明确将用户-LLM交互视为策略博弈：它在推理链中构建收益矩阵以估计自身与用户的福利，随后选择对双方均有利的行动。在训练阶段，我们引入社会福利奖励以强化合作性响应，使模型行为与社会高效结果对齐。此外，我们提出一种推理技术，利用博弈论推理在LLM服务定价策略变化时动态调整模型响应。大量实验表明，与基线方法相比，GTAlign在多类任务中显著提升了推理效率、回答质量和社会福利。代码发布于https://github.com/ulab-uiuc/GTAlign。