Spatial public goods games model collective dilemmas where individual payoffs depend on population-level strategy configurations. Most existing studies rely on evolutionary update rules or value-based reinforcement learning methods. These approaches struggle to represent payoff coupling and non-stationarity in large interacting populations. This work introduces Multi-Agent Proximal Policy Optimization (MAPPO) into spatial public goods games for the first time. In these games, individual returns are intrinsically coupled through overlapping group interactions. Proximal Policy Optimization (PPO) treats agents as independent learners and ignores this coupling during value estimation. MAPPO addresses this limitation through a centralized critic that evaluates joint strategy configurations. To study neighborhood-level cooperation signals under this framework, we propose MAPPO with Local Cooperation Reward, termed MAPPO-LCR. The local cooperation reward aligns policy updates with surrounding cooperative density without altering the original game structure. MAPPO-LCR preserves decentralized execution while enabling population-level value estimation during training. Extensive simulations demonstrate stable cooperation emergence and reliable convergence across enhancement factors. Statistical analyses further confirm the learning advantage of MAPPO over PPO in spatial public goods games.
翻译:空间公共物品博弈模拟了个人收益依赖于群体层面策略配置的集体困境。现有研究大多依赖于演化更新规则或基于价值的强化学习方法。这些方法难以表征大规模交互群体中的收益耦合与非平稳性。本研究首次将多智能体近端策略优化引入空间公共物品博弈。在此类博弈中,个体回报通过重叠的群体交互本质上是相互耦合的。近端策略优化将智能体视为独立学习者,在价值估计时忽略了这种耦合。MAPPO通过一个评估联合策略配置的集中式评论家来解决这一局限。为研究该框架下邻域层面的合作信号,我们提出了带有局部合作奖励的MAPPO,称为MAPPO-LCR。局部合作奖励在不改变原始博弈结构的前提下,使策略更新与周围合作密度保持一致。MAPPO-LCR保持了去中心化执行,同时在训练期间实现了群体层面的价值估计。大量仿真实验表明,该方法能在不同增强因子下实现稳定合作涌现与可靠收敛。统计分析进一步证实了MAPPO在空间公共物品博弈中相较于PPO的学习优势。