Coordination is often critical to forming prosocial behaviors -- behaviors that increase the overall sum of rewards received by all agents in a multi-agent game. However, state of the art reinforcement learning algorithms often suffer from converging to socially less desirable equilibria when multiple equilibria exist. Previous works address this challenge with explicit reward shaping, which requires the strong assumption that agents can be forced to be prosocial. We propose using a less restrictive peer-rewarding mechanism, gifting, that guides the agents toward more socially desirable equilibria while allowing agents to remain selfish and decentralized. Gifting allows each agent to give some of their reward to other agents. We employ a theoretical framework that captures the benefit of gifting in converging to the prosocial equilibrium by characterizing the equilibria's basins of attraction in a dynamical system. With gifting, we demonstrate increased convergence of high risk, general-sum coordination games to the prosocial equilibrium both via numerical analysis and experiments.
翻译:协调往往对于形成有利于社会的行为 -- -- 增加所有代理人在多试玩游戏中得到的奖励总和的行为 -- -- 至关重要。然而,当多重平衡存在时,先进的强化学习算法往往会与社会上不太可取的平衡相融合。以前的作品以明确的奖赏方式应对这一挑战,这就要求强烈假定代理人可能被迫偏向社会。我们提议使用一种限制较少的同侪奖赏机制,即赠送,引导代理人走向社会上更可取的平衡,同时允许代理人保持自私和分散。赠送使每个代理人将其部分奖赏给予其他代理人。我们采用了一种理论框架,通过在动态系统中将平衡的吸引力盆地定性,捕捉到天赋,使其与有利于社会平衡的优势。我们利用天赋,通过数字分析和实验,展示了高风险、一般和协作游戏与有利于社会平衡的日益趋同。