How cooperation emerges is a long-standing and interdisciplinary problem. Game-theoretical studies on social dilemmas reveal that altruistic incentives are critical to the emergence of cooperation but their analyses are limited to stateless games. For more realistic scenarios, multi-agent reinforcement learning has been used to study sequential social dilemmas (SSDs). Recent works show that learning to incentivize other agents can promote cooperation in SSDs. However, we find that, with these incentivizing mechanisms, the team cooperation level does not converge and regularly oscillates between cooperation and defection during learning. We show that a second-order social dilemma resulting from the incentive mechanisms is the main reason for such fragile cooperation. We formally analyze the dynamics of second-order social dilemmas and find that a typical tendency of humans, called homophily, provides a promising solution. We propose a novel learning framework to encourage homophilic incentives and show that it achieves stable cooperation in both SSDs of public goods and tragedy of the commons.
翻译:如何开展合作是一个长期和跨学科的问题。关于社会困境的游戏理论研究显示,利他主义激励机制对于合作的出现至关重要,但其分析仅限于无国籍游戏。对于更现实的情况,多剂强化学习已经用于研究相继的社会困境(SSDS)。最近的工作表明,学习激励其他代理人可以促进SSD的合作。然而,我们发现,有了这些激励机制,团队合作水平在学习期间的合作和叛逃之间并不趋同和经常振荡。我们表明,由激励机制产生的二阶社会困境是这种脆弱合作的主要原因。我们正式分析二阶社会困境的动态,发现典型的人类趋势,即同族主义趋势,提供了一种有希望的解决办法。我们提出了一个新颖的学习框架,鼓励同性哲学激励机制,并表明它在公共商品的SSD和公域的悲剧中都实现了稳定的合作。