In many societal and industrial interactions, participants generally prefer their pure self-interest at the expense of the global welfare. Known as social dilemmas, this category of non-cooperative games offers situations where multiple actors should all cooperate to achieve the best outcome but greed and fear lead to a worst self-interested issue. Recently, the emergence of Deep Reinforcement Learning (RL) has generated revived interest in social dilemmas with the introduction of Sequential Social Dilemma (SSD). Cooperative agents mixing RL policies and Tit-for-tat (TFT) strategies have successfully addressed some non-optimal Nash equilibrium issues. However, this kind of paradigm requires symmetrical and direct cooperation between actors, conditions that are not met when mutual cooperation become asymmetric and is possible only with at least a third actor in a circular way. To tackle this issue, this paper extends SSD with Circular Sequential Social Dilemma (CSSD), a new kind of Markov games that better generalizes the diversity of cooperation between agents. Secondly, to address such circular and asymmetric cooperation, we propose a candidate solution based on RL policies and a graph-based TFT. We conducted some experiments on a simple multi-player grid world which offers adaptable cooperation structures. Our work confirmed that our graph-based approach is beneficial to address circular situations by encouraging self-interested agents to reach mutual cooperation.
翻译:在许多社会和工业互动中,参与者一般倾向于纯粹的自我利益而牺牲全球福利。这种不合作的游戏被称为社会困境,但这种不合作的游戏提供的情况是,多个行为体都应合作,以取得最佳结果,但贪婪和恐惧导致一个最坏的自我利益问题。最近,随着 " 社会困境 " (SSD)的引入,深强化学习(RL)的出现,对社会困境重新产生了兴趣。将RL政策与TT(TF)战略混杂在一起的合作代理人成功地解决了一些非最佳的纳什均衡问题。然而,这种模式需要行为者之间进行对称和直接的合作,而当相互合作变得不对称时,这些条件是无法满足的,而且只有至少有第三个行为体可以循环地这样做。为了解决这个问题,本文件将SDSD与 " 社会困境 " (SSDD)通告(SDD)(SDSD)(一种新型的Markov游戏,更好地概括了代理人之间的合作的多样性。第二,为了解决这种通报和不对称的合作问题,我们提出了一种基于RL政策的候选人解决方案的解决方案,而一种基于简单的网络式的自我实验,我们通过一个简单的网络式的自我实验,可以使我们的网络提供一种鼓励的网络式合作。