Some of the most relevant future applications of multi-agent systems like autonomous driving or factories as a service display mixed-motive scenarios, where agents might have conflicting goals. In these settings agents are likely to learn undesirable outcomes in terms of cooperation under independent learning, such as overly greedy behavior. Motivated from real world societies, in this work we propose to utilize market forces to provide incentives for agents to become cooperative. As demonstrated in an iterated version of the Prisoner's Dilemma, the proposed market formulation can change the dynamics of the game to consistently learn cooperative policies. Further we evaluate our approach in spatially and temporally extended settings for varying numbers of agents. We empirically find that the presence of markets can improve both the overall result and agent individual returns via their trading activities.
翻译:多试剂系统(如自主驾驶或工厂,作为一种服务形式)今后最相关的一些应用,如自主驾驶或工厂,显示混合动机的情景,其中代理商可能具有相互冲突的目标。在这些环境中,代理商有可能在独立学习的合作中学到不良的结果,例如过度贪婪的行为。我们从现实世界的社会出发,在这项工作中提议利用市场力量激励代理商成为合作者。正如《囚犯困境》的迭代版所显示的那样,拟议的市场配置可以改变游戏的动态,以不断学习合作政策。我们进一步评估我们在空间和时间上为不同代理商扩展的环境下的做法。我们从经验中发现,市场的存在能够通过交易活动改善总体结果和代理商个人回报。