In the future, artificial learning agents are likely to become increasingly widespread in our society. They will interact with both other learning agents and humans in a variety of complex settings including social dilemmas. We argue that there is a need for research on the intersection between game theory and artificial intelligence, with the goal of achieving cooperative artificial intelligence that can navigate social dilemmas well. We consider the problem of how an external agent can promote cooperation between artificial learners by distributing additional rewards and punishments based on observing the actions of the learners. We propose a rule for automatically learning how to create the right incentives by considering the anticipated parameter updates of each agent. Using this learning rule leads to cooperation with high social welfare in matrix games in which the agents would otherwise learn to defect with high probability. We show that the resulting cooperative outcome is stable in certain games even if the planning agent is turned off after a given number of episodes, while other games require ongoing intervention to maintain mutual cooperation. Finally, we reflect on what the goals of multi-agent reinforcement learning should be in the first place, and discuss the necessary building blocks towards the goal of building cooperative AI.
翻译:未来,人造学习代理人有可能在社会中日益普及,他们将与其他学习代理人和人类在包括社会困境在内的各种复杂环境中互动。我们认为,有必要研究游戏理论与人工智能之间的交叉点,目的是获得能够很好地解决社会困境的合作人工智能。我们考虑了外部代理人如何通过根据观察学习者的行动分配额外的奖惩来促进人造学习者之间的合作问题。我们提出了一个规则,即通过考虑每个代理人的预期参数更新,自动学习如何创造正确的奖励。使用这一学习规则,将促成在矩阵游戏中与高社会福利合作,使这些代理人否则会高概率地学会叛逃。我们表明,即使规划代理人在一定数量的事件后被关闭,由此产生的合作结果在某些游戏中是稳定的,而其他游戏则需要持续干预,以保持相互合作。最后,我们思考多代理人强化学习的目标首先应该是什么,并讨论朝着建立合作AI的目标的必要基础。