SOTA multiagent reinforcement algorithms distinguish themselves in many ways from their single-agent equivalences, except that they still totally inherit the single-agent exploration-exploitation strategy. We report that naively inheriting this strategy from single-agent algorithms causes potential collaboration failures, in which the agents blindly follow mainstream behaviors and reject taking minority responsibility. We named this problem the diffusion of responsibility (DR) as it shares similarities with a same-name social psychology effect. In this work, we start by theoretically analyzing the cause of the DR problem, emphasizing it is not relevant to the reward crafting or the credit assignment problems. We propose a Policy Resonance approach to address the DR problem by modifying the multiagent exploration-exploitation strategy. Next, we show that most SOTA algorithms can equip this approach to promote collaborative agent performance in complex cooperative tasks. Experiments are performed in multiple test benchmark tasks to illustrate the effectiveness of this approach.
翻译:SOTA多剂强化算法在许多方面区别于其单一剂等同法,除了它们仍然完全继承单一剂勘探开发战略之外。我们报告说,从单一剂勘探开发算法中天真地继承这一战略可能造成合作失败,其中代理人盲目地遵循主流行为,拒绝承担少数人责任。我们把责任的分散称为问题,因为它与同名社会心理学效应有相似之处。在这项工作中,我们从理论上分析DR问题的原因开始,强调它与奖励的手法或信用分配问题无关。我们建议了一种政策共振方法,通过修改多剂勘探开发战略来解决DR问题。接下来,我们表明大多数SOTA算法可以使这一方法能够促进合作者在复杂合作任务中的表现。实验是在多个测试基准任务中进行的,以说明这一方法的有效性。