A hallmark of an AI agent is to mimic human beings to understand and interact with others. In this paper, we propose a collaborative multi-agent reinforcement learning algorithm to learn a \emph{joint} policy through the interactions over agents. To make a joint decision over the group, each agent makes an initial decision and tells its policy to its neighbors. Then each agent modifies its own policy properly based on received messages and spreads out its plan. As this intention propagation procedure goes on, we prove that it converges to a mean-field approximation of the joint policy with the framework of neural embedded probabilistic inference. We evaluate our algorithm on several large scale challenging tasks and demonstrate that it outperforms previous state-of-the-arts.
翻译:AI 代理商的一个标志是模仿人类来理解和与他人互动。 在本文中, 我们提出一个合作性多试剂强化学习算法, 通过代理商的互动来学习 \ emph{ joint} 政策 。 为了对集团做出联合决定, 每个代理商做出初步决定, 并告诉邻居自己的政策 。 然后每个代理商根据收到的信息适当修改自己的政策, 并传播自己的计划 。 随着这个意图传播程序继续下去, 我们证明它与神经嵌入的概率推论框架相近。 我们评估了我们的算法, 并证明它比以前的艺术状态要强得多 。