This paper proposes an exploration technique for multi-agent reinforcement learning (MARL) with graph-based communication among agents. We assume the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled. In the proposed framework, neighbouring agents collaborate to estimate the uncertainty about the state-action space in order to execute more efficient explorative behaviour. Different from existing works, the proposed algorithm does not require counting mechanisms and can be applied to continuous-state environments without requiring complex conversion techniques. Moreover, the proposed scheme allows agents to communicate in a fully decentralized manner with minimal information exchange. And for continuous-state scenarios, each agent needs to exchange only a single parameter vector. The performance of the algorithm is verified with theoretical results for discrete-state scenarios and with experiments for continuous ones.
翻译:本文提出了一种基于图形通信的探索技术,用于多智能体强化学习 (MARL)。我们假设代理收到的个体奖励与其他代理的动作无关,而它们的策略相互耦合。在所提出的框架中,相邻的代理合作估计状态-动作空间的不确定性,以执行更高效的探索行为。与现有的工作不同,所提出的算法不需要计数机制,并且可以在不需要复杂转化技术的情况下应用于连续状态的环境。此外,所提出的方案允许代理以完全去中心化的方式通信,并且信息交换最小化。对于连续状态的情况,每个代理仅需交换单个参数向量。通过理论结果验证了算法的性能,对于离散状态的情况进行了实验验证。