Exploration is critical for good results in deep reinforcement learning and has attracted much attention. However, existing multi-agent deep reinforcement learning algorithms still use mostly noise-based techniques. Very recently, exploration methods that consider cooperation among multiple agents have been developed. However, existing methods suffer from a common challenge: agents struggle to identify states that are worth exploring, and hardly coordinate exploration efforts toward those states. To address this shortcoming, in this paper, we propose cooperative multi-agent exploration (CMAE): agents share a common goal while exploring. The goal is selected from multiple projected state spaces via a normalized entropy-based technique. Then, agents are trained to reach this goal in a coordinated manner. We demonstrate that CMAE consistently outperforms baselines on various tasks, including a sparse-reward version of the multiple-particle environment (MPE) and the Starcraft multi-agent challenge (SMAC).
翻译:然而,现有的多试剂深度强化学习算法仍然主要使用以噪音为基础的技术。最近,开发了考虑多种物剂之间合作的勘探方法。然而,现有方法面临一个共同的挑战:代理人努力寻找值得探索的国家,几乎无法协调针对这些国家的勘探努力。为了解决这一缺陷,我们在本文件中建议合作进行多剂探索:代理人在探索时有一个共同目标。目标是通过一种基于加密的常规技术从多个预测的州空间中挑选出来的。然后,对代理人进行培训,以便以协调的方式达到这一目标。我们证明,CMAE始终超越了各种任务的基准,包括多粒子环境(MPE)和星际多剂挑战(SMAC)。