We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art algorithms that rely on complex neural network architectures.
翻译:我们将记忆和遗忘经验回放算法(ReF-ER)扩展为多力强化学习(MARL) 。 reF-ER显示,在从 OpenAI Gym 到复杂的流体流动等各种问题上,对持续控制的问题,ReF-ER的算法优于艺术算法。在MARL中,代理商之间的依赖性包含在州价值估计器中,环境动态则通过ReF-ER的比重进行模型构建。在合作环境中,当使用个人奖赏估算值时,我们发现最佳性能,我们忽略了其他行动对过渡图的影响。我们用斯坦福智能系统实验室(SISL)环境对ReF-ERMARL的性能进行基准。我们发现,在ReF-ERMARL中,对政策和价值功能使用单一的进向神经网络网络,比依赖复杂的神经网络结构的艺术算法状态要强。