We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). {ReF-ER} was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art algorithms that rely on complex neural network architectures.
翻译:我们将记忆和遗忘经验回放(ReF-ER)算法延伸至多力强化学习(MARL) 。 {ReF-ER} 显示在从 OpenAI Gym 到复杂的流体流动等问题上,持续控制的问题比艺术算法的状态要好。 在MARL 中, 代理商之间的依赖性包含在国家价值估计器中, 而环境动态则通过 ReF-ER 使用的重要性权重来建模。 在合作环境中, 当使用个人奖赏来估计价值时, 我们发现最佳的性能, 我们忽略了其他动作对过渡图的影响。 我们用斯坦福智能系统实验室( SISL) 环境来衡量ReF- ER MARL 的性能。 我们发现, 使用单一的向导神经网络网络网络网络(SISL) 环境的性能网络, 以及ReF- ER MARL 的值函数, 超越了依赖复杂神经网络结构的艺术算法的状态 。