Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.
翻译:多试剂强化学习的探索是一个具有挑战性的问题,尤其是在回报微薄的环境中。我们提出一种通过在代理商之间分享经验进行有效探索的一般方法。我们提议的算法叫做“共享经验Actor-Critic ” (SEAC),将经验分享应用到一个演员-批评框架中。我们用一套稀薄的多试剂环境来评估SEAC,发现它通过学习较少的步骤和融合到更高的回报率,始终优于两个基线和两个最先进的算法。在一些更困难的环境中,经验分享使学习解决任务与根本不学习之间产生差异。