Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains for the future off-policy deep reinforcement learning applications in which the allocated memory for the experience replay buffer is limited. To overcome the extrapolation error induced by learning from other agents' experiences, we facilitate our algorithm with a novel off-policy correction technique without any action probability estimates. We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents and exhibits a robust performance when the replay memory is strictly limited.
翻译:高维连续任务中的学习具有挑战性,主要在经验重现记忆非常有限的情况下。我们引入了一个简单而有效的经验分享机制,用于在连续行动领域制定确定性政策,用于未来的政策外深层强化学习应用,其中用于重现缓冲的经验分配记忆有限。为了克服从其他代理人的经验中学习的外推错误,我们用一种没有行动概率估计的新型非政策纠正技术来推动我们的算法。我们测试了我们的方法在挑战 OpenAI Gym 连续控制任务方面的有效性,并得出结论,它可以在多个代理人之间实现安全的经验分享,并在重放记忆受到严格限制时展示一种强有力的表现。