Non-parametric episodic memory can be used to quickly latch onto high-reward experience in reinforcement learning tasks. In contrast to parametric deep reinforcement learning approaches, these methods only need to discover the solution once, and may then repeatedly solve the task. However, episodic control solutions are stored in discrete tables, and this approach has so far only been applied to discrete action space problems. Therefore, this paper introduces Continuous Episodic Control (CEC), a novel non-parametric episodic memory algorithm for sequential decision making in problems with a continuous action space. Results on several sparse-reward continuous control environments show that our proposed method learns faster than state-of-the-art model-free RL and memory-augmented RL algorithms, while maintaining good long-run performance as well. In short, CEC can be a fast approach for learning in continuous control tasks, and a useful addition to parametric RL methods in a hybrid approach as well.
翻译:非参数内存可用于快速获取在强化学习任务方面的高回报经验。 与参数深度强化学习方法相反,这些方法只需要一次发现解决方案,然后可能反复解决任务。 但是,偶发控制解决方案存储在离散表格中,而这种方法迄今为止只用于独立的行动空间问题。 因此,本文件引入了“连续偏移控制”(CEC),这是用于在连续行动空间的问题中进行连续决策的新颖的非参数性偶发存储算法。 几个微弱反向连续控制环境的结果显示,我们拟议方法的学习速度比最先进的无模型RL和内存强化RL算法要快,同时保持良好的长期性能。 简而言之, CEC可以快速地学习连续控制任务,在混合方法中也可以对参数RL法进行有益的补充。