Non-parametric episodic memory can be used to quickly latch onto high-rewarded experience in reinforcement learning tasks. In contrast to parametric deep reinforcement learning approaches in which reward signals need to be back-propagated slowly, these methods only need to discover the solution once, and may then repeatedly solve the task. However, episodic control solutions are stored in discrete tables, and this approach has so far only been applied to discrete action space problems. Therefore, this paper introduces Continuous Episodic Control (CEC), a novel non-parametric episodic memory algorithm for sequential decision making in problems with a continuous action space. Results on several sparse-reward continuous control environments show that our proposed method learns faster than state-of-the-art model-free RL and memory-augmented RL algorithms, while maintaining good long-run performance as well. In short, CEC can be a fast approach for learning in continuous control tasks.
翻译:非参数计算的情节性记忆可用于强化学习任务中迅速捕捉到高奖励的经验。与参数深度强化学习方法相比,其中奖励信号需要缓慢地进行反向传播,这些方法只需要发现解决方案一次,然后就可以重复解决该任务。然而,情节控制的解决方案存储在离散表格中,这种方法迄今仅应用于离散行动空间问题。因此,本文引入了连续化的情节控制(CEC),一种新颖的非参数性情节性记忆算法,用于处理具有连续动作空间的序贯决策问题。在几个稀疏奖励的连续控制环境中的结果表明,我们提出的方法比最先进的无模型 RL 和记忆增强 RL 算法学习更快,同时保持良好的长期性能。简而言之,CEC 可以成为连续控制任务中快速学习的一种方法。