Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency. Generally, episodic control-based approaches are solutions that leverage highly-rewarded past experiences to improve sample efficiency of DRL algorithms. However, previous episodic control-based approaches fail to utilize the latent information from the historical behaviors (e.g., state transitions, topological similarities, etc.) and lack scalability during DRL training. This work introduces Neural Episodic Control with State Abstraction (NECSA), a simple but effective state abstraction-based episodic control containing a more comprehensive episodic memory, a novel state evaluation, and a multi-step state analysis. We evaluate our approach to the MuJoCo and Atari tasks in OpenAI gym domains. The experimental results indicate that NECSA achieves higher sample efficiency than the state-of-the-art episodic control-based approaches. Our data and code are available at the project website\footnote{\url{https://sites.google.com/view/drl-necsa}}.
翻译:现有的深强化学习算法(DRL)存在效率低下的样本。一般而言,基于偶发性控制的方法是利用高回报的过去经验来提高DRL算法的样本效率的解决方案。然而,以往基于偶发性控制的方法未能利用历史行为(例如州过渡、地貌相似等)的潜在信息,在DRL培训期间也缺乏可缩放性。这项工作引入了Neural Episodic Controlation with State Empleticion(NECSA),这是一个简单而有效的基于国家抽象的缩记控制,包含更全面的缩记记忆、新颖的国家评价和多步骤状态分析。我们评估了我们在OpenAI健身区执行MuJoCo和Atari任务的方法。实验结果表明,NECSA的样本效率高于基于State-the-condical controduction-方法。我们的数据和代码可以在项目网站上查阅。https://site.gogle.com/view/dr-necsa。