深强化学习通用记忆 (Generalizable Episodic Memory for Deep Reinforcement Learning)

Episodic memory-based methods can rapidly latch onto past successful strategies by a non-parametric memory and improve sample efficiency of traditional reinforcement learning. However, little effort is put into the continuous domain, where a state is never visited twice, and previous episodic methods fail to efficiently aggregate experience across trajectories. To address this problem, we propose Generalizable Episodic Memory (GEM), which effectively organizes the state-action values of episodic memory in a generalizable manner and supports implicit planning on memorized trajectories. GEM utilizes a double estimator to reduce the overestimation bias induced by value propagation in the planning process. Empirical evaluation shows that our method significantly outperforms existing trajectory-based methods on various MuJoCo continuous control tasks. To further show the general applicability, we evaluate our method on Atari games with discrete action space, which also shows a significant improvement over baseline algorithms.

翻译：以记忆为基础的方法可以通过非参数内存迅速连接到过去的成功战略,并提高传统强化学习的样本效率。但是,很少努力进入连续领域,在这个领域,一个州从未被访问过两次,先前的偶发方法未能有效地综合跨轨道的经验。为了解决这个问题,我们提议通用记忆(GEM),它能以可普遍接受的方式有效地组织分流内存的状态-行动值,并支持对默记轨迹进行隐性规划。 GEM使用双重估计器来减少规划过程中的价值观传播引起的过高估计偏差。经验性评估表明,我们的方法大大超过穆乔科连续控制任务中现有的轨迹方法。为了进一步显示一般适用性,我们用离散动作空间来评估我们关于阿塔里游戏的方法,这也表明基线算法有了显著的改进。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

【牛津大学博士论文】基于强化学习的无地图机器人导航，Reinforcement Learning Based MRN

专知会员服务

121+阅读 · 2020年5月18日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日