Prioritized experience replay is a reinforcement learning technique shown to speed up learning by allowing agents to replay useful past experiences more frequently. This usefulness is quantified as the expected gain from replaying the experience, and is often approximated as the prediction error (TD-error) observed during the corresponding experience. However, prediction error is only one possible prioritization metric. Recent work in neuroscience suggests that, in biological organisms, replay is prioritized by both gain and need. The need term measures the expected relevance of each experience with respect to the current situation, and more importantly, this term is not currently considered in algorithms such as deep Q-network (DQN). Thus, in this paper we present a new approach for prioritizing experiences for replay that considers both gain and need. We test our approach by considering the need term, quantified as the Successor Representation, into the sampling process of different reinforcement learning algorithms. Our proposed algorithms show a significant increase in performance in benchmarks including the Dyna-Q maze and a selection of Atari games.
翻译:优先经验重现是一种强化学习技术,它让代理商能够更经常地重现有用的过去经验,从而加速学习。这一有用性被量化为重现经验的预期收益,并往往被近似于相应的经验中观察到的预测错误(TD-error),然而,预测错误只是可能的优先度衡量标准之一。神经科学的近期工作表明,在生物机体中,重现是按利得和需要排列优先次序的。需要术语衡量每种经验与当前情况的预期相关性,更重要的是,这一术语目前没有在深Q-网络(DQN)等算法中被考虑。因此,在本文中,我们提出了一个新的方法,用于优先考虑重现既考虑收益又考虑需要的经验。我们通过将“成功代表”这一必要术语纳入不同强化学习算法的抽样过程来测试我们的方法。我们提议的算法显示,包括Dyna-Q Maze和选择Atari游戏在内的基准性能显著提高。