We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer. EVA shifts the value predicted by a neural network with an estimate of the value function found by planning over experience tuples from the replay buffer near the current state. EVA combines a number of recent ideas around combining episodic memory-like structures into reinforcement learning agents: slot-based storage, content-based retrieval, and memory-based planning. We show that EVAis performant on a demonstration task and Atari games.
翻译:我们提出“短暂价值分析”(EVA):一种使深强化学习机构能够迅速适应其重播缓冲的经验的手段。 EVA改变了神经网络预测的价值,通过规划当前状态附近的重播缓冲中的经验图例,对所发现的值函数进行了估计。EVA将一些最近的想法结合起来,这些想法围绕着将偶发记忆相似的结构与强化学习机构相结合:基于槽的存储、基于内容的检索和基于记忆的规划。我们展示了EVAis在演示任务和Atari游戏上的表现。