When thrust into an unfamiliar environment and charged with solving a series of tasks, an effective agent should (1) leverage prior knowledge to solve its current task while (2) efficiently exploring to gather knowledge for use in future tasks, and then (3) plan using that knowledge when faced with new tasks in that same environment. We introduce two domains for conducting research on this challenge, and find that state-of-the-art deep reinforcement learning (RL) agents fail to plan in novel environments. We develop a recursive implicit planning module that operates over episodic memories, and show that the resulting deep-RL agent is able to explore and plan in novel environments, outperforming the nearest baseline by factors of 2-3 across the two domains. We find evidence that our module (1) learned to execute a sensible information-propagating algorithm and (2) generalizes to situations beyond its training experience.
翻译:当被推入不熟悉的环境并负责解决一系列任务时,一个有效的代理机构应该(1) 利用先前的知识来完成目前的任务,(2) 有效地探索收集知识,用于未来的任务,(3) 在面临同一环境中的新任务时利用这种知识的计划。我们引入了两个领域来研究这一挑战,发现最先进的深层强化学习(RL)代理机构在新环境中无法规划。我们开发了一个循环的隐含规划模块,该模块在偶发记忆中运作,并表明由此形成的深层RL代理机构能够在新环境中探索和规划,在两个领域之间以2-3系数比最近的基线表现得更强。我们发现有证据表明,我们的模块(1) 学会了实施明智的信息传播算法,(2) 概括了培训经验之外的情况。