We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern deep RL agents exhibit some of these abilities in isolation, none are suitable for the full RTS challenge. To enable progress toward RTS, we introduce two challenge domains: (1) a minimal RTS challenge called the Memory&Planning Game and (2) One-Shot StreetLearn Navigation, which introduces scale and complexity from real-world data. We demonstrate that state-of-the-art deep RL agents fail at RTS in both domains, and that this failure is due to an inability to plan over gathered knowledge. We develop Episodic Planning Networks (EPNs) and show that deep-RL agents with EPNs excel at RTS, outperforming the nearest baseline by factors of 2-3 and learning to navigate held-out StreetLearn maps within a single episode. We show that EPNs learn to execute a value iteration-like planning algorithm and that they generalize to situations beyond their training experience. algorithm and that they generalize to situations beyond their training experience.
翻译:我们提出在新的环境中迅速解决问题的挑战,即代理人必须在不熟悉的环境中尽快解决一系列任务。一个有效的RTS代理必须在探索不熟悉的环境和解决当前任务之间取得平衡,一方面要建立新环境的模型,在面对后期任务时可以规划。虽然现代深层RL代理机构孤立地展示了其中的一些能力,但没有一种能适应新的RTS的完整挑战。为了能够向RTS取得进展,我们引入了两个挑战领域:(1)一个叫作记忆和规划游戏的最低限度RTS挑战,以及(2)一流的Shot StreetLearn导航,它从真实世界的数据中引入了规模和复杂性。我们证明,在这两个领域,最先进的RTS代理机构都无法在新的环境中进行规划。尽管现代深层RL代理机构在孤立地展示了这些能力,但我们开发了Episodi规划网络并展示了ERPS在RTS上最优秀的深度RL代理机构,通过2-3级因素表现了最近的基线,并学习了在单一的阶段内绘制稳住的Streal-Leararson地图。我们展示了EPOL系统,以便进行总体的学习。