We present HiDe, a novel hierarchical reinforcement learning architecture that successfully solves long horizon control tasks and generalizes to unseen test scenarios. Functional decomposition between planning and low-level control is achieved by explicitly separating the state-action spaces across the hierarchy, which allows the integration of task-relevant knowledge per layer. We propose an RL-based planner to efficiently leverage the information in the planning layer of the hierarchy, while the control layer learns a goal-conditioned control policy. The hierarchy is trained jointly but allows for the modular transfer of policy layers across hierarchies of different agents. We experimentally show that our method generalizes across unseen test environments and can scale to 3x horizon length compared to both learning and non-learning based methods. We evaluate on complex continuous control tasks with sparse rewards, including navigation and robot manipulation.
翻译:我们提出了一个新型的等级强化学习架构HiDe,它成功地解决了长视范围控制任务,并概括了隐蔽的测试情景。规划和低级别控制之间的功能分解是通过在等级之间明确区分州际行动空间来实现的,从而可以将任务相关知识纳入每一层。我们提议了一个基于RL的计划员,以便在等级的规划层中有效地利用信息,而控制层则学习一个有目标限制的控制政策。该等级体系是联合培训的,但允许在不同代理人的等级结构中以模块形式转移政策层。我们实验性地表明,我们的方法在看不见的测试环境中是普遍的,与基于学习和非学习的方法相比,可以达到3x视距长度。我们评估复杂的连续控制任务,其回报是稀少的,包括导航和机器人操纵。