Hierarchical reinforcement learning is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we propose a novel hierarchical algorithm by generating a curriculum of achievable subgoals for evolving lower-level primitives using reinforcement learning and imitation learning. The lower level primitive periodically performs data relabeling on a handful of expert demonstrations using our primitive informed parsing approach. We provide expressions to bound the sub-optimality of our method and develop a practical algorithm for hierarchical reinforcement learning. Since our approach uses a handful of expert demonstrations, it is suitable for most robotic control tasks. Experimental evaluation on complex maze navigation and robotic manipulation environments show that inducing hierarchical curriculum learning significantly improves sample efficiency, and results in efficient goal conditioned policies for solving temporally extended tasks.
翻译:CRISP:课程诱导基本信息子目标预测用于分层强化学习
分层强化学习是一种利用时间抽象来解决复杂长期问题的有前途的方法。然而,同时学习层次结构的策略是不稳定的,因为当低阶基元非平稳时,训练高阶策略是具有挑战性的。在本文中,我们提出了一种新颖的分层算法,通过使用强化学习和模仿学习生成可实现的子目标的课程来演化较低层次的基元。低层基元定期使用我们的基元信息解析方法对一些专家示范进行数据重新标记。我们提供表达式来限制我们方法的次优性,并开发用于分层强化学习的实用算法。由于我们的方法使用了一些专家示范,所以适用于大多数机器人控制任务。在复杂的迷宫导航和机器人操作环境上进行的实验评估表明,引入分层课程学习显著提高了采样效率,并产生了有效的目标条件策略,用于解决时间上扩展的任务。