Goal-conditioned hierarchical reinforcement learning (HRL) has shown promising results for solving complex and long-horizon RL tasks. However, the action space of high-level policy in the goal-conditioned HRL is often large, so it results in poor exploration, leading to inefficiency in training. In this paper, we present HIerarchical reinforcement learning Guided by Landmarks (HIGL), a novel framework for training a high-level policy with a reduced action space guided by landmarks, i.e., promising states to explore. The key component of HIGL is twofold: (a) sampling landmarks that are informative for exploration and (b) encouraging the high-level policy to generate a subgoal towards a selected landmark. For (a), we consider two criteria: coverage of the entire visited state space (i.e., dispersion of states) and novelty of states (i.e., prediction error of a state). For (b), we select a landmark as the very first landmark in the shortest path in a graph whose nodes are landmarks. Our experiments demonstrate that our framework outperforms prior-arts across a variety of control tasks, thanks to efficient exploration guided by landmarks.
翻译:高层次强化学习(HRL)在解决复杂和长和远视的RL任务方面显示了有希望的成果。然而,高层次政策在有目标的HRL中的行动空间往往很大,导致探索不善,导致培训效率低下。在本文中,我们介绍了由Landmarks(HIGL)指导的高度强化学习(HIGL)的新框架,这是一个培训高层次政策的新框架,其行动空间以里程碑为指南,即有希望探索的国家为指南。HIGL的关键组成部分是双重的:(a) 为勘探提供信息的抽样标志,和(b) 鼓励高层次政策为选定的里程碑制定次级目标。关于(a),我们考虑两个标准:覆盖整个被访问的国家空间(即国家分散)和新的国家(即预测国家错误)。关于(b),我们选择一个里程碑作为最短路径中的第一个标志,其节点是标志。我们的实验表明,我们的框架超越了前几个里程碑,通过高效的勘探任务加以指导。