Hierarchical Reinforcement Learning (HRL) has made notable progress in complex control tasks by leveraging temporal abstraction. However, previous HRL algorithms often suffer from serious data inefficiency as environments get large. The extended components, $i.e.$, goal space and length of episodes, impose a burden on either one or both high-level and low-level policies since both levels share the total horizon of the episode. In this paper, we present a method of Decoupling Horizons Using a Graph in Hierarchical Reinforcement Learning (DHRL) which can alleviate this problem by decoupling the horizons of high-level and low-level policies and bridging the gap between the length of both horizons using a graph. DHRL provides a freely stretchable high-level action interval, which facilitates longer temporal abstraction and faster training in complex tasks. Our method outperforms state-of-the-art HRL algorithms in typical HRL environments. Moreover, DHRL achieves long and complex locomotion and manipulation tasks.
翻译:通过利用时间抽象,等级强化学习在复杂的控制任务方面取得了显著进展;然而,以往的HRL算法往往由于环境变大而严重缺乏数据效率;扩展的构件,即美元、目标空间和片段长度,对一个或两个高层次和低层次的政策都造成负担,因为这两个级别都具有片段的总视野;在本文中,我们提出了一个使用等级强化学习图解析地平线的方法,通过分解高层次和低层次政策的视野和用图表缩小两个地平线之间的距离,可以缓解这一问题;DHRL提供了可自由伸展的高层次行动间隔,有利于较长的时间抽象和更快的复杂任务培训。我们的方法在典型的HRL环境中超越了先进的HRL算法。此外,DHRL还实现了长期和复杂的传动和操纵任务。