Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large. Searching in a large goal space poses difficulties for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that the proposed adjacency constraint preserves the optimal hierarchical policy in deterministic MDPs, and show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks show that incorporating the adjacency constraint improves the performance of state-of-the-art HRL approaches in both deterministic and stochastic environments.
翻译:以目标为条件的等级强化学习(HRL)是提升强化学习(RL)技术的一个很有希望的方法,但是,由于高层次(即目标空间)的行动空间往往很大,培训效率往往低下,因为高层次(即目标空间)的行动空间往往很大。在大型目标空间的搜索给高层次次级目标生成和低层次政策学习都带来了困难。在本文件中,我们表明,将高层次行动空间从整个目标空间限制为目前状态的一个低步相邻区域,从而有效地缓解了这一问题。我们理论上证明,拟议的对等限制保留了确定性多边发展方案的最佳等级政策,并表明这一限制可以通过培训对相邻和不相邻次级目标加以区分的对等网络来实际实施。关于离散和连续控制任务的实验结果显示,结合对等限制可以改善确定性和可探测性环境中的州级HRL方法的绩效。