Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy HRL often suffers from the problem of a non-stationary high-level policy since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. In practice, the adversarial learning is implemented by training a simple state-conditioned discriminator network concurrently with the high-level policy which determines the compatibility level of subgoals. Comparison to state-of-the-art algorithms shows that our approach improves both learning efficiency and performance in challenging continuous control tasks.
翻译:分级强化学习(HRL)建议通过在连续较高的时间抽取级别上进行决策和控制来解决困难的任务,然而,由于低层次政策不断变化,离政策的HRL经常遇到非固定性高级别政策的问题。在本文件中,我们提出一种新的HRL办法,通过对抗性地执行高级别政策,产生与目前对低层次政策进行即时调整相符的次级目标来减轻不固定性。在实践中,对抗性学习是通过培训一个简单的国家条件歧视者网络和确定次级目标相容程度的高级政策同时实施的。与最新算法比较表明,我们的方法提高了学习效率和挑战持续控制任务的业绩。</s>