Problems which require both long-horizon planning and continuous control capabilities pose significant challenges to existing reinforcement learning agents. In this paper we introduce a novel hierarchical reinforcement learning agent which links temporally extended skills for continuous control with a forward model in a symbolic discrete abstraction of the environment's state for planning. We term our agent SEADS for Symbolic Effect-Aware Diverse Skills. We formulate an objective and corresponding algorithm which leads to unsupervised learning of a diverse set of skills through intrinsic motivation given a known state abstraction. The skills are jointly learned with the symbolic forward model which captures the effect of skill execution in the state abstraction. After training, we can leverage the skills as symbolic actions using the forward model for long-horizon planning and subsequently execute the plan using the learned continuous-action control skills. The proposed algorithm learns skills and forward models that can be used to solve complex tasks which require both continuous control and long-horizon planning capabilities with high success rate. It compares favorably with other flat and hierarchical reinforcement learning baseline agents and is successfully demonstrated with a real robot.
翻译:需要长期横向规划和连续控制能力的问题对现有的强化学习剂构成重大挑战。 在本文件中,我们引入了一种新的等级强化学习剂,将时间上扩展的连续控制技能与前方模型联系起来,在环境状态的象征性离散抽取中进行规划。我们将我们的代理SEADS 用于符号效应-软件多样化技能。我们制定了一个客观和相应的算法,通过已知的状态抽取,通过内在动机,导致不受监督地学习多种技能。这些技能与象征式前方模型共同学习,该模型捕捉到在州抽象中执行技能的效果。经过培训,我们可以利用前方模型进行长期横向规划,然后利用学习的连续操作控制技能来执行计划。拟议的算法学习技能和前方模型,这些技能和前方模型可以用来解决复杂的任务,既需要持续控制,也需要长期和长视距规划能力,而且成功率很高。与其他平坦的强化学习基线剂相比,这些技能与其他平坦和分级基准代理成功演示。