We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. We first propose novel learning objectives -- trajectory-centric diversity and smoothness -- that allow an agent to meta-learn reusable parameterized skills. Our agent can use these learned skills to construct a temporally-extended parameterized-action Markov decision process, for which we propose a hierarchical actor-critic algorithm that aims to efficiently learn a high-level control policy with the learned skills. We empirically demonstrate that the proposed algorithms enable an agent to solve a complicated long-horizon obstacle-course environment.
翻译:我们提出一个新的参数化技能学习算法,旨在学习可转移参数化技能,并将其综合成一个新的行动空间,支持在长视距任务中有效学习。我们首先提出新的学习目标 -- -- 以轨迹为中心的多样性和光滑性 -- -- 使一个代理商能够进行元 Learn 可重复使用的参数化技能。我们的代理商可以利用这些学习技能来构建一个时间延伸参数化的马尔科夫行动决策程序,为此我们提议了一个分级的行为者-criction算法,目的是通过学习技能来有效学习高水平的控制政策。我们从经验上证明,拟议的算法使代理商能够解决一个复杂的长视距障碍环境。