We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. We propose to leverage off-policy Meta-RL combined with a trajectory-centric smoothness term to learn a set of parameterized skills. Our agent can use these learned skills to construct a three-level hierarchical framework that models a Temporally-extended Parameterized Action Markov Decision Process. We empirically demonstrate that the proposed algorithms enable an agent to solve a set of difficult long-horizon (obstacle-course and robot manipulation) tasks.
翻译:我们提出一个新的参数化技能学习算法,旨在学习可转移参数化技能,并将其综合成新的行动空间,支持在长方位任务中有效学习。我们提议利用政策外的Meta-RL和以轨迹为中心的平滑术语学习一套参数化技能。我们的代理商可以利用这些学习技能来构建一个三级等级框架,以模拟临时扩展的可计量行动Markov 决策过程。我们从经验上证明,拟议的算法使代理商能够解决一系列困难的长方位(闭路和机器人操纵)任务。</s>