具有等级低端混合政策学习可转让的机动技能 (Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies)

Dushyant Rao,Fereshteh Sadeghi,Leonard Hasenclever,Markus Wulfmeier,Martina Zambelli,Giulia Vezzani,Dhruva Tirumala,Yusuf Aytar,Josh Merel,Nicolas Heess,Raia Hadsell

For robots operating in the real world, it is desirable to learn reusable behaviours that can effectively be transferred and adapted to numerous tasks and scenarios. We propose an approach to learn abstract motor skills from data using a hierarchical mixture latent variable model. In contrast to existing work, our method exploits a three-level hierarchy of both discrete and continuous latent variables, to capture a set of high-level behaviours while allowing for variance in how they are executed. We demonstrate in manipulation domains that the method can effectively cluster offline data into distinct, executable behaviours, while retaining the flexibility of a continuous latent variable model. The resulting skills can be transferred and fine-tuned on new tasks, unseen objects, and from state to vision-based policies, yielding better sample efficiency and asymptotic performance compared to existing skill- and imitation-based methods. We further analyse how and when the skills are most beneficial: they encourage directed exploration to cover large regions of the state space relevant to the task, making them most effective in challenging sparse-reward settings.

翻译：对于在现实世界中运作的机器人,最好学习可有效转让和适应多种任务和情景的可再使用行为。我们建议一种方法,从使用等级混合潜伏变数模型的数据中学习抽象的机动能力。与现有工作不同,我们的方法利用离散和连续潜伏变数的三层等级来捕捉一套高层次的行为,同时允许在如何执行时出现差异。我们在操纵领域表明,该方法可以有效地将离线数据分组成不同、可执行的行为,同时保留持续潜伏变数模型的灵活性。由此产生的技能可以转让和微调,适应新的任务、隐形物体以及从国家政策到基于愿景的政策,产生更好的样本效率和无干扰性表现,与现有的技能和仿造方法相比。我们进一步分析这些技能如何以及何时最有益:它们鼓励直接探索,以覆盖与任务相关的大片空间,使其在挑战性微变化环境中最为有效。