Demonstrations provide insight into relevant state or action space regions, bearing great potential to boost the efficiency and practicality of reinforcement learning agents. In this work, we propose to leverage demonstration datasets by combining skill learning and sequence modeling. Starting with a learned joint latent space, we separately train a generative model of demonstration sequences and an accompanying low-level policy. The sequence model forms a latent space prior over plausible demonstration behaviors to accelerate learning of high-level policies. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning on transfer tasks. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance. We benchmark our approach on a set of challenging sparse-reward environments with a complex, simulated humanoid, and on offline RL benchmarks for navigation and object manipulation. Videos, source code and pre-trained models are available at the corresponding project website at https://facebookresearch.github.io/latent-space-priors .
翻译:演示可以深入了解相关的州或行动空间区域,具有提高强化学习机构的效率和实用性的巨大潜力。 在这项工作中,我们提议通过将技能学习和序列模型结合起来来利用示范数据集。从一个学习的共同潜在空间开始,我们分别培训一个示范序列的基因化模型和相应的低层次政策。序列模型形成一个潜伏空间,先于可信的示范行为,然后加速学习高层政策。我们展示了如何从国家专用的运动捕捉演示中获取此类前科,并探索了将前科纳入转让任务政策学习的若干方法。我们的实验结果证实,潜空前科在学习速度和最后性能方面都取得了显著的收益。我们把我们的方法以一套具有挑战性的稀有环境作为基准,其中含有复杂、模拟的人类结构,以及用于导航和物体操作的离线性RL基准。视频、源代码和预先培训的模型可在相应的项目网站上查阅 https://facebookresear.github.io/latent-space-priorors。</s>