Demonstrations provide insight into relevant state or action space regions, bearing great potential to boost the efficiency and practicality of reinforcement learning agents. In this work, we propose to leverage demonstration datasets by combining skill learning and sequence modeling. Starting with a learned joint latent space, we separately train a generative model of demonstration sequences and an accompanying low-level policy. The sequence model forms a latent space prior over plausible demonstration behaviors to accelerate learning of high-level policies. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning on transfer tasks. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments with a complex, simulated humanoid. Videos, source code and pre-trained models are available at the corresponding project website at https://facebookresearch.github.io/latent-space-priors .
翻译:演示可以深入了解相关的州或行动空间区域,具有提高强化学习机构的效率和实用性的巨大潜力。 在这项工作中,我们提议通过将技能学习和序列建模相结合来利用示范数据集。从一个学习的共同潜在空间开始,我们分别培训一个示范序列的基因化模型和相应的低层次政策。序列模型在超越可信的示范行为之前形成一个潜在空间,以加速学习高级别政策。我们展示了如何从仅由国家进行的动作捕捉示范中获取此类前科,并探索了将之纳入转让任务政策学习的若干方法。我们的实验结果证实,潜空前科在具有复杂、模拟人类结构的挑战性稀疏环境的学习速度和最终表现方面有显著的收益。视频、源代码和预培训模型可以在相应的项目网站上查阅 https://facebourseresearch.github.io/latent-space-priors。