高效学习技能汇辑的动态软件质量多样性 (Dynamics-Aware Quality-Diversity for Efficient Learning of Skill Repertoires)

Quality-Diversity (QD) algorithms are powerful exploration algorithms that allow robots to discover large repertoires of diverse and high-performing skills. However, QD algorithms are sample inefficient and require millions of evaluations. In this paper, we propose Dynamics-Aware Quality-Diversity (DA-QD), a framework to improve the sample efficiency of QD algorithms through the use of dynamics models. We also show how DA-QD can then be used for continual acquisition of new skill repertoires. To do so, we incrementally train a deep dynamics model from experience obtained when performing skill discovery using QD. We can then perform QD exploration in imagination with an imagined skill repertoire. We evaluate our approach on three robotic experiments. First, our experiments show DA-QD is 20 times more sample efficient than existing QD approaches for skill discovery. Second, we demonstrate learning an entirely new skill repertoire in imagination to perform zero-shot learning. Finally, we show how DA-QD is useful and effective for solving a long horizon navigation task and for damage adaptation in the real world. Videos and source code are available at: https://sites.google.com/view/da-qd.

翻译：质量- 多样性( QD) 算法是强大的探索算法, 使机器人能够发现大量多样化和高性能技能的大型系列。然而, QD 算法是抽样低效的, 需要数以百万计的评价。在本文中, 我们提出“ 动态- 软件质量- 多样性( DA- QD) ” (DA- QD), 这是一个通过使用动态模型提高QD 算法的样本效率的框架。我们还展示了如何将DA- QD 用于持续获取新的技能编程。为了这样做,我们从使用 QD 进行技能发现时获得的经验中逐步培养了一种深层次的动态模型。然后, 我们可以用想象的技能序列来进行QD 的想象探索。我们在三个机器人实验中评估我们的方法。首先, 我们的DA- QD 显示的样本效率比现有的 QD 技术发现方法要高20倍。其次, 我们展示了在想象中学习一种全新的技能, 来进行零光谱学习。最后, 我们展示D- QDA- 是如何有用和有效解决远程导航任务和 / 可用于世界的源/ 。