One of the key capabilities of intelligent agents is the ability to discover useful skills without external supervision. However, the current unsupervised skill discovery methods are often limited to acquiring simple, easy-to-learn skills due to the lack of incentives to discover more complex, challenging behaviors. We introduce a novel unsupervised skill discovery method, Controllability-aware Skill Discovery (CSD), which actively seeks complex, hard-to-control skills without supervision. The key component of CSD is a controllability-aware distance function, which assigns larger values to state transitions that are harder to achieve with the current skills. Combined with distance-maximizing skill discovery, CSD progressively learns more challenging skills over the course of training as our jointly trained distance function reduces rewards for easy-to-achieve skills. Our experimental results in six robotic manipulation and locomotion environments demonstrate that CSD can discover diverse complex skills including object manipulation and locomotion skills with no supervision, significantly outperforming prior unsupervised skill discovery methods. Videos and code are available at https://sites.google.com/view/icml2023csd
翻译:智能剂的关键能力之一是在没有外部监督的情况下发现有用技能的能力。然而,目前不受监督的技能发现方法往往局限于获得简单、容易学习的技能,因为缺乏发现更复杂、更具有挑战性的行为的激励机制。我们引入了一种新的不受监督的技能发现方法,即控制能力-觉悟技能发现(CSD),它积极寻求复杂、难以控制的技能,而不受监督。CSD的关键组成部分是控制能力-认知距离功能,它赋予更大的价值,以说明与当前技能相比更难实现的过渡。与远程最大化技能发现相结合,CSD在培训课程中逐渐学习更具挑战性的技能,因为我们共同培训的远程功能减少了对容易取得技能的奖励。我们在六种机械操纵和移动环境中的实验结果表明,CSD可以在没有监督的情况下发现各种复杂技能,包括物体操纵和移动技能,大大超过先前未经监督的技能发现方法。视频和代码可以在 https://sitesite.gogle.com/view/icml20 https://gogle.gle.com/cd/cdsd) 上查到。视频和代码。