Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining single versatile policies with controllable skill sets from unlabeled datasets containing diverse state transition patterns by maximizing their discriminability. Moreover, we show that by utilizing unsupervised skill discovery in the generative adversarial imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained versatile policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.
翻译:学习不同技能是机器人的主要挑战之一。 为此,模仿学习方法已经取得了令人印象深刻的成果。 这些方法需要明确标注的数据集或进行一致的技能执行,以便能够学习和积极控制个人行为,从而限制其适用性。 在这项工作中,我们提出一种合作对抗方法,通过最大限度地扩大差异,从包含不同状态过渡模式的未标定数据集中获取单一的多功能政策及其可控技能套件。 此外,我们表明,通过在基因化对抗模仿学习框架中利用未经监督的技能发现,新颖和有用的技能随着任务的顺利完成而出现。 最后,获得的多功能政策在名为Solo 8的敏捷的四重四重机器人上进行测试,并展示演示中所编码的各种技能的忠实复制。