Learning agile skills is one of the main challenges in robotics. To this end, reinforcement learning approaches have achieved impressive results. These methods require explicit task information in terms of a reward function or an expert that can be queried in simulation to provide a target control output, which limits their applicability. In this work, we propose a generative adversarial method for inferring reward functions from partial and potentially physically incompatible demonstrations for successful skill acquirement where reference or expert demonstrations are not easily accessible. Moreover, we show that by using a Wasserstein GAN formulation and transitions from demonstrations with rough and partial information as input, we are able to extract policies that are robust and capable of imitating demonstrated behaviors. Finally, the obtained skills such as a backflip are tested on an agile quadruped robot called Solo 8 and present faithful replication of hand-held human demonstrations.
翻译:灵活学习技能是机器人的主要挑战之一。为此,强化学习方法已经取得了令人印象深刻的成果。这些方法需要明确的任务信息,包括奖赏功能,或在模拟中可以询问的专家,以提供目标控制产出,从而限制其适用性。在这项工作中,我们提议一种基因化对抗方法,用以从部分和可能与身体不相容的获得成功技能的演示中推断奖赏功能,如果参考或专家演示不容易获得的话。此外,我们表明,通过使用瓦塞尔斯坦GAN的配方和从以粗略和部分信息作为投入的演示的过渡,我们能够提取有力且能够模仿所显示的行为的政策。 最后,在称为索罗8的灵活四重机器人上测试了获得的技能,并展示手持人类演示的忠实复制性。