State-of-the-art reinforcement learning is now able to learn versatile locomotion, balancing and push-recovery capabilities for bipedal robots in simulation. Yet, the reality gap has mostly been overlooked and the simulated results hardly transfer to real hardware. Either it is unsuccessful in practice because the physics is over-simplified and hardware limitations are ignored, or regularity is not guaranteed, and unexpected hazardous motions can occur. This paper presents a reinforcement learning framework capable of learning robust standing push recovery for bipedal robots that smoothly transfer to reality, providing only instantaneous proprioceptive observations. By combining original termination conditions and policy smoothness conditioning, we achieve stable learning, sim-to-real transfer and safety using a policy without memory nor explicit history. Reward engineering is then used to give insights into how to keep balance. We demonstrate its performance in reality on the lower-limb medical exoskeleton Atalante.
翻译:最新的强化学习现在能够学习双翼机器人在模拟中的多功能移动、平衡和推力恢复能力。然而,现实差距大多被忽略,模拟结果几乎没有转移到真正的硬件上。要么在实践中不成功,因为物理过于简化,硬件限制被忽视,或者没有规律性保障,意外的危险动作可能发生。本文件展示了一个强化学习框架,能够学习双翼机器人的稳健的常速恢复能力,这些机器人可以顺利地转移到现实中,只提供即时自主观察。通过将原始终止条件和政策平稳调节结合起来,我们利用没有记忆或明确历史的政策实现了稳定的学习、模拟到现实的转移和安全。然后,还利用奖励工程来洞察如何保持平衡。我们在低脂医学前方的Atalante上展示了它的实际表现。