State-of-the-art reinforcement learning is now able to learn versatile locomotion, balancing and push-recovery capabilities for bipedal robots in simulation. Yet, the reality gap has mostly been overlooked and the simulated results hardly transfer to real hardware. Either it is unsuccessful in practice because the physics is over-simplified and hardware limitations are ignored, or regularity is not guaranteed and unexpected hazardous motions can occur. This paper presents a reinforcement learning framework capable of learning robust standing push recovery for bipedal robots with a smooth out-of-the-box transfer to reality, requiring only instantaneous proprioceptive observations. By combining original termination conditions and policy smoothness conditioning, we achieve stable learning, sim-to-real transfer and safety using a policy without memory nor observation history. Reward shaping is then used to give insights into how to keep balance. We demonstrate its performance in reality on the lower-limb medical exoskeleton Atalante.
翻译:最新的强化学习现在能够学习双型机器人在模拟中的多功能移动、平衡和推力恢复能力。然而,现实差距大多被忽略,模拟结果几乎没有转移到真正的硬件上。要么在实践中不成功,因为物理过于简化,硬件限制被忽视,或者没有规律性保证,意外的危险动作可能发生。本文件提供了一个强化学习框架,能够学习双型机器人的稳健的常态恢复,顺利地从箱外转移到现实,只需要瞬时自觉观察。通过将原始终止条件和政策平稳调节结合起来,我们利用没有记忆或观察历史的政策实现稳定的学习、模拟到现实的转移和安全。然后,再利用调整形状来洞察如何保持平衡。我们在低层医学前骨骼阿塔兰特(Exoskeleton Atalante)上展示了它的实际表现。