Model-free reinforcement learning (RL) for legged locomotion commonly relies on a physics simulator that can accurately predict the behaviors of every degree of freedom of the robot. In contrast, approximate reduced-order models are often sufficient for many model-based control strategies. In this work we explore how RL can be effectively used with a centroidal model to generate robust control policies for quadrupedal locomotion. Advantages over RL with a full-order model include a simple reward structure, reduced computational costs, and robust sim-to-real transfer. We further show the potential of the method by demonstrating stepping-stone locomotion, two-legged in-place balance, balance beam locomotion, and sim-to-real transfer without further adaptations. Additional Results: https://www.pair.toronto.edu/glide-quadruped/.
翻译:用于腿部助行器的无模型强化学习通常依赖于物理学模拟器,该模拟器能够准确预测机器人每种程度的自由行为。相比之下,近似减序模型往往足以用于许多基于模型的控制战略。在这项工作中,我们探索如何使用环球模型有效地使用减序模型来为四重体动生成稳健的控制政策。具有全序模型的RL的优势包括一个简单的奖赏结构、降低计算成本和强大的模拟到真实传输。我们进一步展示了该方法的潜力,展示了踏脚石 Locoove、两条腿的内位平衡、平衡波束动和不进一步调整的模拟到真实的转移。其他结果:https://www.pair.toronto.edu/glide-quarpup/。