Model-free reinforcement learning (RL) for legged locomotion commonly relies on a physics simulator that can accurately predict the behaviors of every degree of freedom of the robot. In contrast, approximate reduced-order models are often sufficient for many model-based control strategies. In this work we explore how RL can be effectively used with a centroidal model to generate robust control policies for quadrupedal locomotion. Advantages over RL with a full-order model include a simple reward structure, reduced computational costs, and robust sim-to-real transfer. We further show the potential of the method by demonstrating stepping-stone locomotion, two-legged in-place balance, balance beam locomotion, and sim-to-real transfer without further adaptations.
翻译:无模型强化学习(RL)对于腿部助行器,通常依赖于物理模拟器,该模拟器可以准确预测机器人每种程度的自由行为。相比之下,近似减序模型往往足以用于许多基于模型的控制战略。在这项工作中,我们探索如何使用环球模型有效地使用减序模型来产生稳健的四重助动控制政策。 具有全序模型的优于RL的优势包括简单的奖赏结构、降低计算成本和强大的模拟到真实的转移。 我们进一步展示了该方法的潜力,展示了踏脚石摇动、双脚的院内平衡、平衡波束动和不进一步调整的模拟到真实的转移。