Model-free reinforcement learning (RL) for legged locomotion commonly relies on a physics simulator that can accurately predict the behaviors of every degree of freedom of the robot. In contrast, approximate reduced-order models are commonly used for many model predictive control strategies. In this work we abandon the conventional use of high-fidelity dynamics models in RL and we instead seek to understand what can be achieved when using RL with a much simpler centroidal model when applied to quadrupedal locomotion. We show that RL-based control of the accelerations of a centroidal model is surprisingly effective, when combined with a quadratic program to realize the commanded actions via ground contact forces. It allows for a simple reward structure, reduced computational costs, and robust sim-to-real transfer. We show the generality of the method by demonstrating flat-terrain gaits, stepping-stone locomotion, two-legged in-place balance, balance beam locomotion, and direct sim-to-real transfer.
翻译:用于腿部助行器的无模型强化学习通常依赖于物理模拟器,该模拟器能够准确预测机器人每种程度的自由行为。相反,对于许多模型预测控制战略,通常使用近似减序模型。在这项工作中,我们放弃了在模型预测控制战略中传统使用高不忠动态模型的做法,相反,我们寻求理解在四肢部位时使用使用简单得多的中央机器人模型使用RL时能够取得什么成就。我们显示,以RL为基础的对中央机器人模型加速率的控制是惊人有效的,如果结合一个二次程序,通过地面接触力量实现指挥的行动。它允许一个简单的奖励结构,降低计算成本,并实现强力的模拟到真实的转移。我们展示了这一方法的通用性,我们展示了平坦的顶部网格、踏脚石液动、两脚部平衡、平衡的移动和直接的模拟到真实的转移。