Generating dynamic motions for legged robots remains a challenging problem. While reinforcement learning has achieved notable success in various legged locomotion tasks, producing highly dynamic behaviors often requires extensive reward tuning or high-quality demonstrations. Leveraging reduced-order models can help mitigate these challenges. However, the model discrepancy poses a significant challenge when transferring policies to full-body dynamics environments. In this work, we introduce a continuation-based learning framework that combines simplified model pretraining and model homotopy transfer to efficiently generate and refine complex dynamic behaviors. First, we pretrain the policy using a single rigid body model to capture core motion patterns in a simplified environment. Next, we employ a continuation strategy to progressively transfer the policy to the full-body environment, minimizing performance loss. To define the continuation path, we introduce a model homotopy from the single rigid body model to the full-body model by gradually redistributing mass and inertia between the trunk and legs. The proposed method not only achieves faster convergence but also demonstrates superior stability during the transfer process compared to baseline methods. Our framework is validated on a range of dynamic tasks, including flips and wall-assisted maneuvers, and is successfully deployed on a real quadrupedal robot.
翻译:为足式机器人生成动态运动仍是一个具有挑战性的问题。尽管强化学习在各种足式运动任务中取得了显著成功,但要产生高度动态的行为通常需要大量的奖励函数调整或高质量的演示数据。利用降阶模型有助于缓解这些挑战。然而,当将策略迁移到全身动力学环境时,模型差异构成了重大挑战。在本工作中,我们提出了一种基于延拓的学习框架,该框架结合了简化模型预训练与模型同伦迁移,以高效生成并优化复杂的动态行为。首先,我们使用单刚体模型对策略进行预训练,以在简化环境中捕捉核心运动模式。接着,我们采用延拓策略将策略逐步迁移到全身环境,从而最小化性能损失。为了定义延拓路径,我们通过逐渐在躯干与腿部之间重新分配质量和转动惯量,引入了从单刚体模型到全身模型的模型同伦。所提出的方法不仅实现了更快的收敛速度,而且在迁移过程中相比基线方法表现出更优的稳定性。我们的框架在一系列动态任务上得到了验证,包括空翻和墙面辅助机动,并成功部署在真实的四足机器人上。