优化加强学习的单一硬体-体积模型的双速双向双向操纵器 (Optimizing Bipedal Maneuvers of Single Rigid-Body Models for Reinforcement Learning)

In this work, we propose a method to generate reduced-order model reference trajectories for general classes of highly dynamic maneuvers for bipedal robots for use in sim-to-real reinforcement learning. Our approach is to utilize a single rigid-body model (SRBM) to optimize libraries of trajectories offline to be used as expert references in the reward function of a learned policy. This method translates the model's dynamically rich rotational and translational behaviour to a full-order robot model and successfully transfers to real hardware. The SRBM's simplicity allows for fast iteration and refinement of behaviors, while the robustness of learning-based controllers allows for highly dynamic motions to be transferred to hardware. % Within this work we introduce a set of transferability constraints that amend the SRBM dynamics to actual bipedal robot hardware, our framework for creating optimal trajectories for dynamic stepping, turning maneuvers and jumps as well as our approach to integrating reference trajectories to a reinforcement learning policy. Within this work we introduce a set of transferability constraints that amend the SRBM dynamics to actual bipedal robot hardware, our framework for creating optimal trajectories for a variety of highly dynamic maneuvers as well as our approach to integrating reference trajectories for a high-speed running reinforcement learning policy. We validate our methods on the bipedal robot Cassie on which we were successfully able to demonstrate highly dynamic grounded running gaits up to 3.0 m/s.

翻译：在这项工作中,我们提出一种方法,为双脚机器人的高度动态操控的一般类别生成减序模型参考轨迹,用于模拟到现实的强化学习。我们的方法是利用单一的僵硬体模型(SRBM)优化离线轨迹库,作为学习政策奖励功能的专家参考。这种方法将模型动态丰富的旋转和翻译行为转换成全序机器人模型,并成功向实际硬件转移。SRBM的简单性允许行为快速循环和完善,而学习型控制器的强大性允许将高度动态动作转移到硬件。在这项工作中,我们引入了一套可转移性限制,将SRBM动态动态功能修正成实际双轨机器人硬件,我们为动态跳动、转换和跳动创建最佳轨迹,以及我们将参考轨迹转换到强化学习政策。在这项工作中,我们引入了一套可转移性制约,将高度动态动作动作动作动作动作动作动作转换成高动态的机能模型。我们引入了一套可移动性参照性框架,将运行的SRBMMRBM动态和高动态机动性机能模型转化为高动态的硬体模型。