In this paper, a hierarchical and robust framework for learning bipedal locomotion is presented and successfully implemented on the 3D biped robot Digit built by Agility Robotics. We propose a cascade-structure controller that combines the learning process with intuitive feedback regulations. This design allows the framework to realize robust and stable walking with a reduced-dimension state and action spaces of the policy, significantly simplifying the design and reducing the sampling efficiency of the learning method. The inclusion of feedback regulation into the framework improves the robustness of the learned walking gait and ensures the success of the sim-to-real transfer of the proposed controller with minimal tuning. We specifically present a learning pipeline that considers hardware-feasible initial poses of the robot within the learning process to ensure the initial state of the learning is replicated as close as possible to the initial state of the robot in hardware experiments. Finally, we demonstrate the feasibility of our method by successfully transferring the learned policy in simulation to the Digit robot hardware, realizing sustained walking gaits under external force disturbances and challenging terrains not included during the training process. To the best of our knowledge, this is the first time a learning-based policy is transferred successfully to the Digit robot in hardware experiments without using dynamic randomization or curriculum learning.
翻译:本文介绍并成功实施了学习双双动动动的分级和强健框架,用于学习双向移动器。我们提出将学习过程与直观反馈规则相结合的级联结构控制器。这一设计使框架能够实现稳健和稳定的行走,同时减少政策差异状态和行动空间,大大简化了教学方法的设计并降低了抽样效率。将反馈监管纳入框架,提高了学习的行走步步步的稳健性,确保了拟议控制器的模拟到实际转移的成功,并进行了最低限度的调整。我们特别提出了一个学习管道,在学习过程中考虑机器人的硬件可行初始构成,以确保学习的初始状态尽可能与硬件实验中的机器人初始状态相仿。最后,我们展示了我们的方法的可行性,成功地将学习的模拟政策转移到Digit机器人硬件上,在外部力量干扰下实现持续行骗术,以及培训过程中没有包括的挑战性地形。我们所了解的最好情况是,在学习过程中将机器人的硬质初始状态复制到没有机能的机器人学习课程中。我们第一次将硬件转移到了以动态为基础的机器人学习课程。