Recently, work on reinforcement learning (RL) for bipedal robots has successfully learned controllers for a variety of dynamic gaits with robust sim-to-real demonstrations. In order to maintain balance, the learned controllers have full freedom of where to place the feet, resulting in highly robust gaits. In the real world however, the environment will often impose constraints on the feasible footstep locations, typically identified by perception systems. Unfortunately, most demonstrated RL controllers on bipedal robots do not allow for specifying and responding to such constraints. This missing control interface greatly limits the real-world application of current RL controllers. In this paper, we aim to maintain the robust and dynamic nature of learned gaits while also respecting footstep constraints imposed externally. We develop an RL formulation for training dynamic gait controllers that can respond to specified touchdown locations. We then successfully demonstrate simulation and sim-to-real performance on the bipedal robot Cassie. In addition, we use supervised learning to induce a transition model for accurately predicting the next touchdown locations that the controller can achieve given the robot's proprioceptive observations. This model paves the way for integrating the learned controller into a full-order robot locomotion planner that robustly satisfies both balance and environmental constraints.
翻译:最近,双翼机器人的强化学习(RL)工作已经成功地为各种动态动作成功学到了控制器,这些动作带有强大的模拟到真实的演示。为了保持平衡,学习到的控制器完全可以自由放置脚部,从而产生非常强的音轨。然而,在现实世界中,环境往往会限制可行的脚步位置,通常是通过感知系统确定的。不幸的是,大多数双臂机器人上显示的RL控制器不允许具体指定和应对这些限制。这个缺失的控制界面极大地限制了当前RL控制器的真实应用。在本文中,我们的目标是保持所学的音频和动态性,同时尊重外部施加的脚步限制。我们为培训能对特定触地点作出反应的动态控制器设计了一个RL设计器。我们随后成功地展示了双臂机器人卡西的模拟和模拟到真实性表现。此外,我们利用监督式学习来引导一个过渡模型,以准确预测控制器在机器人敏锐的观察下可以实现的下一个触地达标位置的位置。我们的目标是保持所学到的音调,同时还要尊重外部限制。我们开发出一个稳健健健的机器人的机压控制器。