In this work, we demonstrate robust walking in the bipedal robot Digit on uneven terrains by just learning a single linear policy. In particular, we propose a new control pipeline, wherein the high-level trajectory modulator shapes the end-foot ellipsoidal trajectories, and the low-level gait controller regulates the torso and ankle orientation. The foot-trajectory modulator uses a linear policy and the regulator uses a linear PD control law. As opposed to neural network-based policies, the proposed linear policy has only 13 learnable parameters, thereby not only guaranteeing sample efficient learning but also enabling simplicity and interpretability of the policy. This is achieved with no loss of performance on challenging terrains like slopes, stairs and outdoor landscapes. We first demonstrate robust walking in the custom simulation environment, MuJoCo, and then directly transfer to hardware with no modification of the control pipeline. We subject the biped to a series of pushes and terrain height changes, both indoors and outdoors, thereby validating the presented work.
翻译:在这项工作中,我们通过学习单一线性政策,在双脚机器人Digit(Digit)中展示了在不均匀的地形上大力行走。特别是,我们提议了一个新的控制管道,让高水平的轨迹调控器塑造最终脚足的单线轨迹轨迹和低水平的轨迹控制器来调节身体和脚踝方向。脚轨调控器使用线性政策,而调控器则使用线性PD控制法。与以神经网络为基础的政策相反,拟议的线性政策只有13个可学习参数,从而不仅保证抽样有效学习,而且使政策的简单易懂和可解释性。在诸如斜坡、楼梯和室外景观等具有挑战性的地形上,没有丧失性能,我们首先展示了在定制模拟环境中的动态行走势,即MuJoCo,然后直接转到硬件,而没有修改控制管道。我们将这两条线性路分成一系列推力和地形高度变化,无论是室内还是室外,从而验证了所提出的工作。