Deep reinforcement learning produces robust locomotion policies for legged robots over challenging terrains. To date, few studies have leveraged model-based methods to combine these locomotion skills with the precise control of manipulators. Here, we incorporate external dynamics plans into learning-based locomotion policies for mobile manipulation. We train the base policy by applying a random wrench sequence on the robot base in simulation and adding the noisified wrench sequence prediction to the policy observations. The policy then learns to counteract the partially-known future disturbance. The random wrench sequences are replaced with the wrench prediction generated with the dynamics plans from model predictive control to enable deployment. We show zero-shot adaptation for manipulators unseen during training. On the hardware, we demonstrate stable locomotion of legged robots with the prediction of the external wrench.
翻译:深入强化学习为具有挑战性地形的腿型机器人制定了强有力的移动政策。 到目前为止,很少有研究利用模型方法将这些移动技能与操控器的精确控制结合起来。 在这里,我们将外部动态计划纳入基于学习的流动操作政策中。 我们通过在机器人基地进行模拟随机扳动序列和在政策观测中添加有噪音的扳动序列预测来培训基础政策。 该政策随后学会对抗部分已知的未来扰动。 随机扳动序列被由模型预测控制产生的动态计划产生的扳动预测所取代, 以便进行部署。 我们展示了在训练期间看不见的操纵者的零镜头适应性。 在硬件上, 我们展示了与外部扳动预测一起的脚型机器人稳定的移动动作。