Developing robust vision-guided controllers for quadrupedal robots in complex environments, with various obstacles, dynamical surroundings and uneven terrains, is very challenging. While Reinforcement Learning (RL) provides a promising paradigm for agile locomotion skills with vision inputs in simulation, it is still very challenging to deploy the RL policy in the real world. Our key insight is that aside from the discrepancy in the domain gap, in visual appearance between the simulation and the real world, the latency from the control pipeline is also a major cause of difficulty. In this paper, we propose Multi-Modal Delay Randomization (MMDR) to address this issue when training RL agents. Specifically, we simulate the latency of real hardware by using past observations, sampled with randomized periods, for both proprioception and vision. We train the RL policy for end-to-end control in a physical simulator without any predefined controller or reference motion, and directly deploy it on the real A1 quadruped robot running in the wild. We evaluate our method in different outdoor environments with complex terrains and obstacles. We demonstrate the robot can smoothly maneuver at a high speed, avoid the obstacles, and show significant improvement over the baselines. Our project page with videos is at https://mehooz.github.io/mmdr-wild/.
翻译:在复杂的环境中,有各种障碍、动态环境以及不均匀的地形,为四重机器人开发强大的视觉引导控制器非常具有挑战性。虽然强化学习(RL)为灵活的移动技能提供了充满希望的范例,在模拟中提供了视觉投入,但在现实世界中部署RL政策仍然非常困难。我们的关键见解是,除了在模拟和现实世界的视觉外,在模拟与真实世界的视觉外观上存在差异外,控制管道的悬浮也是一个主要困难原因。在本文中,我们提议在培训RL代理时,多式延迟随机化(MMDR)来解决这个问题。具体地说,我们通过使用以往的观测(通过随机的周期抽样)来模拟真实硬件的宽度。我们训练RL政策在实际模拟器中进行端到端的控制,而没有预先定义的控制器或参考动作,直接在野外运行的真正的A1四重的机器人上部署。我们用复杂的地形和障碍来评估我们在不同室外环境中使用的方法。我们用高速度和障碍来模拟真实的机器人,我们用高速度来展示我们的机器人的基线/图像。我们可以避开。