In this work, we present a learning-based pipeline to realise local navigation with a quadrupedal robot in cluttered environments with static and dynamic obstacles. Given high-level navigation commands, the robot is able to safely locomote to a target location based on frames from a depth camera without any explicit mapping of the environment. First, the sequence of images and the current trajectory of the camera are fused to form a model of the world using state representation learning. The output of this lightweight module is then directly fed into a target-reaching and obstacle-avoiding policy trained with reinforcement learning. We show that decoupling the pipeline into these components results in a sample efficient policy learning stage that can be fully trained in simulation in just a dozen minutes. The key part is the state representation, which is trained to not only estimate the hidden state of the world in an unsupervised fashion, but also helps bridging the reality gap, enabling successful sim-to-real transfer. In our experiments with the quadrupedal robot ANYmal in simulation and in reality, we show that our system can handle noisy depth images, avoid dynamic obstacles unseen during training, and is endowed with local spatial awareness.
翻译:在这项工作中,我们展示了一条基于学习的管道,以在充满静态和动态障碍的封闭环境中实现本地导航。在高水平导航指令下,机器人能够安全地从深度摄像机的架子上从没有对环境进行任何明确测绘的深层摄像头向目标位置移动。首先,图像的顺序和摄影机目前的轨迹被结合成一个使用州代表制学习的世界模型。然后,这个轻量模块的输出被直接地注入一个经过强化学习培训的有目标影响和障碍的回避政策中。我们展示了将管道与这些组件脱钩的结果,在一个精准有效的政策学习阶段,可以在短短的12分钟内进行模拟培训。关键部分是州代表,它不仅受过培训,不仅能够以不受监控的方式估计世界的隐蔽状态,而且还有助于弥合现实差距,能够成功进行模拟和现实传输。在模拟中和现实中,我们用四重的机器人Anymal的实验中,我们展示了我们的系统能够处理震荡的深度图像,避免在培训过程中出现动态障碍,并且具有当地空间意识。