Legged robots have the potential to expand the reach of autonomy beyond paved roads. In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. Due to the partial observability of the problem, the robot has to rely on past observations to infer the terrain currently beneath it. To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene and propose Neural Volumetric Memory (NVM), a geometric memory architecture that explicitly accounts for the SE(3) equivariance of the 3D world. NVM aggregates feature volumes from multiple camera views by first bringing them back to the ego-centric frame of the robot. We test the learned visual-locomotion policy on a physical robot and show that our approach, which explicitly introduces geometric priors during training, offers superior performance than more na\"ive methods. We also include ablation studies and show that the representations stored in the neural volumetric memory capture sufficient geometric information to reconstruct the scene. Our project page with videos is https://rchalyang.github.io/NVM .
翻译:四足机器人有潜力将自主性扩展到铺路以外的领域。在这项工作中,我们考虑使用单个前向深度相机的挑战性地形行走问题。由于该问题的部分可观测性,机器人必须依靠过去的观测来推断当前位于其下方的地形。为了解决这个问题,我们遵循计算机视觉中明确建模场景的 3D 几何的范例,并提出神经体积记忆 (NVM),一种明确考虑 3D 世界 SE(3) 等变性的几何记忆架构。NVM 通过首先将来自多个相机视图的特征体积带回到机器人的本体参考系,聚合它们。我们在实际机器人上测试了学习到的视觉移动策略,并表明我们的方法,在训练期间明确引入几何先验条件,比更原始的方法提供更好的性能。我们还包括了削减研究,并显示存储在神经体积存储器中的表示捕获足以重建场景的足够几何信息。我们的项目页面链接为 https://rchalyang.github.io/NVM。