The robustness of visual navigation policies trained through imitation often hinges on the augmentation of the training image-action pairs. Traditionally, this has been done by collecting data from multiple cameras, by using standard data augmentations from computer vision, such as adding random noise to each image, or by synthesizing training images. In this paper we show that there is another practical alternative for data augmentation for visual navigation based on extrapolating viewpoint embeddings and actions nearby the ones observed in the training data. Our method makes use of the geometry of the visual navigation problem in 2D and 3D and relies on policies that are functions of equivariant embeddings, as opposed to images. Given an image-action pair from a training navigation dataset, our neural network model predicts the latent representations of images at nearby viewpoints, using the equivariance property, and augments the dataset. We then train a policy on the augmented dataset. Our simulation results indicate that policies trained in this way exhibit reduced cross-track error, and require fewer interventions compared to policies trained using standard augmentation methods. We also show similar results in autonomous visual navigation by a real ground robot along a path of over 500m.
翻译:通过模仿所培训的视觉导航政策的稳健性往往取决于培训图像-动作配对的增强。 传统上,这是通过从多摄像头收集数据,使用计算机视觉的标准数据增强器,如在每张图像中添加随机噪音,或通过合成培训图像。 在本文中,我们显示,基于外推观点嵌入和培训数据中观测到的数据集周围的动作,为视觉导航提供数据增强的另一个实用替代方法。 我们的方法是使用2D和3D中视觉导航问题的几何测量法,并依靠与图像相反的等同嵌嵌入功能的政策。 鉴于培训导航数据集中的图像增强对齐,我们的神经网络模型预测了附近视图中图像的潜在表现,使用等异属性属性,并强化了数据集。 我们随后对增强的数据集进行了一项政策培训。 我们的模拟结果表明, 以这种方式培训的政策显示的跨轨误减少了, 与使用标准增强方法培训的政策相比, 需要较少的干预措施。 我们还展示了500个实际地面机器人沿直径自主视觉导航法的类似结果。