Localization in topological maps is essential for image-based navigation using an RGB camera. Localization using only one camera can be challenging in medium-to-large-sized environments because similar-looking images are often observed repeatedly, especially in indoor environments. To overcome this issue, we propose a learning-based localization method that simultaneously utilizes the spatial consistency from topological maps and the temporal consistency from time-series images captured by the robot. Our method combines a convolutional neural network (CNN) to embed image features and a recurrent-type graph neural network to perform accurate localization. When training our model, it is difficult to obtain the ground truth pose of the robot when capturing images in real-world environments. Hence, we propose a sim2real transfer approach with semi-supervised learning that leverages simulator images with the ground truth pose in addition to real images. We evaluated our method quantitatively and qualitatively and compared it with several state-of-the-art baselines. The proposed method outperformed the baselines in environments where the map contained similar images. Moreover, we evaluated an image-based navigation system incorporating our localization method and confirmed that navigation accuracy significantly improved in the simulator and real environments when compared with the other baseline methods.
翻译:地形图中的本地化对于使用 RGB 相机进行基于图像的导航至关重要。 仅使用一台相机的本地化在中大环境中可能具有挑战性,因为相近的图像经常被反复观测,特别是在室内环境中。 为了克服这一问题,我们建议采用基于学习的本地化方法,同时利用地形图的空间一致性和机器人所摄时间序列图像的时间一致性。我们的方法将一个革命神经网络(CNN)结合到嵌入图像特征和一个经常性类型的图形神经网络,以进行准确的本地化。在培训我们的模型时,很难在现实世界环境中采集图像时获得机器人的地面真实面貌。因此,我们提出一种模拟真实化方法,采用半超导式学习方法,除真实图像外,还利用模拟图像与地面真实面图像的模拟图像。我们从数量上和质量上评估了我们的方法,并将其与若干最先进的基线进行了比较。拟议方法在地图包含类似图像的环境中超越了基线。此外,我们评价了一种基于图像的导航系统,将我们的本地化方法与其它基线环境进行比较,从而确认真实的精确性。