Direct visual localization has recently enjoyed a resurgence in popularity with the increasing availability of cheap mobile computing power. The competitive accuracy and robustness of these algorithms compared to state-of-the-art feature-based methods, as well as their natural ability to yield dense maps, makes them an appealing choice for a variety of mobile robotics applications. However, direct methods remain brittle in the face of appearance change due to their underlying assumption of photometric consistency, which is commonly violated in practice. In this paper, we propose to mitigate this problem by training deep convolutional encoder-decoder models to transform images of a scene such that they correspond to a previously-seen canonical appearance. We validate our method in multiple environments and illumination conditions using high-fidelity synthetic RGB-D datasets, and integrate the trained models into a direct visual localization pipeline, yielding improvements in visual odometry (VO) accuracy through time-varying illumination conditions, as well as improved metric relocalization performance under illumination change, where conventional methods normally fail. We further provide a preliminary investigation of transfer learning from synthetic to real environments in a localization context. An open-source implementation of our method using PyTorch is available at https://github.com/utiasSTARS/cat-net.
翻译:最近,随着廉价的移动计算电源的日益普及,直接视觉本地化已再次受到欢迎。这些算法与最先进的基于地貌特征的方法相比具有竞争力的准确性和稳健性,以及它们制作密集地图的自然能力,使这些算法成为各种移动机器人应用的吸引力选择。然而,直接方法在外观变化面前仍然很脆弱,因为它们的基本假设是光度测量一致性,在实践中经常违反这一点。在本文中,我们提议通过培训深层的共振编码器-脱coder模型来缓解这一问题,以改造一个景象的图像,使其与以前所见的光学外观相匹配。我们用高纤维合成RGB-D数据集验证我们的方法和照明条件,并将经过训练的模型纳入直观本地化管道,通过时间变色化条件改善视觉测量(Voo)的准确性,以及在常规方法通常失败的情况下改进基准重新本地化的绩效。我们进一步对从合成环境向真实环境的转移进行初步调查,在本地化背景下使用开放的MAST/STARS应用方法。