Place recognition is key to Simultaneous Localization and Mapping (SLAM) and spatial perception. However, a place recognition in the wild often suffers from erroneous predictions due to image variations, e.g., changing viewpoints and street appearance. Integrating uncertainty estimation into the life cycle of place recognition is a promising method to mitigate the impact of variations on place recognition performance. However, existing uncertainty estimation approaches in this vein are either computationally inefficient (e.g., Monte Carlo dropout) or at the cost of dropped accuracy. This paper proposes STUN, a self-teaching framework that learns to simultaneously predict the place and estimate the prediction uncertainty given an input image. To this end, we first train a teacher net using a standard metric learning pipeline to produce embedding priors. Then, supervised by the pretrained teacher net, a student net with an additional variance branch is trained to finetune the embedding priors and estimate the uncertainty sample by sample. During the online inference phase, we only use the student net to generate a place prediction in conjunction with the uncertainty. When compared with place recognition systems that are ignorant to the uncertainty, our framework features the uncertainty estimation for free without sacrificing any prediction accuracy. Our experimental results on the large-scale Pittsburgh30k dataset demonstrate that STUN outperforms the state-of-the-art methods in both recognition accuracy and the quality of uncertainty estimation.
翻译:位置识别是同步定位和绘图(SLAM)和空间感知的关键。然而,野生位置的识别往往因图像变异而出现错误的预测,例如观点变化和街头外观。将不确定性估计纳入地点识别生命周期是减轻差异对地点识别性业绩的影响的一个有希望的方法。然而,目前这种不确定性估计方法要么是计算效率低下(如蒙特卡洛辍学),要么以降低准确度为代价。本文提议采用自学框架STUN,这是一个自我教学框架,学习同时预测地点,并估计输入图像时的预测不确定性。为此,我们首先使用标准的衡量学习管道来培训教师网络,以建立先前的嵌入。随后,在事先培训的教师网监督下,一个具有额外差异分支的学生网络经过培训,以微调嵌入前端和通过抽样估计不确定性。在网上推断阶段,我们只使用学生网络与不确定性一起生成一个地点的预测。与对不确定性的认知系统是无知的,因此,我们首先使用标准的教师网来培训教师网络,然后用标准的测试管道来生成前置前置的不确定性,然后,我们框架将测试中的任何不确定性估算结果。