We present a novel monocular localization framework by jointly training deep learning-based depth prediction and Bayesian filtering-based pose reasoning. The proposed cross-modal framework significantly outperforms deep learning-only predictions with respect to model scalability and tolerance to environmental variations. Specifically, we show little-to-no degradation of pose accuracy even with extremely poor depth estimates from a lightweight depth predictor. Our framework also maintains high pose accuracy in extreme lighting variations compared to standard deep learning, even without explicit domain adaptation. By openly representing the map and intermediate feature maps (such as depth estimates), our framework also allows for faster updates and reusing intermediate predictions for other tasks, such as obstacle avoidance, resulting in much higher resource efficiency.
翻译:我们通过联合培训深层学习深度预测和贝叶西亚过滤法的表面推理,提出了一个新的单一本地化框架。拟议的跨模式框架在模型可缩放性和环境变异的容忍度方面大大优于只进行深层学习的预测。具体地说,我们显示,即使轻量深度预测器的深度估计极差,其准确性也微乎其微。我们的框架在极端的照明变异与标准的深层学习之间也保持很高的准确性,即使没有明确的领域调整。通过公开代表地图和中间地貌图(如深度估计),我们的框架还允许更快地更新和重新使用中间预测来完成其他任务,例如避免障碍,从而导致更高的资源效率。