Self-supervised deep learning methods for joint depth and ego-motion estimation can yield accurate trajectories without needing ground-truth training data. However, as they typically use photometric losses, their performance can degrade significantly when the assumptions these losses make (e.g. temporal illumination consistency, a static scene, and the absence of noise and occlusions) are violated. This limits their use for e.g. nighttime sequences, which tend to contain many point light sources (including on dynamic objects) and low signal-to-noise ratio (SNR) in darker image regions. In this paper, we show how to use a combination of three techniques to allow the existing photometric losses to work for both day and nighttime images. First, we introduce a per-pixel neural intensity transformation to compensate for the light changes that occur between successive frames. Second, we predict a per-pixel residual flow map that we use to correct the reprojection correspondences induced by the estimated ego-motion and depth from the networks. And third, we denoise the training images to improve the robustness and accuracy of our approach. These changes allow us to train a single model for both day and nighttime images without needing separate encoders or extra feature networks like existing methods. We perform extensive experiments and ablation studies on the challenging Oxford RobotCar dataset to demonstrate the efficacy of our approach for both day and nighttime sequences.
翻译:以自我监督的深层次学习方法来进行联合深度和自我感动估计,可以产生准确的轨迹,而不需要地面真实性培训数据。 但是,由于通常使用光度损失,当这些损失的假设(如时间光照的一致性、静态场景、没有噪音和隔离)被违反时,其性能可以显著降低。这限制了它们用于例如夜间序列,例如,夜间序列往往包含许多点光源(包括动态物体)和暗色图像区域信号对噪音比率低(SNR)。在本文中,我们展示了如何使用三种技术的组合,使现有的光度损失能够用于日间和夜间图像。首先,我们引入了半象素神经强度变异,以弥补连续框架之间的光变化。第二,我们预测了每像素残余流图,我们用它来纠正由估计的自我感和深度在网络中引发的重新预测性能反应。第三,我们将培训图像压缩成日间模型,以便改进现有的光度和精确性网络的准确性能,我们需要进行一个不拘谨的实验。