Self-supervised depth estimation draws a lot of attention recently as it can promote the 3D sensing capabilities of self-driving vehicles. However, it intrinsically relies upon the photometric consistency assumption, which hardly holds during nighttime. Although various supervised nighttime image enhancement methods have been proposed, their generalization performance in challenging driving scenarios is not satisfactory. To this end, we propose the first method that jointly learns a nighttime image enhancer and a depth estimator, without using ground truth for either task. Our method tightly entangles two self-supervised tasks using a newly proposed uncertain pixel masking strategy. This strategy originates from the observation that nighttime images not only suffer from underexposed regions but also from overexposed regions. By fitting a bridge-shaped curve to the illumination map distribution, both regions are suppressed and two tasks are bridged naturally. We benchmark the method on two established datasets: nuScenes and RobotCar and demonstrate state-of-the-art performance on both of them. Detailed ablations also reveal the mechanism of our proposal. Last but not least, to mitigate the problem of sparse ground truth of existing datasets, we provide a new photo-realistically enhanced nighttime dataset based upon CARLA. It brings meaningful new challenges to the community. Codes, data, and models are available at https://github.com/ucaszyp/STEPS.
翻译:自我监督的深度估计最近引起许多关注,因为它可以促进自我驾驶车辆的3D感知能力。 但是,它本质上依赖于光度一致性假设,而光度一致性假设在夜间几乎无法维持。 虽然提出了各种监督的夜间图像增强方法,但在挑战性驾驶假设情景中一般化表现并不令人满意。 为此,我们提议了第一个方法,即共同学习夜视图像增强器和深度估计器,但不使用两种任务的地面真相。我们的方法紧紧缠着两个自我监督的任务,使用新提出的不确定的像素遮罩战略。这个战略源于这样的观察,即夜间图像不仅受到过深的区域的影响,而且还受到过度暴露的区域的影响。通过将桥形曲线与照明性地图分布相匹配,这两个区域都被压制,两个任务自然地连接。我们用两个既定的数据集(nuscenes 和 Robcil Car) 来测试方法,并展示两种工具的状态。 细微的布图还揭示了我们提议的机制。最后但并非最不切实际的、基于我们所见的实地数据,将新的数据带到了最新的地面数据。