Monocular depth estimation plays a fundamental role in computer vision. Due to the costly acquisition of depth ground truth, self-supervised methods that leverage adjacent frames to establish a supervisory signal have emerged as the most promising paradigms. In this work, we propose two novel ideas to improve self-supervised monocular depth estimation: 1) self-reference distillation and 2) disparity offset refinement. Specifically, we use a parameter-optimized model as the teacher updated as the training epochs to provide additional supervision during the training process. The teacher model has the same structure as the student model, with weights inherited from the historical student model. In addition, a multiview check is introduced to filter out the outliers produced by the teacher model. Furthermore, we leverage the contextual consistency between high-scale and low-scale features to obtain multiscale disparity offsets, which are used to refine the disparity output incrementally by aligning disparity information at different scales. The experimental results on the KITTI and Make3D datasets show that our method outperforms previous state-of-the-art competitors.
翻译:单心深度估计在计算机视野中起着根本作用。 由于获取深度地面真理的成本昂贵,利用邻近框架建立监督信号的自我监督方法已成为最有希望的范例。 在这项工作中,我们提出了两种改进自我监督单心深度估计的新想法:1)自我参考蒸馏,2)偏差补分。具体地说,我们使用一个参数优化模型作为教师在培训过程中更新的培训时代提供额外监督。教师模型与学生模型有着相同的结构,其重量由历史学生模型继承。此外,还引入了多视角检查来过滤教师模型产生的外层。此外,我们利用高尺度和低尺度特征之间的背景一致性来获得多尺度差异抵消,这些差异用来通过在不同尺度上调整差异信息来逐步改善差异输出。KITTI和Make3D数据集的实验结果显示,我们的方法超越了以前的高级竞争者。