This work aims to estimate a high-quality depth map from a single RGB image. Due to the lack of depth clues, making full use of the long-range correlation and the local information is critical for accurate depth estimation. Towards this end, we introduce an uncertainty rectified cross-distillation between Transformer and convolutional neural network (CNN) to learn a unified depth estimator. Specifically, we use the depth estimates from the Transformer branch and the CNN branch as pseudo labels to teach each other. Meanwhile, we model the pixel-wise depth uncertainty to rectify the loss weights of noisy pseudo labels. To avoid the large capacity gap induced by the strong Transformer branch deteriorating the cross-distillation, we transfer the feature maps from Transformer to CNN and design coupling units to assist the weak CNN branch to leverage the transferred features. Furthermore, we propose a surprisingly simple yet highly effective data augmentation technique CutFlip, which enforces the model to exploit more valuable clues apart from the vertical image position for depth inference. Extensive experiments demonstrate that our model, termed~\textbf{URCDC-Depth}, exceeds previous state-of-the-art methods on the KITTI, NYU-Depth-v2 and SUN RGB-D datasets, even with no additional computational burden at inference time. The source code is publicly available at \url{https://github.com/ShuweiShao/URCDC-Depth}.
翻译:这项工作旨在从一个 RGB 图像中估计高质量的深度地图。 由于缺乏深度线索, 充分利用远程相关关系和地方信息对于准确的深度估算至关重要 。 为此, 我们引入了一种不确定性, 纠正变异器和进化神经网络( CNN) 之间的交叉蒸馏。 具体地说, 我们使用变异器分支和CNN分支的深度估算值作为假标签来相互教学。 同时, 我们模拟了像素智慧深度不确定性, 以纠正噪音假名标签的损失重量。 为了避免强大的变异器分支导致的巨大的能力差距, 我们从变异器到CNN, 并设计连接器, 以协助弱的CNN分支利用传输的特性。 此外, 我们提议了一个令人惊讶而非常有效的数据增强技术剪动器, 使模型能够利用比垂直图像位置更有价值的线索进行深度推断。 广泛的实验显示, 我们的模型, 被命名为\\ URCD- NYC- Deptal 的代码在 RU- Deptial- discoal 中, 超越了先前的Systrubb- dal- disal- 方法。