Building upon the recent progress in novel view synthesis, we propose its application to improve monocular depth estimation. In particular, we propose a novel training method split in three main steps. First, the prediction results of a monocular depth network are warped to an additional view point. Second, we apply an additional image synthesis network, which corrects and improves the quality of the warped RGB image. The output of this network is required to look as similar as possible to the ground-truth view by minimizing the pixel-wise RGB reconstruction error. Third, we reapply the same monocular depth estimation onto the synthesized second view point and ensure that the depth predictions are consistent with the associated ground truth depth. Experimental results prove that our method achieves state-of-the-art or comparable performance on the KITTI and NYU-Depth-v2 datasets with a lightweight and simple vanilla U-Net architecture.
翻译:在新观点合成的最新进展的基础上,我们建议应用这一方法来改进单层深度估计。特别是,我们建议采用一种新的培训方法,将其分为三个主要步骤。首先,单层深度网络的预测结果被扭曲为另一个视图点。第二,我们采用一个额外的图像合成网络,以纠正和改进扭曲的 RGB 图像的质量。这个网络的输出需要通过尽量减少像素智慧RGB重建错误来尽可能与地面真实视图相似。第三,我们将同样的单层深度估计重新应用到合成的第二视图点上,并确保深度预测与相关的地面真相深度相一致。实验结果证明,我们的方法在KITTI和NYU-Depeh-V2的数据集上取得了最先进或可比的性能,并具有轻量和简单的Vanilla U-Net结构。