Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress in terms of evaluation metrics such as the pixel-wise relative error, most methods neglect the geometric constraints in the 3D space. In this work, we show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces a simple geometric constraint, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space, we significantly improve the accuracy and robustness of monocular depth estimation. Significantly, the virtual normal loss can not only improve the performance of learning metric depth, but also disentangle the scale information and enrich the model with better shape information. Therefore, when not having access to absolute metric depth training data, we can use virtual normal to learn a robust affine-invariant depth generated on diverse scenes. In experiments, We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI. From the high-quality predicted depth, we are now able to recover good 3D structures of the scene such as the point cloud and surface normal directly, eliminating the necessity of relying on additional models as was previously done. To demonstrate the excellent generalizability of learning affine-invariant depth on diverse data with the virtual normal loss, we construct a large-scale and diverse dataset for training affine-invariant depth, termed Diverse Scene Depth dataset (DiverseDepth), and test on five datasets with the zero-shot test setting. Code is available at: https://git.io/Depth
翻译:虽然最近的方法在评估指标(如像素相对错误)方面取得了令人印象深刻的进展,但大多数方法忽视了3D空间的几何限制。在这项工作中,我们展示了高调 3D 几何限制对于深度预测的重要性。通过设计一个执行简单几何限制的损耗术语,即通过随机抽样在重建的 3D 空间中抽取的三点确定的虚拟正常方向,我们大大提高了单眼深度估计的准确性和稳健性。重要的是,虚拟正常损失不仅能够提高学习指标深度的性能,而且能够混淆尺度信息,以更好的形状信息丰富模型。因此,当我们无法获取绝对的 3D 深度培训数据时,我们可以使用虚拟正常来学习在不同场景中产生的稳健的近距离变化深度。在实验中,我们展示了NU 深度- V2 和 KITTI 上最先进的测量深度测量结果。从高质量预测深度来看,我们现在能够恢复正常的 3D 测试性数据结构,在以前的平面上直接学习了正常的深度数据。