In this paper, we propose enhancing monocular depth estimation by adding 3D points as depth guidance. Unlike existing depth completion methods, our approach performs well on extremely sparse and unevenly distributed point clouds, which makes it agnostic to the source of the 3D points. We achieve this by introducing a novel multi-scale 3D point fusion network that is both lightweight and efficient. We demonstrate its versatility on two different depth estimation problems where the 3D points have been acquired with conventional structure-from-motion and LiDAR. In both cases, our network performs on par with state-of-the-art depth completion methods and achieves significantly higher accuracy when only a small number of points is used while being more compact in terms of the number of parameters. We show that our method outperforms some contemporary deep learning based multi-view stereo and structure-from-motion methods both in accuracy and in compactness.
翻译:在本文中,我们建议通过增加3D点作为深度指导,加强单体深度估计。与现有的深度完成方法不同,我们的方法在极稀少和分布不均的点云上表现良好,这使得我们无法了解3D点的来源。我们通过引入一个新型的3D点三维点融合网络来实现这一目标,这个网络既轻又高效。我们展示了它对于3D点以常规结构获得的两种不同深度估计问题的多功能性。在这两种情况下,我们的网络都与最先进的深度完成方法相同,在只使用少量点的同时在参数数量方面更为紧凑的情况下,其准确性也大大提高了准确性。我们表明,我们的方法超越了当代一些基于深度学习的多视角立体和结构从动作方法的精度和精准性。