Supervised learning depth estimation methods can achieve good performance when trained on high-quality ground-truth, like LiDAR data. However, LiDAR can only generate sparse 3D maps which causes losing information. Obtaining high-quality ground-truth depth data per pixel is difficult to acquire. In order to overcome this limitation, we propose a novel approach combining structure information from a promising Plane and Parallax geometry pipeline with depth information into a U-Net supervised learning network, which results in quantitative and qualitative improvement compared to existing popular learning-based methods. In particular, the model is evaluated on two large-scale and challenging datasets: KITTI Vision Benchmark and Cityscapes dataset and achieve the best performance in terms of relative error. Compared with pure depth supervision models, our model has impressive performance on depth prediction of thin objects and edges, and compared to structure prediction baseline, our model performs more robustly.
翻译:受监督的学习深度估测方法,如LiDAR数据等高质量的地面真相培训,可以取得良好的业绩。然而,LiDAR只能产生稀有的三维地图,导致信息丢失。很难获得每像素高质量的地面真相深度数据。为了克服这一限制,我们建议采用一种新颖的办法,将来自有前途的平板和帕拉利亚几何管道的结构信息与深度信息结合到U-Net监督的学习网络中,与现有的以学习为基础的方法相比,在数量和质量上都取得了改进。特别是,该模型在两个大型和具有挑战性的数据集上进行了评估:KITTI Vision Birits 和 Cityscovers 数据集,在相对错误方面达到了最佳性能。与纯深度监督模型相比,我们的模型在对稀薄物体和边缘的深度预测方面表现令人印象深刻,与结构预测基线相比,我们的模型表现更加有力。