Good quality reconstruction and comprehension of a scene rely on 3D estimation methods. The 3D information was usually obtained from images by stereo-photogrammetry, but deep learning has recently provided us with excellent results for monocular depth estimation. Building up a sufficiently large and rich training dataset to achieve these results requires onerous processing. In this paper, we address the problem of learning outdoor 3D point cloud from monocular data using a sparse ground-truth dataset. We propose Pix2Point, a deep learning-based approach for monocular 3D point cloud prediction, able to deal with complete and challenging outdoor scenes. Our method relies on a 2D-3D hybrid neural network architecture, and a supervised end-to-end minimisation of an optimal transport divergence between point clouds. We show that, when trained on sparse point clouds, our simple promising approach achieves a better coverage of 3D outdoor scenes than efficient monocular depth methods.
翻译:高质量的场景重建与理解取决于 3D 估算方法。 3D 信息通常是通过立体光谱测量从图像中获取的,但最近深层学习为我们提供了单层深度估计的极好结果。 建立足够大和丰富的培训数据集以取得这些结果需要繁琐的处理。 在本文中,我们用稀疏的地面真象数据集从单层数据中学习3D点云的问题。 我们提出Pix2Point,这是以深层次学习为基础的单立体三维点云预测方法,能够处理完整和具有挑战性的室外场景。 我们的方法依赖于 2D-3D 混合神经网络结构, 以及受监督的点云间最佳运输差异的端到端最小化。 我们表明,在对稀有的云进行训练时,我们简单的有希望的方法比高效的单层深度方法更能覆盖3D户外场景。