While the keypoint-based maps created by sparse monocular simultaneous localisation and mapping (SLAM) systems are useful for camera tracking, dense 3D reconstructions may be desired for many robotic tasks. Solutions involving depth cameras are limited in range and to indoor spaces, and dense reconstruction systems based on minimising the photometric error between frames are typically poorly constrained and suffer from scale ambiguity. To address these issues, we propose a 3D reconstruction system that leverages the output of a convolutional neural network (CNN) to produce fully dense depth maps for keyframes that include metric scale. Our system, DeepFusion, is capable of producing real-time dense reconstructions on a GPU. It fuses the output of a semi-dense multiview stereo algorithm with the depth and gradient predictions of a CNN in a probabilistic fashion, using learned uncertainties produced by the network. While the network only needs to be run once per keyframe, we are able to optimise for the depth map with each new frame so as to constantly make use of new geometric constraints. Based on its performance on synthetic and real-world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.
翻译:虽然由稀疏单形同步定位和绘图系统创建的关键点地图(SLAM)有助于相机跟踪,但对于许多机器人任务来说,可能需要进行密集的 3D 重建。涉及深度摄像机的解决方案在射程和室内空间上有限,而基于最小化各框架之间光度误差的密集重建系统通常没有很好地加以限制,并受到规模模糊的影响。为了解决这些问题,我们提议3D重建系统,利用动态神经网络(CNN)的输出,为包括光度在内的关键框架绘制完全密集的深度地图。我们的系统DeepFusion能够生成GPU上实时密集的重建。它以预测CNN的深度和梯度预测方式将半临界多视立体算法的输出与CNN的深度和梯度预测结合起来,使用网络产生的已知的不确定性。虽然网络只需要运行一次,但我们能够利用每个新框架来选择深度地图的深度图,以便不断使用新的几何限制。基于其在合成和真实世界数据集上的性能最差的功能,我们证明深视系统作为其他可比较的系统。