In this paper, a complete pipeline for image-based 3D reconstruction of urban scenarios is proposed, based on PatchMatch Multi-View Stereo (MVS). Input images are firstly fed into an off-the-shelf visual SLAM system to extract camera poses and sparse keypoints, which are used to initialize PatchMatch optimization. Then, pixelwise depths and normals are iteratively computed in a multi-scale framework with a novel depth-normal consistency loss term and a global refinement algorithm to balance the inherently local nature of PatchMatch. Finally, a large-scale point cloud is generated by back-projecting multi-view consistent estimates in 3D. The proposed approach is carefully evaluated against both classical MVS algorithms and monocular depth networks on the KITTI dataset, showing state of the art performances.
翻译:在本文中,根据PatchMatch多视立体立体模型(MVS),提出了基于图像的3D城市情景重建完整管道。输入图像首先被输入一个现成的视觉SLAM系统,以提取相机配置和稀疏关键点,用于初始化PatchMatch优化。然后,像素深度和正常度在多尺度的框架内进行迭接计算,采用一个新的深度-正常一致性损失术语和全球精细算法,以平衡PatchMatch的内在本地性质。 最后,3D的反射多视一致估计产生了一个大型点云。 对照KITTI数据集的经典MVS算法和单眼深度网络,对拟议方法进行了仔细评估,展示了艺术表现的状态。