We present a new learning-based method for multi-frame depth estimation from a color video, which is a fundamental problem in scene understanding, robot navigation or handheld 3D reconstruction. While recent learning-based methods estimate depth at high accuracy, 3D point clouds exported from their depth maps often fail to preserve important geometric feature (e.g., corners, edges, planes) of man-made scenes. Widely-used pixel-wise depth errors do not specifically penalize inconsistency on these features. These inaccuracies are particularly severe when subsequent depth reconstructions are accumulated in an attempt to scan a full environment with man-made objects with this kind of features. Our depth estimation algorithm therefore introduces a Combined Normal Map (CNM) constraint, which is designed to better preserve high-curvature features and global planar regions. In order to further improve the depth estimation accuracy, we introduce a new occlusion-aware strategy that aggregates initial depth predictions from multiple adjacent views into one final depth map and one occlusion probability map for the current reference view. Our method outperforms the state-of-the-art in terms of depth estimation accuracy, and preserves essential geometric features of man-made indoor scenes much better than other algorithms.
翻译:彩色视频、机器人导航或手持3D重建是现场理解、机器人导航或手持3D重建的一个根本问题。虽然最近的基于学习的方法以高精度估计深度,但从深度地图导出的三维点云往往无法保存人造场的重要几何特征(如角、边缘、平面)。广泛使用的像素深度错误并不具体惩罚这些特征上的不一致。这些不准确之处特别严重,因为随后的深度重建是为了用这种特征扫描带有人造物体的完整环境而积累的。我们的深度估算算法因此引入了综合正常地图的制约,其目的是更好地保存高精度特征和全球平面区域。为了进一步提高深度估算的准确性,我们采用了新的封闭度认知战略,将从多个相邻视图的初始深度预测汇总到最后深度地图和当前参照视图的隐蔽性概率地图。我们的方法在深度估算、保存和其他更精确性测深度模型方面超越了人的状态。