In this paper, we present TANDEM a real-time monocular tracking and dense mapping framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of keyframes. To increase the robustness, we propose a novel tracking front-end that performs dense direct image alignment using depth maps rendered from a global model that is built incrementally from dense depth predictions. To predict the dense depth maps, we propose Cascade View-Aggregation MVSNet (CVA-MVSNet) that utilizes the entire active keyframe window by hierarchically constructing 3D cost volumes with adaptive view aggregation to balance the different stereo baselines between the keyframes. Finally, the predicted depth maps are fused into a consistent global map represented as a truncated signed distance function (TSDF) voxel grid. Our experimental results show that TANDEM outperforms other state-of-the-art traditional and learning-based monocular visual odometry (VO) methods in terms of camera tracking. Moreover, TANDEM shows state-of-the-art real-time 3D reconstruction performance.
翻译:在本文中,我们提出了一个实时单眼跟踪和密集绘图框架。为了做出估计,德黑兰排雷中心根据一个键盘滑动窗口进行光度计捆绑调整。为了提高稳健性,我们提议一个新型的跟踪前端,利用从密度深度预测中逐渐形成的全球模型深度地图进行密集的直接图像调整。为了预测密集深度地图,我们提议Cascade View-Agregation MVSNet(CVA-MVSNet),利用整个活动键盘窗口(CVA-MVSNet),从等级上构建3D成本量,并进行适应性组合,以平衡关键框架之间的不同立体基线。最后,预测深度地图被整合成一个一致的全球地图,代表着一个细化的签名远程功能(TSDF)氧化物格。我们的实验结果显示,在相机跟踪方面,德黑兰排雷中心超越了其他以学习为基础的传统单一视觉测量方法。此外,德黑兰排雷中心展示了最新的实时3D重建业绩。