Dense image alignment from RGB-D images remains a critical issue for real-world applications, especially under challenging lighting conditions and in a wide baseline setting. In this paper, we propose a new framework to learn a pixel-wise deep feature map and a deep feature-metric uncertainty map predicted by a Convolutional Neural Network (CNN), which together formulate a deep probabilistic feature-metric residual of the two-view constraint that can be minimised using Gauss-Newton in a coarse-to-fine optimisation framework. Furthermore, our network predicts a deep initial pose for faster and more reliable convergence. The optimisation steps are differentiable and unrolled to train in an end-to-end fashion. Due to its probabilistic essence, our approach can easily couple with other residuals, where we show a combination with ICP. Experimental results demonstrate state-of-the-art performances on the TUM RGB-D dataset and the 3D rigid object tracking dataset. We further demonstrate our method's robustness and convergence qualitatively.
翻译:从 RGB-D 图像获得的高度图像校正仍然是现实世界应用的关键问题,特别是在具有挑战性的照明条件和宽度基线设置下。在本文件中,我们提议一个新的框架,学习像素深深深处地貌图和由进化神经网络预测的深度地貌不确定性图,这些图和深地地貌特征图共同绘制出两眼限制的深度概率特征残留物,在粗糙到平坦的优化框架中使用高斯-纽顿可以最大限度地减少。此外,我们的网络预测了快速和更可靠的趋同的深度初始面貌。优化步骤是不同的,并且没有滚动,以便以端到端的方式进行培训。由于其概率性能,我们的方法可以很容易地与其他残余物结合,在那里,我们展示了与比较方案的组合。实验结果显示TUM RGB-D 数据集和 3D 硬性对象跟踪数据集的状态。我们进一步展示了我们的方法的坚固性和质量。