We solve the problem of 6-DoF localisation and 3D dense reconstruction in spatial environments as approximate Bayesian inference in a deep state-space model. Our approach leverages both learning and domain knowledge from multiple-view geometry and rigid-body dynamics. This results in an expressive predictive model of the world, often missing in current state-of-the-art visual SLAM solutions. The combination of variational inference, neural networks and a differentiable raycaster ensures that our model is amenable to end-to-end gradient-based optimisation. We evaluate our approach on realistic unmanned aerial vehicle flight data, nearing the performance of state-of-the-art visual-inertial odometry systems. We demonstrate the applicability of the model to generative prediction and planning.
翻译:我们解决了空间环境中6-DoF本地化和3D密集重建的问题,在深深的状态空间模型中以近似贝耶斯人的推论方式解决了6-DoF本地化和3D密集重建的问题。我们的方法利用了多视角几何和僵硬体动态的学习和领域知识。这导致产生了一种清晰的世界预测模型,在目前最先进的视觉SLM解决方案中往往缺少这种模型。变异推论、神经网络和不同射线观测器的结合确保了我们的模型可以用于端到端的梯度优化。我们评估了我们关于现实的无人驾驶飞行器飞行数据的方法,接近于最先进的视觉内脏测量系统的性能。我们展示了该模型对基因化预测和规划的适用性。