We introduce a method for real-time navigation and tracking with differentiably rendered world models. Learning models for control has led to impressive results in robotics and computer games, but this success has yet to be extended to vision-based navigation. To address this, we transfer advances in the emergent field of differentiable rendering to model-based control. We do this by planning in a learned 3D spatial world model, combined with a pose estimation algorithm previously used in the context of TSDF fusion, but now tailored to our setting and improved to incorporate agent dynamics. We evaluate over six simulated environments based on complex human-designed floor plans and provide quantitative results. We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations under stochastic, continuous dynamics.
翻译:我们引入了一种实时导航和跟踪方法,使用不同的世界模型。 用于控制的学习模式在机器人和计算机游戏中取得了令人印象深刻的成果, 但这一成功还有待推广到基于视觉的导航。 为了解决这个问题,我们将不同成份的新兴领域的进展转移到基于模型的控制。 我们这样做的方式是,用一个3D空间世界模型进行规划,同时使用一种先前在TSDF融合过程中使用但现在适合我们设置的构成估计算法,并改进了其中的代理动力。 我们根据复杂的人类设计的地面计划,对六个模拟环境进行了评估,并提供了定量结果。 我们仅利用随机、连续动态下的图像和深度观测,在15赫兹的频率上达到92%的导航成功率。