We describe a learning-based system that estimates the camera position and orientation from a single input image relative to a known environment. The system is flexible w.r.t. the amount of information available at test and at training time, catering to different applications. Input images can be RGB-D or RGB, and a 3D model of the environment can be utilized for training but is not necessary. In the minimal case, our system requires only RGB images and ground truth poses at training time, and it requires only a single RGB image at test time. The framework consists of a deep neural network and fully differentiable pose optimization. The neural network predicts so called scene coordinates, i.e. dense correspondences between the input image and 3D scene space of the environment. The pose optimization implements robust fitting of pose parameters using differentiable RANSAC (DSAC) to facilitate end-to-end training. The system, an extension of DSAC++ and referred to as DSAC*, achieves state-of-the-art accuracy an various public datasets for RGB-based re-localization, and competitive accuracy for RGB-D-based re-localization.
翻译:我们描述一个基于学习的系统,该系统根据已知环境的单一输入图像对相机位置和方向进行估计。这个系统具有灵活的测试和培训时间可用的信息量,适合不同的应用。输入图像可以是 RGB-D 或 RGB,3D 环境模型可用于培训,但并不必要。在最低限度的情况下,我们的系统只需要在培训时间显示 RGB 图像和地面真相,它只需要在测试时间显示一个单一的 RGB 图像。这个框架包括一个深层的神经网络和完全不同的图像优化。神经网络预测了所谓的场景坐标,即输入图像与3D 环境场景空间之间的密集对应。“组合优化”安装了强大的配置参数,使用不同的RANSAC (DSAC) 来便利终端到终端培训。这个系统,DSAC++的扩展和被称为DSAC* 的扩展,实现了基于RGB重新定位的各种公共数据集的最新精度,以及基于RGBD重新定位的竞争性精确度。