利用图形神经网络和相对脉冲监督进行视像相机重新定位 (Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision)

Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment. The highest-scoring methods are "structure based," and need the query camera's intrinsics as an input to the model, with careful geometric optimization. When intrinsics are absent, methods vie for accuracy by making various other assumptions. This yields fairly good localization scores, but the models are "narrow" in some way, eg., requiring costly test-time computations, or depth sensors, or multiple query frames. In contrast, our proposed method makes few special assumptions, and is fairly lightweight in training and testing. Our pose regression network learns from only relative poses of training scenes. For inference, it builds a graph connecting the query image to training counterparts and uses a graph neural network (GNN) with image representations on nodes and image-pair representations on edges. By efficiently passing messages between them, both representation types are refined to produce a consistent camera pose estimate. We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks. Our relative pose regression method matches the accuracy of absolute pose regression networks, while retaining the relative-pose models' test-time speed and ability to generalize to non-training scenes.

翻译：视觉重新定位意味着使用单一图像来估计相机相对于预先记录的环境的位置和方向。最高分的方法是“ 结构基础 ”, 并且需要查询相机的内在性作为模型的输入, 并进行谨慎的几何优化。当内在性不存在时, 方法会通过做出其他各种假设来提高准确性。这产生相当不错的本地化分数, 但模型在某种意义上是“ 窄 ”, 例如, 需要昂贵的测试时间计算, 或深度传感器, 或多个查询框架。相比之下, 我们提议的方法没有多少特殊假设, 在培训和测试方面相当轻。我们的回归网络只从相对的培训场景中学习。为了推断, 它会绘制一个图表, 将查询图像图像图像连接到其它的图像显示器。通过有效传递信息, 两种代表类型都会得到改进, 以得出一致的相机的估测值。我们验证了我们关于标准室内( 7 ) 和室外( Cambrical road) 的回归能力模型的相对精确度, 以及我们相对的回归能力模型的精确度比重的精确度模型, 将摄制成比例的摄像平的精确性模型的精确性模型的精确性模型的精确性模型的比比比。