The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62 degrees with respect to the initial estimates obtained based on bounding boxes. Code and data are available at https://github.com/IIT-PAVIS/PoserNet.
翻译:与一组图像相联的相机的估测通常依赖于图像之间的特征匹配。 相反,我们是第一个通过使用目标区域来应对这一挑战的,用对象区域来指导构成估计问题,而不是直观的语义物体探测。我们提议建立一个轻量的图像神经网络(Pose Refiner Net)来完善近似对对相相相相相相配图像的配置。 PoserNet利用目标区域之间的关联,即简明表示为捆绑的框,从多个视图到全球精细化小连接的视图图。我们评估了7-Scenes数据集,跨不同大小的图表,并展示了这一过程如何有利于优化基于移动动动动的算法,在根据捆绑框框获得的初步估计方面,将中位误提高62度。代码和数据见https://github.com/IIT-PAVIS/PoserNet。