Accurate camera pose estimation is a fundamental requirement for numerous applications, such as autonomous driving, mobile robotics, and augmented reality. In this work, we address the problem of estimating the global 6 DoF camera pose from a single RGB image in a given environment. Previous works consider every part of the image valuable for localization. However, many image regions such as the sky, occlusions, and repetitive non-distinguishable patterns cannot be utilized for localization. In addition to adding unnecessary computation efforts, extracting and matching features from such regions produce many wrong matches which in turn degrades the localization accuracy and efficiency. Our work addresses this particular issue and shows by exploiting an interesting concept of sparse 3D models that we can exploit discriminatory environment parts and avoid useless image regions for the sake of a single image localization. Interestingly, through avoiding selecting keypoints from non-reliable image regions such as trees, bushes, cars, pedestrians, and occlusions, our work acts naturally as an outlier filter. This makes our system highly efficient in that minimal set of correspondences is needed and highly accurate as the number of outliers is low. Our work exceeds state-ofthe-art methods on outdoor Cambridge Landmarks dataset. With only relying on single image at inference, it outweighs in terms of accuracy methods that exploit pose priors and/or reference 3D models while being much faster. By choosing as little as 100 correspondences, it surpasses similar methods that localize from thousands of correspondences, while being more efficient. In particular, it achieves, compared to these methods, an improvement of localization by 33% on OldHospital scene. Furthermore, It outstands direct pose regressors even those that learn from sequence of images
翻译:精确的摄像头表示估计是许多应用程序的基本要求, 如自主驱动、移动机器人和增强现实。 在这项工作中, 我们处理全球 6 DoF 相机在特定环境中从单一 RGB 图像中显示的问题。 先前的作品考虑到图像的每一部分对本地化的价值。 但是, 许多图像区域, 如天空、 隐蔽性、 重复的不可分的模式无法用于本地化。 除了添加不必要的计算努力, 从这些区域提取和匹配功能, 产生许多错误匹配, 这反过来会降低本地化的准确性和效率。 我们的工作解决了这个特定问题, 并展示了一种有趣的3D 分散模型概念, 我们可以利用这些分散的 3D 模型来利用歧视性的环境部分, 避免图像的每个部分对本地化很有价值。 有趣的是, 通过避免从树木、 灌木、 汽车、 行人、 行人、 和 隐蔽性区域化区域选择的关键点, 我们的工作自然会成为一个外部过滤器。 这使得我们的系统效率非常高, 最起码的通信集是需要的, 并且非常精确的对应, 类似 的 直径直径直径直径, 在 的直径直径的直径的直方的直方的图像中,, 的直方的直方 的平方的平方 的平方的平方的平方 方法是低方的平方的平方的平方的平方的平方, 的平方法是低方 。