Cross-scene model adaption is crucial for camera relocalization in real scenarios. It is often preferable that a pre-learned model can be fast adapted to a novel scene with as few training samples as possible. The existing state-of-the-art approaches, however, can hardly support such few-shot scene adaption due to the entangling of image feature extraction and scene coordinate regression. To address this issue, we approach camera relocalization with a decoupled solution where feature extraction, coordinate regression, and pose estimation are performed separately. Our key insight is that feature encoder used for coordinate regression should be learned by removing the distracting factor of coordinate systems, such that feature encoder is learned from multiple scenes for general feature representation and more important, view-insensitive capability. With this feature prior, and combined with a coordinate regressor, few-shot observations in a new scene are much easier to connect with the 3D world than the one with existing integrated solution. Experiments have shown the superiority of our approach compared to the state-of-the-art methods, producing higher accuracy on several scenes with diverse visual appearance and viewpoint distribution.
翻译:交叉光谱模型的调整对于在真实情景中相机重新定位至关重要。 通常比较可取的是, 一个预学模型能够以尽可能少的培训样本快速适应新的场景。 但是,由于图像特征提取和场景协调回归的趋同,现有的最先进的方法几乎无法支持这种小片场景的调整。 为了解决这个问题, 我们用一个分离的解决方案将相机重新定位, 地物提取、 协调回归和显示估计分别进行 。 我们的关键洞察力是, 协调回归的特征编码器应该通过消除协调系统转移注意力的因素来学习, 以便从多个场景中学习地物编码器, 用于一般地物描述和更重要的、 视觉不敏感的能力。 由于这个特点, 并且与一个协调的回归器相结合, 在新场景中, 很少的点观测比现有的综合解决方案更容易连接到3D世界。 实验显示, 我们的方法优于最先进的状态方法, 使若干场景的精确度更高, 具有不同的视觉和视觉分布。