We present a method for localizing a single camera with respect to a point cloud map in indoor and outdoor scenes. The problem is challenging because correspondences of local invariant features are inconsistent across the domains between image and 3D. The problem is even more challenging as the method must handle various environmental conditions such as illumination, weather, and seasonal changes. Our method can match equirectangular images to the 3D range projections by extracting cross-domain symmetric place descriptors. Our key insight is to retain condition-invariant 3D geometry features from limited data samples while eliminating the condition-related features by a designed Generative Adversarial Network. Based on such features, we further design a spherical convolution network to learn viewpoint-invariant symmetric place descriptors. We evaluate our method on extensive self-collected datasets, which involve \textit{Long-term} (variant appearance conditions), \textit{Large-scale} (up to $2km$ structure/unstructured environment), and \textit{Multistory} (four-floor confined space). Our method surpasses other current state-of-the-arts by achieving around $3$ times higher place retrievals to inconsistent environments, and above $3$ times accuracy on online localization. To highlight our method's generalization capabilities, we also evaluate the recognition across different datasets. With a single trained model, i3dLoc can demonstrate reliable visual localization in random conditions.
翻译:在室内和室外场景的点云图中,我们提出了一个单一相机本地化的方法。 问题之所以具有挑战性,是因为图像和3D之间的域间地方变异性特征的对应方法不一致。 问题甚至更加具有挑战性, 因为方法必须处理各种环境条件, 如照明、 天气和季节性变化。 我们的方法可以通过提取跨多面对称位置描述符, 将正方形图像与 3D 范围预测相匹配。 我们的关键洞察力是从有限的数据样本中保留条件异性 3D 几何特征, 而同时消除设计成的 Genemental Adversarial 网络与条件有关的特征。 基于这些特征, 我们进一步设计了一个球形变异网络, 学习视觉变异性对称位置描述符。 我们的方法可以在广泛的自我采集数据集中, 包括 extit {Long-term) (变相外观条件 ) 模型{Large- slage- speriteal } (最高为2kmm 结构/ 不结构环境 ), 以及 leftriticalalalalalalalalalalalalalal- creal- creal- creal- creal- calizationalated.