Accurately describing and detecting 2D and 3D keypoints is crucial to establishing correspondences across images and point clouds. Despite a plethora of learning-based 2D or 3D local feature descriptors and detectors having been proposed, the derivation of a shared descriptor and joint keypoint detector that directly matches pixels and points remains under-explored by the community. This work takes the initiative to establish fine-grained correspondences between 2D images and 3D point clouds. In order to directly match pixels and points, a dual fully convolutional framework is presented that maps 2D and 3D inputs into a shared latent representation space to simultaneously describe and detect keypoints. Furthermore, an ultra-wide reception mechanism in combination with a novel loss function are designed to mitigate the intrinsic information variations between pixel and point local regions. Extensive experimental results demonstrate that our framework shows competitive performance in fine-grained matching between images and point clouds and achieves state-of-the-art results for the task of indoor visual localization. Our source code will be available at [no-name-for-blind-review].
翻译:准确描述和探测 2D 和 3D 关键点对于建立图像和点云之间的通信至关重要。 尽管已经提出了大量基于学习的 2D 或 3D 本地特征描述器和探测器, 但仍产生一个共享描述器和联合关键点检测器, 直接匹配像素和点点, 社区仍然未充分探索。 这项工作采取了在 2D 图像和 3D 点云之间建立细微匹配的功能。 为了直接匹配像素和点云, 提出了一个双轨全演框架, 将地图 2D 和 3D 输入到一个共享的潜在代表空间, 以同时描述和检测关键点。 此外, 设计了一个超广域接收机制, 结合一种新的损失功能, 以缓解像素和点地方区域之间的内在信息差异。 广泛的实验结果表明, 我们的框架在图像和点云之间的微分匹配中表现出有竞争力的表现, 并实现室内视觉本地化任务的最新结果。 我们的源代码将在 [无名盲审 上提供 。