Image and Point Clouds provide different information for robots. Finding the correspondences between data from different sensors is crucial for various tasks such as localization, mapping, and navigation. Learning-based descriptors have been developed for single sensors; there is little work on cross-modal features. This work treats learning cross-modal features as a dense contrastive learning problem. We propose a Tuple-Circle loss function for cross-modality feature learning. Furthermore, to learn good features and not lose generality, we developed a variant of widely used PointNet++ architecture for point cloud and U-Net CNN architecture for images. Moreover, we conduct experiments on a real-world dataset to show the effectiveness of our loss function and network structure. We show that our models indeed learn information from both images as well as LiDAR by visualizing the features.
翻译:图像和点云为机器人提供不同的信息。 查找来自不同传感器的数据之间的对应性对于本地化、 绘图和导航等各种任务至关重要。 已经为单个传感器开发了基于学习的描述符; 有关跨模式特性的工作很少。 这项工作将学习跨模式特征视为一个浓厚的对比学习问题。 我们提出跨模式特征学习的图普尔- 环流损失函数。 此外, 为了学习好的特点而不是失去普遍性, 我们为点云和U- Net CNN 图像结构开发了一个广泛使用的点网++ 结构的变体。 此外, 我们还在现实世界数据集上进行了实验, 以显示我们损失功能和网络结构的有效性。 我们展示了我们的模型确实从图像和LIDAR中学习信息, 通过图像的可视化。