We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences. Purposely, we introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels in the form of disparity maps. To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera, required only during training data acquisition. In this setup, we can conveniently learn cross-modal matching in the absence of ground-truth labels by distilling knowledge from an easier RGB-RGB matching task based on a collection of about 11K unlabeled image triplets. Experiments show that the proposed pipeline sets a good performance bar (1.16 pixels average registration error) for future research on this novel, challenging task.
翻译:我们通过解决立体相匹配的通信,解决以非常不同分辨率显示的同步颜色和多光谱图像的登记问题。我们有意推出一个新的 RGB-MS 数据集,在室内环境设置13个不同场景,并以差异图的形式提供总共34对配有半高级高分辨率地面真实标签的图像配对。为了完成这项任务,我们提议了一个通过开发另一个 RGB 相机进行自我监督培训的深层次学习架构,这只有在培训数据获取期间才需要。在这个设置中,我们可以在没有地面真实标签的情况下,通过从一个比较容易的 RGB-RGB-RGB 匹配任务中提取知识来方便地学习跨模式匹配,该任务的基础是收集大约11K 个未贴标签的图像三重图。实验显示,拟议的管道为未来这项具有挑战性的新式任务的研究设置了一个良好的性能棒(1.16 像素平均注册错误 ) 。