Since the differences in viewing range, resolution and relative position, the multi-modality sensing module composed of infrared and visible cameras needs to be registered so as to have more accurate scene perception. In practice, manual calibration-based registration is the most widely used process, and it is regularly calibrated to maintain accuracy, which is time-consuming and labor-intensive. To cope with these problems, we propose a scene-adaptive infrared and visible image registration. Specifically, in regard of the discrepancy between multi-modality images, an invertible translation process is developed to establish a modality-invariant domain, which comprehensively embraces the feature intensity and distribution of both infrared and visible modalities. We employ homography to simulate the deformation between different planes and develop a hierarchical framework to rectify the deformation inferred from the proposed latent representation in a coarse-to-fine manner. For that, the advanced perception ability coupled with the residual estimation conducive to the regression of sparse offsets, and the alternate correlation search facilitates a more accurate correspondence matching. Moreover, we propose the first ground truth available misaligned infrared and visible image dataset, involving three synthetic sets and one real-world set. Extensive experiments validate the effectiveness of the proposed method against the state-of-the-arts, advancing the subsequent applications.
翻译:鉴于多模态传感模块由红外和可见光相机组成,在视野范围、分辨率和相对位置方面存在差异,需要进行配准,以获得更准确的场景感知。在实践中,手动基于校准的配准是最广泛使用的过程,经常进行校准以保持精度,但这是耗时和劳动密集的。为了解决这些问题,我们提出了一种场景自适应红外和可见图像配准方法。具体来说,针对多模态图像之间的差异,开发了一种可逆的平移过程,建立了一个包括红外和可见模态的特征强度和分布的模态不变域。我们采用单应性矩阵来模拟不同平面之间的变形,并开发了一种分层框架以粗到细的方式恢复从提出的潜在表示得出的变形。为此,采用了先进的感知能力,结合有利于稀疏偏移回归的残差估计和交替相关搜索,有助于更准确的对应匹配。此外,我们提出了第一个可用的红外与可见图像错位数据集,涉及三个合成集和一个实际数据集。广泛的实验验证了所提出的方法相对于现有技术的有效性,提高了后续应用的水平。