Infrared and visible image fusion plays a vital role in the field of computer vision. Previous approaches make efforts to design various fusion rules in the loss functions. However, these experimental designed fusion rules make the methods more and more complex. Besides, most of them only focus on boosting the visual effects, thus showing unsatisfactory performance for the follow-up high-level vision tasks. To address these challenges, in this letter, we develop a semantic-level fusion network to sufficiently utilize the semantic guidance, emancipating the experimental designed fusion rules. In addition, to achieve a better semantic understanding of the feature fusion process, a fusion block based on the transformer is presented in a multi-scale manner. Moreover, we devise a regularization loss function, together with a training strategy, to fully use semantic guidance from the high-level vision tasks. Compared with state-of-the-art methods, our method does not depend on the hand-crafted fusion loss function. Still, it achieves superior performance on visual quality along with the follow-up high-level vision tasks.
翻译:红外和可见的图像融合在计算机视觉领域发挥着关键作用。 以往的方法在设计损失功能中的各种融合规则方面做出了努力。 但是, 这些实验设计的融合规则使得方法越来越复杂。 此外, 其中多数只是侧重于提高视觉效果,从而显示对后续高级视觉任务的不满意性能。 为了应对这些挑战,我们在本信中开发了一个语义级融合网络,以充分利用语义指导,对实验设计的融合规则进行调控。 此外,为了对特征融合过程有更好的语义理解,一个基于变异器的融合块以多尺度的方式展示。 此外,我们设计了一个正规化损失函数,连同一项培训战略,以充分利用高级视觉任务中的语义指导。 与最先进的方法相比, 我们的方法并不取决于手制的融合损失功能。 此外,它实现了视觉质量的优异性表现以及后续的高级视觉任务。