Color plays an important role in human visual perception, reflecting the spectrum of objects. However, the existing infrared and visible image fusion methods rarely explore how to handle multi-spectral/channel data directly and achieve high color fidelity. This paper addresses the above issue by proposing a novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data, which increases the ability of multi-source information aggregation and the fidelity of colors. In specific, instead of converting multi-channel images into single-channel data in existing fusion methods, we create the multi-channel data distribution with a denoising network in a latent space with forward and reverse diffusion process. Then, we use the the denoising network to extract the multi-channel diffusion features with both visible and infrared information. Finally, we feed the multi-channel diffusion features to the multi-channel fusion module to directly generate the three-channel fused image. To retain the texture and intensity information, we propose multi-channel gradient loss and intensity loss. Along with the current evaluation metrics for measuring texture and intensity fidelity, we introduce a new evaluation metric to quantify color fidelity. Extensive experiments indicate that our method is more effective than other state-of-the-art image fusion methods, especially in color fidelity.
翻译:色彩在人类视觉感知中起着重要作用, 反映了物体的频谱。 但是, 现有的红外和可见图像聚合方法很少探索如何直接处理多光谱/ 通道数据, 并实现高颜色忠诚度。 本文讨论上述问题, 提出一种新型的传播模型方法, 名为 Dif- Fusion, 以生成多通道输入数据的分布, 提高多源信息聚合能力和颜色的忠诚度。 具体地说, 我们不在现有聚合方法中将多通道图像转换成单一通道数据, 而是在现有聚变方法中, 我们用前向和反向扩散程序, 在隐蔽空间创建解密网络, 创建多通道数据分布。 然后, 我们使用拆散网络, 利用可见和红外信息的多通道扩散模型模型来提取多通道扩散功能。 最后, 我们把多通道扩散功能特性加入多渠道聚合模块模块模块模块模块模块模块直接生成三层连接图像的能力。 为了保留文本和强度信息, 我们建议多通道的梯度损失和强度损失。 与此同时, 与当前用于测量文本和透明度( 特别是透明度) 度实验的透明度等透明度的方法,, 也显示了一种不透明度的透明度。