Multi-modality (MM) image fusion aims to render fused images that maintain the merits of different modalities, e.g., functional highlight and detailed textures. To tackle the challenge in modeling cross-modality features and decomposing desirable modality-specific and modality-shared features, we propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. Firstly, CDDFuse uses Restormer blocks to extract cross-modality shallow features. We then introduce a dual-branch Transformer-CNN feature extractor with Lite Transformer (LT) blocks leveraging long-range attention to handle low-frequency global features and Invertible Neural Networks (INN) blocks focusing on extracting high-frequency local information. A correlation-driven loss is further proposed to make the low-frequency features correlated while the high-frequency features uncorrelated based on the embedded information. Then, the LT-based global fusion and INN-based local fusion layers output the fused image. Extensive experiments demonstrate that our CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion. We also show that CDDFuse can boost the performance in downstream infrared-visible semantic segmentation and object detection in a unified benchmark. The code is available at https://github.com/Zhaozixiang1228/MMIF-CDDFuse.
翻译:多模态(MM)图像融合旨在呈现保留不同模态优点的融合图像,例如功能突出和详细纹理。为了解决建模跨模态特征和分解理想模态特定和模态共享特征的挑战,我们提出了一种新的基于相关性驱动特征分解融合(CDDFuse)网络。首先,CDDFuse采用Restormer块提取跨模态浅层特征。然后,我们引入了一个采用Lite Transformer(LT)块的双分支Transformer-CNN特征提取器,利用长程注意力处理低频全局特征,以及采用以Invertible Neural Networks(INN)块为主的提取高频局部信息。根据嵌入的信息进一步提出了基于相关性驱动的损失,使低频特征相关而高频特征不相关。然后,基于LT的全局融合和基于INN的局部融合层输出融合图像。大量实验表明,我们的CDDFuse在多个融合任务中取得了良好的结果,包括红外-可见光图像融合和医学图像融合。我们还展示了CDDFuse可以在统一基准测试中提高红外-可见光语义分割和目标检测的性能。代码可在https://github.com/Zhaozixiang1228/MMIF-CDDFuse找到。