Transfuse:一个使用自监督学习的基于统一变异器的图像融合框架 (TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning)

Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image. Due to insufficient task-specific training data and corresponding ground truth, most existing end-to-end image fusion methods easily fall into overfitting or tedious parameter optimization processes. Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets and utilizing the extracted features for fusion, but the domain gap between natural images and different fusion tasks results in limited performance. In this study, we design a novel encoder-decoder based image fusion framework and propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features. Specifically, we propose three destruction-reconstruction self-supervised auxiliary tasks for multi-modal image fusion, multi-exposure image fusion and multi-focus image fusion based on pixel intensity non-linear transformation, brightness transformation and noise transformation, respectively. In order to encourage different fusion tasks to promote each other and increase the generalizability of the trained network, we integrate the three self-supervised auxiliary tasks by randomly choosing one of them to destroy a natural image in model training. In addition, we design a new encoder that combines CNN and Transformer for feature extraction, so that the trained model can exploit both local and global information. Extensive experiments on multi-modal image fusion, multi-exposure image fusion and multi-focus image fusion tasks demonstrate that our proposed method achieves the state-of-the-art performance in both subjective and objective evaluations. The code will be publicly available soon.

翻译：图像融合是一种将多种来源图像中的信息与补充性信息整合起来,以提高单一图像的丰富性的技术。由于任务培训数据和相应的地面真实性能不足,大多数现有的端到端图像融合方法很容易落入过装或乏味参数优化过程。两阶段方法通过在大型自然图像数据集中培训编码器脱色器网络,并利用提取的聚合功能,避免了大量任务培训数据。但自然图像和不同同步任务之间的域差导致有限的性能。在本研究中,我们设计了一个基于特定任务的培训机-脱色器基于图像的多线性能和相应的地面真实性能。我们设计了一个以销毁为基点的自监督性能培训计划。我们建议用三种阶段方法来避免大量任务性能培训,即针对大型图像融合、多色调图像融合和多点度图像混合的模型混合性能。为了鼓励不同的混合任务,在经过培训的每个目标性能的基础上,我们将经过培训的本地性能变换成一个自我设计的网络,从而实现我们经过培训的自我改造的自我设计性能。