In this paper, we propose TransMEF, a transformer-based multi-exposure image fusion framework that uses self-supervised multi-task learning. The framework is based on an encoder-decoder network, which can be trained on large natural image datasets and does not require ground truth fusion images. We design three self-supervised reconstruction tasks according to the characteristics of multi-exposure images and conduct these tasks simultaneously using multi-task learning; through this process, the network can learn the characteristics of multi-exposure images and extract more generalized features. In addition, to compensate for the defect in establishing long-range dependencies in CNN-based architectures, we design an encoder that combines a CNN module with a transformer module. This combination enables the network to focus on both local and global information. We evaluated our method and compared it to 11 competitive traditional and deep learning-based methods on the latest released multi-exposure image fusion benchmark dataset, and our method achieved the best performance in both subjective and objective evaluations.
翻译:在本文中,我们提议TranseMEF,这是一个以变压器为基础的多接触图像聚合框架,它使用自监督的多任务学习方法;这个框架基于一个编码器-解码器网络,可以对大型自然图像数据集进行培训,而不需要地面真相聚合图像。我们根据多接触图像的特征设计了三个自监督的重建任务,同时利用多任务学习进行这些任务;通过这个过程,网络可以了解多接触图像的特性,并提取更普遍的特征。此外,为了弥补在CNN结构中建立长期依赖性的缺陷,我们设计了一个编码器,将CNN模块与变压器模块结合起来。这种组合使网络能够同时关注本地和全球信息。我们评估了我们的方法,并将它与11种竞争性的传统和深层次学习方法进行了比较,用于最新的多接触图像融合基准数据集,我们的方法在主观和客观评价中都取得了最佳的绩效。