SwinFuse:红外和可见图像残余双向变异器融合网络 (SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images)

The existing deep learning fusion methods mainly concentrate on the convolutional neural networks, and few attempts are made with transformer. Meanwhile, the convolutional operation is a content-independent interaction between the image and convolution kernel, which may lose some important contexts and further limit fusion performance. Towards this end, we present a simple and strong fusion baseline for infrared and visible images, namely\textit{ Residual Swin Transformer Fusion Network}, termed as SwinFuse. Our SwinFuse includes three parts: the global feature extraction, fusion layer and feature reconstruction. In particular, we build a fully attentional feature encoding backbone to model the long-range dependency, which is a pure transformer network and has a stronger representation ability compared with the convolutional neural networks. Moreover, we design a novel feature fusion strategy based on $L_{1}$-norm for sequence matrices, and measure the corresponding activity levels from row and column vector dimensions, which can well retain competitive infrared brightness and distinct visible details. Finally, we testify our SwinFuse with nine state-of-the-art traditional and deep learning methods on three different datasets through subjective observations and objective comparisons, and the experimental results manifest that the proposed SwinFuse obtains surprising fusion performance with strong generalization ability and competitive computational efficiency. The code will be available at https://github.com/Zhishe-Wang/SwinFuse.

翻译：现有的深层次学习融合方法主要集中于 convolution 神经网络, 并且很少尝试变压器。同时, 共进操作是图像和 convolution 内核之间内容独立的相互作用, 可能会失去一些重要的环境, 并进一步限制融合性能。为此, 我们为红外和可见图像, 即以SwinFuse 命名的\ textit{ 遗留的Swin变异网络, 提出了一个简单而强大的融合基线。我们的 SwinFuse 包括三个部分: 全球特征提取、聚合层和特征重建。特别是, 我们建立一个完全的注意性特征编码主干柱, 以模拟远程依赖, 这是一种纯粹的变异网络, 并且比共进神经神经网络更具有更大的代表能力。此外, 我们设计了一个基于 $+%1 美元的新的功能融合战略, 用于序列矩阵, 并测量行和柱矢量矢量层面的相应活动水平, 这可以保留有竞争力的红外线亮度和可见的细节。最后, 我们用强大的SwinFuse 使用强的重的重心重的编码, 用9 州- bast- flistal ad- comal comlivestisteal commal com recal commal commal commal commal commal commis conview commis conview commal comisal comisal commisal commismisal commisal violval viewsal viewsal vical vical viewd sessal disal disal viewd sessal dism sessal dal dal sessald sessal dal sal dal dal dal dal sal dal sal dal dal sal sal sal sal sal sal comm sal comm sal sal sal sal dal comm sal comm sal dal dal comm sal comm sal comm sald sal sal commessal comisal commessal commal sal sal