RGB-thermal salient object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair and accurately segment all the pixels belonging to those objects. It is promising in challenging scenes such as nighttime and complex backgrounds due to the insensitivity to lighting conditions of thermal images. Thus, the key problem of RGB-T SOD is to make the features from the two modalities complement and adjust each other flexibly, since it is inevitable that any modalities of RGB-T image pairs failure due to challenging scenes such as extreme light conditions and thermal crossover. In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD. Specifically, we introduce a Transformer-based feature extraction module to effective extract hierarchical features of RGB and thermal images. Then, through the attention-based feature interaction and serial multiscale dilated convolution (SDC) based feature fusion modules, the proposed model achieves the complementary interaction of low-level features and the semantic fusion of deep features. Finally, based on the mirror complementary structure, the salient regions of the two modalities can be accurately extracted even one modality is invalid. To demonstrate the robustness of the proposed model under challenging scenes in real world, we build a novel RGB-T SOD dataset VT723 based on a large public semantic segmentation RGB-T dataset used in the autonomous driving domain. Expensive experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches, including CNN-based and Transformer-based methods. The code and dataset will be released later at https://github.com/jxr326/SwinMCNet.
翻译:RGB-T 热点物体探测( RGBMB-T SOD) 旨在定位一个匹配的可见红外热红外图像配对的常见突出对象,并准确分割属于这些物体的所有像素。由于对热图像的照明条件不敏感,在夜间和复杂背景等具有挑战性的场景中,这种场景很有希望。因此,RGB-T SOD 的关键问题是使两种模式的特征相互补充和灵活调整,因为由于极端光度和热过量等具有挑战性的场景,RGB-T图像配对的任何模式都必然会失败。在本文件中,我们为RGB-T SOD 提出了一个新的镜像补充变异域网络网络网络网络。具体地说,我们引入了一个基于变异的基于变异性图像的变异异模块。然后,通过基于关注的特征互动和连续的多级变异(SDC) 功能模块,拟议的模型可以实现低级变异特性的互补互动,以及深度的组合。最后,根据镜像补充结构,S-GB3 的显像性变异的变异域域域域域域网络数据提取数据提取数据提取模型将显示一个基于常规数据模型的常规数据模式。