Benefiting from color independence, illumination invariance and location discrimination attributed by the depth map, it can provide important supplemental information for extracting salient objects in complex environments. However, high-quality depth sensors are expensive and can not be widely applied. While general depth sensors produce the noisy and sparse depth information, which brings the depth-based networks with irreversible interference. In this paper, we propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD). Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. In this way, the depth information can be completed and purified. Moreover, we introduce a multi-modal filtered transformer (MFT) module, which equips with three modality-specific filters to generate the transformer-enhanced feature for each modality. The proposed model works in a depth-free style during the testing phase. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time. And, the resulted depth map can help existing RGB-D SOD methods obtain significant performance gain. The source code will be publicly available at https://github.com/Xiaoqi-Zhao-DLUT/MMFT.
翻译:从深度地图所赋予的颜色独立、光化和位置歧视中受益,它可以为在复杂环境中提取突出物体提供重要的补充信息。然而,高质量的深度传感器费用昂贵,无法广泛应用。尽管一般深度传感器产生噪音和稀少的深度信息,使深度网络受到不可逆转的干扰。在本文件中,我们提议为RGB-D显著物体探测(SOD)建立一个新型的多任务和多式过滤变压器网络(MMMFT),具体地说,我们统一了三项互补任务:深度估计、突出对象探测和轮廓估计。多任务装置促进从辅助任务中学习任务认知特征的模型。这样,深度信息就可以完成和净化。此外,我们引入了一个多式过滤变压器和多式过滤器(MMMFFT)模块(MFT),为每种模式生成变压器强化的特征。提议的模型将在测试阶段以深度自由方式工作。在测试阶段,多塔克机制中,不仅大大超越了当前深度的深度数据。SGB-D的深度预测方法,而且还能大大超越了现有的深度数据。