Most existing lightweight RGB-D salient object detection (SOD) models are based on two-stream structure or single-stream structure. The former one first uses two sub-networks to extract unimodal features from RGB and depth images, respectively, and then fuses them for SOD. While, the latter one directly extracts multi-modal features from the input RGB-D images and then focuses on exploiting cross-level complementary information. However, two-stream structure based models inevitably require more parameters and single-stream structure based ones cannot well exploit the cross-modal complementary information since they ignore the modality difference. To address these issues, we propose to employ the middle-level fusion structure for designing lightweight RGB-D SOD model in this paper, which first employs two sub-networks to extract low- and middle-level unimodal features, respectively, and then fuses those extracted middle-level unimodal features for extracting corresponding high-level multi-modal features in the subsequent sub-network. Different from existing models, this structure can effectively exploit the cross-modal complementary information and significantly reduce the network's parameters, simultaneously. Therefore, a novel lightweight SOD model is designed, which contains a information-aware multi-modal feature fusion (IMFF) module for effectively capturing the cross-modal complementary information and a lightweight feature-level and decision-level feature fusion (LFDF) module for aggregating the feature-level and the decision-level saliency information in different stages with less parameters. Our proposed model has only 3.9M parameters and runs at 33 FPS. The experimental results on several benchmark datasets verify the effectiveness and superiority of the proposed method over some state-of-the-art methods.
翻译:现有大多数轻量 RGB-D 显著物体探测(SOD) 模型以双流结构或单流结构为基础,前者首先使用两个子网络分别从 RGB 和深度图像中提取单式特征,然后将其结合到 SOD 中。虽然后者直接从输入 RGB-D 图像中提取多式特征,然后侧重于利用跨层次补充信息。然而,基于双流结构的模型不可避免地需要更多的参数,基于单流结构的模型无法充分利用跨模式补充信息,因为它们忽略了模式的差异。为了解决这些问题,我们提议在本文件中使用两个分网络的中层集成结构来分别从RGB-D 和深度图像中提取单式特征,然后将那些提取中层的单行特征用于在随后的子网络中提取相应的高层次多式模式拟议特征。这一结构可以有效地利用跨模式补充信息,同时在网络的中层的RGB-DOD S-DF 模型中, 水平的跨级补充性模型模型的模型中, 一种新式的SMA 模型级的模型和模型级的模型级的模型级的模型的模型的模型的模型, 的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型