RGB-D salient object detection (SOD) recently has attracted increasing research interest by benefiting conventional RGB SOD with extra depth information. However, existing RGB-D SOD models often fail to perform well in terms of both efficiency and accuracy, which hinders their potential applications on mobile devices and real-world problems. An underlying challenge is that the model accuracy usually degrades when the model is simplified to have few parameters. To tackle this dilemma and also inspired by the fact that depth quality is a key factor influencing the accuracy, we propose a novel depth quality-inspired feature manipulation (DQFM) process, which is efficient itself and can serve as a gating mechanism for filtering depth features to greatly boost the accuracy. DQFM resorts to the alignment of low-level RGB and depth features, as well as holistic attention of the depth stream to explicitly control and enhance cross-modal fusion. We embed DQFM to obtain an efficient light-weight model called DFM-Net, where we also design a tailored depth backbone and a two-stage decoder for further efficiency consideration. Extensive experimental results demonstrate that our DFM-Net achieves state-of-the-art accuracy when comparing to existing non-efficient models, and meanwhile runs at 140ms on CPU (2.2$\times$ faster than the prior fastest efficient model) with only $\sim$8.5Mb model size (14.9% of the prior lightest). Our code will be available at https://github.com/zwbx/DFM-Net.
翻译:RGB-D显性物体探测(SOD)最近通过让传统的 RGB SOD (SOD) 获得额外的深度信息,吸引了越来越多的研究兴趣;然而,现有的 RGB-D SOD 模型在效率和准确性方面往往不能很好地发挥作用和准确性,这妨碍了其在移动设备上的潜在应用以及现实世界问题。一个根本的挑战是,模型的准确性在模型简化后通常会降低,而没有多少参数。为了解决这一难题,并受到深度质量质量是影响准确性的关键因素这一事实的启发,我们提议采用一个新的深度质量启发特性操作(DQFM)程序,该程序本身效率很高,可以作为过滤深度特性的过滤机制。DGB-DOD模型采用低水平RGB和深度特性的组合,以及深度流的整体关注,以明确控制和加强跨模式的融合。我们嵌入了DQFM,以获得一个高效的轻质模型,即DFM-Net,我们只设计一个定制的深度主干线和两阶段解码(DFM),以进一步考虑。