The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information. In this paper, we explore these issues from a new perspective. We integrate the features of different modalities through densely connected structures and use their mixed features to generate dynamic filters with receptive fields of different sizes. In the end, we implement a kind of more flexible and efficient multi-scale cross-modal feature processing, i.e. dynamic dilated pyramid module. In order to make the predictions have sharper edges and consistent saliency regions, we design a hybrid enhanced loss function to further optimize the results. This loss function is also validated to be effective in the single-modal RGB SOD task. In terms of six metrics, the proposed method outperforms the existing twelve methods on eight challenging benchmark datasets. A large number of experiments verify the effectiveness of the proposed module and loss function. Our code, model and results are available at \url{https://github.com/lartpang/HDFNet}.
翻译:RGB-D显著物体探测(SOD)的主要目的是如何更好地整合和利用跨模式聚合信息。在本文件中,我们从新的角度探讨这些问题。我们通过紧密相连的结构整合不同模式的特征,并使用其混合特征产生具有不同大小的可接收域的动态过滤器。最后,我们实施一种更加灵活和高效的多规模跨模式特征处理,即动态扩展金字塔模块。为了使预测具有更清晰的边缘和一致的显著区域,我们设计了一个混合强化损失功能,以进一步优化结果。这一损失功能在单一模式RGB SOD任务中也得到验证,在六个标准中,拟议方法比现有的八个挑战性基准数据集的十二种方法要强。大量实验可以验证拟议模块和损失功能的有效性。我们的代码、模型和结果可在\url{https://github.com/lartpang/HDFNet}查阅。