In the past few years, numerous deep learning methods have been proposed to address the task of segmenting salient objects from RGB images. However, these approaches depending on single modality fail to achieve the state-of-the-art performance on widely used light field salient object detection (SOD) datasets, which collect large-scale natural images and provide multiple modalities such as multi-view, micro-lens images and depth maps. Most recently proposed light field SOD methods have acquired improving detecting accuracy, yet still predict rough objects' structures and perform slow inference speed. To this end, we propose CMA-Net, which consists of two novel cascaded mutual attention modules aiming at fusing the high level features from the modalities of all-in-focus and depth. Our proposed CMA-Net outperforms 30 SOD methods (by a large margin) on two widely applied light field benchmark datasets. Besides, the proposed CMA-Net can run at a speed of 53 fps, thus being four times faster than the state-of-the-art multi-modal SOD methods. Extensive quantitative and qualitative experiments illustrate both the effectiveness and efficiency of our CMA-Net, inspiring future development of multi-modal learning for both the RGB-D and light field SOD.
翻译:在过去几年里,提出了许多深层次的学习方法,以解决从RGB图像中分离突出物体的任务,但根据单一模式,这些方法未能在广泛使用的光地显要物体探测数据集上达到最先进的性能,该数据集收集大规模自然图像,提供多种模式,如多视、微粒图像和深度地图,最近提出的光场SOD方法提高了探测准确性,但仍然预测了粗物体的结构,并且速度缓慢。为此,我们提议CMA-Net,由两个新型的连锁相互注意模块组成,目的是利用所有焦点和深度模式中的高水平特征。我们提议的CMA-Net在两种广泛应用的光地基基准数据集上比30个SOD方法(大幅度)高出。此外,拟议的CMA-Net可以以53英尺的速度运行,从而比目前最先进的多模版SOD方法快四倍。广泛的定量和定性实验说明了我们MAR-M-M-MM-MMMM-M-MLM-M-ML-MLM-M-ML-MLM-ML-ML-ML-MOL-MOL-MOL-MOL-MD-MA-MOL-ML-ML-ML-MOL-MA-ML-MOL-MD-MLML-MD-MD-MD-MD-MD-MD-MLM-M-M-M-M-M-M-M-M-M-M-M-MD-M-M-M-M-M-M-M-M-MD-MD-MD-M-M-MD-MD-M-M-M-M-M-M-M-M-M-M-M-M-MD-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-MD-MD-MD-MD-MD-MD-MD-MD-MD-MD-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-