In the past few years, numerous deep learning methods have been proposed to address the task of segmenting salient objects from RGB images. However, these approaches depending on single modality fail to achieve the state-of-the-art performance on widely used light field salient object detection (SOD) datasets, which collect large-scale natural images and provide multiple modalities such as multi-view, micro-lens images and depth maps. Most recently proposed light field SOD methods have acquired improving detecting accuracy, yet still predict rough objects' structures and perform slow inference speed. To this end, we propose CMA-Net, which consists of two novel cascaded mutual attention modules aiming at fusing the high level features from the modalities of all-in-focus and depth. Our proposed CMA-Net outperforms 30 SOD methods on two widely applied light field benchmark datasets. Besides, the proposed CMA-Net is able to inference at the speed of 53 fps. Extensive quantitative and qualitative experiments illustrate both the effectiveness and efficiency of our CMA-Net.
翻译:在过去几年里,提出了许多深层次的学习方法,以解决从RGB图像中分离突出物体的任务,但根据单一模式,这些方法未能在广泛使用的光场突出物体探测数据集上达到最先进的性能,该数据集收集大型自然图像并提供多种模式,如多视、微粒图像和深度地图,最近提出的光场SOD方法提高了探测准确性,但仍预测粗物体的结构并进行缓慢的推断速度。为此,我们提议CMA-Net,由两个新的、分层的相互注意模块组成,目的是利用所有焦点和深度模式的高层次特征。我们提议的CMA-Net在两种广泛应用的光场基准数据集上比30个SOD方法要优。此外,拟议的CMA-Net能够以53英尺的速度推断出我们CMA-Net的效果和效率。