The success of fully supervised saliency detection models depends on a large number of pixel-wise labeling. In this paper, we work on bounding-box based weakly-supervised saliency detection to relieve the labeling effort. Given the bounding box annotation, we observe that pixels inside the bounding box may contain extensive labeling noise. However, as a large amount of background is excluded, the foreground bounding box region contains a less complex background, making it possible to perform handcrafted features-based saliency detection with only the cropped foreground region. As the conventional handcrafted features are not representative enough, leading to noisy saliency maps, we further introduce structure-aware self-supervised loss to regularize the structure of the prediction. Further, we claim that pixels outside the bounding box should be background, thus partial cross-entropy loss function can be used to accurately localize the accurate background region. Experimental results on six benchmark RGB saliency datasets illustrate the effectiveness of our model.
翻译:完全监督的显要检测模型的成功取决于大量像素标签。 在本文中, 我们致力于基于捆绑盒的微弱监督的显要检测, 以缓解标签工作。 鉴于捆绑盒的注解, 我们观察到, 捆绑盒内的像素可能含有广泛的标签噪音。 但是, 由于大量背景被排除, 前景捆绑盒区域包含不那么复杂的背景, 从而有可能只对种植的浅地区域进行手工制作的特征显要检测。 由于传统的手工艺特征不够具有代表性, 导致显要图吵闹, 我们进一步引入结构自观察的损失来规范预测的结构。 此外, 我们声称, 捆绑盒外的像素应该是背景的, 因此部分跨有机损失功能可以用来准确定位准确的背景区域。 6个基准 RGB显要数据集的实验结果显示了我们模型的有效性 。