This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter~\textbf{(SOC)}, includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We, therefore, argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.
翻译:本文指出并解决了现有显要天体探测(SOD)数据集的严重设计偏差, 这些偏差不切实际地假定每张图像至少包含一个清晰且未分的突出对象。 这种设计偏差导致在现有数据集上评估时, 最先进的 SOD 模型性能的饱和度。 但是, 这些模型在应用到现实世界的场景时仍然远远不能令人满意。 根据我们的分析, 我们建议了一个新的高质量数据集, 并更新了先前的突出基准。 具体地说, 我们的数据集, 叫做 Clutle{textbf{( SOC)} (SOC)}, 包括来自多个共同对象类别的显要性和非显性对象的全面图像。 除了对象类别说明之外, 每个突出的图像还带有反映共同场景中共同挑战的属性。 有助于更深入地洞察 SOD 问题。 此外, 我们的显著性能网络网络, 现有的突出模型旨在从训练图像集到高端的训练点D(SOrentBE) 设置。 因此, 我们的精度战略可以显示我们的精确的精确度定位, 将数据定位定位定位定位定位定位定位定位显示我们的数据显示了 的精确的精确度,, 我们的精确度战略, 将数据定位定位显示的精确的精确度 包括了在设计中, 方向, 我们的精确的精确的定位, 我们的图像中, 我们的精确的定位中, 我们的精确的定位中, 。