Saliency prediction has made great strides over the past two decades, with current techniques modeling low-level information, such as color, intensity and size contrasts, and high-level one, such as attention and gaze direction for entire objects. Despite this, these methods fail to account for the dissimilarity between objects, which humans naturally do. In this paper, we introduce a detection-guided saliency prediction network that explicitly models the differences between multiple objects, such as their appearance and size dissimilarities. Our approach is general, allowing us to fuse our object dissimilarities with features extracted by any deep saliency prediction network. As evidenced by our experiments, this consistently boosts the accuracy of the baseline networks, enabling us to outperform the state-of-the-art models on three saliency benchmarks, namely SALICON, MIT300 and CAT2000.
翻译:在过去二十年中,测量质量的预测取得了巨大的进步,目前的技术模拟了低层次的信息,如颜色、强度和大小对比,高层次的信息,如关注和凝视整个物体的方向。尽管如此,这些方法未能说明物体之间的差异,人类自然会这样做。在本文件中,我们引入了检测引导显著预测网络,明确模拟多个物体之间的差异,如外观和大小差异。我们的方法是一般性的,使我们能够将不同物体与任何深度显著预测网络所提取的特征融合起来。正如我们的实验所证明的那样,这一贯地提高了基线网络的准确性,使我们能够在三个突出基准(即SALICON、MIT300和CAT2000)上超越最先进的模型。