Humans perform co-saliency detection by first summarizing the consensus knowledge in the whole group and then searching corresponding objects in each image. Previous methods usually lack robustness, scalability, or stability for the first process and simply fuse consensus features with image features for the second process. In this paper, we propose a novel consensus-aware dynamic convolution model to explicitly and effectively perform the "summarize and search" process. To summarize consensus image features, we first summarize robust features for every single image using an effective pooling method and then aggregate cross-image consensus cues via the self-attention mechanism. By doing this, our model meets the scalability and stability requirements. Next, we generate dynamic kernels from consensus features to encode the summarized consensus knowledge. Two kinds of kernels are generated in a supplementary way to summarize fine-grained image-specific consensus object cues and the coarse group-wise common knowledge, respectively. Then, we can effectively perform object searching by employing dynamic convolution at multiple scales. Besides, a novel and effective data synthesis method is also proposed to train our network. Experimental results on four benchmark datasets verify the effectiveness of our proposed method. Our code and saliency maps are available at \url{https://github.com/nnizhang/CADC}.
翻译:人类通过首先总结整个组群的共识知识,然后在每张图像中搜索相应的对象,从而进行共振探测; 以往的方法通常缺乏第一个过程的稳健性、可伸缩性或稳定性,而只是将共识特征与第二个过程的图像特征融合起来。 在本文件中,我们提出一个新的共识-能动变动模型,以明确和有效地实施“概括和搜索”进程; 为了总结共识图像特征, 我们首先使用有效的集合方法, 总结每张图像的强性特征, 然后通过自省机制综合交叉图像的提示。 通过这样做, 我们的模型满足了可伸缩性和稳定性要求。 下一步, 我们从共识特征中产生动态的内核, 以输入汇总共识知识的汇总。 以补充方式生成了两种核心, 分别总结精细的图像特有的一致对象提示和粗度组群群的普通知识。 然后, 我们可以通过多种规模的动态共振力搜索对象。 此外, 还提议了一种新而有效的数据合成方法, 以培训我们的网络。 在四种基准数据图上, 实验性结果/ CAGR/CR/CR/CRSGR/CRR