In contrastive self-supervised learning, the common way to learn discriminative representation is to pull different augmented "views" of the same image closer while pushing all other images further apart, which has been proven to be effective. However, it is unavoidable to construct undesirable views containing different semantic concepts during the augmentation procedure. It would damage the semantic consistency of representation to pull these augmentations closer in the feature space indiscriminately. In this study, we introduce feature-level augmentation and propose a novel semantics-consistent feature search (SCFS) method to mitigate this negative effect. The main idea of SCFS is to adaptively search semantics-consistent features to enhance the contrast between semantics-consistent regions in different augmentations. Thus, the trained model can learn to focus on meaningful object regions, improving the semantic representation ability. Extensive experiments conducted on different datasets and tasks demonstrate that SCFS effectively improves the performance of self-supervised learning and achieves state-of-the-art performance on different downstream tasks.
翻译:在对比性自我监督的学习中,学习歧视代表的常见方法是将同一图像的不同扩大的“观点”拉近,同时将所有其他图像进一步分开,这已证明是有效的。然而,在增强过程中,不可避免地会形成含有不同语义概念的不可取的观点。这将损害代表的语义一致性,从而在特征空间中将这些增强力拉近。在本研究中,我们引入了地平级增强,并提出了一种新型的语义一致特征搜索(SCFS)方法,以减轻这一负面影响。SCFS的主要理念是适应性地搜索语义一致的特征,以加强不同增强的语义一致区域之间的对比。因此,经过培训的模型可以学会关注有意义的对象区域,提高语义代表性能力。对不同的数据集和任务进行的广泛实验表明,SSCFS有效地改进了自我监督学习的绩效,并在不同的下游任务上实现了最先进的业绩。