Contrastive self-supervised learning has shown impressive results in learning visual representations from unlabeled images by enforcing invariance against different data augmentations. However, the learned representations are often contextually biased to the spurious scene correlations of different objects or object and background, which may harm their generalization on the downstream tasks. To tackle the issue, we develop a novel object-aware contrastive learning framework that first (a) localizes objects in a self-supervised manner and then (b) debias scene correlations via appropriate data augmentations considering the inferred object locations. For (a), we propose the contrastive class activation map (ContraCAM), which finds the most discriminative regions (e.g., objects) in the image compared to the other images using the contrastively trained models. We further improve the ContraCAM to detect multiple objects and entire shapes via an iterative refinement procedure. For (b), we introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning, respectively. Our experiments demonstrate the effectiveness of our representation learning framework, particularly when trained under multi-object images or evaluated under the background (and distribution) shifted images.
翻译:自我监督的自我监督学习在通过对不同数据扩增力采取不谨慎做法,从未贴标签的图像中学习视觉表现方面,取得了令人印象深刻的成果;然而,通过对不同对象或对象和背景进行强化,所学的表达方式往往在背景上偏向于不同对象或对象和背景的虚假的场景相关性,这可能会损害其对下游任务的一般化。为了解决这一问题,我们开发了一个新的对象认知对比学习框架,首先(a) 以自我监督的方式将对象定位,然后(b) 通过适当的数据增强,考虑到推断的物体位置,通过适当的数据增强,使对象的场景关系下降。.(a) 对于(a),我们提出了对比性类激活地图(ContraCAM),我们建议使用对比性强的模型中发现与其他图像相比,最具有歧视性的区域(例如对象),这可能会损害对下游任务和背景的对比性关系。我们进一步改进了对比性CAM,以便通过迭接的改进程序探测多个对象和整个形状。 (b) 我们引入了两个数据增强能力的数据,根据对比性随机和背景混合组合,这可以减少对比性学习图象的前后的背景和背景学习框架下分别(我们经过的实验评估后,我们进行了的图像传播的图象效果。