Contrastive learning has become a key component of self-supervised learning approaches for computer vision. By learning to embed two augmented versions of the same image close to each other and to push the embeddings of different images apart, one can train highly transferable visual representations. As revealed by recent studies, heavy data augmentation and large sets of negatives are both crucial in learning such representations. At the same time, data mixing strategies either at the image or the feature level improve both supervised and semi-supervised learning by synthesizing novel examples, forcing networks to learn more robust features. In this paper, we argue that an important aspect of contrastive learning, i.e., the effect of hard negatives, has so far been neglected. To get more meaningful negative samples, current top contrastive self-supervised learning approaches either substantially increase the batch sizes, or keep very large memory banks; increasing the memory size, however, leads to diminishing returns in terms of performance. We therefore start by delving deeper into a top-performing framework and show evidence that harder negatives are needed to facilitate better and faster learning. Based on these observations, and motivated by the success of data mixing, we propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead. We exhaustively ablate our approach on linear classification, object detection and instance segmentation and show that employing our hard negative mixing procedure improves the quality of visual representations learned by a state-of-the-art self-supervised learning method.
翻译:在计算机视觉方面,自我监督的学习方法已成为自我监督的学习方法的一个关键组成部分。通过学习将两个相近的同一图像的扩大版本嵌入两个不同的版本,并推动不同图像的嵌入,人们可以培养高度可转移的视觉形象。正如最近研究所揭示的,重数据增强和大量反向学习方法对于学习这种表达方式都是至关重要的。与此同时,在图像或功能层面的数据混合战略,无论是在图像或功能层面,都通过综合新的实例,改进监督和半监督的学习方式,迫使网络学习更强有力的特征。在本文中,我们认为,对比学习的一个重要方面,即硬性负面的效果,迄今为止被忽略了。为了获得更有意义的负面样本,目前的顶级反向自我监督的学习方法要么大幅提高批量,要么保持非常大的记忆库;但是,提高记忆的大小,导致业绩回报的减少。因此,我们开始更深地进入一个最优秀的绩效框架,并显示需要更强硬的负面状态,以便更好和更迅速地学习。基于这些观察,基于这些观察,以最准确的计算方法,我们用一个硬的计算方法来模拟的计算,我们最精确地模拟的计算方式,可以提出一个硬性地模拟的计算方法,在最精确地模拟的计算。