Contrastive learning has achieved great success in self-supervised visual representation learning, but existing approaches mostly ignored spatial information which is often crucial for visual representation. This paper presents heterogeneous contrastive learning (HCL), an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations. We demonstrate the effectiveness of HCL by showing that (i) it achieves higher accuracy in instance discrimination and (ii) it surpasses existing pre-training methods in a series of downstream tasks while shrinking the pre-training costs by half. More importantly, we show that our approach achieves higher efficiency in visual representations, and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
翻译:反向学习在自我监督的视觉演示学习中取得了巨大成功,但现有方法大多忽略了通常对视觉演示至关重要的空间信息。 本文展示了多种多样的对比学习(HCL ), 这是一种有效的方法,将空间信息添加到编码阶段,以缓解对比目标与强力数据增强操作之间的学习不一致。 我们通过显示(一) 它在实例中实现了更高的准确性,以及(二) 它在一系列下游任务中超过了现有的培训前方法,同时将培训前费用减少一半。 更重要的是,我们展示了我们的方法在视觉演示方面实现了更高的效率,从而传递了一个关键信息,激励未来对自我监督的视觉演示学习进行研究。