Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from ReLIC [Mitrovic et al., 2021], we include additional inductive biases into self-supervised learning. We propose a new self-supervised representation learning method, ReLICv2, which combines an explicit invariance loss with a contrastive objective over a varied set of appropriately constructed data views to avoid learning spurious correlations and obtain more informative representations. ReLICv2 achieves $77.1\%$ top-$1$ accuracy on ImageNet under linear evaluation on a ResNet50, thus improving the previous state-of-the-art by absolute $+1.5\%$; on larger ResNet models, ReLICv2 achieves up to $80.6\%$ outperforming previous self-supervised approaches with margins up to $+2.3\%$. Most notably, ReLICv2 is the first unsupervised representation learning method to consistently outperform the supervised baseline in a like-for-like comparison over a range of ResNet architectures. Using ReLICv2, we also learn more robust and transferable representations that generalize better out-of-distribution than previous work, both on image classification and semantic segmentation. Finally, we show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.
翻译:尽管与剩余网络进行自我监督的代表制学习方法最近取得了进展,但它们在图像网络分类基准的监督下学习仍然不够完善,限制了其在业绩关键环境下的适用性。根据ReLIC[Mitrovic 等人, 2021] 先前的理论见解,我们在自我监督的学习中增加了感化偏向。我们提出了一种新的自我监督的代表制学习方法,ReLICv2, 将明显的差异损失与一系列经过适当构建的数据观点的对比性目标结合起来,以避免学习虚假的关联,并获得更多信息化的演示。ReLICv2在ResNet50的线性评价下实现了图像网络的77.1美元最高至1美元的精确度,从而用绝对的美元+1.5美元改进了先前的状态;在更大的ResNet模型上,ReLICv2的成绩超过了以往的自我监督方法,差幅高达80.6美元。 最重要的是,ReLICv2是第一个未经监督的演示方法,在类似版本的图像中持续超越了监督的基线,同时利用了我们以往的升级的图像结构。