Despite the empirical successes of self-supervised learning (SSL) methods, it is unclear what characteristics of their representations lead to high downstream accuracies. In this work, we characterize properties that SSL representations should ideally satisfy. Specifically, we prove necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes (e.g., linear or MLP) trained on that representation attain perfect accuracy. These requirements lead to a unifying conceptual framework for improving existing SSL methods and deriving new ones. For contrastive learning, our framework prescribes simple but significant improvements to previous methods such as using asymmetric projection heads. For non-contrastive learning, we use our framework to derive a simple and novel objective. Our resulting SSL algorithms outperform baselines on standard benchmarks, including SwAV+multicrops on linear probing of ImageNet.
翻译:尽管自我监督的学习方法取得了经验性的成功,但尚不清楚他们的表现特征是什么导致下游高度封闭。在这项工作中,我们描述SSL的表现最理想地满足的属性。具体地说,我们证明必要和充分的条件,对于任何对特定数据扩增有差异的任务来说,我们证明必要和充分的条件,对于任何对特定数据扩增有差异的任务来说,所希望的关于该表达式的探测器(如线性或MLP)获得完全的准确性。这些要求导致改进现有SSL方法和产生新模型的统一概念框架。对于对比性学习,我们的框架对以前的方法(如使用不对称投影头)作了简单但重大的改进。对于非互动性学习,我们利用我们的框架来得出一个简单和新颖的目标。我们所产生的SSL的算法超越了标准基准的基线,包括图像网络线性探测的Swav+多曲线。