Contrastive methods have led a recent surge in the performance of self-supervised representation learning (SSL). Recent methods like BYOL or SimSiam purportedly distill these contrastive methods down to their essence, removing bells and whistles, including the negative examples, that do not contribute to downstream performance. These "non-contrastive" methods work surprisingly well without using negatives even though the global minimum lies at trivial collapse. We empirically analyze these non-contrastive methods and find that SimSiam is extraordinarily sensitive to dataset and model size. In particular, SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size. We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels. We further analyze architectural design choices and their effect on the downstream performance. Finally, we demonstrate that shifting to a continual learning setting acts as a regularizer and prevents collapse, and a hybrid between continual and multi-epoch training can improve linear probe accuracy by as many as 18 percentage points using ResNet-18 on ImageNet. Our project page is at https://alexanderli.com/noncontrastive-ssl/.
翻译:最近的一些方法,例如 BYOL 或 SimSiam 将这些反比方法提炼到其本质,除去钟声和哨声,包括负面例子,这些方法对下游业绩没有帮助。这些“非反比”方法在不使用负作用的情况下效果极好,尽管全球最低要求处于微不足道的崩溃状态。我们从经验上分析这些非反比方法,发现SimSiam对数据集和模型大小特别敏感。特别是,如果模型与数据集大小相比太小,SimSiam 演示会发生部分的尺寸崩溃。我们提出了衡量这种崩溃程度的尺度,并表明可以用来预测下游任务业绩,而不作任何微调或标签。我们进一步分析建筑设计选择及其对下游业绩的影响。最后,我们证明,转向持续学习的设置会起到调节作用,防止崩溃,而连续和多层次培训之间的混合作用可以提高线性探测的准确度,因为模型与数据集相比太小。我们在 ALSNet/Resalvarial Net 的18 % 。