Self-supervised learning (SSL) pipelines differ in many design choices such as the architecture, augmentations, or pretraining data. Yet SSL is typically evaluated using a single metric: linear probing on ImageNet. This does not provide much insight into why or when a model is better, now how to improve it. To address this, we propose an SSL risk decomposition, which generalizes the classical supervised approximation-estimation decomposition by considering errors arising from the representation learning step. Our decomposition consists of four error components: approximation, representation usability, probe generalization, and encoder generalization. We provide efficient estimators for each component and use them to analyze the effect of 30 design choices on 169 SSL vision models evaluated on ImageNet. Our analysis gives valuable insights for designing and using SSL models. For example, it highlights the main sources of error and shows how to improve SSL in specific settings (full- vs few-shot) by trading off error components. All results and pretrained models are at https://github.com/YannDubs/SSL-Risk-Decomposition.
翻译:自我监督的学习(SSL)管道在许多设计选择上存在差异,例如结构、扩增或预训练数据。然而,SSL通常使用一个单一的尺度来评估:在图像网络上进行线性勘测。这并不能为模型为何或何时更好提供多少洞察力,而现在如何加以改进。为了解决这个问题,我们建议采用SSL风险分解法,该分解法通过考虑代表性学习步骤产生的错误来概括典型的受监督的近似估计分解。我们的分解由四个错误组成部分组成:近似、表示可用性、探测一般化和编码一般化。我们为每个组成部分提供高效的估测器,并利用它们分析在图像网络上被评估的169个SSL愿景模型上30个设计选择的效果。我们的分析为设计和使用SSL模型提供了宝贵的洞察力。例如,它突出了错误的主要来源,并展示了如何通过交换错误组成部分来改进特定环境中的SSL(完整和微截图)。所有结果和预先训练的模型都在 https://github.com/YanDus/SS-Risk-Decposiomposition。