Linear probing (LP) (and $k$-NN) on the upstream dataset with labels (e.g., ImageNet) and transfer learning (TL) to various downstream datasets are commonly employed to evaluate the quality of visual representations learned via self-supervised learning (SSL). Although existing SSL methods have shown good performances under those evaluation protocols, we observe that the performances are very sensitive to the hyperparameters involved in LP and TL. We argue that this is an undesirable behavior since truly generic representations should be easily adapted to any other visual recognition task, i.e., the learned representations should be robust to the settings of LP and TL hyperparameters. In this work, we try to figure out the cause of performance sensitivity by conducting extensive experiments with state-of-the-art SSL methods. First, we find that input normalization for LP is crucial to eliminate performance variations according to the hyperparameters. Specifically, batch normalization before feeding inputs to a linear classifier considerably improves the stability of evaluation, and also resolves inconsistency of $k$-NN and LP metrics. Second, for TL, we demonstrate that a weight decay parameter in SSL significantly affects the transferability of learned representations, which cannot be identified by LP or $k$-NN evaluations on the upstream dataset. We believe that the findings of this study will be beneficial for the community by drawing attention to the shortcomings in the current SSL evaluation schemes and underscoring the need to reconsider them.
翻译:线性探测(LP)(和$k$-NN)在有标签的上游数据集上和传递学习(TL)到各种下游数据集上的使用是常用于评估通过自监督学习(SSL)学习的视觉表示质量的方法。虽然现有的SSL方法在这些评估协议下表现出良好的性能,但我们发现这些性能非常敏感于LP和TL中涉及的超参数。我们认为这是一种不可取的行为,因为真正通用的表示应该能够轻松适应任何其他视觉识别任务,即学习的表示应该对LP和TL超参数的设置具有鲁棒性。在这项工作中,我们尝试通过对最先进的SSL方法进行大量实验来找出性能灵敏度的原因。首先,我们发现LP的输入归一化对于消除根据超参数的性能变化非常重要。具体来说,将输入馈送到线性分类器之前的批归一化大大提高了评估的稳定性,并解决了$k$-NN和LP指标的不一致性。其次,对于TL,我们证明了SSL中的权重衰减参数显着影响所学表示的可传递性,而LP或$k$-NN在上游数据集上的评估无法识别它们。我们相信,这个研究的发现将有益于社区,引起人们对当前SSL评估方案中缺陷的重视,并强调需要重新考虑它们。