Self-Supervised Learning (SSL) models rely on a pretext task to learn representations. Because this pretext task differs from the downstream tasks used to evaluate the performance of these models, there is an inherent misalignment or pretraining bias. A commonly used trick in SSL, shown to make deep networks more robust to such bias, is the addition of a small projector (usually a 2 or 3 layer multi-layer perceptron) on top of a backbone network during training. In contrast to previous work that studied the impact of the projector architecture, we here focus on a simpler, yet overlooked lever to control the information in the backbone representation. We show that merely changing its dimensionality -- by changing only the size of the backbone's very last block -- is a remarkably effective technique to mitigate the pretraining bias. It significantly improves downstream transfer performance for both Self-Supervised and Supervised pretrained models.
翻译:摘要:自监督学习(SSL)模型依赖于预文本任务来学习表示。因为这个预文本任务与用于评估这些模型性能的下游任务不同,存在固有的不对齐或预训练偏差。在SSL中常用的技巧是在训练过程中在骨干网络上添加一个小的投影器(通常是2或3层的多层感知器),以使深度网络更加鲁棒性。与先前研究投影器架构影响的工作相比,我们在此关注更简单但被忽略的控制骨干表示信息的杠杆。我们展示仅通过改变骨干网络的最后一个块的大小来改变其维数,是一个非常有效的技术来减少预训练偏差。它显著提高了自监督和监督预训练模型的下游转移性能。