Self-supervised learning (SSL) aims to produce useful feature representations without access to any human-labeled data annotations. Due to the success of recent SSL methods based on contrastive learning, such as SimCLR, this problem has gained popularity. Most current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective and then discard the learned projection head after training. This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training? In this work, we first perform a systematic study on the behavior of SSL training focusing on the role of the projection head layers. By formulating the projection head as a parametric component for the InfoNCE objective rather than a part of the network, we present an alternative optimization scheme for training contrastive learning based SSL frameworks. Our experimental study on multiple image classification datasets demonstrates the effectiveness of the proposed approach over alternatives in the SSL literature.
翻译:自我监督的学习(SSL) 旨在产生有用的特征表现,而没有获得任何人类标签的数据说明。由于最近基于对比性学习(如SimCLR)的SSL方法的成功,这一问题已经受到欢迎。目前大多数对比性学习方法将一个准配方的投影头附加到某个主干网的末尾,以优化InfoNCE目标,然后在培训后丢弃学到的投影头。这提出了一个基本问题:如果我们在培训后丢弃它,为什么需要一个可学习的投影头?在这项工作中,我们首先对SSL培训的行为进行系统研究,重点是投影头层的作用。通过将投影头作为InfoNCE目标的参数部分,而不是网络的一部分,我们提出了一个用于培训基于SSL框架的对比性学习的备选优化计划。我们对多图像分类数据集的实验研究表明,拟议的方法相对于SSL文献中的替代方法是有效的。