Recently, self-supervised learning (SSL) has achieved tremendous success in learning image representation. Despite the empirical success, most self-supervised learning methods are rather "inefficient" learners, typically taking hundreds of training epochs to fully converge. In this work, we show that the key towards efficient self-supervised learning is to increase the number of crops from each image instance. Leveraging one of the state-of-the-art SSL method, we introduce a simplistic form of self-supervised learning method called Extreme-Multi-Patch Self-Supervised-Learning (EMP-SSL) that does not rely on many heuristic techniques for SSL such as weight sharing between the branches, feature-wise normalization, output quantization, and stop gradient, etc, and reduces the training epochs by two orders of magnitude. We show that the proposed method is able to converge to 85.1% on CIFAR-10, 58.5% on CIFAR-100, 38.1% on Tiny ImageNet and 58.5% on ImageNet-100 in just one epoch. Furthermore, the proposed method achieves 91.5% on CIFAR-10, 70.1% on CIFAR-100, 51.5% on Tiny ImageNet and 78.9% on ImageNet-100 with linear probing in less than ten training epochs. In addition, we show that EMP-SSL shows significantly better transferability to out-of-domain datasets compared to baseline SSL methods. We will release the code in https://github.com/tsb0601/EMP-SSL.
翻译:最近,自我监督学习(SSL)在学习图像表示方面取得了巨大成功。尽管经验证明这种方法非常成功,但大多数自我监督学习方法都是相当“低效”的学习器,通常需要数百个训练时期才能完全收敛。在这项工作中,我们展示了实现高效自我监督学习的关键是增加每个图像实例的裁剪数量。借助最先进的SSL方法之一,我们引入了一种自我监督学习方法的简单形式,称为Extreme-Multi-Patch Self-Supervised-Learning(EMPA-SSL),该方法不依赖于许多基于SSL的启发式技术,如分支之间的权重共享、特征归一化、输出量化和停止梯度等,并将训练时期降低了两个数量级。我们证明了该方法能够在仅一个时期内就收敛到CIFAR-10的85.1%,CIFAR-100的58.5%,Tiny ImageNet的38.1%和ImageNet-100的58.5%。此外,该方法在不到十个训练周期内即可在线性探测中实现对CIFAR-10的91.5%,CIFAR-100的70.1%,Tiny ImageNet的51.5%和ImageNet-100的78.9%的正确分类。此外,我们发现EMP-SSL在与基线SSL方法相比,具有显着更好的领域外数据集传输性。我们将在https://github.com/tsb0601/EMP-SSL发布代码。