Recent self-supervised representation learning techniques have largely closed the gap between supervised and unsupervised learning on ImageNet classification. While the particulars of pretraining on ImageNet are now relatively well understood, the field still lacks widely accepted best practices for replicating this success on other datasets. As a first step in this direction, we study contrastive self-supervised learning on four diverse large-scale datasets. By looking through the lenses of data quantity, data domain, data quality, and task granularity, we provide new insights into the necessary conditions for successful self-supervised learning. Our key findings include observations such as: (i) the benefit of additional pretraining data beyond 500k images is modest, (ii) adding pretraining images from another domain does not lead to more general representations, (iii) corrupted pretraining images have a disparate impact on supervised and self-supervised pretraining, and (iv) contrastive learning lags far behind supervised learning on fine-grained visual classification tasks.
翻译:最近自我监督的代表学习技术在很大程度上缩小了在图像网络分类方面受监督的和不受监督的学习之间的差距。虽然目前对图像网络培训前的细节了解得比较广泛,但实地仍然缺乏推广其他数据集这一成功经验的广泛接受的最佳做法。作为朝这个方向迈出的第一步,我们研究了四个不同大型数据集的对比式自我监督学习。我们通过观察数据数量、数据域、数据质量和任务颗粒的透镜,对成功进行自我监督学习的必要条件提供了新的洞察力。我们的主要结论包括:(一) 超过500k图像的额外培训前数据的好处不大,(二) 从另一个领域添加培训前图像不会导致更笼统的表述,(三) 腐败的训练前图像对受监督和自我监督的训练前阶段有着不同的影响,(四) 差异性学习滞后远远落后于对精细视觉分类任务的监督学习。