Contrastive learning has made considerable progress in computer vision, outperforming supervised pretraining on a range of downstream datasets. However, is contrastive learning the better choice in all situations? We demonstrate two cases where it is not. First, under sufficiently small pretraining budgets, supervised pretraining on ImageNet consistently outperforms a comparable contrastive model on eight diverse image classification datasets. This suggests that the common practice of comparing pretraining approaches at hundreds or thousands of epochs may not produce actionable insights for those with more limited compute budgets. Second, even with larger pretraining budgets we identify tasks where supervised learning prevails, perhaps because the object-centric bias of supervised pretraining makes the model more resilient to common corruptions and spurious foreground-background correlations. These results underscore the need to characterize tradeoffs of different pretraining objectives across a wider range of contexts and training regimes.
翻译:在计算机愿景方面,反向学习取得了相当大的进展,在一系列下游数据集方面比受监督的预科培训要好得多。然而,在各种情况下,这是对比性地学习更好的选择吗?我们展示了两种情况。首先,在足够小的预科预算下,受监督的图像网络预科培训始终优于八个不同图像分类数据集的可比对比模式。这表明,在数百或数千个时代对预科培训方法进行比较的常见做法可能不会为那些预算比较有限的人带来可操作的洞察力。 其次,即使有较大的预科预算,我们确定在哪些任务中有受监督的学习盛行,也许因为受监督的预科培训的以目标为中心的偏向性倾向使得模型更能适应常见的腐败和虚假的地表地对地相关关系。这些结果突出表明,需要将不同培训前目标在更广泛的背景和培训制度上进行权衡。