Self-supervised visual representation learning has seen huge progress in recent months. However, no large scale evaluation has compared the many pre-trained models that are now available. In this paper, we evaluate the transfer performance of 13 top self-supervised models on 25 downstream tasks, including many-shot classification, few-shot classification, object detection and dense prediction. We compare their performance to a supervised baseline and conclude that on most datasets, the best self-supervised models outperform supervision, confirming the recently observed trend in the literature. We find ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition, but increasingly less so for few-shot, object detection and dense prediction, as well as to unstructured data. There is no single self-supervised method which dominates overall, but notably DeepCluster-v2 comes out on top in recognition and SimCLR-v2 in detection and dense prediction. Our analysis of feature properties suggests that top self-supervised learners struggle to preserve colour information as well as supervised (likely due to use of augmentation), but exhibit better calibration for recognition and suffer from less attentive overfitting than supervised learners.
翻译:近几个月来,自我监督的视觉演示学习取得了巨大进展。 但是,没有大规模评价将许多经过预先培训的模型进行比较。 在本文中,我们评估了25个下游任务中13个最高自我监督模型的转移性能,包括多发分类、少发分类、物体探测和密集预测。我们将其性能与监督基线进行比较,并得出结论,在大多数数据集中,最佳自我监督模型的性能优于监督,证实了最近观察到的文献趋势。我们发现图像网顶部1的准确性与向多发识别转移高度相关,但对于少发、物体探测和密集预测以及非结构化数据而言则越来越少。没有单一的自我监督方法在总体上占主导地位,但值得注意的是,DeepCluster-V2在认知中排在顶端,SimCLR-V2在检测和密集预测中处于领先地位。 我们对特征特性特性的分析表明,顶级自我监督的学生在保存彩色信息方面挣扎(可能由于使用增强能力),但展示了更好的校准度,并遭受比监督过于监督的不那么仔细的伤害。