Recent advancements in self-supervised learning (SSL) made it possible to learn generalizable visual representations from unlabeled data. The performance of Deep Learning models fine-tuned on pretrained SSL representations is on par with models fine-tuned on the state-of-the-art supervised learning (SL) representations. Irrespective of the progress made in SSL, its generalizability has not been studied extensively. In this article, we perform a deeper analysis of the generalizability of pretrained SSL and SL representations by conducting a domain-based study for transfer learning classification tasks. The representations are learned from the ImageNet source data, which are then fine-tuned using two types of target datasets: similar to the source dataset, and significantly different from the source dataset. We study generalizability of the SSL and SL-based models via their prediction accuracy as well as prediction confidence. In addition to this, we analyze the attribution of the final convolutional layer of these models to understand how they reason about the semantic identity of the data. We show that the SSL representations are more generalizable as compared to the SL representations. We explain the generalizability of the SSL representations by investigating its invariance property, which is shown to be better than that observed in the SL representations.
翻译:自我监督学习(SSL)的最近进展使得有可能从未贴标签的数据中学习一般可视化的视觉表现。在事先经过培训的SSL表示式上经过微调的深学习模型的性能与在最先进的监督学习(SL)表示式上经过微调的模型相同。无论SSL取得了何种进展,其一般性能都没有进行广泛研究。在本篇文章中,我们通过对转移学习分类任务进行基于域的研究,对经过预先培训的SSL和SL表示式的一般性能进行更深入的分析。我们从图像网络来源数据中了解到,然后用两种类型的目标数据集进行微调:类似于源数据集,与源数据集大不相同。我们通过预测准确性和预测信心来研究基于SSL的模型的一般性。除此之外,我们分析这些模型的最后革命性层的归属,以了解它们对于数据的语义特性有何理由。我们表明,SLSL的表示比SL的表示式式更能概括性。我们通过SL在SL的表示式表示式中更好地解释其一般性。