Self-Supervised Learning (SSL) is a paradigm that leverages unlabeled data for model training. Empirical studies show that SSL can achieve promising performance in distribution shift scenarios, where the downstream and training distributions differ. However, the theoretical understanding of its transferability remains limited. In this paper, we develop a theoretical framework to analyze the transferability of self-supervised contrastive learning, by investigating the impact of data augmentation on it. Our results reveal that the downstream performance of contrastive learning depends largely on the choice of data augmentation. Moreover, we show that contrastive learning fails to learn domain-invariant features, which limits its transferability. Based on these theoretical insights, we propose a novel method called Augmentation-robust Contrastive Learning (ArCL), which guarantees to learn domain-invariant features and can be easily integrated with existing contrastive learning algorithms. We conduct experiments on several datasets and show that ArCL significantly improves the transferability of contrastive learning.
翻译:自我监督学习(SSL)是利用未贴标签的数据进行模型培训的范例。经验性研究表明,在下游和培训分布不同的情况下,SSL可以在分销转换情景中取得有希望的成绩。然而,对于其可转移性的理论理解仍然有限。在本文中,我们通过调查数据增强对自我监督对比学习的影响,开发了一个理论框架,分析自我监督对比学习的可转移性。我们的结果显示,对比学习的下游表现在很大程度上取决于数据增强的选择。此外,我们表明,对比学习无法学习限制其可转移性的域异性特征。根据这些理论见解,我们提出了一个名为“增强-紫外线对立学习(ArCL)”的新方法,它保证学习域异性特征,并且可以很容易地与现有的对比学习算法结合。我们在几个数据集上进行实验,并表明ArCL会显著改善对比学习的可转移性。</s>