Contrastive learning has achieved state-of-the-art performance in various self-supervised learning tasks and even outperforms its supervised counterpart. Despite its empirical success, theoretical understanding of why contrastive learning works is still limited. In this paper, (i) we provably show that contrastive learning outperforms autoencoder, a classical unsupervised learning method, for both feature recovery and downstream tasks; (ii) we also illustrate the role of labeled data in supervised contrastive learning. This provides theoretical support for recent findings that contrastive learning with labels improves the performance of learned representations in the in-domain downstream task, but it can harm the performance in transfer learning. We verify our theory with numerical experiments.
翻译:反向学习在各种自我监督的学习任务中取得了最先进的成绩,甚至优于所监督的对口单位。尽管它取得了经验上的成功,但理论上对对比式学习工作为什么仍然有限的理解仍然有限。在本文中,(一)我们可以明显地表明,反比式学习优于自动编码器,这是典型的、没有监督的学习方法,既包括特征恢复,也包括下游任务;(二)我们还说明了贴标签的数据在监督对比式学习中的作用。这为最近的调查结果提供了理论上的支持,即与标签对比式学习改善了内部下游任务中学习表现的表现,但会损害转移学习的绩效。我们用数字实验来验证我们的理论。